This is the course project of 2019 Fall CSCI-SHU 360 Machine Learning. This project proposes an improved Anime real-time up-scaling method using a Generative Adversarial Network for super resolution, based on Anime4K: an open-source, high-quality real-time anime upscaling algorithm.
TLDR
- Open this page to see the live demo. Allow serveral seconds for loading the videos.
- Refer to this section to run a toy example using the pre-trained model. And use this section to train your own model.
- Jump to this section to upscale your own videos.
- This section contains the jupyter notebooks this project used on Google Colab.
Table of Contents
- TLDR
- Table of Contents
- Quick Start: Web Demo
- Quick Start: Video Upscaling
- Quick Start: Traning and Testing
- Quick Start: Notebooks
- Credits
Quick Start: Web Demo
The demo is available here. Please allow serveral seconds for the videos to be loaded if you open the website for the first time. If it doesn’t play automatically, just simply click the Pause/Play video
button to start playing the video because Google disabled autoplay with sound. You can spare the effort of deploying it locally, but feel free to preceed if you would like to make some own changes.
This is a screenshot of the demo page. The upper is the upscaled video and the lower one is the original video. You can clearly notice that the upscaled video has smoother lines across the face and hair.
Prerequisites
None. This project uses Live Server, an extension on VS Code to simply build a local server, but there is no specified requirements. In the following sections, all commands will be based on this setting.
Deployment
First, clone the ANIME4K-ML repository to your local directory.
git clone https://github.com/ANPULI/Anime4K-ML.git
code web
Open index-demo.html
. Press Ctrl+Shift+P
and then select Live Server: Open with Live Server
. Or simply use the shortcut Alt+L Alt+O
to deploy the website.
Usage
The path is already set. Simple click load with URI
, and you can take a look at the difference between the original video and upscaled one.
For usage on other videos/images, you may deploy index.html
and click choose file
. You can play with the Scale, Bold, and Blur to see altered results.
Quick Start: Video Upscaling
Prerequisites
Installation & Setup
First, clone the ANIME4K-ML repository to your local directory.
git clone https://github.com/ANPULI/Anime4K-ML.git
cd ANIME4K-ML/SRGAN-video
Then, install the python dependencies using the following command before proceding.
pip install -r requirements.txt
Change Configuration
:warning: Caveat: After installing FFmpeg, please change the ffmpeg_path
in video2x.json
to the absolution path of your local installation.
Sample Video
If you do not have a video to start with, you can use the sample video provided in the sub-directory.
Clip is from anime “さくら荘のペットな彼女”. Copyright belongs to “株式会社アニプレックス (Aniplex Inc.)”. Will delete immediately if use of clip is in violation of copyright.
Run Upscaling
Enlarge the video to 960p (4x) using the CPU.
python main.py -i sample_input.mp4 -o sample_input_upscaled.mp4 -m cpu -r 4
Quick Start: Traning and Testing
Prerequisites
- Windows System
- Python 3 Download
Installation & Setup
First, clone the ANIME4K-ML repository to your local directory.
git clone https://github.com/ANPULI/Anime4K-ML.git
cd ANIME4K-ML/SRGAN-impl
Then, install the python dependencies using the following command before proceding.
pip install -r requirements.txt
Use Pre-trained Model
You can spare the efforts of training by directly using one of the following pre-installed model:
models/generator.h5
, trained on DIV2K dataset.models/generator_ANIME.h5
, trained on an anime face dataset
It is recommended to use the first model, though it was based on anime. The second dataset sufferes severely from the low resolution of the dataset, therefore it performs not well in terms of visual perception.
To use the DIV2K model, run the following in the terminal:
python infer_old.py --image_dir 'path/to/input/directory' --output_dir 'path/to/output/directory'
To use the anime model, run this:
python infer_anime.py --image_dir 'path/to/input/directory' --output_dir 'path/to/output/directory'
Toy Example
The folder ./SRGAN-impl/image_input
contains several images that you can use as an toy example to train.
mkdir temp_output
python infer_old.py --image_dir "./image_input/" --output_dir "./temp_output"
Then you can see the upscaled images in ./temp_output
, which should be the same as in ./image_output
.
To clean the output, use
rm -rf temp_output
Sample Outputs
Here is the performance with contrast on other methods, all using 240p → 960p (4x) upscaling:
Speed Benchmark
The trained model is applied on two datasets, one is from DIV2K (800 images) and the other is from a 19 second anime clip (426 images) used in the demo. The average rendering speed result is shown below table, using the GPU provided in Google Colab:
Input Image Size | Output Image Size | Time (s) | FPS |
---|---|---|---|
128×128 | 512×512 | 0.022 | 46 |
256×256 | 1024×1024 | 0.045 | 22 |
384×384 | 1536×1536 | 0.083 | 12 |
The problem this project aims to resolve is to upscale low resolution anime videos to high resolution ones (240p → 1080p). The typical FPS of anime videos is 24. Since the program is run on Google Colab, where the GPU is not dedicated to one user and the file system works extremely bad in terms of read/write speed. In a local system, the running speed will be much higher and satisfy the requirements.
Similarity Measures
All experiments are performed with a scale factor of 4× between the low and high resolution images, corresponding to a 16× reduction in image pixels. The images Set5 and Set14 and the corresponding result metrics are attributed to the supplementary materials of Huang et al and Twitter’s work on SRGAN. The highest results, except for ground truth, are formatted bold. It is apparent that SR-GAN is the best among the proposed methods. The comparison with other deep learning methods is not included because they are not at the same level of computing time - SRGAN works substantially faster. The result metric is shown below. The index MOS is given in SRGAN’s paper that quantifies visual perception.
Set5&14 | nearest | bicubic | SRGAN | original |
---|---|---|---|---|
PSNR | 26.26 | 28.43 | 29.40 | ∞ |
SSIM | 0.7552 | 0.8211 | 0.8472 | 1 |
MOS | 1.28 | 1.97 | 3.58 | 4.32 |
Testing is also done on some anime images. The result goes as follows:
nearest | bicubic | SRGAN | original | |
---|---|---|---|---|
PSNR | 29.79 | 31.67 | 32.88 | ∞ |
SSIM | 0.9479 | 0.9585 | 0.9593 | 1 |
Do it on your own?
If you would like to test the similarity measures by yourself, there is an esay program sewar that provides the function. To get upscaled images using bicubic and nearest neighbors algorithms, you can use these the two python files:
SRGAN-impl/image_resize_bicubic.py
SRGAN-impl/image_resize_nn.py
The usage of it is as follows:
mkdir 'path/to/output/image'
python image_resize_nn.py --res 'resolution' --input_dir 'path/to/input/image' --output_dir 'path/to/output/image'
Train Your Own Model
To train the model, you need to have a local image dataset. However, you do not need to manually split the dataset into high-/low-resolution parts, the program will automatically do it for you. Then, simply execute the following command to start training:
python main.py --image_dir 'path/to/image/directory' --hr_size 384 --lr 1e-4 --save_iter 200 --epochs 10 --batch_size 14
The training status will get updated every 200 iteration. To monitor the training process, you can open tensorboard and point it to the log
directory that will be created upon training.
Convert images to h5 file (buggy, don’t use)
If you are also using Google Colab as your training tool, and using Google Drive as the file system, you may have already spotted the following problems:
- It takes a considerable time to read/write.
- Some small files seem to exist when you visit them, but create
file_not_exist error
when accessed from programs.
If yes, you may want to convert the images to a single file to boost the read/write speed and preserve file completeness.
The SRGAN-impl/image2h5.py
provides a solution by compressing multiple images into a HDF5 file. To use the file, first install the h5py dependencies by executing the following command:
pip install h5py
Then, make sure your files are in the sub-directory SRGAN-impl/image_input/
and of PNG
formart, and run the following command:
python image2h5.py
Otherwise, you can also specifies the parameter by using:
python image2h5.py --image_dir 'path/to/your/image/input' --image_format 'your_image_format'
This will produce a hdf5 file called images.hdf5
that stores the information of all your images.
Do upscaling on hdf5 image file
After you have your hdf5 file, you can train the model based on it by runing:
python infer_h5.py --output_dir 'path/to/output/image'
This will generate upscaled images in your desired output directory.
Quick Start: Notebooks
In the directory ./Anime4K-ML-notebooks/
store some jupyter notebooks this project used on Google Colab, covering different stages of this project. Here is the list of each notebook and a short description on what each notebook does. In the notebooks there will be more instructions.
:warning: Caveat: These notebooks are not executable on your local file systems because they mount to the Google Drive for storage. If you feel interested in running the notebooks, please feel free to contact me for access and instructions.
File Name | Usage | Author |
---|---|---|
load_dataset.ipynb | Covers the code that downloaded datasets to Google Drive. | Hongyi Li |
interpolation.ipynb | Bilinear, bicubic, and quintic upscaling using Scipy on some sample images. | Qihang Zeng |
midterm_presentation.ipynb | Covers the dataset this project used and some preliminary experiments on the datasets, i.e., randomly picked some images and applied different upscaling algorithms and showed the result. | Hongyi Li |
video2x.ipynb | Failed attempt to run a video upscaling pipeline. (System doesn’t match) | Anpu Li |
fast-srgan.ipynb | Covers the training and testing using SRGAN and a complete pipeline of extracting frames, applyig super resolution, and converting back to video. | Anpu Li |
final.ipynb | Covers the experiments, including the speed benchmark and similarity measures. | Anpu Li |
Credits
The implementation of this project cannot be done without the insights derived from Fast-SRGAN and Video2X. This project also relies on FFmpeg and Anime4K.