Anime4k-ML

15 Dec 2019

Reading time ~8 minutes

This is the course project of 2019 Fall CSCI-SHU 360 Machine Learning. This project proposes an improved Anime real-time up-scaling method using a Generative Adversarial Network for super resolution, based on Anime4K: an open-source, high-quality real-time anime upscaling algorithm.

Github Repo: link
Project Proposal: link
Final Presentation Slides: link
Final Paper: link

TLDR

Open this page to see the live demo. Allow serveral seconds for loading the videos.
Refer to this section to run a toy example using the pre-trained model. And use this section to train your own model.
Jump to this section to upscale your own videos.
This section contains the jupyter notebooks this project used on Google Colab.

TLDR
Table of Contents
Quick Start: Web Demo
Quick Start: Video Upscaling
Quick Start: Traning and Testing
Quick Start: Notebooks
Credits

Quick Start: Web Demo

The demo is available here. Please allow serveral seconds for the videos to be loaded if you open the website for the first time. If it doesn’t play automatically, just simply click the Pause/Play video button to start playing the video because Google disabled autoplay with sound. You can spare the effort of deploying it locally, but feel free to preceed if you would like to make some own changes.

This is a screenshot of the demo page. The upper is the upscaled video and the lower one is the original video. You can clearly notice that the upscaled video has smoother lines across the face and hair.

demo

Prerequisites

None. This project uses Live Server, an extension on VS Code to simply build a local server, but there is no specified requirements. In the following sections, all commands will be based on this setting.

Deployment

First, clone the ANIME4K-ML repository to your local directory.

git clone https://github.com/ANPULI/Anime4K-ML.git
code web

Open index-demo.html. Press Ctrl+Shift+P and then select Live Server: Open with Live Server. Or simply use the shortcut Alt+L Alt+O to deploy the website.

Usage

The path is already set. Simple click load with URI, and you can take a look at the difference between the original video and upscaled one.

For usage on other videos/images, you may deploy index.html and click choose file. You can play with the Scale, Bold, and Blur to see altered results.

Quick Start: Video Upscaling

Prerequisites

Windows System
Python 3 Download
FFmpeg Windows Build Download

Installation & Setup

First, clone the ANIME4K-ML repository to your local directory.

git clone https://github.com/ANPULI/Anime4K-ML.git
cd ANIME4K-ML/SRGAN-video

Then, install the python dependencies using the following command before proceding.

pip install -r requirements.txt

Change Configuration

:warning: Caveat: After installing FFmpeg, please change the ffmpeg_path in video2x.json to the absolution path of your local installation.

Sample Video

If you do not have a video to start with, you can use the sample video provided in the sub-directory.

sample_contrast_4to3

Clip is from anime “さくら荘のペットな彼女”. Copyright belongs to “株式会社アニプレックス (Aniplex Inc.)”. Will delete immediately if use of clip is in violation of copyright.

Run Upscaling

Enlarge the video to 960p (4x) using the CPU.

python main.py -i sample_input.mp4 -o sample_input_upscaled.mp4 -m cpu -r 4

Quick Start: Traning and Testing

Prerequisites

Windows System
Python 3 Download

Installation & Setup

First, clone the ANIME4K-ML repository to your local directory.

git clone https://github.com/ANPULI/Anime4K-ML.git
cd ANIME4K-ML/SRGAN-impl

Then, install the python dependencies using the following command before proceding.

pip install -r requirements.txt

Use Pre-trained Model

You can spare the efforts of training by directly using one of the following pre-installed model:

models/generator.h5, trained on DIV2K dataset.
models/generator_ANIME.h5, trained on an anime face dataset

It is recommended to use the first model, though it was based on anime. The second dataset sufferes severely from the low resolution of the dataset, therefore it performs not well in terms of visual perception.

To use the DIV2K model, run the following in the terminal:

python infer_old.py --image_dir 'path/to/input/directory' --output_dir 'path/to/output/directory'

To use the anime model, run this:

python infer_anime.py --image_dir 'path/to/input/directory' --output_dir 'path/to/output/directory'

Toy Example

The folder ./SRGAN-impl/image_input contains several images that you can use as an toy example to train.

mkdir temp_output
python infer_old.py --image_dir "./image_input/" --output_dir "./temp_output"

Then you can see the upscaled images in ./temp_output, which should be the same as in ./image_output.

To clean the output, use

rm -rf temp_output

Sample Outputs

Here is the performance with contrast on other methods, all using 240p → 960p (4x) upscaling:

sample outputs

Speed Benchmark

The trained model is applied on two datasets, one is from DIV2K (800 images) and the other is from a 19 second anime clip (426 images) used in the demo. The average rendering speed result is shown below table, using the GPU provided in Google Colab:

Input Image Size	Output Image Size	Time (s)	FPS
128×128	512×512	0.022	46
256×256	1024×1024	0.045	22
384×384	1536×1536	0.083	12

The problem this project aims to resolve is to upscale low resolution anime videos to high resolution ones (240p → 1080p). The typical FPS of anime videos is 24. Since the program is run on Google Colab, where the GPU is not dedicated to one user and the ﬁle system works extremely bad in terms of read/write speed. In a local system, the running speed will be much higher and satisfy the requirements.

Similarity Measures

All experiments are performed with a scale factor of 4× between the low and high resolution images, corresponding to a 16× reduction in image pixels. The images Set5 and Set14 and the corresponding result metrics are attributed to the supplementary materials of Huang et al and Twitter’s work on SRGAN. The highest results, except for ground truth, are formatted bold. It is apparent that SR-GAN is the best among the proposed methods. The comparison with other deep learning methods is not included because they are not at the same level of computing time - SRGAN works substantially faster. The result metric is shown below. The index MOS is given in SRGAN’s paper that quantifies visual perception.

Set5&14	nearest	bicubic	SRGAN	original
PSNR	26.26	28.43	29.40	∞
SSIM	0.7552	0.8211	0.8472	1
MOS	1.28	1.97	3.58	4.32

Testing is also done on some anime images. The result goes as follows:

	nearest	bicubic	SRGAN	original
PSNR	29.79	31.67	32.88	∞
SSIM	0.9479	0.9585	0.9593	1

Do it on your own?

If you would like to test the similarity measures by yourself, there is an esay program sewar that provides the function. To get upscaled images using bicubic and nearest neighbors algorithms, you can use these the two python files:

SRGAN-impl/image_resize_bicubic.py
SRGAN-impl/image_resize_nn.py

The usage of it is as follows:

mkdir 'path/to/output/image'
python image_resize_nn.py --res 'resolution' --input_dir 'path/to/input/image' --output_dir 'path/to/output/image'

Train Your Own Model

To train the model, you need to have a local image dataset. However, you do not need to manually split the dataset into high-/low-resolution parts, the program will automatically do it for you. Then, simply execute the following command to start training:

python main.py --image_dir 'path/to/image/directory' --hr_size 384 --lr 1e-4 --save_iter 200 --epochs 10 --batch_size 14

The training status will get updated every 200 iteration. To monitor the training process, you can open tensorboard and point it to the log directory that will be created upon training.

Convert images to h5 file (buggy, don’t use)

If you are also using Google Colab as your training tool, and using Google Drive as the file system, you may have already spotted the following problems:

It takes a considerable time to read/write.
Some small files seem to exist when you visit them, but create file_not_exist error when accessed from programs.

If yes, you may want to convert the images to a single file to boost the read/write speed and preserve file completeness.

The SRGAN-impl/image2h5.py provides a solution by compressing multiple images into a HDF5 file. To use the file, first install the h5py dependencies by executing the following command:

pip install h5py

Then, make sure your files are in the sub-directory SRGAN-impl/image_input/ and of PNG formart, and run the following command:

python image2h5.py

Otherwise, you can also specifies the parameter by using:

python image2h5.py --image_dir 'path/to/your/image/input' --image_format 'your_image_format'

This will produce a hdf5 file called images.hdf5 that stores the information of all your images.

Do upscaling on hdf5 image file

After you have your hdf5 file, you can train the model based on it by runing:

python infer_h5.py --output_dir 'path/to/output/image'

This will generate upscaled images in your desired output directory.

Quick Start: Notebooks

In the directory ./Anime4K-ML-notebooks/ store some jupyter notebooks this project used on Google Colab, covering different stages of this project. Here is the list of each notebook and a short description on what each notebook does. In the notebooks there will be more instructions.

:warning: Caveat: These notebooks are not executable on your local file systems because they mount to the Google Drive for storage. If you feel interested in running the notebooks, please feel free to contact me for access and instructions.

File Name	Usage	Author
`load_dataset.ipynb`	Covers the code that downloaded datasets to Google Drive.	Hongyi Li
`interpolation.ipynb`	Bilinear, bicubic, and quintic upscaling using Scipy on some sample images.	Qihang Zeng
`midterm_presentation.ipynb`	Covers the dataset this project used and some preliminary experiments on the datasets, i.e., randomly picked some images and applied different upscaling algorithms and showed the result.	Hongyi Li
`video2x.ipynb`	Failed attempt to run a video upscaling pipeline. (System doesn’t match)	Anpu Li
`fast-srgan.ipynb`	Covers the training and testing using SRGAN and a complete pipeline of extracting frames, applyig super resolution, and converting back to video.	Anpu Li
`final.ipynb`	Covers the experiments, including the speed benchmark and similarity measures.	Anpu Li

Credits

The implementation of this project cannot be done without the insights derived from Fast-SRGAN and Video2X. This project also relies on FFmpeg and Anime4K.