Skip to content

traiNNer: Deep learning framework for image and video super-resolution, restoration and image-to-image translation, for training and testing.

License

Notifications You must be signed in to change notification settings

victorca25/traiNNer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BasicSR

BasicSR (Basic Super Restoration) is an open source image and video restoration toolbox (super-resolution, denoising, deblurring and others) based on PyTorch.

Python Version License DeepSource Issues PR's Accepted

This is a heavily modified fork of the original BasicSR. What you will find here: boilerplate code for training and testing computer vision (CV) models, different CV methods and strategies integrated in a single pipeline and modularity to add and remove components as needed, including new network architectures. A large rewrite of code was made to reduce code redundancy and duplicates, reorganize the code and make it more modular.

Details of the supported architectures can be found here.

(README currently WIP)

Some of the new things in the latest version of this code:

  • The filters and image manipulations used by the different functions (HFEN, SSIM/MS-SSIM, SPL, TV/DTV, etc) are now consolidated in filters.py and colors.py.
  • Reusable loss builder to reduce the changes needed when using a new model and adding new losses only once for all models
  • Metrics builder to include only the selected ones during validation.
  • Integrated Automatic Mixed Precision (AMP). (Code updated to work with Pytorch 1.6.0 and 1.3.0). Option "use_amp".
  • Contextual Loss (CX, CX). Option: 'cx_type'.
  • Differential Augmentations for efficient gan training (Paper). Option: 'diffaug'.
  • Batch augmentations (based on Cutblur). Option: 'mixup'.
  • ESRGAN+ improvements to the ESRGAN network (ESRGAN+). Options: 'gaussian' and 'plus'.
  • Adapted frequency filtering per loss function (Reference). Option: 'fs'.
  • Enabled option to use the feature maps from the VGG-like discriminator in training for feature similarity (Reference). Option: 'discriminator_vgg_128_fea'.
  • PatchGAN option for the discriminator (Reference). Option: 'patchgan'.
  • Multiscale PatchGAN option for the discriminator (Reference). Option: 'multiscale'.
  • Added a modified Pixel Attention Network for Efficient Image Super-Resolution (PAN), which includes a self-attention layer in the residual path, among other changes. A basic pretrained model for 4x scale can be found here.
  • Stochastic Weight Averaging (SWA, Pytorch) added as an option. Currently the change only applies to the generator network, changing the original learning rate scheduler to the SWA scheduler after a defined number of iterations have passed (the original paper refers to the later 25% part of training). The resulting SWA model can be converted to a regular model after training using the scripts/swa2normal.py script. Option "use_swa" and configure the swa scheduler.
  • Added the basic idea behind "Freeze Discriminator: A Simple Baseline for Fine-tuning GANs" (FreezeD) to accelerate training with transfer learning. It is possible to use a pretrained discriminator model and freeze the initial (bottom) X number of layers. Option: "freeze_loc", enabled for any of the VGG-like discriminators or patchgan (multiscale patchgan not yet added).
  • Integrated the Consistency Enforcing Module (CEM) from Explorable Super Resolution (Paper, Web). Available both for use during inference, as well as during training (only using a default downsampling kernel ATM). Can be easily extended to use estimaged Kernels from the images for downscaling using KernelGAN from DLIP. More information on CEM here.
  • Added the training and testing codes for Super-Resolution using Normalizing Flow in PyTorch (SRFlow models (including the GLOW reference code). A starter pretrained model can be found here, which used a model based on the original ESRGAN architecture for the RRDB module (it is necessary to use it to later be able to test model interpolations). Otherwise, the original SRFlow model used the modified ESRGAN pretrained model that can also be used.
  • Other changes: added graceful interruption of training to continue from where it was interrupted, virtual batch option, "strict" model loading flag, support for using YAML or JSON options files, color transfer script (color_transfer.py) with multiple algorithms to transfer image statistics (colors) from a reference image to another, integrated the "forward_chop" function into the SR model to crop images into patches before upscaling for inference in VRAM constrained systems (use option test_mode: chop), general fixes and code refactoring.

WIP:

  • Added on the fly use of realistic image kernels extracted with KernelGAN (Paper and injection of noise extracted from real images patches (Reference).
  • Change to use openCV-based composable transformation for augmentations (From) with a new dataloader.
  • Use of configuration presets for reuse instead of editing full configuration files.
  • Video network for optical flow and video super-resolution (SOFVSR). Pretrained model using 3 frames, trained on a subset of REDS dataset here.
  • Added option to use different image upscaling networks with the HR optical flow estimation for video (Pretrained using 3 frames and default ESRGAN as SR network here).
  • Initial integration of RIFE (Paper) architecture for Video Frame Interpolation (Converted trained model from three pickle files into a single pth model here).
  • Video ESRGAN (EVSRGAN) and SR3D networks using 3D convolution for video super-resolution, inspired on "3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks" (Paper). EVSRGAN Pretrained using 3 frames and default arch options here.
  • Real-time Deep Video Deinterlacing (Paper) training and testing codes implemented. Pretraineds DVD models can be found here.

(Previous changes can be found here)

Table of Contents

  1. Dependencies
  2. Codes
  3. Usage
  4. Datasets
  5. Pretrained models

Dependencies

  • Python 3 (Recommend to use Anaconda)
  • PyTorch >= 0.4.0. PyTorch >= 1.7.0 required to enable certain features (SWA, AMP, others).
  • NVIDIA GPU + CUDA
  • Python packages: pip install numpy opencv-python

Optional Dependencies

Codes

./codes. Detailed explaination of the code framework in ./codes.

We also provide:

  1. Some useful scripts. More details in ./codes/scripts.
  2. Evaluation codes, e.g., PSNR/SSIM metric.

To extract the estimated kernels and noise patches from images, use the modified KernelGAN and patches extraction code in: DLIP. Detailed instructions to use the estimated kernels are available here

Usage

Data and model preparation

The common SR datasets can be found in Datasets. Detailed data preparation can be seen in codes/data.

We provide pretrained models in Pretrained models.

How to Test

For simple testing

The recommended way to get started with some of the models produced by the training codes available in this repository is by getting the pretrained models to be tested and either a GUI (for ESRGAN models, for video) or a smaller repo for inference (for ESRGAN, for video).

Otherwise, it is also possible to do inference of batches of images and some additional options (such as CEM, geometric self-ensemble or automatic cropping of images before upscale for VRAM limited environment) with the code in this repository as follow.

Test Super Resolution models (ESRGAN, PPON, PAN, others)

  1. Modify the configuration file options/test/test_ESRGAN.yml (or options/test/test_ESRGAN.json)
  2. Run command: python test.py -opt options/test/test_ESRGAN.yml (or python test.py -opt options/test/test_ESRGAN.json)

Test SFTGAN models

  1. Obtain the segmentation probability maps: python test_seg.py
  2. Run command: python test_sftgan.py

Test VSR models

  1. Modify the configuration file options/test/test_video.yml
  2. Run command: python test_vsr.py -opt options/test/test_video.yml

How to Train

How to train

Datasets

Several common SR datasets are list below.

Name Datasets Short Description Google Drive Other
Classical SR Training T91 91 images for training Google Drive Other
BSDS200 A subset (train) of BSD500 for training
General100 100 images for training
Classical SR Testing Set5 Set5 test dataset
Set14 Set14 test dataset
BSDS100 A subset (test) of BSD500 for testing
urban100 100 building images for testing (regular structures)
manga109 109 images of Japanese manga for testing
historical 10 gray LR images without the ground-truth
2K Resolution DIV2K proposed in NTIRE17(800 train and 100 validation) Google Drive Other
Flickr2K 2650 2K images from Flickr for training
DF2K A merged training dataset of DIV2K and Flickr2K
OST (Outdoor Scenes) OST Training 7 categories images with rich textures Google Drive Other
OST300 300 test images of outdoor scences
PIRM PIRM PIRM self-val, val, test datasets Google Drive Other

Any dataset can be augmented to expose the model to information that might not be available in the images, such a noise and blur. For this reason, Data Augmentation has been added to the options in this repository and it can be extended to include other types of augmentations.

Pretrained models

The most recent community pretrained models can be found in the Wiki, Discord and nmkd's models.

You can put the downloaded models in the default experiments/pretrained_models folder.

Models that were trained using the same pretrained model or are derivates of the same pretrained model are able to be interpolated to combine the properties of both. The original author demostrated this by interpolating the PSNR pretrained model (which is not perceptually good, but results in smooth images) with the ESRGAN resulting models that have more details but sometimes is excessive to control a balance in the resulting images, instead of interpolating the resulting images from both models, giving much better results.

The authors continued exploring the capabilities of linearly interpolating models in "DNI": Deep Network Interpolation for Continuous Imagery Effect Transition (CVPR19) with very interesting results and examples. The script for interpolation can be found in the net_interp.py file. This is an alternative to create new models without additional training and also to create pretrained models for easier fine tuning.

More details and explanations of interpolation can be found here in the Wiki.

Following are the original pretrained models that the authors made available for ESRGAN, SFTGAN and PPON:

Name Models Short Description Source Other
ESRGAN RRDB_ESRGAN_x4.pth final ESRGAN model we used in our paper Google Drive Other
RRDB_PSNR_x4.pth model with high PSNR performance
SFTGAN segmentation_OST_bic.pth segmentation model Google Drive Other
sft_net_ini.pth sft_net for initilization
sft_net_torch.pth SFTGAN Torch version (paper)
SFTGAN_bicx4_noBN_OST_bg.pth SFTGAN PyTorch version
SRGAN*1 SRGAN_bicx4_303_505.pth SRGAN(with modification) Google Drive
SRResNet*2 SRResNet_bicx4_in3nf64nb16.pth SRResNet(with modification) Google Drive
PPON*2 PPON.pth PPON model presented in the paper Original Repo
PAN PAN.pth 4x pretrained modified PAN model with self-attention Other
SOFVSR SOFVSR.pth 4x pretrained SOFVSR model, using 3 frames Other
SOFVESRGAN SOFVESRGAN.pth 4x pretrained modified SOFVSR model using ESRGAN network for super-resolution, using 3 frames Other
RIFE RIFE.pth Converted pretrained RIFE model from the three original pickle files into a single pth model Other

For more details about the original pretrained models, please see experiments/pretrained_models.


Additional Help

If you have any questions, we have a discord server where you can ask them and a Wiki with more information.


Acknowledgement