Skip to content

traiNNer: Deep learning framework for image and video super-resolution, restoration and image-to-image translation, for training and testing.

License

Notifications You must be signed in to change notification settings

victorca25/traiNNer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BasicSR (Basic Super Restoration) is an open source image and video restoration toolbox (super-resolution, denoising, deblurring and others) based on PyTorch.

This is a heavily modified fork of the original BasicSR. What you will find here: boilerplate code for training and testing computer vision (CV) models, different CV methods and strategies integrated in a single pipeline and modularity to add and remove components as needed, including new network architectures. A large rewrite of code was made to reduce code redundancy and duplicates, reorganize the code and make it more modular.

Details of the supported architectures can be found here.

(README currently WIP)

Some of the new things in the latest version of this code:

  • The filters and image manipulations used by the different functions (HFEN, SSIM/MS-SSIM, SPL, TV/DTV, etc) are now consolidated in filters.py and colors.py
  • Reusable loss builder to reduce the changes needed when using a new model and adding new losses only once for all models
  • Metrics builder to include only the selected ones during validation
  • Automatic Mixed Precision (AMP: https://pytorch.org/docs/master/amp.html) is now properly integrated. (Code updated to work with Pytorch 1.6.0 and 1.3.0). Option "use_amp".
  • Contextual Loss (https://arxiv.org/abs/1803.02077, https://arxiv.org/abs/1803.04626). Option: 'cx_type'.
  • Differential Augmentations for efficient gan training (https://arxiv.org/pdf/2006.10738). Option: 'diffaug'.
  • batch augmentations (based on https://arxiv.org/abs/2004.00448). Option: 'mixup'.
  • ESRGAN+ improvements to the ESRGAN network (https://arxiv.org/pdf/2001.08073). Options: 'gaussian' and 'plus'.
  • adapted frequency filtering per loss function (https://arxiv.org/pdf/1911.07850). Option: 'fs'.
  • enabled option to use the feature maps from the VGG-like discriminator in training for feature similarity (https://arxiv.org/abs/1712.05927). Option: 'discriminator_vgg_128_fea'.
  • PatchGAN option for the discriminator (https://arxiv.org/pdf/1611.07004v3.pdf). Option: 'patchgan'.
  • Multiscale PatchGAN option for the discriminator (https://arxiv.org/pdf/1711.11585.pdf). Option: 'multiscale'.
  • Added a modified Pixel Attention Network for Efficient Image Super-Resolution (https://arxiv.org/pdf/2010.01073.pdf), which includes a self-attention layer in the residual path, among other changes. A basic pretrained model for 4x scale can be found here
  • Stochastic Weight Averaging (SWA: https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/, https://arxiv.org/pdf/1803.05407.pdf) added as an option. Currently the change only applies to the generator network, changing the original learning rate scheduler to the SWA scheduler after a defined number of iterations have passed (the original paper refers to the later 25% part of training). The resulting SWA model can be converted to a regular model after training using the scripts/swa2normal.py script. Option "use_swa" and configure the swa scheduler.
  • Added the basic idea behind "Freeze Discriminator: A Simple Baseline for Fine-tuning GANs" (https://arxiv.org/pdf/2002.10964.pdf) to accelerate training with transfer learning. It is possible to use a pretrained discriminator model and freeze the initial (bottom) X number of layers. Option: "freeze_loc", enabled for any of the VGG-like discriminators or patchgan (muliscale patchgan not yet added).
  • Other changes: added graceful interruption of training to continue from where it was interrupted, virtual batch option, "strict" model loading flag, support for using YAML or JSON options files, color transfer script (color_transfer.py) with multiple algorithms to transfer image statistics (colors) from a reference image to another, general fixes and code refactoring.

WIP:

(Previous changes can be found here)

Table of Contents

  1. Dependencies
  2. Codes
  3. Usage
  4. Datasets
  5. Pretrained models

Dependencies

Optional Dependencies

Codes

./codes. We provide a detailed explaination of the code framework in ./codes.

We also provide:

  1. Some useful scripts. More details in ./codes/scripts.
  2. Evaluation codes, e.g., PSNR/SSIM metric.

To extract the realistic kernels and noise patches, use the modified KernelGAN and patches extraction code in: DLIP

Usage

Data and model preparation

The common SR datasets can be found in Datasets. Detailed data preparation can be seen in codes/data.

We provide pretrained models in Pretrained models.

How to Test

For simple testing

The recommended way to get started with some of the models produced by the training codes available in this repository is by getting the pretrained models to be tested and either a GUI (for ESRGAN models, for video) or a smaller repo for inference (for ESRGAN, for video).

Otherwise, it is also possible to do inference of batches of images with the code in this repository as follow.

Test ESRGAN (SRGAN) models

  1. Modify the configuration file options/test/test_ESRGAN.yml (or options/test/test_ESRGAN.json)
  2. Run command: python test.py -opt options/test/test_ESRGAN.yml (or python test.py -opt options/test/test_ESRGAN.json)

Test SFTGAN models

  1. Obtain the segmentation probability maps: python test_seg.py
  2. Run command: python test_sftgan.py

Test PPON models

  1. Modify the configuration file options/test/test_ESRGAN.yml (or options/test/test_ESRGAN.json)
  2. Run command: python test_ppon.py -opt options/test/test_ESRGAN.yml (or python test_ppon.py -opt options/test/test_ESRGAN.json)

Test VSR models

  1. Modify the configuration file options/test/test_video.yml
  2. Run command: python test_vsr.py -opt options/test/test_video.yml

How to Train

How to train

Datasets

Several common SR datasets are list below.

Name Datasets Short Description Google Drive Other
Classical SR Training T91 91 images for training Google Drive Other
BSDS200 A subset (train) of BSD500 for training
General100 100 images for training
Classical SR Testing Set5 Set5 test dataset
Set14 Set14 test dataset
BSDS100 A subset (test) of BSD500 for testing
urban100 100 building images for testing (regular structures)
manga109 109 images of Japanese manga for testing
historical 10 gray LR images without the ground-truth
2K Resolution DIV2K proposed in NTIRE17(800 train and 100 validation) Google Drive Other
Flickr2K 2650 2K images from Flickr for training
DF2K A merged training dataset of DIV2K and Flickr2K
OST (Outdoor Scenes) OST Training 7 categories images with rich textures Google Drive Other
OST300 300 test images of outdoor scences
PIRM PIRM PIRM self-val, val, test datasets Google Drive Other

Any dataset can be augmented to expose the model to information that might not be available in the images, such a noise and blur. For this reason, Data Augmentation has been added to the options in this repository and it can be extended to include other types of augmentations.

Pretrained models

The most recent community pretrained models can be found in the Wiki, Discord and nmkd's models.

You can put the downloaded models in the default experiments/pretrained_models folder.

Models that were trained using the same pretrained model or are derivates of the same pretrained model are able to be interpolated to combine the properties of both. The original author demostrated this by interpolating the PSNR pretrained model (which is not perceptually good, but results in smooth images) with the ESRGAN resulting models that have more details but sometimes is excessive to control a balance in the resulting images, instead of interpolating the resulting images from both models, giving much better results.

The authors continued exploring the capabilities of linearly interpolating models in their new work "DNI" (CVPR19): Deep Network Interpolation for Continuous Imagery Effect Transition with very interesting results and examples. The script for interpolation can be found in the net_interp.py file, but a new version with more options will be commited at a later time. This is an alternative to create new models without additional training and also to create pretrained models for easier fine tuning.

More details and explanations of interpolation can be found here in the Wiki.

Following are the original pretrained models that the authors made available for ESRGAN, SFTGAN and PPON:

Name Models Short Description Google Drive Other
ESRGAN RRDB_ESRGAN_x4.pth final ESRGAN model we used in our paper Google Drive Other
RRDB_PSNR_x4.pth model with high PSNR performance
SFTGAN segmentation_OST_bic.pth segmentation model Google Drive Other
sft_net_ini.pth sft_net for initilization
sft_net_torch.pth SFTGAN Torch version (paper)
SFTGAN_bicx4_noBN_OST_bg.pth SFTGAN PyTorch version
SRGAN*1 SRGAN_bicx4_303_505.pth SRGAN(with modification) Google Drive
SRResNet*2 SRResNet_bicx4_in3nf64nb16.pth SRResNet(with modification) Google Drive
PPON*2 PPON.pth PPON model presented in the paper Original Repo
PAN PAN.pth 4x pretrained modified PAN model with self-attention Other
SOFVSR SOFVSR.pth 4x pretrained SOFVSR model, using 3 frames Other
SOFVESRGAN SOFVESRGAN.pth 4x pretrained modified SOFVSR model using ESRGAN network for super-resolution, using 3 frames Other
RIFE RIFE.pth Converted pretrained RIFE model from the three original pickle files into a single pth model Other

For more details about the original pretrained models, please see experiments/pretrained_models.


Additional Help

If you have any questions, we have a discord server where you can ask them and a Wiki with more information.


Acknowledgement