BasicSR (Basic Super Restoration) is an open source image and video restoration toolbox (super-resolution, denoising, deblurring and others) based on PyTorch.
This is a heavily modified fork of the original BasicSR. What you will find here: boilerplate code for training and testing computer vision (CV) models, different CV methods and strategies integrated in a single pipeline and modularity to add and remove components as needed, including new network architectures. A large rewrite of code was made to reduce code redundancy and duplicates, reorganize the code and make it more modular.
Details of the supported architectures can be found here.
(README currently WIP)
Some of the new things in the latest version of this code:
- The filters and image manipulations used by the different functions (HFEN, SSIM/MS-SSIM, SPL, TV/DTV, etc) are now consolidated in filters.py and colors.py
- Reusable loss builder to reduce the changes needed when using a new model and adding new losses only once for all models
- Metrics builder to include only the selected ones during validation
- Automatic Mixed Precision (AMP: https://pytorch.org/docs/master/amp.html) is now properly integrated. (Code updated to work with Pytorch 1.6.0 and 1.3.0). Option "use_amp".
- Contextual Loss (https://arxiv.org/abs/1803.02077, https://arxiv.org/abs/1803.04626). Option: 'cx_type'.
- Differential Augmentations for efficient gan training (https://arxiv.org/pdf/2006.10738). Option: 'diffaug'.
- batch augmentations (based on https://arxiv.org/abs/2004.00448). Option: 'mixup'.
- ESRGAN+ improvements to the ESRGAN network (https://arxiv.org/pdf/2001.08073). Options: 'gaussian' and 'plus'.
- adapted frequency filtering per loss function (https://arxiv.org/pdf/1911.07850). Option: 'fs'.
- enabled option to use the feature maps from the VGG-like discriminator in training for feature similarity (https://arxiv.org/abs/1712.05927). Option: 'discriminator_vgg_128_fea'.
- PatchGAN option for the discriminator (https://arxiv.org/pdf/1611.07004v3.pdf). Option: 'patchgan'.
- Multiscale PatchGAN option for the discriminator (https://arxiv.org/pdf/1711.11585.pdf). Option: 'multiscale'.
- Added a modified Pixel Attention Network for Efficient Image Super-Resolution (https://arxiv.org/pdf/2010.01073.pdf), which includes a self-attention layer in the residual path, among other changes. A basic pretrained model for 4x scale can be found here
- Stochastic Weight Averaging (SWA: https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/, https://arxiv.org/pdf/1803.05407.pdf) added as an option. Currently the change only applies to the generator network, changing the original learning rate scheduler to the SWA scheduler after a defined number of iterations have passed (the original paper refers to the later 25% part of training). The resulting SWA model can be converted to a regular model after training using the scripts/swa2normal.py script. Option "use_swa" and configure the swa scheduler.
- Added the basic idea behind "Freeze Discriminator: A Simple Baseline for Fine-tuning GANs" (https://arxiv.org/pdf/2002.10964.pdf) to accelerate training with transfer learning. It is possible to use a pretrained discriminator model and freeze the initial (bottom) X number of layers. Option: "freeze_loc", enabled for any of the VGG-like discriminators or patchgan (muliscale patchgan not yet added).
- Other changes: added graceful interruption of training to continue from where it was interrupted, virtual batch option, "strict" model loading flag, support for using YAML or JSON options files, color transfer script (color_transfer.py) with multiple algorithms to transfer image statistics (colors) from a reference image to another, general fixes and code refactoring.
WIP:
- Added on the fly use of realistic image kernels extracted with KernelGAN (https://openaccess.thecvf.com/content_ICCV_2019/papers/Zhou_Kernel_Modeling_Super-Resolution_on_Real_Low-Resolution_Images_ICCV_2019_paper.pdf) and injection of noise extracted from real images patches (https://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_Image_Blind_Denoising_CVPR_2018_paper.pdf)
- Change to use openCV-based composable transformation for augmentations (https://github.com/victorca25/opencv_transforms) with a new dataloader
- Use of configuration presets for reuse instead of editing full configuration files
- Video network for optical flow and video super-resolution (http://arxiv.org/abs/2001.02129. Pretrained model using 3 frames, trained on a subset of REDS dataset here)
- Added option to use different image upscaling networks with the HR optical flow estimation for video (Pretrained using 3 frames and default ESRGAN as SR network here)
- Initial integration of RIFE (https://arxiv.org/abs/2011.06294) architecture for Video Frame Interpolation (Converted trained model from three pickle files into a single pth model here)
- Video ESRGAN (EVSRGAN) and SR3D networks using 3D convolution for video super-resolution, inspired on "3DSRnet: Video Super-resolution using 3D Convolutional Neural Networks" (https://arxiv.org/pdf/1812.09079.pdf). (EVSRGAN Pretrained using 3 frames and default arch options here)
- Real-time Deep Video Deinterlacing (https://arxiv.org/pdf/1708.00187.pdf) training and testing codes implemented
(Previous changes can be found here)
- Python 3 (Recommend to use Anaconda)
- PyTorch >= 0.4.0
- NVIDIA GPU + CUDA
- Python packages:
pip install numpy opencv-python
- Python package:
pip install tensorboardX
, for visualizing curves. - Python package:
pip install lmdb
, for lmdb database support.
./codes
. We provide a detailed explaination of the code framework in ./codes
.
We also provide:
- Some useful scripts. More details in
./codes/scripts
. - Evaluation codes, e.g., PSNR/SSIM metric.
To extract the realistic kernels and noise patches, use the modified KernelGAN and patches extraction code in: DLIP
The common SR datasets can be found in Datasets. Detailed data preparation can be seen in codes/data
.
We provide pretrained models in Pretrained models.
The recommended way to get started with some of the models produced by the training codes available in this repository is by getting the pretrained models to be tested and either a GUI (for ESRGAN models, for video) or a smaller repo for inference (for ESRGAN, for video).
Otherwise, it is also possible to do inference of batches of images with the code in this repository as follow.
- Modify the configuration file
options/test/test_ESRGAN.yml
(oroptions/test/test_ESRGAN.json
) - Run command:
python test.py -opt options/test/test_ESRGAN.yml
(orpython test.py -opt options/test/test_ESRGAN.json
)
- Obtain the segmentation probability maps:
python test_seg.py
- Run command:
python test_sftgan.py
- Modify the configuration file
options/test/test_ESRGAN.yml
(oroptions/test/test_ESRGAN.json
) - Run command:
python test_ppon.py -opt options/test/test_ESRGAN.yml
(orpython test_ppon.py -opt options/test/test_ESRGAN.json
)
- Modify the configuration file
options/test/test_video.yml
- Run command:
python test_vsr.py -opt options/test/test_video.yml
Several common SR datasets are list below.
Name | Datasets | Short Description | Google Drive | Other |
---|---|---|---|---|
Classical SR Training | T91 | 91 images for training | Google Drive | Other |
BSDS200 | A subset (train) of BSD500 for training | |||
General100 | 100 images for training | |||
Classical SR Testing | Set5 | Set5 test dataset | ||
Set14 | Set14 test dataset | |||
BSDS100 | A subset (test) of BSD500 for testing | |||
urban100 | 100 building images for testing (regular structures) | |||
manga109 | 109 images of Japanese manga for testing | |||
historical | 10 gray LR images without the ground-truth | |||
2K Resolution | DIV2K | proposed in NTIRE17(800 train and 100 validation) | Google Drive | Other |
Flickr2K | 2650 2K images from Flickr for training | |||
DF2K | A merged training dataset of DIV2K and Flickr2K | |||
OST (Outdoor Scenes) | OST Training | 7 categories images with rich textures | Google Drive | Other |
OST300 | 300 test images of outdoor scences | |||
PIRM | PIRM | PIRM self-val, val, test datasets | Google Drive | Other |
Any dataset can be augmented to expose the model to information that might not be available in the images, such a noise and blur. For this reason, Data Augmentation has been added to the options in this repository and it can be extended to include other types of augmentations.
The most recent community pretrained models can be found in the Wiki, Discord and nmkd's models.
You can put the downloaded models in the default experiments/pretrained_models
folder.
Models that were trained using the same pretrained model or are derivates of the same pretrained model are able to be interpolated to combine the properties of both. The original author demostrated this by interpolating the PSNR pretrained model (which is not perceptually good, but results in smooth images) with the ESRGAN resulting models that have more details but sometimes is excessive to control a balance in the resulting images, instead of interpolating the resulting images from both models, giving much better results.
The authors continued exploring the capabilities of linearly interpolating models in their new work "DNI" (CVPR19): Deep Network Interpolation for Continuous Imagery Effect Transition with very interesting results and examples. The script for interpolation can be found in the net_interp.py file, but a new version with more options will be commited at a later time. This is an alternative to create new models without additional training and also to create pretrained models for easier fine tuning.
More details and explanations of interpolation can be found here in the Wiki.
Following are the original pretrained models that the authors made available for ESRGAN, SFTGAN and PPON:
Name | Models | Short Description | Google Drive | Other |
---|---|---|---|---|
ESRGAN | RRDB_ESRGAN_x4.pth | final ESRGAN model we used in our paper | Google Drive | Other |
RRDB_PSNR_x4.pth | model with high PSNR performance | |||
SFTGAN | segmentation_OST_bic.pth | segmentation model | Google Drive | Other |
sft_net_ini.pth | sft_net for initilization | |||
sft_net_torch.pth | SFTGAN Torch version (paper) | |||
SFTGAN_bicx4_noBN_OST_bg.pth | SFTGAN PyTorch version | |||
SRGAN*1 | SRGAN_bicx4_303_505.pth | SRGAN(with modification) | Google Drive | |
SRResNet*2 | SRResNet_bicx4_in3nf64nb16.pth | SRResNet(with modification) | Google Drive | |
PPON*2 | PPON.pth | PPON model presented in the paper | Original Repo | |
PAN | PAN.pth | 4x pretrained modified PAN model with self-attention | Other | |
SOFVSR | SOFVSR.pth | 4x pretrained SOFVSR model, using 3 frames | Other | |
SOFVESRGAN | SOFVESRGAN.pth | 4x pretrained modified SOFVSR model using ESRGAN network for super-resolution, using 3 frames | Other | |
RIFE | RIFE.pth | Converted pretrained RIFE model from the three original pickle files into a single pth model | Other |
For more details about the original pretrained models, please see experiments/pretrained_models
.
If you have any questions, we have a discord server where you can ask them and a Wiki with more information.
- Code architecture is inspired by pytorch-cyclegan and based on the original version of BasicSR.