An unofficial PyTorch implementation of the paper "Denoising Diffusion Probabilistic Models" (Ho et al., 2020). This implementation is based on the official TensorFlow implementation
This implementation includes:
- A complete DDPM training pipeline
- Support for CelebA-HQ and CIFAR-10 datasets
- Multi-GPU training support via PyTorch Lightning
- Configurable model architecture and training parameters
- TensorBoard logging for training metrics and image generation
- Python 3.10+
- CUDA-capable GPU (recommended)
- Dependencies listed in
environment.yml
orrequirements.txt
- Training on CelebA-HQ dataset (256x256 images) costs around $435 for 0.5M steps. It took ~11 days on 4x RTX A6000
- Training on CIFAR-10 dataset (32x32 images) costs around $70 for 0.8M steps. It took ~3 days on 1x RTX 4090
- Clone the repository:
$ git clone https://github.com/AhmedEssam19/ddpm-pytorch.git
$ cd ddpm-pytorch
- Create and activate conda environment:
$ conda env create -f environment.yml
$ conda activate image-generation-finetuning
Alternatively, you can use pip:
$ pip install --no-cache-dir -r requirements.txt
model.py
: Contains the U-Net architecture with attention mechanismsdiffusion_utils.py
: Implementation of the diffusion process utilitiestrain.py
: Training script with PyTorch Lightningpl_utils.py
: Lightning model and callbacksdataset.py
: Dataset implementations for CelebA-HQ and CIFAR-10config.py
: Configuration managementsample.py
: Image generation scriptconfigs/
: Configuration files for different datasets
-
Choose or modify a configuration file in the
configs/
directory. Two default configurations are provided:celeba.yml
: For CelebA-HQ dataset (256x256 images)cifar10.yml
: For CIFAR-10 dataset (32x32 images)
-
Start training:
$ python train.py configs/celeba.yml
To resume training from a checkpoint:
$ python train.py configs/celeba.yml --continue-training --checkpoint-path path/to/checkpoint.ckpt
To generate images using a trained model:
$ python3 sample.py --help
Usage: sample.py [OPTIONS] CHECKPOINT_PATH
╭─ Arguments ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ * checkpoint_path TEXT [default: None] [required] │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --device TEXT [default: cuda] │
│ --num-images INTEGER [default: 8] │
│ --image-size INTEGER [default: 256] │
│ --timesteps INTEGER [default: 1000] │
│ --batch-size INTEGER [default: 8] │
│ --output-dir TEXT [default: samples] │
│ --help Show this message and exit. │
The implementation uses a U-Net architecture with:
- Residual blocks with group normalization
- Self-attention layers at specified resolutions
- Time embedding through sinusoidal positional encoding
- Skip connections between encoder and decoder
The training follows the DDPM paper's approach to the tiniest details:
- Forward diffusion process adds Gaussian noise gradually
- Model learns to reverse the diffusion process
- Uses linear noise schedule
- Implements linear warmup for learning rate
Training progress can be monitored using TensorBoard:
tensorboard --logdir lightning_logs
This will show:
- Training loss
- Generated samples during training
- Validation metrics
If you use this implementation in your research, please cite the original DDPM paper:
@article{ho2020denoising,
title={Denoising Diffusion Probabilistic Models},
author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
journal={arXiv preprint arXiv:2006.11239},
year={2020}
}