Course project for the UBC CPSC532L course "Multimodal Learning with Vision, Language and Sound". The goal of the project is to implement StackGAN model and explore different possibilities of improving the quality of generated results. One of the approaches to do so is to train model using different types of loss functions, for example using perceptual losses, proposed in https://arxiv.org/abs/1708.09321. Another area for exploration is to modify architecture of the second GAN to produce two loss functions of discriminator and combine them together or to make second GAN more similar to SRGAN.
Pytorch implementation for reproducing COCO results in the paper StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks by Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas. The network structure is slightly different from the tensorflow implementation.
python 2.7
Pytorch
In addition, please add the project folder to PYTHONPATH and pip install
the following packages:
tensorboard
python-dateutil
easydict
pandas
torchfile
Data
- Download our preprocessed char-CNN-RNN text embeddings for training coco and evaluating coco, save them to
data/coco
.
- [Optional] Follow the instructions reedscot/icml2016 to download the pretrained char-CNN-RNN text encoders and extract text embeddings.
- Download the coco image data. Extract them to
data/coco/
.
Training
- The steps to train a StackGAN model on the COCO dataset using our preprocessed embeddings.
- Step 1: train Stage-I GAN (e.g., for 120 epochs)
python main.py --cfg cfg/coco_s1.yml --gpu 0
- Step 2: train Stage-II GAN (e.g., for another 120 epochs)
python main.py --cfg cfg/coco_s2.yml --gpu 1
- Step 1: train Stage-I GAN (e.g., for 120 epochs)
*.yml
files are example configuration files for training/evaluating our models.- If you want to try your own datasets, here are some good tips about how to train GAN. Also, we encourage to try different hyper-parameters and architectures, especially for more complex datasets.
Pretrained Model
- StackGAN for coco. Download and save it to
models/coco
. - Our current implementation has a higher inception score(10.62±0.19) than reported in the StackGAN paper
Evaluating
- Run
python main.py --cfg cfg/coco_eval.yml --gpu 2
to generate samples from captions in COCO validation set.
Examples for COCO:
Save your favorite pictures generated by our models since the randomness from noise z and conditioning augmentation makes them creative enough to generate objects with different poses and viewpoints from the same discription 😃
If you find StackGAN useful in your research, please consider citing:
@inproceedings{han2017stackgan,
Author = {Han Zhang and Tao Xu and Hongsheng Li and Shaoting Zhang and Xiaogang Wang and Xiaolei Huang and Dimitris Metaxas},
Title = {StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks},
Year = {2017},
booktitle = {{ICCV}},
}
Our follow-up work
- StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks
- AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks [supplementary][code]
References