Skip to content

Commit

Permalink
Merge pull request #10 from wilson1yan/dev
Browse files Browse the repository at this point in the history
Add pretrained UCF101 / BAIR VideoGPT models + corresponding demos + FVD script
  • Loading branch information
wilson1yan authored Jun 27, 2021
2 parents b234b80 + 74bd379 commit 5aca0e2
Show file tree
Hide file tree
Showing 23 changed files with 2,270 additions and 71 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ datasets/
.ipynb_checkpoints/
*.mp4
*.egg-info
i3d_pretrained_400.pt
27 changes: 20 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,24 @@ We present VideoGPT: a conceptually simple architecture for scaling likelihood b
## Installation
Change the `cudatoolkit` version compatible to your machine.
```bash
$ conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
$ pip install git+https://github.com/wilson1yan/VideoGPT.git
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install git+https://github.com/wilson1yan/VideoGPT.git
```

### Sparse Attention (Optional)
For limited compute scenarios, it may be beneficial to use [sparse attention](https://arxiv.org/abs/1904.10509).
```bash
$ sudo apt-get install llvm-9-dev
$ DS_BUILD_SPARSE_ATTN=1 pip install deepspeed
sudo apt-get install llvm-9-dev
DS_BUILD_SPARSE_ATTN=1 pip install deepspeed
```
After installng `deepspeed`, you can train a sparse transformer by setting the flag `--attn_type sparse` in `scripts/train_videogpt.py`. The default supported sparsity configuration is an N-d strided sparsity layout, however, you can write your own arbitrary layouts to use.

## Dataset
The default code accepts data as an HDF5 file with the specified format in `videogpt/data.py`, OR a directory format with the follow structure:
The default code accepts data as an HDF5 file with the specified format in `videogpt/data.py`. An example of such a dataset can be constructed from the BAIR Robot data by running the script:
```bash
sh scripts/preprocess/bair/create_bair_dataset.sh datasets/bair
```
Alternatively, the code supports a dataset with the following directory structure:
```
video_dataset/
train/
Expand Down Expand Up @@ -52,7 +56,7 @@ video_dataset/
```
An example of such a dataset can be constructed from [UCF-101](https://www.crcv.ucf.edu/data/UCF101.php) data by running the script
```bash
sh scripts/preprocess/create_ucf_dataset.sh datasets/ucf101
sh scripts/preprocess/ucf101/create_ucf_dataset.sh datasets/ucf101
```
You may need to install `unrar` and `unzip` for the code to work correctly.

Expand Down Expand Up @@ -103,6 +107,12 @@ Use the `scripts/train_vqvae.py` script to train a VQ-VAE. Execute `python scrip
* `--resolution 128`: spatial resolution to train on
* `--sequence_length 16`: temporal resolution, or video clip length

## Using Pretrained VideoGPTs
There are two available pre-trained VideoGPT models
* `bair_gpt`: single frame-conditional BAIR model using discrete encodings from `bair_stride4x2x2` VQ-VAE
* `ucf101_uncond_gpt`: unconditional UCF101 model using discrete encodings from `ucf101_stride4x4x4` VQ-VAE
Note that both pre-trained models use sparse attention. For purposes of fine-tuning, you will need to install sparse attention, however, sampling does not required sparse attention to be installed.

## Training VideoGPT
You can download a pretrained VQ-VAE, or train your own. Afterwards, use the `scripts/train_videogpt.py` script to train an VideoGPT model for sampling. Execute `python scripts/train_videogpt.py -h` for information on all available training settings. A subset of more relevant settings are listed below, along with default values.
### VideoGPT Specific Settings
Expand All @@ -128,7 +138,10 @@ You can download a pretrained VQ-VAE, or train your own. Afterwards, use the `sc
* `--sequence_length 16`: temporal resolution, or video clip length

## Sampling VideoGPT
After training, the VideoGPT model can be sampled using the `scripts/sample_videogpt.py`. You may need to install `ffmpeg`: `sudo apt-get install ffmpeg`
VideoGPT models can be sampled using the `scripts/sample_videogpt.py`. You can specify a path to a checkpoint during training, or the name of a pretrained model. You may need to install `ffmpeg`: `sudo apt-get install ffmpeg`

## Evaluation
Evaluation is done primarily using [Frechet Video Distance (FVD)](https://arxiv.org/abs/1812.01717) for BAIR and Kinetics, and [Inception Score](https://arxiv.org/abs/1606.03498) for UCF-101. Inception Score can be computed by generating samples and using the code from the [TGANv2 repo](https://github.com/pfnet-research/tgan2). FVD can be computed through `python scripts/compute_fvd.py`, which runs a PyTorch-ported version of the [original codebase](https://github.com/google-research/google-research/tree/master/frechet_video_distance)

## Reproducing Paper Results
Note that this repo is primarily designed for simplicity and extending off of our method. Reproducing the full paper results can be done using code found at a [separate repo](https://github.com/wilson1yan/VideoGPT-Paper). However, be aware that the code is not as clean.
Expand Down
Loading

0 comments on commit 5aca0e2

Please sign in to comment.