Skip to content

Commit

Permalink
Merge branch 'master' of github.com:jxbz/fromage
Browse files Browse the repository at this point in the history
  • Loading branch information
jxbz committed Jun 9, 2020
2 parents 13d114f + 14091e5 commit 6457498
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 88 deletions.
29 changes: 14 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@ Fromage 🧀 optimiser
To get started with Fromage in your Pytorch code, copy the file `fromage.py` into your project directory, then write:
```
from fromage import Fromage
optimizer = Fromage(net.parameters(), lr=0.01)
optimizer = Fromage(net.parameters(), lr=0.01, p_bound=None)
```
We found an initial learning rate of 0.01 worked well in all experiments except model fine-tuning, where we used 0.001. You may want to experiment with learning rate decay schedules.
An initial learning rate of 0.01 has worked well in all our experiments except model fine-tuning, where 0.001 worked well. Decaying the learning rate when the loss plateaus is a good idea.

On some benchmarks, Fromage heavily overfit the training set. We were able to control this behaviour by setting the p_bound regularisation flag. This constrains the norm of each layer's weights to lie within a factor of p_bound times its intial value.

## About this repository

Expand All @@ -37,24 +39,21 @@ If something isn't clear or isn't working, let us know in the *Issues section* o
Here is the structure of this repository.

.
├── classify-cifar/ # CIFAR-10 classification experiments.
├── classify-imagenet/ # Imagenet classification experiments. Coming soon! 🕒
├── classify-mnist/ # MNIST classification experiments.
├── finetune-transformer/ # Transformer fine-tuning experiments.
├── generate-cifar/ # CIFAR-10 class-conditional GAN experiments.
├── make-plots/ # Code to reproduce the figures in the paper.
├── LICENSE # The license on our algorithm.
├── README.md # The very page you're reading now.
└── fromage.py # Pytorch code for the Fromage optimiser.
├── classify-cifar/ # CIFAR-10 classification experiments.
├── classify-imagenet/ # Imagenet classification experiments.
├── classify-mnist/ # MNIST classification experiments.
├── transformer-wikitext2/ # Transformer training experiments.
├── generate-cifar/ # CIFAR-10 class-conditional GAN experiments.
├── make-plots/ # Code to reproduce the figures in the paper.
├── LICENSE # The license on our algorithm.
├── README.md # The very page you're reading now.
└── fromage.py # Pytorch code for the Fromage optimiser.

Check back in a few days if the code you're after is missing. We're currently cleaning and posting it.

## Acknowledgements

- This research was supported by [Caltech](https://www.caltech.edu/) and [NVIDIA](https://www.nvidia.com/).
- Our code is written in [Pytorch](https://pytorch.org/).
- Our GAN implementation is based on a codebase by [Jiahui Yu](http://jiahuiyu.com/).
- Our Transformer code is from [🤗 Transformers](https://github.com/huggingface/transformers).
- Our Transformer code is from the [Pytorch example](https://github.com/pytorch/examples/tree/master/word_language_model).
- Our CIFAR-10 classification code is orginally by [kuangliu](https://github.com/kuangliu/pytorch-cifar).
- Our MNIST code was originally forked from the [Pytorch example](https://github.com/pytorch/examples/tree/master/mnist).
- See [here](https://arxiv.org/abs/1708.03888) and [here](https://people.eecs.berkeley.edu/~youyang/publications/batch32k.pdf) for closely related work by [Yang You](https://people.eecs.berkeley.edu/~youyang/), [Igor Gitman](https://scholar.google.com/citations?user=8r9aWLIAAAAJ&hl=en) and [Boris Ginsburg](https://scholar.google.com/citations?user=7BRYaGcAAAAJ&hl=nl).
Expand Down
14 changes: 10 additions & 4 deletions classify-imagenet/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
## Requirements
<h1 align="center">
Fromage 🧀 optimiser
</h1>

## Imagenet classification experiments

### Requirements
- [PyTorch](http://pytorch.org)
- [NVIDIA APEX](https://github.com/NVIDIA/apex#quick-start)

## Data Preparation
### Data Preparation
Download the ImageNet 2012 dataset and structure the dataset under
train and val subfloders. You can follow [this page](https://github.com/pytorch/examples/tree/master/imagenet#requirements)
to structure the dataset. The data directory should be in the form:
Expand All @@ -17,11 +23,11 @@ to structure the dataset. The data directory should be in the form:
├── n01443537/
├── ...

## COMMANDS
### Commands
```
cd classify-imagenet
python -m torch.distributed.launch --nproc_per_node=8 train_imagenet.py --data $DATA_DIR --results_dir $RESULTS_DIR \
--save $EXPR_NAME --optimizer fromage --learning_rate 1e-2 --seed 0
```
Above `$DATA_DIR` refers to the dataset directory path, `$RESULTS_DIR` is the results directory with `$EXPR_NAME` giving
a name for the experiment.
a name for the experiment.
14 changes: 2 additions & 12 deletions generate-cifar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,9 @@ The following Python packages are required: numpy, torch, torchvision, tqdm.

An example job is
```
python main.py --optim fromage --lrG 0.01 --lrD 0.01 --epochs 121 --seed 0
python main.py --seed 0 --optim fromage --initial_lr 0.01
```
See inside `main.py` for additional command line arguments.

## Results

Runnning `sh batch.sh`, we obtain the following results:

| | train FID | test FID |
|---------|------------|------------|
| Fromage | 16.4 ± 0.5 | 16.3 ± 0.8 |
| Adam | 19.1 ± 0.9 | 19.4 ± 1.1 |
| SGD | 36.4 ± 2.5 | 36.7 ± 2.7 |
See inside `batch.sh` for the settings used in the paper.

## Acknowledgements
- The self attention block implementation is originally by https://github.com/zhaoyuzhi.
Expand Down
64 changes: 7 additions & 57 deletions transformer-wikitext2/README.md
Original file line number Diff line number Diff line change
@@ -1,61 +1,11 @@
# Word-level language modeling RNN
<h1 align="center">
Fromage 🧀 optimiser
</h1>

This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task.
By default, the training script uses the Wikitext-2 dataset, provided.
The trained model can then be used by the generate script to generate new text.
## Transformer training on Wikitext-2

```bash
python main.py --cuda --epochs 6 # Train a LSTM on Wikitext-2 with CUDA
python main.py --cuda --epochs 6 --tied # Train a tied LSTM on Wikitext-2 with CUDA
python main.py --cuda --epochs 6 --model Transformer --lr 5
# Train a Transformer model on Wikitext-2 with CUDA
python main.py --cuda --tied # Train a tied LSTM on Wikitext-2 with CUDA for 40 epochs
python generate.py # Generate samples from the trained LSTM model.
python generate.py --cuda --model Transformer
# Generate samples from the trained Transformer model.
This codebase is from the [Pytorch example](https://github.com/pytorch/examples/tree/master/word_language_model). To run the training script, use a command like:
```

The model uses the `nn.RNN` module (and its sister modules `nn.GRU` and `nn.LSTM`)
which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.

During training, if a keyboard interrupt (Ctrl-C) is received,
training is stopped and the current model is evaluated against the test dataset.

The `main.py` script accepts the following arguments:

```bash
optional arguments:
-h, --help show this help message and exit
--data DATA location of the data corpus
--model MODEL type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)
--emsize EMSIZE size of word embeddings
--nhid NHID number of hidden units per layer
--nlayers NLAYERS number of layers
--lr LR initial learning rate
--clip CLIP gradient clipping
--epochs EPOCHS upper epoch limit
--batch_size N batch size
--bptt BPTT sequence length
--dropout DROPOUT dropout applied to layers (0 = no dropout)
--decay DECAY learning rate decay per epoch
--tied tie the word embedding and softmax weights
--seed SEED random seed
--cuda use CUDA
--log-interval N report interval
--save SAVE path to save the final model
--onnx-export path to export the final model in onnx format
--transformer_head N the number of heads in the encoder/decoder of the transformer model
--transformer_encoder_layers N the number of layers in the encoder of the transformer model
--transformer_decoder_layers N the number of layers in the decoder of the transformer model
--transformer_d_ff N the number of nodes on the hidden layer in feed forward nn
```

With these arguments, a variety of models can be tested.
As an example, the following arguments produce slower but better models:

```bash
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied
python main.py --cuda --epochs 20 --model Transformer --optim fromage --lr 0.01 --p_bound 1.0
```
We provide the shell script `batch.sh` to run multiple experiments.

0 comments on commit 6457498

Please sign in to comment.