-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'master' of github.com:jxbz/fromage
- Loading branch information
Showing
4 changed files
with
33 additions
and
88 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,61 +1,11 @@ | ||
# Word-level language modeling RNN | ||
<h1 align="center"> | ||
Fromage 🧀 optimiser | ||
</h1> | ||
|
||
This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. | ||
By default, the training script uses the Wikitext-2 dataset, provided. | ||
The trained model can then be used by the generate script to generate new text. | ||
## Transformer training on Wikitext-2 | ||
|
||
```bash | ||
python main.py --cuda --epochs 6 # Train a LSTM on Wikitext-2 with CUDA | ||
python main.py --cuda --epochs 6 --tied # Train a tied LSTM on Wikitext-2 with CUDA | ||
python main.py --cuda --epochs 6 --model Transformer --lr 5 | ||
# Train a Transformer model on Wikitext-2 with CUDA | ||
python main.py --cuda --tied # Train a tied LSTM on Wikitext-2 with CUDA for 40 epochs | ||
python generate.py # Generate samples from the trained LSTM model. | ||
python generate.py --cuda --model Transformer | ||
# Generate samples from the trained Transformer model. | ||
This codebase is from the [Pytorch example](https://github.com/pytorch/examples/tree/master/word_language_model). To run the training script, use a command like: | ||
``` | ||
|
||
The model uses the `nn.RNN` module (and its sister modules `nn.GRU` and `nn.LSTM`) | ||
which will automatically use the cuDNN backend if run on CUDA with cuDNN installed. | ||
|
||
During training, if a keyboard interrupt (Ctrl-C) is received, | ||
training is stopped and the current model is evaluated against the test dataset. | ||
|
||
The `main.py` script accepts the following arguments: | ||
|
||
```bash | ||
optional arguments: | ||
-h, --help show this help message and exit | ||
--data DATA location of the data corpus | ||
--model MODEL type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU) | ||
--emsize EMSIZE size of word embeddings | ||
--nhid NHID number of hidden units per layer | ||
--nlayers NLAYERS number of layers | ||
--lr LR initial learning rate | ||
--clip CLIP gradient clipping | ||
--epochs EPOCHS upper epoch limit | ||
--batch_size N batch size | ||
--bptt BPTT sequence length | ||
--dropout DROPOUT dropout applied to layers (0 = no dropout) | ||
--decay DECAY learning rate decay per epoch | ||
--tied tie the word embedding and softmax weights | ||
--seed SEED random seed | ||
--cuda use CUDA | ||
--log-interval N report interval | ||
--save SAVE path to save the final model | ||
--onnx-export path to export the final model in onnx format | ||
--transformer_head N the number of heads in the encoder/decoder of the transformer model | ||
--transformer_encoder_layers N the number of layers in the encoder of the transformer model | ||
--transformer_decoder_layers N the number of layers in the decoder of the transformer model | ||
--transformer_d_ff N the number of nodes on the hidden layer in feed forward nn | ||
``` | ||
|
||
With these arguments, a variety of models can be tested. | ||
As an example, the following arguments produce slower but better models: | ||
|
||
```bash | ||
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 | ||
python main.py --cuda --emsize 650 --nhid 650 --dropout 0.5 --epochs 40 --tied | ||
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 | ||
python main.py --cuda --emsize 1500 --nhid 1500 --dropout 0.65 --epochs 40 --tied | ||
python main.py --cuda --epochs 20 --model Transformer --optim fromage --lr 0.01 --p_bound 1.0 | ||
``` | ||
We provide the shell script `batch.sh` to run multiple experiments. |