Skip to content

Commit

Permalink
Merge pull request #152 from thammegowda/patch-2
Browse files Browse the repository at this point in the history
Update README w/ command for distributed training
  • Loading branch information
StillKeepTry authored Jun 28, 2020
2 parents 208ead5 + d164b8b commit 48c119d
Showing 1 changed file with 9 additions and 0 deletions.
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,15 @@ valid_en-fr_mt_bleu -> 7.81
test_fr-en_mt_bleu -> 11.72
test_en-fr_mt_bleu -> 8.80
```
#### Distributed Training

To use *multiple GPUs* e.g. 3 GPUs **on same node**
```
export NGPU=3; CUDA_VISIBLE_DEVICES=0,1,2 python -m torch.distributed.launch --nproc_per_node=$NGPU train.py [...args]
```
To use *multiple GPUS* across **many nodes**, use Slurm to request multi-node job and launch the above command.
The code automatically detects the SLURM_* environment vars to distribute the training.


### Fine-tuning
After pre-training, we use back-translation to fine-tune the pre-trained model on unsupervised machine translation:
Expand Down

0 comments on commit 48c119d

Please sign in to comment.