GPT2 ZeRO benchmark with data parallelism to evaluate Colossal-AI, DeepSpeed, FairScale and PatrickStar.
CUDA>=11.3
torch>=1.10.0
deepspeed>=0.5.8
fairscale>=0.4.5
patrickstar>=0.4.6
nvidia-dali>=1.8.0
- Install dependencies if you do not have them
pip install -r requirement.txt
- Also, download PatrickStar from github
https://github.com/Tencent/PatrickStar.git
- Install PatrickStar
cd PatrickStar
pip install .
- Add root dir into PYTHONPATH
export PYTHONPATH=$(dirname "$PWD"):$PYTHONPATH
-
Prepare datasets and tokenizers from HuggingFace Hub if necessary (e.g. we provide an example of training
wikitext-2
). -
Run benchmark with one of the systems to evaluate
DATA=/PATH/TO/DATASET TOKENIZER=/PATH/TO/TOKENIZER LOG=/PATH/TO/LOG torchrun --nproc_per_node=NUM_GPUS run.py --config=CONFIG_FILE
-
Prepare ImageNet-1k datasets (TFrecord version).
-
Run benchmark with one of the systems to evaluate
DATA=/PATH/TO/DATASET LOG=/PATH/TO/LOG torchrun --nproc_per_node=NUM_GPUS run.py --config=CONFIG_FILE