https://github.com/jingyaogong/minimind
预训练:
python moe_train.py
SFT:
python moe_sft_train.py
预训练:
torchrun --nproc_per_node=2 moe_train.py
SFT:
torchrun --nproc_per_node=2 moe_sft_train.py
预训练:
deepspeed --include 'localhost:0,1' moe_train.py
SFT:
deepspeed --include 'localhost:0,1' moe_sft_train.py
python moe_test.py