Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hotfix] gpt example titans bug #2493 #2494

Merged
merged 12 commits into from
Jan 18, 2023
Prev Previous commit
Next Next commit
polish code
  • Loading branch information
feifeibear committed Jan 16, 2023
commit 38424db6ffbdafbc9e65a1802640202a07e53c50
48 changes: 48 additions & 0 deletions examples/language/gpt/titans/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Run GPT With Colossal-AI

## How to Prepare Webtext Dataset

You can download the preprocessed sample dataset for this demo via our [Google Drive sharing link](https://drive.google.com/file/d/1QKI6k-e2gJ7XgS8yIpgPPiMmwiBP_BPE/view?usp=sharing).


You can also avoid dataset preparation by using `--use_dummy_dataset` during running.

## Run this Demo

Use the following commands to install prerequisites.

```bash
# assuming using cuda 11.3
pip install -r requirements.txt
```

Use the following commands to execute training.

```Bash
#!/usr/bin/env sh
# if you want to use real dataset, then remove --use_dummy_dataset
# export DATA=/path/to/small-gpt-dataset.json'

# run on a single node
colossalai run --nproc_per_node=<num_gpus> train_gpt.py --config configs/<config_file> --from_torch

# run on multiple nodes with slurm
colossalai run --nproc_per_node=<num_gpus> \
--master_addr <hostname> \
--master_port <port-number> \
--hosts <list-of-hostname-separated-by-comma> \
train_gpt.py \
--config configs/<config_file> \
--from_torch \
--use_dummy_dataset

# run on multiple nodes with slurm
srun python \
train_gpt.py \
--config configs/<config_file> \
--host <master_node> \
--use_dummy_dataset

```

You can set the `<config_file>` to any file in the `configs` folder. To simply get it running, you can start with `gpt_small_zero3_pp1d.py` on a single node first. You can view the explanations in the config file regarding how to change the parallel setting.