bert

BERT MODEL

1. Training setup

To run the model using a docker container run it as follows

PYTORCH_IMAGE=nvcr.io/nvidia/pytorch:24.01-py3
CHECKPOINT_PATH="" #<Specify path>
TENSORBOARD_LOGS_PATH=""#<Specify path>
VOCAB_FILE="" #<Specify path to file>//bert-vocab.txt
DATA_PATH="" #<Specify path and file prefix>_text_document

docker run \
  --gpus=all \
  --ipc=host \
  --workdir /workspace/megatron-lm \
  -v /path/to/data:/path/to/data \
  -v /path/to/megatron-lm:/workspace/megatron-lm \
  megatron-lm nvcr.io/nvidia/pytorch:24.01-py3 \
  bash examples/bert/train_bert_340m_distributed.sh $CHECKPOINT_PATH $TENSORBOARD_LOGS_PATH $VOCAB_FILE $DATA_PATH "

NOTE: Depending on the environment you are running it the above command might like slightly different.

2. Configurations

The example in this folder shows you how to run 340m large model. There are other configs you could run as well

4B

       --num-layers 48 \
       --hidden-size 2560 \
       --num-attention-heads 32 \
       --tensor-model-parallel-size 1 \
       --pipeline-model-parallel-size 1 \

20B

       --num-layers 48 \
       --hidden-size 6144 \
       --num-attention-heads 96 \
       --tensor-model-parallel-size 4 \
       --pipeline-model-parallel-size 4 \

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
train_bert_340m_distributed.sh		train_bert_340m_distributed.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bert

bert

README.md

BERT MODEL

Table of contents

1. Training setup

2. Configurations

4B

20B

Files

bert

Directory actions

More options

Directory actions

More options

Latest commit

History

bert

Folders and files

parent directory

README.md

BERT MODEL

Table of contents

1. Training setup

2. Configurations

4B

20B