Russian GPT trained with 2048 context length (ruGPT3XL) with sparse attention, Russian GPT trained with 2048 context length (ruGPT3Large), Russian GPT Medium trained with context 2048 (ruGPT3Medium), Russian GPT Small trained with context 2048 (ruGPT3Small) and Russian GPT2 large (ruGPT2Large) trained with 1024 context length.
We suggest using ruGPT2Large or ruGPT3XL because this models are well tested and achieve the best perplexity.
Examples here
Table of contents
The organizers gave participants the opportunity to get access to Cristofari by SberCloud.
Models can be used for inference or finetuning with two ways: 🤗HuggingFace interface or our code based on this implementation.
For both ways install transformers:
pip install transformers==3.5.0
We support 🤗HuggingFace interface only for ruGPT3Large, ruGPT3Medium, ruGPT3Small and ruGPT2Large models. For RuGPT3XL please use code in this repo because RuGPT3XL model was trained with sparse attention.
Here we can obtain examples of finetuning or generation.
Also this examples is adapted for google colab:
Basic usage:
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_name_or_path = "sberbank-ai/rugpt3large_based_on_gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name_or_path)
model = GPT2LMHeadModel.from_pretrained(model_name_or_path).cuda()
text = "Александр Сергеевич Пушкин родился в "
input_ids = tokenizer.encode(text, return_tensors="pt").cuda()
out = model.generate(input_ids.cuda())
generated_text = list(map(tokenizer.decode, out))[0]
print(generated_text)
# Output should be like this:
# Александр Сергеевич Пушкин родился в \n1799 году. Его отец был крепостным крестьянином, а мать – крепостной крестьянкой. Детство и юность Пушкина прошли в деревне Михайловское под Петербургом. В 1820-х годах семья переехала
For more information about 🤗HuggingFace interface please follow this documentation.
For training pass single txt file.
For using our code for finetuning without deepspeed (not recommended) we should install apex:
%%writefile setup.sh
export CUDA_HOME=/usr/local/cuda-10.1
git clone https://github.com/NVIDIA/apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./apex
sh setup.sh
Example of finetuning, generating and loading/convert megatron checkpoints here or
Note! This way is valid for all RuGPTs models except RuGPT3XL.
For using our code for finetuning with deepspeed (recommended) we should install apex (see previous section) and deepspeed:
pip install deepspeed==0.3.7
Example of finetuning, generating and loading/convert megatron checkpoints here or
Note! For using deepspeed we should specify environ variable before all your python scripts and run with torch.distributed or mpi:
USE_DEEPSPEED=1 python -m torch.distributed.launch --nproc_per_node 1 ru-gpts/pretrain_gpt3.py \
--train-data-path "train.list" \
--test-data-path "valid.list" \
--max-files-per-process 100 \
--save model \
--load-huggingface sberbank-ai/rugpt3small_based_on_gpt2 \
--model-parallel-size 1 \
--num-layers 12 \
--hidden-size 768 \
--num-attention-heads 12 \
--seq-length 2048 \
--max-position-embeddings 2048 \
--fp16 \
--checkpoint-activations \
--deepspeed-activation-checkpointing \
--deepspeed \
--deepspeed_config ru-gpts/src/deepspeed_config/gpt3_small_2048.json
We use custom implementation of distributed dataset. For training and evaluating we should specify file file.list
with list of paths to txt files. All files from file.list
will be splitted between aviable GPUs. The logic of splitting is described by the following code:
shard_size = len(files) // world_size
shard_start = rank * shard_size
shard_end = (rank + 1) * shard_size
files = files[shard_start:shard_end]
For more details please see full code of dataset: src.dataset_rugpt3.RuGpt3TextDataset
and example.
Note! This way is valid for all RuGPTs models except RuGPT3XL.
This section is used mostly for usage of RuGPT3XL model and training models with sparse attention.
apt-get install llvm-9-dev
pip install cpufeature
pip install triton==0.2.3
DS_BUILD_CPU_ADAM=1 DS_BUILD_SPARSE_ATTN=1 pip install deepspeed==0.3.7
Test installation of deepspeed you can with the following command: ds_report
.
Example of inference of RuGPT3XL here or
Also we add pretraining scripts for all models (except RuGPT2Large). See scripts dir.
Note! All training params (such as lr, wd, ...) may was different while real training. This is just for example.