Skip to content

Commit

Permalink
Fixed GPT-125m
Browse files Browse the repository at this point in the history
Hugginface doesn't have 125M, instead has 125m
  • Loading branch information
aksj98 authored Jul 6, 2023
1 parent 3940306 commit 120b42e
Showing 1 changed file with 30 additions and 30 deletions.
60 changes: 30 additions & 30 deletions biencoder/nli_msmarco/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,18 +70,18 @@ accelerate config
accelerate launch examples/training/msmarco/training_nli_v2.py --modelname bert-large-uncased
```

Training of `SGPT-125M-mean-nli` on 1 24GiB GPU:
Training of `SGPT-125m-mean-nli` on 1 24GiB GPU:

```bash
accelerate config
accelerate launch --main_process_port 1469 examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125M --pooling mean
accelerate launch --main_process_port 1469 examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125m --pooling mean
```

Training of `SGPT-125M-weightedmean-nli` on 1 24GiB GPU::
Training of `SGPT-125m-weightedmean-nli` on 1 24GiB GPU::

```bash
accelerate config
accelerate launch --main_process_port 1469 examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125M --pooling weightedmean
accelerate launch --main_process_port 1469 examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125m --pooling weightedmean
```

Training of `SGPT-1.3B-mean-nli` on 8 40GiB GPUs::
Expand All @@ -100,25 +100,25 @@ accelerate config
accelerate launch --main_process_port 1469 examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-1.3B --train_batch_size 6 --lr 1e-5 --pooling weightedmean
```

Training of `SGPT-125M-mean-nli-linear5` on 4 40GiB GPUs:
Training of `SGPT-125m-mean-nli-linear5` on 4 40GiB GPUs:

```bash
accelerate config
accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125M --freeze --addxlinear 5 --wandb --useact
accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125m --freeze --addxlinear 5 --wandb --useact
```

Training of `SGPT-125M-mean-nli-linearthenpool5` on 4 40GiB GPUs:
Training of `SGPT-125m-mean-nli-linearthenpool5` on 4 40GiB GPUs:

```bash
accelerate config
accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125M --freeze --addxlinear 5 --linearthenpool --wandb --useact
accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125m --freeze --addxlinear 5 --linearthenpool --wandb --useact
```

Training of `SGPT-125M-weightedmean-nli-linearthenpool5` on 4 40GiB GPUs:
Training of `SGPT-125m-weightedmean-nli-linearthenpool5` on 4 40GiB GPUs:

```bash
accelerate config
accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125M --freeze --addxlinear 5 --linearthenpool --wandb --pooling weightedmean --useact
accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125m --freeze --addxlinear 5 --linearthenpool --wandb --pooling weightedmean --useact
```

Training of `SGPT-1.3B-mean-nli-linearthenpool5` on 4 40GiB GPUs:
Expand Down Expand Up @@ -155,13 +155,13 @@ accelerate launch examples/training/nli/training_nli_v2.py --model_name Eleuther
Models with larger batch size (These are the ones used in the paper for the most part). The models use GradCache, a technique for gradient accumulation with contrastive learning.


Training of `SGPT-125M-weightedmean-nli-bitfit` on 8 40GiB GPUs:
Training of `SGPT-125m-weightedmean-nli-bitfit` on 8 40GiB GPUs:

```bash
accelerate config
cd sentence-transformers
accelerate config
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125M --freezenonbias --train_batch_size 128 --lr 32e-5 --pooling weightedmean --wandb --wandbwatchlog gradients --gradcache --chunksize 128
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch examples/training/nli/training_nli_v2.py --model_name EleutherAI/gpt-neo-125m --freezenonbias --train_batch_size 128 --lr 32e-5 --pooling weightedmean --wandb --wandbwatchlog gradients --gradcache --chunksize 128
```


Expand Down Expand Up @@ -224,57 +224,57 @@ accelerate config
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2223 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name bert-base-uncased --train_batch_size 32 --freezenonbias --asym --wandb --wandbwatchlog gradients
```

Training of `SGPT-125M-weightedmean-msmarco` on 2 40GiB GPUs:
Training of `SGPT-125m-weightedmean-msmarco` on 2 40GiB GPUs:

```bash
accelerate config
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --wandb --wandbwatchlog gradients
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --wandb --wandbwatchlog gradients
```

Training of `SGPT-125M-weightedmean-msmarco-asym` on 2 40GiB GPUs:
Training of `SGPT-125m-weightedmean-msmarco-asym` on 2 40GiB GPUs:

```bash
accelerate config
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --asym --wandb --wandbwatchlog gradients
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --asym --wandb --wandbwatchlog gradients
```

Training of `SGPT-125M-weightedmean-msmarco-bitfit` on 2 40GiB GPUs:
Training of `SGPT-125m-weightedmean-msmarco-bitfit` on 2 40GiB GPUs:

```bash
accelerate config
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --freezenonbias --lr 2e-4 --wandb --wandbwatchlog gradients
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --freezenonbias --lr 2e-4 --wandb --wandbwatchlog gradients
```


Training of `SGPT-125M-weightedmean-msmarco-bitfit` on 2 40GiB GPUs:
Training of `SGPT-125m-weightedmean-msmarco-bitfit` on 2 40GiB GPUs:

```bash
accelerate config
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --main_process_port 2225 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --freezenonbias --lr 8e-4 --wandb --wandbwatchlog gradients
CUDA_VISIBLE_DEVICES=0,1 accelerate launch --main_process_port 2225 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --freezenonbias --lr 8e-4 --wandb --wandbwatchlog gradients
```

Training of `SGPT-125M-weightedmean-msmarco-speca-bitfit` on 2 40GiB GPUs:
Training of `SGPT-125m-weightedmean-msmarco-speca-bitfit` on 2 40GiB GPUs:

```bash
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --wandb --wandbwatchlog gradients --speca --pooling weightedmean
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --wandb --wandbwatchlog gradients --speca --pooling weightedmean
```

Training of `SGPT-125M-lasttoken-msmarco-speca-bitfit` on 2 40GiB GPUs:
Training of `SGPT-125m-lasttoken-msmarco-speca-bitfit` on 2 40GiB GPUs:

```bash
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 2225 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --wandb --wandbwatchlog gradients --speca --pooling lasttoken
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 2225 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --wandb --wandbwatchlog gradients --speca --pooling lasttoken
```

Training of `SGPT-125M-weightedmean-msmarco-specb-bitfit` on 2 40GiB GPUs:
Training of `SGPT-125m-weightedmean-msmarco-specb-bitfit` on 2 40GiB GPUs:

```bash
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --wandb --wandbwatchlog gradients --specb --pooling weightedmean
CUDA_VISIBLE_DEVICES=4,5 accelerate launch --main_process_port 2224 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --wandb --wandbwatchlog gradients --specb --pooling weightedmean
```

Training of `SGPT-125M-lasttoken-msmarco-specb-bitfit` on 2 40GiB GPUs::
Training of `SGPT-125m-lasttoken-msmarco-specb-bitfit` on 2 40GiB GPUs::

```bash
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 2225 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --wandb --wandbwatchlog gradients --specb --pooling lasttoken
CUDA_VISIBLE_DEVICES=6,7 accelerate launch --main_process_port 2225 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --wandb --wandbwatchlog gradients --specb --pooling lasttoken
```

Training of `SGPT-1.3B-weightedmean-msmarco-specb-bitfit` on 8 40GiB GPUs:
Expand Down Expand Up @@ -313,12 +313,12 @@ If unspecified in the arguments, batch size is always 64 & lr is 2e-5 (argparse

Models with larger batch size (These are the ones used in the paper for the most part). The models use GradCache, a technique for gradient accumulation with contrastive learning.

Training of `SGPT-125M-weightedmean-msmarco-specb-bitfit` on 8 40GiB GPUs:
Training of `SGPT-125m-weightedmean-msmarco-specb-bitfit` on 8 40GiB GPUs:

```
cd sentence-transformers
accelerate config
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --main_process_port 2222 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125M --train_batch_size 32 --eval_batch_size 16 --freezenonbias --specb --lr 4e-4 --wandb --wandbwatchlog gradients --pooling weightedmean --gradcache --chunksize 32
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch --main_process_port 2222 examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name EleutherAI/gpt-neo-125m --train_batch_size 32 --eval_batch_size 16 --freezenonbias --specb --lr 4e-4 --wandb --wandbwatchlog gradients --pooling weightedmean --gradcache --chunksize 32
```

Training of `SGPT-1.3B-weightedmean-msmarco-specb-bitfit` on 8 40GiB GPUs:
Expand Down

0 comments on commit 120b42e

Please sign in to comment.