Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch merge #411

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1b214d1
stash initial changes for now
jacob-morrison Aug 16, 2024
6178897
stash
jacob-morrison Aug 16, 2024
cc5670f
.
jacob-morrison Aug 17, 2024
d3ccf4a
.
jacob-morrison Aug 17, 2024
f9e0319
.
jacob-morrison Aug 17, 2024
4c161e1
.
jacob-morrison Aug 17, 2024
f282a3c
.
jacob-morrison Aug 18, 2024
488b71b
.
jacob-morrison Aug 18, 2024
623e68d
.
jacob-morrison Aug 18, 2024
758b2a9
.
jacob-morrison Aug 18, 2024
8b68012
fix
jacob-morrison Aug 18, 2024
09d6834
.
jacob-morrison Aug 18, 2024
baaa375
.
jacob-morrison Aug 18, 2024
e0b9a84
.
jacob-morrison Aug 18, 2024
88b1656
.
jacob-morrison Aug 18, 2024
32739a4
.
jacob-morrison Aug 18, 2024
f516abc
.
jacob-morrison Aug 19, 2024
f97c4d2
.
jacob-morrison Aug 19, 2024
ef11e70
.
jacob-morrison Aug 19, 2024
8250980
.
jacob-morrison Aug 19, 2024
b704fde
.
jacob-morrison Aug 19, 2024
8404b6c
.
jacob-morrison Aug 21, 2024
691143b
.
jacob-morrison Aug 21, 2024
1045794
Merge branch 'main' into batch-merge
jacob-morrison Sep 9, 2024
bbe7648
add
jacob-morrison Sep 10, 2024
45cabbb
.
jacob-morrison Sep 20, 2024
ace26b0
.
jacob-morrison Sep 20, 2024
7e7e1c1
test
jacob-morrison Sep 20, 2024
29a7a95
,
jacob-morrison Sep 20, 2024
9b29228
fix
jacob-morrison Sep 20, 2024
d07a819
test
jacob-morrison Sep 20, 2024
1812c40
Merge branch 'main' into batch-merge
jacob-morrison Oct 28, 2024
47fb938
push new commits
jacob-morrison Oct 29, 2024
c33ab5a
Merge branch 'main' into batch-merge
jacob-morrison Oct 29, 2024
9a13d8b
changes to support weka (rough draft for now)
jacob-morrison Oct 29, 2024
5aa6267
changes
jacob-morrison Oct 30, 2024
f4bbe02
update merge configs
jacob-morrison Oct 30, 2024
6377335
committing changes
jacob-morrison Nov 1, 2024
c5b9c0f
update
jacob-morrison Nov 3, 2024
67d05a4
update
jacob-morrison Nov 5, 2024
205c2f6
final configs
jacob-morrison Nov 5, 2024
bc2aec8
update
jacob-morrison Nov 13, 2024
f002136
Merge branch 'main' into batch-merge
jacob-morrison Nov 13, 2024
a0fc16f
update my branch with garbo
jacob-morrison Nov 17, 2024
9c0e769
dumping changes, not necessary for release
jacob-morrison Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update my branch with garbo
  • Loading branch information
jacob-morrison committed Nov 17, 2024
commit a0fc16f3f97dbf2ba3cfc93130e0f8a8477b5d1a
2 changes: 1 addition & 1 deletion configs/beaker_configs/default_dpo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ tasks:
command: [
'/bin/sh', '-c'
]
arguments: ['PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
arguments: ['pip install --upgrade transformers && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 1
--num_processes 4
Expand Down
8 changes: 4 additions & 4 deletions configs/beaker_configs/default_finetune_multinode.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ tasks:
'/bin/sh', '-c'
]
arguments: ['
unset CUDA_LAUNCH_BLOCKING && PYTHONPATH="/stage:$PYTHONPATH" pip install git+https://github.com/vwxyzjn/transformers.git@olmo1124_classification && accelerate launch
unset CUDA_LAUNCH_BLOCKING && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 4
--num_processes 32
Expand Down Expand Up @@ -66,9 +66,9 @@ tasks:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
- mountPath: /model
source:
beaker: jacobm/llama-3.1-8b
# - mountPath: /model
# source:
# beaker: jacobm/llama-3.1-8b
result:
path: /output
resources:
Expand Down
78 changes: 78 additions & 0 deletions configs/beaker_configs/default_finetune_multinode_olmo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
version: v2
description: open-instruct-finetune-multinode
budget: ai2/oe-adapt
tasks:
- name: open-instruct-finetune-multinode
replicas: 4
leaderSelection: true
hostNetworking: true
propagateFailure: true
propagatePreemption: true
synchronizedStartTimeout: 60m
image:
beaker: nathanl/open_instruct_auto
command: [
'/bin/sh', '-c'
]
arguments: ['
unset CUDA_LAUNCH_BLOCKING && pip install git+https://github.com/vwxyzjn/transformers.git@olmo1124_classification && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 4
--num_processes 32
--machine_rank $BEAKER_REPLICA_RANK
--main_process_ip $BEAKER_LEADER_REPLICA_HOSTNAME
--main_process_port 29400
--use_deepspeed
--deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf
--deepspeed_multinode_launcher standard
open_instruct/finetune.py
--model_name_or_path meta-llama/Meta-Llama-3-8B
--tokenizer_name meta-llama/Meta-Llama-3-8B
--use_slow_tokenizer
--use_flash_attn
--max_seq_length 4096
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--gradient_accumulation_steps 4
--learning_rate 5e-6
--lr_scheduler_type linear
--warmup_ratio 0.03
--weight_decay 0.
--num_train_epochs 2
--output_dir /output/
--with_tracking
--report_to tensorboard
--logging_steps 1
--reduce_loss sum
']
envVars:
- name: CUDA_DEVICE_ORDER
value: PCI_BUS_ID
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
value: false
- name: WANDB_LOG_MODEL
value: false
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
# - mountPath: /model
# source:
# beaker: jacobm/llama-3.1-8b
result:
path: /output
resources:
gpuCount: 8
context:
priority: normal
preemptible: true
78 changes: 78 additions & 0 deletions configs/beaker_configs/default_finetune_multinode_olmoe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
version: v2
description: open-instruct-finetune-multinode
budget: ai2/oe-adapt
tasks:
- name: open-instruct-finetune-multinode
replicas: 4
leaderSelection: true
hostNetworking: true
propagateFailure: true
propagatePreemption: true
synchronizedStartTimeout: 60m
image:
beaker: nathanl/open_instruct_auto
command: [
'/bin/sh', '-c'
]
arguments: ['
unset CUDA_LAUNCH_BLOCKING && pip install --upgrade transformers && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 4
--num_processes 32
--machine_rank $BEAKER_REPLICA_RANK
--main_process_ip $BEAKER_LEADER_REPLICA_HOSTNAME
--main_process_port 29400
--use_deepspeed
--deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf
--deepspeed_multinode_launcher standard
open_instruct/finetune.py
--model_name_or_path meta-llama/Meta-Llama-3-8B
--tokenizer_name meta-llama/Meta-Llama-3-8B
--use_slow_tokenizer
--use_flash_attn
--max_seq_length 4096
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--gradient_accumulation_steps 4
--learning_rate 5e-6
--lr_scheduler_type linear
--warmup_ratio 0.03
--weight_decay 0.
--num_train_epochs 2
--output_dir /output/
--with_tracking
--report_to tensorboard
--logging_steps 1
--reduce_loss sum
']
envVars:
- name: CUDA_DEVICE_ORDER
value: PCI_BUS_ID
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
value: false
- name: WANDB_LOG_MODEL
value: false
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
# - mountPath: /model
# source:
# beaker: jacobm/llama-3.1-8b
result:
path: /output
resources:
gpuCount: 8
context:
priority: normal
preemptible: true
65 changes: 65 additions & 0 deletions configs/beaker_configs/default_finetune_olmo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
version: v2
description: open-instruct-finetune
budget: ai2/oe-adapt
tasks:
- name: open-instruct-finetune
image:
beaker: nathanl/open_instruct_auto
command: [
'/bin/sh', '-c'
]
arguments: ['pip install git+https://github.com/vwxyzjn/transformers.git@olmo1124_classification && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 1
--num_processes 4
--use_deepspeed
--deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf
open_instruct/finetune.py
--model_name_or_path /hf_llama_models
--use_flash_attn
--max_seq_length 2048
--preprocessing_num_workers 16
--per_device_train_batch_size 2
--gradient_accumulation_steps 16
--learning_rate 2e-5
--lr_scheduler_type linear
--warmup_ratio 0.03
--weight_decay 0.
--num_train_epochs 2
--output_dir /output/
--with_tracking
--report_to tensorboard
--logging_steps 1
']
envVars:
- name: CUDA_DEVICE_ORDER
value: PCI_BUS_ID
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
value: false
- name: WANDB_LOG_MODEL
value: false
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
- mountPath: /oe-training-default
source:
weka: oe-training-default
result:
path: /output
resources:
gpuCount: 4
context:
cluster: ai2/allennlp-cirrascale
priority: high
preemptible: false
25 changes: 18 additions & 7 deletions configs/merge_configs/my-merge-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,26 @@
merge_method: linear
normalize: true
models:
- name: llama-3.1-8b-resized
location: huggingface
path: ai2-adapt-dev/llama-3.1-8b-resized
weight: 0.5
- name: L3.1-8B-v3.9-nc-fixed-soup-best_2
# - name: llama-3.1-8b-resized
# location: huggingface
# path: ai2-adapt-dev/llama-3.1-8b-resized
# weight: 0.5
# - name: L3.1-8B-v3.9-nc-fixed-soup-best_2
# location: weka
# path: /oe-adapt-default/jacobm/tulu-3-dev/checkpoints/base_models/L3.1-8B-v3.9-nc-fixed-best_2/
# wekaBucket: "oe-adapt-default"
# weight: 0.5

- name: gsm_math_if_valpy_best_overall_avg_8b_beta0.05-step_200
location: weka
path: /oe-adapt-default/hamishi/model_checkpoints/gsm_math_if_valpy_best_overall_avg_8b_beta0.05_checkpoints/step_200/
wekaBucket: "oe-adapt-default"
weight: 1.0
- name: gsm_math_if_valpy_best_and_if_avg_8b_beta0.05-step_200
location: weka
path: /oe-adapt-default/jacobm/tulu-3-dev/checkpoints/base_models/L3.1-8B-v3.9-nc-fixed-best_2/
path: /oe-adapt-default/hamishi/model_checkpoints/gsm_math_if_valpy_best_and_if_avg_8b_beta0.05_checkpoints/step_200/
wekaBucket: "oe-adapt-default"
weight: 0.5
weight: 1.0
# - name: L3.1-8B-v3.9-nc-fixed-2
# location: weka
# path: /oe-adapt-default/jacobm/tulu-3-dev/checkpoints/base_models/L3.1-8B-v3.9-nc-fixed-2/
Expand Down
37 changes: 37 additions & 0 deletions configs/train_configs/dpo/olmoe_dpo_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
model_name_or_path: /model
tokenizer_name: /model
model_revision: main
use_flash_attn: true
gradient_checkpointing: true
dataset_mixer:
# ai2-adapt-dev/sft_v3.9_used_off_policy: 1.0
# ai2-adapt-dev/sft_v3.9_used_on_policy_large_70b_ckpt: 1.0
# ai2-adapt-dev/DaringAnteater-prefs-RM-filter-uf-pipeline-regen-v3.9_large_70b_ckpt: 1.0
# ai2-adapt-dev/WildChat-prefs-280824-uf-pipeline-regen-v3.9_large_70b_ckpt: 1.0
# ai2-adapt-dev/Llama-3.1-if_taxonomy_tulu-uf-pipeline-regen-v3.9_large_70b_ckpt: 1.0
ai2-adapt-dev/wildchat_v3.9_unused_off_policy: 1.0

ai2-adapt-dev/sft_v3.9_used_p0_olmoe-1b-7b: 1.0
ai2-adapt-dev/sft_v3.9_used_p1_olmoe-1b-7b: 1.0
ai2-adapt-dev/daring_anteater_olmoe-1b-7b: 1.0
ai2-adapt-dev/wildchat-prefs-280824_olmoe-1b-7b: 1.0
ai2-adapt-dev/llama3.1-if_taxonomy_tulu_olmoe-1b-7b: 1.0
use_slow_tokenizer: true
max_seq_length: 2048
preprocessing_num_workers: 16
per_device_train_batch_size: 2
gradient_accumulation_steps: 8 # designed for 8 GPUs, so batch size 128
learning_rate: 5.0e-7
lr_scheduler_type: linear
warmup_ratio: 0.1
weight_decay: 0.0
num_train_epochs: 1
output_dir: /output
with_tracking: true
report_to:
- wandb
logging_steps: 1
use_lora: false
dpo_loss_type: dpo_norm
dpo_beta: 5
checkpointing_steps: 1000
Loading