Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch merge #411

Draft
wants to merge 45 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1b214d1
stash initial changes for now
jacob-morrison Aug 16, 2024
6178897
stash
jacob-morrison Aug 16, 2024
cc5670f
.
jacob-morrison Aug 17, 2024
d3ccf4a
.
jacob-morrison Aug 17, 2024
f9e0319
.
jacob-morrison Aug 17, 2024
4c161e1
.
jacob-morrison Aug 17, 2024
f282a3c
.
jacob-morrison Aug 18, 2024
488b71b
.
jacob-morrison Aug 18, 2024
623e68d
.
jacob-morrison Aug 18, 2024
758b2a9
.
jacob-morrison Aug 18, 2024
8b68012
fix
jacob-morrison Aug 18, 2024
09d6834
.
jacob-morrison Aug 18, 2024
baaa375
.
jacob-morrison Aug 18, 2024
e0b9a84
.
jacob-morrison Aug 18, 2024
88b1656
.
jacob-morrison Aug 18, 2024
32739a4
.
jacob-morrison Aug 18, 2024
f516abc
.
jacob-morrison Aug 19, 2024
f97c4d2
.
jacob-morrison Aug 19, 2024
ef11e70
.
jacob-morrison Aug 19, 2024
8250980
.
jacob-morrison Aug 19, 2024
b704fde
.
jacob-morrison Aug 19, 2024
8404b6c
.
jacob-morrison Aug 21, 2024
691143b
.
jacob-morrison Aug 21, 2024
1045794
Merge branch 'main' into batch-merge
jacob-morrison Sep 9, 2024
bbe7648
add
jacob-morrison Sep 10, 2024
45cabbb
.
jacob-morrison Sep 20, 2024
ace26b0
.
jacob-morrison Sep 20, 2024
7e7e1c1
test
jacob-morrison Sep 20, 2024
29a7a95
,
jacob-morrison Sep 20, 2024
9b29228
fix
jacob-morrison Sep 20, 2024
d07a819
test
jacob-morrison Sep 20, 2024
1812c40
Merge branch 'main' into batch-merge
jacob-morrison Oct 28, 2024
47fb938
push new commits
jacob-morrison Oct 29, 2024
c33ab5a
Merge branch 'main' into batch-merge
jacob-morrison Oct 29, 2024
9a13d8b
changes to support weka (rough draft for now)
jacob-morrison Oct 29, 2024
5aa6267
changes
jacob-morrison Oct 30, 2024
f4bbe02
update merge configs
jacob-morrison Oct 30, 2024
6377335
committing changes
jacob-morrison Nov 1, 2024
c5b9c0f
update
jacob-morrison Nov 3, 2024
67d05a4
update
jacob-morrison Nov 5, 2024
205c2f6
final configs
jacob-morrison Nov 5, 2024
bc2aec8
update
jacob-morrison Nov 13, 2024
f002136
Merge branch 'main' into batch-merge
jacob-morrison Nov 13, 2024
a0fc16f
update my branch with garbo
jacob-morrison Nov 17, 2024
9c0e769
dumping changes, not necessary for release
jacob-morrison Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ RUN pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url h
RUN pip install packaging
RUN pip install flash-attn==2.6.3 --no-build-isolation
RUN pip install -r requirements.txt
RUN pip install git+https://github.com/arcee-ai/mergekit.git

# NLTK download
RUN python -m nltk.downloader punkt
Expand Down
6 changes: 3 additions & 3 deletions configs/beaker_configs/default_dpo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ tasks:
command: [
'/bin/sh', '-c'
]
arguments: ['PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
arguments: ['pip install --upgrade transformers && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 1
--num_processes 4
Expand Down Expand Up @@ -37,7 +37,7 @@ tasks:
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
Expand All @@ -47,7 +47,7 @@ tasks:
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
Expand Down
11 changes: 7 additions & 4 deletions configs/beaker_configs/default_eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,19 @@ tasks:
- name: WANDB_DISABLED
value: true
- name: OPENAI_API_KEY
secret: openai_api_key
secret: jacobm_OPENAI_API_KEY
- name: HF_TOKEN
secret: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /data/
- mountPath: /oe-adapt-default
source:
beaker: hamishivi/open-instruct-eval-data
weka: oe-adapt-default
- mountPath: /model
source:
beaker: 01GVYXDGJC6DV0JW9JZ16YM07G
- mountPath: /data/
source:
beaker: hamishivi/open-instruct-eval-data
- mountPath: /net/nfs.cirrascale
source:
hostPath: /net/nfs.cirrascale
Expand Down
7 changes: 5 additions & 2 deletions configs/beaker_configs/default_finetune.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ tasks:
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
Expand All @@ -47,11 +47,14 @@ tasks:
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
- mountPath: /oe-training-default
source:
weka: oe-training-default
result:
path: /output
resources:
Expand Down
7 changes: 5 additions & 2 deletions configs/beaker_configs/default_finetune_multinode.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ tasks:
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
Expand All @@ -61,11 +61,14 @@ tasks:
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
# - mountPath: /model
# source:
# beaker: jacobm/llama-3.1-8b
result:
path: /output
resources:
Expand Down
128 changes: 128 additions & 0 deletions configs/beaker_configs/default_finetune_multinode_augusta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
version: v2
description: open-instruct-finetune-multinode
budget: ai2/oe-adapt
tasks:
- name: open-instruct-finetune-multinode
replicas: 4
leaderSelection: true
hostNetworking: true
propagateFailure: true
propagatePreemption: true
synchronizedStartTimeout: 60m
image:
beaker: nathanl/open_instruct_auto
command: [
'/bin/sh', '-c'
]
arguments: ['
unset CUDA_LAUNCH_BLOCKING && export LD_LIBRARY_PATH=/var/lib/tcpxo/lib64:${LD_LIBRARY_PATH} && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 4
--num_processes 32
--machine_rank $BEAKER_REPLICA_RANK
--main_process_ip $BEAKER_LEADER_REPLICA_HOSTNAME
--main_process_port 29400
--use_deepspeed
--deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf
--deepspeed_multinode_launcher standard
open_instruct/finetune.py
--model_name_or_path meta-llama/Meta-Llama-3-8B
--tokenizer_name meta-llama/Meta-Llama-3-8B
--use_slow_tokenizer
--use_flash_attn
--max_seq_length 4096
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--gradient_accumulation_steps 4
--learning_rate 5e-6
--lr_scheduler_type linear
--warmup_ratio 0.03
--weight_decay 0.
--num_train_epochs 2
--output_dir /output/
--with_tracking
--report_to tensorboard
--logging_steps 1
--reduce_loss sum
']
envVars:
- name: CUDA_DEVICE_ORDER
value: PCI_BUS_ID
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
value: false
- name: WANDB_LOG_MODEL
value: false
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: jacobm_HF_TOKEN
- name: NCCL_CROSS_NIC
value: 0
- name: NCCL_ALGO
value: Ring,Tree
- name: NCCL_PROTO
value: Simple
- name: NCCL_MIN_NCHANNELS
value: 4
- name: NCCL_P2P_NET_CHUNKSIZE
value: 524288
- name: NCCL_P2P_PCI_CHUNKSIZE
value: 524288
- name: NCCL_P2P_NVL_CHUNKSIZE
value: 1048576
- name: NCCL_FASTRAK_NUM_FLOWS
value: 2
- name: NCCL_FASTRAK_ENABLE_CONTROL_CHANNEL
value: 0
- name: NCCL_BUFFSIZE
value: 8388608
- name: NCCL_FASTRAK_USE_SNAP
value: 1
- name: CUDA_VISIBLE_DEVICES
value: 0,1,2,3,4,5,6,7
- name: NCCL_NET_GDR_LEVEL
value: PIX
- name: NCCL_FASTRAK_ENABLE_HOTPATH_LOGGING
value: 0
- name: NCCL_TUNER_PLUGIN
value: libnccl-tuner.so
- name: NCCL_TUNER_CONFIG_PATH
value: /var/lib/tcpxo/lib64/a3plus_tuner_config.textproto
- name: NCCL_SHIMNET_GUEST_CONFIG_CHECKER_CONFIG_FILE
value: /var/lib/tcpxo/lib64/a3plus_guest_config.textproto
- name: NCCL_FASTRAK_PLUGIN_ACCEPT_TIMEOUT_MS
value: 600000
- name: NCCL_NVLS_ENABLE
value: 0
- name: NCCL_DEBUG
value: WARN
- name: NCCL_FASTRAK_CTRL_DEV
value: enp0s12
- name: NCCL_FASTRAK_IFNAME
value: enp6s0,enp7s0,enp13s0,enp14s0,enp134s0,enp135s0,enp141s0,enp142s0
- name: NCCL_SOCKET_IFNAME
value: enp0s12
- name: NCCL_USE_SNAP
value: 1
- name: NCCL_FASTRAK_USE_LLCM
value: 1
- name: NCCL_FASTRAK_LLCM_DEVICE_DIRECTORY
value: /dev/aperture_devices

datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
result:
path: /output
resources:
gpuCount: 8
context:
priority: normal
preemptible: true
78 changes: 78 additions & 0 deletions configs/beaker_configs/default_finetune_multinode_olmo.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
version: v2
description: open-instruct-finetune-multinode
budget: ai2/oe-adapt
tasks:
- name: open-instruct-finetune-multinode
replicas: 4
leaderSelection: true
hostNetworking: true
propagateFailure: true
propagatePreemption: true
synchronizedStartTimeout: 60m
image:
beaker: nathanl/open_instruct_auto
command: [
'/bin/sh', '-c'
]
arguments: ['
unset CUDA_LAUNCH_BLOCKING && pip install git+https://github.com/vwxyzjn/transformers.git@olmo1124_classification && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 4
--num_processes 32
--machine_rank $BEAKER_REPLICA_RANK
--main_process_ip $BEAKER_LEADER_REPLICA_HOSTNAME
--main_process_port 29400
--use_deepspeed
--deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf
--deepspeed_multinode_launcher standard
open_instruct/finetune.py
--model_name_or_path meta-llama/Meta-Llama-3-8B
--tokenizer_name meta-llama/Meta-Llama-3-8B
--use_slow_tokenizer
--use_flash_attn
--max_seq_length 4096
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--gradient_accumulation_steps 4
--learning_rate 5e-6
--lr_scheduler_type linear
--warmup_ratio 0.03
--weight_decay 0.
--num_train_epochs 2
--output_dir /output/
--with_tracking
--report_to tensorboard
--logging_steps 1
--reduce_loss sum
']
envVars:
- name: CUDA_DEVICE_ORDER
value: PCI_BUS_ID
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
value: false
- name: WANDB_LOG_MODEL
value: false
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
# - mountPath: /model
# source:
# beaker: jacobm/llama-3.1-8b
result:
path: /output
resources:
gpuCount: 8
context:
priority: normal
preemptible: true
78 changes: 78 additions & 0 deletions configs/beaker_configs/default_finetune_multinode_olmoe.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
version: v2
description: open-instruct-finetune-multinode
budget: ai2/oe-adapt
tasks:
- name: open-instruct-finetune-multinode
replicas: 4
leaderSelection: true
hostNetworking: true
propagateFailure: true
propagatePreemption: true
synchronizedStartTimeout: 60m
image:
beaker: nathanl/open_instruct_auto
command: [
'/bin/sh', '-c'
]
arguments: ['
unset CUDA_LAUNCH_BLOCKING && pip install --upgrade transformers && PYTHONPATH="/stage:$PYTHONPATH" accelerate launch
--mixed_precision bf16
--num_machines 4
--num_processes 32
--machine_rank $BEAKER_REPLICA_RANK
--main_process_ip $BEAKER_LEADER_REPLICA_HOSTNAME
--main_process_port 29400
--use_deepspeed
--deepspeed_config_file configs/ds_configs/stage3_no_offloading_accelerate.conf
--deepspeed_multinode_launcher standard
open_instruct/finetune.py
--model_name_or_path meta-llama/Meta-Llama-3-8B
--tokenizer_name meta-llama/Meta-Llama-3-8B
--use_slow_tokenizer
--use_flash_attn
--max_seq_length 4096
--preprocessing_num_workers 16
--per_device_train_batch_size 1
--gradient_accumulation_steps 4
--learning_rate 5e-6
--lr_scheduler_type linear
--warmup_ratio 0.03
--weight_decay 0.
--num_train_epochs 2
--output_dir /output/
--with_tracking
--report_to tensorboard
--logging_steps 1
--reduce_loss sum
']
envVars:
- name: CUDA_DEVICE_ORDER
value: PCI_BUS_ID
- name: TRANSFORMERS_CACHE
value: ./cache/
- name: WANDB_API_KEY
secret: jacobm_WANDB_API_KEY
- name: WANDB_PROJECT
value: open-instruct
- name: WANDB_WATCH
value: false
- name: WANDB_LOG_MODEL
value: false
- name: WANDB_DISABLED
value: true
- name: HF_TOKEN
secret: jacobm_HF_TOKEN
datasets:
- mountPath: /oe-adapt-default
source:
weka: oe-adapt-default
# - mountPath: /model
# source:
# beaker: jacobm/llama-3.1-8b
result:
path: /output
resources:
gpuCount: 8
context:
priority: normal
preemptible: true
Loading