Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Submitit run script #1822

Merged
merged 72 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
402c339
init
vmoens Jan 9, 2024
7481abf
simple test
albertbou92 Jan 18, 2024
76185f7
simple test
albertbou92 Jan 18, 2024
7d5d1bb
simple test log to same project
albertbou92 Jan 18, 2024
1907265
simple test log to same project
albertbou92 Jan 18, 2024
b056750
simple test log to same project
albertbou92 Jan 18, 2024
57ecb4a
simple test log to same project
albertbou92 Jan 18, 2024
8013ec9
add a2c
albertbou92 Jan 18, 2024
6b92f35
add dqn
albertbou92 Jan 18, 2024
1557e1b
add sac td3 redq
albertbou92 Jan 18, 2024
6c6158c
add sac td3 redq
albertbou92 Jan 18, 2024
156843e
add sac td3 redq
albertbou92 Jan 18, 2024
a9676aa
fix benchmarks
albertbou92 Jan 18, 2024
b689bf2
fix benchmarks
albertbou92 Jan 18, 2024
e0681ce
iql cql dreamer impala benchmarks
albertbou92 Jan 19, 2024
ed30b89
multiagent benchmarks
albertbou92 Jan 19, 2024
734684b
accept partition as argument
albertbou92 Jan 19, 2024
8f64f9e
accept n_runs as argument
albertbou92 Jan 19, 2024
f1a0541
improve docs
albertbou92 Jan 19, 2024
a429c59
improve docs
albertbou92 Jan 19, 2024
f3c681e
fix
albertbou92 Jan 19, 2024
f951264
add rlhf
albertbou92 Jan 19, 2024
12c9311
add group to wandb logs
albertbou92 Jan 19, 2024
944572a
add group to wandb logs
albertbou92 Jan 19, 2024
2dd88db
fixes
albertbou92 Jan 19, 2024
a08170f
fixes
albertbou92 Jan 19, 2024
1bf21bb
fixes
albertbou92 Jan 20, 2024
74e752c
merge main
albertbou92 Jan 20, 2024
c92b093
fixes
albertbou92 Jan 20, 2024
0643cf2
fixes
albertbou92 Jan 20, 2024
0b0d62d
fixes
albertbou92 Jan 20, 2024
66dab5f
remove small trainings
albertbou92 Jan 20, 2024
1f719d1
add disclaimer note for rlhf and bandits
albertbou92 Jan 22, 2024
473dfb9
add disclaimer note for rlhf and bandits
albertbou92 Jan 22, 2024
701bdae
remove unused scripts
albertbou92 Jan 22, 2024
3d87136
remove unused scripts
albertbou92 Jan 22, 2024
1c1f856
remove unused scripts
albertbou92 Jan 22, 2024
e8f9f96
Update examples/dreamer/README.md
albertbou92 Jan 22, 2024
1c793d5
mino change READMEs
albertbou92 Jan 22, 2024
e574c81
temp-modif
vmoens Jan 25, 2024
80c3ed5
amend
vmoens Jan 25, 2024
ce692ff
oops
vmoens Jan 25, 2024
1baa40d
add job id
vmoens Jan 25, 2024
f880b27
amend
vmoens Jan 25, 2024
36cf93c
amend
vmoens Jan 25, 2024
55b0346
python path
vmoens Jan 25, 2024
446a97e
python path
vmoens Jan 25, 2024
b41d20e
python path
vmoens Jan 25, 2024
412e6f5
edit
vmoens Jan 25, 2024
b3a0c51
reverts
vmoens Jan 25, 2024
aa31584
Merge remote-tracking branch 'origin/main' into submitit-run-script-a…
vmoens Jan 25, 2024
796532a
tmp
vmoens Jan 25, 2024
53d5307
amend
vmoens Jan 25, 2024
b8aad27
amend
vmoens Jan 25, 2024
9b4352e
fix
vmoens Jan 25, 2024
caaa0bf
Merge remote-tracking branch 'origin/main' into submitit-run-script-a…
vmoens Jan 25, 2024
aefb24e
amend
vmoens Jan 25, 2024
208264a
amend
vmoens Jan 25, 2024
79cbace
amend
vmoens Jan 25, 2024
92fb391
amend
vmoens Jan 25, 2024
d23a48f
amend
vmoens Jan 25, 2024
14433de
amend
vmoens Jan 25, 2024
6611063
Update benchmarks/sota-check/run_iql_online.sh
vmoens Jan 26, 2024
7f525de
Update benchmarks/sota-check/run_multiagent_sac.sh
vmoens Jan 26, 2024
b40664e
Merge remote-tracking branch 'origin/main' into submitit-run-script-a…
vmoens Jan 26, 2024
8c66adf
non_blocking=False
vmoens Jan 26, 2024
3d759cd
Merge remote-tracking branch 'origin/main' into submitit-run-script-a…
vmoens Jan 29, 2024
8789b33
Merge remote-tracking branch 'origin/main' into submitit-run-script-a…
vmoens Jan 30, 2024
76748e1
lint
vmoens Jan 30, 2024
82dd859
amend
vmoens Jan 30, 2024
9678fcc
Merge remote-tracking branch 'origin/main' into submitit-run-script-a…
vmoens Jan 31, 2024
7fe2e40
amend
vmoens Jan 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions benchmarks/sota-check/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# SOTA Performance checks

This folder contains a `submitit-release-check.sh` file that executes all
the training scripts using `sbatch` with the default configuration and long them
into a common WandB project.

This script is to be executed before every release to assess the performance of
the various algorithms available in torchrl. The name of the project will include
the specific commit of torchrl used to run the scripts (e.g. `torchrl-examples-check-<commit>`).

## Usage

To display the script usage, you can use the `--help` option:

```bash
./submitit-release-check.sh --help
```

## Setup

The following setup should allow you to run the scripts:

```bash
export MUJOCO_GL=egl

conda create -n rl-sota-bench python=3.10 -y
conda install anaconda::libglu -y
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
pip3 install "gymnasium[accept-rom-license,atari,mujoco]" vmas tqdm wandb pygame moviepy imageio submitit hydra-core transformers

cd /path/to/tensordict
python setup.py develop
cd /path/to/torchrl
python setup.py develop
```
27 changes: 27 additions & 0 deletions benchmarks/sota-check/run_a2c_atari.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

#SBATCH --job-name=a2c_atari
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/a2c_atari_%j.txt
#SBATCH --error=slurm_errors/a2c_atari_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="a2c_atari"

export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/a2c/a2c_atari.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >>> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >>> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_a2c_mujoco.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=a2c_mujoco
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/a2c_mujoco_%j.txt
#SBATCH --error=slurm_errors/a2c_mujoco_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="a2c_mujoco"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/a2c/a2c_mujoco.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >>> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >>> report.log
fi
27 changes: 27 additions & 0 deletions benchmarks/sota-check/run_cql_offline.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#!/bin/bash

#SBATCH --job-name=cql_offline
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/cql_offline_%j.txt
#SBATCH --error=slurm_errors/cql_offline_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="cql_offline"

export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/cql/cql_offline.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >>> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_cql_online.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=cql_online
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/cql_online_%j.txt
#SBATCH --error=slurm_errors/cql_online_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="cql_online"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/cql/cql_online.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_ddpg.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=ddpg
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/ddpg_%j.txt
#SBATCH --error=slurm_errors/ddpg_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="ddpg"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/ddpg/ddpg.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_discrete_sac.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=discrete_sac
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/discrete_sac_%j.txt
#SBATCH --error=slurm_errors/discrete_sac_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="discrete_sac"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/discrete_sac/discrete_sac.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_dqn_atari.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=dqn_atari
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/dqn_atari_%j.txt
#SBATCH --error=slurm_errors/dqn_atari_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="dqn_atari"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/dqn/dqn_atari.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_dqn_cartpole.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=dqn_cartpole
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/dqn_cartpole_%j.txt
#SBATCH --error=slurm_errors/dqn_cartpole_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="dqn_cartpole"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/dqn/dqn_cartpole.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_dt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=dt
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/dt_offline_%j.txt
#SBATCH --error=slurm_errors/dt_offline_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="dt_offline"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/decision_transformer/dt.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_dt_online.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=dt_online
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/dt_online_%j.txt
#SBATCH --error=slurm_errors/dt_online_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="dt_online"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/decision_transformer/online_dt.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_impala_single_node.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=impala_1node
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/impala_1node_%j.txt
#SBATCH --error=slurm_errors/impala_1node_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="impala_1node"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/impala/impala_single_node.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_iql_discrete.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=iql_discrete
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/iql_discrete_%j.txt
#SBATCH --error=slurm_errors/iql_discrete_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="iql_discrete"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/iql/discrete_iql.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >> report.log
fi
26 changes: 26 additions & 0 deletions benchmarks/sota-check/run_iql_offline.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

#SBATCH --job-name=iql_offline
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:1
#SBATCH --output=slurm_logs/iql_offline_%j.txt
#SBATCH --error=slurm_errors/iql_offline_%j.txt

current_commit=$(git rev-parse --short HEAD)
project_name="torchrl-example-check-$current_commit"
group_name="iql_offline"
export PYTHONPATH=$(dirname $(dirname $PWD))
python $PYTHONPATH/examples/iql/iql_offline.py \
logger.backend=wandb \
logger.project_name="$project_name" \
logger.group_name="$group_name"

# Capture the exit status of the Python command
exit_status=$?
# Write the exit status to a file
if [ $exit_status -eq 0 ]; then
echo "${group_name}_${SLURM_JOB_ID}=success" >>> report.log
else
echo "${group_name}_${SLURM_JOB_ID}=error" >>> report.log
fi
Loading
Loading