Skip to content

Commit

Permalink
[CI] Fix benchmark on gpu (#1706)
Browse files Browse the repository at this point in the history
Co-authored-by: DanilBaibak <baibak@meta.com>
  • Loading branch information
vmoens and DanilBaibak authored Nov 20, 2023
1 parent 5cac16a commit c2edf35
Show file tree
Hide file tree
Showing 4 changed files with 143 additions and 132 deletions.
116 changes: 60 additions & 56 deletions .github/workflows/benchmarks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ jobs:
python -m pip install git+https://github.com/pytorch/tensordict
python setup.py develop
python -m pip install pytest pytest-benchmark
python -m pip install dm_control
python3 -m pip install "gym[accept-rom-license,atari]"
python3 -m pip install dm_control
- name: Run benchmarks
run: |
cd benchmarks/
Expand All @@ -57,62 +58,65 @@ jobs:

benchmark_gpu:
name: GPU Pytest benchmark
runs-on: ubuntu-20.04
strategy:
matrix:
include:
- os: linux.4xlarge.nvidia.gpu
python-version: 3.8
runs-on: linux.g5.4xlarge.nvidia.gpu
defaults:
run:
shell: bash -l {0}
container: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
container:
image: nvidia/cuda:12.3.0-base-ubuntu22.04
options: --gpus all
steps:
- name: Install deps
run: |
export TZ=Europe/London
export DEBIAN_FRONTEND=noninteractive # tzdata bug
apt-get update -y
apt-get install software-properties-common -y
add-apt-repository ppa:git-core/candidate -y
apt-get update -y
apt-get upgrade -y
apt-get -y install libglu1-mesa libgl1-mesa-glx libosmesa6 gcc curl g++ unzip wget libglfw3-dev libgles2-mesa-dev libglew-dev sudo git cmake libz-dev
- name: Check ldd --version
run: ldd --version
- name: Checkout
uses: actions/checkout@v3
- name: Update pip
run: |
apt-get install python3.8 python3-pip -y
pip3 install --upgrade pip
- name: Setup git
run: git config --global --add safe.directory /__w/rl/rl
- name: setup Path
run: |
echo /usr/local/bin >> $GITHUB_PATH
- name: Setup Environment
run: |
python3 -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
python3 -m pip install git+https://github.com/pytorch/tensordict
python3 setup.py develop
python3 -m pip install pytest pytest-benchmark
python3 -m pip install dm_control
- name: Run benchmarks
run: |
cd benchmarks/
python3 -m pytest --benchmark-json output.json
- name: Store benchmark results
uses: benchmark-action/github-action-benchmark@v1
if: ${{ github.ref == 'refs/heads/main' || github.event_name == 'workflow_dispatch' }}
with:
name: GPU Benchmark Results
tool: 'pytest'
output-file-path: benchmarks/output.json
fail-on-alert: true
alert-threshold: '200%'
alert-comment-cc-users: '@vmoens'
comment-on-alert: true
github-token: ${{ secrets.GITHUB_TOKEN }}
gh-pages-branch: gh-pages
auto-push: true
- name: Install deps
run: |
export TZ=Europe/London
export DEBIAN_FRONTEND=noninteractive # tzdata bug
apt-get update -y
apt-get install software-properties-common -y
add-apt-repository ppa:git-core/candidate -y
apt-get update -y
apt-get upgrade -y
apt-get -y install libglu1-mesa libgl1-mesa-glx libosmesa6 gcc curl g++ unzip wget libglfw3-dev libgles2-mesa-dev libglew-dev sudo git cmake libz-dev
- name: Check ldd --version
run: ldd --version
- name: Checkout
uses: actions/checkout@v3
- name: Python Setup
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Setup git
run: git config --global --add safe.directory /__w/rl/rl
- name: setup Path
run: |
echo /usr/local/bin >> $GITHUB_PATH
- name: Setup Environment
run: |
python3 -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
python3 -m pip install git+https://github.com/pytorch/tensordict
python3 setup.py develop
python3 -m pip install pytest pytest-benchmark
python3 -m pip install "gym[accept-rom-license,atari]"
python3 -m pip install dm_control
- name: check GPU presence
run: |
python -c """import torch
assert torch.cuda.device_count()
"""
- name: Run benchmarks
run: |
cd benchmarks/
python3 -m pytest --benchmark-json output.json
- name: Store benchmark results
uses: benchmark-action/github-action-benchmark@v1
if: ${{ github.ref == 'refs/heads/main' || github.event_name == 'workflow_dispatch' }}
with:
name: GPU Benchmark Results
tool: 'pytest'
output-file-path: benchmarks/output.json
fail-on-alert: true
alert-threshold: '200%'
alert-comment-cc-users: '@vmoens'
comment-on-alert: true
github-token: ${{ secrets.GITHUB_TOKEN }}
gh-pages-branch: gh-pages
auto-push: true
142 changes: 73 additions & 69 deletions .github/workflows/benchmarks_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ jobs:
python -m pip install git+https://github.com/pytorch/tensordict
python setup.py develop
python -m pip install pytest pytest-benchmark
python -m pip install dm_control
python3 -m pip install "gym[accept-rom-license,atari]"
python3 -m pip install dm_control
- name: Setup benchmarks
run: |
echo "BASE_SHA=$(echo ${{ github.event.pull_request.base.sha }} | cut -c1-8)" >> $GITHUB_ENV
Expand Down Expand Up @@ -63,75 +64,78 @@ jobs:

benchmark_gpu:
name: GPU Pytest benchmark
runs-on: ubuntu-20.04
strategy:
matrix:
include:
- os: linux.4xlarge.nvidia.gpu
python-version: 3.8
runs-on: linux.g5.4xlarge.nvidia.gpu
defaults:
run:
shell: bash -l {0}
container: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
container:
image: nvidia/cuda:12.3.0-base-ubuntu22.04
options: --gpus all
steps:
- name: Who triggered this?
run: |
echo "Action triggered by ${{ github.event.pull_request.html_url }}"
- name: Install deps
run: |
export TZ=Europe/London
export DEBIAN_FRONTEND=noninteractive # tzdata bug
apt-get update -y
apt-get install software-properties-common -y
add-apt-repository ppa:git-core/candidate -y
apt-get update -y
apt-get upgrade -y
apt-get -y install libglu1-mesa libgl1-mesa-glx libosmesa6 gcc curl g++ unzip wget libglfw3-dev libgles2-mesa-dev libglew-dev sudo git cmake libz-dev
- name: Check ldd --version
run: ldd --version
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 50 # this is to make sure we obtain the target base commit
- name: Update pip
run: |
apt-get install python3.8 python3-pip -y
pip3 install --upgrade pip
- name: Setup git
run: git config --global --add safe.directory /__w/rl/rl
- name: setup Path
run: |
echo /usr/local/bin >> $GITHUB_PATH
- name: Setup Environment
run: |
python3 -m pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
python3 -m pip install git+https://github.com/pytorch/tensordict
python3 setup.py develop
python3 -m pip install pytest pytest-benchmark
python3 -m pip install dm_control
- name: Setup benchmarks
run: |
echo "BASE_SHA=$(echo ${{ github.event.pull_request.base.sha }} | cut -c1-8)" >> $GITHUB_ENV
echo "HEAD_SHA=$(echo ${{ github.event.pull_request.head.sha }} | cut -c1-8)" >> $GITHUB_ENV
echo "BASELINE_JSON=$(mktemp)" >> $GITHUB_ENV
echo "CONTENDER_JSON=$(mktemp)" >> $GITHUB_ENV
echo "PR_COMMENT=$(mktemp)" >> $GITHUB_ENV
- name: Run benchmarks
run: |
cd benchmarks/
RUN_BENCHMARK="pytest --rank 0 --benchmark-json "
git checkout ${{ github.event.pull_request.base.sha }}
$RUN_BENCHMARK ${{ env.BASELINE_JSON }}
git checkout ${{ github.event.pull_request.head.sha }}
$RUN_BENCHMARK ${{ env.CONTENDER_JSON }}
- name: Publish results
uses: apbard/pytest-benchmark-commenter@v3
with:
token: ${{ secrets.GITHUB_TOKEN }}
benchmark-file: ${{ env.CONTENDER_JSON }}
comparison-benchmark-file: ${{ env.BASELINE_JSON }}
benchmark-metrics: 'name,max,mean,ops'
comparison-benchmark-metric: 'ops'
comparison-higher-is-better: true
comparison-threshold: 5
benchmark-title: 'Result of GPU Benchmark Tests'
- name: Who triggered this?
run: |
echo "Action triggered by ${{ github.event.pull_request.html_url }}"
- name: Install deps
run: |
export TZ=Europe/London
export DEBIAN_FRONTEND=noninteractive # tzdata bug
apt-get update -y
apt-get install software-properties-common -y
add-apt-repository ppa:git-core/candidate -y
apt-get update -y
apt-get upgrade -y
apt-get -y install libglu1-mesa libgl1-mesa-glx libosmesa6 gcc curl g++ unzip wget libglfw3-dev libgles2-mesa-dev libglew-dev sudo git cmake libz-dev
- name: Check ldd --version
run: ldd --version
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 50 # this is to make sure we obtain the target base commit
- name: Python Setup
uses: actions/setup-python@v4
with:
python-version: 3.8
- name: Setup git
run: git config --global --add safe.directory /__w/rl/rl
- name: setup Path
run: |
echo /usr/local/bin >> $GITHUB_PATH
- name: Setup Environment
run: |
python3 -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
python3 -m pip install git+https://github.com/pytorch/tensordict
python3 setup.py develop
python3 -m pip install pytest pytest-benchmark
python3 -m pip install "gym[accept-rom-license,atari]"
python3 -m pip install dm_control
- name: check GPU presence
run: |
python -c """import torch
assert torch.cuda.device_count()
"""
- name: Setup benchmarks
run: |
echo "BASE_SHA=$(echo ${{ github.event.pull_request.base.sha }} | cut -c1-8)" >> $GITHUB_ENV
echo "HEAD_SHA=$(echo ${{ github.event.pull_request.head.sha }} | cut -c1-8)" >> $GITHUB_ENV
echo "BASELINE_JSON=$(mktemp)" >> $GITHUB_ENV
echo "CONTENDER_JSON=$(mktemp)" >> $GITHUB_ENV
echo "PR_COMMENT=$(mktemp)" >> $GITHUB_ENV
- name: Run benchmarks
run: |
cd benchmarks/
RUN_BENCHMARK="pytest --rank 0 --benchmark-json "
git checkout ${{ github.event.pull_request.base.sha }}
$RUN_BENCHMARK ${{ env.BASELINE_JSON }}
git checkout ${{ github.event.pull_request.head.sha }}
$RUN_BENCHMARK ${{ env.CONTENDER_JSON }}
- name: Publish results
uses: apbard/pytest-benchmark-commenter@v3
with:
token: ${{ secrets.GITHUB_TOKEN }}
benchmark-file: ${{ env.CONTENDER_JSON }}
comparison-benchmark-file: ${{ env.BASELINE_JSON }}
benchmark-metrics: 'name,max,mean,ops'
comparison-benchmark-metric: 'ops'
comparison-higher-is-better: true
comparison-threshold: 5
benchmark-title: 'Result of GPU Benchmark Tests'
15 changes: 9 additions & 6 deletions benchmarks/test_collectors_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
MultiSyncDataCollector,
RandomPolicy,
)
from torchrl.envs import EnvCreator, StepCounter, TransformedEnv
from torchrl.envs import EnvCreator, GymEnv, StepCounter, TransformedEnv
from torchrl.envs.libs.dm_control import DMControlEnv


Expand Down Expand Up @@ -78,9 +78,10 @@ def async_collector_setup():

def single_collector_setup_pixels():
device = "cuda:0" if torch.cuda.device_count() else "cpu"
env = TransformedEnv(
DMControlEnv("cheetah", "run", device=device, from_pixels=True), StepCounter(50)
)
# env = TransformedEnv(
# DMControlEnv("cheetah", "run", device=device, from_pixels=True), StepCounter(50)
# )
env = TransformedEnv(GymEnv("ALE/Pong-v5"), StepCounter(50))
c = SyncDataCollector(
env,
RandomPolicy(env.action_spec),
Expand All @@ -99,7 +100,8 @@ def sync_collector_setup_pixels():
device = "cuda:0" if torch.cuda.device_count() else "cpu"
env = EnvCreator(
lambda: TransformedEnv(
DMControlEnv("cheetah", "run", device=device, from_pixels=True),
# DMControlEnv("cheetah", "run", device=device, from_pixels=True),
GymEnv("ALE/Pong-v5"),
StepCounter(50),
)
)
Expand All @@ -121,7 +123,8 @@ def async_collector_setup_pixels():
device = "cuda:0" if torch.cuda.device_count() else "cpu"
env = EnvCreator(
lambda: TransformedEnv(
DMControlEnv("cheetah", "run", device=device, from_pixels=True),
# DMControlEnv("cheetah", "run", device=device, from_pixels=True),
GymEnv("ALE/Pong-v5"),
StepCounter(50),
)
)
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/test_objectives_benchmarks.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ def test_gae_speed(benchmark, gae_fn, gamma_tensor, batches, timesteps):

gamma = 0.99
if gamma_tensor:
gamma = torch.full(size, gamma)
gamma = torch.full(size, gamma, device=device)
lmbda = 0.95

benchmark(
Expand Down

1 comment on commit c2edf35

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'GPU Benchmark Results'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 2.

Benchmark suite Current: c2edf35 Previous: 5cac16a Ratio
benchmarks/test_collectors_benchmark.py::test_sync 9.810059463167097 iter/sec (stddev: 0.002041292288897499) 28.532412789722958 iter/sec (stddev: 0.0026846866550320116) 2.91
benchmarks/test_collectors_benchmark.py::test_async 10.080410924621184 iter/sec (stddev: 0.09701922146618137) 30.9133477332784 iter/sec (stddev: 0.0311860928756896) 3.07
benchmarks/test_envs_benchmark.py::test_simple 1.1489713303686049 iter/sec (stddev: 0.03784047056231992) 2.3423070056657287 iter/sec (stddev: 0.03146140112651748) 2.04
benchmarks/test_envs_benchmark.py::test_parallel 0.4121471142355259 iter/sec (stddev: 0.033077600972760635) 0.9163986086644761 iter/sec (stddev: 0.03835037655176077) 2.22
benchmarks/test_objectives_benchmarks.py::test_values[td1_return_estimate-False-False] 18.737783606872945 iter/sec (stddev: 0.0006650968771473066) 38.14853728413215 iter/sec (stddev: 0.00014240161117313105) 2.04
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] 11.782401705119526 iter/sec (stddev: 0.00029902150016902155) 27.50430573580675 iter/sec (stddev: 0.00031334395849292625) 2.33
benchmarks/test_objectives_benchmarks.py::test_gae_speed[generalized_advantage_estimate-False-1-512] 43.328464861657416 iter/sec (stddev: 0.00008755735906681894) 123.99011192517025 iter/sec (stddev: 0.000029043194140820502) 2.86

This comment was automatically generated by workflow using github-action-benchmark.

CC: @vmoens

Please sign in to comment.