Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Faster and more generic multi-agent nets #1921

Merged
merged 8 commits into from
Feb 20, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 16, 2024

cc @matteobettini @kfu02

TODO:

  • Account for non-initialized params
  • Have TD.from_modules work with lazy params (the issue here being that the list of parameters will change between before and after the first call is made).

Copy link

pytorch-bot bot commented Feb 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1921

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (14 Unrelated Failures)

As of commit 030d0dd with merge base 799f939 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 16, 2024
@vmoens vmoens added the Refactoring Refactoring of an existing feature label Feb 16, 2024
@vmoens vmoens mentioned this pull request Feb 16, 2024
1 task
Copy link

github-actions bot commented Feb 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 62.5112ms 61.8857ms 16.1588 Ops/s 15.4966 Ops/s $\color{#35bf28}+4.27\%$
test_sync 38.1996ms 34.0074ms 29.4054 Ops/s 29.3890 Ops/s $\color{#35bf28}+0.06\%$
test_async 45.9222ms 31.6040ms 31.6416 Ops/s 33.4017 Ops/s $\textbf{\color{#d91a1a}-5.27\%}$
test_simple 0.4893s 0.4354s 2.2969 Ops/s 2.2675 Ops/s $\color{#35bf28}+1.30\%$
test_transformed 0.6400s 0.5867s 1.7044 Ops/s 1.6920 Ops/s $\color{#35bf28}+0.73\%$
test_serial 1.4752s 1.4270s 0.7008 Ops/s 0.6966 Ops/s $\color{#35bf28}+0.60\%$
test_parallel 1.4908s 1.4378s 0.6955 Ops/s 0.7061 Ops/s $\color{#d91a1a}-1.49\%$
test_step_mdp_speed[True-True-True-True-True] 0.1542ms 21.3359μs 46.8693 KOps/s 46.9593 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[True-True-True-True-False] 45.7350μs 13.0406μs 76.6838 KOps/s 77.3673 KOps/s $\color{#d91a1a}-0.88\%$
test_step_mdp_speed[True-True-True-False-True] 42.6000μs 12.5365μs 79.7673 KOps/s 78.8723 KOps/s $\color{#35bf28}+1.13\%$
test_step_mdp_speed[True-True-True-False-False] 41.0170μs 7.5323μs 132.7619 KOps/s 131.3094 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-True-False-True-True] 45.0240μs 22.6592μs 44.1323 KOps/s 43.4680 KOps/s $\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-True-False-True-False] 48.2300μs 14.1607μs 70.6179 KOps/s 69.5814 KOps/s $\color{#35bf28}+1.49\%$
test_step_mdp_speed[True-True-False-False-True] 38.7220μs 13.7885μs 72.5244 KOps/s 72.4843 KOps/s $\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-True-False-False-False] 48.2400μs 8.7880μs 113.7916 KOps/s 112.9430 KOps/s $\color{#35bf28}+0.75\%$
test_step_mdp_speed[True-False-True-True-True] 49.3320μs 24.3310μs 41.0998 KOps/s 41.2828 KOps/s $\color{#d91a1a}-0.44\%$
test_step_mdp_speed[True-False-True-True-False] 52.5380μs 15.6520μs 63.8896 KOps/s 63.1895 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-True-False-True] 55.1830μs 13.6556μs 73.2302 KOps/s 71.5723 KOps/s $\color{#35bf28}+2.32\%$
test_step_mdp_speed[True-False-True-False-False] 46.0360μs 8.7747μs 113.9646 KOps/s 111.0286 KOps/s $\color{#35bf28}+2.64\%$
test_step_mdp_speed[True-False-False-True-True] 61.3340μs 25.3292μs 39.4802 KOps/s 39.6050 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[True-False-False-True-False] 56.4050μs 16.8540μs 59.3330 KOps/s 59.2923 KOps/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[True-False-False-False-True] 47.8290μs 14.8159μs 67.4950 KOps/s 66.3243 KOps/s $\color{#35bf28}+1.77\%$
test_step_mdp_speed[True-False-False-False-False] 34.1330μs 10.0487μs 99.5153 KOps/s 98.8897 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[False-True-True-True-True] 87.6340μs 24.3215μs 41.1160 KOps/s 41.4181 KOps/s $\color{#d91a1a}-0.73\%$
test_step_mdp_speed[False-True-True-True-False] 51.2860μs 15.6128μs 64.0502 KOps/s 63.9903 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-True-True-False-True] 51.9070μs 16.1272μs 62.0069 KOps/s 62.3443 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-True-True-False-False] 60.3230μs 10.0896μs 99.1118 KOps/s 100.3243 KOps/s $\color{#d91a1a}-1.21\%$
test_step_mdp_speed[False-True-False-True-True] 40.3860μs 25.5884μs 39.0801 KOps/s 38.8752 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[False-True-False-True-False] 45.9250μs 16.9135μs 59.1243 KOps/s 59.8775 KOps/s $\color{#d91a1a}-1.26\%$
test_step_mdp_speed[False-True-False-False-True] 43.6020μs 17.2918μs 57.8309 KOps/s 58.1520 KOps/s $\color{#d91a1a}-0.55\%$
test_step_mdp_speed[False-True-False-False-False] 58.1390μs 11.3058μs 88.4501 KOps/s 88.6783 KOps/s $\color{#d91a1a}-0.26\%$
test_step_mdp_speed[False-False-True-True-True] 59.3400μs 26.7710μs 37.3538 KOps/s 37.4845 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-False-True-True-False] 48.0590μs 18.2981μs 54.6505 KOps/s 54.8705 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[False-False-True-False-True] 47.0180μs 17.3663μs 57.5829 KOps/s 58.8495 KOps/s $\color{#d91a1a}-2.15\%$
test_step_mdp_speed[False-False-True-False-False] 44.2230μs 11.2854μs 88.6097 KOps/s 88.9063 KOps/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[False-False-False-True-True] 85.0180μs 27.7796μs 35.9977 KOps/s 36.3071 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[False-False-False-True-False] 54.6720μs 19.2423μs 51.9690 KOps/s 52.5437 KOps/s $\color{#d91a1a}-1.09\%$
test_step_mdp_speed[False-False-False-False-True] 41.7570μs 18.0820μs 55.3036 KOps/s 55.1335 KOps/s $\color{#35bf28}+0.31\%$
test_step_mdp_speed[False-False-False-False-False] 56.5660μs 12.3468μs 80.9929 KOps/s 80.7951 KOps/s $\color{#35bf28}+0.24\%$
test_values[generalized_advantage_estimate-True-True] 9.4978ms 9.2944ms 107.5916 Ops/s 106.0318 Ops/s $\color{#35bf28}+1.47\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.3751ms 35.4342ms 28.2213 Ops/s 28.4637 Ops/s $\color{#d91a1a}-0.85\%$
test_values[td0_return_estimate-False-False] 0.2141ms 0.1817ms 5.5036 KOps/s 5.9081 KOps/s $\textbf{\color{#d91a1a}-6.85\%}$
test_values[td1_return_estimate-False-False] 25.9423ms 23.2916ms 42.9339 Ops/s 42.5047 Ops/s $\color{#35bf28}+1.01\%$
test_values[vec_td1_return_estimate-False-False] 39.5866ms 35.6245ms 28.0706 Ops/s 26.6271 Ops/s $\textbf{\color{#35bf28}+5.42\%}$
test_values[td_lambda_return_estimate-True-False] 36.0859ms 33.5557ms 29.8012 Ops/s 29.0833 Ops/s $\color{#35bf28}+2.47\%$
test_values[vec_td_lambda_return_estimate-True-False] 39.9226ms 35.4771ms 28.1872 Ops/s 27.9420 Ops/s $\color{#35bf28}+0.88\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.2543ms 8.1220ms 123.1224 Ops/s 121.1597 Ops/s $\color{#35bf28}+1.62\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.6590ms 1.9837ms 504.1169 Ops/s 492.9592 Ops/s $\color{#35bf28}+2.26\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5679ms 0.3477ms 2.8758 KOps/s 2.8004 KOps/s $\color{#35bf28}+2.69\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.5830ms 45.0959ms 22.1750 Ops/s 20.9736 Ops/s $\textbf{\color{#35bf28}+5.73\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.6315ms 3.0373ms 329.2428 Ops/s 326.5696 Ops/s $\color{#35bf28}+0.82\%$
test_dqn_speed 71.3281ms 1.5295ms 653.7952 Ops/s 706.7451 Ops/s $\textbf{\color{#d91a1a}-7.49\%}$
test_ddpg_speed 3.0161ms 2.8495ms 350.9411 Ops/s 355.4388 Ops/s $\color{#d91a1a}-1.27\%$
test_sac_speed 10.0665ms 8.4885ms 117.8069 Ops/s 118.3113 Ops/s $\color{#d91a1a}-0.43\%$
test_redq_speed 15.1853ms 13.5094ms 74.0225 Ops/s 74.7511 Ops/s $\color{#d91a1a}-0.97\%$
test_redq_deprec_speed 14.7894ms 13.6044ms 73.5056 Ops/s 73.9770 Ops/s $\color{#d91a1a}-0.64\%$
test_td3_speed 9.1014ms 8.4831ms 117.8813 Ops/s 116.9084 Ops/s $\color{#35bf28}+0.83\%$
test_cql_speed 38.4061ms 36.9620ms 27.0548 Ops/s 26.7744 Ops/s $\color{#35bf28}+1.05\%$
test_a2c_speed 7.9608ms 7.4434ms 134.3466 Ops/s 131.7123 Ops/s $\color{#35bf28}+2.00\%$
test_ppo_speed 8.7415ms 7.7662ms 128.7624 Ops/s 123.1680 Ops/s $\color{#35bf28}+4.54\%$
test_reinforce_speed 7.8257ms 6.6934ms 149.4007 Ops/s 146.5871 Ops/s $\color{#35bf28}+1.92\%$
test_iql_speed 34.7747ms 33.0926ms 30.2182 Ops/s 27.5289 Ops/s $\textbf{\color{#35bf28}+9.77\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1669ms 2.9088ms 343.7802 Ops/s 337.8755 Ops/s $\color{#35bf28}+1.75\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8041ms 0.5168ms 1.9352 KOps/s 1.9013 KOps/s $\color{#35bf28}+1.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7203ms 0.4920ms 2.0325 KOps/s 1.7640 KOps/s $\textbf{\color{#35bf28}+15.22\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.3366ms 3.0188ms 331.2587 Ops/s 329.2647 Ops/s $\color{#35bf28}+0.61\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8343ms 0.5087ms 1.9658 KOps/s 1.9613 KOps/s $\color{#35bf28}+0.23\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8723ms 0.4944ms 2.0227 KOps/s 2.0945 KOps/s $\color{#d91a1a}-3.43\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.3268ms 3.0176ms 331.3874 Ops/s 334.4820 Ops/s $\color{#d91a1a}-0.93\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1657ms 0.6424ms 1.5566 KOps/s 1.5441 KOps/s $\color{#35bf28}+0.81\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9610ms 0.6132ms 1.6307 KOps/s 1.6413 KOps/s $\color{#d91a1a}-0.65\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.5387ms 2.9156ms 342.9884 Ops/s 355.9943 Ops/s $\color{#d91a1a}-3.65\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6352ms 0.5181ms 1.9301 KOps/s 1.9613 KOps/s $\color{#d91a1a}-1.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8116ms 0.4942ms 2.0236 KOps/s 2.0610 KOps/s $\color{#d91a1a}-1.82\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.4452ms 3.0216ms 330.9462 Ops/s 347.6243 Ops/s $\color{#d91a1a}-4.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6180ms 0.5168ms 1.9350 KOps/s 1.9655 KOps/s $\color{#d91a1a}-1.55\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8366ms 0.4940ms 2.0244 KOps/s 2.0849 KOps/s $\color{#d91a1a}-2.90\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.5454ms 3.1096ms 321.5820 Ops/s 332.9705 Ops/s $\color{#d91a1a}-3.42\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9910ms 0.6424ms 1.5567 KOps/s 1.5637 KOps/s $\color{#d91a1a}-0.45\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9332ms 0.6118ms 1.6345 KOps/s 1.6476 KOps/s $\color{#d91a1a}-0.79\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1035s 7.8996ms 126.5884 Ops/s 106.6373 Ops/s $\textbf{\color{#35bf28}+18.71\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 15.8632ms 13.5575ms 73.7600 Ops/s 75.0293 Ops/s $\color{#d91a1a}-1.69\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 5.0162ms 2.5494ms 392.2481 Ops/s 389.7161 Ops/s $\color{#35bf28}+0.65\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 98.8650ms 9.6645ms 103.4716 Ops/s 135.7539 Ops/s $\textbf{\color{#d91a1a}-23.78\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 16.2543ms 13.4681ms 74.2493 Ops/s 74.6999 Ops/s $\color{#d91a1a}-0.60\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 5.3868ms 2.5618ms 390.3453 Ops/s 229.4036 Ops/s $\textbf{\color{#35bf28}+70.16\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 96.9531ms 9.6783ms 103.3245 Ops/s 126.3552 Ops/s $\textbf{\color{#d91a1a}-18.23\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 16.0304ms 13.6296ms 73.3695 Ops/s 72.4780 Ops/s $\color{#35bf28}+1.23\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 5.3510ms 2.8281ms 353.5933 Ops/s 354.3193 Ops/s $\color{#d91a1a}-0.20\%$

Copy link

github-actions bot commented Feb 16, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1184s 0.1167s 8.5685 Ops/s 8.2938 Ops/s $\color{#35bf28}+3.31\%$
test_sync 95.6823ms 95.5085ms 10.4703 Ops/s 10.4926 Ops/s $\color{#d91a1a}-0.21\%$
test_async 0.1810s 91.6261ms 10.9139 Ops/s 10.9060 Ops/s $\color{#35bf28}+0.07\%$
test_single_pixels 0.2084s 0.1396s 7.1625 Ops/s 7.3286 Ops/s $\color{#d91a1a}-2.27\%$
test_sync_pixels 83.4398ms 81.6323ms 12.2501 Ops/s 12.1391 Ops/s $\color{#35bf28}+0.91\%$
test_async_pixels 0.1537s 76.3523ms 13.0972 Ops/s 15.5890 Ops/s $\textbf{\color{#d91a1a}-15.98\%}$
test_simple 0.8338s 0.8331s 1.2003 Ops/s 1.1937 Ops/s $\color{#35bf28}+0.56\%$
test_transformed 1.0684s 1.0669s 0.9373 Ops/s 0.9441 Ops/s $\color{#d91a1a}-0.72\%$
test_serial 2.5607s 2.5143s 0.3977 Ops/s 0.4138 Ops/s $\color{#d91a1a}-3.89\%$
test_parallel 2.1641s 2.1130s 0.4733 Ops/s 0.4925 Ops/s $\color{#d91a1a}-3.91\%$
test_step_mdp_speed[True-True-True-True-True] 0.1038ms 33.0820μs 30.2280 KOps/s 30.3900 KOps/s $\color{#d91a1a}-0.53\%$
test_step_mdp_speed[True-True-True-True-False] 37.5710μs 20.3983μs 49.0238 KOps/s 51.3512 KOps/s $\color{#d91a1a}-4.53\%$
test_step_mdp_speed[True-True-True-False-True] 40.8600μs 19.0811μs 52.4080 KOps/s 53.6387 KOps/s $\color{#d91a1a}-2.29\%$
test_step_mdp_speed[True-True-True-False-False] 25.9100μs 11.0821μs 90.2354 KOps/s 90.8600 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[True-True-False-True-True] 55.6110μs 34.4252μs 29.0485 KOps/s 28.9621 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-True-False-True-False] 46.2700μs 21.3644μs 46.8069 KOps/s 46.6053 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[True-True-False-False-True] 42.4700μs 20.8717μs 47.9117 KOps/s 48.3550 KOps/s $\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-True-False-False-False] 30.4800μs 12.9776μs 77.0560 KOps/s 75.9859 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[True-False-True-True-True] 54.0110μs 36.4678μs 27.4214 KOps/s 26.8388 KOps/s $\color{#35bf28}+2.17\%$
test_step_mdp_speed[True-False-True-True-False] 45.5400μs 23.4286μs 42.6829 KOps/s 42.5291 KOps/s $\color{#35bf28}+0.36\%$
test_step_mdp_speed[True-False-True-False-True] 66.8910μs 20.7156μs 48.2727 KOps/s 48.9665 KOps/s $\color{#d91a1a}-1.42\%$
test_step_mdp_speed[True-False-True-False-False] 31.0700μs 13.0651μs 76.5396 KOps/s 76.4647 KOps/s $\color{#35bf28}+0.10\%$
test_step_mdp_speed[True-False-False-True-True] 60.8310μs 38.4740μs 25.9916 KOps/s 25.7416 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-False-False-True-False] 44.5010μs 25.1692μs 39.7311 KOps/s 38.7664 KOps/s $\color{#35bf28}+2.49\%$
test_step_mdp_speed[True-False-False-False-True] 38.5910μs 22.4431μs 44.5571 KOps/s 44.7135 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[True-False-False-False-False] 31.0110μs 14.6817μs 68.1122 KOps/s 66.4048 KOps/s $\color{#35bf28}+2.57\%$
test_step_mdp_speed[False-True-True-True-True] 61.0110μs 36.4476μs 27.4367 KOps/s 27.3817 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[False-True-True-True-False] 40.0500μs 23.4517μs 42.6408 KOps/s 42.6362 KOps/s $\color{#35bf28}+0.01\%$
test_step_mdp_speed[False-True-True-False-True] 42.2310μs 24.6669μs 40.5401 KOps/s 40.5428 KOps/s $-0.01\%$
test_step_mdp_speed[False-True-True-False-False] 32.1000μs 14.8204μs 67.4746 KOps/s 66.4415 KOps/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[False-True-False-True-True] 62.0510μs 38.3651μs 26.0654 KOps/s 25.1009 KOps/s $\color{#35bf28}+3.84\%$
test_step_mdp_speed[False-True-False-True-False] 47.0400μs 25.3399μs 39.4634 KOps/s 38.7635 KOps/s $\color{#35bf28}+1.81\%$
test_step_mdp_speed[False-True-False-False-True] 54.4110μs 26.5593μs 37.6516 KOps/s 37.7093 KOps/s $\color{#d91a1a}-0.15\%$
test_step_mdp_speed[False-True-False-False-False] 38.0100μs 16.8921μs 59.1994 KOps/s 59.4586 KOps/s $\color{#d91a1a}-0.44\%$
test_step_mdp_speed[False-False-True-True-True] 60.3610μs 40.7070μs 24.5658 KOps/s 24.6939 KOps/s $\color{#d91a1a}-0.52\%$
test_step_mdp_speed[False-False-True-True-False] 56.0410μs 27.4741μs 36.3979 KOps/s 36.9156 KOps/s $\color{#d91a1a}-1.40\%$
test_step_mdp_speed[False-False-True-False-True] 46.5510μs 26.3135μs 38.0033 KOps/s 38.7284 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[False-False-True-False-False] 41.4400μs 16.5814μs 60.3084 KOps/s 59.8627 KOps/s $\color{#35bf28}+0.74\%$
test_step_mdp_speed[False-False-False-True-True] 65.3610μs 42.0179μs 23.7994 KOps/s 23.6972 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[False-False-False-True-False] 51.1100μs 28.9439μs 34.5496 KOps/s 34.3553 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[False-False-False-False-True] 46.3610μs 27.8713μs 35.8792 KOps/s 35.8235 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[False-False-False-False-False] 35.3610μs 18.4249μs 54.2745 KOps/s 54.4260 KOps/s $\color{#d91a1a}-0.28\%$
test_values[generalized_advantage_estimate-True-True] 27.6203ms 26.7298ms 37.4115 Ops/s 40.2444 Ops/s $\textbf{\color{#d91a1a}-7.04\%}$
test_values[vec_generalized_advantage_estimate-True-True] 84.6563ms 3.2728ms 305.5507 Ops/s 308.4672 Ops/s $\color{#d91a1a}-0.95\%$
test_values[td0_return_estimate-False-False] 99.8720μs 63.8011μs 15.6737 KOps/s 16.4883 KOps/s $\color{#d91a1a}-4.94\%$
test_values[td1_return_estimate-False-False] 59.8015ms 59.0706ms 16.9289 Ops/s 18.9545 Ops/s $\textbf{\color{#d91a1a}-10.69\%}$
test_values[vec_td1_return_estimate-False-False] 2.1419ms 1.7911ms 558.3039 Ops/s 564.5024 Ops/s $\color{#d91a1a}-1.10\%$
test_values[td_lambda_return_estimate-True-False] 94.8850ms 94.0660ms 10.6308 Ops/s 11.8930 Ops/s $\textbf{\color{#d91a1a}-10.61\%}$
test_values[vec_td_lambda_return_estimate-True-False] 4.0662ms 1.8285ms 546.9042 Ops/s 556.0384 Ops/s $\color{#d91a1a}-1.64\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 26.4302ms 26.2245ms 38.1322 Ops/s 42.3764 Ops/s $\textbf{\color{#d91a1a}-10.02\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9329ms 0.7227ms 1.3837 KOps/s 1.4196 KOps/s $\color{#d91a1a}-2.53\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7309ms 0.6898ms 1.4496 KOps/s 1.5336 KOps/s $\textbf{\color{#d91a1a}-5.48\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5365ms 1.4785ms 676.3477 Ops/s 686.8483 Ops/s $\color{#d91a1a}-1.53\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9569ms 0.6846ms 1.4607 KOps/s 1.4835 KOps/s $\color{#d91a1a}-1.54\%$
test_dqn_speed 4.0529ms 1.4515ms 688.9229 Ops/s 681.5146 Ops/s $\color{#35bf28}+1.09\%$
test_ddpg_speed 3.2679ms 2.8022ms 356.8585 Ops/s 356.7989 Ops/s $\color{#35bf28}+0.02\%$
test_sac_speed 8.6458ms 8.1824ms 122.2131 Ops/s 123.1767 Ops/s $\color{#d91a1a}-0.78\%$
test_redq_speed 10.8624ms 10.1591ms 98.4337 Ops/s 98.1957 Ops/s $\color{#35bf28}+0.24\%$
test_redq_deprec_speed 11.8456ms 11.3007ms 88.4902 Ops/s 90.2089 Ops/s $\color{#d91a1a}-1.91\%$
test_td3_speed 8.3240ms 8.1816ms 122.2261 Ops/s 122.3455 Ops/s $\color{#d91a1a}-0.10\%$
test_cql_speed 25.7121ms 24.7448ms 40.4125 Ops/s 39.4782 Ops/s $\color{#35bf28}+2.37\%$
test_a2c_speed 5.3978ms 5.1400ms 194.5530 Ops/s 182.7475 Ops/s $\textbf{\color{#35bf28}+6.46\%}$
test_ppo_speed 5.6780ms 5.4305ms 184.1448 Ops/s 172.5937 Ops/s $\textbf{\color{#35bf28}+6.69\%}$
test_reinforce_speed 4.4662ms 4.1536ms 240.7576 Ops/s 223.8770 Ops/s $\textbf{\color{#35bf28}+7.54\%}$
test_iql_speed 99.9122ms 20.0872ms 49.7828 Ops/s 52.7061 Ops/s $\textbf{\color{#d91a1a}-5.55\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.8416ms 3.7133ms 269.3011 Ops/s 273.1119 Ops/s $\color{#d91a1a}-1.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7622ms 0.5588ms 1.7897 KOps/s 1.8074 KOps/s $\color{#d91a1a}-0.98\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 92.9655ms 0.6011ms 1.6636 KOps/s 1.9055 KOps/s $\textbf{\color{#d91a1a}-12.69\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.9339ms 3.7354ms 267.7065 Ops/s 268.6227 Ops/s $\color{#d91a1a}-0.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6768ms 0.5489ms 1.8217 KOps/s 1.8369 KOps/s $\color{#d91a1a}-0.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6644ms 0.5244ms 1.9069 KOps/s 1.9271 KOps/s $\color{#d91a1a}-1.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.9844ms 3.8475ms 259.9084 Ops/s 262.7046 Ops/s $\color{#d91a1a}-1.06\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8016ms 0.6855ms 1.4588 KOps/s 1.4813 KOps/s $\color{#d91a1a}-1.52\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8700ms 0.6572ms 1.5215 KOps/s 1.5497 KOps/s $\color{#d91a1a}-1.82\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.7862ms 3.7004ms 270.2437 Ops/s 271.4084 Ops/s $\color{#d91a1a}-0.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6862ms 0.5573ms 1.7945 KOps/s 1.8113 KOps/s $\color{#d91a1a}-0.93\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6794ms 0.5320ms 1.8796 KOps/s 1.8992 KOps/s $\color{#d91a1a}-1.03\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.8937ms 3.7178ms 268.9767 Ops/s 269.3558 Ops/s $\color{#d91a1a}-0.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7071ms 0.5513ms 1.8139 KOps/s 1.8318 KOps/s $\color{#d91a1a}-0.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6558ms 0.5261ms 1.9007 KOps/s 1.6318 KOps/s $\textbf{\color{#35bf28}+16.48\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.9620ms 3.8475ms 259.9061 Ops/s 263.1344 Ops/s $\color{#d91a1a}-1.23\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8100ms 0.6866ms 1.4565 KOps/s 1.4733 KOps/s $\color{#d91a1a}-1.14\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8064ms 0.6597ms 1.5158 KOps/s 1.5328 KOps/s $\color{#d91a1a}-1.11\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1120s 11.3552ms 88.0651 Ops/s 89.7844 Ops/s $\color{#d91a1a}-1.91\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.8925ms 16.5106ms 60.5670 Ops/s 62.3541 Ops/s $\color{#d91a1a}-2.87\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.4333ms 3.1065ms 321.9091 Ops/s 333.9906 Ops/s $\color{#d91a1a}-3.62\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1023s 9.2850ms 107.7011 Ops/s 90.7362 Ops/s $\textbf{\color{#35bf28}+18.70\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 19.3573ms 16.5439ms 60.4451 Ops/s 62.4266 Ops/s $\color{#d91a1a}-3.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 8.1033ms 3.1456ms 317.9022 Ops/s 334.9338 Ops/s $\textbf{\color{#d91a1a}-5.09\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1014s 11.4684ms 87.1963 Ops/s 88.2951 Ops/s $\color{#d91a1a}-1.24\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 19.4856ms 16.7447ms 59.7204 Ops/s 61.3875 Ops/s $\color{#d91a1a}-2.72\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.8550ms 3.4436ms 290.3966 Ops/s 304.9729 Ops/s $\color{#d91a1a}-4.78\%$

Copy link
Contributor

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing. Thanks so much for this! Can't wait to see the perf gains.

Before merging, we should run a comparision on main and on this PR on one of the multiagent example scripts (e.g., mappo_ippo.py) in the case of NOT sharing params to test that the reward is the same and gather data for the performance differences.

In particular i am interested in confirming that

  @staticmethod
  def vmap_func_module(module, *args, **kwargs):
      def exec_module(params, *input):
          with params.to_module(module):
              return module(*input)

      return torch.vmap(exec_module, *args, **kwargs)

   output = self.vmap_func_module(
                    self._empty_net, (0, self.agent_dim), (-2,)
                )(self.params, inputs)

is faster than

    output = torch.stack(
                    [
                        net(inputs[..., i, :])
                        for i, net in enumerate(self.agent_networks)
                    ],
                    dim=-2,
                )

in low number of agents regimes

torchrl/modules/models/multiagent.py Show resolved Hide resolved
torchrl/modules/models/multiagent.py Show resolved Hide resolved
@vmoens
Copy link
Contributor Author

vmoens commented Feb 17, 2024

It is faster, we ran multiple benchmarks on this.
It is faster in low regimes if you consider the backward pass (which is much slower when you build multiple graphs) and when you populate your optimizers with many more params (since you'll be calling step and zero grad on many more tensors with ops executed in python loops).

https://gist.github.com/vmoens/4b6037896a6a0ad347e91877ade354ae

Copy link

@kfu02 kfu02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool! Thank you so much!

To follow up on your comment here, what do you anticipate adding an RNN will require in addition to this? Is the issue that the hidden state must be initialized first? I don't quite understand what "non initialized lazy params" means or how that creates an issue with RNNs.

self.params = TensorDict.from_modules(*agent_networks, as_module=True)

@abc.abstractmethod
def _build_single_net(self, *, device, **kwargs):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, any new MultiAgent* will simply need to implement this method with some nn.Module (and the pre_forward_check below) right? I would be interested in helping contribute a MultiAgentGNN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that is the idea!

@vmoens
Copy link
Contributor Author

vmoens commented Feb 19, 2024

@kfu02 @matteobettini for context: the problem with unitilialized params is that if you have modules with lazy, non initialized params it is usually assumed that you can pass them to an optimizer and the optmizer will know that it must wait until they are initialized to do smth with them.

Currently this isn't supported by from_modules which will only create dense params. We could create lazy params but I'm not super duper sure I see how that will work with vmap...

@matteobettini
Copy link
Contributor

matteobettini commented Feb 19, 2024

It is faster, we ran multiple benchmarks on this.

That is cool! I would still run the full multiagent training script to check those 2 things

@kfu02 @matteobettini for context: the problem with unitilialized params is that if you have modules with lazy, non initialized params it is usually assumed that you can pass them to an optimizer and the optmizer will know that it must wait until they are initialized to do smth with them.

Since lazy modules was not a feature of these classes in the first place, why don’t we just leave it that way?

i personally would prefer this to a complex solution to support it

@vmoens
Copy link
Contributor Author

vmoens commented Feb 19, 2024

It was a feature of CNN so I just made it uniform

@vmoens
Copy link
Contributor Author

vmoens commented Feb 20, 2024

@kfu02 wanna give a shot at MA-RNNs with this once it's merged? Or should I draft a PR?

@vmoens vmoens merged commit ca42794 into main Feb 20, 2024
53 of 67 checks passed
@vmoens vmoens deleted the edit_ma_mlp2 branch February 20, 2024 21:28
@kfu02
Copy link

kfu02 commented Feb 21, 2024

@kfu02 wanna give a shot at MA-RNNs with this once it's merged? Or should I draft a PR?

Yes! I will put up a draft by the end of the week!

@matteobettini
Copy link
Contributor

matteobettini commented Feb 21, 2024

I ran some benchmarks on the mappo_ippo example for the MLP. With MAPPO and non-sharinmg params for 3 agents.

It seems to work! Do not see any regression in this case

All metrics match (no perf improvement tho)

W B Chart 21_02_2024, 11_18_37
W B Chart 21_02_2024, 11_18_50

@matteobettini
Copy link
Contributor

Further to #1957 i ran some tests with MASAC and non-shared parameters on the same task for 4 agents and the results look good!
W B Chart 26_02_2024, 09_16_20
W B Chart 26_02_2024, 09_16_32

Almost half training time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactoring Refactoring of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants