-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactor] Faster and more generic multi-agent nets #1921
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1921
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (14 Unrelated Failures)As of commit 030d0dd with merge base 799f939 (): FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 62.5112ms | 61.8857ms | 16.1588 Ops/s | 15.4966 Ops/s | |
test_sync | 38.1996ms | 34.0074ms | 29.4054 Ops/s | 29.3890 Ops/s | |
test_async | 45.9222ms | 31.6040ms | 31.6416 Ops/s | 33.4017 Ops/s | |
test_simple | 0.4893s | 0.4354s | 2.2969 Ops/s | 2.2675 Ops/s | |
test_transformed | 0.6400s | 0.5867s | 1.7044 Ops/s | 1.6920 Ops/s | |
test_serial | 1.4752s | 1.4270s | 0.7008 Ops/s | 0.6966 Ops/s | |
test_parallel | 1.4908s | 1.4378s | 0.6955 Ops/s | 0.7061 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1542ms | 21.3359μs | 46.8693 KOps/s | 46.9593 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 45.7350μs | 13.0406μs | 76.6838 KOps/s | 77.3673 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 42.6000μs | 12.5365μs | 79.7673 KOps/s | 78.8723 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 41.0170μs | 7.5323μs | 132.7619 KOps/s | 131.3094 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 45.0240μs | 22.6592μs | 44.1323 KOps/s | 43.4680 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 48.2300μs | 14.1607μs | 70.6179 KOps/s | 69.5814 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 38.7220μs | 13.7885μs | 72.5244 KOps/s | 72.4843 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 48.2400μs | 8.7880μs | 113.7916 KOps/s | 112.9430 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 49.3320μs | 24.3310μs | 41.0998 KOps/s | 41.2828 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 52.5380μs | 15.6520μs | 63.8896 KOps/s | 63.1895 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 55.1830μs | 13.6556μs | 73.2302 KOps/s | 71.5723 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 46.0360μs | 8.7747μs | 113.9646 KOps/s | 111.0286 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 61.3340μs | 25.3292μs | 39.4802 KOps/s | 39.6050 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 56.4050μs | 16.8540μs | 59.3330 KOps/s | 59.2923 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 47.8290μs | 14.8159μs | 67.4950 KOps/s | 66.3243 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 34.1330μs | 10.0487μs | 99.5153 KOps/s | 98.8897 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 87.6340μs | 24.3215μs | 41.1160 KOps/s | 41.4181 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 51.2860μs | 15.6128μs | 64.0502 KOps/s | 63.9903 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 51.9070μs | 16.1272μs | 62.0069 KOps/s | 62.3443 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 60.3230μs | 10.0896μs | 99.1118 KOps/s | 100.3243 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 40.3860μs | 25.5884μs | 39.0801 KOps/s | 38.8752 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 45.9250μs | 16.9135μs | 59.1243 KOps/s | 59.8775 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 43.6020μs | 17.2918μs | 57.8309 KOps/s | 58.1520 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 58.1390μs | 11.3058μs | 88.4501 KOps/s | 88.6783 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 59.3400μs | 26.7710μs | 37.3538 KOps/s | 37.4845 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 48.0590μs | 18.2981μs | 54.6505 KOps/s | 54.8705 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 47.0180μs | 17.3663μs | 57.5829 KOps/s | 58.8495 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 44.2230μs | 11.2854μs | 88.6097 KOps/s | 88.9063 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 85.0180μs | 27.7796μs | 35.9977 KOps/s | 36.3071 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 54.6720μs | 19.2423μs | 51.9690 KOps/s | 52.5437 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 41.7570μs | 18.0820μs | 55.3036 KOps/s | 55.1335 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 56.5660μs | 12.3468μs | 80.9929 KOps/s | 80.7951 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 9.4978ms | 9.2944ms | 107.5916 Ops/s | 106.0318 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 37.3751ms | 35.4342ms | 28.2213 Ops/s | 28.4637 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.2141ms | 0.1817ms | 5.5036 KOps/s | 5.9081 KOps/s | |
test_values[td1_return_estimate-False-False] | 25.9423ms | 23.2916ms | 42.9339 Ops/s | 42.5047 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 39.5866ms | 35.6245ms | 28.0706 Ops/s | 26.6271 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 36.0859ms | 33.5557ms | 29.8012 Ops/s | 29.0833 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 39.9226ms | 35.4771ms | 28.1872 Ops/s | 27.9420 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 10.2543ms | 8.1220ms | 123.1224 Ops/s | 121.1597 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.6590ms | 1.9837ms | 504.1169 Ops/s | 492.9592 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.5679ms | 0.3477ms | 2.8758 KOps/s | 2.8004 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 48.5830ms | 45.0959ms | 22.1750 Ops/s | 20.9736 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 3.6315ms | 3.0373ms | 329.2428 Ops/s | 326.5696 Ops/s | |
test_dqn_speed | 71.3281ms | 1.5295ms | 653.7952 Ops/s | 706.7451 Ops/s | |
test_ddpg_speed | 3.0161ms | 2.8495ms | 350.9411 Ops/s | 355.4388 Ops/s | |
test_sac_speed | 10.0665ms | 8.4885ms | 117.8069 Ops/s | 118.3113 Ops/s | |
test_redq_speed | 15.1853ms | 13.5094ms | 74.0225 Ops/s | 74.7511 Ops/s | |
test_redq_deprec_speed | 14.7894ms | 13.6044ms | 73.5056 Ops/s | 73.9770 Ops/s | |
test_td3_speed | 9.1014ms | 8.4831ms | 117.8813 Ops/s | 116.9084 Ops/s | |
test_cql_speed | 38.4061ms | 36.9620ms | 27.0548 Ops/s | 26.7744 Ops/s | |
test_a2c_speed | 7.9608ms | 7.4434ms | 134.3466 Ops/s | 131.7123 Ops/s | |
test_ppo_speed | 8.7415ms | 7.7662ms | 128.7624 Ops/s | 123.1680 Ops/s | |
test_reinforce_speed | 7.8257ms | 6.6934ms | 149.4007 Ops/s | 146.5871 Ops/s | |
test_iql_speed | 34.7747ms | 33.0926ms | 30.2182 Ops/s | 27.5289 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.1669ms | 2.9088ms | 343.7802 Ops/s | 337.8755 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.8041ms | 0.5168ms | 1.9352 KOps/s | 1.9013 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7203ms | 0.4920ms | 2.0325 KOps/s | 1.7640 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.3366ms | 3.0188ms | 331.2587 Ops/s | 329.2647 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.8343ms | 0.5087ms | 1.9658 KOps/s | 1.9613 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.8723ms | 0.4944ms | 2.0227 KOps/s | 2.0945 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 4.3268ms | 3.0176ms | 331.3874 Ops/s | 334.4820 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1657ms | 0.6424ms | 1.5566 KOps/s | 1.5441 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.9610ms | 0.6132ms | 1.6307 KOps/s | 1.6413 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 4.5387ms | 2.9156ms | 342.9884 Ops/s | 355.9943 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6352ms | 0.5181ms | 1.9301 KOps/s | 1.9613 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.8116ms | 0.4942ms | 2.0236 KOps/s | 2.0610 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 4.4452ms | 3.0216ms | 330.9462 Ops/s | 347.6243 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6180ms | 0.5168ms | 1.9350 KOps/s | 1.9655 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.8366ms | 0.4940ms | 2.0244 KOps/s | 2.0849 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.5454ms | 3.1096ms | 321.5820 Ops/s | 332.9705 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9910ms | 0.6424ms | 1.5567 KOps/s | 1.5637 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.9332ms | 0.6118ms | 1.6345 KOps/s | 1.6476 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1035s | 7.8996ms | 126.5884 Ops/s | 106.6373 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 15.8632ms | 13.5575ms | 73.7600 Ops/s | 75.0293 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 5.0162ms | 2.5494ms | 392.2481 Ops/s | 389.7161 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 98.8650ms | 9.6645ms | 103.4716 Ops/s | 135.7539 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 16.2543ms | 13.4681ms | 74.2493 Ops/s | 74.6999 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 5.3868ms | 2.5618ms | 390.3453 Ops/s | 229.4036 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 96.9531ms | 9.6783ms | 103.3245 Ops/s | 126.3552 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 16.0304ms | 13.6296ms | 73.3695 Ops/s | 72.4780 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 5.3510ms | 2.8281ms | 353.5933 Ops/s | 354.3193 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1184s | 0.1167s | 8.5685 Ops/s | 8.2938 Ops/s | |
test_sync | 95.6823ms | 95.5085ms | 10.4703 Ops/s | 10.4926 Ops/s | |
test_async | 0.1810s | 91.6261ms | 10.9139 Ops/s | 10.9060 Ops/s | |
test_single_pixels | 0.2084s | 0.1396s | 7.1625 Ops/s | 7.3286 Ops/s | |
test_sync_pixels | 83.4398ms | 81.6323ms | 12.2501 Ops/s | 12.1391 Ops/s | |
test_async_pixels | 0.1537s | 76.3523ms | 13.0972 Ops/s | 15.5890 Ops/s | |
test_simple | 0.8338s | 0.8331s | 1.2003 Ops/s | 1.1937 Ops/s | |
test_transformed | 1.0684s | 1.0669s | 0.9373 Ops/s | 0.9441 Ops/s | |
test_serial | 2.5607s | 2.5143s | 0.3977 Ops/s | 0.4138 Ops/s | |
test_parallel | 2.1641s | 2.1130s | 0.4733 Ops/s | 0.4925 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1038ms | 33.0820μs | 30.2280 KOps/s | 30.3900 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 37.5710μs | 20.3983μs | 49.0238 KOps/s | 51.3512 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 40.8600μs | 19.0811μs | 52.4080 KOps/s | 53.6387 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 25.9100μs | 11.0821μs | 90.2354 KOps/s | 90.8600 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 55.6110μs | 34.4252μs | 29.0485 KOps/s | 28.9621 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 46.2700μs | 21.3644μs | 46.8069 KOps/s | 46.6053 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 42.4700μs | 20.8717μs | 47.9117 KOps/s | 48.3550 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 30.4800μs | 12.9776μs | 77.0560 KOps/s | 75.9859 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 54.0110μs | 36.4678μs | 27.4214 KOps/s | 26.8388 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 45.5400μs | 23.4286μs | 42.6829 KOps/s | 42.5291 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 66.8910μs | 20.7156μs | 48.2727 KOps/s | 48.9665 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 31.0700μs | 13.0651μs | 76.5396 KOps/s | 76.4647 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 60.8310μs | 38.4740μs | 25.9916 KOps/s | 25.7416 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 44.5010μs | 25.1692μs | 39.7311 KOps/s | 38.7664 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 38.5910μs | 22.4431μs | 44.5571 KOps/s | 44.7135 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 31.0110μs | 14.6817μs | 68.1122 KOps/s | 66.4048 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 61.0110μs | 36.4476μs | 27.4367 KOps/s | 27.3817 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 40.0500μs | 23.4517μs | 42.6408 KOps/s | 42.6362 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 42.2310μs | 24.6669μs | 40.5401 KOps/s | 40.5428 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 32.1000μs | 14.8204μs | 67.4746 KOps/s | 66.4415 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 62.0510μs | 38.3651μs | 26.0654 KOps/s | 25.1009 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 47.0400μs | 25.3399μs | 39.4634 KOps/s | 38.7635 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 54.4110μs | 26.5593μs | 37.6516 KOps/s | 37.7093 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 38.0100μs | 16.8921μs | 59.1994 KOps/s | 59.4586 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 60.3610μs | 40.7070μs | 24.5658 KOps/s | 24.6939 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 56.0410μs | 27.4741μs | 36.3979 KOps/s | 36.9156 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 46.5510μs | 26.3135μs | 38.0033 KOps/s | 38.7284 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 41.4400μs | 16.5814μs | 60.3084 KOps/s | 59.8627 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 65.3610μs | 42.0179μs | 23.7994 KOps/s | 23.6972 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 51.1100μs | 28.9439μs | 34.5496 KOps/s | 34.3553 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 46.3610μs | 27.8713μs | 35.8792 KOps/s | 35.8235 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 35.3610μs | 18.4249μs | 54.2745 KOps/s | 54.4260 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 27.6203ms | 26.7298ms | 37.4115 Ops/s | 40.2444 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 84.6563ms | 3.2728ms | 305.5507 Ops/s | 308.4672 Ops/s | |
test_values[td0_return_estimate-False-False] | 99.8720μs | 63.8011μs | 15.6737 KOps/s | 16.4883 KOps/s | |
test_values[td1_return_estimate-False-False] | 59.8015ms | 59.0706ms | 16.9289 Ops/s | 18.9545 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 2.1419ms | 1.7911ms | 558.3039 Ops/s | 564.5024 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 94.8850ms | 94.0660ms | 10.6308 Ops/s | 11.8930 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 4.0662ms | 1.8285ms | 546.9042 Ops/s | 556.0384 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 26.4302ms | 26.2245ms | 38.1322 Ops/s | 42.3764 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.9329ms | 0.7227ms | 1.3837 KOps/s | 1.4196 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7309ms | 0.6898ms | 1.4496 KOps/s | 1.5336 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5365ms | 1.4785ms | 676.3477 Ops/s | 686.8483 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.9569ms | 0.6846ms | 1.4607 KOps/s | 1.4835 KOps/s | |
test_dqn_speed | 4.0529ms | 1.4515ms | 688.9229 Ops/s | 681.5146 Ops/s | |
test_ddpg_speed | 3.2679ms | 2.8022ms | 356.8585 Ops/s | 356.7989 Ops/s | |
test_sac_speed | 8.6458ms | 8.1824ms | 122.2131 Ops/s | 123.1767 Ops/s | |
test_redq_speed | 10.8624ms | 10.1591ms | 98.4337 Ops/s | 98.1957 Ops/s | |
test_redq_deprec_speed | 11.8456ms | 11.3007ms | 88.4902 Ops/s | 90.2089 Ops/s | |
test_td3_speed | 8.3240ms | 8.1816ms | 122.2261 Ops/s | 122.3455 Ops/s | |
test_cql_speed | 25.7121ms | 24.7448ms | 40.4125 Ops/s | 39.4782 Ops/s | |
test_a2c_speed | 5.3978ms | 5.1400ms | 194.5530 Ops/s | 182.7475 Ops/s | |
test_ppo_speed | 5.6780ms | 5.4305ms | 184.1448 Ops/s | 172.5937 Ops/s | |
test_reinforce_speed | 4.4662ms | 4.1536ms | 240.7576 Ops/s | 223.8770 Ops/s | |
test_iql_speed | 99.9122ms | 20.0872ms | 49.7828 Ops/s | 52.7061 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.8416ms | 3.7133ms | 269.3011 Ops/s | 273.1119 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7622ms | 0.5588ms | 1.7897 KOps/s | 1.8074 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 92.9655ms | 0.6011ms | 1.6636 KOps/s | 1.9055 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.9339ms | 3.7354ms | 267.7065 Ops/s | 268.6227 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6768ms | 0.5489ms | 1.8217 KOps/s | 1.8369 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6644ms | 0.5244ms | 1.9069 KOps/s | 1.9271 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.9844ms | 3.8475ms | 259.9084 Ops/s | 262.7046 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8016ms | 0.6855ms | 1.4588 KOps/s | 1.4813 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8700ms | 0.6572ms | 1.5215 KOps/s | 1.5497 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.7862ms | 3.7004ms | 270.2437 Ops/s | 271.4084 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6862ms | 0.5573ms | 1.7945 KOps/s | 1.8113 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6794ms | 0.5320ms | 1.8796 KOps/s | 1.8992 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.8937ms | 3.7178ms | 268.9767 Ops/s | 269.3558 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7071ms | 0.5513ms | 1.8139 KOps/s | 1.8318 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6558ms | 0.5261ms | 1.9007 KOps/s | 1.6318 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.9620ms | 3.8475ms | 259.9061 Ops/s | 263.1344 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8100ms | 0.6866ms | 1.4565 KOps/s | 1.4733 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8064ms | 0.6597ms | 1.5158 KOps/s | 1.5328 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1120s | 11.3552ms | 88.0651 Ops/s | 89.7844 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 18.8925ms | 16.5106ms | 60.5670 Ops/s | 62.3541 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 7.4333ms | 3.1065ms | 321.9091 Ops/s | 333.9906 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1023s | 9.2850ms | 107.7011 Ops/s | 90.7362 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 19.3573ms | 16.5439ms | 60.4451 Ops/s | 62.4266 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 8.1033ms | 3.1456ms | 317.9022 Ops/s | 334.9338 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1014s | 11.4684ms | 87.1963 Ops/s | 88.2951 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 19.4856ms | 16.7447ms | 59.7204 Ops/s | 61.3875 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 8.8550ms | 3.4436ms | 290.3966 Ops/s | 304.9729 Ops/s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing. Thanks so much for this! Can't wait to see the perf gains.
Before merging, we should run a comparision on main and on this PR on one of the multiagent example scripts (e.g., mappo_ippo.py) in the case of NOT sharing params to test that the reward is the same and gather data for the performance differences.
In particular i am interested in confirming that
@staticmethod
def vmap_func_module(module, *args, **kwargs):
def exec_module(params, *input):
with params.to_module(module):
return module(*input)
return torch.vmap(exec_module, *args, **kwargs)
output = self.vmap_func_module(
self._empty_net, (0, self.agent_dim), (-2,)
)(self.params, inputs)
is faster than
output = torch.stack(
[
net(inputs[..., i, :])
for i, net in enumerate(self.agent_networks)
],
dim=-2,
)
in low number of agents regimes
It is faster, we ran multiple benchmarks on this. https://gist.github.com/vmoens/4b6037896a6a0ad347e91877ade354ae |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is cool! Thank you so much!
To follow up on your comment here, what do you anticipate adding an RNN will require in addition to this? Is the issue that the hidden state must be initialized first? I don't quite understand what "non initialized lazy params" means or how that creates an issue with RNNs.
self.params = TensorDict.from_modules(*agent_networks, as_module=True) | ||
|
||
@abc.abstractmethod | ||
def _build_single_net(self, *, device, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, any new MultiAgent* will simply need to implement this method with some nn.Module (and the pre_forward_check
below) right? I would be interested in helping contribute a MultiAgentGNN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep that is the idea!
@kfu02 @matteobettini for context: the problem with unitilialized params is that if you have modules with lazy, non initialized params it is usually assumed that you can pass them to an optimizer and the optmizer will know that it must wait until they are initialized to do smth with them. Currently this isn't supported by |
That is cool! I would still run the full multiagent training script to check those 2 things
Since lazy modules was not a feature of these classes in the first place, why don’t we just leave it that way? i personally would prefer this to a complex solution to support it |
It was a feature of CNN so I just made it uniform |
@kfu02 wanna give a shot at MA-RNNs with this once it's merged? Or should I draft a PR? |
Yes! I will put up a draft by the end of the week! |
Further to #1957 i ran some tests with MASAC and non-shared parameters on the same task for 4 agents and the results look good! Almost half training time! |
cc @matteobettini @kfu02
TODO: