-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance, Refactor, BugFix] Faster loading of uninitialized storages #2221
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2221
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 2 Unrelated FailuresAs of commit 934f48c with merge base 166467a (): NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1092s | 57.8620ms | 17.2825 Ops/s | 17.8609 Ops/s | |
test_sync | 40.5347ms | 34.6504ms | 28.8597 Ops/s | 32.1993 Ops/s | |
test_async | 58.3820ms | 29.3221ms | 34.1040 Ops/s | 35.4492 Ops/s | |
test_simple | 0.4386s | 0.3810s | 2.6249 Ops/s | 2.6591 Ops/s | |
test_transformed | 0.5818s | 0.5352s | 1.8685 Ops/s | 1.8462 Ops/s | |
test_serial | 1.2906s | 1.2341s | 0.8103 Ops/s | 0.7900 Ops/s | |
test_parallel | 1.1262s | 1.0652s | 0.9388 Ops/s | 0.9392 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 74.4860μs | 21.3683μs | 46.7982 KOps/s | 45.0420 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 46.2970μs | 13.0328μs | 76.7294 KOps/s | 74.6741 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 33.5430μs | 12.7415μs | 78.4836 KOps/s | 78.0207 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 46.2170μs | 7.6698μs | 130.3807 KOps/s | 127.8736 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 51.3670μs | 22.9853μs | 43.5061 KOps/s | 42.9170 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 50.3350μs | 14.2544μs | 70.1539 KOps/s | 67.9458 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 42.0690μs | 13.9407μs | 71.7323 KOps/s | 70.8762 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 44.2640μs | 8.9339μs | 111.9331 KOps/s | 109.6311 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 54.7230μs | 24.3320μs | 41.0981 KOps/s | 40.4271 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 53.4900μs | 15.6971μs | 63.7061 KOps/s | 61.6495 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 50.9560μs | 14.1277μs | 70.7828 KOps/s | 70.0807 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 33.7940μs | 8.9734μs | 111.4408 KOps/s | 109.5249 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 60.0630μs | 25.4968μs | 39.2205 KOps/s | 38.6252 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 44.7740μs | 16.9516μs | 58.9914 KOps/s | 57.1577 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 50.3650μs | 15.1156μs | 66.1567 KOps/s | 65.5584 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 33.5730μs | 10.0547μs | 99.4557 KOps/s | 95.7110 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 60.4630μs | 24.2469μs | 41.2423 KOps/s | 40.6127 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 58.5790μs | 15.5458μs | 64.3260 KOps/s | 61.9441 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 42.6800μs | 16.2754μs | 61.4425 KOps/s | 60.8697 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 45.6050μs | 10.0666μs | 99.3385 KOps/s | 95.9511 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 57.1410μs | 25.2579μs | 39.5915 KOps/s | 38.5319 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 41.8790μs | 16.8572μs | 59.3219 KOps/s | 57.5940 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 51.5070μs | 17.2917μs | 57.8312 KOps/s | 56.4563 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 68.8250μs | 11.2917μs | 88.5604 KOps/s | 85.3813 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 65.3530μs | 26.9352μs | 37.1261 KOps/s | 36.9008 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 41.8580μs | 18.1241μs | 55.1753 KOps/s | 53.4923 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 56.7600μs | 17.4686μs | 57.2455 KOps/s | 56.5296 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 34.5250μs | 11.3576μs | 88.0468 KOps/s | 85.9283 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 42.0190μs | 28.3084μs | 35.3251 KOps/s | 26.8601 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 58.3600μs | 19.3066μs | 51.7958 KOps/s | 50.7974 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 53.2900μs | 18.2481μs | 54.8002 KOps/s | 53.2239 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 50.8250μs | 12.5588μs | 79.6253 KOps/s | 77.8421 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 12.0578ms | 9.6984ms | 103.1097 Ops/s | 106.5470 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 37.2958ms | 33.5827ms | 29.7772 Ops/s | 28.2439 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.2204ms | 0.1691ms | 5.9127 KOps/s | 5.5624 KOps/s | |
test_values[td1_return_estimate-False-False] | 24.4367ms | 23.8635ms | 41.9050 Ops/s | 42.1385 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 34.2890ms | 33.5054ms | 29.8460 Ops/s | 28.1775 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 37.1753ms | 34.0718ms | 29.3497 Ops/s | 29.1320 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 34.3717ms | 33.5269ms | 29.8268 Ops/s | 28.1350 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 10.8436ms | 8.5277ms | 117.2655 Ops/s | 120.3497 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.1265ms | 1.8685ms | 535.1877 Ops/s | 515.9436 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4341ms | 0.3516ms | 2.8445 KOps/s | 2.8863 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 45.2625ms | 44.2174ms | 22.6155 Ops/s | 21.6929 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 3.5937ms | 3.0399ms | 328.9621 Ops/s | 330.8133 Ops/s | |
test_dqn_speed | 1.8057ms | 1.3137ms | 761.2101 Ops/s | 739.7779 Ops/s | |
test_ddpg_speed | 3.0697ms | 2.8032ms | 356.7342 Ops/s | 348.4421 Ops/s | |
test_sac_speed | 9.5437ms | 8.3002ms | 120.4794 Ops/s | 115.6984 Ops/s | |
test_redq_speed | 13.8142ms | 13.2296ms | 75.5881 Ops/s | 76.1562 Ops/s | |
test_redq_deprec_speed | 15.4172ms | 13.3708ms | 74.7896 Ops/s | 74.7745 Ops/s | |
test_td3_speed | 8.4256ms | 8.2086ms | 121.8233 Ops/s | 117.4037 Ops/s | |
test_cql_speed | 37.7098ms | 36.4579ms | 27.4289 Ops/s | 27.3792 Ops/s | |
test_a2c_speed | 8.1285ms | 7.4553ms | 134.1320 Ops/s | 134.3177 Ops/s | |
test_ppo_speed | 9.1172ms | 7.7171ms | 129.5831 Ops/s | 129.9359 Ops/s | |
test_reinforce_speed | 7.3752ms | 6.6290ms | 150.8530 Ops/s | 150.5917 Ops/s | |
test_iql_speed | 33.7329ms | 32.6757ms | 30.6038 Ops/s | 30.5080 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.8562ms | 3.5203ms | 284.0637 Ops/s | 291.5949 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9525ms | 0.4949ms | 2.0207 KOps/s | 1.9187 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7620ms | 0.4719ms | 2.1190 KOps/s | 2.1129 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.8689ms | 3.4414ms | 290.5762 Ops/s | 296.3669 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.1808ms | 0.4894ms | 2.0435 KOps/s | 2.0260 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.8313ms | 0.4656ms | 2.1479 KOps/s | 2.1340 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.8562ms | 1.6835ms | 594.0028 Ops/s | 588.5154 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 5.8767ms | 1.6752ms | 596.9592 Ops/s | 624.7120 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 4.0163ms | 3.6317ms | 275.3516 Ops/s | 283.5997 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8957ms | 0.6077ms | 1.6457 KOps/s | 1.4537 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.9469ms | 0.6121ms | 1.6337 KOps/s | 1.7038 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.6729ms | 3.5188ms | 284.1882 Ops/s | 295.7819 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6241ms | 0.4963ms | 2.0150 KOps/s | 1.9907 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 1.2025ms | 0.4803ms | 2.0822 KOps/s | 2.0988 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.9169ms | 3.5734ms | 279.8423 Ops/s | 298.6102 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 1.0338ms | 0.4867ms | 2.0547 KOps/s | 2.0391 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.7904ms | 0.4689ms | 2.1325 KOps/s | 2.1014 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.9345ms | 3.6907ms | 270.9542 Ops/s | 286.2378 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.9495ms | 0.6151ms | 1.6257 KOps/s | 1.6228 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 3.9971ms | 0.5894ms | 1.6966 KOps/s | 1.6907 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1001s | 7.8698ms | 127.0684 Ops/s | 133.7878 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 14.0244ms | 12.0853ms | 82.7454 Ops/s | 78.5992 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.5295ms | 1.0381ms | 963.3048 Ops/s | 952.2411 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 92.6896ms | 7.2269ms | 138.3710 Ops/s | 182.9572 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 14.7396ms | 12.1825ms | 82.0852 Ops/s | 79.5937 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 1.5912ms | 1.0422ms | 959.5006 Ops/s | 901.9687 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 91.4655ms | 5.5483ms | 180.2363 Ops/s | 139.5870 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 14.5767ms | 12.3381ms | 81.0495 Ops/s | 78.7748 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 1.7043ms | 1.1932ms | 838.0811 Ops/s | 777.5394 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1247s | 0.1219s | 8.2066 Ops/s | 7.9619 Ops/s | |
test_sync | 99.4821ms | 97.5342ms | 10.2528 Ops/s | 9.7877 Ops/s | |
test_async | 0.2012s | 0.1015s | 9.8526 Ops/s | 12.2151 Ops/s | |
test_single_pixels | 0.1322s | 0.1301s | 7.6857 Ops/s | 7.6848 Ops/s | |
test_sync_pixels | 84.5871ms | 81.4647ms | 12.2753 Ops/s | 12.2654 Ops/s | |
test_async_pixels | 0.1534s | 69.6501ms | 14.3575 Ops/s | 14.4305 Ops/s | |
test_simple | 0.8987s | 0.8377s | 1.1938 Ops/s | 1.2080 Ops/s | |
test_transformed | 1.1691s | 1.1078s | 0.9027 Ops/s | 0.9246 Ops/s | |
test_serial | 2.6078s | 2.5460s | 0.3928 Ops/s | 0.3906 Ops/s | |
test_parallel | 2.4279s | 2.3704s | 0.4219 Ops/s | 0.4216 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1042ms | 34.3438μs | 29.1173 KOps/s | 30.3251 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 47.0310μs | 20.0846μs | 49.7895 KOps/s | 50.6616 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 46.8110μs | 19.7952μs | 50.5174 KOps/s | 53.2348 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 33.1100μs | 11.3293μs | 88.2670 KOps/s | 88.5778 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 53.3810μs | 35.8582μs | 27.8877 KOps/s | 28.6471 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 92.6510μs | 21.9436μs | 45.5715 KOps/s | 46.4418 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 47.3600μs | 21.5297μs | 46.4474 KOps/s | 47.6835 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 31.6910μs | 13.4313μs | 74.4530 KOps/s | 76.5390 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 62.5310μs | 37.8340μs | 26.4312 KOps/s | 27.3187 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 45.6120μs | 23.8456μs | 41.9364 KOps/s | 42.8202 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 47.6710μs | 21.4381μs | 46.6459 KOps/s | 47.7618 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 32.7000μs | 13.3613μs | 74.8430 KOps/s | 76.1968 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 76.3720μs | 39.0205μs | 25.6275 KOps/s | 25.9224 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 52.6110μs | 25.6643μs | 38.9646 KOps/s | 39.5361 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 97.6830μs | 22.9926μs | 43.4922 KOps/s | 43.9993 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 38.2310μs | 15.2051μs | 65.7674 KOps/s | 66.6305 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 57.0110μs | 37.0565μs | 26.9858 KOps/s | 26.9680 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 47.0620μs | 23.5977μs | 42.3771 KOps/s | 42.3610 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 39.8000μs | 25.5388μs | 39.1561 KOps/s | 39.9572 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 31.5210μs | 15.0657μs | 66.3758 KOps/s | 66.9541 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 78.0610μs | 39.2871μs | 25.4536 KOps/s | 25.9605 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 48.7110μs | 25.4869μs | 39.2359 KOps/s | 39.6686 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 51.7100μs | 27.2187μs | 36.7395 KOps/s | 37.2501 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 40.5410μs | 16.8886μs | 59.2114 KOps/s | 59.0581 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 59.4200μs | 41.1592μs | 24.2959 KOps/s | 24.8247 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 51.2710μs | 27.5354μs | 36.3169 KOps/s | 36.5849 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 55.7510μs | 28.0172μs | 35.6924 KOps/s | 36.6941 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 37.3200μs | 17.1487μs | 58.3136 KOps/s | 58.8771 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 58.1020μs | 43.7889μs | 22.8368 KOps/s | 23.0226 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 54.7920μs | 29.4428μs | 33.9641 KOps/s | 33.6025 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 46.8800μs | 28.9329μs | 34.5628 KOps/s | 34.6461 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 40.2610μs | 18.7618μs | 53.2999 KOps/s | 53.1594 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 27.2343ms | 26.0084ms | 38.4492 Ops/s | 37.9881 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 89.5652ms | 2.7031ms | 369.9388 Ops/s | 375.2375 Ops/s | |
test_values[td0_return_estimate-False-False] | 90.4710μs | 68.4001μs | 14.6199 KOps/s | 14.7702 KOps/s | |
test_values[td1_return_estimate-False-False] | 60.8827ms | 57.5631ms | 17.3722 Ops/s | 16.7169 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 1.3054ms | 1.0998ms | 909.2529 Ops/s | 903.7523 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 97.1568ms | 91.9891ms | 10.8709 Ops/s | 11.0462 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 1.2597ms | 1.0977ms | 910.9856 Ops/s | 909.0890 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 25.8896ms | 25.6867ms | 38.9307 Ops/s | 38.4613 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.9914ms | 0.7398ms | 1.3517 KOps/s | 1.3459 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7577ms | 0.6830ms | 1.4642 KOps/s | 1.4594 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.5657ms | 1.4914ms | 670.4977 Ops/s | 672.5191 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.7710ms | 0.7337ms | 1.3629 KOps/s | 1.4300 KOps/s | |
test_dqn_speed | 1.8492ms | 1.5069ms | 663.6203 Ops/s | 675.0348 Ops/s | |
test_ddpg_speed | 3.1779ms | 3.0525ms | 327.6003 Ops/s | 328.4542 Ops/s | |
test_sac_speed | 9.0255ms | 8.7509ms | 114.2737 Ops/s | 115.8512 Ops/s | |
test_redq_speed | 12.4803ms | 10.8554ms | 92.1204 Ops/s | 92.1711 Ops/s | |
test_redq_deprec_speed | 12.4117ms | 11.6500ms | 85.8367 Ops/s | 81.8555 Ops/s | |
test_td3_speed | 8.8584ms | 8.6303ms | 115.8710 Ops/s | 116.3523 Ops/s | |
test_cql_speed | 27.8610ms | 26.3357ms | 37.9713 Ops/s | 38.0951 Ops/s | |
test_a2c_speed | 6.1794ms | 5.6258ms | 177.7529 Ops/s | 172.2162 Ops/s | |
test_ppo_speed | 6.5330ms | 5.9797ms | 167.2322 Ops/s | 162.9063 Ops/s | |
test_reinforce_speed | 5.3603ms | 4.6012ms | 217.3324 Ops/s | 208.2109 Ops/s | |
test_iql_speed | 20.5949ms | 19.9497ms | 50.1260 Ops/s | 49.4915 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.1486ms | 4.8638ms | 205.5987 Ops/s | 204.6035 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.7258ms | 0.5987ms | 1.6704 KOps/s | 1.6703 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 4.4106ms | 0.5812ms | 1.7206 KOps/s | 1.7465 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.0072ms | 4.8062ms | 208.0634 Ops/s | 205.5567 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7070ms | 0.5909ms | 1.6923 KOps/s | 1.6770 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.4329ms | 0.5703ms | 1.7535 KOps/s | 1.7575 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 2.3754ms | 2.1413ms | 467.0046 Ops/s | 466.2946 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 5.8159ms | 2.0457ms | 488.8358 Ops/s | 491.7983 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.1108ms | 4.9754ms | 200.9893 Ops/s | 199.6342 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.4664ms | 0.7283ms | 1.3731 KOps/s | 1.3762 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8874ms | 0.7048ms | 1.4188 KOps/s | 1.4154 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.0340ms | 4.8781ms | 204.9971 Ops/s | 204.7488 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.3017ms | 0.6022ms | 1.6605 KOps/s | 1.6660 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6878ms | 0.5762ms | 1.7354 KOps/s | 1.7238 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 5.0378ms | 4.8468ms | 206.3227 Ops/s | 205.3881 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.7085ms | 0.5926ms | 1.6874 KOps/s | 1.6753 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.7397ms | 0.5691ms | 1.7571 KOps/s | 1.7436 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 5.0843ms | 4.9830ms | 200.6806 Ops/s | 199.8634 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.4396ms | 0.7280ms | 1.3737 KOps/s | 1.3687 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8708ms | 0.7048ms | 1.4188 KOps/s | 1.4139 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1132s | 9.2914ms | 107.6263 Ops/s | 106.2519 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 20.9692ms | 17.0166ms | 58.7663 Ops/s | 59.0157 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 2.3498ms | 1.3488ms | 741.4260 Ops/s | 736.6960 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1051s | 7.1471ms | 139.9167 Ops/s | 139.4236 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 19.4005ms | 16.8990ms | 59.1751 Ops/s | 59.7428 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 6.8648ms | 1.4606ms | 684.6658 Ops/s | 734.3014 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1055s | 9.3444ms | 107.0160 Ops/s | 106.4787 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 19.3551ms | 16.7573ms | 59.6756 Ops/s | 57.9838 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.5454ms | 1.5052ms | 664.3746 Ops/s | 661.6923 Ops/s |
Awesome <3 |
cc @teopir
cc @shagunsodhani this is a good example of prealloc with tensordict. We were using a lot of lazy stacks and stacking at the last minute. Using a preallocated TD instead (create an empty td -> get a bunch of views of that td -> write on the first view, and all views get instantiated instantaneously) made the whole thing 20 - 1000x faster!