Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance, Refactor, BugFix] Faster loading of uninitialized storages #2221

Merged
merged 5 commits into from
Jun 11, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 10, 2024

cc @teopir

cc @shagunsodhani this is a good example of prealloc with tensordict. We were using a lot of lazy stacks and stacking at the last minute. Using a preallocated TD instead (create an empty td -> get a bunch of views of that td -> write on the first view, and all views get instantiated instantaneously) made the whole thing 20 - 1000x faster!

Copy link

pytorch-bot bot commented Jun 10, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2221

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 2 Unrelated Failures

As of commit 934f48c with merge base 166467a (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2024
@vmoens vmoens added enhancement New feature or request performance Performance issue or suggestion for improvement labels Jun 10, 2024
Copy link

github-actions bot commented Jun 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1092s 57.8620ms 17.2825 Ops/s 17.8609 Ops/s $\color{#d91a1a}-3.24\%$
test_sync 40.5347ms 34.6504ms 28.8597 Ops/s 32.1993 Ops/s $\textbf{\color{#d91a1a}-10.37\%}$
test_async 58.3820ms 29.3221ms 34.1040 Ops/s 35.4492 Ops/s $\color{#d91a1a}-3.79\%$
test_simple 0.4386s 0.3810s 2.6249 Ops/s 2.6591 Ops/s $\color{#d91a1a}-1.29\%$
test_transformed 0.5818s 0.5352s 1.8685 Ops/s 1.8462 Ops/s $\color{#35bf28}+1.21\%$
test_serial 1.2906s 1.2341s 0.8103 Ops/s 0.7900 Ops/s $\color{#35bf28}+2.57\%$
test_parallel 1.1262s 1.0652s 0.9388 Ops/s 0.9392 Ops/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[True-True-True-True-True] 74.4860μs 21.3683μs 46.7982 KOps/s 45.0420 KOps/s $\color{#35bf28}+3.90\%$
test_step_mdp_speed[True-True-True-True-False] 46.2970μs 13.0328μs 76.7294 KOps/s 74.6741 KOps/s $\color{#35bf28}+2.75\%$
test_step_mdp_speed[True-True-True-False-True] 33.5430μs 12.7415μs 78.4836 KOps/s 78.0207 KOps/s $\color{#35bf28}+0.59\%$
test_step_mdp_speed[True-True-True-False-False] 46.2170μs 7.6698μs 130.3807 KOps/s 127.8736 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[True-True-False-True-True] 51.3670μs 22.9853μs 43.5061 KOps/s 42.9170 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[True-True-False-True-False] 50.3350μs 14.2544μs 70.1539 KOps/s 67.9458 KOps/s $\color{#35bf28}+3.25\%$
test_step_mdp_speed[True-True-False-False-True] 42.0690μs 13.9407μs 71.7323 KOps/s 70.8762 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-True-False-False-False] 44.2640μs 8.9339μs 111.9331 KOps/s 109.6311 KOps/s $\color{#35bf28}+2.10\%$
test_step_mdp_speed[True-False-True-True-True] 54.7230μs 24.3320μs 41.0981 KOps/s 40.4271 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[True-False-True-True-False] 53.4900μs 15.6971μs 63.7061 KOps/s 61.6495 KOps/s $\color{#35bf28}+3.34\%$
test_step_mdp_speed[True-False-True-False-True] 50.9560μs 14.1277μs 70.7828 KOps/s 70.0807 KOps/s $\color{#35bf28}+1.00\%$
test_step_mdp_speed[True-False-True-False-False] 33.7940μs 8.9734μs 111.4408 KOps/s 109.5249 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[True-False-False-True-True] 60.0630μs 25.4968μs 39.2205 KOps/s 38.6252 KOps/s $\color{#35bf28}+1.54\%$
test_step_mdp_speed[True-False-False-True-False] 44.7740μs 16.9516μs 58.9914 KOps/s 57.1577 KOps/s $\color{#35bf28}+3.21\%$
test_step_mdp_speed[True-False-False-False-True] 50.3650μs 15.1156μs 66.1567 KOps/s 65.5584 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[True-False-False-False-False] 33.5730μs 10.0547μs 99.4557 KOps/s 95.7110 KOps/s $\color{#35bf28}+3.91\%$
test_step_mdp_speed[False-True-True-True-True] 60.4630μs 24.2469μs 41.2423 KOps/s 40.6127 KOps/s $\color{#35bf28}+1.55\%$
test_step_mdp_speed[False-True-True-True-False] 58.5790μs 15.5458μs 64.3260 KOps/s 61.9441 KOps/s $\color{#35bf28}+3.85\%$
test_step_mdp_speed[False-True-True-False-True] 42.6800μs 16.2754μs 61.4425 KOps/s 60.8697 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[False-True-True-False-False] 45.6050μs 10.0666μs 99.3385 KOps/s 95.9511 KOps/s $\color{#35bf28}+3.53\%$
test_step_mdp_speed[False-True-False-True-True] 57.1410μs 25.2579μs 39.5915 KOps/s 38.5319 KOps/s $\color{#35bf28}+2.75\%$
test_step_mdp_speed[False-True-False-True-False] 41.8790μs 16.8572μs 59.3219 KOps/s 57.5940 KOps/s $\color{#35bf28}+3.00\%$
test_step_mdp_speed[False-True-False-False-True] 51.5070μs 17.2917μs 57.8312 KOps/s 56.4563 KOps/s $\color{#35bf28}+2.44\%$
test_step_mdp_speed[False-True-False-False-False] 68.8250μs 11.2917μs 88.5604 KOps/s 85.3813 KOps/s $\color{#35bf28}+3.72\%$
test_step_mdp_speed[False-False-True-True-True] 65.3530μs 26.9352μs 37.1261 KOps/s 36.9008 KOps/s $\color{#35bf28}+0.61\%$
test_step_mdp_speed[False-False-True-True-False] 41.8580μs 18.1241μs 55.1753 KOps/s 53.4923 KOps/s $\color{#35bf28}+3.15\%$
test_step_mdp_speed[False-False-True-False-True] 56.7600μs 17.4686μs 57.2455 KOps/s 56.5296 KOps/s $\color{#35bf28}+1.27\%$
test_step_mdp_speed[False-False-True-False-False] 34.5250μs 11.3576μs 88.0468 KOps/s 85.9283 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[False-False-False-True-True] 42.0190μs 28.3084μs 35.3251 KOps/s 26.8601 KOps/s $\textbf{\color{#35bf28}+31.52\%}$
test_step_mdp_speed[False-False-False-True-False] 58.3600μs 19.3066μs 51.7958 KOps/s 50.7974 KOps/s $\color{#35bf28}+1.97\%$
test_step_mdp_speed[False-False-False-False-True] 53.2900μs 18.2481μs 54.8002 KOps/s 53.2239 KOps/s $\color{#35bf28}+2.96\%$
test_step_mdp_speed[False-False-False-False-False] 50.8250μs 12.5588μs 79.6253 KOps/s 77.8421 KOps/s $\color{#35bf28}+2.29\%$
test_values[generalized_advantage_estimate-True-True] 12.0578ms 9.6984ms 103.1097 Ops/s 106.5470 Ops/s $\color{#d91a1a}-3.23\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.2958ms 33.5827ms 29.7772 Ops/s 28.2439 Ops/s $\textbf{\color{#35bf28}+5.43\%}$
test_values[td0_return_estimate-False-False] 0.2204ms 0.1691ms 5.9127 KOps/s 5.5624 KOps/s $\textbf{\color{#35bf28}+6.30\%}$
test_values[td1_return_estimate-False-False] 24.4367ms 23.8635ms 41.9050 Ops/s 42.1385 Ops/s $\color{#d91a1a}-0.55\%$
test_values[vec_td1_return_estimate-False-False] 34.2890ms 33.5054ms 29.8460 Ops/s 28.1775 Ops/s $\textbf{\color{#35bf28}+5.92\%}$
test_values[td_lambda_return_estimate-True-False] 37.1753ms 34.0718ms 29.3497 Ops/s 29.1320 Ops/s $\color{#35bf28}+0.75\%$
test_values[vec_td_lambda_return_estimate-True-False] 34.3717ms 33.5269ms 29.8268 Ops/s 28.1350 Ops/s $\textbf{\color{#35bf28}+6.01\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.8436ms 8.5277ms 117.2655 Ops/s 120.3497 Ops/s $\color{#d91a1a}-2.56\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.1265ms 1.8685ms 535.1877 Ops/s 515.9436 Ops/s $\color{#35bf28}+3.73\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4341ms 0.3516ms 2.8445 KOps/s 2.8863 KOps/s $\color{#d91a1a}-1.45\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 45.2625ms 44.2174ms 22.6155 Ops/s 21.6929 Ops/s $\color{#35bf28}+4.25\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.5937ms 3.0399ms 328.9621 Ops/s 330.8133 Ops/s $\color{#d91a1a}-0.56\%$
test_dqn_speed 1.8057ms 1.3137ms 761.2101 Ops/s 739.7779 Ops/s $\color{#35bf28}+2.90\%$
test_ddpg_speed 3.0697ms 2.8032ms 356.7342 Ops/s 348.4421 Ops/s $\color{#35bf28}+2.38\%$
test_sac_speed 9.5437ms 8.3002ms 120.4794 Ops/s 115.6984 Ops/s $\color{#35bf28}+4.13\%$
test_redq_speed 13.8142ms 13.2296ms 75.5881 Ops/s 76.1562 Ops/s $\color{#d91a1a}-0.75\%$
test_redq_deprec_speed 15.4172ms 13.3708ms 74.7896 Ops/s 74.7745 Ops/s $\color{#35bf28}+0.02\%$
test_td3_speed 8.4256ms 8.2086ms 121.8233 Ops/s 117.4037 Ops/s $\color{#35bf28}+3.76\%$
test_cql_speed 37.7098ms 36.4579ms 27.4289 Ops/s 27.3792 Ops/s $\color{#35bf28}+0.18\%$
test_a2c_speed 8.1285ms 7.4553ms 134.1320 Ops/s 134.3177 Ops/s $\color{#d91a1a}-0.14\%$
test_ppo_speed 9.1172ms 7.7171ms 129.5831 Ops/s 129.9359 Ops/s $\color{#d91a1a}-0.27\%$
test_reinforce_speed 7.3752ms 6.6290ms 150.8530 Ops/s 150.5917 Ops/s $\color{#35bf28}+0.17\%$
test_iql_speed 33.7329ms 32.6757ms 30.6038 Ops/s 30.5080 Ops/s $\color{#35bf28}+0.31\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.8562ms 3.5203ms 284.0637 Ops/s 291.5949 Ops/s $\color{#d91a1a}-2.58\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9525ms 0.4949ms 2.0207 KOps/s 1.9187 KOps/s $\textbf{\color{#35bf28}+5.32\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7620ms 0.4719ms 2.1190 KOps/s 2.1129 KOps/s $\color{#35bf28}+0.29\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.8689ms 3.4414ms 290.5762 Ops/s 296.3669 Ops/s $\color{#d91a1a}-1.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1808ms 0.4894ms 2.0435 KOps/s 2.0260 KOps/s $\color{#35bf28}+0.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8313ms 0.4656ms 2.1479 KOps/s 2.1340 KOps/s $\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8562ms 1.6835ms 594.0028 Ops/s 588.5154 Ops/s $\color{#35bf28}+0.93\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 5.8767ms 1.6752ms 596.9592 Ops/s 624.7120 Ops/s $\color{#d91a1a}-4.44\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.0163ms 3.6317ms 275.3516 Ops/s 283.5997 Ops/s $\color{#d91a1a}-2.91\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8957ms 0.6077ms 1.6457 KOps/s 1.4537 KOps/s $\textbf{\color{#35bf28}+13.21\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9469ms 0.6121ms 1.6337 KOps/s 1.7038 KOps/s $\color{#d91a1a}-4.12\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.6729ms 3.5188ms 284.1882 Ops/s 295.7819 Ops/s $\color{#d91a1a}-3.92\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6241ms 0.4963ms 2.0150 KOps/s 1.9907 KOps/s $\color{#35bf28}+1.22\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 1.2025ms 0.4803ms 2.0822 KOps/s 2.0988 KOps/s $\color{#d91a1a}-0.79\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.9169ms 3.5734ms 279.8423 Ops/s 298.6102 Ops/s $\textbf{\color{#d91a1a}-6.29\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0338ms 0.4867ms 2.0547 KOps/s 2.0391 KOps/s $\color{#35bf28}+0.77\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7904ms 0.4689ms 2.1325 KOps/s 2.1014 KOps/s $\color{#35bf28}+1.48\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.9345ms 3.6907ms 270.9542 Ops/s 286.2378 Ops/s $\textbf{\color{#d91a1a}-5.34\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9495ms 0.6151ms 1.6257 KOps/s 1.6228 KOps/s $\color{#35bf28}+0.18\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 3.9971ms 0.5894ms 1.6966 KOps/s 1.6907 KOps/s $\color{#35bf28}+0.35\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1001s 7.8698ms 127.0684 Ops/s 133.7878 Ops/s $\textbf{\color{#d91a1a}-5.02\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 14.0244ms 12.0853ms 82.7454 Ops/s 78.5992 Ops/s $\textbf{\color{#35bf28}+5.28\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.5295ms 1.0381ms 963.3048 Ops/s 952.2411 Ops/s $\color{#35bf28}+1.16\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 92.6896ms 7.2269ms 138.3710 Ops/s 182.9572 Ops/s $\textbf{\color{#d91a1a}-24.37\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 14.7396ms 12.1825ms 82.0852 Ops/s 79.5937 Ops/s $\color{#35bf28}+3.13\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.5912ms 1.0422ms 959.5006 Ops/s 901.9687 Ops/s $\textbf{\color{#35bf28}+6.38\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 91.4655ms 5.5483ms 180.2363 Ops/s 139.5870 Ops/s $\textbf{\color{#35bf28}+29.12\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 14.5767ms 12.3381ms 81.0495 Ops/s 78.7748 Ops/s $\color{#35bf28}+2.89\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.7043ms 1.1932ms 838.0811 Ops/s 777.5394 Ops/s $\textbf{\color{#35bf28}+7.79\%}$

Copy link

github-actions bot commented Jun 10, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}0$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1247s 0.1219s 8.2066 Ops/s 7.9619 Ops/s $\color{#35bf28}+3.07\%$
test_sync 99.4821ms 97.5342ms 10.2528 Ops/s 9.7877 Ops/s $\color{#35bf28}+4.75\%$
test_async 0.2012s 0.1015s 9.8526 Ops/s 12.2151 Ops/s $\textbf{\color{#d91a1a}-19.34\%}$
test_single_pixels 0.1322s 0.1301s 7.6857 Ops/s 7.6848 Ops/s $\color{#35bf28}+0.01\%$
test_sync_pixels 84.5871ms 81.4647ms 12.2753 Ops/s 12.2654 Ops/s $\color{#35bf28}+0.08\%$
test_async_pixels 0.1534s 69.6501ms 14.3575 Ops/s 14.4305 Ops/s $\color{#d91a1a}-0.51\%$
test_simple 0.8987s 0.8377s 1.1938 Ops/s 1.2080 Ops/s $\color{#d91a1a}-1.18\%$
test_transformed 1.1691s 1.1078s 0.9027 Ops/s 0.9246 Ops/s $\color{#d91a1a}-2.37\%$
test_serial 2.6078s 2.5460s 0.3928 Ops/s 0.3906 Ops/s $\color{#35bf28}+0.55\%$
test_parallel 2.4279s 2.3704s 0.4219 Ops/s 0.4216 Ops/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[True-True-True-True-True] 0.1042ms 34.3438μs 29.1173 KOps/s 30.3251 KOps/s $\color{#d91a1a}-3.98\%$
test_step_mdp_speed[True-True-True-True-False] 47.0310μs 20.0846μs 49.7895 KOps/s 50.6616 KOps/s $\color{#d91a1a}-1.72\%$
test_step_mdp_speed[True-True-True-False-True] 46.8110μs 19.7952μs 50.5174 KOps/s 53.2348 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_step_mdp_speed[True-True-True-False-False] 33.1100μs 11.3293μs 88.2670 KOps/s 88.5778 KOps/s $\color{#d91a1a}-0.35\%$
test_step_mdp_speed[True-True-False-True-True] 53.3810μs 35.8582μs 27.8877 KOps/s 28.6471 KOps/s $\color{#d91a1a}-2.65\%$
test_step_mdp_speed[True-True-False-True-False] 92.6510μs 21.9436μs 45.5715 KOps/s 46.4418 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[True-True-False-False-True] 47.3600μs 21.5297μs 46.4474 KOps/s 47.6835 KOps/s $\color{#d91a1a}-2.59\%$
test_step_mdp_speed[True-True-False-False-False] 31.6910μs 13.4313μs 74.4530 KOps/s 76.5390 KOps/s $\color{#d91a1a}-2.73\%$
test_step_mdp_speed[True-False-True-True-True] 62.5310μs 37.8340μs 26.4312 KOps/s 27.3187 KOps/s $\color{#d91a1a}-3.25\%$
test_step_mdp_speed[True-False-True-True-False] 45.6120μs 23.8456μs 41.9364 KOps/s 42.8202 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-False-True-False-True] 47.6710μs 21.4381μs 46.6459 KOps/s 47.7618 KOps/s $\color{#d91a1a}-2.34\%$
test_step_mdp_speed[True-False-True-False-False] 32.7000μs 13.3613μs 74.8430 KOps/s 76.1968 KOps/s $\color{#d91a1a}-1.78\%$
test_step_mdp_speed[True-False-False-True-True] 76.3720μs 39.0205μs 25.6275 KOps/s 25.9224 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[True-False-False-True-False] 52.6110μs 25.6643μs 38.9646 KOps/s 39.5361 KOps/s $\color{#d91a1a}-1.45\%$
test_step_mdp_speed[True-False-False-False-True] 97.6830μs 22.9926μs 43.4922 KOps/s 43.9993 KOps/s $\color{#d91a1a}-1.15\%$
test_step_mdp_speed[True-False-False-False-False] 38.2310μs 15.2051μs 65.7674 KOps/s 66.6305 KOps/s $\color{#d91a1a}-1.30\%$
test_step_mdp_speed[False-True-True-True-True] 57.0110μs 37.0565μs 26.9858 KOps/s 26.9680 KOps/s $\color{#35bf28}+0.07\%$
test_step_mdp_speed[False-True-True-True-False] 47.0620μs 23.5977μs 42.3771 KOps/s 42.3610 KOps/s $\color{#35bf28}+0.04\%$
test_step_mdp_speed[False-True-True-False-True] 39.8000μs 25.5388μs 39.1561 KOps/s 39.9572 KOps/s $\color{#d91a1a}-2.01\%$
test_step_mdp_speed[False-True-True-False-False] 31.5210μs 15.0657μs 66.3758 KOps/s 66.9541 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[False-True-False-True-True] 78.0610μs 39.2871μs 25.4536 KOps/s 25.9605 KOps/s $\color{#d91a1a}-1.95\%$
test_step_mdp_speed[False-True-False-True-False] 48.7110μs 25.4869μs 39.2359 KOps/s 39.6686 KOps/s $\color{#d91a1a}-1.09\%$
test_step_mdp_speed[False-True-False-False-True] 51.7100μs 27.2187μs 36.7395 KOps/s 37.2501 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-True-False-False-False] 40.5410μs 16.8886μs 59.2114 KOps/s 59.0581 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[False-False-True-True-True] 59.4200μs 41.1592μs 24.2959 KOps/s 24.8247 KOps/s $\color{#d91a1a}-2.13\%$
test_step_mdp_speed[False-False-True-True-False] 51.2710μs 27.5354μs 36.3169 KOps/s 36.5849 KOps/s $\color{#d91a1a}-0.73\%$
test_step_mdp_speed[False-False-True-False-True] 55.7510μs 28.0172μs 35.6924 KOps/s 36.6941 KOps/s $\color{#d91a1a}-2.73\%$
test_step_mdp_speed[False-False-True-False-False] 37.3200μs 17.1487μs 58.3136 KOps/s 58.8771 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[False-False-False-True-True] 58.1020μs 43.7889μs 22.8368 KOps/s 23.0226 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[False-False-False-True-False] 54.7920μs 29.4428μs 33.9641 KOps/s 33.6025 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[False-False-False-False-True] 46.8800μs 28.9329μs 34.5628 KOps/s 34.6461 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-False-False-False-False] 40.2610μs 18.7618μs 53.2999 KOps/s 53.1594 KOps/s $\color{#35bf28}+0.26\%$
test_values[generalized_advantage_estimate-True-True] 27.2343ms 26.0084ms 38.4492 Ops/s 37.9881 Ops/s $\color{#35bf28}+1.21\%$
test_values[vec_generalized_advantage_estimate-True-True] 89.5652ms 2.7031ms 369.9388 Ops/s 375.2375 Ops/s $\color{#d91a1a}-1.41\%$
test_values[td0_return_estimate-False-False] 90.4710μs 68.4001μs 14.6199 KOps/s 14.7702 KOps/s $\color{#d91a1a}-1.02\%$
test_values[td1_return_estimate-False-False] 60.8827ms 57.5631ms 17.3722 Ops/s 16.7169 Ops/s $\color{#35bf28}+3.92\%$
test_values[vec_td1_return_estimate-False-False] 1.3054ms 1.0998ms 909.2529 Ops/s 903.7523 Ops/s $\color{#35bf28}+0.61\%$
test_values[td_lambda_return_estimate-True-False] 97.1568ms 91.9891ms 10.8709 Ops/s 11.0462 Ops/s $\color{#d91a1a}-1.59\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.2597ms 1.0977ms 910.9856 Ops/s 909.0890 Ops/s $\color{#35bf28}+0.21\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.8896ms 25.6867ms 38.9307 Ops/s 38.4613 Ops/s $\color{#35bf28}+1.22\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9914ms 0.7398ms 1.3517 KOps/s 1.3459 KOps/s $\color{#35bf28}+0.43\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7577ms 0.6830ms 1.4642 KOps/s 1.4594 KOps/s $\color{#35bf28}+0.33\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5657ms 1.4914ms 670.4977 Ops/s 672.5191 Ops/s $\color{#d91a1a}-0.30\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7710ms 0.7337ms 1.3629 KOps/s 1.4300 KOps/s $\color{#d91a1a}-4.69\%$
test_dqn_speed 1.8492ms 1.5069ms 663.6203 Ops/s 675.0348 Ops/s $\color{#d91a1a}-1.69\%$
test_ddpg_speed 3.1779ms 3.0525ms 327.6003 Ops/s 328.4542 Ops/s $\color{#d91a1a}-0.26\%$
test_sac_speed 9.0255ms 8.7509ms 114.2737 Ops/s 115.8512 Ops/s $\color{#d91a1a}-1.36\%$
test_redq_speed 12.4803ms 10.8554ms 92.1204 Ops/s 92.1711 Ops/s $\color{#d91a1a}-0.06\%$
test_redq_deprec_speed 12.4117ms 11.6500ms 85.8367 Ops/s 81.8555 Ops/s $\color{#35bf28}+4.86\%$
test_td3_speed 8.8584ms 8.6303ms 115.8710 Ops/s 116.3523 Ops/s $\color{#d91a1a}-0.41\%$
test_cql_speed 27.8610ms 26.3357ms 37.9713 Ops/s 38.0951 Ops/s $\color{#d91a1a}-0.32\%$
test_a2c_speed 6.1794ms 5.6258ms 177.7529 Ops/s 172.2162 Ops/s $\color{#35bf28}+3.21\%$
test_ppo_speed 6.5330ms 5.9797ms 167.2322 Ops/s 162.9063 Ops/s $\color{#35bf28}+2.66\%$
test_reinforce_speed 5.3603ms 4.6012ms 217.3324 Ops/s 208.2109 Ops/s $\color{#35bf28}+4.38\%$
test_iql_speed 20.5949ms 19.9497ms 50.1260 Ops/s 49.4915 Ops/s $\color{#35bf28}+1.28\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.1486ms 4.8638ms 205.5987 Ops/s 204.6035 Ops/s $\color{#35bf28}+0.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7258ms 0.5987ms 1.6704 KOps/s 1.6703 KOps/s $+0.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.4106ms 0.5812ms 1.7206 KOps/s 1.7465 KOps/s $\color{#d91a1a}-1.48\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.0072ms 4.8062ms 208.0634 Ops/s 205.5567 Ops/s $\color{#35bf28}+1.22\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7070ms 0.5909ms 1.6923 KOps/s 1.6770 KOps/s $\color{#35bf28}+0.91\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.4329ms 0.5703ms 1.7535 KOps/s 1.7575 KOps/s $\color{#d91a1a}-0.23\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.3754ms 2.1413ms 467.0046 Ops/s 466.2946 Ops/s $\color{#35bf28}+0.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 5.8159ms 2.0457ms 488.8358 Ops/s 491.7983 Ops/s $\color{#d91a1a}-0.60\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.1108ms 4.9754ms 200.9893 Ops/s 199.6342 Ops/s $\color{#35bf28}+0.68\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.4664ms 0.7283ms 1.3731 KOps/s 1.3762 KOps/s $\color{#d91a1a}-0.22\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8874ms 0.7048ms 1.4188 KOps/s 1.4154 KOps/s $\color{#35bf28}+0.24\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.0340ms 4.8781ms 204.9971 Ops/s 204.7488 Ops/s $\color{#35bf28}+0.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.3017ms 0.6022ms 1.6605 KOps/s 1.6660 KOps/s $\color{#d91a1a}-0.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6878ms 0.5762ms 1.7354 KOps/s 1.7238 KOps/s $\color{#35bf28}+0.67\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.0378ms 4.8468ms 206.3227 Ops/s 205.3881 Ops/s $\color{#35bf28}+0.46\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7085ms 0.5926ms 1.6874 KOps/s 1.6753 KOps/s $\color{#35bf28}+0.72\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7397ms 0.5691ms 1.7571 KOps/s 1.7436 KOps/s $\color{#35bf28}+0.78\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.0843ms 4.9830ms 200.6806 Ops/s 199.8634 Ops/s $\color{#35bf28}+0.41\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.4396ms 0.7280ms 1.3737 KOps/s 1.3687 KOps/s $\color{#35bf28}+0.36\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8708ms 0.7048ms 1.4188 KOps/s 1.4139 KOps/s $\color{#35bf28}+0.35\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1132s 9.2914ms 107.6263 Ops/s 106.2519 Ops/s $\color{#35bf28}+1.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 20.9692ms 17.0166ms 58.7663 Ops/s 59.0157 Ops/s $\color{#d91a1a}-0.42\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.3498ms 1.3488ms 741.4260 Ops/s 736.6960 Ops/s $\color{#35bf28}+0.64\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1051s 7.1471ms 139.9167 Ops/s 139.4236 Ops/s $\color{#35bf28}+0.35\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 19.4005ms 16.8990ms 59.1751 Ops/s 59.7428 Ops/s $\color{#d91a1a}-0.95\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.8648ms 1.4606ms 684.6658 Ops/s 734.3014 Ops/s $\textbf{\color{#d91a1a}-6.76\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1055s 9.3444ms 107.0160 Ops/s 106.4787 Ops/s $\color{#35bf28}+0.50\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 19.3551ms 16.7573ms 59.6756 Ops/s 57.9838 Ops/s $\color{#35bf28}+2.92\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.5454ms 1.5052ms 664.3746 Ops/s 661.6923 Ops/s $\color{#35bf28}+0.41\%$

@vmoens vmoens merged commit 3787a9e into main Jun 11, 2024
46 of 51 checks passed
@vmoens vmoens added the bug Something isn't working label Jun 11, 2024
@vmoens vmoens changed the title [Performance, Refactor] Faster loading of uninitialized storages [Performance, Refactor, BugFix] Faster loading of uninitialized storages Jun 11, 2024
@shagunsodhani
Copy link
Contributor

cc @teopir

cc @shagunsodhani this is a good example of prealloc with tensordict. We were using a lot of lazy stacks and stacking at the last minute. Using a preallocated TD instead (create an empty td -> get a bunch of views of that td -> write on the first view, and all views get instantiated instantaneously) made the whole thing 20 - 1000x faster!

Awesome <3

@vmoens vmoens deleted the refactor-checkpointers branch August 7, 2024 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants