Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Faster DMC #2002

Merged
merged 14 commits into from
Mar 8, 2024
Merged

[Performance] Faster DMC #2002

merged 14 commits into from
Mar 8, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 7, 2024

Other improvements to explore:

  • usage of to in env._step may be unnecessary if no mapping is needed
  • step_mdp takes 11% of runtime and _set called within another 8% (2.2% own time). Improving on _set could speed things up drastically
  • tensordict.get takes another 5.1%

cc @teopir

Copy link

pytorch-bot bot commented Mar 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2002

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (14 Unrelated Failures)

As of commit 8c2d463 with merge base ad73733 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2024
@vmoens vmoens added the performance Performance issue or suggestion for improvement label Mar 7, 2024
Copy link

github-actions bot commented Mar 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}12$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 54.9466ms 54.4435ms 18.3677 Ops/s 16.7933 Ops/s $\textbf{\color{#35bf28}+9.37\%}$
test_sync 38.1087ms 30.2650ms 33.0415 Ops/s 30.2263 Ops/s $\textbf{\color{#35bf28}+9.31\%}$
test_async 69.6969ms 28.9454ms 34.5478 Ops/s 33.8853 Ops/s $\color{#35bf28}+1.96\%$
test_simple 0.3996s 0.3439s 2.9075 Ops/s 2.3823 Ops/s $\textbf{\color{#35bf28}+22.05\%}$
test_transformed 0.4732s 0.4681s 2.1364 Ops/s 1.7738 Ops/s $\textbf{\color{#35bf28}+20.44\%}$
test_serial 1.2593s 1.2049s 0.8299 Ops/s 0.7755 Ops/s $\textbf{\color{#35bf28}+7.02\%}$
test_parallel 1.0938s 1.0445s 0.9574 Ops/s 0.8774 Ops/s $\textbf{\color{#35bf28}+9.12\%}$
test_step_mdp_speed[True-True-True-True-True] 0.1188ms 21.0743μs 47.4511 KOps/s 46.9723 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[True-True-True-True-False] 35.1050μs 12.8739μs 77.6765 KOps/s 77.8198 KOps/s $\color{#d91a1a}-0.18\%$
test_step_mdp_speed[True-True-True-False-True] 50.2150μs 12.4727μs 80.1750 KOps/s 80.1655 KOps/s $\color{#35bf28}+0.01\%$
test_step_mdp_speed[True-True-True-False-False] 29.0540μs 7.4492μs 134.2420 KOps/s 133.7036 KOps/s $\color{#35bf28}+0.40\%$
test_step_mdp_speed[True-True-False-True-True] 47.6290μs 22.3032μs 44.8367 KOps/s 44.9474 KOps/s $\color{#d91a1a}-0.25\%$
test_step_mdp_speed[True-True-False-True-False] 32.9010μs 14.2036μs 70.4047 KOps/s 71.0393 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-True-False-False-True] 56.2050μs 13.6133μs 73.4575 KOps/s 74.6957 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[True-True-False-False-False] 28.5840μs 8.7657μs 114.0806 KOps/s 115.4353 KOps/s $\color{#d91a1a}-1.17\%$
test_step_mdp_speed[True-False-True-True-True] 51.0950μs 23.7527μs 42.1005 KOps/s 42.2697 KOps/s $\color{#d91a1a}-0.40\%$
test_step_mdp_speed[True-False-True-True-False] 57.4970μs 15.4941μs 64.5405 KOps/s 65.1902 KOps/s $\color{#d91a1a}-1.00\%$
test_step_mdp_speed[True-False-True-False-True] 56.7750μs 13.6544μs 73.2365 KOps/s 74.2432 KOps/s $\color{#d91a1a}-1.36\%$
test_step_mdp_speed[True-False-True-False-False] 29.4940μs 8.7314μs 114.5286 KOps/s 115.5065 KOps/s $\color{#d91a1a}-0.85\%$
test_step_mdp_speed[True-False-False-True-True] 67.3960μs 24.8654μs 40.2165 KOps/s 40.3204 KOps/s $\color{#d91a1a}-0.26\%$
test_step_mdp_speed[True-False-False-True-False] 51.7460μs 16.4867μs 60.6548 KOps/s 60.7809 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[True-False-False-False-True] 61.0240μs 14.5404μs 68.7737 KOps/s 68.4805 KOps/s $\color{#35bf28}+0.43\%$
test_step_mdp_speed[True-False-False-False-False] 38.6720μs 9.8409μs 101.6166 KOps/s 102.7851 KOps/s $\color{#d91a1a}-1.14\%$
test_step_mdp_speed[False-True-True-True-True] 65.2510μs 23.7407μs 42.1218 KOps/s 42.5437 KOps/s $\color{#d91a1a}-0.99\%$
test_step_mdp_speed[False-True-True-True-False] 88.5550μs 15.2562μs 65.5473 KOps/s 65.5935 KOps/s $\color{#d91a1a}-0.07\%$
test_step_mdp_speed[False-True-True-False-True] 45.4740μs 15.7024μs 63.6846 KOps/s 63.7918 KOps/s $\color{#d91a1a}-0.17\%$
test_step_mdp_speed[False-True-True-False-False] 33.9030μs 10.0019μs 99.9813 KOps/s 101.4395 KOps/s $\color{#d91a1a}-1.44\%$
test_step_mdp_speed[False-True-False-True-True] 46.0260μs 25.3313μs 39.4768 KOps/s 39.4748 KOps/s $+0.01\%$
test_step_mdp_speed[False-True-False-True-False] 43.4710μs 16.6559μs 60.0388 KOps/s 60.5008 KOps/s $\color{#d91a1a}-0.76\%$
test_step_mdp_speed[False-True-False-False-True] 61.3240μs 16.8992μs 59.1744 KOps/s 59.3162 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-True-False-False-False] 31.2380μs 11.1578μs 89.6230 KOps/s 90.1399 KOps/s $\color{#d91a1a}-0.57\%$
test_step_mdp_speed[False-False-True-True-True] 60.0310μs 25.9903μs 38.4759 KOps/s 38.0415 KOps/s $\color{#35bf28}+1.14\%$
test_step_mdp_speed[False-False-True-True-False] 52.7580μs 17.8736μs 55.9484 KOps/s 55.9508 KOps/s $-0.00\%$
test_step_mdp_speed[False-False-True-False-True] 39.2030μs 16.9375μs 59.0407 KOps/s 58.7536 KOps/s $\color{#35bf28}+0.49\%$
test_step_mdp_speed[False-False-True-False-False] 41.9180μs 11.0571μs 90.4397 KOps/s 92.7597 KOps/s $\color{#d91a1a}-2.50\%$
test_step_mdp_speed[False-False-False-True-True] 55.8040μs 27.1391μs 36.8472 KOps/s 37.5135 KOps/s $\color{#d91a1a}-1.78\%$
test_step_mdp_speed[False-False-False-True-False] 65.5500μs 18.4993μs 54.0560 KOps/s 54.0824 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[False-False-False-False-True] 45.5450μs 17.8651μs 55.9750 KOps/s 56.5182 KOps/s $\color{#d91a1a}-0.96\%$
test_step_mdp_speed[False-False-False-False-False] 56.9670μs 12.0616μs 82.9079 KOps/s 84.0519 KOps/s $\color{#d91a1a}-1.36\%$
test_values[generalized_advantage_estimate-True-True] 12.1555ms 9.3732ms 106.6871 Ops/s 109.3249 Ops/s $\color{#d91a1a}-2.41\%$
test_values[vec_generalized_advantage_estimate-True-True] 38.1462ms 35.6110ms 28.0812 Ops/s 28.0127 Ops/s $\color{#35bf28}+0.24\%$
test_values[td0_return_estimate-False-False] 0.2363ms 0.1663ms 6.0123 KOps/s 6.0827 KOps/s $\color{#d91a1a}-1.16\%$
test_values[td1_return_estimate-False-False] 26.6716ms 23.1362ms 43.2224 Ops/s 43.3607 Ops/s $\color{#d91a1a}-0.32\%$
test_values[vec_td1_return_estimate-False-False] 40.6710ms 35.7374ms 27.9819 Ops/s 26.6582 Ops/s $\color{#35bf28}+4.97\%$
test_values[td_lambda_return_estimate-True-False] 36.2816ms 33.8585ms 29.5347 Ops/s 30.1283 Ops/s $\color{#d91a1a}-1.97\%$
test_values[vec_td_lambda_return_estimate-True-False] 38.7429ms 35.6605ms 28.0422 Ops/s 27.3061 Ops/s $\color{#35bf28}+2.70\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.3191ms 8.1582ms 122.5763 Ops/s 123.0389 Ops/s $\color{#d91a1a}-0.38\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.6940ms 2.0757ms 481.7729 Ops/s 487.3168 Ops/s $\color{#d91a1a}-1.14\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5461ms 0.3549ms 2.8180 KOps/s 2.8082 KOps/s $\color{#35bf28}+0.35\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 52.8779ms 50.1531ms 19.9389 Ops/s 19.7415 Ops/s $\color{#35bf28}+1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.8956ms 3.0601ms 326.7827 Ops/s 329.1803 Ops/s $\color{#d91a1a}-0.73\%$
test_dqn_speed 6.8741ms 1.3733ms 728.1971 Ops/s 740.8297 Ops/s $\color{#d91a1a}-1.71\%$
test_ddpg_speed 3.5466ms 2.7081ms 369.2577 Ops/s 378.2602 Ops/s $\color{#d91a1a}-2.38\%$
test_sac_speed 9.4743ms 8.2313ms 121.4869 Ops/s 122.6106 Ops/s $\color{#d91a1a}-0.92\%$
test_redq_speed 20.5118ms 13.7747ms 72.5968 Ops/s 76.3396 Ops/s $\color{#d91a1a}-4.90\%$
test_redq_deprec_speed 14.0023ms 13.1660ms 75.9534 Ops/s 75.9133 Ops/s $\color{#35bf28}+0.05\%$
test_td3_speed 8.2969ms 8.0923ms 123.5744 Ops/s 123.9042 Ops/s $\color{#d91a1a}-0.27\%$
test_cql_speed 37.2574ms 36.3188ms 27.5340 Ops/s 27.7421 Ops/s $\color{#d91a1a}-0.75\%$
test_a2c_speed 8.6477ms 7.3573ms 135.9191 Ops/s 136.6136 Ops/s $\color{#d91a1a}-0.51\%$
test_ppo_speed 8.2442ms 7.6387ms 130.9117 Ops/s 132.4777 Ops/s $\color{#d91a1a}-1.18\%$
test_reinforce_speed 7.7244ms 6.5549ms 152.5572 Ops/s 153.4903 Ops/s $\color{#d91a1a}-0.61\%$
test_iql_speed 0.1077s 35.3098ms 28.3208 Ops/s 30.7878 Ops/s $\textbf{\color{#d91a1a}-8.01\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.6606ms 2.2101ms 452.4605 Ops/s 450.0818 Ops/s $\color{#35bf28}+0.53\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7530ms 0.5004ms 1.9982 KOps/s 2.0045 KOps/s $\color{#d91a1a}-0.31\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.6596ms 0.4747ms 2.1064 KOps/s 2.1097 KOps/s $\color{#d91a1a}-0.16\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.4354ms 2.1559ms 463.8338 Ops/s 457.2159 Ops/s $\color{#35bf28}+1.45\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8874ms 0.4913ms 2.0353 KOps/s 2.0524 KOps/s $\color{#d91a1a}-0.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7624ms 0.4659ms 2.1463 KOps/s 2.1492 KOps/s $\color{#d91a1a}-0.14\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8124ms 1.2877ms 776.5870 Ops/s 693.9381 Ops/s $\textbf{\color{#35bf28}+11.91\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7023ms 1.2209ms 819.0545 Ops/s 815.7828 Ops/s $\color{#35bf28}+0.40\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.6463ms 2.3731ms 421.3903 Ops/s 429.2438 Ops/s $\color{#d91a1a}-1.83\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8506ms 0.6206ms 1.6112 KOps/s 1.6268 KOps/s $\color{#d91a1a}-0.95\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 3.6599ms 0.5950ms 1.6806 KOps/s 1.7054 KOps/s $\color{#d91a1a}-1.45\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.5037ms 2.1886ms 456.9071 Ops/s 451.5163 Ops/s $\color{#35bf28}+1.19\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7968ms 0.5064ms 1.9747 KOps/s 2.0029 KOps/s $\color{#d91a1a}-1.41\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6287ms 0.4743ms 2.1082 KOps/s 1.7182 KOps/s $\textbf{\color{#35bf28}+22.70\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.6057ms 2.2387ms 446.6885 Ops/s 443.9410 Ops/s $\color{#35bf28}+0.62\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0320ms 0.5004ms 1.9983 KOps/s 2.0461 KOps/s $\color{#d91a1a}-2.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7147ms 0.4719ms 2.1190 KOps/s 2.1118 KOps/s $\color{#35bf28}+0.34\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.3295ms 2.3044ms 433.9515 Ops/s 438.7581 Ops/s $\color{#d91a1a}-1.10\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.6951ms 0.6160ms 1.6233 KOps/s 1.3483 KOps/s $\textbf{\color{#35bf28}+20.40\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 3.9044ms 0.6010ms 1.6639 KOps/s 1.6952 KOps/s $\color{#d91a1a}-1.85\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1022s 7.2878ms 137.2147 Ops/s 185.6099 Ops/s $\textbf{\color{#d91a1a}-26.07\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 14.8769ms 11.9463ms 83.7077 Ops/s 83.4100 Ops/s $\color{#35bf28}+0.36\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.0650ms 1.0222ms 978.2908 Ops/s 934.6910 Ops/s $\color{#35bf28}+4.66\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 91.4867ms 5.4234ms 184.3861 Ops/s 138.0649 Ops/s $\textbf{\color{#35bf28}+33.55\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 14.9478ms 11.9530ms 83.6607 Ops/s 83.4027 Ops/s $\color{#35bf28}+0.31\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.5116ms 1.0239ms 976.6737 Ops/s 877.4548 Ops/s $\textbf{\color{#35bf28}+11.31\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1008s 7.6505ms 130.7110 Ops/s 174.0527 Ops/s $\textbf{\color{#d91a1a}-24.90\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 15.1163ms 12.2787ms 81.4418 Ops/s 70.6770 Ops/s $\textbf{\color{#35bf28}+15.23\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.4740ms 1.3396ms 746.4639 Ops/s 739.7329 Ops/s $\color{#35bf28}+0.91\%$

Copy link

github-actions bot commented Mar 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1053s 0.1048s 9.5402 Ops/s 8.8735 Ops/s $\textbf{\color{#35bf28}+7.51\%}$
test_sync 93.1055ms 91.3681ms 10.9447 Ops/s 10.5755 Ops/s $\color{#35bf28}+3.49\%$
test_async 0.1693s 82.5262ms 12.1174 Ops/s 11.0288 Ops/s $\textbf{\color{#35bf28}+9.87\%}$
test_single_pixels 0.1143s 0.1136s 8.8055 Ops/s 8.2567 Ops/s $\textbf{\color{#35bf28}+6.65\%}$
test_sync_pixels 69.9057ms 68.0163ms 14.7023 Ops/s 12.3855 Ops/s $\textbf{\color{#35bf28}+18.71\%}$
test_async_pixels 0.1152s 56.4100ms 17.7274 Ops/s 13.7634 Ops/s $\textbf{\color{#35bf28}+28.80\%}$
test_simple 0.6885s 0.6736s 1.4846 Ops/s 1.2481 Ops/s $\textbf{\color{#35bf28}+18.95\%}$
test_transformed 0.8886s 0.8835s 1.1319 Ops/s 0.9849 Ops/s $\textbf{\color{#35bf28}+14.93\%}$
test_serial 2.2022s 2.1400s 0.4673 Ops/s 0.4552 Ops/s $\color{#35bf28}+2.66\%$
test_parallel 1.9292s 1.8662s 0.5359 Ops/s 0.5212 Ops/s $\color{#35bf28}+2.82\%$
test_step_mdp_speed[True-True-True-True-True] 0.1008ms 33.6243μs 29.7404 KOps/s 30.2446 KOps/s $\color{#d91a1a}-1.67\%$
test_step_mdp_speed[True-True-True-True-False] 0.2260ms 19.8878μs 50.2821 KOps/s 51.5465 KOps/s $\color{#d91a1a}-2.45\%$
test_step_mdp_speed[True-True-True-False-True] 0.2137ms 19.0197μs 52.5770 KOps/s 53.5591 KOps/s $\color{#d91a1a}-1.83\%$
test_step_mdp_speed[True-True-True-False-False] 29.1700μs 11.4359μs 87.4443 KOps/s 91.5548 KOps/s $\color{#d91a1a}-4.49\%$
test_step_mdp_speed[True-True-False-True-True] 0.2237ms 35.2708μs 28.3521 KOps/s 29.0685 KOps/s $\color{#d91a1a}-2.46\%$
test_step_mdp_speed[True-True-False-True-False] 0.2094ms 21.7482μs 45.9808 KOps/s 47.0838 KOps/s $\color{#d91a1a}-2.34\%$
test_step_mdp_speed[True-True-False-False-True] 46.3900μs 20.6178μs 48.5017 KOps/s 49.4923 KOps/s $\color{#d91a1a}-2.00\%$
test_step_mdp_speed[True-True-False-False-False] 0.1987ms 13.4531μs 74.3323 KOps/s 77.3809 KOps/s $\color{#d91a1a}-3.94\%$
test_step_mdp_speed[True-False-True-True-True] 53.8800μs 36.9493μs 27.0641 KOps/s 27.2602 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[True-False-True-True-False] 0.2164ms 23.4900μs 42.5713 KOps/s 42.7234 KOps/s $\color{#d91a1a}-0.36\%$
test_step_mdp_speed[True-False-True-False-True] 45.6510μs 20.6770μs 48.3630 KOps/s 50.0540 KOps/s $\color{#d91a1a}-3.38\%$
test_step_mdp_speed[True-False-True-False-False] 0.2021ms 13.3604μs 74.8481 KOps/s 78.4656 KOps/s $\color{#d91a1a}-4.61\%$
test_step_mdp_speed[True-False-False-True-True] 0.2303ms 38.9099μs 25.7004 KOps/s 26.5661 KOps/s $\color{#d91a1a}-3.26\%$
test_step_mdp_speed[True-False-False-True-False] 0.2105ms 25.4717μs 39.2592 KOps/s 40.7500 KOps/s $\color{#d91a1a}-3.66\%$
test_step_mdp_speed[True-False-False-False-True] 37.8200μs 22.2589μs 44.9259 KOps/s 46.6974 KOps/s $\color{#d91a1a}-3.79\%$
test_step_mdp_speed[True-False-False-False-False] 0.2037ms 15.1446μs 66.0300 KOps/s 68.3172 KOps/s $\color{#d91a1a}-3.35\%$
test_step_mdp_speed[False-True-True-True-True] 0.2261ms 37.3424μs 26.7792 KOps/s 27.6234 KOps/s $\color{#d91a1a}-3.06\%$
test_step_mdp_speed[False-True-True-True-False] 0.2130ms 23.6515μs 42.2806 KOps/s 43.3181 KOps/s $\color{#d91a1a}-2.40\%$
test_step_mdp_speed[False-True-True-False-True] 54.7600μs 25.0185μs 39.9705 KOps/s 41.3415 KOps/s $\color{#d91a1a}-3.32\%$
test_step_mdp_speed[False-True-True-False-False] 35.9510μs 15.0159μs 66.5961 KOps/s 67.8640 KOps/s $\color{#d91a1a}-1.87\%$
test_step_mdp_speed[False-True-False-True-True] 0.2367ms 39.1446μs 25.5463 KOps/s 25.9129 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[False-True-False-True-False] 61.4200μs 25.3885μs 39.3880 KOps/s 39.6335 KOps/s $\color{#d91a1a}-0.62\%$
test_step_mdp_speed[False-True-False-False-True] 0.2347ms 26.1091μs 38.3008 KOps/s 39.0226 KOps/s $\color{#d91a1a}-1.85\%$
test_step_mdp_speed[False-True-False-False-False] 0.2251ms 16.8075μs 59.4974 KOps/s 60.4356 KOps/s $\color{#d91a1a}-1.55\%$
test_step_mdp_speed[False-False-True-True-True] 0.2328ms 40.5414μs 24.6661 KOps/s 24.9934 KOps/s $\color{#d91a1a}-1.31\%$
test_step_mdp_speed[False-False-True-True-False] 41.5410μs 27.4610μs 36.4153 KOps/s 37.5841 KOps/s $\color{#d91a1a}-3.11\%$
test_step_mdp_speed[False-False-True-False-True] 0.2141ms 26.1625μs 38.2226 KOps/s 38.5824 KOps/s $\color{#d91a1a}-0.93\%$
test_step_mdp_speed[False-False-True-False-False] 0.2052ms 16.7848μs 59.5779 KOps/s 60.9686 KOps/s $\color{#d91a1a}-2.28\%$
test_step_mdp_speed[False-False-False-True-True] 66.8810μs 42.0411μs 23.7862 KOps/s 24.5155 KOps/s $\color{#d91a1a}-2.97\%$
test_step_mdp_speed[False-False-False-True-False] 50.0800μs 29.4854μs 33.9151 KOps/s 34.7811 KOps/s $\color{#d91a1a}-2.49\%$
test_step_mdp_speed[False-False-False-False-True] 0.2220ms 28.2840μs 35.3557 KOps/s 36.9128 KOps/s $\color{#d91a1a}-4.22\%$
test_step_mdp_speed[False-False-False-False-False] 0.2079ms 18.9369μs 52.8070 KOps/s 55.1709 KOps/s $\color{#d91a1a}-4.28\%$
test_values[generalized_advantage_estimate-True-True] 26.5945ms 25.8259ms 38.7208 Ops/s 40.1434 Ops/s $\color{#d91a1a}-3.54\%$
test_values[vec_generalized_advantage_estimate-True-True] 81.8232ms 3.1973ms 312.7599 Ops/s 298.3300 Ops/s $\color{#35bf28}+4.84\%$
test_values[td0_return_estimate-False-False] 95.2620μs 64.4098μs 15.5256 KOps/s 15.9943 KOps/s $\color{#d91a1a}-2.93\%$
test_values[td1_return_estimate-False-False] 55.6230ms 55.2522ms 18.0988 Ops/s 18.9888 Ops/s $\color{#d91a1a}-4.69\%$
test_values[vec_td1_return_estimate-False-False] 2.1263ms 1.7602ms 568.1048 Ops/s 572.5939 Ops/s $\color{#d91a1a}-0.78\%$
test_values[td_lambda_return_estimate-True-False] 87.5876ms 86.8333ms 11.5163 Ops/s 11.9753 Ops/s $\color{#d91a1a}-3.83\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.1455ms 1.7571ms 569.1277 Ops/s 572.2012 Ops/s $\color{#d91a1a}-0.54\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.7921ms 24.2610ms 41.2184 Ops/s 42.7412 Ops/s $\color{#d91a1a}-3.56\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8967ms 0.7011ms 1.4264 KOps/s 1.4591 KOps/s $\color{#d91a1a}-2.24\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7145ms 0.6427ms 1.5559 KOps/s 1.5802 KOps/s $\color{#d91a1a}-1.54\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4860ms 1.4500ms 689.6318 Ops/s 695.7410 Ops/s $\color{#d91a1a}-0.88\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9477ms 0.6610ms 1.5129 KOps/s 1.5244 KOps/s $\color{#d91a1a}-0.75\%$
test_dqn_speed 1.6920ms 1.4638ms 683.1322 Ops/s 653.2872 Ops/s $\color{#35bf28}+4.57\%$
test_ddpg_speed 3.0744ms 2.7757ms 360.2713 Ops/s 377.5191 Ops/s $\color{#d91a1a}-4.57\%$
test_sac_speed 8.6103ms 8.2382ms 121.3864 Ops/s 126.6283 Ops/s $\color{#d91a1a}-4.14\%$
test_redq_speed 11.5923ms 10.4704ms 95.5070 Ops/s 98.5872 Ops/s $\color{#d91a1a}-3.12\%$
test_redq_deprec_speed 11.7309ms 10.9784ms 91.0880 Ops/s 92.0563 Ops/s $\color{#d91a1a}-1.05\%$
test_td3_speed 8.3691ms 8.1718ms 122.3714 Ops/s 128.6381 Ops/s $\color{#d91a1a}-4.87\%$
test_cql_speed 26.7132ms 25.7107ms 38.8943 Ops/s 40.0036 Ops/s $\color{#d91a1a}-2.77\%$
test_a2c_speed 6.3013ms 5.4951ms 181.9797 Ops/s 182.5892 Ops/s $\color{#d91a1a}-0.33\%$
test_ppo_speed 7.4162ms 5.8398ms 171.2401 Ops/s 174.6646 Ops/s $\color{#d91a1a}-1.96\%$
test_reinforce_speed 5.0613ms 4.4638ms 224.0232 Ops/s 223.2150 Ops/s $\color{#35bf28}+0.36\%$
test_iql_speed 20.1141ms 19.4622ms 51.3815 Ops/s 51.6020 Ops/s $\color{#d91a1a}-0.43\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1221ms 2.9141ms 343.1560 Ops/s 353.6893 Ops/s $\color{#d91a1a}-2.98\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6560ms 0.5467ms 1.8291 KOps/s 1.8690 KOps/s $\color{#d91a1a}-2.14\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7539ms 0.5226ms 1.9134 KOps/s 1.9471 KOps/s $\color{#d91a1a}-1.73\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.2092ms 2.9216ms 342.2724 Ops/s 352.8219 Ops/s $\color{#d91a1a}-2.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7791ms 0.5423ms 1.8442 KOps/s 1.8931 KOps/s $\color{#d91a1a}-2.58\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.3877ms 0.5212ms 1.9185 KOps/s 1.9592 KOps/s $\color{#d91a1a}-2.07\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.7777ms 1.5916ms 628.3162 Ops/s 657.5156 Ops/s $\color{#d91a1a}-4.44\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7096ms 1.5161ms 659.5971 Ops/s 689.4304 Ops/s $\color{#d91a1a}-4.33\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.2333ms 3.0524ms 327.6138 Ops/s 338.0050 Ops/s $\color{#d91a1a}-3.07\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2166ms 0.6754ms 1.4805 KOps/s 1.5070 KOps/s $\color{#d91a1a}-1.76\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8891ms 0.6486ms 1.5418 KOps/s 1.5441 KOps/s $\color{#d91a1a}-0.15\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1330ms 2.9111ms 343.5168 Ops/s 352.4896 Ops/s $\color{#d91a1a}-2.55\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6515ms 0.5475ms 1.8266 KOps/s 1.8610 KOps/s $\color{#d91a1a}-1.85\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1090s 0.6513ms 1.5355 KOps/s 1.9220 KOps/s $\textbf{\color{#d91a1a}-20.11\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.2360ms 2.9504ms 338.9392 Ops/s 347.3809 Ops/s $\color{#d91a1a}-2.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6657ms 0.5423ms 1.8441 KOps/s 1.8798 KOps/s $\color{#d91a1a}-1.90\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7055ms 0.5165ms 1.9362 KOps/s 1.9330 KOps/s $\color{#35bf28}+0.17\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1917ms 3.0531ms 327.5371 Ops/s 337.4994 Ops/s $\color{#d91a1a}-2.95\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.3305ms 0.6765ms 1.4781 KOps/s 1.4961 KOps/s $\color{#d91a1a}-1.20\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.1115s 0.7899ms 1.2659 KOps/s 1.2743 KOps/s $\color{#d91a1a}-0.66\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1086s 6.8983ms 144.9628 Ops/s 146.6475 Ops/s $\color{#d91a1a}-1.15\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.6853ms 15.4995ms 64.5183 Ops/s 67.6092 Ops/s $\color{#d91a1a}-4.57\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.3490ms 1.1469ms 871.9251 Ops/s 929.2022 Ops/s $\textbf{\color{#d91a1a}-6.16\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1071s 8.8925ms 112.4545 Ops/s 112.8677 Ops/s $\color{#d91a1a}-0.37\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.8142ms 15.4407ms 64.7639 Ops/s 68.2595 Ops/s $\textbf{\color{#d91a1a}-5.12\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.1314ms 1.1144ms 897.3651 Ops/s 932.5171 Ops/s $\color{#d91a1a}-3.77\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1064s 7.2378ms 138.1631 Ops/s 139.7721 Ops/s $\color{#d91a1a}-1.15\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 18.1205ms 15.7745ms 63.3933 Ops/s 59.2813 Ops/s $\textbf{\color{#35bf28}+6.94\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.5002ms 1.4632ms 683.4405 Ops/s 717.4225 Ops/s $\color{#d91a1a}-4.74\%$

@vmoens vmoens marked this pull request as ready for review March 8, 2024 20:22
@vmoens vmoens merged commit 358475a into main Mar 8, 2024
52 of 66 checks passed
vmoens added a commit that referenced this pull request Mar 25, 2024
(cherry picked from commit 358475a)
@vmoens vmoens deleted the faster-dmc branch April 3, 2024 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants