Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor,Performance] Faster collectors (bis) #1331

Merged
merged 36 commits into from
Jul 7, 2023
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 28, 2023

The perf of stack onto is better than the perf of stack (compared with call to contiguous(), otherwise no real stack occurs)
image

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2023
@vmoens vmoens added the performance Performance issue or suggestion for improvement label Jun 28, 2023
@github-actions
Copy link

github-actions bot commented Jun 30, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1892s 0.1857s 5.3861 Ops/s 4.6913 Ops/s $\textbf{\color{#35bf28}+14.81\%}$
test_sync 0.1013s 97.8309ms 10.2217 Ops/s 8.0327 Ops/s $\textbf{\color{#35bf28}+27.25\%}$
test_async 0.1854s 95.0111ms 10.5251 Ops/s 8.4431 Ops/s $\textbf{\color{#35bf28}+24.66\%}$
test_simple 0.9737s 0.8832s 1.1323 Ops/s 1.1198 Ops/s $\color{#35bf28}+1.11\%$
test_transformed 2.2864s 2.2053s 0.4535 Ops/s 0.4736 Ops/s $\color{#d91a1a}-4.25\%$
test_serial 2.8114s 2.7485s 0.3638 Ops/s 0.3778 Ops/s $\color{#d91a1a}-3.69\%$
test_parallel 2.3747s 2.1575s 0.4635 Ops/s 0.4774 Ops/s $\color{#d91a1a}-2.91\%$
test_step_mdp_speed[True-True-True-True-True] 1.3202ms 54.3867μs 18.3869 KOps/s 18.7080 KOps/s $\color{#d91a1a}-1.72\%$
test_step_mdp_speed[True-True-True-True-False] 2.2762ms 30.9771μs 32.2819 KOps/s 32.4563 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[True-True-True-False-True] 4.9490ms 40.8395μs 24.4861 KOps/s 24.1757 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[True-True-True-False-False] 0.4803ms 22.0723μs 45.3056 KOps/s 44.3045 KOps/s $\color{#35bf28}+2.26\%$
test_step_mdp_speed[True-True-False-True-True] 0.3837ms 55.3917μs 18.0532 KOps/s 17.8371 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-True-False-True-False] 0.5658ms 32.6650μs 30.6138 KOps/s 28.8716 KOps/s $\textbf{\color{#35bf28}+6.03\%}$
test_step_mdp_speed[True-True-False-False-True] 1.1439ms 42.2791μs 23.6524 KOps/s 23.1702 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[True-True-False-False-False] 0.5493ms 24.3059μs 41.1422 KOps/s 40.4941 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[True-False-True-True-True] 0.6469ms 55.9424μs 17.8755 KOps/s 17.0418 KOps/s $\color{#35bf28}+4.89\%$
test_step_mdp_speed[True-False-True-True-False] 2.2910ms 34.0688μs 29.3524 KOps/s 28.6408 KOps/s $\color{#35bf28}+2.48\%$
test_step_mdp_speed[True-False-True-False-True] 0.6002ms 42.3791μs 23.5965 KOps/s 23.8015 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[True-False-True-False-False] 0.3552ms 24.0170μs 41.6372 KOps/s 40.0353 KOps/s $\color{#35bf28}+4.00\%$
test_step_mdp_speed[True-False-False-True-True] 1.5068ms 59.1872μs 16.8955 KOps/s 15.9275 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_step_mdp_speed[True-False-False-True-False] 0.9343ms 35.6961μs 28.0142 KOps/s 26.9587 KOps/s $\color{#35bf28}+3.92\%$
test_step_mdp_speed[True-False-False-False-True] 4.3551ms 44.4860μs 22.4790 KOps/s 19.1732 KOps/s $\textbf{\color{#35bf28}+17.24\%}$
test_step_mdp_speed[True-False-False-False-False] 0.5197ms 25.9347μs 38.5583 KOps/s 37.3886 KOps/s $\color{#35bf28}+3.13\%$
test_step_mdp_speed[False-True-True-True-True] 4.4477ms 59.0413μs 16.9373 KOps/s 16.7771 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[False-True-True-True-False] 1.3981ms 34.6571μs 28.8542 KOps/s 29.5698 KOps/s $\color{#d91a1a}-2.42\%$
test_step_mdp_speed[False-True-True-False-True] 0.3970ms 47.8693μs 20.8902 KOps/s 18.6256 KOps/s $\textbf{\color{#35bf28}+12.16\%}$
test_step_mdp_speed[False-True-True-False-False] 6.4158ms 27.2973μs 36.6337 KOps/s 36.0154 KOps/s $\color{#35bf28}+1.72\%$
test_step_mdp_speed[False-True-False-True-True] 0.5516ms 58.9145μs 16.9738 KOps/s 17.1028 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-True-False-True-False] 1.1672ms 35.9291μs 27.8326 KOps/s 27.1799 KOps/s $\color{#35bf28}+2.40\%$
test_step_mdp_speed[False-True-False-False-True] 0.5573ms 49.4166μs 20.2361 KOps/s 20.3988 KOps/s $\color{#d91a1a}-0.80\%$
test_step_mdp_speed[False-True-False-False-False] 0.9514ms 28.2428μs 35.4073 KOps/s 35.5741 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-True-True-True] 3.1293ms 61.8573μs 16.1662 KOps/s 16.8778 KOps/s $\color{#d91a1a}-4.22\%$
test_step_mdp_speed[False-False-True-True-False] 0.6172ms 37.5420μs 26.6369 KOps/s 26.0692 KOps/s $\color{#35bf28}+2.18\%$
test_step_mdp_speed[False-False-True-False-True] 0.1831ms 48.1761μs 20.7572 KOps/s 18.1208 KOps/s $\textbf{\color{#35bf28}+14.55\%}$
test_step_mdp_speed[False-False-True-False-False] 0.4270ms 27.9910μs 35.7257 KOps/s 33.5171 KOps/s $\textbf{\color{#35bf28}+6.59\%}$
test_step_mdp_speed[False-False-False-True-True] 2.9801ms 62.7129μs 15.9457 KOps/s 16.1196 KOps/s $\color{#d91a1a}-1.08\%$
test_step_mdp_speed[False-False-False-True-False] 2.3710ms 39.5908μs 25.2584 KOps/s 25.4783 KOps/s $\color{#d91a1a}-0.86\%$
test_step_mdp_speed[False-False-False-False-True] 0.9253ms 49.5415μs 20.1851 KOps/s 18.7217 KOps/s $\textbf{\color{#35bf28}+7.82\%}$
test_step_mdp_speed[False-False-False-False-False] 4.0849ms 29.7339μs 33.6316 KOps/s 30.5640 KOps/s $\textbf{\color{#35bf28}+10.04\%}$
test_values[generalized_advantage_estimate-True-True] 21.6707ms 18.8630ms 53.0139 Ops/s 53.6339 Ops/s $\color{#d91a1a}-1.16\%$
test_values[vec_generalized_advantage_estimate-True-True] 75.5466ms 66.1030ms 15.1279 Ops/s 14.4239 Ops/s $\color{#35bf28}+4.88\%$
test_values[td0_return_estimate-False-False] 0.7134ms 0.3071ms 3.2564 KOps/s 2.9574 KOps/s $\textbf{\color{#35bf28}+10.11\%}$
test_values[td1_return_estimate-False-False] 19.4841ms 18.0950ms 55.2639 Ops/s 58.2740 Ops/s $\textbf{\color{#d91a1a}-5.17\%}$
test_values[vec_td1_return_estimate-False-False] 86.9889ms 67.0895ms 14.9055 Ops/s 14.9019 Ops/s $\color{#35bf28}+0.02\%$
test_values[td_lambda_return_estimate-True-False] 53.5987ms 45.7055ms 21.8792 Ops/s 21.5424 Ops/s $\color{#35bf28}+1.56\%$
test_values[vec_td_lambda_return_estimate-True-False] 78.4416ms 66.6382ms 15.0064 Ops/s 14.9267 Ops/s $\color{#35bf28}+0.53\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 15.5938ms 14.3389ms 69.7404 Ops/s 71.5780 Ops/s $\color{#d91a1a}-2.57\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 6.7497ms 4.3936ms 227.6047 Ops/s 233.2881 Ops/s $\color{#d91a1a}-2.44\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 2.3877ms 0.6359ms 1.5726 KOps/s 1.5499 KOps/s $\color{#35bf28}+1.47\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 84.4087ms 74.7777ms 13.3730 Ops/s 14.0316 Ops/s $\color{#d91a1a}-4.69\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 9.6492ms 5.5209ms 181.1298 Ops/s 186.2370 Ops/s $\color{#d91a1a}-2.74\%$
test_dqn_speed 7.9113ms 2.3938ms 417.7523 Ops/s 423.0881 Ops/s $\color{#d91a1a}-1.26\%$
test_ddpg_speed 9.9337ms 4.4004ms 227.2501 Ops/s 228.6864 Ops/s $\color{#d91a1a}-0.63\%$
test_sac_speed 16.6543ms 12.1767ms 82.1242 Ops/s 79.7428 Ops/s $\color{#35bf28}+2.99\%$
test_redq_speed 30.3917ms 24.1900ms 41.3394 Ops/s 42.2615 Ops/s $\color{#d91a1a}-2.18\%$
test_redq_deprec_speed 22.8509ms 20.3040ms 49.2514 Ops/s 50.7549 Ops/s $\color{#d91a1a}-2.96\%$
test_td3_speed 23.2361ms 17.7538ms 56.3258 Ops/s 61.6967 Ops/s $\textbf{\color{#d91a1a}-8.71\%}$
test_cql_speed 54.7995ms 47.8879ms 20.8821 Ops/s 17.7078 Ops/s $\textbf{\color{#35bf28}+17.93\%}$
test_a2c_speed 15.2632ms 10.5665ms 94.6384 Ops/s 98.6468 Ops/s $\color{#d91a1a}-4.06\%$
test_ppo_speed 20.7207ms 11.2802ms 88.6509 Ops/s 88.3122 Ops/s $\color{#35bf28}+0.38\%$
test_reinforce_speed 14.7695ms 8.5470ms 116.9999 Ops/s 113.6091 Ops/s $\color{#35bf28}+2.98\%$
test_iql_speed 45.2048ms 41.6881ms 23.9877 Ops/s 24.5867 Ops/s $\color{#d91a1a}-2.44\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 0.1409s 5.6380ms 177.3691 Ops/s 207.9854 Ops/s $\textbf{\color{#d91a1a}-14.72\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 7.6142ms 5.0055ms 199.7810 Ops/s 200.3868 Ops/s $\color{#d91a1a}-0.30\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 9.8131ms 5.1683ms 193.4877 Ops/s 212.6838 Ops/s $\textbf{\color{#d91a1a}-9.03\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 12.1216ms 5.1335ms 194.7987 Ops/s 183.9582 Ops/s $\textbf{\color{#35bf28}+5.89\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 8.2018ms 5.1565ms 193.9308 Ops/s 208.1475 Ops/s $\textbf{\color{#d91a1a}-6.83\%}$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 7.2151ms 5.0109ms 199.5639 Ops/s 170.9874 Ops/s $\textbf{\color{#35bf28}+16.71\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 0.1467s 5.7025ms 175.3609 Ops/s 212.6741 Ops/s $\textbf{\color{#d91a1a}-17.54\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 9.6422ms 5.0883ms 196.5275 Ops/s 195.2171 Ops/s $\color{#35bf28}+0.67\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 14.8343ms 5.2845ms 189.2316 Ops/s 192.0651 Ops/s $\color{#d91a1a}-1.48\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 9.3542ms 4.9993ms 200.0278 Ops/s 202.0956 Ops/s $\color{#d91a1a}-1.02\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 9.3471ms 5.1211ms 195.2707 Ops/s 198.3255 Ops/s $\color{#d91a1a}-1.54\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.1802s 6.0057ms 166.5075 Ops/s 191.9995 Ops/s $\textbf{\color{#d91a1a}-13.28\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 9.5452ms 5.0945ms 196.2883 Ops/s 204.6935 Ops/s $\color{#d91a1a}-4.11\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 11.8956ms 5.1343ms 194.7691 Ops/s 197.6823 Ops/s $\color{#d91a1a}-1.47\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 8.9045ms 5.0053ms 199.7900 Ops/s 191.6021 Ops/s $\color{#35bf28}+4.27\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 0.1405s 5.7054ms 175.2720 Ops/s 196.5704 Ops/s $\textbf{\color{#d91a1a}-10.83\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 7.3654ms 5.1167ms 195.4396 Ops/s 197.1261 Ops/s $\color{#d91a1a}-0.86\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.1788s 5.8603ms 170.6403 Ops/s 197.9995 Ops/s $\textbf{\color{#d91a1a}-13.82\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.3916s 45.8901ms 21.7912 Ops/s 22.4059 Ops/s $\color{#d91a1a}-2.74\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1861s 41.3057ms 24.2097 Ops/s 22.0096 Ops/s $\textbf{\color{#35bf28}+10.00\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1884s 42.0834ms 23.7624 Ops/s 23.7557 Ops/s $\color{#35bf28}+0.03\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.2077s 41.9166ms 23.8569 Ops/s 23.6065 Ops/s $\color{#35bf28}+1.06\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1916s 42.5099ms 23.5239 Ops/s 24.4105 Ops/s $\color{#d91a1a}-3.63\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.2047s 45.2123ms 22.1179 Ops/s 23.4627 Ops/s $\textbf{\color{#d91a1a}-5.73\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1971s 41.9915ms 23.8144 Ops/s 22.1013 Ops/s $\textbf{\color{#35bf28}+7.75\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1867s 41.8494ms 23.8952 Ops/s 24.1772 Ops/s $\color{#d91a1a}-1.17\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1961s 42.8442ms 23.3404 Ops/s 23.9019 Ops/s $\color{#d91a1a}-2.35\%$

@github-actions
Copy link

github-actions bot commented Jun 30, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1704s 0.1699s 5.8864 Ops/s 5.1710 Ops/s $\textbf{\color{#35bf28}+13.83\%}$
test_sync 90.0793ms 88.0239ms 11.3606 Ops/s 9.9377 Ops/s $\textbf{\color{#35bf28}+14.32\%}$
test_async 0.1721s 86.1629ms 11.6059 Ops/s 10.0400 Ops/s $\textbf{\color{#35bf28}+15.60\%}$
test_simple 0.8805s 0.7884s 1.2683 Ops/s 1.3161 Ops/s $\color{#d91a1a}-3.63\%$
test_transformed 2.0528s 1.9794s 0.5052 Ops/s 0.5171 Ops/s $\color{#d91a1a}-2.31\%$
test_serial 2.4584s 2.3818s 0.4199 Ops/s 0.4341 Ops/s $\color{#d91a1a}-3.28\%$
test_parallel 1.8794s 1.8074s 0.5533 Ops/s 0.5419 Ops/s $\color{#35bf28}+2.11\%$
test_step_mdp_speed[True-True-True-True-True] 0.2216ms 43.2430μs 23.1251 KOps/s 23.6474 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[True-True-True-True-False] 0.2150ms 24.0319μs 41.6113 KOps/s 42.4848 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-True-True-False-True] 0.1462ms 30.2507μs 33.0570 KOps/s 33.5703 KOps/s $\color{#d91a1a}-1.53\%$
test_step_mdp_speed[True-True-True-False-False] 41.8010μs 16.8012μs 59.5196 KOps/s 60.9452 KOps/s $\color{#d91a1a}-2.34\%$
test_step_mdp_speed[True-True-False-True-True] 0.1589ms 44.3112μs 22.5677 KOps/s 23.0281 KOps/s $\color{#d91a1a}-2.00\%$
test_step_mdp_speed[True-True-False-True-False] 50.0000μs 25.6981μs 38.9134 KOps/s 40.0840 KOps/s $\color{#d91a1a}-2.92\%$
test_step_mdp_speed[True-True-False-False-True] 0.1447ms 32.2604μs 30.9978 KOps/s 31.8297 KOps/s $\color{#d91a1a}-2.61\%$
test_step_mdp_speed[True-True-False-False-False] 48.0010μs 18.7129μs 53.4390 KOps/s 55.2770 KOps/s $\color{#d91a1a}-3.33\%$
test_step_mdp_speed[True-False-True-True-True] 0.1227ms 45.7098μs 21.8771 KOps/s 22.0147 KOps/s $\color{#d91a1a}-0.62\%$
test_step_mdp_speed[True-False-True-True-False] 61.6010μs 27.5046μs 36.3576 KOps/s 37.6358 KOps/s $\color{#d91a1a}-3.40\%$
test_step_mdp_speed[True-False-True-False-True] 0.1262ms 31.9648μs 31.2844 KOps/s 31.9056 KOps/s $\color{#d91a1a}-1.95\%$
test_step_mdp_speed[True-False-True-False-False] 50.2000μs 18.4261μs 54.2707 KOps/s 55.9239 KOps/s $\color{#d91a1a}-2.96\%$
test_step_mdp_speed[True-False-False-True-True] 0.1546ms 47.0473μs 21.2552 KOps/s 21.3604 KOps/s $\color{#d91a1a}-0.49\%$
test_step_mdp_speed[True-False-False-True-False] 58.1010μs 28.8819μs 34.6238 KOps/s 35.5858 KOps/s $\color{#d91a1a}-2.70\%$
test_step_mdp_speed[True-False-False-False-True] 0.1453ms 33.5414μs 29.8139 KOps/s 30.6289 KOps/s $\color{#d91a1a}-2.66\%$
test_step_mdp_speed[True-False-False-False-False] 67.2010μs 20.0241μs 49.9398 KOps/s 51.0707 KOps/s $\color{#d91a1a}-2.21\%$
test_step_mdp_speed[False-True-True-True-True] 0.1504ms 46.0335μs 21.7233 KOps/s 21.9549 KOps/s $\color{#d91a1a}-1.05\%$
test_step_mdp_speed[False-True-True-True-False] 60.6000μs 27.3346μs 36.5837 KOps/s 37.6938 KOps/s $\color{#d91a1a}-2.94\%$
test_step_mdp_speed[False-True-True-False-True] 0.1456ms 37.0356μs 27.0010 KOps/s 27.3387 KOps/s $\color{#d91a1a}-1.24\%$
test_step_mdp_speed[False-True-True-False-False] 0.2772ms 20.4644μs 48.8652 KOps/s 50.0280 KOps/s $\color{#d91a1a}-2.32\%$
test_step_mdp_speed[False-True-False-True-True] 0.1573ms 47.8914μs 20.8806 KOps/s 21.0942 KOps/s $\color{#d91a1a}-1.01\%$
test_step_mdp_speed[False-True-False-True-False] 0.1113ms 28.9172μs 34.5815 KOps/s 35.2366 KOps/s $\color{#d91a1a}-1.86\%$
test_step_mdp_speed[False-True-False-False-True] 0.1387ms 38.3578μs 26.0703 KOps/s 26.2010 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[False-True-False-False-False] 52.0000μs 22.1564μs 45.1336 KOps/s 46.5548 KOps/s $\color{#d91a1a}-3.05\%$
test_step_mdp_speed[False-False-True-True-True] 0.1533ms 48.9225μs 20.4405 KOps/s 20.8729 KOps/s $\color{#d91a1a}-2.07\%$
test_step_mdp_speed[False-False-True-True-False] 0.1173ms 30.5890μs 32.6914 KOps/s 33.5767 KOps/s $\color{#d91a1a}-2.64\%$
test_step_mdp_speed[False-False-True-False-True] 63.1010μs 39.2234μs 25.4950 KOps/s 25.9366 KOps/s $\color{#d91a1a}-1.70\%$
test_step_mdp_speed[False-False-True-False-False] 0.1064ms 21.7969μs 45.8780 KOps/s 47.0852 KOps/s $\color{#d91a1a}-2.56\%$
test_step_mdp_speed[False-False-False-True-True] 0.2385ms 49.9431μs 20.0228 KOps/s 20.3246 KOps/s $\color{#d91a1a}-1.49\%$
test_step_mdp_speed[False-False-False-True-False] 0.1098ms 32.0035μs 31.2466 KOps/s 31.7519 KOps/s $\color{#d91a1a}-1.59\%$
test_step_mdp_speed[False-False-False-False-True] 0.1545ms 39.6949μs 25.1921 KOps/s 25.6122 KOps/s $\color{#d91a1a}-1.64\%$
test_step_mdp_speed[False-False-False-False-False] 50.0010μs 23.4226μs 42.6939 KOps/s 43.9301 KOps/s $\color{#d91a1a}-2.81\%$
test_values[generalized_advantage_estimate-True-True] 16.7459ms 16.1545ms 61.9022 Ops/s 61.3062 Ops/s $\color{#35bf28}+0.97\%$
test_values[vec_generalized_advantage_estimate-True-True] 56.5632ms 50.9361ms 19.6325 Ops/s 19.1710 Ops/s $\color{#35bf28}+2.41\%$
test_values[td0_return_estimate-False-False] 0.4517ms 0.3064ms 3.2637 KOps/s 3.4089 KOps/s $\color{#d91a1a}-4.26\%$
test_values[td1_return_estimate-False-False] 15.8342ms 15.5858ms 64.1608 Ops/s 63.5724 Ops/s $\color{#35bf28}+0.93\%$
test_values[vec_td1_return_estimate-False-False] 53.1953ms 50.6735ms 19.7342 Ops/s 19.2212 Ops/s $\color{#35bf28}+2.67\%$
test_values[td_lambda_return_estimate-True-False] 39.4961ms 38.4341ms 26.0186 Ops/s 26.0583 Ops/s $\color{#d91a1a}-0.15\%$
test_values[vec_td_lambda_return_estimate-True-False] 58.9599ms 51.1611ms 19.5461 Ops/s 19.0920 Ops/s $\color{#35bf28}+2.38\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 13.5418ms 13.2792ms 75.3059 Ops/s 73.1945 Ops/s $\color{#35bf28}+2.88\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 7.5598ms 4.2535ms 235.0991 Ops/s 228.9204 Ops/s $\color{#35bf28}+2.70\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 2.2000ms 0.5897ms 1.6958 KOps/s 1.6728 KOps/s $\color{#35bf28}+1.38\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 67.9402ms 67.3996ms 14.8369 Ops/s 14.9897 Ops/s $\color{#d91a1a}-1.02\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 6.3622ms 3.9343ms 254.1773 Ops/s 261.3731 Ops/s $\color{#d91a1a}-2.75\%$
test_dqn_speed 2.6318ms 1.9935ms 501.6255 Ops/s 487.4321 Ops/s $\color{#35bf28}+2.91\%$
test_ddpg_speed 10.3786ms 3.2902ms 303.9293 Ops/s 280.2547 Ops/s $\textbf{\color{#35bf28}+8.45\%}$
test_sac_speed 12.0594ms 10.2940ms 97.1436 Ops/s 95.1448 Ops/s $\color{#35bf28}+2.10\%$
test_redq_speed 24.8585ms 18.4094ms 54.3202 Ops/s 54.6378 Ops/s $\color{#d91a1a}-0.58\%$
test_redq_deprec_speed 16.7859ms 15.6741ms 63.7995 Ops/s 63.0728 Ops/s $\color{#35bf28}+1.15\%$
test_td3_speed 19.5024ms 14.5638ms 68.6632 Ops/s 70.9143 Ops/s $\color{#d91a1a}-3.17\%$
test_cql_speed 47.8674ms 40.6239ms 24.6160 Ops/s 22.8095 Ops/s $\textbf{\color{#35bf28}+7.92\%}$
test_a2c_speed 9.0028ms 7.4013ms 135.1113 Ops/s 142.9486 Ops/s $\textbf{\color{#d91a1a}-5.48\%}$
test_ppo_speed 20.8610ms 8.0706ms 123.9069 Ops/s 135.4278 Ops/s $\textbf{\color{#d91a1a}-8.51\%}$
test_reinforce_speed 7.1560ms 5.5983ms 178.6252 Ops/s 188.5206 Ops/s $\textbf{\color{#d91a1a}-5.25\%}$
test_iql_speed 29.2524ms 27.4961ms 36.3688 Ops/s 35.1077 Ops/s $\color{#35bf28}+3.59\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.4035ms 4.5511ms 219.7267 Ops/s 196.5339 Ops/s $\textbf{\color{#35bf28}+11.80\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 9.1175ms 4.6834ms 213.5220 Ops/s 219.0414 Ops/s $\color{#d91a1a}-2.52\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 9.4895ms 4.7057ms 212.5090 Ops/s 217.1204 Ops/s $\color{#d91a1a}-2.12\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 0.1520s 5.1868ms 192.7964 Ops/s 225.6472 Ops/s $\textbf{\color{#d91a1a}-14.56\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 8.0227ms 4.6747ms 213.9182 Ops/s 211.6111 Ops/s $\color{#35bf28}+1.09\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.1857s 5.4546ms 183.3301 Ops/s 217.4018 Ops/s $\textbf{\color{#d91a1a}-15.67\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 11.5651ms 4.5467ms 219.9393 Ops/s 193.3705 Ops/s $\textbf{\color{#35bf28}+13.74\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 9.7504ms 4.6618ms 214.5090 Ops/s 216.5552 Ops/s $\color{#d91a1a}-0.94\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 7.8136ms 4.6453ms 215.2733 Ops/s 217.5755 Ops/s $\color{#d91a1a}-1.06\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 0.1470s 5.1432ms 194.4331 Ops/s 197.0654 Ops/s $\color{#d91a1a}-1.34\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 7.7157ms 4.6356ms 215.7223 Ops/s 217.0088 Ops/s $\color{#d91a1a}-0.59\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 11.9930ms 4.7296ms 211.4336 Ops/s 184.2382 Ops/s $\textbf{\color{#35bf28}+14.76\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.1093ms 4.5214ms 221.1718 Ops/s 226.0547 Ops/s $\color{#d91a1a}-2.16\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 9.4435ms 4.7244ms 211.6661 Ops/s 216.9080 Ops/s $\color{#d91a1a}-2.42\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 7.3330ms 4.6999ms 212.7685 Ops/s 217.7595 Ops/s $\color{#d91a1a}-2.29\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 0.1482s 5.1920ms 192.6025 Ops/s 191.0213 Ops/s $\color{#35bf28}+0.83\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 8.1383ms 4.6986ms 212.8295 Ops/s 214.5232 Ops/s $\color{#d91a1a}-0.79\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 9.4354ms 4.7232ms 211.7204 Ops/s 216.8997 Ops/s $\color{#d91a1a}-2.39\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.3343s 42.0968ms 23.7548 Ops/s 24.9902 Ops/s $\color{#d91a1a}-4.94\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1880s 35.7583ms 27.9655 Ops/s 28.1401 Ops/s $\color{#d91a1a}-0.62\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1859s 35.4190ms 28.2334 Ops/s 28.0174 Ops/s $\color{#35bf28}+0.77\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1838s 35.1400ms 28.4576 Ops/s 28.3727 Ops/s $\color{#35bf28}+0.30\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1895s 35.5189ms 28.1540 Ops/s 28.1951 Ops/s $\color{#d91a1a}-0.15\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1900s 38.8619ms 25.7321 Ops/s 25.4413 Ops/s $\color{#35bf28}+1.14\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1934s 36.2949ms 27.5521 Ops/s 27.8152 Ops/s $\color{#d91a1a}-0.95\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1937s 35.6335ms 28.0635 Ops/s 27.9271 Ops/s $\color{#35bf28}+0.49\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1928s 35.7775ms 27.9505 Ops/s 28.4495 Ops/s $\color{#d91a1a}-1.75\%$

Copy link
Contributor

@matteobettini matteobettini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, i just have a question about the 2 clones. These are very expensive so just wanna make sure there is absolutely no way to avoid them

torchrl/collectors/collectors.py Show resolved Hide resolved
torchrl/collectors/collectors.py Show resolved Hide resolved
@vmoens vmoens merged commit fcb04e4 into main Jul 7, 2023
@vmoens vmoens deleted the faster_collector_rollout branch July 7, 2023 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants