[Refactor,Performance] Faster collectors (bis) #1331

vmoens · 2023-06-28T11:12:40Z

The perf of stack onto is better than the perf of stack (compared with call to contiguous(), otherwise no real stack occurs)

# Conflicts: # torchrl/envs/utils.py

github-actions · 2023-06-30T14:38:44Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}10$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1892s	0.1857s	5.3861 Ops/s	4.6913 Ops/s	$\textbf{\color{#35bf28}+14.81\%}$
test_sync	0.1013s	97.8309ms	10.2217 Ops/s	8.0327 Ops/s	$\textbf{\color{#35bf28}+27.25\%}$
test_async	0.1854s	95.0111ms	10.5251 Ops/s	8.4431 Ops/s	$\textbf{\color{#35bf28}+24.66\%}$
test_simple	0.9737s	0.8832s	1.1323 Ops/s	1.1198 Ops/s	$\color{#35bf28}+1.11\%$
test_transformed	2.2864s	2.2053s	0.4535 Ops/s	0.4736 Ops/s	$\color{#d91a1a}-4.25\%$
test_serial	2.8114s	2.7485s	0.3638 Ops/s	0.3778 Ops/s	$\color{#d91a1a}-3.69\%$
test_parallel	2.3747s	2.1575s	0.4635 Ops/s	0.4774 Ops/s	$\color{#d91a1a}-2.91\%$
test_step_mdp_speed[True-True-True-True-True]	1.3202ms	54.3867μs	18.3869 KOps/s	18.7080 KOps/s	$\color{#d91a1a}-1.72\%$
test_step_mdp_speed[True-True-True-True-False]	2.2762ms	30.9771μs	32.2819 KOps/s	32.4563 KOps/s	$\color{#d91a1a}-0.54\%$
test_step_mdp_speed[True-True-True-False-True]	4.9490ms	40.8395μs	24.4861 KOps/s	24.1757 KOps/s	$\color{#35bf28}+1.28\%$
test_step_mdp_speed[True-True-True-False-False]	0.4803ms	22.0723μs	45.3056 KOps/s	44.3045 KOps/s	$\color{#35bf28}+2.26\%$
test_step_mdp_speed[True-True-False-True-True]	0.3837ms	55.3917μs	18.0532 KOps/s	17.8371 KOps/s	$\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-True-False-True-False]	0.5658ms	32.6650μs	30.6138 KOps/s	28.8716 KOps/s	$\textbf{\color{#35bf28}+6.03\%}$
test_step_mdp_speed[True-True-False-False-True]	1.1439ms	42.2791μs	23.6524 KOps/s	23.1702 KOps/s	$\color{#35bf28}+2.08\%$
test_step_mdp_speed[True-True-False-False-False]	0.5493ms	24.3059μs	41.1422 KOps/s	40.4941 KOps/s	$\color{#35bf28}+1.60\%$
test_step_mdp_speed[True-False-True-True-True]	0.6469ms	55.9424μs	17.8755 KOps/s	17.0418 KOps/s	$\color{#35bf28}+4.89\%$
test_step_mdp_speed[True-False-True-True-False]	2.2910ms	34.0688μs	29.3524 KOps/s	28.6408 KOps/s	$\color{#35bf28}+2.48\%$
test_step_mdp_speed[True-False-True-False-True]	0.6002ms	42.3791μs	23.5965 KOps/s	23.8015 KOps/s	$\color{#d91a1a}-0.86\%$
test_step_mdp_speed[True-False-True-False-False]	0.3552ms	24.0170μs	41.6372 KOps/s	40.0353 KOps/s	$\color{#35bf28}+4.00\%$
test_step_mdp_speed[True-False-False-True-True]	1.5068ms	59.1872μs	16.8955 KOps/s	15.9275 KOps/s	$\textbf{\color{#35bf28}+6.08\%}$
test_step_mdp_speed[True-False-False-True-False]	0.9343ms	35.6961μs	28.0142 KOps/s	26.9587 KOps/s	$\color{#35bf28}+3.92\%$
test_step_mdp_speed[True-False-False-False-True]	4.3551ms	44.4860μs	22.4790 KOps/s	19.1732 KOps/s	$\textbf{\color{#35bf28}+17.24\%}$
test_step_mdp_speed[True-False-False-False-False]	0.5197ms	25.9347μs	38.5583 KOps/s	37.3886 KOps/s	$\color{#35bf28}+3.13\%$
test_step_mdp_speed[False-True-True-True-True]	4.4477ms	59.0413μs	16.9373 KOps/s	16.7771 KOps/s	$\color{#35bf28}+0.96\%$
test_step_mdp_speed[False-True-True-True-False]	1.3981ms	34.6571μs	28.8542 KOps/s	29.5698 KOps/s	$\color{#d91a1a}-2.42\%$
test_step_mdp_speed[False-True-True-False-True]	0.3970ms	47.8693μs	20.8902 KOps/s	18.6256 KOps/s	$\textbf{\color{#35bf28}+12.16\%}$
test_step_mdp_speed[False-True-True-False-False]	6.4158ms	27.2973μs	36.6337 KOps/s	36.0154 KOps/s	$\color{#35bf28}+1.72\%$
test_step_mdp_speed[False-True-False-True-True]	0.5516ms	58.9145μs	16.9738 KOps/s	17.1028 KOps/s	$\color{#d91a1a}-0.75\%$
test_step_mdp_speed[False-True-False-True-False]	1.1672ms	35.9291μs	27.8326 KOps/s	27.1799 KOps/s	$\color{#35bf28}+2.40\%$
test_step_mdp_speed[False-True-False-False-True]	0.5573ms	49.4166μs	20.2361 KOps/s	20.3988 KOps/s	$\color{#d91a1a}-0.80\%$
test_step_mdp_speed[False-True-False-False-False]	0.9514ms	28.2428μs	35.4073 KOps/s	35.5741 KOps/s	$\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-False-True-True-True]	3.1293ms	61.8573μs	16.1662 KOps/s	16.8778 KOps/s	$\color{#d91a1a}-4.22\%$
test_step_mdp_speed[False-False-True-True-False]	0.6172ms	37.5420μs	26.6369 KOps/s	26.0692 KOps/s	$\color{#35bf28}+2.18\%$
test_step_mdp_speed[False-False-True-False-True]	0.1831ms	48.1761μs	20.7572 KOps/s	18.1208 KOps/s	$\textbf{\color{#35bf28}+14.55\%}$
test_step_mdp_speed[False-False-True-False-False]	0.4270ms	27.9910μs	35.7257 KOps/s	33.5171 KOps/s	$\textbf{\color{#35bf28}+6.59\%}$
test_step_mdp_speed[False-False-False-True-True]	2.9801ms	62.7129μs	15.9457 KOps/s	16.1196 KOps/s	$\color{#d91a1a}-1.08\%$
test_step_mdp_speed[False-False-False-True-False]	2.3710ms	39.5908μs	25.2584 KOps/s	25.4783 KOps/s	$\color{#d91a1a}-0.86\%$
test_step_mdp_speed[False-False-False-False-True]	0.9253ms	49.5415μs	20.1851 KOps/s	18.7217 KOps/s	$\textbf{\color{#35bf28}+7.82\%}$
test_step_mdp_speed[False-False-False-False-False]	4.0849ms	29.7339μs	33.6316 KOps/s	30.5640 KOps/s	$\textbf{\color{#35bf28}+10.04\%}$
test_values[generalized_advantage_estimate-True-True]	21.6707ms	18.8630ms	53.0139 Ops/s	53.6339 Ops/s	$\color{#d91a1a}-1.16\%$
test_values[vec_generalized_advantage_estimate-True-True]	75.5466ms	66.1030ms	15.1279 Ops/s	14.4239 Ops/s	$\color{#35bf28}+4.88\%$
test_values[td0_return_estimate-False-False]	0.7134ms	0.3071ms	3.2564 KOps/s	2.9574 KOps/s	$\textbf{\color{#35bf28}+10.11\%}$
test_values[td1_return_estimate-False-False]	19.4841ms	18.0950ms	55.2639 Ops/s	58.2740 Ops/s	$\textbf{\color{#d91a1a}-5.17\%}$
test_values[vec_td1_return_estimate-False-False]	86.9889ms	67.0895ms	14.9055 Ops/s	14.9019 Ops/s	$\color{#35bf28}+0.02\%$
test_values[td_lambda_return_estimate-True-False]	53.5987ms	45.7055ms	21.8792 Ops/s	21.5424 Ops/s	$\color{#35bf28}+1.56\%$
test_values[vec_td_lambda_return_estimate-True-False]	78.4416ms	66.6382ms	15.0064 Ops/s	14.9267 Ops/s	$\color{#35bf28}+0.53\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	15.5938ms	14.3389ms	69.7404 Ops/s	71.5780 Ops/s	$\color{#d91a1a}-2.57\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	6.7497ms	4.3936ms	227.6047 Ops/s	233.2881 Ops/s	$\color{#d91a1a}-2.44\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	2.3877ms	0.6359ms	1.5726 KOps/s	1.5499 KOps/s	$\color{#35bf28}+1.47\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	84.4087ms	74.7777ms	13.3730 Ops/s	14.0316 Ops/s	$\color{#d91a1a}-4.69\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	9.6492ms	5.5209ms	181.1298 Ops/s	186.2370 Ops/s	$\color{#d91a1a}-2.74\%$
test_dqn_speed	7.9113ms	2.3938ms	417.7523 Ops/s	423.0881 Ops/s	$\color{#d91a1a}-1.26\%$
test_ddpg_speed	9.9337ms	4.4004ms	227.2501 Ops/s	228.6864 Ops/s	$\color{#d91a1a}-0.63\%$
test_sac_speed	16.6543ms	12.1767ms	82.1242 Ops/s	79.7428 Ops/s	$\color{#35bf28}+2.99\%$
test_redq_speed	30.3917ms	24.1900ms	41.3394 Ops/s	42.2615 Ops/s	$\color{#d91a1a}-2.18\%$
test_redq_deprec_speed	22.8509ms	20.3040ms	49.2514 Ops/s	50.7549 Ops/s	$\color{#d91a1a}-2.96\%$
test_td3_speed	23.2361ms	17.7538ms	56.3258 Ops/s	61.6967 Ops/s	$\textbf{\color{#d91a1a}-8.71\%}$
test_cql_speed	54.7995ms	47.8879ms	20.8821 Ops/s	17.7078 Ops/s	$\textbf{\color{#35bf28}+17.93\%}$
test_a2c_speed	15.2632ms	10.5665ms	94.6384 Ops/s	98.6468 Ops/s	$\color{#d91a1a}-4.06\%$
test_ppo_speed	20.7207ms	11.2802ms	88.6509 Ops/s	88.3122 Ops/s	$\color{#35bf28}+0.38\%$
test_reinforce_speed	14.7695ms	8.5470ms	116.9999 Ops/s	113.6091 Ops/s	$\color{#35bf28}+2.98\%$
test_iql_speed	45.2048ms	41.6881ms	23.9877 Ops/s	24.5867 Ops/s	$\color{#d91a1a}-2.44\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	0.1409s	5.6380ms	177.3691 Ops/s	207.9854 Ops/s	$\textbf{\color{#d91a1a}-14.72\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	7.6142ms	5.0055ms	199.7810 Ops/s	200.3868 Ops/s	$\color{#d91a1a}-0.30\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	9.8131ms	5.1683ms	193.4877 Ops/s	212.6838 Ops/s	$\textbf{\color{#d91a1a}-9.03\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	12.1216ms	5.1335ms	194.7987 Ops/s	183.9582 Ops/s	$\textbf{\color{#35bf28}+5.89\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	8.2018ms	5.1565ms	193.9308 Ops/s	208.1475 Ops/s	$\textbf{\color{#d91a1a}-6.83\%}$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	7.2151ms	5.0109ms	199.5639 Ops/s	170.9874 Ops/s	$\textbf{\color{#35bf28}+16.71\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	0.1467s	5.7025ms	175.3609 Ops/s	212.6741 Ops/s	$\textbf{\color{#d91a1a}-17.54\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	9.6422ms	5.0883ms	196.5275 Ops/s	195.2171 Ops/s	$\color{#35bf28}+0.67\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	14.8343ms	5.2845ms	189.2316 Ops/s	192.0651 Ops/s	$\color{#d91a1a}-1.48\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	9.3542ms	4.9993ms	200.0278 Ops/s	202.0956 Ops/s	$\color{#d91a1a}-1.02\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	9.3471ms	5.1211ms	195.2707 Ops/s	198.3255 Ops/s	$\color{#d91a1a}-1.54\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.1802s	6.0057ms	166.5075 Ops/s	191.9995 Ops/s	$\textbf{\color{#d91a1a}-13.28\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	9.5452ms	5.0945ms	196.2883 Ops/s	204.6935 Ops/s	$\color{#d91a1a}-4.11\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	11.8956ms	5.1343ms	194.7691 Ops/s	197.6823 Ops/s	$\color{#d91a1a}-1.47\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	8.9045ms	5.0053ms	199.7900 Ops/s	191.6021 Ops/s	$\color{#35bf28}+4.27\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	0.1405s	5.7054ms	175.2720 Ops/s	196.5704 Ops/s	$\textbf{\color{#d91a1a}-10.83\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	7.3654ms	5.1167ms	195.4396 Ops/s	197.1261 Ops/s	$\color{#d91a1a}-0.86\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.1788s	5.8603ms	170.6403 Ops/s	197.9995 Ops/s	$\textbf{\color{#d91a1a}-13.82\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.3916s	45.8901ms	21.7912 Ops/s	22.4059 Ops/s	$\color{#d91a1a}-2.74\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	0.1861s	41.3057ms	24.2097 Ops/s	22.0096 Ops/s	$\textbf{\color{#35bf28}+10.00\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	0.1884s	42.0834ms	23.7624 Ops/s	23.7557 Ops/s	$\color{#35bf28}+0.03\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.2077s	41.9166ms	23.8569 Ops/s	23.6065 Ops/s	$\color{#35bf28}+1.06\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	0.1916s	42.5099ms	23.5239 Ops/s	24.4105 Ops/s	$\color{#d91a1a}-3.63\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	0.2047s	45.2123ms	22.1179 Ops/s	23.4627 Ops/s	$\textbf{\color{#d91a1a}-5.73\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1971s	41.9915ms	23.8144 Ops/s	22.1013 Ops/s	$\textbf{\color{#35bf28}+7.75\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	0.1867s	41.8494ms	23.8952 Ops/s	24.1772 Ops/s	$\color{#d91a1a}-1.17\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	0.1961s	42.8442ms	23.3404 Ops/s	23.9019 Ops/s	$\color{#d91a1a}-2.35\%$

github-actions · 2023-06-30T14:40:53Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1704s	0.1699s	5.8864 Ops/s	5.1710 Ops/s	$\textbf{\color{#35bf28}+13.83\%}$
test_sync	90.0793ms	88.0239ms	11.3606 Ops/s	9.9377 Ops/s	$\textbf{\color{#35bf28}+14.32\%}$
test_async	0.1721s	86.1629ms	11.6059 Ops/s	10.0400 Ops/s	$\textbf{\color{#35bf28}+15.60\%}$
test_simple	0.8805s	0.7884s	1.2683 Ops/s	1.3161 Ops/s	$\color{#d91a1a}-3.63\%$
test_transformed	2.0528s	1.9794s	0.5052 Ops/s	0.5171 Ops/s	$\color{#d91a1a}-2.31\%$
test_serial	2.4584s	2.3818s	0.4199 Ops/s	0.4341 Ops/s	$\color{#d91a1a}-3.28\%$
test_parallel	1.8794s	1.8074s	0.5533 Ops/s	0.5419 Ops/s	$\color{#35bf28}+2.11\%$
test_step_mdp_speed[True-True-True-True-True]	0.2216ms	43.2430μs	23.1251 KOps/s	23.6474 KOps/s	$\color{#d91a1a}-2.21\%$
test_step_mdp_speed[True-True-True-True-False]	0.2150ms	24.0319μs	41.6113 KOps/s	42.4848 KOps/s	$\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-True-True-False-True]	0.1462ms	30.2507μs	33.0570 KOps/s	33.5703 KOps/s	$\color{#d91a1a}-1.53\%$
test_step_mdp_speed[True-True-True-False-False]	41.8010μs	16.8012μs	59.5196 KOps/s	60.9452 KOps/s	$\color{#d91a1a}-2.34\%$
test_step_mdp_speed[True-True-False-True-True]	0.1589ms	44.3112μs	22.5677 KOps/s	23.0281 KOps/s	$\color{#d91a1a}-2.00\%$
test_step_mdp_speed[True-True-False-True-False]	50.0000μs	25.6981μs	38.9134 KOps/s	40.0840 KOps/s	$\color{#d91a1a}-2.92\%$
test_step_mdp_speed[True-True-False-False-True]	0.1447ms	32.2604μs	30.9978 KOps/s	31.8297 KOps/s	$\color{#d91a1a}-2.61\%$
test_step_mdp_speed[True-True-False-False-False]	48.0010μs	18.7129μs	53.4390 KOps/s	55.2770 KOps/s	$\color{#d91a1a}-3.33\%$
test_step_mdp_speed[True-False-True-True-True]	0.1227ms	45.7098μs	21.8771 KOps/s	22.0147 KOps/s	$\color{#d91a1a}-0.62\%$
test_step_mdp_speed[True-False-True-True-False]	61.6010μs	27.5046μs	36.3576 KOps/s	37.6358 KOps/s	$\color{#d91a1a}-3.40\%$
test_step_mdp_speed[True-False-True-False-True]	0.1262ms	31.9648μs	31.2844 KOps/s	31.9056 KOps/s	$\color{#d91a1a}-1.95\%$
test_step_mdp_speed[True-False-True-False-False]	50.2000μs	18.4261μs	54.2707 KOps/s	55.9239 KOps/s	$\color{#d91a1a}-2.96\%$
test_step_mdp_speed[True-False-False-True-True]	0.1546ms	47.0473μs	21.2552 KOps/s	21.3604 KOps/s	$\color{#d91a1a}-0.49\%$
test_step_mdp_speed[True-False-False-True-False]	58.1010μs	28.8819μs	34.6238 KOps/s	35.5858 KOps/s	$\color{#d91a1a}-2.70\%$
test_step_mdp_speed[True-False-False-False-True]	0.1453ms	33.5414μs	29.8139 KOps/s	30.6289 KOps/s	$\color{#d91a1a}-2.66\%$
test_step_mdp_speed[True-False-False-False-False]	67.2010μs	20.0241μs	49.9398 KOps/s	51.0707 KOps/s	$\color{#d91a1a}-2.21\%$
test_step_mdp_speed[False-True-True-True-True]	0.1504ms	46.0335μs	21.7233 KOps/s	21.9549 KOps/s	$\color{#d91a1a}-1.05\%$
test_step_mdp_speed[False-True-True-True-False]	60.6000μs	27.3346μs	36.5837 KOps/s	37.6938 KOps/s	$\color{#d91a1a}-2.94\%$
test_step_mdp_speed[False-True-True-False-True]	0.1456ms	37.0356μs	27.0010 KOps/s	27.3387 KOps/s	$\color{#d91a1a}-1.24\%$
test_step_mdp_speed[False-True-True-False-False]	0.2772ms	20.4644μs	48.8652 KOps/s	50.0280 KOps/s	$\color{#d91a1a}-2.32\%$
test_step_mdp_speed[False-True-False-True-True]	0.1573ms	47.8914μs	20.8806 KOps/s	21.0942 KOps/s	$\color{#d91a1a}-1.01\%$
test_step_mdp_speed[False-True-False-True-False]	0.1113ms	28.9172μs	34.5815 KOps/s	35.2366 KOps/s	$\color{#d91a1a}-1.86\%$
test_step_mdp_speed[False-True-False-False-True]	0.1387ms	38.3578μs	26.0703 KOps/s	26.2010 KOps/s	$\color{#d91a1a}-0.50\%$
test_step_mdp_speed[False-True-False-False-False]	52.0000μs	22.1564μs	45.1336 KOps/s	46.5548 KOps/s	$\color{#d91a1a}-3.05\%$
test_step_mdp_speed[False-False-True-True-True]	0.1533ms	48.9225μs	20.4405 KOps/s	20.8729 KOps/s	$\color{#d91a1a}-2.07\%$
test_step_mdp_speed[False-False-True-True-False]	0.1173ms	30.5890μs	32.6914 KOps/s	33.5767 KOps/s	$\color{#d91a1a}-2.64\%$
test_step_mdp_speed[False-False-True-False-True]	63.1010μs	39.2234μs	25.4950 KOps/s	25.9366 KOps/s	$\color{#d91a1a}-1.70\%$
test_step_mdp_speed[False-False-True-False-False]	0.1064ms	21.7969μs	45.8780 KOps/s	47.0852 KOps/s	$\color{#d91a1a}-2.56\%$
test_step_mdp_speed[False-False-False-True-True]	0.2385ms	49.9431μs	20.0228 KOps/s	20.3246 KOps/s	$\color{#d91a1a}-1.49\%$
test_step_mdp_speed[False-False-False-True-False]	0.1098ms	32.0035μs	31.2466 KOps/s	31.7519 KOps/s	$\color{#d91a1a}-1.59\%$
test_step_mdp_speed[False-False-False-False-True]	0.1545ms	39.6949μs	25.1921 KOps/s	25.6122 KOps/s	$\color{#d91a1a}-1.64\%$
test_step_mdp_speed[False-False-False-False-False]	50.0010μs	23.4226μs	42.6939 KOps/s	43.9301 KOps/s	$\color{#d91a1a}-2.81\%$
test_values[generalized_advantage_estimate-True-True]	16.7459ms	16.1545ms	61.9022 Ops/s	61.3062 Ops/s	$\color{#35bf28}+0.97\%$
test_values[vec_generalized_advantage_estimate-True-True]	56.5632ms	50.9361ms	19.6325 Ops/s	19.1710 Ops/s	$\color{#35bf28}+2.41\%$
test_values[td0_return_estimate-False-False]	0.4517ms	0.3064ms	3.2637 KOps/s	3.4089 KOps/s	$\color{#d91a1a}-4.26\%$
test_values[td1_return_estimate-False-False]	15.8342ms	15.5858ms	64.1608 Ops/s	63.5724 Ops/s	$\color{#35bf28}+0.93\%$
test_values[vec_td1_return_estimate-False-False]	53.1953ms	50.6735ms	19.7342 Ops/s	19.2212 Ops/s	$\color{#35bf28}+2.67\%$
test_values[td_lambda_return_estimate-True-False]	39.4961ms	38.4341ms	26.0186 Ops/s	26.0583 Ops/s	$\color{#d91a1a}-0.15\%$
test_values[vec_td_lambda_return_estimate-True-False]	58.9599ms	51.1611ms	19.5461 Ops/s	19.0920 Ops/s	$\color{#35bf28}+2.38\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	13.5418ms	13.2792ms	75.3059 Ops/s	73.1945 Ops/s	$\color{#35bf28}+2.88\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	7.5598ms	4.2535ms	235.0991 Ops/s	228.9204 Ops/s	$\color{#35bf28}+2.70\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	2.2000ms	0.5897ms	1.6958 KOps/s	1.6728 KOps/s	$\color{#35bf28}+1.38\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	67.9402ms	67.3996ms	14.8369 Ops/s	14.9897 Ops/s	$\color{#d91a1a}-1.02\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	6.3622ms	3.9343ms	254.1773 Ops/s	261.3731 Ops/s	$\color{#d91a1a}-2.75\%$
test_dqn_speed	2.6318ms	1.9935ms	501.6255 Ops/s	487.4321 Ops/s	$\color{#35bf28}+2.91\%$
test_ddpg_speed	10.3786ms	3.2902ms	303.9293 Ops/s	280.2547 Ops/s	$\textbf{\color{#35bf28}+8.45\%}$
test_sac_speed	12.0594ms	10.2940ms	97.1436 Ops/s	95.1448 Ops/s	$\color{#35bf28}+2.10\%$
test_redq_speed	24.8585ms	18.4094ms	54.3202 Ops/s	54.6378 Ops/s	$\color{#d91a1a}-0.58\%$
test_redq_deprec_speed	16.7859ms	15.6741ms	63.7995 Ops/s	63.0728 Ops/s	$\color{#35bf28}+1.15\%$
test_td3_speed	19.5024ms	14.5638ms	68.6632 Ops/s	70.9143 Ops/s	$\color{#d91a1a}-3.17\%$
test_cql_speed	47.8674ms	40.6239ms	24.6160 Ops/s	22.8095 Ops/s	$\textbf{\color{#35bf28}+7.92\%}$
test_a2c_speed	9.0028ms	7.4013ms	135.1113 Ops/s	142.9486 Ops/s	$\textbf{\color{#d91a1a}-5.48\%}$
test_ppo_speed	20.8610ms	8.0706ms	123.9069 Ops/s	135.4278 Ops/s	$\textbf{\color{#d91a1a}-8.51\%}$
test_reinforce_speed	7.1560ms	5.5983ms	178.6252 Ops/s	188.5206 Ops/s	$\textbf{\color{#d91a1a}-5.25\%}$
test_iql_speed	29.2524ms	27.4961ms	36.3688 Ops/s	35.1077 Ops/s	$\color{#35bf28}+3.59\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	5.4035ms	4.5511ms	219.7267 Ops/s	196.5339 Ops/s	$\textbf{\color{#35bf28}+11.80\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	9.1175ms	4.6834ms	213.5220 Ops/s	219.0414 Ops/s	$\color{#d91a1a}-2.52\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	9.4895ms	4.7057ms	212.5090 Ops/s	217.1204 Ops/s	$\color{#d91a1a}-2.12\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	0.1520s	5.1868ms	192.7964 Ops/s	225.6472 Ops/s	$\textbf{\color{#d91a1a}-14.56\%}$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	8.0227ms	4.6747ms	213.9182 Ops/s	211.6111 Ops/s	$\color{#35bf28}+1.09\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.1857s	5.4546ms	183.3301 Ops/s	217.4018 Ops/s	$\textbf{\color{#d91a1a}-15.67\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	11.5651ms	4.5467ms	219.9393 Ops/s	193.3705 Ops/s	$\textbf{\color{#35bf28}+13.74\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	9.7504ms	4.6618ms	214.5090 Ops/s	216.5552 Ops/s	$\color{#d91a1a}-0.94\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	7.8136ms	4.6453ms	215.2733 Ops/s	217.5755 Ops/s	$\color{#d91a1a}-1.06\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	0.1470s	5.1432ms	194.4331 Ops/s	197.0654 Ops/s	$\color{#d91a1a}-1.34\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	7.7157ms	4.6356ms	215.7223 Ops/s	217.0088 Ops/s	$\color{#d91a1a}-0.59\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	11.9930ms	4.7296ms	211.4336 Ops/s	184.2382 Ops/s	$\textbf{\color{#35bf28}+14.76\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	5.1093ms	4.5214ms	221.1718 Ops/s	226.0547 Ops/s	$\color{#d91a1a}-2.16\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	9.4435ms	4.7244ms	211.6661 Ops/s	216.9080 Ops/s	$\color{#d91a1a}-2.42\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	7.3330ms	4.6999ms	212.7685 Ops/s	217.7595 Ops/s	$\color{#d91a1a}-2.29\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	0.1482s	5.1920ms	192.6025 Ops/s	191.0213 Ops/s	$\color{#35bf28}+0.83\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	8.1383ms	4.6986ms	212.8295 Ops/s	214.5232 Ops/s	$\color{#d91a1a}-0.79\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	9.4354ms	4.7232ms	211.7204 Ops/s	216.8997 Ops/s	$\color{#d91a1a}-2.39\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.3343s	42.0968ms	23.7548 Ops/s	24.9902 Ops/s	$\color{#d91a1a}-4.94\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	0.1880s	35.7583ms	27.9655 Ops/s	28.1401 Ops/s	$\color{#d91a1a}-0.62\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	0.1859s	35.4190ms	28.2334 Ops/s	28.0174 Ops/s	$\color{#35bf28}+0.77\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1838s	35.1400ms	28.4576 Ops/s	28.3727 Ops/s	$\color{#35bf28}+0.30\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	0.1895s	35.5189ms	28.1540 Ops/s	28.1951 Ops/s	$\color{#d91a1a}-0.15\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	0.1900s	38.8619ms	25.7321 Ops/s	25.4413 Ops/s	$\color{#35bf28}+1.14\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1934s	36.2949ms	27.5521 Ops/s	27.8152 Ops/s	$\color{#d91a1a}-0.95\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	0.1937s	35.6335ms	28.0635 Ops/s	27.9271 Ops/s	$\color{#35bf28}+0.49\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	0.1928s	35.7775ms	27.9505 Ops/s	28.4495 Ops/s	$\color{#d91a1a}-1.75\%$

# Conflicts: # torchrl/envs/utils.py

…collector_rollout # Conflicts: # torchrl/envs/transforms/rlhf.py

…collector_rollout

# Conflicts: # torchrl/envs/transforms/rlhf.py # torchrl/envs/utils.py # torchrl/envs/vec_env.py # torchrl/modules/tensordict_module/common.py

# Conflicts: # torchrl/_utils.py # torchrl/envs/transforms/transforms.py

matteobettini

LGTM, i just have a question about the 2 clones. These are very expensive so just wanna make sure there is absolutely no way to avoid them

torchrl/collectors/collectors.py

vmoens added 4 commits June 27, 2023 21:49

init

da92cc8

amend

4d37ea5

amend

9b92c3a

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

185fb51

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2023

vmoens added 5 commits June 28, 2023 13:17

amend

33991ab

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

37eadb1

amend

cf5f014

amend

f602d1e

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

8059956

vmoens added the performance Performance issue or suggestion for improvement label Jun 28, 2023

vmoens added 3 commits June 29, 2023 14:38

Merge branch 'main' into faster_collector_rollout

f6e89e4

# Conflicts: # torchrl/envs/utils.py

amend

4d9a56c

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

1fcec3a

vmoens added 14 commits July 1, 2023 07:19

Merge branch 'main' into faster_collector_rollout

a4653a2

# Conflicts: # torchrl/envs/utils.py

init

945dc73

amend

e3753fe

fix

1db0f63

Merge remote-tracking branch 'origin/main' into more_unravel_fixes

6c66de8

amend

cade46c

amend

b882f18

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

e9e0da6

amend

c5e07c0

Merge remote-tracking branch 'origin/more_unravel_fixes' into faster_…

99e5872

…collector_rollout # Conflicts: # torchrl/envs/transforms/rlhf.py

address comment

83299a6

Merge remote-tracking branch 'origin/more_unravel_fixes' into faster_…

37949b2

…collector_rollout

amend

565a5cc

init

13dcc55

vmoens added 8 commits July 6, 2023 10:46

Merge branch 'followu-473' into faster_collector_rollout

2b4699b

# Conflicts: # torchrl/envs/transforms/rlhf.py # torchrl/envs/utils.py # torchrl/envs/vec_env.py # torchrl/modules/tensordict_module/common.py

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

01baa1e

# Conflicts: # torchrl/_utils.py # torchrl/envs/transforms/transforms.py

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

332eb1a

ignore warnings about policy device

54b486d

Merge remote-tracking branch 'origin/main' into faster_collector_rollout

ac18334

Merge branch 'main' into faster_collector_rollout

3767359

amend

d0cab4b

amend

bcf7263

matteobettini approved these changes Jul 7, 2023

View reviewed changes

torchrl/collectors/collectors.py Show resolved Hide resolved

torchrl/collectors/collectors.py Show resolved Hide resolved

matteobettini reviewed Jul 7, 2023

View reviewed changes

torchrl/collectors/collectors.py Show resolved Hide resolved

matteobettini reviewed Jul 7, 2023

View reviewed changes

torchrl/collectors/collectors.py Outdated Show resolved Hide resolved

vmoens added 2 commits July 7, 2023 15:50

addressing comments

3c879a8

lint

3b16598

vmoens merged commit fcb04e4 into main Jul 7, 2023

vmoens deleted the faster_collector_rollout branch July 7, 2023 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor,Performance] Faster collectors (bis) #1331

[Refactor,Performance] Faster collectors (bis) #1331

vmoens commented Jun 28, 2023 •

edited

Loading

github-actions bot commented Jun 30, 2023 •

edited

Loading

github-actions bot commented Jun 30, 2023 •

edited

Loading

matteobettini left a comment

[Refactor,Performance] Faster collectors (bis) #1331

[Refactor,Performance] Faster collectors (bis) #1331

Conversation

vmoens commented Jun 28, 2023 • edited Loading

github-actions bot commented Jun 30, 2023 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}10$.

github-actions bot commented Jun 30, 2023 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}8$. Worsened: $\large\color{#d91a1a}5$.

matteobettini left a comment

Choose a reason for hiding this comment

vmoens commented Jun 28, 2023 •

edited

Loading

github-actions bot commented Jun 30, 2023 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

github-actions bot commented Jun 30, 2023 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests