[Feature] optionally set truncated = True at the end of rollouts #2042

vmoens · 2024-03-26T20:37:46Z

No description provided.

pytorch-bot · 2024-03-26T20:37:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2042

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[PREEMPTIVE] - We recently implemented changes in pull job linux-jammy-py3.8-gcc11 / build

❌ 4 New Failures, 1 Unrelated Failure

As of commit 1987d4b with merge base a7bf5a4 ():

NEW FAILURES - The following jobs have failed:

Habitat Tests on Linux / tests (3.9, 11.6) / linux-job (gh)
RuntimeError: Command docker exec -t ef307dd8d455ab5a83e09f2b0542f38bf15e1035f3a6c0daf2a1cc0e9b475c71 /exec failed with exit code 139
Libs Tests on Linux / unittests-robohive (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t f86623b048890fa2ba72d5e0ca2d45eaf3de3323b096eb0decc2787176ca7dc5 /exec failed with exit code 1
Unit-tests on Linux / tests-stable-gpu (3.10, 11.8) / linux-job (gh)
test/test_distributed.py::TestRayCollector::test_distributed_collector_updatepolicy[True-SyncDataCollector]
Unit-tests on MacOS CPU / tests (3.8) / macos-job (gh)
test/test_modules.py::TestMultiAgent::test_multiagent_mlp[batch1-None-False-True-3]

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on Windows / unittests-cpu / windows-job (gh)
test/test_transforms.py::TestBatchSizeTransform::test_trans_parallel_env_check[False-reshape]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2024-03-26T20:45:12Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	54.1541ms	53.2761ms	18.7701 Ops/s	18.2941 Ops/s	$\color{#35bf28}+2.60\%$
test_sync	42.3707ms	29.6104ms	33.7719 Ops/s	33.4684 Ops/s	$\color{#35bf28}+0.91\%$
test_async	53.3146ms	28.4770ms	35.1161 Ops/s	36.7166 Ops/s	$\color{#d91a1a}-4.36\%$
test_simple	0.4095s	0.3445s	2.9030 Ops/s	2.8701 Ops/s	$\color{#35bf28}+1.15\%$
test_transformed	0.5465s	0.4914s	2.0348 Ops/s	2.0342 Ops/s	$\color{#35bf28}+0.03\%$
test_serial	1.2635s	1.2035s	0.8309 Ops/s	0.8255 Ops/s	$\color{#35bf28}+0.66\%$
test_parallel	1.0625s	0.9986s	1.0014 Ops/s	0.9992 Ops/s	$\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-True-True-True-True]	0.1289ms	21.8460μs	45.7749 KOps/s	45.1332 KOps/s	$\color{#35bf28}+1.42\%$
test_step_mdp_speed[True-True-True-True-False]	37.2090μs	13.2887μs	75.2516 KOps/s	72.9493 KOps/s	$\color{#35bf28}+3.16\%$
test_step_mdp_speed[True-True-True-False-True]	0.1205ms	12.8720μs	77.6882 KOps/s	77.7383 KOps/s	$\color{#d91a1a}-0.06\%$
test_step_mdp_speed[True-True-True-False-False]	38.8320μs	7.8716μs	127.0394 KOps/s	125.9493 KOps/s	$\color{#35bf28}+0.87\%$
test_step_mdp_speed[True-True-False-True-True]	50.7440μs	23.0824μs	43.3231 KOps/s	43.0129 KOps/s	$\color{#35bf28}+0.72\%$
test_step_mdp_speed[True-True-False-True-False]	38.1010μs	14.5969μs	68.5079 KOps/s	66.8575 KOps/s	$\color{#35bf28}+2.47\%$
test_step_mdp_speed[True-True-False-False-True]	47.5280μs	13.9873μs	71.4933 KOps/s	70.7476 KOps/s	$\color{#35bf28}+1.05\%$
test_step_mdp_speed[True-True-False-False-False]	42.2880μs	9.0369μs	110.6569 KOps/s	109.2804 KOps/s	$\color{#35bf28}+1.26\%$
test_step_mdp_speed[True-False-True-True-True]	0.1246ms	24.2964μs	41.1583 KOps/s	40.1956 KOps/s	$\color{#35bf28}+2.40\%$
test_step_mdp_speed[True-False-True-True-False]	42.7600μs	15.8982μs	62.9001 KOps/s	60.8960 KOps/s	$\color{#35bf28}+3.29\%$
test_step_mdp_speed[True-False-True-False-True]	50.9850μs	14.0212μs	71.3207 KOps/s	70.0646 KOps/s	$\color{#35bf28}+1.79\%$
test_step_mdp_speed[True-False-True-False-False]	34.7440μs	9.0065μs	111.0305 KOps/s	109.2813 KOps/s	$\color{#35bf28}+1.60\%$
test_step_mdp_speed[True-False-False-True-True]	52.0060μs	25.4475μs	39.2966 KOps/s	38.2625 KOps/s	$\color{#35bf28}+2.70\%$
test_step_mdp_speed[True-False-False-True-False]	53.4380μs	17.0824μs	58.5397 KOps/s	57.0661 KOps/s	$\color{#35bf28}+2.58\%$
test_step_mdp_speed[True-False-False-False-True]	40.9560μs	14.9994μs	66.6695 KOps/s	65.1739 KOps/s	$\color{#35bf28}+2.29\%$
test_step_mdp_speed[True-False-False-False-False]	38.1700μs	10.0860μs	99.1470 KOps/s	96.1816 KOps/s	$\color{#35bf28}+3.08\%$
test_step_mdp_speed[False-True-True-True-True]	61.6640μs	24.4666μs	40.8721 KOps/s	40.2075 KOps/s	$\color{#35bf28}+1.65\%$
test_step_mdp_speed[False-True-True-True-False]	57.7360μs	15.8900μs	62.9326 KOps/s	61.7635 KOps/s	$\color{#35bf28}+1.89\%$
test_step_mdp_speed[False-True-True-False-True]	46.7370μs	16.1404μs	61.9563 KOps/s	60.6197 KOps/s	$\color{#35bf28}+2.20\%$
test_step_mdp_speed[False-True-True-False-False]	49.6810μs	10.2567μs	97.4971 KOps/s	95.8179 KOps/s	$\color{#35bf28}+1.75\%$
test_step_mdp_speed[False-True-False-True-True]	57.8370μs	25.8410μs	38.6981 KOps/s	37.9072 KOps/s	$\color{#35bf28}+2.09\%$
test_step_mdp_speed[False-True-False-True-False]	51.5950μs	17.0040μs	58.8096 KOps/s	57.8319 KOps/s	$\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-True-False-False-True]	38.9320μs	17.1797μs	58.2083 KOps/s	57.3492 KOps/s	$\color{#35bf28}+1.50\%$
test_step_mdp_speed[False-True-False-False-False]	45.1330μs	11.3933μs	87.7706 KOps/s	86.5803 KOps/s	$\color{#35bf28}+1.37\%$
test_step_mdp_speed[False-False-True-True-True]	62.1150μs	26.7131μs	37.4348 KOps/s	36.6676 KOps/s	$\color{#35bf28}+2.09\%$
test_step_mdp_speed[False-False-True-True-False]	46.9470μs	18.3287μs	54.5592 KOps/s	52.6202 KOps/s	$\color{#35bf28}+3.68\%$
test_step_mdp_speed[False-False-True-False-True]	56.7250μs	17.2786μs	57.8752 KOps/s	56.9528 KOps/s	$\color{#35bf28}+1.62\%$
test_step_mdp_speed[False-False-True-False-False]	0.1072ms	11.4702μs	87.1828 KOps/s	85.9531 KOps/s	$\color{#35bf28}+1.43\%$
test_step_mdp_speed[False-False-False-True-True]	0.1840ms	27.9314μs	35.8020 KOps/s	35.4127 KOps/s	$\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-False-False-True-False]	60.8330μs	19.2760μs	51.8779 KOps/s	49.8502 KOps/s	$\color{#35bf28}+4.07\%$
test_step_mdp_speed[False-False-False-False-True]	43.9420μs	18.1464μs	55.1073 KOps/s	53.9366 KOps/s	$\color{#35bf28}+2.17\%$
test_step_mdp_speed[False-False-False-False-False]	52.3970μs	12.4475μs	80.3376 KOps/s	78.2613 KOps/s	$\color{#35bf28}+2.65\%$
test_values[generalized_advantage_estimate-True-True]	9.8556ms	9.3652ms	106.7780 Ops/s	104.3380 Ops/s	$\color{#35bf28}+2.34\%$
test_values[vec_generalized_advantage_estimate-True-True]	38.1104ms	35.3068ms	28.3232 Ops/s	28.1289 Ops/s	$\color{#35bf28}+0.69\%$
test_values[td0_return_estimate-False-False]	0.2171ms	0.1663ms	6.0136 KOps/s	5.5202 KOps/s	$\textbf{\color{#35bf28}+8.94\%}$
test_values[td1_return_estimate-False-False]	26.2221ms	23.2443ms	43.0213 Ops/s	42.5454 Ops/s	$\color{#35bf28}+1.12\%$
test_values[vec_td1_return_estimate-False-False]	36.6657ms	35.3796ms	28.2649 Ops/s	28.2320 Ops/s	$\color{#35bf28}+0.12\%$
test_values[td_lambda_return_estimate-True-False]	34.8335ms	33.5814ms	29.7784 Ops/s	29.9888 Ops/s	$\color{#d91a1a}-0.70\%$
test_values[vec_td_lambda_return_estimate-True-False]	37.1210ms	35.4393ms	28.2173 Ops/s	28.0824 Ops/s	$\color{#35bf28}+0.48\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	10.0847ms	8.1340ms	122.9412 Ops/s	122.1736 Ops/s	$\color{#35bf28}+0.63\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	2.3559ms	2.0104ms	497.4099 Ops/s	508.9985 Ops/s	$\color{#d91a1a}-2.28\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.4397ms	0.3509ms	2.8499 KOps/s	2.8541 KOps/s	$\color{#d91a1a}-0.15\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	48.6160ms	46.0222ms	21.7286 Ops/s	21.8424 Ops/s	$\color{#d91a1a}-0.52\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	3.6095ms	3.0261ms	330.4591 Ops/s	328.0371 Ops/s	$\color{#35bf28}+0.74\%$
test_dqn_speed	7.0253ms	1.3681ms	730.9443 Ops/s	743.4159 Ops/s	$\color{#d91a1a}-1.68\%$
test_ddpg_speed	3.0080ms	2.7137ms	368.4944 Ops/s	371.8748 Ops/s	$\color{#d91a1a}-0.91\%$
test_sac_speed	9.3823ms	8.2273ms	121.5460 Ops/s	112.9140 Ops/s	$\textbf{\color{#35bf28}+7.64\%}$
test_redq_speed	14.6738ms	13.5082ms	74.0289 Ops/s	75.4641 Ops/s	$\color{#d91a1a}-1.90\%$
test_redq_deprec_speed	16.3825ms	13.8212ms	72.3525 Ops/s	75.8567 Ops/s	$\color{#d91a1a}-4.62\%$
test_td3_speed	16.4720ms	8.2494ms	121.2213 Ops/s	122.3328 Ops/s	$\color{#d91a1a}-0.91\%$
test_cql_speed	38.2879ms	36.6135ms	27.3123 Ops/s	27.6303 Ops/s	$\color{#d91a1a}-1.15\%$
test_a2c_speed	8.8133ms	7.4239ms	134.6997 Ops/s	135.1570 Ops/s	$\color{#d91a1a}-0.34\%$
test_ppo_speed	8.7702ms	7.7710ms	128.6833 Ops/s	129.9824 Ops/s	$\color{#d91a1a}-1.00\%$
test_reinforce_speed	9.0330ms	6.8494ms	145.9974 Ops/s	151.8069 Ops/s	$\color{#d91a1a}-3.83\%$
test_iql_speed	39.3403ms	33.3765ms	29.9612 Ops/s	30.6083 Ops/s	$\color{#d91a1a}-2.11\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	2.4527ms	2.2024ms	454.0443 Ops/s	445.7992 Ops/s	$\color{#35bf28}+1.85\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.9992ms	0.5019ms	1.9924 KOps/s	2.0010 KOps/s	$\color{#d91a1a}-0.43\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.6594ms	0.4752ms	2.1046 KOps/s	2.1228 KOps/s	$\color{#d91a1a}-0.86\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	2.6138ms	2.2930ms	436.1068 Ops/s	445.1155 Ops/s	$\color{#d91a1a}-2.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	1.0169ms	0.4953ms	2.0189 KOps/s	2.0443 KOps/s	$\color{#d91a1a}-1.24\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.6661ms	0.4688ms	2.1331 KOps/s	2.1427 KOps/s	$\color{#d91a1a}-0.45\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	1.8958ms	1.2180ms	821.0388 Ops/s	778.3532 Ops/s	$\textbf{\color{#35bf28}+5.48\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	4.2978ms	1.1502ms	869.3972 Ops/s	871.8975 Ops/s	$\color{#d91a1a}-0.29\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.2866ms	2.3512ms	425.3058 Ops/s	424.6132 Ops/s	$\color{#35bf28}+0.16\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.1063s	0.7114ms	1.4057 KOps/s	1.6324 KOps/s	$\textbf{\color{#d91a1a}-13.88\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.9199ms	0.5979ms	1.6725 KOps/s	1.7072 KOps/s	$\color{#d91a1a}-2.03\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	2.7143ms	2.3406ms	427.2466 Ops/s	447.2209 Ops/s	$\color{#d91a1a}-4.47\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	1.0986ms	0.5161ms	1.9375 KOps/s	2.0153 KOps/s	$\color{#d91a1a}-3.86\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.5912ms	0.4788ms	2.0887 KOps/s	2.1016 KOps/s	$\color{#d91a1a}-0.61\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.5073ms	2.3662ms	422.6176 Ops/s	441.5814 Ops/s	$\color{#d91a1a}-4.29\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6615ms	0.4979ms	2.0086 KOps/s	2.0476 KOps/s	$\color{#d91a1a}-1.91\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	3.9965ms	0.4825ms	2.0725 KOps/s	2.1134 KOps/s	$\color{#d91a1a}-1.93\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.4552ms	2.3668ms	422.5106 Ops/s	414.0246 Ops/s	$\color{#35bf28}+2.05\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.7596ms	0.6141ms	1.6284 KOps/s	1.6319 KOps/s	$\color{#d91a1a}-0.21\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.7702ms	0.5984ms	1.6711 KOps/s	1.6973 KOps/s	$\color{#d91a1a}-1.54\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1221s	7.8768ms	126.9554 Ops/s	127.4865 Ops/s	$\color{#d91a1a}-0.42\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	15.1955ms	12.3321ms	81.0890 Ops/s	83.4848 Ops/s	$\color{#d91a1a}-2.87\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	1.8907ms	1.1164ms	895.7701 Ops/s	958.3805 Ops/s	$\textbf{\color{#d91a1a}-6.53\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1047s	5.7461ms	174.0313 Ops/s	173.8109 Ops/s	$\color{#35bf28}+0.13\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	16.4739ms	12.2293ms	81.7706 Ops/s	82.3765 Ops/s	$\color{#d91a1a}-0.74\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	1.7063ms	1.0951ms	913.1350 Ops/s	944.6832 Ops/s	$\color{#d91a1a}-3.34\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1164s	6.3100ms	158.4797 Ops/s	122.9751 Ops/s	$\textbf{\color{#35bf28}+28.87\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	0.1260s	15.2692ms	65.4914 Ops/s	80.2057 Ops/s	$\textbf{\color{#d91a1a}-18.35\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	4.2557ms	1.4771ms	677.0150 Ops/s	739.9074 Ops/s	$\textbf{\color{#d91a1a}-8.50\%}$

github-actions · 2024-03-26T20:48:00Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1027s	0.1013s	9.8716 Ops/s	9.2686 Ops/s	$\textbf{\color{#35bf28}+6.51\%}$
test_sync	88.8339ms	87.3446ms	11.4489 Ops/s	11.2616 Ops/s	$\color{#35bf28}+1.66\%$
test_async	0.1626s	71.8186ms	13.9240 Ops/s	13.9564 Ops/s	$\color{#d91a1a}-0.23\%$
test_single_pixels	0.1106s	0.1097s	9.1162 Ops/s	9.1859 Ops/s	$\color{#d91a1a}-0.76\%$
test_sync_pixels	69.3908ms	67.4312ms	14.8299 Ops/s	15.0853 Ops/s	$\color{#d91a1a}-1.69\%$
test_async_pixels	0.1284s	63.4724ms	15.7549 Ops/s	18.1342 Ops/s	$\textbf{\color{#d91a1a}-13.12\%}$
test_simple	0.7513s	0.6842s	1.4615 Ops/s	1.4451 Ops/s	$\color{#35bf28}+1.13\%$
test_transformed	0.9641s	0.9011s	1.1097 Ops/s	1.1288 Ops/s	$\color{#d91a1a}-1.69\%$
test_serial	2.1475s	2.0823s	0.4802 Ops/s	0.4758 Ops/s	$\color{#35bf28}+0.93\%$
test_parallel	1.9230s	1.8435s	0.5424 Ops/s	0.5575 Ops/s	$\color{#d91a1a}-2.70\%$
test_step_mdp_speed[True-True-True-True-True]	90.2910μs	33.7893μs	29.5951 KOps/s	30.3243 KOps/s	$\color{#d91a1a}-2.40\%$
test_step_mdp_speed[True-True-True-True-False]	55.0010μs	19.9174μs	50.2073 KOps/s	50.8639 KOps/s	$\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-True-True-False-True]	85.7810μs	18.8942μs	52.9263 KOps/s	54.7489 KOps/s	$\color{#d91a1a}-3.33\%$
test_step_mdp_speed[True-True-True-False-False]	26.6100μs	11.1959μs	89.3187 KOps/s	89.7529 KOps/s	$\color{#d91a1a}-0.48\%$
test_step_mdp_speed[True-True-False-True-True]	70.7210μs	35.3093μs	28.3212 KOps/s	29.3588 KOps/s	$\color{#d91a1a}-3.53\%$
test_step_mdp_speed[True-True-False-True-False]	81.1310μs	21.7125μs	46.0563 KOps/s	47.1354 KOps/s	$\color{#d91a1a}-2.29\%$
test_step_mdp_speed[True-True-False-False-True]	40.9910μs	20.9013μs	47.8440 KOps/s	49.0400 KOps/s	$\color{#d91a1a}-2.44\%$
test_step_mdp_speed[True-True-False-False-False]	47.1810μs	13.2735μs	75.3383 KOps/s	77.4936 KOps/s	$\color{#d91a1a}-2.78\%$
test_step_mdp_speed[True-False-True-True-True]	62.3610μs	36.9616μs	27.0551 KOps/s	27.7295 KOps/s	$\color{#d91a1a}-2.43\%$
test_step_mdp_speed[True-False-True-True-False]	51.4710μs	23.8136μs	41.9927 KOps/s	43.7485 KOps/s	$\color{#d91a1a}-4.01\%$
test_step_mdp_speed[True-False-True-False-True]	38.9510μs	20.7645μs	48.1592 KOps/s	50.6265 KOps/s	$\color{#d91a1a}-4.87\%$
test_step_mdp_speed[True-False-True-False-False]	28.7700μs	13.1848μs	75.8449 KOps/s	77.0773 KOps/s	$\color{#d91a1a}-1.60\%$
test_step_mdp_speed[True-False-False-True-True]	66.9610μs	39.2771μs	25.4602 KOps/s	26.1514 KOps/s	$\color{#d91a1a}-2.64\%$
test_step_mdp_speed[True-False-False-True-False]	71.3710μs	25.5317μs	39.1669 KOps/s	40.0616 KOps/s	$\color{#d91a1a}-2.23\%$
test_step_mdp_speed[True-False-False-False-True]	42.0310μs	22.5788μs	44.2894 KOps/s	46.2623 KOps/s	$\color{#d91a1a}-4.26\%$
test_step_mdp_speed[True-False-False-False-False]	33.9710μs	15.1331μs	66.0803 KOps/s	67.5110 KOps/s	$\color{#d91a1a}-2.12\%$
test_step_mdp_speed[False-True-True-True-True]	72.2910μs	37.6643μs	26.5504 KOps/s	28.0183 KOps/s	$\textbf{\color{#d91a1a}-5.24\%}$
test_step_mdp_speed[False-True-True-True-False]	43.0200μs	23.6058μs	42.3625 KOps/s	43.8102 KOps/s	$\color{#d91a1a}-3.30\%$
test_step_mdp_speed[False-True-True-False-True]	57.0110μs	24.8980μs	40.1639 KOps/s	41.7949 KOps/s	$\color{#d91a1a}-3.90\%$
test_step_mdp_speed[False-True-True-False-False]	40.4010μs	14.7952μs	67.5896 KOps/s	68.3151 KOps/s	$\color{#d91a1a}-1.06\%$
test_step_mdp_speed[False-True-False-True-True]	78.2510μs	39.2851μs	25.4549 KOps/s	26.2466 KOps/s	$\color{#d91a1a}-3.02\%$
test_step_mdp_speed[False-True-False-True-False]	0.2068ms	25.5247μs	39.1778 KOps/s	39.8467 KOps/s	$\color{#d91a1a}-1.68\%$
test_step_mdp_speed[False-True-False-False-True]	47.8910μs	26.3705μs	37.9212 KOps/s	38.8869 KOps/s	$\color{#d91a1a}-2.48\%$
test_step_mdp_speed[False-True-False-False-False]	38.5310μs	16.7475μs	59.7103 KOps/s	60.5427 KOps/s	$\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-False-True-True-True]	76.9010μs	41.3910μs	24.1598 KOps/s	25.1420 KOps/s	$\color{#d91a1a}-3.91\%$
test_step_mdp_speed[False-False-True-True-False]	45.4310μs	27.4479μs	36.4327 KOps/s	37.1553 KOps/s	$\color{#d91a1a}-1.94\%$
test_step_mdp_speed[False-False-True-False-True]	53.5500μs	26.1215μs	38.2827 KOps/s	39.0848 KOps/s	$\color{#d91a1a}-2.05\%$
test_step_mdp_speed[False-False-True-False-False]	33.2410μs	16.7285μs	59.7784 KOps/s	60.1820 KOps/s	$\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-False-False-True-True]	69.4610μs	42.0668μs	23.7717 KOps/s	24.2140 KOps/s	$\color{#d91a1a}-1.83\%$
test_step_mdp_speed[False-False-False-True-False]	59.4610μs	29.3196μs	34.1069 KOps/s	34.9961 KOps/s	$\color{#d91a1a}-2.54\%$
test_step_mdp_speed[False-False-False-False-True]	50.9710μs	27.9004μs	35.8417 KOps/s	36.9359 KOps/s	$\color{#d91a1a}-2.96\%$
test_step_mdp_speed[False-False-False-False-False]	44.4710μs	18.3406μs	54.5237 KOps/s	54.6840 KOps/s	$\color{#d91a1a}-0.29\%$
test_values[generalized_advantage_estimate-True-True]	25.0296ms	24.5631ms	40.7114 Ops/s	41.6859 Ops/s	$\color{#d91a1a}-2.34\%$
test_values[vec_generalized_advantage_estimate-True-True]	82.5006ms	3.2208ms	310.4796 Ops/s	307.9508 Ops/s	$\color{#35bf28}+0.82\%$
test_values[td0_return_estimate-False-False]	94.0910μs	66.3601μs	15.0693 KOps/s	15.1780 KOps/s	$\color{#d91a1a}-0.72\%$
test_values[td1_return_estimate-False-False]	55.7502ms	55.4171ms	18.0450 Ops/s	18.5148 Ops/s	$\color{#d91a1a}-2.54\%$
test_values[vec_td1_return_estimate-False-False]	2.1321ms	1.7760ms	563.0503 Ops/s	564.4104 Ops/s	$\color{#d91a1a}-0.24\%$
test_values[td_lambda_return_estimate-True-False]	88.1122ms	87.8007ms	11.3894 Ops/s	11.6763 Ops/s	$\color{#d91a1a}-2.46\%$
test_values[vec_td_lambda_return_estimate-True-False]	2.1168ms	1.7737ms	563.7921 Ops/s	563.6563 Ops/s	$\color{#35bf28}+0.02\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	24.6290ms	24.4217ms	40.9472 Ops/s	42.2255 Ops/s	$\color{#d91a1a}-3.03\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	0.9146ms	0.7171ms	1.3945 KOps/s	1.3991 KOps/s	$\color{#d91a1a}-0.33\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.7287ms	0.6613ms	1.5121 KOps/s	1.5159 KOps/s	$\color{#d91a1a}-0.25\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	1.5980ms	1.4669ms	681.7070 Ops/s	680.5353 Ops/s	$\color{#35bf28}+0.17\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	0.9825ms	0.6852ms	1.4594 KOps/s	1.4688 KOps/s	$\color{#d91a1a}-0.64\%$
test_dqn_speed	3.1749ms	1.4535ms	687.9778 Ops/s	695.7797 Ops/s	$\color{#d91a1a}-1.12\%$
test_ddpg_speed	2.9353ms	2.7616ms	362.1076 Ops/s	366.3564 Ops/s	$\color{#d91a1a}-1.16\%$
test_sac_speed	8.5834ms	8.1405ms	122.8432 Ops/s	124.0672 Ops/s	$\color{#d91a1a}-0.99\%$
test_redq_speed	11.2733ms	10.2944ms	97.1403 Ops/s	97.2235 Ops/s	$\color{#d91a1a}-0.09\%$
test_redq_deprec_speed	11.4476ms	11.0777ms	90.2718 Ops/s	84.7380 Ops/s	$\textbf{\color{#35bf28}+6.53\%}$
test_td3_speed	8.1302ms	8.0675ms	123.9537 Ops/s	125.1050 Ops/s	$\color{#d91a1a}-0.92\%$
test_cql_speed	26.5135ms	25.3384ms	39.4657 Ops/s	39.7049 Ops/s	$\color{#d91a1a}-0.60\%$
test_a2c_speed	7.1851ms	5.7267ms	174.6218 Ops/s	180.7659 Ops/s	$\color{#d91a1a}-3.40\%$
test_ppo_speed	6.6788ms	6.0180ms	166.1682 Ops/s	169.5433 Ops/s	$\color{#d91a1a}-1.99\%$
test_reinforce_speed	4.8481ms	4.5732ms	218.6652 Ops/s	222.9516 Ops/s	$\color{#d91a1a}-1.92\%$
test_iql_speed	20.4252ms	19.7591ms	50.6095 Ops/s	51.3232 Ops/s	$\color{#d91a1a}-1.39\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.0360ms	2.8853ms	346.5835 Ops/s	344.3605 Ops/s	$\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.6862ms	0.5444ms	1.8369 KOps/s	1.6162 KOps/s	$\textbf{\color{#35bf28}+13.65\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	4.5183ms	0.5229ms	1.9125 KOps/s	1.9406 KOps/s	$\color{#d91a1a}-1.45\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.1380ms	2.9059ms	344.1327 Ops/s	341.6145 Ops/s	$\color{#35bf28}+0.74\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.7098ms	0.5388ms	1.8559 KOps/s	1.8718 KOps/s	$\color{#d91a1a}-0.85\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	4.6065ms	0.5185ms	1.9287 KOps/s	1.9670 KOps/s	$\color{#d91a1a}-1.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	1.6134ms	1.4694ms	680.5312 Ops/s	700.5742 Ops/s	$\color{#d91a1a}-2.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	5.1934ms	1.3983ms	715.1658 Ops/s	722.0869 Ops/s	$\color{#d91a1a}-0.96\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.2160ms	3.0210ms	331.0203 Ops/s	331.1683 Ops/s	$\color{#d91a1a}-0.04\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.4843ms	0.6711ms	1.4901 KOps/s	1.4909 KOps/s	$\color{#d91a1a}-0.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8164ms	0.6450ms	1.5503 KOps/s	1.5352 KOps/s	$\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.0815ms	2.8976ms	345.1158 Ops/s	343.7651 Ops/s	$\color{#35bf28}+0.39\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	1.4427ms	0.5444ms	1.8370 KOps/s	1.8479 KOps/s	$\color{#d91a1a}-0.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.6908ms	0.5218ms	1.9166 KOps/s	1.9316 KOps/s	$\color{#d91a1a}-0.78\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.1532ms	2.9293ms	341.3839 Ops/s	341.8258 Ops/s	$\color{#d91a1a}-0.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.7729ms	0.5394ms	1.8538 KOps/s	1.8645 KOps/s	$\color{#d91a1a}-0.58\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	4.5562ms	0.5200ms	1.9229 KOps/s	1.9491 KOps/s	$\color{#d91a1a}-1.34\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.1995ms	3.0164ms	331.5176 Ops/s	328.2921 Ops/s	$\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.8283ms	0.6711ms	1.4902 KOps/s	1.4957 KOps/s	$\color{#d91a1a}-0.37\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8202ms	0.6484ms	1.5423 KOps/s	1.5277 KOps/s	$\color{#35bf28}+0.96\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1237s	9.5222ms	105.0179 Ops/s	134.3401 Ops/s	$\textbf{\color{#d91a1a}-21.83\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	17.4338ms	15.1665ms	65.9349 Ops/s	58.6331 Ops/s	$\textbf{\color{#35bf28}+12.45\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	2.3949ms	1.1868ms	842.5853 Ops/s	954.8889 Ops/s	$\textbf{\color{#d91a1a}-11.76\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1178s	7.1699ms	139.4716 Ops/s	139.4107 Ops/s	$\color{#35bf28}+0.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	17.2727ms	15.0543ms	66.4263 Ops/s	68.0604 Ops/s	$\color{#d91a1a}-2.40\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	2.5677ms	1.1526ms	867.5911 Ops/s	951.5833 Ops/s	$\textbf{\color{#d91a1a}-8.83\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1190s	7.5575ms	132.3191 Ops/s	133.2891 Ops/s	$\color{#d91a1a}-0.73\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	0.1289s	17.7859ms	56.2244 Ops/s	66.5063 Ops/s	$\textbf{\color{#d91a1a}-15.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	7.4459ms	1.6180ms	618.0521 Ops/s	708.5483 Ops/s	$\textbf{\color{#d91a1a}-12.77\%}$

skandermoalla · 2024-03-27T10:56:20Z

@vmoens I would raise a warning in collectors when set_truncated=True but reset_at_each_iter=False, or even raise an error.

vmoens · 2024-03-27T15:29:43Z

@vmoens I would raise a warning in collectors when set_truncated=True but reset_at_each_iter=False, or even raise an error.

Can you elaborate? To me it's ok to have set_truncated=True when you don't reset. You will have trajectories that are slices of real ones (start of the batch isn't start of the episode) but at least you can delimitate what comes from where if you do a reshape(-1) or this sort of things, allowing you to feed that to GAE or other modules without worrying about one trajectory polluting another.

skandermoalla · 2024-03-27T16:16:04Z

TLDR: my bad, I guess this is a valid thing to do, but if you have something that computes the episodic return out of that, it needs to be adjusted.

Okay, I see. I didn't have that use case in mind. I had the opposite assumptions:

When truncated happens in the middle of the rollout, it is set and a value estimator (GAE) will know to bootstrap and not sum over the next rewards. This is not an issue here.
When a rollout stops in the middle of a trajectory and you feed the trajectory to a GAE, it will bootstrap at the end because terminated is not set (truncated is not needed it already knows there are no more samples sum over)

2b. when you decide to rollout twice because for some reason your algorithm told you you need more data, you can concatenate the rollouts and the GAE will make use of more samples.

I thought you would never want to bootstrap (set truncated) when the next data coming can still be used to compute a value estimate (no reset) but I guess that's wrong. You can just want to compute your values on a fixed number of rollout steps.

I can think of another issue though. If you have something that computes the episodic return on the trajectories it will have the wrong signal.

init

7eb400b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2024

amend

1987d4b

vmoens changed the title ~~[WIP] optionally set truncated = True at the end of rollouts~~ [Feature] optionally set truncated = True at the end of rollouts Mar 27, 2024

vmoens added the enhancement New feature or request label Mar 27, 2024

vmoens merged commit f439b54 into main Mar 27, 2024
62 of 67 checks passed

vmoens deleted the truncated-rollouts branch March 27, 2024 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] optionally set truncated = True at the end of rollouts #2042

[Feature] optionally set truncated = True at the end of rollouts #2042

vmoens commented Mar 26, 2024

pytorch-bot bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

skandermoalla commented Mar 27, 2024 •

edited

Loading

vmoens commented Mar 27, 2024

skandermoalla commented Mar 27, 2024 •

edited

Loading

[Feature] optionally set truncated = True at the end of rollouts #2042

[Feature] optionally set truncated = True at the end of rollouts #2042

Conversation

vmoens commented Mar 26, 2024

pytorch-bot bot commented Mar 26, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2042

❗ 1 Active SEVs

❌ 4 New Failures, 1 Unrelated Failure

github-actions bot commented Mar 26, 2024 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}4$.

github-actions bot commented Mar 26, 2024 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}7$.

skandermoalla commented Mar 27, 2024 • edited Loading

vmoens commented Mar 27, 2024

skandermoalla commented Mar 27, 2024 • edited Loading

pytorch-bot bot commented Mar 26, 2024 •

edited

Loading

github-actions bot commented Mar 26, 2024 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

github-actions bot commented Mar 26, 2024 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

skandermoalla commented Mar 27, 2024 •

edited

Loading

skandermoalla commented Mar 27, 2024 •

edited

Loading