Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] optionally set truncated = True at the end of rollouts #2042

Merged
merged 2 commits into from
Mar 27, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 26, 2024

No description provided.

Copy link

pytorch-bot bot commented Mar 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2042

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 4 New Failures, 1 Unrelated Failure

As of commit 1987d4b with merge base a7bf5a4 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 26, 2024
Copy link

github-actions bot commented Mar 26, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 54.1541ms 53.2761ms 18.7701 Ops/s 18.2941 Ops/s $\color{#35bf28}+2.60\%$
test_sync 42.3707ms 29.6104ms 33.7719 Ops/s 33.4684 Ops/s $\color{#35bf28}+0.91\%$
test_async 53.3146ms 28.4770ms 35.1161 Ops/s 36.7166 Ops/s $\color{#d91a1a}-4.36\%$
test_simple 0.4095s 0.3445s 2.9030 Ops/s 2.8701 Ops/s $\color{#35bf28}+1.15\%$
test_transformed 0.5465s 0.4914s 2.0348 Ops/s 2.0342 Ops/s $\color{#35bf28}+0.03\%$
test_serial 1.2635s 1.2035s 0.8309 Ops/s 0.8255 Ops/s $\color{#35bf28}+0.66\%$
test_parallel 1.0625s 0.9986s 1.0014 Ops/s 0.9992 Ops/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-True-True-True-True] 0.1289ms 21.8460μs 45.7749 KOps/s 45.1332 KOps/s $\color{#35bf28}+1.42\%$
test_step_mdp_speed[True-True-True-True-False] 37.2090μs 13.2887μs 75.2516 KOps/s 72.9493 KOps/s $\color{#35bf28}+3.16\%$
test_step_mdp_speed[True-True-True-False-True] 0.1205ms 12.8720μs 77.6882 KOps/s 77.7383 KOps/s $\color{#d91a1a}-0.06\%$
test_step_mdp_speed[True-True-True-False-False] 38.8320μs 7.8716μs 127.0394 KOps/s 125.9493 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[True-True-False-True-True] 50.7440μs 23.0824μs 43.3231 KOps/s 43.0129 KOps/s $\color{#35bf28}+0.72\%$
test_step_mdp_speed[True-True-False-True-False] 38.1010μs 14.5969μs 68.5079 KOps/s 66.8575 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[True-True-False-False-True] 47.5280μs 13.9873μs 71.4933 KOps/s 70.7476 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[True-True-False-False-False] 42.2880μs 9.0369μs 110.6569 KOps/s 109.2804 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[True-False-True-True-True] 0.1246ms 24.2964μs 41.1583 KOps/s 40.1956 KOps/s $\color{#35bf28}+2.40\%$
test_step_mdp_speed[True-False-True-True-False] 42.7600μs 15.8982μs 62.9001 KOps/s 60.8960 KOps/s $\color{#35bf28}+3.29\%$
test_step_mdp_speed[True-False-True-False-True] 50.9850μs 14.0212μs 71.3207 KOps/s 70.0646 KOps/s $\color{#35bf28}+1.79\%$
test_step_mdp_speed[True-False-True-False-False] 34.7440μs 9.0065μs 111.0305 KOps/s 109.2813 KOps/s $\color{#35bf28}+1.60\%$
test_step_mdp_speed[True-False-False-True-True] 52.0060μs 25.4475μs 39.2966 KOps/s 38.2625 KOps/s $\color{#35bf28}+2.70\%$
test_step_mdp_speed[True-False-False-True-False] 53.4380μs 17.0824μs 58.5397 KOps/s 57.0661 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[True-False-False-False-True] 40.9560μs 14.9994μs 66.6695 KOps/s 65.1739 KOps/s $\color{#35bf28}+2.29\%$
test_step_mdp_speed[True-False-False-False-False] 38.1700μs 10.0860μs 99.1470 KOps/s 96.1816 KOps/s $\color{#35bf28}+3.08\%$
test_step_mdp_speed[False-True-True-True-True] 61.6640μs 24.4666μs 40.8721 KOps/s 40.2075 KOps/s $\color{#35bf28}+1.65\%$
test_step_mdp_speed[False-True-True-True-False] 57.7360μs 15.8900μs 62.9326 KOps/s 61.7635 KOps/s $\color{#35bf28}+1.89\%$
test_step_mdp_speed[False-True-True-False-True] 46.7370μs 16.1404μs 61.9563 KOps/s 60.6197 KOps/s $\color{#35bf28}+2.20\%$
test_step_mdp_speed[False-True-True-False-False] 49.6810μs 10.2567μs 97.4971 KOps/s 95.8179 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[False-True-False-True-True] 57.8370μs 25.8410μs 38.6981 KOps/s 37.9072 KOps/s $\color{#35bf28}+2.09\%$
test_step_mdp_speed[False-True-False-True-False] 51.5950μs 17.0040μs 58.8096 KOps/s 57.8319 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-True-False-False-True] 38.9320μs 17.1797μs 58.2083 KOps/s 57.3492 KOps/s $\color{#35bf28}+1.50\%$
test_step_mdp_speed[False-True-False-False-False] 45.1330μs 11.3933μs 87.7706 KOps/s 86.5803 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[False-False-True-True-True] 62.1150μs 26.7131μs 37.4348 KOps/s 36.6676 KOps/s $\color{#35bf28}+2.09\%$
test_step_mdp_speed[False-False-True-True-False] 46.9470μs 18.3287μs 54.5592 KOps/s 52.6202 KOps/s $\color{#35bf28}+3.68\%$
test_step_mdp_speed[False-False-True-False-True] 56.7250μs 17.2786μs 57.8752 KOps/s 56.9528 KOps/s $\color{#35bf28}+1.62\%$
test_step_mdp_speed[False-False-True-False-False] 0.1072ms 11.4702μs 87.1828 KOps/s 85.9531 KOps/s $\color{#35bf28}+1.43\%$
test_step_mdp_speed[False-False-False-True-True] 0.1840ms 27.9314μs 35.8020 KOps/s 35.4127 KOps/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-False-False-True-False] 60.8330μs 19.2760μs 51.8779 KOps/s 49.8502 KOps/s $\color{#35bf28}+4.07\%$
test_step_mdp_speed[False-False-False-False-True] 43.9420μs 18.1464μs 55.1073 KOps/s 53.9366 KOps/s $\color{#35bf28}+2.17\%$
test_step_mdp_speed[False-False-False-False-False] 52.3970μs 12.4475μs 80.3376 KOps/s 78.2613 KOps/s $\color{#35bf28}+2.65\%$
test_values[generalized_advantage_estimate-True-True] 9.8556ms 9.3652ms 106.7780 Ops/s 104.3380 Ops/s $\color{#35bf28}+2.34\%$
test_values[vec_generalized_advantage_estimate-True-True] 38.1104ms 35.3068ms 28.3232 Ops/s 28.1289 Ops/s $\color{#35bf28}+0.69\%$
test_values[td0_return_estimate-False-False] 0.2171ms 0.1663ms 6.0136 KOps/s 5.5202 KOps/s $\textbf{\color{#35bf28}+8.94\%}$
test_values[td1_return_estimate-False-False] 26.2221ms 23.2443ms 43.0213 Ops/s 42.5454 Ops/s $\color{#35bf28}+1.12\%$
test_values[vec_td1_return_estimate-False-False] 36.6657ms 35.3796ms 28.2649 Ops/s 28.2320 Ops/s $\color{#35bf28}+0.12\%$
test_values[td_lambda_return_estimate-True-False] 34.8335ms 33.5814ms 29.7784 Ops/s 29.9888 Ops/s $\color{#d91a1a}-0.70\%$
test_values[vec_td_lambda_return_estimate-True-False] 37.1210ms 35.4393ms 28.2173 Ops/s 28.0824 Ops/s $\color{#35bf28}+0.48\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.0847ms 8.1340ms 122.9412 Ops/s 122.1736 Ops/s $\color{#35bf28}+0.63\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3559ms 2.0104ms 497.4099 Ops/s 508.9985 Ops/s $\color{#d91a1a}-2.28\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4397ms 0.3509ms 2.8499 KOps/s 2.8541 KOps/s $\color{#d91a1a}-0.15\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 48.6160ms 46.0222ms 21.7286 Ops/s 21.8424 Ops/s $\color{#d91a1a}-0.52\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.6095ms 3.0261ms 330.4591 Ops/s 328.0371 Ops/s $\color{#35bf28}+0.74\%$
test_dqn_speed 7.0253ms 1.3681ms 730.9443 Ops/s 743.4159 Ops/s $\color{#d91a1a}-1.68\%$
test_ddpg_speed 3.0080ms 2.7137ms 368.4944 Ops/s 371.8748 Ops/s $\color{#d91a1a}-0.91\%$
test_sac_speed 9.3823ms 8.2273ms 121.5460 Ops/s 112.9140 Ops/s $\textbf{\color{#35bf28}+7.64\%}$
test_redq_speed 14.6738ms 13.5082ms 74.0289 Ops/s 75.4641 Ops/s $\color{#d91a1a}-1.90\%$
test_redq_deprec_speed 16.3825ms 13.8212ms 72.3525 Ops/s 75.8567 Ops/s $\color{#d91a1a}-4.62\%$
test_td3_speed 16.4720ms 8.2494ms 121.2213 Ops/s 122.3328 Ops/s $\color{#d91a1a}-0.91\%$
test_cql_speed 38.2879ms 36.6135ms 27.3123 Ops/s 27.6303 Ops/s $\color{#d91a1a}-1.15\%$
test_a2c_speed 8.8133ms 7.4239ms 134.6997 Ops/s 135.1570 Ops/s $\color{#d91a1a}-0.34\%$
test_ppo_speed 8.7702ms 7.7710ms 128.6833 Ops/s 129.9824 Ops/s $\color{#d91a1a}-1.00\%$
test_reinforce_speed 9.0330ms 6.8494ms 145.9974 Ops/s 151.8069 Ops/s $\color{#d91a1a}-3.83\%$
test_iql_speed 39.3403ms 33.3765ms 29.9612 Ops/s 30.6083 Ops/s $\color{#d91a1a}-2.11\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.4527ms 2.2024ms 454.0443 Ops/s 445.7992 Ops/s $\color{#35bf28}+1.85\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9992ms 0.5019ms 1.9924 KOps/s 2.0010 KOps/s $\color{#d91a1a}-0.43\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6594ms 0.4752ms 2.1046 KOps/s 2.1228 KOps/s $\color{#d91a1a}-0.86\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.6138ms 2.2930ms 436.1068 Ops/s 445.1155 Ops/s $\color{#d91a1a}-2.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0169ms 0.4953ms 2.0189 KOps/s 2.0443 KOps/s $\color{#d91a1a}-1.24\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6661ms 0.4688ms 2.1331 KOps/s 2.1427 KOps/s $\color{#d91a1a}-0.45\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8958ms 1.2180ms 821.0388 Ops/s 778.3532 Ops/s $\textbf{\color{#35bf28}+5.48\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 4.2978ms 1.1502ms 869.3972 Ops/s 871.8975 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.2866ms 2.3512ms 425.3058 Ops/s 424.6132 Ops/s $\color{#35bf28}+0.16\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.1063s 0.7114ms 1.4057 KOps/s 1.6324 KOps/s $\textbf{\color{#d91a1a}-13.88\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9199ms 0.5979ms 1.6725 KOps/s 1.7072 KOps/s $\color{#d91a1a}-2.03\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.7143ms 2.3406ms 427.2466 Ops/s 447.2209 Ops/s $\color{#d91a1a}-4.47\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0986ms 0.5161ms 1.9375 KOps/s 2.0153 KOps/s $\color{#d91a1a}-3.86\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5912ms 0.4788ms 2.0887 KOps/s 2.1016 KOps/s $\color{#d91a1a}-0.61\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.5073ms 2.3662ms 422.6176 Ops/s 441.5814 Ops/s $\color{#d91a1a}-4.29\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6615ms 0.4979ms 2.0086 KOps/s 2.0476 KOps/s $\color{#d91a1a}-1.91\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.9965ms 0.4825ms 2.0725 KOps/s 2.1134 KOps/s $\color{#d91a1a}-1.93\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.4552ms 2.3668ms 422.5106 Ops/s 414.0246 Ops/s $\color{#35bf28}+2.05\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7596ms 0.6141ms 1.6284 KOps/s 1.6319 KOps/s $\color{#d91a1a}-0.21\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7702ms 0.5984ms 1.6711 KOps/s 1.6973 KOps/s $\color{#d91a1a}-1.54\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1221s 7.8768ms 126.9554 Ops/s 127.4865 Ops/s $\color{#d91a1a}-0.42\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 15.1955ms 12.3321ms 81.0890 Ops/s 83.4848 Ops/s $\color{#d91a1a}-2.87\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.8907ms 1.1164ms 895.7701 Ops/s 958.3805 Ops/s $\textbf{\color{#d91a1a}-6.53\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1047s 5.7461ms 174.0313 Ops/s 173.8109 Ops/s $\color{#35bf28}+0.13\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 16.4739ms 12.2293ms 81.7706 Ops/s 82.3765 Ops/s $\color{#d91a1a}-0.74\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.7063ms 1.0951ms 913.1350 Ops/s 944.6832 Ops/s $\color{#d91a1a}-3.34\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1164s 6.3100ms 158.4797 Ops/s 122.9751 Ops/s $\textbf{\color{#35bf28}+28.87\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1260s 15.2692ms 65.4914 Ops/s 80.2057 Ops/s $\textbf{\color{#d91a1a}-18.35\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.2557ms 1.4771ms 677.0150 Ops/s 739.9074 Ops/s $\textbf{\color{#d91a1a}-8.50\%}$

Copy link

github-actions bot commented Mar 26, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1027s 0.1013s 9.8716 Ops/s 9.2686 Ops/s $\textbf{\color{#35bf28}+6.51\%}$
test_sync 88.8339ms 87.3446ms 11.4489 Ops/s 11.2616 Ops/s $\color{#35bf28}+1.66\%$
test_async 0.1626s 71.8186ms 13.9240 Ops/s 13.9564 Ops/s $\color{#d91a1a}-0.23\%$
test_single_pixels 0.1106s 0.1097s 9.1162 Ops/s 9.1859 Ops/s $\color{#d91a1a}-0.76\%$
test_sync_pixels 69.3908ms 67.4312ms 14.8299 Ops/s 15.0853 Ops/s $\color{#d91a1a}-1.69\%$
test_async_pixels 0.1284s 63.4724ms 15.7549 Ops/s 18.1342 Ops/s $\textbf{\color{#d91a1a}-13.12\%}$
test_simple 0.7513s 0.6842s 1.4615 Ops/s 1.4451 Ops/s $\color{#35bf28}+1.13\%$
test_transformed 0.9641s 0.9011s 1.1097 Ops/s 1.1288 Ops/s $\color{#d91a1a}-1.69\%$
test_serial 2.1475s 2.0823s 0.4802 Ops/s 0.4758 Ops/s $\color{#35bf28}+0.93\%$
test_parallel 1.9230s 1.8435s 0.5424 Ops/s 0.5575 Ops/s $\color{#d91a1a}-2.70\%$
test_step_mdp_speed[True-True-True-True-True] 90.2910μs 33.7893μs 29.5951 KOps/s 30.3243 KOps/s $\color{#d91a1a}-2.40\%$
test_step_mdp_speed[True-True-True-True-False] 55.0010μs 19.9174μs 50.2073 KOps/s 50.8639 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-True-True-False-True] 85.7810μs 18.8942μs 52.9263 KOps/s 54.7489 KOps/s $\color{#d91a1a}-3.33\%$
test_step_mdp_speed[True-True-True-False-False] 26.6100μs 11.1959μs 89.3187 KOps/s 89.7529 KOps/s $\color{#d91a1a}-0.48\%$
test_step_mdp_speed[True-True-False-True-True] 70.7210μs 35.3093μs 28.3212 KOps/s 29.3588 KOps/s $\color{#d91a1a}-3.53\%$
test_step_mdp_speed[True-True-False-True-False] 81.1310μs 21.7125μs 46.0563 KOps/s 47.1354 KOps/s $\color{#d91a1a}-2.29\%$
test_step_mdp_speed[True-True-False-False-True] 40.9910μs 20.9013μs 47.8440 KOps/s 49.0400 KOps/s $\color{#d91a1a}-2.44\%$
test_step_mdp_speed[True-True-False-False-False] 47.1810μs 13.2735μs 75.3383 KOps/s 77.4936 KOps/s $\color{#d91a1a}-2.78\%$
test_step_mdp_speed[True-False-True-True-True] 62.3610μs 36.9616μs 27.0551 KOps/s 27.7295 KOps/s $\color{#d91a1a}-2.43\%$
test_step_mdp_speed[True-False-True-True-False] 51.4710μs 23.8136μs 41.9927 KOps/s 43.7485 KOps/s $\color{#d91a1a}-4.01\%$
test_step_mdp_speed[True-False-True-False-True] 38.9510μs 20.7645μs 48.1592 KOps/s 50.6265 KOps/s $\color{#d91a1a}-4.87\%$
test_step_mdp_speed[True-False-True-False-False] 28.7700μs 13.1848μs 75.8449 KOps/s 77.0773 KOps/s $\color{#d91a1a}-1.60\%$
test_step_mdp_speed[True-False-False-True-True] 66.9610μs 39.2771μs 25.4602 KOps/s 26.1514 KOps/s $\color{#d91a1a}-2.64\%$
test_step_mdp_speed[True-False-False-True-False] 71.3710μs 25.5317μs 39.1669 KOps/s 40.0616 KOps/s $\color{#d91a1a}-2.23\%$
test_step_mdp_speed[True-False-False-False-True] 42.0310μs 22.5788μs 44.2894 KOps/s 46.2623 KOps/s $\color{#d91a1a}-4.26\%$
test_step_mdp_speed[True-False-False-False-False] 33.9710μs 15.1331μs 66.0803 KOps/s 67.5110 KOps/s $\color{#d91a1a}-2.12\%$
test_step_mdp_speed[False-True-True-True-True] 72.2910μs 37.6643μs 26.5504 KOps/s 28.0183 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_step_mdp_speed[False-True-True-True-False] 43.0200μs 23.6058μs 42.3625 KOps/s 43.8102 KOps/s $\color{#d91a1a}-3.30\%$
test_step_mdp_speed[False-True-True-False-True] 57.0110μs 24.8980μs 40.1639 KOps/s 41.7949 KOps/s $\color{#d91a1a}-3.90\%$
test_step_mdp_speed[False-True-True-False-False] 40.4010μs 14.7952μs 67.5896 KOps/s 68.3151 KOps/s $\color{#d91a1a}-1.06\%$
test_step_mdp_speed[False-True-False-True-True] 78.2510μs 39.2851μs 25.4549 KOps/s 26.2466 KOps/s $\color{#d91a1a}-3.02\%$
test_step_mdp_speed[False-True-False-True-False] 0.2068ms 25.5247μs 39.1778 KOps/s 39.8467 KOps/s $\color{#d91a1a}-1.68\%$
test_step_mdp_speed[False-True-False-False-True] 47.8910μs 26.3705μs 37.9212 KOps/s 38.8869 KOps/s $\color{#d91a1a}-2.48\%$
test_step_mdp_speed[False-True-False-False-False] 38.5310μs 16.7475μs 59.7103 KOps/s 60.5427 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-False-True-True-True] 76.9010μs 41.3910μs 24.1598 KOps/s 25.1420 KOps/s $\color{#d91a1a}-3.91\%$
test_step_mdp_speed[False-False-True-True-False] 45.4310μs 27.4479μs 36.4327 KOps/s 37.1553 KOps/s $\color{#d91a1a}-1.94\%$
test_step_mdp_speed[False-False-True-False-True] 53.5500μs 26.1215μs 38.2827 KOps/s 39.0848 KOps/s $\color{#d91a1a}-2.05\%$
test_step_mdp_speed[False-False-True-False-False] 33.2410μs 16.7285μs 59.7784 KOps/s 60.1820 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-False-False-True-True] 69.4610μs 42.0668μs 23.7717 KOps/s 24.2140 KOps/s $\color{#d91a1a}-1.83\%$
test_step_mdp_speed[False-False-False-True-False] 59.4610μs 29.3196μs 34.1069 KOps/s 34.9961 KOps/s $\color{#d91a1a}-2.54\%$
test_step_mdp_speed[False-False-False-False-True] 50.9710μs 27.9004μs 35.8417 KOps/s 36.9359 KOps/s $\color{#d91a1a}-2.96\%$
test_step_mdp_speed[False-False-False-False-False] 44.4710μs 18.3406μs 54.5237 KOps/s 54.6840 KOps/s $\color{#d91a1a}-0.29\%$
test_values[generalized_advantage_estimate-True-True] 25.0296ms 24.5631ms 40.7114 Ops/s 41.6859 Ops/s $\color{#d91a1a}-2.34\%$
test_values[vec_generalized_advantage_estimate-True-True] 82.5006ms 3.2208ms 310.4796 Ops/s 307.9508 Ops/s $\color{#35bf28}+0.82\%$
test_values[td0_return_estimate-False-False] 94.0910μs 66.3601μs 15.0693 KOps/s 15.1780 KOps/s $\color{#d91a1a}-0.72\%$
test_values[td1_return_estimate-False-False] 55.7502ms 55.4171ms 18.0450 Ops/s 18.5148 Ops/s $\color{#d91a1a}-2.54\%$
test_values[vec_td1_return_estimate-False-False] 2.1321ms 1.7760ms 563.0503 Ops/s 564.4104 Ops/s $\color{#d91a1a}-0.24\%$
test_values[td_lambda_return_estimate-True-False] 88.1122ms 87.8007ms 11.3894 Ops/s 11.6763 Ops/s $\color{#d91a1a}-2.46\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.1168ms 1.7737ms 563.7921 Ops/s 563.6563 Ops/s $\color{#35bf28}+0.02\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.6290ms 24.4217ms 40.9472 Ops/s 42.2255 Ops/s $\color{#d91a1a}-3.03\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9146ms 0.7171ms 1.3945 KOps/s 1.3991 KOps/s $\color{#d91a1a}-0.33\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7287ms 0.6613ms 1.5121 KOps/s 1.5159 KOps/s $\color{#d91a1a}-0.25\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5980ms 1.4669ms 681.7070 Ops/s 680.5353 Ops/s $\color{#35bf28}+0.17\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9825ms 0.6852ms 1.4594 KOps/s 1.4688 KOps/s $\color{#d91a1a}-0.64\%$
test_dqn_speed 3.1749ms 1.4535ms 687.9778 Ops/s 695.7797 Ops/s $\color{#d91a1a}-1.12\%$
test_ddpg_speed 2.9353ms 2.7616ms 362.1076 Ops/s 366.3564 Ops/s $\color{#d91a1a}-1.16\%$
test_sac_speed 8.5834ms 8.1405ms 122.8432 Ops/s 124.0672 Ops/s $\color{#d91a1a}-0.99\%$
test_redq_speed 11.2733ms 10.2944ms 97.1403 Ops/s 97.2235 Ops/s $\color{#d91a1a}-0.09\%$
test_redq_deprec_speed 11.4476ms 11.0777ms 90.2718 Ops/s 84.7380 Ops/s $\textbf{\color{#35bf28}+6.53\%}$
test_td3_speed 8.1302ms 8.0675ms 123.9537 Ops/s 125.1050 Ops/s $\color{#d91a1a}-0.92\%$
test_cql_speed 26.5135ms 25.3384ms 39.4657 Ops/s 39.7049 Ops/s $\color{#d91a1a}-0.60\%$
test_a2c_speed 7.1851ms 5.7267ms 174.6218 Ops/s 180.7659 Ops/s $\color{#d91a1a}-3.40\%$
test_ppo_speed 6.6788ms 6.0180ms 166.1682 Ops/s 169.5433 Ops/s $\color{#d91a1a}-1.99\%$
test_reinforce_speed 4.8481ms 4.5732ms 218.6652 Ops/s 222.9516 Ops/s $\color{#d91a1a}-1.92\%$
test_iql_speed 20.4252ms 19.7591ms 50.6095 Ops/s 51.3232 Ops/s $\color{#d91a1a}-1.39\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0360ms 2.8853ms 346.5835 Ops/s 344.3605 Ops/s $\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6862ms 0.5444ms 1.8369 KOps/s 1.6162 KOps/s $\textbf{\color{#35bf28}+13.65\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.5183ms 0.5229ms 1.9125 KOps/s 1.9406 KOps/s $\color{#d91a1a}-1.45\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1380ms 2.9059ms 344.1327 Ops/s 341.6145 Ops/s $\color{#35bf28}+0.74\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7098ms 0.5388ms 1.8559 KOps/s 1.8718 KOps/s $\color{#d91a1a}-0.85\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.6065ms 0.5185ms 1.9287 KOps/s 1.9670 KOps/s $\color{#d91a1a}-1.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6134ms 1.4694ms 680.5312 Ops/s 700.5742 Ops/s $\color{#d91a1a}-2.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 5.1934ms 1.3983ms 715.1658 Ops/s 722.0869 Ops/s $\color{#d91a1a}-0.96\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.2160ms 3.0210ms 331.0203 Ops/s 331.1683 Ops/s $\color{#d91a1a}-0.04\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.4843ms 0.6711ms 1.4901 KOps/s 1.4909 KOps/s $\color{#d91a1a}-0.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8164ms 0.6450ms 1.5503 KOps/s 1.5352 KOps/s $\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0815ms 2.8976ms 345.1158 Ops/s 343.7651 Ops/s $\color{#35bf28}+0.39\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.4427ms 0.5444ms 1.8370 KOps/s 1.8479 KOps/s $\color{#d91a1a}-0.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6908ms 0.5218ms 1.9166 KOps/s 1.9316 KOps/s $\color{#d91a1a}-0.78\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1532ms 2.9293ms 341.3839 Ops/s 341.8258 Ops/s $\color{#d91a1a}-0.13\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7729ms 0.5394ms 1.8538 KOps/s 1.8645 KOps/s $\color{#d91a1a}-0.58\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.5562ms 0.5200ms 1.9229 KOps/s 1.9491 KOps/s $\color{#d91a1a}-1.34\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1995ms 3.0164ms 331.5176 Ops/s 328.2921 Ops/s $\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8283ms 0.6711ms 1.4902 KOps/s 1.4957 KOps/s $\color{#d91a1a}-0.37\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8202ms 0.6484ms 1.5423 KOps/s 1.5277 KOps/s $\color{#35bf28}+0.96\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1237s 9.5222ms 105.0179 Ops/s 134.3401 Ops/s $\textbf{\color{#d91a1a}-21.83\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.4338ms 15.1665ms 65.9349 Ops/s 58.6331 Ops/s $\textbf{\color{#35bf28}+12.45\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.3949ms 1.1868ms 842.5853 Ops/s 954.8889 Ops/s $\textbf{\color{#d91a1a}-11.76\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1178s 7.1699ms 139.4716 Ops/s 139.4107 Ops/s $\color{#35bf28}+0.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.2727ms 15.0543ms 66.4263 Ops/s 68.0604 Ops/s $\color{#d91a1a}-2.40\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.5677ms 1.1526ms 867.5911 Ops/s 951.5833 Ops/s $\textbf{\color{#d91a1a}-8.83\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1190s 7.5575ms 132.3191 Ops/s 133.2891 Ops/s $\color{#d91a1a}-0.73\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1289s 17.7859ms 56.2244 Ops/s 66.5063 Ops/s $\textbf{\color{#d91a1a}-15.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.4459ms 1.6180ms 618.0521 Ops/s 708.5483 Ops/s $\textbf{\color{#d91a1a}-12.77\%}$

@vmoens vmoens changed the title [WIP] optionally set truncated = True at the end of rollouts [Feature] optionally set truncated = True at the end of rollouts Mar 27, 2024
@vmoens vmoens added the enhancement New feature or request label Mar 27, 2024
@vmoens vmoens merged commit f439b54 into main Mar 27, 2024
62 of 67 checks passed
@vmoens vmoens deleted the truncated-rollouts branch March 27, 2024 09:11
@skandermoalla
Copy link
Contributor

skandermoalla commented Mar 27, 2024

@vmoens I would raise a warning in collectors when set_truncated=True but reset_at_each_iter=False, or even raise an error.

@vmoens
Copy link
Contributor Author

vmoens commented Mar 27, 2024

@vmoens I would raise a warning in collectors when set_truncated=True but reset_at_each_iter=False, or even raise an error.

Can you elaborate? To me it's ok to have set_truncated=True when you don't reset. You will have trajectories that are slices of real ones (start of the batch isn't start of the episode) but at least you can delimitate what comes from where if you do a reshape(-1) or this sort of things, allowing you to feed that to GAE or other modules without worrying about one trajectory polluting another.

@skandermoalla
Copy link
Contributor

skandermoalla commented Mar 27, 2024

TLDR: my bad, I guess this is a valid thing to do, but if you have something that computes the episodic return out of that, it needs to be adjusted.

Okay, I see. I didn't have that use case in mind. I had the opposite assumptions:

  1. When truncated happens in the middle of the rollout, it is set and a value estimator (GAE) will know to bootstrap and not sum over the next rewards. This is not an issue here.

  2. When a rollout stops in the middle of a trajectory and you feed the trajectory to a GAE, it will bootstrap at the end because terminated is not set (truncated is not needed it already knows there are no more samples sum over)

2b. when you decide to rollout twice because for some reason your algorithm told you you need more data, you can concatenate the rollouts and the GAE will make use of more samples.

I thought you would never want to bootstrap (set truncated) when the next data coming can still be used to compute a value estimate (no reset) but I guess that's wrong. You can just want to compute your values on a fixed number of rollout steps.

I can think of another issue though. If you have something that computes the episodic return on the trajectories it will have the wrong signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants