-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] RB MultiStep transform #2008
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2008
Note: Links to docs will display an error until the docs builds have been completed. ❌ 6 New Failures, 1 Unrelated FailureAs of commit ddfce88 with merge base 2b8450c (): NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 55.2397ms | 54.3096ms | 18.4129 Ops/s | 17.0154 Ops/s | |
test_sync | 49.1180ms | 30.5735ms | 32.7081 Ops/s | 33.3896 Ops/s | |
test_async | 55.8512ms | 28.8483ms | 34.6641 Ops/s | 34.8361 Ops/s | |
test_simple | 0.4021s | 0.3431s | 2.9150 Ops/s | 2.8794 Ops/s | |
test_transformed | 0.5228s | 0.4774s | 2.0945 Ops/s | 2.0813 Ops/s | |
test_serial | 1.2587s | 1.2006s | 0.8329 Ops/s | 0.8088 Ops/s | |
test_parallel | 1.0810s | 1.0446s | 0.9573 Ops/s | 0.9474 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 0.1357ms | 21.1236μs | 47.3404 KOps/s | 47.9664 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 39.6240μs | 12.8604μs | 77.7584 KOps/s | 78.8775 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 67.7060μs | 12.5542μs | 79.6543 KOps/s | 82.1345 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 28.2420μs | 7.5815μs | 131.9006 KOps/s | 134.5477 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 64.2100μs | 22.5734μs | 44.2999 KOps/s | 45.3420 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 51.1950μs | 13.9565μs | 71.6514 KOps/s | 72.0603 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 47.4380μs | 13.7273μs | 72.8475 KOps/s | 75.0534 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 22.7920μs | 8.8217μs | 113.3565 KOps/s | 115.2522 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 84.8280μs | 23.5563μs | 42.4515 KOps/s | 42.7198 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 58.5290μs | 15.3836μs | 65.0044 KOps/s | 66.1872 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 66.8340μs | 13.5837μs | 73.6178 KOps/s | 70.8827 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 27.8610μs | 8.8230μs | 113.3406 KOps/s | 117.4087 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 71.4330μs | 24.9371μs | 40.1009 KOps/s | 40.9825 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 60.6630μs | 16.4484μs | 60.7963 KOps/s | 62.0824 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 34.2440μs | 14.7981μs | 67.5762 KOps/s | 69.2625 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 54.1510μs | 9.8606μs | 101.4142 KOps/s | 103.0649 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 56.9060μs | 23.7541μs | 42.0979 KOps/s | 42.9149 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 59.9410μs | 15.3455μs | 65.1656 KOps/s | 66.1454 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 41.1060μs | 15.7380μs | 63.5403 KOps/s | 65.1623 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 58.3980μs | 9.9359μs | 100.6455 KOps/s | 102.7644 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 37.1890μs | 25.2497μs | 39.6045 KOps/s | 40.7799 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 62.0860μs | 16.2656μs | 61.4794 KOps/s | 61.6155 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 38.1710μs | 16.8938μs | 59.1935 KOps/s | 59.9536 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 53.5790μs | 11.0696μs | 90.3374 KOps/s | 91.5644 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 80.1690μs | 25.9467μs | 38.5406 KOps/s | 39.1314 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 57.9980μs | 17.7593μs | 56.3086 KOps/s | 57.0719 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 68.8580μs | 16.9200μs | 59.1016 KOps/s | 59.9579 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 56.2440μs | 11.1648μs | 89.5676 KOps/s | 91.9210 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 73.6370μs | 27.0882μs | 36.9164 KOps/s | 37.2905 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 56.3040μs | 18.7844μs | 53.2357 KOps/s | 53.9330 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 63.5980μs | 17.9066μs | 55.8453 KOps/s | 56.4411 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 60.0710μs | 12.1551μs | 82.2698 KOps/s | 83.7082 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 9.6162ms | 9.3432ms | 107.0297 Ops/s | 101.7809 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 38.9748ms | 35.1340ms | 28.4625 Ops/s | 29.7484 Ops/s | |
test_values[td0_return_estimate-False-False] | 0.1956ms | 0.1675ms | 5.9710 KOps/s | 5.1201 KOps/s | |
test_values[td1_return_estimate-False-False] | 25.9116ms | 23.5836ms | 42.4023 Ops/s | 41.9541 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 36.8319ms | 35.6383ms | 28.0597 Ops/s | 29.6684 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 36.5092ms | 33.6616ms | 29.7075 Ops/s | 29.5712 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 36.7253ms | 35.5446ms | 28.1337 Ops/s | 29.8620 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 8.2880ms | 8.1641ms | 122.4876 Ops/s | 121.1500 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 2.1138ms | 1.8609ms | 537.3641 Ops/s | 492.3930 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.4161ms | 0.3398ms | 2.9430 KOps/s | 2.8458 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 48.4963ms | 46.7232ms | 21.4026 Ops/s | 24.5081 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 3.7285ms | 3.0080ms | 332.4431 Ops/s | 327.7110 Ops/s | |
test_dqn_speed | 6.9942ms | 1.3600ms | 735.3133 Ops/s | 723.2078 Ops/s | |
test_ddpg_speed | 3.3529ms | 2.6785ms | 373.3496 Ops/s | 368.9682 Ops/s | |
test_sac_speed | 9.2023ms | 8.1403ms | 122.8459 Ops/s | 120.2305 Ops/s | |
test_redq_speed | 14.1571ms | 12.9521ms | 77.2076 Ops/s | 75.4805 Ops/s | |
test_redq_deprec_speed | 14.2136ms | 13.0098ms | 76.8653 Ops/s | 75.8306 Ops/s | |
test_td3_speed | 9.6185ms | 8.0905ms | 123.6025 Ops/s | 121.5871 Ops/s | |
test_cql_speed | 36.8606ms | 35.7516ms | 27.9708 Ops/s | 27.5889 Ops/s | |
test_a2c_speed | 77.8576ms | 7.8578ms | 127.2617 Ops/s | 134.9268 Ops/s | |
test_ppo_speed | 9.0401ms | 7.6721ms | 130.3429 Ops/s | 131.0389 Ops/s | |
test_reinforce_speed | 7.3551ms | 6.6135ms | 151.2060 Ops/s | 152.7481 Ops/s | |
test_iql_speed | 33.3002ms | 32.0147ms | 31.2356 Ops/s | 30.4422 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.5435ms | 2.2603ms | 442.4180 Ops/s | 438.6506 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.9111ms | 0.5026ms | 1.9896 KOps/s | 1.8097 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.6353ms | 0.4724ms | 2.1168 KOps/s | 1.9480 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 2.5590ms | 2.2097ms | 452.5507 Ops/s | 449.6242 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.9793ms | 0.4897ms | 2.0421 KOps/s | 2.0491 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6426ms | 0.4669ms | 2.1418 KOps/s | 2.1273 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.5997ms | 1.2705ms | 787.0846 Ops/s | 748.4306 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 1.4226ms | 1.2057ms | 829.3599 Ops/s | 802.6966 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.4830ms | 2.3388ms | 427.5640 Ops/s | 437.2774 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 95.1741ms | 0.6856ms | 1.4585 KOps/s | 1.6173 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8731ms | 0.5811ms | 1.7210 KOps/s | 1.7004 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 5.8809ms | 2.2455ms | 445.3319 Ops/s | 449.0400 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.5973ms | 0.5003ms | 1.9987 KOps/s | 2.0056 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 3.7021ms | 0.4820ms | 2.0749 KOps/s | 2.1148 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.5995ms | 2.3980ms | 417.0226 Ops/s | 443.2812 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6752ms | 0.4897ms | 2.0423 KOps/s | 2.0323 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 0.6713ms | 0.4674ms | 2.1396 KOps/s | 2.1184 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.5745ms | 2.3818ms | 419.8487 Ops/s | 414.8421 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.1136ms | 0.6114ms | 1.6356 KOps/s | 1.6219 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7501ms | 0.5818ms | 1.7188 KOps/s | 1.6755 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1024s | 7.5131ms | 133.1017 Ops/s | 137.2360 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 14.4365ms | 12.0554ms | 82.9502 Ops/s | 83.6324 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 3.7645ms | 1.1416ms | 875.9338 Ops/s | 949.1960 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 87.3050ms | 5.3301ms | 187.6152 Ops/s | 135.7376 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 14.4403ms | 12.0072ms | 83.2834 Ops/s | 83.9775 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 3.6615ms | 1.1192ms | 893.4643 Ops/s | 956.7210 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 90.9132ms | 7.5499ms | 132.4516 Ops/s | 163.7701 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 14.5945ms | 12.2862ms | 81.3922 Ops/s | 68.5295 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 2.2413ms | 1.3760ms | 726.7642 Ops/s | 732.4263 Ops/s |
|
Name | Max | Mean | Ops | Ops on Repo HEAD
|
Change |
---|---|---|---|---|---|
test_single | 0.1026s | 0.1008s | 9.9224 Ops/s | 8.9794 Ops/s | |
test_sync | 93.2983ms | 90.4722ms | 11.0531 Ops/s | 10.8826 Ops/s | |
test_async | 0.1762s | 89.0927ms | 11.2243 Ops/s | 11.2765 Ops/s | |
test_single_pixels | 0.1816s | 0.1181s | 8.4683 Ops/s | 8.8167 Ops/s | |
test_sync_pixels | 69.0961ms | 67.7455ms | 14.7611 Ops/s | 14.7793 Ops/s | |
test_async_pixels | 0.1225s | 55.4467ms | 18.0353 Ops/s | 17.5950 Ops/s | |
test_simple | 0.7276s | 0.6723s | 1.4875 Ops/s | 1.4588 Ops/s | |
test_transformed | 0.9257s | 0.8713s | 1.1477 Ops/s | 1.1124 Ops/s | |
test_serial | 2.1422s | 2.0915s | 0.4781 Ops/s | 0.4664 Ops/s | |
test_parallel | 1.8980s | 1.8659s | 0.5359 Ops/s | 0.5502 Ops/s | |
test_step_mdp_speed[True-True-True-True-True] | 87.2150μs | 32.6232μs | 30.6530 KOps/s | 30.5787 KOps/s | |
test_step_mdp_speed[True-True-True-True-False] | 38.5830μs | 19.6904μs | 50.7861 KOps/s | 51.1338 KOps/s | |
test_step_mdp_speed[True-True-True-False-True] | 46.9420μs | 18.7218μs | 53.4136 KOps/s | 53.2623 KOps/s | |
test_step_mdp_speed[True-True-True-False-False] | 37.3310μs | 11.1527μs | 89.6646 KOps/s | 89.0231 KOps/s | |
test_step_mdp_speed[True-True-False-True-True] | 58.4630μs | 34.4433μs | 29.0332 KOps/s | 28.9692 KOps/s | |
test_step_mdp_speed[True-True-False-True-False] | 38.5420μs | 21.2697μs | 47.0152 KOps/s | 47.1266 KOps/s | |
test_step_mdp_speed[True-True-False-False-True] | 36.7630μs | 20.4306μs | 48.9462 KOps/s | 49.8005 KOps/s | |
test_step_mdp_speed[True-True-False-False-False] | 36.7520μs | 13.0397μs | 76.6891 KOps/s | 76.4129 KOps/s | |
test_step_mdp_speed[True-False-True-True-True] | 58.0240μs | 36.4431μs | 27.4400 KOps/s | 27.7327 KOps/s | |
test_step_mdp_speed[True-False-True-True-False] | 42.9730μs | 23.2366μs | 43.0356 KOps/s | 42.8265 KOps/s | |
test_step_mdp_speed[True-False-True-False-True] | 45.1030μs | 20.2512μs | 49.3798 KOps/s | 49.2630 KOps/s | |
test_step_mdp_speed[True-False-True-False-False] | 27.7520μs | 12.9574μs | 77.1762 KOps/s | 77.1979 KOps/s | |
test_step_mdp_speed[True-False-False-True-True] | 64.2140μs | 38.3790μs | 26.0559 KOps/s | 26.6114 KOps/s | |
test_step_mdp_speed[True-False-False-True-False] | 49.4130μs | 25.2262μs | 39.6413 KOps/s | 40.2207 KOps/s | |
test_step_mdp_speed[True-False-False-False-True] | 47.3130μs | 21.9061μs | 45.6495 KOps/s | 45.6322 KOps/s | |
test_step_mdp_speed[True-False-False-False-False] | 38.4620μs | 14.6481μs | 68.2681 KOps/s | 67.9560 KOps/s | |
test_step_mdp_speed[False-True-True-True-True] | 60.2940μs | 36.3306μs | 27.5250 KOps/s | 27.7301 KOps/s | |
test_step_mdp_speed[False-True-True-True-False] | 55.1330μs | 23.1913μs | 43.1196 KOps/s | 43.0816 KOps/s | |
test_step_mdp_speed[False-True-True-False-True] | 53.9630μs | 24.5690μs | 40.7016 KOps/s | 42.2944 KOps/s | |
test_step_mdp_speed[False-True-True-False-False] | 37.7930μs | 14.7929μs | 67.6001 KOps/s | 67.3720 KOps/s | |
test_step_mdp_speed[False-True-False-True-True] | 69.6240μs | 38.6807μs | 25.8527 KOps/s | 26.0104 KOps/s | |
test_step_mdp_speed[False-True-False-True-False] | 46.9130μs | 25.3339μs | 39.4728 KOps/s | 39.6026 KOps/s | |
test_step_mdp_speed[False-True-False-False-True] | 43.2810μs | 25.8333μs | 38.7098 KOps/s | 38.2899 KOps/s | |
test_step_mdp_speed[False-True-False-False-False] | 37.5420μs | 16.7718μs | 59.6239 KOps/s | 60.0938 KOps/s | |
test_step_mdp_speed[False-False-True-True-True] | 64.4130μs | 39.7703μs | 25.1444 KOps/s | 24.8837 KOps/s | |
test_step_mdp_speed[False-False-True-True-False] | 49.8730μs | 27.0405μs | 36.9815 KOps/s | 36.8751 KOps/s | |
test_step_mdp_speed[False-False-True-False-True] | 52.8720μs | 26.1020μs | 38.3112 KOps/s | 38.3862 KOps/s | |
test_step_mdp_speed[False-False-True-False-False] | 35.0010μs | 16.7248μs | 59.7915 KOps/s | 60.0534 KOps/s | |
test_step_mdp_speed[False-False-False-True-True] | 58.8740μs | 41.8937μs | 23.8699 KOps/s | 24.2384 KOps/s | |
test_step_mdp_speed[False-False-False-True-False] | 52.0730μs | 28.7207μs | 34.8181 KOps/s | 34.5716 KOps/s | |
test_step_mdp_speed[False-False-False-False-True] | 48.7430μs | 27.6634μs | 36.1488 KOps/s | 36.2752 KOps/s | |
test_step_mdp_speed[False-False-False-False-False] | 37.7720μs | 18.2018μs | 54.9396 KOps/s | 54.3757 KOps/s | |
test_values[generalized_advantage_estimate-True-True] | 25.8758ms | 24.6231ms | 40.6123 Ops/s | 39.2225 Ops/s | |
test_values[vec_generalized_advantage_estimate-True-True] | 95.8533ms | 3.4766ms | 287.6374 Ops/s | 307.2527 Ops/s | |
test_values[td0_return_estimate-False-False] | 93.2740μs | 64.1303μs | 15.5932 KOps/s | 15.0885 KOps/s | |
test_values[td1_return_estimate-False-False] | 52.2964ms | 51.6854ms | 19.3478 Ops/s | 18.1430 Ops/s | |
test_values[vec_td1_return_estimate-False-False] | 1.9428ms | 1.7481ms | 572.0559 Ops/s | 566.5981 Ops/s | |
test_values[td_lambda_return_estimate-True-False] | 83.1762ms | 82.5093ms | 12.1198 Ops/s | 11.4121 Ops/s | |
test_values[vec_td_lambda_return_estimate-True-False] | 2.0434ms | 1.7486ms | 571.8803 Ops/s | 567.6234 Ops/s | |
test_gae_speed[generalized_advantage_estimate-False-1-512] | 23.0065ms | 22.7172ms | 44.0195 Ops/s | 42.6969 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] | 0.8722ms | 0.6897ms | 1.4498 KOps/s | 1.4416 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] | 0.7120ms | 0.6399ms | 1.5628 KOps/s | 1.5468 KOps/s | |
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] | 1.4954ms | 1.4436ms | 692.7072 Ops/s | 688.9805 Ops/s | |
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] | 0.9430ms | 0.6635ms | 1.5071 KOps/s | 1.4986 KOps/s | |
test_dqn_speed | 7.9962ms | 1.4425ms | 693.2542 Ops/s | 683.3785 Ops/s | |
test_ddpg_speed | 2.9893ms | 2.7240ms | 367.1091 Ops/s | 364.3629 Ops/s | |
test_sac_speed | 8.4200ms | 8.0084ms | 124.8693 Ops/s | 121.5691 Ops/s | |
test_redq_speed | 11.1589ms | 10.0636ms | 99.3679 Ops/s | 95.9891 Ops/s | |
test_redq_deprec_speed | 11.4745ms | 10.7043ms | 93.4203 Ops/s | 88.2921 Ops/s | |
test_td3_speed | 8.2623ms | 7.9278ms | 126.1380 Ops/s | 122.4805 Ops/s | |
test_cql_speed | 26.3237ms | 25.2289ms | 39.6370 Ops/s | 39.0922 Ops/s | |
test_a2c_speed | 6.7543ms | 5.4871ms | 182.2458 Ops/s | 180.2229 Ops/s | |
test_ppo_speed | 6.7604ms | 5.7417ms | 174.1650 Ops/s | 169.8639 Ops/s | |
test_reinforce_speed | 4.6484ms | 4.4443ms | 225.0073 Ops/s | 222.2596 Ops/s | |
test_iql_speed | 19.7389ms | 19.1285ms | 52.2780 Ops/s | 51.2922 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 3.0869ms | 2.8796ms | 347.2729 Ops/s | 344.2655 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 0.6603ms | 0.5376ms | 1.8603 KOps/s | 1.8548 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 4.3175ms | 0.5163ms | 1.9368 KOps/s | 1.9228 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.1610ms | 2.8885ms | 346.2012 Ops/s | 346.8901 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6728ms | 0.5284ms | 1.8926 KOps/s | 1.8805 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.5575ms | 0.5161ms | 1.9375 KOps/s | 1.9579 KOps/s | |
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] | 1.6204ms | 1.5150ms | 660.0794 Ops/s | 661.7739 Ops/s | |
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] | 5.3627ms | 1.4471ms | 691.0345 Ops/s | 693.8739 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.0901ms | 3.0071ms | 332.5457 Ops/s | 332.0259 Ops/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 1.4649ms | 0.6579ms | 1.5199 KOps/s | 1.3228 KOps/s | |
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.8365ms | 0.6360ms | 1.5723 KOps/s | 1.5642 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] | 2.9517ms | 2.8778ms | 347.4826 Ops/s | 345.1936 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] | 1.2415ms | 0.5431ms | 1.8413 KOps/s | 1.8469 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] | 0.7203ms | 0.5170ms | 1.9343 KOps/s | 1.9338 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] | 3.2246ms | 2.9180ms | 342.7000 Ops/s | 345.6547 Ops/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] | 0.6193ms | 0.5298ms | 1.8875 KOps/s | 1.8886 KOps/s | |
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] | 4.5273ms | 0.5130ms | 1.9494 KOps/s | 1.9484 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] | 3.1921ms | 3.0077ms | 332.4791 Ops/s | 330.8049 Ops/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] | 0.8840ms | 0.6633ms | 1.5075 KOps/s | 1.4995 KOps/s | |
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] | 0.7743ms | 0.6394ms | 1.5640 KOps/s | 1.5560 KOps/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] | 0.1058s | 8.7414ms | 114.3980 Ops/s | 111.9774 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] | 16.6530ms | 14.3538ms | 69.6681 Ops/s | 68.1723 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] | 1.1402ms | 1.0397ms | 961.7768 Ops/s | 834.3765 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] | 0.1001s | 6.7293ms | 148.6036 Ops/s | 149.0107 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] | 16.6246ms | 14.3152ms | 69.8556 Ops/s | 68.3636 Ops/s | |
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] | 2.4998ms | 1.1988ms | 834.1687 Ops/s | 760.5105 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] | 0.1031s | 9.0474ms | 110.5291 Ops/s | 111.6162 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] | 17.2234ms | 14.6359ms | 68.3250 Ops/s | 67.0633 Ops/s | |
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] | 7.8760ms | 1.6455ms | 607.7174 Ops/s | 657.8848 Ops/s |
@AechPro I did not change the n_steps argument as you suggested for consistency within torchrl. I'll ping you when the preview of the doc is built for you to check if things make sense, in the meantime you can look at the diff |
@vmoens I understand that backwards compatibility is important, but I think if we expect torchrl to be largely used by a research community it seems a bit strange for the n_steps argument to behave in a slightly unexpected way. Consider the case where a user of torchrl is re-implementing an algorithm like RAINBOW in good faith, and they naturally set the n_step argument to 3 as suggested by the paper. Using the current behavior this would actually produce a multi-step return estimate with n=4 according to the algorithm implemented in the paper, which would meaningfully change the performance of the algorithm. I can imagine our hypothetical torchrl user becoming quite confused when they cannot replicate the results of RAINBOW despite having (seemingly) implemented everything correctly here. Further, if we expect new algorithms to be written using torchrl, this sort of discrepancy in the meaning of the multi-step parameter could introduce a point of confusion in the opposite direction: if a person not using torchrl is attempting to implement a paper which describes results from an algorithm implemented with torchrl using the multi-step return object, that person may find it difficult to replicate the results of the torchrl algorithm for the same reason as our hypothetical RAINBOW user. In my opinion preserving the meaning of the parameter in the multi-step return algorithm is more important than backwards compatibility because of the two examples I presented above, but I understand others may feel differently. At the very least I would strongly advocate for some easy to see documentation about this behavior whenever there is a tutorial using the |
Makes sense! Then let's make this |
I'm having a little trouble understanding the output of the example. We have the I'm sure I must be misunderstanding what's happening in the example. Could you clarify? |
Let me clarify: >>> print("step_count", rb[:]["step_count"][:, :5])
step_count tensor([[[ 9],
[10],
[11],
[12],
[13]],
[[12],
[13],
[14],
[15],
[16]]]) Env 0 has steps [9, 10, 11...] and env 1 [12, 13, 14,...] Then we look at the "next" entry to see the shift. >>> print("next step_count", rb[:]["next", "step_count"][:, :5])
next step_count tensor([[[13],
[14],
[15],
[16],
[17]],
[[16],
[17],
[18],
[19],
[20]]]) Note that we're looking at the replay buffer content so it doesn't really matter what those values are (ie it's expected that it doesn't start at 0). For a single env and n=3, you would have this data structure accessible in the buffer done: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
step_count: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
next, step_count_orig: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
next, step_count: [4, 5, 6, 7, 8, 9, 10, 10, 10, 10] If you think this isn't the desired behaviour I'd be happy to read what you think would be expected, but to me it looks like what multi-step requires. |
Ah-hah, my apologies for misunderstanding. Thanks for the clarification, this looks good! |
Although if you are looking to implement the change to the done: [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
step_count: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
next, step_count_orig: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
next, step_count: [3, 4, 5, 6, 7, 8, 9, 10, 10, 10] Because starting from state 0 we will compute the return estimate as the sum of rewards starting with timestep 0, then 1, then 2, and finally the learning algorithm will need to bootstrap from the state encountered at timestep 3. |
So given what you're saying, what it should be is: I will change that thx |
I'm not sure what the right outcome for With that said I'm not aware of anything in the literature using n=0, and I wonder if it would be confusing or if it's even necessary. |
The way I see it n=0 is equivalent to "doing nothing" which can be achieved by... doing nothing haha |
LOL 😅 yeah it makes sense to do it that way. I was just imagining a scenario where maybe a user is doing some sort of hyper-parameter investigation and they wanted to vary the value of n from 0 to some number without changing anything in the underlying learning algorithm, which is maybe a realistic thing to want if someone is interested in measuring the impact of rewards on the return estimate. |
I guess they will have to start from 1 :) I think accounting for those edge cases causes more hustle and loads the doc while bringing little value in practice (+ requires proper testing of a behaviour that is even poorly defined on our end!) |
Yeah, fair enough. Let's keep it as you suggested then! |
cc @agarwl this is an implementation of multi-step that allows to dynamically change the horizon during training |
Test script
cc @AechPro