Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix max-priority update #2215

Merged
merged 6 commits into from
Jun 8, 2024
Merged

[BugFix] Fix max-priority update #2215

merged 6 commits into from
Jun 8, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 7, 2024

No description provided.

Copy link

pytorch-bot bot commented Jun 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2215

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 4 New Failures

As of commit 40f7b2d with merge base 0813dc0 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 7, 2024
@vmoens vmoens added the bug Something isn't working label Jun 7, 2024
@vmoens vmoens linked an issue Jun 7, 2024 that may be closed by this pull request
3 tasks
@vmoens
Copy link
Contributor Author

vmoens commented Jun 7, 2024

@wertyuilife2 can you confirm that this makes sense?

Copy link

github-actions bot commented Jun 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1043s 58.8602ms 16.9894 Ops/s 17.6957 Ops/s $\color{#d91a1a}-3.99\%$
test_sync 37.1610ms 30.7970ms 32.4707 Ops/s 32.5672 Ops/s $\color{#d91a1a}-0.30\%$
test_async 48.3588ms 28.8045ms 34.7168 Ops/s 33.7761 Ops/s $\color{#35bf28}+2.79\%$
test_simple 0.4605s 0.3977s 2.5144 Ops/s 2.6398 Ops/s $\color{#d91a1a}-4.75\%$
test_transformed 0.5367s 0.5360s 1.8657 Ops/s 1.8770 Ops/s $\color{#d91a1a}-0.60\%$
test_serial 1.3348s 1.2833s 0.7792 Ops/s 0.7788 Ops/s $\color{#35bf28}+0.05\%$
test_parallel 1.1384s 1.0853s 0.9214 Ops/s 0.9214 Ops/s $+0.00\%$
test_step_mdp_speed[True-True-True-True-True] 0.1160ms 21.2276μs 47.1084 KOps/s 46.5276 KOps/s $\color{#35bf28}+1.25\%$
test_step_mdp_speed[True-True-True-True-False] 39.8250μs 12.8734μs 77.6798 KOps/s 76.1392 KOps/s $\color{#35bf28}+2.02\%$
test_step_mdp_speed[True-True-True-False-True] 46.6070μs 12.5464μs 79.7044 KOps/s 78.3356 KOps/s $\color{#35bf28}+1.75\%$
test_step_mdp_speed[True-True-True-False-False] 28.9740μs 7.6003μs 131.5735 KOps/s 128.7339 KOps/s $\color{#35bf28}+2.21\%$
test_step_mdp_speed[True-True-False-True-True] 46.5370μs 22.4981μs 44.4482 KOps/s 43.8488 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[True-True-False-True-False] 69.5100μs 14.1474μs 70.6842 KOps/s 69.3626 KOps/s $\color{#35bf28}+1.91\%$
test_step_mdp_speed[True-True-False-False-True] 35.7170μs 13.8797μs 72.0474 KOps/s 71.8321 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-True-False-False-False] 39.9420μs 8.7764μs 113.9417 KOps/s 109.6297 KOps/s $\color{#35bf28}+3.93\%$
test_step_mdp_speed[True-False-True-True-True] 50.8950μs 24.1007μs 41.4925 KOps/s 40.9210 KOps/s $\color{#35bf28}+1.40\%$
test_step_mdp_speed[True-False-True-True-False] 65.9630μs 15.4680μs 64.6494 KOps/s 64.0287 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-False-True-False-True] 39.3730μs 13.7240μs 72.8650 KOps/s 71.5930 KOps/s $\color{#35bf28}+1.78\%$
test_step_mdp_speed[True-False-True-False-False] 36.8590μs 8.7441μs 114.3624 KOps/s 111.6765 KOps/s $\color{#35bf28}+2.41\%$
test_step_mdp_speed[True-False-False-True-True] 53.2990μs 25.0813μs 39.8703 KOps/s 39.4315 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-False-True-False] 55.1830μs 16.7056μs 59.8600 KOps/s 59.4001 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-False-False-False-True] 34.7750μs 14.9684μs 66.8073 KOps/s 65.2349 KOps/s $\color{#35bf28}+2.41\%$
test_step_mdp_speed[True-False-False-False-False] 40.2550μs 9.9874μs 100.1260 KOps/s 99.7417 KOps/s $\color{#35bf28}+0.39\%$
test_step_mdp_speed[False-True-True-True-True] 51.6860μs 23.9312μs 41.7864 KOps/s 41.7730 KOps/s $\color{#35bf28}+0.03\%$
test_step_mdp_speed[False-True-True-True-False] 35.8170μs 15.5180μs 64.4414 KOps/s 63.7693 KOps/s $\color{#35bf28}+1.05\%$
test_step_mdp_speed[False-True-True-False-True] 47.1780μs 16.0036μs 62.4860 KOps/s 62.1575 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[False-True-True-False-False] 45.7350μs 10.0420μs 99.5819 KOps/s 100.2382 KOps/s $\color{#d91a1a}-0.65\%$
test_step_mdp_speed[False-True-False-True-True] 81.6520μs 24.7959μs 40.3292 KOps/s 39.6193 KOps/s $\color{#35bf28}+1.79\%$
test_step_mdp_speed[False-True-False-True-False] 37.5500μs 16.7234μs 59.7966 KOps/s 59.6463 KOps/s $\color{#35bf28}+0.25\%$
test_step_mdp_speed[False-True-False-False-True] 41.3060μs 17.2385μs 58.0098 KOps/s 58.3641 KOps/s $\color{#d91a1a}-0.61\%$
test_step_mdp_speed[False-True-False-False-False] 32.9820μs 11.2561μs 88.8406 KOps/s 89.0066 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[False-False-True-True-True] 58.1190μs 26.4074μs 37.8682 KOps/s 37.6319 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[False-False-True-True-False] 65.8530μs 17.9931μs 55.5767 KOps/s 55.0455 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[False-False-True-False-True] 44.6330μs 17.2401μs 58.0042 KOps/s 58.0825 KOps/s $\color{#d91a1a}-0.13\%$
test_step_mdp_speed[False-False-True-False-False] 39.1730μs 11.2879μs 88.5904 KOps/s 88.6854 KOps/s $\color{#d91a1a}-0.11\%$
test_step_mdp_speed[False-False-False-True-True] 40.9660μs 28.0060μs 35.7067 KOps/s 35.7087 KOps/s $-0.01\%$
test_step_mdp_speed[False-False-False-True-False] 49.6520μs 19.1233μs 52.2923 KOps/s 51.7268 KOps/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[False-False-False-False-True] 48.4210μs 18.3601μs 54.4660 KOps/s 55.1702 KOps/s $\color{#d91a1a}-1.28\%$
test_step_mdp_speed[False-False-False-False-False] 34.7240μs 12.2876μs 81.3830 KOps/s 82.1562 KOps/s $\color{#d91a1a}-0.94\%$
test_values[generalized_advantage_estimate-True-True] 12.7881ms 9.7993ms 102.0479 Ops/s 104.3926 Ops/s $\color{#d91a1a}-2.25\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.7328ms 35.1124ms 28.4800 Ops/s 27.9398 Ops/s $\color{#35bf28}+1.93\%$
test_values[td0_return_estimate-False-False] 0.2215ms 0.1686ms 5.9328 KOps/s 5.8474 KOps/s $\color{#35bf28}+1.46\%$
test_values[td1_return_estimate-False-False] 24.4535ms 24.0948ms 41.5028 Ops/s 43.0704 Ops/s $\color{#d91a1a}-3.64\%$
test_values[vec_td1_return_estimate-False-False] 49.8435ms 35.7092ms 28.0040 Ops/s 28.4233 Ops/s $\color{#d91a1a}-1.48\%$
test_values[td_lambda_return_estimate-True-False] 36.4228ms 35.1292ms 28.4663 Ops/s 29.4509 Ops/s $\color{#d91a1a}-3.34\%$
test_values[vec_td_lambda_return_estimate-True-False] 39.3531ms 35.0850ms 28.5022 Ops/s 28.2981 Ops/s $\color{#35bf28}+0.72\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 8.6676ms 8.5767ms 116.5947 Ops/s 121.5702 Ops/s $\color{#d91a1a}-4.09\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.5040ms 2.0047ms 498.8180 Ops/s 491.3307 Ops/s $\color{#35bf28}+1.52\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.5437ms 0.3633ms 2.7528 KOps/s 2.8373 KOps/s $\color{#d91a1a}-2.98\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 47.0912ms 45.0314ms 22.2067 Ops/s 21.1452 Ops/s $\textbf{\color{#35bf28}+5.02\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.7554ms 3.0333ms 329.6766 Ops/s 330.0294 Ops/s $\color{#d91a1a}-0.11\%$
test_dqn_speed 1.6946ms 1.3564ms 737.2303 Ops/s 716.7874 Ops/s $\color{#35bf28}+2.85\%$
test_ddpg_speed 3.8097ms 2.9013ms 344.6734 Ops/s 343.1917 Ops/s $\color{#35bf28}+0.43\%$
test_sac_speed 9.0322ms 8.5376ms 117.1293 Ops/s 114.0446 Ops/s $\color{#35bf28}+2.70\%$
test_redq_speed 15.2159ms 13.4464ms 74.3695 Ops/s 74.0607 Ops/s $\color{#35bf28}+0.42\%$
test_redq_deprec_speed 15.4307ms 13.9122ms 71.8795 Ops/s 72.2616 Ops/s $\color{#d91a1a}-0.53\%$
test_td3_speed 16.5480ms 8.5518ms 116.9342 Ops/s 116.7568 Ops/s $\color{#35bf28}+0.15\%$
test_cql_speed 37.7928ms 36.8340ms 27.1488 Ops/s 27.1071 Ops/s $\color{#35bf28}+0.15\%$
test_a2c_speed 8.6934ms 7.6181ms 131.2664 Ops/s 130.9282 Ops/s $\color{#35bf28}+0.26\%$
test_ppo_speed 9.6943ms 7.9505ms 125.7779 Ops/s 128.8597 Ops/s $\color{#d91a1a}-2.39\%$
test_reinforce_speed 7.7625ms 6.7031ms 149.1852 Ops/s 149.8170 Ops/s $\color{#d91a1a}-0.42\%$
test_iql_speed 34.0312ms 33.2067ms 30.1144 Ops/s 30.2121 Ops/s $\color{#d91a1a}-0.32\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.3949ms 3.5430ms 282.2431 Ops/s 285.0538 Ops/s $\color{#d91a1a}-0.99\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8825ms 0.5050ms 1.9804 KOps/s 1.9775 KOps/s $\color{#35bf28}+0.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6163ms 0.4758ms 2.1018 KOps/s 2.0805 KOps/s $\color{#35bf28}+1.03\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.9631ms 3.5593ms 280.9555 Ops/s 283.8965 Ops/s $\color{#d91a1a}-1.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.9706ms 0.4924ms 2.0310 KOps/s 1.9989 KOps/s $\color{#35bf28}+1.61\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6633ms 0.4708ms 2.1239 KOps/s 2.1127 KOps/s $\color{#35bf28}+0.53\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.0045ms 1.7123ms 584.0103 Ops/s 581.9883 Ops/s $\color{#35bf28}+0.35\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.3006ms 1.6233ms 616.0252 Ops/s 610.1367 Ops/s $\color{#35bf28}+0.97\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.7005ms 3.6661ms 272.7678 Ops/s 274.3353 Ops/s $\color{#d91a1a}-0.57\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2082ms 0.6191ms 1.6152 KOps/s 1.6091 KOps/s $\color{#35bf28}+0.38\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7831ms 0.5947ms 1.6814 KOps/s 1.6827 KOps/s $\color{#d91a1a}-0.07\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.2187ms 3.5693ms 280.1686 Ops/s 282.1928 Ops/s $\color{#d91a1a}-0.72\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0418ms 0.5114ms 1.9555 KOps/s 1.9904 KOps/s $\color{#d91a1a}-1.76\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6347ms 0.4776ms 2.0937 KOps/s 2.0867 KOps/s $\color{#35bf28}+0.34\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.8358ms 3.4662ms 288.4997 Ops/s 289.4559 Ops/s $\color{#d91a1a}-0.33\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7055ms 0.5224ms 1.9141 KOps/s 1.9963 KOps/s $\color{#d91a1a}-4.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.7849ms 0.4994ms 2.0025 KOps/s 2.1062 KOps/s $\color{#d91a1a}-4.92\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.3947ms 3.6576ms 273.4053 Ops/s 277.2647 Ops/s $\color{#d91a1a}-1.39\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7862ms 0.6199ms 1.6133 KOps/s 1.6006 KOps/s $\color{#35bf28}+0.79\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7625ms 0.5975ms 1.6737 KOps/s 1.6558 KOps/s $\color{#35bf28}+1.08\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1011s 7.5503ms 132.4453 Ops/s 169.5401 Ops/s $\textbf{\color{#d91a1a}-21.88\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 14.6654ms 12.5282ms 79.8198 Ops/s 68.7219 Ops/s $\textbf{\color{#35bf28}+16.15\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.5349ms 1.0512ms 951.2659 Ops/s 943.2838 Ops/s $\color{#35bf28}+0.85\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1001s 5.6204ms 177.9221 Ops/s 177.9252 Ops/s $-0.00\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.0053ms 12.5625ms 79.6020 Ops/s 78.7909 Ops/s $\color{#35bf28}+1.03\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.5962ms 1.0469ms 955.2409 Ops/s 934.4257 Ops/s $\color{#35bf28}+2.23\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 99.7729ms 5.7405ms 174.2019 Ops/s 123.1605 Ops/s $\textbf{\color{#35bf28}+41.44\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1086s 14.7651ms 67.7273 Ops/s 77.9960 Ops/s $\textbf{\color{#d91a1a}-13.17\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.8096ms 1.1901ms 840.2485 Ops/s 828.8785 Ops/s $\color{#35bf28}+1.37\%$

Copy link

github-actions bot commented Jun 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1182s 0.1178s 8.4898 Ops/s 8.4326 Ops/s $\color{#35bf28}+0.68\%$
test_sync 0.1063s 0.1056s 9.4653 Ops/s 9.5627 Ops/s $\color{#d91a1a}-1.02\%$
test_async 0.1962s 98.6730ms 10.1345 Ops/s 10.1655 Ops/s $\color{#d91a1a}-0.31\%$
test_single_pixels 0.1295s 0.1283s 7.7964 Ops/s 7.8389 Ops/s $\color{#d91a1a}-0.54\%$
test_sync_pixels 83.8544ms 80.1277ms 12.4801 Ops/s 12.3994 Ops/s $\color{#35bf28}+0.65\%$
test_async_pixels 0.1595s 69.5722ms 14.3736 Ops/s 14.6743 Ops/s $\color{#d91a1a}-2.05\%$
test_simple 0.8963s 0.8350s 1.1976 Ops/s 1.2244 Ops/s $\color{#d91a1a}-2.19\%$
test_transformed 1.1391s 1.0904s 0.9171 Ops/s 0.9344 Ops/s $\color{#d91a1a}-1.85\%$
test_serial 2.5624s 2.5099s 0.3984 Ops/s 0.4038 Ops/s $\color{#d91a1a}-1.33\%$
test_parallel 2.4315s 2.3633s 0.4231 Ops/s 0.4278 Ops/s $\color{#d91a1a}-1.09\%$
test_step_mdp_speed[True-True-True-True-True] 0.1576ms 33.6553μs 29.7130 KOps/s 28.4457 KOps/s $\color{#35bf28}+4.46\%$
test_step_mdp_speed[True-True-True-True-False] 45.3810μs 19.6828μs 50.8058 KOps/s 48.8177 KOps/s $\color{#35bf28}+4.07\%$
test_step_mdp_speed[True-True-True-False-True] 0.1397ms 19.2062μs 52.0664 KOps/s 49.9615 KOps/s $\color{#35bf28}+4.21\%$
test_step_mdp_speed[True-True-True-False-False] 0.1345ms 11.5670μs 86.4528 KOps/s 86.4352 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[True-True-False-True-True] 92.1620μs 35.8353μs 27.9054 KOps/s 27.0577 KOps/s $\color{#35bf28}+3.13\%$
test_step_mdp_speed[True-True-False-True-False] 46.8910μs 22.1102μs 45.2279 KOps/s 45.3198 KOps/s $\color{#d91a1a}-0.20\%$
test_step_mdp_speed[True-True-False-False-True] 52.4820μs 21.2134μs 47.1400 KOps/s 45.5795 KOps/s $\color{#35bf28}+3.42\%$
test_step_mdp_speed[True-True-False-False-False] 29.0810μs 13.3718μs 74.7841 KOps/s 74.0719 KOps/s $\color{#35bf28}+0.96\%$
test_step_mdp_speed[True-False-True-True-True] 62.5710μs 37.3409μs 26.7803 KOps/s 26.0053 KOps/s $\color{#35bf28}+2.98\%$
test_step_mdp_speed[True-False-True-True-False] 0.2080ms 23.7885μs 42.0372 KOps/s 42.0319 KOps/s $\color{#35bf28}+0.01\%$
test_step_mdp_speed[True-False-True-False-True] 0.1941ms 21.4292μs 46.6654 KOps/s 47.1253 KOps/s $\color{#d91a1a}-0.98\%$
test_step_mdp_speed[True-False-True-False-False] 0.1257ms 13.3476μs 74.9200 KOps/s 74.2369 KOps/s $\color{#35bf28}+0.92\%$
test_step_mdp_speed[True-False-False-True-True] 0.2221ms 39.9251μs 25.0469 KOps/s 24.7500 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[True-False-False-True-False] 43.7110μs 25.9037μs 38.6046 KOps/s 38.5737 KOps/s $\color{#35bf28}+0.08\%$
test_step_mdp_speed[True-False-False-False-True] 0.1164ms 23.2000μs 43.1034 KOps/s 42.5682 KOps/s $\color{#35bf28}+1.26\%$
test_step_mdp_speed[True-False-False-False-False] 70.7920μs 15.2094μs 65.7490 KOps/s 65.0622 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[False-True-True-True-True] 0.1174ms 38.1813μs 26.1908 KOps/s 25.8177 KOps/s $\color{#35bf28}+1.45\%$
test_step_mdp_speed[False-True-True-True-False] 43.3110μs 23.7328μs 42.1358 KOps/s 40.7886 KOps/s $\color{#35bf28}+3.30\%$
test_step_mdp_speed[False-True-True-False-True] 74.6810μs 25.1655μs 39.7369 KOps/s 39.1705 KOps/s $\color{#35bf28}+1.45\%$
test_step_mdp_speed[False-True-True-False-False] 0.1462ms 15.2698μs 65.4889 KOps/s 62.7860 KOps/s $\color{#35bf28}+4.30\%$
test_step_mdp_speed[False-True-False-True-True] 64.7910μs 39.7635μs 25.1487 KOps/s 25.7441 KOps/s $\color{#d91a1a}-2.31\%$
test_step_mdp_speed[False-True-False-True-False] 0.1003ms 25.4460μs 39.2989 KOps/s 37.8416 KOps/s $\color{#35bf28}+3.85\%$
test_step_mdp_speed[False-True-False-False-True] 0.2215ms 26.8060μs 37.3050 KOps/s 36.5015 KOps/s $\color{#35bf28}+2.20\%$
test_step_mdp_speed[False-True-False-False-False] 0.1940ms 17.0718μs 58.5760 KOps/s 56.9456 KOps/s $\color{#35bf28}+2.86\%$
test_step_mdp_speed[False-False-True-True-True] 0.2120ms 41.5007μs 24.0960 KOps/s 23.3437 KOps/s $\color{#35bf28}+3.22\%$
test_step_mdp_speed[False-False-True-True-False] 50.5410μs 27.7592μs 36.0241 KOps/s 35.1016 KOps/s $\color{#35bf28}+2.63\%$
test_step_mdp_speed[False-False-True-False-True] 58.1710μs 27.0750μs 36.9344 KOps/s 36.7259 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[False-False-True-False-False] 40.7010μs 17.0810μs 58.5444 KOps/s 56.0468 KOps/s $\color{#35bf28}+4.46\%$
test_step_mdp_speed[False-False-False-True-True] 64.1320μs 43.9558μs 22.7501 KOps/s 22.3458 KOps/s $\color{#35bf28}+1.81\%$
test_step_mdp_speed[False-False-False-True-False] 69.6110μs 30.1445μs 33.1735 KOps/s 33.3447 KOps/s $\color{#d91a1a}-0.51\%$
test_step_mdp_speed[False-False-False-False-True] 48.5210μs 28.8129μs 34.7067 KOps/s 34.5105 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[False-False-False-False-False] 0.1100ms 19.1191μs 52.3037 KOps/s 51.0965 KOps/s $\color{#35bf28}+2.36\%$
test_values[generalized_advantage_estimate-True-True] 25.3792ms 24.2590ms 41.2219 Ops/s 42.0674 Ops/s $\color{#d91a1a}-2.01\%$
test_values[vec_generalized_advantage_estimate-True-True] 88.8726ms 2.6679ms 374.8328 Ops/s 363.8501 Ops/s $\color{#35bf28}+3.02\%$
test_values[td0_return_estimate-False-False] 93.9320μs 67.3642μs 14.8447 KOps/s 15.1845 KOps/s $\color{#d91a1a}-2.24\%$
test_values[td1_return_estimate-False-False] 54.1823ms 53.1385ms 18.8187 Ops/s 18.8414 Ops/s $\color{#d91a1a}-0.12\%$
test_values[vec_td1_return_estimate-False-False] 1.3892ms 1.0717ms 933.0713 Ops/s 938.0656 Ops/s $\color{#d91a1a}-0.53\%$
test_values[td_lambda_return_estimate-True-False] 88.0548ms 84.8988ms 11.7787 Ops/s 11.7837 Ops/s $\color{#d91a1a}-0.04\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3558ms 1.0651ms 938.8776 Ops/s 942.9374 Ops/s $\color{#d91a1a}-0.43\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 26.3019ms 24.8714ms 40.2068 Ops/s 38.9117 Ops/s $\color{#35bf28}+3.33\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9590ms 0.7061ms 1.4162 KOps/s 1.4160 KOps/s $+0.01\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8158ms 0.6576ms 1.5207 KOps/s 1.5453 KOps/s $\color{#d91a1a}-1.59\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.7308ms 1.4580ms 685.8562 Ops/s 686.3101 Ops/s $\color{#d91a1a}-0.07\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.8430ms 0.6692ms 1.4944 KOps/s 1.5059 KOps/s $\color{#d91a1a}-0.77\%$
test_dqn_speed 1.8264ms 1.4540ms 687.7703 Ops/s 693.8680 Ops/s $\color{#d91a1a}-0.88\%$
test_ddpg_speed 3.2972ms 3.0095ms 332.2797 Ops/s 335.5814 Ops/s $\color{#d91a1a}-0.98\%$
test_sac_speed 9.2288ms 8.5807ms 116.5407 Ops/s 118.0311 Ops/s $\color{#d91a1a}-1.26\%$
test_redq_speed 12.5131ms 10.9948ms 90.9525 Ops/s 83.7090 Ops/s $\textbf{\color{#35bf28}+8.65\%}$
test_redq_deprec_speed 12.9905ms 12.2703ms 81.4975 Ops/s 86.4076 Ops/s $\textbf{\color{#d91a1a}-5.68\%}$
test_td3_speed 17.6180ms 8.5166ms 117.4182 Ops/s 121.3378 Ops/s $\color{#d91a1a}-3.23\%$
test_cql_speed 28.5733ms 26.7734ms 37.3506 Ops/s 37.9203 Ops/s $\color{#d91a1a}-1.50\%$
test_a2c_speed 6.5701ms 5.8716ms 170.3112 Ops/s 174.3038 Ops/s $\color{#d91a1a}-2.29\%$
test_ppo_speed 6.6060ms 6.1406ms 162.8506 Ops/s 165.9254 Ops/s $\color{#d91a1a}-1.85\%$
test_reinforce_speed 5.2094ms 4.8890ms 204.5416 Ops/s 207.8513 Ops/s $\color{#d91a1a}-1.59\%$
test_iql_speed 21.4055ms 20.5310ms 48.7069 Ops/s 48.1808 Ops/s $\color{#35bf28}+1.09\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.0514ms 4.6413ms 215.4564 Ops/s 213.3100 Ops/s $\color{#35bf28}+1.01\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8181ms 0.6116ms 1.6350 KOps/s 1.6142 KOps/s $\color{#35bf28}+1.29\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.9341ms 0.6007ms 1.6649 KOps/s 1.6730 KOps/s $\color{#d91a1a}-0.49\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.0360ms 4.6613ms 214.5302 Ops/s 216.7009 Ops/s $\color{#d91a1a}-1.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7787ms 0.6110ms 1.6368 KOps/s 1.6267 KOps/s $\color{#35bf28}+0.62\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.9007ms 0.5965ms 1.6763 KOps/s 1.6605 KOps/s $\color{#35bf28}+0.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.4486ms 2.1486ms 465.4113 Ops/s 477.6289 Ops/s $\color{#d91a1a}-2.56\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 6.5543ms 2.0827ms 480.1438 Ops/s 495.9933 Ops/s $\color{#d91a1a}-3.20\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.2402ms 4.8140ms 207.7254 Ops/s 208.4852 Ops/s $\color{#d91a1a}-0.36\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.8062ms 0.7497ms 1.3339 KOps/s 1.3405 KOps/s $\color{#d91a1a}-0.49\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9395ms 0.7269ms 1.3756 KOps/s 1.3546 KOps/s $\color{#35bf28}+1.56\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.0441ms 4.6706ms 214.1048 Ops/s 212.6052 Ops/s $\color{#35bf28}+0.71\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.3934ms 0.6137ms 1.6294 KOps/s 1.6222 KOps/s $\color{#35bf28}+0.44\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7980ms 0.5915ms 1.6907 KOps/s 1.6956 KOps/s $\color{#d91a1a}-0.29\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.2211ms 4.6894ms 213.2476 Ops/s 216.3348 Ops/s $\color{#d91a1a}-1.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8002ms 0.5991ms 1.6693 KOps/s 1.6561 KOps/s $\color{#35bf28}+0.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.7518ms 0.5873ms 1.7026 KOps/s 1.2774 KOps/s $\textbf{\color{#35bf28}+33.29\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.1273ms 4.8011ms 208.2839 Ops/s 210.5258 Ops/s $\color{#d91a1a}-1.06\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.6579ms 0.7386ms 1.3539 KOps/s 1.3666 KOps/s $\color{#d91a1a}-0.93\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9390ms 0.7195ms 1.3899 KOps/s 1.4062 KOps/s $\color{#d91a1a}-1.16\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1282s 7.3544ms 135.9732 Ops/s 136.9164 Ops/s $\color{#d91a1a}-0.69\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.8038ms 15.8194ms 63.2137 Ops/s 54.8889 Ops/s $\textbf{\color{#35bf28}+15.17\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.6766ms 1.3353ms 748.8747 Ops/s 770.2602 Ops/s $\color{#d91a1a}-2.78\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1205s 9.5155ms 105.0918 Ops/s 139.3734 Ops/s $\textbf{\color{#d91a1a}-24.60\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 18.5375ms 15.8210ms 63.2072 Ops/s 62.9612 Ops/s $\color{#35bf28}+0.39\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.8991ms 1.4568ms 686.4375 Ops/s 754.1752 Ops/s $\textbf{\color{#d91a1a}-8.98\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1209s 7.3746ms 135.6005 Ops/s 136.7233 Ops/s $\color{#d91a1a}-0.82\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 18.3290ms 15.9232ms 62.8013 Ops/s 63.1243 Ops/s $\color{#d91a1a}-0.51\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.6162ms 1.5173ms 659.0750 Ops/s 632.0176 Ops/s $\color{#35bf28}+4.28\%$

@wertyuilife2
Copy link

wertyuilife2 commented Jun 7, 2024

@vmoens I did some further search, found that many open-source libraries implementing PER maintain the historical maximum priority, such as dopamine.

But the original PER paper and some other source codes in my domain use the buffer's maximum priority approach,
including EfficientZero, whose v2 is the SOTA data-efficient method in RL.

So, overall, both approaches make sense, and this is not a bug(my bad), I believe the buffer's maximum priority approach is more robust to the priority value.

@wertyuilife2
Copy link

wertyuilife2 commented Jun 7, 2024

I am raising the issue because I found in practice that during the early stages of training, when a transition is first time being sampled, its PER weight is typically 1e-8 (which is the value of epsilon) which is weird. This is because max_priority=1 and the bellman error I get in the early training stage is 0 (cause network output are init to 0 for stablility).

So, basically, this is more like an additional feature for better priority adaptation or similar to priority normalization. I think it is not an essential feature (if you find implementing it to be complicated), but it can be available as an option, I believe it can increase sample efficiency and training stability.

@vmoens
Copy link
Contributor Author

vmoens commented Jun 7, 2024

So should I make the erasing of the max_priority during extend optional?

@wertyuilife2
Copy link

yep, I think so. Maybe in some task the historical max will be better.

@vmoens vmoens added enhancement New feature or request and removed bug Something isn't working labels Jun 8, 2024
@vmoens vmoens merged commit 4d37ee1 into main Jun 8, 2024
43 of 47 checks passed
@vmoens vmoens deleted the fix-max-priority branch June 8, 2024 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Inadequate Default Priority Design in PrioritizedSampler
3 participants