Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix clip_fraction in PO losses #2021

Merged
merged 2 commits into from
Mar 19, 2024
Merged

[BugFix] Fix clip_fraction in PO losses #2021

merged 2 commits into from
Mar 19, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 18, 2024

Copy link

pytorch-bot bot commented Mar 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2021

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 22 Unrelated Failures

As of commit 4ad0e86 with merge base 9747170 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 18, 2024
@vmoens vmoens linked an issue Mar 18, 2024 that may be closed by this pull request
@vmoens vmoens added the bug Something isn't working label Mar 18, 2024
@albertbou92
Copy link
Contributor

albertbou92 commented Mar 18, 2024

This one also needs to be corrected
https://github.com/pytorch/rl/blob/main/torchrl/objectives/ppo.py#L864

Copy link

github-actions bot commented Mar 18, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}2$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 54.8529ms 53.6499ms 18.6394 Ops/s 17.8283 Ops/s $\color{#35bf28}+4.55\%$
test_sync 42.2192ms 32.1768ms 31.0783 Ops/s 34.2578 Ops/s $\textbf{\color{#d91a1a}-9.28\%}$
test_async 53.3675ms 26.4034ms 37.8739 Ops/s 37.7404 Ops/s $\color{#35bf28}+0.35\%$
test_simple 0.3937s 0.3379s 2.9598 Ops/s 3.0760 Ops/s $\color{#d91a1a}-3.78\%$
test_transformed 0.5237s 0.4738s 2.1104 Ops/s 2.1111 Ops/s $\color{#d91a1a}-0.03\%$
test_serial 1.2306s 1.1915s 0.8393 Ops/s 0.8337 Ops/s $\color{#35bf28}+0.66\%$
test_parallel 1.1006s 1.0574s 0.9457 Ops/s 0.9708 Ops/s $\color{#d91a1a}-2.58\%$
test_step_mdp_speed[True-True-True-True-True] 0.1539ms 21.6457μs 46.1985 KOps/s 46.6829 KOps/s $\color{#d91a1a}-1.04\%$
test_step_mdp_speed[True-True-True-True-False] 57.9680μs 13.1682μs 75.9406 KOps/s 77.3392 KOps/s $\color{#d91a1a}-1.81\%$
test_step_mdp_speed[True-True-True-False-True] 35.4260μs 12.6536μs 79.0291 KOps/s 80.9492 KOps/s $\color{#d91a1a}-2.37\%$
test_step_mdp_speed[True-True-True-False-False] 34.3740μs 7.7429μs 129.1505 KOps/s 135.5008 KOps/s $\color{#d91a1a}-4.69\%$
test_step_mdp_speed[True-True-False-True-True] 0.2123ms 23.2020μs 43.0997 KOps/s 44.5797 KOps/s $\color{#d91a1a}-3.32\%$
test_step_mdp_speed[True-True-False-True-False] 38.3310μs 14.4936μs 68.9962 KOps/s 70.7399 KOps/s $\color{#d91a1a}-2.46\%$
test_step_mdp_speed[True-True-False-False-True] 48.8610μs 13.8221μs 72.3479 KOps/s 74.1937 KOps/s $\color{#d91a1a}-2.49\%$
test_step_mdp_speed[True-True-False-False-False] 50.0730μs 8.9741μs 111.4314 KOps/s 115.9046 KOps/s $\color{#d91a1a}-3.86\%$
test_step_mdp_speed[True-False-True-True-True] 49.1820μs 24.4966μs 40.8220 KOps/s 42.0915 KOps/s $\color{#d91a1a}-3.02\%$
test_step_mdp_speed[True-False-True-True-False] 54.4210μs 15.7989μs 63.2956 KOps/s 65.6919 KOps/s $\color{#d91a1a}-3.65\%$
test_step_mdp_speed[True-False-True-False-True] 54.3810μs 13.9978μs 71.4397 KOps/s 75.0452 KOps/s $\color{#d91a1a}-4.80\%$
test_step_mdp_speed[True-False-True-False-False] 91.6110μs 8.9609μs 111.5956 KOps/s 116.3496 KOps/s $\color{#d91a1a}-4.09\%$
test_step_mdp_speed[True-False-False-True-True] 64.2600μs 25.6019μs 39.0597 KOps/s 39.7305 KOps/s $\color{#d91a1a}-1.69\%$
test_step_mdp_speed[True-False-False-True-False] 44.7030μs 16.8279μs 59.4251 KOps/s 60.7396 KOps/s $\color{#d91a1a}-2.16\%$
test_step_mdp_speed[True-False-False-False-True] 51.6470μs 15.1368μs 66.0642 KOps/s 68.2035 KOps/s $\color{#d91a1a}-3.14\%$
test_step_mdp_speed[True-False-False-False-False] 44.6130μs 10.0559μs 99.4442 KOps/s 102.6328 KOps/s $\color{#d91a1a}-3.11\%$
test_step_mdp_speed[False-True-True-True-True] 61.5050μs 24.2551μs 41.2284 KOps/s 41.8561 KOps/s $\color{#d91a1a}-1.50\%$
test_step_mdp_speed[False-True-True-True-False] 41.8380μs 15.6606μs 63.8544 KOps/s 65.9320 KOps/s $\color{#d91a1a}-3.15\%$
test_step_mdp_speed[False-True-True-False-True] 56.7450μs 16.0353μs 62.3622 KOps/s 64.3467 KOps/s $\color{#d91a1a}-3.08\%$
test_step_mdp_speed[False-True-True-False-False] 40.5750μs 10.0528μs 99.4743 KOps/s 102.8881 KOps/s $\color{#d91a1a}-3.32\%$
test_step_mdp_speed[False-True-False-True-True] 39.4640μs 25.4749μs 39.2543 KOps/s 39.5018 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[False-True-False-True-False] 55.3230μs 16.7364μs 59.7500 KOps/s 61.0560 KOps/s $\color{#d91a1a}-2.14\%$
test_step_mdp_speed[False-True-False-False-True] 47.8690μs 17.1844μs 58.1924 KOps/s 60.2936 KOps/s $\color{#d91a1a}-3.49\%$
test_step_mdp_speed[False-True-False-False-False] 47.0680μs 11.1828μs 89.4228 KOps/s 92.2795 KOps/s $\color{#d91a1a}-3.10\%$
test_step_mdp_speed[False-False-True-True-True] 65.0210μs 26.7418μs 37.3946 KOps/s 38.5647 KOps/s $\color{#d91a1a}-3.03\%$
test_step_mdp_speed[False-False-True-True-False] 82.7040μs 17.6951μs 56.5129 KOps/s 56.7012 KOps/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[False-False-True-False-True] 54.9820μs 17.3501μs 57.6367 KOps/s 59.2781 KOps/s $\color{#d91a1a}-2.77\%$
test_step_mdp_speed[False-False-True-False-False] 49.2010μs 11.1357μs 89.8014 KOps/s 91.4668 KOps/s $\color{#d91a1a}-1.82\%$
test_step_mdp_speed[False-False-False-True-True] 66.1230μs 27.5687μs 36.2730 KOps/s 37.3188 KOps/s $\color{#d91a1a}-2.80\%$
test_step_mdp_speed[False-False-False-True-False] 45.8350μs 19.1660μs 52.1756 KOps/s 53.7078 KOps/s $\color{#d91a1a}-2.85\%$
test_step_mdp_speed[False-False-False-False-True] 55.4730μs 18.0481μs 55.4074 KOps/s 56.9979 KOps/s $\color{#d91a1a}-2.79\%$
test_step_mdp_speed[False-False-False-False-False] 34.6450μs 12.2665μs 81.5230 KOps/s 84.4411 KOps/s $\color{#d91a1a}-3.46\%$
test_values[generalized_advantage_estimate-True-True] 9.7374ms 9.4411ms 105.9194 Ops/s 104.5957 Ops/s $\color{#35bf28}+1.27\%$
test_values[vec_generalized_advantage_estimate-True-True] 36.6737ms 35.4772ms 28.1871 Ops/s 28.2836 Ops/s $\color{#d91a1a}-0.34\%$
test_values[td0_return_estimate-False-False] 0.2318ms 0.1711ms 5.8458 KOps/s 6.0440 KOps/s $\color{#d91a1a}-3.28\%$
test_values[td1_return_estimate-False-False] 24.8629ms 23.5341ms 42.4915 Ops/s 42.1407 Ops/s $\color{#35bf28}+0.83\%$
test_values[vec_td1_return_estimate-False-False] 36.5048ms 35.5267ms 28.1478 Ops/s 27.8442 Ops/s $\color{#35bf28}+1.09\%$
test_values[td_lambda_return_estimate-True-False] 37.2956ms 33.9720ms 29.4360 Ops/s 28.8761 Ops/s $\color{#35bf28}+1.94\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.6347ms 35.4271ms 28.2270 Ops/s 28.1840 Ops/s $\color{#35bf28}+0.15\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 11.0432ms 8.2791ms 120.7864 Ops/s 121.8514 Ops/s $\color{#d91a1a}-0.87\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3395ms 2.0094ms 497.6603 Ops/s 526.9344 Ops/s $\textbf{\color{#d91a1a}-5.56\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4331ms 0.3498ms 2.8586 KOps/s 2.8349 KOps/s $\color{#35bf28}+0.84\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 50.4425ms 47.5409ms 21.0345 Ops/s 21.2549 Ops/s $\color{#d91a1a}-1.04\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.9077ms 3.0537ms 327.4717 Ops/s 331.1043 Ops/s $\color{#d91a1a}-1.10\%$
test_dqn_speed 3.6776ms 1.3580ms 736.3656 Ops/s 733.3903 Ops/s $\color{#35bf28}+0.41\%$
test_ddpg_speed 3.0634ms 2.6923ms 371.4330 Ops/s 373.3012 Ops/s $\color{#d91a1a}-0.50\%$
test_sac_speed 9.4584ms 8.3050ms 120.4100 Ops/s 121.5413 Ops/s $\color{#d91a1a}-0.93\%$
test_redq_speed 14.2198ms 13.4014ms 74.6192 Ops/s 76.2795 Ops/s $\color{#d91a1a}-2.18\%$
test_redq_deprec_speed 18.1117ms 13.5024ms 74.0608 Ops/s 76.4899 Ops/s $\color{#d91a1a}-3.18\%$
test_td3_speed 9.2300ms 8.2236ms 121.6011 Ops/s 122.1034 Ops/s $\color{#d91a1a}-0.41\%$
test_cql_speed 0.1167s 40.3036ms 24.8117 Ops/s 27.4760 Ops/s $\textbf{\color{#d91a1a}-9.70\%}$
test_a2c_speed 8.6464ms 7.5280ms 132.8368 Ops/s 136.4388 Ops/s $\color{#d91a1a}-2.64\%$
test_ppo_speed 9.2567ms 7.8485ms 127.4127 Ops/s 129.3169 Ops/s $\color{#d91a1a}-1.47\%$
test_reinforce_speed 8.1106ms 6.6529ms 150.3096 Ops/s 151.8134 Ops/s $\color{#d91a1a}-0.99\%$
test_iql_speed 33.9989ms 33.0103ms 30.2936 Ops/s 30.5041 Ops/s $\color{#d91a1a}-0.69\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.5780ms 2.3454ms 426.3706 Ops/s 453.0878 Ops/s $\textbf{\color{#d91a1a}-5.90\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.2047ms 0.5062ms 1.9755 KOps/s 2.0166 KOps/s $\color{#d91a1a}-2.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6709ms 0.4824ms 2.0728 KOps/s 2.1259 KOps/s $\color{#d91a1a}-2.50\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.5494ms 2.3994ms 416.7703 Ops/s 451.6534 Ops/s $\textbf{\color{#d91a1a}-7.72\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6932ms 0.5019ms 1.9926 KOps/s 2.0321 KOps/s $\color{#d91a1a}-1.94\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.8303ms 0.4867ms 2.0548 KOps/s 2.1404 KOps/s $\color{#d91a1a}-4.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4098ms 1.2874ms 776.7593 Ops/s 777.5997 Ops/s $\color{#d91a1a}-0.11\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7452ms 1.2200ms 819.6628 Ops/s 817.4316 Ops/s $\color{#35bf28}+0.27\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.0637ms 2.4695ms 404.9468 Ops/s 426.0126 Ops/s $\color{#d91a1a}-4.94\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2612ms 0.6227ms 1.6059 KOps/s 1.5610 KOps/s $\color{#35bf28}+2.88\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8227ms 0.5947ms 1.6815 KOps/s 1.6640 KOps/s $\color{#35bf28}+1.05\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.6051ms 2.3724ms 421.5062 Ops/s 432.7168 Ops/s $\color{#d91a1a}-2.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.1072ms 0.5050ms 1.9802 KOps/s 1.9670 KOps/s $\color{#35bf28}+0.67\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6714ms 0.4882ms 2.0482 KOps/s 2.0980 KOps/s $\color{#d91a1a}-2.37\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.6765ms 2.4137ms 414.3068 Ops/s 431.6344 Ops/s $\color{#d91a1a}-4.01\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7559ms 0.5035ms 1.9863 KOps/s 1.9602 KOps/s $\color{#35bf28}+1.33\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.1024s 0.6115ms 1.6352 KOps/s 2.1294 KOps/s $\textbf{\color{#d91a1a}-23.21\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.9449ms 2.4639ms 405.8653 Ops/s 421.0375 Ops/s $\color{#d91a1a}-3.60\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9723ms 0.6290ms 1.5898 KOps/s 1.6233 KOps/s $\color{#d91a1a}-2.07\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7409ms 0.5928ms 1.6868 KOps/s 1.6604 KOps/s $\color{#35bf28}+1.59\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1018s 7.4829ms 133.6383 Ops/s 129.1978 Ops/s $\color{#35bf28}+3.44\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 14.5649ms 11.9783ms 83.4840 Ops/s 82.6253 Ops/s $\color{#35bf28}+1.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.8997ms 1.1148ms 897.0100 Ops/s 952.5231 Ops/s $\textbf{\color{#d91a1a}-5.83\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 95.7334ms 5.5036ms 181.7004 Ops/s 131.3032 Ops/s $\textbf{\color{#35bf28}+38.38\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 14.5623ms 11.9149ms 83.9287 Ops/s 81.2568 Ops/s $\color{#35bf28}+3.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.8711ms 1.0881ms 919.0741 Ops/s 892.8438 Ops/s $\color{#35bf28}+2.94\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 95.4756ms 7.6216ms 131.2059 Ops/s 167.3812 Ops/s $\textbf{\color{#d91a1a}-21.61\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 14.9369ms 12.3259ms 81.1303 Ops/s 70.0432 Ops/s $\textbf{\color{#35bf28}+15.83\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.7951ms 1.6425ms 608.8319 Ops/s 676.3244 Ops/s $\textbf{\color{#d91a1a}-9.98\%}$

Copy link

github-actions bot commented Mar 18, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1038s 0.1032s 9.6913 Ops/s 8.9799 Ops/s $\textbf{\color{#35bf28}+7.92\%}$
test_sync 93.7425ms 92.0456ms 10.8642 Ops/s 10.8075 Ops/s $\color{#35bf28}+0.52\%$
test_async 0.1760s 88.3071ms 11.3241 Ops/s 11.4079 Ops/s $\color{#d91a1a}-0.73\%$
test_single_pixels 0.1156s 0.1144s 8.7417 Ops/s 8.6866 Ops/s $\color{#35bf28}+0.63\%$
test_sync_pixels 69.2623ms 67.0755ms 14.9086 Ops/s 14.4749 Ops/s $\color{#35bf28}+3.00\%$
test_async_pixels 0.1192s 56.8451ms 17.5917 Ops/s 16.9567 Ops/s $\color{#35bf28}+3.74\%$
test_simple 0.6753s 0.6748s 1.4820 Ops/s 1.4101 Ops/s $\textbf{\color{#35bf28}+5.10\%}$
test_transformed 0.8902s 0.8876s 1.1266 Ops/s 1.0869 Ops/s $\color{#35bf28}+3.65\%$
test_serial 2.1872s 2.1298s 0.4695 Ops/s 0.4594 Ops/s $\color{#35bf28}+2.20\%$
test_parallel 1.8745s 1.8387s 0.5439 Ops/s 0.5473 Ops/s $\color{#d91a1a}-0.64\%$
test_step_mdp_speed[True-True-True-True-True] 0.1354ms 33.9922μs 29.4185 KOps/s 30.6315 KOps/s $\color{#d91a1a}-3.96\%$
test_step_mdp_speed[True-True-True-True-False] 41.6710μs 20.2865μs 49.2939 KOps/s 50.1505 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[True-True-True-False-True] 46.6200μs 18.9010μs 52.9073 KOps/s 53.0187 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[True-True-True-False-False] 32.9400μs 11.4262μs 87.5185 KOps/s 87.0603 KOps/s $\color{#35bf28}+0.53\%$
test_step_mdp_speed[True-True-False-True-True] 67.6000μs 35.7918μs 27.9394 KOps/s 28.9022 KOps/s $\color{#d91a1a}-3.33\%$
test_step_mdp_speed[True-True-False-True-False] 44.7410μs 22.2038μs 45.0373 KOps/s 45.8253 KOps/s $\color{#d91a1a}-1.72\%$
test_step_mdp_speed[True-True-False-False-True] 42.4010μs 21.3610μs 46.8143 KOps/s 48.5070 KOps/s $\color{#d91a1a}-3.49\%$
test_step_mdp_speed[True-True-False-False-False] 71.9410μs 13.4539μs 74.3281 KOps/s 74.5756 KOps/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[True-False-True-True-True] 78.4010μs 37.2390μs 26.8535 KOps/s 27.0610 KOps/s $\color{#d91a1a}-0.77\%$
test_step_mdp_speed[True-False-True-True-False] 44.6700μs 23.7766μs 42.0581 KOps/s 42.2506 KOps/s $\color{#d91a1a}-0.46\%$
test_step_mdp_speed[True-False-True-False-True] 87.3210μs 20.7524μs 48.1873 KOps/s 48.8155 KOps/s $\color{#d91a1a}-1.29\%$
test_step_mdp_speed[True-False-True-False-False] 34.2300μs 13.2943μs 75.2201 KOps/s 75.4507 KOps/s $\color{#d91a1a}-0.31\%$
test_step_mdp_speed[True-False-False-True-True] 63.7300μs 39.2100μs 25.5037 KOps/s 26.0411 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[True-False-False-True-False] 46.6310μs 25.5692μs 39.1095 KOps/s 39.5013 KOps/s $\color{#d91a1a}-0.99\%$
test_step_mdp_speed[True-False-False-False-True] 57.9000μs 22.3395μs 44.7638 KOps/s 44.6202 KOps/s $\color{#35bf28}+0.32\%$
test_step_mdp_speed[True-False-False-False-False] 35.0110μs 15.2016μs 65.7824 KOps/s 66.6291 KOps/s $\color{#d91a1a}-1.27\%$
test_step_mdp_speed[False-True-True-True-True] 57.8500μs 37.1184μs 26.9408 KOps/s 27.1691 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[False-True-True-True-False] 51.5210μs 24.0019μs 41.6633 KOps/s 42.3338 KOps/s $\color{#d91a1a}-1.58\%$
test_step_mdp_speed[False-True-True-False-True] 56.4900μs 24.8231μs 40.2851 KOps/s 40.9084 KOps/s $\color{#d91a1a}-1.52\%$
test_step_mdp_speed[False-True-True-False-False] 41.2610μs 15.0977μs 66.2351 KOps/s 66.3917 KOps/s $\color{#d91a1a}-0.24\%$
test_step_mdp_speed[False-True-False-True-True] 67.4300μs 39.8683μs 25.0826 KOps/s 25.7811 KOps/s $\color{#d91a1a}-2.71\%$
test_step_mdp_speed[False-True-False-True-False] 54.5110μs 25.6102μs 39.0469 KOps/s 39.1930 KOps/s $\color{#d91a1a}-0.37\%$
test_step_mdp_speed[False-True-False-False-True] 47.9300μs 26.5666μs 37.6412 KOps/s 38.3496 KOps/s $\color{#d91a1a}-1.85\%$
test_step_mdp_speed[False-True-False-False-False] 50.7910μs 16.9757μs 58.9079 KOps/s 59.7959 KOps/s $\color{#d91a1a}-1.49\%$
test_step_mdp_speed[False-False-True-True-True] 64.7110μs 41.0660μs 24.3510 KOps/s 24.9523 KOps/s $\color{#d91a1a}-2.41\%$
test_step_mdp_speed[False-False-True-True-False] 54.2800μs 27.3303μs 36.5894 KOps/s 36.6049 KOps/s $\color{#d91a1a}-0.04\%$
test_step_mdp_speed[False-False-True-False-True] 55.3010μs 26.4227μs 37.8463 KOps/s 38.5926 KOps/s $\color{#d91a1a}-1.93\%$
test_step_mdp_speed[False-False-True-False-False] 38.0200μs 17.1171μs 58.4212 KOps/s 59.6472 KOps/s $\color{#d91a1a}-2.06\%$
test_step_mdp_speed[False-False-False-True-True] 64.0410μs 41.8212μs 23.9113 KOps/s 24.0766 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[False-False-False-True-False] 68.2120μs 29.2793μs 34.1538 KOps/s 34.2640 KOps/s $\color{#d91a1a}-0.32\%$
test_step_mdp_speed[False-False-False-False-True] 54.1710μs 28.5857μs 34.9826 KOps/s 36.2923 KOps/s $\color{#d91a1a}-3.61\%$
test_step_mdp_speed[False-False-False-False-False] 45.4610μs 18.7048μs 53.4622 KOps/s 54.4002 KOps/s $\color{#d91a1a}-1.72\%$
test_values[generalized_advantage_estimate-True-True] 26.0947ms 25.2455ms 39.6110 Ops/s 38.8276 Ops/s $\color{#35bf28}+2.02\%$
test_values[vec_generalized_advantage_estimate-True-True] 85.8623ms 3.2909ms 303.8721 Ops/s 308.4311 Ops/s $\color{#d91a1a}-1.48\%$
test_values[td0_return_estimate-False-False] 96.1010μs 65.3236μs 15.3084 KOps/s 14.9198 KOps/s $\color{#35bf28}+2.60\%$
test_values[td1_return_estimate-False-False] 54.3926ms 53.9750ms 18.5271 Ops/s 18.0735 Ops/s $\color{#35bf28}+2.51\%$
test_values[vec_td1_return_estimate-False-False] 2.0886ms 1.7760ms 563.0551 Ops/s 560.4680 Ops/s $\color{#35bf28}+0.46\%$
test_values[td_lambda_return_estimate-True-False] 86.2963ms 86.0493ms 11.6212 Ops/s 11.3094 Ops/s $\color{#35bf28}+2.76\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.1260ms 1.7723ms 564.2507 Ops/s 560.3681 Ops/s $\color{#35bf28}+0.69\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.0555ms 23.7243ms 42.1508 Ops/s 41.5306 Ops/s $\color{#35bf28}+1.49\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9188ms 0.7148ms 1.3989 KOps/s 1.3833 KOps/s $\color{#35bf28}+1.13\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7205ms 0.6618ms 1.5111 KOps/s 1.4962 KOps/s $\color{#35bf28}+1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4987ms 1.4653ms 682.4467 Ops/s 680.7899 Ops/s $\color{#35bf28}+0.24\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9366ms 0.6871ms 1.4555 KOps/s 1.4517 KOps/s $\color{#35bf28}+0.26\%$
test_dqn_speed 7.7316ms 1.4184ms 705.0255 Ops/s 679.6334 Ops/s $\color{#35bf28}+3.74\%$
test_ddpg_speed 2.9395ms 2.6696ms 374.5863 Ops/s 361.7080 Ops/s $\color{#35bf28}+3.56\%$
test_sac_speed 8.4166ms 8.0375ms 124.4171 Ops/s 122.1676 Ops/s $\color{#35bf28}+1.84\%$
test_redq_speed 10.9451ms 10.0368ms 99.6334 Ops/s 96.8887 Ops/s $\color{#35bf28}+2.83\%$
test_redq_deprec_speed 11.8753ms 11.3032ms 88.4702 Ops/s 86.2508 Ops/s $\color{#35bf28}+2.57\%$
test_td3_speed 8.2390ms 7.9844ms 125.2446 Ops/s 122.5395 Ops/s $\color{#35bf28}+2.21\%$
test_cql_speed 26.5403ms 24.7321ms 40.4333 Ops/s 39.3434 Ops/s $\color{#35bf28}+2.77\%$
test_a2c_speed 5.5372ms 5.1711ms 193.3839 Ops/s 181.5382 Ops/s $\textbf{\color{#35bf28}+6.53\%}$
test_ppo_speed 5.9204ms 5.5262ms 180.9577 Ops/s 170.9439 Ops/s $\textbf{\color{#35bf28}+5.86\%}$
test_reinforce_speed 4.3618ms 4.1247ms 242.4410 Ops/s 220.1847 Ops/s $\textbf{\color{#35bf28}+10.11\%}$
test_iql_speed 21.0816ms 18.6129ms 53.7261 Ops/s 52.4109 Ops/s $\color{#35bf28}+2.51\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.9814ms 2.8771ms 347.5709 Ops/s 340.9956 Ops/s $\color{#35bf28}+1.93\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.2055ms 0.5411ms 1.8479 KOps/s 1.8475 KOps/s $\color{#35bf28}+0.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6735ms 0.5159ms 1.9385 KOps/s 1.9193 KOps/s $\color{#35bf28}+1.01\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.0467ms 2.8919ms 345.7994 Ops/s 337.8203 Ops/s $\color{#35bf28}+2.36\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.3702ms 0.5313ms 1.8820 KOps/s 1.8746 KOps/s $\color{#35bf28}+0.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6528ms 0.5080ms 1.9685 KOps/s 1.9576 KOps/s $\color{#35bf28}+0.55\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 4.4184ms 1.5222ms 656.9564 Ops/s 647.7077 Ops/s $\color{#35bf28}+1.43\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6155ms 1.4499ms 689.6850 Ops/s 674.5977 Ops/s $\color{#35bf28}+2.24\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1703ms 3.0184ms 331.3044 Ops/s 325.7340 Ops/s $\color{#35bf28}+1.71\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7836ms 0.6632ms 1.5079 KOps/s 1.3180 KOps/s $\textbf{\color{#35bf28}+14.41\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.5291ms 0.6417ms 1.5585 KOps/s 1.5483 KOps/s $\color{#35bf28}+0.65\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.1535ms 2.8833ms 346.8240 Ops/s 343.4429 Ops/s $\color{#35bf28}+0.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6627ms 0.5379ms 1.8592 KOps/s 1.8521 KOps/s $\color{#35bf28}+0.38\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6798ms 0.5160ms 1.9381 KOps/s 1.9326 KOps/s $\color{#35bf28}+0.28\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.0867ms 2.9014ms 344.6611 Ops/s 341.0702 Ops/s $\color{#35bf28}+1.05\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.4538ms 0.5336ms 1.8741 KOps/s 1.8735 KOps/s $\color{#35bf28}+0.03\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7050ms 0.5121ms 1.9529 KOps/s 1.9321 KOps/s $\color{#35bf28}+1.07\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1735ms 3.0318ms 329.8374 Ops/s 326.9663 Ops/s $\color{#35bf28}+0.88\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7854ms 0.6643ms 1.5054 KOps/s 1.5035 KOps/s $\color{#35bf28}+0.13\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.7877ms 0.6481ms 1.5430 KOps/s 1.5532 KOps/s $\color{#d91a1a}-0.65\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1209s 7.1544ms 139.7747 Ops/s 110.2889 Ops/s $\textbf{\color{#35bf28}+26.74\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.5032ms 14.8400ms 67.3854 Ops/s 66.4090 Ops/s $\color{#35bf28}+1.47\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 2.1050ms 1.0725ms 932.4229 Ops/s 840.8846 Ops/s $\textbf{\color{#35bf28}+10.89\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1013s 8.7296ms 114.5532 Ops/s 149.6893 Ops/s $\textbf{\color{#d91a1a}-23.47\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.1986ms 14.8805ms 67.2022 Ops/s 66.6495 Ops/s $\color{#35bf28}+0.83\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 1.1242ms 1.0444ms 957.5282 Ops/s 877.9433 Ops/s $\textbf{\color{#35bf28}+9.06\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 99.1277ms 7.0778ms 141.2875 Ops/s 110.6259 Ops/s $\textbf{\color{#35bf28}+27.72\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 17.6565ms 15.1560ms 65.9804 Ops/s 65.0224 Ops/s $\color{#35bf28}+1.47\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.5090ms 1.4032ms 712.6329 Ops/s 599.5928 Ops/s $\textbf{\color{#35bf28}+18.85\%}$

@vmoens vmoens merged commit 4bce371 into main Mar 19, 2024
9 of 10 checks passed
@vmoens vmoens deleted the fix-clip-fraction branch March 19, 2024 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Value clipping for PPO loss
3 participants