Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Make KL-controllers independent of the model #1903

Merged
merged 2 commits into from
Feb 12, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 12, 2024

No description provided.

Copy link

pytorch-bot bot commented Feb 12, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1903

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 3d55991 with merge base 2f9e1ae (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 12, 2024
@vmoens vmoens added bug Something isn't working Suitable for minor Suitable to be integrated in minor release (no new feature) labels Feb 12, 2024
@vmoens vmoens linked an issue Feb 12, 2024 that may be closed by this pull request
1 task
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}13$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1324s 68.3079ms 14.6396 Ops/s 15.6824 Ops/s $\textbf{\color{#d91a1a}-6.65\%}$
test_sync 36.4531ms 34.2971ms 29.1570 Ops/s 27.6005 Ops/s $\textbf{\color{#35bf28}+5.64\%}$
test_async 0.1265s 33.4058ms 29.9349 Ops/s 30.5841 Ops/s $\color{#d91a1a}-2.12\%$
test_simple 0.5128s 0.4456s 2.2440 Ops/s 2.2961 Ops/s $\color{#d91a1a}-2.27\%$
test_transformed 0.6785s 0.6136s 1.6297 Ops/s 1.6436 Ops/s $\color{#d91a1a}-0.84\%$
test_serial 1.5212s 1.4603s 0.6848 Ops/s 0.6957 Ops/s $\color{#d91a1a}-1.58\%$
test_parallel 1.5034s 1.4391s 0.6949 Ops/s 0.7073 Ops/s $\color{#d91a1a}-1.76\%$
test_step_mdp_speed[True-True-True-True-True] 0.1423ms 21.2962μs 46.9566 KOps/s 45.7171 KOps/s $\color{#35bf28}+2.71\%$
test_step_mdp_speed[True-True-True-True-False] 53.9300μs 12.9580μs 77.1726 KOps/s 73.7173 KOps/s $\color{#35bf28}+4.69\%$
test_step_mdp_speed[True-True-True-False-True] 38.2120μs 12.4134μs 80.5580 KOps/s 77.9385 KOps/s $\color{#35bf28}+3.36\%$
test_step_mdp_speed[True-True-True-False-False] 54.7020μs 7.5807μs 131.9144 KOps/s 128.5106 KOps/s $\color{#35bf28}+2.65\%$
test_step_mdp_speed[True-True-False-True-True] 49.0810μs 22.6885μs 44.0752 KOps/s 42.6296 KOps/s $\color{#35bf28}+3.39\%$
test_step_mdp_speed[True-True-False-True-False] 62.4770μs 14.1981μs 70.4319 KOps/s 66.9734 KOps/s $\textbf{\color{#35bf28}+5.16\%}$
test_step_mdp_speed[True-True-False-False-True] 47.0370μs 13.6606μs 73.2031 KOps/s 70.6038 KOps/s $\color{#35bf28}+3.68\%$
test_step_mdp_speed[True-True-False-False-False] 52.0370μs 8.7548μs 114.2234 KOps/s 109.7189 KOps/s $\color{#35bf28}+4.11\%$
test_step_mdp_speed[True-False-True-True-True] 66.3340μs 23.9959μs 41.6738 KOps/s 40.3735 KOps/s $\color{#35bf28}+3.22\%$
test_step_mdp_speed[True-False-True-True-False] 66.0530μs 15.5802μs 64.1840 KOps/s 60.9898 KOps/s $\textbf{\color{#35bf28}+5.24\%}$
test_step_mdp_speed[True-False-True-False-True] 64.0300μs 13.8127μs 72.3971 KOps/s 70.8653 KOps/s $\color{#35bf28}+2.16\%$
test_step_mdp_speed[True-False-True-False-False] 49.9740μs 8.7668μs 114.0671 KOps/s 110.1630 KOps/s $\color{#35bf28}+3.54\%$
test_step_mdp_speed[True-False-False-True-True] 82.1240μs 25.1602μs 39.7454 KOps/s 38.6409 KOps/s $\color{#35bf28}+2.86\%$
test_step_mdp_speed[True-False-False-True-False] 48.4600μs 16.8618μs 59.3057 KOps/s 56.8300 KOps/s $\color{#35bf28}+4.36\%$
test_step_mdp_speed[True-False-False-False-True] 66.3740μs 15.0298μs 66.5344 KOps/s 65.8258 KOps/s $\color{#35bf28}+1.08\%$
test_step_mdp_speed[True-False-False-False-False] 48.2300μs 9.9847μs 100.1529 KOps/s 95.8480 KOps/s $\color{#35bf28}+4.49\%$
test_step_mdp_speed[False-True-True-True-True] 95.2090μs 23.8813μs 41.8738 KOps/s 40.1309 KOps/s $\color{#35bf28}+4.34\%$
test_step_mdp_speed[False-True-True-True-False] 37.7910μs 15.5110μs 64.4702 KOps/s 60.8455 KOps/s $\textbf{\color{#35bf28}+5.96\%}$
test_step_mdp_speed[False-True-True-False-True] 46.0960μs 15.8890μs 62.9368 KOps/s 61.2096 KOps/s $\color{#35bf28}+2.82\%$
test_step_mdp_speed[False-True-True-False-False] 43.3610μs 9.9964μs 100.0359 KOps/s 94.2127 KOps/s $\textbf{\color{#35bf28}+6.18\%}$
test_step_mdp_speed[False-True-False-True-True] 44.9330μs 25.5794μs 39.0940 KOps/s 37.7869 KOps/s $\color{#35bf28}+3.46\%$
test_step_mdp_speed[False-True-False-True-False] 60.4410μs 16.6162μs 60.1821 KOps/s 57.3243 KOps/s $\color{#35bf28}+4.99\%$
test_step_mdp_speed[False-True-False-False-True] 41.4070μs 17.1349μs 58.3603 KOps/s 56.9516 KOps/s $\color{#35bf28}+2.47\%$
test_step_mdp_speed[False-True-False-False-False] 60.4120μs 11.1537μs 89.6560 KOps/s 84.8945 KOps/s $\textbf{\color{#35bf28}+5.61\%}$
test_step_mdp_speed[False-False-True-True-True] 86.0400μs 26.5256μs 37.6994 KOps/s 36.7230 KOps/s $\color{#35bf28}+2.66\%$
test_step_mdp_speed[False-False-True-True-False] 49.4620μs 17.9004μs 55.8648 KOps/s 52.4648 KOps/s $\textbf{\color{#35bf28}+6.48\%}$
test_step_mdp_speed[False-False-True-False-True] 57.6580μs 16.9725μs 58.9187 KOps/s 56.7031 KOps/s $\color{#35bf28}+3.91\%$
test_step_mdp_speed[False-False-True-False-False] 35.0150μs 11.1546μs 89.6493 KOps/s 84.6767 KOps/s $\textbf{\color{#35bf28}+5.87\%}$
test_step_mdp_speed[False-False-False-True-True] 81.3920μs 27.6991μs 36.1023 KOps/s 35.5608 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[False-False-False-True-False] 64.9610μs 19.0421μs 52.5153 KOps/s 49.8246 KOps/s $\textbf{\color{#35bf28}+5.40\%}$
test_step_mdp_speed[False-False-False-False-True] 47.2980μs 18.1348μs 55.1427 KOps/s 53.8711 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[False-False-False-False-False] 58.8400μs 12.1643μs 82.2076 KOps/s 77.5261 KOps/s $\textbf{\color{#35bf28}+6.04\%}$
test_values[generalized_advantage_estimate-True-True] 12.5014ms 9.2133ms 108.5390 Ops/s 109.7919 Ops/s $\color{#d91a1a}-1.14\%$
test_values[vec_generalized_advantage_estimate-True-True] 38.2641ms 35.2823ms 28.3428 Ops/s 29.9760 Ops/s $\textbf{\color{#d91a1a}-5.45\%}$
test_values[td0_return_estimate-False-False] 0.2343ms 0.1776ms 5.6291 KOps/s 6.1322 KOps/s $\textbf{\color{#d91a1a}-8.20\%}$
test_values[td1_return_estimate-False-False] 25.3157ms 23.1152ms 43.2615 Ops/s 43.1213 Ops/s $\color{#35bf28}+0.33\%$
test_values[vec_td1_return_estimate-False-False] 36.7201ms 35.3016ms 28.3273 Ops/s 29.8491 Ops/s $\textbf{\color{#d91a1a}-5.10\%}$
test_values[td_lambda_return_estimate-True-False] 33.5539ms 33.0705ms 30.2384 Ops/s 30.4545 Ops/s $\color{#d91a1a}-0.71\%$
test_values[vec_td_lambda_return_estimate-True-False] 37.9413ms 35.4278ms 28.2264 Ops/s 29.8848 Ops/s $\textbf{\color{#d91a1a}-5.55\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 11.2234ms 8.0484ms 124.2487 Ops/s 124.9060 Ops/s $\color{#d91a1a}-0.53\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2986ms 1.9987ms 500.3173 Ops/s 515.5404 Ops/s $\color{#d91a1a}-2.95\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4341ms 0.3500ms 2.8572 KOps/s 2.8357 KOps/s $\color{#35bf28}+0.76\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 47.7726ms 46.3235ms 21.5873 Ops/s 24.8521 Ops/s $\textbf{\color{#d91a1a}-13.14\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.5771ms 3.0218ms 330.9330 Ops/s 331.8518 Ops/s $\color{#d91a1a}-0.28\%$
test_dqn_speed 2.0722ms 1.3684ms 730.7951 Ops/s 730.7246 Ops/s $+0.01\%$
test_ddpg_speed 3.5568ms 2.7263ms 366.7944 Ops/s 368.7430 Ops/s $\color{#d91a1a}-0.53\%$
test_sac_speed 9.4539ms 8.6467ms 115.6505 Ops/s 117.3205 Ops/s $\color{#d91a1a}-1.42\%$
test_redq_speed 14.5455ms 13.3648ms 74.8235 Ops/s 75.9926 Ops/s $\color{#d91a1a}-1.54\%$
test_redq_deprec_speed 14.9599ms 13.6153ms 73.4470 Ops/s 74.8107 Ops/s $\color{#d91a1a}-1.82\%$
test_td3_speed 9.1808ms 8.7448ms 114.3531 Ops/s 115.2798 Ops/s $\color{#d91a1a}-0.80\%$
test_cql_speed 46.1974ms 37.3327ms 26.7862 Ops/s 27.4552 Ops/s $\color{#d91a1a}-2.44\%$
test_a2c_speed 8.7508ms 7.2595ms 137.7500 Ops/s 138.4245 Ops/s $\color{#d91a1a}-0.49\%$
test_ppo_speed 8.6382ms 7.5379ms 132.6621 Ops/s 133.6390 Ops/s $\color{#d91a1a}-0.73\%$
test_reinforce_speed 7.3139ms 6.5338ms 153.0493 Ops/s 153.8789 Ops/s $\color{#d91a1a}-0.54\%$
test_iql_speed 34.3194ms 33.0340ms 30.2719 Ops/s 30.5758 Ops/s $\color{#d91a1a}-0.99\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.2828ms 2.7837ms 359.2400 Ops/s 379.1417 Ops/s $\textbf{\color{#d91a1a}-5.25\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6516ms 0.5325ms 1.8780 KOps/s 1.9598 KOps/s $\color{#d91a1a}-4.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7486ms 0.5073ms 1.9710 KOps/s 2.0658 KOps/s $\color{#d91a1a}-4.59\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.3022ms 2.9685ms 336.8698 Ops/s 375.3582 Ops/s $\textbf{\color{#d91a1a}-10.25\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7269ms 0.5155ms 1.9399 KOps/s 1.9851 KOps/s $\color{#d91a1a}-2.28\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7771ms 0.4924ms 2.0310 KOps/s 1.9602 KOps/s $\color{#35bf28}+3.61\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.3102ms 2.9458ms 339.4709 Ops/s 364.1617 Ops/s $\textbf{\color{#d91a1a}-6.78\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8429ms 0.6310ms 1.5849 KOps/s 1.5944 KOps/s $\color{#d91a1a}-0.60\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6981ms 0.6005ms 1.6652 KOps/s 1.6634 KOps/s $\color{#35bf28}+0.11\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.2504ms 2.7951ms 357.7715 Ops/s 379.5803 Ops/s $\textbf{\color{#d91a1a}-5.75\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7181ms 0.5174ms 1.9328 KOps/s 1.9354 KOps/s $\color{#d91a1a}-0.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6412ms 0.4979ms 2.0085 KOps/s 2.0420 KOps/s $\color{#d91a1a}-1.64\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.3198ms 2.8436ms 351.6681 Ops/s 370.8897 Ops/s $\textbf{\color{#d91a1a}-5.18\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6928ms 0.5150ms 1.9417 KOps/s 1.9707 KOps/s $\color{#d91a1a}-1.47\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6176ms 0.4832ms 2.0696 KOps/s 2.0517 KOps/s $\color{#35bf28}+0.87\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.3885ms 2.9436ms 339.7237 Ops/s 345.2475 Ops/s $\color{#d91a1a}-1.60\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8032ms 0.6438ms 1.5533 KOps/s 1.5747 KOps/s $\color{#d91a1a}-1.36\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2.5515ms 0.6675ms 1.4981 KOps/s 1.6534 KOps/s $\textbf{\color{#d91a1a}-9.39\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1142s 8.1545ms 122.6321 Ops/s 119.0987 Ops/s $\color{#35bf28}+2.97\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 16.3498ms 13.3950ms 74.6545 Ops/s 76.2100 Ops/s $\color{#d91a1a}-2.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 3.8404ms 2.5714ms 388.8976 Ops/s 403.9602 Ops/s $\color{#d91a1a}-3.73\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1048s 10.0120ms 99.8798 Ops/s 100.1709 Ops/s $\color{#d91a1a}-0.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.3784ms 13.1553ms 76.0152 Ops/s 77.0846 Ops/s $\color{#d91a1a}-1.39\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 3.8586ms 2.5098ms 398.4427 Ops/s 397.1219 Ops/s $\color{#35bf28}+0.33\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1059s 10.1542ms 98.4815 Ops/s 123.9992 Ops/s $\textbf{\color{#d91a1a}-20.58\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 15.7146ms 13.3479ms 74.9183 Ops/s 74.7048 Ops/s $\color{#35bf28}+0.29\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.3791ms 2.7839ms 359.2140 Ops/s 361.7275 Ops/s $\color{#d91a1a}-0.69\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1179s 0.1173s 8.5230 Ops/s 8.4034 Ops/s $\color{#35bf28}+1.42\%$
test_sync 0.1849s 0.1041s 9.6077 Ops/s 9.5998 Ops/s $\color{#35bf28}+0.08\%$
test_async 0.1829s 92.1374ms 10.8534 Ops/s 10.8756 Ops/s $\color{#d91a1a}-0.20\%$
test_single_pixels 0.1380s 0.1372s 7.2890 Ops/s 7.1712 Ops/s $\color{#35bf28}+1.64\%$
test_sync_pixels 83.5034ms 81.1872ms 12.3172 Ops/s 12.4676 Ops/s $\color{#d91a1a}-1.21\%$
test_async_pixels 0.1447s 73.2075ms 13.6598 Ops/s 13.4635 Ops/s $\color{#35bf28}+1.46\%$
test_simple 0.9144s 0.8392s 1.1916 Ops/s 1.1906 Ops/s $\color{#35bf28}+0.09\%$
test_transformed 1.1748s 1.0954s 0.9129 Ops/s 0.9015 Ops/s $\color{#35bf28}+1.26\%$
test_serial 2.5318s 2.4573s 0.4069 Ops/s 0.4112 Ops/s $\color{#d91a1a}-1.03\%$
test_parallel 2.1593s 2.0831s 0.4801 Ops/s 0.4639 Ops/s $\color{#35bf28}+3.48\%$
test_step_mdp_speed[True-True-True-True-True] 71.2910μs 32.7884μs 30.4986 KOps/s 29.5529 KOps/s $\color{#35bf28}+3.20\%$
test_step_mdp_speed[True-True-True-True-False] 58.7210μs 19.8873μs 50.2834 KOps/s 49.4461 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[True-True-True-False-True] 42.2710μs 18.6288μs 53.6804 KOps/s 52.0376 KOps/s $\color{#35bf28}+3.16\%$
test_step_mdp_speed[True-True-True-False-False] 28.4410μs 11.0898μs 90.1729 KOps/s 87.1390 KOps/s $\color{#35bf28}+3.48\%$
test_step_mdp_speed[True-True-False-True-True] 0.1008ms 34.7024μs 28.8165 KOps/s 27.6890 KOps/s $\color{#35bf28}+4.07\%$
test_step_mdp_speed[True-True-False-True-False] 40.8310μs 21.2237μs 47.1171 KOps/s 45.1924 KOps/s $\color{#35bf28}+4.26\%$
test_step_mdp_speed[True-True-False-False-True] 47.5900μs 20.3000μs 49.2610 KOps/s 48.1284 KOps/s $\color{#35bf28}+2.35\%$
test_step_mdp_speed[True-True-False-False-False] 33.2410μs 13.0305μs 76.7429 KOps/s 74.7627 KOps/s $\color{#35bf28}+2.65\%$
test_step_mdp_speed[True-False-True-True-True] 58.0400μs 36.4238μs 27.4545 KOps/s 26.5374 KOps/s $\color{#35bf28}+3.46\%$
test_step_mdp_speed[True-False-True-True-False] 52.6700μs 23.1013μs 43.2876 KOps/s 41.7226 KOps/s $\color{#35bf28}+3.75\%$
test_step_mdp_speed[True-False-True-False-True] 84.9610μs 20.2265μs 49.4400 KOps/s 48.0279 KOps/s $\color{#35bf28}+2.94\%$
test_step_mdp_speed[True-False-True-False-False] 44.5900μs 12.9848μs 77.0129 KOps/s 74.8285 KOps/s $\color{#35bf28}+2.92\%$
test_step_mdp_speed[True-False-False-True-True] 56.0010μs 38.4077μs 26.0364 KOps/s 25.3436 KOps/s $\color{#35bf28}+2.73\%$
test_step_mdp_speed[True-False-False-True-False] 47.8610μs 25.3007μs 39.5246 KOps/s 38.5038 KOps/s $\color{#35bf28}+2.65\%$
test_step_mdp_speed[True-False-False-False-True] 44.0510μs 22.1639μs 45.1185 KOps/s 43.6618 KOps/s $\color{#35bf28}+3.34\%$
test_step_mdp_speed[True-False-False-False-False] 69.8500μs 14.9414μs 66.9280 KOps/s 65.3538 KOps/s $\color{#35bf28}+2.41\%$
test_step_mdp_speed[False-True-True-True-True] 57.2110μs 36.8217μs 27.1579 KOps/s 26.2812 KOps/s $\color{#35bf28}+3.34\%$
test_step_mdp_speed[False-True-True-True-False] 53.4700μs 23.1268μs 43.2398 KOps/s 41.5188 KOps/s $\color{#35bf28}+4.15\%$
test_step_mdp_speed[False-True-True-False-True] 71.7010μs 24.6797μs 40.5192 KOps/s 40.3079 KOps/s $\color{#35bf28}+0.52\%$
test_step_mdp_speed[False-True-True-False-False] 33.4800μs 14.8949μs 67.1372 KOps/s 65.8102 KOps/s $\color{#35bf28}+2.02\%$
test_step_mdp_speed[False-True-False-True-True] 0.2287ms 38.5400μs 25.9471 KOps/s 24.8976 KOps/s $\color{#35bf28}+4.22\%$
test_step_mdp_speed[False-True-False-True-False] 95.4110μs 25.1633μs 39.7405 KOps/s 38.3868 KOps/s $\color{#35bf28}+3.53\%$
test_step_mdp_speed[False-True-False-False-True] 89.2710μs 26.1846μs 38.1904 KOps/s 37.0561 KOps/s $\color{#35bf28}+3.06\%$
test_step_mdp_speed[False-True-False-False-False] 47.6800μs 16.3668μs 61.0993 KOps/s 58.1256 KOps/s $\textbf{\color{#35bf28}+5.12\%}$
test_step_mdp_speed[False-False-True-True-True] 65.8910μs 39.9007μs 25.0622 KOps/s 24.0326 KOps/s $\color{#35bf28}+4.28\%$
test_step_mdp_speed[False-False-True-True-False] 49.6410μs 27.4056μs 36.4889 KOps/s 35.7801 KOps/s $\color{#35bf28}+1.98\%$
test_step_mdp_speed[False-False-True-False-True] 89.9910μs 25.8550μs 38.6773 KOps/s 37.5179 KOps/s $\color{#35bf28}+3.09\%$
test_step_mdp_speed[False-False-True-False-False] 35.4900μs 16.4496μs 60.7917 KOps/s 58.5563 KOps/s $\color{#35bf28}+3.82\%$
test_step_mdp_speed[False-False-False-True-True] 78.9810μs 42.1311μs 23.7354 KOps/s 23.1953 KOps/s $\color{#35bf28}+2.33\%$
test_step_mdp_speed[False-False-False-True-False] 57.6810μs 29.3103μs 34.1177 KOps/s 33.6672 KOps/s $\color{#35bf28}+1.34\%$
test_step_mdp_speed[False-False-False-False-True] 45.8110μs 27.2964μs 36.6349 KOps/s 35.1003 KOps/s $\color{#35bf28}+4.37\%$
test_step_mdp_speed[False-False-False-False-False] 42.0210μs 18.1411μs 55.1234 KOps/s 52.8680 KOps/s $\color{#35bf28}+4.27\%$
test_values[generalized_advantage_estimate-True-True] 26.4176ms 25.3493ms 39.4489 Ops/s 41.1520 Ops/s $\color{#d91a1a}-4.14\%$
test_values[vec_generalized_advantage_estimate-True-True] 86.4498ms 3.2964ms 303.3600 Ops/s 308.6651 Ops/s $\color{#d91a1a}-1.72\%$
test_values[td0_return_estimate-False-False] 0.1071ms 65.0206μs 15.3797 KOps/s 15.8494 KOps/s $\color{#d91a1a}-2.96\%$
test_values[td1_return_estimate-False-False] 56.7009ms 55.9954ms 17.8586 Ops/s 19.1729 Ops/s $\textbf{\color{#d91a1a}-6.85\%}$
test_values[vec_td1_return_estimate-False-False] 2.0837ms 1.7656ms 566.3951 Ops/s 568.6260 Ops/s $\color{#d91a1a}-0.39\%$
test_values[td_lambda_return_estimate-True-False] 91.1190ms 89.5853ms 11.1626 Ops/s 11.9892 Ops/s $\textbf{\color{#d91a1a}-6.89\%}$
test_values[vec_td_lambda_return_estimate-True-False] 4.2488ms 1.8002ms 555.4967 Ops/s 557.6346 Ops/s $\color{#d91a1a}-0.38\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 26.4183ms 25.1195ms 39.8097 Ops/s 43.9788 Ops/s $\textbf{\color{#d91a1a}-9.48\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8997ms 0.7016ms 1.4253 KOps/s 1.4166 KOps/s $\color{#35bf28}+0.62\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.8025ms 0.6512ms 1.5356 KOps/s 1.5456 KOps/s $\color{#d91a1a}-0.65\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6121ms 1.4654ms 682.4285 Ops/s 689.6159 Ops/s $\color{#d91a1a}-1.04\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9492ms 0.6661ms 1.5012 KOps/s 1.5012 KOps/s $+0.00\%$
test_dqn_speed 8.9950ms 1.4707ms 679.9465 Ops/s 687.4269 Ops/s $\color{#d91a1a}-1.09\%$
test_ddpg_speed 3.2453ms 2.7982ms 357.3689 Ops/s 358.6136 Ops/s $\color{#d91a1a}-0.35\%$
test_sac_speed 8.9857ms 8.5400ms 117.0956 Ops/s 117.6642 Ops/s $\color{#d91a1a}-0.48\%$
test_redq_speed 13.4564ms 11.1467ms 89.7124 Ops/s 89.9339 Ops/s $\color{#d91a1a}-0.25\%$
test_redq_deprec_speed 14.8117ms 11.8919ms 84.0906 Ops/s 85.4062 Ops/s $\color{#d91a1a}-1.54\%$
test_td3_speed 19.2277ms 8.7971ms 113.6744 Ops/s 114.1219 Ops/s $\color{#d91a1a}-0.39\%$
test_cql_speed 28.3263ms 26.7336ms 37.4061 Ops/s 37.5161 Ops/s $\color{#d91a1a}-0.29\%$
test_a2c_speed 5.9396ms 5.6118ms 178.1953 Ops/s 181.7701 Ops/s $\color{#d91a1a}-1.97\%$
test_ppo_speed 6.3786ms 5.9699ms 167.5077 Ops/s 169.8639 Ops/s $\color{#d91a1a}-1.39\%$
test_reinforce_speed 4.8936ms 4.6314ms 215.9159 Ops/s 219.0295 Ops/s $\color{#d91a1a}-1.42\%$
test_iql_speed 21.5136ms 20.6215ms 48.4931 Ops/s 48.1640 Ops/s $\color{#35bf28}+0.68\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.8003ms 3.6445ms 274.3890 Ops/s 274.7728 Ops/s $\color{#d91a1a}-0.14\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8504ms 0.5774ms 1.7318 KOps/s 1.7117 KOps/s $\color{#35bf28}+1.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8026ms 0.5571ms 1.7950 KOps/s 1.7983 KOps/s $\color{#d91a1a}-0.19\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.1098ms 3.6920ms 270.8580 Ops/s 271.7314 Ops/s $\color{#d91a1a}-0.32\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7536ms 0.5637ms 1.7741 KOps/s 1.7535 KOps/s $\color{#35bf28}+1.17\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8503ms 0.5421ms 1.8447 KOps/s 1.4452 KOps/s $\textbf{\color{#35bf28}+27.64\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.0240ms 3.7968ms 263.3815 Ops/s 264.6601 Ops/s $\color{#d91a1a}-0.48\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8661ms 0.7007ms 1.4271 KOps/s 1.4055 KOps/s $\color{#35bf28}+1.54\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8107ms 0.6709ms 1.4905 KOps/s 1.4520 KOps/s $\color{#35bf28}+2.65\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.7864ms 3.6390ms 274.8042 Ops/s 274.7318 Ops/s $\color{#35bf28}+0.03\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.7385ms 0.5751ms 1.7389 KOps/s 1.7186 KOps/s $\color{#35bf28}+1.18\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6896ms 0.5480ms 1.8249 KOps/s 1.8022 KOps/s $\color{#35bf28}+1.26\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.0174ms 3.7058ms 269.8463 Ops/s 269.6094 Ops/s $\color{#35bf28}+0.09\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.8682ms 0.5696ms 1.7555 KOps/s 1.7462 KOps/s $\color{#35bf28}+0.53\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7231ms 0.5460ms 1.8316 KOps/s 1.8253 KOps/s $\color{#35bf28}+0.34\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.0435ms 3.7788ms 264.6364 Ops/s 263.7062 Ops/s $\color{#35bf28}+0.35\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8822ms 0.7020ms 1.4245 KOps/s 1.4123 KOps/s $\color{#35bf28}+0.86\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9262ms 0.6812ms 1.4681 KOps/s 1.4753 KOps/s $\color{#d91a1a}-0.49\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1313s 10.5576ms 94.7184 Ops/s 92.0101 Ops/s $\color{#35bf28}+2.94\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 18.5969ms 16.5383ms 60.4655 Ops/s 53.6134 Ops/s $\textbf{\color{#35bf28}+12.78\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 6.4393ms 3.1601ms 316.4424 Ops/s 318.8474 Ops/s $\color{#d91a1a}-0.75\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1573s 13.7966ms 72.4816 Ops/s 93.4411 Ops/s $\textbf{\color{#d91a1a}-22.43\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 19.0249ms 16.4943ms 60.6269 Ops/s 61.3790 Ops/s $\color{#d91a1a}-1.23\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.9454ms 3.1556ms 316.8926 Ops/s 318.5568 Ops/s $\color{#d91a1a}-0.52\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1289s 10.6422ms 93.9653 Ops/s 91.4958 Ops/s $\color{#35bf28}+2.70\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 19.8259ms 16.8122ms 59.4805 Ops/s 60.4965 Ops/s $\color{#d91a1a}-1.68\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 8.0304ms 3.4972ms 285.9415 Ops/s 290.2167 Ops/s $\color{#d91a1a}-1.47\%$

@vmoens vmoens merged commit 899af07 into main Feb 12, 2024
66 of 68 checks passed
@vmoens vmoens deleted the fix-wrapper-kl branch February 12, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Suitable for minor Suitable to be integrated in minor release (no new feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Make KLControllerBase independent of the model
2 participants