Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix RNNs trajectory split in VMAP calls #1736

Merged
merged 4 commits into from
Dec 6, 2023
Merged

[BugFix] Fix RNNs trajectory split in VMAP calls #1736

merged 4 commits into from
Dec 6, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Dec 6, 2023

Description

Fixes #1735
The issue was that a newly created tensor was being populated with a non-leaf tensor during a call to torch.vmap, making vmap complain about it.
The solution is to rely on torch.masked_scatter instead of __setitem__ (it, tensor[mask] = smth).

Copy link

pytorch-bot bot commented Dec 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1736

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 7 Unrelated Failures

As of commit e4830db with merge base 25bd8a5 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@vmoens vmoens marked this pull request as ready for review December 6, 2023 06:35
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 6, 2023
@vmoens vmoens added the bug Something isn't working label Dec 6, 2023
Copy link

github-actions bot commented Dec 6, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 62.4729ms 62.3050ms 16.0501 Ops/s 15.4727 Ops/s $\color{#35bf28}+3.73\%$
test_sync 36.1186ms 34.0500ms 29.3686 Ops/s 29.0412 Ops/s $\color{#35bf28}+1.13\%$
test_async 77.8250ms 33.4864ms 29.8629 Ops/s 29.9895 Ops/s $\color{#d91a1a}-0.42\%$
test_simple 0.4848s 0.4312s 2.3189 Ops/s 2.3322 Ops/s $\color{#d91a1a}-0.57\%$
test_transformed 0.6568s 0.5977s 1.6730 Ops/s 1.6940 Ops/s $\color{#d91a1a}-1.24\%$
test_serial 1.3741s 1.3273s 0.7534 Ops/s 0.7602 Ops/s $\color{#d91a1a}-0.90\%$
test_parallel 1.3150s 1.2737s 0.7851 Ops/s 0.7722 Ops/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[True-True-True-True-True] 0.2596ms 22.4059μs 44.6312 KOps/s 45.1755 KOps/s $\color{#d91a1a}-1.20\%$
test_step_mdp_speed[True-True-True-True-False] 45.2950μs 13.7440μs 72.7589 KOps/s 74.4933 KOps/s $\color{#d91a1a}-2.33\%$
test_step_mdp_speed[True-True-True-False-True] 54.2410μs 13.7468μs 72.7441 KOps/s 73.9382 KOps/s $\color{#d91a1a}-1.61\%$
test_step_mdp_speed[True-True-True-False-False] 28.1330μs 8.2757μs 120.8353 KOps/s 123.6047 KOps/s $\color{#d91a1a}-2.24\%$
test_step_mdp_speed[True-True-False-True-True] 74.5800μs 23.9708μs 41.7173 KOps/s 42.5877 KOps/s $\color{#d91a1a}-2.04\%$
test_step_mdp_speed[True-True-False-True-False] 37.2490μs 14.9508μs 66.8861 KOps/s 67.9856 KOps/s $\color{#d91a1a}-1.62\%$
test_step_mdp_speed[True-True-False-False-True] 51.5370μs 14.8891μs 67.1634 KOps/s 67.9013 KOps/s $\color{#d91a1a}-1.09\%$
test_step_mdp_speed[True-True-False-False-False] 73.9130μs 9.7346μs 102.7258 KOps/s 106.6703 KOps/s $\color{#d91a1a}-3.70\%$
test_step_mdp_speed[True-False-True-True-True] 0.1133ms 25.3665μs 39.4221 KOps/s 40.4230 KOps/s $\color{#d91a1a}-2.48\%$
test_step_mdp_speed[True-False-True-True-False] 52.9690μs 16.1471μs 61.9306 KOps/s 61.7724 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[True-False-True-False-True] 50.8240μs 14.7903μs 67.6117 KOps/s 66.3864 KOps/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[True-False-True-False-False] 51.1950μs 9.5848μs 104.3313 KOps/s 105.3482 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[True-False-False-True-True] 60.1730μs 26.3154μs 38.0006 KOps/s 38.2871 KOps/s $\color{#d91a1a}-0.75\%$
test_step_mdp_speed[True-False-False-True-False] 43.1410μs 17.5197μs 57.0785 KOps/s 58.6994 KOps/s $\color{#d91a1a}-2.76\%$
test_step_mdp_speed[True-False-False-False-True] 47.1180μs 16.3052μs 61.3301 KOps/s 63.3293 KOps/s $\color{#d91a1a}-3.16\%$
test_step_mdp_speed[True-False-False-False-False] 41.4480μs 10.7092μs 93.3775 KOps/s 95.7040 KOps/s $\color{#d91a1a}-2.43\%$
test_step_mdp_speed[False-True-True-True-True] 79.9500μs 25.5745μs 39.1015 KOps/s 40.1772 KOps/s $\color{#d91a1a}-2.68\%$
test_step_mdp_speed[False-True-True-True-False] 38.6230μs 16.3699μs 61.0876 KOps/s 61.7471 KOps/s $\color{#d91a1a}-1.07\%$
test_step_mdp_speed[False-True-True-False-True] 43.2410μs 17.5254μs 57.0600 KOps/s 58.0485 KOps/s $\color{#d91a1a}-1.70\%$
test_step_mdp_speed[False-True-True-False-False] 40.8760μs 10.7599μs 92.9381 KOps/s 92.6964 KOps/s $\color{#35bf28}+0.26\%$
test_step_mdp_speed[False-True-False-True-True] 82.0860μs 26.7091μs 37.4405 KOps/s 38.2819 KOps/s $\color{#d91a1a}-2.20\%$
test_step_mdp_speed[False-True-False-True-False] 47.6090μs 17.5577μs 56.9550 KOps/s 57.3317 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[False-True-False-False-True] 69.5170μs 18.3429μs 54.5169 KOps/s 54.1544 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-True-False-False-False] 58.3990μs 11.8972μs 84.0531 KOps/s 85.7811 KOps/s $\color{#d91a1a}-2.01\%$
test_step_mdp_speed[False-False-True-True-True] 64.7700μs 27.7817μs 35.9949 KOps/s 36.6284 KOps/s $\color{#d91a1a}-1.73\%$
test_step_mdp_speed[False-False-True-True-False] 62.0360μs 18.8021μs 53.1856 KOps/s 53.9781 KOps/s $\color{#d91a1a}-1.47\%$
test_step_mdp_speed[False-False-True-False-True] 58.0690μs 18.3436μs 54.5148 KOps/s 53.6599 KOps/s $\color{#35bf28}+1.59\%$
test_step_mdp_speed[False-False-True-False-False] 43.8020μs 11.8702μs 84.2448 KOps/s 84.4889 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-False-False-True-True] 61.5150μs 28.7104μs 34.8306 KOps/s 34.8649 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[False-False-False-True-False] 54.3320μs 19.9077μs 50.2319 KOps/s 51.0596 KOps/s $\color{#d91a1a}-1.62\%$
test_step_mdp_speed[False-False-False-False-True] 48.9810μs 19.7084μs 50.7398 KOps/s 51.7002 KOps/s $\color{#d91a1a}-1.86\%$
test_step_mdp_speed[False-False-False-False-False] 36.8290μs 13.0796μs 76.4552 KOps/s 77.9678 KOps/s $\color{#d91a1a}-1.94\%$
test_values[generalized_advantage_estimate-True-True] 16.5994ms 11.9141ms 83.9344 Ops/s 84.9609 Ops/s $\color{#d91a1a}-1.21\%$
test_values[vec_generalized_advantage_estimate-True-True] 35.0960ms 27.6827ms 36.1236 Ops/s 36.8629 Ops/s $\color{#d91a1a}-2.01\%$
test_values[td0_return_estimate-False-False] 0.2384ms 0.1743ms 5.7387 KOps/s 5.6848 KOps/s $\color{#35bf28}+0.95\%$
test_values[td1_return_estimate-False-False] 25.2784ms 25.0908ms 39.8553 Ops/s 40.0601 Ops/s $\color{#d91a1a}-0.51\%$
test_values[vec_td1_return_estimate-False-False] 36.0232ms 27.7448ms 36.0428 Ops/s 34.5767 Ops/s $\color{#35bf28}+4.24\%$
test_values[td_lambda_return_estimate-True-False] 35.5488ms 35.1436ms 28.4547 Ops/s 28.4911 Ops/s $\color{#d91a1a}-0.13\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.9708ms 27.8804ms 35.8675 Ops/s 36.9607 Ops/s $\color{#d91a1a}-2.96\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 7.9694ms 7.8479ms 127.4219 Ops/s 128.4934 Ops/s $\color{#d91a1a}-0.83\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.1367ms 1.8937ms 528.0724 Ops/s 563.5431 Ops/s $\textbf{\color{#d91a1a}-6.29\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 8.6675ms 0.4356ms 2.2956 KOps/s 2.2668 KOps/s $\color{#35bf28}+1.27\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 44.3928ms 39.0428ms 25.6129 Ops/s 25.6919 Ops/s $\color{#d91a1a}-0.31\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 2.7381ms 2.6102ms 383.1192 Ops/s 392.4063 Ops/s $\color{#d91a1a}-2.37\%$
test_dqn_speed 8.1862ms 1.6178ms 618.1329 Ops/s 614.4082 Ops/s $\color{#35bf28}+0.61\%$
test_ddpg_speed 12.1894ms 3.6107ms 276.9511 Ops/s 271.2368 Ops/s $\color{#35bf28}+2.11\%$
test_sac_speed 18.2228ms 10.1580ms 98.4442 Ops/s 98.4520 Ops/s $-0.01\%$
test_redq_speed 26.0371ms 19.4827ms 51.3276 Ops/s 51.6185 Ops/s $\color{#d91a1a}-0.56\%$
test_redq_deprec_speed 85.7187ms 16.3149ms 61.2935 Ops/s 65.1665 Ops/s $\textbf{\color{#d91a1a}-5.94\%}$
test_td3_speed 18.3125ms 10.3685ms 96.4459 Ops/s 94.3028 Ops/s $\color{#35bf28}+2.27\%$
test_cql_speed 45.6702ms 38.0346ms 26.2919 Ops/s 25.9011 Ops/s $\color{#35bf28}+1.51\%$
test_a2c_speed 16.5492ms 8.0851ms 123.6839 Ops/s 123.3517 Ops/s $\color{#35bf28}+0.27\%$
test_ppo_speed 16.5383ms 8.4347ms 118.5579 Ops/s 118.8014 Ops/s $\color{#d91a1a}-0.20\%$
test_reinforce_speed 17.0420ms 7.2367ms 138.1847 Ops/s 138.7615 Ops/s $\color{#d91a1a}-0.42\%$
test_iql_speed 42.1560ms 34.1999ms 29.2399 Ops/s 28.5804 Ops/s $\color{#35bf28}+2.31\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.3113ms 1.8458ms 541.7770 Ops/s 544.2045 Ops/s $\color{#d91a1a}-0.45\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.1063s 2.1620ms 462.5452 Ops/s 494.7293 Ops/s $\textbf{\color{#d91a1a}-6.51\%}$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.1818ms 1.9559ms 511.2844 Ops/s 481.6566 Ops/s $\textbf{\color{#35bf28}+6.15\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.9271ms 1.8543ms 539.2768 Ops/s 531.1994 Ops/s $\color{#35bf28}+1.52\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.1043s 2.1585ms 463.2796 Ops/s 476.5990 Ops/s $\color{#d91a1a}-2.79\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.7859ms 1.9571ms 510.9670 Ops/s 477.8078 Ops/s $\textbf{\color{#35bf28}+6.94\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.3768ms 1.8484ms 541.0022 Ops/s 522.3526 Ops/s $\color{#35bf28}+3.57\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.1053s 2.1773ms 459.2812 Ops/s 491.6693 Ops/s $\textbf{\color{#d91a1a}-6.59\%}$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2.5979ms 1.9338ms 517.1148 Ops/s 495.2307 Ops/s $\color{#35bf28}+4.42\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.7699ms 1.8532ms 539.6053 Ops/s 530.8084 Ops/s $\color{#35bf28}+1.66\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.1070s 2.1910ms 456.4211 Ops/s 495.8942 Ops/s $\textbf{\color{#d91a1a}-7.96\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2.9042ms 1.9980ms 500.5127 Ops/s 510.6322 Ops/s $\color{#d91a1a}-1.98\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.5688ms 1.9133ms 522.6638 Ops/s 536.2025 Ops/s $\color{#d91a1a}-2.52\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.1358s 2.3783ms 420.4676 Ops/s 512.4814 Ops/s $\textbf{\color{#d91a1a}-17.95\%}$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 2.9786ms 1.9673ms 508.3050 Ops/s 511.8242 Ops/s $\color{#d91a1a}-0.69\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.2206ms 1.8655ms 536.0606 Ops/s 538.7182 Ops/s $\color{#d91a1a}-0.49\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.1160s 2.1715ms 460.5040 Ops/s 510.5925 Ops/s $\textbf{\color{#d91a1a}-9.81\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2.7042ms 1.9696ms 507.7169 Ops/s 509.6035 Ops/s $\color{#d91a1a}-0.37\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1667s 17.6158ms 56.7672 Ops/s 58.4661 Ops/s $\color{#d91a1a}-2.91\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1053s 16.1956ms 61.7450 Ops/s 61.4954 Ops/s $\color{#35bf28}+0.41\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1093s 16.3162ms 61.2889 Ops/s 54.9021 Ops/s $\textbf{\color{#35bf28}+11.63\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1188s 16.7884ms 59.5651 Ops/s 61.6821 Ops/s $\color{#d91a1a}-3.43\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1111s 16.3934ms 61.0002 Ops/s 61.9529 Ops/s $\color{#d91a1a}-1.54\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1091s 16.4373ms 60.8371 Ops/s 62.0388 Ops/s $\color{#d91a1a}-1.94\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1160s 16.7060ms 59.8588 Ops/s 62.2903 Ops/s $\color{#d91a1a}-3.90\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1199s 17.0734ms 58.5707 Ops/s 61.8246 Ops/s $\textbf{\color{#d91a1a}-5.26\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1141s 16.4904ms 60.6414 Ops/s 61.5437 Ops/s $\color{#d91a1a}-1.47\%$

Copy link

github-actions bot commented Dec 6, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1153s 0.1150s 8.6934 Ops/s 8.5465 Ops/s $\color{#35bf28}+1.72\%$
test_sync 0.1006s 0.1002s 9.9804 Ops/s 9.9557 Ops/s $\color{#35bf28}+0.25\%$
test_async 0.2658s 99.0745ms 10.0934 Ops/s 10.3963 Ops/s $\color{#d91a1a}-2.91\%$
test_single_pixels 0.1381s 0.1378s 7.2585 Ops/s 7.2684 Ops/s $\color{#d91a1a}-0.14\%$
test_sync_pixels 93.6747ms 91.7365ms 10.9008 Ops/s 10.7997 Ops/s $\color{#35bf28}+0.94\%$
test_async_pixels 0.1742s 86.0832ms 11.6167 Ops/s 11.5436 Ops/s $\color{#35bf28}+0.63\%$
test_simple 0.9002s 0.8349s 1.1978 Ops/s 1.1990 Ops/s $\color{#d91a1a}-0.10\%$
test_transformed 1.1242s 1.0619s 0.9417 Ops/s 0.9428 Ops/s $\color{#d91a1a}-0.11\%$
test_serial 2.3627s 2.3028s 0.4342 Ops/s 0.4349 Ops/s $\color{#d91a1a}-0.16\%$
test_parallel 2.5106s 2.4429s 0.4094 Ops/s 0.4126 Ops/s $\color{#d91a1a}-0.79\%$
test_step_mdp_speed[True-True-True-True-True] 71.9210μs 27.7418μs 36.0467 KOps/s 35.6217 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[True-True-True-True-False] 39.3300μs 16.8743μs 59.2618 KOps/s 58.9770 KOps/s $\color{#35bf28}+0.48\%$
test_step_mdp_speed[True-True-True-False-True] 40.6010μs 16.4936μs 60.6298 KOps/s 60.4426 KOps/s $\color{#35bf28}+0.31\%$
test_step_mdp_speed[True-True-True-False-False] 28.7200μs 9.9874μs 100.1264 KOps/s 98.2805 KOps/s $\color{#35bf28}+1.88\%$
test_step_mdp_speed[True-True-False-True-True] 56.3000μs 29.4817μs 33.9193 KOps/s 33.8222 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-True-False-True-False] 41.4200μs 18.0488μs 55.4053 KOps/s 53.2611 KOps/s $\color{#35bf28}+4.03\%$
test_step_mdp_speed[True-True-False-False-True] 44.8900μs 17.9490μs 55.7133 KOps/s 54.9891 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[True-True-False-False-False] 43.6510μs 11.5271μs 86.7518 KOps/s 85.7221 KOps/s $\color{#35bf28}+1.20\%$
test_step_mdp_speed[True-False-True-True-True] 67.3820μs 30.9700μs 32.2894 KOps/s 31.9507 KOps/s $\color{#35bf28}+1.06\%$
test_step_mdp_speed[True-False-True-True-False] 44.4300μs 19.8865μs 50.2855 KOps/s 49.5662 KOps/s $\color{#35bf28}+1.45\%$
test_step_mdp_speed[True-False-True-False-True] 43.4200μs 18.0999μs 55.2489 KOps/s 55.1560 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[True-False-True-False-False] 31.5100μs 11.6410μs 85.9031 KOps/s 84.9741 KOps/s $\color{#35bf28}+1.09\%$
test_step_mdp_speed[True-False-False-True-True] 59.1400μs 33.1113μs 30.2012 KOps/s 30.6049 KOps/s $\color{#d91a1a}-1.32\%$
test_step_mdp_speed[True-False-False-True-False] 48.2310μs 21.4734μs 46.5693 KOps/s 46.3052 KOps/s $\color{#35bf28}+0.57\%$
test_step_mdp_speed[True-False-False-False-True] 55.2110μs 19.5025μs 51.2755 KOps/s 51.1870 KOps/s $\color{#35bf28}+0.17\%$
test_step_mdp_speed[True-False-False-False-False] 33.7110μs 13.2172μs 75.6592 KOps/s 75.2350 KOps/s $\color{#35bf28}+0.56\%$
test_step_mdp_speed[False-True-True-True-True] 62.9700μs 30.9220μs 32.3394 KOps/s 32.3101 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-True-True-True-False] 43.6500μs 20.2183μs 49.4602 KOps/s 49.7100 KOps/s $\color{#d91a1a}-0.50\%$
test_step_mdp_speed[False-True-True-False-True] 46.2200μs 22.0639μs 45.3229 KOps/s 46.2934 KOps/s $\color{#d91a1a}-2.10\%$
test_step_mdp_speed[False-True-True-False-False] 37.1210μs 13.1858μs 75.8393 KOps/s 75.1700 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[False-True-False-True-True] 61.5010μs 33.5261μs 29.8275 KOps/s 30.4481 KOps/s $\color{#d91a1a}-2.04\%$
test_step_mdp_speed[False-True-False-True-False] 50.2300μs 21.2835μs 46.9847 KOps/s 45.7998 KOps/s $\color{#35bf28}+2.59\%$
test_step_mdp_speed[False-True-False-False-True] 0.1835ms 22.9255μs 43.6196 KOps/s 43.6419 KOps/s $\color{#d91a1a}-0.05\%$
test_step_mdp_speed[False-True-False-False-False] 89.1510μs 14.9223μs 67.0138 KOps/s 68.5061 KOps/s $\color{#d91a1a}-2.18\%$
test_step_mdp_speed[False-False-True-True-True] 66.0610μs 33.7138μs 29.6614 KOps/s 29.5733 KOps/s $\color{#35bf28}+0.30\%$
test_step_mdp_speed[False-False-True-True-False] 49.0300μs 23.0663μs 43.3532 KOps/s 42.7691 KOps/s $\color{#35bf28}+1.37\%$
test_step_mdp_speed[False-False-True-False-True] 59.5100μs 23.0523μs 43.3795 KOps/s 43.2179 KOps/s $\color{#35bf28}+0.37\%$
test_step_mdp_speed[False-False-True-False-False] 39.3300μs 14.7487μs 67.8024 KOps/s 67.3332 KOps/s $\color{#35bf28}+0.70\%$
test_step_mdp_speed[False-False-False-True-True] 71.0610μs 35.1313μs 28.4647 KOps/s 28.2006 KOps/s $\color{#35bf28}+0.94\%$
test_step_mdp_speed[False-False-False-True-False] 48.7910μs 24.6406μs 40.5834 KOps/s 39.8929 KOps/s $\color{#35bf28}+1.73\%$
test_step_mdp_speed[False-False-False-False-True] 49.5710μs 23.5736μs 42.4203 KOps/s 41.7288 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[False-False-False-False-False] 44.2810μs 16.4649μs 60.7351 KOps/s 61.9318 KOps/s $\color{#d91a1a}-1.93\%$
test_values[generalized_advantage_estimate-True-True] 26.6140ms 25.8910ms 38.6234 Ops/s 37.4947 Ops/s $\color{#35bf28}+3.01\%$
test_values[vec_generalized_advantage_estimate-True-True] 85.7796ms 3.2969ms 303.3171 Ops/s 310.0842 Ops/s $\color{#d91a1a}-2.18\%$
test_values[td0_return_estimate-False-False] 0.1044ms 66.4625μs 15.0461 KOps/s 14.6697 KOps/s $\color{#35bf28}+2.57\%$
test_values[td1_return_estimate-False-False] 60.1695ms 56.9134ms 17.5706 Ops/s 17.6307 Ops/s $\color{#d91a1a}-0.34\%$
test_values[vec_td1_return_estimate-False-False] 2.1589ms 1.7864ms 559.7978 Ops/s 572.1826 Ops/s $\color{#d91a1a}-2.16\%$
test_values[td_lambda_return_estimate-True-False] 92.9865ms 91.2995ms 10.9530 Ops/s 10.8590 Ops/s $\color{#35bf28}+0.87\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.0178ms 1.7821ms 561.1276 Ops/s 577.1634 Ops/s $\color{#d91a1a}-2.78\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 25.2971ms 24.4476ms 40.9038 Ops/s 40.7368 Ops/s $\color{#35bf28}+0.41\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8818ms 0.7185ms 1.3918 KOps/s 1.3949 KOps/s $\color{#d91a1a}-0.22\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7746ms 0.6944ms 1.4400 KOps/s 1.4340 KOps/s $\color{#35bf28}+0.42\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.5481ms 1.4756ms 677.7084 Ops/s 678.6787 Ops/s $\color{#d91a1a}-0.14\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9791ms 0.6930ms 1.4430 KOps/s 1.3780 KOps/s $\color{#35bf28}+4.72\%$
test_dqn_speed 7.7039ms 1.3681ms 730.9192 Ops/s 724.2191 Ops/s $\color{#35bf28}+0.93\%$
test_ddpg_speed 4.4519ms 3.0791ms 324.7704 Ops/s 303.0028 Ops/s $\textbf{\color{#35bf28}+7.18\%}$
test_sac_speed 9.9046ms 8.7512ms 114.2694 Ops/s 115.6547 Ops/s $\color{#d91a1a}-1.20\%$
test_redq_speed 16.0487ms 15.3840ms 65.0026 Ops/s 65.1090 Ops/s $\color{#d91a1a}-0.16\%$
test_redq_deprec_speed 13.8389ms 12.4675ms 80.2085 Ops/s 81.7304 Ops/s $\color{#d91a1a}-1.86\%$
test_td3_speed 17.8773ms 8.8848ms 112.5513 Ops/s 113.2423 Ops/s $\color{#d91a1a}-0.61\%$
test_cql_speed 32.6927ms 31.2298ms 32.0207 Ops/s 33.0996 Ops/s $\color{#d91a1a}-3.26\%$
test_a2c_speed 8.5030ms 7.1345ms 140.1635 Ops/s 143.1502 Ops/s $\color{#d91a1a}-2.09\%$
test_ppo_speed 8.8351ms 7.4654ms 133.9508 Ops/s 137.2279 Ops/s $\color{#d91a1a}-2.39\%$
test_reinforce_speed 7.6448ms 6.1752ms 161.9390 Ops/s 166.5317 Ops/s $\color{#d91a1a}-2.76\%$
test_iql_speed 0.1273s 29.2328ms 34.2082 Ops/s 38.0672 Ops/s $\textbf{\color{#d91a1a}-10.14\%}$
test_sample_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.7141ms 2.1244ms 470.7132 Ops/s 468.1961 Ops/s $\color{#35bf28}+0.54\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.4279ms 2.2776ms 439.0546 Ops/s 382.3039 Ops/s $\textbf{\color{#35bf28}+14.84\%}$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.8170ms 2.2957ms 435.5893 Ops/s 432.9420 Ops/s $\color{#35bf28}+0.61\%$
test_sample_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.7382ms 2.1093ms 474.0904 Ops/s 475.6538 Ops/s $\color{#d91a1a}-0.33\%$
test_sample_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.4474ms 2.2960ms 435.5352 Ops/s 435.2561 Ops/s $\color{#35bf28}+0.06\%$
test_sample_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.1122ms 2.2920ms 436.2970 Ops/s 433.6112 Ops/s $\color{#35bf28}+0.62\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.4948ms 2.1306ms 469.3418 Ops/s 471.1282 Ops/s $\color{#d91a1a}-0.38\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 3.7273ms 2.2951ms 435.7093 Ops/s 433.5276 Ops/s $\color{#35bf28}+0.50\%$
test_sample_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.1187s 2.5660ms 389.7155 Ops/s 433.0458 Ops/s $\textbf{\color{#d91a1a}-10.01\%}$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 2.3264ms 2.1198ms 471.7463 Ops/s 471.7108 Ops/s $+0.01\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3.2093ms 2.2980ms 435.1568 Ops/s 435.1688 Ops/s $-0.00\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.3050ms 2.2857ms 437.5082 Ops/s 435.4531 Ops/s $\color{#35bf28}+0.47\%$
test_iterate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.5548ms 2.1283ms 469.8566 Ops/s 471.0720 Ops/s $\color{#d91a1a}-0.26\%$
test_iterate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3.4413ms 2.2956ms 435.6139 Ops/s 435.5780 Ops/s $+0.01\%$
test_iterate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.5739ms 2.3053ms 433.7895 Ops/s 434.0599 Ops/s $\color{#d91a1a}-0.06\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 2.6422ms 2.1342ms 468.5667 Ops/s 472.2010 Ops/s $\color{#d91a1a}-0.77\%$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.2044ms 2.1497ms 465.1890 Ops/s 433.5703 Ops/s $\textbf{\color{#35bf28}+7.29\%}$
test_iterate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 3.1839ms 2.2998ms 434.8189 Ops/s 433.8295 Ops/s $\color{#35bf28}+0.23\%$
test_populate_rb[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.2181s 18.2760ms 54.7165 Ops/s 55.4495 Ops/s $\color{#d91a1a}-1.32\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 0.1250s 14.0430ms 71.2098 Ops/s 61.8474 Ops/s $\textbf{\color{#35bf28}+15.14\%}$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 0.1256s 16.3552ms 61.1428 Ops/s 72.0612 Ops/s $\textbf{\color{#d91a1a}-15.15\%}$
test_populate_rb[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1273s 16.3855ms 61.0294 Ops/s 61.6957 Ops/s $\color{#d91a1a}-1.08\%$
test_populate_rb[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 0.1268s 16.3877ms 61.0215 Ops/s 61.5741 Ops/s $\color{#d91a1a}-0.90\%$
test_populate_rb[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 0.1261s 16.4227ms 60.8914 Ops/s 61.7320 Ops/s $\color{#d91a1a}-1.36\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1261s 14.0852ms 70.9967 Ops/s 61.7646 Ops/s $\textbf{\color{#35bf28}+14.95\%}$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1287s 16.3716ms 61.0814 Ops/s 61.6801 Ops/s $\color{#d91a1a}-0.97\%$
test_populate_rb[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 0.1307s 14.2142ms 70.3524 Ops/s 72.0727 Ops/s $\color{#d91a1a}-2.39\%$

@vmoens vmoens merged commit f1e4b43 into main Dec 6, 2023
53 of 60 checks passed
@vmoens vmoens deleted the fix-rnn-vmap branch December 6, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Discrete SAC vmap runtime error when using MLP + LSTM architecture
3 participants