Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix sampling without replacement with ndim storages #1999

Merged
merged 1 commit into from
Mar 7, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 7, 2024

No description provided.

Copy link

pytorch-bot bot commented Mar 7, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1999

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures

As of commit 887b83e with merge base fe6c070 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 7, 2024
@vmoens vmoens added the bug Something isn't working label Mar 7, 2024
@vmoens vmoens marked this pull request as ready for review March 7, 2024 11:17
Copy link

github-actions bot commented Mar 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 60.9970ms 60.2178ms 16.6064 Ops/s 16.7231 Ops/s $\color{#d91a1a}-0.70\%$
test_sync 33.6331ms 32.8454ms 30.4456 Ops/s 30.8374 Ops/s $\color{#d91a1a}-1.27\%$
test_async 62.6889ms 29.9776ms 33.3583 Ops/s 33.9784 Ops/s $\color{#d91a1a}-1.82\%$
test_simple 0.4857s 0.4249s 2.3533 Ops/s 2.3710 Ops/s $\color{#d91a1a}-0.75\%$
test_transformed 0.6215s 0.5706s 1.7524 Ops/s 1.7230 Ops/s $\color{#35bf28}+1.71\%$
test_serial 1.4759s 1.4107s 0.7089 Ops/s 0.7215 Ops/s $\color{#d91a1a}-1.75\%$
test_parallel 1.4465s 1.3934s 0.7177 Ops/s 0.7234 Ops/s $\color{#d91a1a}-0.79\%$
test_step_mdp_speed[True-True-True-True-True] 0.2390ms 21.3347μs 46.8721 KOps/s 46.1631 KOps/s $\color{#35bf28}+1.54\%$
test_step_mdp_speed[True-True-True-True-False] 38.2710μs 13.0412μs 76.6801 KOps/s 76.8079 KOps/s $\color{#d91a1a}-0.17\%$
test_step_mdp_speed[True-True-True-False-True] 34.7850μs 12.4971μs 80.0185 KOps/s 78.5767 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[True-True-True-False-False] 27.9320μs 7.6123μs 131.3657 KOps/s 129.4680 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[True-True-False-True-True] 53.6790μs 23.0498μs 43.3843 KOps/s 43.6090 KOps/s $\color{#d91a1a}-0.52\%$
test_step_mdp_speed[True-True-False-True-False] 42.0680μs 14.3931μs 69.4779 KOps/s 68.8721 KOps/s $\color{#35bf28}+0.88\%$
test_step_mdp_speed[True-True-False-False-True] 65.9600μs 13.6471μs 73.2756 KOps/s 72.1810 KOps/s $\color{#35bf28}+1.52\%$
test_step_mdp_speed[True-True-False-False-False] 59.2800μs 8.7845μs 113.8372 KOps/s 111.6352 KOps/s $\color{#35bf28}+1.97\%$
test_step_mdp_speed[True-False-True-True-True] 54.9020μs 24.0845μs 41.5205 KOps/s 40.8037 KOps/s $\color{#35bf28}+1.76\%$
test_step_mdp_speed[True-False-True-True-False] 68.4370μs 15.3794μs 65.0220 KOps/s 62.9618 KOps/s $\color{#35bf28}+3.27\%$
test_step_mdp_speed[True-False-True-False-True] 46.8270μs 13.7119μs 72.9291 KOps/s 72.1179 KOps/s $\color{#35bf28}+1.12\%$
test_step_mdp_speed[True-False-True-False-False] 28.6430μs 8.7728μs 113.9892 KOps/s 111.8438 KOps/s $\color{#35bf28}+1.92\%$
test_step_mdp_speed[True-False-False-True-True] 51.0140μs 25.3054μs 39.5173 KOps/s 39.1193 KOps/s $\color{#35bf28}+1.02\%$
test_step_mdp_speed[True-False-False-True-False] 42.7590μs 16.8501μs 59.3467 KOps/s 58.6947 KOps/s $\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-False-False-True] 48.0390μs 14.8598μs 67.2958 KOps/s 66.1725 KOps/s $\color{#35bf28}+1.70\%$
test_step_mdp_speed[True-False-False-False-False] 41.2260μs 10.0394μs 99.6071 KOps/s 98.3348 KOps/s $\color{#35bf28}+1.29\%$
test_step_mdp_speed[False-True-True-True-True] 50.8550μs 23.9702μs 41.7185 KOps/s 40.9585 KOps/s $\color{#35bf28}+1.86\%$
test_step_mdp_speed[False-True-True-True-False] 41.7080μs 15.5990μs 64.1065 KOps/s 63.3907 KOps/s $\color{#35bf28}+1.13\%$
test_step_mdp_speed[False-True-True-False-True] 50.6640μs 16.1261μs 62.0112 KOps/s 61.5763 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[False-True-True-False-False] 43.1400μs 10.0878μs 99.1293 KOps/s 98.6607 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[False-True-False-True-True] 65.1910μs 25.8730μs 38.6504 KOps/s 38.7059 KOps/s $\color{#d91a1a}-0.14\%$
test_step_mdp_speed[False-True-False-True-False] 64.5000μs 16.9242μs 59.0871 KOps/s 58.6996 KOps/s $\color{#35bf28}+0.66\%$
test_step_mdp_speed[False-True-False-False-True] 39.9740μs 17.2545μs 57.9560 KOps/s 58.2829 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[False-True-False-False-False] 38.9920μs 11.3143μs 88.3834 KOps/s 88.3810 KOps/s $+0.00\%$
test_step_mdp_speed[False-False-True-True-True] 90.0090μs 26.3691μs 37.9232 KOps/s 36.6376 KOps/s $\color{#35bf28}+3.51\%$
test_step_mdp_speed[False-False-True-True-False] 57.1360μs 18.0055μs 55.5385 KOps/s 55.0027 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[False-False-True-False-True] 47.0970μs 17.2524μs 57.9630 KOps/s 57.4908 KOps/s $\color{#35bf28}+0.82\%$
test_step_mdp_speed[False-False-True-False-False] 56.6250μs 11.3277μs 88.2790 KOps/s 88.1337 KOps/s $\color{#35bf28}+0.16\%$
test_step_mdp_speed[False-False-False-True-True] 54.4610μs 27.2829μs 36.6530 KOps/s 35.3251 KOps/s $\color{#35bf28}+3.76\%$
test_step_mdp_speed[False-False-False-True-False] 55.8040μs 19.1503μs 52.2185 KOps/s 51.8672 KOps/s $\color{#35bf28}+0.68\%$
test_step_mdp_speed[False-False-False-False-True] 60.8140μs 18.1366μs 55.1372 KOps/s 54.7685 KOps/s $\color{#35bf28}+0.67\%$
test_step_mdp_speed[False-False-False-False-False] 56.2240μs 12.2703μs 81.4973 KOps/s 80.4569 KOps/s $\color{#35bf28}+1.29\%$
test_values[generalized_advantage_estimate-True-True] 11.0056ms 9.5661ms 104.5357 Ops/s 104.0785 Ops/s $\color{#35bf28}+0.44\%$
test_values[vec_generalized_advantage_estimate-True-True] 35.9879ms 33.5975ms 29.7641 Ops/s 29.9755 Ops/s $\color{#d91a1a}-0.71\%$
test_values[td0_return_estimate-False-False] 0.2540ms 0.1843ms 5.4269 KOps/s 5.6825 KOps/s $\color{#d91a1a}-4.50\%$
test_values[td1_return_estimate-False-False] 27.7087ms 24.3411ms 41.0827 Ops/s 42.4168 Ops/s $\color{#d91a1a}-3.15\%$
test_values[vec_td1_return_estimate-False-False] 35.0941ms 33.6865ms 29.6855 Ops/s 29.8909 Ops/s $\color{#d91a1a}-0.69\%$
test_values[td_lambda_return_estimate-True-False] 35.6119ms 34.8973ms 28.6555 Ops/s 29.1999 Ops/s $\color{#d91a1a}-1.86\%$
test_values[vec_td_lambda_return_estimate-True-False] 35.0851ms 33.6788ms 29.6923 Ops/s 29.8647 Ops/s $\color{#d91a1a}-0.58\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 11.5883ms 8.3932ms 119.1436 Ops/s 120.2800 Ops/s $\color{#d91a1a}-0.94\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.4637ms 2.0577ms 485.9837 Ops/s 512.0872 Ops/s $\textbf{\color{#d91a1a}-5.10\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.6700ms 0.3578ms 2.7946 KOps/s 2.8168 KOps/s $\color{#d91a1a}-0.79\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 50.1522ms 46.1261ms 21.6797 Ops/s 23.7528 Ops/s $\textbf{\color{#d91a1a}-8.73\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.7624ms 3.0706ms 325.6667 Ops/s 330.5338 Ops/s $\color{#d91a1a}-1.47\%$
test_dqn_speed 6.9518ms 1.3785ms 725.4361 Ops/s 743.9947 Ops/s $\color{#d91a1a}-2.49\%$
test_ddpg_speed 3.4652ms 2.7274ms 366.6543 Ops/s 376.8843 Ops/s $\color{#d91a1a}-2.71\%$
test_sac_speed 9.2401ms 8.2822ms 120.7415 Ops/s 121.6956 Ops/s $\color{#d91a1a}-0.78\%$
test_redq_speed 14.4575ms 13.1785ms 75.8814 Ops/s 76.2355 Ops/s $\color{#d91a1a}-0.46\%$
test_redq_deprec_speed 79.1337ms 14.3340ms 69.7643 Ops/s 77.2293 Ops/s $\textbf{\color{#d91a1a}-9.67\%}$
test_td3_speed 8.7955ms 8.3411ms 119.8885 Ops/s 122.8891 Ops/s $\color{#d91a1a}-2.44\%$
test_cql_speed 38.0072ms 36.4701ms 27.4197 Ops/s 27.7541 Ops/s $\color{#d91a1a}-1.21\%$
test_a2c_speed 8.8137ms 7.5616ms 132.2471 Ops/s 135.4036 Ops/s $\color{#d91a1a}-2.33\%$
test_ppo_speed 9.0147ms 8.0887ms 123.6298 Ops/s 130.0478 Ops/s $\color{#d91a1a}-4.94\%$
test_reinforce_speed 12.6824ms 7.2544ms 137.8479 Ops/s 150.2258 Ops/s $\textbf{\color{#d91a1a}-8.24\%}$
test_iql_speed 34.1959ms 33.2739ms 30.0536 Ops/s 30.4747 Ops/s $\color{#d91a1a}-1.38\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.6069ms 2.3975ms 417.1067 Ops/s 428.4256 Ops/s $\color{#d91a1a}-2.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0198ms 0.5160ms 1.9379 KOps/s 1.9775 KOps/s $\color{#d91a1a}-2.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7758ms 0.4885ms 2.0470 KOps/s 2.0873 KOps/s $\color{#d91a1a}-1.93\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.6707ms 2.3866ms 419.0128 Ops/s 429.7131 Ops/s $\color{#d91a1a}-2.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0749ms 0.4975ms 2.0099 KOps/s 1.9948 KOps/s $\color{#35bf28}+0.76\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7390ms 0.4730ms 2.1142 KOps/s 2.1168 KOps/s $\color{#d91a1a}-0.13\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.8467ms 1.3215ms 756.7024 Ops/s 792.0921 Ops/s $\color{#d91a1a}-4.47\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.7422ms 1.2413ms 805.5826 Ops/s 837.5714 Ops/s $\color{#d91a1a}-3.82\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.6665ms 2.3575ms 424.1754 Ops/s 437.7523 Ops/s $\color{#d91a1a}-3.10\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9961ms 0.6134ms 1.6303 KOps/s 1.6302 KOps/s $+0.00\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9390ms 0.5893ms 1.6970 KOps/s 1.6898 KOps/s $\color{#35bf28}+0.42\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.3547ms 2.2686ms 440.7955 Ops/s 436.1876 Ops/s $\color{#35bf28}+1.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8623ms 0.5204ms 1.9216 KOps/s 1.6244 KOps/s $\textbf{\color{#35bf28}+18.30\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3.9079ms 0.4897ms 2.0420 KOps/s 2.1212 KOps/s $\color{#d91a1a}-3.73\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 2.8777ms 2.3663ms 422.6050 Ops/s 443.9349 Ops/s $\color{#d91a1a}-4.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6147ms 0.4954ms 2.0188 KOps/s 2.0436 KOps/s $\color{#d91a1a}-1.21\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7587ms 0.4842ms 2.0653 KOps/s 2.0933 KOps/s $\color{#d91a1a}-1.34\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.6361ms 2.4056ms 415.7010 Ops/s 417.0885 Ops/s $\color{#d91a1a}-0.33\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1827ms 0.6212ms 1.6099 KOps/s 1.3480 KOps/s $\textbf{\color{#35bf28}+19.43\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8668ms 0.5973ms 1.6743 KOps/s 1.6744 KOps/s $-0.01\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 91.4688ms 7.0329ms 142.1887 Ops/s 178.9965 Ops/s $\textbf{\color{#d91a1a}-20.56\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 16.7212ms 12.3563ms 80.9303 Ops/s 81.4689 Ops/s $\color{#d91a1a}-0.66\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 4.3326ms 1.1279ms 886.5911 Ops/s 983.6858 Ops/s $\textbf{\color{#d91a1a}-9.87\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 94.5798ms 7.1364ms 140.1264 Ops/s 136.1541 Ops/s $\color{#35bf28}+2.92\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 14.8538ms 12.4244ms 80.4867 Ops/s 81.9493 Ops/s $\color{#d91a1a}-1.78\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.1833ms 1.1298ms 885.0827 Ops/s 923.0552 Ops/s $\color{#d91a1a}-4.11\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 97.1122ms 5.8168ms 171.9152 Ops/s 135.5284 Ops/s $\textbf{\color{#35bf28}+26.85\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1008s 14.3977ms 69.4556 Ops/s 79.3424 Ops/s $\textbf{\color{#d91a1a}-12.46\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.1144ms 1.3781ms 725.6343 Ops/s 706.6403 Ops/s $\color{#35bf28}+2.69\%$

Copy link

github-actions bot commented Mar 7, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1157s 0.1155s 8.6563 Ops/s 8.5871 Ops/s $\color{#35bf28}+0.81\%$
test_sync 95.2852ms 95.0553ms 10.5202 Ops/s 10.3786 Ops/s $\color{#35bf28}+1.36\%$
test_async 0.1791s 90.7689ms 11.0170 Ops/s 10.9784 Ops/s $\color{#35bf28}+0.35\%$
test_single_pixels 0.1259s 0.1254s 7.9762 Ops/s 7.9111 Ops/s $\color{#35bf28}+0.82\%$
test_sync_pixels 81.9354ms 80.7189ms 12.3887 Ops/s 12.2627 Ops/s $\color{#35bf28}+1.03\%$
test_async_pixels 0.1495s 65.3788ms 15.2955 Ops/s 15.2790 Ops/s $\color{#35bf28}+0.11\%$
test_simple 0.8925s 0.8334s 1.1998 Ops/s 1.1834 Ops/s $\color{#35bf28}+1.39\%$
test_transformed 1.1196s 1.0631s 0.9406 Ops/s 0.9267 Ops/s $\color{#35bf28}+1.50\%$
test_serial 2.4834s 2.4270s 0.4120 Ops/s 0.4018 Ops/s $\color{#35bf28}+2.54\%$
test_parallel 2.1678s 2.1075s 0.4745 Ops/s 0.4710 Ops/s $\color{#35bf28}+0.74\%$
test_step_mdp_speed[True-True-True-True-True] 0.1011ms 33.4079μs 29.9330 KOps/s 29.7217 KOps/s $\color{#35bf28}+0.71\%$
test_step_mdp_speed[True-True-True-True-False] 0.1631ms 20.1140μs 49.7167 KOps/s 49.2096 KOps/s $\color{#35bf28}+1.03\%$
test_step_mdp_speed[True-True-True-False-True] 35.9310μs 19.0081μs 52.6091 KOps/s 53.0392 KOps/s $\color{#d91a1a}-0.81\%$
test_step_mdp_speed[True-True-True-False-False] 29.7410μs 11.1241μs 89.8948 KOps/s 88.6914 KOps/s $\color{#35bf28}+1.36\%$
test_step_mdp_speed[True-True-False-True-True] 58.2610μs 34.6945μs 28.8230 KOps/s 28.4584 KOps/s $\color{#35bf28}+1.28\%$
test_step_mdp_speed[True-True-False-True-False] 46.8910μs 21.8295μs 45.8095 KOps/s 45.2638 KOps/s $\color{#35bf28}+1.21\%$
test_step_mdp_speed[True-True-False-False-True] 40.2210μs 20.8159μs 48.0402 KOps/s 47.7399 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[True-True-False-False-False] 30.6110μs 13.1026μs 76.3206 KOps/s 74.2729 KOps/s $\color{#35bf28}+2.76\%$
test_step_mdp_speed[True-False-True-True-True] 61.3910μs 37.2876μs 26.8186 KOps/s 26.6475 KOps/s $\color{#35bf28}+0.64\%$
test_step_mdp_speed[True-False-True-True-False] 45.8010μs 23.9040μs 41.8340 KOps/s 41.3479 KOps/s $\color{#35bf28}+1.18\%$
test_step_mdp_speed[True-False-True-False-True] 40.1600μs 21.2214μs 47.1222 KOps/s 48.1805 KOps/s $\color{#d91a1a}-2.20\%$
test_step_mdp_speed[True-False-True-False-False] 32.1600μs 13.2127μs 75.6846 KOps/s 75.4623 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-False-False-True-True] 58.6010μs 38.7111μs 25.8324 KOps/s 25.3458 KOps/s $\color{#35bf28}+1.92\%$
test_step_mdp_speed[True-False-False-True-False] 45.9710μs 25.6258μs 39.0231 KOps/s 38.8261 KOps/s $\color{#35bf28}+0.51\%$
test_step_mdp_speed[True-False-False-False-True] 45.9710μs 22.1634μs 45.1194 KOps/s 44.3898 KOps/s $\color{#35bf28}+1.64\%$
test_step_mdp_speed[True-False-False-False-False] 34.9000μs 15.0083μs 66.6296 KOps/s 66.4990 KOps/s $\color{#35bf28}+0.20\%$
test_step_mdp_speed[False-True-True-True-True] 64.3320μs 37.5432μs 26.6359 KOps/s 26.6353 KOps/s $+0.00\%$
test_step_mdp_speed[False-True-True-True-False] 43.9400μs 23.6812μs 42.2277 KOps/s 42.0344 KOps/s $\color{#35bf28}+0.46\%$
test_step_mdp_speed[False-True-True-False-True] 50.3410μs 24.2223μs 41.2842 KOps/s 40.3293 KOps/s $\color{#35bf28}+2.37\%$
test_step_mdp_speed[False-True-True-False-False] 36.6300μs 15.0265μs 66.5491 KOps/s 67.8245 KOps/s $\color{#d91a1a}-1.88\%$
test_step_mdp_speed[False-True-False-True-True] 64.1010μs 39.8163μs 25.1153 KOps/s 25.3613 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[False-True-False-True-False] 59.6210μs 25.6759μs 38.9470 KOps/s 38.5787 KOps/s $\color{#35bf28}+0.95\%$
test_step_mdp_speed[False-True-False-False-True] 50.9110μs 26.3309μs 37.9782 KOps/s 37.0734 KOps/s $\color{#35bf28}+2.44\%$
test_step_mdp_speed[False-True-False-False-False] 36.2410μs 16.7416μs 59.7315 KOps/s 58.4528 KOps/s $\color{#35bf28}+2.19\%$
test_step_mdp_speed[False-False-True-True-True] 67.5120μs 40.9931μs 24.3944 KOps/s 24.4408 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[False-False-True-True-False] 49.8710μs 27.4561μs 36.4217 KOps/s 35.5320 KOps/s $\color{#35bf28}+2.50\%$
test_step_mdp_speed[False-False-True-False-True] 47.5710μs 26.5084μs 37.7239 KOps/s 37.8663 KOps/s $\color{#d91a1a}-0.38\%$
test_step_mdp_speed[False-False-True-False-False] 35.6810μs 16.6428μs 60.0862 KOps/s 58.3384 KOps/s $\color{#35bf28}+3.00\%$
test_step_mdp_speed[False-False-False-True-True] 68.5210μs 41.9975μs 23.8110 KOps/s 23.7610 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[False-False-False-True-False] 52.5910μs 29.2207μs 34.2223 KOps/s 33.9195 KOps/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[False-False-False-False-True] 42.3510μs 28.0557μs 35.6434 KOps/s 35.8501 KOps/s $\color{#d91a1a}-0.58\%$
test_step_mdp_speed[False-False-False-False-False] 41.0500μs 18.4426μs 54.2222 KOps/s 53.6272 KOps/s $\color{#35bf28}+1.11\%$
test_values[generalized_advantage_estimate-True-True] 26.8668ms 26.4790ms 37.7657 Ops/s 35.4544 Ops/s $\textbf{\color{#35bf28}+6.52\%}$
test_values[vec_generalized_advantage_estimate-True-True] 86.7080ms 3.3153ms 301.6361 Ops/s 288.4627 Ops/s $\color{#35bf28}+4.57\%$
test_values[td0_return_estimate-False-False] 0.1028ms 66.0000μs 15.1515 KOps/s 14.6590 KOps/s $\color{#35bf28}+3.36\%$
test_values[td1_return_estimate-False-False] 56.9697ms 55.9616ms 17.8694 Ops/s 17.1131 Ops/s $\color{#35bf28}+4.42\%$
test_values[vec_td1_return_estimate-False-False] 2.1739ms 1.7830ms 560.8656 Ops/s 557.2825 Ops/s $\color{#35bf28}+0.64\%$
test_values[td_lambda_return_estimate-True-False] 90.9388ms 88.8689ms 11.2525 Ops/s 10.8420 Ops/s $\color{#35bf28}+3.79\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.1456ms 1.7791ms 562.0888 Ops/s 558.6854 Ops/s $\color{#35bf28}+0.61\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 26.5052ms 26.0341ms 38.4112 Ops/s 38.9473 Ops/s $\color{#d91a1a}-1.38\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9150ms 0.7204ms 1.3881 KOps/s 1.3615 KOps/s $\color{#35bf28}+1.95\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7359ms 0.6868ms 1.4560 KOps/s 1.4793 KOps/s $\color{#d91a1a}-1.58\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.6172ms 1.4701ms 680.2051 Ops/s 674.5316 Ops/s $\color{#35bf28}+0.84\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9713ms 0.6882ms 1.4530 KOps/s 1.4363 KOps/s $\color{#35bf28}+1.16\%$
test_dqn_speed 8.1140ms 1.4716ms 679.5484 Ops/s 617.5489 Ops/s $\textbf{\color{#35bf28}+10.04\%}$
test_ddpg_speed 3.0900ms 2.7521ms 363.3631 Ops/s 358.5430 Ops/s $\color{#35bf28}+1.34\%$
test_sac_speed 8.7136ms 8.1406ms 122.8409 Ops/s 121.1985 Ops/s $\color{#35bf28}+1.36\%$
test_redq_speed 11.0077ms 10.1408ms 98.6112 Ops/s 96.8959 Ops/s $\color{#35bf28}+1.77\%$
test_redq_deprec_speed 11.4759ms 11.0327ms 90.6393 Ops/s 88.4069 Ops/s $\color{#35bf28}+2.53\%$
test_td3_speed 15.8196ms 8.1666ms 122.4502 Ops/s 121.3105 Ops/s $\color{#35bf28}+0.94\%$
test_cql_speed 26.4805ms 25.2004ms 39.6819 Ops/s 39.3810 Ops/s $\color{#35bf28}+0.76\%$
test_a2c_speed 5.7727ms 5.4812ms 182.4413 Ops/s 179.5641 Ops/s $\color{#35bf28}+1.60\%$
test_ppo_speed 6.2156ms 5.8546ms 170.8062 Ops/s 169.4295 Ops/s $\color{#35bf28}+0.81\%$
test_reinforce_speed 4.7363ms 4.5203ms 221.2241 Ops/s 221.2841 Ops/s $\color{#d91a1a}-0.03\%$
test_iql_speed 19.8145ms 19.2566ms 51.9303 Ops/s 52.0642 Ops/s $\color{#d91a1a}-0.26\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0069ms 2.9055ms 344.1691 Ops/s 344.1852 Ops/s $-0.00\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.2394ms 0.5401ms 1.8516 KOps/s 1.8222 KOps/s $\color{#35bf28}+1.61\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6901ms 0.5154ms 1.9404 KOps/s 1.9292 KOps/s $\color{#35bf28}+0.58\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.1000ms 2.9206ms 342.3975 Ops/s 341.8731 Ops/s $\color{#35bf28}+0.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.2629ms 0.5335ms 1.8743 KOps/s 1.8384 KOps/s $\color{#35bf28}+1.95\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6632ms 0.5077ms 1.9696 KOps/s 1.9450 KOps/s $\color{#35bf28}+1.27\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.6433ms 1.5370ms 650.6259 Ops/s 646.4109 Ops/s $\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.6269ms 1.4603ms 684.7912 Ops/s 670.2791 Ops/s $\color{#35bf28}+2.17\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1372ms 2.9977ms 333.5885 Ops/s 331.3470 Ops/s $\color{#35bf28}+0.68\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9280ms 0.6666ms 1.5001 KOps/s 1.4844 KOps/s $\color{#35bf28}+1.06\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.1082s 0.7242ms 1.3808 KOps/s 1.5450 KOps/s $\textbf{\color{#d91a1a}-10.63\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 3.0105ms 2.8941ms 345.5317 Ops/s 345.8247 Ops/s $\color{#d91a1a}-0.08\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6600ms 0.5412ms 1.8478 KOps/s 1.8335 KOps/s $\color{#35bf28}+0.78\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.8975ms 0.5238ms 1.9092 KOps/s 1.5743 KOps/s $\textbf{\color{#35bf28}+21.27\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 3.2205ms 2.9377ms 340.4044 Ops/s 342.0949 Ops/s $\color{#d91a1a}-0.49\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.1057s 0.6774ms 1.4762 KOps/s 1.8499 KOps/s $\textbf{\color{#d91a1a}-20.20\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.6573ms 0.5096ms 1.9622 KOps/s 1.9539 KOps/s $\color{#35bf28}+0.43\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 3.1255ms 3.0466ms 328.2343 Ops/s 331.2959 Ops/s $\color{#d91a1a}-0.92\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.7741ms 0.6684ms 1.4962 KOps/s 1.4919 KOps/s $\color{#35bf28}+0.29\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 4.9604ms 0.6444ms 1.5519 KOps/s 1.2724 KOps/s $\textbf{\color{#35bf28}+21.97\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1070s 8.7714ms 114.0063 Ops/s 150.5914 Ops/s $\textbf{\color{#d91a1a}-24.29\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.2516ms 15.0581ms 66.4095 Ops/s 63.0597 Ops/s $\textbf{\color{#35bf28}+5.31\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.9368ms 1.0688ms 935.6512 Ops/s 934.5577 Ops/s $\color{#35bf28}+0.12\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 99.6234ms 6.6950ms 149.3659 Ops/s 116.7187 Ops/s $\textbf{\color{#35bf28}+27.97\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.2661ms 14.9544ms 66.8699 Ops/s 63.8407 Ops/s $\color{#35bf28}+4.74\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.1498ms 1.1237ms 889.9106 Ops/s 929.8765 Ops/s $\color{#d91a1a}-4.30\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1007s 8.9784ms 111.3790 Ops/s 140.3155 Ops/s $\textbf{\color{#d91a1a}-20.62\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 17.6600ms 15.3836ms 65.0042 Ops/s 62.6849 Ops/s $\color{#35bf28}+3.70\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.4057ms 1.4135ms 707.4393 Ops/s 696.8788 Ops/s $\color{#35bf28}+1.52\%$

# like a non-zero through stacking.
def tuple_to_tensor(traj_idx, lengths=lengths):
if isinstance(traj_idx, tuple):
traj_idx = torch.arange(len(storage), device=lengths.device).view(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared this with np.ravel_multi_index using

torch.as_tensor(np.ravel_multi_index(tuple(idx.numpy() for idx in unravelled), shape))

Rumtimes are roughly equivalent, with a slight advantage for the numpy version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way slower, about 2.5x the numpy solution

def ravel_multi_index(x, shape):
    out = 0
    shape_modif = np.cumprod(list(reversed((*shape, 1))))
    for i, idx in enumerate(reversed(x)):
        out += idx * shape_modif[i]
    return out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more vectorized version still underperforms arange and numpy

def ravel_multi_index(x, shape):
    out = 0
    shape_modif = torch.flipud(
        torch.cumprod(torch.tensor(list(reversed((*shape[1:], 1)))), 0)
    ).unsqueeze(0)
    return (torch.stack(x, -1) * shape_modif).sum(-1)

@vmoens vmoens merged commit 07eb02d into main Mar 7, 2024
61 of 67 checks passed
@vmoens vmoens deleted the fix-ndim-samples branch March 7, 2024 12:01
vmoens added a commit that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants