Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Better names handling in LazyStackTD #482

Merged
merged 3 commits into from
Jul 9, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 8, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 8, 2023
@github-actions
Copy link

github-actions bot commented Jul 9, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 109. Improved: $\large\color{#35bf28}36$. Worsened: $\large\color{#d91a1a}1$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.5000μs 19.7140μs 50.7254 KOps/s 50.3945 KOps/s $\color{#35bf28}+0.66\%$
test_plain_set_stack_nested 0.2243ms 0.1808ms 5.5304 KOps/s 5.5104 KOps/s $\color{#35bf28}+0.36\%$
test_plain_set_nested_inplace 88.8010μs 23.1873μs 43.1271 KOps/s 42.3931 KOps/s $\color{#35bf28}+1.73\%$
test_plain_set_stack_nested_inplace 0.2481ms 0.2138ms 4.6773 KOps/s 4.5585 KOps/s $\color{#35bf28}+2.61\%$
test_items 43.1010μs 3.5762μs 279.6268 KOps/s 305.0550 KOps/s $\textbf{\color{#d91a1a}-8.34\%}$
test_items_nested 0.3706ms 0.3465ms 2.8858 KOps/s 2.6327 KOps/s $\textbf{\color{#35bf28}+9.61\%}$
test_items_nested_locked 2.0413ms 0.3611ms 2.7696 KOps/s 2.8296 KOps/s $\color{#d91a1a}-2.12\%$
test_items_nested_leaf 0.2380ms 0.2107ms 4.7450 KOps/s 4.6471 KOps/s $\color{#35bf28}+2.11\%$
test_items_stack_nested 1.9749ms 1.9106ms 523.3876 Ops/s 341.6974 Ops/s $\textbf{\color{#35bf28}+53.17\%}$
test_items_stack_nested_leaf 1.8096ms 1.7346ms 576.5040 Ops/s 361.6908 Ops/s $\textbf{\color{#35bf28}+59.39\%}$
test_items_stack_nested_locked 1.0454ms 0.9373ms 1.0669 KOps/s 810.3591 Ops/s $\textbf{\color{#35bf28}+31.65\%}$
test_keys 24.5000μs 4.8431μs 206.4808 KOps/s 208.8662 KOps/s $\color{#d91a1a}-1.14\%$
test_keys_nested 1.9769ms 0.1717ms 5.8242 KOps/s 5.6567 KOps/s $\color{#35bf28}+2.96\%$
test_keys_nested_locked 0.1942ms 0.1701ms 5.8802 KOps/s 5.7529 KOps/s $\color{#35bf28}+2.21\%$
test_keys_nested_leaf 0.2951ms 0.1641ms 6.0942 KOps/s 5.5703 KOps/s $\textbf{\color{#35bf28}+9.41\%}$
test_keys_stack_nested 1.8871ms 1.6961ms 589.6019 Ops/s 364.8520 Ops/s $\textbf{\color{#35bf28}+61.60\%}$
test_keys_stack_nested_leaf 2.0113ms 1.7018ms 587.6292 Ops/s 365.6810 Ops/s $\textbf{\color{#35bf28}+60.69\%}$
test_keys_stack_nested_locked 0.8567ms 0.7249ms 1.3794 KOps/s 981.6144 Ops/s $\textbf{\color{#35bf28}+40.53\%}$
test_values 8.0000μs 1.2228μs 817.7731 KOps/s 701.4518 KOps/s $\textbf{\color{#35bf28}+16.58\%}$
test_values_nested 93.4010μs 64.0652μs 15.6091 KOps/s 15.1843 KOps/s $\color{#35bf28}+2.80\%$
test_values_nested_locked 95.2020μs 63.8259μs 15.6676 KOps/s 15.1973 KOps/s $\color{#35bf28}+3.09\%$
test_values_nested_leaf 0.1231ms 56.4928μs 17.7014 KOps/s 17.2581 KOps/s $\color{#35bf28}+2.57\%$
test_values_stack_nested 1.5877ms 1.5276ms 654.6207 Ops/s 392.4694 Ops/s $\textbf{\color{#35bf28}+66.80\%}$
test_values_stack_nested_leaf 1.6399ms 1.5217ms 657.1713 Ops/s 393.7564 Ops/s $\textbf{\color{#35bf28}+66.90\%}$
test_values_stack_nested_locked 0.8418ms 0.6215ms 1.6090 KOps/s 1.1052 KOps/s $\textbf{\color{#35bf28}+45.58\%}$
test_membership 19.6000μs 1.7871μs 559.5607 KOps/s 532.8338 KOps/s $\textbf{\color{#35bf28}+5.02\%}$
test_membership_nested 26.6000μs 3.5420μs 282.3254 KOps/s 271.1346 KOps/s $\color{#35bf28}+4.13\%$
test_membership_nested_leaf 26.9000μs 3.5024μs 285.5180 KOps/s 271.2350 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_membership_stacked_nested 32.5000μs 13.5651μs 73.7188 KOps/s 70.2036 KOps/s $\textbf{\color{#35bf28}+5.01\%}$
test_membership_stacked_nested_leaf 64.0000μs 13.5508μs 73.7963 KOps/s 70.0473 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_membership_nested_last 31.5010μs 7.2711μs 137.5307 KOps/s 131.9665 KOps/s $\color{#35bf28}+4.22\%$
test_membership_nested_leaf_last 34.0000μs 7.2657μs 137.6323 KOps/s 130.8132 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_membership_stacked_nested_last 0.2452ms 0.2159ms 4.6325 KOps/s 4.4438 KOps/s $\color{#35bf28}+4.25\%$
test_membership_stacked_nested_leaf_last 38.5010μs 16.0368μs 62.3566 KOps/s 59.0942 KOps/s $\textbf{\color{#35bf28}+5.52\%}$
test_nested_getleaf 73.1010μs 15.1142μs 66.1629 KOps/s 65.0607 KOps/s $\color{#35bf28}+1.69\%$
test_nested_get 0.1943ms 14.4759μs 69.0806 KOps/s 68.9992 KOps/s $\color{#35bf28}+0.12\%$
test_stacked_getleaf 0.9605ms 0.8451ms 1.1833 KOps/s 707.1999 Ops/s $\textbf{\color{#35bf28}+67.32\%}$
test_stacked_get 0.8900ms 0.8101ms 1.2345 KOps/s 739.4170 Ops/s $\textbf{\color{#35bf28}+66.95\%}$
test_nested_getitemleaf 48.3000μs 15.3712μs 65.0568 KOps/s 64.1006 KOps/s $\color{#35bf28}+1.49\%$
test_nested_getitem 43.8000μs 14.5888μs 68.5460 KOps/s 67.4643 KOps/s $\color{#35bf28}+1.60\%$
test_stacked_getitemleaf 1.1439ms 0.8508ms 1.1754 KOps/s 700.5907 Ops/s $\textbf{\color{#35bf28}+67.77\%}$
test_stacked_getitem 0.9823ms 0.8299ms 1.2050 KOps/s 734.4859 Ops/s $\textbf{\color{#35bf28}+64.05\%}$
test_lock_nested 74.1324ms 1.4760ms 677.5128 Ops/s 707.2141 Ops/s $\color{#d91a1a}-4.20\%$
test_lock_stack_nested 87.2093ms 16.2647ms 61.4828 Ops/s 61.3237 Ops/s $\color{#35bf28}+0.26\%$
test_unlock_nested 69.1038ms 1.4795ms 675.8865 Ops/s 656.5249 Ops/s $\color{#35bf28}+2.95\%$
test_unlock_stack_nested 0.1022s 17.0166ms 58.7662 Ops/s 59.9334 Ops/s $\color{#d91a1a}-1.95\%$
test_flatten_speed 1.1360ms 1.0124ms 987.7036 Ops/s 967.1909 Ops/s $\color{#35bf28}+2.12\%$
test_unflatten_speed 1.9637ms 1.8014ms 555.1329 Ops/s 519.7613 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_common_ops 1.3760ms 1.0909ms 916.6547 Ops/s 911.6756 Ops/s $\color{#35bf28}+0.55\%$
test_creation 37.9010μs 6.1784μs 161.8553 KOps/s 162.7801 KOps/s $\color{#d91a1a}-0.57\%$
test_creation_empty 31.5010μs 13.9186μs 71.8461 KOps/s 70.1621 KOps/s $\color{#35bf28}+2.40\%$
test_creation_nested_1 62.4010μs 24.8109μs 40.3048 KOps/s 39.0455 KOps/s $\color{#35bf28}+3.23\%$
test_creation_nested_2 68.8020μs 27.5208μs 36.3362 KOps/s 36.4954 KOps/s $\color{#d91a1a}-0.44\%$
test_clone 0.1992ms 23.8160μs 41.9885 KOps/s 40.4078 KOps/s $\color{#35bf28}+3.91\%$
test_getitem[int] 0.1047ms 30.1904μs 33.1231 KOps/s 33.0110 KOps/s $\color{#35bf28}+0.34\%$
test_getitem[slice_int] 98.3010μs 64.0166μs 15.6209 KOps/s 15.5474 KOps/s $\color{#35bf28}+0.47\%$
test_getitem[range] 0.1261ms 67.7193μs 14.7668 KOps/s 15.0698 KOps/s $\color{#d91a1a}-2.01\%$
test_getitem[tuple] 0.1653ms 59.0979μs 16.9211 KOps/s 16.7552 KOps/s $\color{#35bf28}+0.99\%$
test_getitem[list] 96.3020μs 58.2089μs 17.1795 KOps/s 16.9407 KOps/s $\color{#35bf28}+1.41\%$
test_setitem_dim[int] 61.1000μs 32.4346μs 30.8312 KOps/s 30.1666 KOps/s $\color{#35bf28}+2.20\%$
test_setitem_dim[slice_int] 0.1008ms 66.6406μs 15.0059 KOps/s 14.6818 KOps/s $\color{#35bf28}+2.21\%$
test_setitem_dim[range] 0.1091ms 63.9477μs 15.6378 KOps/s 15.4247 KOps/s $\color{#35bf28}+1.38\%$
test_setitem_dim[tuple] 98.8020μs 58.9954μs 16.9505 KOps/s 16.5688 KOps/s $\color{#35bf28}+2.30\%$
test_setitem 0.2225ms 31.6545μs 31.5911 KOps/s 30.0681 KOps/s $\textbf{\color{#35bf28}+5.07\%}$
test_set 0.1979ms 30.2475μs 33.0605 KOps/s 31.4474 KOps/s $\textbf{\color{#35bf28}+5.13\%}$
test_set_shared 0.4165ms 0.1767ms 5.6604 KOps/s 5.6229 KOps/s $\color{#35bf28}+0.67\%$
test_update 0.2259ms 34.2443μs 29.2020 KOps/s 28.0682 KOps/s $\color{#35bf28}+4.04\%$
test_update_nested 0.2343ms 50.6550μs 19.7414 KOps/s 18.7928 KOps/s $\textbf{\color{#35bf28}+5.05\%}$
test_set_nested 0.1784ms 33.5755μs 29.7836 KOps/s 28.2965 KOps/s $\textbf{\color{#35bf28}+5.26\%}$
test_set_nested_new 0.2318ms 51.7611μs 19.3195 KOps/s 18.7101 KOps/s $\color{#35bf28}+3.26\%$
test_select 2.3049ms 94.9993μs 10.5264 KOps/s 10.2833 KOps/s $\color{#35bf28}+2.36\%$
test_unbind_speed 0.7282ms 0.6274ms 1.5938 KOps/s 1.5888 KOps/s $\color{#35bf28}+0.32\%$
test_unbind_speed_stack0 3.4515ms 3.0928ms 323.3364 Ops/s 255.1125 Ops/s $\textbf{\color{#35bf28}+26.74\%}$
test_unbind_speed_stack1 3.4151μs 0.4310μs 2.3202 MOps/s 2.1105 MOps/s $\textbf{\color{#35bf28}+9.93\%}$
test_creation[device0] 0.5458ms 0.4384ms 2.2808 KOps/s 2.2822 KOps/s $\color{#d91a1a}-0.06\%$
test_creation_from_tensor 0.5951ms 0.4908ms 2.0373 KOps/s 2.0316 KOps/s $\color{#35bf28}+0.28\%$
test_add_one[memmap_tensor0] 1.3587ms 32.2044μs 31.0516 KOps/s 30.2074 KOps/s $\color{#35bf28}+2.79\%$
test_contiguous[memmap_tensor0] 0.1074ms 8.7092μs 114.8205 KOps/s 109.2337 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_stack[memmap_tensor0] 67.1010μs 26.2219μs 38.1361 KOps/s 37.3146 KOps/s $\color{#35bf28}+2.20\%$
test_memmaptd_index 0.3304ms 0.2749ms 3.6378 KOps/s 3.5056 KOps/s $\color{#35bf28}+3.77\%$
test_memmaptd_index_astensor 1.3043ms 1.1793ms 847.9714 Ops/s 819.5922 Ops/s $\color{#35bf28}+3.46\%$
test_memmaptd_index_op 2.5272ms 2.3841ms 419.4451 Ops/s 410.5599 Ops/s $\color{#35bf28}+2.16\%$
test_reshape_pytree 0.1056ms 36.2998μs 27.5483 KOps/s 26.9232 KOps/s $\color{#35bf28}+2.32\%$
test_reshape_td 83.4010μs 43.8861μs 22.7863 KOps/s 22.4789 KOps/s $\color{#35bf28}+1.37\%$
test_view_pytree 0.1380ms 33.7766μs 29.6063 KOps/s 28.7544 KOps/s $\color{#35bf28}+2.96\%$
test_view_td 30.5010μs 8.4904μs 117.7807 KOps/s 115.9792 KOps/s $\color{#35bf28}+1.55\%$
test_unbind_pytree 77.5010μs 37.3945μs 26.7419 KOps/s 26.0471 KOps/s $\color{#35bf28}+2.67\%$
test_unbind_td 0.2053ms 93.8288μs 10.6577 KOps/s 10.7499 KOps/s $\color{#d91a1a}-0.86\%$
test_split_pytree 89.3010μs 42.9006μs 23.3097 KOps/s 22.1641 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_split_td 0.8901ms 0.1123ms 8.9054 KOps/s 8.5948 KOps/s $\color{#35bf28}+3.61\%$
test_add_pytree 0.1099ms 45.7420μs 21.8617 KOps/s 21.2855 KOps/s $\color{#35bf28}+2.71\%$
test_add_td 0.1158ms 74.8637μs 13.3576 KOps/s 13.7557 KOps/s $\color{#d91a1a}-2.89\%$
test_distributed 23.0010μs 8.4238μs 118.7117 KOps/s 115.6390 KOps/s $\color{#35bf28}+2.66\%$
test_tdmodule 0.2057ms 28.2387μs 35.4124 KOps/s 35.1451 KOps/s $\color{#35bf28}+0.76\%$
test_tdmodule_dispatch 0.3021ms 54.7005μs 18.2814 KOps/s 8.0924 KOps/s $\textbf{\color{#35bf28}+125.91\%}$
test_tdseq 0.6030ms 33.1476μs 30.1681 KOps/s 29.5710 KOps/s $\color{#35bf28}+2.02\%$
test_tdseq_dispatch 0.2172ms 66.8924μs 14.9494 KOps/s 14.6476 KOps/s $\color{#35bf28}+2.06\%$
test_instantiation_functorch 2.0620ms 1.5680ms 637.7673 Ops/s 615.6777 Ops/s $\color{#35bf28}+3.59\%$
test_instantiation_td 2.0919ms 1.3133ms 761.4605 Ops/s 733.6300 Ops/s $\color{#35bf28}+3.79\%$
test_exec_functorch 0.2425ms 0.1811ms 5.5209 KOps/s 5.3457 KOps/s $\color{#35bf28}+3.28\%$
test_exec_td 0.2710ms 0.1728ms 5.7882 KOps/s 5.6527 KOps/s $\color{#35bf28}+2.40\%$
test_vmap_mlp_speed[True-True] 1.3172ms 1.1605ms 861.6862 Ops/s 610.4136 Ops/s $\textbf{\color{#35bf28}+41.16\%}$
test_vmap_mlp_speed[True-False] 1.0522ms 0.5942ms 1.6829 KOps/s 1.6981 KOps/s $\color{#d91a1a}-0.90\%$
test_vmap_mlp_speed[False-True] 1.9686ms 0.9814ms 1.0189 KOps/s 728.5250 Ops/s $\textbf{\color{#35bf28}+39.86\%}$
test_vmap_mlp_speed[False-False] 11.1605ms 0.4492ms 2.2259 KOps/s 2.2865 KOps/s $\color{#d91a1a}-2.65\%$
test_vmap_transformer_speed[True-True] 14.7679ms 13.8752ms 72.0710 Ops/s 53.2357 Ops/s $\textbf{\color{#35bf28}+35.38\%}$
test_vmap_transformer_speed[True-False] 9.4918ms 8.5114ms 117.4892 Ops/s 120.2111 Ops/s $\color{#d91a1a}-2.26\%$
test_vmap_transformer_speed[False-True] 13.2990ms 12.4904ms 80.0612 Ops/s 54.4092 Ops/s $\textbf{\color{#35bf28}+47.15\%}$
test_vmap_transformer_speed[False-False] 8.9025ms 7.9833ms 125.2615 Ops/s 121.7401 Ops/s $\color{#35bf28}+2.89\%$

@vmoens
Copy link
Contributor Author

vmoens commented Jul 9, 2023

Merging as it solves a bug formerly hidden in RL

@vmoens vmoens merged commit a73040f into main Jul 9, 2023
vmoens added a commit to pytorch/rl that referenced this pull request Jul 9, 2023
@vmoens vmoens deleted the better_names_for_lazystack branch October 21, 2024 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants