[Refactor] Faster and more generic multi-agent nets #1921

vmoens · 2024-02-16T21:01:24Z

cc @matteobettini @kfu02

TODO:

Account for non-initialized params
Have TD.from_modules work with lazy params (the issue here being that the list of parameters will change between before and after the first call is made).

pytorch-bot · 2024-02-16T21:01:28Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1921

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (14 Unrelated Failures)

As of commit 030d0dd with merge base 799f939 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Examples Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t 08751a687847001aabc1da3a15ac4cea359dce24267f7a0ca16a31a35be21fa3 /exec failed with exit code 1
Habitat Tests on Linux / tests (3.9, 11.6) / linux-job (gh)
RuntimeError: Command docker exec -t 153b3ee3687b9e7eb70eb40301359d7d7f14e2c1d2a1e8829608e62df7bdc93f /exec failed with exit code 139
Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh)
test/test_rb.py::TestStorages::test_storage_dumps_loads[True-pytree-LazyTensorStorage-device_data0]
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh)
test/test_rb.py::TestStorages::test_storage_dumps_loads[True-pytree-LazyTensorStorage-device_data0]
Unit-tests on Windows / unittests-cpu / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Unit-tests on Windows / unittests-gpu / windows-job (gh)
##[error]The operation was canceled.

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on Linux / tests-cpu (3.8) / linux-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]
Unit-tests on Linux / tests-cpu (3.9) / linux-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]
Unit-tests on Linux / tests-gpu (3.8, 12.1) / linux-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]
Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]
Unit-tests on Linux / tests-optdeps (3.9, 12.1) / linux-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]
Unit-tests on Linux / tests-stable-gpu (3.8, 11.8) / linux-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]
Unit-tests on MacOS CPU / tests (3.11) / macos-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]
Unit-tests on MacOS CPU / tests (3.8) / macos-job (gh)
test/test_env.py::TestModelBasedEnvBase::test_mb_env_batch_lock[device0]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2024-02-16T21:08:47Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	62.5112ms	61.8857ms	16.1588 Ops/s	15.4966 Ops/s	$\color{#35bf28}+4.27\%$
test_sync	38.1996ms	34.0074ms	29.4054 Ops/s	29.3890 Ops/s	$\color{#35bf28}+0.06\%$
test_async	45.9222ms	31.6040ms	31.6416 Ops/s	33.4017 Ops/s	$\textbf{\color{#d91a1a}-5.27\%}$
test_simple	0.4893s	0.4354s	2.2969 Ops/s	2.2675 Ops/s	$\color{#35bf28}+1.30\%$
test_transformed	0.6400s	0.5867s	1.7044 Ops/s	1.6920 Ops/s	$\color{#35bf28}+0.73\%$
test_serial	1.4752s	1.4270s	0.7008 Ops/s	0.6966 Ops/s	$\color{#35bf28}+0.60\%$
test_parallel	1.4908s	1.4378s	0.6955 Ops/s	0.7061 Ops/s	$\color{#d91a1a}-1.49\%$
test_step_mdp_speed[True-True-True-True-True]	0.1542ms	21.3359μs	46.8693 KOps/s	46.9593 KOps/s	$\color{#d91a1a}-0.19\%$
test_step_mdp_speed[True-True-True-True-False]	45.7350μs	13.0406μs	76.6838 KOps/s	77.3673 KOps/s	$\color{#d91a1a}-0.88\%$
test_step_mdp_speed[True-True-True-False-True]	42.6000μs	12.5365μs	79.7673 KOps/s	78.8723 KOps/s	$\color{#35bf28}+1.13\%$
test_step_mdp_speed[True-True-True-False-False]	41.0170μs	7.5323μs	132.7619 KOps/s	131.3094 KOps/s	$\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-True-False-True-True]	45.0240μs	22.6592μs	44.1323 KOps/s	43.4680 KOps/s	$\color{#35bf28}+1.53\%$
test_step_mdp_speed[True-True-False-True-False]	48.2300μs	14.1607μs	70.6179 KOps/s	69.5814 KOps/s	$\color{#35bf28}+1.49\%$
test_step_mdp_speed[True-True-False-False-True]	38.7220μs	13.7885μs	72.5244 KOps/s	72.4843 KOps/s	$\color{#35bf28}+0.06\%$
test_step_mdp_speed[True-True-False-False-False]	48.2400μs	8.7880μs	113.7916 KOps/s	112.9430 KOps/s	$\color{#35bf28}+0.75\%$
test_step_mdp_speed[True-False-True-True-True]	49.3320μs	24.3310μs	41.0998 KOps/s	41.2828 KOps/s	$\color{#d91a1a}-0.44\%$
test_step_mdp_speed[True-False-True-True-False]	52.5380μs	15.6520μs	63.8896 KOps/s	63.1895 KOps/s	$\color{#35bf28}+1.11\%$
test_step_mdp_speed[True-False-True-False-True]	55.1830μs	13.6556μs	73.2302 KOps/s	71.5723 KOps/s	$\color{#35bf28}+2.32\%$
test_step_mdp_speed[True-False-True-False-False]	46.0360μs	8.7747μs	113.9646 KOps/s	111.0286 KOps/s	$\color{#35bf28}+2.64\%$
test_step_mdp_speed[True-False-False-True-True]	61.3340μs	25.3292μs	39.4802 KOps/s	39.6050 KOps/s	$\color{#d91a1a}-0.32\%$
test_step_mdp_speed[True-False-False-True-False]	56.4050μs	16.8540μs	59.3330 KOps/s	59.2923 KOps/s	$\color{#35bf28}+0.07\%$
test_step_mdp_speed[True-False-False-False-True]	47.8290μs	14.8159μs	67.4950 KOps/s	66.3243 KOps/s	$\color{#35bf28}+1.77\%$
test_step_mdp_speed[True-False-False-False-False]	34.1330μs	10.0487μs	99.5153 KOps/s	98.8897 KOps/s	$\color{#35bf28}+0.63\%$
test_step_mdp_speed[False-True-True-True-True]	87.6340μs	24.3215μs	41.1160 KOps/s	41.4181 KOps/s	$\color{#d91a1a}-0.73\%$
test_step_mdp_speed[False-True-True-True-False]	51.2860μs	15.6128μs	64.0502 KOps/s	63.9903 KOps/s	$\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-True-True-False-True]	51.9070μs	16.1272μs	62.0069 KOps/s	62.3443 KOps/s	$\color{#d91a1a}-0.54\%$
test_step_mdp_speed[False-True-True-False-False]	60.3230μs	10.0896μs	99.1118 KOps/s	100.3243 KOps/s	$\color{#d91a1a}-1.21\%$
test_step_mdp_speed[False-True-False-True-True]	40.3860μs	25.5884μs	39.0801 KOps/s	38.8752 KOps/s	$\color{#35bf28}+0.53\%$
test_step_mdp_speed[False-True-False-True-False]	45.9250μs	16.9135μs	59.1243 KOps/s	59.8775 KOps/s	$\color{#d91a1a}-1.26\%$
test_step_mdp_speed[False-True-False-False-True]	43.6020μs	17.2918μs	57.8309 KOps/s	58.1520 KOps/s	$\color{#d91a1a}-0.55\%$
test_step_mdp_speed[False-True-False-False-False]	58.1390μs	11.3058μs	88.4501 KOps/s	88.6783 KOps/s	$\color{#d91a1a}-0.26\%$
test_step_mdp_speed[False-False-True-True-True]	59.3400μs	26.7710μs	37.3538 KOps/s	37.4845 KOps/s	$\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-False-True-True-False]	48.0590μs	18.2981μs	54.6505 KOps/s	54.8705 KOps/s	$\color{#d91a1a}-0.40\%$
test_step_mdp_speed[False-False-True-False-True]	47.0180μs	17.3663μs	57.5829 KOps/s	58.8495 KOps/s	$\color{#d91a1a}-2.15\%$
test_step_mdp_speed[False-False-True-False-False]	44.2230μs	11.2854μs	88.6097 KOps/s	88.9063 KOps/s	$\color{#d91a1a}-0.33\%$
test_step_mdp_speed[False-False-False-True-True]	85.0180μs	27.7796μs	35.9977 KOps/s	36.3071 KOps/s	$\color{#d91a1a}-0.85\%$
test_step_mdp_speed[False-False-False-True-False]	54.6720μs	19.2423μs	51.9690 KOps/s	52.5437 KOps/s	$\color{#d91a1a}-1.09\%$
test_step_mdp_speed[False-False-False-False-True]	41.7570μs	18.0820μs	55.3036 KOps/s	55.1335 KOps/s	$\color{#35bf28}+0.31\%$
test_step_mdp_speed[False-False-False-False-False]	56.5660μs	12.3468μs	80.9929 KOps/s	80.7951 KOps/s	$\color{#35bf28}+0.24\%$
test_values[generalized_advantage_estimate-True-True]	9.4978ms	9.2944ms	107.5916 Ops/s	106.0318 Ops/s	$\color{#35bf28}+1.47\%$
test_values[vec_generalized_advantage_estimate-True-True]	37.3751ms	35.4342ms	28.2213 Ops/s	28.4637 Ops/s	$\color{#d91a1a}-0.85\%$
test_values[td0_return_estimate-False-False]	0.2141ms	0.1817ms	5.5036 KOps/s	5.9081 KOps/s	$\textbf{\color{#d91a1a}-6.85\%}$
test_values[td1_return_estimate-False-False]	25.9423ms	23.2916ms	42.9339 Ops/s	42.5047 Ops/s	$\color{#35bf28}+1.01\%$
test_values[vec_td1_return_estimate-False-False]	39.5866ms	35.6245ms	28.0706 Ops/s	26.6271 Ops/s	$\textbf{\color{#35bf28}+5.42\%}$
test_values[td_lambda_return_estimate-True-False]	36.0859ms	33.5557ms	29.8012 Ops/s	29.0833 Ops/s	$\color{#35bf28}+2.47\%$
test_values[vec_td_lambda_return_estimate-True-False]	39.9226ms	35.4771ms	28.1872 Ops/s	27.9420 Ops/s	$\color{#35bf28}+0.88\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	10.2543ms	8.1220ms	123.1224 Ops/s	121.1597 Ops/s	$\color{#35bf28}+1.62\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	2.6590ms	1.9837ms	504.1169 Ops/s	492.9592 Ops/s	$\color{#35bf28}+2.26\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.5679ms	0.3477ms	2.8758 KOps/s	2.8004 KOps/s	$\color{#35bf28}+2.69\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	48.5830ms	45.0959ms	22.1750 Ops/s	20.9736 Ops/s	$\textbf{\color{#35bf28}+5.73\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	3.6315ms	3.0373ms	329.2428 Ops/s	326.5696 Ops/s	$\color{#35bf28}+0.82\%$
test_dqn_speed	71.3281ms	1.5295ms	653.7952 Ops/s	706.7451 Ops/s	$\textbf{\color{#d91a1a}-7.49\%}$
test_ddpg_speed	3.0161ms	2.8495ms	350.9411 Ops/s	355.4388 Ops/s	$\color{#d91a1a}-1.27\%$
test_sac_speed	10.0665ms	8.4885ms	117.8069 Ops/s	118.3113 Ops/s	$\color{#d91a1a}-0.43\%$
test_redq_speed	15.1853ms	13.5094ms	74.0225 Ops/s	74.7511 Ops/s	$\color{#d91a1a}-0.97\%$
test_redq_deprec_speed	14.7894ms	13.6044ms	73.5056 Ops/s	73.9770 Ops/s	$\color{#d91a1a}-0.64\%$
test_td3_speed	9.1014ms	8.4831ms	117.8813 Ops/s	116.9084 Ops/s	$\color{#35bf28}+0.83\%$
test_cql_speed	38.4061ms	36.9620ms	27.0548 Ops/s	26.7744 Ops/s	$\color{#35bf28}+1.05\%$
test_a2c_speed	7.9608ms	7.4434ms	134.3466 Ops/s	131.7123 Ops/s	$\color{#35bf28}+2.00\%$
test_ppo_speed	8.7415ms	7.7662ms	128.7624 Ops/s	123.1680 Ops/s	$\color{#35bf28}+4.54\%$
test_reinforce_speed	7.8257ms	6.6934ms	149.4007 Ops/s	146.5871 Ops/s	$\color{#35bf28}+1.92\%$
test_iql_speed	34.7747ms	33.0926ms	30.2182 Ops/s	27.5289 Ops/s	$\textbf{\color{#35bf28}+9.77\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.1669ms	2.9088ms	343.7802 Ops/s	337.8755 Ops/s	$\color{#35bf28}+1.75\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.8041ms	0.5168ms	1.9352 KOps/s	1.9013 KOps/s	$\color{#35bf28}+1.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.7203ms	0.4920ms	2.0325 KOps/s	1.7640 KOps/s	$\textbf{\color{#35bf28}+15.22\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.3366ms	3.0188ms	331.2587 Ops/s	329.2647 Ops/s	$\color{#35bf28}+0.61\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.8343ms	0.5087ms	1.9658 KOps/s	1.9613 KOps/s	$\color{#35bf28}+0.23\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.8723ms	0.4944ms	2.0227 KOps/s	2.0945 KOps/s	$\color{#d91a1a}-3.43\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	4.3268ms	3.0176ms	331.3874 Ops/s	334.4820 Ops/s	$\color{#d91a1a}-0.93\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.1657ms	0.6424ms	1.5566 KOps/s	1.5441 KOps/s	$\color{#35bf28}+0.81\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.9610ms	0.6132ms	1.6307 KOps/s	1.6413 KOps/s	$\color{#d91a1a}-0.65\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	4.5387ms	2.9156ms	342.9884 Ops/s	355.9943 Ops/s	$\color{#d91a1a}-3.65\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.6352ms	0.5181ms	1.9301 KOps/s	1.9613 KOps/s	$\color{#d91a1a}-1.59\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.8116ms	0.4942ms	2.0236 KOps/s	2.0610 KOps/s	$\color{#d91a1a}-1.82\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.4452ms	3.0216ms	330.9462 Ops/s	347.6243 Ops/s	$\color{#d91a1a}-4.80\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6180ms	0.5168ms	1.9350 KOps/s	1.9655 KOps/s	$\color{#d91a1a}-1.55\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.8366ms	0.4940ms	2.0244 KOps/s	2.0849 KOps/s	$\color{#d91a1a}-2.90\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.5454ms	3.1096ms	321.5820 Ops/s	332.9705 Ops/s	$\color{#d91a1a}-3.42\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.9910ms	0.6424ms	1.5567 KOps/s	1.5637 KOps/s	$\color{#d91a1a}-0.45\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.9332ms	0.6118ms	1.6345 KOps/s	1.6476 KOps/s	$\color{#d91a1a}-0.79\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1035s	7.8996ms	126.5884 Ops/s	106.6373 Ops/s	$\textbf{\color{#35bf28}+18.71\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	15.8632ms	13.5575ms	73.7600 Ops/s	75.0293 Ops/s	$\color{#d91a1a}-1.69\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	5.0162ms	2.5494ms	392.2481 Ops/s	389.7161 Ops/s	$\color{#35bf28}+0.65\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	98.8650ms	9.6645ms	103.4716 Ops/s	135.7539 Ops/s	$\textbf{\color{#d91a1a}-23.78\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	16.2543ms	13.4681ms	74.2493 Ops/s	74.6999 Ops/s	$\color{#d91a1a}-0.60\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	5.3868ms	2.5618ms	390.3453 Ops/s	229.4036 Ops/s	$\textbf{\color{#35bf28}+70.16\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	96.9531ms	9.6783ms	103.3245 Ops/s	126.3552 Ops/s	$\textbf{\color{#d91a1a}-18.23\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	16.0304ms	13.6296ms	73.3695 Ops/s	72.4780 Ops/s	$\color{#35bf28}+1.23\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	5.3510ms	2.8281ms	353.5933 Ops/s	354.3193 Ops/s	$\color{#d91a1a}-0.20\%$

github-actions · 2024-02-16T21:14:58Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1184s	0.1167s	8.5685 Ops/s	8.2938 Ops/s	$\color{#35bf28}+3.31\%$
test_sync	95.6823ms	95.5085ms	10.4703 Ops/s	10.4926 Ops/s	$\color{#d91a1a}-0.21\%$
test_async	0.1810s	91.6261ms	10.9139 Ops/s	10.9060 Ops/s	$\color{#35bf28}+0.07\%$
test_single_pixels	0.2084s	0.1396s	7.1625 Ops/s	7.3286 Ops/s	$\color{#d91a1a}-2.27\%$
test_sync_pixels	83.4398ms	81.6323ms	12.2501 Ops/s	12.1391 Ops/s	$\color{#35bf28}+0.91\%$
test_async_pixels	0.1537s	76.3523ms	13.0972 Ops/s	15.5890 Ops/s	$\textbf{\color{#d91a1a}-15.98\%}$
test_simple	0.8338s	0.8331s	1.2003 Ops/s	1.1937 Ops/s	$\color{#35bf28}+0.56\%$
test_transformed	1.0684s	1.0669s	0.9373 Ops/s	0.9441 Ops/s	$\color{#d91a1a}-0.72\%$
test_serial	2.5607s	2.5143s	0.3977 Ops/s	0.4138 Ops/s	$\color{#d91a1a}-3.89\%$
test_parallel	2.1641s	2.1130s	0.4733 Ops/s	0.4925 Ops/s	$\color{#d91a1a}-3.91\%$
test_step_mdp_speed[True-True-True-True-True]	0.1038ms	33.0820μs	30.2280 KOps/s	30.3900 KOps/s	$\color{#d91a1a}-0.53\%$
test_step_mdp_speed[True-True-True-True-False]	37.5710μs	20.3983μs	49.0238 KOps/s	51.3512 KOps/s	$\color{#d91a1a}-4.53\%$
test_step_mdp_speed[True-True-True-False-True]	40.8600μs	19.0811μs	52.4080 KOps/s	53.6387 KOps/s	$\color{#d91a1a}-2.29\%$
test_step_mdp_speed[True-True-True-False-False]	25.9100μs	11.0821μs	90.2354 KOps/s	90.8600 KOps/s	$\color{#d91a1a}-0.69\%$
test_step_mdp_speed[True-True-False-True-True]	55.6110μs	34.4252μs	29.0485 KOps/s	28.9621 KOps/s	$\color{#35bf28}+0.30\%$
test_step_mdp_speed[True-True-False-True-False]	46.2700μs	21.3644μs	46.8069 KOps/s	46.6053 KOps/s	$\color{#35bf28}+0.43\%$
test_step_mdp_speed[True-True-False-False-True]	42.4700μs	20.8717μs	47.9117 KOps/s	48.3550 KOps/s	$\color{#d91a1a}-0.92\%$
test_step_mdp_speed[True-True-False-False-False]	30.4800μs	12.9776μs	77.0560 KOps/s	75.9859 KOps/s	$\color{#35bf28}+1.41\%$
test_step_mdp_speed[True-False-True-True-True]	54.0110μs	36.4678μs	27.4214 KOps/s	26.8388 KOps/s	$\color{#35bf28}+2.17\%$
test_step_mdp_speed[True-False-True-True-False]	45.5400μs	23.4286μs	42.6829 KOps/s	42.5291 KOps/s	$\color{#35bf28}+0.36\%$
test_step_mdp_speed[True-False-True-False-True]	66.8910μs	20.7156μs	48.2727 KOps/s	48.9665 KOps/s	$\color{#d91a1a}-1.42\%$
test_step_mdp_speed[True-False-True-False-False]	31.0700μs	13.0651μs	76.5396 KOps/s	76.4647 KOps/s	$\color{#35bf28}+0.10\%$
test_step_mdp_speed[True-False-False-True-True]	60.8310μs	38.4740μs	25.9916 KOps/s	25.7416 KOps/s	$\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-False-False-True-False]	44.5010μs	25.1692μs	39.7311 KOps/s	38.7664 KOps/s	$\color{#35bf28}+2.49\%$
test_step_mdp_speed[True-False-False-False-True]	38.5910μs	22.4431μs	44.5571 KOps/s	44.7135 KOps/s	$\color{#d91a1a}-0.35\%$
test_step_mdp_speed[True-False-False-False-False]	31.0110μs	14.6817μs	68.1122 KOps/s	66.4048 KOps/s	$\color{#35bf28}+2.57\%$
test_step_mdp_speed[False-True-True-True-True]	61.0110μs	36.4476μs	27.4367 KOps/s	27.3817 KOps/s	$\color{#35bf28}+0.20\%$
test_step_mdp_speed[False-True-True-True-False]	40.0500μs	23.4517μs	42.6408 KOps/s	42.6362 KOps/s	$\color{#35bf28}+0.01\%$
test_step_mdp_speed[False-True-True-False-True]	42.2310μs	24.6669μs	40.5401 KOps/s	40.5428 KOps/s	$-0.01\%$
test_step_mdp_speed[False-True-True-False-False]	32.1000μs	14.8204μs	67.4746 KOps/s	66.4415 KOps/s	$\color{#35bf28}+1.55\%$
test_step_mdp_speed[False-True-False-True-True]	62.0510μs	38.3651μs	26.0654 KOps/s	25.1009 KOps/s	$\color{#35bf28}+3.84\%$
test_step_mdp_speed[False-True-False-True-False]	47.0400μs	25.3399μs	39.4634 KOps/s	38.7635 KOps/s	$\color{#35bf28}+1.81\%$
test_step_mdp_speed[False-True-False-False-True]	54.4110μs	26.5593μs	37.6516 KOps/s	37.7093 KOps/s	$\color{#d91a1a}-0.15\%$
test_step_mdp_speed[False-True-False-False-False]	38.0100μs	16.8921μs	59.1994 KOps/s	59.4586 KOps/s	$\color{#d91a1a}-0.44\%$
test_step_mdp_speed[False-False-True-True-True]	60.3610μs	40.7070μs	24.5658 KOps/s	24.6939 KOps/s	$\color{#d91a1a}-0.52\%$
test_step_mdp_speed[False-False-True-True-False]	56.0410μs	27.4741μs	36.3979 KOps/s	36.9156 KOps/s	$\color{#d91a1a}-1.40\%$
test_step_mdp_speed[False-False-True-False-True]	46.5510μs	26.3135μs	38.0033 KOps/s	38.7284 KOps/s	$\color{#d91a1a}-1.87\%$
test_step_mdp_speed[False-False-True-False-False]	41.4400μs	16.5814μs	60.3084 KOps/s	59.8627 KOps/s	$\color{#35bf28}+0.74\%$
test_step_mdp_speed[False-False-False-True-True]	65.3610μs	42.0179μs	23.7994 KOps/s	23.6972 KOps/s	$\color{#35bf28}+0.43\%$
test_step_mdp_speed[False-False-False-True-False]	51.1100μs	28.9439μs	34.5496 KOps/s	34.3553 KOps/s	$\color{#35bf28}+0.57\%$
test_step_mdp_speed[False-False-False-False-True]	46.3610μs	27.8713μs	35.8792 KOps/s	35.8235 KOps/s	$\color{#35bf28}+0.16\%$
test_step_mdp_speed[False-False-False-False-False]	35.3610μs	18.4249μs	54.2745 KOps/s	54.4260 KOps/s	$\color{#d91a1a}-0.28\%$
test_values[generalized_advantage_estimate-True-True]	27.6203ms	26.7298ms	37.4115 Ops/s	40.2444 Ops/s	$\textbf{\color{#d91a1a}-7.04\%}$
test_values[vec_generalized_advantage_estimate-True-True]	84.6563ms	3.2728ms	305.5507 Ops/s	308.4672 Ops/s	$\color{#d91a1a}-0.95\%$
test_values[td0_return_estimate-False-False]	99.8720μs	63.8011μs	15.6737 KOps/s	16.4883 KOps/s	$\color{#d91a1a}-4.94\%$
test_values[td1_return_estimate-False-False]	59.8015ms	59.0706ms	16.9289 Ops/s	18.9545 Ops/s	$\textbf{\color{#d91a1a}-10.69\%}$
test_values[vec_td1_return_estimate-False-False]	2.1419ms	1.7911ms	558.3039 Ops/s	564.5024 Ops/s	$\color{#d91a1a}-1.10\%$
test_values[td_lambda_return_estimate-True-False]	94.8850ms	94.0660ms	10.6308 Ops/s	11.8930 Ops/s	$\textbf{\color{#d91a1a}-10.61\%}$
test_values[vec_td_lambda_return_estimate-True-False]	4.0662ms	1.8285ms	546.9042 Ops/s	556.0384 Ops/s	$\color{#d91a1a}-1.64\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	26.4302ms	26.2245ms	38.1322 Ops/s	42.3764 Ops/s	$\textbf{\color{#d91a1a}-10.02\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	0.9329ms	0.7227ms	1.3837 KOps/s	1.4196 KOps/s	$\color{#d91a1a}-2.53\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.7309ms	0.6898ms	1.4496 KOps/s	1.5336 KOps/s	$\textbf{\color{#d91a1a}-5.48\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	1.5365ms	1.4785ms	676.3477 Ops/s	686.8483 Ops/s	$\color{#d91a1a}-1.53\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	0.9569ms	0.6846ms	1.4607 KOps/s	1.4835 KOps/s	$\color{#d91a1a}-1.54\%$
test_dqn_speed	4.0529ms	1.4515ms	688.9229 Ops/s	681.5146 Ops/s	$\color{#35bf28}+1.09\%$
test_ddpg_speed	3.2679ms	2.8022ms	356.8585 Ops/s	356.7989 Ops/s	$\color{#35bf28}+0.02\%$
test_sac_speed	8.6458ms	8.1824ms	122.2131 Ops/s	123.1767 Ops/s	$\color{#d91a1a}-0.78\%$
test_redq_speed	10.8624ms	10.1591ms	98.4337 Ops/s	98.1957 Ops/s	$\color{#35bf28}+0.24\%$
test_redq_deprec_speed	11.8456ms	11.3007ms	88.4902 Ops/s	90.2089 Ops/s	$\color{#d91a1a}-1.91\%$
test_td3_speed	8.3240ms	8.1816ms	122.2261 Ops/s	122.3455 Ops/s	$\color{#d91a1a}-0.10\%$
test_cql_speed	25.7121ms	24.7448ms	40.4125 Ops/s	39.4782 Ops/s	$\color{#35bf28}+2.37\%$
test_a2c_speed	5.3978ms	5.1400ms	194.5530 Ops/s	182.7475 Ops/s	$\textbf{\color{#35bf28}+6.46\%}$
test_ppo_speed	5.6780ms	5.4305ms	184.1448 Ops/s	172.5937 Ops/s	$\textbf{\color{#35bf28}+6.69\%}$
test_reinforce_speed	4.4662ms	4.1536ms	240.7576 Ops/s	223.8770 Ops/s	$\textbf{\color{#35bf28}+7.54\%}$
test_iql_speed	99.9122ms	20.0872ms	49.7828 Ops/s	52.7061 Ops/s	$\textbf{\color{#d91a1a}-5.55\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.8416ms	3.7133ms	269.3011 Ops/s	273.1119 Ops/s	$\color{#d91a1a}-1.40\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.7622ms	0.5588ms	1.7897 KOps/s	1.8074 KOps/s	$\color{#d91a1a}-0.98\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	92.9655ms	0.6011ms	1.6636 KOps/s	1.9055 KOps/s	$\textbf{\color{#d91a1a}-12.69\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.9339ms	3.7354ms	267.7065 Ops/s	268.6227 Ops/s	$\color{#d91a1a}-0.34\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6768ms	0.5489ms	1.8217 KOps/s	1.8369 KOps/s	$\color{#d91a1a}-0.83\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.6644ms	0.5244ms	1.9069 KOps/s	1.9271 KOps/s	$\color{#d91a1a}-1.05\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.9844ms	3.8475ms	259.9084 Ops/s	262.7046 Ops/s	$\color{#d91a1a}-1.06\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.8016ms	0.6855ms	1.4588 KOps/s	1.4813 KOps/s	$\color{#d91a1a}-1.52\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8700ms	0.6572ms	1.5215 KOps/s	1.5497 KOps/s	$\color{#d91a1a}-1.82\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.7862ms	3.7004ms	270.2437 Ops/s	271.4084 Ops/s	$\color{#d91a1a}-0.43\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.6862ms	0.5573ms	1.7945 KOps/s	1.8113 KOps/s	$\color{#d91a1a}-0.93\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.6794ms	0.5320ms	1.8796 KOps/s	1.8992 KOps/s	$\color{#d91a1a}-1.03\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.8937ms	3.7178ms	268.9767 Ops/s	269.3558 Ops/s	$\color{#d91a1a}-0.14\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.7071ms	0.5513ms	1.8139 KOps/s	1.8318 KOps/s	$\color{#d91a1a}-0.98\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.6558ms	0.5261ms	1.9007 KOps/s	1.6318 KOps/s	$\textbf{\color{#35bf28}+16.48\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.9620ms	3.8475ms	259.9061 Ops/s	263.1344 Ops/s	$\color{#d91a1a}-1.23\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.8100ms	0.6866ms	1.4565 KOps/s	1.4733 KOps/s	$\color{#d91a1a}-1.14\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8064ms	0.6597ms	1.5158 KOps/s	1.5328 KOps/s	$\color{#d91a1a}-1.11\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1120s	11.3552ms	88.0651 Ops/s	89.7844 Ops/s	$\color{#d91a1a}-1.91\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	18.8925ms	16.5106ms	60.5670 Ops/s	62.3541 Ops/s	$\color{#d91a1a}-2.87\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	7.4333ms	3.1065ms	321.9091 Ops/s	333.9906 Ops/s	$\color{#d91a1a}-3.62\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1023s	9.2850ms	107.7011 Ops/s	90.7362 Ops/s	$\textbf{\color{#35bf28}+18.70\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	19.3573ms	16.5439ms	60.4451 Ops/s	62.4266 Ops/s	$\color{#d91a1a}-3.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	8.1033ms	3.1456ms	317.9022 Ops/s	334.9338 Ops/s	$\textbf{\color{#d91a1a}-5.09\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1014s	11.4684ms	87.1963 Ops/s	88.2951 Ops/s	$\color{#d91a1a}-1.24\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	19.4856ms	16.7447ms	59.7204 Ops/s	61.3875 Ops/s	$\color{#d91a1a}-2.72\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	8.8550ms	3.4436ms	290.3966 Ops/s	304.9729 Ops/s	$\color{#d91a1a}-4.78\%$

matteobettini

This is amazing. Thanks so much for this! Can't wait to see the perf gains.

Before merging, we should run a comparision on main and on this PR on one of the multiagent example scripts (e.g., mappo_ippo.py) in the case of NOT sharing params to test that the reward is the same and gather data for the performance differences.

In particular i am interested in confirming that

  @staticmethod
  def vmap_func_module(module, *args, **kwargs):
      def exec_module(params, *input):
          with params.to_module(module):
              return module(*input)

      return torch.vmap(exec_module, *args, **kwargs)

   output = self.vmap_func_module(
                    self._empty_net, (0, self.agent_dim), (-2,)
                )(self.params, inputs)

is faster than

    output = torch.stack(
                    [
                        net(inputs[..., i, :])
                        for i, net in enumerate(self.agent_networks)
                    ],
                    dim=-2,
                )

in low number of agents regimes

torchrl/modules/models/multiagent.py

vmoens · 2024-02-17T20:49:07Z

It is faster, we ran multiple benchmarks on this.
It is faster in low regimes if you consider the backward pass (which is much slower when you build multiple graphs) and when you populate your optimizers with many more params (since you'll be calling step and zero grad on many more tensors with ops executed in python loops).

https://gist.github.com/vmoens/4b6037896a6a0ad347e91877ade354ae

kfu02

This is cool! Thank you so much!

To follow up on your comment here, what do you anticipate adding an RNN will require in addition to this? Is the issue that the hidden state must be initialized first? I don't quite understand what "non initialized lazy params" means or how that creates an issue with RNNs.

kfu02 · 2024-02-18T20:53:36Z

torchrl/modules/models/multiagent.py

+            self.params = TensorDict.from_modules(*agent_networks, as_module=True)
+
+    @abc.abstractmethod
+    def _build_single_net(self, *, device, **kwargs):


Just for my understanding, any new MultiAgent* will simply need to implement this method with some nn.Module (and the pre_forward_check below) right? I would be interested in helping contribute a MultiAgentGNN.

yep that is the idea!

vmoens · 2024-02-19T07:59:20Z

@kfu02 @matteobettini for context: the problem with unitilialized params is that if you have modules with lazy, non initialized params it is usually assumed that you can pass them to an optimizer and the optmizer will know that it must wait until they are initialized to do smth with them.

Currently this isn't supported by from_modules which will only create dense params. We could create lazy params but I'm not super duper sure I see how that will work with vmap...

matteobettini · 2024-02-19T08:15:40Z

It is faster, we ran multiple benchmarks on this.

That is cool! I would still run the full multiagent training script to check those 2 things

@kfu02 @matteobettini for context: the problem with unitilialized params is that if you have modules with lazy, non initialized params it is usually assumed that you can pass them to an optimizer and the optmizer will know that it must wait until they are initialized to do smth with them.

Since lazy modules was not a feature of these classes in the first place, why don’t we just leave it that way?

i personally would prefer this to a complex solution to support it

vmoens · 2024-02-19T08:25:13Z

It was a feature of CNN so I just made it uniform

vmoens · 2024-02-20T21:20:28Z

@kfu02 wanna give a shot at MA-RNNs with this once it's merged? Or should I draft a PR?

kfu02 · 2024-02-21T02:54:35Z

@kfu02 wanna give a shot at MA-RNNs with this once it's merged? Or should I draft a PR?

Yes! I will put up a draft by the end of the week!

matteobettini · 2024-02-21T11:21:26Z

I ran some benchmarks on the mappo_ippo example for the MLP. With MAPPO and non-sharinmg params for 3 agents.

It seems to work! Do not see any regression in this case

All metrics match (no perf improvement tho)

matteobettini · 2024-02-26T09:18:42Z

Further to #1957 i ran some tests with MASAC and non-shared parameters on the same task for 4 agents and the results look good!

Almost half training time!

init

b54f73d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 16, 2024

vmoens added the Refactoring Refactoring of an existing feature label Feb 16, 2024

vmoens mentioned this pull request Feb 16, 2024

[Refactor] Refactor multi-agent MLP #1497

Closed

1 task

Merge remote-tracking branch 'origin/main' into edit_ma_mlp2

b0d2c36

matteobettini approved these changes Feb 17, 2024

View reviewed changes

torchrl/modules/models/multiagent.py Show resolved Hide resolved

torchrl/modules/models/multiagent.py Show resolved Hide resolved

amend

5b6ee6e

amend

66343ee

kfu02 approved these changes Feb 18, 2024

View reviewed changes

amend

5c8834f

vmoens mentioned this pull request Feb 20, 2024

[BugFix] Fix lazy params init pytorch/tensordict#681

Merged

vmoens and others added 3 commits February 20, 2024 09:17

Merge remote-tracking branch 'origin/main' into edit_ma_mlp2

126cdf0

:wMerge remote-tracking branch 'origin/main' into edit_ma_mlp2

ffe318c

amend

030d0dd

vmoens merged commit ca42794 into main Feb 20, 2024
53 of 67 checks passed

vmoens deleted the edit_ma_mlp2 branch February 20, 2024 21:28

kfu02 mentioned this pull request Feb 22, 2024

[WIP] add multiagentRNN #1948

Closed

10 tasks

This was referenced Feb 23, 2024

[BUG] Multiagent nets problems with SAC #1957

Closed

[BUG] Restoring multiagent nets #1960

Closed

matteobettini mentioned this pull request Feb 26, 2024

[Feature] Reset parameters of multiagent networks #1967

Closed

kfu02 mentioned this pull request Mar 8, 2024

[Feature Request] Multi-Agent RNNs #2003

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Faster and more generic multi-agent nets #1921

[Refactor] Faster and more generic multi-agent nets #1921

vmoens commented Feb 16, 2024 •

edited

Loading

pytorch-bot bot commented Feb 16, 2024 •

edited

Loading

github-actions bot commented Feb 16, 2024 •

edited

Loading

github-actions bot commented Feb 16, 2024 •

edited

Loading

matteobettini left a comment •

edited

Loading

vmoens commented Feb 17, 2024 •

edited

Loading

kfu02 left a comment

kfu02 Feb 18, 2024

vmoens Feb 20, 2024

vmoens commented Feb 19, 2024

matteobettini commented Feb 19, 2024 •

edited

Loading

vmoens commented Feb 19, 2024

vmoens commented Feb 20, 2024

kfu02 commented Feb 21, 2024

matteobettini commented Feb 21, 2024 •

edited

Loading

matteobettini commented Feb 26, 2024

[Refactor] Faster and more generic multi-agent nets #1921

[Refactor] Faster and more generic multi-agent nets #1921

Conversation

vmoens commented Feb 16, 2024 • edited Loading

pytorch-bot bot commented Feb 16, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1921

✅ You can merge normally! (14 Unrelated Failures)

github-actions bot commented Feb 16, 2024 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 89. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}5$.

github-actions bot commented Feb 16, 2024 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 92. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}9$.

matteobettini left a comment • edited Loading

Choose a reason for hiding this comment

vmoens commented Feb 17, 2024 • edited Loading

kfu02 left a comment

Choose a reason for hiding this comment

kfu02 Feb 18, 2024

Choose a reason for hiding this comment

vmoens Feb 20, 2024

Choose a reason for hiding this comment

vmoens commented Feb 19, 2024

matteobettini commented Feb 19, 2024 • edited Loading

vmoens commented Feb 19, 2024

vmoens commented Feb 20, 2024

kfu02 commented Feb 21, 2024

matteobettini commented Feb 21, 2024 • edited Loading

matteobettini commented Feb 26, 2024

vmoens commented Feb 16, 2024 •

edited

Loading

pytorch-bot bot commented Feb 16, 2024 •

edited

Loading

github-actions bot commented Feb 16, 2024 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

github-actions bot commented Feb 16, 2024 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

matteobettini left a comment •

edited

Loading

vmoens commented Feb 17, 2024 •

edited

Loading

matteobettini commented Feb 19, 2024 •

edited

Loading

matteobettini commented Feb 21, 2024 •

edited

Loading