[Feature] RB MultiStep transform #2008

vmoens · 2024-03-11T17:47:05Z

Allow rbs to handle extend with no data (will/may happen during the first iterations when calling extend / add)
Design class
- Handle change of horizon correctly
Test equivalence with MultiStep in collectors
Document feature + compare with collector version

Test script

import torch

from torchrl.envs import GymEnv, TransformedEnv, StepCounter, SerialEnv
from torchrl.envs.transforms.rb_transforms import MultiStepTransform
from tensordict.utils import assert_allclose_td

# env = TransformedEnv(SerialEnv(2, lambda:GymEnv("CartPole-v1")), StepCounter())
env = TransformedEnv(GymEnv("CartPole-v1"), StepCounter())

env.set_seed(0)
torch.manual_seed(0)

t = MultiStepTransform(3, 0.98)

outs_2 = []
td = env.reset()
for _ in range(1):
    rollout = env.rollout(250, auto_reset=False, tensordict=td, break_when_any_done=False)
    out = t._inv_call(rollout)
    td = rollout[..., -1]
    outs_2.append(out)

outs_2 = torch.cat(outs_2, -1).split([47, 50, 50, 50, 50], -1)

t = MultiStepTransform(3, 0.98)

env.set_seed(0)
torch.manual_seed(0)

outs = []
td = env.reset()
for i in range(5):
    rollout = env.rollout(50, auto_reset=False, tensordict=td, break_when_any_done=False)
    out = t._inv_call(rollout)
    assert_allclose_td(out, outs_2[i])
    td = rollout[..., -1]["next"]
    outs.append(out)

outs = torch.cat(outs, -1)

cc @AechPro

pytorch-bot · 2024-03-11T17:47:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2008

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Unrelated Failure

As of commit ddfce88 with merge base 2b8450c ():

NEW FAILURES - The following jobs have failed:

Habitat Tests on Linux / tests (3.9, 11.6) / linux-job (gh)
RuntimeError: Command docker exec -t e9558bfc5ff04dcc8ed1d9f334fac9e285a8368836bb480c45cebc858bd50abd /exec failed with exit code 139
Unit-tests on Linux / tests-cpu (3.8) / linux-job (gh)
AttributeError: 'OrphanPath' object has no attribute 'exists'
Unit-tests on Linux / tests-gpu (3.8, 12.1) / linux-job (gh)
AttributeError: 'OrphanPath' object has no attribute 'exists'
Unit-tests on Linux / tests-stable-gpu (3.8, 11.8) / linux-job (gh)
test/test_env.py::TestLibThreading::test_auto_num_threads
Unit-tests on MacOS CPU / tests (3.8) / macos-job (gh)
AttributeError: 'OrphanPath' object has no attribute 'exists'
Unit-tests on Windows / unittests-cpu / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128

FLAKY - The following job failed but was likely due to flakiness present on trunk:

Libs Tests on Linux / unittests-sklearn (3.9, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2024-03-11T17:53:54Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}9$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	55.2397ms	54.3096ms	18.4129 Ops/s	17.0154 Ops/s	$\textbf{\color{#35bf28}+8.21\%}$
test_sync	49.1180ms	30.5735ms	32.7081 Ops/s	33.3896 Ops/s	$\color{#d91a1a}-2.04\%$
test_async	55.8512ms	28.8483ms	34.6641 Ops/s	34.8361 Ops/s	$\color{#d91a1a}-0.49\%$
test_simple	0.4021s	0.3431s	2.9150 Ops/s	2.8794 Ops/s	$\color{#35bf28}+1.24\%$
test_transformed	0.5228s	0.4774s	2.0945 Ops/s	2.0813 Ops/s	$\color{#35bf28}+0.64\%$
test_serial	1.2587s	1.2006s	0.8329 Ops/s	0.8088 Ops/s	$\color{#35bf28}+2.98\%$
test_parallel	1.0810s	1.0446s	0.9573 Ops/s	0.9474 Ops/s	$\color{#35bf28}+1.04\%$
test_step_mdp_speed[True-True-True-True-True]	0.1357ms	21.1236μs	47.3404 KOps/s	47.9664 KOps/s	$\color{#d91a1a}-1.30\%$
test_step_mdp_speed[True-True-True-True-False]	39.6240μs	12.8604μs	77.7584 KOps/s	78.8775 KOps/s	$\color{#d91a1a}-1.42\%$
test_step_mdp_speed[True-True-True-False-True]	67.7060μs	12.5542μs	79.6543 KOps/s	82.1345 KOps/s	$\color{#d91a1a}-3.02\%$
test_step_mdp_speed[True-True-True-False-False]	28.2420μs	7.5815μs	131.9006 KOps/s	134.5477 KOps/s	$\color{#d91a1a}-1.97\%$
test_step_mdp_speed[True-True-False-True-True]	64.2100μs	22.5734μs	44.2999 KOps/s	45.3420 KOps/s	$\color{#d91a1a}-2.30\%$
test_step_mdp_speed[True-True-False-True-False]	51.1950μs	13.9565μs	71.6514 KOps/s	72.0603 KOps/s	$\color{#d91a1a}-0.57\%$
test_step_mdp_speed[True-True-False-False-True]	47.4380μs	13.7273μs	72.8475 KOps/s	75.0534 KOps/s	$\color{#d91a1a}-2.94\%$
test_step_mdp_speed[True-True-False-False-False]	22.7920μs	8.8217μs	113.3565 KOps/s	115.2522 KOps/s	$\color{#d91a1a}-1.64\%$
test_step_mdp_speed[True-False-True-True-True]	84.8280μs	23.5563μs	42.4515 KOps/s	42.7198 KOps/s	$\color{#d91a1a}-0.63\%$
test_step_mdp_speed[True-False-True-True-False]	58.5290μs	15.3836μs	65.0044 KOps/s	66.1872 KOps/s	$\color{#d91a1a}-1.79\%$
test_step_mdp_speed[True-False-True-False-True]	66.8340μs	13.5837μs	73.6178 KOps/s	70.8827 KOps/s	$\color{#35bf28}+3.86\%$
test_step_mdp_speed[True-False-True-False-False]	27.8610μs	8.8230μs	113.3406 KOps/s	117.4087 KOps/s	$\color{#d91a1a}-3.46\%$
test_step_mdp_speed[True-False-False-True-True]	71.4330μs	24.9371μs	40.1009 KOps/s	40.9825 KOps/s	$\color{#d91a1a}-2.15\%$
test_step_mdp_speed[True-False-False-True-False]	60.6630μs	16.4484μs	60.7963 KOps/s	62.0824 KOps/s	$\color{#d91a1a}-2.07\%$
test_step_mdp_speed[True-False-False-False-True]	34.2440μs	14.7981μs	67.5762 KOps/s	69.2625 KOps/s	$\color{#d91a1a}-2.43\%$
test_step_mdp_speed[True-False-False-False-False]	54.1510μs	9.8606μs	101.4142 KOps/s	103.0649 KOps/s	$\color{#d91a1a}-1.60\%$
test_step_mdp_speed[False-True-True-True-True]	56.9060μs	23.7541μs	42.0979 KOps/s	42.9149 KOps/s	$\color{#d91a1a}-1.90\%$
test_step_mdp_speed[False-True-True-True-False]	59.9410μs	15.3455μs	65.1656 KOps/s	66.1454 KOps/s	$\color{#d91a1a}-1.48\%$
test_step_mdp_speed[False-True-True-False-True]	41.1060μs	15.7380μs	63.5403 KOps/s	65.1623 KOps/s	$\color{#d91a1a}-2.49\%$
test_step_mdp_speed[False-True-True-False-False]	58.3980μs	9.9359μs	100.6455 KOps/s	102.7644 KOps/s	$\color{#d91a1a}-2.06\%$
test_step_mdp_speed[False-True-False-True-True]	37.1890μs	25.2497μs	39.6045 KOps/s	40.7799 KOps/s	$\color{#d91a1a}-2.88\%$
test_step_mdp_speed[False-True-False-True-False]	62.0860μs	16.2656μs	61.4794 KOps/s	61.6155 KOps/s	$\color{#d91a1a}-0.22\%$
test_step_mdp_speed[False-True-False-False-True]	38.1710μs	16.8938μs	59.1935 KOps/s	59.9536 KOps/s	$\color{#d91a1a}-1.27\%$
test_step_mdp_speed[False-True-False-False-False]	53.5790μs	11.0696μs	90.3374 KOps/s	91.5644 KOps/s	$\color{#d91a1a}-1.34\%$
test_step_mdp_speed[False-False-True-True-True]	80.1690μs	25.9467μs	38.5406 KOps/s	39.1314 KOps/s	$\color{#d91a1a}-1.51\%$
test_step_mdp_speed[False-False-True-True-False]	57.9980μs	17.7593μs	56.3086 KOps/s	57.0719 KOps/s	$\color{#d91a1a}-1.34\%$
test_step_mdp_speed[False-False-True-False-True]	68.8580μs	16.9200μs	59.1016 KOps/s	59.9579 KOps/s	$\color{#d91a1a}-1.43\%$
test_step_mdp_speed[False-False-True-False-False]	56.2440μs	11.1648μs	89.5676 KOps/s	91.9210 KOps/s	$\color{#d91a1a}-2.56\%$
test_step_mdp_speed[False-False-False-True-True]	73.6370μs	27.0882μs	36.9164 KOps/s	37.2905 KOps/s	$\color{#d91a1a}-1.00\%$
test_step_mdp_speed[False-False-False-True-False]	56.3040μs	18.7844μs	53.2357 KOps/s	53.9330 KOps/s	$\color{#d91a1a}-1.29\%$
test_step_mdp_speed[False-False-False-False-True]	63.5980μs	17.9066μs	55.8453 KOps/s	56.4411 KOps/s	$\color{#d91a1a}-1.06\%$
test_step_mdp_speed[False-False-False-False-False]	60.0710μs	12.1551μs	82.2698 KOps/s	83.7082 KOps/s	$\color{#d91a1a}-1.72\%$
test_values[generalized_advantage_estimate-True-True]	9.6162ms	9.3432ms	107.0297 Ops/s	101.7809 Ops/s	$\textbf{\color{#35bf28}+5.16\%}$
test_values[vec_generalized_advantage_estimate-True-True]	38.9748ms	35.1340ms	28.4625 Ops/s	29.7484 Ops/s	$\color{#d91a1a}-4.32\%$
test_values[td0_return_estimate-False-False]	0.1956ms	0.1675ms	5.9710 KOps/s	5.1201 KOps/s	$\textbf{\color{#35bf28}+16.62\%}$
test_values[td1_return_estimate-False-False]	25.9116ms	23.5836ms	42.4023 Ops/s	41.9541 Ops/s	$\color{#35bf28}+1.07\%$
test_values[vec_td1_return_estimate-False-False]	36.8319ms	35.6383ms	28.0597 Ops/s	29.6684 Ops/s	$\textbf{\color{#d91a1a}-5.42\%}$
test_values[td_lambda_return_estimate-True-False]	36.5092ms	33.6616ms	29.7075 Ops/s	29.5712 Ops/s	$\color{#35bf28}+0.46\%$
test_values[vec_td_lambda_return_estimate-True-False]	36.7253ms	35.5446ms	28.1337 Ops/s	29.8620 Ops/s	$\textbf{\color{#d91a1a}-5.79\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512]	8.2880ms	8.1641ms	122.4876 Ops/s	121.1500 Ops/s	$\color{#35bf28}+1.10\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	2.1138ms	1.8609ms	537.3641 Ops/s	492.3930 Ops/s	$\textbf{\color{#35bf28}+9.13\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.4161ms	0.3398ms	2.9430 KOps/s	2.8458 KOps/s	$\color{#35bf28}+3.42\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	48.4963ms	46.7232ms	21.4026 Ops/s	24.5081 Ops/s	$\textbf{\color{#d91a1a}-12.67\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	3.7285ms	3.0080ms	332.4431 Ops/s	327.7110 Ops/s	$\color{#35bf28}+1.44\%$
test_dqn_speed	6.9942ms	1.3600ms	735.3133 Ops/s	723.2078 Ops/s	$\color{#35bf28}+1.67\%$
test_ddpg_speed	3.3529ms	2.6785ms	373.3496 Ops/s	368.9682 Ops/s	$\color{#35bf28}+1.19\%$
test_sac_speed	9.2023ms	8.1403ms	122.8459 Ops/s	120.2305 Ops/s	$\color{#35bf28}+2.18\%$
test_redq_speed	14.1571ms	12.9521ms	77.2076 Ops/s	75.4805 Ops/s	$\color{#35bf28}+2.29\%$
test_redq_deprec_speed	14.2136ms	13.0098ms	76.8653 Ops/s	75.8306 Ops/s	$\color{#35bf28}+1.36\%$
test_td3_speed	9.6185ms	8.0905ms	123.6025 Ops/s	121.5871 Ops/s	$\color{#35bf28}+1.66\%$
test_cql_speed	36.8606ms	35.7516ms	27.9708 Ops/s	27.5889 Ops/s	$\color{#35bf28}+1.38\%$
test_a2c_speed	77.8576ms	7.8578ms	127.2617 Ops/s	134.9268 Ops/s	$\textbf{\color{#d91a1a}-5.68\%}$
test_ppo_speed	9.0401ms	7.6721ms	130.3429 Ops/s	131.0389 Ops/s	$\color{#d91a1a}-0.53\%$
test_reinforce_speed	7.3551ms	6.6135ms	151.2060 Ops/s	152.7481 Ops/s	$\color{#d91a1a}-1.01\%$
test_iql_speed	33.3002ms	32.0147ms	31.2356 Ops/s	30.4422 Ops/s	$\color{#35bf28}+2.61\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	2.5435ms	2.2603ms	442.4180 Ops/s	438.6506 Ops/s	$\color{#35bf28}+0.86\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.9111ms	0.5026ms	1.9896 KOps/s	1.8097 KOps/s	$\textbf{\color{#35bf28}+9.94\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.6353ms	0.4724ms	2.1168 KOps/s	1.9480 KOps/s	$\textbf{\color{#35bf28}+8.67\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	2.5590ms	2.2097ms	452.5507 Ops/s	449.6242 Ops/s	$\color{#35bf28}+0.65\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.9793ms	0.4897ms	2.0421 KOps/s	2.0491 KOps/s	$\color{#d91a1a}-0.35\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.6426ms	0.4669ms	2.1418 KOps/s	2.1273 KOps/s	$\color{#35bf28}+0.68\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	1.5997ms	1.2705ms	787.0846 Ops/s	748.4306 Ops/s	$\textbf{\color{#35bf28}+5.16\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	1.4226ms	1.2057ms	829.3599 Ops/s	802.6966 Ops/s	$\color{#35bf28}+3.32\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.4830ms	2.3388ms	427.5640 Ops/s	437.2774 Ops/s	$\color{#d91a1a}-2.22\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	95.1741ms	0.6856ms	1.4585 KOps/s	1.6173 KOps/s	$\textbf{\color{#d91a1a}-9.82\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8731ms	0.5811ms	1.7210 KOps/s	1.7004 KOps/s	$\color{#35bf28}+1.21\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	5.8809ms	2.2455ms	445.3319 Ops/s	449.0400 Ops/s	$\color{#d91a1a}-0.83\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.5973ms	0.5003ms	1.9987 KOps/s	2.0056 KOps/s	$\color{#d91a1a}-0.34\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	3.7021ms	0.4820ms	2.0749 KOps/s	2.1148 KOps/s	$\color{#d91a1a}-1.89\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.5995ms	2.3980ms	417.0226 Ops/s	443.2812 Ops/s	$\textbf{\color{#d91a1a}-5.92\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6752ms	0.4897ms	2.0423 KOps/s	2.0323 KOps/s	$\color{#35bf28}+0.49\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.6713ms	0.4674ms	2.1396 KOps/s	2.1184 KOps/s	$\color{#35bf28}+1.00\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.5745ms	2.3818ms	419.8487 Ops/s	414.8421 Ops/s	$\color{#35bf28}+1.21\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.1136ms	0.6114ms	1.6356 KOps/s	1.6219 KOps/s	$\color{#35bf28}+0.84\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.7501ms	0.5818ms	1.7188 KOps/s	1.6755 KOps/s	$\color{#35bf28}+2.58\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1024s	7.5131ms	133.1017 Ops/s	137.2360 Ops/s	$\color{#d91a1a}-3.01\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	14.4365ms	12.0554ms	82.9502 Ops/s	83.6324 Ops/s	$\color{#d91a1a}-0.82\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	3.7645ms	1.1416ms	875.9338 Ops/s	949.1960 Ops/s	$\textbf{\color{#d91a1a}-7.72\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	87.3050ms	5.3301ms	187.6152 Ops/s	135.7376 Ops/s	$\textbf{\color{#35bf28}+38.22\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	14.4403ms	12.0072ms	83.2834 Ops/s	83.9775 Ops/s	$\color{#d91a1a}-0.83\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	3.6615ms	1.1192ms	893.4643 Ops/s	956.7210 Ops/s	$\textbf{\color{#d91a1a}-6.61\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	90.9132ms	7.5499ms	132.4516 Ops/s	163.7701 Ops/s	$\textbf{\color{#d91a1a}-19.12\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	14.5945ms	12.2862ms	81.3922 Ops/s	68.5295 Ops/s	$\textbf{\color{#35bf28}+18.77\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	2.2413ms	1.3760ms	726.7642 Ops/s	732.4263 Ops/s	$\color{#d91a1a}-0.77\%$

github-actions · 2024-03-11T17:57:21Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}2$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1026s	0.1008s	9.9224 Ops/s	8.9794 Ops/s	$\textbf{\color{#35bf28}+10.50\%}$
test_sync	93.2983ms	90.4722ms	11.0531 Ops/s	10.8826 Ops/s	$\color{#35bf28}+1.57\%$
test_async	0.1762s	89.0927ms	11.2243 Ops/s	11.2765 Ops/s	$\color{#d91a1a}-0.46\%$
test_single_pixels	0.1816s	0.1181s	8.4683 Ops/s	8.8167 Ops/s	$\color{#d91a1a}-3.95\%$
test_sync_pixels	69.0961ms	67.7455ms	14.7611 Ops/s	14.7793 Ops/s	$\color{#d91a1a}-0.12\%$
test_async_pixels	0.1225s	55.4467ms	18.0353 Ops/s	17.5950 Ops/s	$\color{#35bf28}+2.50\%$
test_simple	0.7276s	0.6723s	1.4875 Ops/s	1.4588 Ops/s	$\color{#35bf28}+1.97\%$
test_transformed	0.9257s	0.8713s	1.1477 Ops/s	1.1124 Ops/s	$\color{#35bf28}+3.17\%$
test_serial	2.1422s	2.0915s	0.4781 Ops/s	0.4664 Ops/s	$\color{#35bf28}+2.51\%$
test_parallel	1.8980s	1.8659s	0.5359 Ops/s	0.5502 Ops/s	$\color{#d91a1a}-2.59\%$
test_step_mdp_speed[True-True-True-True-True]	87.2150μs	32.6232μs	30.6530 KOps/s	30.5787 KOps/s	$\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-True-True-True-False]	38.5830μs	19.6904μs	50.7861 KOps/s	51.1338 KOps/s	$\color{#d91a1a}-0.68\%$
test_step_mdp_speed[True-True-True-False-True]	46.9420μs	18.7218μs	53.4136 KOps/s	53.2623 KOps/s	$\color{#35bf28}+0.28\%$
test_step_mdp_speed[True-True-True-False-False]	37.3310μs	11.1527μs	89.6646 KOps/s	89.0231 KOps/s	$\color{#35bf28}+0.72\%$
test_step_mdp_speed[True-True-False-True-True]	58.4630μs	34.4433μs	29.0332 KOps/s	28.9692 KOps/s	$\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-True-False-True-False]	38.5420μs	21.2697μs	47.0152 KOps/s	47.1266 KOps/s	$\color{#d91a1a}-0.24\%$
test_step_mdp_speed[True-True-False-False-True]	36.7630μs	20.4306μs	48.9462 KOps/s	49.8005 KOps/s	$\color{#d91a1a}-1.72\%$
test_step_mdp_speed[True-True-False-False-False]	36.7520μs	13.0397μs	76.6891 KOps/s	76.4129 KOps/s	$\color{#35bf28}+0.36\%$
test_step_mdp_speed[True-False-True-True-True]	58.0240μs	36.4431μs	27.4400 KOps/s	27.7327 KOps/s	$\color{#d91a1a}-1.06\%$
test_step_mdp_speed[True-False-True-True-False]	42.9730μs	23.2366μs	43.0356 KOps/s	42.8265 KOps/s	$\color{#35bf28}+0.49\%$
test_step_mdp_speed[True-False-True-False-True]	45.1030μs	20.2512μs	49.3798 KOps/s	49.2630 KOps/s	$\color{#35bf28}+0.24\%$
test_step_mdp_speed[True-False-True-False-False]	27.7520μs	12.9574μs	77.1762 KOps/s	77.1979 KOps/s	$\color{#d91a1a}-0.03\%$
test_step_mdp_speed[True-False-False-True-True]	64.2140μs	38.3790μs	26.0559 KOps/s	26.6114 KOps/s	$\color{#d91a1a}-2.09\%$
test_step_mdp_speed[True-False-False-True-False]	49.4130μs	25.2262μs	39.6413 KOps/s	40.2207 KOps/s	$\color{#d91a1a}-1.44\%$
test_step_mdp_speed[True-False-False-False-True]	47.3130μs	21.9061μs	45.6495 KOps/s	45.6322 KOps/s	$\color{#35bf28}+0.04\%$
test_step_mdp_speed[True-False-False-False-False]	38.4620μs	14.6481μs	68.2681 KOps/s	67.9560 KOps/s	$\color{#35bf28}+0.46\%$
test_step_mdp_speed[False-True-True-True-True]	60.2940μs	36.3306μs	27.5250 KOps/s	27.7301 KOps/s	$\color{#d91a1a}-0.74\%$
test_step_mdp_speed[False-True-True-True-False]	55.1330μs	23.1913μs	43.1196 KOps/s	43.0816 KOps/s	$\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-True-True-False-True]	53.9630μs	24.5690μs	40.7016 KOps/s	42.2944 KOps/s	$\color{#d91a1a}-3.77\%$
test_step_mdp_speed[False-True-True-False-False]	37.7930μs	14.7929μs	67.6001 KOps/s	67.3720 KOps/s	$\color{#35bf28}+0.34\%$
test_step_mdp_speed[False-True-False-True-True]	69.6240μs	38.6807μs	25.8527 KOps/s	26.0104 KOps/s	$\color{#d91a1a}-0.61\%$
test_step_mdp_speed[False-True-False-True-False]	46.9130μs	25.3339μs	39.4728 KOps/s	39.6026 KOps/s	$\color{#d91a1a}-0.33\%$
test_step_mdp_speed[False-True-False-False-True]	43.2810μs	25.8333μs	38.7098 KOps/s	38.2899 KOps/s	$\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-True-False-False-False]	37.5420μs	16.7718μs	59.6239 KOps/s	60.0938 KOps/s	$\color{#d91a1a}-0.78\%$
test_step_mdp_speed[False-False-True-True-True]	64.4130μs	39.7703μs	25.1444 KOps/s	24.8837 KOps/s	$\color{#35bf28}+1.05\%$
test_step_mdp_speed[False-False-True-True-False]	49.8730μs	27.0405μs	36.9815 KOps/s	36.8751 KOps/s	$\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-False-True-False-True]	52.8720μs	26.1020μs	38.3112 KOps/s	38.3862 KOps/s	$\color{#d91a1a}-0.20\%$
test_step_mdp_speed[False-False-True-False-False]	35.0010μs	16.7248μs	59.7915 KOps/s	60.0534 KOps/s	$\color{#d91a1a}-0.44\%$
test_step_mdp_speed[False-False-False-True-True]	58.8740μs	41.8937μs	23.8699 KOps/s	24.2384 KOps/s	$\color{#d91a1a}-1.52\%$
test_step_mdp_speed[False-False-False-True-False]	52.0730μs	28.7207μs	34.8181 KOps/s	34.5716 KOps/s	$\color{#35bf28}+0.71\%$
test_step_mdp_speed[False-False-False-False-True]	48.7430μs	27.6634μs	36.1488 KOps/s	36.2752 KOps/s	$\color{#d91a1a}-0.35\%$
test_step_mdp_speed[False-False-False-False-False]	37.7720μs	18.2018μs	54.9396 KOps/s	54.3757 KOps/s	$\color{#35bf28}+1.04\%$
test_values[generalized_advantage_estimate-True-True]	25.8758ms	24.6231ms	40.6123 Ops/s	39.2225 Ops/s	$\color{#35bf28}+3.54\%$
test_values[vec_generalized_advantage_estimate-True-True]	95.8533ms	3.4766ms	287.6374 Ops/s	307.2527 Ops/s	$\textbf{\color{#d91a1a}-6.38\%}$
test_values[td0_return_estimate-False-False]	93.2740μs	64.1303μs	15.5932 KOps/s	15.0885 KOps/s	$\color{#35bf28}+3.34\%$
test_values[td1_return_estimate-False-False]	52.2964ms	51.6854ms	19.3478 Ops/s	18.1430 Ops/s	$\textbf{\color{#35bf28}+6.64\%}$
test_values[vec_td1_return_estimate-False-False]	1.9428ms	1.7481ms	572.0559 Ops/s	566.5981 Ops/s	$\color{#35bf28}+0.96\%$
test_values[td_lambda_return_estimate-True-False]	83.1762ms	82.5093ms	12.1198 Ops/s	11.4121 Ops/s	$\textbf{\color{#35bf28}+6.20\%}$
test_values[vec_td_lambda_return_estimate-True-False]	2.0434ms	1.7486ms	571.8803 Ops/s	567.6234 Ops/s	$\color{#35bf28}+0.75\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	23.0065ms	22.7172ms	44.0195 Ops/s	42.6969 Ops/s	$\color{#35bf28}+3.10\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	0.8722ms	0.6897ms	1.4498 KOps/s	1.4416 KOps/s	$\color{#35bf28}+0.57\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.7120ms	0.6399ms	1.5628 KOps/s	1.5468 KOps/s	$\color{#35bf28}+1.04\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	1.4954ms	1.4436ms	692.7072 Ops/s	688.9805 Ops/s	$\color{#35bf28}+0.54\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	0.9430ms	0.6635ms	1.5071 KOps/s	1.4986 KOps/s	$\color{#35bf28}+0.56\%$
test_dqn_speed	7.9962ms	1.4425ms	693.2542 Ops/s	683.3785 Ops/s	$\color{#35bf28}+1.45\%$
test_ddpg_speed	2.9893ms	2.7240ms	367.1091 Ops/s	364.3629 Ops/s	$\color{#35bf28}+0.75\%$
test_sac_speed	8.4200ms	8.0084ms	124.8693 Ops/s	121.5691 Ops/s	$\color{#35bf28}+2.71\%$
test_redq_speed	11.1589ms	10.0636ms	99.3679 Ops/s	95.9891 Ops/s	$\color{#35bf28}+3.52\%$
test_redq_deprec_speed	11.4745ms	10.7043ms	93.4203 Ops/s	88.2921 Ops/s	$\textbf{\color{#35bf28}+5.81\%}$
test_td3_speed	8.2623ms	7.9278ms	126.1380 Ops/s	122.4805 Ops/s	$\color{#35bf28}+2.99\%$
test_cql_speed	26.3237ms	25.2289ms	39.6370 Ops/s	39.0922 Ops/s	$\color{#35bf28}+1.39\%$
test_a2c_speed	6.7543ms	5.4871ms	182.2458 Ops/s	180.2229 Ops/s	$\color{#35bf28}+1.12\%$
test_ppo_speed	6.7604ms	5.7417ms	174.1650 Ops/s	169.8639 Ops/s	$\color{#35bf28}+2.53\%$
test_reinforce_speed	4.6484ms	4.4443ms	225.0073 Ops/s	222.2596 Ops/s	$\color{#35bf28}+1.24\%$
test_iql_speed	19.7389ms	19.1285ms	52.2780 Ops/s	51.2922 Ops/s	$\color{#35bf28}+1.92\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	3.0869ms	2.8796ms	347.2729 Ops/s	344.2655 Ops/s	$\color{#35bf28}+0.87\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.6603ms	0.5376ms	1.8603 KOps/s	1.8548 KOps/s	$\color{#35bf28}+0.30\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	4.3175ms	0.5163ms	1.9368 KOps/s	1.9228 KOps/s	$\color{#35bf28}+0.73\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.1610ms	2.8885ms	346.2012 Ops/s	346.8901 Ops/s	$\color{#d91a1a}-0.20\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6728ms	0.5284ms	1.8926 KOps/s	1.8805 KOps/s	$\color{#35bf28}+0.64\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	4.5575ms	0.5161ms	1.9375 KOps/s	1.9579 KOps/s	$\color{#d91a1a}-1.04\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	1.6204ms	1.5150ms	660.0794 Ops/s	661.7739 Ops/s	$\color{#d91a1a}-0.26\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	5.3627ms	1.4471ms	691.0345 Ops/s	693.8739 Ops/s	$\color{#d91a1a}-0.41\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.0901ms	3.0071ms	332.5457 Ops/s	332.0259 Ops/s	$\color{#35bf28}+0.16\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.4649ms	0.6579ms	1.5199 KOps/s	1.3228 KOps/s	$\textbf{\color{#35bf28}+14.90\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8365ms	0.6360ms	1.5723 KOps/s	1.5642 KOps/s	$\color{#35bf28}+0.52\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	2.9517ms	2.8778ms	347.4826 Ops/s	345.1936 Ops/s	$\color{#35bf28}+0.66\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	1.2415ms	0.5431ms	1.8413 KOps/s	1.8469 KOps/s	$\color{#d91a1a}-0.30\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.7203ms	0.5170ms	1.9343 KOps/s	1.9338 KOps/s	$\color{#35bf28}+0.02\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	3.2246ms	2.9180ms	342.7000 Ops/s	345.6547 Ops/s	$\color{#d91a1a}-0.85\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6193ms	0.5298ms	1.8875 KOps/s	1.8886 KOps/s	$\color{#d91a1a}-0.06\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	4.5273ms	0.5130ms	1.9494 KOps/s	1.9484 KOps/s	$\color{#35bf28}+0.05\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	3.1921ms	3.0077ms	332.4791 Ops/s	330.8049 Ops/s	$\color{#35bf28}+0.51\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	0.8840ms	0.6633ms	1.5075 KOps/s	1.4995 KOps/s	$\color{#35bf28}+0.54\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.7743ms	0.6394ms	1.5640 KOps/s	1.5560 KOps/s	$\color{#35bf28}+0.51\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1058s	8.7414ms	114.3980 Ops/s	111.9774 Ops/s	$\color{#35bf28}+2.16\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	16.6530ms	14.3538ms	69.6681 Ops/s	68.1723 Ops/s	$\color{#35bf28}+2.19\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	1.1402ms	1.0397ms	961.7768 Ops/s	834.3765 Ops/s	$\textbf{\color{#35bf28}+15.27\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1001s	6.7293ms	148.6036 Ops/s	149.0107 Ops/s	$\color{#d91a1a}-0.27\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	16.6246ms	14.3152ms	69.8556 Ops/s	68.3636 Ops/s	$\color{#35bf28}+2.18\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	2.4998ms	1.1988ms	834.1687 Ops/s	760.5105 Ops/s	$\textbf{\color{#35bf28}+9.69\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1031s	9.0474ms	110.5291 Ops/s	111.6162 Ops/s	$\color{#d91a1a}-0.97\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	17.2234ms	14.6359ms	68.3250 Ops/s	67.0633 Ops/s	$\color{#35bf28}+1.88\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	7.8760ms	1.6455ms	607.7174 Ops/s	657.8848 Ops/s	$\textbf{\color{#d91a1a}-7.63\%}$

vmoens · 2024-03-12T18:14:29Z

@AechPro I did not change the n_steps argument as you suggested for consistency within torchrl.
I'm open to change both but I'm not sure how to proceed to make things non-bc breaking

I'll ping you when the preview of the doc is built for you to check if things make sense, in the meantime you can look at the diff

AechPro · 2024-03-12T18:55:38Z

@vmoens I understand that backwards compatibility is important, but I think if we expect torchrl to be largely used by a research community it seems a bit strange for the n_steps argument to behave in a slightly unexpected way.

Consider the case where a user of torchrl is re-implementing an algorithm like RAINBOW in good faith, and they naturally set the n_step argument to 3 as suggested by the paper. Using the current behavior this would actually produce a multi-step return estimate with n=4 according to the algorithm implemented in the paper, which would meaningfully change the performance of the algorithm. I can imagine our hypothetical torchrl user becoming quite confused when they cannot replicate the results of RAINBOW despite having (seemingly) implemented everything correctly here.

Further, if we expect new algorithms to be written using torchrl, this sort of discrepancy in the meaning of the multi-step parameter could introduce a point of confusion in the opposite direction: if a person not using torchrl is attempting to implement a paper which describes results from an algorithm implemented with torchrl using the multi-step return object, that person may find it difficult to replicate the results of the torchrl algorithm for the same reason as our hypothetical RAINBOW user.

In my opinion preserving the meaning of the parameter in the multi-step return algorithm is more important than backwards compatibility because of the two examples I presented above, but I understand others may feel differently. At the very least I would strongly advocate for some easy to see documentation about this behavior whenever there is a tutorial using the MultiStep object and in the docstring for the object itself so we can mitigate these two potential issues.

vmoens · 2024-03-12T18:58:00Z

Makes sense! Then let's make this n_step congruent with what's expected by the community

vmoens · 2024-03-12T19:01:16Z

Here's the doc https://docs-preview.pytorch.org/pytorch/rl/2008/reference/generated/torchrl.envs.transforms.rb_transforms.MultiStepTransform.html#torchrl.envs.transforms.rb_transforms.MultiStepTransform

AechPro · 2024-03-12T19:23:22Z

I'm having a little trouble understanding the output of the example. We have the MultiStepTransform with n_steps=3, then we sample some timesteps and slice the first 5 entries from our replay buffer. The first entry in that slice is reporting a step count of 9, which I assume is supposed to be equal to 5+n, which would be 5 + 3 = 8 but that would still be an incorrect behavior. The first timestep emitted by the transform should be timestep 1 (or zero if you start counting at zero), because that contains the state from which we're computing the n-step return, so I would expect rb[:]["step_count"][:, 0] to be 1 (or zero) and the final entry at rb[:]["step_count"][:, -1] to be T-n+1 because the internal buffers should only need to contain n-1 waiting timesteps unless there is a terminal state at the end of the internal buffer, in which case it should just compute all of the remaining possible returns at each of those timesteps.

I'm sure I must be misunderstanding what's happening in the example. Could you clarify?

vmoens · 2024-03-12T20:28:08Z

Let me clarify:
Here we look at the step count at the root:

>>> print("step_count", rb[:]["step_count"][:, :5])
step_count tensor([[[ 9],
         [10],
         [11],
         [12],
         [13]],

        [[12],
         [13],
         [14],
         [15],
         [16]]])

Env 0 has steps [9, 10, 11...] and env 1 [12, 13, 14,...]

Then we look at the "next" entry to see the shift.
Without MultiStep, we would have [10, 11, 12, ...] for env 0 and [13, 14, 15, ...] for env 1
Because we use multi-step with a shift of 3 we have [13, 14, 15, ...] and [16, 17, 18, ...] resp.

>>> print("next step_count", rb[:]["next", "step_count"][:, :5])
next step_count tensor([[[13],
         [14],
         [15],
         [16],
         [17]],

        [[16],
         [17],
         [18],
         [19],
         [20]]])

Note that we're looking at the replay buffer content so it doesn't really matter what those values are (ie it's expected that it doesn't start at 0).

For a single env and n=3, you would have this data structure accessible in the buffer

done:        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
step_count:        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
next, step_count_orig:        [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
next, step_count:        [4, 5, 6, 7, 8, 9, 10, 10, 10, 10]

If you think this isn't the desired behaviour I'd be happy to read what you think would be expected, but to me it looks like what multi-step requires.

AechPro · 2024-03-12T21:14:40Z

Ah-hah, my apologies for misunderstanding. Thanks for the clarification, this looks good!

AechPro · 2024-03-13T05:59:01Z

Although if you are looking to implement the change to the n_step parameter we spoke about earlier right now then I believe the example with 1 env should be as follows:

done:                        [0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
step_count:                  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
next, step_count_orig:       [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
next, step_count:            [3, 4, 5, 6, 7, 8, 9, 10, 10, 10]

Because starting from state 0 we will compute the return estimate as the sum of rewards starting with timestep 0, then 1, then 2, and finally the learning algorithm will need to bootstrap from the state encountered at timestep 3.

vmoens · 2024-03-13T09:04:59Z

So given what you're saying, what it should be is:
n_steps=0 => error
n_steps=1 => no transform
n_steps=2 => 1 shift in the future
etc

I will change that thx

AechPro · 2024-03-13T14:19:58Z

I'm not sure what the right outcome for n_steps=0 is. It seems reasonable to think we should set all of the rewards to zero and then set all the next states to the current states (i.e. step_count = t and next, step_count = t) because this would turn the equation return_estimates = batch["rewards"] + batch["next]["gammas"] * value_estimator(batch["next"]["observation"]) into return_estimates = 0 + 1 * value_estimator(batch["next"]["observation"]) = value_estimator(batch["observation"]) provided we also set the gamma values to 1. This makes some amount of sense because the value of n_steps can be taken to mean the number of reward terms we need to incorporate before using an estimator of the value function in our return estimate, so zero there would just lead us to use our value estimator completely with no reward terms at all

With that said I'm not aware of anything in the literature using n=0, and I wonder if it would be confusing or if it's even necessary.

vmoens · 2024-03-13T14:25:56Z

The way I see it n=0 is equivalent to "doing nothing" which can be achieved by... doing nothing haha

AechPro · 2024-03-13T14:29:39Z

LOL 😅 yeah it makes sense to do it that way. I was just imagining a scenario where maybe a user is doing some sort of hyper-parameter investigation and they wanted to vary the value of n from 0 to some number without changing anything in the underlying learning algorithm, which is maybe a realistic thing to want if someone is interested in measuring the impact of rewards on the return estimate.

vmoens · 2024-03-13T14:32:19Z

I guess they will have to start from 1 :)

I think accounting for those edge cases causes more hustle and loads the doc while bringing little value in practice (+ requires proper testing of a behaviour that is even poorly defined on our end!)

AechPro · 2024-03-13T14:33:45Z

Yeah, fair enough. Let's keep it as you suggested then!

vmoens · 2024-03-18T08:52:27Z

cc @agarwl this is an implementation of multi-step that allows to dynamically change the horizon during training

init

1118e1c

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 11, 2024

amend

7550059

vmoens added the enhancement New feature or request label Mar 12, 2024

vmoens added 3 commits March 12, 2024 14:34

amend

5a5f561

amend

48f0d7d

amend

260ef94

amend

e1494f6

vmoens added 4 commits March 13, 2024 16:02

amend

5cc8bc1

amend

3f1dd7a

amend

71dc49d

amend

ddfce88

vmoens marked this pull request as ready for review March 18, 2024 08:51

vmoens merged commit e3b66bb into main Mar 18, 2024
45 of 52 checks passed

vmoens deleted the void-add-extend branch March 18, 2024 08:52

SandishKumarHN pushed a commit to SandishKumarHN/rl that referenced this pull request Mar 18, 2024

[Feature] RB MultiStep transform (pytorch#2008)

a87094d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] RB MultiStep transform #2008

[Feature] RB MultiStep transform #2008

vmoens commented Mar 11, 2024 •

edited

Loading

pytorch-bot bot commented Mar 11, 2024 •

edited

Loading

github-actions bot commented Mar 11, 2024 •

edited

Loading

github-actions bot commented Mar 11, 2024 •

edited

Loading

vmoens commented Mar 12, 2024

AechPro commented Mar 12, 2024

vmoens commented Mar 12, 2024

vmoens commented Mar 12, 2024

AechPro commented Mar 12, 2024

vmoens commented Mar 12, 2024 •

edited

Loading

AechPro commented Mar 12, 2024

AechPro commented Mar 13, 2024 •

edited

Loading

vmoens commented Mar 13, 2024

AechPro commented Mar 13, 2024

vmoens commented Mar 13, 2024

AechPro commented Mar 13, 2024 •

edited

Loading

vmoens commented Mar 13, 2024

AechPro commented Mar 13, 2024

vmoens commented Mar 18, 2024

[Feature] RB MultiStep transform #2008

[Feature] RB MultiStep transform #2008

Conversation

vmoens commented Mar 11, 2024 • edited Loading

pytorch-bot bot commented Mar 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2008

❌ 6 New Failures, 1 Unrelated Failure

github-actions bot commented Mar 11, 2024 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}9$.

github-actions bot commented Mar 11, 2024 • edited Loading

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}7$. Worsened: $\large\color{#d91a1a}2$.

vmoens commented Mar 12, 2024

AechPro commented Mar 12, 2024

vmoens commented Mar 12, 2024

vmoens commented Mar 12, 2024

AechPro commented Mar 12, 2024

vmoens commented Mar 12, 2024 • edited Loading

AechPro commented Mar 12, 2024

AechPro commented Mar 13, 2024 • edited Loading

vmoens commented Mar 13, 2024

AechPro commented Mar 13, 2024

vmoens commented Mar 13, 2024

AechPro commented Mar 13, 2024 • edited Loading

vmoens commented Mar 13, 2024

AechPro commented Mar 13, 2024

vmoens commented Mar 18, 2024

vmoens commented Mar 11, 2024 •

edited

Loading

pytorch-bot bot commented Mar 11, 2024 •

edited

Loading

github-actions bot commented Mar 11, 2024 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

github-actions bot commented Mar 11, 2024 •

edited

Loading

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

vmoens commented Mar 12, 2024 •

edited

Loading

AechPro commented Mar 13, 2024 •

edited

Loading

AechPro commented Mar 13, 2024 •

edited

Loading