[BugFix] Remove reset on last step of a rollout #1936

matteobettini · 2024-02-20T14:11:57Z

Discussion in #1929

pytorch-bot · 2024-02-20T14:12:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1936

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 5222496 with merge base 23bf315 ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Unit-tests on Windows / unittests-gpu / windows-job (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens

Thanks for this! Ideally we'd like a non regression test.
Also now that there is a reset missing, if the user takes the tensordict passed as output and feeds it back to rollout, rollout will need to call reset on it. I don't think this was part of rollout contract before: either a tensordict is passed or reset is called but not both (as this PR will dictate)

matteobettini · 2024-02-20T15:33:32Z

Ideally we'd like a non regression test.

I'll work on the test.

Also now that there is a reset missing, if the user takes the tensordict passed as output and feeds it back to rollout, rollout will need to call reset on it. I don't think this was part of rollout contract before: either a tensordict is passed or reset is called but not both (as this PR will dictate)

So this is a design choice, let me explain my view.

Before the PR

Before the PR, a user could not take the last tensordict from a rollout and safely pass it back in.

This is because if the env was reset on the last step, the data from that reset would not be part of the rollout results.

Thus, if the user passed the output of a rollout back into a new rollout, It would be providing old data from a previous trajectory to a freshly reset env which is in a different state.

After this PR

In the current state of this PR, the user still cannot just take the last tensordict from a rollout and safely pass it back in.

Doing so would have the same effect as prior to this PR, with the difference that now at least the env is not in a new reset state.

In my opinion this should remain like this (and like it was).

If the users chooses auto_reset=False it will be the duty of the user to pass non-done data as the input to a rollout (just as before).

The difference that this PR allows is that now users that want to run rollouts in a row are able to by adding the reset logic outside the rollout function (as detailed in the snippet in #1929 (comment)). Before they could not do this.

vmoens · 2024-02-20T17:00:21Z

If we patch things let's patch them correctly no?
Why not having something like

        if auto_reset:
            if tensordict is not None:
                raise RuntimeError(
                    "tensordict cannot be provided when auto_reset is True"
                )
            tensordict = self.reset()
        elif tensordict is None:
            raise RuntimeError("tensordict must be provided when auto_reset is False")
		else:
			aggregate_reset = _aggregate_end_of_traj(tensordict)
            if aggregate_reset.any():
				self.reset(tensordict)

I don't understand the reluctance against this, does this break anything?

matteobettini · 2024-02-20T17:13:57Z

I don't understand the reluctance against this, does this break anything?

Oh no this works too! I'm not against it.

It will just be a new feature.

Aka rollout will check that the passed tensordict is done and it will reset the env if so.

We can do it, a few considerations:

this feature should be activated if also auto_reset is set. This would change the meaning of auto_reset as it could be set at the same time as an input tensordict, generating the following cases:
- no tensordict passed -> auto_reset will either call reset or not (like before)
- tensordict passed and auto_reset=True -> rollout will check if the passed td is done and if so reset (this is the new param combination that was not allowed before and it will trigger the new feature)
- tensordict passed and auto_reset=False -> rollout will not check the input td for reset (like before)

keeping the last case is good for bc compatibility and there might be cases where you do not want the rollout to call a reset

vmoens · 2024-02-20T17:19:19Z

It will just be a new feature.

I think we're at the edge between new feature and bug fix. If you can't pass the tensordict you got from the last step instead of calling reset safely, this would be a bugfix.

matteobettini · 2024-02-20T17:21:27Z

If you can't pass the tensordict you got from the last step instead of calling reset safely, this would be a bugfix.

Yeah if this was supposed to be a feature before, I guess this is a bug fix

vmoens · 2024-02-20T17:22:35Z

keeping the last case is good for bc compatibility and there might be cases where you do not want the rollout to call a reset

If the env does not allow a step after done and you pass a tensordict that is done, it will result in an error. IMO we should capture that and make sure it does not happen for ease of use. I can't think of anyone using rollout reasonably right now who would be annoyed by this change and consider it bc-breaking, but I could be overlooking things.

To me auto_reset just means reset when starting. I don't think we should change the meaning. The consideration I was putting forward is orthogonal to auto_reset IMO.

matteobettini · 2024-02-20T17:28:23Z

If the env does not allow a step after done and you pass a tensordict that is done, it will result in an error. IMO we should capture that and make sure it does not happen for ease of use. I can't think of anyone using rollout reasonably right now who would be annoyed by this change and consider it bc-breaking, but I could be overlooking things.

To me auto_reset just means reset when starting. I don't think we should change the meaning. The consideration I was putting forward is orthogonal to auto_reset IMO.

This consideration makes an assumption about the env.
There could be users that might want to take steps in done environments

Forcing a reset on these users would narrow the flexibility of rollout with respect to prior versions.
I think if we introduce this auto resetting when a td is given, it is important to make it deactivatable.

If we want to have another param to deactivate it other than auto_reset that is fine too, I just though that auto_reset fits nicely as it was not allowed before and it is true by default.

EDIT: thinking about it, you might be right. Since rollout auto-resets anyway along the trajectory, it could makes sense that this is done on the input td as well. It will add computational complexity for checking the done tho, so maybe having it optional still makes sense?

vmoens · 2024-02-20T17:47:02Z

Not sure I'm following here

Previously we had a broken version of rollout where the reset data of the last step was potentially lost. Anyone wanting to do anything with this was doomed to fail. I don't think that in this scenario working with env that do not reset when done was an option. So this scenario can be excluded entirely and considered as a separate issue.

If we want to land this, to me having it work with tensordict that are in a done state is a pre-requirement. The change I'm proposing isn't bc-breaking in any way since it build upon a bug-fix: anyone who was doing anything with rollout before can only have fewer bugs now.

The runtime won't be affected since the check I'm suggesting is already performed by step_and_maybe_reset, which we now replace by step.

matteobettini · 2024-02-20T18:04:30Z

There is something that we still did not discuss.

If you really want to chain calls of rollout, then the function should call sefl.maybe_reset on the input dict (like in the current version of the PR).
Which includes step_mdp.

Or are we assuming that users call step_mdp beteween rollout calls? (which i guess was the assumption before?)

vmoens · 2024-02-20T18:51:08Z

It's safe to assume that users call step_mdp because the contract is that the tensordict you pass is similar to the one you'd get out of a call to reset

matteobettini · 2024-02-20T19:01:51Z

Ok if you wanna have a look now it should do what we want

vmoens

LGTM, just missing a docstring and a test
(maybe also writing a note in rollout doctring about all of this? Like "how to use rollout as a data collector" or similar?)

matteobettini · 2024-02-21T09:57:35Z

Further question:

if break_when_any_done is true, we still do the reset and ignore it?

matteobettini · 2024-02-21T10:40:22Z

LGTM, just missing a docstring and a test (maybe also writing a note in rollout doctring about all of this? Like "how to use rollout as a data collector" or similar?)

We should be gucci!

Also, I still haven't come around the problem in #1929, so currently rollout is still not working for me as a data collector.
Have to see if it is a bug of my brain or a a very subtle bug of the library.

Nevertheless, this can be merged.

torchrl/envs/common.py

…o fix_rollout

Co-authored-by: vmoens <vincentmoens@gmail.com>

Amend

016fec1

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 20, 2024

matteobettini changed the title ~~[BugFix] Remove rollout reset on last step~~ [BugFix] Remove reset on last step of a rollout Feb 20, 2024

vmoens reviewed Feb 20, 2024

View reviewed changes

vmoens added bug Something isn't working Suitable for minor Suitable to be integrated in minor release (no new feature) labels Feb 20, 2024

Amend

3e16d9f

Merge remote-tracking branch 'origin/main' into fix_rollout

611155a

matteobettini added 2 commits February 20, 2024 19:00

Amend

8d9bb83

Merge remote-tracking branch 'origin/fix_rollout' into fix_rollout

009fb79

vmoens reviewed Feb 20, 2024

View reviewed changes

vmoens added 2 commits February 20, 2024 13:29

Merge remote-tracking branch 'origin/main' into fix_rollout

4943eff

Merge remote-tracking branch 'origin/main' into fix_rollout

ad6b8d4

matteobettini added 4 commits February 21, 2024 10:10

docs

8289d04

docs

6c3fa98

docs

a84090a

docs

0d7ea9e

matteobettini added 3 commits February 21, 2024 10:27

docs

31d74d5

docs

cd3aaed

amend

79790da

vmoens reviewed Feb 21, 2024

View reviewed changes

vmoens added 10 commits February 21, 2024 17:40

Update torchrl/envs/common.py

447ee55

Update torchrl/envs/common.py

8d94a5c

Update torchrl/envs/common.py

621afd5

Update torchrl/envs/common.py

77374e0

Update torchrl/envs/common.py

47f02ec

Update torchrl/envs/common.py

7ae8092

Update torchrl/envs/common.py

aca27c5

Merge remote-tracking branch 'origin/main' into fix_rollout

a7d57f1

Merge branch 'fix_rollout' of https://github.com/matteobettini/rl int…

440a736

…o fix_rollout

lint

5222496

vmoens merged commit 03f4aa3 into pytorch:main Feb 21, 2024
65 of 68 checks passed

matteobettini deleted the fix_rollout branch February 21, 2024 22:11

vmoens added a commit that referenced this pull request Feb 28, 2024

[BugFix] Remove reset on last step of a rollout (#1936)

5e03a4c

Co-authored-by: vmoens <vincentmoens@gmail.com>

vmoens mentioned this pull request Feb 28, 2024

[NOMERG] Annoyingly creating a PR to log in failed jobs nodes through ssh #1979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Remove reset on last step of a rollout #1936

[BugFix] Remove reset on last step of a rollout #1936

matteobettini commented Feb 20, 2024

pytorch-bot bot commented Feb 20, 2024 •

edited

Loading

vmoens left a comment

matteobettini commented Feb 20, 2024 •

edited

Loading

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024 •

edited

Loading

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024 •

edited

Loading

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024

vmoens left a comment

matteobettini commented Feb 21, 2024

matteobettini commented Feb 21, 2024

[BugFix] Remove reset on last step of a rollout #1936

[BugFix] Remove reset on last step of a rollout #1936

Conversation

matteobettini commented Feb 20, 2024

pytorch-bot bot commented Feb 20, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1936

❌ 3 New Failures

vmoens left a comment

Choose a reason for hiding this comment

matteobettini commented Feb 20, 2024 • edited Loading

Before the PR

After this PR

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024 • edited Loading

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024 • edited Loading

vmoens commented Feb 20, 2024

matteobettini commented Feb 20, 2024

vmoens left a comment

Choose a reason for hiding this comment

matteobettini commented Feb 21, 2024

matteobettini commented Feb 21, 2024

pytorch-bot bot commented Feb 20, 2024 •

edited

Loading

matteobettini commented Feb 20, 2024 •

edited

Loading

matteobettini commented Feb 20, 2024 •

edited

Loading

matteobettini commented Feb 20, 2024 •

edited

Loading