-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] DDPG select also critic input for actor loss #1563
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but not sure we cover all use cases.
Can we add a test in TestDDPG?
Can we also change line 243:
self._in_keys = list(set(keys))
into
self._in_keys = sorted(set(keys), keys=str)
?
we should be there, now i just need to write tests |
unravel_key(("next", self.tensor_keys.reward)), | ||
unravel_key(("next", self.tensor_keys.done)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check but since it's a TensorDictModuleBase the in_keys are unravelled by default
https://github.com/pytorch-labs/tensordict/blob/accd8a4a31ec749f52e75a87a875424652069163/tensordict/nn/common.py#L474-L495
So I think we can spare the effort of doing that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but even if they are already unraveled, we are creating a new tuple ("next", already_unraveled_key)
which could be
("next","done")
or ("next",("nested", "done"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is why i am only unravleing the ones where we are putting next
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at the link: they will be unravelled after you set them in theory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
funky stuff! i ll make sure to test it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah maybe not for properties, this one's harder actually
Let's keep it as it is
Signed-off-by: Matteo Bettini <matbet@meta.com>
Before we were only selecting the actor input and output keys when computing the actor loss (which involves a forward critic pass).
This is limiting as the critic can have extra inputs (for example a fully observable state or a centralized state in MARL)
This PR fixes that by extending the selected keys.