-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] SAC compatibility with composite distributions. #2447
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2447
Note: Links to docs will display an error until the docs builds have been completed. ❌ 12 New Failures, 7 Unrelated FailuresAs of commit 53ba371 with merge base f411f93 (): NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@@ -251,7 +264,7 @@ class _AcceptedKeys: | |||
state_action_value (NestedKey): The input tensordict key where the | |||
state action value is expected. Defaults to ``"state_action_value"``. | |||
log_prob (NestedKey): The input tensordict key where the log probability is expected. | |||
Defaults to ``"_log_prob"``. | |||
Defaults to ``"sample_log_prob"``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this key to make it work with the new API. It did not seem to break anything, but I imagine there was a reason for the naming.
@@ -450,9 +463,7 @@ def target_entropy(self): | |||
else: | |||
action_container_shape = action_spec.shape | |||
target_entropy = -float( | |||
action_spec[self.tensor_keys.action] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This too I had to change. Shouldn't action spec already have the correct shape?
886705d
to
53ba371
Compare
Description
This PR adapts the SAC objective to be compatible with composite distributions.
DiscreteSAC seems a little bit more tricky, as it requires computing the log probabilities of all possible actions. In the case of composite distributions this might mean the probabilities of all combinations of actions.
Motivation and Context
Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax
close #15213
if this solves the issue #15213Types of changes
What types of changes does your code introduce? Remove all that do not apply:
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!