Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Multiagent "auto" entropy fix in SAC #1494

Merged
merged 4 commits into from
Sep 7, 2023

Conversation

matteobettini
Copy link
Contributor

In SAC, when the entropy target is set to "auto", it is computed using the shape of the action.
In multi-agent settings this shape included the number of agents, which should not be the case.

This PR fixes that

Signed-off-by: Matteo Bettini <matbet@meta.com>
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 6, 2023
@vmoens vmoens added bug Something isn't working Refactoring Refactoring of an existing feature labels Sep 6, 2023
Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also address this in other algos?
TD3, CQL, REDQ, DTs?

torchrl/objectives/sac.py Outdated Show resolved Hide resolved
matteobettini and others added 3 commits September 6, 2023 14:22
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
Signed-off-by: Matteo Bettini <matbet@meta.com>
Signed-off-by: Matteo Bettini <matbet@meta.com>
@matteobettini
Copy link
Contributor Author

Shall we also address this in other algos? TD3, CQL, REDQ, DTs?

done

Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could run a quick check in the test by passing composite actions to the losses and look at the target entropy?
Also: what I understand is that the target entropy is the number of agents.
Why is it that for single agent the entropy is the size of the action (eg, an action of size 100 has a very low target entropy) but for batched actions (not always MA but sometimes just composite actions) the target entropy is the number of actions?
How do we compute the entropy? Sum of the entropies?

I don't see the math behind these changes

@matteobettini
Copy link
Contributor Author

Also: what I understand is that the target entropy is the number of agents. Why is it that for single agent the entropy is the size of the action (eg, an action of size 100 has a very low target entropy) but for batched actions (not always MA but sometimes just composite actions) the target entropy is the number of actions? How do we compute the entropy? Sum of the entropies?

I don't see the math behind these changes

The purpose of this change is to exclude the agents from the entropy calculation.

there are no changes in the math here.
The target entropy is always computed using the number of actions.
If i have an action with shape [3] it is 3, if i have a multidimnesional action [3,4] it is 12.

Therefore, we need to remove the batch size before computing this multiplication, otherwise the batch size will influence it.
In this case i am removing the batch size of the composite spec containing the action from the action itself (which removes the number of agents).

@vmoens
Copy link
Contributor

vmoens commented Sep 7, 2023

Oh ok my mistake I understood something else!

@vmoens vmoens merged commit 4c50f1e into pytorch:main Sep 7, 2023
@matteobettini matteobettini deleted the fix_sac_multiagent_auto_entropy branch September 7, 2023 14:53
vmoens added a commit to hyerra/rl that referenced this pull request Oct 10, 2023
Signed-off-by: Matteo Bettini <matbet@meta.com>
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactoring Refactoring of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants