-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Multiagent "auto" entropy fix in SAC #1494
[BugFix] Multiagent "auto" entropy fix in SAC #1494
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also address this in other algos?
TD3, CQL, REDQ, DTs?
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could run a quick check in the test by passing composite actions to the losses and look at the target entropy?
Also: what I understand is that the target entropy is the number of agents.
Why is it that for single agent the entropy is the size of the action (eg, an action of size 100 has a very low target entropy) but for batched actions (not always MA but sometimes just composite actions) the target entropy is the number of actions?
How do we compute the entropy? Sum of the entropies?
I don't see the math behind these changes
The purpose of this change is to exclude the agents from the entropy calculation. there are no changes in the math here. Therefore, we need to remove the batch size before computing this multiplication, otherwise the batch size will influence it. |
Oh ok my mistake I understood something else! |
Signed-off-by: Matteo Bettini <matbet@meta.com> Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
In SAC, when the entropy target is set to "auto", it is computed using the shape of the action.
In multi-agent settings this shape included the number of agents, which should not be the case.
This PR fixes that