-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150
Comments
@HenningBeyer check the pr-#1151 tried out a basic implementation? would love your feedback! |
Hi @HenningBeyer thanks for the issue. We have tried HAPPO in the past and found that it doesn't perform well, which is why we don't currently have it in Mava. If you'd like to contribute it you're welcome to just know that we'd like to fully benchmark it and see that it performs compatibly to MAPPO/it's paper results before accepting it. As for heterogenous algorithms we don't currently support this but I think it would be something good to add to the Mava roadmap (cc @RuanJohn). If you're looking to implement this, simply |
Hi @sash-a, thank you for your insights. The performance of MAPPO and HAPPO mostly matched in the HARL paper and follow-up papers, where HAPPO seems to perform equal or barely better than HetMAPPO/MAPPO in ~75% of the cases. So I think it's some little detail that's missing. Else, HAPPO outperformed MAPPO for highly heterogeneous + complex tasks as humanoid-v2 17x1, ShadowHandCatchOver2Underarm, or ShadowHandPen. While HAPPO and HetMAPPO achieve very similar performance for the simple homogeneous tasks like MPE in the HARL paper. So HAPPO might rather be relevant for cases of very complex and highly heterogeneous CTDE MARL tasks, where HASAC also does well. In my case, I looked for a less memory-intensive alternative to HASAC for large-scale MARL simulations, that may also conveniently fit on a single GPU and is more easily transferable to a real-world online-learning setup without the big replay buffer of HASAC. MAPPO/HetMAPPO + the recurrent variants should suffice here, although testing HAPPO would be interesting too in simulation. PQN (CTCE) or PQN-VDN (CTDE) look interesting for this use case too. PQN-VDN is currently implemented in JaxMARL and seems competitive to MAPPO in performance (based on the paper results). |
Feature
HAPPO seems to be missing in the current implementation and either being WIP or discarded, even though HASAC was implemented in Mava already. So implementing HAPPO + HAPPO_rec would be a relevant feature.
Heterogeneous agent versions of any I-type and MA-type Agent would be interesting for having heterogeneous agent counterparts of agents like IPPO and MAPPO, that could tackle tasks that require heterogeneity to solve them properly with the standard algorithms of IPPO/MAPPO.
Proposal
(Image Source)
I also looked through the Mava code and can't spot any barrier implementing this yet. One should essentially be able to implement this into the actor_loss_fn similarly to HASAC.
This documentation source might be helpful as additional context for HAPPO.
To implement Het- Agent versions, simply provide an option/implementation without parameter sharing for I- and MA-Type MARL agents. The recurrent versions for the Het-type agents might be kept up as well.
Definition of done
Implemented HAPPO, HAPPO_rec, HetIPPO, HetIPPO_rec, HetMAPPO_rec, HetISAC, HetMASAC.
One can do the same with IQL and QMIX, giving HetQMix_rec, HetIQL_rec - I wonder why there are no non-recurrent IQL/QMIX variants; I guess they do not perform that well as recurrent variants, but it could be still interesting to have a classic IQL/QMIX version for testing. [OPTIONAL]
In general, these are a lot of ideas/tasks. Feel free to just implement what seems relevant to you.
The text was updated successfully, but these errors were encountered: