[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

HenningBeyer · 2024-12-25T02:01:24Z

Feature

HAPPO seems to be missing in the current implementation and either being WIP or discarded, even though HASAC was implemented in Mava already. So implementing HAPPO + HAPPO_rec would be a relevant feature.
Heterogeneous agent versions of any I-type and MA-type Agent would be interesting for having heterogeneous agent counterparts of agents like IPPO and MAPPO, that could tackle tasks that require heterogeneity to solve them properly with the standard algorithms of IPPO/MAPPO.
- For context Agents like HetIPPO were introduced in the VMAS paper, while a "HetMAPPO" version was used in the MAPPO Paper to allow the use of heterogeneous agents. I spotted TorchRL adopting these heterogeneous agent versions already (see this line), while they still miss agents like HAPPO or HASAC.
- Heterogeneous agent versions would also enable better comparability between agents like (Het-)MAPPO and the more advanced HAPPO agent, which might be relevant for future research on the MARL agents, and in application.
- HetMAPPO/HetIPPO means only not to share parameters in MAPPO/IPPO. So all that needs to be done to implement Het- agent versions, is to disable parameter sharing. HAPPO differs from HetMAPPO in its additional sequential updating scheme, while HetMAPPO simply updates all agents at once like MAPPO.

Proposal

It seems that to implement HAPPO not much has to be changed from MAPPO apart from adding the sequential HARL updating scheme that (of my knowledge) only needs to calculate updating weights M as in the depiction below. Else, the implementation should be the same as MAPPO.

I also looked through the Mava code and can't spot any barrier implementing this yet. One should essentially be able to implement this into the actor_loss_fn similarly to HASAC.

This documentation source might be helpful as additional context for HAPPO.

To implement Het- Agent versions, simply provide an option/implementation without parameter sharing for I- and MA-Type MARL agents. The recurrent versions for the Het-type agents might be kept up as well.

Definition of done

Implemented HAPPO, HAPPO_rec, HetIPPO, HetIPPO_rec, HetMAPPO_rec, HetISAC, HetMASAC.

One can do the same with IQL and QMIX, giving HetQMix_rec, HetIQL_rec - I wonder why there are no non-recurrent IQL/QMIX variants; I guess they do not perform that well as recurrent variants, but it could be still interesting to have a classic IQL/QMIX version for testing. [OPTIONAL]

In general, these are a lot of ideas/tasks. Feel free to just implement what seems relevant to you.

ch33nchan · 2024-12-27T01:20:55Z

@HenningBeyer check the pr-#1151 tried out a basic implementation? would love your feedback!

sash-a · 2024-12-27T11:41:54Z

Hi @HenningBeyer thanks for the issue. We have tried HAPPO in the past and found that it doesn't perform well, which is why we don't currently have it in Mava. If you'd like to contribute it you're welcome to just know that we'd like to fully benchmark it and see that it performs compatibly to MAPPO/it's paper results before accepting it.

As for heterogenous algorithms we don't currently support this but I think it would be something good to add to the Mava roadmap (cc @RuanJohn). If you're looking to implement this, simply vmaping the network.init function is a good place to start as this will give you a set of parameters per agent

HenningBeyer · 2024-12-27T20:33:25Z

Hi @sash-a, thank you for your insights.

The performance of MAPPO and HAPPO mostly matched in the HARL paper and follow-up papers, where HAPPO seems to perform equal or barely better than HetMAPPO/MAPPO in ~75% of the cases. So I think it's some little detail that's missing.

Else, HAPPO outperformed MAPPO for highly heterogeneous + complex tasks as humanoid-v2 17x1, ShadowHandCatchOver2Underarm, or ShadowHandPen. While HAPPO and HetMAPPO achieve very similar performance for the simple homogeneous tasks like MPE in the HARL paper.

So HAPPO might rather be relevant for cases of very complex and highly heterogeneous CTDE MARL tasks, where HASAC also does well. In my case, I looked for a less memory-intensive alternative to HASAC for large-scale MARL simulations, that may also conveniently fit on a single GPU and is more easily transferable to a real-world online-learning setup without the big replay buffer of HASAC. MAPPO/HetMAPPO + the recurrent variants should suffice here, although testing HAPPO would be interesting too in simulation.

PQN (CTCE) or PQN-VDN (CTDE) look interesting for this use case too. PQN-VDN is currently implemented in JaxMARL and seems competitive to MAPPO in performance (based on the paper results).

HenningBeyer added the enhancement New feature or request label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

HenningBeyer commented Dec 25, 2024

ch33nchan commented Dec 27, 2024

sash-a commented Dec 27, 2024

HenningBeyer commented Dec 27, 2024

[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

[FEATURE] HAPPO, and HetIPPO, HetMAPPO, ... Implementations #1150

Comments

HenningBeyer commented Dec 25, 2024

Feature

Proposal

Definition of done

ch33nchan commented Dec 27, 2024

sash-a commented Dec 27, 2024

HenningBeyer commented Dec 27, 2024