We provide the following four multi-agent extensions to PPO following the Anakin architecture.
In all cases IPPO implies that it is an implementation following the independent learners MARL paradigm while MAPPO implies that the implementation follows the centralised training with decentralised execution paradigm by having a centralised critic during training. The ff
or rec
suffixes in the system names implies that the policy networks are MLPs or have a GRU memory module to help learning despite partial observability in the environment.
In addition to the Anakin-based implementations, we also include a Sebulba-based implementation of ff-IPPO which can be used on environments that are not written in JAX and adhere to the Gymnasium API.