ppo

Proximal Policy Optimization

We provide the following four multi-agent extensions to PPO following the Anakin architecture.

In all cases IPPO implies that it is an implementation following the independent learners MARL paradigm while MAPPO implies that the implementation follows the centralised training with decentralised execution paradigm by having a centralised critic during training. The ff or rec suffixes in the system names implies that the policy networks are MLPs or have a GRU memory module to help learning despite partial observability in the environment.

In addition to the Anakin-based implementations, we also include a Sebulba-based implementation of ff-IPPO which can be used on environments that are not written in JAX and adhere to the Gymnasium API.

Name		Name	Last commit message	Last commit date
parent directory ..
anakin		anakin
sebulba		sebulba
README.md		README.md
__init__.py		__init__.py
types.py		types.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ppo

ppo

README.md

Proximal Policy Optimization

Relevant papers:

Files

ppo

Directory actions

More options

Directory actions

More options

Latest commit

History

ppo

Folders and files

parent directory

README.md

Proximal Policy Optimization

Relevant papers: