(AAAI24 oral) Implementation of RPPO(Risk-sensitive PPO) and RPBT(Population-based self-play with RPPO)
competition ppo population-based-training self-play multi-agent-reinforcement-learning risk-sensitive-preferences reinforcment-learning
-
Updated
May 22, 2023 - Python