Based on PARL, the PPO algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Atari benchmarks.
Include following approach:
- Clipped Surrogate Objective
- Adaptive KL Penalty Coefficient
Paper: PPO in Proximal Policy Optimization Algorithms
Please see here to know more about Mujoco games.
- python3.5+
- paddlepaddle>=1.6.1
- parl
- gym
- tqdm
- mujoco-py>=1.50.1.0
# To train an agent for HalfCheetah-v2 game (default: CLIP loss)
python train.py
# To train for different game and different loss type
# python train.py --env [ENV_NAME] --loss_type [CLIP|KLPEN]