Policy Gradient Algorithms VPG (VANILLA POLICY GRADIENT) PPO (PROXIMAL POLICY OPTIMIZATION) TRPO (TRUST REGION POLICY OPTIMIZATION) Installation pip install matplotlib gym==0.25.2 tensorflow keras-rl2 pyglet protobuf==3.20.* Training results