This is a simple implemenation of off-policy TRPO (link).
- python 3.7 or greater
- gym
- mujoco-py (https://github.com/openai/mujoco-py)
- stable-baselines3
- torch==1.10.0 or greater
- requests
- wandb
- obtained by training with three seeds.
- {algo_name}-Norm: training with state normalization.