CleanRL (Clean Implementation of RL Algorithms)

CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:

📜 Single-file implementation
- Every detail about an algorithm is put into the algorithm's own file. It is therefore easier to fully understand an algorithm and do research with.
📊 Benchmarked Implementation (7+ algorithms and 34+ games at https://benchmark.cleanrl.dev)
📈 Tensorboard Logging
🪛 Local Reproducibility via Seeding
🎮 Videos of Gameplay Capturing
🧫 Experiment Management with Weights and Biases
💸 Cloud Integration with docker and AWS

Good luck have fun 🚀

Get started

Prerequisites:

Python 3.8+
Poetry

To run experiments locally, give the following a try:

git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
poetry install

# alternatively, you could use `poetry shell` and do
# `python run cleanrl/ppo.py`
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000

# open another temrminal and enter `cd cleanrl/cleanrl`
tensorboard --logdir runs

To use experiment tracking with wandb, run

wandb login # only required for the first time
poetry run python cleanrl/ppo.py \
    --seed 1 \
    --gym-id CartPole-v0 \
    --total-timesteps 50000 \
    --track \
    --wandb-project-name cleanrltest

To run training scripts in other games:

poetry shell

# classic control
python cleanrl/dqn.py --gym-id CartPole-v1
python cleanrl/ppo.py --gym-id CartPole-v1
python cleanrl/c51.py --gym-id CartPole-v1

# atari
poetry install -E atari
python cleanrl/dqn_atari.py --gym-id BreakoutNoFrameskip-v4
python cleanrl/c51_atari.py --gym-id BreakoutNoFrameskip-v4
python cleanrl/ppo_atari.py --gym-id BreakoutNoFrameskip-v4
python cleanrl/apex_dqn_atari.py --gym-id BreakoutNoFrameskip-v4

# pybullet
poetry install -E pybullet
python cleanrl/td3_continuous_action.py --gym-id MinitaurBulletDuckEnv-v0
python cleanrl/ddpg_continuous_action.py --gym-id MinitaurBulletDuckEnv-v0
python cleanrl/sac_continuous_action.py --gym-id MinitaurBulletDuckEnv-v0

# procgen
poetry install -E procgen
python cleanrl/ppo_procgen.py --gym-id starpilot
python cleanrl/ppo_procgen_impala_cnn.py --gym-id starpilot
python cleanrl/ppg_procgen.py --gym-id starpilot
python cleanrl/ppg_procgen_impala_cnn.py --gym-id starpilot

Algorithms Implemented

Deep Q-Learning (DQN)
- dqn.py
  - For discrete action space.
- dqn_atari.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
Categorical DQN (C51)
- c51.py
  - For discrete action space.
- c51_atari.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
- c51_atari_visual.py
  - Adds return and q-values visulization for dqn_atari.py.
Proximal Policy Gradient (PPO)
- All of the PPO implementations below are augmented with some code-level optimizations. See https://costa.sh/blog-the-32-implementation-details-of-ppo.html for more details
- ppo.py
  - For discrete action space.
- ppo_continuous_action.py
  - For continuous action space. Also implemented Mujoco-specific code-level optimizations
- ppo_atari.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
Soft Actor Critic (SAC)
- sac_continuous_action.py
  - For continuous action space.
Deep Deterministic Policy Gradient (DDPG)
- ddpg_continuous_action.py
  - For continuous action space.
Twin Delayed Deep Deterministic Policy Gradient (TD3)
- td3_continuous_action.py
  - For continuous action space.
Apex Deep Q-Learning (Apex-DQN)
- apex_dqn_atari_visual.py
  - For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.

Open RL Benchmark

Open RL Benchmark by CleanRL is a comprehensive, interactive and reproducible benchmark of deep Reinforcement Learning (RL) algorithms. It uses Weights and Biases to keep track of the experiment data of popular deep RL algorithms (e.g. DQN, PPO, DDPG, TD3) in a variety of games (e.g. Atari, Mujoco, PyBullet, Procgen, Griddly, MicroRTS). The experiment data includes:

reproducibility info:
- source code and requirements.txt
- hyper-parameters and the exact command to reproduce results
metrics:
- training metrics and videos of the agents playing the game
- system metrics and logs

Open RL Benchmark has over 1000+ experiments including runs from other projects, which is overwhelming to present in a single report. Instead, we present the results in separate reports. Please click on the links below to access them.

We hope it could bring a new level of transparency, openness, and reproducibility. Our plan is to benchmark as many algorithms and games as possible. If you are interested, please join us and contribute more algorithms and games. To get started, check out our contribution guide and our roadmap for the Open RL Benchmark

Cloud integration

Check out the documentation here

Support and get involved

We have a Discord Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome. Also our past video recordings are available at YouTube

Contribution

We have a short contribution guide here https://github.com/vwxyzjn/cleanrl/blob/master/CONTRIBUTING.md. Consider adding new algorithms or test new games on the Open RL Benchmark (https://benchmark.cleanrl.dev)

Big thanks to all the contributors of CleanRL!

References

I have been heavily inspired by the many repos and blog posts. Below contains a incomplete list of them.

The following ones helped me a lot with the continuous action space handling:

Citing CleanRL

If you use CleanRL in your work, please cite our technical paper:

@article{huang2021cleanrl,
    title={CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms}, 
    author={Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga},
    year={2021},
    journal={arXiv preprint arXiv:2111.08819},
    url={https://arxiv.org/abs/2111.08819}
}

Name		Name	Last commit message	Last commit date
Latest commit History 666 Commits
.github		.github
cleanrl		cleanrl
cleanrl_utils		cleanrl_utils
cloud		cloud
docs		docs
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CleanRL (Clean Implementation of RL Algorithms)

Get started

Algorithms Implemented

Open RL Benchmark

Cloud integration

Support and get involved

Contribution

References

Citing CleanRL

About

Releases

Packages

Languages

License

RL-code-lib/cleanrl

Folders and files

Latest commit

History

Repository files navigation

CleanRL (Clean Implementation of RL Algorithms)

Get started

Algorithms Implemented

Open RL Benchmark

Cloud integration

Support and get involved

Contribution

References

Citing CleanRL

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages