This is some naive implementations of deep reinforcement learning algorithms. The purpose of this repo is to help me understand the algorithms and the code. The code is not optimized for performance. If you want to use the code for your research, please refer to the original paper and the official implementation. I verify the code with OpenAI gymnasium. The most of games that I used is LunarLander-v3
, CartPole-v1
and Pendulum-v1
conda create -n rltorch pytorch torchvision torchaudio pytorch-cuda=12.1 gymnasium pyglet pygame gymnasium-box2d colorama pylint yapf tqdm 'tensorboardx>=2.5.0' 'tensorboard>2.0' pillow matplotlib scipy seaborn ipykernel -c conda-forge -c pytorch -c nvidia
This project does not provide the trained Deep Reinforcement Learning model weight.
You can start training model under conda environment by
(rltorch) > python -m <project name>.main
For example (DDPG):
(rltorch) > python -m DDPG.main
|
|
|
|
|
|
|
|
|
|
|
|
Improved with Gumbel Distribution Regression from XQL
XAWR |
XDDPG |
XTD3 |
XSAC |
- TrainMonitor and Generategif modified from coax
- https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
- https://arxiv.org/pdf/1509.06461.pdf
- https://arxiv.org/pdf/1509.02971.pdf
- https://arxiv.org/pdf/1707.06347.pdf
- https://arxiv.org/pdf/1707.06887.pdf
- https://openreview.net/attachment?id=H1gdF34FvS&name=original_pdf
- https://proceedings.neurips.cc/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
- https://div99.github.io/XQL/