Diffusion-Offline-RL

In this work, we propose Diffusion-QL which utilizes a diffusion model as a highly expressive policy class for behavior cloning and policy regularization. In our approach we learn an action-value function and we add a term maximising action-values to the the training loss of the diffusion model, which results in a loss that seeks optimal actions that are near the behavior policy.

Dependencies

Plese see the requirements.txt file for the detailed python package dependencies for our project.

Run our Code

Running our code is quite easy, such as an example below,

python run_offline.py --env_name walker2d-medium-expert-v2 --algo pcq

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_bc.py		run_bc.py
run_offline.py		run_offline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion-Offline-RL

Dependencies

Run our Code

About

Releases

Packages

Languages

License

twitter/diffusion-rl

Folders and files

Latest commit

History

Repository files navigation

Diffusion-Offline-RL

Dependencies

Run our Code

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages