Supported Policy Optimization

Official implementation for NeurIPS 2022 paper Supported Policy Optimization for Offline Reinforcement Learning.

🚩 News:

June, 2023: SPOT has been included in Clean Offline Reinforcement Learning (CORL) library as a strong baseline for Offline-to-Online RL. Thanks Tinkoff AI and Denis Tarasov for the implementation!

Environment

Install MuJoCo version 2.0 at ~/.mujoco/mujoco200 and copy license key to ~/.mujoco/mjkey.txt
Create a conda environment

conda env create -f conda_env.yml
conda activate spot

Install D4RL

Usage

Pretrained Models

We have uploaded pretrained VAE models and offline models to facilitate experiment reproduction. Download from this link and unzip:

unzip spot-models.zip -d .

Offline RL

Run the following command to train VAE.

python train_vae.py --env halfcheetah --dataset medium-replay
python train_vae.py --env antmaze --dataset medium-diverse --no_normalize

Run the following command to train offline RL on D4RL with pretrained VAE models.

python main.py --config configs/offline/halfcheetah-medium-replay.yml
python main.py --config configs/offline/antmaze-medium-diverse.yml

You can also specify the random seed and VAE model:

python main.py --config configs/offline/halfcheetah-medium-replay.yml --seed <seed> --vae_model_path <vae_model.pt>

Logging

This codebase uses tensorboard. You can view saved runs with:

tensorboard --logdir <run_dir>

Online Fine-tuning

Run the following command to online fine-tune on AntMaze with pretrained VAE models and offline models.

python main_finetune.py --config configs/online_finetune/antmaze-medium-diverse.yml

You can also specify the random seed, VAE model and offline models:

python main_finetune.py --config configs/online_finetune/antmaze-medium-diverse.yml --seed <seed> --vae_model_path <vae_model.pt> --pretrain_model <pretrain_model/>

Citation

If you find this code useful for your research, please cite our paper as:

@inproceedings{wu2022supported,
  title={Supported Policy Optimization for Offline Reinforcement Learning},
  author={Jialong Wu and Haixu Wu and Zihan Qiu and Jianmin Wang and Mingsheng Long},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Contact

If you have any question, please contact wujialong0229@gmail.com .

Acknowledgement

This repo borrows heavily from sfujim/TD3_BC and sfujim/BCQ.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SPOT.py		SPOT.py
conda_env.yml		conda_env.yml
eval.py		eval.py
log.py		log.py
main.py		main.py
main_finetune.py		main_finetune.py
train_vae.py		train_vae.py
utils.py		utils.py
vae.py		vae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supported Policy Optimization

Environment

Usage

Pretrained Models

Offline RL

Logging

Online Fine-tuning

Citation

Contact

Acknowledgement

About

Releases

Packages

Languages

License

thuml/SPOT

Folders and files

Latest commit

History

Repository files navigation

Supported Policy Optimization

Environment

Usage

Pretrained Models

Offline RL

Logging

Online Fine-tuning

Citation

Contact

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages