Official implementation for NeurIPS 2022 paper Supported Policy Optimization for Offline Reinforcement Learning.
🚩 News:
- June, 2023: SPOT has been included in Clean Offline Reinforcement Learning (CORL) library as a strong baseline for Offline-to-Online RL. Thanks Tinkoff AI and Denis Tarasov for the implementation!
- Install MuJoCo version 2.0 at ~/.mujoco/mujoco200 and copy license key to ~/.mujoco/mjkey.txt
- Create a conda environment
conda env create -f conda_env.yml
conda activate spot
- Install D4RL
We have uploaded pretrained VAE models and offline models to facilitate experiment reproduction. Download from this link and unzip:
unzip spot-models.zip -d .
Run the following command to train VAE.
python train_vae.py --env halfcheetah --dataset medium-replay
python train_vae.py --env antmaze --dataset medium-diverse --no_normalize
Run the following command to train offline RL on D4RL with pretrained VAE models.
python main.py --config configs/offline/halfcheetah-medium-replay.yml
python main.py --config configs/offline/antmaze-medium-diverse.yml
You can also specify the random seed and VAE model:
python main.py --config configs/offline/halfcheetah-medium-replay.yml --seed <seed> --vae_model_path <vae_model.pt>
This codebase uses tensorboard. You can view saved runs with:
tensorboard --logdir <run_dir>
Run the following command to online fine-tune on AntMaze with pretrained VAE models and offline models.
python main_finetune.py --config configs/online_finetune/antmaze-medium-diverse.yml
You can also specify the random seed, VAE model and offline models:
python main_finetune.py --config configs/online_finetune/antmaze-medium-diverse.yml --seed <seed> --vae_model_path <vae_model.pt> --pretrain_model <pretrain_model/>
If you find this code useful for your research, please cite our paper as:
@inproceedings{wu2022supported,
title={Supported Policy Optimization for Offline Reinforcement Learning},
author={Jialong Wu and Haixu Wu and Zihan Qiu and Jianmin Wang and Mingsheng Long},
booktitle={Advances in Neural Information Processing Systems},
year={2022}
}
If you have any question, please contact wujialong0229@gmail.com .
This repo borrows heavily from sfujim/TD3_BC and sfujim/BCQ.