Dependencies

This repo depends on pytorch (version 1.9.0) and mujoco (version 200). I am running python 3.7 on my system.

Other requirements can be installed via pip by running:

pip install -r requirements.txt

Running the code

Data

Data must be saved into a replay buffer in the data/ directory by running e.g. python d4rl_to_replay.py --name halfcheetah-medium-v2 for whichever D4RL dataset you want data from.

Training

The train loop is in train.py which can capture all algorithms in the paper by varying parameters as described below.

Important: to run the training files, you need to include the path from the root to the onestep-rl directory on your machine as the path variable in the config/train.yaml file, e.g. path: /path/to/onestep-rl.

Setting the training loop hyperparameters

Config files with all the relevant hyperparameters can be found in the config/ directory.

To get the one-step algorithm from the paper we set the following parameters of the training loop in config/train.yaml:

beta_steps: 5e5
steps: 1
qs_teps: 2e6
pi_steps: 1e5

To get the iterative algorithms we load the pre-trained beta and q estimators used by the one-step algorithm and then run

beta_steps: 0
steps: 1e5
q_steps: 2
pi_steps: 1

For the multi-step algorithms we load the pre-trained beta and q estimators used by the one-step algorithm and then run

beta_steps: 0
steps: 5
q_steps: 2e5
pi_steps: 2e4

Figures

All the figures from the paper along with the notebooks that generated them are in the figures/ directory. Due to space constraints, the data and log files needed to generate the figures are not included.

Citation

If you use this repo in you research, please cite the paper as follows

@article{brandfonbrener2021offline,
  title={Offline RL Without Off-Policy Evaluation},
  author={Brandfonbrener, David and Whitney, William F and Ranganath, Rajesh and Bruna, Joan},
  journal={arXiv preprint arXiv:2106.08909},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data		data
figures		figures
logs		logs
models		models
.gitignore		.gitignore
baseline_learner.py		baseline_learner.py
d4rl_to_replay.py		d4rl_to_replay.py
eval.py		eval.py
experiment_logging.py		experiment_logging.py
policy_learner.py		policy_learner.py
policy_network.py		policy_network.py
q_learner.py		q_learner.py
q_network.py		q_network.py
readme.md		readme.md
replay_buffer.py		replay_buffer.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Running the code

Data

Training

Setting the training loop hyperparameters

Figures

Citation

About

Releases

Packages

Languages

davidbrandfonbrener/onestep-rl

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Running the code

Data

Training

Setting the training loop hyperparameters

Figures

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages