Objective: Benchmark reinforcement learning (RL) and imitation Learning (GAIL) algorithms from Stable Baselines 2.10 on OpenAI Gym and AirSim environments. To be more specific, the goal of this codebase is to:
- Train a GAIL model to imitate expert demonstrations generated from a trained RL model
- Integrate several cool features provided by Stable Baselines (to the best of my knowledge, uncharted territory!)
Idea: Pick your favourite [task, RL algo] pair -> train RL -> rollout expert data -> train GAIL -> verify imitation
Framework, langauge, OS: Tensorflow 1.14, Python 3.7, Windows 10
Thesis problem statement: Imitate Autonomous UAV maneuver and landing purely from human demonstrations. We train GAIL on a custom environment built on Microsoft AirSim 2.0. Short video here
The implementation uses Stable Baselines 2.10. Inlcuded 'utils.py' from here to save hyperparameters as a Dict object
# create virtual environment (optional)
conda create -n myenv python=3.7
conda activate myenv
git clone https://github.com/prabhasak/masters-thesis.git
cd masters-thesis
pip install -r requirements.txt # recommended
pip install stable-baselines[mpi] # MPI needed for TRPO, GAIL
For CustomEnvs and CustomAlgos: Register your CustomEnv on Gym (examples), and add your custom env and/or algorithm details to the code. You can use the "airsim_env"
folder for reference
For AirSim: Some resources to generate custom binary files, modify settings. Binaries for my thesis available here. You will have to run them before running the code
- Train RL and GAIL:
python train.py --seed 42 --env Pendulum-v0 --algo sac -rl -trl 1e5 -il -til 3e5 -best -check -eval -tb -params-RL learning_starts:1000 -params-IL lam:0.9 vf_iters:10
Exclude -rl
if expert data is available. For deterministic evaluation of expert data, add deterministic=True
here. Tuned hyperparameters (HPs) are available on Baselines Zoo. Please read description.txt
for info on sub-folders
-
Check expert data:
python expert_data_view.py --seed 42 --env Pendulum-v0 --algo sac --episodic
If--episodic
, use 'c' to go through each episode, and 'q' to stop the program -
Render expert data:
python expert_data_render.py --seed 42 --env My-Pendulum-v0 --algo sac --render
For envs in "custom_env" folder. If--episodic
, use 'c' to go through each episode, and 'q' to stop the program -
Evaluate, render model:
python model_render.py --seed 42 --env Pendulum-v0 --algo sac --mode rl -policy
Verify optimality of trained RL model and imitation accuracy of trained GAIL model. Include--test
to render
The codebase contains Tensorboard and Callback features, which help monitor performance during training. You can enable them with -tb
and -check,-eval
respectively. TB: tensorboard --logdir "/your/file/path"
. Callbacks for:
- Saving the model periodically (useful for continual learning and to resume training)
- Evaluating the model periodically and saves the best model throughout training (you can choose to save and evaluate just the best model found throughout training with
-best
)
- Multiprocessing: speed up training (observed 6x speedup for CartPole-v0 on my CPU with 12 threads)
- HP tuning: find the best set of hyperparameters for an [env, algo] pair
- VecNormalize: normalize env observation, action spaces (useful for MuJoCo environments)
- Monitor: record internal state information during training (episode length, rewards)
- (i) Comparing consecutive runs of the experiment, and (ii) passing arguments, HPs to custom environments
This is a work in progress (available here), but I hope to release clean code once my reasearch is done!