Skip to content

Latest commit

 

History

History
 
 

mbmpo

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MBMPO (Model-Based Meta-Policy-Optimization)

MBMPO is a Dyna-styled model-based RL method that learns based on the predictions of an ensemble of transition-dynamics models. Similar to MAML, MBMPO metalearns an optimal policy by treating each dynamics model as a different task. Similar to the original paper, MBMPO is evaluated on MuJoCo, with the horizon set to 200 instead of the default 1000.

Additional statistics are logged in MBMPO. Each MBMPO iteration corresponds to multiple MAML iterations, and MAMLIter_i_DynaTrajInner_j_episode_reward_mean measures the agent’s returns across the dynamics models at iteration i of MAML and step j of inner adaptation.

Installation

conda create -n rllib-mbmpo python=3.10
conda activate rllib-mbmpo
pip install -r requirements.txt
pip install -e '.[development]'

Usage

MBMPO Example