Skip to content
/ BCO Public
forked from montaserFath/BCO

behavior cloning from observation

License

Notifications You must be signed in to change notification settings

ErlebnisW/BCO

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 Cannot retrieve latest commit at this time.

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Behavior Cloning (BC) and Behavior Cloning from Observation (BCO)

  • Implementation for Behavior Cloning (BC) and behavior cloning from observation (BCO) (pdf) in Pytorch for OpenAI Gym Environment

  • Behavior Cloning (BC) and behavior cloning from observation (BCO) are Imitation Learning algorithms

  • Behavior Cloning (BC) assume that you have access to expert's states and actions but behavior cloning from observation assume that you have access to expert's States only

How it works?

1- Collecting data:

  • Learner: exploration policy, save states and actions

  • Expert: train expert (if you don’t have one), save states only.

  • all data available here

2- Train Inverse dynamic model (T):

  • Input: Learner current state and Learner next state.

  • Output: predicted Learner current action.

  • Loss function: MSE, L1loss or NLL (predicted Learner current action, Learner current action).

3- Test: Inverse dynamic model (T):

  • Input: Expert current state and Expert next state

  • Output: predicted Expert current action.

4- Train Behaviour model (policy):

  • Input: Expert current state.

  • Output: prediction of predicted Expert current action.

  • Loss: MSE, L1loss or NLL (prediction of predicted Expert current action, predicted Expert current action).

5- Learner interacts with environment BCO(alpha):

  • Learner use Behaviour model (policy) to get action given current state.

  • Collect new data (states and actions)

  • Use collected data to update Inverse dynamic model (T) and Behaviour model (policy) (repeat 2, 3, and 4)

OpenAI Gym Enviroment

  • Open AI Gym has several environments, We Use classical control environments Pendulum and Bipedal Walker2D environmens.

Installing

pip install gym
pip install numpy
pip install box2d-py
pip install torchvision

Data

Results

BCO VS BC Pendelum_result

alpha

Demo

BC

BCO

About

behavior cloning from observation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%