Skip to content

Latest commit

 

History

History

week11_rl

Materials (based on practical_rl course)

Practice

Part 1 - intro to gym(nasium) interface - Open In Colab

part 2 - implement REINFORCE with a neural network agent - Open In Colab

Optionally, if you want to go full hardcore, you may choose to implement the actor-critic algorithm in a2c-optional.ipynb.

More materials

  • A full-term course on reinforcement learning - practical_rl

  • Actually proving the policy gradient for discounted rewards - article

  • On variance of policy gradient and optimal baselines: article, another article

  • Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article

  • Generalizing log-derivative trick - url

  • Combining policy gradient and q-learning - arxiv

  • Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf

  • Adversarial review of policy gradient - blog