week11_rl

Materials (based on `practical_rl` course)

Part 1 - intro to gym(nasium) interface -

part 2 - implement REINFORCE with a neural network agent -

Optionally, if you want to go full hardcore, you may choose to implement the actor-critic algorithm in a2c-optional.ipynb.

A full-term course on reinforcement learning - practical_rl
Actually proving the policy gradient for discounted rewards - article
On variance of policy gradient and optimal baselines: article, another article
Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article
Generalizing log-derivative trick - url
Combining policy gradient and q-learning - arxiv
Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf
Adversarial review of policy gradient - blog