Materials (based on practical_rl
course)
- Slides
- Video lecture by D. Silver - https://www.youtube.com/watch?v=KHZVXao4qXs
- Our lecture, seminar
- Alternative lecture by J. Schulman part 1 - https://www.youtube.com/watch?v=BB-BhTn6DCM
- Alternative lecture by J. Schulman part 2 - https://www.youtube.com/watch?v=Wnl-Qh2UHGg
Part 1 - intro to gym(nasium) interface -
part 2 - implement REINFORCE with a neural network agent -
Optionally, if you want to go full hardcore, you may choose to implement the actor-critic algorithm in a2c-optional.ipynb
.
-
A full-term course on reinforcement learning - practical_rl
-
Actually proving the policy gradient for discounted rewards - article
-
On variance of policy gradient and optimal baselines: article, another article
-
Generalized Advantage Estimation - a way you can speed up training for homework_*.ipynb - article
-
Generalizing log-derivative trick - url
-
Combining policy gradient and q-learning - arxiv
-
Bayesian perspective on why reparameterization & logderivative tricks matter (Vetrov's take) - pdf
-
Adversarial review of policy gradient - blog