week2_value_based

Materials

Lecture slides
Our videos: lecture seminar (russian)
[main] lecture by David Silver - url
Alternative lecture by Pieter Abbeel (english): part 1, part 2
Alternative lecture by John Schulmann (english): video
Definitive guide in policy/value iteration from Sutton: start from page 81 here.

Planning by dynamic programming (D. Silver) - video
Planning via tree search videos 2-6 from CS188
Our lecture:
- Slides part1 (intro), part2 (pomdp)
- Lecture & seminar
Monte-carlo tree search
- Udacity video on monte-carlo tree search (first part of a chain) - video
- Reminder: UCB-1 - slides
- Monte-carlo tree search step-by-step by J.Levine - video
- Guide to MCTS (monte-carlo tree search) - post
- Another guide to MCTS - url
Integrating learning and planning (D. Silver) - video
Approximating the MCTS optimal actions - 5vision solution for deephack.RL, code by Mikhail Pavlov - repo

The main assignment is seminar1_VI.ipynb notebook in this week's folder.

If you're interested in model-based RL at scale, go through Materials: planning section and proceed with seminar2_MCTS.ipynb notebook.