A course on reinforcement learning in the wild. Taught on-campus in HSE and Yandex SDA (russian) and maintained to be friendly to online students (both english and russian).
- Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
- Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that allows to “feel” it on a practical problem.
- Git-course. Know a way to make the course better? Noticed a typo in a formula? Made the code more readable? Made a version for alternative framework? You're awesome! Pull-request it!
- HSE classes are on mondays at 18-10 in Room 505
- YSDA classes are on thursdays at 18-00 in "Princeton" classroom
- Online student survival guide
- Installing the libraries - guide and issues thread
- Magical button that creates VM: (may be down time to time
- Telegram chat room (russian)
- English chat -
- How to submit homeworks[HSE and YSDA only]: anytask instructions and grading rules
- E-mail for everything else : practicalrl17@gmail.com (please don't submit homeworks via e-mail)
- Anonymous feedback form for everything that didn't go through e-mail.
- About the course
- 16.02.17 - HSE homework 3 added
- 14.02.17 - HSE deadlines for weeks 1-2 extended!
- 14.02.17 - anytask invites moved here
- 14.02.17 - if you're from HSE track and we didn't reply to your week0 homework submission, raise panic!
- 11.02.17 - week2 success thresholds are now easier: get >+50 for LunarLander or >-180 for MountainCar. Solving env will yield bonus points.
- 13.02.17 - Added invites for anytask.org
- 10.02.17 - from now on, we'll formally describe homework and add useful links via ./week*/README.md files. Example.
- 9.02.17 - YSDA track started
- 7.02.17 - HWs checked up
- 6.02.17 - week2 uploaded
- 27.01.17 - merged fix by omtcyfz, thanks!
- 27.01.17 - added course mail for homework submission: practicalrl17@gmail.com
- 23.01.17 - first class happened
- 23.01.17 - created repo
- week0 Welcome to the MDP
- Lecture: RL problems around us. Markov decision process. Simple solutions through combinatoric optimization.
- Seminar: Frozenlake with genetic algorithms
- Homework description - ./week0/README.md
- HSE Homework deadline: 23.59 1.02.17
- YSDA Homework deadline: 23.59 19.02.17
- week1 Monte-carlo methods
- Lecture: Crossentropy method in general and for RL. Extension to continuous state & action space. Limitations.
- Seminar: Tabular CEM for Taxi-v0, deep CEM for box2d environments.
- HSE homework deadline: 23.59 15.02.17
- week2 Temporal Difference
- Lecture: Discounted reward MDP. Value iteration. Q-learning. Temporal difference Vs Monte-Carlo.
- Seminar: Tabular q-learning
- Homework description - see ./week2/README.md
- HSE homework deadline: 23.59 15.02.17
- week3 Value-based algorithms
- Lecture: SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. Eligibility traces.
- Seminar: Qlearning Vs SARSA Vs expected value sarsa in the wild
- Homework description
- HSE homework deadline 23.59 22.02.17
- week4 Approximate reinforcement learning
- Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick.
- Seminar: Approximate Q-learning. (CartPole, MountainCar, Breakout)
somewhere here introduction to theano
-
week i+1 Deep reinforcement learning
-
Lecture: Deep Q-learning/sarsa/whatever. Heuristics & motivation behind them: experience replay, target networks, double/dueling/bootstrap DQN, etc.
-
Seminar: Playing atari with deep reinforcement learning. Experience replay. (classwork = doombasic)
-
week i+1 Policy-based methods
-
Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance theorem(advantage), advantage actor-critic (incl.n-step advantage), off-policy actor-critic (off-PAC), natural gradients(briefly), continuous action space(teaser).
-
Seminar: a2c Vs qlearning for MountainCar/Doom, entropy regularization & tricks.
-
week i+1 Trust Region Policy Optimization.
-
Lecture: Trust region policy optimization in detail.
-
approximate TRPO vs approximate Q-learning for gym box2d envs (robotics-themed)
-
week i+1 Large/Continuous action space. Case study: recsys.
-
Lecture: Continuous action space MDPs. Model-based approach (NAF). Actor-critic approach (dpg, svg). Trust Region Policy Optimization. Large discrete action space problem. Action embedding.
-
Seminar: Classic Control and BipedalWalker with ddpg Vs qNAF. https://gym.openai.com/envs/BipedalWalker-v2 .
somewhere here RNN crash-course
-
week i+1 Partially observable MDPs
-
Lecture: POMDP intro. Model-based solvers. RNN solvers. RNN tricks: attention, problems with normalization methods, pre-training.
-
Seminar: Deep kung-fu with recurrent A2C vs feedforward A2C
-
week i+1 Advanced exploration methods: intrinsic motivation
-
Lecture: Augmented rewards. Heuristics (UNREAL,density-based models), formal approach: information maximizing exploration. Model-based tricks(also refer mcts).
-
Seminar: Vime vs epsilon-greedy for Go9x9 (bonus 19x19)
-
week i+1 Advanced exploration methods: probablistic approach.
-
Lecture: Improved exploration methods (quantile-based, etc.). Bayesian approach. Case study: Contextual bandits for RTB.
-
Seminar: Bandits
-
week i+1 Case studies I
-
Lecture: Reinforcement Learning as a general way to optimize non-differentiable loss. KL(p||q) vs KL(q||p). Case study: machine ranslation, speech synthesis, conversation models.
-
Seminar: Optimizing Levenstein for word transcription
-
week i+1 Hierarchical MDP
-
Lecture: MDP Vs real world. Sparse and delayed rewards. When Q-learning fails. Hierarchical MDP. Hierarchy as temporal abstraction. MDP with symbolic reasoning.
-
Seminar: Hierarchical RL for atari games with rare rewards (starting from pre-trained DQN)
-
week i+1 Case studies II
-
Lecture: Direct policy optimization: finance. Inverse Reinforcement Learning: personalized medial treatment, robotics.
-
Seminar: Portfolio optimization as POMDP.
Course materials and teaching by
- Fedor Ratnikov - lectures, seminars, hw checkups
- Alexander Fritsler - lectures, seminars, hw checkups
- Oleg Vasilev - seminars, hw checkups, technical stuff
- Pavel Shvechikov - lectures, seminars, HW checkups
- Using pictures from http://ai.berkeley.edu/home.html
- Other contributions: omtcyfz dmittov arogozhnikov