- Lecture slides
- Our lecture,seminar (russian)
- [main] Lecture by David Silver (english): https://www.youtube.com/watch?v=PnHCvfgC_ZA
- Alternative lecture by Pieter Abbeel (english): https://www.youtube.com/watch?v=ifma8G7LegE
- Alternative lecture by John Schulmann (english): https://www.youtube.com/watch?v=IL3gVyJMmhg
- Policy improvement theorems from Sutton book - http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node42.html
- Lecture II by Dan Klein (english): https://www.youtube.com/watch?v=jUoZg513cdE
- Qlearning guide from Habr (russian): https://habrahabr.ru/post/308094/
- A great turorial/assignment on value-based methods from CS294 - https://github.com/berkeleydeeprlcourse/homework/blob/master/hw2/HW2.ipynb
For ease of access, we have 2 versions of the same homework. They feature the same algorithmic part but a bit different examples.
You can pick whichever one you prefer but mind the technical limitations. If you have a python2 on a local machine (NOT in docker), even if it's on Windows, we recommend the ./assignment one.
this assignment borrows code from awesome cs188 This homework assignment works on python2 only. If you stick to py3, consider alternative homework. Or just install it for this homework alone and remove afterwards.
This homework also requires some physical display (e.g. laptop monitor). It won't work on binder VM / headless server. Please run it on laptop or consider ./alternative
- Go to ./assignment, edit qlearningagents.py (see instructions inside)
- Make sure you can tune agent to beat ./run_crawler.sh
- on windows, just run
python crawler.py
from cmd in the project directory - other ./run* files are mostly for your amusement.
- ./run_pacman.sh will need more epochs to converge, see comments
- on windows, just copy the type
python pacman.py -p PacmanQAgent -x 2000 -n 2010 -l smallGrid
in cmd from assignemnt dir (YSDA/HSE) Please submit only qlearningAgents.py file and include a brief text report as comments in it.
Please make a separate copy of qlearningAgents.py for this assignment
The default tabular q-learning requires unrealistic amount of experience to learn anything useful on pacman tasks. This is mostly due to extremely large state space, combining positions of pacman, ghosts and all dots.
To speed up training you will need to implement a preprocessor that extracts new discrete features from state space. You can design these features to account only for the most important stuff around pacman. This time, it's okay to use environment-specific duct tape :)
Please read tips on how to solve them here. Also, if you find some state spaces that work amaizingly good on pacman, weel free to propose a Pull Request with advices
(HSE/YSDA) Please send us
- The alternative qlearningAgents.py file (and any other files you modified)
- A short description of what you did there
- How to run it. Usually something like
python pacman.py -p PacmanQAgent -x SOMETHING -n SOMETHING -l __mediumClassic__ -SOMETHING SOMETHING ...
- End of train/test log (or even whole log), including at least last iteration of learning and final statistics (especially winrate)
To get 5 points, your algorithm should solve mediumGrid more than 50% times. Creative features and outstanding performance on mediumClassic yields bonus points!
Alternative homework description:
- Go to the notebook
- The assignment is described there.
- If you use binder/server, see week1 for example on how to run CartPole and other envs.
- 5 points for implementing q-learning and testing on taxi
- 5 points for solving CartPole-v0
- bonus tasks listed inside