week2

Materials

Lecture slides
Our lecture,seminar (russian)
[main] Lecture by David Silver (english): https://www.youtube.com/watch?v=PnHCvfgC_ZA
Alternative lecture by Pieter Abbeel (english): https://www.youtube.com/watch?v=ifma8G7LegE
Alternative lecture by John Schulmann (english): https://www.youtube.com/watch?v=IL3gVyJMmhg

Bonus materials

Policy improvement theorems from Sutton book - http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node42.html
Lecture II by Dan Klein (english): https://www.youtube.com/watch?v=jUoZg513cdE
Qlearning guide from Habr (russian): https://habrahabr.ru/post/308094/
A great turorial/assignment on value-based methods from CS294 - https://github.com/berkeleydeeprlcourse/homework/blob/master/hw2/HW2.ipynb

Homework description:

For ease of access, we have 2 versions of the same homework. They feature the same algorithmic part but a bit different examples.

You can pick whichever one you prefer but mind the technical limitations. If you have a python2 on a local machine (NOT in docker), even if it's on Windows, we recommend the ./assignment one.

./assignment

this assignment borrows code from awesome cs188 This homework assignment works on python2 only. If you stick to py3, consider alternative homework. Or just install it for this homework alone and remove afterwards.

This homework also requires some physical display (e.g. laptop monitor). It won't work on binder VM / headless server. Please run it on laptop or consider ./alternative

Part I (5 points)

Go to ./assignment, edit qlearningagents.py (see instructions inside)
Make sure you can tune agent to beat ./run_crawler.sh
on windows, just run python crawler.py from cmd in the project directory
other ./run* files are mostly for your amusement.
- ./run_pacman.sh will need more epochs to converge, see comments
- on windows, just copy the type python pacman.py -p PacmanQAgent -x 2000 -n 2010 -l smallGrid in cmd from assignemnt dir (YSDA/HSE) Please submit only qlearningAgents.py file and include a brief text report as comments in it.

Part II (5+ points)

Please make a separate copy of qlearningAgents.py for this assignment

The default tabular q-learning requires unrealistic amount of experience to learn anything useful on pacman tasks. This is mostly due to extremely large state space, combining positions of pacman, ghosts and all dots.

To speed up training you will need to implement a preprocessor that extracts new discrete features from state space. You can design these features to account only for the most important stuff around pacman. This time, it's okay to use environment-specific duct tape :)

Please read tips on how to solve them here. Also, if you find some state spaces that work amaizingly good on pacman, weel free to propose a Pull Request with advices

(HSE/YSDA) Please send us

The alternative qlearningAgents.py file (and any other files you modified)
A short description of what you did there
How to run it. Usually something like python pacman.py -p PacmanQAgent -x SOMETHING -n SOMETHING -l __mediumClassic__ -SOMETHING SOMETHING ...
End of train/test log (or even whole log), including at least last iteration of learning and final statistics (especially winrate)

To get 5 points, your algorithm should solve mediumGrid more than 50% times. Creative features and outstanding performance on mediumClassic yields bonus points!

./alternative

Alternative homework description:

Go to the notebook
The assignment is described there.
If you use binder/server, see week1 for example on how to run CartPole and other envs.

Grading (alternative)

5 points for implementing q-learning and testing on taxi
5 points for solving CartPole-v0
bonus tasks listed inside

Name		Name	Last commit message	Last commit date
parent directory ..
alternative		alternative
assignment		assignment
README.md		README.md
homework_tips.md		homework_tips.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week2

week2

README.md

Materials

Bonus materials

Homework description:

./assignment

Part I (5 points)

Part II (5+ points)

./alternative

Grading (alternative)

Files

week2

Directory actions

More options

Directory actions

More options

Latest commit

History

week2

Folders and files

parent directory

README.md

Materials

Bonus materials

Homework description:

./assignment

Part I (5 points)

Part II (5+ points)

./alternative

Grading (alternative)