Skip to content

Latest commit

 

History

History
 
 

week2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Materials

Bonus materials

Homework description:

For ease of access, we have 2 versions of the same homework. They feature the same algorithmic part but a bit different examples.

You can pick whichever one you prefer but mind the technical limitations. If you have a python2 on a local machine (NOT in docker), even if it's on Windows, we recommend the ./assignment one.

./assignment

this assignment borrows code from awesome cs188 This homework assignment works on python2 only. If you stick to py3, consider alternative homework. Or just install it for this homework alone and remove afterwards.

This homework also requires some physical display (e.g. laptop monitor). It won't work on binder VM / headless server. Please run it on laptop or consider ./alternative

Part I (5 points)

  • Go to ./assignment, edit qlearningagents.py (see instructions inside)
  • Make sure you can tune agent to beat ./run_crawler.sh
  • on windows, just run python crawler.py from cmd in the project directory
  • other ./run* files are mostly for your amusement.
    • ./run_pacman.sh will need more epochs to converge, see comments
    • on windows, just copy the type python pacman.py -p PacmanQAgent -x 2000 -n 2010 -l smallGrid in cmd from assignemnt dir (YSDA/HSE) Please submit only qlearningAgents.py file and include a brief text report as comments in it.

Part II (5+ points)

Please make a separate copy of qlearningAgents.py for this assignment

The default tabular q-learning requires unrealistic amount of experience to learn anything useful on pacman tasks. This is mostly due to extremely large state space, combining positions of pacman, ghosts and all dots.

To speed up training you will need to implement a preprocessor that extracts new discrete features from state space. You can design these features to account only for the most important stuff around pacman. This time, it's okay to use environment-specific duct tape :)

Please read tips on how to solve them here. Also, if you find some state spaces that work amaizingly good on pacman, weel free to propose a Pull Request with advices

(HSE/YSDA) Please send us

  • The alternative qlearningAgents.py file (and any other files you modified)
  • A short description of what you did there
  • How to run it. Usually something like python pacman.py -p PacmanQAgent -x SOMETHING -n SOMETHING -l __mediumClassic__ -SOMETHING SOMETHING ...
  • End of train/test log (or even whole log), including at least last iteration of learning and final statistics (especially winrate)

To get 5 points, your algorithm should solve mediumGrid more than 50% times. Creative features and outstanding performance on mediumClassic yields bonus points!

./alternative

Alternative homework description:

  • Go to the notebook
  • The assignment is described there.
  • If you use binder/server, see week1 for example on how to run CartPole and other envs.

Grading (alternative)

  • 5 points for implementing q-learning and testing on taxi
  • 5 points for solving CartPole-v0
  • bonus tasks listed inside