week5

Materials

Slides here
Video lecture (esp. second half) by J. Schulman - https://www.youtube.com/watch?v=h1-pj4Y9-kM
Our lecture, seminar (russian)
Article on dueling DQN - https://arxiv.org/pdf/1511.06581.pdf
Article on double DQN - https://arxiv.org/abs/1509.06461
Article on prioritized experience replay - https://arxiv.org/abs/1511.05952
Video on asynchronuous methods (Mnih) - https://www.youtube.com/watch?v=9sx1_u2qVhQ
Article on bootstrap DQN - https://papers.nips.cc/paper/6501-deep-exploration-via-bootstrapped-dqn.pdf, summary

More materials

[recommended] An overview of deep reinforcement learning - https://arxiv.org/pdf/1701.07274v1.pdf
Reinforcement learning architectures list - https://github.com/5vision/deep-reinforcement-learning-networks
Building deep q-network from ~scratch (blog) - https://jaromiru.com/2016/09/27/lets-make-a-dqn-theory/
Another guide guide to DQN from ~scratch (blog) - https://rubenfiszel.github.io/posts/rl4j/2016-08-24-Reinforcement-Learning-and-DQN.html
Article on asynchronuous methods in deep RL - https://arxiv.org/abs/1602.01783
[recap] Slides on basic DQN, including target networks - https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

Homework

As usual, "lasagne way" and "other way"

Lasagne way

Basically go to the notebook and follow what's inside.

Other way

This week's task is to implement (and hopefully compare) target networks, double DQN and/or duelling DQN and training on atari breakout.

Tensorflow template: cs294 assignment 3

Implementing prioritized experience replay or bootstrap dqn or any other cool stuff yields you bonus points. You can also choose a different environment if you have issues with breakout, but don't get too complicated. E.g. your DQN will likely fail on Montezuma Revenge unless you do weird stuff with reward function.

We recommend you to upload your results to OpenAI gym and fit your solution in a notebook (ipython/torch/r) unless your framework is incompatible with that. In the latter case, please supply us some notes on what code lies where.

Again,we recommend you to read the lasagne/agentnet assignments briefly to get the grasp of what parameters to start from.

Bonus assignments remain exactly the same as in the first track.

Blindly copy-pasting code from any publically available demos will result in us interrogating you about every signifficant line of code to make sure you at least understand (and regret) what you copypasted.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
Seminar5_deep_rl.ipynb		Seminar5_deep_rl.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week5

week5

README.md

Materials

More materials

Homework

Lasagne way

Other way

Files

week5

Directory actions

More options

Directory actions

More options

Latest commit

History

week5

Folders and files

parent directory

README.md

Materials

More materials

Homework

Lasagne way

Other way