week4

Materials

lecture slides
David Silver lecture - https://www.youtube.com/watch?v=UoPei5o4fps&t=3s
More practical and less theoretical lecture from MIT 6.S191 - https://www.youtube.com/watch?v=xWe58WGWmlk
Karpathy's post on approximate RL - http://karpathy.github.io/2016/05/31/rl/

More materials

[recommended] How to actually do deep reinforcement learning by J. Schulman - http://rll.berkeley.edu/deeprlcourse/docs/nuts-and-bolts.pdf
interactive demos in your browser: demo1(karpathy), demo2(Hünermann)
A guide to deep RL from ~scratch (nervana blog) - https://www.nervanasys.com/demystifying-deep-reinforcement-learning/

Homework

From now on, we introduce an alternative homework track that's not tied to lasagne/agentnet/rllab/any_other_framework. In that track, you'll be tasked with similar problems, but they will not be tied to jupyter notebooks with lasagne networks.

You can choose whichever track you want, but unless you're expertly familiar with your framework, we recommend you to start by completing the task in lasagne and only then reproduce your solution in your chosen framework.

Recommended path

Step 1 - go to Seminar4.1, complete it and make sure it reaches the desired reward on Acrobot-v1. Then go to homework section (at the end) and follow the instructions from there.
- Tip - for your network to work properly on Acrobot-v1, please either use non-saturated nonlinearities (elu/leaky_relu/softplus), or normalize observations, or initialize with smaller weights. Otherwise, e.g. sigmoid may get saturated and fail to learn anything.
Step 2 - go to Seminar4.2 and make it beat DoomBasic.

Doom environments are powered by VizDoom (via doom_py), which may require separate installation. If you're using docker container or running in binder, the dependency should already be installed.

To install doom envs manually, follow the instructions at the top of the Seminar4.2 notebook.

For example, on python2, ubuntu 14, stardate 2017.02.27 it took us to

apt-get install -y gcc g++ wget unzip libsdl2-dev libboost-all-dev
pip install gym_pull
pip install ppaquette-gym-doom

If it just won't get installed, pick BreakoutDeterministic-v0 and try to get average reward >= +10

Alternative frameworks

The task is to implement approximate Q-learning with experience replay and show that it works on Acrobot-v1,LunarLander-v2 and ppaquette/DoomBasic-v0 (or other versions of those environments).

If you use tensorflow, there's a very convenient notebook for you to start (by Scitator)

We, however, recommend you to read the lasagne/agentnet assignments briefly to get the grasp of what parameters to start from.

Your're also recommended to fit your solution in a notebook (ipython/torch/r) unless your framework is incompatible with that. In the latter case, please supply us some notes on what code lies where.

Bonus assignments remain exactly the same as in the first track.

Blindly copy-pasting code from any publically available demos will result in us interrogating you about every signifficant line of code to make sure you at least understand (and regret) what you copypasted.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
Seminar4.0_recap_approx_qlearning-tf.ipynb		Seminar4.0_recap_approx_qlearning-tf.ipynb
Seminar4.0_recap_approx_qlearning.ipynb		Seminar4.0_recap_approx_qlearning.ipynb
Seminar4.1_experience_replay.ipynb		Seminar4.1_experience_replay.ipynb
Seminar4.2_conv_agent.ipynb		Seminar4.2_conv_agent.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

week4

week4

README.md

Materials

More materials

Homework

Recommended path

Alternative frameworks

Files

week4

Directory actions

More options

Directory actions

More options

Latest commit

History

week4

Folders and files

parent directory

README.md

Materials

More materials

Homework

Recommended path

Alternative frameworks