[this week is still largely under construction]
- [main] David Silver lecture on exploration and expoitation - https://www.youtube.com/watch?v=sGuiWX07sKw
- Russian version - [under construction]
- "Deep" version: variational information maximizing exploration - https://www.youtube.com/watch?v=sRIjxxjVrnY
- Same topics in russian - https://yadi.sk/i/_2_0yqeW3HDbcn
- Lecture covering intrinsically motivated reinforcement learning - https://www.youtube.com/watch?v=aJI_9SoBDaQ
- Slides
- Same topics in russian - https://www.youtube.com/watch?v=WCE9hhPbCmc
- Note: UCB-1 is not for bernoulli rewards, but for arbitrary r in [0,1], so you can just scale any reward to [0,1] to obtain a peace of mind. It's derived directly from Hoeffding's inequality.
under construction...