Future improvements

First, hands down, amazing work. Serving as a baseline, I see a possible improvement, if someone wants to implement it:

- The n-step return, as it is, is biased (as you are using old off-policy samples). Retrace [[Safe and Efficient Off-Policy Reinforcement Learning](https://arxiv.org/abs/1606.02647)] would resolve the issue. However, implementing Retrace in Distributional RL is not straightforward, but I see that work [[The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning](https://arxiv.org/abs/1704.04651)] deals with the issue (as it seems, without the quantile regression, however).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future improvements #23

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development