Open
Description
First, hands down, amazing work. Serving as a baseline, I see a possible improvement, if someone wants to implement it:
- The n-step return, as it is, is biased (as you are using old off-policy samples). Retrace [Safe and Efficient Off-Policy Reinforcement Learning] would resolve the issue. However, implementing Retrace in Distributional RL is not straightforward, but I see that work [The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning] deals with the issue (as it seems, without the quantile regression, however).
Metadata
Assignees
Labels
No labels