Closed
Description
I think the way you transform value/reward is a little mismatch with the original paper at this line (
Line 153 in fe791e8
From the referenced paper (https://arxiv.org/abs/1805.11593), the transformation function should be
So instead of
x = torch.sign(x) * (torch.sqrt(torch.abs(x) + 1) - 1 + 0.001 * x)
the correct formula should be
x = torch.sign(x) * (torch.sqrt(torch.abs(x) + 1) - 1) + .001 * x
Metadata
Assignees
Labels
No labels