value/reward transform issue

I think the way you transform value/reward is a little mismatch with the original paper at this line (https://github.com/werner-duvaud/muzero-general/blob/fe791e8651645ea05f5b582157b4892588ee56ca/trainer.py#L153)

From the referenced paper (https://arxiv.org/abs/1805.11593), the transformation function should be 
<img width="346" alt="transform" src="https://user-images.githubusercontent.com/29387830/74098329-93229880-4adc-11ea-9705-668d8e164a9a.png">

So instead of

`x = torch.sign(x) * (torch.sqrt(torch.abs(x) + 1) - 1 + 0.001 * x)`

the correct formula should be

`x = torch.sign(x) * (torch.sqrt(torch.abs(x) + 1) - 1) + .001 * x`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

value/reward transform issue #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development