update PER support #25

xuxiyang1993 · 2020-03-21T08:13:08Z

Hi,

Thanks for your information!

For the 'wrap' mode in numpy.put(), you are correct. I fixed this line in this commit.

For the mean, I don't have a reference. Since we are first sampling games, and then sampling transitions from games, we need two prioritized sampling process, which is different from Prioritized Experience Replay paper. An alternative to the mean could be max, since after averaging, the transitions with high priorities may be affected. I'm still testing these two schemes.

I also added an option to choose whether or not to use the priority replay (self.PER in the config file). If set to False, the algorithm will never update the priorities, thus all transitions will always have equal priorities (currently 1.0), which is equivalent to uniform sampling.
I made alpha in PER algorithm configurable by adding self.PER_alpha in the config file.

For the remaining items on the list, here are my thoughts:

Add the loss scaling using the importance sampling ratio. (I'm having trouble figuring out how to do this without turning the buffer into a very long list with all the steps of each game)

I think to calculate the IS weights for each sample, we need $N$ and $P(i)$ in this formula:

where $N$ is the total number of transitions in the buffer, and $P(i)$ is the probability of transition $i$ get sampled, which can be calculated from game priorities and transition priorities.

Maybe assign an initial value of probabilities based on the loss of root.value and the predicted value in MCTS (or 1 as you did, could be a parameter).

As suggested by the Distributed Prioritized Experience Replay paper (Page 4, paragraph 2), the initial priority can be either set to the "maximum priority seen so far" (which performs well when replay buffer is small but do NOT scale to cases where there are a lot of actors and replay buffer is large) or the "current n-step TD error" (as you suggested, which requires an additional prediction step), we can make this configurable.

Let me know if my points make sense to you!

werner-duvaud · 2020-03-23T15:46:34Z

Hi,

Thank you for the update and the answers.

It seems clear to me.

Should we scale the IS weights by the maximum one as suggested in the Prioritized Experience Replay paper page 5?

For stability reasons, we always normalize weights by 1/ maxi wi so
that they only scale the update downwards

update PER support

update PER support

b7ebec9

werner-duvaud merged commit 2c3b9fb into werner-duvaud:prioritized_replay Mar 23, 2020

egafni pushed a commit to egafni/muzero-general that referenced this pull request Apr 15, 2021

Merge pull request werner-duvaud#25 from xuxiyang1993/master

bdbf703

update PER support

EpicLiem pushed a commit to EpicLiem/muzero-general-chess-archive that referenced this pull request Feb 4, 2023

Merge pull request werner-duvaud#25 from xuxiyang1993/master

9b23d92

update PER support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update PER support #25

update PER support #25

xuxiyang1993 commented Mar 21, 2020

werner-duvaud commented Mar 23, 2020

update PER support #25

update PER support #25

Conversation

xuxiyang1993 commented Mar 21, 2020

werner-duvaud commented Mar 23, 2020