RNaD Entropy Schedule #1076

spktrm · 2023-05-25T10:46:59Z

s = EntropySchedule(sizes=(10,))
print([s(i) for i in range(20)])
[
(0.0, False), (0.2, False), (0.4, False), (0.6, False), (0.8, False), 
(1.0, False), (1.0, False), (1.0, False), (1.0, False), (1.0, False), 
(0.0, True), (0.2, False), (0.4, False), (0.6, False), (0.8, False), 
(1.0, False), (1.0, False), (1.0, False), (1.0, False), (1.0, False)
]

Shouldn't the alpha value be 1.0 when the regularization nets are updated?

Since it is 0 this effectively disrupts the linear interpolation between regularization policies

lanctot · 2023-06-01T19:16:49Z

@perolat, @bartdevylder: any ideas?

perolat · 2023-06-13T12:52:42Z

Hi @spktrm ,

Thanks for the question.

It should becomes 0 after we update the two regularisation policies. Unless we missed an edge case there should only be continuous interpolations between regularisation networks.

It should go this way:

alpha goes from 0 to 1 linearly over half of the interval (interpolate from pi_{reg, 0} to pi_{reg, 1}),
then alpha stays at one for the rest of the interval (here the regularisation policy is pi_{reg, 1}),
we update the networks (to interpolate between pi_{reg, 1} and pi_{reg, 2})
start a new interval from alpha=0 (so at the first step of this interval we start with the regularisation policy pi_{reg, 1})

Let me know if you see something that doesn't match this intended behaviour.

Julien

spktrm · 2023-06-13T13:35:50Z

In the current code, on the steps the regularisation nets are updated, alpha = 0. The regularisation nets are updated after alpha is used to compute the parameter updates. As such, the current entropy schedule implementation outputs alpha = 0 when update_target_net is True.

lanctot · 2023-08-31T14:18:40Z

Update: We have identified this as a bug and Eugene Tarassov has a fix. Will be fixed on next sync to github.

lanctot added bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync! labels Aug 31, 2023

lanctot closed this as completed in 3be74c1 Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNaD Entropy Schedule #1076

RNaD Entropy Schedule #1076

spktrm commented May 25, 2023

lanctot commented Jun 1, 2023

perolat commented Jun 13, 2023

spktrm commented Jun 13, 2023

lanctot commented Aug 31, 2023

RNaD Entropy Schedule #1076

RNaD Entropy Schedule #1076

Comments

spktrm commented May 25, 2023

lanctot commented Jun 1, 2023

perolat commented Jun 13, 2023

spktrm commented Jun 13, 2023

lanctot commented Aug 31, 2023