You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It should becomes 0 after we update the two regularisation policies. Unless we missed an edge case there should only be continuous interpolations between regularisation networks.
It should go this way:
alpha goes from 0 to 1 linearly over half of the interval (interpolate from pi_{reg, 0} to pi_{reg, 1}),
then alpha stays at one for the rest of the interval (here the regularisation policy is pi_{reg, 1}),
we update the networks (to interpolate between pi_{reg, 1} and pi_{reg, 2})
start a new interval from alpha=0 (so at the first step of this interval we start with the regularisation policy pi_{reg, 1})
Let me know if you see something that doesn't match this intended behaviour.
In the current code, on the steps the regularisation nets are updated, alpha = 0. The regularisation nets are updated after alpha is used to compute the parameter updates. As such, the current entropy schedule implementation outputs alpha = 0 when update_target_net is True.
Shouldn't the alpha value be 1.0 when the regularization nets are updated?
Since it is 0 this effectively disrupts the linear interpolation between regularization policies
The text was updated successfully, but these errors were encountered: