correct issues about symmetric games in psro_v2 #1031
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, I think there are a few issues in the current psro_v2 with the symmetric game option so I submit a PR. They are:
(1) for
get_metagame()
. The original one returnsself._game_num_players * self._meta_games
. This is incorrect becauseself._meta_games
is always a list of N [|S1|, |S2|,...,|SN|] tensors where each tensor contains the payoff values of a player. |Sn| is the number of strategies of player n.self._game_num_players * meta_games
will result to N references of the same N tensors which are N^2 tensors.My guess is that originally it was thought that
self._meta_games
only contains the tensor of player 0. But I don't see anywhere in psro_v2 handling metagame for symmetric game specifically. And even if it is the case this is still incorrect. Becauseget_metagame()
gives payoff tensors for each player, soget_metagame()[0]
andget_metagame()[1]
are still different despite that they obey some degree of symmetry.(2)
_initialize_policy
. In psro_v2, whensymmetric_game=True
,self._policies, self._new_policies
will represent a single population strategy set when they aren't getting updated. But in_initialize_policy
it does not have a special handler for this case and initializes the policy sets as an N-player policy list.(3) Also in multiple places when it switches between one-population strategy set representation and N-player strategy set representation it does something like
self._policies = self._game_num_players * self._policies
. This approach assumes it is safe to use N references of the same policy set object for other computational purposes. However this might not be always be safe. For example if we useopen_spiel.python.jax.dqn
as an oracle, then it will contain player_id information. Andself._policies = self._game_num_players * self._policies
may contain N DQN all withplayer_id=0
, which could lead to bug. This part needs some ad-hoc treatment, including how you build the oracle object so I put some comments at anywhere it appears.