Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct issues about symmetric games in psro_v2 #1031

Merged
merged 1 commit into from
Mar 20, 2023

Conversation

rezunli96
Copy link
Contributor

@rezunli96 rezunli96 commented Mar 9, 2023

Hi, I think there are a few issues in the current psro_v2 with the symmetric game option so I submit a PR. They are:

(1) for get_metagame(). The original one returns self._game_num_players * self._meta_games. This is incorrect because self._meta_games is always a list of N [|S1|, |S2|,...,|SN|] tensors where each tensor contains the payoff values of a player. |Sn| is the number of strategies of player n. self._game_num_players * meta_games will result to N references of the same N tensors which are N^2 tensors.

My guess is that originally it was thought that self._meta_games only contains the tensor of player 0. But I don't see anywhere in psro_v2 handling metagame for symmetric game specifically. And even if it is the case this is still incorrect. Because get_metagame() gives payoff tensors for each player, so get_metagame()[0] and get_metagame()[1] are still different despite that they obey some degree of symmetry.

(2) _initialize_policy. In psro_v2, when symmetric_game=True, self._policies, self._new_policies will represent a single population strategy set when they aren't getting updated. But in _initialize_policy it does not have a special handler for this case and initializes the policy sets as an N-player policy list.

(3) Also in multiple places when it switches between one-population strategy set representation and N-player strategy set representation it does something like self._policies = self._game_num_players * self._policies. This approach assumes it is safe to use N references of the same policy set object for other computational purposes. However this might not be always be safe. For example if we use open_spiel.python.jax.dqn as an oracle, then it will contain player_id information. And self._policies = self._game_num_players * self._policies may contain N DQN all with player_id=0, which could lead to bug. This part needs some ad-hoc treatment, including how you build the oracle object so I put some comments at anywhere it appears.

@lanctot
Copy link
Collaborator

lanctot commented Mar 9, 2023

Thanks!

@PaulFMMuller, can you take a look?

@lanctot lanctot requested a review from PaulFMMuller March 10, 2023 14:47
Copy link
Collaborator

@PaulFMMuller PaulFMMuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @rezunli96 !

@lanctot lanctot added imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync. labels Mar 13, 2023
@lanctot lanctot merged commit 9ed727f into google-deepmind:master Mar 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants