Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel #889

newmanne · 2022-07-23T01:29:14Z

@lanctot @ssokota: @gregdeon and I have adapted @vwxyzjn CleanRL's PPO implementation to work with OpenSpiel. In order to test the implementation, we have also wrapped Gym's Atari games into an OpenSpiel game.

Notes

We included a SyncVectorEnv class modelled after Gym's class but using OpenSpiel RL Environments. Using multiple environments is a big deal for performance.
This only works for single player games right now. We plan to get a multiplayer version running, but there are some challenges. Namely, when using a SyncVectorEnv, the "turn order" of players in each of the environments can quickly get out of sync, and that really is not trivial to resolve. The games @gregdeon and I are studying are simultaneous move games with no player elimination - this class of games can never go out of sync (the same agent will always be acting in every environment). This is all we were thinking to support, but perhaps we are overlooking a simpler way to support arbitrary turn orders (in combination with vector environments).
Atari will only work if gym and the ROMS and stable baselines are all installed.

Evaluation

Catch will converge to a score of 1.0 every episode fairly quickly (71_680 steps)
Performance of three seeds of Breakout on TensorBoard (heavily smoothed) after 10_000_000 iterations (~8 wall time hours on our setup)
You can compare to this example, we don't match exactly but seem to be in the right ballpark.

Example commands:

python ppo_example.py --game-name catch // runs catch
python ppo_example.py // runs breakout

google-cla · 2022-07-23T01:29:18Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

lanctot · 2022-07-23T03:49:15Z

Awesome!

First thing: something went wrong with the CLA. Can you look into the message above and follow the steps, and let me know when you have and I will rerun it.

lanctot · 2022-07-23T03:52:21Z

Would be nice to have a non-Atari test that solves a really basic single-player game. See the pytorch DQN test for how to construct one using the gambit EFG format: https://github.com/deepmind/open_spiel/blob/b5e0bf6495bd8baf7c6011c18fa4fd403e21385d/open_spiel/python/pytorch/dqn_pytorch_test.py#L39

Note: when you add a python test you also need to add it to python/CMakeLists.txt.

lanctot · 2022-07-23T13:02:50Z

Awesome!

First thing: something went wrong with the CLA. Can you look into the message above and follow the steps, and let me know when you have and I will rerun it.

It seems that one of the commits is marked as an AWS user that has not signed the CLA:

Unfortunately I can't import the PR until this is resolved.

I think probably the easiest thing is to do a fresh branch of master and copy over all the files (individually-- not with the .git subdirectories) and open a fresh new PR with those files. That way, it should only show up as a single commit from the main author.

vwxyzjn

Hey @newmanne, this is great work. I left a couple of comments for minor issues 🙂

I saw open_spiel also has a folder for jax learning algorithms. Just a relevant note - we are experimenting with PPO + JAX integration in CleanRL, which shows promising speed improvement w/ Envpool.

Don't worry about integrating it now since the prototype is in its early stage (messy code), but it might be worth integrating it in the future once we finalize it w/ more tests and documentation.

open_spiel/python/pytorch/ppo.py

lanctot

Can you add a simple test, i.e. the one described here: #889 (comment)

…o ppo

lanctot · 2022-07-27T10:02:35Z

Awesome, thanks. Can you add PPO to docs/algorithms.md and Atari to docs/games.md?

…o ppo

lanctot · 2022-08-04T12:21:33Z

Thanks @vwxyzjn for taking a look!

Apologies @newmanne for the delay, I was on vacation for a large part of July. I'm back now and will ask one of the team to take a quick look as well.

lanctot

Hi guys, I noticed you're using four space indent, but we use 2 space indentation in OpenSpiel, can you change that to use the 2 spaces?

Also noted a few other things in separate comments

open_spiel/python/examples/ppo_example.py

lanctot · 2022-08-04T15:01:14Z

open_spiel/python/pytorch/ppo_pytorch_test.py

+
+from open_spiel.python import rl_environment
+import pyspiel
+from open_spiel.python.pytorch.ppo import PPO, PPOAgent


Separate into individual imports (also elsewhere)

vwxyzjn · 2022-08-04T23:16:07Z

Hi also a quick comment: would it be possible to add more benchmark? I am quite interested to see the performance of the agent in the environments that OpenSpiel provides. E.g,. see our ppo_atari.py docs for how we usually do it.

We certainly don't need to do it in this PR, but having more benchmark would be quite nice.

Ideally, it would be great if you can contribute the tracked experiments to the Open RL Benchmark, which makes everything about the experiments super transparent. It is leveraging wandb (a proprietary service), however, so no worries if this is difficult.

Move from argparse to abseil Fix bug in naming non-atari games

lanctot · 2022-08-08T10:27:57Z

Hmmm, test failed in Colored Trails (due to observing a utility greater than the maximum) -- that's probably a bug on our end, I'll look into it today.

lanctot · 2022-08-08T11:07:21Z

Hmmm, test failed in Colored Trails (due to observing a utility greater than the maximum) -- that's probably a bug on our end, I'll look into it today.

Confirmed bug on our side, fix is lines 186-187 in colored_trails.h here: https://github.com/deepmind/open_spiel/pull/900/files.

jhtschultz

Thanks for the submission! Cool to see someone integrate Atari with OpenSpiel. Left some minor comments. Didn't dig into the PPO implementation as that appears to have been forked from a stable version.

Please run the code through yapf (https://github.com/google/yapf/) or pylint to conform with the Google styleguide (https://google.github.io/styleguide/pyguide.html).

docs/games.md

open_spiel/python/examples/ppo_example.py

open_spiel/python/games/atari.py

open_spiel/python/pytorch/ppo.py

open_spiel/python/pytorch/ppo_pytorch_test.py

open_spiel/python/games/atari.py

open_spiel/python/pytorch/ppo.py

PaulFMMuller · 2022-10-18T13:01:43Z

open_spiel/python/pytorch/ppo.py

+
+    # Annealing the rate if instructed to do so.
+    if self.num_annealing_updates is not None:
+      frac = 1.0 - (self.updates_done) / self.num_annealing_updates


Make sure LR doesn't go to 0 / negative ?

It wouldn't go to 0 / negative when num_annealing_updates=num_updates (the default setting in ppo_pytorch_test.py):

learning_rate = 2.5e-4 total_timesteps = 500000 num_envs = 4 num_steps = 128 num_updates = total_timesteps // (num_envs * num_steps) num_annealing_updates = num_updates print(f"num_updates={num_updates}") for update in range(num_updates): # Annealing the rate if instructed to do so. frac = 1.0 - (update) / num_annealing_updates lrnow = frac * learning_rate print(f"frac={frac}") frac = 1.0 - (update+1) / num_annealing_updates print(f"do an extra annealing step is not necessary because frac would become {frac}")

num_updates=976 frac=0.0010245901639344135 do an extra annealing step is not necessary because frac would become 0.0

However, if num_annealing_updates=num_updates - 2, then 0 / negative happens. Maybe it's worth disabling num_annealing_updates and just making learning rate annealing a toggleable option? Supporting different kinds of learning rate annealing should be done through a different API.

num_updates=976 frac=-0.0010266940451746365 do an extra annealing step is not necessary because frac would become -0.002053388090349051

We've moved the learning rate annealing outside of the agent. By default there is no annealing, and it can be put into the training loop if the user wants (as seen in ppo_example.py with the anneal_lr flag.

…r PR changes.

…o ppo

Fix ALE link in games.md

lanctot · 2022-11-07T21:22:11Z

Thanks guys. I've started the tests, but they will probably fail. There was a change on GitHub's configuration that broke our tests today. I finally got it fixed and updated master a few minutes ago. You'll probably need to pull changes from master in order for the tests to pass (which would be a good thing anyway since this PR's branch has some large changes and was started a while back).

…o ppo

lanctot · 2022-11-15T11:34:15Z

Hi guys, can you update the comments on the github PR thread to either reply or mark the as resolved?

@newmanne @gregdeon

…g loop

newmanne · 2022-11-15T23:46:08Z

@lanctot @gregdeon I think everything should be resolved now

Single player game PPO algorithm and exmaple. Also adds Atar game

c444cc6

newmanne force-pushed the ppo branch from a2b8f73 to c444cc6 Compare July 24, 2022 01:32

vwxyzjn reviewed Jul 24, 2022

View reviewed changes

open_spiel/python/pytorch/ppo.py Outdated Show resolved Hide resolved

open_spiel/python/pytorch/ppo.py Outdated Show resolved Hide resolved

open_spiel/python/pytorch/ppo.py Outdated Show resolved Hide resolved

lanctot self-requested a review July 25, 2022 18:54

lanctot requested changes Jul 25, 2022

View reviewed changes

newmanne and others added 4 commits July 25, 2022 22:21

Single player game PPO algorithm and exmaple. Also adds Atar game

50b3510

Add ppo_pytorch_test. Address cleanups suggested by @vwxyzjnwq

c37bbf2

Merge branch 'ppo' of https://github.com/newmanne/open_spiel_fork int…

8547a70

…o ppo

Merge branch 'master' into ppo

8558982

newmanne added 2 commits July 27, 2022 18:50

Update docs

c772b52

Merge branch 'ppo' of https://github.com/newmanne/open_spiel_fork int…

b747aa4

…o ppo

lanctot approved these changes Aug 4, 2022

View reviewed changes

lanctot requested changes Aug 4, 2022

View reviewed changes

Change spacing from 4 spaces to 2 spaces

25336cc

Move from argparse to abseil Fix bug in naming non-atari games

newmanne and others added 3 commits August 8, 2022 17:52

Merge remote-tracking branch 'upstream/master' into ppo

f4c1b29

Merge remote-tracking branch 'upstream/master' into ppo

686e85b

Merge branch 'master' into ppo

d4df7c0

lanctot requested review from PaulFMMuller and jhtschultz October 13, 2022 15:46

jhtschultz reviewed Oct 13, 2022

View reviewed changes

PaulFMMuller reviewed Oct 18, 2022

View reviewed changes

newmanne and others added 8 commits October 31, 2022 20:15

Merge branch 'master' of https://github.com/deepmind/open_spiel into ppo

1369c97

Fixing some PR issues

f72789b

working on PR

45f779b

Fix reshaping bug, improve formatting+documentation, and address othe…

617fe18

…r PR changes.

Merge branch 'master' of https://github.com/deepmind/open_spiel into ppo

a903926

Merge branch 'ppo' of https://github.com/newmanne/open_spiel_fork int…

d0b3be2

…o ppo

Fix ALE link in games.md

b7083ef

Merge pull request #1 from gregdeon/patch-1

3f2f24d

Fix ALE link in games.md

newmanne added 2 commits November 7, 2022 21:32

Merge remote-tracking branch 'upstream/master' into ppo

8d44795

Merge branch 'ppo' of https://github.com/newmanne/open_spiel_fork int…

7a88803

…o ppo

Move LR annealing to be external to the agnet and part of the trainin…

984fd19

…g loop

lanctot approved these changes Nov 21, 2022

View reviewed changes

lanctot added imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync. labels Nov 22, 2022

OpenSpiel merged commit 76b2416 into google-deepmind:master Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel #889

Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel #889

newmanne commented Jul 23, 2022

google-cla bot commented Jul 23, 2022

lanctot commented Jul 23, 2022

lanctot commented Jul 23, 2022 •

edited

Loading

lanctot commented Jul 23, 2022

vwxyzjn left a comment •

edited

Loading

lanctot left a comment

lanctot commented Jul 27, 2022

lanctot commented Aug 4, 2022

lanctot left a comment

lanctot Aug 4, 2022

vwxyzjn commented Aug 4, 2022 •

edited

Loading

lanctot commented Aug 8, 2022 •

edited

Loading

lanctot commented Aug 8, 2022

jhtschultz left a comment

PaulFMMuller Oct 18, 2022

vwxyzjn Nov 7, 2022 •

edited

Loading

newmanne Nov 15, 2022

lanctot commented Nov 7, 2022 •

edited

Loading

lanctot commented Nov 15, 2022 •

edited

Loading

newmanne commented Nov 15, 2022

Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel #889

Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel #889

Conversation

newmanne commented Jul 23, 2022

Notes

Evaluation

Example commands:

google-cla bot commented Jul 23, 2022

lanctot commented Jul 23, 2022

lanctot commented Jul 23, 2022 • edited Loading

lanctot commented Jul 23, 2022

vwxyzjn left a comment • edited Loading

Choose a reason for hiding this comment

lanctot left a comment

Choose a reason for hiding this comment

lanctot commented Jul 27, 2022

lanctot commented Aug 4, 2022

lanctot left a comment

Choose a reason for hiding this comment

lanctot Aug 4, 2022

Choose a reason for hiding this comment

vwxyzjn commented Aug 4, 2022 • edited Loading

lanctot commented Aug 8, 2022 • edited Loading

lanctot commented Aug 8, 2022

jhtschultz left a comment

Choose a reason for hiding this comment

PaulFMMuller Oct 18, 2022

Choose a reason for hiding this comment

vwxyzjn Nov 7, 2022 • edited Loading

Choose a reason for hiding this comment

newmanne Nov 15, 2022

Choose a reason for hiding this comment

lanctot commented Nov 7, 2022 • edited Loading

lanctot commented Nov 15, 2022 • edited Loading

newmanne commented Nov 15, 2022

lanctot commented Jul 23, 2022 •

edited

Loading

vwxyzjn left a comment •

edited

Loading

vwxyzjn commented Aug 4, 2022 •

edited

Loading

lanctot commented Aug 8, 2022 •

edited

Loading

vwxyzjn Nov 7, 2022 •

edited

Loading

lanctot commented Nov 7, 2022 •

edited

Loading

lanctot commented Nov 15, 2022 •

edited

Loading