Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel #889

Merged
merged 22 commits into from
Nov 24, 2022

Conversation

newmanne
Copy link
Contributor

@lanctot @ssokota: @gregdeon and I have adapted @vwxyzjn CleanRL's PPO implementation to work with OpenSpiel. In order to test the implementation, we have also wrapped Gym's Atari games into an OpenSpiel game.

Notes

  • We included a SyncVectorEnv class modelled after Gym's class but using OpenSpiel RL Environments. Using multiple environments is a big deal for performance.
  • This only works for single player games right now. We plan to get a multiplayer version running, but there are some challenges. Namely, when using a SyncVectorEnv, the "turn order" of players in each of the environments can quickly get out of sync, and that really is not trivial to resolve. The games @gregdeon and I are studying are simultaneous move games with no player elimination - this class of games can never go out of sync (the same agent will always be acting in every environment). This is all we were thinking to support, but perhaps we are overlooking a simpler way to support arbitrary turn orders (in combination with vector environments).
  • Atari will only work if gym and the ROMS and stable baselines are all installed.

Evaluation

  • Catch will converge to a score of 1.0 every episode fairly quickly (71_680 steps)
  • Performance of three seeds of Breakout on TensorBoard (heavily smoothed) after 10_000_000 iterations (~8 wall time hours on our setup)
    atari You can compare to this example, we don't match exactly but seem to be in the right ballpark.

Example commands:

python ppo_example.py --game-name catch // runs catch
python ppo_example.py // runs breakout

@google-cla
Copy link

google-cla bot commented Jul 23, 2022

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@lanctot
Copy link
Collaborator

lanctot commented Jul 23, 2022

Awesome!

First thing: something went wrong with the CLA. Can you look into the message above and follow the steps, and let me know when you have and I will rerun it.

@lanctot
Copy link
Collaborator

lanctot commented Jul 23, 2022

Would be nice to have a non-Atari test that solves a really basic single-player game. See the pytorch DQN test for how to construct one using the gambit EFG format: https://github.com/deepmind/open_spiel/blob/b5e0bf6495bd8baf7c6011c18fa4fd403e21385d/open_spiel/python/pytorch/dqn_pytorch_test.py#L39

Note: when you add a python test you also need to add it to python/CMakeLists.txt.

@lanctot
Copy link
Collaborator

lanctot commented Jul 23, 2022

Awesome!

First thing: something went wrong with the CLA. Can you look into the message above and follow the steps, and let me know when you have and I will rerun it.

It seems that one of the commits is marked as an AWS user that has not signed the CLA:
image

Unfortunately I can't import the PR until this is resolved.

I think probably the easiest thing is to do a fresh branch of master and copy over all the files (individually-- not with the .git subdirectories) and open a fresh new PR with those files. That way, it should only show up as a single commit from the main author.

Copy link

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @newmanne, this is great work. I left a couple of comments for minor issues 🙂

I saw open_spiel also has a folder for jax learning algorithms. Just a relevant note - we are experimenting with PPO + JAX integration in CleanRL, which shows promising speed improvement w/ Envpool.

Don't worry about integrating it now since the prototype is in its early stage (messy code), but it might be worth integrating it in the future once we finalize it w/ more tests and documentation.

image

open_spiel/python/pytorch/ppo.py Outdated Show resolved Hide resolved
open_spiel/python/pytorch/ppo.py Outdated Show resolved Hide resolved
open_spiel/python/pytorch/ppo.py Outdated Show resolved Hide resolved
@lanctot lanctot self-requested a review July 25, 2022 18:54
Copy link
Collaborator

@lanctot lanctot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a simple test, i.e. the one described here: #889 (comment)

@lanctot
Copy link
Collaborator

lanctot commented Jul 27, 2022

Awesome, thanks. Can you add PPO to docs/algorithms.md and Atari to docs/games.md?

@lanctot
Copy link
Collaborator

lanctot commented Aug 4, 2022

Thanks @vwxyzjn for taking a look!

Apologies @newmanne for the delay, I was on vacation for a large part of July. I'm back now and will ask one of the team to take a quick look as well.

Copy link
Collaborator

@lanctot lanctot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi guys, I noticed you're using four space indent, but we use 2 space indentation in OpenSpiel, can you change that to use the 2 spaces?

Also noted a few other things in separate comments

open_spiel/python/examples/ppo_example.py Outdated Show resolved Hide resolved

from open_spiel.python import rl_environment
import pyspiel
from open_spiel.python.pytorch.ppo import PPO, PPOAgent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate into individual imports (also elsewhere)

@vwxyzjn
Copy link

vwxyzjn commented Aug 4, 2022

Hi also a quick comment: would it be possible to add more benchmark? I am quite interested to see the performance of the agent in the environments that OpenSpiel provides. E.g,. see our ppo_atari.py docs for how we usually do it.

image

We certainly don't need to do it in this PR, but having more benchmark would be quite nice.

Ideally, it would be great if you can contribute the tracked experiments to the Open RL Benchmark, which makes everything about the experiments super transparent. It is leveraging wandb (a proprietary service), however, so no worries if this is difficult.

Move from argparse to abseil
Fix bug in naming non-atari games
@lanctot
Copy link
Collaborator

lanctot commented Aug 8, 2022

Hmmm, test failed in Colored Trails (due to observing a utility greater than the maximum) -- that's probably a bug on our end, I'll look into it today.

@lanctot
Copy link
Collaborator

lanctot commented Aug 8, 2022

Hmmm, test failed in Colored Trails (due to observing a utility greater than the maximum) -- that's probably a bug on our end, I'll look into it today.

Confirmed bug on our side, fix is lines 186-187 in colored_trails.h here: https://github.com/deepmind/open_spiel/pull/900/files.

Copy link
Member

@jhtschultz jhtschultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the submission! Cool to see someone integrate Atari with OpenSpiel. Left some minor comments. Didn't dig into the PPO implementation as that appears to have been forked from a stable version.

Please run the code through yapf (https://github.com/google/yapf/) or pylint to conform with the Google styleguide (https://google.github.io/styleguide/pyguide.html).

docs/games.md Outdated Show resolved Hide resolved
open_spiel/python/examples/ppo_example.py Show resolved Hide resolved
open_spiel/python/games/atari.py Show resolved Hide resolved
open_spiel/python/pytorch/ppo.py Show resolved Hide resolved
open_spiel/python/pytorch/ppo_pytorch_test.py Outdated Show resolved Hide resolved
open_spiel/python/games/atari.py Outdated Show resolved Hide resolved
open_spiel/python/games/atari.py Outdated Show resolved Hide resolved
open_spiel/python/games/atari.py Show resolved Hide resolved
open_spiel/python/games/atari.py Outdated Show resolved Hide resolved
open_spiel/python/games/atari.py Show resolved Hide resolved
open_spiel/python/pytorch/ppo.py Show resolved Hide resolved
open_spiel/python/pytorch/ppo.py Outdated Show resolved Hide resolved

# Annealing the rate if instructed to do so.
if self.num_annealing_updates is not None:
frac = 1.0 - (self.updates_done) / self.num_annealing_updates
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure LR doesn't go to 0 / negative ?

Copy link

@vwxyzjn vwxyzjn Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't go to 0 / negative when num_annealing_updates=num_updates (the default setting in ppo_pytorch_test.py):

learning_rate = 2.5e-4
total_timesteps = 500000
num_envs = 4
num_steps = 128
num_updates = total_timesteps // (num_envs * num_steps)
num_annealing_updates = num_updates
print(f"num_updates={num_updates}")
for update in range(num_updates):
    # Annealing the rate if instructed to do so.
    frac = 1.0 - (update) / num_annealing_updates
    lrnow = frac * learning_rate
print(f"frac={frac}")
frac = 1.0 - (update+1) / num_annealing_updates
print(f"do an extra annealing step is not necessary because frac would become {frac}")
num_updates=976
frac=0.0010245901639344135
do an extra annealing step is not necessary because frac would become 0.0

However, if num_annealing_updates=num_updates - 2, then 0 / negative happens. Maybe it's worth disabling num_annealing_updates and just making learning rate annealing a toggleable option? Supporting different kinds of learning rate annealing should be done through a different API.

num_updates=976
frac=-0.0010266940451746365
do an extra annealing step is not necessary because frac would become -0.002053388090349051

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've moved the learning rate annealing outside of the agent. By default there is no annealing, and it can be put into the training loop if the user wants (as seen in ppo_example.py with the anneal_lr flag.

@lanctot
Copy link
Collaborator

lanctot commented Nov 7, 2022

Thanks guys. I've started the tests, but they will probably fail. There was a change on GitHub's configuration that broke our tests today. I finally got it fixed and updated master a few minutes ago. You'll probably need to pull changes from master in order for the tests to pass (which would be a good thing anyway since this PR's branch has some large changes and was started a while back).

@lanctot
Copy link
Collaborator

lanctot commented Nov 15, 2022

Hi guys, can you update the comments on the github PR thread to either reply or mark the as resolved?

@newmanne @gregdeon

@newmanne
Copy link
Contributor Author

@lanctot @gregdeon I think everything should be resolved now

@lanctot lanctot added imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync. labels Nov 22, 2022
@OpenSpiel OpenSpiel merged commit 76b2416 into google-deepmind:master Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported This PR has been imported and awaiting internal review. Please avoid any more local changes, thanks! merged internally The code is now submitted to our internal repo and will be merged in the next github sync.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants