-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adapt PPO algorithm from CleanRL to OpenSpiel. Adapt Gym Atari environment to OpenSpiel #889
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Awesome! First thing: something went wrong with the CLA. Can you look into the message above and follow the steps, and let me know when you have and I will rerun it. |
Would be nice to have a non-Atari test that solves a really basic single-player game. See the pytorch DQN test for how to construct one using the gambit EFG format: https://github.com/deepmind/open_spiel/blob/b5e0bf6495bd8baf7c6011c18fa4fd403e21385d/open_spiel/python/pytorch/dqn_pytorch_test.py#L39 Note: when you add a python test you also need to add it to python/CMakeLists.txt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @newmanne, this is great work. I left a couple of comments for minor issues 🙂
I saw open_spiel
also has a folder for jax learning algorithms. Just a relevant note - we are experimenting with PPO + JAX integration in CleanRL, which shows promising speed improvement w/ Envpool.
Don't worry about integrating it now since the prototype is in its early stage (messy code), but it might be worth integrating it in the future once we finalize it w/ more tests and documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a simple test, i.e. the one described here: #889 (comment)
Awesome, thanks. Can you add PPO to docs/algorithms.md and Atari to docs/games.md? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi guys, I noticed you're using four space indent, but we use 2 space indentation in OpenSpiel, can you change that to use the 2 spaces?
Also noted a few other things in separate comments
|
||
from open_spiel.python import rl_environment | ||
import pyspiel | ||
from open_spiel.python.pytorch.ppo import PPO, PPOAgent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate into individual imports (also elsewhere)
Hi also a quick comment: would it be possible to add more benchmark? I am quite interested to see the performance of the agent in the environments that OpenSpiel provides. E.g,. see our We certainly don't need to do it in this PR, but having more benchmark would be quite nice. Ideally, it would be great if you can contribute the tracked experiments to the Open RL Benchmark, which makes everything about the experiments super transparent. It is leveraging wandb (a proprietary service), however, so no worries if this is difficult. |
Move from argparse to abseil Fix bug in naming non-atari games
Hmmm, test failed in Colored Trails (due to observing a utility greater than the maximum) -- that's probably a bug on our end, I'll look into it today. |
Confirmed bug on our side, fix is lines 186-187 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the submission! Cool to see someone integrate Atari with OpenSpiel. Left some minor comments. Didn't dig into the PPO implementation as that appears to have been forked from a stable version.
Please run the code through yapf (https://github.com/google/yapf/) or pylint to conform with the Google styleguide (https://google.github.io/styleguide/pyguide.html).
open_spiel/python/pytorch/ppo.py
Outdated
|
||
# Annealing the rate if instructed to do so. | ||
if self.num_annealing_updates is not None: | ||
frac = 1.0 - (self.updates_done) / self.num_annealing_updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure LR doesn't go to 0 / negative ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wouldn't go to 0 / negative when num_annealing_updates=num_updates
(the default setting in ppo_pytorch_test.py
):
learning_rate = 2.5e-4
total_timesteps = 500000
num_envs = 4
num_steps = 128
num_updates = total_timesteps // (num_envs * num_steps)
num_annealing_updates = num_updates
print(f"num_updates={num_updates}")
for update in range(num_updates):
# Annealing the rate if instructed to do so.
frac = 1.0 - (update) / num_annealing_updates
lrnow = frac * learning_rate
print(f"frac={frac}")
frac = 1.0 - (update+1) / num_annealing_updates
print(f"do an extra annealing step is not necessary because frac would become {frac}")
num_updates=976
frac=0.0010245901639344135
do an extra annealing step is not necessary because frac would become 0.0
However, if num_annealing_updates=num_updates - 2
, then 0 / negative happens. Maybe it's worth disabling num_annealing_updates
and just making learning rate annealing a toggleable option? Supporting different kinds of learning rate annealing should be done through a different API.
num_updates=976
frac=-0.0010266940451746365
do an extra annealing step is not necessary because frac would become -0.002053388090349051
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've moved the learning rate annealing outside of the agent. By default there is no annealing, and it can be put into the training loop if the user wants (as seen in ppo_example.py
with the anneal_lr
flag.
Fix ALE link in games.md
Thanks guys. I've started the tests, but they will probably fail. There was a change on GitHub's configuration that broke our tests today. I finally got it fixed and updated master a few minutes ago. You'll probably need to pull changes from master in order for the tests to pass (which would be a good thing anyway since this PR's branch has some large changes and was started a while back). |
@lanctot @ssokota: @gregdeon and I have adapted @vwxyzjn CleanRL's PPO implementation to work with OpenSpiel. In order to test the implementation, we have also wrapped Gym's Atari games into an OpenSpiel game.
Notes
Evaluation
You can compare to this example, we don't match exactly but seem to be in the right ballpark.
Example commands:
python ppo_example.py --game-name catch
// runs catchpython ppo_example.py
// runs breakout