-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tutorial] PettingZoo Parallel competitive tutorial #2047
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2047
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 16 Unrelated FailuresAs of commit 7b5785d with merge base 09c934d (): NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vmoens can you upload on s3 and give the link pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately no, not atm :(
I'd rather have them as a separate PR if possible |
Moved the changes to #2048 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this needs s3
@vmoens this is ready for review |
@elliottower @coderSupreme789 here is the first draft of the tuto in case you are curious and want to have a look. |
# Conflicts: # torchrl/envs/libs/pettingzoo.py
What the tuto is doing is showing the env and the grouping mechanism and training it.
|
Co-authored-by: Vincent Moens <vincentmoens@gmail.com>
I should have access to S3 soon, bear with me EDIT: feel free to send me the files via PM and I'll store them on S3 |
I guessed so, but most feedback I got even from very close, top collaborators such as yourself is that our tutorials are very long and hard to follow, if not worse than that. Concretely, I would either make this tutorial advanced and presuppose that users have an extensive knowledge of MARL and RL in general, and just point them to refs in case context is needed, or split the tutorial in 2-5 sub-tutorials |
I have removed basically all the RL background intro from the tutorial and am making it advanced since it is kind of a more complex case than the PPO one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the status of the RB/Optimizer/Loss grouping? Can we unify that? If not what's the blocker?
env = TransformedEnv( | ||
env, | ||
RewardSum( | ||
in_keys=env.reward_keys, reset_keys=["_reset"] * len(env.group_map.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it!
Not super clean that we ask people to pass _reset, which is private
We should ask for done instead
I answered that in the thread. We want to keep those separate to allow different choices across groups. Plus losses cannot physicall work with multiple groups |
The problem is that the dones are still one per group plus the shared one. The _reset on the other hand is just one |
IIRC the global reset/done precedes over the local ones. Am I right that what you're trying to do is override the local resets with the global one then? |
Sorry I missed this, the thread must have been closed. I don't understand this statement though, can you elaborate?
what does "physicall" mean here?
What "choices"? Hyperparameters? |
So in pettingzoo we have the global done and the local dones but only the global _reset. So the local resets just do not exist. since the class needs a reset for each agent key, we are telling all reward keys to track the global reset. |
Ok let me elaborate more, because this is quite a fundamental concept about the whole MARL api and BenchMARL.
The core of the MARL API is that if you want a shared loss/buffer/optimizer you can keep agents in the same group. But when agents are in different groups they can be decoupled into individual components so that you can
|
Thanks for the clarification Since we can use param_groups in the optimizers, we could obtain the same behaviour as having many optimizers by just splitting the param groups carefully. Note that this will come with significant speedups on cuda too since you'll be able to update your params asynchronously.
Isn't the training synchronous here? They're trained sequentially for a given number of steps no?
Can you point me to the part of the tutorial that does that? What I see is that there's a constant number of steps for each component and the learning rates as well as other hyperparameters are all shared. From a non MARL trained eye it seems like there's a bunch of stuff that could be simplified (for instance why not having one replay buffer from which you sample and then get the relevant keys with indexing / select?) |
# Stop training a certain group when a condition is met (e.g., number of training iterations)
if iteration == iteration_when_stop_training_evaders:
del train_group_map["agent"] Again, at some point we may sample more from the buffer of one group or stop completely sampling from one group like in this tuto.
I mean train them at different times. In this tuto for the seond half of the iterations we are trianing just one
I doi not agree. I think that this is not a feature we want as it is good to have the clear separation and have losses being single group. Repeating the construction in a for loop is peanuts compared to the complexity that we would introduce by moving the for loop inside the losses (because at the end of the day the for loop will be somewhere since it is all heterogeneous networks). I do not think that the current way is boiler plate as users can just experiment with setting different values for groups.
All these cases benefit from the flexibility of training groups independently and I currently see no benefits in making groups share training components (because the for loops will be just nested inside the loss) But since this is maybe the most important MARL choice of the lib and kind of the core of my internship let's maybe discuss it more also offline |
…tingzoo_tuto # Conflicts: # tutorials/sphinx-tutorials/multiagent_competitive_ddpg.py
PettingZoo tutorial
So the idea is that we need a tutorial to show people how to do comeptitive MARL in TorchRL (also showing how to use PettingZoo parallel environments)
I would like it to be complementary to the MultiAgent PPO tutorial (https://pytorch.org/rl/tutorials/multiagent_ppo.html) in VMAS so:
This would however be a synchronous environment so the tutorial would NOT SHOW:
It would SHOW: