[Tutorial] PettingZoo Parallel competitive tutorial #2047

matteobettini · 2024-03-29T08:18:42Z

PettingZoo tutorial

So the idea is that we need a tutorial to show people how to do comeptitive MARL in TorchRL (also showing how to use PettingZoo parallel environments)

I would like it to be complementary to the MultiAgent PPO tutorial (https://pytorch.org/rl/tutorials/multiagent_ppo.html) in VMAS so:

as the algorithm i would use MADDPG (to show an off-policy algo)
as the env i was thinking MPE environments. particularly "simple_tag" so we can show training of 2 competitive agent groups

This would however be a synchronous environment so the tutorial would NOT SHOW:

turn-based environments
dead agents
action masking

It would SHOW:

parallel petting zoo environments
competitive marl
training multiple MARL groups in PettingZoo and VMAS

pytorch-bot · 2024-03-29T08:18:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2047

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 16 Unrelated Failures

As of commit 7b5785d with merge base 09c934d ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Unit-tests on MacOS CPU / tests (3.11) / macos-job (gh)
The process '/usr/local/bin/git' failed with exit code 128
Unit-tests on MacOS CPU / tests (3.8) / macos-job (gh)
The process '/usr/local/bin/git' failed with exit code 128
Unit-tests on Windows / unittests-cpu / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Examples Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Generate documentation / build-docs (3.9, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Habitat Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Libs Tests on Linux / unittests-gym (3.9, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Libs Tests on Linux / unittests-sklearn (3.9, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Lint / c-source / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Lint / python-source-and-configs / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
RLHF Tests on Linux / unittests (3.9, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-cpu (3.8) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-cpu (3.9) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-gpu (3.10, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-optdeps (3.10, 12.1) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128
Unit-tests on Linux / tests-stable-gpu (3.10, 11.8) / linux-job (gh)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

matteobettini · 2024-03-29T10:10:24Z

tutorials/sphinx-tutorials/simple_tag.gif

@vmoens can you upload on s3 and give the link pls?

unfortunately no, not atm :(

vmoens · 2024-03-29T16:30:51Z

I'd rather have them as a separate PR if possible

matteobettini · 2024-04-01T09:39:58Z

This would however be a synchronous environment so the tutorial would NOT SHOW:

Moved the changes to #2048

matteobettini · 2024-04-02T07:28:12Z

tutorials/sphinx-tutorials/off_policy_training_pettingzoo_vmas.png

also this needs s3

matteobettini · 2024-04-02T16:55:33Z

@vmoens this is ready for review

matteobettini · 2024-04-02T16:58:04Z

@elliottower @coderSupreme789 here is the first draft of the tuto in case you are curious and want to have a look.

# Conflicts: # torchrl/envs/libs/pettingzoo.py

matteobettini · 2024-04-22T09:36:05Z

Thanks for this, very complete!

I'm wondering if it isn't a bit too long. The code is about 1k lines of code long, there's a lot of theory and code. When someone goes through this tuto, what should they get out of it exactly (a theory about MARL, a way to code DDPG in MARL, how to use the env?) Can't this be split in more tutos? Or can we remove some extra bits that aren't 100% required to train the algo?

What the tuto is doing is showing the env and the grouping mechanism and training it.

Regarding the code: All the python code is necessary for training, except the section where i print the env specs and rollout which i think is needed to show readers the agent groups
Regarding the text: this follows the PPO and multi-agent PPO format. I can remove some theory, this was there as it motivates some choices on how to centralize the critic and so on, but if you think too verbous we can cut. I can also cut the DDPG anf off-policy explainations

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

vmoens · 2024-04-22T14:04:23Z

I should have access to S3 soon, bear with me

EDIT: feel free to send me the files via PM and I'll store them on S3

vmoens · 2024-04-22T14:27:07Z

Regarding the text: this follows the PPO and multi-agent PPO format. I can remove some theory, this was there as it motivates some choices on how to centralize the critic and so on, but if you think too verbous we can cut. I can also cut the DDPG anf off-policy explainations

I guessed so, but most feedback I got even from very close, top collaborators such as yourself is that our tutorials are very long and hard to follow, if not worse than that.
If we can make sure that tutorials that are aimed at a large audience (which I suspect is the case here since it starts with the very basics like off-policy vs on-policy) are also completed in 10 mins top I think they will be more impactful.

Concretely, I would either make this tutorial advanced and presuppose that users have an extensive knowledge of MARL and RL in general, and just point them to refs in case context is needed, or split the tutorial in 2-5 sub-tutorials

matteobettini · 2024-04-22T14:32:17Z

Regarding the text: this follows the PPO and multi-agent PPO format. I can remove some theory, this was there as it motivates some choices on how to centralize the critic and so on, but if you think too verbous we can cut. I can also cut the DDPG anf off-policy explainations

I guessed so, but most feedback I got even from very close, top collaborators such as yourself is that our tutorials are very long and hard to follow, if not worse than that. If we can make sure that tutorials that are aimed at a large audience (which I suspect is the case here since it starts with the very basics like off-policy vs on-policy) are also completed in 10 mins top I think they will be more impactful.

Concretely, I would either make this tutorial advanced and presuppose that users have an extensive knowledge of MARL and RL in general, and just point them to refs in case context is needed, or split the tutorial in 2-5 sub-tutorials

I have removed basically all the RL background intro from the tutorial and am making it advanced since it is kind of a more complex case than the PPO one

vmoens

What's the status of the RB/Optimizer/Loss grouping? Can we unify that? If not what's the blocker?

vmoens · 2024-04-22T14:42:23Z

tutorials/sphinx-tutorials/multiagent_competitive_ddpg.py

+env = TransformedEnv(
+    env,
+    RewardSum(
+        in_keys=env.reward_keys, reset_keys=["_reset"] * len(env.group_map.keys())


Got it!
Not super clean that we ask people to pass _reset, which is private
We should ask for done instead

tutorials/sphinx-tutorials/multiagent_competitive_ddpg.py

matteobettini · 2024-04-22T14:47:43Z

Can we unify that? If not what's the blocker?

I answered that in the thread. We want to keep those separate to allow different choices across groups. Plus losses cannot physicall work with multiple groups

matteobettini · 2024-04-22T14:48:38Z

Got it!
Not super clean that we ask people to pass _reset, which is private
We should ask for done instead

The problem is that the dones are still one per group plus the shared one. The _reset on the other hand is just one

vmoens · 2024-04-22T20:24:43Z

Got it!
Not super clean that we ask people to pass _reset, which is private
We should ask for done instead

The problem is that the dones are still one per group plus the shared one. The _reset on the other hand is just one

IIRC the global reset/done precedes over the local ones.

Am I right that what you're trying to do is override the local resets with the global one then?
Couldn't we use "done" like you're using reset?

vmoens · 2024-04-22T20:27:57Z

Can we unify that? If not what's the blocker?

I answered that in the thread. We want to keep those separate to allow different choices across groups. Plus losses cannot physicall work with multiple groups

Sorry I missed this, the thread must have been closed.

I don't understand this statement though, can you elaborate?
In particular

losses cannot physicall work with multiple groups

what does "physicall" mean here?
What is blocking us to use a single loss across groups?

We want to keep those separate to allow different choices across groups

What "choices"? Hyperparameters?
What I see is that they are all the same so unless we use different ones it isn't immediate to understand.

matteobettini · 2024-04-23T07:02:11Z

Am I right that what you're trying to do is override the local resets with the global one then? Couldn't we use "done" like you're using reset?

So in pettingzoo we have the global done and the local dones but only the global _reset. So the local resets just do not exist.

since the class needs a reset for each agent key, we are telling all reward keys to track the global reset.

matteobettini · 2024-04-23T07:11:16Z

Can we unify that? If not what's the blocker?

I answered that in the thread. We want to keep those separate to allow different choices across groups. Plus losses cannot physicall work with multiple groups

Sorry I missed this, the thread must have been closed.

I don't understand this statement though, can you elaborate? In particular

losses cannot physicall work with multiple groups

what does "physicall" mean here? What is blocking us to use a single loss across groups?

We want to keep those separate to allow different choices across groups

What "choices"? Hyperparameters? What I see is that they are all the same so unless we use different ones it isn't immediate to understand.

Ok let me elaborate more, because this is quite a fundamental concept about the whole MARL api and BenchMARL.

losses are stateful components that only take one set of keys (reward,state,done, and so on). Therefore they cannot operate across multiple groups
each group can have different global values, like global critic output, that sit at root. Therefore if you kept multiple groups in the same loss it is unclear how to handle these
in the example of this tutorial, we stop training one group after x iterations. This would not be possible with a shared buffer/loss/optimizer

The core of the MARL API is that if you want a shared loss/buffer/optimizer you can keep agents in the same group. But when agents are in different groups they can be decoupled into individual components so that you can

use different losses/buffers(/optimizers for each group
Train them asynchronously
not pollute the data of other groups when writing global keys for your group

vmoens · 2024-04-23T08:14:47Z

Thanks for the clarification
I can see why it isn't easy to have one loss for multiple groups atm, but don't we want that to be a feature, rather than asking users to repeat the tedious construction of N different buffers, N different optimizers and N different losses?
It feels like there's a lot of boilerplate that we could get rid of without too much trouble from within the library.

Since we can use param_groups in the optimizers, we could obtain the same behaviour as having many optimizers by just splitting the param groups carefully. Note that this will come with significant speedups on cuda too since you'll be able to update your params asynchronously.

Train them asynchronously

Isn't the training synchronous here? They're trained sequentially for a given number of steps no?

in the example of this tutorial, we stop training one group after x iterations. This would not be possible with a shared buffer/loss/optimizer

Can you point me to the part of the tutorial that does that? What I see is that there's a constant number of steps for each component and the learning rates as well as other hyperparameters are all shared. From a non MARL trained eye it seems like there's a bunch of stuff that could be simplified (for instance why not having one replay buffer from which you sample and then get the relevant keys with indexing / select?)

matteobettini · 2024-04-23T08:28:55Z

Can you point me to the part of the tutorial that does that? What I see is that there's a constant number of steps for each component and the learning rates as well as other hyperparameters are all shared. From a non MARL trained eye it seems like there's a bunch of stuff that could be simplified (for instance why not having one replay buffer from which you sample and then get the relevant keys with indexing / select?)

  # Stop training a certain group when a condition is met (e.g., number of training iterations)
    if iteration == iteration_when_stop_training_evaders:
        del train_group_map["agent"]

Again, at some point we may sample more from the buffer of one group or stop completely sampling from one group like in this tuto.

Isn't the training synchronous here? They're trained sequentially for a given number of steps no?

I mean train them at different times. In this tuto for the seond half of the iterations we are trianing just one

I can see why it isn't easy to have one loss for multiple groups atm, but don't we want that to be a feature, rather than asking users to repeat the tedious construction of N different buffers, N different optimizers and N different losses?
It feels like there's a lot of boilerplate that we could get rid of without too much trouble from within the library.

I doi not agree. I think that this is not a feature we want as it is good to have the clear separation and have losses being single group.

Repeating the construction in a for loop is peanuts compared to the complexity that we would introduce by moving the for loop inside the losses (because at the end of the day the for loop will be somewhere since it is all heterogeneous networks).

I do not think that the current way is boiler plate as users can just experiment with setting different values for groups.

What if they want a different learning rate for the evaders?
What if they want MADDPG for chasers and MAPPO for evaders? (at this point one will have a buffer and the other no)
What if they want to use a curriculum of evaders to train the chasers?

All these cases benefit from the flexibility of training groups independently and I currently see no benefits in making groups share training components (because the for loops will be just nested inside the loss)

But since this is maybe the most important MARL choice of the lib and kind of the core of my internship let's maybe discuss it more also offline

…tingzoo_tuto # Conflicts: # tutorials/sphinx-tutorials/multiagent_competitive_ddpg.py

amend

7ac785d

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 29, 2024

matteobettini marked this pull request as draft March 29, 2024 08:18

amend

9e5f2c4

matteobettini changed the title ~~[Tutorial] PettingZoo Parallel competitive environment~~ [Tutorial] PettingZoo Parallel competitive tutorial Mar 29, 2024

amend

fc430b5

matteobettini commented Mar 29, 2024

View reviewed changes

amend

7fd54a4

matteobettini added 5 commits April 1, 2024 12:12

amend

982a744

amend

581d5e9

amend

947440e

amend

b87bd44

amend

683e0e8

matteobettini commented Apr 2, 2024

View reviewed changes

tutorials/sphinx-tutorials/off_policy_training_pettingzoo_vmas.png Outdated

Copy link

Contributor Author

matteobettini Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this needs s3

matteobettini added 5 commits April 2, 2024 14:51

amend

027dcdc

up to policy

fa45dbd

up to policy

f5d48f6

amend

11e7176

finish?

285a1d1

matteobettini marked this pull request as ready for review April 2, 2024 16:54

vmoens added documentation Improvements or additions to documentation tutorials labels Apr 3, 2024

Merge branch 'main' into pettingzoo_tuto

c744307

# Conflicts: # torchrl/envs/libs/pettingzoo.py

vmoens mentioned this pull request Apr 9, 2024

Tutorial of implementing learning using torchrl in a PettingZoo environment #2067

Closed

Merge remote-tracking branch 'origin/main' into pettingzoo_tuto

56219b0

matteobettini and others added 6 commits April 22, 2024 11:03

Apply suggestions from code review

b664855

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

review comments and major shortening

31cf18e

amend

e1400c4

amend

0fe1bf9

remove media

973a8ad

amend

f5aff55

matteobettini added 2 commits April 22, 2024 15:43

amend

d97dd16

make advanced

967d6fa

vmoens reviewed Apr 22, 2024

View reviewed changes

matteobettini added 3 commits April 22, 2024 15:51

amend links

682c4f7

link

b3f7d70

amend

8d053b7

vmoens added 4 commits April 24, 2024 09:43

Merge remote-tracking branch 'origin/main' into pettingzoo_tuto

9903193

amend

1b3a536

Merge remote-tracking branch 'matteobettini/pettingzoo_tuto' into pet…

de51895

…tingzoo_tuto # Conflicts: # tutorials/sphinx-tutorials/multiagent_competitive_ddpg.py

amend

7b5785d

vmoens merged commit 934100f into pytorch:main Apr 24, 2024
3 of 4 checks passed

matteobettini deleted the pettingzoo_tuto branch April 24, 2024 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tutorial] PettingZoo Parallel competitive tutorial #2047

[Tutorial] PettingZoo Parallel competitive tutorial #2047

matteobettini commented Mar 29, 2024 •

edited

Loading

pytorch-bot bot commented Mar 29, 2024 •

edited

Loading

matteobettini Mar 29, 2024

vmoens Apr 22, 2024

vmoens commented Mar 29, 2024

matteobettini commented Apr 1, 2024

matteobettini Apr 2, 2024

matteobettini commented Apr 2, 2024

matteobettini commented Apr 2, 2024 •

edited

Loading

matteobettini commented Apr 22, 2024 •

edited

Loading

vmoens commented Apr 22, 2024 •

edited

Loading

vmoens commented Apr 22, 2024

matteobettini commented Apr 22, 2024

vmoens left a comment

vmoens Apr 22, 2024

matteobettini commented Apr 22, 2024

matteobettini commented Apr 22, 2024

vmoens commented Apr 22, 2024

vmoens commented Apr 22, 2024

matteobettini commented Apr 23, 2024

matteobettini commented Apr 23, 2024

vmoens commented Apr 23, 2024

matteobettini commented Apr 23, 2024

[Tutorial] PettingZoo Parallel competitive tutorial #2047

[Tutorial] PettingZoo Parallel competitive tutorial #2047

Conversation

matteobettini commented Mar 29, 2024 • edited Loading

pytorch-bot bot commented Mar 29, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2047

❌ 5 New Failures, 16 Unrelated Failures

matteobettini Mar 29, 2024

Choose a reason for hiding this comment

vmoens Apr 22, 2024

Choose a reason for hiding this comment

vmoens commented Mar 29, 2024

matteobettini commented Apr 1, 2024

matteobettini Apr 2, 2024

Choose a reason for hiding this comment

matteobettini commented Apr 2, 2024

matteobettini commented Apr 2, 2024 • edited Loading

matteobettini commented Apr 22, 2024 • edited Loading

vmoens commented Apr 22, 2024 • edited Loading

vmoens commented Apr 22, 2024

matteobettini commented Apr 22, 2024

vmoens left a comment

Choose a reason for hiding this comment

vmoens Apr 22, 2024

Choose a reason for hiding this comment

matteobettini commented Apr 22, 2024

matteobettini commented Apr 22, 2024

vmoens commented Apr 22, 2024

vmoens commented Apr 22, 2024

matteobettini commented Apr 23, 2024

matteobettini commented Apr 23, 2024

vmoens commented Apr 23, 2024

matteobettini commented Apr 23, 2024

matteobettini commented Mar 29, 2024 •

edited

Loading

pytorch-bot bot commented Mar 29, 2024 •

edited

Loading

matteobettini commented Apr 2, 2024 •

edited

Loading

matteobettini commented Apr 22, 2024 •

edited

Loading

vmoens commented Apr 22, 2024 •

edited

Loading