[BugFix] Allow for composite action distributions in PPO/A2C losses #2391

albertbou92 · 2024-08-12T18:14:38Z

Description

At the moment objective classes do not allow to use an actor with a composite distribution.

This PR aims to fix this. I have started with PPO, it turned out to required more changes than I anticipated. In particular, I am struggling with the test test_ppo_notensordict.

Once these modification are correct, I will move on the the tests of the other on-policy objectives and then to all other objectives.

This PR requires the TensorDict PR pytorch/tensordict#961 to be merged.

pytorch-bot · 2024-08-12T18:14:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2391

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 6 Unrelated Failures

As of commit 69922fa with merge base a6310ae ():

NEW FAILURES - The following jobs have failed:

Habitat Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t f23d7a0d50375b8b6643e0be9fd98c75aaca18bffb65563fbc745b18c9ea0e5b /exec failed with exit code 139
Libs Tests on Linux / unittests-gym (3.9, 12.1) / linux-job (gh)
test/test_libs.py::TestCollectorLib::test_collector_run[device0-GymEnv-env_args0-env_kwargs0]
Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cpu (gh) (detected as infra flaky with no log or failing log classifier)
Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cuda11_8 (gh) (detected as infra flaky with no log or failing log classifier)
Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cuda12_1 (gh) (detected as infra flaky with no log or failing log classifier)
Build Windows Wheels / pytorch/rl (pytorch/rl, python packaging/wheel/relocate.py, test/smoke_test.py, torchrl) / upload / wheel-py3_9-cuda12_4 (gh) (detected as infra flaky with no log or failing log classifier)
Continuous Benchmark (PR) / CPU Pytest benchmark (gh) (detected as infra flaky with no log or failing log classifier)
Continuous Benchmark (PR) / GPU Pytest benchmark (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens

Good work!
There's an exception with test_ppo_notensordict in test_cost.py

Have a look at the couple of comments I left

torchrl/objectives/a2c.py

torchrl/objectives/ppo.py

albertbou92 · 2024-08-14T11:36:25Z

Thanks a lot for the feedback!

I have made a few changes:

I have removed any specific distribution id check isinstance(dist, CompositeDistribution), which should make the code more general. I am instead simply checking whether the log_prob is a TensorDict as suggested, or alternatively if action is a torch.Tensor. I think using a TransformedDistribution to do that is a bit overkill as it requires coding some methods that we don't need, but if you prefer it I can further look into it.
Also added tests for A2C. At the moment, a composite action distribution is simply not compatible with notensordict. The action becomes a nested structure which makes it difficult. Any suggestion how to solve that?

vmoens

I think it's safe to assume that people won't use this feature with non-tensordict inputs, because the action will be a tensordict anyway.
I would just document it properly in the loss docstrings where we explain how to use the loss without tensordict.

vmoens · 2024-08-14T15:22:27Z

torchrl/objectives/a2c.py

@@ -383,26 +383,39 @@ def get_entropy_bonus(self, dist: d.Distribution) -> torch.Tensor:
            entropy = dist.entropy()
        except NotImplementedError:
            x = dist.rsample((self.samples_mc_entropy,))
-            entropy = -dist.log_prob(x).mean(0)
+            log_prob = dist.log_prob(x)
+            if isinstance(log_prob, TensorDict):


A lazy stack is not a TensorDict but a TensorDict base.
Also ideally we would want this to work with tensorclasses.
The way to go should be to use is_tensor_collection from tensordict lib.

vmoens · 2024-08-14T15:22:40Z

torchrl/objectives/ppo.py

@@ -449,28 +449,38 @@ def get_entropy_bonus(self, dist: d.Distribution) -> torch.Tensor:
            entropy = dist.entropy()
        except NotImplementedError:
            x = dist.rsample((self.samples_mc_entropy,))
-            entropy = -dist.log_prob(x).mean(0)
+            log_prob = dist.log_prob(x)
+            if isinstance(log_prob, TensorDict):


vmoens · 2024-08-14T15:22:46Z

torchrl/objectives/ppo.py

-            kl = (previous_dist.log_prob(x) - current_dist.log_prob(x)).mean(0)
+            previous_log_prob = previous_dist.log_prob(x)
+            current_log_prob = current_dist.log_prob(x)
+            if isinstance(x, TensorDict):


vmoens

I think it's safe to assume that people won't use this feature with non-tensordict inputs, because the action will be a tensordict anyway.
I would just document it properly in the loss docstrings where we explain how to use the loss without tensordict.

albertbou92 · 2024-08-14T16:11:34Z

Done! I will do the off-policy and offline losses in separate PRs.

vmoens

Thanks for the PR!
Do we need a simple test for this?
Like a dedicated function in PPOTest that runs it with composite dists?

vmoens · 2024-08-26T14:09:07Z

torchrl/objectives/a2c.py

@@ -383,26 +395,39 @@ def get_entropy_bonus(self, dist: d.Distribution) -> torch.Tensor:
            entropy = dist.entropy()
        except NotImplementedError:
            x = dist.rsample((self.samples_mc_entropy,))
-            entropy = -dist.log_prob(x).mean(0)
+            log_prob = dist.log_prob(x)


Previously, was this a bug or did we sum the log-probs automatically?

This is simply because the log_prob() method for a composite dist will return a TD instead of a Tensor, so we compute the entropy in 2 steps.

This is the old version:

def get_entropy_bonus(self, dist: d.Distribution) -> torch.Tensor: try: entropy = dist.entropy() except NotImplementedError: x = dist.rsample((self.samples_mc_entropy,)) entropy = -dist.log_prob(x).mean(0) return entropy.unsqueeze(-1)

This is the new version. It simply retrieves the log tensor before computing the entropy.

def get_entropy_bonus(self, dist: d.Distribution) -> torch.Tensor: try: entropy = dist.entropy() except NotImplementedError: x = dist.rsample((self.samples_mc_entropy,)) log_prob = dist.log_prob(x) if is_tensor_collection(log_prob): log_prob = log_prob.get(self.tensor_keys.sample_log_prob) entropy = -log_prob.mean(0) return entropy.unsqueeze(-1)

torchrl/objectives/a2c.py

torchrl/objectives/ppo.py

albertbou92 · 2024-08-26T15:04:58Z

Regarding a dedicated test, how do you usually approach this decision?

When I thought about testing different dists I saw it a bit like testing for different types of ValueEstimators. It should work both with single dists and with composite dists in all the tested situations. So I added it to all tests (except the notensordict tests).

We could probably switch to a single dedicated test function though.

vmoens

LGTM thanks for this!

albertbou92 · 2024-09-02T19:03:57Z

Computing the entropy for composite distributions is not fully resolved, particularly when dealing with a composite distribution that includes some distributions with an implemented entropy() method and others without.

We could add an entropy method to CompositeDistribution as suggested in this PR: pytorch/tensordict#981

wdyt?

vmoens · 2024-09-03T12:53:53Z

From discord:

Hey, sorry, was out on Holidays last week. I can have a look at the PR. What I was wondering: Why is the behavior for log_prob different between composite and non-composite distributions? I.e. why not have log_prob always just return the (combined) log-prob and have a separate log_prob_composite function for the specific case of returning individual log-probs in the composite distribution?

I think it's a good idea, I prefer a log_prob which always returns a single tensor.

vmoens · 2024-09-03T12:58:34Z

To add on my previous comment, here is how I would address this:

Add a warning saying that from v0.7 log-prob will return a tensor. Users can achieve this already by constructing the lib with an additional temporary kwarg return_log_prob_tensor=True (this will be the default in v0.7).
Add a method log_prob_composite for those who need it.

albertbou92 · 2024-09-03T13:54:32Z

From discord:

Hey, sorry, was out on Holidays last week. I can have a look at the PR. What I was wondering: Why is the behavior for log_prob different between composite and non-composite distributions? I.e. why not have log_prob always just return the (combined) log-prob and have a separate log_prob_composite function for the specific case of returning individual log-probs in the composite distribution?

I think it's a good idea, I prefer a log_prob which always returns a single tensor.

makes sense

vmoens · 2024-09-04T14:12:57Z

Merging this to clear space in the PR list but we should take care of #2391 (comment) sooner than later! Wanna give a shot at it or should I?

albertbou92 · 2024-09-04T14:26:34Z

Merging this to clear space in the PR list but we should take care of #2391 (comment) sooner than later! Wanna give a shot at it or should I?

I will give it a shot, give me a few days

albertbou92 added 9 commits August 10, 2024 20:57

account for composite distribution

627b673

tests

9b08555

fix tests ppo

a668d0c

fix tests ppo

e65e1a3

fix tests ppo

fd60ad7

fix tests ppo

bc00e2d

fix tests ppo

e3f61f7

fix tests ppo

4eb413d

fix tests ppo

560ddad

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 12, 2024

albertbou92 marked this pull request as draft August 12, 2024 18:16

format

2865c92

vmoens changed the title ~~[BUG] Allow for composite action distributions in losses~~ [BugFix] Allow for composite action distributions in losses Aug 13, 2024

vmoens added bug Something isn't working enhancement New feature or request labels Aug 13, 2024

vmoens reviewed Aug 13, 2024

View reviewed changes

albertbou92 added 4 commits August 14, 2024 11:58

a2c tests

13a092d

a2c tests

8bc8e03

a2c tests

07ec262

a2c tests

a95eb3d

albertbou92 requested a review from vmoens August 14, 2024 11:39

format

8f00828

vmoens reviewed Aug 14, 2024

View reviewed changes

albertbou92 added 2 commits August 14, 2024 17:59

docstrings and minor fixes

9ce3c58

docstrings and minor fixes

ad32b90

albertbou92 marked this pull request as ready for review August 14, 2024 16:07

docstrings fix

37b733f

albertbou92 requested a review from vmoens August 14, 2024 16:11

vmoens reviewed Aug 26, 2024

View reviewed changes

added required grad for td

69922fa

vmoens approved these changes Aug 27, 2024

View reviewed changes

vmoens changed the title ~~[BugFix] Allow for composite action distributions in losses~~ [BugFix] Allow for composite action distributions in PPO/A2C losses Aug 28, 2024

albertbou92 mentioned this pull request Sep 2, 2024

[BUG] RuntimeError: index -9223372036854775808 is out of bounds for dimension 1 with size 1 #2402

Closed

2 tasks

vmoens mentioned this pull request Sep 3, 2024

[Feature] Unify composite dist method signatures with other dists pytorch/tensordict#981

Merged

10 tasks

vmoens merged commit 49d7f74 into pytorch:main Sep 4, 2024
67 of 76 checks passed

vmoens deleted the loss_composite_dist branch September 4, 2024 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Allow for composite action distributions in PPO/A2C losses #2391

[BugFix] Allow for composite action distributions in PPO/A2C losses #2391

albertbou92 commented Aug 12, 2024 •

edited by vmoens

Loading

pytorch-bot bot commented Aug 12, 2024 •

edited

Loading

vmoens left a comment

albertbou92 commented Aug 14, 2024

vmoens left a comment

vmoens Aug 14, 2024

vmoens Aug 14, 2024

vmoens Aug 14, 2024

vmoens left a comment

albertbou92 commented Aug 14, 2024

vmoens left a comment

vmoens Aug 26, 2024

albertbou92 Aug 26, 2024 •

edited

Loading

albertbou92 commented Aug 26, 2024

vmoens left a comment

albertbou92 commented Sep 2, 2024

vmoens commented Sep 3, 2024 •

edited

Loading

vmoens commented Sep 3, 2024

albertbou92 commented Sep 3, 2024

vmoens commented Sep 4, 2024

albertbou92 commented Sep 4, 2024

[BugFix] Allow for composite action distributions in PPO/A2C losses #2391

[BugFix] Allow for composite action distributions in PPO/A2C losses #2391

Conversation

albertbou92 commented Aug 12, 2024 • edited by vmoens Loading

Description

pytorch-bot bot commented Aug 12, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2391

❌ 3 New Failures, 6 Unrelated Failures

vmoens left a comment

Choose a reason for hiding this comment

albertbou92 commented Aug 14, 2024

vmoens left a comment

Choose a reason for hiding this comment

vmoens Aug 14, 2024

Choose a reason for hiding this comment

vmoens Aug 14, 2024

Choose a reason for hiding this comment

vmoens Aug 14, 2024

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

albertbou92 commented Aug 14, 2024

vmoens left a comment

Choose a reason for hiding this comment

vmoens Aug 26, 2024

Choose a reason for hiding this comment

albertbou92 Aug 26, 2024 • edited Loading

Choose a reason for hiding this comment

albertbou92 commented Aug 26, 2024

vmoens left a comment

Choose a reason for hiding this comment

albertbou92 commented Sep 2, 2024

vmoens commented Sep 3, 2024 • edited Loading

vmoens commented Sep 3, 2024

albertbou92 commented Sep 3, 2024

vmoens commented Sep 4, 2024

albertbou92 commented Sep 4, 2024

albertbou92 commented Aug 12, 2024 •

edited by vmoens

Loading

pytorch-bot bot commented Aug 12, 2024 •

edited

Loading

albertbou92 Aug 26, 2024 •

edited

Loading

vmoens commented Sep 3, 2024 •

edited

Loading