[Feature] Adds value clipping in ClipPPOLoss loss #2005

albertbou92 · 2024-03-09T10:36:02Z

Description

This PR add a value clipping option in ClipPPOLoss loss.

This PR is related to #1977.

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

pytorch-bot · 2024-03-09T10:36:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2005

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 6 New Failures, 1 Unrelated Failure

As of commit c72347c with merge base c371266 ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Habitat Tests on Linux / tests (3.9, 11.6) / linux-job (gh)
RuntimeError: Command docker exec -t 5d472dd1ea2e73eacc2d1f9f77502e8f6c3805efc9cda16adf311e59c2158a28 /exec failed with exit code 139
Unit-tests on Linux / tests-cpu (3.8) / linux-job (gh)
AttributeError: 'OrphanPath' object has no attribute 'exists'
Unit-tests on Linux / tests-gpu (3.8, 12.1) / linux-job (gh)
AttributeError: 'OrphanPath' object has no attribute 'exists'
Unit-tests on Linux / tests-stable-gpu (3.8, 11.8) / linux-job (gh)
AttributeError: 'OrphanPath' object has no attribute 'exists'

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on MacOS CPU / tests (3.8) / macos-job (gh)
test/test_modules.py::TestMultiAgent::test_multiagent_mlp[batch1-None-False-True-3]

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens · 2024-03-09T10:53:16Z

torchrl/objectives/ppo.py

@@ -664,6 +664,8 @@ class ClipPPOLoss(PPOLoss):
            ``"none"`` | ``"mean"`` | ``"sum"``. ``"none"``: no reduction will be applied,
            ``"mean"``: the sum of the output will be divided by the number of
            elements in the output, ``"sum"``: the output will be summed. Default: ``"mean"``.
+        clip_value_loss (bool, optional): if ``True``, the value loss will be clipped with respect to the


I would have this

not provided / False = no clipping

True = same clipping as log ratio

number = clip with that number

That way you can reuse the same semantic for other losses

albertbou92 · 2024-03-10T17:28:57Z

We can potentially add it to more losses (PPOLoss, KLPENPPOLoss, A2C, Reinforce) with the only difference that we would require the user to provide a float and not a bool since we can not default to the log ratio clipping value.

But I am not sure if we should see if the feature is asked first for those other losses, since I have never seen in used in those.

vmoens · 2024-03-10T19:36:54Z

I don't thinks it's a problem to add it, we don't want to copy existing repos but enable people to experiment and swap configurations quickly.

vmoens

Thanks for this! Cool feature!
A couple of comments along the way but otherwise LGTM

torchrl/objectives/a2c.py

torchrl/objectives/ppo.py

torchrl/objectives/reinforce.py

torchrl/objectives/ppo.py

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

albertbou92 · 2024-03-12T09:05:28Z

Thanks for the feedback! I integrated the suggested changes, but I think it makes sense to add the metrics suggested in #1977 (comment) if they are helpful

albertbou92 · 2024-03-12T10:00:45Z

I added the clip_fraction for both the PPO loss and the Value loss

torchrl/objectives/a2c.py

vmoens · 2024-03-12T16:37:01Z

torchrl/objectives/ppo.py

@@ -506,6 +527,16 @@ def loss_critic(self, tensordict: TensorDictBase) -> torch.Tensor:
                f"can be used for the value loss."
            )

+        if self.clip_value:
+            try:
+                old_state_value = tensordict.get(self.tensor_keys.value).clone()


why do we clone here?

Because the self.tensor_keys.value prediction will be recomputed once we pass tensordict through the value network and we will lose this tensor. Does it make sense?

We never write anything in-place AFAICT

torchrl/objectives/a2c.py

vmoens · 2024-03-12T16:38:57Z

torchrl/objectives/a2c.py

+            )
+            # Chose the most pessimistic value prediction between clipped and non-clipped
+            loss_value = torch.max(loss_value, loss_value_clipped)
+            clip_fraction = (


I don't get what clip_fraction represents

One thing I like to look at is what proportion of data is being clipped

Also we should be careful about this since it doesn't come for free. The overhead introduced by logging this metric could impact performance...

Yes, as I understand it, it is a measure of how much the old model that collected the data differs from the current model. If the fraction values stay close to 1.0 the model is changing slowly. If the ratio consistently deviates from 1.0 and approaches or hits the clipping value, it indicates that the new model is substantially different from the old model, indicating a faster rate of model change. It might help better understand the learning process, but we could make it optional if it too costly.

torchrl/objectives/a2c.py

vmoens

Some very minor details (clone and type annotations) and we're good to go!

vmoens · 2024-03-14T15:39:00Z

torchrl/objectives/ppo.py

@@ -506,6 +527,16 @@ def loss_critic(self, tensordict: TensorDictBase) -> torch.Tensor:
                f"can be used for the value loss."
            )

+        if self.clip_value:
+            try:
+                old_state_value = tensordict.get(self.tensor_keys.value).clone()


We never write anything in-place AFAICT

torchrl/objectives/ppo.py

torchrl/objectives/reinforce.py

torchrl/objectives/a2c.py

torchrl/objectives/ppo.py

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

vmoens

LGTM

Co-authored-by: Vincent Moens <vincentmoens@gmail.com> Co-authored-by: Vincent Moens <vmoens@meta.com>

albertbou92 added 4 commits March 9, 2024 11:07

clipping code

88b1ea3

fix test

704f7da

fix test

fee26d6

fix test

0cd5af0

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 9, 2024

albertbou92 mentioned this pull request Mar 9, 2024

Value clipping for PPO loss #1977

Closed

vmoens reviewed Mar 9, 2024

View reviewed changes

albertbou92 added 5 commits March 10, 2024 17:48

extend logiv

27e1822

fix test

3af7f0c

register param

91577c9

register param

42071f3

minor fix

5e33992

albertbou92 requested a review from vmoens March 10, 2024 17:23

vmoens added the enhancement New feature or request label Mar 10, 2024

albertbou92 added 10 commits March 11, 2024 16:20

added param for a2c and reinforce

47ed022

fix test

c363a37

fix test

c0d03aa

fix test

102235e

fix test

8bf1ae2

fix test

025ceb4

fix test

3b8728b

fix test

fdcbf9f

docstrings

5d75566

docstrings

4c8c783

vmoens reviewed Mar 11, 2024

View reviewed changes

albertbou92 and others added 2 commits March 12, 2024 08:59

Update torchrl/objectives/a2c.py

19e57b5

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/ppo.py

245e562

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

albertbou92 and others added 7 commits March 12, 2024 09:00

Update torchrl/objectives/ppo.py

48f06ce

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/a2c.py

c356670

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/a2c.py

fc3c3dd

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/a2c.py

7d04eca

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

integrate feedback

28c9cae

fix test

9e273d1

fix test

c836bd8

albertbou92 added 2 commits March 12, 2024 10:17

format

1672c48

return clip fractions

cc04ec4

fix test

78ad4d8

vmoens reviewed Mar 12, 2024

View reviewed changes

albertbou92 added 4 commits March 14, 2024 12:29

update feedback

ad1425c

fix test

82e9efd

fix test

3723dc2

fix test

bd1c974

vmoens approved these changes Mar 14, 2024

View reviewed changes

albertbou92 and others added 5 commits March 14, 2024 16:43

Update torchrl/objectives/ppo.py

3e7b824

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/a2c.py

7a5c9ee

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/ppo.py

1374d80

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

Update torchrl/objectives/reinforce.py

cb83230

Co-authored-by: Vincent Moens <vincentmoens@gmail.com>

minor fixes

021aa71

albertbou92 requested a review from vmoens March 18, 2024 08:39

Merge remote-tracking branch 'origin/main' into clip_value_loss

c123231

vmoens approved these changes Mar 18, 2024

View reviewed changes

amend

c72347c

vmoens merged commit 43c6bca into pytorch:main Mar 18, 2024
60 of 67 checks passed

vmoens deleted the clip_value_loss branch March 18, 2024 11:05

SandishKumarHN pushed a commit to SandishKumarHN/rl that referenced this pull request Mar 18, 2024

[Feature] Adds value clipping in ClipPPOLoss loss (pytorch#2005)

ad2b6c3

Co-authored-by: Vincent Moens <vincentmoens@gmail.com> Co-authored-by: Vincent Moens <vmoens@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Adds value clipping in ClipPPOLoss loss #2005

[Feature] Adds value clipping in ClipPPOLoss loss #2005

albertbou92 commented Mar 9, 2024

pytorch-bot bot commented Mar 9, 2024 •

edited

Loading

vmoens Mar 9, 2024

albertbou92 commented Mar 10, 2024

vmoens commented Mar 10, 2024

vmoens left a comment

albertbou92 commented Mar 12, 2024

albertbou92 commented Mar 12, 2024 •

edited

Loading

vmoens Mar 12, 2024

albertbou92 Mar 14, 2024 •

edited

Loading

vmoens Mar 14, 2024

vmoens Mar 12, 2024

albertbou92 Mar 14, 2024 •

edited

Loading

vmoens left a comment

vmoens Mar 14, 2024

vmoens left a comment

[Feature] Adds value clipping in ClipPPOLoss loss #2005

[Feature] Adds value clipping in ClipPPOLoss loss #2005

Conversation

albertbou92 commented Mar 9, 2024

Description

Motivation and Context

Types of changes

Checklist

pytorch-bot bot commented Mar 9, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2005

❌ 6 New Failures, 1 Unrelated Failure

vmoens Mar 9, 2024

Choose a reason for hiding this comment

albertbou92 commented Mar 10, 2024

vmoens commented Mar 10, 2024

vmoens left a comment

Choose a reason for hiding this comment

albertbou92 commented Mar 12, 2024

albertbou92 commented Mar 12, 2024 • edited Loading

vmoens Mar 12, 2024

Choose a reason for hiding this comment

albertbou92 Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

vmoens Mar 14, 2024

Choose a reason for hiding this comment

vmoens Mar 12, 2024

Choose a reason for hiding this comment

albertbou92 Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

vmoens Mar 14, 2024

Choose a reason for hiding this comment

vmoens left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Mar 9, 2024 •

edited

Loading

albertbou92 commented Mar 12, 2024 •

edited

Loading

albertbou92 Mar 14, 2024 •

edited

Loading

albertbou92 Mar 14, 2024 •

edited

Loading