[BugFix] Loss function convert_to_functional seems to generate backprop issues with shared policy-value networks #1034

albertbou92 · 2023-04-11T07:18:13Z

Description

At least in PPO (but also probably in the rest of the Objective classes), calling self.convert_to_functional in the following lines when using a policy-value shared architecture causes the value network not to update correctly. Could the current code be stopping the value loss from being propagated to the first layers?
https://github.com/pytorch/rl/blob/main/torchrl/objectives/ppo.py#L113
https://github.com/pytorch/rl/blob/main/torchrl/objectives/ppo.py#L117

This PR removes the calling of self.convert_to_functional. I ran it in a test case for Atari Pong env and as the plots show it solved the issue. Green line is the modified version of the code. However, a proper solution should probably be discussed because I am not sure if simply removing these lines causes problems somewhere else.

vmoens · 2023-04-11T07:43:03Z

When you backprop the two losses, I guess that in your implementation the gradients for the common modules are a sum of gradient for the two losses, right?
What compare_against does is that it feeds the module (in this case the critic) with a detached version of the params used for the other module (in this case the policy). This is what is done in some algos but I understand that it may be suboptimal in other cases, especially if the reset of your critic is very shallow.

Have you tried to just remove the compare_against?

If that works, a solution could be an optional call compare_against with a default to not doing it.

albertbou92 · 2023-04-11T08:36:04Z

Some updates. I actually only need to modify this line: https://github.com/pytorch/rl/blob/main/torchrl/objectives/ppo.py#L117

Here is what works and what does not:
self.convert_to_functional(critic, "critic", compare_against=self.actor_params) # Value network update is not correct
self.convert_to_functional(critic, "critic") # Value network update is not correct either
self.critic = critic # This works

So removing the compare_against it does not seem to be the solution.

albertbou92 · 2023-04-11T08:38:03Z

When you backprop the two losses, I guess that in your implementation the gradients for the common modules are a sum of gradient for the two losses, right? What compare_against does is that it feeds the module (in this case the critic) with a detached version of the params used for the other module (in this case the policy). This is what is done in some algos but I understand that it may be suboptimal in other cases, especially if the reset of your critic is very shallow.

Have you tried to just remove the compare_against?

If that works, a solution could be an optional call compare_against with a default to not doing it.

Yes for the code to work the gradients have to be the sum of the two losses

vmoens · 2023-04-11T10:52:51Z

You're right, even without that, there is not param tighing. Let me fix that to have a more predictable behaviour.

vmoens · 2023-04-13T08:23:47Z

Closed by #1037

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 11, 2023

remove convert_to_functional

bdf9949

albertbou92 force-pushed the convert_to_functional_bug branch from f13d8f5 to bdf9949 Compare April 11, 2023 08:34

vmoens closed this Apr 13, 2023

albertbou92 deleted the convert_to_functional_bug branch January 18, 2024 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Loss function convert_to_functional seems to generate backprop issues with shared policy-value networks #1034

[BugFix] Loss function convert_to_functional seems to generate backprop issues with shared policy-value networks #1034

albertbou92 commented Apr 11, 2023 •

edited by vmoens

Loading

vmoens commented Apr 11, 2023

albertbou92 commented Apr 11, 2023

albertbou92 commented Apr 11, 2023

vmoens commented Apr 11, 2023

vmoens commented Apr 13, 2023

[BugFix] Loss function convert_to_functional seems to generate backprop issues with shared policy-value networks #1034

[BugFix] Loss function convert_to_functional seems to generate backprop issues with shared policy-value networks #1034

Conversation

albertbou92 commented Apr 11, 2023 • edited by vmoens Loading

Description

vmoens commented Apr 11, 2023

albertbou92 commented Apr 11, 2023

albertbou92 commented Apr 11, 2023

vmoens commented Apr 11, 2023

vmoens commented Apr 13, 2023

albertbou92 commented Apr 11, 2023 •

edited by vmoens

Loading