Skip to content

[BUG]ClipPPOLoss encounters an bug when calculating the loss in a composite action space scenario, failing to produce the correct result.Β #2487

Open
@Sui-Xing

Description

Describe the bug

When I proceeded further based on the issue #2402
I have already made modifications based on the suggestions according to the issue above, but my code encounters issues when calculating the PPO loss. After debugging, I found that the line gain1 = log_weight.exp() * advantage always results in a tensor where all values are zero. I also discovered that this might be due to the fact that the result from return self.log_prob_composite(sample, include_sum=True) is too small (e.g., -300, -200, etc.). I can't figure out why self.log_prob_composite computes such values, and I hope someone can help me with this issue.

System info

win11 24h2
python 3.10.14

torch 2.4.0+cu118
torchaudio 2.4.0+cu118
torchrl 0.5.0+ca3a595
torchvision 0.19.0+cu118
tensordict 0.5.0+eba0769

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions