[BUG]ClipPPOLoss encounters an bug when calculating the loss in a composite action space scenario, failing to produce the correct result.

## Describe the bug

When I proceeded further based on the issue https://github.com/pytorch/rl/issues/2402
I have already made modifications based on the suggestions according to the issue above, but my code encounters issues when calculating the PPO loss. After debugging, I found that the line gain1 = log_weight.exp() * advantage always results in a tensor where all values are zero. I also discovered that this might be due to the fact that the result from return self.log_prob_composite(sample, include_sum=True) is too small (e.g., -300, -200, etc.). I can't figure out why self.log_prob_composite computes such values, and I hope someone can help me with this issue.


## System info

win11 24h2
python 3.10.14

torch 2.4.0+cu118
torchaudio 2.4.0+cu118
torchrl 0.5.0+ca3a595
torchvision 0.19.0+cu118
tensordict 0.5.0+eba0769

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]ClipPPOLoss encounters an bug when calculating the loss in a composite action space scenario, failing to produce the correct result. #2487

Describe the bug

System info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development