[bug] objective/entropy < 0  when using rlootrainer and ppotrainer

https://github.com/huggingface/trl/blob/1661bc295e694dc1ec9d50f80746612347051da3/trl/trainer/rloo_trainer.py#L443

This is because in the previous code, the padding part of logprod is filled with 1.

INVALID_LOGPROB=1.0
logprobs = torch.masked_fill(logprobs, padding_mask, INVALID_LOGPROB)

I don't know why INVALID_LOGPROB is set to 1, wouldn't it work fine if it is set to 0?