About loss #8

onetance · 2024-12-21T11:07:07Z

Hi,
may I ask why in the expression

loss = nn.CrossEntropyLoss()(logits, answers) we don’t use “answers - 1”? In the "Beauty" dataset, the item range is 1 to 12101, not 0 to 12100. Therefore, when using nn.CrossEntropyLoss(), the target labels (answers) should typically be the category indices, which start from 0. I would appreciate your insight on this.
Thank you!

yehjin-shin · 2024-12-23T09:45:30Z

Hi. Thanks for the question and for pointing this out.

You are correct that the actual item range is from 1 to 12101. However, we include item 0 as a padding token to maintain consistent sequence lengths during precessing. For simplicity, logits are calculated for item 0 as well, but it's important to note that item 0 never appears in the target label answers. This ensures that the model focuses only on valid items in the range of 1 to 12101.

As you mentioned, it is more precise to use answers-1 and exclude the first row of logits for computing cross-entropy loss. However, we opted for our current implementation, which avoids additional adjustments to the indices while maintaining equivalent results.

I hope this clarifies your concern. If you have further questions, please let me know.

onetance · 2024-12-23T11:14:32Z

Hi. Thanks for the question and for pointing this out.
Best regards！

onetance closed this as completed Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About loss #8

About loss #8

onetance commented Dec 21, 2024

yehjin-shin commented Dec 23, 2024

onetance commented Dec 23, 2024

About loss #8

About loss #8

Comments

onetance commented Dec 21, 2024

yehjin-shin commented Dec 23, 2024

onetance commented Dec 23, 2024