You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
loss = nn.CrossEntropyLoss()(logits, answers) we don’t use “answers - 1”? In the "Beauty" dataset, the item range is 1 to 12101, not 0 to 12100. Therefore, when using nn.CrossEntropyLoss(), the target labels (answers) should typically be the category indices, which start from 0. I would appreciate your insight on this.
Thank you!
The text was updated successfully, but these errors were encountered:
Hi. Thanks for the question and for pointing this out.
You are correct that the actual item range is from 1 to 12101. However, we include item 0 as a padding token to maintain consistent sequence lengths during precessing. For simplicity, logits are calculated for item 0 as well, but it's important to note that item 0 never appears in the target label answers. This ensures that the model focuses only on valid items in the range of 1 to 12101.
As you mentioned, it is more precise to use answers-1 and exclude the first row of logits for computing cross-entropy loss. However, we opted for our current implementation, which avoids additional adjustments to the indices while maintaining equivalent results.
I hope this clarifies your concern. If you have further questions, please let me know.
Hi,
may I ask why in the expression
loss = nn.CrossEntropyLoss()(logits, answers) we don’t use “answers - 1”? In the "Beauty" dataset, the item range is 1 to 12101, not 0 to 12100. Therefore, when using nn.CrossEntropyLoss(), the target labels (answers) should typically be the category indices, which start from 0. I would appreciate your insight on this.
Thank you!
The text was updated successfully, but these errors were encountered: