Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About loss #8

Closed
onetance opened this issue Dec 21, 2024 · 2 comments
Closed

About loss #8

onetance opened this issue Dec 21, 2024 · 2 comments

Comments

@onetance
Copy link

Hi,
may I ask why in the expression
image

loss = nn.CrossEntropyLoss()(logits, answers) we don’t use “answers - 1”? In the "Beauty" dataset, the item range is 1 to 12101, not 0 to 12100. Therefore, when using nn.CrossEntropyLoss(), the target labels (answers) should typically be the category indices, which start from 0. I would appreciate your insight on this.
Thank you!

@yehjin-shin
Copy link
Owner

Hi. Thanks for the question and for pointing this out.

You are correct that the actual item range is from 1 to 12101. However, we include item 0 as a padding token to maintain consistent sequence lengths during precessing. For simplicity, logits are calculated for item 0 as well, but it's important to note that item 0 never appears in the target label answers. This ensures that the model focuses only on valid items in the range of 1 to 12101.

As you mentioned, it is more precise to use answers-1 and exclude the first row of logits for computing cross-entropy loss. However, we opted for our current implementation, which avoids additional adjustments to the indices while maintaining equivalent results.

I hope this clarifies your concern. If you have further questions, please let me know.

@onetance
Copy link
Author

Hi. Thanks for the question and for pointing this out.
Best regards!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants