You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi and thanks for providing the code to your scientific paper!
According to your article, you randomly choose N = 5000 samples per task.
I am wondering if this could be a problem in case of unbalanced datasets?
For toxic classification in example, I would guess that most of the dialogs are non-toxic.
If the number of samples is high and you only choose N=5000 samples, there is a risk that most of the samples are non-toxic.
Is this something you have investigated?
Thanks in advance,
kind regards,
M
The text was updated successfully, but these errors were encountered:
Hi and thanks for providing the code to your scientific paper!
According to your article, you randomly choose N = 5000 samples per task.
I am wondering if this could be a problem in case of unbalanced datasets?
For toxic classification in example, I would guess that most of the dialogs are non-toxic.
If the number of samples is high and you only choose N=5000 samples, there is a risk that most of the samples are non-toxic.
Is this something you have investigated?
Thanks in advance,
kind regards,
M
The text was updated successfully, but these errors were encountered: