Choosing number of samples per Taks #22

MLRadfys · 2024-08-30T10:29:45Z

Hi and thanks for providing the code to your scientific paper!

According to your article, you randomly choose N = 5000 samples per task.
I am wondering if this could be a problem in case of unbalanced datasets?

For toxic classification in example, I would guess that most of the dialogs are non-toxic.
If the number of samples is high and you only choose N=5000 samples, there is a risk that most of the samples are non-toxic.

Is this something you have investigated?

Thanks in advance,

kind regards,

M

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choosing number of samples per Taks #22

Choosing number of samples per Taks #22

MLRadfys commented Aug 30, 2024 •

edited

Loading

Choosing number of samples per Taks #22

Choosing number of samples per Taks #22

Comments

MLRadfys commented Aug 30, 2024 • edited Loading

MLRadfys commented Aug 30, 2024 •

edited

Loading