Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choosing number of samples per Taks #22

Open
MLRadfys opened this issue Aug 30, 2024 · 0 comments
Open

Choosing number of samples per Taks #22

MLRadfys opened this issue Aug 30, 2024 · 0 comments

Comments

@MLRadfys
Copy link

MLRadfys commented Aug 30, 2024

Hi and thanks for providing the code to your scientific paper!

According to your article, you randomly choose N = 5000 samples per task.
I am wondering if this could be a problem in case of unbalanced datasets?

For toxic classification in example, I would guess that most of the dialogs are non-toxic.
If the number of samples is high and you only choose N=5000 samples, there is a risk that most of the samples are non-toxic.

Is this something you have investigated?

Thanks in advance,

kind regards,

M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant