-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where can we change the ratio for train dataset split? #822
Comments
you could split the data yourself and upload both training and valid splits :) |
I am using my local machine for training. I had placed a train file - train.csv - in the data folder with 1400 rows. After running the trainer, the trainer log includes this piece of info: INFO | 2024-12-12 12:09:15 | autotrain.trainers.clm.utils:process_input_data:398 - Train data: Dataset({ Does that mean it takes only 250 rows from the train file? I am new to ML. Kindly explain a bit. |
what are you training? please provide more details :) |
Hi, I am training GPT2 locally. My train set has 1400 rows - please see attached. And also attaching the screenshot of the log of training. Config is as follows: conf = f""" data: params: hub: |
The params I used are: unsloth = False # @param ["False", "True"] {type:"raw"} |
Is it possible to change the train split ratio? Right now, from 1400 rows in the train file, I get 250 rows in train dataset.
The text was updated successfully, but these errors were encountered: