Where can we change the ratio for train dataset split? #822

apple-1 · 2024-12-12T06:55:12Z

Is it possible to change the train split ratio? Right now, from 1400 rows in the train file, I get 250 rows in train dataset.

abhishekkrthakur · 2024-12-12T08:22:19Z

you could split the data yourself and upload both training and valid splits :)

apple-1 · 2024-12-12T10:44:39Z

I am using my local machine for training. I had placed a train file - train.csv - in the data folder with 1400 rows. After running the trainer, the trainer log includes this piece of info:

INFO | 2024-12-12 12:09:15 | autotrain.trainers.clm.utils:process_input_data:398 - Train data: Dataset({
features: ['text', 'Description'],
num_rows: 250

Does that mean it takes only 250 rows from the train file?

I am new to ML. Kindly explain a bit.

abhishekkrthakur · 2024-12-12T11:55:20Z

what are you training? please provide more details :)

apple-1 · 2024-12-13T04:17:30Z

Hi, I am training GPT2 locally.

My train set has 1400 rows - please see attached. And also attaching the screenshot of the log of training.
train.csv

Config is as follows:

conf = f"""
task: llm-{trainer}
base_model: {model_name}
project_name: {project_name}
log: tensorboard
backend: local

data:
path: /data
train_split: train
valid_split: null
chat_template: null
column_mapping:
text_column: text

params:
block_size: {block_size}
lr: {learning_rate}
warmup_ratio: {warmup_ratio}
weight_decay: {weight_decay}
epochs: {num_epochs}
batch_size: {batch_size}
gradient_accumulation: {gradient_accumulation}
mixed_precision: {mixed_precision}
peft: {peft}
quantization: {quantization}
lora_r: {lora_r}
lora_alpha: {lora_alpha}
lora_dropout: {lora_dropout}
unsloth: {unsloth}

hub:
username: ${{HF_USERNAME}}
token: ${{HF_TOKEN}}
push_to_hub: {push_to_hub}
"""

apple-1 · 2024-12-14T11:04:24Z

The params I used are:

unsloth = False # @param ["False", "True"] {type:"raw"}
learning_rate = 2e-4 # @param {type:"number"}
num_epochs = 1 #@param {type:"number"}
batch_size = 1 # @param {type:"slider", min:1, max:32, step:1}
block_size = 256 # @param {type:"number"}
trainer = "sft" # @param ["generic", "sft"] {type:"string"}
warmup_ratio = 0.1 # @param {type:"number"}
weight_decay = 0.01 # @param {type:"number"}
gradient_accumulation = 2 # @param {type:"number"}
mixed_precision = "none" # @param ["fp16", "bf16", "none"] {type:"string"}
peft = True # @param ["False", "True"] {type:"raw"}
quantization = "int8" # @param ["int4", "int8", "none"] {type:"string"}
lora_r = 16 #@param {type:"number"}
lora_alpha = 32 #@param {type:"number"}
lora_dropout = 0.05 #@param {type:"number"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where can we change the ratio for train dataset split? #822

Where can we change the ratio for train dataset split? #822

apple-1 commented Dec 12, 2024

abhishekkrthakur commented Dec 12, 2024

apple-1 commented Dec 12, 2024 •

edited

Loading

abhishekkrthakur commented Dec 12, 2024

apple-1 commented Dec 13, 2024

apple-1 commented Dec 14, 2024

Where can we change the ratio for train dataset split? #822

Where can we change the ratio for train dataset split? #822

Comments

apple-1 commented Dec 12, 2024

abhishekkrthakur commented Dec 12, 2024

apple-1 commented Dec 12, 2024 • edited Loading

abhishekkrthakur commented Dec 12, 2024

apple-1 commented Dec 13, 2024

apple-1 commented Dec 14, 2024

apple-1 commented Dec 12, 2024 •

edited

Loading