Skip to content

datasets.utils.info_utils.ExpectedMoreSplits: {'validation'} #286

Open
@SDcodehub

Description

╰─$ python llama.py /datadrive/models/Llama-2-13b-chat-hf c4 --wbits 4 --true-sequential --groupsize 128 --save_safetensors /datadrive/models/Llama-2-13b-chat-hf-gptq/llama-2-13b-4bit-gs128.safetensors

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.41it/s]
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:389: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:394: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
Downloading and preparing dataset None/en to file:///home/FRACTAL/sagar.desai/.cache/huggingface/datasets/allenai___json/en-ec45c889631c3c39/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6413.31it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1855.89it/s]
Traceback (most recent call last):
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/llama.py", line 488, in <module>
dataloader, testloader = get_loaders(args.dataset, nsamples=args.nsamples, seed=args.seed, model=args.model, seqlen=model.seqlen)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 189, in get_loaders
return get_c4(nsamples, seed, seqlen, model)
File "/home/FRACTAL/sagar.desai/GPTQ-for-LLaMa/utils/datautils.py", line 64, in get_c4
traindata = load_dataset('allenai/c4', 'allenai--c4', data_files={'train': 'en/c4-train.00000-of-01024.json.gz'}, split='train', use_auth_token=False)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/load.py", line 1797, in load_dataset
builder_instance.download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 890, in download_and_prepare
self._download_and_prepare(
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/builder.py", line 1003, in _download_and_prepare
verify_splits([self.info](http://self.info/).splits, split_dict)
File "/home/FRACTAL/sagar.desai/miniconda3/envs/gptq/lib/python3.9/site-packages/datasets/utils/info_utils.py", line 91, in verify_splits
raise ExpectedMoreSplits(str(set(expected_splits) - set(recorded_splits)))
datasets.utils.info_utils.ExpectedMoreSplits: {'validation'}

working on A100.
tried with different datasets version from 2.10.* to 2.12.*

getting same error

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions