Skip to content

[Tracking issue] General dataset support #2071

Open
@qgallouedec

Description

The aim is for all trainers to apply the same procedure in their init function:

  • if needed, apply the chat template, then
  • if needed, tokenize.

Support todo:

Standard dataset

Conversational dataset

Misc

  • Update docs/dataset_format.mdx

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions