Open
Description
The aim is for all trainers to apply the same procedure in their init function:
- if needed, apply the chat template, then
- if needed, tokenize.
Support todo:
Standard dataset
-
BCOTrainer
-
CPOTrainer
-
DPOTrainer
-
GKDTrainer
(same asSFTTrainer
) -
IterativeSFTTrainer
-
KTOTrainer
-
NashMDTrainer
-
OnlineDPOTrainer
-
ORPOTrainer
-
PPOTrainer
-
RewardTrainer
[RewardTrainer] Tokenize inputs within trainer #2102 -
RLOOTrainer
-
SFTTrainer
(could be previously achieved via"dataset_text_field"
) Defaultdataset_text_field
to"text"
#2078; 🔬 SFT simplification #2405 -
XPOTrainer
Conversational dataset
-
BCOTrainer
BCOTrainer
conversational dataset support #2107 -
CPOTrainer
Conversational dataset support forCPOTrainer
#2144 -
DPOTrainer
Conversational dataset support forDPOTrainer
#2131 -
GKDTrainer
-
IterativeSFTTrainer
-
KTOTrainer
Conversational dataset support forKTOTrainer
#2248 -
NashMDTrainer
Conversational dataset support for Online DPO #2075 -
OnlineDPOTrainer
Conversational dataset support for Online DPO #2075 -
ORPOTrainer
Conversational dataset support forORPOTrainer
#2184 -
PPOTrainer
-
RewardTrainer
[RewardTrainer] Tokenize inputs within trainer #2102 -
RLOOTrainer
-
SFTTrainer
(yes, viaget_formatting_func_from_dataset
for now, needs refactoring); refactor in 🔬 SFT simplification #2405 -
XPOTrainer
Conversational dataset support for Online DPO #2075
Misc
- Update
docs/dataset_format.mdx