SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this

### System Info

- Platform: Linux-5.15.0-1061-gke-x86_64-with-glibc2.31
- Python version: 3.11.9
- PyTorch version: 2.4.0
- CUDA device(s): NVIDIA A100-SXM4-80GB
- Transformers version: 4.46.3
- Accelerate version: 1.0.1
- Accelerate config: not found
- Datasets version: 3.0.2
- HF Hub version: 0.27.0
- TRL version: 0.12.1
- bitsandbytes version: 0.44.1
- DeepSpeed version: not installed
- Diffusers version: not installed
- Liger-Kernel version: not installed
- LLM-Blender version: not installed
- OpenAI version: 1.58.1
- PEFT version: 0.13.2

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder
- [X] My own task or dataset (give details below)

### Reproduction

difference in how each trainer handles PEFT + FSDP

sft:
https://github.com/huggingface/trl/blob/v0.12.1/trl/trainer/sft_trainer.py#L242-L244

dpo:
https://github.com/huggingface/trl/blob/v0.12.1/trl/trainer/dpo_trainer.py#L363


### Expected behavior

Currently workflow is:
- create PEFT model outside of trainer
- pass PEFT model to trainer
- first run SFTTrainer
- use output model from SFTTrainer as base model in DPOTrainer

it is unclear what is the expected way to create and pass a PEFT model to the trainer when also using FSDP for model parallel training since both SFTTrainer and DPOTrainer handle this differently.

### Checklist

- [X] I have checked that my issue isn't already filed (see [open issues](https://github.com/huggingface/trl/issues?q=is%3Aissue))
- [X] I have included my system information
- [X] Any code provided is minimal, complete, and reproducible ([more on MREs](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any code provided is properly formatted in code blocks, (no screenshot, [more on code blocks](https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks))
- [X] Any traceback provided is complete

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

System Info

Information

Tasks

Reproduction

Expected behavior

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SFTTrainer explicitly skips prepare_model_for_kbit_training if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this #2537

Description