Pulse · huggingface/trl · GitHub

December 30, 2024 – January 6, 2025

Overview

6 Active pull requests

8 Active issues

2 Pull requests merged by 2 people

🚜 Use field in dataclasses
#2494 merged Jan 6, 2025
Remove graph breaks for torch.compile() in padding free branch in DataCollatorForCompletionOnlyLM
#2158 merged Jan 6, 2025

4 Pull requests opened by 3 people

add "_prepare_fsdp" for DPOTrainer
#2539 opened Jan 3, 2025
custom reward function support for ppo trainer
#2540 opened Jan 3, 2025
Issues Auto-Labeller
#2542 opened Jan 4, 2025
MPO
#2544 opened Jan 6, 2025

1 Issue closed by 1 person

SFTTrainer not loading dataset correctly, expected format?
#2541 closed Jan 4, 2025

7 Issues opened by 7 people

Finetuning on the last turn of multi-turn conversations
#2545 opened Jan 6, 2025
Dataset type conversion utilities
#2543 opened Jan 6, 2025
Is `truncation_mode` used in `DPOTrainer`?
#2538 opened Jan 2, 2025
SFTTrainer explicitly skips `prepare_model_for_kbit_training` if using PEFT + FSDP/Deepspeed3 whereas DPOTrainer calls this
#2537 opened Jan 2, 2025
Different finetune speed in DPO task of peft and ms-swift (600/S iter vs 30/s iter)
#2536 opened Jan 2, 2025
(Willing to PR) Will it be welcomed if speeding up algorithms like PPO and code refactor/cleanup?
#2535 opened Dec 31, 2024
Using "beam search" strategy while generating the responses
#2534 opened Dec 31, 2024

13 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

add xpu support for DPO
#2533 commented on Jan 3, 2025 • 2 new comments
PPO Example Script Accelerator error: initialize your accelerator via `accelerator = Accelerator()`
#2377 commented on Dec 31, 2024 • 0 new comments
UserWarning for train dpo with lora: None of the inputs have requires_grad=True. Gradients will be None
#2486 commented on Jan 2, 2025 • 0 new comments
AttributeError: 'DistributedDataParallel' object has no attribute 'policy' when saving model using PPOTrainer
#2375 commented on Jan 3, 2025 • 0 new comments
`PPOv2Trainer` `reward_model` throws `AttributeError: '<My Custom Class>' object has no attribute 'base_model_prefix'`
#1977 commented on Jan 4, 2025 • 0 new comments
[question] best way to have my own reward model which is backed by rules
#2518 commented on Jan 4, 2025 • 0 new comments
[GRPO] initial GRPO trainer
#1954 commented on Jan 5, 2025 • 0 new comments
Asynchronous RLHF: Faster and More Efficient Online DPO
#2278 commented on Dec 31, 2024 • 0 new comments
Padding free dpo
#2437 commented on Jan 2, 2025 • 0 new comments
[Liger] add native liger-kernel orpo loss
#2482 commented on Jan 3, 2025 • 0 new comments
[Liger] Integrate Liger CPO & SimPO
#2506 commented on Jan 3, 2025 • 0 new comments
🕊️ DPO padding free
#2520 commented on Jan 6, 2025 • 0 new comments
[ORPO] revert orpo changes
#2527 commented on Jan 6, 2025 • 0 new comments