[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
-
Updated
Nov 4, 2024 - Python
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
[Paper][ACL 2024 Findings] Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering
[NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$
Source code for "A Dense Reward View on Aligning Text-to-Image Diffusion with Preference" (ICML'24).
[ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"
Video Generation Benchmark
Survey of preference alignment algorithms
Generate synthetic datasets for instruction tuning and preference alignment using tools like `distilabel` for efficient and scalable data creation.
Add a description, image, and links to the preference-alignment topic page so that developers can more easily learn about it.
To associate your repository with the preference-alignment topic, visit your repo's landing page and select "manage topics."