corrects loss function for Self-play Preference Optimization hard label version #1615

angelahzyuan · 2024-05-03T03:30:12Z

Corrects implementation mentioned in #1612. arxiv: https://arxiv.org/abs/2405.00675. This updates the loss function according to Equation (4.8) with $P(y_w > y_l) = 1$ and $P(y_l > y_w) = 0$, and justified it in doc as the hard label version of the algorithm.

It should work well now for the first iteration. The reported 3 iterations' results were based on the soft label version.

angelahzyuan · 2024-05-03T03:34:10Z

@winglian

kashif · 2024-05-03T05:36:36Z

@angelahzyuan thank you for the fix! Can you kindly run the pre-commit run --all-files command in the root of trl to fix the formatting etc.

kashif · 2024-05-03T05:38:55Z

@angelahzyuan perhaps move the comment inside the elif of the loss, currently its outside, to avoid confusion

HuggingFaceDocBuilderDev · 2024-05-03T05:40:34Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

angelahzyuan · 2024-05-03T05:44:52Z

@kashif done

trl/trainer/dpo_trainer.py

corrects sppo hard lable version

4f3ac62

angelahzyuan changed the title ~~corrects sppo hard lable version~~ corrects loss function for Self-play Preference Optimization hard lable version May 3, 2024

angelahzyuan changed the title ~~corrects loss function for Self-play Preference Optimization hard lable version~~ corrects loss function for Self-play Preference Optimization hard label version May 3, 2024

kashif approved these changes May 3, 2024

View reviewed changes

formatting

89b036c

kashif reviewed May 3, 2024

View reviewed changes

trl/trainer/dpo_trainer.py Outdated Show resolved Hide resolved

formatting

61b2d25

kashif merged commit 75de236 into huggingface:main May 3, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

corrects loss function for Self-play Preference Optimization hard label version #1615

corrects loss function for Self-play Preference Optimization hard label version #1615

angelahzyuan commented May 3, 2024 •

edited

Loading

angelahzyuan commented May 3, 2024

kashif commented May 3, 2024

kashif commented May 3, 2024

HuggingFaceDocBuilderDev commented May 3, 2024

angelahzyuan commented May 3, 2024

corrects loss function for Self-play Preference Optimization hard label version #1615

corrects loss function for Self-play Preference Optimization hard label version #1615

Conversation

angelahzyuan commented May 3, 2024 • edited Loading

angelahzyuan commented May 3, 2024

kashif commented May 3, 2024

kashif commented May 3, 2024

HuggingFaceDocBuilderDev commented May 3, 2024

angelahzyuan commented May 3, 2024

angelahzyuan commented May 3, 2024 •

edited

Loading