[core] refactor `step` method #76

younesbelkada · 2023-01-05T10:51:06Z

This PR adds a new safety checker inside step method to make sure the rewards are set on the correct device. Regarding the queries and the responses users should retrieve the dataloader from the trainer and use that instead as the device assignment is performed directly at the dataloader level. Since the reward is not part of the dataloader the device assignment needs to be performed manually.

This PR also adds inside the safety checker a new check. Before this PR if a user pass a reward tensor with a dimension different from 0 (e.g. torch.tensor([1.0])) it would break the training loop. Therefore now we force the reward tensor to be with the desired shape. We now also throw a value error if the dimension of the reward is > 1.

cc @lewtun @lvwerra @edbeeching

- add safety checker + manual device assignment

HuggingFaceDocBuilderDev · 2023-01-05T10:57:00Z

The documentation is not available anymore as the PR was closed or merged.

lvwerra

Generally looks good to me, just one comment! 🚀

lvwerra · 2023-01-05T11:14:04Z

trl/trainer/ppo_trainer.py

@@ -252,6 +255,17 @@ def _step_safety_checker(
                    f"Batch size ({batch_size}) does not match number of examples - but got {len(tensor_list)} for: {name}"
                )

+            # set scores on the correct device
+            if name == "scores":
+                scores = [score.to(self.accelerator.device) for score in scores]


I think we can do the same for queries and values as well. although they should be you don't know what the user might do before passing them to step and then the behaviour is consistent for all inputs.

Do you know what happens when the tensor is already on device? Will it copy it again or do nothing?

Makes sense, in PT it should do nothing if they are on the same device so this should be cost-free

Should be now handled in b32bfbf

trl/trainer/ppo_trainer.py

tests/test_ppo_trainer.py

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

lvwerra

LGTM!

refactor step method

c2dda08

- add safety checker + manual device assignment

younesbelkada requested review from edbeeching, lvwerra and lewtun January 5, 2023 10:51

cleanup

9ed66bf

younesbelkada mentioned this pull request Jan 5, 2023

Roadmap - trl 0.2 #64

Closed

26 tasks

isort

2e212e4

lvwerra reviewed Jan 5, 2023

View reviewed changes

younesbelkada added 3 commits January 5, 2023 11:34

Merge remote-tracking branch 'origin/master' into refactor-ppo-step

d86cefb

more generic

b32bfbf

make style

4fdc7cb

lvwerra reviewed Jan 5, 2023

View reviewed changes

trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved

lvwerra reviewed Jan 5, 2023

View reviewed changes

tests/test_ppo_trainer.py Show resolved Hide resolved

younesbelkada and others added 3 commits January 5, 2023 15:06

Update trl/trainer/ppo_trainer.py

eec2133

Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>

make style + add tests

e3d4e26

more tests

c5ced44

younesbelkada requested a review from lvwerra January 5, 2023 14:20

lvwerra approved these changes Jan 5, 2023

View reviewed changes

younesbelkada merged commit d6fe301 into huggingface:main Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] refactor `step` method #76

[core] refactor `step` method #76

younesbelkada commented Jan 5, 2023

HuggingFaceDocBuilderDev commented Jan 5, 2023 •

edited

Loading

lvwerra left a comment

lvwerra Jan 5, 2023

younesbelkada Jan 5, 2023

younesbelkada Jan 5, 2023

lvwerra left a comment

[core] refactor step method #76

[core] refactor step method #76

Conversation

younesbelkada commented Jan 5, 2023

HuggingFaceDocBuilderDev commented Jan 5, 2023 • edited Loading

lvwerra left a comment

Choose a reason for hiding this comment

lvwerra Jan 5, 2023

Choose a reason for hiding this comment

younesbelkada Jan 5, 2023

Choose a reason for hiding this comment

younesbelkada Jan 5, 2023

Choose a reason for hiding this comment

lvwerra left a comment

Choose a reason for hiding this comment

[core] refactor `step` method #76

[core] refactor `step` method #76

HuggingFaceDocBuilderDev commented Jan 5, 2023 •

edited

Loading