Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] refactor step method #76

Merged
merged 9 commits into from
Jan 5, 2023

Conversation

younesbelkada
Copy link
Contributor

This PR adds a new safety checker inside step method to make sure the rewards are set on the correct device. Regarding the queries and the responses users should retrieve the dataloader from the trainer and use that instead as the device assignment is performed directly at the dataloader level. Since the reward is not part of the dataloader the device assignment needs to be performed manually.

This PR also adds inside the safety checker a new check. Before this PR if a user pass a reward tensor with a dimension different from 0 (e.g. torch.tensor([1.0])) it would break the training loop. Therefore now we force the reward tensor to be with the desired shape. We now also throw a value error if the dimension of the reward is > 1.

cc @lewtun @lvwerra @edbeeching

- add safety checker + manual device assignment
@younesbelkada younesbelkada mentioned this pull request Jan 5, 2023
26 tasks
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 5, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, just one comment! 🚀

@@ -252,6 +255,17 @@ def _step_safety_checker(
f"Batch size ({batch_size}) does not match number of examples - but got {len(tensor_list)} for: {name}"
)

# set scores on the correct device
if name == "scores":
scores = [score.to(self.accelerator.device) for score in scores]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do the same for queries and values as well. although they should be you don't know what the user might do before passing them to step and then the behaviour is consistent for all inputs.

Do you know what happens when the tensor is already on device? Will it copy it again or do nothing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, in PT it should do nothing if they are on the same device so this should be cost-free

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be now handled in b32bfbf

trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved
younesbelkada and others added 3 commits January 5, 2023 15:06
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
@younesbelkada younesbelkada requested a review from lvwerra January 5, 2023 14:20
Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@younesbelkada younesbelkada merged commit d6fe301 into huggingface:main Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants