Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accelerate integration #58

Merged
merged 48 commits into from
Dec 30, 2022

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Dec 27, 2022

What does this PR do?

This PR integrates trl with accelerate to make it compatible with the tools provided by the library to be able to train models using PPOTrainer. This would enable users to train their models in mixed precision, using Data Parallelism etc in a very simple manner.
Users should design their own training script and run them using accelerate launch xxx.py based on the example scripts provided in examples/scripits.

This PR also integrates Data Parallelism paradigm, enabling users to benefit from multi-GPU training if they want to speedup training.

TODOs

  • add example script
  • Improve API
  • document code
  • document example (check accelerate examples)
  • Input safety checkers (very important)

DeepSpeed tests (check where it works)

  • zero-0
  • zero-1
  • zero-2
  • zero-3

#### Run PPO step
t = time.time()
stats = ppo_trainer.step(query_tensors, response_tensors, rewards)
ppo_trainer.log_stats(stats, timing, batch, rewards, t0, t, logs)
Copy link
Contributor Author

@younesbelkada younesbelkada Dec 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To improve, we probably want a better way to log the stats

trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
Comment on lines 272 to 275
if isinstance(v, torch.Tensor) and k != 'objective/kl':
# tensor_list = [torch.zeros_like(v) for _ in range(self.accelerator.num_processes)]
dist.all_reduce(v, dist.ReduceOp.SUM)
v /= self.accelerator.num_processes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me in a DP setup, each GPU will need to have its own replica of objective/kl since this is used to update the kl_ctl object above. That is why I prefered to not include it in the all_reduce operation but I just wanted to confirm

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @younesbelkada. My main comments are about DP. I think if we don't wrap the step inputs (queries/responses) in a dataloader we don't achieve proper DP. But maybe I am wrong?

wandb/run-20221213_140454-xsw0kjtv/files/requirements.txt Outdated Show resolved Hide resolved
trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
model (torch.model): Hugging Face transformer GPT2 model with value head
ref_model (torch.model): Hugging Face transformer GPT2 refrence model used for KL penalty
tokenizer (tokenizer): Hugging Face tokenizer
ppo_params (dict or None): PPO parameters for training. Can include following keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should replace **config (=ppo_params) with explicit kwargs or setup TrainingArguments like in transformers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be a follow up PR btw

trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
trl/trainer/accelerate_ppo.py Outdated Show resolved Hide resolved
younesbelkada and others added 3 commits December 29, 2022 09:58
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
- added `PPOConfig` class
- docstring on `LengthSampler`
- fix test
- gather rewards when logging
- unwrap model when calling generate
@younesbelkada
Copy link
Contributor Author

younesbelkada commented Dec 29, 2022

wandb run (multi-GPU) after the latest commit: https://wandb.ai/distill-bloom/trl/runs/1mps4h09?workspace=user-younesbelkada

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are pretty close - a few open questions and minor changs :)

trl/trainer/ppo_config.py Outdated Show resolved Hide resolved
trl/trainer/ppo_config.py Outdated Show resolved Hide resolved
trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved
trl/trainer/ppo_trainer.py Outdated Show resolved Hide resolved
stats (dict[str, Any]):
a dictionary of stats with the tensors gathered.
"""
import torch.distributed as dist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think?

trl/trainer/utils.py Outdated Show resolved Hide resolved
# In a distributed setup, only logging needs to be performed on the main process
# check: https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html
# or: https://discuss.pytorch.org/t/use-distributed-data-parallel-correctly/82500/11
self.is_distributed = self.accelerator.distributed_type == "MULTI_GPU"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can use accelerates gather method we can probably get rid of this?

examples/scripts/04-ppo-sentiment-accelerate.py Outdated Show resolved Hide resolved

#### Compute sentiment score
t = time.time()
texts = [q + r for q,r in zip(batch['query'], batch['response'])]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the remove columns method inside the trainer the query shouldn't be there anymore? since we don't pass the data through the model internally, we don't need to remove the columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples/scripts/04-ppo-sentiment-accelerate.py Outdated Show resolved Hide resolved
@younesbelkada younesbelkada marked this pull request as ready for review December 29, 2022 17:27
@younesbelkada
Copy link
Contributor Author

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Dec 29, 2022

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada younesbelkada mentioned this pull request Dec 29, 2022
26 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants