-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor and benchmark #662
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Benchmark and documentation are ready at https://github.com/vwxyzjn/trl/blob/refactor-benchmark/benchmark/README.md. We can probably better tune some of these models for better performance in follow up PRs, including testing for deepspeed integration cc @lewtun. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vwxyzjn, looks really good to me. The openbenchmark code itself is a bit obscure to me so if we could document well how it works that would be great. Left some comments here and there but in general happy to add it. Since this is not strictly a part of the library and more part of our test suite we can also be a bit more experimental here :)
Regarding tyro
- this looks good to me in general, would also love to hear @younesbelkada feedback who maybe also knows about design decisions in transformers
about the CLI.
Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
…nto refactor-benchmark
Thanks @lvwerra! I have addressed the comments :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank a lot for this great work , as discussed offline :D !
Feel free to merge once the CI is green
* refactor and benchmark * update code * Add accelerate logging * logs * quick fix * update config * precommit * modify training example * fix multi-gpu all_reduce error `Tensors must be CUDA and dense` * support more models and benchmark * update * add changes * upload benchmark * precommit * add tyro as a dependency * add tyro * pre-commit * precommit * weird... * lol typo * precommit * sigh * push changes * Update benchmark/README.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Add experiments * upload image to tag specific folder * add openrlbenchmark documentation * rename * remove unused field * precommit * push changes --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
* refactor and benchmark * update code * Add accelerate logging * logs * quick fix * update config * precommit * modify training example * fix multi-gpu all_reduce error `Tensors must be CUDA and dense` * support more models and benchmark * update * add changes * upload benchmark * precommit * add tyro as a dependency * add tyro * pre-commit * precommit * weird... * lol typo * precommit * sigh * push changes * Update benchmark/README.md Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com> * Add experiments * upload image to tag specific folder * add openrlbenchmark documentation * rename * remove unused field * precommit * push changes --------- Co-authored-by: Leandro von Werra <lvwerra@users.noreply.github.com>
This PR does a few refactor. Some raw thoughts:
query_dataset="imdb"
reward_model="sentiment-analysis:lvwerra/distilbert-imdb"
, it could work with a pipeline or a trained reward modelgpt2
in place oflvwerra/gpt2-imdb
We have multiple benchmark axes:
M_s
(e.g., gpt2), how does that affect the performance of a target modelM_t
(e.g., falcon, gptj, llama2)?We can probably have a
train.py
that can doUses
tyro
to eliminate duplicate code / comments.Help text also works
more controlled terminology and tracking config
add accelerate logging
Log
global_backward_batch_size
global_batch_size
world_size