-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Horovod: adjust base LR used by schedulers to scale with the number of workers #2626
Conversation
…izer after scaling by number of workers
Codecov Report
@@ Coverage Diff @@
## master #2626 +/- ##
=======================================
Coverage 91% 92%
=======================================
Files 72 74 +2
Lines 6131 6321 +190
=======================================
+ Hits 5599 5791 +192
+ Misses 532 530 -2 |
This pull request is now in conflict... :( |
@williamFalcon @Borda failing tests seem unrelated. Could you take a look at the PR and see if everything makes sense? As a follow-up, I want to add a param to |
yes, no connection to this PR |
Hello. I was curious if there have been any other PRs that have added the ability for users to specify which learning rate scaling strategy should be used with e.g., DDP/Horovod. As a frequent user of DDP with Lightning, I would love to see the ability to have my optimizer's learning rate automatically scaled according to the effective batch size across nodes and their GPUs (i.e., the total world size). |
In #2574 it was observed that the learning rate used to initialize learning rate schedulers is in conflict with the way we scale the learning rate with the number of Horovod workers. Because the learning rate schedulers are initialized before the scaling, the scaling would be overridden. This PR fixes this so that optimizers and LR schedulers scale up correctly.
Follow-ups to consider would be: