-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed cuda memory allocation which was causing CUDA OOM #103
Conversation
QQ: Is it actually guaranteed that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, suggest seeding the randomness explicitly or creating an rng instance so as not to rely on randomness being the same across nodes/ranks. creating the rng instance might be better so as not to affect a user's other random seeding, not sure.
Yes, as of now, for deterministic behavior, user needs to set the seeds. If the user sets the seeds, then the behavior is deterministic on single and multi-node. |
Should we set a default shuffle seed? Help the user avoid surprising outcomes, like having a different shuffle on each node. (Fwiw, I don't think our examples in the examples repo explicitly set the shuffle seed) |
Can we just broadcast seed over gloo or something? Non-deterministic is a bug |
Is there a problem with setting a default shuffle seed? To avoid adding unnecessary distributed stuff. Iirc having a separate gloo process group on mcloud is non trivial. |
Good points... sorry am in Guanajuato with terrible reception... cannot really field code reviews atm... defer to what y'all think is best |
Would also prefer to avoid calling |
Yes, we can. As of today, we kept the Currently, I don't think so of any application where user needs a data randomness behavior in every run, and if they would, then they can set the The commit a8ef59c adds the default |
Agree. Adding a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Description of changes:
shuffle_seed
andprefix_int
across ranks.shuffle_seed
orprefix_int
would be same across all the ranks, hence, there is no need fordist.broadcast
operation.Issue #, if available:
CO-1591
Merge Checklist:
Put an
x
without space in the boxes that apply. If you are unsure about any checklist, please don't hesitate to ask. We are here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
pre-commit
on my change. (check out thepre-commit
section of prerequisites)