Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ActorGroup] Add ActorGroup #18960

Merged
merged 18 commits into from
Oct 22, 2021
Merged

Conversation

amogkam
Copy link
Contributor

@amogkam amogkam commented Sep 29, 2021

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@amogkam amogkam marked this pull request as draft September 29, 2021 00:50
@richardliaw
Copy link
Contributor

One thing that stands out to me here is that GPU handling seems to be pretty specific to SGD. Any thoughts on whether some abstraction is needed?

@amogkam
Copy link
Contributor Author

amogkam commented Sep 29, 2021

@richardliaw we could make this into a general resources_per_worker instead of num_cpus_per_worker and num_gpus_per_worker

This reverts commit 532660f.
This reverts commit 54321f4.
@amogkam amogkam changed the title [WorkerGroup] Move SGD WorkerGroup to ray.util [WorkerGroup] Add WorkerGroup Sep 29, 2021
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amogkam, see comments below about making this slightly more generic. Let's not let "perfect" get in the way of "good" for now though.

One thing that would be extremely useful and I would ask for you to do in the next follow-up is to make this completely unit testable by abstracting out the Ray API calls. This is something I did awhile back in Serve and it has been invaluable for developer productivity & stability. If you want some advice / help on this, I'm happy to help brainstorm.

python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
python/ray/util/worker_group.py Outdated Show resolved Hide resolved
@edoakes
Copy link
Contributor

edoakes commented Oct 6, 2021

@amogkam decisions/changes in the comments SGTM, re-request once they're made?

@edoakes edoakes added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 6, 2021
@amogkam amogkam removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 21, 2021
@amogkam amogkam marked this pull request as ready for review October 21, 2021 03:39
@amogkam amogkam changed the title [WorkerGroup] Add WorkerGroup [ActorGroup] Add ActorGroup Oct 21, 2021
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start :)

@amogkam
Copy link
Contributor Author

amogkam commented Oct 22, 2021

Failing test is unrelated, going to merge

@amogkam amogkam merged commit 4f05bac into ray-project:master Oct 22, 2021
@amogkam amogkam deleted the worker-group-experimental branch October 22, 2021 17:22
edoakes added a commit to edoakes/ray that referenced this pull request Oct 22, 2021
amogkam pushed a commit that referenced this pull request Oct 22, 2021
@amogkam amogkam mentioned this pull request Oct 22, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants