Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Setup cgroup v2 in C++ #49416

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

dentiny
Copy link
Contributor

@dentiny dentiny commented Dec 23, 2024

A rewrite of #48788

Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Dec 23, 2024
@dentiny dentiny requested review from jjyao and rynewang December 23, 2024 22:52
src/ray/common/cgroup/cgroup_utils.cc Outdated Show resolved Hide resolved
src/ray/common/cgroup/cgroup_context.h Outdated Show resolved Hide resolved
src/ray/common/cgroup/cgroup_utils.cc Outdated Show resolved Hide resolved
// There're two types of memory cgroup constraints:
// 1. For those with limit capped, they will be created a dedicated cgroup;
// 2. For those without limit specified, they will be added to the default cgroup.
inline constexpr std::string_view kDefaultCgroupUuid = "default_cgroup_uuid";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this default_cgroup_uuid the name for the default cgroup? the word uuid sounds strange to me, since this is a name but not a uuid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I choose the word uuid here since context contains a data field called uuid.

src/ray/common/cgroup/cgroup_utils.h Outdated Show resolved Hide resolved
src/ray/common/cgroup/cgroup_utils.cc Outdated Show resolved Hide resolved
src/ray/common/cgroup/cgroup_context.h Outdated Show resolved Hide resolved
src/ray/common/cgroup/cgroup_utils.cc Outdated Show resolved Hide resolved
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny force-pushed the hjiang/cgroup-utils-cpp branch from a615cab to feb0282 Compare December 24, 2024 04:51
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested a review from rynewang December 24, 2024 05:32
Copy link
Contributor

@rynewang rynewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need more designs on the API. For that, please describe how would you like to use this API, e.g. in WorkerPool or NodeManager. Write some pseudo code in worker / task lifetime to highlight usage of cgroup api.

src/ray/common/cgroup/cgroup_utils.h Show resolved Hide resolved
namespace ray {

// Context used to setup cgroupv2 for a task / actor.
struct PhysicalModeExecutionContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: do we need this separate config class from CgroupV2Setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These data fields are necessary to construct and destruct cgroup;
As of now the struct doesn't seem that necessary since it only contains 4 fields and we could directly pass them into the factory function, but we could have much more fields (i.e. cpu-related, resource min / high), better to have a struct.

src/ray/common/cgroup/cgroup_utils.cc Outdated Show resolved Hide resolved
// 1. For those with limit capped, they will be created a dedicated cgroup;
// 2. For those without limit specified, they will be added to the default cgroup.
static constexpr std::string_view kDefaultCgroupV2Uuid = "default_cgroup_uuid";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a uuid, and we should not use a name like this visible in linux. We had the idea of getting a default name from cluster ID right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed as id.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had the idea of getting a default name from cluster ID right?

I'm not sure how cluster id is related here?

return std::unique_ptr<CgroupV2Setup>(new CgroupV2Setup(std::move(ctx)));
}

CgroupV2Setup::~CgroupV2Setup() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before deleting the cgroup v2 we need to first move proc out of it, or the deletion would fail. for that you need to somehow record the prev cgroup if any, or we can blindly move all those procs to the default cgroup. this can come from a "global cgroup mgr" in raylet:

class CgroupV2Manager {

ctor(default_cgroup_name);

bool PutPidIntoDefaultCgroupRemovingAnyCgroupsIfAny(pid);
bool CreateCgroupV2ForPid(pid, cgroup_name);

};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before deleting the cgroup v2 we need to first move proc out of it, or the deletion would fail.

Yes.. fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, I add a few items based on our offline discussion:

  • I add a README for cgroup
  • I add comments on local task manager on how I plan to integrate the cgroup RAII class with local task manager
  • I add comments on how I plan to prepare cgroup basic setup in raylet

Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Copy link
Contributor

@rynewang rynewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dentiny let's have a meeting to discuss about design of this PR.

src/ray/common/cgroup/cgroup_utils.cc Outdated Show resolved Hide resolved
src/ray/common/cgroup/cgroup_utils.h Show resolved Hide resolved
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny
Copy link
Contributor Author

dentiny commented Dec 27, 2024

Offline discussion: need to document the precondition

  • cgroup hierarchy under cgroup folder
  • already exists default cgroup and permission

@dentiny
Copy link
Contributor Author

dentiny commented Dec 27, 2024

Possible map to place cgroup:

// Keeps track of where currently executing tasks are being run.
absl::flat_hash_map<TaskID, rpc::Address> executing_tasks_ ABSL_GUARDED_BY(mu_);

@dentiny
Copy link
Contributor Author

dentiny commented Dec 27, 2024

Possible place to put cgroup setup class:

void LocalTaskManager::RemoveFromRunningTasksIfExists(const RayTask &task) {
auto sched_cls = task.GetTaskSpecification().GetSchedulingClass();
auto it = info_by_sched_cls_.find(sched_cls);
if (it != info_by_sched_cls_.end()) {
it->second.running_tasks.erase(task.GetTaskSpecification().TaskId());
if (it->second.running_tasks.size() == 0) {
info_by_sched_cls_.erase(it);
}
}
}

@dentiny
Copy link
Contributor Author

dentiny commented Dec 27, 2024

Discussed offline with @rynewang , we hope two other items are included for easier review and roadmap:

Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested a review from rynewang December 27, 2024 10:27
@@ -387,6 +387,16 @@ void LocalTaskManager::DispatchScheduledTasksToWorkers() {
const std::shared_ptr<WorkerInterface> worker,
PopWorkerStatus status,
const std::string &runtime_env_setup_error_message) -> bool {
// TODO(hjiang): After getting the ready-to-use worker and task id, we're
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait a minute... The running_tasks is inserted with task ID, before we call PopWorker. This means if you change running_tasks to a map { task ID -> cgroup raii }, the value must be nullptr at first, and is only set until PopWorker callback is called. I'm OK with this, but, @jjyao do you feel putting the cgroup raii (to remove proc from cgroup and delete cgroup on dtor) in LocalTaskManager's running_tasks is the best place? Is there a even more precise data structure somewhere in raylet to model a running task attempt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value must be nullptr at first, and is only set until PopWorker callback is called

Yeah that's my plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nullptr is needed anyway I think, for example, we need to handle cases where cgroup prerequisite is not met.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjyao Do you have any thoughts?

@dentiny dentiny requested a review from rynewang January 3, 2025 21:52
@dentiny
Copy link
Contributor Author

dentiny commented Jan 3, 2025

@rynewang / @jjyao Wondering if you have any followup comments for this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants