-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Setup cgroup v2 in C++ #49416
base: master
Are you sure you want to change the base?
Changes from 17 commits
a9e8dc6
feb0282
ce77002
995dc0f
f54f2eb
f3a4a9a
341317f
50874cd
3027ed8
12f0cfe
c5e3fd7
b679ba7
7689889
792fa54
128bfdf
d3fdd2a
2dcd1e7
56aeea9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
// Copyright 2024 The Ray Authors. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
#pragma once | ||
|
||
#include <unistd.h> | ||
|
||
#include <cstdint> | ||
#include <string> | ||
|
||
namespace ray { | ||
|
||
// Context used to setup cgroupv2 for a task / actor. | ||
struct PhysicalModeExecutionContext { | ||
// Directory for cgroup, which is applied to application process. | ||
// | ||
// TODO(hjiang): Revisit if we could save some CPU/mem with string view. | ||
std::string cgroup_directory; | ||
// A unique id to uniquely identity a certain task / actor attempt. | ||
std::string id; | ||
// PID for the process. | ||
pid_t pid; | ||
|
||
// Memory-related spec. | ||
// | ||
// Unit: bytes. Corresponds to cgroup V2 `memory.max`, which enforces hard cap on max | ||
// memory consumption. 0 means no limit. | ||
uint64_t max_memory = 0; | ||
}; | ||
|
||
} // namespace ray |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,161 @@ | ||
// Copyright 2024 The Ray Authors. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
#include "ray/common/cgroup/cgroup_utils.h" | ||
|
||
#ifndef __linux__ | ||
namespace ray { | ||
bool CgroupV2Setup::SetupCgroupV2ForContext(const PhysicalModeExecutionContext &ctx) { | ||
return false; | ||
} | ||
/*static*/ bool CgroupV2Setup::CleanupCgroupV2ForContext( | ||
const PhysicalModeExecutionContext &ctx) { | ||
return false; | ||
} | ||
} // namespace ray | ||
#else // __linux__ | ||
|
||
#include <sys/stat.h> | ||
|
||
#include <fstream> | ||
|
||
#include "absl/strings/str_format.h" | ||
#include "absl/strings/str_join.h" | ||
#include "absl/strings/str_split.h" | ||
#include "ray/util/logging.h" | ||
|
||
namespace ray { | ||
|
||
namespace { | ||
|
||
// Owner can read and write. | ||
constexpr int kCgroupV2FilePerm = 0600; | ||
|
||
// There're two types of memory cgroup constraints: | ||
// 1. For those with limit capped, they will be created a dedicated cgroup; | ||
// 2. For those without limit specified, they will be added to the default cgroup. | ||
static constexpr std::string_view kDefaultCgroupV2Id = "default_cgroup_id"; | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not a uuid, and we should not use a name like this visible in linux. We had the idea of getting a default name from cluster ID right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Renamed as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not sure how cluster id is related here? |
||
// Open a cgroup path and append write [content] into the file. | ||
void OpenCgroupV2FileAndAppend(std::string_view path, std::string_view content) { | ||
std::ofstream out_file{path.data(), std::ios::out | std::ios::app}; | ||
out_file << content; | ||
} | ||
|
||
bool CreateNewCgroupV2(const PhysicalModeExecutionContext &ctx) { | ||
// Sanity check. | ||
RAY_CHECK(!ctx.id.empty()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [idea] can we put these contraints into the ctor of PhysicalModeExecutionContext There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I considered that:
|
||
RAY_CHECK_NE(ctx.id, kDefaultCgroupV2Id); | ||
RAY_CHECK_GT(ctx.max_memory, 0); | ||
|
||
const std::string cgroup_folder = | ||
absl::StrFormat("%s/%s", ctx.cgroup_directory, ctx.id); | ||
int ret_code = mkdir(cgroup_folder.data(), kCgroupV2FilePerm); | ||
if (ret_code != 0) { | ||
return false; | ||
} | ||
|
||
const std::string procs_path = absl::StrFormat("%s/cgroup.procs", cgroup_folder); | ||
OpenCgroupV2FileAndAppend(procs_path, absl::StrFormat("%d", ctx.pid)); | ||
|
||
// Add max memory into cgroup. | ||
const std::string max_memory_path = absl::StrFormat("%s/memory.max", cgroup_folder); | ||
OpenCgroupV2FileAndAppend(max_memory_path, absl::StrFormat("%d", ctx.max_memory)); | ||
|
||
return true; | ||
} | ||
|
||
bool UpdateDefaultCgroupV2(const PhysicalModeExecutionContext &ctx) { | ||
// Sanity check. | ||
RAY_CHECK(!ctx.id.empty()); | ||
RAY_CHECK_EQ(ctx.id, kDefaultCgroupV2Id); | ||
RAY_CHECK_EQ(ctx.max_memory, 0); | ||
|
||
const std::string cgroup_folder = | ||
absl::StrFormat("%s/%s", ctx.cgroup_directory, ctx.id); | ||
int ret_code = mkdir(cgroup_folder.data(), kCgroupV2FilePerm); | ||
if (ret_code != 0) { | ||
return false; | ||
} | ||
|
||
const std::string procs_path = absl::StrFormat("%s/cgroup.procs", cgroup_folder); | ||
OpenCgroupV2FileAndAppend(procs_path, absl::StrFormat("%d", ctx.pid)); | ||
|
||
return true; | ||
} | ||
|
||
bool DeleteCgroupV2(const PhysicalModeExecutionContext &ctx) { | ||
// Sanity check. | ||
RAY_CHECK(!ctx.id.empty()); | ||
RAY_CHECK_NE(ctx.id, kDefaultCgroupV2Id); | ||
RAY_CHECK_GT(ctx.max_memory, 0); | ||
|
||
const std::string cgroup_folder = | ||
absl::StrFormat("%s/%s", ctx.cgroup_directory, ctx.id); | ||
return rmdir(cgroup_folder.data()) == 0; | ||
} | ||
|
||
void PlaceProcessIntoDefaultCgroup(const PhysicalModeExecutionContext &ctx) { | ||
const std::string procs_path = | ||
absl::StrFormat("%s/%s/cgroup.procs", ctx.cgroup_directory, kDefaultCgroupV2Id); | ||
{ | ||
std::ofstream out_file{procs_path.data(), std::ios::out}; | ||
out_file << ctx.pid; | ||
} | ||
|
||
return; | ||
} | ||
|
||
} // namespace | ||
|
||
/*static*/ std::unique_ptr<CgroupV2Setup> CgroupV2Setup::New( | ||
PhysicalModeExecutionContext ctx) { | ||
if (!CgroupV2Setup::SetupCgroupV2ForContext(ctx)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Q: do we want to CHECK fail in ctor on "cgroup already exists"? or if we have 2 objs managing the same cgroup, in the first dtor it's deleted, affecting the other. We don't expect anyone creating cgroups with our naming schema so CHECK failure should be acceptable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the folder already exists, cgroup setup function returns false, and nullptr returned here. My idea is to fallback to "not use cgroup" behavior, wondering if that sounds ok to you? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I special handle EEXISTS to treat it as internal error in the latest commit. |
||
return nullptr; | ||
} | ||
return std::unique_ptr<CgroupV2Setup>(new CgroupV2Setup(std::move(ctx))); | ||
} | ||
|
||
CgroupV2Setup::~CgroupV2Setup() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. before deleting the cgroup v2 we need to first move proc out of it, or the deletion would fail. for that you need to somehow record the prev cgroup if any, or we can blindly move all those procs to the default cgroup. this can come from a "global cgroup mgr" in raylet:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes.. fixed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discussed offline, I add a few items based on our offline discussion:
|
||
if (!CleanupCgroupV2ForContext(ctx_)) { | ||
RAY_LOG(ERROR) << "Fails to cleanup cgroup for execution context with id " << ctx_.id; | ||
} | ||
} | ||
|
||
/*static*/ bool CgroupV2Setup::SetupCgroupV2ForContext( | ||
const PhysicalModeExecutionContext &ctx) { | ||
// Create a new cgroup if max memory specified. | ||
if (ctx.max_memory > 0) { | ||
return CreateNewCgroupV2(ctx); | ||
} | ||
|
||
// Update default cgroup if no max resource specified. | ||
return UpdateDefaultCgroupV2(ctx); | ||
} | ||
|
||
/*static*/ bool CgroupV2Setup::CleanupCgroupV2ForContext( | ||
const PhysicalModeExecutionContext &ctx) { | ||
// Delete the dedicated cgroup if max memory specified. | ||
if (ctx.max_memory > 0) { | ||
PlaceProcessIntoDefaultCgroup(ctx); | ||
return DeleteCgroupV2(ctx); | ||
} | ||
|
||
// If pid already in default cgroup, no action needed. | ||
return true; | ||
} | ||
|
||
} // namespace ray | ||
|
||
#endif // __linux__ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
// Copyright 2024 The Ray Authors. | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
// Util functions to setup cgroup. | ||
|
||
#pragma once | ||
|
||
#include <memory> | ||
#include <string_view> | ||
#include <utility> | ||
|
||
#include "ray/common/cgroup/cgroup_context.h" | ||
|
||
namespace ray { | ||
|
||
// A util class which sets up cgroup at construction, and cleans up at destruction. | ||
// On ctor, creates a cgroup v2 if necessary based on the context. Then puts `ctx.pid` | ||
// into this cgroup. | ||
// On dtor, puts `ctx.pid` into the default cgroup, and remove this cgroup v2 if any. | ||
// | ||
// Precondition: | ||
// 1. rw permission for cgroup has been validated. | ||
// 2. Cgroup folder (i.e. default application cgroup folder) has been properly setup. | ||
// See README under this folder for more details. | ||
class CgroupV2Setup { | ||
dentiny marked this conversation as resolved.
Show resolved
Hide resolved
|
||
public: | ||
// A failed construction returns nullptr. | ||
static std::unique_ptr<CgroupV2Setup> New(PhysicalModeExecutionContext ctx); | ||
|
||
~CgroupV2Setup(); | ||
|
||
CgroupV2Setup(const CgroupV2Setup &) = delete; | ||
CgroupV2Setup &operator=(const CgroupV2Setup &) = delete; | ||
CgroupV2Setup(CgroupV2Setup &&) = delete; | ||
CgroupV2Setup &operator=(CgroupV2Setup &&) = delete; | ||
|
||
private: | ||
CgroupV2Setup(PhysicalModeExecutionContext ctx) : ctx_(std::move(ctx)) {} | ||
|
||
// Setup cgroup based on the given [ctx]. Return whether the setup succeeds or not. | ||
static bool SetupCgroupV2ForContext(const PhysicalModeExecutionContext &ctx); | ||
|
||
// Cleanup cgroup based on the given [ctx]. Return whether the cleanup succeds or not. | ||
static bool CleanupCgroupV2ForContext(const PhysicalModeExecutionContext &ctx); | ||
|
||
// Execution context for current cgroup v2 setup. | ||
PhysicalModeExecutionContext ctx_; | ||
}; | ||
|
||
} // namespace ray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: do we need this separate config class from CgroupV2Setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These data fields are necessary to construct and destruct cgroup;
As of now the struct doesn't seem that necessary since it only contains 4 fields and we could directly pass them into the factory function, but we could have much more fields (i.e. cpu-related, resource min / high), better to have a struct.