-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HugePages feature #50859
HugePages feature #50859
Conversation
for i := range pod.Spec.Containers { | ||
resourceSet := toContainerResourcesSet(&pod.Spec.Containers[i]) | ||
for resourceStr := range resourceSet { | ||
if v1helper.IsHugePageResourceName(v1.ResourceName(resourceStr)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to validate the HugePage size? i.e. 2Mi and 1Gi are valid and others not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means, what if users configure hugepages-3Mi
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiangpengzhao - that is a good question.
there are a variety of large page sizes depending on the architecture, so i was hoping to avoid an explicit enumeration of them or awareness in the api server. this is an alpha feature, but my long term expectation is that operators would configure the ResourceQuota
system to make consumption of hugepages denied by default per something like #36765 , and they would then grant explicit quota for the page sizes supported in their fleet of machines. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. I like this idea. Thanks for explanation!
Another concern, I have a use case that users want to have the privilege to adjust their hugepages usage, or even the node hugepages capacity dynamically. Yes this will cause complication and seems like not reasonable enough. Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They want to release the unused hugepages to be "normal" memory so that these memory can be used by other process on the node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiangpengzhao - that is out of scope for alpha. its possible you could run a controller that observed pods pending due to lack of hugepages capacity across cluster, and run a pod to allocate them on a particular node in response. there are pros/cons with that especially with gigantic page sizes. i had a pod that did something like that here: https://github.com/derekwaynecarr/hugepages/tree/master/allocator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@derekwaynecarr Cool! Thanks, Derek!
2ba22e0
to
2a61992
Compare
need #50629 to merge first. |
2a61992
to
999bd45
Compare
9596148
to
9b4ace5
Compare
/test pull-kubernetes-e2e-kops-aws |
1 similar comment
/test pull-kubernetes-e2e-kops-aws |
prerequisite prs merged, this should be good to go. |
note to respond to #50773 (comment) in a follow-on pr. |
9b4ace5
to
dfd5a09
Compare
dfd5a09
to
65be700
Compare
@@ -43,6 +47,10 @@ const ( | |||
libcontainerSystemd libcontainerCgroupManagerType = "systemd" | |||
) | |||
|
|||
// hugePageSizeList is useful for converting to the hugetlb canonical unit | |||
// which is what is expected when interacting with libcontainer | |||
var hugePageSizeList = []string{"", "kB", "MB", "GB", "TB", "PB"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/kB/KB to make it consistent with other units?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the units were intended to align with what libcontainer uses here (except "B" was irrelevant)
|
||
// if huge pages are enabled, we report them as a schedulable resource on the node | ||
if utilfeature.DefaultFeatureGate.Enabled(features.HugePages) { | ||
for _, hugepagesInfo := range info.HugePages { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we agreed to for this release, only support single huge page size, but I couldn't find where we ensure that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, good catch. i wrote validation for pod to only consume one size, and forgot to add the validation for node to only report one size larger than 0. will add now.
@@ -262,6 +296,13 @@ func (m *qosContainerManagerImpl) UpdateCgroups() error { | |||
return err | |||
} | |||
|
|||
// update the qos level cgroup settings for huge pages (ensure they remain unbounded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this lead to the overcommit for hugepage at the node level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We enforce via validation of pod spec that over-commit is not allowed.
rebased on kubefeature change. |
0372445
to
130085a
Compare
/retest |
/lgtm trivial rebase |
/lgtm |
130085a
to
c9942b1
Compare
rebased |
c9942b1
to
38d5dee
Compare
/test pull-kubernetes-e2e-gce-etcd3 |
kernel panic on previous runs not related to this test:
/test pull-kubernetes-e2e-gce-etcd3 |
/assign |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dchen1107, derekwaynecarr, sjenning, smarterclayton Associated issue: 275 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
@derekwaynecarr: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Automatic merge from submit-queue |
What this PR does / why we need it:
Implements HugePages support per kubernetes/community#837
Feature track issue: kubernetes/enhancements#275
Special notes for your reviewer:
A follow-on PR is opened to add the EmptyDir support.
Release note: