HugePages feature #50859

derekwaynecarr · 2017-08-17T18:55:51Z

What this PR does / why we need it:
Implements HugePages support per kubernetes/community#837

Feature track issue: kubernetes/enhancements#275

Special notes for your reviewer:
A follow-on PR is opened to add the EmptyDir support.

Release note:

Alpha support for pre-allocated hugepages

xiangpengzhao · 2017-08-18T04:51:39Z

pkg/api/validation/validation.go

+		for i := range pod.Spec.Containers {
+			resourceSet := toContainerResourcesSet(&pod.Spec.Containers[i])
+			for resourceStr := range resourceSet {
+				if v1helper.IsHugePageResourceName(v1.ResourceName(resourceStr)) {


Do we want to validate the HugePage size? i.e. 2Mi and 1Gi are valid and others not.

That means, what if users configure hugepages-3Mi ?

@xiangpengzhao - that is a good question.

there are a variety of large page sizes depending on the architecture, so i was hoping to avoid an explicit enumeration of them or awareness in the api server. this is an alpha feature, but my long term expectation is that operators would configure the ResourceQuota system to make consumption of hugepages denied by default per something like #36765 , and they would then grant explicit quota for the page sizes supported in their fleet of machines. WDYT?

SGTM. I like this idea. Thanks for explanation!

Another concern, I have a use case that users want to have the privilege to adjust their hugepages usage, or even the node hugepages capacity dynamically. Yes this will cause complication and seems like not reasonable enough. Any thoughts?

They want to release the unused hugepages to be "normal" memory so that these memory can be used by other process on the node.

@xiangpengzhao - that is out of scope for alpha. its possible you could run a controller that observed pods pending due to lack of hugepages capacity across cluster, and run a pod to allocate them on a particular node in response. there are pros/cons with that especially with gigantic page sizes. i had a pod that did something like that here: https://github.com/derekwaynecarr/hugepages/tree/master/allocator

@derekwaynecarr Cool! Thanks, Derek!

derekwaynecarr · 2017-08-21T16:25:14Z

need #50629 to merge first.

derekwaynecarr · 2017-08-22T16:55:25Z

/test pull-kubernetes-e2e-kops-aws

derekwaynecarr · 2017-08-23T18:55:55Z

/test pull-kubernetes-e2e-kops-aws

derekwaynecarr · 2017-08-26T01:30:47Z

prerequisite prs merged, this should be good to go.

derekwaynecarr · 2017-08-26T06:31:02Z

note to respond to #50773 (comment) in a follow-on pr.

dchen1107 · 2017-08-29T15:58:12Z

pkg/kubelet/cm/cgroup_manager_linux.go

@@ -43,6 +47,10 @@ const (
 	libcontainerSystemd libcontainerCgroupManagerType = "systemd"
 )

+// hugePageSizeList is useful for converting to the hugetlb canonical unit
+// which is what is expected when interacting with libcontainer
+var hugePageSizeList = []string{"", "kB", "MB", "GB", "TB", "PB"}


s/kB/KB to make it consistent with other units?

the units were intended to align with what libcontainer uses here (except "B" was irrelevant)

https://github.com/opencontainers/runc/blob/18cd7e06f709750d76a97f690a5b0d5023c52108/libcontainer/cgroups/utils.go#L401

dchen1107 · 2017-08-29T16:39:40Z

pkg/kubelet/cadvisor/util.go

+
+	// if huge pages are enabled, we report them as a schedulable resource on the node
+	if utilfeature.DefaultFeatureGate.Enabled(features.HugePages) {
+		for _, hugepagesInfo := range info.HugePages {


I thought we agreed to for this release, only support single huge page size, but I couldn't find where we ensure that.

ah, good catch. i wrote validation for pod to only consume one size, and forgot to add the validation for node to only report one size larger than 0. will add now.

dchen1107 · 2017-08-29T17:00:50Z

pkg/kubelet/cm/qos_container_manager_linux.go

@@ -262,6 +296,13 @@ func (m *qosContainerManagerImpl) UpdateCgroups() error {
 		return err
 	}

+	// update the qos level cgroup settings for huge pages (ensure they remain unbounded)


Could this lead to the overcommit for hugepage at the node level?

No. We enforce via validation of pod spec that over-commit is not allowed.

derekwaynecarr · 2017-09-01T19:13:09Z

rebased on kubefeature change.

derekwaynecarr · 2017-09-01T20:54:26Z

/retest

smarterclayton · 2017-09-01T21:10:57Z

/lgtm

trivial rebase

dchen1107 · 2017-09-01T21:12:12Z

/lgtm

derekwaynecarr · 2017-09-05T14:17:02Z

rebased

derekwaynecarr · 2017-09-05T15:30:58Z

/test pull-kubernetes-e2e-gce-etcd3
/test pull-kubernetes-e2e-kops-aws

derekwaynecarr · 2017-09-05T16:38:32Z

kernel panic on previous runs not related to this test:

[ 1036.734489] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078

/test pull-kubernetes-e2e-gce-etcd3

ConnorDoyle · 2017-09-05T17:00:11Z

/assign

sjenning · 2017-09-05T17:01:14Z

/lgtm

k8s-github-robot · 2017-09-05T17:01:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, derekwaynecarr, sjenning, smarterclayton

Associated issue: 275

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/OWNERS~~ [dchen1107,smarterclayton]
~~plugin/pkg/scheduler/OWNERS~~ [dchen1107,smarterclayton]
~~staging/src/k8s.io/api/core/OWNERS~~ [dchen1107,smarterclayton]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-09-05T17:26:47Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-ci-robot · 2017-09-05T17:58:51Z

@derekwaynecarr: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-kubernetes-e2e-kops-aws	`38d5dee`	link	`/test pull-kubernetes-e2e-kops-aws`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

k8s-github-robot · 2017-09-05T18:16:17Z

Automatic merge from submit-queue

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 17, 2017

derekwaynecarr mentioned this pull request Aug 17, 2017

WIP - Enable scheduler and node isolation support for pre-allocated HugePages #44817

Closed

k8s-github-robot assigned smarterclayton and jsafrane Aug 17, 2017

derekwaynecarr assigned dchen1107 and davidopp and unassigned jsafrane Aug 17, 2017

derekwaynecarr added this to the v1.8 milestone Aug 17, 2017

xiangpengzhao reviewed Aug 18, 2017

View reviewed changes

derekwaynecarr force-pushed the hugepages-feature branch from 2ba22e0 to 2a61992 Compare August 21, 2017 15:21

k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 21, 2017

derekwaynecarr force-pushed the hugepages-feature branch from 2a61992 to 999bd45 Compare August 22, 2017 14:34

k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 22, 2017

derekwaynecarr force-pushed the hugepages-feature branch from 9596148 to 9b4ace5 Compare August 22, 2017 15:50

derekwaynecarr mentioned this pull request Aug 25, 2017

Hugetlbfs support based on empty dir volume plugin #50072

Merged

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 26, 2017

derekwaynecarr force-pushed the hugepages-feature branch from 9b4ace5 to dfd5a09 Compare August 28, 2017 14:25

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 28, 2017

derekwaynecarr force-pushed the hugepages-feature branch from dfd5a09 to 65be700 Compare August 28, 2017 14:35

dchen1107 reviewed Aug 29, 2017

View reviewed changes

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 1, 2017

derekwaynecarr force-pushed the hugepages-feature branch from 0372445 to 130085a Compare September 1, 2017 19:16

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2017

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 3, 2017

derekwaynecarr added 3 commits September 5, 2017 09:46

HugePage changes in API and server

afd8045

Kubelet changes to support hugepages

1ec2a69

Scheduler support for hugepages

41a4e2c

derekwaynecarr force-pushed the hugepages-feature branch from 130085a to c9942b1 Compare September 5, 2017 14:16

k8s-github-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 5, 2017

Node validation restricts pre-allocated hugepages to single page size

38d5dee

derekwaynecarr force-pushed the hugepages-feature branch from c9942b1 to 38d5dee Compare September 5, 2017 14:34

k8s-ci-robot assigned ConnorDoyle Sep 5, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2017

k8s-github-robot merged commit 2f543f3 into kubernetes:master Sep 5, 2017

mattjmcnaughton mentioned this pull request May 2, 2019

Support multi size hugepages at host level #77251

Closed

cblecker mentioned this pull request Jul 6, 2019

Cherry pick of #78495: Fix issues in kubelet for Aarch64 resulting in kubelet crashing on starup #79671

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HugePages feature #50859

HugePages feature #50859

derekwaynecarr commented Aug 17, 2017 •

edited by dchen1107

Loading

xiangpengzhao Aug 18, 2017

xiangpengzhao Aug 18, 2017

derekwaynecarr Aug 18, 2017

xiangpengzhao Aug 18, 2017

xiangpengzhao Aug 18, 2017

derekwaynecarr Aug 18, 2017

xiangpengzhao Aug 18, 2017

derekwaynecarr commented Aug 21, 2017

derekwaynecarr commented Aug 22, 2017

derekwaynecarr commented Aug 23, 2017

derekwaynecarr commented Aug 26, 2017

derekwaynecarr commented Aug 26, 2017

dchen1107 Aug 29, 2017

derekwaynecarr Aug 30, 2017

dchen1107 Aug 29, 2017

derekwaynecarr Aug 30, 2017

dchen1107 Aug 29, 2017

derekwaynecarr Aug 30, 2017

derekwaynecarr commented Sep 1, 2017

derekwaynecarr commented Sep 1, 2017

smarterclayton commented Sep 1, 2017 •

edited

Loading

dchen1107 commented Sep 1, 2017

derekwaynecarr commented Sep 5, 2017

derekwaynecarr commented Sep 5, 2017

derekwaynecarr commented Sep 5, 2017

ConnorDoyle commented Sep 5, 2017

sjenning commented Sep 5, 2017

k8s-github-robot commented Sep 5, 2017

k8s-github-robot commented Sep 5, 2017

k8s-ci-robot commented Sep 5, 2017 •

edited

Loading

k8s-github-robot commented Sep 5, 2017

HugePages feature #50859

HugePages feature #50859

Conversation

derekwaynecarr commented Aug 17, 2017 • edited by dchen1107 Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Aug 21, 2017

derekwaynecarr commented Aug 22, 2017

derekwaynecarr commented Aug 23, 2017

derekwaynecarr commented Aug 26, 2017

derekwaynecarr commented Aug 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Sep 1, 2017

derekwaynecarr commented Sep 1, 2017

smarterclayton commented Sep 1, 2017 • edited Loading

dchen1107 commented Sep 1, 2017

derekwaynecarr commented Sep 5, 2017

derekwaynecarr commented Sep 5, 2017

derekwaynecarr commented Sep 5, 2017

ConnorDoyle commented Sep 5, 2017

sjenning commented Sep 5, 2017

k8s-github-robot commented Sep 5, 2017

k8s-github-robot commented Sep 5, 2017

k8s-ci-robot commented Sep 5, 2017 • edited Loading

k8s-github-robot commented Sep 5, 2017

derekwaynecarr commented Aug 17, 2017 •

edited by dchen1107

Loading

smarterclayton commented Sep 1, 2017 •

edited

Loading

k8s-ci-robot commented Sep 5, 2017 •

edited

Loading