Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support memory qos with cgroups v2 #102970

Merged
merged 1 commit into from
Jul 8, 2021

Conversation

borgerli
Copy link
Contributor

@borgerli borgerli commented Jun 18, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Implements KEP - Support memory qos with cgroups v2

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Introducing Memory QoS support with cgroups v2 (Alpha)
The MemoryQoS feature is now in Alpha. This allows kubelet running with cgroups v2 to set memory QoS at container, pod and QoS level to protect and guarantee better memory quality. This feature can be enabled through feature gate MemoryQoS.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Note

After discussion with KEP owner @xiaoxubeii, this PR leaves the implementation of kube-reserved / system-reserved cgroups for future. The effective min boundary of a cgroup is also limited by memory.min values of all ancestor cgroups, and memory.min of all ancestor cgroups need to be set accordingly. But for kube-reserved / system-reserved cgroups, the ancestor cgroups may not be managed by kubelet.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 18, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @borgerli. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 18, 2021
@xiaoxubeii
Copy link
Member

/cc
/cc @bobbypage @mrunalp

@xiaoxubeii
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 18, 2021
@borgerli borgerli changed the title [WIP] Feature: Support memory qos with cgroups v2 Feature: Support memory qos with cgroups v2 Jun 22, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 22, 2021
@borgerli
Copy link
Contributor Author

The first commit of this PR,"add unified on cri to support cgroup v2", is being reviewed in another PR: #102578.
I'll rebase after PR102578 merged.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2021
@borgerli
Copy link
Contributor Author

borgerli commented Jul 8, 2021

/test pull-kubernetes-node-crio-cgrpv2-e2e

1 similar comment
@borgerli
Copy link
Contributor Author

borgerli commented Jul 8, 2021

/test pull-kubernetes-node-crio-cgrpv2-e2e

@borgerli
Copy link
Contributor Author

borgerli commented Jul 8, 2021

The check of pull-kubernetes-node-kubelet-serial-crio-cgroupv2 always failed due to issue #103579.

@k8s-ci-robot
Copy link
Contributor

@borgerli: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubernetes-node-kubelet-serial-crio-cgroupv2 c3d9b10 link /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
pull-kubernetes-node-crio-cgrpv2-e2e c3d9b10 link /test pull-kubernetes-node-crio-cgrpv2-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@xiaoxubeii
Copy link
Member

/assign @derekwaynecarr

@thockin
Copy link
Member

thockin commented Jul 8, 2021

/approve

based on @bobbypage and @mrunalp and @ehashman reviews.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 8, 2021
@ehashman
Copy link
Member

ehashman commented Jul 8, 2021

@ehashman Can we use cgv2 e2e in alpha first? It will take some time to submit e2e PR, which we can make up later.

Given that pull-kubernetes-e2e-gce-alpha-features is passing there is at least some coverage for the feature flag being enabled, but since none of the logic of this PR is triggered without the feature flag enabled and cgroupsv2, IMO this needs a test job added. Since none of our available jobs are able to check this right now, I'm hesitant to merge without exercising these code paths in CI. This PR should have had some test-infra changes submitted simultaneously so that we can trigger such a job on a PR. See e.g. https://github.com/kubernetes/test-infra/blob/58ae43ff3e79c2958827883f5f5b0164084bfb23/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L377-L415

If we can't get something merged in test-infra by EOD to test this I'll defer to Node leads to make the call since this is alpha, but I really think we need to add a CI job. This hasn't actually run through CI so I don't have confidence that it works.

@ehashman
Copy link
Member

ehashman commented Jul 8, 2021

/hold

to resolve discussion

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2021
@ehashman
Copy link
Member

ehashman commented Jul 8, 2021

/test pull-kubernetes-node-memoryqos-cgrpv2

@ehashman
Copy link
Member

ehashman commented Jul 8, 2021

Job passed! 🥳 https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/102970/pull-kubernetes-node-memoryqos-cgrpv2/1413181901090328576/

/hold cancel
/approve

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 8, 2021
@ehashman
Copy link
Member

ehashman commented Jul 8, 2021

/skip

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: borgerli, ehashman, mrunalp, thockin, xiaoxubeii

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ehashman
Copy link
Member

ehashman commented Jul 8, 2021

/milestone v1.22

@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone Jul 8, 2021
@k8s-ci-robot k8s-ci-robot merged commit a9d7526 into kubernetes:master Jul 8, 2021
tengqm added a commit to tengqm/website that referenced this pull request Jul 11, 2021
Here is a second batch for feature gate updates in 1.22.

- CPUManagerPolicyOptions    kubernetes/kubernetes#101432
- ControllerManagerLeaderMigration  kubernetes/kubernetes#103533
- DelegateFSGroupToCSIDriver   kubernetes/kubernetes#103244
- DynamicKubeletConfig    kubernetes/kubernetes#102966
- EndpointSliceProxying  kubernetes/kubernetes#103451
- EndpointSliceTerminatingCondition  kubernetes/kubernetes#103596
- HugePageStorageMediumSize    kubernetes/kubernetes#99144
- JobTrackingWithFinalizers   kubernetes/kubernetes#98817
  (also tracked in kubernetes#28841, can rebase).
- MemoryQoS   kubernetes/kubernetes#102970
- NodeSwap  kubernetes/kubernetes#102823, kubernetes/kubernetes#103553
- ServiceInternalTrafficPolicy  kubernetes/kubernetes#103462
- StatefulSetAutoDeletePVC kubernetes/kubernetes#99378
- WindowsEndpointSliceProxying   kubernetes/kubernetes#103451

Some of these needs more detailed documentation.
tengqm added a commit to tengqm/website that referenced this pull request Jul 18, 2021
Here is a second batch for feature gate updates in 1.22.

- CPUManagerPolicyOptions    kubernetes/kubernetes#101432
- ControllerManagerLeaderMigration  kubernetes/kubernetes#103533
- DelegateFSGroupToCSIDriver   kubernetes/kubernetes#103244
- DynamicKubeletConfig    kubernetes/kubernetes#102966
- EndpointSliceProxying  kubernetes/kubernetes#103451
- EndpointSliceTerminatingCondition  kubernetes/kubernetes#103596
- HugePageStorageMediumSize    kubernetes/kubernetes#99144
- JobTrackingWithFinalizers   kubernetes/kubernetes#98817
  (also tracked in kubernetes#28841, can rebase).
- MemoryQoS   kubernetes/kubernetes#102970
- NodeSwap  kubernetes/kubernetes#102823, kubernetes/kubernetes#103553
- ServiceInternalTrafficPolicy  kubernetes/kubernetes#103462
- StatefulSetAutoDeletePVC kubernetes/kubernetes#99378
- WindowsEndpointSliceProxying   kubernetes/kubernetes#103451

Some of these needs more detailed documentation.
tengqm added a commit to tengqm/website that referenced this pull request Jul 21, 2021
Here is a second batch for feature gate updates in 1.22.

- CPUManagerPolicyOptions    kubernetes/kubernetes#101432
- ControllerManagerLeaderMigration  kubernetes/kubernetes#103533
- DelegateFSGroupToCSIDriver   kubernetes/kubernetes#103244
- DynamicKubeletConfig    kubernetes/kubernetes#102966
- EndpointSliceProxying  kubernetes/kubernetes#103451
- EndpointSliceTerminatingCondition  kubernetes/kubernetes#103596
- HugePageStorageMediumSize    kubernetes/kubernetes#99144
- JobTrackingWithFinalizers   kubernetes/kubernetes#98817
  (also tracked in kubernetes#28841, can rebase).
- MemoryQoS   kubernetes/kubernetes#102970
- ServiceInternalTrafficPolicy  kubernetes/kubernetes#103462
- StatefulSetAutoDeletePVC kubernetes/kubernetes#99378
- WindowsEndpointSliceProxying   kubernetes/kubernetes#103451

Some of these needs more detailed documentation.
tengqm added a commit to tengqm/website that referenced this pull request Jul 22, 2021
Here is a second batch for feature gate updates in 1.22.

- CPUManagerPolicyOptions    kubernetes/kubernetes#101432
- ControllerManagerLeaderMigration  kubernetes/kubernetes#103533
- DynamicKubeletConfig    kubernetes/kubernetes#102966
- EndpointSliceProxying  kubernetes/kubernetes#103451
- EndpointSliceTerminatingCondition  kubernetes/kubernetes#103596
- HugePageStorageMediumSize    kubernetes/kubernetes#99144
- JobTrackingWithFinalizers   kubernetes/kubernetes#98817
  (also tracked in kubernetes#28841, can rebase).
- MemoryQoS   kubernetes/kubernetes#102970
- ServiceInternalTrafficPolicy  kubernetes/kubernetes#103462
- StatefulSetAutoDeletePVC kubernetes/kubernetes#99378
- WindowsEndpointSliceProxying   kubernetes/kubernetes#103451

Some of these needs more detailed documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Status: API review completed, 1.22
Development

Successfully merging this pull request may close these issues.