Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KEP-2400]: Restrict access to swap for containers in high priority Pods #125277

Merged
merged 3 commits into from
Jul 22, 2024

Conversation

iholder101
Copy link
Contributor

@iholder101 iholder101 commented Jun 2, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Exclude critical pods from gaining swap access.

I believe this is valuable for two main reasons:

  1. Critical pods are assumed to not tolerate performance derogations which could be impacted swap access.
  2. This provides another way of opting-out burstable pods from swap access.

p.s. currently, it is possible to opt-out of swap for burstable pods by setting requests.memory == limits.memory. However, this approach forces the workload owner to set limits which is unacceptable for certain workloads. With this, an administrator can choose to classify such burstable pods as critical to opt-out of swap.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Changed Linux swap handling to restrict access to swap for containers in high priority Pods.
New Pods that have a node- or cluster-critical priority are prohibited from accessing swap on Linux,
even if your cluster and node configuration could otherwise allow this.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2400-node-swap

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 2, 2024
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 2, 2024
@iholder101
Copy link
Contributor Author

/sig node
/cc @haircommander @fabiand @mrunalp @SergeyKanzhelev

@k8s-ci-robot
Copy link
Contributor

@iholder101: GitHub didn't allow me to request PR reviews from the following users: fabiand.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/sig node
/cc @haircommander @fabiand @mrunalp @SergeyKanzhelev

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@iholder101
Copy link
Contributor Author

@haircommander do we need a backing issue for this PR?

@sftim
Copy link
Contributor

sftim commented Jun 2, 2024

Changelog suggestion

-Exclude critical pods from having swap access
+Changed Linux swap handling to restrict access to swap for containers in high priority Pods.
+New Pods that have a node- or cluster-critical priority are prohibited from accessing swap on Linux,
+even if your cluster and node configuration could otherwise allow this.

@iholder101 iholder101 force-pushed the swap/skip_critical_pods branch from c92bb66 to c5fbd74 Compare June 2, 2024 14:08
@iholder101
Copy link
Contributor Author

Changelog suggestion

-Exclude critical pods from having swap access
+Changed Linux swap handling to restrict access to swap for containers in high priority Pods.
+New Pods that have a node- or cluster-critical priority are prohibited from accessing swap on Linux,
+even if your cluster and node configuration could otherwise allow this.

Thanks, done!

@iholder101
Copy link
Contributor Author

/retest

@haircommander
Copy link
Contributor

@haircommander do we need a backing issue for this PR?

I think we can discuss pros and cons here

@iholder101 iholder101 changed the title [KEP2400]: Exclude critical pods from having swap access [KEP2400]: Restrict access to swap for containers in high priority Pods Jun 3, 2024
@yujuhong
Copy link
Contributor

yujuhong commented Jun 3, 2024

@haircommander do we need a backing issue for this PR?

I think we can discuss pros and cons here

I think the use case is reasonable. Many critical components don't have the memory limit set today.
That said, I wonder if we make too many indirect/internal decisions for swap instead of allowing it to surface on the API for the users to configure.

@aojea
Copy link
Member

aojea commented Jun 3, 2024

Is this in line with the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2400-node-swap#set-aside-swap-for-system-critical-daemons

Set Aside Swap for System Critical Daemons
Note In Beta2, we found that having system critical daemons swapping memory could cause degration of services.

System critical daemons (such as Kubelet) are essential for node health. Usually, an appropriate portion of system resources (e.g., memory, CPU) is reserved as system reserved. However, swap doesn't inherently support reserving a portion out of the total available. For instance, in the case of memory, we set memory.min on the node-level cgroup to ensure an adequate amount of memory is set aside, away from the pods, and for system critical daemons. But there is no equivalent for swap; i.e., no memory.swap.min is supported in the kernel.

Since this proposal advocates enabling swap only for the Burstable QoS pods, this can be done by setting memory.swap.max on the cgroups used by the Burstable QoS pods. The value of this memory.swap.max can be calculated by:

memory.swap.max = total swap memory available on the system - system reserve (memory)

This is the total amount of swap available for all the Burstable QoS pods; let's call it TotalPodsSwapAvailable. This will ensure that the system critical daemons will have access to the swap at least equal to the system reserved memory. This will indirectly act as having support for swap in system reserved.

@bart0sh
Copy link
Contributor

bart0sh commented Jun 4, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jun 4, 2024
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

5 similar comments
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@pacoxu
Copy link
Member

pacoxu commented Jul 22, 2024

/hold
for ci failure

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2024
@@ -1244,6 +1257,11 @@ func TestGenerateLinuxContainerResourcesWithSwap(t *testing.T) {
pod.Spec.Containers[0].Resources = resourceReqsC1
pod.Spec.Containers[1].Resources = resourceReqsC2

if tc.isCriticalPod {
pod.Spec.Priority = ptr.To(scheduling.SystemCriticalPriority)
assert.Equal(t, true, types.IsCriticalPod(pod), "pod is expected to be critical")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Equal(t, true, types.IsCriticalPod(pod), "pod is expected to be critical")
assert.True(t, types.IsCriticalPod(pod), "pod is expected to be critical")

@kannon92
Copy link
Contributor

/lgtm cancel

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 22, 2024
@k8s-ci-robot k8s-ci-robot requested review from mrunalp and yujuhong July 22, 2024 14:34
@iholder101 iholder101 force-pushed the swap/skip_critical_pods branch from c5fbd74 to 353d71a Compare July 22, 2024 14:56
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
Signed-off-by: Itamar Holder <iholder@redhat.com>
@iholder101 iholder101 force-pushed the swap/skip_critical_pods branch from 353d71a to a6df16a Compare July 22, 2024 14:56
@kannon92
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 22, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 8bd81f644c8d8f2768129031f7e36f1877bf2dc4

@kannon92
Copy link
Contributor

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 22, 2024
@iholder101
Copy link
Contributor Author

Thank you @pacoxu and @kannon92 for the notice!
Sorry for not noticing this myself, I'm out of work for an extended period.

@k8s-ci-robot k8s-ci-robot merged commit f458a74 into kubernetes:master Jul 22, 2024
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.31 milestone Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

Successfully merging this pull request may close these issues.