New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[KEP-2400] - Update KEP with findings from beta2 #4401

Merged

k8s-ci-robot merged 12 commits into kubernetes:master from kannon92:kep-2400-beta-2-findings

Feb 6, 2024

Contributor

kannon92 commented Jan 12, 2024

One-line PR description:
Update Swap KEP with findings from beta2 testing.

Issue link: Node memory swap support #2400

Other comments:

Our recommendation is to implement MemorySwap = NoSwap and to let this feature sit in beta2. We don't want to promote this to GA at this time but we realize that someone guard rails are necessary for this feature.
In this KEP, we propose some best practices and allow for opt-in support for this feature regardless of the feature toggle.


          update kep with beta2 findings

3f4b5e4

k8s-ci-robot added the cncf-cla: yes label

k8s-ci-robot requested review from johnbelamaric and kikisdeliveryservice

January 12, 2024 20:46

k8s-ci-robot added kind/kep sig/node size/L labels

kannon92 mentioned this pull request

Fail Kubelet at startup if swap is configured with cgroup v1 kubernetes/kubernetes#122241

Closed

fabiand reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved

keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved


          drop unlimitedswap from kep

d97e2bf

kannon92 force-pushed the kep-2400-beta-2-findings branch from af16a2d to d97e2bf Compare

January 16, 2024 22:50

Contributor Author

kannon92 commented Jan 16, 2024

/cc @harche @haircommander @pacoxu @dchen1107 @mrunalp

k8s-ci-robot requested review from dchen1107, haircommander, harche, mrunalp and pacoxu

January 16, 2024 22:51

harche reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md Outdated

@@ @@ -1059,6 +1169,7 @@ nodes that do not use swap memory. @@
               - **2021-01-05:** Initial design discussion document for swap support and use cases.
               - **2021-04-05:** Alpha KEP drafted for initial node-level swap support and implementation (KEP-2400).
               - **2021-08-09:** New in Kubernetes v1.22: alpha support for using swap memory: https://kubernetes.io/blog/2021/08/09/run-nodes-with-swap-alpha/
+              - **2024-01-12:** Updates to Beta2.

Contributor

harche Jan 17, 2024

Would it make sense to add a reference to the PR that added swap support in the stats summary? kubernetes/kubernetes#118865

Contributor

harche Jan 17, 2024

Although, I see that section only has links to issues, not PRs.

Contributor Author

kannon92 Jan 17, 2024

This section does need some love. I don't really know the dates but I guess I can find PRs some I can fill this in.

Contributor Author

kannon92 Jan 17, 2024

I updated the section with what I could find. There are alot of PRs and I decided to focus on promotion to beta1. PTAL.

kannon92 added 4 commits

January 17, 2024 12:14


          adding unlimited swap drop and more details on separate disk

a94f5c7


          update history

1ceadac


          update history

1a3edcd


          update swap

a438c3a

kannon92 requested review from harche and fabiand

January 18, 2024 18:47


          update toc

d39991e

haircommander reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved

haircommander reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved


          update swap prr questions

c426cf9

harche reviewed

View reviewed changes

Contributor

harche left a comment

/lgtm

k8s-ci-robot assigned harche

k8s-ci-robot added the lgtm label

Contributor Author

kannon92 commented Jan 30, 2024

/assign @dchen1107 @mrunalp @SergeyKanzhelev

Since this is staying in beta, I am not sure if we need PRR reviews/approvals at the moment.

k8s-ci-robot assigned dchen1107, mrunalp and SergeyKanzhelev

mrunalp reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved

mrunalp reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved

Contributor

mrunalp commented Feb 2, 2024

This looks fine to me overall.
@dchen1107 if you want to review as well.


          typo

455df04

k8s-ci-robot removed the lgtm label

Contributor

deads2k commented Feb 5, 2024

Thank you for updating PRR as you go. The PRR changes lgtm

dchen1107 reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved

dchen1107 reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md

               - Add [swap specific tests](https://github.com/kubernetes/kubernetes/issues/120798) such as, handling the usage of
                 swap during container restart boundaries for writes to tmpfs (which may require pod cgroup change beyond what
                 container runtime will do at (container cgroup boundary).
               - Fix flaking/failing swap node e2e jobs.
               - Address eviction related [issue](https://github.com/kubernetes/kubernetes/issues/120800) in swap implementation.
+              - Add `NoSwap` as the default setting.
+              - Remove `UnlimitedSwap` as a supported option.

Member

dchen1107 Feb 6, 2024

+1000

dchen1107 reviewed

View reviewed changes

keps/sig-node/2400-node-swap/README.md

@@ @@ -608,12 +714,18 @@ Here are specific improvements to be made: @@
               - Improve coverage for appropriate scenarios in testgrid.
               #### Beta 2
-              - Publish a Kubernetes doc page encoring user to use encrypted swap if they wish to enable this feature.

Member

dchen1107 Feb 6, 2024

Anything for the debuggability enhancement since enabling the feature might affect the performance of the node. I know we introduced swap metrics, but what about kubectl describe node?

fabiand Feb 6, 2024

FWIW we are still exploring what the best way of monitoring swap is. If it's direct metrics - swap traffic, or indirect, such as *wait times.

Contributor Author

kannon92 Feb 6, 2024

I'll put this as something to look into @dchen1107! The issue with describe node is that I think it gives capacity for swap (ie how much swap the node actually has) but a user may get confused because they are only allowed to use up to computed limit on swap for Burstable QoS. So its a bit misleading to add a capacity when you can't really use it.

I think we should add some way to detect if swap is enabled on a node. I talked with nfd team about potentially adding a way to detect if swap is on a node so we can add useful labels for nodes for swap.

Contributor

yujuhong Mar 5, 2024

I think the node-level swap capacity would still be a useful information even if the pods may not be able to consume it. That's the same for other resources on the node too.

Q: Is there a good way to expose per-pod swap usage?

Contributor

yujuhong Mar 5, 2024

A quick way to examine whether system-critical daemons are using swap would also be helpful

Contributor Author

kannon92 Mar 5, 2024

A quick way to examine whether system-critical daemons are using swap would also be helpful

I don't have a great way to do this. @fabiand any ideas here?

I would look at free and then see what is using swap.

Contributor Author

kannon92 Mar 5, 2024 •

edited

Loading

I think the node-level swap capacity would still be a useful information even if the pods may not be able to consume it. That's the same for other resources on the node too.

I've been asked that a few times so I think we can add this to the node information when doing kubectl describe node.

Q: Is there a good way to expose per-pod swap usage?
I was actually thinking that a crictl stats or something like that may be useful here.

I think one could view the summary api using kubectl --raw and view the swap stats.

It could maybe be useful to expose information at a lower level tool if that is of interest.

We would look into the container and view memory.swap.current to see if swap was being used. The scoping was a bit confusing to me but exec into the pod and viewing the memory.swap.current seems to be the best way to get swap information.

@iholder101 any other ways you found useful to look at per-pod swap usage?

Contributor

iholder101 Mar 7, 2024

I've been asked that a few times so I think we can add this to the node information when doing kubectl describe node

I think it makes sense. I'll get to it.

@iholder101 any other ways you found useful to look at per-pod swap usage?

What about using kubectl top to get this information? I can dive deeper into it.

fabiand Mar 7, 2024

A quick way to examine whether system-critical daemons are using swap would also be helpful

Critical daemons can be provided in two areas

node level (part of os)
as containers

(1) is easy to monitor: system.slice contains all these services. system.slice should also be configured to not use swap (part of the kep)
(2) is more difficult i.e. rook or other CNI/CSI projects will have critical pods.

Monitoring and configuration of those is more difficult.

If we assume that pods are correctly assigned to PriorityClasses, then monitoring can be improve by displaying swap consumption by priority class.
Further - unsure about it - we could consider to implictily control swap for workloads based on priority class. I.e. disable for node- and cluster-critical workloads.

Addition thoughts:
I've often seen that pods from (2) do run as burstable, making them eligible to swap.
We either need to find a way to (a) let pods opt-out of swap independent of the qos class or (b) make pods to opt-in to swap - regardless of the qos class (burstable pods can also benefit from swap as a safety mechanism).

The broad decision we should think about is: Do we want opt-in or opt-out for workloads.

I've been asked that a few times so I think we can add this to the node information when doing kubectl describe node

@iholder101 +1 in adding to kubelet and then kubectl. Let's include two metrics: One around amount of swap, and the other on amount of swap traffic. The latter is worse than the first.

Member

dchen1107 commented Feb 6, 2024

/lgtm with a minor comment which can be added later.
/approve

Contributor

k8s-ci-robot commented Feb 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, kannon92

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-node/OWNERS~~ [dchen1107]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the approved label

Contributor

mrunalp commented Feb 6, 2024

/lgtm

k8s-ci-robot added the lgtm label

k8s-ci-robot merged commit 1464eaf into kubernetes:master

4 checks passed

k8s-ci-robot added this to the v1.30 milestone

kannon92 mentioned this pull request

[KEP-2400]: Add swap to kubectl describe node's output kubernetes/kubernetes#123963

Closed

This was referenced Jun 2, 2024

[KEP-2400]: Add a NodeSwapnode condition kubernetes/kubernetes#125278

Closed

[KEP-2400]: Restrict access to swap for containers in high priority Pods kubernetes/kubernetes#125277

Merged

pacoxu mentioned this pull request

Node memory swap support #2400

Open

60 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

iholder101 iholder101 left review comments

harche harche left review comments

yujuhong yujuhong left review comments

haircommander haircommander left review comments

dchen1107 dchen1107 left review comments

fabiand fabiand left review comments

mrunalp mrunalp left review comments

johnbelamaric Awaiting requested review from johnbelamaric

kikisdeliveryservice Awaiting requested review from kikisdeliveryservice

pacoxu Awaiting requested review from pacoxu

Labels

approved cncf-cla: yes kind/kep lgtm sig/node size/L