Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node system swap support #2602

Merged
merged 9 commits into from
May 13, 2021
Merged

Node system swap support #2602

merged 9 commits into from
May 13, 2021

Conversation

ehashman
Copy link
Member

@ehashman ehashman commented Apr 6, 2021

See: #2400

This is an initial draft, based on the community design doc. I wanted to make sure we're all in high-level alignment before I begin on the implementation specifics. That should be the only section left TODO. (2021-04-28) I've added @ike-ma's initial draft of the implementation specifics as well, so this should be ready for overall review.

I've tried to keep the scope as narrow as possible to reduce unknowns and make this as easy/clear to graduate as possible. This will necessarily mean (IMO) splitting any workload-level accounting for swap into a separate KEP.

/sig node
/cc @dims @SergeyKanzhelev @dchen1107 @derekwaynecarr @mrunalp
cc @ike-ma

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Apr 6, 2021
@nikhita
Copy link
Member

nikhita commented Apr 7, 2021

creating a (dummy) approve comment to re-trigger CI, the approval mechanism was broken for a bit - kubernetes/test-infra#21687

/approve

@ehashman ehashman mentioned this pull request Apr 7, 2021
60 tasks
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Apr 19, 2021
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Apr 29, 2021
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 29, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Apr 29, 2021
@ehashman
Copy link
Member Author

@ike-ma it appears you haven't signed the Kubernetes CLA, can you please follow the instructions in the comment above to do so and run /check-cla when you're done?

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
### Non-Goals

- Provisioning swap. Swap must already be available on the system.
- Setting [swappiness]. This can already be set on a system-wide level outside of Kubernetes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is meant by "system-wide level" here? Is it referring to vm.swappiness sysctl?

If that's the case, the downside I see there is it's system wide. For example maybe there are some usecases where you want certain workloads to swap often (and as such set a high swappiness), but for system workloads (e.g those in /system.slice, i.e. kubelet, container runtime, etc), you don't want them to swap at all (and want a zero swappiness). If it's a system wide setting I guess you lose the ability to make those types of adjustments.

Not sure how important that is and probably a post alpha thing, but maybe something to consider :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This KEP doesn't enable setting per-workload swap utilization, so I think it's YAGNI on per-workload swappiness configuration in the CRI. When we're ready to add per-workload swap configuration to k8s, then we should add it IMO.

but for system workloads (e.g those in /system.slice, i.e. kubelet, container runtime, etc), you don't want them to swap at all (and want a zero swappiness).

Typically one wouldn't run the container runtime, kubelet, etc. using k8s though, they're prerequisites for a node that run directly on the host. So potentially this is still individually configurable for them, no?

Copy link
Member

@bobbypage bobbypage May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're ready to add per-workload swap configuration to k8s, then we should add it IMO.

Make sense!

So potentially this is still individually configurable for them, no?

I guess what you would do in that case is set vm.swappiness sysctl so that it applies system wide, but then you would have to edit all of your systemd units for kubelet, container runtime etc and override their swap settings. Might be helpful to note somewhere if that's the recommended path to configure swappiness since I would imagine most existing kubelet/container runtime units don't override swap settings.

edit: actually looking at the systemd config options, I don't think it exposes overriding swap for individual units... https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
@deads2k
Copy link
Contributor

deads2k commented May 11, 2021

PRR looks good. An overall note when going to beta: I'm not clear from the proposal as a whole how a pod author

  1. indicates the container does or does not want to use swap
  2. knows the binary is using swap

@deads2k
Copy link
Contributor

deads2k commented May 11, 2021

PRR

/approve

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A number of requests throughout, let me know what you think.

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
@ehashman ehashman requested a review from derekwaynecarr May 11, 2021 22:53
Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more requests, I think this is close to ready.

Let me know what you think.

1. Add a feature gate `NodeSwapEnabled` to enable swap support.
1. Leave the default value of kubelet flag `--fail-on-swap` to `true`, to avoid
changing default behaviour.
1. Introduce a new kubelet config parameter, `MemorySwapLimit`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer to never introduce the MemorySwapLimit parameter.

I would update this section to say:

  1. kubelet detects available swap on startup (all available swap is basically allocatable)
  2. kubelet enables reservation of swap to reduce node allocatable swap (apha or beta period)

changing default behaviour.
1. Introduce a new kubelet config parameter, `MemorySwapLimit`.
1. Introduce a new CRI parameter, `memory_swap_limit_in_bytes`.
1. Integrate new kubelet config and pass values to CRI for container creation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a little more clarification.

The container runtime will write the swap settings to the container level cgroup. This will include ephemeral containers. We need to account for the proper setting that is applied at the pod cgroup boundary. Please list addressing this as a blocker to beta for the feature

keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
keps/sig-node/2400-node-swap/README.md Outdated Show resolved Hide resolved
@ehashman
Copy link
Member Author

I believe all comments have been addressed. PTAL!

@SergeyKanzhelev
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2021
int64 memory_swap_limit_in_bytes = 9;
...
}
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to include this as part of CRI stats as well.

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

thanks for the updates, look forward to experimenting with this as it evolves.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, derekwaynecarr, ehashman, nikhita

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 13, 2021
@k8s-ci-robot k8s-ci-robot merged commit abc88e7 into kubernetes:master May 13, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.22 milestone May 13, 2021
@carlisia
Copy link

Hello @ehashman 👋, 1.22 Docs Shadow here.
This enhancement is marked as ‘Needs Docs’ for the 1.22 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.22 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Fri July 9, 11:59 PM PDT.

Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Thank you! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.