Fail on swap enabled and deprecate experimental-fail-swap-on flag #47181

dims · 2017-06-08T14:12:46Z

What this PR does / why we need it:

* Deprecate the old experimental-fail-swap-on
* Add a new flag fail-swap-on and set it to true

Before this change, we would not fail when swap is on. With this
change we fail for everyone when swap is on, unless they explicitly
set --fail-swap-on to false.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes #34726

Special notes for your reviewer:

Release note:

Kubelet will by default fail with swap enabled from now on. The experimental flag "--experimental-fail-swap-on" has been deprecated, please set the new "--fail-swap-on" flag to false if you wish to run with /proc/swaps on.

dims · 2017-06-10T11:15:25Z

/cc @mtaufen

kiall · 2017-06-12T10:32:13Z

When people are running kubelet locally on their workstations (we do this for some dev environment stuff), there will likely be swap enabled. Can a --no-fail-on-swap or similar flag be added to do the opposite of --experimental-fail-swap-on?

mtaufen · 2017-06-12T13:16:29Z

@kiall that's a good point.

Maybe we should just print warnings in some of these cases and report conformance failures in a Node condition, rather than crashing the Kubelet? Or maybe we should discuss having a dev mode that relaxes some of the node conformance requirements?

@vishh what do you think?

vishh · 2017-06-12T18:16:22Z

Adding an opt-out for strict conformance sounds like a good idea for development. However I'd prefer not having an opt-out per feature/check. Can we have one global conformance opt-out flag instead?

kiall · 2017-06-12T18:19:53Z

Adding an opt-out for strict conformance sounds like a good idea for development. However I'd prefer not having an opt-out per feature/check. Can we have one global conformance opt-out flag instead?

The problem I see with a global opt-out flag is, people will want to opt out of 1 thing (e.g. swap off), and once they enable it, they'll never notice the 100 other things they should be doing, but aren't. This may be OK, but it's something to consider :)

vishh · 2017-06-12T21:47:15Z

The problem I see with a global opt-out flag is, people will want to opt out of 1 thing (e.g. swap off), and once they enable it, they'll never notice the 100 other things they should be doing, but aren't. This may be OK, but it's something to consider :)

Devs who opt-out of error checking are expected to understand what they are doing.

kiall · 2017-06-12T21:58:45Z

Absolutely, but production deployments will opt out of some rules too, and many will likely never notice the list of opt outs behind a global flag has grown since they initially evaluated what they opted out of. Again, this may be OK in the context of e.g. certification, but certification is likely not the only reason to want to be aware of additional differences between newly expected requirements and actual deployment environments.

vishh · 2017-06-12T23:53:40Z

In general, we try to not have the kubelet crash explicitly unless the underlying configuration is incompatible with Kubelet's core functionalities. So in theory I can see how a single flag cannot satisfy all use cases, but in practice, I think that will just work for the foreseeable future.

mtaufen · 2017-06-13T03:57:13Z

I'd support adding an explicit --dev-mode flag to handle these sorts of things (that will only ever be a flag, if anyone adds it, please never add it to the KubeletConfiguration type). We need to be careful that too much stuff doesn't get shoved under "dev mode" though. You probably want your dev Kubelet to be as close to production as possible. And conformance failures absolutely need to be logged as errors if you're in a "dev mode" that doesn't crash when the node fails conformance checks.

djsly · 2017-06-21T13:53:09Z

Just to provide a use case where we will need swap. Currently, we have containers that are mounting /dev/shm and loading a lot to RAM Disks for cross container memory sharing.

While we should try not to swap at all, they are cases where we do see swapping.

In the cloud, when you don't have swap and you do start swapping, it makes the VMs unreachable after a while since the OS will most probably start killing processes (sshd and friends).

Having a flag to only disable Swap would be a necessity I would say for our use case.

mtaufen · 2017-06-21T22:44:56Z

If you see swapping caused by pressure from your containers, doesn't that mean the requests and limits need to be adjusted? Ideally requests and limits are set such that the scheduler helps you avoid triggering OOM-kills. Sorry if I'm misunderstanding your situation.

vishh · 2017-06-22T04:10:14Z

@djsly

While we should try not to swap at all, they are cases where we do see swapping.

What do you meant by "see swapping"?

Having a flag to only disable Swap would be a necessity I would say for our use case.

Kubelet expects swap to be disabled by default. Are you requesting that it should allow swapping?

djsly · 2017-06-22T13:52:33Z

@mtaufen @vishh
Bare with me, this use is indeed quite unusual :)

First of all, we only set CPU requests in the manifest.

we also set
"hostPID": true, and

         {
            "name": "memory",
            "hostPath": {
              "path": "/dev/shm/{{ namespace }}/{{ deployed_pool_unique_id }}"
            }
          },

since we create a ram disk with almost all the RAM available, we have processes that will fill up the RAM for other containers to share. Since the memory is managed outside of the container, we end up having possibilities where Memory could fill up hence triggering some swapping.

This is not a desired behavior, but it could occur on some occasions.

When we moved to the cloud, we didnt have swap enabled and this caused the CLoud VM to become unresponsive from time to time.

djsly · 2017-06-22T13:55:22Z

Note that I might not be understanding the whole swap + container story, so I'm not sure how swap can affect the memory usage of a linux container.

mtaufen · 2017-06-27T20:30:22Z

Interesting. Would it be possible to run a container that does nothing, but sets memory requests and limits equal to the size of the RAM disk so that the Kubernetes scheduler can know you intend to allocate that much memory to the RAM disk? If the scheduler is aware, it should be able to avoid sending too many other containers to the Node, and then you should be able to both turn off swap and avoid OOM kills.

Would it help to slightly reduce the size of the RAM disk or slightly increase the size of the RAM? It may also be that your margin of safety is just too thin.

djsly · 2017-07-10T14:32:01Z

For best performance, we require to almost fill the RAM disk to its max capacity which is all of the RAM. We use POD anti-affinity and affinity logics to make sure that only the pods that uses the RAM disk are scheduled on those nodes.

There are edge cases where during a rolling update the RAM disk gets over filled, hence the need for the SWAP.

dims · 2017-07-26T19:15:17Z

Don't see a consensus, abandoning for now

mtaufen · 2017-07-27T01:11:27Z

Re-opening. This discussion trailed off during the release but we should resolve it. I propose we deprecate the experimental flag name and replace it with fail-swap-on, and default this flag to true. That gives @djsly an out for his use case and we still enforce no swap for most users.

djsly · 2017-07-27T03:30:44Z

Sounds good to me. Thanks for taking into consideration our use case.

…

On Jul 26, 2017, at 9:12 PM, Michael Taufen ***@***.***> wrote: Re-opening. This discussion trailed off during the release but we should resolve it. I propose we deprecate the experimental flag name and replace it with fail-swap-on, and default this flag to true. That gives @djsly an out for his use case and we still enforce no swap for most users. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mtaufen · 2017-08-03T03:58:31Z

/assign @yujuhong for approval

k8s-ci-robot · 2017-08-03T03:58:32Z

@mtaufen: GitHub didn't allow me to assign the following users: for, approval.

Note that only kubernetes members can be assigned.

In response to this:

/assign @yujuhong for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

yujuhong · 2017-08-04T16:29:38Z

/approve

dims · 2017-08-04T18:31:41Z

@cblecker / @fejta can one of you please take a look? (one line change in hack/verify-flags/known-flags.txt)

cblecker · 2017-08-04T18:54:18Z

/approve
Hopefully we can get rid of the known flags bit soon (per kubernetes-dev@) 😄

k8s-github-robot · 2017-08-04T18:54:46Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cblecker, dims, mtaufen, yujuhong

Associated issue: 34726

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~cmd/kubelet/OWNERS~~ [yujuhong]
~~hack/OWNERS~~ [cblecker]
~~pkg/apis/componentconfig/OWNERS~~ [mtaufen]
~~pkg/kubelet/OWNERS~~ [yujuhong]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-08-04T21:29:35Z

Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)

ghost · 2017-10-12T11:39:14Z

Can someone explain why kubernetes shouldn't be run with swap enabled? Neither this pull request nor the referenced ticket gives any explanation for it.

dims · 2017-10-12T11:55:42Z

@Jonast - Please see #53533

The terraform apply command got stuck at a point when it was trying to initialize all the hosts in the cluster. It kept on saying "Waiting for API server to respond". Further investigation showed that it was failing on preflight checks: module.kubernetes.null_resource.kubernetes[0] (remote-exec): [preflight] Some fatal errors occurred: module.kubernetes.null_resource.kubernetes[0] (remote-exec): [ERROR Swap]: running with swap on is not supported. Please disable swap This pre-flight check was enabled in kubernetes/kubernetes#47181 Since, the swap was being enabled in main.tf, i have removed that bit to stop terraform from enabling swap on the host machines. This closes hobby-kube/guide#25

blackknight36 · 2018-01-24T14:04:49Z

I would also like to know why running with swap on is not supported.

dims · 2018-01-24T14:16:18Z

@blackknight36 please see #53533

vherron12 · 2018-02-19T06:51:27Z

Hi Sir,

How to deal w/ this issue? I'm running CenOS 7 on vagrant.

[root@k8s-master ~]# kubeadm init \

--kubernetes-version=v1.9.0
--pod-network-cidr=10.244.0.0/16
--apiserver-advertise-address=192.168.25.10
--ignore-preflight-errors=Swap
[init] Using Kubernetes version: v1.9.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.25.10]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- There is no internet connection, so the kubelet cannot pull the following control plane images:
- gcr.io/google_containers/kube-apiserver-amd64:v1.9.0
- gcr.io/google_containers/kube-controller-manager-amd64:v1.9.0
- gcr.io/google_containers/kube-scheduler-amd64:v1.9.0

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster
[root@k8s-master ~]#

vherron12 · 2018-02-19T06:52:42Z

vherron12 · 2018-02-19T06:54:03Z

k8s-github-robot assigned yujuhong and smarterclayton Jun 8, 2017

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 8, 2017

k8s-ci-robot requested a review from mtaufen June 10, 2017 11:15

dims closed this Jul 26, 2017

mtaufen reopened this Jul 27, 2017

k8s-ci-robot assigned mtaufen Aug 3, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2017

k8s-github-robot merged commit fae79dd into kubernetes:master Aug 4, 2017

dims mentioned this pull request Aug 9, 2017

local cluster fails to start Error: failed to run Kubelet: Running with swap on is not supported #50373

Closed

sbezverk mentioned this pull request Aug 9, 2017

Adding --fail-swap-on=false to hack/local-up-cluster.sh #50377

Closed

hixichen mentioned this pull request Oct 9, 2017

kubeadm init --kubernetes-version=v1.8.0 fail with connection refuse for Get http://localhost:10255/healthz #53333

Closed

dims deleted the fail-on-swap-enabled branch November 16, 2017 22:05

pstadler mentioned this pull request Dec 22, 2017

[ERROR Swap]: running with swap on is not supported. Please disable swap hobby-kube/guide#25

Closed

Qambar mentioned this pull request Dec 25, 2017

Remove swap module from main.tf hobby-kube/provisioning#11

Closed

wannabesrevenge mentioned this pull request Jan 9, 2018

Expose the --fail-swap-on flag for kubelet kubernetes/kops#4239

Merged

ingvagabund mentioned this pull request Feb 9, 2018

kubelet: Set --fail-swap-on=false kubernetes-retired/contrib#2850

Merged

Pingan2017 mentioned this pull request Oct 9, 2018

remove kubelet flag '--experimental-fail-swap-on' (deprecated in v1.8) #69552

Merged

arnaldo2792 mentioned this pull request Nov 1, 2022

Allow bootstrap containers to manage swap bottlerocket-os/bottlerocket#2537

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail on swap enabled and deprecate experimental-fail-swap-on flag #47181

Fail on swap enabled and deprecate experimental-fail-swap-on flag #47181

dims commented Jun 8, 2017 •

edited by mtaufen

Loading

dims commented Jun 10, 2017

kiall commented Jun 12, 2017 •

edited

Loading

mtaufen commented Jun 12, 2017

vishh commented Jun 12, 2017

kiall commented Jun 12, 2017

vishh commented Jun 12, 2017

kiall commented Jun 12, 2017 •

edited

Loading

vishh commented Jun 12, 2017

mtaufen commented Jun 13, 2017

djsly commented Jun 21, 2017

mtaufen commented Jun 21, 2017

vishh commented Jun 22, 2017

djsly commented Jun 22, 2017

djsly commented Jun 22, 2017

mtaufen commented Jun 27, 2017

djsly commented Jul 10, 2017

dims commented Jul 26, 2017

mtaufen commented Jul 27, 2017

djsly commented Jul 27, 2017 via email

mtaufen commented Aug 3, 2017

k8s-ci-robot commented Aug 3, 2017

yujuhong commented Aug 4, 2017

dims commented Aug 4, 2017

cblecker commented Aug 4, 2017

k8s-github-robot commented Aug 4, 2017

k8s-github-robot commented Aug 4, 2017

ghost commented Oct 12, 2017

dims commented Oct 12, 2017

blackknight36 commented Jan 24, 2018

dims commented Jan 24, 2018

vherron12 commented Feb 19, 2018

vherron12 commented Feb 19, 2018

vherron12 commented Feb 19, 2018

Fail on swap enabled and deprecate experimental-fail-swap-on flag #47181

Fail on swap enabled and deprecate experimental-fail-swap-on flag #47181

Conversation

dims commented Jun 8, 2017 • edited by mtaufen Loading

dims commented Jun 10, 2017

kiall commented Jun 12, 2017 • edited Loading

mtaufen commented Jun 12, 2017

vishh commented Jun 12, 2017

kiall commented Jun 12, 2017

vishh commented Jun 12, 2017

kiall commented Jun 12, 2017 • edited Loading

vishh commented Jun 12, 2017

mtaufen commented Jun 13, 2017

djsly commented Jun 21, 2017

mtaufen commented Jun 21, 2017

vishh commented Jun 22, 2017

djsly commented Jun 22, 2017

djsly commented Jun 22, 2017

mtaufen commented Jun 27, 2017

djsly commented Jul 10, 2017

dims commented Jul 26, 2017

mtaufen commented Jul 27, 2017

djsly commented Jul 27, 2017 via email

mtaufen commented Aug 3, 2017

k8s-ci-robot commented Aug 3, 2017

yujuhong commented Aug 4, 2017

dims commented Aug 4, 2017

cblecker commented Aug 4, 2017

k8s-github-robot commented Aug 4, 2017

k8s-github-robot commented Aug 4, 2017

ghost commented Oct 12, 2017

dims commented Oct 12, 2017

blackknight36 commented Jan 24, 2018

dims commented Jan 24, 2018

vherron12 commented Feb 19, 2018

vherron12 commented Feb 19, 2018

vherron12 commented Feb 19, 2018

dims commented Jun 8, 2017 •

edited by mtaufen

Loading

kiall commented Jun 12, 2017 •

edited

Loading

kiall commented Jun 12, 2017 •

edited

Loading