Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail on swap enabled and deprecate experimental-fail-swap-on flag #47181

Merged
merged 1 commit into from
Aug 4, 2017

Conversation

dims
Copy link
Member

@dims dims commented Jun 8, 2017

What this PR does / why we need it:

* Deprecate the old experimental-fail-swap-on
* Add a new flag fail-swap-on and set it to true

Before this change, we would not fail when swap is on. With this
change we fail for everyone when swap is on, unless they explicitly
set --fail-swap-on to false.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Fixes #34726

Special notes for your reviewer:

Release note:

Kubelet will by default fail with swap enabled from now on. The experimental flag "--experimental-fail-swap-on" has been deprecated, please set the new "--fail-swap-on" flag to false if you wish to run with /proc/swaps on.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 8, 2017
@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Jun 8, 2017
@dims
Copy link
Member Author

dims commented Jun 10, 2017

/cc @mtaufen

@k8s-ci-robot k8s-ci-robot requested a review from mtaufen June 10, 2017 11:15
@kiall
Copy link

kiall commented Jun 12, 2017

When people are running kubelet locally on their workstations (we do this for some dev environment stuff), there will likely be swap enabled. Can a --no-fail-on-swap or similar flag be added to do the opposite of --experimental-fail-swap-on?

@mtaufen
Copy link
Contributor

mtaufen commented Jun 12, 2017

@kiall that's a good point.

Maybe we should just print warnings in some of these cases and report conformance failures in a Node condition, rather than crashing the Kubelet? Or maybe we should discuss having a dev mode that relaxes some of the node conformance requirements?

@vishh what do you think?

@vishh
Copy link
Contributor

vishh commented Jun 12, 2017

Adding an opt-out for strict conformance sounds like a good idea for development. However I'd prefer not having an opt-out per feature/check. Can we have one global conformance opt-out flag instead?

@kiall
Copy link

kiall commented Jun 12, 2017

Adding an opt-out for strict conformance sounds like a good idea for development. However I'd prefer not having an opt-out per feature/check. Can we have one global conformance opt-out flag instead?

The problem I see with a global opt-out flag is, people will want to opt out of 1 thing (e.g. swap off), and once they enable it, they'll never notice the 100 other things they should be doing, but aren't. This may be OK, but it's something to consider :)

@vishh
Copy link
Contributor

vishh commented Jun 12, 2017

The problem I see with a global opt-out flag is, people will want to opt out of 1 thing (e.g. swap off), and once they enable it, they'll never notice the 100 other things they should be doing, but aren't. This may be OK, but it's something to consider :)

Devs who opt-out of error checking are expected to understand what they are doing.

@kiall
Copy link

kiall commented Jun 12, 2017

Absolutely, but production deployments will opt out of some rules too, and many will likely never notice the list of opt outs behind a global flag has grown since they initially evaluated what they opted out of. Again, this may be OK in the context of e.g. certification, but certification is likely not the only reason to want to be aware of additional differences between newly expected requirements and actual deployment environments.

@vishh
Copy link
Contributor

vishh commented Jun 12, 2017

In general, we try to not have the kubelet crash explicitly unless the underlying configuration is incompatible with Kubelet's core functionalities. So in theory I can see how a single flag cannot satisfy all use cases, but in practice, I think that will just work for the foreseeable future.

@mtaufen
Copy link
Contributor

mtaufen commented Jun 13, 2017

I'd support adding an explicit --dev-mode flag to handle these sorts of things (that will only ever be a flag, if anyone adds it, please never add it to the KubeletConfiguration type). We need to be careful that too much stuff doesn't get shoved under "dev mode" though. You probably want your dev Kubelet to be as close to production as possible. And conformance failures absolutely need to be logged as errors if you're in a "dev mode" that doesn't crash when the node fails conformance checks.

@djsly
Copy link
Contributor

djsly commented Jun 21, 2017

Just to provide a use case where we will need swap. Currently, we have containers that are mounting /dev/shm and loading a lot to RAM Disks for cross container memory sharing.

While we should try not to swap at all, they are cases where we do see swapping.

In the cloud, when you don't have swap and you do start swapping, it makes the VMs unreachable after a while since the OS will most probably start killing processes (sshd and friends).

Having a flag to only disable Swap would be a necessity I would say for our use case.

@mtaufen
Copy link
Contributor

mtaufen commented Jun 21, 2017

If you see swapping caused by pressure from your containers, doesn't that mean the requests and limits need to be adjusted? Ideally requests and limits are set such that the scheduler helps you avoid triggering OOM-kills. Sorry if I'm misunderstanding your situation.

@vishh
Copy link
Contributor

vishh commented Jun 22, 2017

@djsly

While we should try not to swap at all, they are cases where we do see swapping.

What do you meant by "see swapping"?

Having a flag to only disable Swap would be a necessity I would say for our use case.

Kubelet expects swap to be disabled by default. Are you requesting that it should allow swapping?

@djsly
Copy link
Contributor

djsly commented Jun 22, 2017

@mtaufen @vishh
Bare with me, this use is indeed quite unusual :)

First of all, we only set CPU requests in the manifest.

we also set
"hostPID": true, and

         {
            "name": "memory",
            "hostPath": {
              "path": "/dev/shm/{{ namespace }}/{{ deployed_pool_unique_id }}"
            }
          },  

since we create a ram disk with almost all the RAM available, we have processes that will fill up the RAM for other containers to share. Since the memory is managed outside of the container, we end up having possibilities where Memory could fill up hence triggering some swapping.

This is not a desired behavior, but it could occur on some occasions.

When we moved to the cloud, we didnt have swap enabled and this caused the CLoud VM to become unresponsive from time to time.

@djsly
Copy link
Contributor

djsly commented Jun 22, 2017

Note that I might not be understanding the whole swap + container story, so I'm not sure how swap can affect the memory usage of a linux container.

@mtaufen
Copy link
Contributor

mtaufen commented Jun 27, 2017

Interesting. Would it be possible to run a container that does nothing, but sets memory requests and limits equal to the size of the RAM disk so that the Kubernetes scheduler can know you intend to allocate that much memory to the RAM disk? If the scheduler is aware, it should be able to avoid sending too many other containers to the Node, and then you should be able to both turn off swap and avoid OOM kills.

Would it help to slightly reduce the size of the RAM disk or slightly increase the size of the RAM? It may also be that your margin of safety is just too thin.

@djsly
Copy link
Contributor

djsly commented Jul 10, 2017

For best performance, we require to almost fill the RAM disk to its max capacity which is all of the RAM. We use POD anti-affinity and affinity logics to make sure that only the pods that uses the RAM disk are scheduled on those nodes.

There are edge cases where during a rolling update the RAM disk gets over filled, hence the need for the SWAP.

@dims
Copy link
Member Author

dims commented Jul 26, 2017

Don't see a consensus, abandoning for now

@dims dims closed this Jul 26, 2017
@mtaufen
Copy link
Contributor

mtaufen commented Jul 27, 2017

Re-opening. This discussion trailed off during the release but we should resolve it. I propose we deprecate the experimental flag name and replace it with fail-swap-on, and default this flag to true. That gives @djsly an out for his use case and we still enforce no swap for most users.

@mtaufen mtaufen reopened this Jul 27, 2017
@djsly
Copy link
Contributor

djsly commented Jul 27, 2017 via email

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2017
@mtaufen
Copy link
Contributor

mtaufen commented Aug 3, 2017

/assign @yujuhong for approval

@k8s-ci-robot
Copy link
Contributor

@mtaufen: GitHub didn't allow me to assign the following users: for, approval.

Note that only kubernetes members can be assigned.

In response to this:

/assign @yujuhong for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@yujuhong
Copy link
Contributor

yujuhong commented Aug 4, 2017

/approve

@dims
Copy link
Member Author

dims commented Aug 4, 2017

@cblecker / @fejta can one of you please take a look? (one line change in hack/verify-flags/known-flags.txt)

@cblecker
Copy link
Member

cblecker commented Aug 4, 2017

/approve
Hopefully we can get rid of the known flags bit soon (per kubernetes-dev@) 😄

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cblecker, dims, mtaufen, yujuhong

Associated issue: 34726

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 4, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 50119, 48366, 47181, 41611, 49547)

@ghost
Copy link

ghost commented Oct 12, 2017

Can someone explain why kubernetes shouldn't be run with swap enabled? Neither this pull request nor the referenced ticket gives any explanation for it.

@dims
Copy link
Member Author

dims commented Oct 12, 2017

@Jonast - Please see #53533

@dims dims deleted the fail-on-swap-enabled branch November 16, 2017 22:05
Qambar added a commit to Qambar/provisioning that referenced this pull request Dec 25, 2017
The terraform apply command got stuck at a point when it was trying to
initialize all the hosts in the cluster. It kept on saying "Waiting for
API server to respond".

Further investigation showed that it was failing on preflight checks:

module.kubernetes.null_resource.kubernetes[0] (remote-exec): [preflight]
Some fatal errors occurred:
module.kubernetes.null_resource.kubernetes[0] (remote-exec):   [ERROR
Swap]: running with swap on is not supported. Please disable swap

This pre-flight check was enabled in kubernetes/kubernetes#47181

Since, the swap was being enabled in main.tf, i have removed that bit to
stop terraform from enabling swap on the host machines.

This closes hobby-kube/guide#25
@blackknight36
Copy link

I would also like to know why running with swap on is not supported.

@dims
Copy link
Member Author

dims commented Jan 24, 2018

@blackknight36 please see #53533

@vherron12
Copy link

Hi Sir,

How to deal w/ this issue? I'm running CenOS 7 on vagrant.

[root@k8s-master ~]# kubeadm init \

--kubernetes-version=v1.9.0
--pod-network-cidr=10.244.0.0/16
--apiserver-advertise-address=192.168.25.10
--ignore-preflight-errors=Swap
[init] Using Kubernetes version: v1.9.0
[init] Using Authorization modes: [Node RBAC]
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
[preflight] Starting the kubelet service
[certificates] Generated ca certificate and key.
[certificates] Generated apiserver certificate and key.
[certificates] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.25.10]
[certificates] Generated apiserver-kubelet-client certificate and key.
[certificates] Generated sa key and public key.
[certificates] Generated front-proxy-ca certificate and key.
[certificates] Generated front-proxy-client certificate and key.
[certificates] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[kubeconfig] Wrote KubeConfig file to disk: "admin.conf"
[kubeconfig] Wrote KubeConfig file to disk: "kubelet.conf"
[kubeconfig] Wrote KubeConfig file to disk: "controller-manager.conf"
[kubeconfig] Wrote KubeConfig file to disk: "scheduler.conf"
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- There is no internet connection, so the kubelet cannot pull the following control plane images:
- gcr.io/google_containers/kube-apiserver-amd64:v1.9.0
- gcr.io/google_containers/kube-controller-manager-amd64:v1.9.0
- gcr.io/google_containers/kube-scheduler-amd64:v1.9.0

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
couldn't initialize a Kubernetes cluster
[root@k8s-master ~]#

@vherron12
Copy link

image

@vherron12
Copy link

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TODO: Remove the opt-in for failing when swap is enabled.