-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet/Kubernetes should work with Swap Enabled #53533
Comments
Support for swap is non-trivial. Guaranteed pods should never require swap. Burstable pods should have their requests met without requiring swap. BestEffort pods have no guarantee. The kubelet right now lacks the smarts to provide the right amount of predictable behavior here across pods. We discussed this topic at the resource mgmt face to face earlier this year. We are not super interested in tackling this in the near term relative to the gains it could realize. We would prefer to improve reliability around pressure detection, and optimize issues around latency before trying to optimize for swap, but if this is a higher priority for you, we would love your help. |
/kind feature |
@derekwaynecarr thank you for explanation! It was hard to get any information/documentation why swap should be disabled for kubernetes. This was the main reason why I opened this topic. At this point I do not have high priority for this issue, just wanted to be sure that we have a place where it can be discussed. |
There is more context in the discussion here: #7294 – having swap available has very strange and bad interactions with memory limits. For example, a container that hits its memory limit would then start spilling over into swap (this appears to be fixed since f4edaf2 – they won't be allowed to use any swap whether it's there or not). |
This is critical use case for us too. We have a cron job that occasionally runs into high memory usage (>30GB) and we don't want to permanently allocate 40+GB nodes. Also, given that we run in three zones (GKE), this will allocate 3 such machines (1 in each zone). And this configuration has to be repeated in 3+ production instances and 10+ test instances making this super expensive to use K8s. We are forced to have 25+ 48GB nodes which incurs huge cost!. |
A workaround for those who really want swap. If you
That's what we're doing. Or at least, I'm pretty sure it is, I didn't actually implement it personally, but that's what I gather. This might only really be a viable strategy if none of your containers ever specify an explicit memory requirement... |
We run in GKE, and I don't know of a way to set those options. |
I'd be open to considering adopting zswap if someone can evaluate the implications to memory evictions in kubelet. |
I am running Kubernetes in my local Ubuntu laptop and with each restart I have to turnoff swap. Also I have to worry about not to go near memory limit as swap is off. Is there any way with each restart I don't have to turn off swap like some configuration file change in existing installation? I don't need swap on nodes running in cluster. Its just other applications on my laptop other than Kubernetes Local Dev cluster who need swap to be turned on.
Right now the flag is not working.
|
Set the following Kubelet flag: `--fail-swap-on=false`
…On Tue, Jan 30, 2018 at 1:59 PM, icewheel ***@***.***> wrote:
I am running Kubernetes in my local Ubuntu laptop and with each restart I
have to turnoff swap. Also I have to worry about not to go near memory
limit if swap if off.
Is there any way with each restart I don't have to turn off swap like some
configuration file change in existing installation?
I don't need swap on nodes running in cluster.
Its just other applications on my laptop other than Kubernetes Local Dev
cluster who need swap to be turned on.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#53533 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA3JwQdj2skL2dSqEVyV46iCllzT-sOVks5tP5DSgaJpZM4PwnD5>
.
--
Michael Taufen
Google SWE
|
thanks @mtaufen |
For systems that bootstrap cluster for you (like terraform), you may need to modify the service file This worked for me
|
Not supporting swap as a default? I was surprised to hear this -- I thought Kubernetes was ready for the prime time? Swap is one of those features. This is not really optional in most open use cases -- it is how the Unix ecosystem is designed to run, with the VMM switching out inactive pages. If the choice is no swap or no memory limits, I'll choose to keep swap any day, and just spin up more hosts when I start paging, and I will still come out saving money. Can somebody clarify -- is the problem with memory eviction only a problem if you are using memory limits in the pod definition, but otherwise, it is okay? It'd be nice to work in a world where I have control over the way an application memory works so I don't have to worry about poor memory usage, but most applications have plenty of inactive memory space. I honestly think this recent move to run servers without swap is driven by the PaaS providers trying to coerce people into larger memory instances--while disregarding ~40 years of memory management design. The reality is that the kernel is really good about knowing what memory pages are active or not--let it do its job. |
This also has an effect that if the memory gets exhausted on the node, it will potentially become completely locked up - requiring a restart of the node, rather than just slowing down and recovering a while later. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
|
@abdennour That doesn't solve the issue. You're just disabling swap. That depending on your workload may or may not be viable as has been already pointed out within this issue. |
Is there any actual downside to leaving swap on and setting I understand the decision/recommendation and can understand it will take some time to work through reconsidering it, but the regardless of how that goes am I correct in thinking that the only actual immediate downside is over-committed memory and the resulting degraded memory performance for some workloads on the margin? The scheduler does not take swap into account when determining node resources right, and the OOMKiller will still come around and kill processes that escape their limits -- theoretically a node with no |
Swap KEP draft up at kubernetes/enhancements#2602 Feature tracked at kubernetes/enhancements#2400 Aiming for an alpha MVP for 1.22 release (the upcoming one). PTAL! |
(And sorry, I realized I assigned this and left everyone hanging - I've sent out some emails to the mailing list and we have been iterating on a design doc I used to develop the draft KEP above.) |
I'm new to Kubernetes and just learned about this issue; I think swap should get at least a very minimal support for swap. But Kubernetes should aim for broader support of other scenarios. For example, I'm planning to have 3 very small nodes to run 1 pod each, and use Kubernetes mainly for replica and fail-over. Nothing fancy, just one big app on three VPS. When the amount of memory of the host is small, having swap is critical for the stability of the host system. A Linux distribution does not run on constant or pre-allocated memory, therefore there is always a chance that something in the host OS could produce a surge in memory and without swap the oom killer would be invoked. And in my experience, when the linux OOM comes in the results are nothing good, and configuring it properly requires extended knowledge on how your particular OS installation behaves, what's critical and what's not. Following this train of thought my problem is more about Kubernetes requiring the sysadmin to disable swap entirely on the node than having proper swap support on the pods. Showing a nasty warning instead of failing to start it seems a better option to me than require to set a flag. Having proper swap support for pods sounds also really interesting as it can make the nodes very dense, which it can be interesting on certain applications. (Some apps preallocate a lot of memory, touch it and almost never go back to it). And we're also seeing faster drives lately, PCIe 4.0 and new standards for using drives as memory; with these, moving back from disk to memory is fast enough to consider swapping as an option to get more stuff packed per server. My point here is basically: 1) I believe swap support is needed. 2) kubernetes doesn't need to get from 0 to 100 in one shot, there are lots of middle options that are also reasonably valid that would mitigate the majority of issues people have with removing swap entirely. |
Since we have a lot of folks commenting on this issue who are new to the Kubernetes development process, I'll try to explain what I linked above a bit more clearly.
At this point I don't think there's opposition to implementing some kind of swap support, it's just a matter of doing so carefully and in a way that will address most use cases without adding too much complexity or breaking existing users. |
My proposal has been accepted for the 1.22 cycle. We will proceed with the implementation described in the design doc: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2400-node-swap |
/triage accepted |
Spectacular work, @ehashman! Thank you so much for driving this through. |
Greetings, friends! This issue closed as my PR has merged and we now have alpha swap support available in Kubernetes. It should be available in the next 1.22 branch release cut (v1.22.0-beta.1). There are a few things to keep in mind:
|
Is kubernetes/enhancements#2400 the best place to keep an eye out for that work? |
Yup, that's right. Future work will be tracked on that issue. I think our beta criteria are mostly solid, so it's a matter of whether we'll be able to get all the work done for beta next release, as there is a lot to do. Then there will be some lag time between beta and GA as we gather feedback and make updates. Help definitely wanted! If anyone following this issue wants to jump in, you can reach out to me at ehashman at redhat dot com or on k8s Slack (@ehashman). |
Just for assuming: With swap can avoid crash when memory flow for OS, And k8s/CRI as software not allow process use swap, Will that be difficult or cause some problem? |
I have been trying to find work arounds for swap and eventually I just wrote a user space based swap solution using mmap. I've been using it for a week now and it seems to work pretty good https://github.com/misko/bigmaac.git . Not sure if this helps anyone here, to swap a process use |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Kubelet/Kubernetes 1.8 does not work with Swap enabled on Linux Machines.
I have found this original issue #31676
This PR #31996
and last change which enabled it by default 71e8c8e
If Kubernetes does not know how to handle memory eviction when Swap is enabled - it should find a way how to do that, but not asking to get rid of swap.
Please follow kernel.org Chapter 11 Swap Management, for example
In case of running a lot of node/java applications I have seen always a lot of pages are swapped, just because they aren't used anymore.
What you expected to happen:
Kubelet/Kubernetes should work with Swap enabled. I believe instead of disabling swap and giving users no choices kubernetes should support more use cases and various workloads, some of them can be an applications which might rely on caches.
I am not sure how kubernetes decided what to kill with memory eviction, but considering that Linux has this capability, maybe it should align with how Linux does that? https://www.kernel.org/doc/gorman/html/understand/understand016.html
I would suggest to rollback the change for failing when swap is enabled, and revisit how the memory eviction works currently in kubernetes. Swap can be important for some workloads.
How to reproduce it (as minimally and precisely as possible):
Run kubernetes/kublet with default settings on linux box
Anything else we need to know?:
Environment:
kubectl version
):uname -a
):/sig node
cc @mtaufen @vishh @derekwaynecarr @dims
The text was updated successfully, but these errors were encountered: