-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local Ephemeral Storage limit not working #78865
Comments
Can you show your monitoring that shows that the pod exceeds its limit for an extended period of time (a few minutes?) |
containers: State: Running kubectl get po busybox-7cc68d968c-mb47z -n testns kubectl exec -it busybox-7cc68d968c-mb47z -n testns -- bash bash-4.2$ exit kubectl get po busybox-7cc68d968c-mb47z -n testns |
I was able to recreate this issue on a For the node that was being tested an EBS volume was attached to the instance and mounted as an xfs volume to The deployment with resources set had a pod scheduled to the node in question. "Execing" into the pod and running.
Created a file larger than the set ephemeral storage limit of 500Mi. The pod was not evicted. Even waiting up to 10 minutes. Starting again with a fresh volume and deployment. Initially creating a 4G file within the new pod with an underlying volume of 5G mounted on
caused imageGCManager to kick in due to the node DiskPressure condition rather than honouring ephemeral limits and evicting that one pod first
The ranking for eviction, however, was correct and the pod |
cc @kubernetes/sig-storage-bugs @jingxu97 |
@pickledrick @arunbpt7 Could you please share your pod yaml file? You can also email me jinxu at google.com if you prefer. Thanks! |
@arunbpt7 did you miss some part of yaml file? |
|
@arunbpt7 can you query the summary api (localhost:10255/stats/summary) from the node that pod is running on to make sure it is measuring disk space correctly? |
Our tests for this are not super consistent: https://k8s-testgrid.appspot.com/sig-node-kubelet#node-kubelet-serial&include-filter-by-regex=LocalStorageCapacityIsolationEviction, but are mostly green. I'll try and bump the timeout on the serial tests to see if we can get a clearer signal. |
the curl -s http://localhost:10255/stats/summary ran on the node where the pod is running and shows nothing . |
It sounds like that is probably your problem then. If you don't have any metrics, the kubelet can't do its monitoring or eviction. Can you share your kubelet logs, or see if there are any errors related to metrics? |
@pickledrick , can you share the kubelet logs |
Hi all, The insecure stats API appears to be deprecated. |
I'm not able to reproduce it with 1.13.5 cluster, I'm having the following
@pickledrick - if you have access to generated certs you can use the secure port |
Hi @yastij I'm not able to reproduce. It seems in my environment Evictions are happening eventually. I am sourcing more information to see if there is something else set in original reporters configuration. |
/var/lib/docker is a separate file system apart from node root fs |
Hi @arunbpt7 Yes, My test environment reflects this. Are you able to confirm the docker version you are using in this environment? |
Ran into something similar to this as well, but on a
|
This happens with the writable layer when the runtime directory (e. g. /var/lib/crio, /var/lib/docker) is not on the kubernetes root filesystem; it's due to this code in the eviction manager. It's not apparent to me why this was done, and the author of the code (in changeset 27901ad) appears to have moved on. I'm planning to open a PR on it. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
hello i got the same issue on K8S 1.16 and my docker image is 200GB . i have plenty of space in /var/lib/docker (500 GB out 450 GB Free) and i am getting this error. "ephermeral storage " ..can someone tell me what should be the fix. |
i have this error : The node was low on resource: ephemeral-storage. Container k8stst was using 112619704Ki, which exceeds its request of 0 |
hm maybe not 100% related to this issue but I had a problem which was due to the fact that I had 2 filesystems. K8S couldn't manage that fact. See kubernetes/enhancements#361 (comment). So I ended up to mount the filesystem and made docker and kubelet use the same partition with symlinks which solved the issue. |
@arunbpt7 opened an issue in kubernetes/enhancements. I am moving it here.
/kind bug
/priority important-longterm
/sig node
As discussed in #361 , Looking for a solution to restrict ephemeral storage for pods usage . As it is found that ephemeral storage is shared across all the pods and that is going to be fill up /var/lib/docker frequently based on the pods writable layer and logs. This is causing the high utilization on /var/lib/docker file system frequently. If there is a solution to restrict ephemeral storage for pods , for an example set a defined size (lets say 20G) for the pods , that particular pods only can use 20G on ephemeral storage and defined persistant volumes for more storage requirement. So that other pods can use available space on /var/lib/docker which again restrict them to use other 20G for each pods.
have defined ephemeral-storage request and limit in resources (spec.hard.requests.ephemeral-storage , spec.hard.limits.ephemeral-storage) on the deployment and verified that evictionHard: is enabled for "imagefs and "nodefs" on the node . but when when deploying the pod and it is not restricting the pod to use the defined ephemeral storage . when creating large file inside the container it is still able to create files more that the ephemeral-storage request and limit.
evictionHard:
imagefs.available: 15%
memory.available: 100Mi
nodefs.available: 10%
nodefs.inodesFree: 5%
containers:
image:
resources:
requests:
ephemeral-storage: "500Mi"
limits:
ephemeral-storage: "500Mi"
The text was updated successfully, but these errors were encountered: