-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet eviction on inode exhaustion #30311
kubelet eviction on inode exhaustion #30311
Conversation
/cc @vishh @ronnielai - per our chat today. A few things I am trying to reason through:
The issue with (1) is that we may have actually reclaimed enough inodes to not need to kill a pod. |
I am double counting pod inode consumption, but its double counting consistently (logs and rootfs), so I think it doesn't make a material impact. |
If inode consumption is double counted, eviction-minimum-reclaim calculation will get impacted, won't it? |
@ronnielai - the double counting is happening when calculating per pod usage primarily between the rootfs and logfs. in this case, the counting is then used to choose the greediest consumer, but as long as the counting is consistent, we will choose the greediest. for min-reclaim, that is evaluated relative to the actual rootfs or imagefs inodesFree value, so no double counting is there. i guess what i am trying to figure is if there is any benefit to counting per pod inodes for anything that is not the rootfs. thoughts? |
@@ -41,10 +41,16 @@ const ( | |||
message = "The node was low on compute resources." | |||
// disk, in bytes. internal to this module, used to account for local disk usage. | |||
resourceDisk api.ResourceName = "disk" | |||
// inodes, in bytes. internal to this module, used to account for local disk inode consumption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment (and the following ones) says the units of inodes are in bytes, which seems incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I got logically tripped up when using df -ih
. In practice, it is actually quite useful to express these in powers of 1024 ;-)
$ curl <stats>
"rootfs": {
"availableBytes": 15132864512,
"capacityBytes": 52710469632,
"inodesFree": 2592837,
"inodes": 3276800
},
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 683963 2592837 21% /
$ df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3.2M 668K 2.5M 21% /
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still says "in bytes" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, i swear i changed it to say "number" at some point. will fix up.
Please add inode testing in TestMakeSignalObservations() and TestThresholdsMet(). LGTM with nits |
I was experimenting with this more, and realized the following: ## HOST MACHINE
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/fedora-root 3276800 714759 2562041 22% /
...
## INSIDE CONTAINER (overlay driver)
$ df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
overlay 3276800 714759 2562041 22% /
tmpfs 1498577 18 1498559 1% /dev
tmpfs 1498577 16 1498561 1% /sys/fs/cgroup
/dev/mapper/fedora-root 3276800 714759 2562041 22% /etc/hosts
shm 1498577 1 1498576 1% /dev/shm
tmpfs 1498577 9 1498568 1% /run/secrets/kubernetes.io/serviceaccount Notice that the GIven that disk is best-effort in 1.4, are we ok with that behavior until we figure out a way to return meaningful values? Given that inodes consumption is independent of disk usage (i.e. /cc @vishh @ronnielai |
Unless we want to run |
I think we can put that as a limitation in https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/kubelet-eviction.md for now. @vishh, what do you think? |
I'm totally fine with just using QoS for inodes exhaustion. Given that tracking inodes is not possible across all storage drivers, let's not do it for now. |
@derekwaynecarr we can have cadvisor track inodes similar to how it tracks per-container disk usage. |
I will update this PR and poke when ready. |
0a7f8ad
to
e2c0e90
Compare
@vishh @ronnielai - updated. right now, left place-holder to report 0 for inode_usage per container to make it a one-liner update when we get per container stats in cadvisor. added test cases. should be good to go. |
Excepting the pending |
e2c0e90
to
b888d02
Compare
rebased, fixed the byte lingering comment, tagging pr for merge. |
b888d02
to
8261520
Compare
Added a commit now that we pass golint to get past jenkins verification error. Retagging. |
GCE e2e build/test passed for commit 8261520. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit 8261520. |
Automatic merge from submit-queue |
Automatic merge from submit-queue kubelet eviction on inode exhaustion Add support for kubelet to monitor for inode exhaustion of either image or rootfs, and in response, attempt to reclaim node level resources and/or evict pods.
Add support for kubelet to monitor for inode exhaustion of either image or rootfs, and in response, attempt to reclaim node level resources and/or evict pods.
This change is