kubelet eviction on inode exhaustion #30311

derekwaynecarr · 2016-08-09T21:36:15Z

Add support for kubelet to monitor for inode exhaustion of either image or rootfs, and in response, attempt to reclaim node level resources and/or evict pods.

This change is

derekwaynecarr · 2016-08-09T21:39:07Z

/cc @vishh @ronnielai - per our chat today.

A few things I am trying to reason through:

we have no way of knowing the number of inodes reclaimed in response to image gc.
i am not sure if i am double counting inode consumption, more eyes appreciated.

The issue with (1) is that we may have actually reclaimed enough inodes to not need to kill a pod.

derekwaynecarr · 2016-08-09T21:58:10Z

I am double counting pod inode consumption, but its double counting consistently (logs and rootfs), so I think it doesn't make a material impact.

ronnielai · 2016-08-10T16:50:47Z

If inode consumption is double counted, eviction-minimum-reclaim calculation will get impacted, won't it?

derekwaynecarr · 2016-08-10T19:48:24Z

@ronnielai - the double counting is happening when calculating per pod usage primarily between the rootfs and logfs. in this case, the counting is then used to choose the greediest consumer, but as long as the counting is consistent, we will choose the greediest. for min-reclaim, that is evaluated relative to the actual rootfs or imagefs inodesFree value, so no double counting is there. i guess what i am trying to figure is if there is any benefit to counting per pod inodes for anything that is not the rootfs. thoughts?

ronnielai · 2016-08-11T16:55:53Z

pkg/kubelet/eviction/helpers.go

@@ -41,10 +41,16 @@ const (
 	message = "The node was low on compute resources."
 	// disk, in bytes.  internal to this module, used to account for local disk usage.
 	resourceDisk api.ResourceName = "disk"
+	// inodes, in bytes. internal to this module, used to account for local disk inode consumption.


The comment (and the following ones) says the units of inodes are in bytes, which seems incorrect.

Good point, I got logically tripped up when using df -ih. In practice, it is actually quite useful to express these in powers of 1024 ;-)

$ curl <stats> "rootfs": { "availableBytes": 15132864512, "capacityBytes": 52710469632, "inodesFree": 2592837, "inodes": 3276800 }, $ df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3276800 683963 2592837 21% / $ df -ih Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/fedora-root 3.2M 668K 2.5M 21% /

It still says "in bytes" :)

oops, i swear i changed it to say "number" at some point. will fix up.

ronnielai · 2016-08-11T17:09:21Z

Please add inode testing in TestMakeSignalObservations() and TestThresholdsMet().

LGTM with nits

derekwaynecarr · 2016-08-15T21:41:35Z

I was experimenting with this more, and realized the following:

## HOST MACHINE
$ df -i
Filesystem                Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/fedora-root  3276800 714759  2562041   22% /
...
## INSIDE CONTAINER (overlay driver)
$ df -i       
Filesystem               Inodes  IUsed   IFree IUse% Mounted on
overlay                 3276800 714759 2562041   22% /
tmpfs                   1498577     18 1498559    1% /dev
tmpfs                   1498577     16 1498561    1% /sys/fs/cgroup
/dev/mapper/fedora-root 3276800 714759 2562041   22% /etc/hosts
shm                     1498577      1 1498576    1% /dev/shm
tmpfs                   1498577      9 1498568    1% /run/secrets/kubernetes.io/serviceaccount

Notice that the / inode consumption for the container matches the host. This means if I have multiple containers on a node, trying to rank pods by greediest consumer of inodes does not work with existing cAdvisor support. In practice, our pods will be ranked by QOS only unless we had a way of knowing the number of inodes unique to docker image + COW layer per container.

GIven that disk is best-effort in 1.4, are we ok with that behavior until we figure out a way to return meaningful values? Given that inodes consumption is independent of disk usage (i.e. touch foo.txt uses as much as 15GB file), it seems ranking pods by core QOS measures may be optimal behavior for now?

/cc @vishh @ronnielai

derekwaynecarr · 2016-08-15T21:47:59Z

Unless we want to run find / -type f | wc -l inside the container fs which is just awful ;-)

ronnielai · 2016-08-15T22:20:51Z

I think we can put that as a limitation in https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/kubelet-eviction.md for now. @vishh, what do you think?

vishh · 2016-08-15T23:32:51Z

I'm totally fine with just using QoS for inodes exhaustion. Given that tracking inodes is not possible across all storage drivers, let's not do it for now.

vishh · 2016-08-15T23:33:27Z

@derekwaynecarr we can have cadvisor track inodes similar to how it tracks per-container disk usage.

derekwaynecarr · 2016-08-16T14:02:46Z

I will update this PR and poke when ready.

derekwaynecarr · 2016-08-16T16:49:13Z

@vishh @ronnielai - updated. right now, left place-holder to report 0 for inode_usage per container to make it a one-liner update when we get per container stats in cadvisor. added test cases. should be good to go.

vishh · 2016-08-16T22:07:23Z

Excepting the pending bytes comment, this pr lgtm.

derekwaynecarr · 2016-08-17T14:46:16Z

rebased, fixed the byte lingering comment, tagging pr for merge.

derekwaynecarr · 2016-08-17T20:12:08Z

@k8s-bot test this issue #30462

derekwaynecarr · 2016-08-17T21:04:07Z

Added a commit now that we pass golint to get past jenkins verification error. Retagging.

k8s-bot · 2016-08-17T22:36:40Z

GCE e2e build/test passed for commit 8261520.

k8s-github-robot · 2016-08-18T15:31:47Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-08-18T16:16:45Z

GCE e2e build/test passed for commit 8261520.

k8s-github-robot · 2016-08-18T16:17:05Z

Automatic merge from submit-queue

vishh · 2016-08-18T18:30:16Z

㊗️ @derekwaynecarr & @ronnielai

ronnielai · 2016-08-18T22:08:28Z

#21546

Automatic merge from submit-queue kubelet eviction on inode exhaustion Add support for kubelet to monitor for inode exhaustion of either image or rootfs, and in response, attempt to reclaim node level resources and/or evict pods.

googlebot added the cla: yes label Aug 9, 2016

derekwaynecarr added this to the v1.4 milestone Aug 9, 2016

derekwaynecarr added the release-note-none Denotes a PR that doesn't merit a release note. label Aug 9, 2016

k8s-github-robot assigned vishh Aug 9, 2016

k8s-github-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 9, 2016

ronnielai reviewed Aug 11, 2016
View reviewed changes

ronnielai self-assigned this Aug 11, 2016

derekwaynecarr force-pushed the inode_eviction branch from 0a7f8ad to e2c0e90 Compare August 16, 2016 16:43

derekwaynecarr changed the title ~~WIP: kubelet eviction on inode exhaustion~~ kubelet eviction on inode exhaustion Aug 16, 2016

k8s-github-robot added the kind/design Categorizes issue or PR as related to design. label Aug 16, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2016

derekwaynecarr force-pushed the inode_eviction branch from e2c0e90 to b888d02 Compare August 17, 2016 14:45

derekwaynecarr removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 17, 2016

derekwaynecarr added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2016

kubelet eviction on inode exhaustion

a65a24a

derekwaynecarr added 2 commits August 17, 2016 16:57

Document known issue for kubelet inode exhaustion

52b4e87

Verify golint on kubelet eviction package

8261520

derekwaynecarr force-pushed the inode_eviction branch from b888d02 to 8261520 Compare August 17, 2016 21:03

derekwaynecarr added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Aug 17, 2016

k8s-github-robot merged commit ff58d04 into kubernetes:master Aug 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet eviction on inode exhaustion #30311

kubelet eviction on inode exhaustion #30311

derekwaynecarr commented Aug 9, 2016 •

edited by k8s-oncall

Loading

derekwaynecarr commented Aug 9, 2016

derekwaynecarr commented Aug 9, 2016

ronnielai commented Aug 10, 2016

derekwaynecarr commented Aug 10, 2016

ronnielai Aug 11, 2016

derekwaynecarr Aug 15, 2016

ronnielai Aug 16, 2016

derekwaynecarr Aug 16, 2016

ronnielai commented Aug 11, 2016 •

edited

Loading

derekwaynecarr commented Aug 15, 2016

derekwaynecarr commented Aug 15, 2016

ronnielai commented Aug 15, 2016

vishh commented Aug 15, 2016

vishh commented Aug 15, 2016

derekwaynecarr commented Aug 16, 2016

derekwaynecarr commented Aug 16, 2016

vishh commented Aug 16, 2016

derekwaynecarr commented Aug 17, 2016

derekwaynecarr commented Aug 17, 2016

derekwaynecarr commented Aug 17, 2016

k8s-bot commented Aug 17, 2016

k8s-github-robot commented Aug 18, 2016

k8s-bot commented Aug 18, 2016

k8s-github-robot commented Aug 18, 2016

vishh commented Aug 18, 2016

ronnielai commented Aug 18, 2016

kubelet eviction on inode exhaustion #30311

kubelet eviction on inode exhaustion #30311

Conversation

derekwaynecarr commented Aug 9, 2016 • edited by k8s-oncall Loading

derekwaynecarr commented Aug 9, 2016

derekwaynecarr commented Aug 9, 2016

ronnielai commented Aug 10, 2016

derekwaynecarr commented Aug 10, 2016

ronnielai Aug 11, 2016

Choose a reason for hiding this comment

derekwaynecarr Aug 15, 2016

Choose a reason for hiding this comment

ronnielai Aug 16, 2016

Choose a reason for hiding this comment

derekwaynecarr Aug 16, 2016

Choose a reason for hiding this comment

ronnielai commented Aug 11, 2016 • edited Loading

derekwaynecarr commented Aug 15, 2016

derekwaynecarr commented Aug 15, 2016

ronnielai commented Aug 15, 2016

vishh commented Aug 15, 2016

vishh commented Aug 15, 2016

derekwaynecarr commented Aug 16, 2016

derekwaynecarr commented Aug 16, 2016

vishh commented Aug 16, 2016

derekwaynecarr commented Aug 17, 2016

derekwaynecarr commented Aug 17, 2016

derekwaynecarr commented Aug 17, 2016

k8s-bot commented Aug 17, 2016

k8s-github-robot commented Aug 18, 2016

k8s-bot commented Aug 18, 2016

k8s-github-robot commented Aug 18, 2016

vishh commented Aug 18, 2016

ronnielai commented Aug 18, 2016

derekwaynecarr commented Aug 9, 2016 •

edited by k8s-oncall

Loading

ronnielai commented Aug 11, 2016 •

edited

Loading