Many orphaned volumes in hyperkube #22911

amlinux · 2016-03-12T22:56:43Z

Kubelet continuously throws lots of errors:

W0312 22:53:09.750535   13347 kubelet.go:1935] Orphaned volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42" found, tearing down volume
E0312 22:53:09.783931   13347 kubelet.go:1946] Could not tear down volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42": rename /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/default-token-rse42 /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rse42.deleting~720717277: device or resource busy
W0312 22:53:11.731998   13347 kubelet.go:1935] Orphaned volume "50da8d22-e89b-11e5-952b-5c514f64b257/wrapped_default-token-rse42.deleting~720717277" found, tearing down volume
W0312 22:53:11.734984   13347 mount.go:98] could not determine device for path: "/var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rse42.deleting~720717277"
W0312 22:53:11.747966   13347 kubelet.go:1935] Orphaned volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42" found, tearing down volume
E0312 22:53:11.764105   13347 kubelet.go:1946] Could not tear down volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42": rename /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/default-token-rse42 /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rse42.deleting~060988951: device or resource busy
W0312 22:53:13.730480   13347 kubelet.go:1935] Orphaned volume "50da8d22-e89b-11e5-952b-5c514f64b257/wrapped_default-token-rse42.deleting~060988951" found, tearing down volume
W0312 22:53:13.733530   13347 mount.go:98] could not determine device for path: "/var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rse42.deleting~060988951"
W0312 22:53:13.762582   13347 kubelet.go:1935] Orphaned volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42" found, tearing down volume
E0312 22:53:13.787960   13347 kubelet.go:1946] Could not tear down volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42": rename /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/default-token-rse42 /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rse42.deleting~997836385: device or resource busy
W0312 22:53:15.732156   13347 kubelet.go:1935] Orphaned volume "50da8d22-e89b-11e5-952b-5c514f64b257/wrapped_default-token-rse42.deleting~997836385" found, tearing down volume
W0312 22:53:15.735224   13347 mount.go:98] could not determine device for path: "/var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rse42.deleting~997836385"
W0312 22:53:15.771894   13347 kubelet.go:1935] Orphaned volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42" found, tearing down volume
E0312 22:53:15.796077   13347 kubelet.go:1946] Could not tear down volume "50da8d22-e89b-11e5-952b-5c514f64b257/default-token-rse42": rename /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/default-token-rse42 /var/lib/kubelet/pods/50da8d22-e89b-11e5-952b-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rse42.deleting~884779835: device or resource busy

I'm starting the cluster this way:

docker run \
    --name=kube_kubelet \
    --volume=/:/rootfs:ro \
    --volume=/sys:/sys:ro \
    --volume=/var/lib/docker/:/var/lib/docker:rw \
    --volume=/var/lib/kubelet/:/var/lib/kubelet:rw \
    --volume=/var/run:/var/run:rw \
    --net=host \
    --pid=host \
    --privileged=true \
    -d \
    gcr.io/google_containers/hyperkube:v1.2.0-beta.0 \
    /hyperkube kubelet \
    --containerized \
    --hostname-override="127.0.0.1" \
    --address="0.0.0.0" \
    --api-servers=http://localhost:8080 \
    --cluster-dns=10.0.0.10 \
    --cluster-domain=cluster.local \
    --allow-privileged=true \
    --v=2 \
    --config=/etc/kubernetes/manifests || exit $?

docker run \
    --name=kube_kubeproxy \
    -d \
    --net=host \
    --privileged \
    gcr.io/google_containers/hyperkube:v1.2.0-beta.0 \
    /hyperkube proxy \
    --master=http://127.0.0.1:8080 \
    --v=2 || exit $?

The text was updated successfully, but these errors were encountered:

amlinux · 2016-03-19T15:25:09Z

To reproduce this behavior I cleaned up everything related to kubernetes (containers, /var/lib/kubernetes contents, etc), started hyperkube, then started a bunch of containers - everything is good up to this point.

Then I delete a pod with (kubectl delete pod frontend), and this happens:

I0319 15:21:25.936936    9224 kubelet.go:2391] SyncLoop (UPDATE, "api"): "frontend-1jmyn_default(dbe5be50-ede5-11e5-9681-5c514f64b257)"
I0319 15:21:25.937489    9224 manager.go:1368] Killing container "e0f0f65106cd3e4165066d010f77366c95f8e96361441c7d846e4f4455e34368 frontend default/frontend-1jmyn" with 30 second grace period
I0319 15:21:25.969848    9224 kubelet.go:2388] SyncLoop (ADD, "api"): "frontend-koi5j_default(3ef8ed1e-ede6-11e5-9681-5c514f64b257)"
I0319 15:21:25.989511    9224 nsenter_mount.go:179] Failed findmnt command: exit status 1
I0319 15:21:25.990450    9224 server.go:1096] GET /containerLogs/default/frontend-1jmyn/frontend?follow=true: (2m1.56788367s) 200 [[Go 1.1 package http] 192.168.192.25:57244]
I0319 15:21:26.005163    9224 nsenter_mount.go:179] Failed findmnt command: exit status 1
I0319 15:21:26.015950    9224 manager.go:1400] Container "e0f0f65106cd3e4165066d010f77366c95f8e96361441c7d846e4f4455e34368 frontend default/frontend-1jmyn" exited after 78.410518ms
I0319 15:21:26.018012    9224 manager.go:1368] Killing container "44d793d313e3676f293b51dd940edb21a91b565bc00f1c1ef5f179ca1e7c8415 default/frontend-1jmyn" with 30 second grace period
I0319 15:21:26.154601    9224 manager.go:1688] Need to restart pod infra container for "frontend-koi5j_default(3ef8ed1e-ede6-11e5-9681-5c514f64b257)" because it is not found
I0319 15:21:26.430028    9224 manager.go:1400] Container "44d793d313e3676f293b51dd940edb21a91b565bc00f1c1ef5f179ca1e7c8415 default/frontend-1jmyn" exited after 411.995092ms
I0319 15:21:27.453218    9224 kubelet.go:2410] SyncLoop (PLEG): "frontend-koi5j_default(3ef8ed1e-ede6-11e5-9681-5c514f64b257)", event: &pleg.PodLifecycleEvent{ID:"3ef8ed1e-ede6-11e5-9681-5c514f64b257", Type:"ContainerStarted", Data:"35b226753961be3decf567780735d46a11298fe9ef980024fb560435d3811058"}
I0319 15:21:27.730298    9224 kubelet.go:2410] SyncLoop (PLEG): "frontend-1jmyn_default(dbe5be50-ede5-11e5-9681-5c514f64b257)", event: &pleg.PodLifecycleEvent{ID:"dbe5be50-ede5-11e5-9681-5c514f64b257", Type:"ContainerDied", Data:"e0f0f65106cd3e4165066d010f77366c95f8e96361441c7d846e4f4455e34368"}
I0319 15:21:27.733627    9224 kubelet.go:2410] SyncLoop (PLEG): "frontend-1jmyn_default(dbe5be50-ede5-11e5-9681-5c514f64b257)", event: &pleg.PodLifecycleEvent{ID:"dbe5be50-ede5-11e5-9681-5c514f64b257", Type:"ContainerDied", Data:"44d793d313e3676f293b51dd940edb21a91b565bc00f1c1ef5f179ca1e7c8415"}
I0319 15:21:27.767601    9224 kubelet.go:2391] SyncLoop (UPDATE, "api"): "frontend-1jmyn_default(dbe5be50-ede5-11e5-9681-5c514f64b257)"
I0319 15:21:27.822477    9224 kubelet.go:2394] SyncLoop (REMOVE, "api"): "frontend-1jmyn_default(dbe5be50-ede5-11e5-9681-5c514f64b257)"
I0319 15:21:27.822521    9224 kubelet.go:2538] Failed to delete pod "frontend-1jmyn_default(dbe5be50-ede5-11e5-9681-5c514f64b257)", err: pod not found
W0319 15:21:28.646963    9224 kubelet.go:1979] Orphaned volume "dbe5be50-ede5-11e5-9681-5c514f64b257/default-token-c6gc1" found, tearing down volume
W0319 15:21:28.649524    9224 mount.go:98] could not determine device for path: "/var/lib/kubelet/pods/dbe5be50-ede5-11e5-9681-5c514f64b257/volumes/kubernetes.io~secret/default-token-c6gc1"
E0319 15:21:28.665859    9224 kubelet.go:1990] Could not tear down volume "dbe5be50-ede5-11e5-9681-5c514f64b257/default-token-c6gc1": rename /var/lib/kubelet/pods/dbe5be50-ede5-11e5-9681-5c514f64b257/volumes/kubernetes.io~secret/default-token-c6gc1 /var/lib/kubelet/pods/dbe5be50-ede5-11e5-9681-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-c6gc1.deleting~655762369: device or resource busy
I0319 15:21:28.736742    9224 kubelet.go:2410] SyncLoop (PLEG): "frontend-koi5j_default(3ef8ed1e-ede6-11e5-9681-5c514f64b257)", event: &pleg.PodLifecycleEvent{ID:"3ef8ed1e-ede6-11e5-9681-5c514f64b257", Type:"ContainerStarted", Data:"aca06e4796eb721791f4d0cdc63c362746e9e301c8b12f5609197cddd0e2ab5b"}
I0319 15:21:28.736849    9224 manager.go:363] Status for pod "dbe5be50-ede5-11e5-9681-5c514f64b257" is up-to-date; skipping
W0319 15:21:30.645863    9224 kubelet.go:1979] Orphaned volume "dbe5be50-ede5-11e5-9681-5c514f64b257/wrapped_default-token-c6gc1.deleting~655762369" found, tearing down volume
W0319 15:21:30.648390    9224 mount.go:98] could not determine device for path: "/var/lib/kubelet/pods/dbe5be50-ede5-11e5-9681-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-c6gc1.deleting~655762369"
W0319 15:21:30.673478    9224 kubelet.go:1979] Orphaned volume "dbe5be50-ede5-11e5-9681-5c514f64b257/default-token-c6gc1" found, tearing down volume

amlinux · 2016-03-19T16:18:33Z

More observations. Here is my default token:

# kubectl get secrets | grep default
default-token-rjwc9   kubernetes.io/service-account-token   3         17m

Here is error message I get:

E0319 16:14:53.642626   23448 kubelet.go:1990] Could not tear down volume "7cd0d6fc-edeb-11e5-bb95-5c514f64b257/default-token-rjwc9": rename /var/lib/kubelet/pods/7cd0d6fc-edeb-11e5-bb95-5c514f64b257/volumes/kubernetes.io~secret/default-token-rjwc9 /var/lib/kubelet/pods/7cd0d6fc-edeb-11e5-bb95-5c514f64b257/volumes/kubernetes.io~secret/wrapped_default-token-rjwc9.deleting~210881875: device or resource busy

This path is changing all the time:

root@amlie:~# ls -l /var/lib/kubelet/pods/7cd0d6fc-edeb-11e5-bb95-5c514f64b257/volumes/kubernetes.io~secret/
итого 4
drwxrwxrwt 2 root root  100 мар 19 16:58 default-token-rjwc9
drwx------ 2 root root 4096 мар 19 17:16 wrapped_default-token-rjwc9.deleting~055097388
root@amlie:~# ls -l /var/lib/kubelet/pods/7cd0d6fc-edeb-11e5-bb95-5c514f64b257/volumes/kubernetes.io~secret/
итого 4
drwxrwxrwt 2 root root  100 мар 19 16:58 default-token-rjwc9
drwx------ 2 root root 4096 мар 19 17:16 wrapped_default-token-rjwc9.deleting~525414462

(Note the subdirectory names are different)

Pod deleteion code was recently touched by @caesarxuchao, authentication tokens (whose directories can't be removed) by @liggitt, @jlowdermilk. Any ideas what it could be?

I would ignore this problem if it didn't consume 15% of CPU in a very tight orphan removal loop.

liggitt · 2016-03-19T16:22:55Z

Those tokens are just secrets as far as the kubelet is concerned. Not sure how secret volume cleanup differs in hyperkube.

cc @pmorie

NikolayMurha · 2016-03-28T11:02:15Z

We have same problem on nodes with CoreOS alpha 991.0.0 and hyperkube 1.2

a lot of messages like:

Mar 28 14:02:41 az1 kubelet-wrapper[8417]: W0328 11:02:41.899982    8417 mount.go:98] could not determine device for path: "/var/lib/kubelet/pods/e170d3bf-f27a-11e5-b08c-90b11c4d3a00/volumes/kubernetes.io~secret/wrapped_default-token-3bfau.deleting~864258571"
Mar 28 14:02:41 az1 kubelet-wrapper[8417]: W0328 11:02:41.901281    8417 mount.go:98] could not determine device for path: "/var/lib/kubelet/pods/e170d3bf-f27a-11e5-b08c-90b11c4d3a00/volumes/kubernetes.io~secret/default-token-3bfau"
Mar 28 14:02:41 az1 kubelet-wrapper[8417]: E0328 11:02:41.901463    8417 kubelet.go:1990] Could not tear down volume "e170d3bf-f27a-11e5-b08c-90b11c4d3a00/default-token-3bfau": rename /var/lib/kubelet/pods/e170d3bf-f27a-11e5-b08c-90b11c4d3a00/volumes/kubernetes.io~secret/default-token-3bfau /var/lib/kubelet/pods/e170d3bf-f27a-11e5-b08c-90b11c4d3a00/volumes/kubernetes.io~secret/wrapped_default-token-3bfau.deleting~272239891: device or resource busy

saturnism · 2016-04-13T22:38:49Z

same issue here.

apatil · 2016-05-04T19:29:05Z

Same on Ubuntu 14.04, hyperkube v1.2.3, docker 1.9.1 with aufs backend, kernel 3.13.0-48-generic.

I also see pod state transitions taking a long time; they hang in "Pending" and "Terminating" for minutes. Does anyone else see that in conjunction with this issue?

apatil · 2016-05-04T19:36:56Z

For context, this issue doesn't show up immediately in my hyperkube setup. It turns up after a few days. During that time, I update 6 or 7 deployments simultaneously, several times per day.

pmorie · 2016-05-04T21:57:22Z

Sorry I missed the ping on this. Assigned to self, will try to triage by the end of the week.

bgrant0607 · 2016-05-06T21:25:20Z

cc @dlorenc @vishh

AaronTao1990 · 2016-05-19T06:45:47Z

got the same problem on centos 7.2

W0519 06:43:26.715097   49123 kubelet.go:1995] Orphaned volume "0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/default-token-gofxs" found, tearing down volume
W0519 06:43:26.720166   49123 mount.go:99] could not determine device for path: "/var/lib/kubelet/pods/0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/volumes/kubernetes.io~secret/default-token-gofxs"
E0519 06:43:26.738842   49123 kubelet.go:2006] Could not tear down volume "0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/default-token-gofxs": rename /var/lib/kubelet/pods/0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/volumes/kubernetes.io~secret/default-token-gofxs /var/lib/kubelet/pods/0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/volumes/kubernetes.io~secret/wrapped_default-token-gofxs.deleting~000822711: device or resource busy
W0519 06:43:28.655343   49123 kubelet.go:1995] Orphaned volume "0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/default-token-gofxs" found, tearing down volume
W0519 06:43:28.660379   49123 mount.go:99] could not determine device for path: "/var/lib/kubelet/pods/0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/volumes/kubernetes.io~secret/default-token-gofxs"
E0519 06:43:28.675859   49123 kubelet.go:2006] Could not tear down volume "0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/default-token-gofxs": rename /var/lib/kubelet/pods/0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/volumes/kubernetes.io~secret/default-token-gofxs /var/lib/kubelet/pods/0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/volumes/kubernetes.io~secret/wrapped_default-token-gofxs.deleting~550898602: device or resource busy
W0519 06:43:28.675911   49123 kubelet.go:1995] Orphaned volume "0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/wrapped_default-token-gofxs.deleting~000822711" found, tearing down volume
W0519 06:43:28.680160   49123 mount.go:99] could not determine device for path: "/var/lib/kubelet/pods/0640a1c8-1d8a-11e6-a9b1-1402ec3ed5c0/volumes/kubernetes.io~secret/wrapped_default-token-gofxs.deleting~000822711"

errordeveloper · 2016-05-27T09:38:17Z

I am wondering if this depends on Docker version, and not so much the graph driver.

errordeveloper · 2016-05-27T16:48:32Z

xref #26421

king6cong · 2016-05-30T08:51:46Z

{"log":"W0530 08:51:01.922523  200496 kubelet.go:1979] Orphaned volume \"75e97c2a-1e56-11e6-b226-e84dd0c219e4/default-token-5p5vg\" found, tearing down volume\n","stream":"stderr","time":"2016-05-30T08:51:01.922661105Z"}
{"log":"W0530 08:51:01.924846  200496 mount.go:98] could not determine device for path: \"/var/lib/kubelet/pods/75e97c2a-1e56-11e6-b226-e84dd0c219e4/volumes/kubernetes.io~secret/default-token-5p5vg\"\n","stream":"stderr","time":"2016-05-30T08:51:01.924981918Z"}
{"log":"E0530 08:51:01.944604  200496 kubelet.go:1990] Could not tear down volume \"75e97c2a-1e56-11e6-b226-e84dd0c219e4/default-token-5p5vg\": rename /var/lib/kubelet/pods/75e97c2a-1e56-11e6-b226-e84dd0c219e4/volumes/kubernetes.io~secret/default-token-5p5vg /var/lib/kubelet/pods/75e97c2a-1e56-11e6-b226-e84dd0c219e4/volumes/kubernetes.io~secret/wrapped_default-token-5p5vg.deleting~503183222: device or resource busy\n","stream":"stderr","time":"2016-05-30T08:51:01.944784947Z"}
{"log":"W0530 08:51:01.944634  200496 kubelet.go:1979] Orphaned volume \"76e8bf5f-128c-11e6-9469-e84dd0c219e4/wrapped_default-token-5p5vg.deleting~986070007\" found, tearing down volume\n","stream":"stderr","time":"2016-05-30T08:51:01.944793936Z"}

Same problem here, any ideas?

alml · 2016-05-30T08:54:46Z

@bgrant0607, could you please re-triage this bug. I'm not sure "team/ux" is the right tag for it - it doesn't have anything common with UX. It looks like a problem with some low-level container machinery. And given the number of +1's, maybe raise the priority? Thanks.

bgrant0607 · 2016-05-31T02:49:18Z

@alml What problem does this cause for you?

king6cong · 2016-05-31T03:10:31Z

@bgrant0607

The error msg will be a confuse to log/alert monitor infrastructures and really hope it can be fixed.

alml · 2016-05-31T09:33:40Z

@bgrant0607 As I wrote before:

I would ignore this problem if it didn't consume 15% of CPU in a very tight orphan removal loop.

I have a 8-core laptop that can work ~ 4 hours on battery (enough for a mid-range flight). When I run Kubernetes cluster via hyperkube, it loads 15% of my CPU resources (more than 1 core!) in a tight loop draining my battery less than in hour.

saad-ali · 2016-05-31T21:06:40Z

Assigning #26421 #25056 #22911 (all seem related--secret volume tear down issues with hyperkube) to @jingxu97 for investigation.

jingxu97 · 2016-06-04T14:20:12Z

After setting up a hypekube and digging the logs, I think the following happened and caused the issue.

Hyperkube automatically creates a few pods for dns, dashboard, proxy services etc. But there is something wrong when setting up and mounting the volumes. The following shows some logs from one such pod.

(1) 21:20:03.214957 kubelet.go:2579] SyncLoop (ADD, "api"): "kubernetes-dashboard-fnivx_kube-system(efa7ceb6-29d0-11e6-bafc-5065f351b2ea)"
(2) 21:20:03.269801 nsenter_mount.go:117] Mount command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/mount -t tmpfs tmpfs /var/lib/kubelet/pods/efa7ceb6-29d0-11e6-bafc-5065f351b2ea/volumes/kubernetes.io~secret/default-token-z7ul7]
****** These two steps show that volume is mounted

(3) 21:20:07.170736 nsenter_mount.go:196] IsLikelyNotMountPoint findmnt output for path /var/lib/kubelet/pods/efa7ceb6-29d0-11e6-bafc-5065f351b2ea/volumes/kubernetes.io~~secret/default-token-z7ul7: /var/lib/kubelet/pods/efa7ceb6-29d0-11e6-bafc-5065f351b2ea/volumes/kubernetes.io~~secret/default-token-z7ul
****** Very strange thing here is when SychPod check the mount path, the last letter "7" of the path is missing. Now the step (4) show that the volume is mounted again

(4) 21:20:07.218321 nsenter_mount.go:117] Mount command: nsenter [--mount=/rootfs/proc/1/ns/mnt -- /bin/mount -t tmpfs tmpfs /var/lib/kubelet/pods/efa7ceb6-29d0-11e6-bafc-5065f351b2ea/volumes/kubernetes.io~secret/default-token-z7ul7]

From this point, each time SyncPod check the volume, it mounts again the again which is proved by the findmnt output in the log. After a few hours running, there are over hundreds of mounts. I am working on finding the reason why the last letter of the returned mount point is missing. I manually tested the findmnt command inside of the container and it returned the correct result.

There is also an issue for umount. The umount code will first move the mount dir to a temporary dir, but failed with "device or resource busy" error. I also tried to mount tmpfs in the container with command
mount -t tmpfs tmpfs userdir
Then tried to mv userdir also failed with the same error.

Will continue working on solving this issue. Please let me know if you any comments

saad-ali · 2016-06-06T18:11:41Z

Excellent debugging @jingxu97. Looking forward to seeing what the root cause is for both issues.

jingxu97 · 2016-06-06T23:16:06Z

Some updates for this bug

About issue one that path name missing the last character:
I tried to run the "findmnt" command directly on terminal, the output is fine. But when executing the command though exec.command.Output(), the target path name's last character always misses. I tried out different findmnt options, it does not work. Only if the mounting path name is relatively short (looks like under 90~100 characters), there is no missing character. Also no matter how long the path is, only the last character is missing.
Our current code in function nsenter_mount.IsLikelyNotMountPoint() uses exact string match to detect whether the path is already mounted or not, which is not robust. We can change to substring match instead.
About umount issue. The tearDown also invokes IsLikelyNotMountPoint(). If it returns error message, GetMountMedium() function will regard the mount medium as default, but instead it should be memory. Then teaddown will invokes tearDownDefault and tries to rename directories and delete them, which in turn results in "Device or resource busy" error. Also it does not handle the case if path is mounted multiple times (this case should not happen in normal situation)

I will try to modify the IsLikeLyNotMountPoint() to make it more robust. Please let me know if you have any comments or suggestions. Thanks!

saad-ali · 2016-06-06T23:28:00Z

PR #21486 and PR #23435 both modified the behavior of NsenterMounter.IsLikelyNotMountPoint() to fix issues and evidently introduced different issues. So watch out for regressions in any potential fix. CC @jsafrane @fgrzadkowski

Also, have you observed the missing character issue in a non-hyperkube environment, or is there something unique about running locally inside a docker container that causes this to happen?

jingxu97 · 2016-06-07T00:21:11Z

I see isLikelyNotMountPoint() function is implmented in two places: nscenter_mount.go and mount_linux.go. My understanding is nsenterMounter is only used when kubelet is running inside of container. nscenter_mount uses findmnt command result to check the mount point. In mount_linux.go, it uses a different approach: It checks whether the directory has a different device as parent, then it is a mountpoint

I also wrote a small go program to test the "findmnt" command in my local environment, it also truncate the last letter if the mounting path is relatively long. So this "findmnt" behavior is not related to hyperkube environment I think.

saad-ali · 2016-06-07T02:32:29Z

Thanks for the clarification

luxas · 2016-06-07T12:00:05Z

Thanks for debugging this.
May I set the v1.3 milestone, as I think we should fix this before release?

jingxu97 · 2016-06-07T16:51:06Z

Sure. THanks!

Jing

On Tue, Jun 7, 2016 at 5:01 AM, Lucas Käldström notifications@github.com
wrote:

Thanks for debugging this.
May I set the v1.3 milestone, as I think we should fix this before
release?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#22911 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ASSNxQhUtKOPU_PAYivLIVa0iHHXgOonks5qJV1-gaJpZM4HvaNq
.

Jing

In nsenter_mount.go/isLikelyNotMountPoint function, the returned output from findmnt command misses the last letter. Modify the code to make sure that output has the full target path. fix kubernetes#26421 kubernetes#25056 kubernetes#22911

dsanders1234 · 2016-06-14T22:51:51Z

Please note that findmnt command does not exist on docker host vm for virtualbox/windows setup (i.e. 'boot2docker'). I'm pretty sure this will be the case for Mac too. The errors that I get are:

nsenter_mount.go:185] Failed findmnt command for path /var/lib/kubelet/pods/9602feb4-31ba-11e6-bdbe-f62284e3ec86/volumes/kubernetes.io~secret/default-token-4hz3v: exit status 1

and these don't seem to get cleaned up

Automatic merge from submit-queue Fix bug in isLikelyNotMountPoint function In nsenter_mount.go/isLikelyNotMountPoint function, the returned output from findmnt command misses the last letter. Modify the code to use String.contains instead of string matching. fixes #26421 fixes #25056 fixes #22911

amlinux · 2016-09-07T00:25:42Z

This problem is still there. Right now I'm observing it on my GKE instances (managed VM, Kubernetes 1.3.5):

E0907 00:20:19.615448    3457 kubelet.go:2275] Failed cleaning up orphaned pod directories: remove /var/lib/kube
let/pods/85e13d83-711e-11e6-9db3-42010af0003f/volumes/kubernetes.io~secret/authdb-dsn: device or resource busy
E0907 00:20:19.615520    3457 kubelet.go:2596] Failed cleaning pods: remove /var/lib/kubelet/pods/85e13d83-711e-
11e6-9db3-42010af0003f/volumes/kubernetes.io~secret/authdb-dsn: device or resource busy
W0907 00:20:20.638008    3457 container_gc.go:116] Failed to remove dead container "/k8s_POD.d8dbe16c_frontend-5
11d9_default_e70b06ac-6c7d-11e6-b07d-42010af00048_528d48e3": Error response from daemon: Unable to remove filesy
stem for f273e3cb9cb9b65e44ded8de5e4dd3215eac08ccc4abc0402d7d4fb849de3e1c: remove /var/lib/docker/containers/f27
3e3cb9cb9b65e44ded8de5e4dd3215eac08ccc4abc0402d7d4fb849de3e1c/shm: device or resource busy
W0907 00:20:20.641453    3457 container_gc.go:116] Failed to remove dead container "/k8s_POD.d8dbe16c_stationd-v
20160831-2-k1xyz_orbital-prod_096e406c-6f3f-11e6-b07d-42010af00048_de6b8b72": Error response from daemon: Unable
 to remove filesystem for f73afb5773a5321412340d6e29d3936470b9784a98af3874d599a071f2196d5a: remove /var/lib/dock
er/containers/f73afb5773a5321412340d6e29d3936470b9784a98af3874d599a071f2196d5a/shm: device or resource busy
W0907 00:20:20.649780    3457 container_gc.go:116] Failed to remove dead container "/k8s_POD.d8dbe16c_frontend-v
20160831-2-ixopu_orbital-prod_7a4ce4d0-6f3c-11e6-b07d-42010af00048_124ba6ee": Error response from daemon: Unable
 to remove filesystem for 7616add0f17d3a6c65be02a5f65dfb2585d69062c80193249d90a38bd0068479: remove /var/lib/dock
er/containers/7616add0f17d3a6c65be02a5f65dfb2585d69062c80193249d90a38bd0068479/shm: device or resource busy
W0907 00:20:20.651825    3457 container_gc.go:116] Failed to remove dead container "/k8s_POD.d8dbe16c_sessionsd-
v20160831-2-4srld_orbital-prod_fb6a70d2-6f3f-11e6-b07d-42010af00048_e7b5b4b9": Error response from daemon: Unabl
e to remove filesystem for d37a415eb2f303631297bee47c38dd23f0cb0e214b727d337109ed0b4513af11: remove /var/lib/doc
ker/containers/d37a415eb2f303631297bee47c38dd23f0cb0e214b727d337109ed0b4513af11/shm: device or resource busy
W0907 00:20:20.653428    3457 container_gc.go:116] Failed to remove dead container "/k8s_POD.d8dbe16c_version-as
signer-m0gc8_default_8e4e34e0-5d64-11e6-b07d-42010af00048_69f10933": Error response from daemon: Unable to remov
e filesystem for d55b1f61bbacd01572205b48b3d085030a47352f5889933f303cc3fd7abe1308: remove /var/lib/docker/contai
ners/d55b1f61bbacd01572205b48b3d085030a47352f5889933f303cc3fd7abe1308/shm: device or resource busy
I0907 00:20:21.399689    3457 kubelet.go:2029] Failed to remove orphaned pod "85e13d83-711e-11e6-9db3-42010af000
3f" dir; err: remove /var/lib/kubelet/pods/85e13d83-711e-11e6-9db3-42010af0003f/volumes/kubernetes.io~secret/aut
hdb-dsn: device or resource busy
E0907 00:20:21.399941    3457 kubelet.go:2275] Failed cleaning up orphaned pod directories: remove /var/lib/kube
let/pods/85e13d83-711e-11e6-9db3-42010af0003f/volumes/kubernetes.io~secret/authdb-dsn: device or resource busy
E0907 00:20:21.400817    3457 kubelet.go:2596] Failed cleaning pods: remove /var/lib/kubelet/pods/85e13d83-711e-
11e6-9db3-42010af0003f/volumes/kubernetes.io~secret/authdb-dsn: device or resource busy

I noticed it after CPU load jumped from typical 15% to crazy 100% on both my nodes:

jingxu97 · 2016-09-07T17:27:43Z

@amlinux Thank you for providing this information. After checking the log offline, the problem should be fixed in v1.3.6. Please upgrade to v1.3.6 and let us know if you have any problem. Thanks!

pwittrock added the team/ux label Mar 14, 2016

luxas mentioned this issue Mar 24, 2016

[1.2] local cluster setup with hyperkube doesn't work #23417

Closed

5 tasks

colhom mentioned this issue Mar 28, 2016

[kubelet] Could not tear down volume coreos/coreos-kubernetes#357

Open

pmorie self-assigned this May 4, 2016

apatil mentioned this issue May 6, 2016

Pod scheduling grinds to a halt on v1.2.3 #25056

Closed

bgrant0607 added kind/bug Categorizes issue or PR as related to a bug. priority/backlog Higher priority than priority/awaiting-more-evidence. area/devexp labels May 6, 2016

errordeveloper mentioned this issue May 27, 2016

local kubernetes anywhere kubelet seems to be stuck in some sort of loop weaveworks/service#378

Closed

bgrant0607 added the help-wanted label May 31, 2016

bgrant0607 unassigned pmorie May 31, 2016

bgrant0607 added the team/cluster label May 31, 2016

bgrant0607 mentioned this issue May 31, 2016

Should back-off when tearing orphaned volume down fails #26421

Closed

saad-ali assigned jingxu97 May 31, 2016

bgrant0607 removed the team/ux label Jun 4, 2016

luxas added this to the v1.3 milestone Jun 7, 2016

luxas added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Jun 7, 2016

jingxu97 mentioned this issue Jun 8, 2016

Fix bug in isLikelyNotMountPoint function #27054

Merged

fhautomated mentioned this issue Jun 10, 2016

Orphaned volumes failing to clean up with 'device or resource busy' openshift/origin#9267

Closed

jameswilkins mentioned this issue Jun 10, 2016

Origin-node unable to tear down Orphaned Volumes with 'device or resource busy' when attempting rename openshift/origin#9268

Closed

k8s-github-robot closed this as completed in #27054 Jun 19, 2016

saad-ali mentioned this issue Jul 21, 2016

Error while tearing down pod, "device or resource busy" on service account secret #28750

Closed

bruceauyeung mentioned this issue Jul 13, 2017

pod status is always terminating #48866

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Many orphaned volumes in hyperkube #22911

Many orphaned volumes in hyperkube #22911

amlinux commented Mar 12, 2016

amlinux commented Mar 19, 2016

amlinux commented Mar 19, 2016

liggitt commented Mar 19, 2016

NikolayMurha commented Mar 28, 2016

saturnism commented Apr 13, 2016

apatil commented May 4, 2016 •

edited

Loading

apatil commented May 4, 2016

pmorie commented May 4, 2016

bgrant0607 commented May 6, 2016

AaronTao1990 commented May 19, 2016

errordeveloper commented May 27, 2016

errordeveloper commented May 27, 2016

king6cong commented May 30, 2016

alml commented May 30, 2016

bgrant0607 commented May 31, 2016

king6cong commented May 31, 2016 •

edited

Loading

alml commented May 31, 2016

saad-ali commented May 31, 2016

jingxu97 commented Jun 4, 2016

saad-ali commented Jun 6, 2016

jingxu97 commented Jun 6, 2016 •

edited

Loading

saad-ali commented Jun 6, 2016

jingxu97 commented Jun 7, 2016 •

edited

Loading

saad-ali commented Jun 7, 2016

luxas commented Jun 7, 2016

jingxu97 commented Jun 7, 2016

dsanders1234 commented Jun 14, 2016

amlinux commented Sep 7, 2016 •

edited

Loading

jingxu97 commented Sep 7, 2016

Many orphaned volumes in hyperkube #22911

Many orphaned volumes in hyperkube #22911

Comments

amlinux commented Mar 12, 2016

amlinux commented Mar 19, 2016

amlinux commented Mar 19, 2016

liggitt commented Mar 19, 2016

NikolayMurha commented Mar 28, 2016

saturnism commented Apr 13, 2016

apatil commented May 4, 2016 • edited Loading

apatil commented May 4, 2016

pmorie commented May 4, 2016

bgrant0607 commented May 6, 2016

AaronTao1990 commented May 19, 2016

errordeveloper commented May 27, 2016

errordeveloper commented May 27, 2016

king6cong commented May 30, 2016

alml commented May 30, 2016

bgrant0607 commented May 31, 2016

king6cong commented May 31, 2016 • edited Loading

alml commented May 31, 2016

saad-ali commented May 31, 2016

jingxu97 commented Jun 4, 2016

saad-ali commented Jun 6, 2016

jingxu97 commented Jun 6, 2016 • edited Loading

saad-ali commented Jun 6, 2016

jingxu97 commented Jun 7, 2016 • edited Loading

saad-ali commented Jun 7, 2016

luxas commented Jun 7, 2016

jingxu97 commented Jun 7, 2016

dsanders1234 commented Jun 14, 2016

amlinux commented Sep 7, 2016 • edited Loading

jingxu97 commented Sep 7, 2016

apatil commented May 4, 2016 •

edited

Loading

king6cong commented May 31, 2016 •

edited

Loading

jingxu97 commented Jun 6, 2016 •

edited

Loading

jingxu97 commented Jun 7, 2016 •

edited

Loading

amlinux commented Sep 7, 2016 •

edited

Loading