-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods do not get cleaned up #45688
Comments
/sig node Great write-up, thanks for the extra detail. |
@jagosan: Reiterating the mentions to trigger a notification: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@lorenz I had the same issue, but it's now resolved. Maybe you could try "grep -l container_id /proc/*/mountinfo" to check who's preventing your pod from terminating. |
@moon03432: I have the same problem. Can you tell me how you resolved the problem? I see containers in state
When it comes to deleting I also get this error message
The strange thing for me was that the mountinfo grep listed my three hits:
When I look at the pids i see:
|
Just wanted to say that this is still an issue on 1.7.0. This is also not caused by a process hanging on to the mountpoint, I can unmount the path/volume myself and kubelet will clean up the pod. Could we have kubelet attempt unmounting if it sees that a volume can't be removed because it's busy? I could prepare a PR for that. |
@euank After some tests I think you're right. I tried to reproduce on the two machines on 1409.5.0 and it cleans up pods properly. The only machine where it doesn't is the one still on 1298.7.0 (soon going to get updated). So I think we can (at least for my problem, not sure about the other ones) close this as fixed in CoreOS 1353.8.0. |
@muffin87 I just killed my node_exporter. I think this could release some dead containers but not all of them. |
@moon03432: Thanks for the answer! Yeah right killing node_exporter and flutend will free up the busy resources. I can reproduce this when I create a Pod and mount /var/lib into the container. |
Do you happen to have a Issue # within the CoreOS project that describes the fix? |
ref #51835 |
- Mark kubelet datadir volume as a recursive mount in kubelet-wrapper kubernetes/kubernetes#45688 https://github.com/coreos/coreos-overlay/pull/2508/files
I am still getting this error with kubelet 1.9.2 Nothing shows up with lsof or fuser on /var/lib/kubelet/pods/.. Apparently, stopping the docker service released the lock and i was able to remove those directories manually |
@rambo45 Are you running Docker 17.xx? These still don't seem to be stable with Kubernetes. |
@lorenz I am running Docker 17.03, it seem to work for the most part until you start deleting and recreating containers rapidly which seem to hit a race condition with the volume cleanups. I think its still manageable once there is enough documentation around these scenarios. |
@rambo45 Seeing exactly the same. Luckily CoreOS still ships 1.12 if you enable it, so I'm currently running that option everywhere. I have a lot of container churn, so staying on 17.09 (what CoreOS ships by default) was not an option, within 24h I accumulated a few hundred pods stuck in terminating. Still awaiting a proper cri-containerd so that I can get rid of Docker for good. |
#28750 describes the problem for a much older Kubernetes version and is marked as fixed.
Is this a BUG REPORT or FEATURE REQUEST? (choose one): Bug Report
Kubernetes version:
Environment:
What happened:
Some terminated pods are not cleaned up (staying in Terminating state) for a long time (maybe indefinitely?) because of issues deleting secret volumes.
Log excerpt:
Excerpt from
mount
:Output from
fuser -vm
:The reason why they're not being cleaned up is because the volume is not being unmounted before being moved to the deletion area and you can't move a directory which is a mountpoint with a rename() syscall, which is what Go does internally when you call os.Rename.
What you expected to happen:
The Pods should be cleaned up.
How to reproduce it (as minimally and precisely as possible):
Happens on all three machines, so probably CoreOS + kubelet-wrapper should work.
The text was updated successfully, but these errors were encountered: