-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods can get stuck Terminating in certain situations if running a container fails #24819
Comments
This is easy to reproduce by inserting a 2-second sleep before the error return here: kubernetes/pkg/kubelet/dockertools/manager.go Line 1468 in e566948
|
@ncdc, does the container inspection always fail, or does it only fail in a certain window of time? I am trying to understand the scope of the problem. If it's a permanent |
@yujuhong it only ever inspects this particular container one time (during the PLEG relist where it sees the 500 Unknown device error), and that inspection fails. That's what's recorded in the PLEG status cache:
If the kubelet or you tried to inspect container |
kubernetes/pkg/kubelet/pleg/generic.go Lines 203 to 204 in 34d4eae
|
@ncdc, so the timeline is:
I think PLEG should explicitly record the pods/containers to retry the next time if error is encountered (as opposed to rely on maintaining the old view of container states). What do you think? |
@yujuhong the creation of container B failed (permission denied error in /var/lib/docker/volumes). You might argue that the fact that the container even shows up when listing all containers is a bug, since Docker is in the middle of creating the container but that call hasn't returned yet. I like the idea of recording what to retry. How are you thinking we can implement this? |
Also, re "it updates the cache with the error and maintains the old view of only seeing A (so that it'd retry the next time)", it's the following kubernetes/pkg/kubelet/pleg/generic.go Line 229 in 34d4eae
|
So the container appear briefly in
Yes, it was done intentionally so that PLEG will retry the inspection again in the next relist, assuming the container was still there. We should send out the events regardless (even if the inspect fails), but record the pod ID internally so that we can retry and update the cache in the next relist. The downside is that we'd decouple the event and cache update even further; the consumer of the event may not see an up-to-date cache. I think it's reasonable to assume taht the consumers will retry if the cache is not ready, like what the pod workers do today, as long as this doesn't happen often. |
The key is to make sure that the cache doesn't continue to return the same |
We need to add context to the event created when the pleg inspect fails |
I'm actually seeing this happen frequently (the presence of the "Unknown device" errors during PLEG inspection), but it's not a problem because it's happening during the creation of a container, and a subsequent PLEG relist ends up with a successful inspection. I wrote a sample go program to create a container in 1 goroutine and then inspect it as often and as fast as possible until it gets no error. I was able to have the inspection fail 45-50 times with "Unknown device" prior to it succeeding. While this isn't really a valid test (hammering the Docker daemon to inspect 1 container), it does highlight that this can certainly happen. |
It appears that removing the log message and continue statement from kubernetes/pkg/kubelet/pleg/generic.go Line 228 in 34d4eae
kubernetes/pkg/kubelet/dockertools/manager.go Line 1974 in d6f26b6
|
@yujuhong what are the implications (if any) of removing that continue statement? |
Once you remove the In short, we can' just remove the |
So we need to add a new |
Yes, in addition to detecting container changes via |
Ok, I can do that 😄 |
Automatic merge from submit-queue PLEG: reinspect pods that failed prior inspections Fix the following sequence of events: 1. relist call 1 successfully inspects a pod (just has infra container) 1. relist call 2 gets an error inspecting the same pod (has infra container and a transient container that failed to create) and doesn't update the old/new pod records 1. relist calls 3+ don't inspect the pod any more (just has infra container so it doesn't look like anything changed) This change adds a new list that keeps track of pods that failed inspection and retries them the next time relist is called. Without this change, a pod in this state would never be inspected again, its entry in the status cache would never be updated, and the pod worker would never call syncPod again because the most recent entry in the status cache has an error associated with it. Without this change, pods in this state would be stuck Terminating forever, unless the user issued a deletion with a grace period value of 0. Fixes #24819 cc @kubernetes/rh-cluster-infra @kubernetes/sig-node
Is anything changed? It seemed to me I got this issue. |
@DenisIzmaylov this is a very specific issue that has a very particular root cause for why pods get stuck terminating, and it has been fixed by 25077. There may be other reasons why pods get stuck terminating, so I'm wondering if you're maybe hitting a different root cause? |
May be. How I can see logs for this issue? |
@DenisIzmaylov, one common reason for pods to be stuck at terminating is that the worker goroutine responsible for the pod is stuck at certain operations (e.g., image pulling, etc). By the way, you can forcibly delete the pod with |
I have
|
|
I run Details about a single pod:
|
But I've not configured and do not have any mention for |
In the pod spec, if you don't specify DNS policy, the default will be "ClusterFirst". How long did your pods stay in the terminating state? Have they ever become running before you attempted to delete them? |
Would it be possible to move this discussion to either StackOverflow or a different issue? |
@yujuhong Hm, I've not found it:
|
@ncdc of course |
I've created a new one. |
@ncdc @yujuhong PR #25077 should have fix this issue already in Kubernetes 1.3.8, but I encountered almost the same issue in our env. A pod got stuck Terminating after deleting it's rc, and then new pods got stuck ContainerCreating when creating it's rc. When I try to delete the rc of the new pod, the new pod got stuck Terminating, too.
Not sure if it's in the same certain situations @ncdc mentioned. I just got things like this in kubelet.log
|
I have had this happen to me - as part of an issue with resource limits and the cluster autoscaler happening at the same time as a kops rolling-update. This left that pod in terminating state w/o a replica-set or any deployment set, stateful set, jobs, etc. I tried the force delete and it hung for 20 minutes before I stopped it. The describe shows that the pod didn't scale up another node (autoscaler was failing) and the PodScheduled is False because it was requested to terminate. I suspect that the log entry was cleaned up as mentioned above so the kubelet(?) does not know its status and can not determine it. clicking on it in the Dashboard of course throws an error because the ReplicaSet (or parent) is missing. Does anyone have any other ideas of how to purge it from the list? (beside the --grace-period) |
…idation_message Bug 1740604: UPSTREAM 89300: Expand the PVC validation messages Origin-commit: 298daf9157d28e983e515a957886fe2d9c8c5c3f
In one of our clusters, we are not allowing containers to run images that use
VOLUME
instructions without a corresponding Kubernetes volume and volumeMount to back it. We've done this by making/var/lib/docker/volumes
immutable (chattr +i
). An attempt to run a pod like this results in the container creation falling, and there's no way for the pod to run successfully.What we're seeing is that sometimes, attempts to delete the pod result in the pod getting stuck Terminating unless you delete it with a grace period of 0. This appears to be because of the following:
VOLUME
API error (500): Unknown device 27038b8e8b60506e1744c7ab15e06a3e24855fa925678e75d72b6e1fd3bd4a31
kubernetes/pkg/kubelet/pod_workers.go
Lines 125 to 127 in 392fc66
@kubernetes/sig-node @kubernetes/rh-cluster-infra @ironcladlou @smarterclayton
The text was updated successfully, but these errors were encountered: