-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix "PVC Volume not detached if pod deleted via namespace deletion" issue #29077
Fix "PVC Volume not detached if pod deleted via namespace deletion" issue #29077
Conversation
I believe this: "Modify attach/detach controller to cache PVC/PV for each volume so even if these objects are deleted, pod deletion events can be processed without error, resulting in even faster detaches. ...has been causing us a lot of pain using PDs on GKE. We have a script that launches our deployments, pv,pvc, etc. It creates pv/pvc first and then the deployment, however the script that tears down the deployment deletes the pv/pvc first before deleting the deployment. We never delete the namespace and we deploy into a non default namespace. We randomly see PDs remaining attached for long periods after deleting, usually finally detaches in 5-15 minutes. We have a GKE engineer looking at the cluster so if you want any details about the findings let me know. We have another issue (may not be related) where deployment times start at 5-10 seconds on a fresh cluster and then after a few days or a week the deployment times drop to 1-2 minutes and never get better. If we delete the cluster and re-create we get back to 5-10 seconds, but since it's GKE that's our only option to fix. Not sure if this is related but it seems like the PD issues don't happen until the cluster starts doing slow deployments so it might be worth looking into. If this is the issue we've been seeing on GKE it's been happening over the last couple months and across almost every new version released during that time. |
@saad-ali: If you're interested the Google GKE support case is 09807892. |
@@ -84,7 +84,7 @@ func (dswp *desiredStateOfWorldPopulator) findAndRemoveDeletedPods() { | |||
} | |||
// retrieve the pod object from pod informer with the namespace key | |||
informerPodObj, exists, err := dswp.podInformer.GetStore().GetByKey(dswPodKey) | |||
if err != nil || informerPodObj == nil { | |||
if err != nil || (exists && informerPodObj == nil) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saad-ali DO we want to make this check more descriptive? Is one worth logging more than another?
Trying to understand:
if err != nil {
glog.Errorf("podInformer GetBy key failed with an error....")
continue
}
if (exists && informerPodObj == nil) {
glog.Info("podInformer GetByKey found pod that is already deleted %q", dwsPodKey)
continue
}
Fix bug in desired_state_of_the_world_populator.go to check exists so that it can delete pods even if the delete event is missed (or fails)
0fd5e1f
to
afd8a58
Compare
@matchstick PTAL Will do the following in a follow-up (#29262):
|
Opened #29311 to track improvements to this code. |
@saad-ali Giving a LGTM for 1.3 but we really need something better for 1.4 as this code made us bump heads several times. |
GCE e2e build/test passed for commit afd8a58. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e build/test passed for commit afd8a58. |
Automatic merge from submit-queue |
…77-upstream-release-1.3 Automated cherry pick of #29077 upstream release 1.3
…ck-of-#29077-upstream-release-1.3 Automated cherry pick of kubernetes#29077 upstream release 1.3
Fixes #29051: "PVC Volume not detached if pod deleted via namespace deletion"
This PR:
desired_state_of_the_world_populator.go
to check the value ofexists
returned by thepodInformer
so that it can delete pods even if the delete event is missed (or fails).