Make the Garbage Collector work and make it remove the dummy Pod #149

luxas · 2017-02-08T18:44:17Z

Currently it seems like the garbage collector isn't working, it should remove the dummy Pod when the dummy deployment is deleted. Right now it doesn't, even though one sets OrphanDependents=&falseVar.

I've looked at the doc here: https://kubernetes.io/docs/user-guide/garbage-collection/, and IIUC, it should remove the ReplicaSet and Pod when OrphanDependents=&falseVar

When deleting an object, you can request the GC to asynchronously delete its dependents by explicitly specifying deleteOptions.orphanDependents=false in the deletion request that you send to the API server. A 200 OK response from the API server indicates the owner is deleted.

The code I used for deleting the deployment looks like this:

        falseVar := false
	if err := client.ExtensionsV1beta1().Deployments(metav1.NamespaceSystem).Delete("dummy", &metav1.DeleteOptions{
		OrphanDependents: &falseVar,
	}); err != nil {
		fmt.Printf("[apiclient] Failed to delete test deployment [%v] (will ignore)\n", err)
	}

cc @caesarxuchao @mikedanese @deads2k @sttts

The text was updated successfully, but these errors were encountered:

luxas · 2017-02-08T20:30:48Z

What's really strange here is that the ReplicaSet gets deleted, but not the Pod.

Pod YAML:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"dummy-2165365107","uid":"90581c45-ee30-11e6-aa75-f0761c62f136","apiVersion":"extensions","resourceVersion":"111"}}
  creationTimestamp: 2017-02-08T18:58:24Z
  generateName: dummy-2165365107-
  labels:
    app: dummy
    pod-template-hash: "2165365107"
  name: dummy-2165365107-pvlzn
  namespace: kube-system
  resourceVersion: "422"
  selfLink: /api/v1/namespaces/kube-system/pods/dummy-2165365107-pvlzn
  uid: 915ccf2b-ee30-11e6-aa75-f0761c62f136
spec:
  containers:
  - image: gcr.io/google_containers/pause-amd64:3.0
    imagePullPolicy: IfNotPresent
    name: dummy
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-rb2lv
      readOnly: true
  dnsPolicy: ClusterFirst
  hostNetwork: true
  nodeName: theninja
  restartPolicy: Always
  schedulername: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: default-token-rb2lv
    secret:
      defaultMode: 420
      secretName: default-token-rb2lv
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-02-08T18:59:29Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-02-08T18:59:34Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-02-08T18:59:29Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://6a56513c6e34c65b73607abb4eea8a0cfef705d97f0e060eec00304dda253465
    image: gcr.io/google_containers/pause-amd64:3.0
    imageID: docker://sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2
    lastState: {}
    name: dummy
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-02-08T18:59:33Z
  hostIP: 172.20.10.5
  phase: Running
  podIP: 172.20.10.5
  startTime: 2017-02-08T18:59:29Z

Only place it mentions the owner (the rs), is at kubernetes.io/created-by, however, I expected it to have ownerReferences as well, which it doesn't.

The discovery Pod on the other hand, has that field.

  ownerReferences:
  - apiVersion: extensions/v1beta1
    controller: true
    kind: ReplicaSet
    name: kube-discovery-2187510969
    uid: bbfc2be6-ee30-11e6-aa75-f0761c62f136

And the discovery ReplicaSet has

  ownerReferences:
  - apiVersion: extensions/v1beta1
    controller: true
    kind: Deployment
    name: kube-discovery
    uid: bbf76f68-ee30-11e6-aa75-f0761c62f136

cc @Kargakis @mikedanese @yujuhong

sttts · 2017-02-09T08:11:56Z

@luxas do we have an e2e or integration test for something like this?

caesarxuchao · 2017-02-09T08:36:12Z

What's really strange here is that the ReplicaSet gets deleted, but not the Pod.

I know why this is happening. When deleting the RS, the garbage collector set DeleteOptions.OrphanDependents=nil, because garbage collector wants to respect the default GC policy and existing finalizers of the RS. Unfortunately, the default GC policy for RS is orphaning, so the pods are not deleted.

To solve this issue, we can easily let GC always deletes objects with DeleteOptions.OrphanDependents=false. But this means if user wants to delete the Deployment and RSes, but not the Pods, they will have to manually remove the ownerReferences for each Pod. cc @lavalamp @krmayankk

luxas · 2017-02-09T13:02:56Z

@caesarxuchao Thanks!

To solve this issue, we can easily let GC always deletes objects with DeleteOptions.OrphanDependents=false

Or instead we could make the GC always pass DeleteOptions.OrphanDependents=false further to the RS level if the deployment is deleted with DeleteOptions.OrphanDependents=false, it might make even more sense, because if you once pass that you want to delete "children" of the object you're deleting, that command should "flow down" as well I think. WDYT?

yujuhong · 2017-02-09T19:58:30Z

Or instead we could make the GC always pass DeleteOptions.OrphanDependents=false further to the RS level if the deployment is deleted with DeleteOptions.OrphanDependents=false, it might make even more sense, because if you once pass that you want to delete "children" of the object you're deleting, that command should "flow down" as well I think. WDYT?

+1. We should respect options users set.

krmayankk · 2017-02-09T22:28:09Z

I cant think of a use case where a user wants to delete both deployment and rs but not the pods. I can take care of this after i am done with this kubernetes/kubernetes#40898

I think the accepted fix here is to pass on the same OrphanDepedents value to RS that was passed to Deployment.

krmayankk · 2017-02-09T22:29:12Z

Curious @caesarxuchao @luxas why is this issue not in kubernetes but in kubeadm repo

errordeveloper · 2017-02-09T23:42:04Z

I've noticed it when I've implemented the dummy deployment, I've then opened kubernetes/kubernetes#35447, which was a dup of kubernetes/kubernetes#33845 that is now closed. I was intending to just recursively delete the thing, but didn't get around to doing it yet.

krmayankk · 2017-02-10T00:27:28Z

@caesarxuchao @Kargakis i can handle this as well, one of you assign this to me

caesarxuchao · 2017-02-10T13:29:58Z

Let me describe the situation once more, and then we can decide.

Let's say there is a deployment D, a service S1 (the API object "Servcie", note that any API object can depend on a deployment, not just RS) that depends on D, further there are service S2 and S3 that depends on S1. Now the user wants to delete D and S1, but keep S2 and S3. There are two ways to achieve that: 1. remove the ownerReferences in S2 and S3 so that they don't depend on S1, then delete D; 2. add the FinalizerOrphan in the metadata.finalizers for S1, then delete D.

If we change the GC to always set DeleteOptions.Orphan=false, then the second approach won't work anymore, because DeleteOptions has higher priority than the existing finalizer. In the above case, it's easy to remove the ownerReferences on S2 and S3, but it would be difficult if there are 1k services that depend on S1. Are we ready to accept this side effect?

cc @lavalamp

0xmichalis · 2017-02-10T13:40:23Z

@caesarxuchao it's not clear what you mean by "service"? Is it the Service object or something else related to the Deployment.

note that any resource can depend on a deployment, not just RS

Can you provide a concrete example? Is this dependance in the resource level (Deployment has RS children) or in the service level (Deployment startup depends on the output of a Job)? Why, in the later case, owner references are relevant?

caesarxuchao · 2017-02-10T14:10:06Z

@caesarxuchao it's not clear what you mean by "service"? Is it the Service object or something else related to the Deployment.

The API object "Service". It's a fictional example. It doesn't need to be a "Service", it could be a Pod, a Job or anything.

Can you provide a concrete example? Is this dependance in the resource level (Deployment has RS children) or in the service level (Deployment startup depends on the output of a Job)?

The service level. In the example, S1 exposes the deployment D, so it should go away when D is deleted. It could be S2 and S3 are not useful without S1, so the user at first wants to delete all of them together. But in the last minute, the user changes his mind, and wants to keep S2 and S3. With the current GC process, changing mind is easy, the user just needs to put the OrphanFinalizer on S1.

Why, in the later case, owner references are relevant?

GC only looks at the ownerReferences to build the dependency graph, it doesn't know if it's a service level dependency or a resource level one.

luxas · 2017-02-15T18:27:48Z

@caesarxuchao @Kargakis @krmayankk Any resolution here? Can we get the above proposal done in time for v1.6? It doesn't sound hard to implement.

lavalamp · 2017-02-16T01:50:14Z

Is there a way for deployment to create RS's with different default behavior? I think that is the way to fix this.

caesarxuchao · 2017-02-17T10:29:00Z

Is there a way for deployment to create RS's with different default behavior?

No.

The answer perhaps is "yes" with the coming foreground gc (#38678). Deployment can then create RSes with metadata.finalizers set to FinalizerDeletingDependents, then the RSes and their pods will be deleted with foreground gc by default. [edit] Foreground GC means the RSes will not be deleted until their pods are garbage collected.

@errordeveloper

Automatic merge from submit-queue (batch tested with PRs 41420, 41500) Set OrphanDependents=&falseVar so the GC will (or should) remove the dummy Pod **What this PR does / why we need it**: ref: kubernetes/kubeadm#149 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # **Special notes for your reviewer**: This doesn't remove the Pod yet, only the ReplicaSet, but once the GC is working as expected, it'll remove the Pod with this configuration **Release note**: ```release-note NONE ``` @errordeveloper @mikedanese @pires @caesarxuchao @krmayankk @Kargakis

luxas · 2017-03-07T15:27:20Z

ping @caesarxuchao Any resolution to this problem?

deads2k · 2017-03-07T16:07:08Z

ping @caesarxuchao Any resolution to this problem?

Can you try using {"propagationPolicy": "Foreground"} in the delete options? I think that gives the behavior you want for 1.6 and later levels.

luxas · 2017-03-07T18:19:59Z

Thank you a lot @deads2k! It worked fine :)
Posted PR kubernetes/kubernetes#42657 for you to LGTM

@lavalamp

Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823) Make background garbage collection cascading Fix #44046, fix #47843 where user reported that the garbage collector didn't delete pods when a deployment was deleted with PropagationPolicy=Background. The cause is that when propagating background garbage collection request, the garbage collector deletes dependents with DeleteOptions.PropagationPolicy=nil, which means the default GC policy of a resource (defined by its REST strategy) and the existing GC-related finalizers will decide how the delete request is propagated further. Unfortunately, the default GC policy for RS is orphaning, so the pods are behind when a deployment is deleted. This PR changes the garbage collector to delete dependents with DeleteOptions.PropagationPolicy=Background when the owner is deleted in background. This means the dependent's existing GC finalizers will be overridden, making orphaning less flexible (see this made-up [case](kubernetes/kubeadm#149 (comment))). I think sacrificing the flexibility of orphaning is worthwhile, because making the behavior of background garbage collection matching users' expectation is more important. cc @lavalamp @Kargakis @krmayankk @enisoc ```release-note The garbage collector now cascades deletion properly when deleting an object with propagationPolicy="background". This resolves issue [#44046](#44046), so that when a deployment is deleted with propagationPolicy="background", the garbage collector ensures dependent pods are deleted as well. ```

luxas mentioned this issue Feb 15, 2017

Set OrphanDependents=&falseVar so the GC will (or should) remove the dummy Pod kubernetes/kubernetes#41500

Merged

luxas mentioned this issue Mar 7, 2017

kubeadm: Delete the dummy Deployment properly kubernetes/kubernetes#42657

Merged

mikedanese closed this as completed in mikedanese/kubernetes@8e43f00 Mar 8, 2017

This was referenced Apr 4, 2017

Deleting a deployment doesn't delete the pods kubernetes/kubernetes#44046

Closed

Make background garbage collection cascading kubernetes/kubernetes#44058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the Garbage Collector work and make it remove the dummy Pod #149

Make the Garbage Collector work and make it remove the dummy Pod #149

luxas commented Feb 8, 2017

luxas commented Feb 8, 2017

sttts commented Feb 9, 2017

caesarxuchao commented Feb 9, 2017

luxas commented Feb 9, 2017

yujuhong commented Feb 9, 2017

krmayankk commented Feb 9, 2017

krmayankk commented Feb 9, 2017

errordeveloper commented Feb 9, 2017 •

edited

Loading

krmayankk commented Feb 10, 2017

caesarxuchao commented Feb 10, 2017 •

edited

Loading

0xmichalis commented Feb 10, 2017

caesarxuchao commented Feb 10, 2017

luxas commented Feb 15, 2017

lavalamp commented Feb 16, 2017

caesarxuchao commented Feb 17, 2017 •

edited

Loading

luxas commented Mar 7, 2017

deads2k commented Mar 7, 2017

luxas commented Mar 7, 2017

Make the Garbage Collector work and make it remove the dummy Pod #149

Make the Garbage Collector work and make it remove the dummy Pod #149

Comments

luxas commented Feb 8, 2017

luxas commented Feb 8, 2017

sttts commented Feb 9, 2017

caesarxuchao commented Feb 9, 2017

luxas commented Feb 9, 2017

yujuhong commented Feb 9, 2017

krmayankk commented Feb 9, 2017

krmayankk commented Feb 9, 2017

errordeveloper commented Feb 9, 2017 • edited Loading

krmayankk commented Feb 10, 2017

caesarxuchao commented Feb 10, 2017 • edited Loading

0xmichalis commented Feb 10, 2017

caesarxuchao commented Feb 10, 2017

luxas commented Feb 15, 2017

lavalamp commented Feb 16, 2017

caesarxuchao commented Feb 17, 2017 • edited Loading

luxas commented Mar 7, 2017

deads2k commented Mar 7, 2017

luxas commented Mar 7, 2017

errordeveloper commented Feb 9, 2017 •

edited

Loading

caesarxuchao commented Feb 10, 2017 •

edited

Loading

caesarxuchao commented Feb 17, 2017 •

edited

Loading