Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the Garbage Collector work and make it remove the dummy Pod #149

Closed
luxas opened this issue Feb 8, 2017 · 18 comments · Fixed by kubernetes/kubernetes#42657
Closed

Comments

@luxas
Copy link
Member

luxas commented Feb 8, 2017

Currently it seems like the garbage collector isn't working, it should remove the dummy Pod when the dummy deployment is deleted. Right now it doesn't, even though one sets OrphanDependents=&falseVar.

I've looked at the doc here: https://kubernetes.io/docs/user-guide/garbage-collection/, and IIUC, it should remove the ReplicaSet and Pod when OrphanDependents=&falseVar

When deleting an object, you can request the GC to asynchronously delete its dependents by explicitly specifying deleteOptions.orphanDependents=false in the deletion request that you send to the API server. A 200 OK response from the API server indicates the owner is deleted.

The code I used for deleting the deployment looks like this:

        falseVar := false
	if err := client.ExtensionsV1beta1().Deployments(metav1.NamespaceSystem).Delete("dummy", &metav1.DeleteOptions{
		OrphanDependents: &falseVar,
	}); err != nil {
		fmt.Printf("[apiclient] Failed to delete test deployment [%v] (will ignore)\n", err)
	}

cc @caesarxuchao @mikedanese @deads2k @sttts

@luxas
Copy link
Member Author

luxas commented Feb 8, 2017

What's really strange here is that the ReplicaSet gets deleted, but not the Pod.

Pod YAML:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"dummy-2165365107","uid":"90581c45-ee30-11e6-aa75-f0761c62f136","apiVersion":"extensions","resourceVersion":"111"}}
  creationTimestamp: 2017-02-08T18:58:24Z
  generateName: dummy-2165365107-
  labels:
    app: dummy
    pod-template-hash: "2165365107"
  name: dummy-2165365107-pvlzn
  namespace: kube-system
  resourceVersion: "422"
  selfLink: /api/v1/namespaces/kube-system/pods/dummy-2165365107-pvlzn
  uid: 915ccf2b-ee30-11e6-aa75-f0761c62f136
spec:
  containers:
  - image: gcr.io/google_containers/pause-amd64:3.0
    imagePullPolicy: IfNotPresent
    name: dummy
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-rb2lv
      readOnly: true
  dnsPolicy: ClusterFirst
  hostNetwork: true
  nodeName: theninja
  restartPolicy: Always
  schedulername: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: default-token-rb2lv
    secret:
      defaultMode: 420
      secretName: default-token-rb2lv
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-02-08T18:59:29Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-02-08T18:59:34Z
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-02-08T18:59:29Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://6a56513c6e34c65b73607abb4eea8a0cfef705d97f0e060eec00304dda253465
    image: gcr.io/google_containers/pause-amd64:3.0
    imageID: docker://sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2
    lastState: {}
    name: dummy
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-02-08T18:59:33Z
  hostIP: 172.20.10.5
  phase: Running
  podIP: 172.20.10.5
  startTime: 2017-02-08T18:59:29Z

Only place it mentions the owner (the rs), is at kubernetes.io/created-by, however, I expected it to have ownerReferences as well, which it doesn't.

The discovery Pod on the other hand, has that field.

  ownerReferences:
  - apiVersion: extensions/v1beta1
    controller: true
    kind: ReplicaSet
    name: kube-discovery-2187510969
    uid: bbfc2be6-ee30-11e6-aa75-f0761c62f136

And the discovery ReplicaSet has

  ownerReferences:
  - apiVersion: extensions/v1beta1
    controller: true
    kind: Deployment
    name: kube-discovery
    uid: bbf76f68-ee30-11e6-aa75-f0761c62f136

cc @Kargakis @mikedanese @yujuhong

@sttts
Copy link

sttts commented Feb 9, 2017

@luxas do we have an e2e or integration test for something like this?

@caesarxuchao
Copy link
Member

What's really strange here is that the ReplicaSet gets deleted, but not the Pod.

I know why this is happening. When deleting the RS, the garbage collector set DeleteOptions.OrphanDependents=nil, because garbage collector wants to respect the default GC policy and existing finalizers of the RS. Unfortunately, the default GC policy for RS is orphaning, so the pods are not deleted.

To solve this issue, we can easily let GC always deletes objects with DeleteOptions.OrphanDependents=false. But this means if user wants to delete the Deployment and RSes, but not the Pods, they will have to manually remove the ownerReferences for each Pod. cc @lavalamp @krmayankk

@luxas
Copy link
Member Author

luxas commented Feb 9, 2017

@caesarxuchao Thanks!

To solve this issue, we can easily let GC always deletes objects with DeleteOptions.OrphanDependents=false

Or instead we could make the GC always pass DeleteOptions.OrphanDependents=false further to the RS level if the deployment is deleted with DeleteOptions.OrphanDependents=false, it might make even more sense, because if you once pass that you want to delete "children" of the object you're deleting, that command should "flow down" as well I think. WDYT?

@yujuhong
Copy link

yujuhong commented Feb 9, 2017

Or instead we could make the GC always pass DeleteOptions.OrphanDependents=false further to the RS level if the deployment is deleted with DeleteOptions.OrphanDependents=false, it might make even more sense, because if you once pass that you want to delete "children" of the object you're deleting, that command should "flow down" as well I think. WDYT?

+1. We should respect options users set.

@krmayankk
Copy link

I cant think of a use case where a user wants to delete both deployment and rs but not the pods. I can take care of this after i am done with this kubernetes/kubernetes#40898

I think the accepted fix here is to pass on the same OrphanDepedents value to RS that was passed to Deployment.

@krmayankk
Copy link

Curious @caesarxuchao @luxas why is this issue not in kubernetes but in kubeadm repo

@errordeveloper
Copy link
Member

errordeveloper commented Feb 9, 2017

I've noticed it when I've implemented the dummy deployment, I've then opened kubernetes/kubernetes#35447, which was a dup of kubernetes/kubernetes#33845 that is now closed. I was intending to just recursively delete the thing, but didn't get around to doing it yet.

@krmayankk
Copy link

@caesarxuchao @Kargakis i can handle this as well, one of you assign this to me

@caesarxuchao
Copy link
Member

caesarxuchao commented Feb 10, 2017

Let me describe the situation once more, and then we can decide.

Let's say there is a deployment D, a service S1 (the API object "Servcie", note that any API object can depend on a deployment, not just RS) that depends on D, further there are service S2 and S3 that depends on S1. Now the user wants to delete D and S1, but keep S2 and S3. There are two ways to achieve that: 1. remove the ownerReferences in S2 and S3 so that they don't depend on S1, then delete D; 2. add the FinalizerOrphan in the metadata.finalizers for S1, then delete D.

If we change the GC to always set DeleteOptions.Orphan=false, then the second approach won't work anymore, because DeleteOptions has higher priority than the existing finalizer. In the above case, it's easy to remove the ownerReferences on S2 and S3, but it would be difficult if there are 1k services that depend on S1. Are we ready to accept this side effect?

cc @lavalamp

@0xmichalis
Copy link
Contributor

@caesarxuchao it's not clear what you mean by "service"? Is it the Service object or something else related to the Deployment.

note that any resource can depend on a deployment, not just RS

Can you provide a concrete example? Is this dependance in the resource level (Deployment has RS children) or in the service level (Deployment startup depends on the output of a Job)? Why, in the later case, owner references are relevant?

@caesarxuchao
Copy link
Member

@caesarxuchao it's not clear what you mean by "service"? Is it the Service object or something else related to the Deployment.

The API object "Service". It's a fictional example. It doesn't need to be a "Service", it could be a Pod, a Job or anything.

Can you provide a concrete example? Is this dependance in the resource level (Deployment has RS children) or in the service level (Deployment startup depends on the output of a Job)?

The service level. In the example, S1 exposes the deployment D, so it should go away when D is deleted. It could be S2 and S3 are not useful without S1, so the user at first wants to delete all of them together. But in the last minute, the user changes his mind, and wants to keep S2 and S3. With the current GC process, changing mind is easy, the user just needs to put the OrphanFinalizer on S1.

Why, in the later case, owner references are relevant?

GC only looks at the ownerReferences to build the dependency graph, it doesn't know if it's a service level dependency or a resource level one.

@luxas
Copy link
Member Author

luxas commented Feb 15, 2017

@caesarxuchao @Kargakis @krmayankk Any resolution here? Can we get the above proposal done in time for v1.6? It doesn't sound hard to implement.

@lavalamp
Copy link
Member

Is there a way for deployment to create RS's with different default behavior? I think that is the way to fix this.

@caesarxuchao
Copy link
Member

caesarxuchao commented Feb 17, 2017

Is there a way for deployment to create RS's with different default behavior?

No.

The answer perhaps is "yes" with the coming foreground gc (#38678). Deployment can then create RSes with metadata.finalizers set to FinalizerDeletingDependents, then the RSes and their pods will be deleted with foreground gc by default. [edit] Foreground GC means the RSes will not be deleted until their pods are garbage collected.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Feb 19, 2017
Automatic merge from submit-queue (batch tested with PRs 41420, 41500)

Set OrphanDependents=&falseVar so the GC will (or should) remove the dummy Pod

**What this PR does / why we need it**:

ref: kubernetes/kubeadm#149

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

**Special notes for your reviewer**:

This doesn't remove the Pod yet, only the ReplicaSet, but once the GC is working as expected, it'll remove the Pod with this configuration

**Release note**:

```release-note
NONE
```
@errordeveloper @mikedanese @pires @caesarxuchao @krmayankk @Kargakis
@luxas
Copy link
Member Author

luxas commented Mar 7, 2017

ping @caesarxuchao Any resolution to this problem?

@deads2k
Copy link

deads2k commented Mar 7, 2017

ping @caesarxuchao Any resolution to this problem?

Can you try using {"propagationPolicy": "Foreground"} in the delete options? I think that gives the behavior you want for 1.6 and later levels.

@luxas
Copy link
Member Author

luxas commented Mar 7, 2017

Thank you a lot @deads2k! It worked fine :)
Posted PR kubernetes/kubernetes#42657 for you to LGTM

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Jun 26, 2017
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)

Make background garbage collection cascading

Fix #44046, fix #47843 where user reported that the garbage collector didn't delete pods when a deployment was deleted with PropagationPolicy=Background.

The cause is that when propagating background garbage collection request, the garbage collector deletes dependents with DeleteOptions.PropagationPolicy=nil, which means the default GC policy of a resource (defined by its REST strategy) and the existing GC-related finalizers will decide how the delete request is propagated further. Unfortunately, the default GC policy for RS is orphaning, so the pods are behind when a deployment is deleted.

This PR changes the garbage collector to delete dependents with DeleteOptions.PropagationPolicy=Background when the owner is deleted in background. This means the dependent's existing GC finalizers will be overridden, making orphaning less flexible (see this made-up [case](kubernetes/kubeadm#149 (comment))). I think sacrificing the flexibility of orphaning is worthwhile, because making the behavior of background garbage collection matching users' expectation is more important.

cc @lavalamp @Kargakis @krmayankk @enisoc 

```release-note
The garbage collector now cascades deletion properly when deleting an object with propagationPolicy="background". This resolves issue [#44046](#44046), so that when a deployment is deleted with propagationPolicy="background", the garbage collector ensures dependent pods are deleted as well.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants