-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the Garbage Collector work and make it remove the dummy Pod #149
Comments
What's really strange here is that the ReplicaSet gets deleted, but not the Pod. Pod YAML: apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/created-by: |
{"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"dummy-2165365107","uid":"90581c45-ee30-11e6-aa75-f0761c62f136","apiVersion":"extensions","resourceVersion":"111"}}
creationTimestamp: 2017-02-08T18:58:24Z
generateName: dummy-2165365107-
labels:
app: dummy
pod-template-hash: "2165365107"
name: dummy-2165365107-pvlzn
namespace: kube-system
resourceVersion: "422"
selfLink: /api/v1/namespaces/kube-system/pods/dummy-2165365107-pvlzn
uid: 915ccf2b-ee30-11e6-aa75-f0761c62f136
spec:
containers:
- image: gcr.io/google_containers/pause-amd64:3.0
imagePullPolicy: IfNotPresent
name: dummy
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-rb2lv
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: true
nodeName: theninja
restartPolicy: Always
schedulername: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
volumes:
- name: default-token-rb2lv
secret:
defaultMode: 420
secretName: default-token-rb2lv
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2017-02-08T18:59:29Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2017-02-08T18:59:34Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2017-02-08T18:59:29Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://6a56513c6e34c65b73607abb4eea8a0cfef705d97f0e060eec00304dda253465
image: gcr.io/google_containers/pause-amd64:3.0
imageID: docker://sha256:99e59f495ffaa222bfeb67580213e8c28c1e885f1d245ab2bbe3b1b1ec3bd0b2
lastState: {}
name: dummy
ready: true
restartCount: 0
state:
running:
startedAt: 2017-02-08T18:59:33Z
hostIP: 172.20.10.5
phase: Running
podIP: 172.20.10.5
startTime: 2017-02-08T18:59:29Z Only place it mentions the owner (the rs), is at The discovery Pod on the other hand, has that field. ownerReferences:
- apiVersion: extensions/v1beta1
controller: true
kind: ReplicaSet
name: kube-discovery-2187510969
uid: bbfc2be6-ee30-11e6-aa75-f0761c62f136 And the discovery ReplicaSet has ownerReferences:
- apiVersion: extensions/v1beta1
controller: true
kind: Deployment
name: kube-discovery
uid: bbf76f68-ee30-11e6-aa75-f0761c62f136 |
@luxas do we have an e2e or integration test for something like this? |
I know why this is happening. When deleting the RS, the garbage collector set DeleteOptions.OrphanDependents=nil, because garbage collector wants to respect the default GC policy and existing finalizers of the RS. Unfortunately, the default GC policy for RS is orphaning, so the pods are not deleted. To solve this issue, we can easily let GC always deletes objects with DeleteOptions.OrphanDependents=false. But this means if user wants to delete the Deployment and RSes, but not the Pods, they will have to manually remove the ownerReferences for each Pod. cc @lavalamp @krmayankk |
@caesarxuchao Thanks!
Or instead we could make the GC always pass |
+1. We should respect options users set. |
I cant think of a use case where a user wants to delete both deployment and rs but not the pods. I can take care of this after i am done with this kubernetes/kubernetes#40898 I think the accepted fix here is to pass on the same OrphanDepedents value to RS that was passed to Deployment. |
Curious @caesarxuchao @luxas why is this issue not in kubernetes but in kubeadm repo |
I've noticed it when I've implemented the dummy deployment, I've then opened kubernetes/kubernetes#35447, which was a dup of kubernetes/kubernetes#33845 that is now closed. I was intending to just recursively delete the thing, but didn't get around to doing it yet. |
@caesarxuchao @Kargakis i can handle this as well, one of you assign this to me |
Let me describe the situation once more, and then we can decide. Let's say there is a deployment D, a service S1 (the API object "Servcie", note that any API object can depend on a deployment, not just RS) that depends on D, further there are service S2 and S3 that depends on S1. Now the user wants to delete D and S1, but keep S2 and S3. There are two ways to achieve that: 1. remove the ownerReferences in S2 and S3 so that they don't depend on S1, then delete D; 2. add the FinalizerOrphan in the metadata.finalizers for S1, then delete D. If we change the GC to always set DeleteOptions.Orphan=false, then the second approach won't work anymore, because DeleteOptions has higher priority than the existing finalizer. In the above case, it's easy to remove the ownerReferences on S2 and S3, but it would be difficult if there are 1k services that depend on S1. Are we ready to accept this side effect? cc @lavalamp |
@caesarxuchao it's not clear what you mean by "service"? Is it the Service object or something else related to the Deployment.
Can you provide a concrete example? Is this dependance in the resource level (Deployment has RS children) or in the service level (Deployment startup depends on the output of a Job)? Why, in the later case, owner references are relevant? |
The API object "Service". It's a fictional example. It doesn't need to be a "Service", it could be a Pod, a Job or anything.
The service level. In the example, S1 exposes the deployment D, so it should go away when D is deleted. It could be S2 and S3 are not useful without S1, so the user at first wants to delete all of them together. But in the last minute, the user changes his mind, and wants to keep S2 and S3. With the current GC process, changing mind is easy, the user just needs to put the OrphanFinalizer on S1.
GC only looks at the ownerReferences to build the dependency graph, it doesn't know if it's a service level dependency or a resource level one. |
@caesarxuchao @Kargakis @krmayankk Any resolution here? Can we get the above proposal done in time for v1.6? It doesn't sound hard to implement. |
Is there a way for deployment to create RS's with different default behavior? I think that is the way to fix this. |
No. The answer perhaps is "yes" with the coming foreground gc (#38678). Deployment can then create RSes with |
Automatic merge from submit-queue (batch tested with PRs 41420, 41500) Set OrphanDependents=&falseVar so the GC will (or should) remove the dummy Pod **What this PR does / why we need it**: ref: kubernetes/kubeadm#149 **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # **Special notes for your reviewer**: This doesn't remove the Pod yet, only the ReplicaSet, but once the GC is working as expected, it'll remove the Pod with this configuration **Release note**: ```release-note NONE ``` @errordeveloper @mikedanese @pires @caesarxuchao @krmayankk @Kargakis
ping @caesarxuchao Any resolution to this problem? |
Can you try using |
Thank you a lot @deads2k! It worked fine :) |
Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823) Make background garbage collection cascading Fix #44046, fix #47843 where user reported that the garbage collector didn't delete pods when a deployment was deleted with PropagationPolicy=Background. The cause is that when propagating background garbage collection request, the garbage collector deletes dependents with DeleteOptions.PropagationPolicy=nil, which means the default GC policy of a resource (defined by its REST strategy) and the existing GC-related finalizers will decide how the delete request is propagated further. Unfortunately, the default GC policy for RS is orphaning, so the pods are behind when a deployment is deleted. This PR changes the garbage collector to delete dependents with DeleteOptions.PropagationPolicy=Background when the owner is deleted in background. This means the dependent's existing GC finalizers will be overridden, making orphaning less flexible (see this made-up [case](kubernetes/kubeadm#149 (comment))). I think sacrificing the flexibility of orphaning is worthwhile, because making the behavior of background garbage collection matching users' expectation is more important. cc @lavalamp @Kargakis @krmayankk @enisoc ```release-note The garbage collector now cascades deletion properly when deleting an object with propagationPolicy="background". This resolves issue [#44046](#44046), so that when a deployment is deleted with propagationPolicy="background", the garbage collector ensures dependent pods are deleted as well. ```
Currently it seems like the garbage collector isn't working, it should remove the dummy Pod when the dummy deployment is deleted. Right now it doesn't, even though one sets
OrphanDependents=&falseVar
.I've looked at the doc here: https://kubernetes.io/docs/user-guide/garbage-collection/, and IIUC, it should remove the ReplicaSet and Pod when
OrphanDependents=&falseVar
The code I used for deleting the deployment looks like this:
cc @caesarxuchao @mikedanese @deads2k @sttts
The text was updated successfully, but these errors were encountered: