Make background garbage collection cascading #44058

caesarxuchao · 2017-04-04T21:16:43Z

Fix #44046, fix #47843 where user reported that the garbage collector didn't delete pods when a deployment was deleted with PropagationPolicy=Background.

The cause is that when propagating background garbage collection request, the garbage collector deletes dependents with DeleteOptions.PropagationPolicy=nil, which means the default GC policy of a resource (defined by its REST strategy) and the existing GC-related finalizers will decide how the delete request is propagated further. Unfortunately, the default GC policy for RS is orphaning, so the pods are behind when a deployment is deleted.

This PR changes the garbage collector to delete dependents with DeleteOptions.PropagationPolicy=Background when the owner is deleted in background. This means the dependent's existing GC finalizers will be overridden, making orphaning less flexible (see this made-up case). I think sacrificing the flexibility of orphaning is worthwhile, because making the behavior of background garbage collection matching users' expectation is more important.

cc @lavalamp @Kargakis @krmayankk @enisoc

The garbage collector now cascades deletion properly when deleting an object with propagationPolicy="background". This resolves issue [#44046](https://github.com/kubernetes/kubernetes/issues/44046), so that when a deployment is deleted with propagationPolicy="background", the garbage collector ensures dependent pods are deleted as well.

k8s-reviewable · 2017-04-04T21:17:49Z

This change is

lavalamp · 2017-04-05T18:06:44Z

Isn't the problem that deployments should make replica sets which set a policy which isn't orphan?

Making the GC choose background seems wrong, it means the setting a user may or may not have added to the object in question won't apply, right?

caesarxuchao · 2017-04-08T00:42:59Z

How about letting deployment controller creates replicaset with finalizer set to "foregroundDeletion"?

But it only fixes the deployment->rs->pods chains. If a user creates a dependency chain that include a rs (or any other controller type) as a link, the background deletion won't be cascading. I think it's against user's intuition if the deletion is cascading only when the user explicitly sets the finalizer of the rs to "foregroundDeletion".

0xmichalis · 2017-04-25T19:32:50Z

Making the deployment controller set the policy for replicasets makes sense and could be easily factored out if we ever come up with something better. CronJobs will also need the fix. I don't have a good solution for handling this holistically but let's not make perfect the enemy of the good for now.

caesarxuchao · 2017-04-25T19:58:58Z

Ok. I can send a PR, or if @Kargakis @krmayankk are interested to do that? Should be a small change.

krmayankk · 2017-04-26T16:12:01Z

I haven't follows the thread but seems like a easy fix . I am assuming it only needs to be done for deployment creating replica sets I will send out a pr Mayank

On Tue, Apr 25, 2017 at 12:59 PM Chao Xu ***@***.***> wrote: Ok. I can send a PR, or if @Kargakis <https://github.com/kargakis> @krmayankk <https://github.com/krmayankk> are interested to do that? Should be a small change. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44058 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AP4-NFV2k74TN7dMD1ZA8Mw4-MrEBNbFks5rzlChgaJpZM4MzcZ1> .

-- -Mayank

caesarxuchao · 2017-04-26T16:55:11Z

@krmayankk we need to set the Replicaset.ObjectMeta.Finalizer to DeletingDependents when deployment create the replicaset. Also we need to update the gc test you wrote for deployment to verify that the pods are deleted. Thanks for helping.

krmayankk · 2017-04-27T07:06:12Z

@caesarxuchao you mean Replicaset.ObjectMeta.Finalizer to foregroundDeletion ?
It seems this will happen here https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/deployment_controller.go#L198 Also do we need to change the garbagecollector code to say delete with foreground policy ?

0xmichalis · 2017-04-27T08:19:10Z

@krmayankk that's a resource handler, you probably want to set it in a different place. I think the ideal one is in AdoptReplicaSets but the update needs to be atomic and I am not sure we can combine two different patches into a single call (the other one updates the owner refs of the RS). We should consider switching away from Patch to an Update with retries on conflicts so we can facilitate both updates (owner refs, finalizer) into one call.

0xmichalis · 2017-04-27T08:19:57Z

cc: @enisoc who was on the code recently. Any thoughts on the above?

caesarxuchao · 2017-04-27T16:33:21Z

@Kargakis i think it should be done both when "add" and "adopt" replicasets.

Also it should be possible to patch the objectMeta.ownerRef and objectMeta.Finalizers at the same time.

0xmichalis · 2017-04-27T16:40:18Z

@Kargakis i think it should be done both when "add" and "adopt" replicasets.

The only logic that needs to exist in the resource handlers should be about requeueing the resource and nothing more. Adding the finalizer during adoption, when we create an RS, or when we find one that doesn't have them here should be enough.

Also it should be possible to patch the objectMeta.ownerRef and objectMeta.Finalizers at the same time.

If we can combine both then that's great.

caesarxuchao · 2017-04-27T16:56:40Z

Adding the finalizer during adoption, when we create an RS,

That's what i meant. I should had used the word "create" instead of "add" in my previous comment. @krmayankk sorry i didn't look at the link you pasted closely when i typed my last comment. As @Kargakis said, it's the resource handler and we shouldn't change the code there.

when we find one that doesn't have them

I disagree this part. User should be able to manually remove the finalizer, because the user might want to keep the pods alive when the deployment is deleted. The deployment controller shouldn't add the finalizer back. But this also means replicasets created by a deployment before 1.7 will continue to suffer #44046.

0xmichalis · 2017-04-27T17:08:24Z

I disagree this part. User should be able to manually remove the finalizer, because the user might want to keep the pods alive when the deployment is deleted. The deployment controller shouldn't add the finalizer back. But this also means replicasets created by a deployment before 1.7 will continue to suffer #44046.

If users want to keep the pods shouldn't they do non-cascading deletion of a deployment followed by a non-cascading deletion of the replica set?

0xmichalis · 2017-04-27T17:10:27Z

If users want to keep the pods shouldn't they do non-cascading deletion of a deployment followed by a non-cascading deletion of the replica set?

Or even better, orphan the replica set by changing its labels so that it won't be selected by the deployment, do a cascading deletion of the deployment, do a non-cascading deletion of the replica set.

caesarxuchao · 2017-04-27T18:05:31Z

Both are more complicated than manually changing the finalizer of one replicaset.

If the deployment controller is going to overwrite the finalizers set by a user, we might as well let the garbage collector overwrite the finalizers, that's what this PR is doing.

enisoc · 2017-04-27T19:01:05Z

@Kargakis wrote:

cc: @enisoc who was on the code recently. Any thoughts on the above?

@caesarxuchao wrote:

Also it should be possible to patch the objectMeta.ownerRef and objectMeta.Finalizers at the same time.

Patch should work (in theory) for updating multiple things at once, as long as the change you're making is not dependent on existing values. For example, if you need to first check the existing list of Finalizers before deciding what/whether you're modifying, you would need to use Update with a retry loop.

0xmichalis · 2017-04-27T19:13:34Z

Both are more complicated than manually changing the finalizer of one replicaset.

If the deployment controller is going to overwrite the finalizers set by a user, we might as well let the garbage collector overwrite the finalizers, that's what this PR is doing.

Users shouldn't muck with ReplicaSets owned by Deployments and we should discourage them from doing so as much as possible, otherwise they are racing with the controller and open the door for unexpected behavior. If you want to isolate a specific set of Pods, that is already not a common use case and orphaning the parent ReplicaSet or doing two non-cascading deletions of the parent objects doesn't sound complex.

caesarxuchao · 2017-04-27T20:01:27Z

doing two non-cascading deletions of the parent objects

User will need to also delete the replicasets in the deployment history as well, right? Then it's more than "two" :)

The "changing labels" solution requires the deployment controller to react to release the pod so the deletion behavior is not less deterministic.

Is there benefit other than fixing pre-1.7 replicasets? If not, i think we should take the one-time pain.

0xmichalis · 2017-04-27T20:12:34Z

User will need to also delete the replicasets in the deployment history as well, right? Then it's more than "two" :)

kubectl patch deploy/foo -p '{"spec":{"revisionHistoryLimit":0}}'
kubectl delete deploy/foo --cascade=false
kubectl delete rs/foo-n --cascade=false

Or

kubectl label rs/foo-n existing-label-
kubectl delete deploy/foo
kubectl delete rs/foo-n --cascade=false

The "changing labels" solution requires the deployment controller to react to release the pod so the deletion behavior is not less deterministic.

You mean the replica set. Removing a label from the replica set is declaring that you want to orphan and the deployment controller will remove the owner ref eventually (99,9% of the time within a second).

Is there benefit other than fixing pre-1.7 replicasets? If not, i think we should take the one-time pain.

IMO it's more than just fixing pre-1.7 replicasets. We should discourage users from working directly with ReplicaSets owned by Deployments. If users want more control, they should either not use Deployments or for one-off cases such as the one you are describing, they should orphan.

0xmichalis · 2017-04-27T20:20:09Z

You can also script out the steps above but you would want to wait for the controllers to observe the changes during each step.

caesarxuchao · 2017-04-27T21:24:24Z

@Kargakis could you confirm if it's a rare, or even invalid use case to keep some pods alive while deleting the replicasets and deployment? If so, i agree with your comment here.

0xmichalis · 2017-04-27T22:32:55Z

@Kargakis could you confirm if it's a rare, or even invalid use case to keep some pods alive while deleting the replicasets and deployment? If so, i agree with your comment here.

I wouldn't say it's invalid but it's not a common thing to do. The only case I can think of where you would want to do that is to move the Pods under a new controller.

krmayankk · 2017-04-28T03:02:10Z

@caesarxuchao @Kargakis one thing i am not clear is setting ReplicaSets.Finalizers to foregroundDeletion, how would that result in triggerring the deletion of the pods of the replica sets(which is the problem we are trying to solve). As per this https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion when doing foreground cascading deletion , the finalizer gets set by the gc controller to foregroundDeletion. Does the GC notice that the foregroundDeletion is set for an object, so it triggers the deletion of its dependenet obbjects ? Also is the blockOwnerDeletion already set for pods of the replica sets ?

Also its not clear to me why setting the propogation policy to background for RS when deleting them is bad idea. I have read all the comments above, but not sure i understand the logic.

k8s-github-robot · 2017-06-26T17:07:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: caesarxuchao, liggitt
We suggest the following additional approver: lavalamp

Assign the PR to them by writing /assign @lavalamp in a comment when ready.

Associated issue: 44046

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

pkg/controller/OWNERS

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-06-26T17:08:11Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: caesarxuchao, liggitt
We suggest the following additional approver: lavalamp

Assign the PR to them by writing /assign @lavalamp in a comment when ready.

Associated issue: 44046

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

pkg/controller/OWNERS

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

0xmichalis · 2017-06-26T17:09:20Z

/release-note-none

caesarxuchao · 2017-06-26T17:48:23Z

@liggitt @Kargakis could you also review the release note? Although this PR is a bug fix, it changes the GC behavior, so it is worth a release note.

k8s-ci-robot · 2017-06-26T18:08:56Z

@caesarxuchao: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
Jenkins non-CRI GCE e2e	`4cd9d04`	link	`@k8s-bot non-cri e2e test this`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

caesarxuchao · 2017-06-26T18:09:17Z

/test pull-kubernetes-e2e-gce-etcd3

liggitt · 2017-06-26T18:51:01Z

@caesarxuchao updated release note to put the general change first, and the specific issue fixed second

caesarxuchao · 2017-06-26T19:44:53Z

Thanks @liggitt.

caesarxuchao · 2017-06-26T20:09:54Z

cc @dchen1107

dchen1107 · 2017-06-26T20:56:53Z

/lgtm

k8s-github-robot · 2017-06-26T20:57:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, dchen1107, liggitt

Associated issue: 44046

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/controller/OWNERS~~ [dchen1107]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-06-26T22:29:24Z

Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)

…#44058-upstream-release-1.7 Automatic merge from submit-queue Automated cherry pick of #44058 Cherry pick of #44058 on release-1.7. #44058: revert 45764

k8s-cherrypick-bot · 2017-06-27T18:05:03Z

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

@caesarxuchao

Automatic merge from submit-queue Add e2e test for cronjob chained removal This is test proving #44058 works with cronjobs. This will fail until the aforementioned PR merges. @caesarxuchao ptal

caesarxuchao assigned lavalamp Apr 4, 2017

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 4, 2017

k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Apr 4, 2017

caesarxuchao mentioned this pull request Apr 4, 2017

Deleting a deployment doesn't delete the pods #44046

Closed

0xmichalis mentioned this pull request Apr 20, 2017

Introduce cascading deletion kubernetes/dashboard#1570

Closed

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Jun 26, 2017

0xmichalis removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 26, 2017

caesarxuchao added cherrypick-candidate cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. labels Jun 26, 2017

caesarxuchao added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 26, 2017

k8s-ci-robot assigned dchen1107 Jun 26, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2017

k8s-github-robot merged commit 6a28658 into kubernetes:master Jun 26, 2017

caesarxuchao mentioned this pull request Jun 26, 2017

Automated cherry pick of #44058 #48112

Merged

k8s-cherrypick-bot removed the cherrypick-candidate label Jun 27, 2017

caesarxuchao mentioned this pull request Jul 6, 2017

Fix cascading delete #48138

Closed

psyhomb mentioned this pull request Oct 5, 2017

Remove command does not remove replicaset and pod objects openfaas/faas-netes#34

Closed

gtcno mentioned this pull request Apr 16, 2018

Delete replicaSet for app nais/naisd#107

Closed

Make background garbage collection cascading #44058

Make background garbage collection cascading #44058

Conversation

caesarxuchao commented Apr 4, 2017 • edited Loading

k8s-reviewable commented Apr 4, 2017

lavalamp commented Apr 5, 2017

caesarxuchao commented Apr 8, 2017

0xmichalis commented Apr 25, 2017

caesarxuchao commented Apr 25, 2017

krmayankk commented Apr 26, 2017 via email

caesarxuchao commented Apr 26, 2017

krmayankk commented Apr 27, 2017 • edited Loading

0xmichalis commented Apr 27, 2017

0xmichalis commented Apr 27, 2017

caesarxuchao commented Apr 27, 2017

0xmichalis commented Apr 27, 2017

caesarxuchao commented Apr 27, 2017

0xmichalis commented Apr 27, 2017

0xmichalis commented Apr 27, 2017

caesarxuchao commented Apr 27, 2017

enisoc commented Apr 27, 2017

0xmichalis commented Apr 27, 2017

caesarxuchao commented Apr 27, 2017

0xmichalis commented Apr 27, 2017 • edited Loading

0xmichalis commented Apr 27, 2017

caesarxuchao commented Apr 27, 2017

0xmichalis commented Apr 27, 2017

krmayankk commented Apr 28, 2017

k8s-github-robot commented Jun 26, 2017

k8s-github-robot commented Jun 26, 2017

0xmichalis commented Jun 26, 2017

caesarxuchao commented Jun 26, 2017

k8s-ci-robot commented Jun 26, 2017 • edited Loading

caesarxuchao commented Jun 26, 2017

liggitt commented Jun 26, 2017

caesarxuchao commented Jun 26, 2017

caesarxuchao commented Jun 26, 2017

dchen1107 commented Jun 26, 2017

k8s-github-robot commented Jun 26, 2017

k8s-github-robot commented Jun 26, 2017

k8s-cherrypick-bot commented Jun 27, 2017

caesarxuchao commented Apr 4, 2017 •

edited

Loading

krmayankk commented Apr 27, 2017 •

edited

Loading

0xmichalis commented Apr 27, 2017 •

edited

Loading

k8s-ci-robot commented Jun 26, 2017 •

edited

Loading