Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make background garbage collection cascading #44058

Merged
merged 2 commits into from
Jun 26, 2017

Conversation

caesarxuchao
Copy link
Member

@caesarxuchao caesarxuchao commented Apr 4, 2017

Fix #44046, fix #47843 where user reported that the garbage collector didn't delete pods when a deployment was deleted with PropagationPolicy=Background.

The cause is that when propagating background garbage collection request, the garbage collector deletes dependents with DeleteOptions.PropagationPolicy=nil, which means the default GC policy of a resource (defined by its REST strategy) and the existing GC-related finalizers will decide how the delete request is propagated further. Unfortunately, the default GC policy for RS is orphaning, so the pods are behind when a deployment is deleted.

This PR changes the garbage collector to delete dependents with DeleteOptions.PropagationPolicy=Background when the owner is deleted in background. This means the dependent's existing GC finalizers will be overridden, making orphaning less flexible (see this made-up case). I think sacrificing the flexibility of orphaning is worthwhile, because making the behavior of background garbage collection matching users' expectation is more important.

cc @lavalamp @Kargakis @krmayankk @enisoc

The garbage collector now cascades deletion properly when deleting an object with propagationPolicy="background". This resolves issue [#44046](https://github.com/kubernetes/kubernetes/issues/44046), so that when a deployment is deleted with propagationPolicy="background", the garbage collector ensures dependent pods are deleted as well.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 4, 2017
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Apr 4, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@lavalamp
Copy link
Member

lavalamp commented Apr 5, 2017

Isn't the problem that deployments should make replica sets which set a policy which isn't orphan?

Making the GC choose background seems wrong, it means the setting a user may or may not have added to the object in question won't apply, right?

@caesarxuchao
Copy link
Member Author

How about letting deployment controller creates replicaset with finalizer set to "foregroundDeletion"?

But it only fixes the deployment->rs->pods chains. If a user creates a dependency chain that include a rs (or any other controller type) as a link, the background deletion won't be cascading. I think it's against user's intuition if the deletion is cascading only when the user explicitly sets the finalizer of the rs to "foregroundDeletion".

@0xmichalis
Copy link
Contributor

Making the deployment controller set the policy for replicasets makes sense and could be easily factored out if we ever come up with something better. CronJobs will also need the fix. I don't have a good solution for handling this holistically but let's not make perfect the enemy of the good for now.

@caesarxuchao
Copy link
Member Author

Ok. I can send a PR, or if @Kargakis @krmayankk are interested to do that? Should be a small change.

@krmayankk
Copy link

krmayankk commented Apr 26, 2017 via email

@caesarxuchao
Copy link
Member Author

@krmayankk we need to set the Replicaset.ObjectMeta.Finalizer to DeletingDependents when deployment create the replicaset. Also we need to update the gc test you wrote for deployment to verify that the pods are deleted. Thanks for helping.

@krmayankk
Copy link

krmayankk commented Apr 27, 2017

@caesarxuchao you mean Replicaset.ObjectMeta.Finalizer to foregroundDeletion ?
It seems this will happen here https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/deployment/deployment_controller.go#L198 Also do we need to change the garbagecollector code to say delete with foreground policy ?

@0xmichalis
Copy link
Contributor

@krmayankk that's a resource handler, you probably want to set it in a different place. I think the ideal one is in AdoptReplicaSets but the update needs to be atomic and I am not sure we can combine two different patches into a single call (the other one updates the owner refs of the RS). We should consider switching away from Patch to an Update with retries on conflicts so we can facilitate both updates (owner refs, finalizer) into one call.

@0xmichalis
Copy link
Contributor

cc: @enisoc who was on the code recently. Any thoughts on the above?

@caesarxuchao
Copy link
Member Author

@Kargakis i think it should be done both when "add" and "adopt" replicasets.

Also it should be possible to patch the objectMeta.ownerRef and objectMeta.Finalizers at the same time.

@0xmichalis
Copy link
Contributor

@Kargakis i think it should be done both when "add" and "adopt" replicasets.

The only logic that needs to exist in the resource handlers should be about requeueing the resource and nothing more. Adding the finalizer during adoption, when we create an RS, or when we find one that doesn't have them here should be enough.

Also it should be possible to patch the objectMeta.ownerRef and objectMeta.Finalizers at the same time.

If we can combine both then that's great.

@caesarxuchao
Copy link
Member Author

Adding the finalizer during adoption, when we create an RS,

That's what i meant. I should had used the word "create" instead of "add" in my previous comment. @krmayankk sorry i didn't look at the link you pasted closely when i typed my last comment. As @Kargakis said, it's the resource handler and we shouldn't change the code there.

when we find one that doesn't have them

I disagree this part. User should be able to manually remove the finalizer, because the user might want to keep the pods alive when the deployment is deleted. The deployment controller shouldn't add the finalizer back. But this also means replicasets created by a deployment before 1.7 will continue to suffer #44046.

@0xmichalis
Copy link
Contributor

I disagree this part. User should be able to manually remove the finalizer, because the user might want to keep the pods alive when the deployment is deleted. The deployment controller shouldn't add the finalizer back. But this also means replicasets created by a deployment before 1.7 will continue to suffer #44046.

If users want to keep the pods shouldn't they do non-cascading deletion of a deployment followed by a non-cascading deletion of the replica set?

@0xmichalis
Copy link
Contributor

If users want to keep the pods shouldn't they do non-cascading deletion of a deployment followed by a non-cascading deletion of the replica set?

Or even better, orphan the replica set by changing its labels so that it won't be selected by the deployment, do a cascading deletion of the deployment, do a non-cascading deletion of the replica set.

@caesarxuchao
Copy link
Member Author

Both are more complicated than manually changing the finalizer of one replicaset.

If the deployment controller is going to overwrite the finalizers set by a user, we might as well let the garbage collector overwrite the finalizers, that's what this PR is doing.

@enisoc
Copy link
Member

enisoc commented Apr 27, 2017

@Kargakis wrote:

cc: @enisoc who was on the code recently. Any thoughts on the above?

@caesarxuchao wrote:

Also it should be possible to patch the objectMeta.ownerRef and objectMeta.Finalizers at the same time.

Patch should work (in theory) for updating multiple things at once, as long as the change you're making is not dependent on existing values. For example, if you need to first check the existing list of Finalizers before deciding what/whether you're modifying, you would need to use Update with a retry loop.

@0xmichalis
Copy link
Contributor

Both are more complicated than manually changing the finalizer of one replicaset.

If the deployment controller is going to overwrite the finalizers set by a user, we might as well let the garbage collector overwrite the finalizers, that's what this PR is doing.

Users shouldn't muck with ReplicaSets owned by Deployments and we should discourage them from doing so as much as possible, otherwise they are racing with the controller and open the door for unexpected behavior. If you want to isolate a specific set of Pods, that is already not a common use case and orphaning the parent ReplicaSet or doing two non-cascading deletions of the parent objects doesn't sound complex.

@caesarxuchao
Copy link
Member Author

doing two non-cascading deletions of the parent objects

User will need to also delete the replicasets in the deployment history as well, right? Then it's more than "two" :)

The "changing labels" solution requires the deployment controller to react to release the pod so the deletion behavior is not less deterministic.

Is there benefit other than fixing pre-1.7 replicasets? If not, i think we should take the one-time pain.

@0xmichalis
Copy link
Contributor

0xmichalis commented Apr 27, 2017

User will need to also delete the replicasets in the deployment history as well, right? Then it's more than "two" :)

kubectl patch deploy/foo -p '{"spec":{"revisionHistoryLimit":0}}'
kubectl delete deploy/foo --cascade=false
kubectl delete rs/foo-n --cascade=false

Or

kubectl label rs/foo-n existing-label-
kubectl delete deploy/foo
kubectl delete rs/foo-n --cascade=false

The "changing labels" solution requires the deployment controller to react to release the pod so the deletion behavior is not less deterministic.

You mean the replica set. Removing a label from the replica set is declaring that you want to orphan and the deployment controller will remove the owner ref eventually (99,9% of the time within a second).

Is there benefit other than fixing pre-1.7 replicasets? If not, i think we should take the one-time pain.

IMO it's more than just fixing pre-1.7 replicasets. We should discourage users from working directly with ReplicaSets owned by Deployments. If users want more control, they should either not use Deployments or for one-off cases such as the one you are describing, they should orphan.

@0xmichalis
Copy link
Contributor

You can also script out the steps above but you would want to wait for the controllers to observe the changes during each step.

@caesarxuchao
Copy link
Member Author

@Kargakis could you confirm if it's a rare, or even invalid use case to keep some pods alive while deleting the replicasets and deployment? If so, i agree with your comment here.

@0xmichalis
Copy link
Contributor

@Kargakis could you confirm if it's a rare, or even invalid use case to keep some pods alive while deleting the replicasets and deployment? If so, i agree with your comment here.

I wouldn't say it's invalid but it's not a common thing to do. The only case I can think of where you would want to do that is to move the Pods under a new controller.

@krmayankk
Copy link

@caesarxuchao @Kargakis one thing i am not clear is setting ReplicaSets.Finalizers to foregroundDeletion, how would that result in triggerring the deletion of the pods of the replica sets(which is the problem we are trying to solve). As per this https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#foreground-cascading-deletion when doing foreground cascading deletion , the finalizer gets set by the gc controller to foregroundDeletion. Does the GC notice that the foregroundDeletion is set for an object, so it triggers the deletion of its dependenet obbjects ? Also is the blockOwnerDeletion already set for pods of the replica sets ?

Also its not clear to me why setting the propogation policy to background for RS when deleting them is bad idea. I have read all the comments above, but not sure i understand the logic.

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: caesarxuchao, liggitt
We suggest the following additional approver: lavalamp

Assign the PR to them by writing /assign @lavalamp in a comment when ready.

Associated issue: 44046

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

1 similar comment
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: caesarxuchao, liggitt
We suggest the following additional approver: lavalamp

Assign the PR to them by writing /assign @lavalamp in a comment when ready.

Associated issue: 44046

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@0xmichalis
Copy link
Contributor

/release-note-none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Jun 26, 2017
@0xmichalis 0xmichalis removed the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Jun 26, 2017
@caesarxuchao caesarxuchao added cherrypick-candidate cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. labels Jun 26, 2017
@caesarxuchao
Copy link
Member Author

@liggitt @Kargakis could you also review the release note? Although this PR is a bug fix, it changes the GC behavior, so it is worth a release note.

@caesarxuchao caesarxuchao added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 26, 2017
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 26, 2017

@caesarxuchao: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
Jenkins non-CRI GCE e2e 4cd9d04 link @k8s-bot non-cri e2e test this

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@caesarxuchao
Copy link
Member Author

/test pull-kubernetes-e2e-gce-etcd3

@liggitt
Copy link
Member

liggitt commented Jun 26, 2017

@caesarxuchao updated release note to put the general change first, and the specific issue fixed second

@caesarxuchao
Copy link
Member Author

Thanks @liggitt.

@caesarxuchao
Copy link
Member Author

cc @dchen1107

@dchen1107
Copy link
Member

/lgtm

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, dchen1107, liggitt

Associated issue: 44046

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 44058, 48085, 48077, 48076, 47823)

@k8s-github-robot k8s-github-robot merged commit 6a28658 into kubernetes:master Jun 26, 2017
k8s-github-robot pushed a commit that referenced this pull request Jun 27, 2017
…#44058-upstream-release-1.7

Automatic merge from submit-queue

Automated cherry pick of #44058

Cherry pick of #44058 on release-1.7.

#44058: revert 45764
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.7" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

k8s-github-robot pushed a commit that referenced this pull request Aug 8, 2017
Automatic merge from submit-queue

Add e2e test for cronjob chained removal

This is test proving #44058 works with cronjobs. This will fail until the aforementioned PR merges. 

@caesarxuchao ptal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet