-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add a proposal for synchronous garbage collection #32628
Conversation
* When deleting dependents, it should use the same `DeleteOptions.SynchronousGC` as the owner’s finalizers suggest. | ||
|
||
**Note** | ||
We cannot have circular dependencies anymore. Otherwise SynchronousGC will enter a deadlock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be the case. It should not be the case.
|
||
**Modifications to processItem()** | ||
|
||
`processItem()` consumes the `dirtyQueue`, requests the API server to delete an item if all of its owners do not exist. To support synchronous GC, it has to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more detail as requested.
GCE e2e build/test passed for commit d67eb53. |
cba45c3
to
0298cd3
Compare
|
||
**kubectl**: synchronous GC can replace the **kubectl delete** reapers. Currently `kubectl delete` blocks until all dependents and the owner are deleted. To maintain this behavior, after switched to using synchronous GC, *kubectl delete* needs to poll on the removal of the owner object. | ||
|
||
## Security implications |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
**Handling circular dependencies** | ||
|
||
SynchronousGC will enter a deadlock in the presence of circular dependencies. The garbage collector can break the circle by lazily detecting circular dependencies: when `processItem()` processes an object, if it finds the object and all of its owners have the `GCFinalizer`, it searches the internal owner-dependency relationship graph (`uidToNode`) to check if the object and any of its owner are in a circle where all objects have the `GCFinalizer`. If so, it removes the `GCFinzlier` from the object to break the circle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lavalamp how about handling dependency circle this way? This should break a circle timely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should track which object a user requested the sync deletion for, and which objects have the finalizer merely so that the GC can track their deletion (you could do this with two different finalizer strings, for example). The only thing that's important is that the thing the user sent gets deleted after everything else.
The only time there's a problematic circle is if users simultaneously delete all of the items in the circle. In that case the GC probably needs to downgrade someone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an idea at the top that expands on this concept of two finalizer classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked with Brian about this, too-- since we don't expect users to do this on purpose, let's do the easiest possible thing and break the cycle as soon as we notice it (i.e., as soon as we add the sync finalizer to an object which depends on an object that already has the sync label, we remove that link). We can add guarantees later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I didn't mention the "two finalizers" idea in the proposal. I think we need only one finalizer for now, and we can switch to use two finalizers when we want to support circular dependencies.
* If the GC observes the owning object with the `GCFinalizer` before it observes the creation of all the dependents, GC will remove the finalizer from the owning object before all dependents are gone. Hence, “Synchronous GC” is best-effort, though we guarantee that the dependents will be deleted eventually. We face a similar case when handling OrphanFinalizer, see [GC known issues](https://github.com/kubernetes/kubernetes/issues/26120). | ||
|
||
|
||
## Implications to existing clients |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added analysis of several controllers that might be affected by synchronous GC.
// If SynchronousGarbageCollection is set, the object will not be deleted immediately. Instead, a GarbageCollectionInProgress finalizer will be placed on the object. The garbage collector will remove the finalizer from the object when all depdendents are deleted. | ||
// SynchronousGarbageCollection and OrphanDependents are exclusive. | ||
// SynchronousGarbageCollection default to false. | ||
// SynchronousGarbageCollection is cascading, i.e., the object’s dependents will be deleted with the same SynchronousGarbageCollection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary, actually. It's one possible implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it may be very undesirable as it makes it harder to handle cycles.
… | ||
// If SynchronousGarbageCollection is set, the object will not be deleted immediately. Instead, a GarbageCollectionInProgress finalizer will be placed on the object. The garbage collector will remove the finalizer from the object when all depdendents are deleted. | ||
// SynchronousGarbageCollection and OrphanDependents are exclusive. | ||
// SynchronousGarbageCollection default to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider defaulting it in the way that causes the smallest change in old client behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(might be different for different objects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to false is the least disruptive way. Old clients don't expect GC.
} | ||
``` | ||
|
||
We will introduce a new standard finalizer: const GCFinalizer string = “GarbageCollectionInProgress” |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "CollectingGarbage".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string for the OrphanFinalizer is "orphan" (here), so probably "collectingGarbage". I hope we had named the OrphanFinalizer "orphaning".
|
||
Delete() function needs to check the DeleteOptions.SynchronousGarbageCollection. | ||
|
||
* The option is ignored if DeleteOptions.OrphanDependents is true or nil. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, reject the request if the user sets both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
**Node controller** forcefully deletes pod if the pod is scheduled to a node that does not exist ([code](../../pkg/controller/node/nodecontroller.go#L474)). The pod will continue to exist if it has pending finalizers. The node controller will futilely retry the deletion. Also, the `node controller` forcefully deletes pods before deleting the node ([code](../../pkg/controller/node/nodecontroller.go#L592)). If the pods have pending finalizers, the `node controller` will go ahead deleting the node, leaving those pods behind. These pods will be deleted from the key-value store when the pending finalizers are removed. | ||
|
||
**Podgc** deletes terminated pods if there are too many of them in the cluster. `Podgc` should remove any pending finalizers to make sure the pods are deleted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have a problem with finalizers not getting taken off quickly enough we should solve it elsewhere. Podgc shouldn't need to do anything specific about finalizers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
||
**Podgc** deletes terminated pods if there are too many of them in the cluster. `Podgc` should remove any pending finalizers to make sure the pods are deleted. | ||
|
||
**Deployment controller** adopts existing `ReplicaSet` (RS) if its template matches. If a matching RS has a pending `GCFinalizer`, deployment shouldn't adopt it, because the RS controller will not scale up/down a RS that's being deleted. Hence, `deployment controller` needs to check if a RS is being deleted before adopting it. If the RS is being deleted, then the `deployment controller` should wait for the status of the RS to show 0 replicas to avoid creating extra pods, then create a new RS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's being deleted, its pods still count. The only difference is that the deployment controller shouldn't try to mutate it at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
||
**Replication controller manager**, **Job controller**, and **ReplicaSet controller** ignore pods in terminated phase, so pods with pending finalizers will not block these controllers. | ||
|
||
**PetSet controller** will be blocked by a pod with pending finalizers, so Synchronous GC might slow down its progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pet set controller may wish to use synchronous deletion itself. @erictune
* handle Add or Update events where `obj.Finalizers.Has(GCFinalizer) && obj.DeletionTimestamp != nil`. The object will be added into the `synchronousGC queue`. The object will be marked as “GC in progress” in `uidToNode`. | ||
* Upon receiving the deletion event of an object, put its owner into the `synchronousGC queue`. This is to force the `GCFinalizer` (described next) to re-check if all dependents of the owner is deleted. | ||
|
||
**Addition of GCFinalizer() routine** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I think we can just do this in the current function that pops off of the current queue. if object.waiting { check_if_done() } else { ... }
`processItem()` consumes the `dirtyQueue`, requests the API server to delete an item if all of its owners do not exist. To support synchronous GC, it has to: | ||
|
||
* treat an owner as "not exist" if `owner.DeletionTimestamp != nil && !owner.Finalizers.Has(OrphanFinalizer)`, otherwise Synchronous GC will not progress because the owner keeps existing in the key-value store. | ||
* when deleting dependents, it should use the same `DeleteOptions.SynchronousGC` as the owner’s finalizers suggest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one way to do it. I guess it's probably easiest.
|
||
* treat an owner as "not exist" if `owner.DeletionTimestamp != nil && !owner.Finalizers.Has(OrphanFinalizer)`, otherwise Synchronous GC will not progress because the owner keeps existing in the key-value store. | ||
* when deleting dependents, it should use the same `DeleteOptions.SynchronousGC` as the owner’s finalizers suggest. | ||
* if an object has multiple owners, some owners still exit while other owners are in the synchronous GC stage, then according to the existing logic of GC, the object wouldn't be deleted. To unblock the synchronous GC of those owners, `processItem()` has to remove the ownerReferences pointing to them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/exit/exist/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
# Overview | ||
|
||
Some users of the server-side garbage collection need to tell if the garbage collection is done ([example](https://github.com/kubernetes/kubernetes/issues/19701#issuecomment-236997077)). Synchronous Garbage Collection is a best-effort (see [unhandled cases](#unhandled-cases)) mechanism to enable such use cases: after the API server receives a deletion request of an owning object, the object keeps existing in the key-value store until all its dependents are deleted from the key-value store by the garbage collector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do they need to know? Please add some text motivating this change.
Also clarify what exactly it means: Does it wait until all dependents are themselves in the terminating state, or until they're actually deleted and names released?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An idea might be to offer two flavors of sync GC:
- Class one merely waits for all dependent objects to be in terminating state. There's no circular dependency problems with this class.
- Class two waits for all dependent objects to additionally have their names released for reuse. There's a potential circular dependency problem here; one resolution is to say that the garbage collector may downgrade a single item in the circle to class one. A better fix is to allow releasing the name as a thing that happens before the object is physically deleted from the storage layer, but we currently can't do that. If we could, this wouldn't have circular dependency problems, either.
|
||
**Handling circular dependencies** | ||
|
||
SynchronousGC will enter a deadlock in the presence of circular dependencies. The garbage collector can break the circle by lazily detecting circular dependencies: when `processItem()` processes an object, if it finds the object and all of its owners have the `GCFinalizer`, it searches the internal owner-dependency relationship graph (`uidToNode`) to check if the object and any of its owner are in a circle where all objects have the `GCFinalizer`. If so, it removes the `GCFinzlier` from the object to break the circle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should track which object a user requested the sync deletion for, and which objects have the finalizer merely so that the GC can track their deletion (you could do this with two different finalizer strings, for example). The only thing that's important is that the thing the user sent gets deleted after everything else.
The only time there's a problematic circle is if users simultaneously delete all of the items in the circle. In that case the GC probably needs to downgrade someone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, but I have some suggestions.
|
||
**Handling circular dependencies** | ||
|
||
SynchronousGC will enter a deadlock in the presence of circular dependencies. The garbage collector can break the circle by lazily detecting circular dependencies: when `processItem()` processes an object, if it finds the object and all of its owners have the `GCFinalizer`, it searches the internal owner-dependency relationship graph (`uidToNode`) to check if the object and any of its owner are in a circle where all objects have the `GCFinalizer`. If so, it removes the `GCFinzlier` from the object to break the circle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an idea at the top that expands on this concept of two finalizer classes.
|
||
## Security implications | ||
|
||
A user who is authorized to update one object can affect the synchronous GC behavior of another object. Specifically, by setting an object as a pod's owner, and setting a very long grace termination period for the pod, a user can make the synchronous GC of the owner to take long time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially a big deal. Do we have any strategies to mitigate this? @deads2k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially a big deal. Do we have any strategies to mitigate this? @deads2k
I think this is a general problem with finalizing behavior which this proposal makes more obvious, but which always existed. Reading through, this looks like a way to provide nicer client behavior, but the mechanism is roughly the same. I don't know that you can do much better than allowing a timeout and a timeout behavior (fail or orphan), along with a status that links to offenders.
It looks very similar to the case where someone adds a finalizer ref on a namespace, but fails to start a controller that collects that finalizer. The namespace sticks around forever. @smarterclayton and I have talked about it, but we've never come up with a general solution to the stuck finalizer or stuck initializer problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talked a bit with Brian and came up with the idea of the GC impersonating the user at deletion time, and if the object is not visible to that user, then ignoring or breaking the link. It seems like the GC controller can do something in this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GC impersonating idea doesn't solve the scenario I described:
User A only has access to object Oa of kind Ka, but he sets Oa's ownerRef to point to Ob of kind Kb. User A also makes Oa's deletion take a very long time.
When user B, who has access to both Ka and Kb, initiates the synchronous deletion of Ob, he will find Ob's deletion takes a very long time, because of Oa.
The problem in this scenario is that user A gets the ability to affect the behavior of Ob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
along with a status that links to offenders
I think the garbage collector needs to expose the dependencies via the controller manager for debugging purposes. I'm not sure about the security implications. I think it's safe if the exposed information only contains the resource kind and the UID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that you can do much better than allowing a timeout and a timeout behavior (fail or orphan), along with a status that links to offenders.
I'm not in favor of a timeout. #33799 (comment) suggested let users define a timeout. It's hard for user to estimate how long GC will take, because it's hard for a user to figure out how many dependents are out there. On the other hand, if GC defines a default timeout, user will never know if all dependents are actually deleted.
I think we should disallow a user from setting ownerReference to point to a resource that he doesn't have update/delete permission. I think that's possible in our current system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a number of ways to reduce the severity of this problem. These are just some ideas.
- apiserver could control the list of finalizers you're allowed to add to an object. You'd have to register a finalizer and only sysadmins would have permission to do that. This reduces the chance of an attack where you keep an object around by adding a bogus finalizer.
- If we go with the finalizer registry idea, you could supply a max time along with the finalizer (probably bad idea for a number of reasons).
- We could make adding a finalizer require a particular permission, although this would be a field level auth. We might need a dedicated subresource for this.
- We could consider adding a restriction, like "cross-user links are {not permitted by apiserver | not honored by sync deletion by the GC}"
- We could add a restriction like, you can't add a link to an object that can't see you. Although this is pretty weird because it means less permissions is better for the attackee.
- Maybe better than that, just require mutual visibility.
- Actually, even better-- make the sync GC mode do a permission check. If Ob->Oa, and Oa is sync deleted, but Oa cannot see Ob, then still go ahead and delete Ob, but don't consider it as something that needs to be waited for. This still lets people in some namespace get their thing turned down when a dependency is turned down, but they can't block the dependency's turndown process.
- We could surface a metric, time spent in sync GC. sysadmins could get paged if it starts going up. (we should surface this metric no matter what, arguably.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We should come up with a way to express what's blocking the deletion, maybe in an annotation. After N seconds, we add an annotation like "deletion waiting for: eviluser/pods/sleepforever to be deleted". Then a user could either manually remove that link or force delete it to make the original operation proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1, 3 might be useful even out of the scope of synchronous GC.
4 is not possible today, we don't record which user owns an object.
5 and 7 do not solve #32628 (comment).
6 not sure what does "visibility" mean. If it means having create/delete/update permissions, then it's similar to what I suggested in #32628 (comment). But that might block some useful use cases ( e.g., a use wants his pod to share the fate of a database service, but he doesn't have delete permission of the database service, so he can't setup the reference).
8,9 look reasonable.
Didn't get a chance to read this but discussed in hall with @bgrant0607 and @lavalamp . I think the following can happen:
I don't know any way to fix this with our consistency model. |
@erictune when handling Ob, GC issues a GET to check if Oa exists. |
The Get can return a NotFound error in a HA etcd cluster. |
@liggitt I don't understand HA etcd cluster well. In the case you described, is it because one etcd has processed the creation of the object, but the consensus of the etcd cluster is that the object does not exist? |
We should do consistent reads, so that shouldn't be true. On Fri, Sep 23, 2016 at 2:05 PM, Chao Xu notifications@github.com wrote:
|
I would use a separate admission plugin. We have a proposal to ease admission, particularly the ordering aspects which are extremely difficult to reason about today (https://github.com/kubernetes/kubernetes/pull/34131/files#diff-2be0f77c0f5171454aa4db95bd66d342R110). Given a structure like that, its pretty easy to add yourself to the chain in an appropriate location. If you like, I can create one for owner references that requires permission to delete an object before allowing someone to set any ownerrefs to be used as a starting point. |
Thanks, that would be great, though we need to get both proposals in first. |
) | ||
``` | ||
|
||
`DeleteOptions.OrphanDependents *bool` will be removed. We decided not to add a `DeleteOptions.SynchronousGC *bool`, because together with `DeleteOptions.OrphanDependents` it results in 9 possible combinations and is thus confusing. This is a breaking change, so the new DeleteOptions will live in `api.k8s.io` group with version `v1`, as proposed in [#33900](https://github.com/kubernetes/kubernetes/pull/33900). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton @lavalamp can we make breaking changes to DeleteOptions and move it to api.k8s.io/v1
in 1.5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't remove the old field, just add the new field and declare that only one of them may be set.
Mark the old field deprecated and say we'll remove it in 1.7.
It's easy to look at both fields and figure out which one the client set, and do the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
… | ||
// Whether and how garbage collection will be performed. | ||
// Defaults to GarbageCollectionDefault | ||
GarbageCollectionPolicy GarbageCollectionPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it a pointer--it's optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
GarbageCollectionPolicy GarbageCollectionPolicy | ||
} | ||
|
||
type GarabgeCollectionPolicy string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Garabge/Garbage/
I really don't like calling this Garbage Collection for end users. It's On Oct 11, 2016, at 3:01 PM, Daniel Smith notifications@github.com wrote: @lavalamp commented on this pull request.In docs/proposals/synchronous-garbage-collection.md
s/Garabge/Garbage/ — |
How about
as a name? On Tue, Oct 11, 2016 at 3:12 PM, Clayton Coleman notifications@github.com
|
How about this? DeleteOptions {
…
DeletionPropagationPolicy DeletionPropagationPolicy
}
type DeletionPropagationPolicy string
const (
DefaultDeletionPropagationPolicy DeletionPropagationPolicy = "Default"
OrphanDependents DeletionPropagationPolicy = "OrphanDependents"
DeleteDependentsInBackground DeletionPropagationPolicy = "DeleteDependentsInBackground"
DeleteAfterDependentsAreDeleted DeletionPropagationPolicy = "DeleteAfterDependentsAreDeleted"
) |
… | ||
// Whether and how garbage collection will be performed. | ||
// Defaults to DefaultPropagationPolicy | ||
DeletionPropagationPolicy *DeletionPropagationPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smarterclayton how about these new names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than tense (not sure if it should be DeletePropagation or DeletionPropagation) I like this much more (they relate to propagation of deletes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "Delete" is present tense and "Deletion" is sort of perfect tense? I'll leave this to native speakers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name can probably just be PropagationPolicy
, it's redundant to say "delete" since it's in the DeleteOptions already. (Type name should still include "Delete")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The field name can probably just be PropagationPolicy
Done.
Probably "deletePropagationPolicy" is most correct. On Tue, Oct 11, 2016 at 9:26 PM, Chao Xu notifications@github.com wrote:
|
I agree with @smarterclayton's nit on the name ("delete propagation" instead of "deletion propagation"). lgtm |
|
||
**kubectl**: synchronous GC can simplify the **kubectl delete** reapers. Let's take the `deployment reaper` as an example, since it's the most complicated one. Currently, the reaper finds all `RS` with matching labels, scales them down, polls until `RS.Status.Replica` reaches 0, deletes the `RS`es, and finally deletes the `deployment`. If using synchronous GC, `kubectl delete deployment` is as easy as sending a synchronous GC delete request for the deployment, and polls until the deployment is deleted from the key-value store. | ||
|
||
Note that this **changes the behavior** of `kubectl delete`. The command will be blocked until all pods are deleted from the key-value store, instead of being blocked until pods are in the terminating state. This means `kubectl delete` blocks for longer time, but it has the benefit that the resources used by the pods are released when the `kubectl delete` returns. To allow kubectl user not waiting for the cleanup, we will add a `--wait` flag. It defaults to true; if it's set to `false`, `kubectl delete` will send the delete request with `DeletePropagationPolicy=DeleteDependentsInBackground` and return immediately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@deads2k I added --wait
to kubectl delete
I'd argue that it should be "deletionOptions" and "deletionPropagationPolicy" but it's hard to change the former, so we might as well be consistent and say "deletePropagationPolicy". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments mostly on the api.
DeleteOptions { | ||
… | ||
// Whether and how garbage collection will be performed. | ||
// Defaults to DefaultPropagationPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Either this field or OrphanDependents may be set, but not both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
const ( | ||
// Respects the existing garbage collection related finalizers on the object and the default garbage collection policy of the resource. | ||
DefaultPropagationPolicy DeletePropagationPolicy = "Default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeletePropagationPolicyDefault
DeletePropagationPolicyOrphan
DeletePropagationPolicyBackground
DeletePropagationPolicyForeground
...or make the names consistent some other way. Maybe the word "policy" isn't required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment, note explicitly that the default depends on the object type and the controller for the object. (If the controller pre-adds the sync finalizer label, then you'll get foreground behavior as the default unless you specifically request something else).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// The object exists in the key-value store until the garbage collector deletes all the dependents whose ownerReference.blockOwnerDeletion=true from the key-value store. | ||
// API sever will put the "DeletingDependents" finalizer on the object, and sets its deletionTimestamp. | ||
// This policy is cascading, i.e., the dependents will be deleted with GarbageCollectionSynchronous. | ||
DeleteAfterBlockingDependentsAreDeleted DeletePropagationPolicy = "DeleteAfterBlockingDependentsAreDeleted" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name is not going to work. I prefer something short and not immediately obvious (like Foreground) so that users will read the longer, hopefully very clear, description in the documentation.
|
||
## API Server | ||
|
||
`Delete()` function checks `DeleteOptions.PropagationPolicy`. If the policy is `DeleteAfterBlockingDependentsAreDeleted`, the API server will update the object instead of deleting it, add the finalizer, and set the `ObjectMeta.DeletionTimestamp`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the finalizer is already present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i.e., if it's present and they ask for orphan or background gc, we need to remove it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
type DeletePropagationPolicy string | ||
|
||
const ( | ||
// The default depends on the existing finalizers on the object and the type of the object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the comment, note explicitly that the default depends on the object type and the controller for the object. (If the controller pre-adds the sync finalizer label, then you'll get foreground behavior as the default unless you specifically request something else).
@lavalamp Is the current comment clear? I think saying the default depends on the "existing finalizers" is more general than saying it depends on "the controller for the object".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we need a DeletePropagationDefault
? Can we use nil to mean default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll keep the default for now. We can move the comment to DeleteOptions.PropagationPolicy
if we later decide DeletePropagationDefault
is redundent.
Comment looks good. Leaving a default might be good just to have a place to On Wed, Oct 12, 2016 at 4:23 PM, Chao Xu notifications@github.com wrote:
|
@lavalamp can we merge the proposal? @deads2k had given his lgtm (#32628 (comment)) |
Is there any objection merging the proposal? If not I'll apply the lgtm tomorrow. |
Applying the lgtm label. There is not much time left for 1.5. I'll focus on converting controllers to use client-go. Probably I'll do the implementation of this in 1.6. |
Automatic merge from submit-queue |
Automatic merge from submit-queue [RFC] Add a proposal for synchronous garbage collection Tracking issue: kubernetes#29891 @lavalamp @kubernetes/sig-api-machinery @kubernetes/api-review-team
Tracking issue: #29891
@lavalamp @kubernetes/sig-api-machinery @kubernetes/api-review-team
This change is