Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add a proposal for synchronous garbage collection #32628

Merged
merged 12 commits into from
Oct 18, 2016

Conversation

caesarxuchao
Copy link
Member

@caesarxuchao caesarxuchao commented Sep 14, 2016

Tracking issue: #29891

@lavalamp @kubernetes/sig-api-machinery @kubernetes/api-review-team


This change is Reviewable

@caesarxuchao caesarxuchao added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Sep 14, 2016
@k8s-github-robot k8s-github-robot added kind/design Categorizes issue or PR as related to design. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Sep 14, 2016
* When deleting dependents, it should use the same `DeleteOptions.SynchronousGC` as the owner’s finalizers suggest.

**Note**
We cannot have circular dependencies anymore. Otherwise SynchronousGC will enter a deadlock.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we handle this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be the case. It should not be the case.


**Modifications to processItem()**

`processItem()` consumes the `dirtyQueue`, requests the API server to delete an item if all of its owners do not exist. To support synchronous GC, it has to:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more detail as requested.

@k8s-bot
Copy link

k8s-bot commented Sep 14, 2016

GCE e2e build/test passed for commit d67eb53.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 21, 2016

**kubectl**: synchronous GC can replace the **kubectl delete** reapers. Currently `kubectl delete` blocks until all dependents and the owner are deleted. To maintain this behavior, after switched to using synchronous GC, *kubectl delete* needs to poll on the removal of the owner object.

## Security implications
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt @deads2k might be interested in this.


**Handling circular dependencies**

SynchronousGC will enter a deadlock in the presence of circular dependencies. The garbage collector can break the circle by lazily detecting circular dependencies: when `processItem()` processes an object, if it finds the object and all of its owners have the `GCFinalizer`, it searches the internal owner-dependency relationship graph (`uidToNode`) to check if the object and any of its owner are in a circle where all objects have the `GCFinalizer`. If so, it removes the `GCFinzlier` from the object to break the circle.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp how about handling dependency circle this way? This should break a circle timely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should track which object a user requested the sync deletion for, and which objects have the finalizer merely so that the GC can track their deletion (you could do this with two different finalizer strings, for example). The only thing that's important is that the thing the user sent gets deleted after everything else.

The only time there's a problematic circle is if users simultaneously delete all of the items in the circle. In that case the GC probably needs to downgrade someone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an idea at the top that expands on this concept of two finalizer classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with Brian about this, too-- since we don't expect users to do this on purpose, let's do the easiest possible thing and break the cycle as soon as we notice it (i.e., as soon as we add the sync finalizer to an object which depends on an object that already has the sync label, we remove that link). We can add guarantees later if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I didn't mention the "two finalizers" idea in the proposal. I think we need only one finalizer for now, and we can switch to use two finalizers when we want to support circular dependencies.

* If the GC observes the owning object with the `GCFinalizer` before it observes the creation of all the dependents, GC will remove the finalizer from the owning object before all dependents are gone. Hence, “Synchronous GC” is best-effort, though we guarantee that the dependents will be deleted eventually. We face a similar case when handling OrphanFinalizer, see [GC known issues](https://github.com/kubernetes/kubernetes/issues/26120).


## Implications to existing clients
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added analysis of several controllers that might be affected by synchronous GC.

// If SynchronousGarbageCollection is set, the object will not be deleted immediately. Instead, a GarbageCollectionInProgress finalizer will be placed on the object. The garbage collector will remove the finalizer from the object when all depdendents are deleted.
// SynchronousGarbageCollection and OrphanDependents are exclusive.
// SynchronousGarbageCollection default to false.
// SynchronousGarbageCollection is cascading, i.e., the object’s dependents will be deleted with the same SynchronousGarbageCollection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary, actually. It's one possible implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it may be very undesirable as it makes it harder to handle cycles.

// If SynchronousGarbageCollection is set, the object will not be deleted immediately. Instead, a GarbageCollectionInProgress finalizer will be placed on the object. The garbage collector will remove the finalizer from the object when all depdendents are deleted.
// SynchronousGarbageCollection and OrphanDependents are exclusive.
// SynchronousGarbageCollection default to false.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider defaulting it in the way that causes the smallest change in old client behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(might be different for different objects)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to false is the least disruptive way. Old clients don't expect GC.

}
```

We will introduce a new standard finalizer: const GCFinalizer string = “GarbageCollectionInProgress”
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "CollectingGarbage".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string for the OrphanFinalizer is "orphan" (here), so probably "collectingGarbage". I hope we had named the OrphanFinalizer "orphaning".


Delete() function needs to check the DeleteOptions.SynchronousGarbageCollection.

* The option is ignored if DeleteOptions.OrphanDependents is true or nil.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, reject the request if the user sets both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


**Node controller** forcefully deletes pod if the pod is scheduled to a node that does not exist ([code](../../pkg/controller/node/nodecontroller.go#L474)). The pod will continue to exist if it has pending finalizers. The node controller will futilely retry the deletion. Also, the `node controller` forcefully deletes pods before deleting the node ([code](../../pkg/controller/node/nodecontroller.go#L592)). If the pods have pending finalizers, the `node controller` will go ahead deleting the node, leaving those pods behind. These pods will be deleted from the key-value store when the pending finalizers are removed.

**Podgc** deletes terminated pods if there are too many of them in the cluster. `Podgc` should remove any pending finalizers to make sure the pods are deleted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a problem with finalizers not getting taken off quickly enough we should solve it elsewhere. Podgc shouldn't need to do anything specific about finalizers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


**Podgc** deletes terminated pods if there are too many of them in the cluster. `Podgc` should remove any pending finalizers to make sure the pods are deleted.

**Deployment controller** adopts existing `ReplicaSet` (RS) if its template matches. If a matching RS has a pending `GCFinalizer`, deployment shouldn't adopt it, because the RS controller will not scale up/down a RS that's being deleted. Hence, `deployment controller` needs to check if a RS is being deleted before adopting it. If the RS is being deleted, then the `deployment controller` should wait for the status of the RS to show 0 replicas to avoid creating extra pods, then create a new RS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's being deleted, its pods still count. The only difference is that the deployment controller shouldn't try to mutate it at that point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


**Replication controller manager**, **Job controller**, and **ReplicaSet controller** ignore pods in terminated phase, so pods with pending finalizers will not block these controllers.

**PetSet controller** will be blocked by a pod with pending finalizers, so Synchronous GC might slow down its progress.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pet set controller may wish to use synchronous deletion itself. @erictune

* handle Add or Update events where `obj.Finalizers.Has(GCFinalizer) && obj.DeletionTimestamp != nil`. The object will be added into the `synchronousGC queue`. The object will be marked as “GC in progress” in `uidToNode`.
* Upon receiving the deletion event of an object, put its owner into the `synchronousGC queue`. This is to force the `GCFinalizer` (described next) to re-check if all dependents of the owner is deleted.

**Addition of GCFinalizer() routine**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think we can just do this in the current function that pops off of the current queue. if object.waiting { check_if_done() } else { ... }

`processItem()` consumes the `dirtyQueue`, requests the API server to delete an item if all of its owners do not exist. To support synchronous GC, it has to:

* treat an owner as "not exist" if `owner.DeletionTimestamp != nil && !owner.Finalizers.Has(OrphanFinalizer)`, otherwise Synchronous GC will not progress because the owner keeps existing in the key-value store.
* when deleting dependents, it should use the same `DeleteOptions.SynchronousGC` as the owner’s finalizers suggest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one way to do it. I guess it's probably easiest.


* treat an owner as "not exist" if `owner.DeletionTimestamp != nil && !owner.Finalizers.Has(OrphanFinalizer)`, otherwise Synchronous GC will not progress because the owner keeps existing in the key-value store.
* when deleting dependents, it should use the same `DeleteOptions.SynchronousGC` as the owner’s finalizers suggest.
* if an object has multiple owners, some owners still exit while other owners are in the synchronous GC stage, then according to the existing logic of GC, the object wouldn't be deleted. To unblock the synchronous GC of those owners, `processItem()` has to remove the ownerReferences pointing to them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/exit/exist/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# Overview

Some users of the server-side garbage collection need to tell if the garbage collection is done ([example](https://github.com/kubernetes/kubernetes/issues/19701#issuecomment-236997077)). Synchronous Garbage Collection is a best-effort (see [unhandled cases](#unhandled-cases)) mechanism to enable such use cases: after the API server receives a deletion request of an owning object, the object keeps existing in the key-value store until all its dependents are deleted from the key-value store by the garbage collector.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do they need to know? Please add some text motivating this change.

Also clarify what exactly it means: Does it wait until all dependents are themselves in the terminating state, or until they're actually deleted and names released?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea might be to offer two flavors of sync GC:

  • Class one merely waits for all dependent objects to be in terminating state. There's no circular dependency problems with this class.
  • Class two waits for all dependent objects to additionally have their names released for reuse. There's a potential circular dependency problem here; one resolution is to say that the garbage collector may downgrade a single item in the circle to class one. A better fix is to allow releasing the name as a thing that happens before the object is physically deleted from the storage layer, but we currently can't do that. If we could, this wouldn't have circular dependency problems, either.


**Handling circular dependencies**

SynchronousGC will enter a deadlock in the presence of circular dependencies. The garbage collector can break the circle by lazily detecting circular dependencies: when `processItem()` processes an object, if it finds the object and all of its owners have the `GCFinalizer`, it searches the internal owner-dependency relationship graph (`uidToNode`) to check if the object and any of its owner are in a circle where all objects have the `GCFinalizer`. If so, it removes the `GCFinzlier` from the object to break the circle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should track which object a user requested the sync deletion for, and which objects have the finalizer merely so that the GC can track their deletion (you could do this with two different finalizer strings, for example). The only thing that's important is that the thing the user sent gets deleted after everything else.

The only time there's a problematic circle is if users simultaneously delete all of the items in the circle. In that case the GC probably needs to downgrade someone.

Copy link
Member

@lavalamp lavalamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, but I have some suggestions.


**Handling circular dependencies**

SynchronousGC will enter a deadlock in the presence of circular dependencies. The garbage collector can break the circle by lazily detecting circular dependencies: when `processItem()` processes an object, if it finds the object and all of its owners have the `GCFinalizer`, it searches the internal owner-dependency relationship graph (`uidToNode`) to check if the object and any of its owner are in a circle where all objects have the `GCFinalizer`. If so, it removes the `GCFinzlier` from the object to break the circle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an idea at the top that expands on this concept of two finalizer classes.


## Security implications

A user who is authorized to update one object can affect the synchronous GC behavior of another object. Specifically, by setting an object as a pod's owner, and setting a very long grace termination period for the pod, a user can make the synchronous GC of the owner to take long time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially a big deal. Do we have any strategies to mitigate this? @deads2k

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially a big deal. Do we have any strategies to mitigate this? @deads2k

I think this is a general problem with finalizing behavior which this proposal makes more obvious, but which always existed. Reading through, this looks like a way to provide nicer client behavior, but the mechanism is roughly the same. I don't know that you can do much better than allowing a timeout and a timeout behavior (fail or orphan), along with a status that links to offenders.

It looks very similar to the case where someone adds a finalizer ref on a namespace, but fails to start a controller that collects that finalizer. The namespace sticks around forever. @smarterclayton and I have talked about it, but we've never come up with a general solution to the stuck finalizer or stuck initializer problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked a bit with Brian and came up with the idea of the GC impersonating the user at deletion time, and if the object is not visible to that user, then ignoring or breaking the link. It seems like the GC controller can do something in this area.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC impersonating idea doesn't solve the scenario I described:
User A only has access to object Oa of kind Ka, but he sets Oa's ownerRef to point to Ob of kind Kb. User A also makes Oa's deletion take a very long time.
When user B, who has access to both Ka and Kb, initiates the synchronous deletion of Ob, he will find Ob's deletion takes a very long time, because of Oa.
The problem in this scenario is that user A gets the ability to affect the behavior of Ob.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

along with a status that links to offenders

I think the garbage collector needs to expose the dependencies via the controller manager for debugging purposes. I'm not sure about the security implications. I think it's safe if the exposed information only contains the resource kind and the UID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that you can do much better than allowing a timeout and a timeout behavior (fail or orphan), along with a status that links to offenders.

I'm not in favor of a timeout. #33799 (comment) suggested let users define a timeout. It's hard for user to estimate how long GC will take, because it's hard for a user to figure out how many dependents are out there. On the other hand, if GC defines a default timeout, user will never know if all dependents are actually deleted.

I think we should disallow a user from setting ownerReference to point to a resource that he doesn't have update/delete permission. I think that's possible in our current system?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a number of ways to reduce the severity of this problem. These are just some ideas.

  1. apiserver could control the list of finalizers you're allowed to add to an object. You'd have to register a finalizer and only sysadmins would have permission to do that. This reduces the chance of an attack where you keep an object around by adding a bogus finalizer.
  2. If we go with the finalizer registry idea, you could supply a max time along with the finalizer (probably bad idea for a number of reasons).
  3. We could make adding a finalizer require a particular permission, although this would be a field level auth. We might need a dedicated subresource for this.
  4. We could consider adding a restriction, like "cross-user links are {not permitted by apiserver | not honored by sync deletion by the GC}"
  5. We could add a restriction like, you can't add a link to an object that can't see you. Although this is pretty weird because it means less permissions is better for the attackee.
  6. Maybe better than that, just require mutual visibility.
  7. Actually, even better-- make the sync GC mode do a permission check. If Ob->Oa, and Oa is sync deleted, but Oa cannot see Ob, then still go ahead and delete Ob, but don't consider it as something that needs to be waited for. This still lets people in some namespace get their thing turned down when a dependency is turned down, but they can't block the dependency's turndown process.
  8. We could surface a metric, time spent in sync GC. sysadmins could get paged if it starts going up. (we should surface this metric no matter what, arguably.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We should come up with a way to express what's blocking the deletion, maybe in an annotation. After N seconds, we add an annotation like "deletion waiting for: eviluser/pods/sleepforever to be deleted". Then a user could either manually remove that link or force delete it to make the original operation proceed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1, 3 might be useful even out of the scope of synchronous GC.
4 is not possible today, we don't record which user owns an object.
5 and 7 do not solve #32628 (comment).
6 not sure what does "visibility" mean. If it means having create/delete/update permissions, then it's similar to what I suggested in #32628 (comment). But that might block some useful use cases ( e.g., a use wants his pod to share the fate of a database service, but he doesn't have delete permission of the database service, so he can't setup the reference).

8,9 look reasonable.

@erictune
Copy link
Member

Didn't get a chance to read this but discussed in hall with @bgrant0607 and @lavalamp .

I think the following can happen:

  1. User creates object Oa of kind Ka and gets 200 / verifies existence of Oa.
  2. User creates object Ob of kind Kb that has parent object Oa.
  3. GC controller is watching Ka's and Kb's.
  4. GC controller sees Ob before it sees Oa. GC Controller concludes Ob needs to be deleted.

I don't know any way to fix this with our consistency model.

@caesarxuchao
Copy link
Member Author

GC controller sees Ob before it sees Oa. GC Controller concludes Ob needs to be deleted.

@erictune when handling Ob, GC issues a GET to check if Oa exists.

@liggitt
Copy link
Member

liggitt commented Sep 23, 2016

The Get can return a NotFound error in a HA etcd cluster.

@caesarxuchao
Copy link
Member Author

@liggitt I don't understand HA etcd cluster well. In the case you described, is it because one etcd has processed the creation of the object, but the consensus of the etcd cluster is that the object does not exist?

@lavalamp
Copy link
Member

We should do consistent reads, so that shouldn't be true.

On Fri, Sep 23, 2016 at 2:05 PM, Chao Xu notifications@github.com wrote:

@liggitt https://github.com/liggitt I don't understand HA etcd cluster
well. In the case you described, is it because one etcd has processed the
creation of the object, but the consensus of the etcd cluster is that the
object does not exist?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32628 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAnglsu59U0PMI0ZP9iz6kqdZBUNXCpfks5qtD8kgaJpZM4J8Saa
.

@deads2k
Copy link
Contributor

deads2k commented Oct 11, 2016

Regarding admission controllers, I think they are a bunch of plugins. I don't think we should write an admission controller whose sole responsibility is checking ownerReferences, but I don't see which existing admission controller can take this responsibility. Do you have suggestions?

I would use a separate admission plugin. We have a proposal to ease admission, particularly the ordering aspects which are extremely difficult to reason about today (https://github.com/kubernetes/kubernetes/pull/34131/files#diff-2be0f77c0f5171454aa4db95bd66d342R110). Given a structure like that, its pretty easy to add yourself to the chain in an appropriate location.

If you like, I can create one for owner references that requires permission to delete an object before allowing someone to set any ownerrefs to be used as a starting point.

@caesarxuchao
Copy link
Member Author

If you like, I can create one for owner references that requires permission to delete an object before allowing someone to set any ownerrefs to be used as a starting point.

Thanks, that would be great, though we need to get both proposals in first.

)
```

`DeleteOptions.OrphanDependents *bool` will be removed. We decided not to add a `DeleteOptions.SynchronousGC *bool`, because together with `DeleteOptions.OrphanDependents` it results in 9 possible combinations and is thus confusing. This is a breaking change, so the new DeleteOptions will live in `api.k8s.io` group with version `v1`, as proposed in [#33900](https://github.com/kubernetes/kubernetes/pull/33900).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton @lavalamp can we make breaking changes to DeleteOptions and move it to api.k8s.io/v1 in 1.5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove the old field, just add the new field and declare that only one of them may be set.

Mark the old field deprecated and say we'll remove it in 1.7.

It's easy to look at both fields and figure out which one the client set, and do the right thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// Whether and how garbage collection will be performed.
// Defaults to GarbageCollectionDefault
GarbageCollectionPolicy GarbageCollectionPolicy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a pointer--it's optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

GarbageCollectionPolicy GarbageCollectionPolicy
}

type GarabgeCollectionPolicy string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Garabge/Garbage/

@smarterclayton
Copy link
Contributor

I really don't like calling this Garbage Collection for end users. It's
really not garbage, it's dependent object deletion. I feel pretty strongly
we should be aligning what this does, not that it is

On Oct 11, 2016, at 3:01 PM, Daniel Smith notifications@github.com wrote:

@lavalamp commented on this pull request.

In docs/proposals/synchronous-garbage-collection.md
#32628 (review)
:

+}
++ +The initial draft of the proposal did not include this field and it had a security loophole: a user who is only authorized to update one resource can set ownerReference to block the synchronous GC of other resources. Requiring users to explicitly setBlockOwnerDeletion allows the master to properly authorize the request. + +## DeleteOptions + +go
+DeleteOptions {

  • // Whether and how garbage collection will be performed.
  • // Defaults to GarbageCollectionDefault
  • GarbageCollectionPolicy GarbageCollectionPolicy
    +}

+type GarabgeCollectionPolicy string

s/Garabge/Garbage/


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32628 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p6frJo0FtY4zORmN4slpu8zsyGt3ks5qzAdHgaJpZM4J8Saa
.

@lavalamp
Copy link
Member

How about

PropagationPolicy DeletionPropagationPolicy

as a name?

On Tue, Oct 11, 2016 at 3:12 PM, Clayton Coleman notifications@github.com
wrote:

I really don't like calling this Garbage Collection for end users. It's
really not garbage, it's dependent object deletion. I feel pretty strongly
we should be aligning what this does, not that it is

On Oct 11, 2016, at 3:01 PM, Daniel Smith notifications@github.com
wrote:

@lavalamp commented on this pull request.

In docs/proposals/synchronous-garbage-collection.md
<#32628 (review)
3774163>
:

+}
++ +The initial draft of the proposal did not include this field and it had a security loophole: a user who is only authorized to update one resource can set ownerReference to block the synchronous GC of other resources. Requiring users to explicitly setBlockOwnerDeletion allows the master to properly authorize the request. + +## DeleteOptions + +go
+DeleteOptions {

  • // Whether and how garbage collection will be performed.
  • // Defaults to GarbageCollectionDefault
  • GarbageCollectionPolicy GarbageCollectionPolicy
    +}
    +
    +type GarabgeCollectionPolicy string

s/Garabge/Garbage/


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#32628 (review)
3774163>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_
p6frJo0FtY4zORmN4slpu8zsyGt3ks5qzAdHgaJpZM4J8Saa>
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32628 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAnglvXEKdZaNuQ9KVet-6ChhCxdJ6z6ks5qzAnEgaJpZM4J8Saa
.

@caesarxuchao
Copy link
Member Author

caesarxuchao commented Oct 11, 2016

How about this?

DeleteOptions {
  …
  DeletionPropagationPolicy DeletionPropagationPolicy
}

type DeletionPropagationPolicy string

const (
    DefaultDeletionPropagationPolicy DeletionPropagationPolicy = "Default"
    OrphanDependents DeletionPropagationPolicy = "OrphanDependents"
    DeleteDependentsInBackground DeletionPropagationPolicy = "DeleteDependentsInBackground"
    DeleteAfterDependentsAreDeleted DeletionPropagationPolicy = "DeleteAfterDependentsAreDeleted"
)

// Whether and how garbage collection will be performed.
// Defaults to DefaultPropagationPolicy
DeletionPropagationPolicy *DeletionPropagationPolicy
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton how about these new names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than tense (not sure if it should be DeletePropagation or DeletionPropagation) I like this much more (they relate to propagation of deletes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "Delete" is present tense and "Deletion" is sort of perfect tense? I'll leave this to native speakers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name can probably just be PropagationPolicy, it's redundant to say "delete" since it's in the DeleteOptions already. (Type name should still include "Delete")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name can probably just be PropagationPolicy

Done.

@smarterclayton
Copy link
Contributor

Probably "deletePropagationPolicy" is most correct.

On Tue, Oct 11, 2016 at 9:26 PM, Chao Xu notifications@github.com wrote:

@caesarxuchao commented on this pull request.

In docs/proposals/synchronous-garbage-collection.md
#32628:

  • // Defaults to false.
    
  • // To set this field, a user needs "update" and "delete" permission of the owner, otherwise 422 (Unprocessable Entity) will be returned.
    
  • BlockOwnerDeletion *bool
    
    +}
    ++ +The initial draft of the proposal did not include this field and it had a security loophole: a user who is only authorized to update one resource can set ownerReference to block the synchronous GC of other resources. Requiring users to explicitly setBlockOwnerDeletion allows the master to properly authorize the request. + +## DeleteOptions + +go
    +DeleteOptions {
  • // Whether and how garbage collection will be performed.
  • // Defaults to DefaultPropagationPolicy
  • DeletionPropagationPolicy *DeletionPropagationPolicy

Do you mean "Delete" is present tense and "Deletion" is sort of perfect
tense? I'll leave this to native speakers.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32628, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p6H_dt-Y6btx9Tf_VEb4OYo48hItks5qzGF2gaJpZM4J8Saa
.

@caesarxuchao
Copy link
Member Author

@krousey @lavalamp @deads2k are there more comments? This a 1.5 feature. Thanks.

@deads2k
Copy link
Contributor

deads2k commented Oct 12, 2016

@krousey @lavalamp @deads2k are there more comments? This a 1.5 feature. Thanks.

I agree with @smarterclayton's nit on the name ("delete propagation" instead of "deletion propagation").

lgtm


**kubectl**: synchronous GC can simplify the **kubectl delete** reapers. Let's take the `deployment reaper` as an example, since it's the most complicated one. Currently, the reaper finds all `RS` with matching labels, scales them down, polls until `RS.Status.Replica` reaches 0, deletes the `RS`es, and finally deletes the `deployment`. If using synchronous GC, `kubectl delete deployment` is as easy as sending a synchronous GC delete request for the deployment, and polls until the deployment is deleted from the key-value store.

Note that this **changes the behavior** of `kubectl delete`. The command will be blocked until all pods are deleted from the key-value store, instead of being blocked until pods are in the terminating state. This means `kubectl delete` blocks for longer time, but it has the benefit that the resources used by the pods are released when the `kubectl delete` returns. To allow kubectl user not waiting for the cleanup, we will add a `--wait` flag. It defaults to true; if it's set to `false`, `kubectl delete` will send the delete request with `DeletePropagationPolicy=DeleteDependentsInBackground` and return immediately.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k I added --wait to kubectl delete

@lavalamp
Copy link
Member

I'd argue that it should be "deletionOptions" and "deletionPropagationPolicy" but it's hard to change the former, so we might as well be consistent and say "deletePropagationPolicy".

Copy link
Member

@lavalamp lavalamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments mostly on the api.

DeleteOptions {
// Whether and how garbage collection will be performed.
// Defaults to DefaultPropagationPolicy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Either this field or OrphanDependents may be set, but not both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


const (
// Respects the existing garbage collection related finalizers on the object and the default garbage collection policy of the resource.
DefaultPropagationPolicy DeletePropagationPolicy = "Default"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeletePropagationPolicyDefault
DeletePropagationPolicyOrphan
DeletePropagationPolicyBackground
DeletePropagationPolicyForeground

...or make the names consistent some other way. Maybe the word "policy" isn't required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment, note explicitly that the default depends on the object type and the controller for the object. (If the controller pre-adds the sync finalizer label, then you'll get foreground behavior as the default unless you specifically request something else).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// The object exists in the key-value store until the garbage collector deletes all the dependents whose ownerReference.blockOwnerDeletion=true from the key-value store.
// API sever will put the "DeletingDependents" finalizer on the object, and sets its deletionTimestamp.
// This policy is cascading, i.e., the dependents will be deleted with GarbageCollectionSynchronous.
DeleteAfterBlockingDependentsAreDeleted DeletePropagationPolicy = "DeleteAfterBlockingDependentsAreDeleted"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this name is not going to work. I prefer something short and not immediately obvious (like Foreground) so that users will read the longer, hopefully very clear, description in the documentation.


## API Server

`Delete()` function checks `DeleteOptions.PropagationPolicy`. If the policy is `DeleteAfterBlockingDependentsAreDeleted`, the API server will update the object instead of deleting it, add the finalizer, and set the `ObjectMeta.DeletionTimestamp`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the finalizer is already present?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i.e., if it's present and they ask for orphan or background gc, we need to remove it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

type DeletePropagationPolicy string

const (
// The default depends on the existing finalizers on the object and the type of the object.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the comment, note explicitly that the default depends on the object type and the controller for the object. (If the controller pre-adds the sync finalizer label, then you'll get foreground behavior as the default unless you specifically request something else).

@lavalamp Is the current comment clear? I think saying the default depends on the "existing finalizers" is more general than saying it depends on "the controller for the object".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we need a DeletePropagationDefault? Can we use nil to mean default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll keep the default for now. We can move the comment to DeleteOptions.PropagationPolicy if we later decide DeletePropagationDefault is redundent.

@lavalamp
Copy link
Member

Comment looks good. Leaving a default might be good just to have a place to
put documentation, but I don't feel super strongly.

On Wed, Oct 12, 2016 at 4:23 PM, Chao Xu notifications@github.com wrote:

@caesarxuchao commented on this pull request.

In docs/proposals/synchronous-garbage-collection.md
#32628 (review)
:

+## DeleteOptions
+
+```go
+DeleteOptions {

  • // Whether and how garbage collection will be performed.
  • // Defaults to DeletePropagationDefault
  • // Either this field or OrphanDependents may be set, but not both.
  • PropagationPolicy *DeletePropagationPolicy
    +}

+type DeletePropagationPolicy string
+
+const (

  • // The default depends on the existing finalizers on the object and the type of the object.

In the comment, note explicitly that the default depends on the object
type and the controller for the object. (If the controller pre-adds the
sync finalizer label, then you'll get foreground behavior as the default
unless you specifically request something else).

@lavalamp https://github.com/lavalamp Is the current comment clear? I
think saying the default depends on the "existing finalizers" is more

general than saying it depends on "the controller for the object".

In docs/proposals/synchronous-garbage-collection.md
#32628 (review)
:

+## DeleteOptions
+
+```go
+DeleteOptions {

  • // Whether and how garbage collection will be performed.
  • // Defaults to DeletePropagationDefault
  • // Either this field or OrphanDependents may be set, but not both.
  • PropagationPolicy *DeletePropagationPolicy
    +}

+type DeletePropagationPolicy string
+
+const (

  • // The default depends on the existing finalizers on the object and the type of the object.

Also, do we need a DeletePropagationDefault? Can we use nil to mean
default?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32628 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAngltsu4xlMUfmNf9xbTSbZzaf0eZShks5qzWvtgaJpZM4J8Saa
.

@caesarxuchao
Copy link
Member Author

@lavalamp can we merge the proposal? @deads2k had given his lgtm (#32628 (comment))

@caesarxuchao
Copy link
Member Author

Is there any objection merging the proposal? If not I'll apply the lgtm tomorrow.

@caesarxuchao caesarxuchao added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 18, 2016
@caesarxuchao
Copy link
Member Author

Applying the lgtm label.

There is not much time left for 1.5. I'll focus on converting controllers to use client-go. Probably I'll do the implementation of this in 1.6.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 97525c0 into kubernetes:master Oct 18, 2016
xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
Automatic merge from submit-queue

[RFC] Add a proposal for synchronous garbage collection

Tracking issue: kubernetes#29891

@lavalamp @kubernetes/sig-api-machinery @kubernetes/api-review-team
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Categorizes issue or PR as related to design. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.