-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistently support graceful and immediate termination for all objects #1535
Comments
Big +1 on implementing this and making it a generic principle across resources. Definition and logic for graceful vs non-graceful termination for each resource belongs in the server, not in client pieces like "kubecfg stop." |
Thinking about the CLI perspective, I could imagine 3 operations:
|
Proposed approach: Add a "stop" or "shutdown" field to each object's spec, and then support a custom verb, e.g. /op/stop or /op/shutdown, to set the field. |
I'd also like to add a |
Assuming we move forward with #3613, I would like to expose a pattern where I can effectively stop a Namespace, mark all of its content for deletion by some background controller, and then ultimately purge the Namespace resource. Reading through this thread and others, it's not clear we have consensus on the right pattern for this across resources. Is there a recommended pattern to follow that I can look to prototype? I feel like I need a separate resource (NamespaceTermination) that I can post to kick off the proper workflow as I tend to agree that a DELETE should remain a true delete. |
The XxxTermination resource could be protected via a separate policy so it's not confused with traditional PUT. Acceptance of the XxxTermination would toggle a Status field on the resource. A controller would see the resource marked for termination and perform all required cleanup. Upon completion, a client with proper Policy rights would send the DELETE to the Namespace resource. A delete would be accepted iff the resource was in a Terminated status, and the final resource removal would complete. In the interim, clients could do normal Get operations on the internal resources to track the purge to completion. This seems like the general flow I would look to prototype this week. |
@derekwaynecarr My proposed approach was here: #1535 (comment). We may need to bikeshed about the names/paths, though. You could think of the "custom verb" as a synthetic control resource, such as XxxTermination. I'm happy to have "stop" and "delete" be distinct. We can provide the ability to do both in a single operation in kubectl. |
We need existence dependencies for cascading deletion in the server. Deployments generate ReplicaSets which generate Pods. Doing cleanup in the client will be fragile and unfriendly to non-kubectl clients. |
/sub |
Closing in favor of #19054, which is being implemented for 1.3. |
We have the garbage collector in place now but is there an umbrela issue or separate issues for making it work for all of the resources that need graceful deletion? |
Maybe #26120? |
Issue discussed in #103, #1325, #1445, and other issues/PRs.
Our API isn't consistent on clean/graceful shutdown vs. immediate termination of our objects. We should support both modes, and in a consistent fashion. One mode should be the default, and the other should be available via a URL parameter on DELETE or custom verb (e.g.,
/stop
, for graceful shutdown). Graceful shutdown should accept a timeout (as discussed in lifecycle hook PRs) and reason (#1462) as parameters.Cleanly turning down a replication controller currently requires external orchestration: resizing it to 0 and waiting for pods to be deleted prior to deleting the replication controller itself.
On the other hand, by default, individual pods are supposed to gracefully terminate, executing their PreStop handlers, SIGTERM handlers, and, in the future, PostStop handlers.
We should define what clean shutdown would mean for a service (wait for all targeted pods to be deleted?).
However, there are definitely occasions when one would not want graceful shutdown (e.g., shutting down a test deployment), and we should support that.
The text was updated successfully, but these errors were encountered: