-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Forgiveness #1574
Comments
Is the decision to unbind a pod from a host (delete it) a decision a node controller makes (delete all pods on node, then delete node)? Or is it reactive by the system from the node being deleted (node deleted, therefore some process needs to clean up pods)? The former seems like annotations and a custom node controller would solve (delete all pods immediately that have forgiveness true). The latter seems like it would have to be supported by the system itself. This is probably a separate issue but I don't know where it is - should a node have a concept of being unavailable for scheduling? |
@smarterclayton Yes, a node should have a concept of being unavailable for scheduling, and the system (a node controller) should also be responsible for removing pods from non-functional nodes -- see #1366. This separates concerns of pod controlling vs. node controlling (for the most part). Both control systems will get more and more sophisticated over time. |
Sketch of forgiveness from #3058:
|
OpenShift implemented a "manage-node" command which models an evacuate action. Currently it's up to the admin to decide whether the pods will be moved, but a forgiveness value on the pod could be the input to that command (just like the automatic evacuation implemented in node controller). A node with a pod with a long forgiveness policy is also less transient, so there is potentially value in making scheduling decisions based on forgiveness (where increasing the forgiveness of a given node has a cost, but scheduling a pod with a lower forgiveness is free). That's probably more a model for more traditional operations teams, and is certainly not a high priority. |
Do we want to simplify forgiveness to try to get it for 1.1? As a general policy value it's useful to have in the system. |
I don't know if it would be high enough priority to commit to for 1.1, but I could imagine the first step would just be a bool (Pod is treated normally, i.e. deleted by NodeController after standard timeout, vs. Pod stays bound to the node forever unless manually deleted or node is deleted).. |
How about a duration? We want to dramatically reduce the fail detecting time for certain Maybe there's a more tactical way to manage that, but that does seem |
That seems reasonable to me, I was just suggesting the simplest thing we could do. What you suggested would also make it a subset of what @bgrant0607 suggested in his comment above, so Not sure about prioritization for 1.1 though. |
If we added the value, but did not leverage it in node controller (or made As far as prioritization it's something we'd be interested in implementing, |
Is there any doc that has a couple of examples of concrete real world applications and what forgiveness values we recommend for Taints and Tolerations? |
We have a few apps running that are "must never be evicted" where the user On Oct 29, 2016, at 10:08 AM, Eric Tune notifications@github.com wrote: Is there any doc that has a couple of examples of concrete real world — |
Yeah I think there are three categories
|
Anything taking to legacy network attached storage (iscsi, fibrechannel, In terms of general "not being surprised by the platform" there are a class On Oct 29, 2016, at 2:52 PM, David Oppenheimer notifications@github.com Yeah I think there are three categories
— |
Automatic merge from submit-queue (batch tested with PRs 39469, 40557) Forgiveness api changes **What this PR does / why we need it**: Splited from #34825 , contains api changes that are needed to implement forgiveness: 1. update toleration api types to support forgiveness, added a new field forgivenessSeconds to indicate the duration of time it tolerates a taint. 2. update taint api types, added a new field to indicate the time the taint is added. **Which issue this PR fixes** : Related issue: #1574 Related PR: #34825 **Special notes for your reviewer**: **Release note**: ```release-note forgiveness alpha version api definition ```
…hanges Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374) Forgiveness library changes **What this PR does / why we need it**: Splited from #34825, contains library changes that are needed to implement forgiveness: 1. ~~make taints-tolerations matching respect timestamps, so that one toleration can just tolerate a taint for only a period of time.~~ As TaintManager is caching taints and observing taint changes, time-based checking is now outside the library (in TaintManager). see #40355. 2. make tolerations respect wildcard key. 3. add/refresh some related functions to wrap taints-tolerations operation. **Which issue this PR fixes**: Related issue: #1574 Related PR: #34825, #39469 ~~Please note that the first 2 commits in this PR come from #39469 .~~ **Special notes for your reviewer**: ~~Since currently we have `pkg/api/helpers.go` and `pkg/api/v1/helpers.go`, there are some duplicated periods of code laying in these two files.~~ ~~Ideally we should move taints-tolerations related functions into a separate package (pkg/util/taints), and make it a unified set of implementations. But I'd just suggest to do it in a follow-up PR after Forgiveness ones done, in case of feature Forgiveness getting blocked to long.~~ **Release note**: ```release-note make tolerations respect wildcard key ```
…ission-controller Automatic merge from submit-queue (batch tested with PRs 41043, 39058, 41021, 41603, 41414) add defaultTolerationSeconds admission controller **What this PR does / why we need it**: Splited from #34825, add a new admission-controller that 1. adds toleration (with tolerationSeconds = 300) for taint `notReady:NoExecute` to every pod that does not already have a toleration for that taint, and 2. adds toleration (with tolerationSeconds = 300) for taint `unreachable:NoExecute` to every pod that does not already have a toleration for that taint. **Which issue this PR fixes**: Related issue: #1574 Related PR: #34825 **Special notes for your reviewer**: **Release note**: ```release-note add defaultTolerationSeconds admission controller ```
…ionseconds-admission-controller Automatic merge from submit-queue (batch tested with PRs 40932, 41896, 41815, 41309, 41628) enable DefaultTolerationSeconds admission controller by default **What this PR does / why we need it**: Continuation of PR #41414, enable DefaultTolerationSeconds admission controller by default. **Which issue this PR fixes**: fixes: #41860 related Issue: #1574, #25320 related PRs: #34825, #41133, #41414 **Special notes for your reviewer**: **Release note**: ```release-note enable DefaultTolerationSeconds admission controller by default ```
…-noexecute Automatic merge from submit-queue (batch tested with PRs 41116, 41804, 42104, 42111, 42120) make kubectl taint command respect effect NoExecute **What this PR does / why we need it**: Part of feature forgiveness implementation, make kubectl taint command respect effect NoExecute. **Which issue this PR fixes**: Related Issue: #1574 Related PR: #39469 **Special notes for your reviewer**: **Release note**: ```release-note make kubectl taint command respect effect NoExecute ```
…onds Automatic merge from submit-queue fix kubectl describe pod, show tolerationSeconds **What this PR does / why we need it**: tolerationSeconds is now not shown in kubectl describe resutl, this PR is to fix it. With this fix, pod toleration with tolerationSeconds would like below: ```yaml Name: bar Namespace: foo Node: / Labels: <none> Status: IP: Controllers: <none> Containers: <none> No volumes. QoS Class: Node-Selectors: <none> Tolerations: key1=value1 key2=value2:NoSchedule key3=value3:NoExecute for 300s ``` **Which issue this PR fixes** : Related issue: #1574 Related PR: #39469 **Special notes for your reviewer**: **Release note**: ```release-note make kubectl describe pod show tolerationSeconds ```
UPSTREAM: <carry>: move test rules from origin
Forked from #598 -- see that issue for more background. We don't need to implement this now, but recording the idea for posterity.
If we need more control over pod durability (e.g., to configure the node controller behavior -- #1366), rather than a single pin/persistence bit, I suggest we implement forgiveness: a list of (event type, optional duration, optional rate) of disruption events (e.g., host unreachability) the pod will tolerate. We could support an any event type and infinite duration for pods that want to be pinned regardless of what happens.
This approach would generalize nicely for cases where, for example, applications wanted to endure reboots but give up in the case of extended outages or in the case that the disk goes bad. We're also going to want to use a similar spec for availability requirements / failure tolerances of sets of pods.
The text was updated successfully, but these errors were encountered: