-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: implement forgiveness phase 1 #34825
WIP: implement forgiveness phase 1 #34825
Conversation
@kevin-wangzefeng am I right that this PR obsoletes #32498 and #30191 ? |
@kevin-wangzefeng Also if you need help with the last item (add a new admission-controller), let me know. |
Yes, I think so. |
There are still some changes need to be finished, but the main changes are ready for reivew. |
// of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default, | ||
// it is not set, which means tolerate the taint forever (do not evict). Zero and | ||
// negative values are not allowed. | ||
ForgivenessSeconds *int64 `json:"forgivenessSeconds,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForgivenessPeriod or ForgivenessDuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ForgivenessPeriodSeconds ? I like that the seconds is in the var name, otherwise easy to confuse w/ millisec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern we've used elsewhere is XXXSeconds. I think ForgivenessSeconds is fine.
// string(api.TaintEffectNoScheduleNoAdmit), | ||
// string(api.TaintEffectNoScheduleNoAdmitNoExecute), | ||
string(api.TaintEffectNoExecute), | ||
// TODO: Uncomment this block when implement TaintEffectNoAdmit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it NoScheduleNoAdmit (based on the comment #25320 (comment))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is changed in prior offline discussion with @davidopp , I will update to the tracking issue as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ops, I just checked, it not supposed to change (should still be NoScheduleNoAdmit), I will discard it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
bb15419
to
04fa9c8
Compare
tolerated := false | ||
for i := range tolerations { | ||
if TolerationToleratesTaint(&tolerations[i], taint) { | ||
if tolerations[i].ToleratesTaint(taint) { | ||
tolerated = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just return here: return true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tolerated := false | ||
for i := range tolerations { | ||
if TolerationToleratesTaint(&tolerations[i], taint) { | ||
if tolerations[i].ToleratesTaint(taint) { | ||
tolerated = true | ||
break | ||
} | ||
} | ||
return tolerated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could just return here false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if t.ForgivenessSeconds != nil { | ||
// taint with no added time indicated can only be tolerated | ||
// by toleration with no forgivenessSeconds. | ||
if taint.AddedTime.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would test for taint.Effect!=NoExecute then return false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per #28082 (comment) comment
expectTolerated: true, | ||
}, | ||
{ | ||
description: "toleration and taint have the same key and effect, and operator is Exists, and taint has some value, expect tolerated", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add one more explicit test: "toleration and taint have the same key and effect but different values, and operator is Exists, and expect tolerated"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If toleration.operator is Exists, toleration.value must be empty (ensured in validation), I don't think the test you mentioned is necessary.
expectTolerated: false, | ||
}, | ||
{ | ||
description: "toleration with explicit forgiveness can't tolerate taint with no added time, expect not tolerated", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is invalid and not really a test for "not tolerated". Because it seems as per validation NoExcute must have addedtime so this kind of test should be part of validation tests IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taint.AddedTime
is intended to be not required even when the effect is NoExecute, so that the taint can only be tolerated with forever forgiveness toleration.
allErrors = append(allErrors, field.Invalid(idxPath.Child("effect"), toleration.Effect, "effect must be NoExecute when `ForgivenessSeconds` is set")) | ||
} | ||
if toleration.Operator != api.TolerationOpExists { | ||
allErrors = append(allErrors, field.Invalid(idxPath.Child("operator"), toleration.Operator, "operator must be Exists when `key` is 'nodeNotReady'")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"operator must be Exists when key
is 'nodeNotReady" does not seem to fit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "operator must be Exists when forgivenessSeconds
is set", thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you use forgiveness with operator Equals ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is any particular reason, just in node notready/unreachable case, taints has no value, what about removing this checking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree you should remove this. Forgiveness should work with both Exists and Equals toleration operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if toleration.Effect != api.TaintEffectNoExecute { | ||
allErrors = append(allErrors, field.Invalid(idxPath.Child("effect"), toleration.Effect, "effect must be NoExecute when `ForgivenessSeconds` is set")) | ||
} | ||
if toleration.Operator != api.TolerationOpExists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this matters for toleration.ForgivenessSeconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is wrong. Forgiveness should work with both Exists and Equals tolerator operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tolerated = true | ||
break | ||
} | ||
} | ||
return tolerated | ||
} | ||
|
||
type taintsFilterFunc func(Taint) bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I am yet to look into full PR, I wonder if it could be misused if not guarded properly. For example, a malicious pod ignoring all taints of a privileged node and ends up getting scheduled on that node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally we would be able to control who can create, update, and delete taints and tolerations. (And we should implement that in the future.) But I don't think we're introducing anything more unsafe than what we already have today. In particular
- to modify taint, you need to modify the Node object; but if you can modify the Node object, you can already do a lot of damage today (set node capacity to zero, add NotReady condition, etc.).
- to modify toleration, you need to modify the Pod object; if but if you can modify the Pod object, you can already do a lot of damage today (change the image name, set NodeName to an arbitrary value, etc.).
I agree that the "dedicated nodes" use case (the one you are alluding to) definitely can't be safely implemented without some ACL scheme for taints and tolerations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other topic than above, but it's github... - you probably can go with *Taint and avoid unnecessary copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other topic than above, but it's github... - you probably can go with *Taint and avoid unnecessary copies.
done
Effect: api.TaintEffectNoExecute, | ||
AddedTime: nc.now(), | ||
} | ||
added, err := tryAddTaintToNode(nc.kubeClient, node.Name, nodeUnreachableTaint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be checked if the node has TaintNodeNotReady (if the node's status goes from NotReady to Unreachable) and if the taint exists, it should be removed. I dont think both taints TaintNodeNotReady, TaintNodeUnreachable should exist together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you mean. What is "Unreachable"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeReady can have three values: True, False and Unknown. True means ready, False and Unknown means that something's wrong. False is generally set by Kubelet to mean that something's wrong (e.g. Docker is down). Unknown means that the Node is Unreachable. Those are different things and I don't understand why we shouldn't have both taints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gmarek I meant, at the same time, a node can not have both taints (TaintNodeNotReady, and TaintNodeUnreachable) associated to it. So if the NodeReady changes to Unknown(Unreachable), before assigning TaintNodeUnreachable, its better to check if the node had the taint TaintNodeNotReady (if NodeReady goes from False to Unkown), and if TaintNodeNotReady exists, remove it first, then assign TaintNodeUnreachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should try to mimic the current behavior as closely as possible, to minimize the chance of bugs. Ideally we would have TaintNodeNotReady at the same time when we currently have ConditionFalse, and TaintNodeUnreachable at the same time when we currently have ConditionUnknown. Since we cannot have ConditionFalse and ConditionUnknown at the same time, we would not have both taints a tthe same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
|
||
// tryRemoveTaintsOffNode tries to remove taints off a node, | ||
// return true if taints removed, or return false if taints don't exist. | ||
func tryRemoveTaintsOffNode(kubeClient clientset.Interface, nodeName string, taintsToRemove ...api.Taint) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems lot of common code between tryRemoveTaintsOffNode and tryAddTaintToNode, so could be combined onto one function something like tryModifyNodeTaints with a param like add/remove, or the common code could be put into some other common functions which both tryRemoveTaintsOffNode and tryAddTaintToNode could use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we delay refactoring for code compactness. Let's get this PR into a state where it actually works and implements all the functionality, and then clean it up.
Effect: api.TaintEffectNoExecute, | ||
AddedTime: nc.now(), | ||
} | ||
added, err := tryAddTaintToNode(nc.kubeClient, node.Name, nodeUnreachableTaint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above recordNodeStatusChange(nc.recorder, node, "NodeNotReady") sets node status to NodeNotReady, so shouldn't the applied taint be TaintNodeNotReady?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checked here, though the node status change is recorded NodeNotReady, it is actually node unreachable (NodeReady status ConditionUnknown) here, so we should use TaintNodeUnreachable
@kevin-wangzefeng thanks for this PR. could you put all generated code in a separate commit for easier review? |
if (len(toleration.Operator) == 0 || toleration.Operator == TolerationOpEqual) && toleration.Value == taint.Value { | ||
return true | ||
|
||
// check forgivenessSeconds time out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just delete this comment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@davidopp @gmarek @aveshagarwal, thanks for your feedback, I will do the refreshing work in new PRs and get them done in time. For this PR, I will keep as it is now. |
We can't let this slip, I'd like to enqueue a status update here during the next sig call. |
@timothysc - I'm offcall now and I'll prioritize this work. I probably won't be able to do serious code reviews because of that (@sjug) |
[APPROVALNOTIFIER] Needs approval from an approver in each of these OWNERS Files:
We suggest the following people: |
Automatic merge from submit-queue (batch tested with PRs 39469, 40557) Forgiveness api changes **What this PR does / why we need it**: Splited from #34825 , contains api changes that are needed to implement forgiveness: 1. update toleration api types to support forgiveness, added a new field forgivenessSeconds to indicate the duration of time it tolerates a taint. 2. update taint api types, added a new field to indicate the time the taint is added. **Which issue this PR fixes** : Related issue: #1574 Related PR: #34825 **Special notes for your reviewer**: **Release note**: ```release-note forgiveness alpha version api definition ```
@kevin-wangzefeng Is there are reason to keep this PR open given the other PRs are nearly done? |
…hanges Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374) Forgiveness library changes **What this PR does / why we need it**: Splited from #34825, contains library changes that are needed to implement forgiveness: 1. ~~make taints-tolerations matching respect timestamps, so that one toleration can just tolerate a taint for only a period of time.~~ As TaintManager is caching taints and observing taint changes, time-based checking is now outside the library (in TaintManager). see #40355. 2. make tolerations respect wildcard key. 3. add/refresh some related functions to wrap taints-tolerations operation. **Which issue this PR fixes**: Related issue: #1574 Related PR: #34825, #39469 ~~Please note that the first 2 commits in this PR come from #39469 .~~ **Special notes for your reviewer**: ~~Since currently we have `pkg/api/helpers.go` and `pkg/api/v1/helpers.go`, there are some duplicated periods of code laying in these two files.~~ ~~Ideally we should move taints-tolerations related functions into a separate package (pkg/util/taints), and make it a unified set of implementations. But I'd just suggest to do it in a follow-up PR after Forgiveness ones done, in case of feature Forgiveness getting blocked to long.~~ **Release note**: ```release-note make tolerations respect wildcard key ```
No, we can close it now. |
…ission-controller Automatic merge from submit-queue (batch tested with PRs 41043, 39058, 41021, 41603, 41414) add defaultTolerationSeconds admission controller **What this PR does / why we need it**: Splited from #34825, add a new admission-controller that 1. adds toleration (with tolerationSeconds = 300) for taint `notReady:NoExecute` to every pod that does not already have a toleration for that taint, and 2. adds toleration (with tolerationSeconds = 300) for taint `unreachable:NoExecute` to every pod that does not already have a toleration for that taint. **Which issue this PR fixes**: Related issue: #1574 Related PR: #34825 **Special notes for your reviewer**: **Release note**: ```release-note add defaultTolerationSeconds admission controller ```
…ionseconds-admission-controller Automatic merge from submit-queue (batch tested with PRs 40932, 41896, 41815, 41309, 41628) enable DefaultTolerationSeconds admission controller by default **What this PR does / why we need it**: Continuation of PR #41414, enable DefaultTolerationSeconds admission controller by default. **Which issue this PR fixes**: fixes: #41860 related Issue: #1574, #25320 related PRs: #34825, #41133, #41414 **Special notes for your reviewer**: **Release note**: ```release-note enable DefaultTolerationSeconds admission controller by default ```
Why:
This PR is to implement forgiveness phase 1 and features it depends.
Related issue: #1574
Refreshes Implementation: #30191, also incorporates #32498
What changes made in this PR:
forgivenessSeconds
to indicate the duration of time it tolerates a taint.notready:NoExecute
with timestamp to a node when it goes not ready, and remove it when it turns ready.unreachable:NoExecute
with timestamp to a node when it becomes unknown, and remove it when it turns back ready.notready:NoExecute
andunreachable:NoExecute
for 5 minutes.Release note:
This change is