Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: implement forgiveness phase 1 #34825

Closed

Conversation

kevin-wangzefeng
Copy link
Member

@kevin-wangzefeng kevin-wangzefeng commented Oct 14, 2016

Why:
This PR is to implement forgiveness phase 1 and features it depends.
Related issue: #1574
Refreshes Implementation: #30191, also incorporates #32498

What changes made in this PR:

  1. update toleration api types to support forgiveness, added a new field forgivenessSeconds to indicate the duration of time it tolerates a taint.
  2. update taint api types, added a new field to indicate the time the taint is added.
  3. automatically add taint notready:NoExecute with timestamp to a node when it goes not ready, and remove it when it turns ready.
  4. automatically add taint unreachable:NoExecute with timestamp to a node when it becomes unknown, and remove it when it turns back ready.
  5. periodically check taints of a node and pods running on it, evict those not tolerate NoExecute taints.
  6. make taints-tolerations matching respect timestamps, useful in forgiveness cases.
  7. make tolerations respect wildcard key
  8. add a new admission-controller to set default forgiveness toleration for pods (that has no forgiveness tolerations) to tolerate taints notready:NoExecute and unreachable:NoExecute for 5 minutes.

Release note:

implement forgiveness alpha verion 
supports taint based pod eviction at runtime

This change is Reviewable

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Oct 14, 2016
@aveshagarwal
Copy link
Member

@kevin-wangzefeng am I right that this PR obsoletes #32498 and #30191 ?

@aveshagarwal
Copy link
Member

@kevin-wangzefeng Also if you need help with the last item (add a new admission-controller), let me know.

@kevin-wangzefeng
Copy link
Member Author

kevin-wangzefeng commented Oct 14, 2016

@kevin-wangzefeng am I right that this PR obsoletes #32498 and #30191 ?

Yes, I think so.

@kevin-wangzefeng
Copy link
Member Author

There are still some changes need to be finished, but the main changes are ready for reivew.
/cc @davidopp @kubernetes/sig-scheduling

// of effect NoExecute, otherwise this field is ignored) tolerates the taint. By default,
// it is not set, which means tolerate the taint forever (do not evict). Zero and
// negative values are not allowed.
ForgivenessSeconds *int64 `json:"forgivenessSeconds,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForgivenessPeriod or ForgivenessDuration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForgivenessPeriodSeconds ? I like that the seconds is in the var name, otherwise easy to confuse w/ millisec

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern we've used elsewhere is XXXSeconds. I think ForgivenessSeconds is fine.

// string(api.TaintEffectNoScheduleNoAdmit),
// string(api.TaintEffectNoScheduleNoAdmitNoExecute),
string(api.TaintEffectNoExecute),
// TODO: Uncomment this block when implement TaintEffectNoAdmit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it NoScheduleNoAdmit (based on the comment #25320 (comment))?

Copy link
Member Author

@kevin-wangzefeng kevin-wangzefeng Oct 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is changed in prior offline discussion with @davidopp , I will update to the tracking issue as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ops, I just checked, it not supposed to change (should still be NoScheduleNoAdmit), I will discard it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2016
tolerated := false
for i := range tolerations {
if TolerationToleratesTaint(&tolerations[i], taint) {
if tolerations[i].ToleratesTaint(taint) {
tolerated = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just return here: return true.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tolerated := false
for i := range tolerations {
if TolerationToleratesTaint(&tolerations[i], taint) {
if tolerations[i].ToleratesTaint(taint) {
tolerated = true
break
}
}
return tolerated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just return here false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if t.ForgivenessSeconds != nil {
// taint with no added time indicated can only be tolerated
// by toleration with no forgivenessSeconds.
if taint.AddedTime.IsZero() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would test for taint.Effect!=NoExecute then return false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #28082 (comment) comment

expectTolerated: true,
},
{
description: "toleration and taint have the same key and effect, and operator is Exists, and taint has some value, expect tolerated",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add one more explicit test: "toleration and taint have the same key and effect but different values, and operator is Exists, and expect tolerated"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If toleration.operator is Exists, toleration.value must be empty (ensured in validation), I don't think the test you mentioned is necessary.

expectTolerated: false,
},
{
description: "toleration with explicit forgiveness can't tolerate taint with no added time, expect not tolerated",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is invalid and not really a test for "not tolerated". Because it seems as per validation NoExcute must have addedtime so this kind of test should be part of validation tests IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taint.AddedTime is intended to be not required even when the effect is NoExecute, so that the taint can only be tolerated with forever forgiveness toleration.

allErrors = append(allErrors, field.Invalid(idxPath.Child("effect"), toleration.Effect, "effect must be NoExecute when `ForgivenessSeconds` is set"))
}
if toleration.Operator != api.TolerationOpExists {
allErrors = append(allErrors, field.Invalid(idxPath.Child("operator"), toleration.Operator, "operator must be Exists when `key` is 'nodeNotReady'"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"operator must be Exists when key is 'nodeNotReady" does not seem to fit here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be "operator must be Exists when forgivenessSeconds is set", thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you use forgiveness with operator Equals ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is any particular reason, just in node notready/unreachable case, taints has no value, what about removing this checking?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree you should remove this. Forgiveness should work with both Exists and Equals toleration operators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if toleration.Effect != api.TaintEffectNoExecute {
allErrors = append(allErrors, field.Invalid(idxPath.Child("effect"), toleration.Effect, "effect must be NoExecute when `ForgivenessSeconds` is set"))
}
if toleration.Operator != api.TolerationOpExists {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this matters for toleration.ForgivenessSeconds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is wrong. Forgiveness should work with both Exists and Equals tolerator operators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

tolerated = true
break
}
}
return tolerated
}

type taintsFilterFunc func(Taint) bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I am yet to look into full PR, I wonder if it could be misused if not guarded properly. For example, a malicious pod ignoring all taints of a privileged node and ends up getting scheduled on that node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ideally we would be able to control who can create, update, and delete taints and tolerations. (And we should implement that in the future.) But I don't think we're introducing anything more unsafe than what we already have today. In particular

  • to modify taint, you need to modify the Node object; but if you can modify the Node object, you can already do a lot of damage today (set node capacity to zero, add NotReady condition, etc.).
  • to modify toleration, you need to modify the Pod object; if but if you can modify the Pod object, you can already do a lot of damage today (change the image name, set NodeName to an arbitrary value, etc.).

I agree that the "dedicated nodes" use case (the one you are alluding to) definitely can't be safely implemented without some ACL scheme for taints and tolerations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other topic than above, but it's github... - you probably can go with *Taint and avoid unnecessary copies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other topic than above, but it's github... - you probably can go with *Taint and avoid unnecessary copies.

done

Effect: api.TaintEffectNoExecute,
AddedTime: nc.now(),
}
added, err := tryAddTaintToNode(nc.kubeClient, node.Name, nodeUnreachableTaint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be checked if the node has TaintNodeNotReady (if the node's status goes from NotReady to Unreachable) and if the taint exists, it should be removed. I dont think both taints TaintNodeNotReady, TaintNodeUnreachable should exist together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean. What is "Unreachable"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeReady can have three values: True, False and Unknown. True means ready, False and Unknown means that something's wrong. False is generally set by Kubelet to mean that something's wrong (e.g. Docker is down). Unknown means that the Node is Unreachable. Those are different things and I don't understand why we shouldn't have both taints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarek I meant, at the same time, a node can not have both taints (TaintNodeNotReady, and TaintNodeUnreachable) associated to it. So if the NodeReady changes to Unknown(Unreachable), before assigning TaintNodeUnreachable, its better to check if the node had the taint TaintNodeNotReady (if NodeReady goes from False to Unkown), and if TaintNodeNotReady exists, remove it first, then assign TaintNodeUnreachable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should try to mimic the current behavior as closely as possible, to minimize the chance of bugs. Ideally we would have TaintNodeNotReady at the same time when we currently have ConditionFalse, and TaintNodeUnreachable at the same time when we currently have ConditionUnknown. Since we cannot have ConditionFalse and ConditionUnknown at the same time, we would not have both taints a tthe same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.


// tryRemoveTaintsOffNode tries to remove taints off a node,
// return true if taints removed, or return false if taints don't exist.
func tryRemoveTaintsOffNode(kubeClient clientset.Interface, nodeName string, taintsToRemove ...api.Taint) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems lot of common code between tryRemoveTaintsOffNode and tryAddTaintToNode, so could be combined onto one function something like tryModifyNodeTaints with a param like add/remove, or the common code could be put into some other common functions which both tryRemoveTaintsOffNode and tryAddTaintToNode could use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that we delay refactoring for code compactness. Let's get this PR into a state where it actually works and implements all the functionality, and then clean it up.

Effect: api.TaintEffectNoExecute,
AddedTime: nc.now(),
}
added, err := tryAddTaintToNode(nc.kubeClient, node.Name, nodeUnreachableTaint)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above recordNodeStatusChange(nc.recorder, node, "NodeNotReady") sets node status to NodeNotReady, so shouldn't the applied taint be TaintNodeNotReady?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checked here, though the node status change is recorded NodeNotReady, it is actually node unreachable (NodeReady status ConditionUnknown) here, so we should use TaintNodeUnreachable

@aveshagarwal
Copy link
Member

@kevin-wangzefeng thanks for this PR. could you put all generated code in a separate commit for easier review?

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 14, 2016
if (len(toleration.Operator) == 0 || toleration.Operator == TolerationOpEqual) && toleration.Value == taint.Value {
return true

// check forgivenessSeconds time out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just delete this comment :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@kevin-wangzefeng
Copy link
Member Author

@davidopp @gmarek @aveshagarwal, thanks for your feedback, I will do the refreshing work in new PRs and get them done in time. For this PR, I will keep as it is now.

@timothysc
Copy link
Member

We can't let this slip, I'd like to enqueue a status update here during the next sig call.

@gmarek
Copy link
Contributor

gmarek commented Jan 18, 2017

@timothysc - I'm offcall now and I'll prioritize this work. I probably won't be able to do serious code reviews because of that (@sjug)

@k8s-github-robot
Copy link

[APPROVALNOTIFIER] Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @derekwaynecarr @lavalamp @wojtek-t @spxtr
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot pushed a commit that referenced this pull request Jan 27, 2017
Automatic merge from submit-queue (batch tested with PRs 39469, 40557)

Forgiveness api changes

**What this PR does / why we need it**:
Splited from #34825 , contains api changes that are needed to implement forgiveness:
1. update toleration api types to support forgiveness, added a new field forgivenessSeconds to indicate the duration of time it tolerates a taint.
2. update taint api types, added a new field to indicate the time the taint is added.

**Which issue this PR fixes** : 
Related issue: #1574
Related PR: #34825 

**Special notes for your reviewer**:

**Release note**:

```release-note
forgiveness alpha version api definition
```
@spxtr spxtr removed their assignment Jan 31, 2017
@timothysc
Copy link
Member

@kevin-wangzefeng Is there are reason to keep this PR open given the other PRs are nearly done?

@lavalamp lavalamp removed their assignment Feb 3, 2017
k8s-github-robot pushed a commit that referenced this pull request Feb 3, 2017
…hanges

Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)

Forgiveness library changes

**What this PR does / why we need it**:
Splited from #34825, contains library changes that are needed to implement forgiveness:

1. ~~make taints-tolerations matching respect timestamps, so that one toleration can just tolerate a taint for only a period of time.~~ As TaintManager is caching taints and observing taint changes, time-based checking is now outside the library (in TaintManager). see #40355.
2. make tolerations respect wildcard key.
3. add/refresh some related functions to wrap taints-tolerations operation.

**Which issue this PR fixes**: 
Related issue: #1574
Related PR: #34825, #39469 
~~Please note that the first 2 commits in this PR come from #39469 .~~

**Special notes for your reviewer**:

~~Since currently we have `pkg/api/helpers.go` and `pkg/api/v1/helpers.go`, there are some duplicated periods of code laying in these two files.~~

~~Ideally we should move taints-tolerations related functions into a separate package (pkg/util/taints), and make it a unified set of implementations. But I'd just suggest to do it in a follow-up PR after Forgiveness ones done, in case of feature Forgiveness getting blocked to long.~~

**Release note**:

```release-note
make tolerations respect wildcard key
```
@kevin-wangzefeng
Copy link
Member Author

No, we can close it now.

k8s-github-robot pushed a commit that referenced this pull request Feb 19, 2017
…ission-controller

Automatic merge from submit-queue (batch tested with PRs 41043, 39058, 41021, 41603, 41414)

add defaultTolerationSeconds admission controller

**What this PR does / why we need it**:
Splited from #34825, add a new admission-controller that
1. adds toleration (with tolerationSeconds = 300) for taint `notReady:NoExecute` to every pod that does not already have a toleration for that taint, and
2. adds toleration (with tolerationSeconds = 300) for taint `unreachable:NoExecute` to every pod that does not already have a toleration for that taint.

**Which issue this PR fixes**: 
Related issue: #1574
Related PR: #34825

**Special notes for your reviewer**:

**Release note**:

```release-note
add defaultTolerationSeconds admission controller
```
k8s-github-robot pushed a commit that referenced this pull request Feb 26, 2017
…ionseconds-admission-controller

Automatic merge from submit-queue (batch tested with PRs 40932, 41896, 41815, 41309, 41628)

enable DefaultTolerationSeconds admission controller by default

**What this PR does / why we need it**:
Continuation of PR #41414, enable DefaultTolerationSeconds admission controller by default.


**Which issue this PR fixes**: 
fixes: #41860
related Issue: #1574, #25320
related PRs: #34825, #41133, #41414 

**Special notes for your reviewer**:

**Release note**:

```release-note
enable DefaultTolerationSeconds admission controller by default
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.