Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task 1: Tainted node by condition. #49257

Merged
merged 1 commit into from
Sep 1, 2017

Conversation

k82cn
Copy link
Member

@k82cn k82cn commented Jul 20, 2017

What this PR does / why we need it:
Tainted node by condition for MemoryPressure, OutOfDisk and so on.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): part of #42001

Release note:

Tainted nodes by conditions as following:
  * 'node.kubernetes.io/network-unavailable=:NoSchedule' if NetworkUnavailable is true
  * 'node.kubernetes.io/disk-pressure=:NoSchedule' if DiskPressure is true
  * 'node.kubernetes.io/memory-pressure=:NoSchedule' if MemoryPressure is true
  * 'node.kubernetes.io/out-of-disk=:NoSchedule' if OutOfDisk is true

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 20, 2017
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 20, 2017
@k82cn
Copy link
Member Author

k82cn commented Jul 20, 2017

/assign

@k82cn
Copy link
Member Author

k82cn commented Jul 20, 2017

/unassign @wojtek-t @errordeveloper

@k82cn k82cn force-pushed the k8s_42001 branch 4 times, most recently from fa19889 to 9d71f3f Compare July 20, 2017 13:17
@k82cn
Copy link
Member Author

k82cn commented Jul 20, 2017

/cc @gmarek

This PR show the major idea on tainting Node NoSchedule by condition; is this OK for you?

Test MemoryPressure in local cluster with instrument code; I'll continue to add more unit test if it align with the code in your mind :).

@k8s-ci-robot k8s-ci-robot requested a review from gmarek July 20, 2017 13:36
@k82cn
Copy link
Member Author

k82cn commented Jul 21, 2017

cc/ @resouer

@resouer resouer self-requested a review July 22, 2017 15:54
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 22, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 23, 2017
@k82cn
Copy link
Member Author

k82cn commented Jul 23, 2017

/retest

@k82cn
Copy link
Member Author

k82cn commented Jul 24, 2017 via email

@k82cn k82cn changed the title [WIP] Task 1: Tainted node by condition. Task 1: Tainted node by condition. Jul 24, 2017
if !ok {
return nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove empty line

@@ -885,7 +885,11 @@ func (o ReplicaSetsBySizeNewer) Less(i, j int) bool {
return *(o[i].Spec.Replicas) > *(o[j].Spec.Replicas)
}

func AddOrUpdateTaintOnNode(c clientset.Interface, nodeName string, taint *v1.Taint) error {
func AddOrUpdateTaintOnNode(c clientset.Interface, nodeName string, taints ...*v1.Taint) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put those changes into separate PR? They have a lot of sense on their own, and we don't want to revert those in case we find a bug somewhere else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, created PR #49524 .

if err != nil {
utilruntime.HandleError(
fmt.Errorf(
"unable to taint %v unresponsive Node %q: %v",
taintToAdd.Key,
taintsToAdd,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will print pointer addresses. Make sure that it prints something understandable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

zoneNoExecuteTainer map[string]*RateLimitedTimedQueue

noScheduleTainerLock sync.Mutex
zoneNoScheduleTainer map[string]*RateLimitedTimedQueue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to read it carefully (I'll try to do this in the evening, but I may fail, so I'm leaving this note here), but I think this will be wrong. We want to have evictor (i.e. rate-limited tainting) only for Unreachable/NotReady Taints. The rest of the taints should not need rate limiting as they're NoSchedule ones. The only thing that we should protect against is TaintFlapping (i.e. adding and removing Taints in quick succession).

As I said - it's possible that this is how it's done, but I didn't have time to read it carefully enough this time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I stand by what I wrote. We don't need rate limiting for those taints.

@k82cn
Copy link
Member Author

k82cn commented Aug 12, 2017

/unassign
/assign @wojtek-t

@k8s-ci-robot k8s-ci-robot assigned wojtek-t and unassigned k82cn Aug 12, 2017
@k82cn
Copy link
Member Author

k82cn commented Aug 12, 2017

@davidopp , @wojtek-t , do you have any comments for this :).

@davidopp
Copy link
Member

I will take a look this week. May not be able to do it for a few days.

@k82cn
Copy link
Member Author

k82cn commented Aug 23, 2017

ping :).

@wojtek-t
Copy link
Member

/approve

I will let @davidopp LGTM (it looks good to me).

@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 23, 2017
@luxas
Copy link
Member

luxas commented Aug 24, 2017

Just noting that networkUnavailable should be network-unavailable before moving to beta...
Is that something we can do at this stage?
(I know this goes for the general TaintBasedEvictions feature, which still is alpha)

cc @thockin

Copy link
Contributor

@resouer resouer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luxas This PR does not rely on well known labels, here is a separate fix #51266 , I am asking @k82cn to review.

@k82cn
Copy link
Member Author

k82cn commented Aug 24, 2017

@luxas , as we slack, there's not hard code in this feature; #51266 is enough to cover, and no impact to this PR.

@resouer
Copy link
Contributor

resouer commented Aug 24, 2017

Cool, but I think you may still need to update the PR description.

@yastij
Copy link
Member

yastij commented Aug 24, 2017

We'll also need to align the design doc

@luxas
Copy link
Member

luxas commented Aug 24, 2017

@k82cn @resouer Yeah, cool, just stumbled upon that in the release note above; we should replace it there as well. Thanks!

@luxas luxas added this to the v1.8 milestone Aug 24, 2017
@k82cn
Copy link
Member Author

k82cn commented Aug 25, 2017

the description was updated :).

@davidopp , would you help to review this PR?

@gmarek
Copy link
Contributor

gmarek commented Aug 28, 2017

Ping @davidopp

@gmarek
Copy link
Contributor

gmarek commented Aug 30, 2017

I spoke with David offline. Applying label.

@gmarek
Copy link
Contributor

gmarek commented Aug 30, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 30, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gmarek, k82cn, wojtek-t

Associated issue: 42001

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@gmarek gmarek closed this Aug 30, 2017
@gmarek gmarek reopened this Aug 30, 2017
@gmarek
Copy link
Contributor

gmarek commented Aug 30, 2017

/retest

1 similar comment
@k82cn
Copy link
Member Author

k82cn commented Aug 30, 2017

/retest

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants