-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task 1: Tainted node by condition. #49257
Conversation
/assign |
/unassign @wojtek-t @errordeveloper |
fa19889
to
9d71f3f
Compare
/cc @gmarek This PR show the major idea on tainting Node NoSchedule by condition; is this OK for you? Test MemoryPressure in local cluster with instrument code; I'll continue to add more unit test if it align with the code in your mind :). |
cc/ @resouer |
/retest |
/retest
…----
Da (Klaus), Ma (马达) | PMP® | Software Architect, IBM Cloud private
Development
IBM Spectrum Computing, IBM System
+86-10-8245 4084 | klaus1982.cn@gmail.com | http://k82.me
On Sun, Jul 23, 2017 at 10:22 AM, k8s-ci-robot ***@***.***> wrote:
@k82cn <https://github.com/k82cn>: The following test *failed*, say
/retest to rerun them all:
Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-etcd3 02a6f18
<02a6f18>
link
<https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/49257/pull-kubernetes-e2e-gce-etcd3/42331/> /test
pull-kubernetes-e2e-gce-etcd3
Full PR test history <https://k8s-gubernator.appspot.com/pr/49257>. Your
PR dashboard <https://k8s-gubernator.appspot.com/pr/k82cn>. Please help
us cut down on flakes by linking to
<https://github.com/kubernetes/community/blob/master/contributors/devel/flaky-tests.md#filing-issues-for-flaky-tests>
an open issue
<https://github.com/kubernetes/kubernetes/issues?q=is:issue+is:open> when
you hit one in your PR.
Instructions for interacting with me using PR comments are available here
<https://github.com/kubernetes/community/blob/master/contributors/devel/pull-requests.md>.
If you have questions or suggestions related to my behavior, please file an
issue against the kubernetes/test-infra
<https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:>
repository. I understand the commands that are listed here
<https://github.com/kubernetes/test-infra/blob/master/commands.md>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#49257 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARRLLT-NuKAJ9dDT3zvevhYKszR4iennks5sQq5wgaJpZM4Odikw>
.
|
pkg/controller/controller_utils.go
Outdated
if !ok { | ||
return nil | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line
pkg/controller/controller_utils.go
Outdated
@@ -885,7 +885,11 @@ func (o ReplicaSetsBySizeNewer) Less(i, j int) bool { | |||
return *(o[i].Spec.Replicas) > *(o[j].Spec.Replicas) | |||
} | |||
|
|||
func AddOrUpdateTaintOnNode(c clientset.Interface, nodeName string, taint *v1.Taint) error { | |||
func AddOrUpdateTaintOnNode(c clientset.Interface, nodeName string, taints ...*v1.Taint) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put those changes into separate PR? They have a lot of sense on their own, and we don't want to revert those in case we find a bug somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, created PR #49524 .
if err != nil { | ||
utilruntime.HandleError( | ||
fmt.Errorf( | ||
"unable to taint %v unresponsive Node %q: %v", | ||
taintToAdd.Key, | ||
taintsToAdd, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will print pointer addresses. Make sure that it prints something understandable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
zoneNoExecuteTainer map[string]*RateLimitedTimedQueue | ||
|
||
noScheduleTainerLock sync.Mutex | ||
zoneNoScheduleTainer map[string]*RateLimitedTimedQueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll need to read it carefully (I'll try to do this in the evening, but I may fail, so I'm leaving this note here), but I think this will be wrong. We want to have evictor (i.e. rate-limited tainting) only for Unreachable/NotReady Taints. The rest of the taints should not need rate limiting as they're NoSchedule ones. The only thing that we should protect against is TaintFlapping (i.e. adding and removing Taints in quick succession).
As I said - it's possible that this is how it's done, but I didn't have time to read it carefully enough this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I stand by what I wrote. We don't need rate limiting for those taints.
/unassign |
I will take a look this week. May not be able to do it for a few days. |
ping :). |
/approve I will let @davidopp LGTM (it looks good to me). |
Just noting that cc @thockin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, but I think you may still need to update the PR description. |
We'll also need to align the design doc |
the description was updated :). @davidopp , would you help to review this PR? |
Ping @davidopp |
I spoke with David offline. Applying label. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gmarek, k82cn, wojtek-t Associated issue: 42001 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
/retest |
1 similar comment
/retest |
Automatic merge from submit-queue (batch tested with PRs 51574, 51534, 49257, 44680, 48836) |
What this PR does / why we need it:
Tainted node by condition for MemoryPressure, OutOfDisk and so on.
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): part of #42001Release note: