-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kubelet awareness to taint tolerant match caculator. #26501
Conversation
Thanks for sending. I will take a look later this week. |
ping @davidopp any progress on this? |
Sorry, I will take a look as soon as I can. |
@k8s-bot test this issue: #IGNORE |
@@ -2483,6 +2489,13 @@ func (kl *Kubelet) canAdmitPod(pods []*api.Pod, pod *api.Pod) (bool, string, str | |||
glog.V(2).Infof("Predicate failed on Pod: %v, for reason: %v", format.Pod(pod), message) | |||
return fit, reason, message | |||
} | |||
if re, ok := err.(*predicates.ErrTaintsTolerationsNotMatch); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this check. Correct me if I'm wrong but for now GeneralPredicates does not check Taints. There is a separate function for it predicates.PodToleratesNodeTaints
so this error will not appear here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I need to call TaintToleratedByTolerations
here separately, will fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the meantime I created PR which adds Taint to GeneralPredicates #29116
5a17beb
to
e1609ac
Compare
ping @davidopp any bandwidth on this? |
Sorry, not yet. Maybe toward the end of the upcoming week. |
cc/ @kubernetes/sig-node |
} | ||
} | ||
|
||
reason := "UnexpectedPredicateFailureType" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should change this to
reason = "PodToleratesNodeTaints"
message = re.Error()
You need some kind of e2e or integration test that verifies that Kubelet does not admit a pod when the node has an untolerated taint of type NoScheduleNoAdmit. The pod needs to be sent directly to the kubelet, and not go through the scheduler (because the scheduler would never send such a pod to such a node). In other words, create a pod with NodeName already set to a node that has a taint of type NoScheduleNoAdmit, and verify that Kubelet does not admit the pod. |
Jenkins GCE e2e failed for commit 03960a945a1e25e284a64d952988af9c50cf3d88. The magic incantation to run this job again is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments.
if len(reasons) == 0 { | ||
message = fmt.Sprint("PodToleratesNodeTaints failed due to unknown reason, which is unexpected.") | ||
glog.Warningf("Failed to admit pod %v - %s", format.Pod(pod), message) | ||
return PodAdmitResult{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without me popping through the node code, are the events log'd higher up the call chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see event will higher it up, more clues?
Message: message, | ||
} | ||
} | ||
r := reasons[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If multiple reasons are returned, why only output [0]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caller: canAdmitPod
returns bool value with one reason, so it's enough to use the first reason.
glog.Warningf("Failed to admit pod %v - %s", format.Pod(pod), message) | ||
} | ||
|
||
return PodAdmitResult{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment re: event.
By("Waiting for static pod rejected event") | ||
eventFound := false | ||
EventsLoop: | ||
for start := time.Now(); time.Since(start) < 4*time.Minute; time.Sleep(2 * time.Second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why your list looping on events vs. watching the pod in question or events for that namespace? In general lists should be avoided even in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There exists a reflector / informer in the framework code iirc.
Use event watch instead of list based on @timothysc 's comments |
if e.InvolvedObject.Kind == "Pod" && e.Reason == "PodToleratesNodeTaints" && strings.Contains(e.Message, | ||
"Taint Toleration unmatched with SomeUntoleratedTaintIsNoAdmit is: true") { | ||
By("PodToleratesNodeTaints event found") | ||
eventFound = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update.
lgtm... @davidopp I have no more comments, did you have anything you wanted to add? /cc @aveshagarwal |
I'm sure I have signed foundation cla btw, may be a re-push is needed :/ |
k, I'm going to timeout on this and lgtm. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
I'm removing the LGTM for now because I'd really like to review this (or at least have @kevin-wangzefeng review it) before it goes in. I'll try to take a look at it soon. |
Automatic merge from submit-queue |
Ugh, look like I was too late. |
@davidopp I ping'd 4+ days ago for additional comments. |
Yes, sorry, I'm really really backed up right now. I suspect the node folks are too :( |
Automatic merge from submit-queue Revert "Add kubelet awareness to taint tolerant match caculator." Reverts #26501 Original PR was not fully reviewed by @kubernetes/sig-node cc/ @timothysc @resouer
@davidopp So I need to open a new PR by using this branch, right? |
Yes, send out a new PR and we will review it as soon as we can. Thanks and sorry for the trouble. |
Add kubelet awareness to taint tolerant match caculator.
Ref: #25320
This is required by
TaintEffectNoScheduleNoAdmit
&TaintEffectNoScheduleNoAdmitNoExecute
, so that node will know if it should expect the taint&tolerantThis change is