Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kubelet awareness to taint tolerant match caculator. #26501

Merged
merged 3 commits into from
Oct 7, 2016

Conversation

resouer
Copy link
Contributor

@resouer resouer commented May 29, 2016

Add kubelet awareness to taint tolerant match caculator.

Ref: #25320

This is required by TaintEffectNoScheduleNoAdmit & TaintEffectNoScheduleNoAdmitNoExecute, so that node will know if it should expect the taint&tolerant


This change is Reviewable

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels May 29, 2016
@davidopp
Copy link
Member

davidopp commented Jun 1, 2016

Thanks for sending. I will take a look later this week.

@resouer
Copy link
Contributor Author

resouer commented Jun 13, 2016

ping @davidopp any progress on this?

@davidopp
Copy link
Member

Sorry, I will take a look as soon as I can.

@resouer
Copy link
Contributor Author

resouer commented Jun 22, 2016

@k8s-bot test this issue: #IGNORE

@kevin-wangzefeng
Copy link
Member

/cc @kevin-wangzefeng

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 10, 2016
@@ -2483,6 +2489,13 @@ func (kl *Kubelet) canAdmitPod(pods []*api.Pod, pod *api.Pod) (bool, string, str
glog.V(2).Infof("Predicate failed on Pod: %v, for reason: %v", format.Pod(pod), message)
return fit, reason, message
}
if re, ok := err.(*predicates.ErrTaintsTolerationsNotMatch); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this check. Correct me if I'm wrong but for now GeneralPredicates does not check Taints. There is a separate function for it predicates.PodToleratesNodeTaints so this error will not appear here.

Copy link
Contributor Author

@resouer resouer Jul 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I need to call TaintToleratedByTolerations here separately, will fix this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the meantime I created PR which adds Taint to GeneralPredicates #29116

@resouer resouer force-pushed the scheduler branch 2 times, most recently from 5a17beb to e1609ac Compare July 22, 2016 09:15
@resouer
Copy link
Contributor Author

resouer commented Jul 22, 2016

[Updated] Updated the commit. I am hesitate to add PodToleratesNodeTaints in GeneralPredicates, so choose to call it in kubelet directly cc @davidopp @lukaszo

@k8s-github-robot k8s-github-robot added do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. retest-not-required-docs-only and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. retest-not-required-docs-only labels Jul 22, 2016
@resouer
Copy link
Contributor Author

resouer commented Jul 30, 2016

ping @davidopp any bandwidth on this?

@davidopp
Copy link
Member

Sorry, not yet. Maybe toward the end of the upcoming week.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 2, 2016
@davidopp
Copy link
Member

davidopp commented Aug 7, 2016

cc/ @kubernetes/sig-node

@davidopp
Copy link
Member

davidopp commented Aug 7, 2016

cc/ @kevin-wangzefeng

}
}

reason := "UnexpectedPredicateFailureType"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should change this to

reason = "PodToleratesNodeTaints"
message = re.Error()

@davidopp
Copy link
Member

davidopp commented Aug 7, 2016

You need some kind of e2e or integration test that verifies that Kubelet does not admit a pod when the node has an untolerated taint of type NoScheduleNoAdmit. The pod needs to be sent directly to the kubelet, and not go through the scheduler (because the scheduler would never send such a pod to such a node). In other words, create a pod with NodeName already set to a node that has a taint of type NoScheduleNoAdmit, and verify that Kubelet does not admit the pod.

@davidopp davidopp added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Aug 7, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 22, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins GCE e2e failed for commit 03960a945a1e25e284a64d952988af9c50cf3d88.

The magic incantation to run this job again is @k8s-bot gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@resouer
Copy link
Contributor Author

resouer commented Sep 22, 2016

@davidopp This PR is ready for final review cc @yujuhong

Copy link
Member

@timothysc timothysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments.

if len(reasons) == 0 {
message = fmt.Sprint("PodToleratesNodeTaints failed due to unknown reason, which is unexpected.")
glog.Warningf("Failed to admit pod %v - %s", format.Pod(pod), message)
return PodAdmitResult{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without me popping through the node code, are the events log'd higher up the call chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see event will higher it up, more clues?

Message: message,
}
}
r := reasons[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If multiple reasons are returned, why only output [0]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller: canAdmitPod returns bool value with one reason, so it's enough to use the first reason.

glog.Warningf("Failed to admit pod %v - %s", format.Pod(pod), message)
}

return PodAdmitResult{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment re: event.

By("Waiting for static pod rejected event")
eventFound := false
EventsLoop:
for start := time.Now(); time.Since(start) < 4*time.Minute; time.Sleep(2 * time.Second) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why your list looping on events vs. watching the pod in question or events for that namespace? In general lists should be avoided even in the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There exists a reflector / informer in the framework code iirc.

@timothysc timothysc added this to the v1.5 milestone Sep 26, 2016
@timothysc timothysc self-assigned this Sep 26, 2016
@resouer
Copy link
Contributor Author

resouer commented Oct 1, 2016

Use event watch instead of list based on @timothysc 's comments

if e.InvolvedObject.Kind == "Pod" && e.Reason == "PodToleratesNodeTaints" && strings.Contains(e.Message,
"Taint Toleration unmatched with SomeUntoleratedTaintIsNoAdmit is: true") {
By("PodToleratesNodeTaints event found")
eventFound = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.

@timothysc
Copy link
Member

lgtm... @davidopp I have no more comments, did you have anything you wanted to add?

/cc @aveshagarwal

@resouer
Copy link
Contributor Author

resouer commented Oct 7, 2016

I'm sure I have signed foundation cla btw, may be a re-push is needed :/

@timothysc
Copy link
Member

k, I'm going to timeout on this and lgtm.

@timothysc timothysc added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2016
@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@davidopp davidopp removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 7, 2016
@davidopp
Copy link
Member

davidopp commented Oct 7, 2016

I'm removing the LGTM for now because I'd really like to review this (or at least have @kevin-wangzefeng review it) before it goes in. I'll try to take a look at it soon.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit 21188ca into kubernetes:master Oct 7, 2016
@davidopp
Copy link
Member

davidopp commented Oct 7, 2016

Ugh, look like I was too late.
I'm going to send a rollback PR, because it was not reviewed by anyone from node team (I see just one comment from @yujuhong but not a review of the changes to Kubelet).

@timothysc
Copy link
Member

@davidopp I ping'd 4+ days ago for additional comments.

@davidopp
Copy link
Member

davidopp commented Oct 7, 2016

Yes, sorry, I'm really really backed up right now. I suspect the node folks are too :(

k8s-github-robot pushed a commit that referenced this pull request Oct 7, 2016
Automatic merge from submit-queue

Revert "Add kubelet awareness to taint tolerant match caculator."

Reverts #26501

Original PR was not fully reviewed by @kubernetes/sig-node 

cc/ @timothysc @resouer
@resouer
Copy link
Contributor Author

resouer commented Oct 8, 2016

@davidopp So I need to open a new PR by using this branch, right?

@davidopp
Copy link
Member

davidopp commented Oct 8, 2016

Yes, send out a new PR and we will review it as soon as we can. Thanks and sorry for the trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.