Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node problem detector as an addon pod. #25986

Merged

Conversation

Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented May 20, 2016

Introduce a new add-on pod NodeProblemDetector.

NodeProblemDetector is a DaemonSet running on each node, monitoring node health and reporting
node problems as NodeCondition and Event. Currently it already supports kernel log monitoring, and
will support more problem detection in the future. It is enabled by default on gce now.

This PR enables NodeProblemDetector as an add-on pod.

/cc @mikedanese @kubernetes/sig-node

Analytics

@Random-Liu Random-Liu added area/reliability sig/node Categorizes an issue or PR as relevant to SIG Node. labels May 20, 2016
@Random-Liu Random-Liu added this to the v1.3 milestone May 20, 2016
@Random-Liu Random-Liu force-pushed the enable-node-problem-detector branch from b819301 to 1b0c70c Compare May 20, 2016 20:52
@Random-Liu Random-Liu force-pushed the enable-node-problem-detector branch from 1b0c70c to 195e249 Compare May 20, 2016 20:54
@k8s-github-robot k8s-github-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels May 20, 2016
@Random-Liu Random-Liu added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels May 20, 2016
@dchen1107
Copy link
Member

LGTM overall. But have you run feasibility check related test? I am wondering this newly introduced daemonset will break those tests.

@Random-Liu
Copy link
Member Author

I run all SchedulerPredicates test just now, and get the following result:

Summarizing 6 Failures:

[Fail] [k8s.io] SchedulerPredicates [Serial] [It] validates that InterPodAffinity is respected if matching [Feature:PodAffinity] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:929

[Fail] [k8s.io] SchedulerPredicates [Serial] [It] validates that InterPodAffinity is respected if matching with multiple Affinities [Feature:PodAffinity] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:1117

[Fail] [k8s.io] SchedulerPredicates [Serial] [It] validates that embedding the JSON PodAffinity and PodAntiAffinity setting as a string in the annotation value work 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:1282

[Fail] [k8s.io] SchedulerPredicates [Serial] [It] validates that Inter-pod-Affinity is respected if not matching [Feature:PodAffinity] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:114

[Fail] [k8s.io] SchedulerPredicates [Serial] [It] validates that InterPod Affinity and AntiAffinity is respected if matching [Feature:PodAffinity] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:1220

[Fail] [k8s.io] SchedulerPredicates [Serial] [It] validates that InterPodAntiAffinity is respected if matching 2 [Feature:PodAffinity] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/scheduler_predicates.go:114

Ran 17 of 298 Specs in 424.268 seconds
FAIL! -- 11 Passed | 6 Failed | 0 Pending | 281 Skipped --- FAIL: TestE2E (424.67s)

Looks like the [Feature:PodAffinity] ones are skipped by default. I'll look into this one:

validates that embedding the JSON PodAffinity and PodAntiAffinity setting as a string in the annotation value work

@Random-Liu
Copy link
Member Author

Random-Liu commented May 21, 2016

I found the reason why the test failed. It is unrelated with this PR.
These 2 tests:

validates that embedding the JSON PodAffinity and PodAntiAffinity setting as a string in the annotation value work

and

validates that embedding the JSON NodeAffinity setting as a string in the annotation value work

They apply node labels kubernetes.io/e2e-az-name and e2e.inter-pod-affinity.kubernetes.io/zone to a specific node and expect the pod to be scheduled on that node. After the test, they cleaned up the labels.

So if I run the test for more than once, there will be more than one nodes beening labelled. Then the pod may be scheduled on any of these labelled nodes, which causes the test failure.

We should:

  • Cleanup the label after the test if possible.
  • Or randomize the label key just like the other affinity test if possible.

@dchen1107
/cc @davidopp

@dchen1107 dchen1107 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 21, 2016
@dchen1107
Copy link
Member

LGTM

@dchen1107
Copy link
Member

cc/ @jlowdermilk @roberthbailey

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented May 22, 2016

GCE e2e build/test passed for commit 195e249.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit fe15db6 into kubernetes:master May 22, 2016
@Random-Liu Random-Liu deleted the enable-node-problem-detector branch May 22, 2016 19:53
@roberthbailey
Copy link
Contributor

/cc @andyzheng0831 @wonderfly we need this configured on gci masters as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/introspection area/reliability lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants