Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet: Fail kubelet if cadvisor is not started. #29581

Merged

Conversation

Random-Liu
Copy link
Member

@Random-Liu Random-Liu commented Jul 25, 2016

Fixes #28997.

We started cadvisor in sync.Do(), which only run once no matter cadvisor successfully starts or not.

Once it fails, kubelet will be stuck in a bad state. Kubelet could never start sync loop because there is an internal error, but kubelet would never retry starting cadvisor again.

This PR just fails kubelet when cadvisor start fails, and then relies on the babysitter to restart kubelet.
In the future, we may want to add backoff logic in the babysitter to protect the system.

On the other hand, #29492 will fix cadvisor side to prevent cadvisor failing because of these kind of transient error.

Mark P1 to match the original issue.

@dchen1107 @vishh

@Random-Liu Random-Liu added kind/bug Categorizes issue or PR as related to a bug. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. cherrypick-candidate labels Jul 25, 2016
@Random-Liu Random-Liu added this to the v1.3 milestone Jul 25, 2016
@Random-Liu Random-Liu added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jul 25, 2016
@Random-Liu Random-Liu force-pushed the panic-if-cadvisor-not-started branch from 18043ad to e7908c0 Compare July 25, 2016 23:29
@k8s-github-robot k8s-github-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. release-note-label-needed labels Jul 25, 2016
@Random-Liu Random-Liu added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Jul 25, 2016
@Random-Liu Random-Liu changed the title Kubelet: Fail kubelet is cadvisor is not started. Kubelet: Fail kubelet if cadvisor is not started. Jul 25, 2016
@Random-Liu Random-Liu force-pushed the panic-if-cadvisor-not-started branch from e7908c0 to 973f2fc Compare July 25, 2016 23:46
@dchen1107
Copy link
Member

So you are crashing Kubelet instead of failing /healthz?

@dchen1107
Copy link
Member

LGTM

@dchen1107 dchen1107 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 26, 2016
@k8s-bot
Copy link

k8s-bot commented Jul 26, 2016

GCE e2e build/test passed for commit 973f2fc.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit ed3a29b into kubernetes:master Jul 26, 2016
@Random-Liu Random-Liu deleted the panic-if-cadvisor-not-started branch July 26, 2016 01:46
fabioy added a commit that referenced this pull request Jul 26, 2016
…9581-upstream-release-1.3

Automated cherry pick of #29581
@k8s-cherrypick-bot
Copy link

Commit found in the "release-1.3" branch appears to be this PR. Removing the "cherrypick-candidate" label. If this is an error find help to get your PR picked.

shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…pick-of-#29581-upstream-release-1.3

Automated cherry pick of kubernetes#29581
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubelet doesn't start master components
7 participants