Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cadvisor godeps to v0.28.3 #56967

Merged
merged 1 commit into from
Dec 12, 2017

Conversation

jsravn
Copy link
Contributor

@jsravn jsravn commented Dec 8, 2017

What this PR does / why we need it:
Adds timeout around docker queries, to prevent wedging kubelet on node status updates if docker is non-responsive.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #53207

Special notes for your reviewer:
Kubelet's node status update queries cadvisor, which had no timeout on underlying docker queries. As a result, if docker was non responsive, kubelet would never be able to recover itself without a restart.

Release note:

Timeout docker queries to prevent node going NotReady indefinitely.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 8, 2017
@dashpole
Copy link
Contributor

dashpole commented Dec 8, 2017

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Dec 8, 2017
@dashpole
Copy link
Contributor

dashpole commented Dec 8, 2017

This godep update contains three changes in total:

  1. This change to prevent the kubelet from becoming NotReady when docker is hung
  2. An adjustment of logging verbosity to lower spam
  3. A prometheus metrics bug fix

All three changes are low risk by my assessment.

/assign @dchen1107 @yujuhong
/unassign @lavalamp @brendandburns

@dashpole
Copy link
Contributor

dashpole commented Dec 8, 2017

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Dec 8, 2017
@dashpole
Copy link
Contributor

dashpole commented Dec 8, 2017

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 8, 2017
@dashpole
Copy link
Contributor

dashpole commented Dec 8, 2017

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Dec 8, 2017
@yujuhong
Copy link
Contributor

yujuhong commented Dec 8, 2017

/lgtm

This change to prevent the kubelet from becoming NotReady when docker is hung

This is a reliability fix and the main reason for bumping cadvisor version right now.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 8, 2017
@yujuhong yujuhong added this to the v1.9 milestone Dec 8, 2017
@jberkus
Copy link

jberkus commented Dec 11, 2017

We need to get this PR approved and merged today. It appears to relate to a serious issue, but we also need to close the 1.9 release train. Can a SIG lead look at this please?

@dchen1107
Copy link
Member

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, jsravn, yujuhong

Associated issue: #53207

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Pull Request Current

@dchen1107 @jsravn @yujuhong

Note: This pull request is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required
Pull Request Labels
  • sig/node: Pull Request will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move pull request out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 56599, 56824, 56918, 56967, 56959). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 83fdf49 into kubernetes:master Dec 12, 2017
@jsravn
Copy link
Contributor Author

jsravn commented Dec 13, 2017

Is this cherry pickable to 1.8? I started trying to do it, but the vendoring differences are relatively significant (cadvisor 2.7.3->2.8.3, which brings in dockerd->containerd change). Also finding the vendoring process really painstaking overall. So don't want to spend more time on it unless someone gives the okay for a cherry pick. Thanks!

jsravn pushed a commit to jsravn/kubernetes that referenced this pull request Jan 10, 2018
Update cadvisor dependency to v0.27.4.

Fix kubernetes#53207.
k8s-github-robot pushed a commit that referenced this pull request Mar 31, 2018
…-upstream-release-1.8

Automatic merge from submit-queue.

Cherry pick of #56967 to release-1.8

**What this PR does / why we need it**:
Adds timeout around docker queries, to prevent wedging kubelet on node status updates if docker is non-responsive.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #53207
Cherry picks #56967

**Special notes for your reviewer**:
Kubelet's node status update queries cadvisor, which had no timeout on underlying docker queries. As a result, if docker was non responsive, kubelet would never be able to recover itself without a restart.

**Release note**:

```release-note
Timeout docker queries to prevent node going NotReady indefinitely.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kubelet status updates wedging after failed CRI call
10 participants