Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test failed] [1.10 upgrade] Cadvisor should be healthy on every node #60768

Closed
krzyzacy opened this issue Mar 5, 2018 · 12 comments
Closed

[test failed] [1.10 upgrade] Cadvisor should be healthy on every node #60768

krzyzacy opened this issue Mar 5, 2018 · 12 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. milestone/needs-attention priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.
Milestone

Comments

@krzyzacy
Copy link
Member

krzyzacy commented Mar 5, 2018

the tests are failing in http://k8s-testgrid.appspot.com/sig-release-master-upgrade#gce-1.9-master-upgrade-master

/sig sig-instrumentation
/priority failing-test
/priority critical-urgent
/kind bug
/status approved-for-milestone

cc @jdumars @jberkus
/assign @piosz @fabxc
also cc @dashpole for cadvisor I guess

@k8s-ci-robot k8s-ci-robot added status/approved-for-milestone kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. kind/bug Categorizes issue or PR as related to a bug. labels Mar 5, 2018
@krzyzacy
Copy link
Member Author

krzyzacy commented Mar 5, 2018

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added the sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. label Mar 5, 2018
@krzyzacy
Copy link
Member Author

krzyzacy commented Mar 5, 2018

xref #60764

@tpepper
Copy link
Member

tpepper commented Mar 6, 2018

@kubernetes/sig-release-admins looks like this needs /milestone'd by somebody with priv's

@krzyzacy krzyzacy added this to the v1.10 milestone Mar 6, 2018
@krzyzacy
Copy link
Member Author

krzyzacy commented Mar 6, 2018

whoops, thanks for reminding me

@brancz
Copy link
Member

brancz commented Mar 8, 2018

I dug into this a bit and I think I figured out what's happening.

The test logs of each of the 6 retries of the cAdvisor "ping" on each node says:

I0306 16:16:11.292] Mar  6 16:16:11.292: INFO: failed to retrieve kubelet stats -
I0306 16:16:11.293]  [the server could not find the requested resource the server could not find the requested resource the server could not find the requested resource the server could not find the requested resource]
I0306 16:16:21.292] �[1mSTEP�[0m: Querying stats from node bootstrap-e2e-master using url api/v1/proxy/nodes/bootstrap-e2e-master/stats/
I0306 16:16:21.295] �[1mSTEP�[0m: Querying stats from node bootstrap-e2e-minion-group-dspf using url api/v1/proxy/nodes/bootstrap-e2e-minion-group-dspf/stats/
I0306 16:16:21.296] �[1mSTEP�[0m: Querying stats from node bootstrap-e2e-minion-group-p1c4 using url api/v1/proxy/nodes/bootstrap-e2e-minion-group-p1c4/stats/
I0306 16:16:21.298] �[1mSTEP�[0m: Querying stats from node bootstrap-e2e-minion-group-vw6b using url api/v1/proxy/nodes/bootstrap-e2e-minion-group-vw6b/stats/
I0306 16:16:21.299] Mar  6 16:16:21.298: INFO: failed to retrieve kubelet stats -
I0306 16:16:21.299]  [the server could not find the requested resource the server could not find the requested resource the server could not find the requested resource the server could not find the requested resource]

At first this seemed curious, but I realized that this is actually the correct behavior for a 1.10 cluster, as the /api/v1/proxy API has been removed for 1.10. As this is an upgrade test, however, the tests are executed (from how I understand) from latest 1.9.x release, where the path was indeed still using the old API.

I talked to @sttts, as I wasn't sure how to proceed, he told me, that we need to get the patch to use the "new"/non-deprecated API into a 1.9.x release, cut the release, and then the upgrade tests should pass. He also mentioned that you @mbohlool are the release lead for 1.9, I'm just tagging you so you are aware of this. I will open a PR against the 1.9 release branch to adapt the test to use the "new"/non-deprecated API.

@brancz
Copy link
Member

brancz commented Mar 8, 2018

/status in-progress

@k8s-ci-robot
Copy link
Contributor

You must be a member of the kubernetes/kubernetes-milestone-maintainers github team to add status labels.

@brancz
Copy link
Member

brancz commented Mar 8, 2018

Opened #60921, I'm not sure if I followed the procedure correctly, please let me know if I did something wrong.

@DirectXMan12
Copy link
Contributor

/status in-progress

helping out since @brancz is a SIG lead ;-)

@liggitt
Copy link
Member

liggitt commented Mar 13, 2018

in progress, 1.9 PR open at #60921

@k8s-github-robot
Copy link

[MILESTONENOTIFIER] Milestone Issue Needs Attention

@fabxc @krzyzacy @piosz @kubernetes/sig-instrumentation-misc

Action Required: This issue has not been updated since Mar 13. Please provide an update.

Note: This issue is marked as priority/critical-urgent, and must be updated every 1 day during code freeze.

Example update:

ACK.  In progress
ETA: DD/MM/YYYY
Risks: Complicated fix required
Issue Labels
  • sig/instrumentation: Issue will be escalated to these SIGs if needed.
  • priority/critical-urgent: Never automatically move issue out of a release milestone; continually escalate to contributor and SIG through all available channels.
  • kind/bug: Fixes a bug discovered during the current release.
Help

k8s-github-robot pushed a commit that referenced this issue Mar 15, 2018
Automatic merge from submit-queue.

e2e/monitoring: Use non-deprecated proxy API

**What this PR does / why we need it**:

In Kubernetes 1.10 this API is removed, which causes upgrade tests to
fail, as the deprecated, but in 1.10 removed API is used here.


**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:

Fixes #60768

**Special notes for your reviewer**:

I'm not sure whether the upgrade tests are run from HEAD of 1.9 branch or from the latest release, if it's from the latest release, then we need a 1.9.x patch release for the upgrade test to be fixed.

cc @mbohlool (this is my first time doing this so I may have done something completely wrong, please tell me if I did do so 🙂)
@liggitt
Copy link
Member

liggitt commented Mar 15, 2018

fixed by #60921

@liggitt liggitt closed this as completed Mar 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. milestone/needs-attention priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.
Projects
None yet
Development

No branches or pull requests

9 participants