New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

TestUnschedulableNodes: Failed to observe reflected update for setting unschedulable=true #25845

Closed

lavalamp opened this issue May 19, 2016 · 26 comments

Assignees

Labels

kind/flake priority/important-soon

Milestone

Member

lavalamp commented May 19, 2016

https://console.cloud.google.com/storage/kubernetes-jenkins/logs/kubernetes-test-go/12266/

--- FAIL: TestUnschedulableNodes (20.07s)
    scheduler_test.go:166: Failed to observe reflected update for setting unschedulable=true: timed out waiting for the condition

The text was updated successfully, but these errors were encountered:

lavalamp added priority/important-soon team/control-plane kind/flake labels

freehan mentioned this issue

Kubenet host-port support through iptables #25604

Merged

fejta assigned davidopp

Contributor

fejta commented May 23, 2016

Assigning to @davidopp per team/control-plane label

wojtek-t mentioned this issue

Some fixes to tests to support large clusters #26517

Merged

saad-ali mentioned this issue

Enable Attach/Detach Controller #26351

Merged

mml assigned mml and unassigned davidopp

Contributor

mml commented Jun 3, 2016

Not sure if this is still happening. Will take a look on Monday.

dims mentioned this issue

TestUnschedulableNodes is flaky #12312

Closed

Contributor

mml commented Jun 6, 2016

This error message comes from a place where we retry for an arbitrary period of time (20s) before giving up. If the failures are rare, this looks like the standard problem of trying to pick a timeout that fails quickly enough but doesn't flake all the time.

I have tried cranking the timeout down to 50ms instead to see if I can reproduce (no luck yet), but any transient failure around this timeout is basically by design. The "problem" could be that other threads or just etcd haven't gotten enough resources to finish in 20 wall seconds.

It might be interesting if we could get a thread dump when this fails. And maybe turn on etcd debug logging.

Contributor

mml commented Jun 6, 2016

For sure, running etcd with --debug and sending stdout to a file we keep would be super useful. It logs every request with timestamps.

mml mentioned this issue

Retain debug logs for etcd when there is a place to keep them. #26920

Merged

mml added a commit to mml/kubernetes that referenced this issue


          Retain debug logs for etcd when there is a place to keep them.

5dcb821

For help debugging kubernetes#25845

wojtek-t mentioned this issue

Fix traces #26880

Merged

Member

wojtek-t commented Jun 8, 2016

It sometime happens - I've seen it yesterday in completely unrelated PR.

Member

wojtek-t commented Jun 8, 2016

I think I wrote this test at some point btw. Let me take a look.

wojtek-t mentioned this issue

Extend logging for UnschedulableNodes #27041

Merged

Member

wojtek-t commented Jun 8, 2016

I took a look and to be honest I have no idea what is happening there. I sent out a PR to slightly extend logging which may help with debugging - see #27401

k8s-github-robot pushed a commit that referenced this issue


          Merge pull request #27041 from wojtek-t/unschedulable_nodes

e79f046

Automatic merge from submit-queue

Extend logging for UnschedulableNodes

Ref #25845

Contributor

mml commented Jun 9, 2016

I couldn't find an example of this happening at all in June. Downgrading to P2 and we'll wait until it happens again.

mml added priority/backlog and removed priority/important-soon labels

Member

wojtek-t commented Jun 9, 2016

This happened to me on Tuesday in #26880

wojtek-t added priority/important-soon kind/flake and removed kind/flake priority/backlog labels

abhgupta mentioned this issue

Considering all nodes for the scheduler cache to allow lookups #22568

Merged

wojtek-t mentioned this issue

cmd/integration flake: apiserver received an error that is not an unversioned.Status: couldn't get version/kind; json parse error: invalid character '%' after object key:value pair #25539

Closed

pwittrock added the flake-needs-czar-attention-release-1.4 label

k8s-github-robot commented Aug 29, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention...

1 similar comment

k8s-github-robot commented Aug 29, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention...

k8s-github-robot commented Sep 2, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

11 similar comments

k8s-github-robot commented Sep 6, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Sep 10, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Sep 14, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Sep 18, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Sep 22, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Sep 26, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Sep 30, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Oct 4, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Oct 12, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Oct 20, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

k8s-github-robot commented Oct 28, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

Contributor

calebamiles commented Nov 4, 2016

@mml @lavalamp @fejta @wojtek-t any updates on this issue; should it be closed?

k8s-github-robot commented Nov 12, 2016

[FLAKE-PING] @mml

This flaky-test issue would love to have more attention.

bgrant0607 removed the flake-needs-czar-attention-release-1.4 label

calebamiles added this to the v1.5 milestone

Member

dims commented Nov 17, 2016

Marking this as fixed. please reopen if necessary.

dims closed this as completed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment