TestUnschedulableNodes is flaky #12312

dchen1107 · 2015-08-05T22:56:50Z

I observed that TestUnschedulableNodes failed a couple of times today at shippable:

--- FAIL: TestUnschedulableNodes (48.13 seconds)
scheduler_test.go:262: Test 0: Pod did not get scheduled on an unschedulable node
scheduler_test.go:275: Test 0: failed to schedule a pod: timed out waiting for the condition
scheduler_test.go:262: Test 1: Pod did not get scheduled on an unschedulable node
scheduler_test.go:275: Test 1: failed to schedule a pod: timed out waiting for the condition
W0805 20:15:05.445184 2078 master.go:249] Network range for service cluster IPs is unspecified. Defaulting to 10.0.0.0/24.
I0805 20:15:05.445401 2078 master.go:275] Node port range unspecified. Defaulting to 30000-32767.
I0805 20:15:05.446485 2078 master.go:297] Will report 172.17.10.248 as public IP address.
E0805 20:15:05.540941 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolumeClaim: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumeclaims?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:05.541304 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolume: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumes?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
I0805 20:15:05.741411 2078 etcd_utils.go:66] Deleting all etcd keys
W0805 20:15:05.805577 2078 controller.go:212] Resetting endpoints for master service "kubernetes" to &{{ } {kubernetes default 0 0001-01-01 00:00:00 +0000 UTC <nil> map[] map[]} [{[{172.17.10.248 <nil>}] [{ 6443 TCP}]}]}
E0805 20:15:05.876542 2078 repair.go:52] unable to persist the updated port allocations: 100: Key not found (/kubernetes.io) [197]
E0805 20:15:06.579242 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolumeClaim: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumeclaims?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:06.579560 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolume: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumes?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:07.653616 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolume: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumes?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:07.653937 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolumeClaim: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumeclaims?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:08.745323 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolumeClaim: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumeclaims?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:08.746963 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolume: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumes?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:09.820005 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolume: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumes?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
E0805 20:15:09.820322 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolumeClaim: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumeclaims?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused
W0805 20:15:10.140209 2078 master.go:249] Network range for service cluster IPs is unspecified. Defaulting to 10.0.0.0/24.
I0805 20:15:10.140510 2078 master.go:275] Node port range unspecified. Defaulting to 30000-32767.
I0805 20:15:10.141214 2078 master.go:297] Will report 172.17.10.248 as public IP address.
E0805 20:15:10.822095 2078 reflector.go:209] pkg/runtime/proc.c:1445: Failed to watch *api.PersistentVolume: Get http://127.0.0.1:47600/api/v1/watch/persistentvolumes?resourceVersion=167: dial tcp 127.0.0.1:47600: connection refused

The text was updated successfully, but these errors were encountered:

dchen1107 · 2015-08-05T22:58:04Z

cc/ @davidopp

dchen1107 · 2015-08-05T22:59:29Z

You can access the latest failure at: https://app.shippable.com/builds/55c25e6145d4c50b00fe18b0

brendandburns · 2015-09-23T18:11:33Z

This just flaked again. Raising to P0

brendandburns · 2015-09-23T18:12:30Z

https://app.shippable.com/builds/5602e5d74c57620b003f441f

scheduler_test.go:263: Test 0: Pod did not get scheduled on an unschedulable node
scheduler_test.go:278: Test 0: Pod got scheduled on a schedulable node
scheduler_test.go:263: Test 1: Pod did not get scheduled on an unschedulable node
scheduler_test.go:276: Test 1: failed to schedule a pod: timed out waiting for the condition

freehan · 2016-05-10T21:49:24Z

--- FAIL: TestUnschedulableNodes (34.47s)
    scheduler_test.go:246: Pod scheduled successfully on unschedulable nodes
    scheduler_test.go:249: Test 0: failed while trying to confirm the pod does not get scheduled on the node: <nil>
    scheduler_test.go:266: Test 0: Pod got scheduled on a schedulable node
    scheduler_test.go:251: Test 1: Pod did not get scheduled on an unschedulable node
    scheduler_test.go:266: Test 1: Pod got scheduled on a schedulable node

goltermann · 2016-06-06T16:58:26Z

Is this still valid? P0 from 10 months ago, no recent work.

dims · 2016-06-06T19:03:03Z

We should close this as a dup of #25845

mml · 2016-06-06T21:28:06Z

@dims this is a different symptom. At least, the two issues describe failures with totally different messages.

davidopp · 2016-06-07T03:21:08Z

Closing due to inactivity.

dchen1107 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/master labels Aug 5, 2015

davidopp self-assigned this Aug 5, 2015

ghost added team/control-plane and removed team/master labels Aug 19, 2015

brendandburns added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Sep 23, 2015

brendandburns mentioned this issue Sep 23, 2015

Increase some more timeouts to eliminate flakes #14440

Merged

jszczepkowski closed this as completed in #14440 Sep 24, 2015

freehan reopened this May 10, 2016

freehan mentioned this issue May 10, 2016

kubenet try to retrieve ip inside pod net namespace #25185

Merged

brendandburns mentioned this issue May 11, 2016

Extend a timeout to (hopefully) fix a flake. #25505

Closed

euank mentioned this issue May 20, 2016

rkt: Get logs via syslog identifier #25851

Merged

davidopp added the kind/flake Categorizes issue or PR as related to a flaky test. label May 27, 2016

caesarxuchao mentioned this issue Jun 2, 2016

Add direct serializer #26251

Merged

davidopp closed this as completed Jun 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestUnschedulableNodes is flaky #12312

TestUnschedulableNodes is flaky #12312

dchen1107 commented Aug 5, 2015

dchen1107 commented Aug 5, 2015

dchen1107 commented Aug 5, 2015

brendandburns commented Sep 23, 2015

brendandburns commented Sep 23, 2015

freehan commented May 10, 2016

goltermann commented Jun 6, 2016

dims commented Jun 6, 2016

mml commented Jun 6, 2016

davidopp commented Jun 7, 2016

TestUnschedulableNodes is flaky #12312

TestUnschedulableNodes is flaky #12312

Comments

dchen1107 commented Aug 5, 2015

dchen1107 commented Aug 5, 2015

dchen1107 commented Aug 5, 2015

brendandburns commented Sep 23, 2015

brendandburns commented Sep 23, 2015

freehan commented May 10, 2016

goltermann commented Jun 6, 2016

dims commented Jun 6, 2016

mml commented Jun 6, 2016

davidopp commented Jun 7, 2016