DNS e2e pretty busted since ~16:30 on Wed 4/29 #7548

ghost · 2015-04-30T04:49:03Z

I haven't looked into the details yet, but it seems that a PR went in at around 16:30 PDT today (Wed 04/29) that is causing significantly more e2e failures than usual in our continuous integration.

I'll dig into it in the morning unless one of the oncalls gets there before me.

ghost · 2015-04-30T05:00:27Z

Wild guess that it might be this one, based purely on merge time:

Move ComputePodChanges to the Docker runtime Move ComputePodChanges to the Docker runtime #7480

yifan-gu · 2015-04-30T05:14:21Z

Bad news :(
which tests are failing?

ghost · 2015-04-30T05:26:12Z

A variety, and they're intermittent:

Services should provide DNS for the cluster
Density [Performance suite] should allow starting 30 pods per node
Shell tests that services.sh passes
Monitoring verify monitoring pods and all cluster nodes are available on influxdb using heapster.
kubectl guestbook should create and stop a working application
Cluster level logging using Elasticsearch should check that logs from pods on all nodes are ingested into Elasticsearch

... all failed at least once...

ghost · 2015-04-30T13:39:27Z

Tests have been consistently passing since filing this issue. Dropping to P1 while investigation continues. Pretty sure that this is going to crop up again until we figure out the root cause.

ghost · 2015-04-30T20:55:53Z

As I guessed, this cropped up again. I'm going to try to repro locally with verbose cluster logging to figure out what's happening. The DNS service is not working, which is breaking other stuff. Not clear yet whether DNS, Services, or just Pods in general are flaky.

Test Result (4 failures / +4)

Kubernetes e2e Suite run 1 of 1.Cluster level logging using Elasticsearch should check that logs from pods on all nodes are ingested into Elasticsearch
Kubernetes e2e Suite run 1 of 1.kubectl guestbook should create and stop a working application
Kubernetes e2e Suite run 1 of 1.Density [Performance suite] should allow starting 30 pods per node
Kubernetes e2e Suite run 1 of 1.Services should provide DNS for the cluster

Identified problems

Cluster level logging using Elasticsearch should check that logs from pods on all nodes are ingested into Elasticsearch

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/es_cluster_logging.go:46
Failed to find all 200 log lines

kubectl guestbook should create and stop a working application

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/kubectl.go:125
Frontend service did not start serving content in 600 seconds.

Density [Performance suite] should allow starting 30 pods per node

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:158
Expected error:
<*errors.errorString | 0xc208a36240>: {
s: "Error: Pod my-hostname-density60-4363cd45-ef6e-11e4-a8a5-42010af01555zzng7: Container my-hostname-density60-4363cd45-ef6e-11e4-a8a5-42010af01555 was found to have terminated 1 times",
}
Error: Pod my-hostname-density60-4363cd45-ef6e-11e4-a8a5-42010af01555zzng7: Container my-hostname-density60-4363cd45-ef6e-11e4-a8a5-42010af01555 was found to have terminated 1 times
not to have occurred

Services should provide DNS for the cluster

/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:165
Expected
: 3
to equal
: 0

cjcullen · 2015-04-30T21:10:04Z

I'm also running the DNS test in a loop locally to see if I can get a failed cluster to pick at.

cjcullen · 2015-05-01T04:55:52Z

I can consistently get DNS to fail by killing the kube-dns-xxxxx pod. And I can consistently get it to come back to life by killing the kube2sky container.

"kubectl stop pods kube-dns-xxxxx" will cause the DNS RC to recreate the kube-dns pod, and the kube2sky logs will show that the etcd client is hung on the first write to etcd.

Then, "docker stop xxxxxxxxxxx" on the kube2sky container will cause kubelet to restart kube2sky, and it will successfully write all service DNS entries to skyDNS's etcd.

vmarmol · 2015-05-02T00:00:43Z

Might be worth timing out the kube2sky Set operation and re-try?

ghost · 2015-05-05T16:11:59Z

I think that we can close this one now? (closed by #7675)

cjcullen · 2015-05-05T16:39:04Z

Agreed. I wanted to hold off until the flakiness aged out of jenkins, which it looks like has happened.

ghost added area/test priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/test-infra labels Apr 30, 2015

ghost assigned cjcullen Apr 30, 2015

ghost added this to the v1.0 milestone Apr 30, 2015

ghost mentioned this issue Apr 30, 2015

Failing performance tests on Jenkins #7561

Closed

ghost added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Apr 30, 2015

yifan-gu mentioned this issue May 1, 2015

Add SyncPod() to DockerManager and use it in Kubelet #7610

Merged

nikhiljindal mentioned this issue May 1, 2015

Flip to v1beta3 #5475

Closed

16 tasks

ghost changed the title ~~e2e pretty busted since ~16:30 on Wed 4/29~~ DNS e2e pretty busted since ~16:30 on Wed 4/29 May 1, 2015

vmarmol closed this as completed May 2, 2015

vmarmol reopened this May 2, 2015

cjcullen mentioned this issue May 4, 2015

Fix kube2sky flakes. Fix tools.GetEtcdVersion to work with etcd > 2.0.7 #7675

Merged

cjcullen closed this as completed May 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS e2e pretty busted since ~16:30 on Wed 4/29 #7548

DNS e2e pretty busted since ~16:30 on Wed 4/29 #7548

ghost commented Apr 30, 2015

ghost commented Apr 30, 2015

yifan-gu commented Apr 30, 2015

ghost commented Apr 30, 2015

ghost commented Apr 30, 2015

ghost commented Apr 30, 2015

cjcullen commented Apr 30, 2015

cjcullen commented May 1, 2015

vmarmol commented May 2, 2015

ghost commented May 5, 2015

cjcullen commented May 5, 2015

DNS e2e pretty busted since ~16:30 on Wed 4/29 #7548

DNS e2e pretty busted since ~16:30 on Wed 4/29 #7548

Comments

ghost commented Apr 30, 2015

ghost commented Apr 30, 2015

yifan-gu commented Apr 30, 2015

ghost commented Apr 30, 2015

ghost commented Apr 30, 2015

ghost commented Apr 30, 2015

cjcullen commented Apr 30, 2015

cjcullen commented May 1, 2015

vmarmol commented May 2, 2015

ghost commented May 5, 2015

cjcullen commented May 5, 2015