-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS e2e pretty busted since ~16:30 on Wed 4/29 #7548
Comments
Wild guess that it might be this one, based purely on merge time:
|
Bad news :( |
A variety, and they're intermittent:
... all failed at least once... |
Tests have been consistently passing since filing this issue. Dropping to P1 while investigation continues. Pretty sure that this is going to crop up again until we figure out the root cause. |
As I guessed, this cropped up again. I'm going to try to repro locally with verbose cluster logging to figure out what's happening. The DNS service is not working, which is breaking other stuff. Not clear yet whether DNS, Services, or just Pods in general are flaky.
Kubernetes e2e Suite run 1 of 1.Cluster level logging using Elasticsearch should check that logs from pods on all nodes are ingested into Elasticsearch Identified problems Cluster level logging using Elasticsearch should check that logs from pods on all nodes are ingested into Elasticsearch /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/es_cluster_logging.go:46 kubectl guestbook should create and stop a working application /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/kubectl.go:125 Density [Performance suite] should allow starting 30 pods per node /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:158 Services should provide DNS for the cluster /go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:165 |
I'm also running the DNS test in a loop locally to see if I can get a failed cluster to pick at. |
I can consistently get DNS to fail by killing the kube-dns-xxxxx pod. And I can consistently get it to come back to life by killing the kube2sky container. "kubectl stop pods kube-dns-xxxxx" will cause the DNS RC to recreate the kube-dns pod, and the kube2sky logs will show that the etcd client is hung on the first write to etcd. Then, "docker stop xxxxxxxxxxx" on the kube2sky container will cause kubelet to restart kube2sky, and it will successfully write all service DNS entries to skyDNS's etcd. |
Might be worth timing out the kube2sky Set operation and re-try? |
I think that we can close this one now? (closed by #7675) |
Agreed. I wanted to hold off until the flakiness aged out of jenkins, which it looks like has happened. |
I haven't looked into the details yet, but it seems that a PR went in at around 16:30 PDT today (Wed 04/29) that is causing significantly more e2e failures than usual in our continuous integration.
I'll dig into it in the morning unless one of the oncalls gets there before me.
The text was updated successfully, but these errors were encountered: