-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e flake: Pod Disks tests are very flaky #26076
Comments
Assigning @thockin for fix or triage. |
I can work on this, as I am writing tests for openstack in |
It would be great! |
Huamin, Thanks for jumping on this. Please consider it priority - flaky tests are On Mon, May 23, 2016 at 7:57 AM, Huamin Chen notifications@github.com
|
sure |
These two tests are so flaky that they will block the merge queue for over half of the time it takes to merge a fix. If you think the fix will take more than a day, I'd suggest moving these to the flaky suite until they're fixed. |
I'd suggest tagging them flaky ASAP. Whenever they get fixed they can be untagged. |
I am okay with moving these to the flaky suite until we can get some focus on it. @childsb |
PR is #26089, will likely need a manual merge since the SQ is so blocked. |
Hang on this looks like it was broken by a PR merged on Sunday: #21709 Test Run http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-slow/5873/ and earlier are Green. CC @swagiaal I'll prepare a roll back. |
@saad-ali good catch, i'll take a look at the gce attacher. |
@saad-ali give me some moments to figure out a fix |
I think the appropriate thing to do here was to move them into the flaky suite or revert the offending PR right away, not wait several hours for the fix to pass code review and CI. |
Agreed, this could've been handled better. |
Occurrences tracked automatically in #26127. I don't care if you leave this one open too. |
Closing this in favor autogenerated #26127 |
Automatic merge from submit-queue in e2e test, when kubectl exec fails to find the container to run a command, it should retry fix #26076 Without retrying upon "container not found" error, `Pod Disks` test failed on the following error: ```console [k8s.io] Pod Disks should schedule a pod w/two RW PDs both mounted to one container, write to PD, verify contents, delete pod, recreate pod, verify contents, and repeat in rapid succession [Slow] /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/pd.go:271 [BeforeEach] [k8s.io] Pod Disks /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/framework.go:108 STEP: Creating a kubernetes client May 23 19:18:02.254: INFO: >>> TestContext.KubeConfig: /root/.kube/config STEP: Building a namespace api object STEP: Waiting for a default service account to be provisioned in namespace [BeforeEach] [k8s.io] Pod Disks /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/pd.go:69 [It] should schedule a pod w/two RW PDs both mounted to one container, write to PD, verify contents, delete pod, recreate pod, verify contents, and repeat in rapid succession [Slow] /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/pd.go:271 STEP: creating PD1 May 23 19:18:06.678: INFO: Successfully created a new PD: "rootfs-e2e-11dd5f5b-211b-11e6-a3ff-b8ca3a62792c". STEP: creating PD2 May 23 19:18:11.216: INFO: Successfully created a new PD: "rootfs-e2e-141f062d-211b-11e6-a3ff-b8ca3a62792c". May 23 19:18:11.216: INFO: PD Read/Writer Iteration #0 STEP: submitting host0Pod to kubernetes W0523 19:18:11.279910 4984 request.go:347] Field selector: v1 - pods - metadata.name - pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c: need to check if this is versioned correctly. STEP: writing a file in the container May 23 19:18:39.088: INFO: Running '/srv/dev/kubernetes/_output/dockerized/bin/linux/amd64/kubectl kubectl --server=https://130.211.199.187 --kubeconfig=/root/.kube/config exec --namespace=e2e-tests-pod-disks-3t3g8 pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c -c=mycontainer -- /bin/sh -c echo '1394466581702052925' > '/testpd1/tracker0'' May 23 19:18:40.250: INFO: Wrote value: "1394466581702052925" to PD1 ("rootfs-e2e-11dd5f5b-211b-11e6-a3ff-b8ca3a62792c") from pod "pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c" container "mycontainer" STEP: writing a file in the container May 23 19:18:40.251: INFO: Running '/srv/dev/kubernetes/_output/dockerized/bin/linux/amd64/kubectl kubectl --server=https://130.211.199.187 --kubeconfig=/root/.kube/config exec --namespace=e2e-tests-pod-disks-3t3g8 pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c -c=mycontainer -- /bin/sh -c echo '1740704063962701662' > '/testpd2/tracker0'' May 23 19:18:41.433: INFO: Wrote value: "1740704063962701662" to PD2 ("rootfs-e2e-141f062d-211b-11e6-a3ff-b8ca3a62792c") from pod "pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c" container "mycontainer" STEP: reading a file in the container May 23 19:18:41.433: INFO: Running '/srv/dev/kubernetes/_output/dockerized/bin/linux/amd64/kubectl kubectl --server=https://130.211.199.187 --kubeconfig=/root/.kube/config exec --namespace=e2e-tests-pod-disks-3t3g8 pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c -c=mycontainer -- cat /testpd1/tracker0' May 23 19:18:42.585: INFO: Read file "/testpd1/tracker0" with content: 1394466581702052925 STEP: reading a file in the container May 23 19:18:42.585: INFO: Running '/srv/dev/kubernetes/_output/dockerized/bin/linux/amd64/kubectl kubectl --server=https://130.211.199.187 --kubeconfig=/root/.kube/config exec --namespace=e2e-tests-pod-disks-3t3g8 pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c -c=mycontainer -- cat /testpd2/tracker0' May 23 19:18:43.779: INFO: Read file "/testpd2/tracker0" with content: 1740704063962701662 STEP: deleting host0Pod May 23 19:18:44.048: INFO: PD Read/Writer Iteration #1 STEP: submitting host0Pod to kubernetes W0523 19:18:44.132475 4984 request.go:347] Field selector: v1 - pods - metadata.name - pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c: need to check if this is versioned correctly. STEP: reading a file in the container May 23 19:18:45.186: INFO: Running '/srv/dev/kubernetes/_output/dockerized/bin/linux/amd64/kubectl kubectl --server=https://130.211.199.187 --kubeconfig=/root/.kube/config exec --namespace=e2e-tests-pod-disks-3t3g8 pd-test-16d3653c-211b-11e6-a3ff-b8ca3a62792c -c=mycontainer -- cat /testpd1/tracker0' May 23 19:18:46.290: INFO: error running kubectl exec to read file: exit status 1 stdout= stderr=error: error executing remote command: error executing command in container: container not found ("mycontainer") ) May 23 19:18:46.290: INFO: Error reading file: exit status 1 May 23 19:18:46.290: INFO: Unexpected error occurred: exit status 1 ``` Now I've run this fix on e2e pd test 5 times and no longer see any failure
The following tests failed multiple times both in GCE and GKE suites:
The error message is the same:
Logs:
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-slow/5896
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-slow/5899
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-slow/5903
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-slow/5904
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gke-slow/4491
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gke-slow/4495
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gke-slow/4496
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gke-slow/4497
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gke-slow/4500
https://console.cloud.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gke-slow/4503
The text was updated successfully, but these errors were encountered: