Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e flake: "KubeProxy should test kube-proxy" #17781

Closed
wojtek-t opened this issue Nov 25, 2015 · 20 comments
Closed

e2e flake: "KubeProxy should test kube-proxy" #17781

wojtek-t opened this issue Nov 25, 2015 · 20 comments
Assignees
Labels
area/test kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@wojtek-t
Copy link
Member

Test failed with the following error:

/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubeproxy.go:101 Expected : 1 to be == : 2

Example run:
http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-e2e-gke-ci/9692/

@kubernetes/goog-cluster

@mikedanese - can this be related to your recent changes?

@wojtek-t wojtek-t added area/test priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/cluster kind/flake Categorizes issue or PR as related to a flaky test. labels Nov 25, 2015
@gmarek gmarek added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Nov 25, 2015
@gmarek
Copy link
Contributor

gmarek commented Nov 25, 2015

It actually failed 2/3 of last runs. I think we should move it to flaky suite if we won't be able to fix it today.

@ixdy @mikedanese

@wojtek-t
Copy link
Member Author

@kubernetes/goog-testing

@gmarek
Copy link
Contributor

gmarek commented Nov 26, 2015

Moving the GKE version of this test to the GKE_flaky suite.

@gmarek
Copy link
Contributor

gmarek commented Nov 26, 2015

@thockin - can you investigate or reassign?

@wojtek-t
Copy link
Member Author

@mikedanese - can this be related to moving kube-proxy to a pod? (the timing when it started failing was around when your PR was merged)

@mikedanese
Copy link
Member

I'm not sure and haven't looked yet but I will look today,

@wojtek-t
Copy link
Member Author

@mikedanese - thanks!

@mikedanese
Copy link
Member

#16344 #17121 #15777 all went in around the same time and affect kubeproxy e2e test.

The test is relying on hitting all hostnames which is a valid assertion when we are using roundrobin proxy, but seems like it could be flaky after switching to iptables which is probabilistic. That's the only thing that statnds out to me but haven't tested this theory.

Do we have an exact date of when this test started flaking or was it "about 2 weeks ago"?

Does this only affect gke-ci?

@wojtek-t
Copy link
Member Author

It affects all suites (including gce).

Regarding the exact data - we don't have exact run, because there were a bunch of issues around that time (including starting clusters).

@ikehz
Copy link
Contributor

ikehz commented Nov 30, 2015

It's possible #17965 is related to this.

@ArtfulCoder
Copy link
Contributor

I am looking into it..
It is not related to 17965.
The test breaks after we reduce the number of endpoints..

(the first half of tests work..)
I believe its the test change that was made recently is causing this.#15777
#15777

On Mon, Nov 30, 2015 at 9:58 AM, Isaac Hollander McCreery <
notifications@github.com> wrote:

It's possible #17965
#17965 is related to
this.


Reply to this email directly or view it on GitHub
#17781 (comment)
.

@mikedanese
Copy link
Member

@ihmccreery i think its not related as artfulcoder said. That looks more like #17583

@wojtek-t
Copy link
Member Author

wojtek-t commented Dec 3, 2015

The potential fix: #17995 has already been merged. We should verify in few days whether it fixed the problem and move the test back out of flaky if that's the case.

@wojtek-t
Copy link
Member Author

wojtek-t commented Dec 4, 2015

It seems that #17995 fixed the problem. I will leave this issue open until Monday and if there won't be any issues, I will move it non-flaky suite and close this issue then.

@zmerlynn
Copy link
Member

zmerlynn commented Dec 5, 2015

This needs to be cherry-picked into the release-1.1 branch as well, please. http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-release-1.1/885/

@gmarek
Copy link
Contributor

gmarek commented Dec 21, 2015

@gmarek gmarek reopened this Dec 21, 2015
@gmarek
Copy link
Contributor

gmarek commented Dec 23, 2015

Actually it isn't fixed. It just failed in gke-ci: http://kubekins.dls.corp.google.com/view/Critical%20Builds/job/kubernetes-e2e-gke-ci/10073/

@gmarek
Copy link
Contributor

gmarek commented Jan 4, 2016

@thockin
Copy link
Member

thockin commented Jan 20, 2016

please open new issues with new links if this pops up again.

@ArtfulCoder we need to prio this if it comes up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

7 participants