Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubernetes-e2e-aws failing to start cluster #18037

Closed
spxtr opened this issue Dec 1, 2015 · 12 comments
Closed

kubernetes-e2e-aws failing to start cluster #18037

spxtr opened this issue Dec 1, 2015 · 12 comments
Assignees
Labels
area/test area/test-infra priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Milestone

Comments

@spxtr
Copy link
Contributor

spxtr commented Dec 1, 2015

kubernetes-e2e-aws has been running daily for about a week and failing each time after trying to check for salt-master repeatedly:

e2e.go:141: Error starting e2e cluster. Aborting.

Once this is green, should it be moved to critical builds?

@spxtr
Copy link
Contributor Author

spxtr commented Dec 1, 2015

Also related, there is a case in e2e.sh for kubernetes-e2e-aws-parallel, but no corresponding Jenkins job.

@j3ffml
Copy link
Contributor

j3ffml commented Dec 11, 2015

Ping on this. Cluster up is succeeding now, but the e2e test driver is timing out waiting for kube-system pods to start running.

@ikehz ikehz added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Dec 22, 2015
@ixdy
Copy link
Member

ixdy commented Feb 4, 2016

any update on this?

@spxtr
Copy link
Contributor Author

spxtr commented Feb 11, 2016

It looks like both aws and aws-1.1 are running tests, with one failing consistently on aws and several failing on aws-1.1.

@spxtr
Copy link
Contributor Author

spxtr commented Feb 12, 2016

There is one test failing now, same place 3x in a row. @brendandburns you might be interested.

[BeforeEach] Services
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:70

[It] should be able to change the type and ports of a service
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:699
...
STEP: changing the TCP service to type=LoadBalancer
STEP: changing the UDP service to type=LoadBalancer
STEP: waiting for the TCP service to have a load balancer
Feb 11 18:53:08.217: INFO: Waiting up to 20m0s for service "mutability-test" to have a LoadBalancer
Feb 11 18:53:10.291: INFO: TCP load balancer: a999de63bd13311e59c500a99e494a48-1780227540.us-east-1.elb.amazonaws.com
STEP: waiting for the UDP service mutability-test to have a load balancer
STEP: waiting for the UDP service to have a load balancer
Feb 11 18:53:10.291: INFO: Waiting up to 20m0s for service "mutability-test" to have a LoadBalancer
Feb 11 19:13:10.433: FAIL: Timeout waiting for service "mutability-test" to have a load balancer
...

• Failure [1281.610 seconds]
Services
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:885
  should be able to change the type and ports of a service [It]
  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:699

  Feb 11 19:13:10.433: Timeout waiting for service "mutability-test" to have a load balancer

  /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/service.go:1602

@justinsb justinsb modified the milestone: v1.2 Feb 20, 2016
@justinsb justinsb added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/backlog Higher priority than priority/awaiting-more-evidence. labels Feb 23, 2016
@brendandburns
Copy link
Contributor

Assigning to @justinsb since I believe he is fixing this stuff. Ping back if not true.

@justinsb
Copy link
Member

Thank you. I'm pretty sure this is fixed (I'm running the e2e tests locally using a simulate-Jenkins hack), and they are currently all green. But once we get the pending PRs merged I will verify / investigate whether there is something wrong with Jenkins!

@fejta
Copy link
Contributor

fejta commented Mar 2, 2016

Jenkins ran AWS tests on 3/1 and 2/29

@spxtr
Copy link
Contributor Author

spxtr commented Mar 2, 2016

The cluster successfully comes up, but the two tests: Services should be able to up and down services and SSH should SSH to all nodes and run commands are failing. We can either close this and open a new issue or just track those here.

On release-1.1 branch there are lots of tests failing, but I don't think that's a priority.

@justinsb
Copy link
Member

justinsb commented Mar 2, 2016

@spxtr those are failing on 1.2? The default SSH username changed to "admin" if you're using jessie, which is now the default if you don't set KUBE_OS_DISTRIBUTION. So KUBE_SSH_USER needs to be changed from ubuntu -> jessie. I can file a PR for that.

I think service up and down is a flake, but I've seen it come and go also.

I would hope 1.1 should pass tests, but that shouldn't be a priority over 1.2.

I propose we close this and open 2 issues: default SSH username change, and 1.1 e2e being not happy. And then if we see the service up and down again we open it too...

@spxtr
Copy link
Contributor Author

spxtr commented Mar 2, 2016

SGTM

@justinsb
Copy link
Member

justinsb commented Mar 4, 2016

Opened those two issues; closing this one.

@justinsb justinsb closed this as completed Mar 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test area/test-infra priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

No branches or pull requests

8 participants