Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR builder: Cluster failed to initialize within 300 seconds #28641

Closed
timstclair opened this issue Jul 7, 2016 · 11 comments
Closed

PR builder: Cluster failed to initialize within 300 seconds #28641

timstclair opened this issue Jul 7, 2016 · 11 comments
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@timstclair
Copy link

timstclair commented Jul 7, 2016

Happened several times in the last few runs:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/28543/kubernetes-pull-build-test-e2e-gce/48158/
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/26696/kubernetes-pull-build-test-e2e-gce/48157/
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/27243/kubernetes-pull-build-test-e2e-gce/48156/

Waiting up to 300 seconds for cluster initialization.

  This will continually check to see if the API for kubernetes is reachable.
  This may time out if there was some uncaught error during start up.

...........................................Cluster failed to initialize within 300 seconds.
2016/07/07 15:09:32 e2e.go:218: Error running up: exit status 2
2016/07/07 15:09:32 e2e.go:214: Step 'up' finished in 7m56.450395031s
2016/07/07 15:09:32 e2e.go:114: Error starting e2e cluster. Aborting.
exit status 1
@timstclair timstclair added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. kind/flake Categorizes issue or PR as related to a flaky test. labels Jul 7, 2016
@timstclair
Copy link
Author

@krousey @fejta

@timstclair
Copy link
Author

Looks like a GCE issue:

ERROR: (gcloud.compute.firewall-rules.delete) Some requests did not succeed:
 - The resource 'projects/k8s-jkns-pr-gce/global/firewalls/e2e-gce-agent-pr-38-0-minion-e2e-gce-agent-pr-38-0-http-alt' was not found

ERROR: (gcloud.compute.firewall-rules.delete) Some requests did not succeed:
 - The resource 'projects/k8s-jkns-pr-gce/global/firewalls/e2e-gce-agent-pr-38-0-minion-e2e-gce-agent-pr-38-0-nodeports' was not found

A recent run failed with a slightly different error:
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/28639/kubernetes-pull-build-test-e2e-gce/48160/

ERROR: (gcloud.compute.instances.create) Some requests did not succeed:
 - The resource 'projects/google-containers/global/images/gci-dev-53-8530-6-0' is obsolete.  New uses are not allowed.  A suggested replacement is 'projects/google-containers/global/images/gci-dev-53-8490-0-0'.

Created [https://www.googleapis.com/compute/v1/projects/k8s-jkns-pr-gce/global/firewalls/e2e-gce-agent-pr-27-0-minion-all].
NAME                              NETWORK                SRC_RANGES     RULES                     SRC_TAGS  TARGET_TAGS
e2e-gce-agent-pr-27-0-minion-all  e2e-gce-agent-pr-27-0  10.180.0.0/14  tcp,udp,icmp,esp,ah,sctp            e2e-gce-agent-pr-27-0-minion
Some commands failed.
...
ERROR: (gcloud.compute.instances.describe) Could not fetch resource:
 - The resource 'projects/k8s-jkns-pr-gce/zones/us-central1-f/instances/e2e-gce-agent-pr-27-0-master' was not found

2016/07/07 15:14:40 e2e.go:218: Error running up: exit status 1
2016/07/07 15:14:40 e2e.go:214: Step 'up' finished in 2m0.195399639s
2016/07/07 15:14:40 e2e.go:114: Error starting e2e cluster. Aborting.
exit status 1

@timstclair
Copy link
Author

Possibly a duplicate of #28612?

@timstclair
Copy link
Author

This appears to have resolved itself.

@timstclair
Copy link
Author

Happened again.

@ixdy
Copy link
Member

ixdy commented Jul 25, 2016

Still flaking. cc @kubernetes/test-infra-maintainers

@gmarek
Copy link
Contributor

gmarek commented Jul 25, 2016

@ixdy - can you paste the link to logs or a suite/run number?

@ixdy
Copy link
Member

ixdy commented Jul 25, 2016

my own pet peeve! sorry about that.
https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/29477/kubernetes-pull-build-test-e2e-gce/50321/ is the failing run.

@gmarek
Copy link
Contributor

gmarek commented Jul 25, 2016

The problem here is that kubelet on master node didn't start apiserver (or any other master components) @dchen1107

@dchen1107
Copy link
Member

The most recent failure reported by @ixdy has the error message in kubelet.log of the master node:

Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/memory/system.slice/kube-logrotate.service: no such file or directory caused kubelet not being able to sync pods.

I think it is a dup of #28997, which should be fixed by #29492

@ixdy
Copy link
Member

ixdy commented Jul 25, 2016

Thanks @dchen1107! I should probably have also pointed out that the failure I linked was on a PR being cherry-picked into release-1.3, so it might be fixed in master already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

4 participants