-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e clusters sometimes fail to create master #22655
Comments
In this case I don't think the vm even existed for us to collect logs |
Ahh, I saw above error message. Looks like this failure can occur with a real production cluster, not limited to test. cc @kubernetes/goog-gke |
Yeah this is a https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/debian/helper.sh#L42 flake (or gcloud flake). Should this be in a loop? Or is there some way we can get the gcloud debug logs? |
This happened again: http://kubekins.dls.corp.google.com/job/kubernetes-e2e-gce-slow/3019/console Maybe just put all these: kubernetes/cluster/gce/util.sh Line 544 in bc96422
In a loop like: kubernetes/cluster/gce/util.sh Line 505 in bc96422
|
Ideally all of the gcloud calls would be wrapped in a loop. This one is particularly insidious because the create-master-instance call is run in the background and therefore doesn't abort the cluster creation until much later in the process (and without a useful error). |
Yup, need retry loop. @thockin to delegate? Not sure who owns our setup scripts. |
As Dawn noted in #22655 (comment) this appears to be a problem with kube-up in general and not just our e2e setup script, which makes me think we should fix it for 1.2. |
On the other hand, it's been this way since it was written and shouldn't be any flakier now that it's been for the last year. I wouldn't block a release on it (but I would cherry pick it so that it gets fixed on the release branch in the 1.2.1 release if it misses 1.2.0). |
Another occurrence. https://console.cloud.google.com/storage/kubernetes-jenkins/logs/kubernetes-e2e-gce-slow/4164/
|
I'm probably not the right assignee - I have almost no context on this area. It looks like it is flaking once a month? Who has most context on kube-up? Names from This is a pretty nefarious failure mode, can either of you shake loose a little time to estimate it and see what would have to push to fix this? |
I don't know if this is related but brand new Google account and lastest pull of Kuberenetes, I get the following issue.
|
@bweston92 that looks like an issue creating the node VMs. |
[FLAKE-PING] @mikedanese This flaky-test issue would love to have more attention... |
1 similar comment
[FLAKE-PING] @mikedanese This flaky-test issue would love to have more attention... |
[FLAKE-PING] @mikedanese This flaky-test issue would love to have more attention. |
2 similar comments
[FLAKE-PING] @mikedanese This flaky-test issue would love to have more attention. |
[FLAKE-PING] @mikedanese This flaky-test issue would love to have more attention. |
hasn't been referenced in five months... I vote to close this one |
Seems reasonable. |
Observed in #18672 (comment)
Probably contributing to: #20916 (comment)
Of course the kubelets are complaining:
Looks like we didn't even create the master vm, but the error is lost.
The text was updated successfully, but these errors were encountered: