-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
the server has asked for the client to provide credentials #13067
Comments
@jim-minter, as @smarterclayton says, we should update the test case to first check for build success (and dump logs on failure) before waiting for the deployment. @smarterclayton what's the implication of
is that a benign result of the build pod failing, or is that the reason the build pod failed (something killed it)? |
I *think* that's the kubelet doing its normal "all containers exited, clean
up". But it could be a hung build if the build locked up (didn't look)
On Feb 22, 2017, at 8:59 PM, Ben Parees <notifications@github.com> wrote:
@jim-minter <https://github.com/jim-minter>, as @smarterclayton
<https://github.com/smarterclayton> says, we should update the test case to
first check for build success (and dump logs on failure) before waiting for
the deployment.
@smarterclayton <https://github.com/smarterclayton> what's the implication
of
Feb 22 18:18:31.153: INFO: At 2017-02-22 18:04:05 -0500 EST - event
for a234567890123456789012345678901234567890123456789012345678-1-build:
{kubelet ci-prtest870-ig-n-r0wr} Killing: Killing container with
docker id 3e745c4dbdb0: Need to kill pod.
is that a benign result of the build pod failing, or is that the reason the
build pod failed (something killed it)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13067 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p7f9h_3nqZT5PSeXc9kyIrJ-WoBlks5rfOgWgaJpZM4MJZF3>
.
|
#13074 adds additional logging as requested, but I think what would be more useful here is access to the master log and ideally node logs from the run. These are not recovered as part of the CI job. @stevekuznetsov, is this planned? |
No, I assume @smarterclayton wanted to retrieve both master and node logs. This is the one test I do not intimately own so I'll throw that one over the wall back to him :) |
@stevekuznetsov I mean, IMO it would be good if GCE Jenkins jobs retrieved master and ideally node logs and stored them in the case of a run failure. Is this planned? |
Yes, I agree that would be ideal. Clayton knows the ins and outs of that specific job much better than I, so I would assign an issue for it to him. |
looks like we got good build logging from this failure here and it shows a problem pushing to the registry.. assigning to @mfojtik
|
Saw this again here ... not on GCE, so we have node, master logs, docker logs, whatever you need. |
@mfojtik please could you take a look at this? The build failed because the push failed part way through:
|
@miminar PTAL |
@jim-minter This error is different than the one above. The last time I saw this was a routing issue but that shouldn't be a problem now unless router is broken. Do you have more details? Can you link the failed job? |
@jim-minter my bad, I thought they are unrelated. Thanks. |
@stevekuznetsov just a suggestion. I'd welcome the exact docker version in the logs. |
@miminar we have an artifact called
|
@stevekuznetsov Great, thanks! |
This is indeed etcd failing over.
There is another problem. We are using quorum reads from etcd, which
changed how errors are handled for authorization. It's possible that we
should return 429 retry when etcd returns this error, even for auth.
On Apr 14, 2017, at 10:55 AM, Michail Kargakis <notifications@github.com> wrote:
https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/742/testReport/junit/(root)/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#13067 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p8wcjkaxXzeb7nrARx1VzAVQWdreks5rv4jXgaJpZM4MJZF3>
.
|
@smarterclayton do you have an idea for how to attack fixing this? |
@ncdc I've been trying to tackle it in openshift-eng/aos-cd-jobs#169 but I keep seeing etcd just fall over, and setting |
@stevekuznetsov have we been able to reproduce this in a VM that we control i.e. outside of CI? |
I've had very limited bandwidth so I have not been able to dig very deep into it |
Not a blocker, only affects test cases |
p1 is used on flakes because of the impact they have on developer productivity. we may need a more nuanced release process to make it clear the release is not blocked by them, but they shouldn't get relegated to p2 limbo if they are occurring frequently. |
Of course we need a better process, but for as long as we've been labeling flakes P1 we've done the demotion/promotion dance around their priority when they span a release ... the P1 is triggering blocker bug mechanisms that we don't need it to trigger as it does not block the release. Not really sure why we should treat this flake differently from the rest. |
That dance is news to me. We have ignored them on blocker calls but i'm not aware of the process requiring us to formally mark them down to get through the release. @pweil- |
(and i'm not suggesting this flake be treated differently, it's just the only one i personally saw getting marked down when it's occurring frequently) |
I've updated this flake to reflect the current root cause (which is cropping up in all sorts of test failures now): 'the server has asked for the client to provide credentials' My understanding is we still think that is fundamentally caused by an etcd hiccup. |
Should be fixed in openshift-eng/aos-cd-jobs#199 |
Haven't seen it since. Closing. |
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_gce/870/testReport/junit/(root)/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/
Looks like the build failed, you'll need more debugging. @bparees
The text was updated successfully, but these errors were encountered: