the server has asked for the client to provide credentials #13067

smarterclayton · 2017-02-23T00:46:02Z

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_gce/870/testReport/junit/(root)/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/

/tmp/openshift/tito/rpmbuild-originbVKDf0/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/test/extended/builds/new_app.go:32
Expected error:
    <*errors.errorString | 0xc4212baa70>: {
        s: "timed out waiting for deployment \"a234567890123456789012345678901234567890123456789012345678\" after 15m0s",
    }
    timed out waiting for deployment "a234567890123456789012345678901234567890123456789012345678" after 15m0s
not to have occurred
/tmp/openshift/tito/rpmbuild-originbVKDf0/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/test/extended/builds/new_app.go:31

18:00:05 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-22 17:59:35 -0500 EST  }]
Feb 22 18:18:31.243: INFO: a234567890123456789012345678901234567890123456789012345678-1-build  ci-prtest870-ig-n-r0wr  Failed          [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-22 18:03:31 -0500 EST  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-02-22 18:04:05 -0500 EST ContainersNotReady containers with unready status: [sti-build]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-22 18:03:31 -0500 EST  }]

Looks like the build failed, you'll need more debugging. @bparees

The text was updated successfully, but these errors were encountered:

bparees · 2017-02-23T01:59:49Z

@jim-minter, as @smarterclayton says, we should update the test case to first check for build success (and dump logs on failure) before waiting for the deployment.

@smarterclayton what's the implication of

Feb 22 18:18:31.153: INFO: At 2017-02-22 18:04:05 -0500 EST - event for a234567890123456789012345678901234567890123456789012345678-1-build: {kubelet ci-prtest870-ig-n-r0wr} Killing: Killing container with docker id 3e745c4dbdb0: Need to kill pod.

is that a benign result of the build pod failing, or is that the reason the build pod failed (something killed it)?

smarterclayton · 2017-02-23T02:10:16Z

I *think* that's the kubelet doing its normal "all containers exited, clean up". But it could be a hung build if the build locked up (didn't look) On Feb 22, 2017, at 8:59 PM, Ben Parees <notifications@github.com> wrote: @jim-minter <https://github.com/jim-minter>, as @smarterclayton <https://github.com/smarterclayton> says, we should update the test case to first check for build success (and dump logs on failure) before waiting for the deployment. @smarterclayton <https://github.com/smarterclayton> what's the implication of Feb 22 18:18:31.153: INFO: At 2017-02-22 18:04:05 -0500 EST - event for a234567890123456789012345678901234567890123456789012345678-1-build: {kubelet ci-prtest870-ig-n-r0wr} Killing: Killing container with docker id 3e745c4dbdb0: Need to kill pod. is that a benign result of the build pod failing, or is that the reason the build pod failed (something killed it)? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13067 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p7f9h_3nqZT5PSeXc9kyIrJ-WoBlks5rfOgWgaJpZM4MJZF3> .

jim-minter · 2017-02-23T10:59:47Z

#13074 adds additional logging as requested, but I think what would be more useful here is access to the master log and ideally node logs from the run. These are not recovered as part of the CI job. @stevekuznetsov, is this planned?

stevekuznetsov · 2017-02-23T12:44:24Z

No, I assume @smarterclayton wanted to retrieve both master and node logs. This is the one test I do not intimately own so I'll throw that one over the wall back to him :)

jim-minter · 2017-02-24T08:44:02Z

@stevekuznetsov I mean, IMO it would be good if GCE Jenkins jobs retrieved master and ideally node logs and stored them in the case of a run failure. Is this planned?

stevekuznetsov · 2017-02-24T13:02:54Z

Yes, I agree that would be ideal. Clayton knows the ins and outs of that specific job much better than I, so I would assign an issue for it to him.

bparees · 2017-02-27T19:40:06Z

looks like we got good build logging from this failure here and it shows a problem pushing to the registry.. assigning to @mfojtik
https://ci.openshift.redhat.com/jenkins/job/zz_origin_gce_image/138/testReport/junit/%28root%29/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/

2017-02-27T19:06:55.909760000Z Pushing image 172.30.241.235:5000/extended-test-new-app-flk70-2vm8c/a234567890123456789012345678901234567890123456789012345678:latest ...
2017-02-27T19:06:56.040451000Z Registry server Address: 
2017-02-27T19:06:56.040798000Z Registry server User Name: serviceaccount
2017-02-27T19:06:56.041079000Z Registry server Email: serviceaccount@example.org
2017-02-27T19:06:56.041370000Z Registry server Password: <<non-empty>>
2017-02-27T19:06:56.041669000Z error: build error: Failed to push image: Get https://172.30.241.235:5000/v1/_ping: x509: cannot validate certificate for 172.30.241.235 because it doesn't contain any IP SANs

smarterclayton · 2017-02-27T19:44:06Z

This is https://bugzilla.redhat.com/show_bug.cgi?id=1427040

stevekuznetsov · 2017-03-11T13:47:30Z

Saw this again here ... not on GCE, so we have node, master logs, docker logs, whatever you need.

jim-minter · 2017-03-13T10:56:28Z

@mfojtik please could you take a look at this? The build failed because the push failed part way through:

2017-03-11T02:50:22.815884000Z Pushed 7/9 layers, 84% complete
2017-03-11T02:50:56.735001000Z Registry server Address: 
2017-03-11T02:50:56.742434000Z Registry server User Name: serviceaccount
2017-03-11T02:50:56.747935000Z Registry server Email: serviceaccount@example.org
2017-03-11T02:50:56.757086000Z Registry server Password: <<non-empty>>
2017-03-11T02:50:56.761795000Z error: build error: Failed to push image: unauthorized: authentication required

time="2017-03-11T02:50:56.607738296Z" level=error msg="Get user failed with error: the server has asked for the client to provide credentials (get users ~)" go.version=go1.7.4 http.request.h
ost="172.30.125.183:5000" http.request.id=11b11ffe-6b7b-4a56-9585-45c526de8f78 http.request.method=PUT http.request.remoteaddr="10.128.0.1:36682" http.request.uri="/v2/extended-test-new-app-
n1n3l-3vtvl/a234567890123456789012345678901234567890123456789012345678/blobs/uploads/8a1fb4db-ceae-457a-99f5-bcf461a47676?_state=9O2ALNijpG1rx_8F8Xlx5uL8wwapUKWpnEZwrlmjLAF7Ik5hbWUiOiJleHRlb
mRlZC10ZXN0LW5ldy1hcHAtbjFuM2wtM3Z0dmwvYTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3OCIsIlVVSUQiOiI4YTFmYjRkYi1jZWFlLTQ1N2EtOTlmNS1iY2Y0NjFhNDc2NzYiLCJPZmZzZXQ
iOjcwMDY4OTM3LCJTdGFydGVkQXQiOiIyMDE3LTAzLTExVDAyOjUwOjE0WiJ9&digest=sha256%3Ad1e4228e5083cf1fb388732eea618c296623860fd76a8af1eb5f24f9e819c065" http.request.useragent="docker/1.12.6 go/go1.7
.4 kernel/3.10.0-327.22.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" instance.id=9eb7f703-fd6c-42b9-b45d-69a5196f2ad6 openshift.logger=registry vars.name="extended-test-
new-app-n1n3l-3vtvl/a234567890123456789012345678901234567890123456789012345678" vars.uuid=8a1fb4db-ceae-457a-99f5-bcf461a47676 
time="2017-03-11T02:50:56.607902636Z" level=error msg="error authorizing context: access denied" go.version=go1.7.4 http.request.host="172.30.125.183:5000" http.request.id=11b11ffe-6b7b-4a56
-9585-45c526de8f78 http.request.method=PUT http.request.remoteaddr="10.128.0.1:36682" http.request.uri="/v2/extended-test-new-app-n1n3l-3vtvl/a23456789012345678901234567890123456789012345678
9012345678/blobs/uploads/8a1fb4db-ceae-457a-99f5-bcf461a47676?_state=9O2ALNijpG1rx_8F8Xlx5uL8wwapUKWpnEZwrlmjLAF7Ik5hbWUiOiJleHRlbmRlZC10ZXN0LW5ldy1hcHAtbjFuM2wtM3Z0dmwvYTIzNDU2Nzg5MDEyMzQ1N
jc4OTAxMjM0NTY3ODkwMTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3OCIsIlVVSUQiOiI4YTFmYjRkYi1jZWFlLTQ1N2EtOTlmNS1iY2Y0NjFhNDc2NzYiLCJPZmZzZXQiOjcwMDY4OTM3LCJTdGFydGVkQXQiOiIyMDE3LTAzLTExVDAyOjUwOjE0WiJ
9&digest=sha256%3Ad1e4228e5083cf1fb388732eea618c296623860fd76a8af1eb5f24f9e819c065" http.request.useragent="docker/1.12.6 go/go1.7.4 kernel/3.10.0-327.22.2.el7.x86_64 os/linux arch/amd64 Ups
treamClient(go-dockerclient)" instance.id=9eb7f703-fd6c-42b9-b45d-69a5196f2ad6 openshift.logger=registry vars.name="extended-test-new-app-n1n3l-3vtvl/a234567890123456789012345678901234567890
123456789012345678" vars.uuid=8a1fb4db-ceae-457a-99f5-bcf461a47676

mfojtik · 2017-03-13T11:41:41Z

@miminar PTAL

miminar · 2017-03-13T14:46:51Z

@jim-minter This error is different than the one above. The last time I saw this was a routing issue but that shouldn't be a problem now unless router is broken.

Do you have more details? Can you link the failed job?

jim-minter · 2017-03-13T15:01:57Z

@miminar (steve linked it above, but here you go): https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/22/testReport/junit/(root)/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/

miminar · 2017-03-13T15:51:12Z

@jim-minter my bad, I thought they are unrelated. Thanks.

miminar · 2017-03-13T15:59:47Z

@stevekuznetsov just a suggestion. I'd welcome the exact docker version in the logs. 1.12.6 is too generic (it has 13 releases in rhel7.3 already).

stevekuznetsov · 2017-03-13T17:13:20Z

@miminar we have an artifact called generated/installed_packages.log which has the full list of packages installed. From that list we can see:

docker.x86_64                  2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os
docker-client.x86_64           2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os
docker-common.x86_64           2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os
docker-rhel-push-plugin.x86_64 2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os

miminar · 2017-03-14T08:18:56Z

@stevekuznetsov Great, thanks!

0xmichalis · 2017-04-14T14:55:18Z

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/742/testReport/junit/(root)/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/

smarterclayton · 2017-04-14T16:52:30Z

This is indeed etcd failing over. There is another problem. We are using quorum reads from etcd, which changed how errors are handled for authorization. It's possible that we should return 429 retry when etcd returns this error, even for auth. On Apr 14, 2017, at 10:55 AM, Michail Kargakis <notifications@github.com> wrote: https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/742/testReport/junit/(root)/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/ — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13067 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABG_p8wcjkaxXzeb7nrARx1VzAVQWdreks5rv4jXgaJpZM4MJZF3> .

ncdc · 2017-04-18T17:04:08Z

@smarterclayton do you have an idea for how to attack fixing this?

stevekuznetsov · 2017-04-18T17:54:31Z

@ncdc I've been trying to tackle it in openshift-eng/aos-cd-jobs#169 but I keep seeing etcd just fall over, and setting $ETCD_DEBUG on has not really helped me to figure out why ...

ncdc · 2017-04-18T18:00:20Z

@stevekuznetsov have we been able to reproduce this in a VM that we control i.e. outside of CI?

stevekuznetsov · 2017-04-18T18:21:54Z

I've had very limited bandwidth so I have not been able to dig very deep into it

stevekuznetsov · 2017-04-19T12:01:23Z

Not a blocker, only affects test cases

bparees · 2017-04-19T12:59:15Z

p1 is used on flakes because of the impact they have on developer productivity. we may need a more nuanced release process to make it clear the release is not blocked by them, but they shouldn't get relegated to p2 limbo if they are occurring frequently.

stevekuznetsov · 2017-04-19T13:08:20Z

Of course we need a better process, but for as long as we've been labeling flakes P1 we've done the demotion/promotion dance around their priority when they span a release ... the P1 is triggering blocker bug mechanisms that we don't need it to trigger as it does not block the release. Not really sure why we should treat this flake differently from the rest.

bparees · 2017-04-19T14:07:52Z

That dance is news to me. We have ignored them on blocker calls but i'm not aware of the process requiring us to formally mark them down to get through the release. @pweil-

bparees · 2017-04-19T14:08:25Z

(and i'm not suggesting this flake be treated differently, it's just the only one i personally saw getting marked down when it's occurring frequently)

bparees · 2017-04-20T19:23:32Z

I've updated this flake to reflect the current root cause (which is cropping up in all sorts of test failures now): 'the server has asked for the client to provide credentials'

My understanding is we still think that is fundamentally caused by an etcd hiccup.

stevekuznetsov · 2017-04-26T18:30:25Z

Should be fixed in openshift-eng/aos-cd-jobs#199

stevekuznetsov · 2017-05-05T13:52:34Z

Haven't seen it since. Closing.

smarterclayton added component/build kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Feb 23, 2017

smarterclayton mentioned this issue Feb 23, 2017

install ceph-common pkg on origin to support rbd provisioning #12896

Merged

bparees assigned jim-minter Feb 23, 2017

bparees assigned mfojtik and unassigned jim-minter Feb 27, 2017

smarterclayton closed this as completed Feb 27, 2017

jim-minter mentioned this issue Mar 2, 2017

store master and ideally node logs after test_pull_requests_origin_gce failures #13191

Closed

stevekuznetsov reopened this Mar 11, 2017

stevekuznetsov mentioned this issue Mar 11, 2017

bug 1430661. Update masterConfig metricsPublicURL on install openshift/openshift-ansible#3625

Merged

miminar assigned miminar and unassigned mfojtik Mar 13, 2017

0xmichalis mentioned this issue Apr 14, 2017

Update logging in our deployment controllers #13762

Merged

mtnbikenc mentioned this issue Apr 17, 2017

BZ 1442942: Fix paths for file includes openshift/openshift-ansible#3936

Merged

stevekuznetsov removed the priority/P1 label Apr 19, 2017

stevekuznetsov added the priority/P2 label Apr 19, 2017

stevekuznetsov mentioned this issue Apr 19, 2017

Extended.deploymentconfigs with revision history limits [Conformance] should never persist more old deployments than acceptable after being observed by the controller #11114

Closed

bparees added priority/P1 and removed priority/P2 labels Apr 19, 2017

bparees mentioned this issue Apr 20, 2017

strip proxy credentials from proxy env variables when logging them #13751

Merged

stevekuznetsov mentioned this issue Apr 20, 2017

Extended.deploymentconfigs with multiple image change triggers #13821

Closed

bparees changed the title ~~flake: oc new-app should succeed with a --name of 58 characters~~ the server has asked for the client to provide credentials Apr 20, 2017

sdodson mentioned this issue Apr 26, 2017

Copy v3 data dir when performing backup openshift/openshift-ansible#3860

Merged

dcbw mentioned this issue Apr 26, 2017

proxy/hybrid: add locking around userspace map #13847

Merged

stevekuznetsov closed this as completed May 5, 2017

This was referenced Jun 22, 2017

Verify matched openshift_upgrade_nodes_label openshift/openshift-ansible#4498

Merged

Update template examples for 3.6 openshift/openshift-ansible#4528

Merged

mtnbikenc mentioned this issue Jun 29, 2017

Correct version comparisons to ensure proper fact evaluation openshift/openshift-ansible#4645

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the server has asked for the client to provide credentials #13067

the server has asked for the client to provide credentials #13067

smarterclayton commented Feb 23, 2017

bparees commented Feb 23, 2017

smarterclayton commented Feb 23, 2017 via email

jim-minter commented Feb 23, 2017

stevekuznetsov commented Feb 23, 2017

jim-minter commented Feb 24, 2017

stevekuznetsov commented Feb 24, 2017

bparees commented Feb 27, 2017

smarterclayton commented Feb 27, 2017

stevekuznetsov commented Mar 11, 2017

jim-minter commented Mar 13, 2017

mfojtik commented Mar 13, 2017

miminar commented Mar 13, 2017

jim-minter commented Mar 13, 2017

miminar commented Mar 13, 2017

miminar commented Mar 13, 2017

stevekuznetsov commented Mar 13, 2017

miminar commented Mar 14, 2017

0xmichalis commented Apr 14, 2017

smarterclayton commented Apr 14, 2017 via email

ncdc commented Apr 18, 2017

stevekuznetsov commented Apr 18, 2017

ncdc commented Apr 18, 2017

stevekuznetsov commented Apr 18, 2017

stevekuznetsov commented Apr 19, 2017

bparees commented Apr 19, 2017

stevekuznetsov commented Apr 19, 2017

bparees commented Apr 19, 2017

bparees commented Apr 19, 2017

bparees commented Apr 20, 2017

stevekuznetsov commented Apr 26, 2017

stevekuznetsov commented May 5, 2017

the server has asked for the client to provide credentials #13067

the server has asked for the client to provide credentials #13067

Comments

smarterclayton commented Feb 23, 2017

bparees commented Feb 23, 2017

smarterclayton commented Feb 23, 2017 via email

jim-minter commented Feb 23, 2017

stevekuznetsov commented Feb 23, 2017

jim-minter commented Feb 24, 2017

stevekuznetsov commented Feb 24, 2017

bparees commented Feb 27, 2017

smarterclayton commented Feb 27, 2017

stevekuznetsov commented Mar 11, 2017

jim-minter commented Mar 13, 2017

mfojtik commented Mar 13, 2017

miminar commented Mar 13, 2017

jim-minter commented Mar 13, 2017

miminar commented Mar 13, 2017

miminar commented Mar 13, 2017

stevekuznetsov commented Mar 13, 2017

miminar commented Mar 14, 2017

0xmichalis commented Apr 14, 2017

smarterclayton commented Apr 14, 2017 via email

ncdc commented Apr 18, 2017

stevekuznetsov commented Apr 18, 2017

ncdc commented Apr 18, 2017

stevekuznetsov commented Apr 18, 2017

stevekuznetsov commented Apr 19, 2017

bparees commented Apr 19, 2017

stevekuznetsov commented Apr 19, 2017

bparees commented Apr 19, 2017

bparees commented Apr 19, 2017

bparees commented Apr 20, 2017

stevekuznetsov commented Apr 26, 2017

stevekuznetsov commented May 5, 2017