Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the server has asked for the client to provide credentials #13067

Closed
smarterclayton opened this issue Feb 23, 2017 · 46 comments
Closed

the server has asked for the client to provide credentials #13067

smarterclayton opened this issue Feb 23, 2017 · 46 comments
Assignees
Labels
component/build kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1

Comments

@smarterclayton
Copy link
Contributor

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_gce/870/testReport/junit/(root)/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/

/tmp/openshift/tito/rpmbuild-originbVKDf0/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/test/extended/builds/new_app.go:32
Expected error:
    <*errors.errorString | 0xc4212baa70>: {
        s: "timed out waiting for deployment \"a234567890123456789012345678901234567890123456789012345678\" after 15m0s",
    }
    timed out waiting for deployment "a234567890123456789012345678901234567890123456789012345678" after 15m0s
not to have occurred
/tmp/openshift/tito/rpmbuild-originbVKDf0/BUILD/origin-1.5.0/_output/local/go/src/github.com/openshift/origin/test/extended/builds/new_app.go:31
18:00:05 -0500 EST  } {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-22 17:59:35 -0500 EST  }]
Feb 22 18:18:31.243: INFO: a234567890123456789012345678901234567890123456789012345678-1-build  ci-prtest870-ig-n-r0wr  Failed          [{Initialized True 0001-01-01 00:00:00 +0000 UTC 2017-02-22 18:03:31 -0500 EST  } {Ready False 0001-01-01 00:00:00 +0000 UTC 2017-02-22 18:04:05 -0500 EST ContainersNotReady containers with unready status: [sti-build]} {PodScheduled True 0001-01-01 00:00:00 +0000 UTC 2017-02-22 18:03:31 -0500 EST  }]

Looks like the build failed, you'll need more debugging. @bparees

@bparees
Copy link
Contributor

bparees commented Feb 23, 2017

@jim-minter, as @smarterclayton says, we should update the test case to first check for build success (and dump logs on failure) before waiting for the deployment.

@smarterclayton what's the implication of

Feb 22 18:18:31.153: INFO: At 2017-02-22 18:04:05 -0500 EST - event for a234567890123456789012345678901234567890123456789012345678-1-build: {kubelet ci-prtest870-ig-n-r0wr} Killing: Killing container with docker id 3e745c4dbdb0: Need to kill pod.

is that a benign result of the build pod failing, or is that the reason the build pod failed (something killed it)?

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Feb 23, 2017 via email

@jim-minter
Copy link
Contributor

#13074 adds additional logging as requested, but I think what would be more useful here is access to the master log and ideally node logs from the run. These are not recovered as part of the CI job. @stevekuznetsov, is this planned?

@stevekuznetsov
Copy link
Contributor

No, I assume @smarterclayton wanted to retrieve both master and node logs. This is the one test I do not intimately own so I'll throw that one over the wall back to him :)

@jim-minter
Copy link
Contributor

@stevekuznetsov I mean, IMO it would be good if GCE Jenkins jobs retrieved master and ideally node logs and stored them in the case of a run failure. Is this planned?

@stevekuznetsov
Copy link
Contributor

Yes, I agree that would be ideal. Clayton knows the ins and outs of that specific job much better than I, so I would assign an issue for it to him.

@bparees
Copy link
Contributor

bparees commented Feb 27, 2017

looks like we got good build logging from this failure here and it shows a problem pushing to the registry.. assigning to @mfojtik
https://ci.openshift.redhat.com/jenkins/job/zz_origin_gce_image/138/testReport/junit/%28root%29/Extended/_builds__Conformance__oc_new_app_should_succeed_with_a___name_of_58_characters/

2017-02-27T19:06:55.909760000Z Pushing image 172.30.241.235:5000/extended-test-new-app-flk70-2vm8c/a234567890123456789012345678901234567890123456789012345678:latest ...
2017-02-27T19:06:56.040451000Z Registry server Address: 
2017-02-27T19:06:56.040798000Z Registry server User Name: serviceaccount
2017-02-27T19:06:56.041079000Z Registry server Email: serviceaccount@example.org
2017-02-27T19:06:56.041370000Z Registry server Password: <<non-empty>>
2017-02-27T19:06:56.041669000Z error: build error: Failed to push image: Get https://172.30.241.235:5000/v1/_ping: x509: cannot validate certificate for 172.30.241.235 because it doesn't contain any IP SANs

@bparees bparees assigned mfojtik and unassigned jim-minter Feb 27, 2017
@smarterclayton
Copy link
Contributor Author

@stevekuznetsov
Copy link
Contributor

Saw this again here ... not on GCE, so we have node, master logs, docker logs, whatever you need.

@jim-minter
Copy link
Contributor

@mfojtik please could you take a look at this? The build failed because the push failed part way through:

2017-03-11T02:50:22.815884000Z Pushed 7/9 layers, 84% complete
2017-03-11T02:50:56.735001000Z Registry server Address: 
2017-03-11T02:50:56.742434000Z Registry server User Name: serviceaccount
2017-03-11T02:50:56.747935000Z Registry server Email: serviceaccount@example.org
2017-03-11T02:50:56.757086000Z Registry server Password: <<non-empty>>
2017-03-11T02:50:56.761795000Z error: build error: Failed to push image: unauthorized: authentication required
time="2017-03-11T02:50:56.607738296Z" level=error msg="Get user failed with error: the server has asked for the client to provide credentials (get users ~)" go.version=go1.7.4 http.request.h
ost="172.30.125.183:5000" http.request.id=11b11ffe-6b7b-4a56-9585-45c526de8f78 http.request.method=PUT http.request.remoteaddr="10.128.0.1:36682" http.request.uri="/v2/extended-test-new-app-
n1n3l-3vtvl/a234567890123456789012345678901234567890123456789012345678/blobs/uploads/8a1fb4db-ceae-457a-99f5-bcf461a47676?_state=9O2ALNijpG1rx_8F8Xlx5uL8wwapUKWpnEZwrlmjLAF7Ik5hbWUiOiJleHRlb
mRlZC10ZXN0LW5ldy1hcHAtbjFuM2wtM3Z0dmwvYTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3ODkwMTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3OCIsIlVVSUQiOiI4YTFmYjRkYi1jZWFlLTQ1N2EtOTlmNS1iY2Y0NjFhNDc2NzYiLCJPZmZzZXQ
iOjcwMDY4OTM3LCJTdGFydGVkQXQiOiIyMDE3LTAzLTExVDAyOjUwOjE0WiJ9&digest=sha256%3Ad1e4228e5083cf1fb388732eea618c296623860fd76a8af1eb5f24f9e819c065" http.request.useragent="docker/1.12.6 go/go1.7
.4 kernel/3.10.0-327.22.2.el7.x86_64 os/linux arch/amd64 UpstreamClient(go-dockerclient)" instance.id=9eb7f703-fd6c-42b9-b45d-69a5196f2ad6 openshift.logger=registry vars.name="extended-test-
new-app-n1n3l-3vtvl/a234567890123456789012345678901234567890123456789012345678" vars.uuid=8a1fb4db-ceae-457a-99f5-bcf461a47676 
time="2017-03-11T02:50:56.607902636Z" level=error msg="error authorizing context: access denied" go.version=go1.7.4 http.request.host="172.30.125.183:5000" http.request.id=11b11ffe-6b7b-4a56
-9585-45c526de8f78 http.request.method=PUT http.request.remoteaddr="10.128.0.1:36682" http.request.uri="/v2/extended-test-new-app-n1n3l-3vtvl/a23456789012345678901234567890123456789012345678
9012345678/blobs/uploads/8a1fb4db-ceae-457a-99f5-bcf461a47676?_state=9O2ALNijpG1rx_8F8Xlx5uL8wwapUKWpnEZwrlmjLAF7Ik5hbWUiOiJleHRlbmRlZC10ZXN0LW5ldy1hcHAtbjFuM2wtM3Z0dmwvYTIzNDU2Nzg5MDEyMzQ1N
jc4OTAxMjM0NTY3ODkwMTIzNDU2Nzg5MDEyMzQ1Njc4OTAxMjM0NTY3OCIsIlVVSUQiOiI4YTFmYjRkYi1jZWFlLTQ1N2EtOTlmNS1iY2Y0NjFhNDc2NzYiLCJPZmZzZXQiOjcwMDY4OTM3LCJTdGFydGVkQXQiOiIyMDE3LTAzLTExVDAyOjUwOjE0WiJ
9&digest=sha256%3Ad1e4228e5083cf1fb388732eea618c296623860fd76a8af1eb5f24f9e819c065" http.request.useragent="docker/1.12.6 go/go1.7.4 kernel/3.10.0-327.22.2.el7.x86_64 os/linux arch/amd64 Ups
treamClient(go-dockerclient)" instance.id=9eb7f703-fd6c-42b9-b45d-69a5196f2ad6 openshift.logger=registry vars.name="extended-test-new-app-n1n3l-3vtvl/a234567890123456789012345678901234567890
123456789012345678" vars.uuid=8a1fb4db-ceae-457a-99f5-bcf461a47676 

@mfojtik
Copy link
Contributor

mfojtik commented Mar 13, 2017

@miminar PTAL

@miminar
Copy link

miminar commented Mar 13, 2017

@jim-minter This error is different than the one above. The last time I saw this was a routing issue but that shouldn't be a problem now unless router is broken.

Do you have more details? Can you link the failed job?

@miminar
Copy link

miminar commented Mar 13, 2017

@jim-minter my bad, I thought they are unrelated. Thanks.

@miminar miminar assigned miminar and unassigned mfojtik Mar 13, 2017
@miminar
Copy link

miminar commented Mar 13, 2017

@stevekuznetsov just a suggestion. I'd welcome the exact docker version in the logs. 1.12.6 is too generic (it has 13 releases in rhel7.3 already).

@stevekuznetsov
Copy link
Contributor

@miminar we have an artifact called generated/installed_packages.log which has the full list of packages installed. From that list we can see:

docker.x86_64                  2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os
docker-client.x86_64           2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os
docker-common.x86_64           2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os
docker-rhel-push-plugin.x86_64 2:1.12.6-13.el7     @httpsmirroropenshiftcomenterpriserheldockertestedx8664os

@miminar
Copy link

miminar commented Mar 14, 2017

@stevekuznetsov Great, thanks!

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Apr 14, 2017 via email

@ncdc
Copy link
Contributor

ncdc commented Apr 18, 2017

@smarterclayton do you have an idea for how to attack fixing this?

@stevekuznetsov
Copy link
Contributor

@ncdc I've been trying to tackle it in openshift-eng/aos-cd-jobs#169 but I keep seeing etcd just fall over, and setting $ETCD_DEBUG on has not really helped me to figure out why ...

@ncdc
Copy link
Contributor

ncdc commented Apr 18, 2017

@stevekuznetsov have we been able to reproduce this in a VM that we control i.e. outside of CI?

@stevekuznetsov
Copy link
Contributor

I've had very limited bandwidth so I have not been able to dig very deep into it

@stevekuznetsov
Copy link
Contributor

Not a blocker, only affects test cases

@bparees
Copy link
Contributor

bparees commented Apr 19, 2017

p1 is used on flakes because of the impact they have on developer productivity. we may need a more nuanced release process to make it clear the release is not blocked by them, but they shouldn't get relegated to p2 limbo if they are occurring frequently.

@stevekuznetsov
Copy link
Contributor

Of course we need a better process, but for as long as we've been labeling flakes P1 we've done the demotion/promotion dance around their priority when they span a release ... the P1 is triggering blocker bug mechanisms that we don't need it to trigger as it does not block the release. Not really sure why we should treat this flake differently from the rest.

@bparees
Copy link
Contributor

bparees commented Apr 19, 2017

That dance is news to me. We have ignored them on blocker calls but i'm not aware of the process requiring us to formally mark them down to get through the release. @pweil-

@bparees
Copy link
Contributor

bparees commented Apr 19, 2017

(and i'm not suggesting this flake be treated differently, it's just the only one i personally saw getting marked down when it's occurring frequently)

@bparees bparees changed the title flake: oc new-app should succeed with a --name of 58 characters the server has asked for the client to provide credentials Apr 20, 2017
@bparees
Copy link
Contributor

bparees commented Apr 20, 2017

I've updated this flake to reflect the current root cause (which is cropping up in all sorts of test failures now): 'the server has asked for the client to provide credentials'

My understanding is we still think that is fundamentally caused by an etcd hiccup.

@stevekuznetsov
Copy link
Contributor

Should be fixed in openshift-eng/aos-cd-jobs#199

@stevekuznetsov
Copy link
Contributor

Haven't seen it since. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/build kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1
Projects
None yet
Development

No branches or pull requests

9 participants