Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e flake: "x509: certificate signed by unknown authority" #27612

Closed
wojtek-t opened this issue Jun 17, 2016 · 13 comments
Closed

e2e flake: "x509: certificate signed by unknown authority" #27612

wojtek-t opened this issue Jun 17, 2016 · 13 comments
Assignees
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.

Comments

@wojtek-t
Copy link
Member

In gke-large-cluster suite, two tests failed for me with the following error:
"Unable to connect to the server: x509: certificate signed by unknown authority"

01:39:45   /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubeproxy.go:107
01:39:45 
01:39:45   Expected error:
01:39:45       <*errors.errorString | 0xc8211d8480>: {
01:39:45           s: "Error running &{/workspace/kubernetes/platforms/linux/amd64/kubectl [kubectl --server=https://8.35.199.16 --kubeconfig=/workspace/.kube/config exec --namespace=e2e-tests-e2e-kubeproxy-rew7e host-test-container-pod -- /bin/sh -c for i in $(seq 1 5); do echo 'hostName' | timeout -t 3 nc -w 1 -u 10.38.215.4 8081; echo; sleep 1s; done | grep -v '^\\s*$' |sort | uniq -c | wc -l] []  <nil> Please enter Username: Please enter Password:  Unable to connect to the server: x509: certificate signed by unknown authority\n [] <nil> 0xc82358a460 exit status 1 <nil> true [0xc820dd1790 0xc820dd17a8 0xc820dd17c0] [0xc820dd1790 0xc820dd17a8 0xc820dd17c0] [0xc820dd17a0 0xc820dd17b8] [0xa7bfc0 0xa7bfc0] 0xc820ffafc0}:\nCommand stdout:\nPlease enter Username: Please enter Password: \nstderr:\nUnable to connect to the server: x509: certificate signed by unknown authority\n\nerror:\nexit status 1\n",
01:39:45       }
01:39:45       Error running &{/workspace/kubernetes/platforms/linux/amd64/kubectl [kubectl --server=https://8.35.199.16 --kubeconfig=/workspace/.kube/config exec --namespace=e2e-tests-e2e-kubeproxy-rew7e host-test-container-pod -- /bin/sh -c for i in $(seq 1 5); do echo 'hostName' | timeout -t 3 nc -w 1 -u 10.38.215.4 8081; echo; sleep 1s; done | grep -v '^\s*$' |sort | uniq -c | wc -l] []  <nil> Please enter Username: Please enter Password:  Unable to connect to the server: x509: certificate signed by unknown authority
01:39:45        [] <nil> 0xc82358a460 exit status 1 <nil> true [0xc820dd1790 0xc820dd17a8 0xc820dd17c0] [0xc820dd1790 0xc820dd17a8 0xc820dd17c0] [0xc820dd17a0 0xc820dd17b8] [0xa7bfc0 0xa7bfc0] 0xc820ffafc0}:
01:39:45       Command stdout:
01:39:45       Please enter Username: Please enter Password: 
01:39:45       stderr:
01:39:45       Unable to connect to the server: x509: certificate signed by unknown authority
01:39:45       
01:39:45       error:
01:39:45       exit status 1
01:39:45       
01:39:45   not to have occurred
01:39:45 
01:39:45   /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/util.go:3449

Those errors come from:
http://kubekins.dls.corp.google.com:8080/view/Scalability/job/kubernetes-e2e-gke-large-cluster/51/console

@zmerlynn

@wojtek-t wojtek-t added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. team/cluster kind/flake Categorizes issue or PR as related to a flaky test. labels Jun 17, 2016
@fejta
Copy link
Contributor

fejta commented Jun 17, 2016

Assigning to people based on thread activity. Please help reassign this correctly!

@wojtek-t
Copy link
Member Author

@cjcullen @roberthbailey @kubernetes/goog-gke

ping - this just happened again in gke-serial:
http://kubekins.dls.corp.google.com:8080/view/Critical%20Builds/job/kubernetes-e2e-gke-serial/1599/

@wojtek-t wojtek-t added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Jun 24, 2016
@wojtek-t
Copy link
Member Author

@davidopp @erictune

@cjcullen
Copy link
Member

#28034 should fix this.

@cjcullen
Copy link
Member

I'm seeing another x509 failure in gke-serial: #27537 (comment)

@zmerlynn
Copy link
Member

This seems to be fixed. Please reopen if not?

@cjcullen cjcullen reopened this Jun 27, 2016
@zmerlynn zmerlynn removed their assignment Jun 27, 2016
@cjcullen
Copy link
Member

Finally got inside a running Jenkins VM for this. Here's what the kubecfg file looks like:

# cat ~/.kube/config 
apiVersion: v1
clusters: []
contexts: []
current-context: ""
kind: Config
preferences: {}
users:
- name: gke_jenkins-gke-e2e-serial_us-central1-f_jenkins-e2e
  user:
    auth-provider:
      config:
        access-token: <redacted, but valid>
        expiry: 2016-06-27T18:27:47.859578747-07:00
      name: gcp

@cjcullen
Copy link
Member

I looked at the ModifyConfig code closer, and it is doing inidividual writes for each section of the kubeconfig. That now means that some sections can succeed, while others fail (because the file is being locked). We should be able to fix this by batching up the writes for each possible destinationFile and issuing a single WriteToFile per file at the end of ModifyConfig.

@krousey
Copy link
Contributor

krousey commented Jun 28, 2016

To be fair, this could happen before the file locking too. The file locking just adds another failure path, but prevents corruption.

@thockin thockin removed their assignment Jun 29, 2016
@cjcullen
Copy link
Member

@krousey: Agreed.

I think #28197 should prevent the token refresh case from causing kubeconfig data loss. We still won't have a fully parallelizable ModifyConfig, but that is lower priority.

@cjcullen
Copy link
Member

Made a new PR #28232 to address comments from #28197.

@krousey
Copy link
Contributor

krousey commented Jun 30, 2016

@cjcullen can this be closed now?

@cjcullen
Copy link
Member

I believe so. #28232 has been merged, and gke-serial has had 6 greens in a row since that (hopefully I'm not jinxing it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/flake Categorizes issue or PR as related to a flaky test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now.
Projects
None yet
Development

No branches or pull requests

7 participants