Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingress tests broken on gke #27813

Closed
j3ffml opened this issue Jun 21, 2016 · 13 comments
Closed

Ingress tests broken on gke #27813

j3ffml opened this issue Jun 21, 2016 · 13 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Milestone

Comments

@j3ffml
Copy link
Contributor

j3ffml commented Jun 21, 2016

I didn't see an existing bug. Started failing this morning at ~10am
http://kubekins.dls.corp.google.com/view/GKE/job/kubernetes-e2e-gke-ingress/

From logs it looks like it may be cluster nodes missing permissions. Did we perhaps change IAM settings for test clusters? cc @cjcullen

11:27:48 Jun 21 11:27:48.837: INFO: Running '/workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://104.155.136.136 --kubeconfig=/workspace/.kube/config describe ing --namespace=e2e-tests-ingress-kd0lu'
11:27:48 Jun 21 11:27:48.972: INFO: stderr: ""
11:27:48 Jun 21 11:27:48.972: INFO: Name:           echomap
11:27:48 Namespace:     e2e-tests-ingress-kd0lu
11:27:48 Address:       
11:27:48 Default backend:   default-http-backend:80 (10.180.0.4:8080)
11:27:48 Rules:
11:27:48   Host     Path    Backends
11:27:48   ----     ----    --------
11:27:48   foo.bar.com  
11:27:48            /foo    echoheadersx:80 (<none>)
11:27:48   bar.baz.com  
11:27:48            /bar    echoheadersy:80 (<none>)
11:27:48            /foo    echoheadersx:80 (<none>)
11:27:48 Annotations:
11:27:48 Events:
11:27:48   FirstSeen    LastSeen    Count   From                SubobjectPath   Type        Reason      Message
11:27:48   ---------    --------    -----   ----                -------------   --------    ------      -------
11:27:48   15m      15m     1   {loadbalancer-controller }          Normal      ADD     e2e-tests-ingress-kd0lu/echomap
11:27:48   15m      3m      4495    {loadbalancer-controller }          Warning     GCE :Quota  googleapi: Error 403: Insufficient Permission, insufficientPermissions
@j3ffml j3ffml added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network. team/cluster labels Jun 21, 2016
@j3ffml j3ffml added this to the v1.3 milestone Jun 21, 2016
@bprashanth
Copy link
Contributor

As noted this looks related to permissions, @kubernetes/goog-gke how do i reproduce? the project has enough quota and the test fails to acquire the loadbalancer ip. I haven't looked up gke master logs yet.

@j3ffml
Copy link
Contributor Author

j3ffml commented Jun 21, 2016

Actually the 403 may be misleading/bad error code from GCE. The failures started happening with the build that included #27741, so this may just be kubernetes-retired/contrib#1246

@j3ffml
Copy link
Contributor Author

j3ffml commented Jun 21, 2016

cc @zmerlynn

@zmerlynn
Copy link
Member

That's kind of disturbing. The Ingress container should have it's own vendored version of the code and should just be running old code?

@zmerlynn
Copy link
Member

There's also not a similar failure of kubernetes-e2e-gce-ingress.

@bprashanth
Copy link
Contributor

Yeah i think what happend was we updated the gce.conf with the node-instance-prefix and the vendored kubernetes library didn't understand it. The gce.conf file changed right? If that's the case just checking in the glbc version bump should fix it i think.

@zmerlynn
Copy link
Member

@bprashanth: Yeah, we just worked out the same thing. It turns out, even with vendored code, this is not forward-compatible because of the gce.conf change.

@bprashanth
Copy link
Contributor

Evne the gce one doesn't read it, but somehow it doesn't seem to matter:

On GCE

21 22:46:29.189048       5 cluster_manager.go:230] Reading config from path /etc/gce.conf
E0621 22:46:29.192929       5 gce.go:240] Couldn't read config: invalid variable: section "global" subsection "" variable "node-instance-prefix"
W0621 22:46:29.192953       5 cluster_manager.go:209] Failed to retrieve cloud interface, retrying: invalid variable: section "global" subsection "" variable "node-instance-prefix"
I0621 22:46:39.195850       5 gce.go:243] Using GCE provider config {Global:{TokenURL: TokenBody: ProjectID: NetworkName: NodeTags:[] Multizone:false}}
I0621 22:46:39.195902       5 gce.go:282] Using existing Token Source &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
I0621 22:46:39.982502       5 cluster_manager.go:243] Successfully loaded cloudprovider using config "/etc/gce.conf"
I0621 22:46:40.070218       5 controller.go:193] Starting loadbalancer controller
I0621 22:52:32.370493       5 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"e2e-tests-ingress-rpy0k", Name:"static-ip", UID:"d6a1a0d4-3802-11e6-8fd0-42010af00002", APIVersion:"extensions", ResourceVersion:"419", FieldPath:""}): type: 'Normal' reason: 'ADD' e2e-tests-ingress-rpy0k/static-ip
I0621 22:52:32.538173       5 instances.go:56] Creating instance group k8s-ig--5c2d858a34cd1f19

On GKE (I finally paged in sql query syntax and did a binary search on metadata.timestamp to when this started):

80  2016-06-21 17:41:31 UTC INFO    cluster_manager.go:230  Reading config from path /etc/gce.conf   
81  2016-06-21 17:41:31 UTC WARNING cluster_manager.go:209  Failed to retrieve cloud interface, retrying: invalid variable: section "global" subsection "" variable "node-instance-prefix"   
82  2016-06-21 17:41:31 UTC ERROR   gce.go:240  Couldn't read config: invalid variable: section "global" subsection "" variable "node-instance-prefix"   
83  2016-06-21 17:41:31 UTC INFO    configmaps.go:92    Successfully stored uid "9fdc10af230a9db7" in config map kube-system/ingress-uid     
84  2016-06-21 17:41:41 UTC WARNING pools.go:88 Failed to list: googleapi: Error 403: Insufficient Permission, insufficientPermissions   
85  2016-06-21 17:41:41 UTC INFO    gce.go:282  Using existing Token Source &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}     
86  2016-06-21 17:41:41 UTC INFO    cluster_manager.go:243  Successfully loaded cloudprovider using config "/etc/gce.conf"   
87  2016-06-21 17:41:41 UTC INFO    gce.go:243  Using GCE provider config {Global:{TokenURL: TokenBody: ProjectID: NetworkName: NodeTags:[] Multizone:false}}    
88  2016-06-21 17:41:41 UTC INFO    controller.go:193   Starting loadbalancer controller     
89  2016-06-21 17:41:57 UTC INFO    cluster_manager.go:89   Reporting cluster as healthy, but unable to list backends: googleapi: Error 403: Insufficient Permission, insufficientPermissions    
90  2016-06-21 17:42:11 UTC WARNING pools.go:88 Failed to list: googleapi: Error 403: Insufficient Permission, insufficientPermissions

Is GKE more strict about something? Or is the gce.conf failure just a red herring

@zmerlynn
Copy link
Member

@bprashanth: On GKE, we need the TokenURL to work.

@zmerlynn
Copy link
Member

I believe this will be fixed by #27814, but has the same root cause as #27821. So we need a bump there (or a revert).

@zmerlynn
Copy link
Member

zmerlynn commented Jun 21, 2016

Actually, I think I can fix this by making it a gcfg multi value, then enforcing there's exactly one or zero.

Nevermind, that obviously still involves making a GCE provider change. :)

@zmerlynn
Copy link
Member

Tentatively calling this fixed. Please reopen if it's still failing.

@bbzg
Copy link

bbzg commented Jul 15, 2017

Since this thread is quite old, and probably unrelated, I have created a new issue on ingress instead: kubernetes/ingress-nginx#975

Old text I recently upgraded to kubernetes 1.7 with RBAC on GKE, and I am seeing something similar:
  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  6h		6m		75	loadbalancer-controller			Warning		GCE :Quota	googleapi: Error 403: Insufficient Permission, insufficientPermissions

GKE Cluster logs (slightly redacted):

{
 insertId:  "x"   
 jsonPayload: {
  apiVersion:  "v1"    
  involvedObject: {
   apiVersion:  "extensions"     
   kind:  "Ingress"     
   name:  "ingress-testing"     
   namespace:  "default"     
   resourceVersion:  "425826"     
   uid:  "x"     
  }
  kind:  "Event"    
  message:  "googleapi: Error 403: Insufficient Permission, insufficientPermissions"    
  metadata: {
   creationTimestamp:  "2017-07-15T12:54:37Z"     
   name:  "ingress-testing.x"     
   namespace:  "default"     
   resourceVersion:  "53520"     
   selfLink:  "/api/v1/namespaces/default/events/ingress-testing.14d1822c5ed30595"     
   uid:  "x"     
  }
  reason:  "GCE :Quota"    
  source: {
   component:  "loadbalancer-controller"     
  }
  type:  "Warning"    
 }
 logName:  "projects/x/logs/events"   
 receiveTimestamp:  "2017-07-15T19:11:59.117152623Z"   
 resource: {
  labels: {
   cluster_name:  "app-cluster"     
   location:  ""     
   project_id:  "x"     
  }
  type:  "gke_cluster"    
 }
 severity:  "WARNING"   
 timestamp:  "2017-07-15T19:11:54Z"   
}

I have tried figuring out what the cause might be, but have not found anything except this thread. I suppose I should post something on stack overflow, but trying here first.

What can I do to get Ingress working again in my cluster?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

No branches or pull requests

4 participants