Ingress tests broken on gke #27813

j3ffml · 2016-06-21T22:52:34Z

I didn't see an existing bug. Started failing this morning at ~10am
http://kubekins.dls.corp.google.com/view/GKE/job/kubernetes-e2e-gke-ingress/

From logs it looks like it may be cluster nodes missing permissions. Did we perhaps change IAM settings for test clusters? cc @cjcullen

11:27:48 Jun 21 11:27:48.837: INFO: Running '/workspace/kubernetes/platforms/linux/amd64/kubectl --server=https://104.155.136.136 --kubeconfig=/workspace/.kube/config describe ing --namespace=e2e-tests-ingress-kd0lu'
11:27:48 Jun 21 11:27:48.972: INFO: stderr: ""
11:27:48 Jun 21 11:27:48.972: INFO: Name:           echomap
11:27:48 Namespace:     e2e-tests-ingress-kd0lu
11:27:48 Address:       
11:27:48 Default backend:   default-http-backend:80 (10.180.0.4:8080)
11:27:48 Rules:
11:27:48   Host     Path    Backends
11:27:48   ----     ----    --------
11:27:48   foo.bar.com  
11:27:48            /foo    echoheadersx:80 (<none>)
11:27:48   bar.baz.com  
11:27:48            /bar    echoheadersy:80 (<none>)
11:27:48            /foo    echoheadersx:80 (<none>)
11:27:48 Annotations:
11:27:48 Events:
11:27:48   FirstSeen    LastSeen    Count   From                SubobjectPath   Type        Reason      Message
11:27:48   ---------    --------    -----   ----                -------------   --------    ------      -------
11:27:48   15m      15m     1   {loadbalancer-controller }          Normal      ADD     e2e-tests-ingress-kd0lu/echomap
11:27:48   15m      3m      4495    {loadbalancer-controller }          Warning     GCE :Quota  googleapi: Error 403: Insufficient Permission, insufficientPermissions

The text was updated successfully, but these errors were encountered:

bprashanth · 2016-06-21T23:05:27Z

As noted this looks related to permissions, @kubernetes/goog-gke how do i reproduce? the project has enough quota and the test fails to acquire the loadbalancer ip. I haven't looked up gke master logs yet.

j3ffml · 2016-06-21T23:27:03Z

Actually the 403 may be misleading/bad error code from GCE. The failures started happening with the build that included #27741, so this may just be kubernetes-retired/contrib#1246

j3ffml · 2016-06-21T23:28:57Z

cc @zmerlynn

zmerlynn · 2016-06-21T23:30:56Z

That's kind of disturbing. The Ingress container should have it's own vendored version of the code and should just be running old code?

zmerlynn · 2016-06-21T23:33:44Z

There's also not a similar failure of kubernetes-e2e-gce-ingress.

bprashanth · 2016-06-21T23:39:27Z

Yeah i think what happend was we updated the gce.conf with the node-instance-prefix and the vendored kubernetes library didn't understand it. The gce.conf file changed right? If that's the case just checking in the glbc version bump should fix it i think.

zmerlynn · 2016-06-21T23:42:45Z

@bprashanth: Yeah, we just worked out the same thing. It turns out, even with vendored code, this is not forward-compatible because of the gce.conf change.

bprashanth · 2016-06-21T23:42:46Z

Evne the gce one doesn't read it, but somehow it doesn't seem to matter:

On GCE

21 22:46:29.189048       5 cluster_manager.go:230] Reading config from path /etc/gce.conf
E0621 22:46:29.192929       5 gce.go:240] Couldn't read config: invalid variable: section "global" subsection "" variable "node-instance-prefix"
W0621 22:46:29.192953       5 cluster_manager.go:209] Failed to retrieve cloud interface, retrying: invalid variable: section "global" subsection "" variable "node-instance-prefix"
I0621 22:46:39.195850       5 gce.go:243] Using GCE provider config {Global:{TokenURL: TokenBody: ProjectID: NetworkName: NodeTags:[] Multizone:false}}
I0621 22:46:39.195902       5 gce.go:282] Using existing Token Source &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}
I0621 22:46:39.982502       5 cluster_manager.go:243] Successfully loaded cloudprovider using config "/etc/gce.conf"
I0621 22:46:40.070218       5 controller.go:193] Starting loadbalancer controller
I0621 22:52:32.370493       5 event.go:216] Event(api.ObjectReference{Kind:"Ingress", Namespace:"e2e-tests-ingress-rpy0k", Name:"static-ip", UID:"d6a1a0d4-3802-11e6-8fd0-42010af00002", APIVersion:"extensions", ResourceVersion:"419", FieldPath:""}): type: 'Normal' reason: 'ADD' e2e-tests-ingress-rpy0k/static-ip
I0621 22:52:32.538173       5 instances.go:56] Creating instance group k8s-ig--5c2d858a34cd1f19

On GKE (I finally paged in sql query syntax and did a binary search on metadata.timestamp to when this started):

80  2016-06-21 17:41:31 UTC INFO    cluster_manager.go:230  Reading config from path /etc/gce.conf   
81  2016-06-21 17:41:31 UTC WARNING cluster_manager.go:209  Failed to retrieve cloud interface, retrying: invalid variable: section "global" subsection "" variable "node-instance-prefix"   
82  2016-06-21 17:41:31 UTC ERROR   gce.go:240  Couldn't read config: invalid variable: section "global" subsection "" variable "node-instance-prefix"   
83  2016-06-21 17:41:31 UTC INFO    configmaps.go:92    Successfully stored uid "9fdc10af230a9db7" in config map kube-system/ingress-uid     
84  2016-06-21 17:41:41 UTC WARNING pools.go:88 Failed to list: googleapi: Error 403: Insufficient Permission, insufficientPermissions   
85  2016-06-21 17:41:41 UTC INFO    gce.go:282  Using existing Token Source &oauth2.reuseTokenSource{new:google.computeSource{account:""}, mu:sync.Mutex{state:0, sema:0x0}, t:(*oauth2.Token)(nil)}     
86  2016-06-21 17:41:41 UTC INFO    cluster_manager.go:243  Successfully loaded cloudprovider using config "/etc/gce.conf"   
87  2016-06-21 17:41:41 UTC INFO    gce.go:243  Using GCE provider config {Global:{TokenURL: TokenBody: ProjectID: NetworkName: NodeTags:[] Multizone:false}}    
88  2016-06-21 17:41:41 UTC INFO    controller.go:193   Starting loadbalancer controller     
89  2016-06-21 17:41:57 UTC INFO    cluster_manager.go:89   Reporting cluster as healthy, but unable to list backends: googleapi: Error 403: Insufficient Permission, insufficientPermissions    
90  2016-06-21 17:42:11 UTC WARNING pools.go:88 Failed to list: googleapi: Error 403: Insufficient Permission, insufficientPermissions

Is GKE more strict about something? Or is the gce.conf failure just a red herring

zmerlynn · 2016-06-21T23:43:15Z

@bprashanth: On GKE, we need the TokenURL to work.

zmerlynn · 2016-06-21T23:49:41Z

I believe this will be fixed by #27814, but has the same root cause as #27821. So we need a bump there (or a revert).

zmerlynn · 2016-06-21T23:57:22Z

~~Actually, I think I can fix this by making it a gcfg multi value, then enforcing there's exactly one or zero.~~

Nevermind, that obviously still involves making a GCE provider change. :)

zmerlynn · 2016-06-22T01:31:07Z

Tentatively calling this fixed. Please reopen if it's still failing.

bbzg · 2017-07-15T20:05:27Z

Since this thread is quite old, and probably unrelated, I have created a new issue on ingress instead: kubernetes/ingress-nginx#975

Old text

I recently upgraded to kubernetes 1.7 with RBAC on GKE, and I am seeing something similar:

  FirstSeen	LastSeen	Count	From			SubObjectPath	Type		Reason		Message
  ---------	--------	-----	----			-------------	--------	------		-------
  6h		6m		75	loadbalancer-controller			Warning		GCE :Quota	googleapi: Error 403: Insufficient Permission, insufficientPermissions

GKE Cluster logs (slightly redacted):

{
 insertId:  "x"   
 jsonPayload: {
  apiVersion:  "v1"    
  involvedObject: {
   apiVersion:  "extensions"     
   kind:  "Ingress"     
   name:  "ingress-testing"     
   namespace:  "default"     
   resourceVersion:  "425826"     
   uid:  "x"     
  }
  kind:  "Event"    
  message:  "googleapi: Error 403: Insufficient Permission, insufficientPermissions"    
  metadata: {
   creationTimestamp:  "2017-07-15T12:54:37Z"     
   name:  "ingress-testing.x"     
   namespace:  "default"     
   resourceVersion:  "53520"     
   selfLink:  "/api/v1/namespaces/default/events/ingress-testing.14d1822c5ed30595"     
   uid:  "x"     
  }
  reason:  "GCE :Quota"    
  source: {
   component:  "loadbalancer-controller"     
  }
  type:  "Warning"    
 }
 logName:  "projects/x/logs/events"   
 receiveTimestamp:  "2017-07-15T19:11:59.117152623Z"   
 resource: {
  labels: {
   cluster_name:  "app-cluster"     
   location:  ""     
   project_id:  "x"     
  }
  type:  "gke_cluster"    
 }
 severity:  "WARNING"   
 timestamp:  "2017-07-15T19:11:54Z"   
}

I have tried figuring out what the cause might be, but have not found anything except this thread. I suppose I should post something on stack overflow, but trying here first.

What can I do to get Ingress working again in my cluster?

Thanks!

j3ffml added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/network Categorizes an issue or PR as relevant to SIG Network. team/cluster labels Jun 21, 2016

j3ffml added this to the v1.3 milestone Jun 21, 2016

j3ffml assigned bprashanth Jun 21, 2016

This was referenced Jun 22, 2016

cluster autoscaling tests broken on gce #27821

Closed

Transition gce.conf (and any other gcfg users) to something like text proto #27827

Closed

zmerlynn closed this as completed Jun 22, 2016

bprashanth mentioned this issue Jun 22, 2016

Ingress tests panic invoking framework.BeforeEach #27486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingress tests broken on gke #27813

Ingress tests broken on gke #27813

j3ffml commented Jun 21, 2016

bprashanth commented Jun 21, 2016

j3ffml commented Jun 21, 2016

j3ffml commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

bprashanth commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

bprashanth commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

zmerlynn commented Jun 21, 2016 •

edited

Loading

zmerlynn commented Jun 22, 2016

bbzg commented Jul 15, 2017 •

edited

Loading

Ingress tests broken on gke #27813

Ingress tests broken on gke #27813

Comments

j3ffml commented Jun 21, 2016

bprashanth commented Jun 21, 2016

j3ffml commented Jun 21, 2016

j3ffml commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

bprashanth commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

bprashanth commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

zmerlynn commented Jun 21, 2016

zmerlynn commented Jun 21, 2016 • edited Loading

zmerlynn commented Jun 22, 2016

bbzg commented Jul 15, 2017 • edited Loading

zmerlynn commented Jun 21, 2016 •

edited

Loading

bbzg commented Jul 15, 2017 •

edited

Loading