Ubernetes e2e flakes #26762

madhusudancs · 2016-06-03T05:58:47Z

I am opening this master/umbrella issue to track Ubernetes e2e test flakes. I think this issue is sufficient right now. But as we make progress we might have to fork this into sub-issues.

This issue is not just about the e2e tests themselves but also includes infrastructure related flakes such as bringing up the clusters, bringing up the federation control plane, etc.

Here is one flake that I am seeing from yesterday:

$ FEDERATION=true E2E_ZONES="us-central1-f" KUBE_GCE_ZONE="us-central1-f" go run hack/e2e.go -v --up
...
...
...
NAME                                     LOCATION       SCOPE  BASE_INSTANCE_NAME                       SIZE  TARGET_SIZE  INSTANCE_TEMPLATE                           AUTOSCALED
madhusudancs-us-central1-f-minion-group  us-central1-f  zone   madhusudancs-us-central1-f-minion-group        3            madhusudancs-us-central1-f-minion-template
Waiting for group to become stable, current operations: creating: 3
Waiting for group to become stable, current operations: creating: 3
Waiting for group to become stable, current operations: creating: 3
Waiting for group to become stable, current operations: creating: 3
Group is stable
INSTANCE_GROUPS=madhusudancs-us-central1-f-minion-group
NODE_NAMES=madhusudancs-us-central1-f-minion-group-5wk9 madhusudancs-us-central1-f-minion-group-6ys7 madhusudancs-us-central1-f-minion-group-l1vs
Using master: madhusudancs-us-central1-f-master (external IP: <REDACTED_IP>)
Waiting up to 300 seconds for cluster initialization.

  This will continually check to see if the API for kubernetes is reachable.
  This may time out if there was some uncaught error during start up.

...................................................................................................................................................Cluster failed to initialize within 300 seconds.
2016/06/02 14:18:06 e2e.go:218: Error running up: exit status 1
2016/06/02 14:18:06 e2e.go:214: Step 'up' finished in 8m0.261564635s
2016/06/02 14:18:06 e2e.go:114: Error starting e2e cluster. Aborting.
exit status 1

cc @colhom @nikhiljindal @colhom @kubernetes/sig-cluster-federation @jianhuiz @mfanjie

ghost · 2016-06-03T06:25:32Z

@madhusudancs I saw similar symptoms today, but it turns out the I'd reached the quota on my project. Do you have sufficient quota (specifically I had the default quota of 24 cores, and the federation requires more than that). I trivially increased my quota with an increase request, which got automatically approved within a few seconds.

madhusudancs · 2016-06-03T06:29:31Z

@quinton-hoole I have a higher quota limit in the region where I am spinning up a federation. Btw, this is a single cluster federation.

ghost · 2016-06-03T07:35:34Z

For the record, my three clusters came up fine, but then I got the following. Will debug some more in the morning:

$ FEDERATION=true E2E_ZONES="us-central1-a us-central1-b us-central1-f" KUBE_GCE_ZONE="us-central1-f" FEDERATION_PUSH_REPO_BASE="gcr.io/quinton-etcd-testing" KUBERNETES_PROVIDER="gce" federation/cluster/federated-up.sh 
namespace "federation-e2e" created
No resources found
service "federation-apiserver" created
attempting to get federation-apiserver loadbalancer hostname (1 / 30)
attempting to get federation-apiserver loadbalancer hostname (2 / 30)
attempting to get federation-apiserver loadbalancer hostname (3 / 30)
attempting to get federation-apiserver loadbalancer hostname (4 / 30)
attempting to get federation-apiserver loadbalancer hostname (5 / 30)
attempting to get federation-apiserver loadbalancer hostname (6 / 30)
Found federation-apiserver host at 104.197.87.156
deployment "federation-apiserver" created
secret "federation-apiserver-secrets" created
deployment "federation-controller-manager" created
Waiting for federation-apiserver to be running...(phase= Pending)
...
Waiting for federation-apiserver to be running...(phase= Pending)
federation-apiserver pod is not running! giving up.

ghost · 2016-06-03T07:53:24Z

PS: It looks like the images are not where they should be:

kubectl --server=https://146.148.37.247 get pods --namespace federation-e2e
NAME                                             READY     STATUS             RESTARTS   AGE
federation-apiserver-3663518660-nwge3            1/2       ImagePullBackOff   0          5m
federation-controller-manager-2288429398-98142   0/1       ImagePullBackOff   0          5m

ghost · 2016-06-03T08:15:21Z

I'm pretty sure that this is operator error on my part.

I ran this:

FEDERATION=true E2E_ZONES="us-central1-a us-central1-b us-central1-f" KUBE_GCE_ZONE="us-central1-f" FEDERATION_PUSH_REPO_BASE="gcr.io/quinton-etcd-testing" go run hack/e2e.go -v --up

and after all three clusters came up cleanly, I got this...

k8s.io/kubernetes/federation/cluster/common.sh: line 75: KUBERNETES_PROVIDER: unbound variable
2016/06/03 00:09:08 e2e.go:218: Error running up: exit status 1
2016/06/03 00:09:08 e2e.go:214: Step 'up' finished in 23m42.294835356s
2016/06/03 00:09:08 e2e.go:114: Error starting e2e cluster. Aborting.
exit status 1

So (stupidly), rather than tearing the whole lot down and starting over, I ran this:

$ FEDERATION=true E2E_ZONES="us-central1-a us-central1-b us-central1-f" KUBE_GCE_ZONE="us-central1-f" FEDERATION_PUSH_REPO_BASE="gcr.io/quinton-etcd-testing" KUBERNETES_PROVIDER="gce" federation/cluster/federated-up.sh -v

and got this:

namespace "federation-e2e" created
No resources found
service "federation-apiserver" created
attempting to get federation-apiserver loadbalancer hostname (1 / 30)
attempting to get federation-apiserver loadbalancer hostname (2 / 30)
attempting to get federation-apiserver loadbalancer hostname (3 / 30)
attempting to get federation-apiserver loadbalancer hostname (4 / 30)
attempting to get federation-apiserver loadbalancer hostname (5 / 30)
attempting to get federation-apiserver loadbalancer hostname (6 / 30)
Found federation-apiserver host at 104.197.87.156
deployment "federation-apiserver" created
secret "federation-apiserver-secrets" created
deployment "federation-controller-manager" created
Waiting for federation-apiserver to be running...(phase= Pending)
...
Waiting for federation-apiserver to be running...(phase= Pending)
federation-apiserver pod is not running! giving up.

I'm busy tearing it all down and starting from scratch, with the correct env vars.

madhusudancs · 2016-06-03T08:16:15Z

Please don't close this issue until we are reasonably confident that this isn't a flake. If it works once that doesn't mean it is not a flake. Also, this is an umbrella issue.

madhusudancs · 2016-06-03T08:41:47Z

@quinton-hoole Did you also run federation/cluster/federation-push.sh before running federation/cluster/federation-up.sh or hack/e2e.go --up? Also, please see the discussion here - #26656

ghost · 2016-06-03T14:14:51Z

My apologies. I had no intention of closing this issue. I must have pressed the wrong button :-(

ghost · 2016-06-03T15:47:13Z

And thanks - my federation is now up and healthy:

Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Running)
federation-apiserver pod is running!
Waiting for federation-controller-manager to be running...(phase= Running)
federation-controller-manager pod is running!
cluster "federated-cluster" set.
user "federated-cluster" set.
context "federated-cluster" set.
Wrote config for federated-cluster to /Users/quinton/.kube/config

kubectl get pods --namespace federation-e2e
NAME READY STATUS RESTARTS AGE
federation-apiserver-608821220-rsid0 2/2 Running 0 2m
federation-controller-manager-3887574383-h40yw 1/1 Running 0 2m

ghost · 2016-06-03T16:05:24Z

And the e2e test for cluster registration passed!

[It] should allow creation of cluster api objects
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/federation-apiserver.go:63
Jun 3 09:02:42.316: INFO: Creating cluster object: federation-e2e-gce-us-central1-a (https://104.197.87.156)
Jun 3 09:02:42.746: INFO: Creating cluster object: federation-e2e-gce-us-central1-b (https://104.197.70.145)
Jun 3 09:02:42.812: INFO: Creating cluster object: federation-e2e-gce-us-central1-f (https://23.236.59.255)

...

Ran 1 of 312 Specs in 98.316 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 311 Skipped PASS

Ginkgo ran 1 suite in 1m38.613311042s
Test Suite Passed

ghost · 2016-06-03T16:15:48Z

But my cluster controller is spewing errors, unable to list clusters. @nikhiljindal @mfanjie @madhusudancs @jianhuiz

E0603 16:11:08.140689 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/cluster/clustercontroller.go:111: Failed to list *federation.Cluster: Get federation-apiserver:443?resourceVersion=0: unsupported protocol scheme "federation-apiserver"

nikhiljindal · 2016-06-03T17:07:56Z

Looks like we need to replace --master={{.FEDERATION_APISERVER_DEPLOYMENT_NAME}}:443 by --master=https://{{.FEDERATION_APISERVER_DEPLOYMENT_NAME}}:443 here:

kubernetes/federation/manifests/federation-controller-manager-deployment.yaml

Line 21 in d2ea7a2

- --master={{.FEDERATION_APISERVER_DEPLOYMENT_NAME}}:443

.

I havent tried it. @quinton-hoole can you try and see if that fixes the problem?

bprashanth · 2016-06-03T17:37:50Z

Random sidenote given that this is a e2e umbrella. A while back I noticed ubernetes lite e2es weren't collecting logs from all nodes: https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-ubernetes-lite/4407/artifacts/

ghost · 2016-06-03T17:42:46Z

Thanks @bprashanth. Good catch. That should be pretty easy to fix. I
imagine that the e2e log collector is getting confused about the multi-zone
cluster.

On Fri, Jun 3, 2016 at 10:38 AM, Prashanth B notifications@github.com
wrote:

Random sidenote given that this is a e2e umbrella. A while back I noticed
ubernetes lite e2es weren't collecting logs from all nodes:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-ubernetes-lite/4407/artifacts/

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#26762 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AJ6NAYVk53Ci1vCm80hnM9bB6O8iNpRyks5qIGargaJpZM4ItOie
.

ghost · 2016-06-03T18:11:12Z

Thanks @nikhiljindal, that improved things, but cluster controller is still unable to connect to API server (see below). I'm going to head in to the office now, will continue debugging shortly:

kubectl log federation-controller-manager-3548753978-1gv2b --namespace federation-e2e
...
E0603 18:01:44.649876       1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:232: Failed to list *federation.Cluster: Get https://federation-apiserver:443/apis/federation/v1alpha1/clusters?resourceVersion=0: dial tcp 10.0.240.175:443: getsockopt: no route to host
E0603 18:02:02.661837       1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/cluster/clustercontroller.go:111: Failed to list *federation.Cluster: Get https://federation-apiserver:443/apis/federation/v1alpha1/clusters?resourceVersion=0: dial tcp: i/o timeout
E0603 18:02:08.667227       1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:231: Failed to list *api.Service: Get https://federation-apiserver:443/api/v1/services?resourceVersion=0: dial tcp 10.0.240.175:443: i/o timeout

The service exists, although those ports look weird:

$ kubectl get services --namespace federation-e2e
NAME                   CLUSTER-IP     EXTERNAL-IP      PORT(S)   AGE
federation-apiserver   10.0.240.175   146.148.79.202   443/TCP   7m
$ kubectl describe services --namespace federation-e2e
Name:           federation-apiserver
Namespace:      federation-e2e
Labels:         app=federated-cluster
Selector:       app=federated-cluster,module=federation-apiserver
Type:           LoadBalancer
IP:         10.0.240.175
LoadBalancer Ingress:   146.148.79.202
Port:           https   443/TCP
NodePort:       https   32152/TCP
Endpoints:      10.180.1.2:443
Session Affinity:   None
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  8m        8m      1   {service-controller }           Normal      CreatingLoadBalancer    Creating load balancer
  7m        7m      1   {service-controller }           Normal      CreatedLoadBalancer Created load balancer

ghost · 2016-06-03T18:12:17Z

Oh never mind, the ports are fine. It was just my kubectl output that got mushed in my terminal window.

ghost · 2016-06-03T19:10:39Z

And occasionally, this error:

E0603 19:09:26.005217 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:232: Failed to list *federation.Cluster: Get https://federation-apiserver:443/apis/federation/v1alpha1/clusters?resourceVersion=0: x509: failed to load system roots and no roots provided

ghost · 2016-06-03T19:19:51Z

OK, the problem seems to be as simple as this (from API server logs):

I0603 19:10:34.136146 1 logs.go:41] http: TLS handshake error from 10.240.0.13:61288: tls: client offered an unsupported, maximum protocol version of 301
I0603 19:12:53.040150 1 logs.go:41] http: TLS handshake error from 10.240.0.11:50186: remote error: bad certificate

ghost · 2016-06-03T20:49:45Z

It seems that we need the correct certificates installed on the controller manager.

In the mean time I tried setting up unsecured, non-https access to API server, but that has other problems. Apparently the generic apiserver only listens on localhost for unsecured access:

I0603 20:35:02.441500 1 genericapiserver.go:690] Serving securely on 0.0.0.0:443
I0603 20:35:03.222121 1 genericapiserver.go:703] Using self-signed cert (/var/run/kubernetes/apiserver.crt, /var/run/kubernetes/apiserver.key)
I0603 20:35:03.222214 1 genericapiserver.go:734] Serving insecurely on 127.0.0.1:8080

So, not surprisingly, controller manager can't connect:

E0603 20:43:05.903765 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:231: Failed to list *api.Service: Get http://federation-apiserver:8080/api/v1/services?resourceVersion=0: dial tcp 10.0.114.72:8080: getsockopt: connection refused
E0603 20:43:05.903783 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:232: Failed to list *federation.Cluster: Get http://federation-apiserver:8080/apis/federation/v1alpha1/clusters?resourceVersion=0: dial tcp 10.0.114.72:8080: getsockopt: connection refused

mfanjie · 2016-06-04T00:29:39Z

edited, i have a fix for unsecured connection in #26694 but still under discussion

@lavalamp

Automatic merge from submit-queue federation: Updating federation-controller-manager to use secret to get federation-apiserver's kubeconfig Fixing the credentials problem: #26762 (comment). Admin will create a secret with the name "federation-apiserver-secret" in the k8s cluster hosting the federation control plane. This secret will contain the kubeconfig to access federation-apiserver. federation-controller-manager will use this secret to contact the federation-apiserver. This flow is same as the one used by all federation-controllers to contact k8s apiservers that are part of the federation. cc @kubernetes/sig-cluster-federation @lavalamp @erictune @colhom

nikhiljindal · 2016-06-05T02:24:26Z

Thanks @mfanjie!

I am fixing the secured connection in #26819. I need to update the federated-up.sh script after that to auto create the secret.

We should have e2e tests for both secured and unsecured connections.

ghost · 2016-06-15T20:31:48Z

I'm making fairly good progress doing manual e2e testing, but seeing some curious log messages and behaviour. I'll note them down here in case anyone else is seeing similar errors:

Of note below is:

At 18:48 4 DNS records were successfully updated, so that appears to work.
At 20:07 we detect a new federated service and try to create it in the underlying cluster. Good. But this seems to fail due to a misconfigured DNS domain ("example.com"). I'll fix that now.
Presumably due to the above failure, the service seems to not be created at all (i.e. not only the DNS record is not created). Probably OK, but a bit weird.

I0615 18:48:09.849880       1 servicecontroller.go:659] Successfully updated 2 out of 2 DNS records to direct traffic to the updated cluster
I0615 18:48:10.850275       1 servicecontroller.go:645] Detected change in list of cluster names. New  set: map[], Old set: map[federation-e2e-gce-us-central1-f:{}]
I0615 18:48:10.850568       1 servicecontroller.go:659] Successfully updated 4 out of 4 DNS records to direct traffic to the updated cluster
I0615 20:05:10.254312       1 servicecontroller.go:645] Detected change in list of cluster names. New  set: map[federation-e2e-gce-us-central1-f:{}], Old set: map[]
I0615 20:05:10.254468       1 servicecontroller.go:654] New cluster observed federation-e2e-gce-us-central1-f
I0615 20:05:10.257255       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:07:50.264688       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
E0615 20:11:23.972640       1 endpoint_helper.go:88] Failed to sync service: DNS zone example.com not found., put back to service queue
I0615 20:12:50.272983       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:17:50.279530       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new

ghost · 2016-06-15T20:32:45Z

@nikhiljindal @madhusudancs @mml @mfanjie FYI ^^

ghost · 2016-06-15T20:42:43Z

Actually the failure to create the service seems unrelated to the failure to create the DNS record, as it recurs independently every time around the sync loop:

I0615 20:12:50.272983       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:17:50.279530       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:22:50.286356       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:27:50.292919       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:27:55.299025       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:28:05.305809       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:28:25.312875       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:29:05.319426       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:30:25.326059       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:30:30.332917       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:30:40.339491       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:31:00.346206       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:31:40.352150       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:33:00.357819       1 servicecontroller.go:417] Service 'e2e-tests-service-9p7ro/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:35:40.364418       1 servicecontroller.go:417] Service 'e2e-tests-service-9p7ro/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:40:40.373982       1 servicecontroller.go:417] Service 'e2e-tests-service-9p7ro/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new

mml · 2016-06-15T23:12:41Z

I'm having success with some DNS names but not others. For example

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      federated-service
Address 1: 10.0.254.162 federated-service.e2e-tests-service-ev6w0.svc.cluster.local

and

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      federated-service.e2e-tests-service-ev6w0
Address 1: 10.0.254.162 federated-service.e2e-tests-service-ev6w0.svc.cluster.local

and

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      federated-service.e2e-tests-service-ev6w0.svc.cluster.local.
Address 1: 10.0.254.162 federated-service.e2e-tests-service-ev6w0.svc.cluster.local

but federated-service.e2e-tests-service-ev6w0.federation fails.

Still digging in to that, but I note that having all these be a single test makes it really hard to debug what failed, or even to note that three of the DNS entries passed.

nikhiljindal · 2016-06-16T00:12:33Z

Sent #27504 to dump federation apiserver and controller manager logs on failure which should help in debugging federation failures on jenkins.

I am right now looking at federation-apiserver test. Its failing on jenkins. Works fine locally.

mfanjie · 2016-06-16T01:32:11Z

@quinton-hoole FYI, this log indicates service controller are trying to create new service on k8s cluster federation-e2e-gce-us-central1-f, and repeated logs might be led by creation failure.

I0615 20:12:50.272983       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new

@colhom

Automatic merge from submit-queue Dumping logs of federation pods (federation-apiserver, federation-controller-manager) on e2e test failure Ref #26762 This should help with debugging failures. Right now there is no way to access those logs. @kubernetes/sig-cluster-federation @colhom

Automatic merge from submit-queue fix nslookup invocation The old way with 'sh -c' was not correct. For #26762

Automatic merge from submit-queue Clear ClusterIP in the local service before comparison. For #26762

@mml

Automatic merge from submit-queue Adding a wait for federation apiserver to be ready in e2e tests Ref #26762 @kubernetes/sig-cluster-federation @mml

- Dropped the regex test and just test for nslookup exiting 0. - Moved more setup into BeforeEach and used nested Context for non-local case. - Poll inside the container using a bash loop. - Aim for less console noise unless something goes wrong. - Commented out the tests trying to verify that a DNS name is absent.

Automatic merge from submit-queue Re-work the DNS part of the test for #26762

@mml

Automatic merge from submit-queue federation: reverse the order of creating controller manager and secret since controller requires secret Ref #26762 federation-controller-manager fails if the secret is not there when it comes up: https://github.com/kubernetes/kubernetes/blob/970104df3199eeb30710d1067da28f952ae36403/federation/cmd/federation-controller-manager/app/controllermanager.go#L82. Updating the bring up scripts to first create the secrets and then create the deployments. @kubernetes/sig-cluster-federation @mml

ghost · 2016-06-17T18:46:04Z

This loos like a bug. Namespace being dropped? I'll send in a fix shortly.

[BeforeEach] DNS
  /Users/quinton/code/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/federated-service.go:136
STEP: Creating federated service "federated-service" in namespace "e2e-tests-service-40rio"
Jun 17 11:43:48.605: INFO: Trying to create service "federated-service" in namespace ""

nikhiljindal · 2016-06-21T21:21:31Z

federation apiserver e2e test was a lot more green over the weekend. It seems to have regressed again

colhom · 2016-06-21T21:58:16Z

We should probably wrap the underlying kube-up invocations in a retry loops? That would probably take care of most of the failures.

@quinton-hoole

Automatic merge from submit-queue federation: Updating KubeDNS to try finding a local service first for federation query Ref #26762 Updating KubeDNS to try to find a local service first for federation query. Without this change, KubeDNS always returns the DNS hostname, even if a local service exists. Have updated the code to first remove federation name from path if it exists, so that the default search for local service happens. If we dont find a local service, then we try to find the DNS hostname. Will appreciate a strong review since this is my first change to KubeDNS. #25727 was the original PR that added federation support to KubeDNS. cc @kubernetes/sig-cluster-federation @quinton-hoole @madhusudancs @bprashanth @mml

ghost · 2016-07-25T22:40:17Z

I think these problems are all resolved now, and this issue has served it's purpose. Closing.

@lavalamp

Automatic merge from submit-queue federation: Updating federation-controller-manager to use secret to get federation-apiserver's kubeconfig Fixing the credentials problem: kubernetes/kubernetes#26762 (comment). Admin will create a secret with the name "federation-apiserver-secret" in the k8s cluster hosting the federation control plane. This secret will contain the kubeconfig to access federation-apiserver. federation-controller-manager will use this secret to contact the federation-apiserver. This flow is same as the one used by all federation-controllers to contact k8s apiservers that are part of the federation. cc @kubernetes/sig-cluster-federation @lavalamp @erictune @colhom

madhusudancs added the area/cluster-federation label Jun 3, 2016

madhusudancs added this to the next-candidate milestone Jun 3, 2016

madhusudancs self-assigned this Jun 3, 2016

ghost closed this as completed Jun 3, 2016

madhusudancs reopened this Jun 3, 2016

ghost closed this as completed Jun 3, 2016

nikhiljindal mentioned this issue Jun 3, 2016

federation: Updating federation-controller-manager to use secret to get federation-apiserver's kubeconfig #26819

Merged

ghost reopened this Jun 3, 2016

nikhiljindal mentioned this issue Jun 15, 2016

federation-apiserver test: Increasing timeout for clusters to be ready #27411

Merged

mml mentioned this issue Jun 15, 2016

One more place we need a secret. #27457

Merged

nikhiljindal mentioned this issue Jun 15, 2016

Adding kubernetes-e2e-gce-federation to list of merge blocking suites kubernetes-retired/contrib#1198

Closed

This was referenced Jun 15, 2016

Clear ClusterIP in the local service before comparison. #27480

Merged

fix nslookup invocation #27481

Merged

nikhiljindal mentioned this issue Jun 16, 2016

Dumping logs of federation pods (federation-apiserver, federation-controller-manager) on e2e test failure #27504

Merged

k8s-github-robot pushed a commit that referenced this issue Jun 16, 2016

Merge pull request #27481 from mml/fed-nslookup

947f608

Automatic merge from submit-queue fix nslookup invocation The old way with 'sh -c' was not correct. For #26762

nikhiljindal mentioned this issue Jun 16, 2016

Adding a wait for federation apiserver to be ready in e2e tests #27561

Merged

k8s-github-robot pushed a commit that referenced this issue Jun 17, 2016

Merge pull request #27480 from mml/fed-service-clusterip

b41405e

Automatic merge from submit-queue Clear ClusterIP in the local service before comparison. For #26762

k8s-github-robot pushed a commit that referenced this issue Jun 17, 2016

Merge pull request #27561 from nikhiljindal/fixFedTest

d82f3bf

Automatic merge from submit-queue Adding a wait for federation apiserver to be ready in e2e tests Ref #26762 @kubernetes/sig-cluster-federation @mml

nikhiljindal mentioned this issue Jun 17, 2016

federation: reverse the order of creating controller manager and secret since controller requires secret #27592

Merged

k8s-github-robot pushed a commit that referenced this issue Jun 17, 2016

Merge pull request #27587 from mml/fed-dns-simplify

b9c5ef6

Automatic merge from submit-queue Re-work the DNS part of the test for #26762

nikhiljindal mentioned this issue Jun 20, 2016

federation: Updating KubeDNS to try finding a local service first for federation query #27708

Merged

ghost closed this as completed Jul 25, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ubernetes e2e flakes #26762

Ubernetes e2e flakes #26762

madhusudancs commented Jun 3, 2016

ghost commented Jun 3, 2016

madhusudancs commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

madhusudancs commented Jun 3, 2016

madhusudancs commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

nikhiljindal commented Jun 3, 2016

bprashanth commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016 •

edited by ghost

Loading

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

mfanjie commented Jun 4, 2016 •

edited

Loading

nikhiljindal commented Jun 5, 2016

ghost commented Jun 15, 2016

ghost commented Jun 15, 2016

ghost commented Jun 15, 2016

mml commented Jun 15, 2016

nikhiljindal commented Jun 16, 2016

mfanjie commented Jun 16, 2016

ghost commented Jun 17, 2016

nikhiljindal commented Jun 21, 2016

colhom commented Jun 21, 2016

ghost commented Jul 25, 2016 •

edited by ghost

Loading

Ubernetes e2e flakes #26762

Ubernetes e2e flakes #26762

Comments

madhusudancs commented Jun 3, 2016

ghost commented Jun 3, 2016

madhusudancs commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

madhusudancs commented Jun 3, 2016

madhusudancs commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

nikhiljindal commented Jun 3, 2016

bprashanth commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016 • edited by ghost Loading

ghost commented Jun 3, 2016

ghost commented Jun 3, 2016

mfanjie commented Jun 4, 2016 • edited Loading

nikhiljindal commented Jun 5, 2016

ghost commented Jun 15, 2016

ghost commented Jun 15, 2016

ghost commented Jun 15, 2016

mml commented Jun 15, 2016

nikhiljindal commented Jun 16, 2016

mfanjie commented Jun 16, 2016

ghost commented Jun 17, 2016

nikhiljindal commented Jun 21, 2016

colhom commented Jun 21, 2016

ghost commented Jul 25, 2016 • edited by ghost Loading

ghost commented Jun 3, 2016 •

edited by ghost

Loading

mfanjie commented Jun 4, 2016 •

edited

Loading

ghost commented Jul 25, 2016 •

edited by ghost

Loading