Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ubernetes e2e flakes #26762

Closed
madhusudancs opened this issue Jun 3, 2016 · 51 comments
Closed

Ubernetes e2e flakes #26762

madhusudancs opened this issue Jun 3, 2016 · 51 comments
Assignees

Comments

@madhusudancs
Copy link
Contributor

I am opening this master/umbrella issue to track Ubernetes e2e test flakes. I think this issue is sufficient right now. But as we make progress we might have to fork this into sub-issues.

This issue is not just about the e2e tests themselves but also includes infrastructure related flakes such as bringing up the clusters, bringing up the federation control plane, etc.

Here is one flake that I am seeing from yesterday:

$ FEDERATION=true E2E_ZONES="us-central1-f" KUBE_GCE_ZONE="us-central1-f" go run hack/e2e.go -v --up
...
...
...
NAME                                     LOCATION       SCOPE  BASE_INSTANCE_NAME                       SIZE  TARGET_SIZE  INSTANCE_TEMPLATE                           AUTOSCALED
madhusudancs-us-central1-f-minion-group  us-central1-f  zone   madhusudancs-us-central1-f-minion-group        3            madhusudancs-us-central1-f-minion-template
Waiting for group to become stable, current operations: creating: 3
Waiting for group to become stable, current operations: creating: 3
Waiting for group to become stable, current operations: creating: 3
Waiting for group to become stable, current operations: creating: 3
Group is stable
INSTANCE_GROUPS=madhusudancs-us-central1-f-minion-group
NODE_NAMES=madhusudancs-us-central1-f-minion-group-5wk9 madhusudancs-us-central1-f-minion-group-6ys7 madhusudancs-us-central1-f-minion-group-l1vs
Using master: madhusudancs-us-central1-f-master (external IP: <REDACTED_IP>)
Waiting up to 300 seconds for cluster initialization.

  This will continually check to see if the API for kubernetes is reachable.
  This may time out if there was some uncaught error during start up.

...................................................................................................................................................Cluster failed to initialize within 300 seconds.
2016/06/02 14:18:06 e2e.go:218: Error running up: exit status 1
2016/06/02 14:18:06 e2e.go:214: Step 'up' finished in 8m0.261564635s
2016/06/02 14:18:06 e2e.go:114: Error starting e2e cluster. Aborting.
exit status 1

cc @colhom @nikhiljindal @colhom @kubernetes/sig-cluster-federation @jianhuiz @mfanjie

@ghost
Copy link

ghost commented Jun 3, 2016

@madhusudancs I saw similar symptoms today, but it turns out the I'd reached the quota on my project. Do you have sufficient quota (specifically I had the default quota of 24 cores, and the federation requires more than that). I trivially increased my quota with an increase request, which got automatically approved within a few seconds.

@madhusudancs
Copy link
Contributor Author

@quinton-hoole I have a higher quota limit in the region where I am spinning up a federation. Btw, this is a single cluster federation.

@ghost
Copy link

ghost commented Jun 3, 2016

For the record, my three clusters came up fine, but then I got the following. Will debug some more in the morning:

$ FEDERATION=true E2E_ZONES="us-central1-a us-central1-b us-central1-f" KUBE_GCE_ZONE="us-central1-f" FEDERATION_PUSH_REPO_BASE="gcr.io/quinton-etcd-testing" KUBERNETES_PROVIDER="gce" federation/cluster/federated-up.sh 
namespace "federation-e2e" created
No resources found
service "federation-apiserver" created
attempting to get federation-apiserver loadbalancer hostname (1 / 30)
attempting to get federation-apiserver loadbalancer hostname (2 / 30)
attempting to get federation-apiserver loadbalancer hostname (3 / 30)
attempting to get federation-apiserver loadbalancer hostname (4 / 30)
attempting to get federation-apiserver loadbalancer hostname (5 / 30)
attempting to get federation-apiserver loadbalancer hostname (6 / 30)
Found federation-apiserver host at 104.197.87.156
deployment "federation-apiserver" created
secret "federation-apiserver-secrets" created
deployment "federation-controller-manager" created
Waiting for federation-apiserver to be running...(phase= Pending)
...
Waiting for federation-apiserver to be running...(phase= Pending)
federation-apiserver pod is not running! giving up.

@ghost ghost closed this as completed Jun 3, 2016
@ghost
Copy link

ghost commented Jun 3, 2016

PS: It looks like the images are not where they should be:

kubectl --server=https://146.148.37.247 get pods --namespace federation-e2e
NAME                                             READY     STATUS             RESTARTS   AGE
federation-apiserver-3663518660-nwge3            1/2       ImagePullBackOff   0          5m
federation-controller-manager-2288429398-98142   0/1       ImagePullBackOff   0          5m

@ghost
Copy link

ghost commented Jun 3, 2016

I'm pretty sure that this is operator error on my part.

I ran this:

FEDERATION=true E2E_ZONES="us-central1-a us-central1-b us-central1-f" KUBE_GCE_ZONE="us-central1-f" FEDERATION_PUSH_REPO_BASE="gcr.io/quinton-etcd-testing" go run hack/e2e.go -v --up

and after all three clusters came up cleanly, I got this...

k8s.io/kubernetes/federation/cluster/common.sh: line 75: KUBERNETES_PROVIDER: unbound variable
2016/06/03 00:09:08 e2e.go:218: Error running up: exit status 1
2016/06/03 00:09:08 e2e.go:214: Step 'up' finished in 23m42.294835356s
2016/06/03 00:09:08 e2e.go:114: Error starting e2e cluster. Aborting.
exit status 1

So (stupidly), rather than tearing the whole lot down and starting over, I ran this:

$ FEDERATION=true E2E_ZONES="us-central1-a us-central1-b us-central1-f" KUBE_GCE_ZONE="us-central1-f" FEDERATION_PUSH_REPO_BASE="gcr.io/quinton-etcd-testing" KUBERNETES_PROVIDER="gce" federation/cluster/federated-up.sh -v

and got this:

namespace "federation-e2e" created
No resources found
service "federation-apiserver" created
attempting to get federation-apiserver loadbalancer hostname (1 / 30)
attempting to get federation-apiserver loadbalancer hostname (2 / 30)
attempting to get federation-apiserver loadbalancer hostname (3 / 30)
attempting to get federation-apiserver loadbalancer hostname (4 / 30)
attempting to get federation-apiserver loadbalancer hostname (5 / 30)
attempting to get federation-apiserver loadbalancer hostname (6 / 30)
Found federation-apiserver host at 104.197.87.156
deployment "federation-apiserver" created
secret "federation-apiserver-secrets" created
deployment "federation-controller-manager" created
Waiting for federation-apiserver to be running...(phase= Pending)
...
Waiting for federation-apiserver to be running...(phase= Pending)
federation-apiserver pod is not running! giving up.

I'm busy tearing it all down and starting from scratch, with the correct env vars.

@madhusudancs
Copy link
Contributor Author

Please don't close this issue until we are reasonably confident that this isn't a flake. If it works once that doesn't mean it is not a flake. Also, this is an umbrella issue.

@madhusudancs madhusudancs reopened this Jun 3, 2016
@madhusudancs
Copy link
Contributor Author

@quinton-hoole Did you also run federation/cluster/federation-push.sh before running federation/cluster/federation-up.sh or hack/e2e.go --up? Also, please see the discussion here - #26656

@ghost
Copy link

ghost commented Jun 3, 2016

My apologies. I had no intention of closing this issue. I must have pressed the wrong button :-(

@ghost
Copy link

ghost commented Jun 3, 2016

And thanks - my federation is now up and healthy:

Waiting for federation-apiserver to be running...(phase= Pending)
Waiting for federation-apiserver to be running...(phase= Running)
federation-apiserver pod is running!
Waiting for federation-controller-manager to be running...(phase= Running)
federation-controller-manager pod is running!
cluster "federated-cluster" set.
user "federated-cluster" set.
context "federated-cluster" set.
Wrote config for federated-cluster to /Users/quinton/.kube/config

kubectl get pods --namespace federation-e2e
NAME READY STATUS RESTARTS AGE
federation-apiserver-608821220-rsid0 2/2 Running 0 2m
federation-controller-manager-3887574383-h40yw 1/1 Running 0 2m

@ghost
Copy link

ghost commented Jun 3, 2016

And the e2e test for cluster registration passed!

[It] should allow creation of cluster api objects
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/federation-apiserver.go:63
Jun 3 09:02:42.316: INFO: Creating cluster object: federation-e2e-gce-us-central1-a (https://104.197.87.156)
Jun 3 09:02:42.746: INFO: Creating cluster object: federation-e2e-gce-us-central1-b (https://104.197.70.145)
Jun 3 09:02:42.812: INFO: Creating cluster object: federation-e2e-gce-us-central1-f (https://23.236.59.255)

...

Ran 1 of 312 Specs in 98.316 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 311 Skipped PASS

Ginkgo ran 1 suite in 1m38.613311042s
Test Suite Passed

@ghost
Copy link

ghost commented Jun 3, 2016

But my cluster controller is spewing errors, unable to list clusters. @nikhiljindal @mfanjie @madhusudancs @jianhuiz

E0603 16:11:08.140689 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/cluster/clustercontroller.go:111: Failed to list *federation.Cluster: Get federation-apiserver:443?resourceVersion=0: unsupported protocol scheme "federation-apiserver"

@nikhiljindal
Copy link
Contributor

Looks like we need to replace --master={{.FEDERATION_APISERVER_DEPLOYMENT_NAME}}:443 by --master=https://{{.FEDERATION_APISERVER_DEPLOYMENT_NAME}}:443 here:

- --master={{.FEDERATION_APISERVER_DEPLOYMENT_NAME}}:443
.

I havent tried it. @quinton-hoole can you try and see if that fixes the problem?

@bprashanth
Copy link
Contributor

Random sidenote given that this is a e2e umbrella. A while back I noticed ubernetes lite e2es weren't collecting logs from all nodes: https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-ubernetes-lite/4407/artifacts/

@ghost
Copy link

ghost commented Jun 3, 2016

Thanks @bprashanth. Good catch. That should be pretty easy to fix. I
imagine that the e2e log collector is getting confused about the multi-zone
cluster.

On Fri, Jun 3, 2016 at 10:38 AM, Prashanth B notifications@github.com
wrote:

Random sidenote given that this is a e2e umbrella. A while back I noticed
ubernetes lite e2es weren't collecting logs from all nodes:
https://pantheon.corp.google.com/storage/browser/kubernetes-jenkins/logs/kubernetes-e2e-gce-ubernetes-lite/4407/artifacts/


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#26762 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AJ6NAYVk53Ci1vCm80hnM9bB6O8iNpRyks5qIGargaJpZM4ItOie
.

@ghost
Copy link

ghost commented Jun 3, 2016

Thanks @nikhiljindal, that improved things, but cluster controller is still unable to connect to API server (see below). I'm going to head in to the office now, will continue debugging shortly:

kubectl log federation-controller-manager-3548753978-1gv2b --namespace federation-e2e
...
E0603 18:01:44.649876       1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:232: Failed to list *federation.Cluster: Get https://federation-apiserver:443/apis/federation/v1alpha1/clusters?resourceVersion=0: dial tcp 10.0.240.175:443: getsockopt: no route to host
E0603 18:02:02.661837       1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/cluster/clustercontroller.go:111: Failed to list *federation.Cluster: Get https://federation-apiserver:443/apis/federation/v1alpha1/clusters?resourceVersion=0: dial tcp: i/o timeout
E0603 18:02:08.667227       1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:231: Failed to list *api.Service: Get https://federation-apiserver:443/api/v1/services?resourceVersion=0: dial tcp 10.0.240.175:443: i/o timeout

The service exists, although those ports look weird:

$ kubectl get services --namespace federation-e2e
NAME                   CLUSTER-IP     EXTERNAL-IP      PORT(S)   AGE
federation-apiserver   10.0.240.175   146.148.79.202   443/TCP   7m
$ kubectl describe services --namespace federation-e2e
Name:           federation-apiserver
Namespace:      federation-e2e
Labels:         app=federated-cluster
Selector:       app=federated-cluster,module=federation-apiserver
Type:           LoadBalancer
IP:         10.0.240.175
LoadBalancer Ingress:   146.148.79.202
Port:           https   443/TCP
NodePort:       https   32152/TCP
Endpoints:      10.180.1.2:443
Session Affinity:   None
Events:
  FirstSeen LastSeen    Count   From            SubobjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  8m        8m      1   {service-controller }           Normal      CreatingLoadBalancer    Creating load balancer
  7m        7m      1   {service-controller }           Normal      CreatedLoadBalancer Created load balancer

@ghost
Copy link

ghost commented Jun 3, 2016

Oh never mind, the ports are fine. It was just my kubectl output that got mushed in my terminal window.

@ghost
Copy link

ghost commented Jun 3, 2016

And occasionally, this error:

E0603 19:09:26.005217 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:232: Failed to list *federation.Cluster: Get https://federation-apiserver:443/apis/federation/v1alpha1/clusters?resourceVersion=0: x509: failed to load system roots and no roots provided

@ghost
Copy link

ghost commented Jun 3, 2016

OK, the problem seems to be as simple as this (from API server logs):

I0603 19:10:34.136146 1 logs.go:41] http: TLS handshake error from 10.240.0.13:61288: tls: client offered an unsupported, maximum protocol version of 301
I0603 19:12:53.040150 1 logs.go:41] http: TLS handshake error from 10.240.0.11:50186: remote error: bad certificate

@ghost
Copy link

ghost commented Jun 3, 2016

It seems that we need the correct certificates installed on the controller manager.

In the mean time I tried setting up unsecured, non-https access to API server, but that has other problems. Apparently the generic apiserver only listens on localhost for unsecured access:

I0603 20:35:02.441500 1 genericapiserver.go:690] Serving securely on 0.0.0.0:443
I0603 20:35:03.222121 1 genericapiserver.go:703] Using self-signed cert (/var/run/kubernetes/apiserver.crt, /var/run/kubernetes/apiserver.key)
I0603 20:35:03.222214 1 genericapiserver.go:734] Serving insecurely on 127.0.0.1:8080

So, not surprisingly, controller manager can't connect:

E0603 20:43:05.903765 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:231: Failed to list *api.Service: Get http://federation-apiserver:8080/api/v1/services?resourceVersion=0: dial tcp 10.0.114.72:8080: getsockopt: connection refused
E0603 20:43:05.903783 1 reflector.go:216] k8s.io/kubernetes/federation/pkg/federation-controller/service/servicecontroller.go:232: Failed to list *federation.Cluster: Get http://federation-apiserver:8080/apis/federation/v1alpha1/clusters?resourceVersion=0: dial tcp 10.0.114.72:8080: getsockopt: connection refused

@mfanjie
Copy link

mfanjie commented Jun 4, 2016

edited, i have a fix for unsecured connection in #26694 but still under discussion

k8s-github-robot pushed a commit that referenced this issue Jun 4, 2016
Automatic merge from submit-queue

federation: Updating federation-controller-manager to use secret to get federation-apiserver's kubeconfig

Fixing the credentials problem: #26762 (comment).

Admin will create a secret with the name "federation-apiserver-secret" in the k8s cluster hosting the federation control plane. This secret will contain the kubeconfig to access federation-apiserver.
federation-controller-manager will use this secret to contact the federation-apiserver.
This flow is same as the one used by all federation-controllers to contact k8s apiservers that are part of the federation.

cc @kubernetes/sig-cluster-federation @lavalamp @erictune @colhom
@nikhiljindal
Copy link
Contributor

Thanks @mfanjie!

I am fixing the secured connection in #26819. I need to update the federated-up.sh script after that to auto create the secret.

We should have e2e tests for both secured and unsecured connections.

@ghost
Copy link

ghost commented Jun 15, 2016

I'm making fairly good progress doing manual e2e testing, but seeing some curious log messages and behaviour. I'll note them down here in case anyone else is seeing similar errors:

Of note below is:

  1. At 18:48 4 DNS records were successfully updated, so that appears to work.
  2. At 20:07 we detect a new federated service and try to create it in the underlying cluster. Good. But this seems to fail due to a misconfigured DNS domain ("example.com"). I'll fix that now.
  3. Presumably due to the above failure, the service seems to not be created at all (i.e. not only the DNS record is not created). Probably OK, but a bit weird.
I0615 18:48:09.849880       1 servicecontroller.go:659] Successfully updated 2 out of 2 DNS records to direct traffic to the updated cluster
I0615 18:48:10.850275       1 servicecontroller.go:645] Detected change in list of cluster names. New  set: map[], Old set: map[federation-e2e-gce-us-central1-f:{}]
I0615 18:48:10.850568       1 servicecontroller.go:659] Successfully updated 4 out of 4 DNS records to direct traffic to the updated cluster
I0615 20:05:10.254312       1 servicecontroller.go:645] Detected change in list of cluster names. New  set: map[federation-e2e-gce-us-central1-f:{}], Old set: map[]
I0615 20:05:10.254468       1 servicecontroller.go:654] New cluster observed federation-e2e-gce-us-central1-f
I0615 20:05:10.257255       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:07:50.264688       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
E0615 20:11:23.972640       1 endpoint_helper.go:88] Failed to sync service: DNS zone example.com not found., put back to service queue
I0615 20:12:50.272983       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:17:50.279530       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new

@ghost
Copy link

ghost commented Jun 15, 2016

@nikhiljindal @madhusudancs @mml @mfanjie FYI ^^

@ghost
Copy link

ghost commented Jun 15, 2016

Actually the failure to create the service seems unrelated to the failure to create the DNS record, as it recurs independently every time around the sync loop:

I0615 20:12:50.272983       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:17:50.279530       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:22:50.286356       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:27:50.292919       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:27:55.299025       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:28:05.305809       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:28:25.312875       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:29:05.319426       1 servicecontroller.go:417] Service 'e2e-tests-service-d4kwp/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:30:25.326059       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:30:30.332917       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:30:40.339491       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:31:00.346206       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:31:40.352150       1 servicecontroller.go:417] Service 'e2e-tests-service-yjdey/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:33:00.357819       1 servicecontroller.go:417] Service 'e2e-tests-service-9p7ro/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:35:40.364418       1 servicecontroller.go:417] Service 'e2e-tests-service-9p7ro/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new
I0615 20:40:40.373982       1 servicecontroller.go:417] Service 'e2e-tests-service-9p7ro/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new

@mml
Copy link
Contributor

mml commented Jun 15, 2016

I'm having success with some DNS names but not others. For example

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      federated-service
Address 1: 10.0.254.162 federated-service.e2e-tests-service-ev6w0.svc.cluster.local

and

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      federated-service.e2e-tests-service-ev6w0
Address 1: 10.0.254.162 federated-service.e2e-tests-service-ev6w0.svc.cluster.local

and

Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      federated-service.e2e-tests-service-ev6w0.svc.cluster.local.
Address 1: 10.0.254.162 federated-service.e2e-tests-service-ev6w0.svc.cluster.local

but federated-service.e2e-tests-service-ev6w0.federation fails.

Still digging in to that, but I note that having all these be a single test makes it really hard to debug what failed, or even to note that three of the DNS entries passed.

@nikhiljindal
Copy link
Contributor

Sent #27504 to dump federation apiserver and controller manager logs on failure which should help in debugging federation failures on jenkins.

I am right now looking at federation-apiserver test. Its failing on jenkins. Works fine locally.

@mfanjie
Copy link

mfanjie commented Jun 16, 2016

@quinton-hoole FYI, this log indicates service controller are trying to create new service on k8s cluster federation-e2e-gce-us-central1-f, and repeated logs might be led by creation failure.

I0615 20:12:50.272983       1 servicecontroller.go:417] Service 'e2e-tests-service-3s4ko/federated-service' is not found in cluster federation-e2e-gce-us-central1-f, trying to create new

k8s-github-robot pushed a commit that referenced this issue Jun 16, 2016
Automatic merge from submit-queue

Dumping logs of federation pods (federation-apiserver, federation-controller-manager) on e2e test failure

Ref #26762

This should help with debugging failures.
Right now there is no way to access those logs.

@kubernetes/sig-cluster-federation @colhom
k8s-github-robot pushed a commit that referenced this issue Jun 16, 2016
Automatic merge from submit-queue

fix nslookup invocation

The old way with 'sh -c' was not correct.

For #26762
k8s-github-robot pushed a commit that referenced this issue Jun 17, 2016
Automatic merge from submit-queue

Clear ClusterIP in the local service before comparison.

For #26762
k8s-github-robot pushed a commit that referenced this issue Jun 17, 2016
Automatic merge from submit-queue

Adding a wait for federation apiserver to be ready in e2e tests

Ref #26762

@kubernetes/sig-cluster-federation @mml
mml added a commit to mml/kubernetes that referenced this issue Jun 17, 2016
- Dropped the regex test and just test for nslookup exiting 0.
- Moved more setup into BeforeEach and used nested Context for non-local
  case.
- Poll inside the container using a bash loop.
- Aim for less console noise unless something goes wrong.
- Commented out the tests trying to verify that a DNS name is absent.
k8s-github-robot pushed a commit that referenced this issue Jun 17, 2016
Automatic merge from submit-queue

Re-work the DNS part of the test for #26762
k8s-github-robot pushed a commit that referenced this issue Jun 17, 2016
Automatic merge from submit-queue

federation: reverse the order of creating controller manager and secret since controller requires secret

Ref #26762

federation-controller-manager fails if the secret is not there when it comes up: https://github.com/kubernetes/kubernetes/blob/970104df3199eeb30710d1067da28f952ae36403/federation/cmd/federation-controller-manager/app/controllermanager.go#L82.

Updating the bring up scripts to first create the secrets and then create the deployments.

@kubernetes/sig-cluster-federation @mml
@ghost
Copy link

ghost commented Jun 17, 2016

This loos like a bug. Namespace being dropped? I'll send in a fix shortly.

[BeforeEach] DNS
  /Users/quinton/code/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/federated-service.go:136
STEP: Creating federated service "federated-service" in namespace "e2e-tests-service-40rio"
Jun 17 11:43:48.605: INFO: Trying to create service "federated-service" in namespace ""

@nikhiljindal
Copy link
Contributor

federation apiserver e2e test was a lot more green over the weekend. It seems to have regressed again

fed-apiserver-flake

@colhom
Copy link

colhom commented Jun 21, 2016

We should probably wrap the underlying kube-up invocations in a retry loops? That would probably take care of most of the failures.

k8s-github-robot pushed a commit that referenced this issue Jun 24, 2016
Automatic merge from submit-queue

federation: Updating KubeDNS to try finding a local service first for federation query

Ref #26762

Updating KubeDNS to try to find a local service first for federation query.
Without this change, KubeDNS always returns the DNS hostname, even if a local service exists.

Have updated the code to first remove federation name from path if it exists, so that the default search for local service happens. If we dont find a local service, then we try to find the DNS hostname.

Will appreciate a strong review since this is my first change to KubeDNS.
#25727 was the original PR that added federation support to KubeDNS.

cc @kubernetes/sig-cluster-federation @quinton-hoole @madhusudancs @bprashanth @mml
@ghost
Copy link

ghost commented Jul 25, 2016

I think these problems are all resolved now, and this issue has served it's purpose. Closing.

@ghost ghost closed this as completed Jul 25, 2016
perotinus pushed a commit to kubernetes-retired/cluster-registry that referenced this issue Sep 2, 2017
Automatic merge from submit-queue

federation: Updating federation-controller-manager to use secret to get federation-apiserver's kubeconfig

Fixing the credentials problem: kubernetes/kubernetes#26762 (comment).

Admin will create a secret with the name "federation-apiserver-secret" in the k8s cluster hosting the federation control plane. This secret will contain the kubeconfig to access federation-apiserver.
federation-controller-manager will use this secret to contact the federation-apiserver.
This flow is same as the one used by all federation-controllers to contact k8s apiservers that are part of the federation.

cc @kubernetes/sig-cluster-federation @lavalamp @erictune @colhom
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants