Don't recreate forwarding rule/target pool on master restart #29079

bprashanth · 2016-07-18T00:31:23Z

Restarting master causes a recreation of fwrule and target pool for type=loadbalancer services. We are saved in the non restart case because we compare a cached verson of the service to decided if we should even call into the cloudprovider and update resources.

This is happening because our reconciliation routines are confused about session affintiy and lbip, leading them to recreate, since inplace update isn't allowed.

The impact is at best a few seconds of radio silence, at worst who knows?

Repro:
Create a loadbalanced service, eg:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
    protocol: TCP
    name: http
  selector:
    app: nginx

---
apiVersion: v1
kind: ReplicationController
metadata:
  name: nginx
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: gcr.io/google_containers/nginx-slim:0.8
        ports:
        - containerPort: 80
        volumeMounts:
        - name: host-proc
          mountPath: /host-proc
        command:
        - sh
        - -c
        - |
          echo 1024 > /host-proc/sys/net/core/somaxconn && nginx -g "daemon off;"
      volumes:
      - name: host-proc
        hostPath:
          path: /proc

Wait for it to acquire a vip.

$ k get svc
NAME         CLUSTER-IP   EXTERNAL-IP       PORT(S)   AGE
kubernetes   10.0.0.1     <none>            443/TCP   5m
nginx        10.0.68.45   104.197.169.231   80/TCP    48s

Try it out

$ while true; do wget -q -O - --retry-connrefused --waitretry=1 --read-timeout=1 --timeout=1 -t 0 104.197.169.231; sleep
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

Ssh into master and restart kube-controller-manager:

$ docker kill $(docker ps | grep kube-controller | awk '{print $1}')

Watch the wgets, you'll notice radio silence after a bit.
This is because the targetpool and forwarding rule get recreted, for 2 reasons:

affinity mismatch

$ k get svc nginx -o yaml | grep -i affinity
    sessionAffinity: None
$ gcloud compute target-pools describe abd65fb3837ec11e69d3642010af0000 --format=json --region=us-central1
{
  "creationTimestamp": "2016-06-21T13:14:54.713-07:00",
  "description": "{\"kubernetes.io/service-name\":\"default/echoheaders\"}",
  "id": "5596799261850476241",
  "kind": "compute#targetPool",
  "name": "abd65fb3837ec11e69d3642010af0000",
  "region": "https://www.googleapis.com/compute/v1/projects/kubernetesdev/regions/us-central1",
  "selfLink": "https://www.googleapis.com/compute/v1/projects/kubernetesdev/regions/us-central1/targetPools/abd65fb3837ec11e69d3642010af0000"
}

So this check fails: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L816

Lbip mismatch:

$ k get svc nginx --template='{{.spec.clusterIP}}'
10.0.68.45
$ k get svc nginx --template='{{.spec.loadBalancerIP}}'
<no value>
$ $ gcloud compute forwarding-rules list abd65fb3837ec11e69d3642010af0000 --format=json
[
  {
    "IPAddress": "162.222.178.160",
    "IPProtocol": "TCP",
    "creationTimestamp": "2016-06-21T13:14:58.244-07:00",
    "description": "{\"kubernetes.io/service-name\":\"default/echoheaders\"}",
    "id": "1689396002026692269",
    "kind": "compute#forwardingRule",
    "name": "abd65fb3837ec11e69d3642010af0000",
    "portRange": "80-80",
    "region": "us-central1",
    "selfLink": "https://www.googleapis.com/compute/v1/projects/kubernetesdev/regions/us-central1/forwardingRules/abd65fb3837ec11e69d3642010af0000",
    "target": "us-central1/targetPools/abd65fb3837ec11e69d3642010af0000"
  }
]

So this check fails: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L765 (lbip is retrieved here: https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce.go#L535)

The text was updated successfully, but these errors were encountered:

bprashanth · 2016-07-18T00:35:51Z

the session affinity fix is trivial. The easiest change for the lb ip fix is to not actually check the ip, but simply backoff as long as the forwarding rule exists. This is "bad" because of the way we allocate a static-ip, assign it to the fwd rule, then de-allocate it. The alternative is to use status.lbip to reconcile, but that's racy.

Automatic merge from submit-queue Don't recreate lb cloud resources on kcm restart Absolute dumbest way to fix it right now. Fixes #29079

bprashanth · 2016-07-18T05:42:40Z

cherry pick #29083

bprashanth added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/platform/gce team/cluster labels Jul 18, 2016

bprashanth added this to the v1.3 milestone Jul 18, 2016

bprashanth assigned j3ffml, thockin, a-robinson, roberthbailey and bprashanth Jul 18, 2016

bprashanth mentioned this issue Jul 18, 2016

Don't recreate lb cloud resources on kcm restart #29082

Merged

k8s-github-robot closed this as completed in #29082 Jul 18, 2016

k8s-github-robot pushed a commit that referenced this issue Jul 18, 2016

Merge pull request #29082 from bprashanth/lb_fw

1e1d4b0

Automatic merge from submit-queue Don't recreate lb cloud resources on kcm restart Absolute dumbest way to fix it right now. Fixes #29079

zmerlynn changed the title ~~Don't recrete forwarding rule/target pool on master restart~~ Don't recreate forwarding rule/target pool on master restart Jul 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't recreate forwarding rule/target pool on master restart #29079

Don't recreate forwarding rule/target pool on master restart #29079

bprashanth commented Jul 18, 2016 •

edited

Loading

bprashanth commented Jul 18, 2016

bprashanth commented Jul 18, 2016

Don't recreate forwarding rule/target pool on master restart #29079

Don't recreate forwarding rule/target pool on master restart #29079

Comments

bprashanth commented Jul 18, 2016 • edited Loading

bprashanth commented Jul 18, 2016

bprashanth commented Jul 18, 2016

bprashanth commented Jul 18, 2016 •

edited

Loading