Multiple versions of addons running after upgrade. #37641

krousey · 2016-11-29T18:43:14Z

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): None

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Kubernetes version (use kubectl version):

$ kubectl version                                                                          
Client Version: version.Info{Major:"1", Minor:"5+", GitVersion:"v1.5.0-beta.2.2+f64c9f2d999ceb", GitCommit:"f64c9f2d999ceb157d5672e9bba6639a4c456f6e", GitTreeState:"clean", BuildDate:"2016-11-29T15:21:56Z", GoVersion:"go1.7.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5+", GitVersion:"v1.5.0-beta.2.2+f64c9f2d999ceb", GitCommit:"f64c9f2d999ceb157d5672e9bba6639a4c456f6e", GitTreeState:"clean", BuildDate:"2016-11-29T15:13:51Z", GoVersion:"go1.7.3", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: GKE

What happened: Upgrades to version 1.5 (from any previous version) change existing addons from ReplicationControllers to Deployments without deleting old ReplicationControllers. This leads to multiple versions of the addons running at the same time. There also seems to be multiple deployments of heapster as well.

$ kubectl get rc --namespace=kube-system                                                   
NAME                          DESIRED   CURRENT   READY     AGE
kube-dns-v17.1                2         2         2         1h
kubernetes-dashboard-v1.1.1   1         1         1         1h
l7-default-backend-v1.0       1         1         1         1h

$ kubectl get deployment --namespace=kube-system                                           
NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
heapster-v1.1.0        1         1         1            1           1h
heapster-v1.2.0        1         1         1            1           1h
kube-dns               1         1         1            1           1h
kubernetes-dashboard   1         1         1            1           1h
l7-default-backend     1         1         1            1           1h

$ kubectl get pods --namespace=kube-system                                                 
NAME                                                               READY     STATUS    RESTARTS   AGE
fluentd-cloud-logging-gke-jenkins-e2e-default-pool-91ebbcc7-f3wt   1/1       Unknown   0          1h
fluentd-cloud-logging-gke-jenkins-e2e-default-pool-91ebbcc7-mgst   1/1       Running   0          1h
fluentd-cloud-logging-gke-jenkins-e2e-default-pool-91ebbcc7-zhm0   1/1       Running   0          1h
heapster-v1.1.0-2096339923-39key                                   2/2       Running   0          1h
heapster-v1.2.0-2168613315-1bcy3                                   2/2       Running   0          1h
kube-dns-4101612645-78hx6                                          4/4       Running   0          1h
kube-dns-v17.1-3pyz0                                               3/3       Running   0          1h
kube-dns-v17.1-zaskz                                               3/3       Running   0          1h
kube-proxy-gke-jenkins-e2e-default-pool-91ebbcc7-f3wt              1/1       Unknown   0          1h
kube-proxy-gke-jenkins-e2e-default-pool-91ebbcc7-mgst              1/1       Running   0          1h
kube-proxy-gke-jenkins-e2e-default-pool-91ebbcc7-zhm0              1/1       Running   0          1h
kubernetes-dashboard-3697774758-n808h                              1/1       Running   0          1h
kubernetes-dashboard-v1.1.1-ljtj5                                  1/1       Running   0          1h
l7-default-backend-2234341178-vo5z1                                1/1       Running   0          1h
l7-default-backend-v1.0-qe9yo                                      1/1       Running   0          1h

I found this as a counting error in https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/logs/kubernetes-e2e-gke-container_vm-1.3-container_vm-1.5-upgrade-cluster/337. At first, I just thought it was incorrectly counting, and I attempted to fix that #36924. That fix is still valid and an improvement, but the underlying problem of multiple versions of addons still running is probably bad.

We need a mechanism to delete the old ReplicationControllers/Deployments after an upgrade.

The text was updated successfully, but these errors were encountered:

krousey · 2016-11-29T18:48:49Z

We could roll back #36008 to avoid the RC -> Deployment issue. To solve the 2 heapster deployments, we would have to either keep the version=v1.1.0 label in the v1.2.0 deployment or find a label combination that doesn't cause duplicate deployments to be created.

We could address this in GKE with a post-upgrade cleanup script, and note this in the release notes with manual steps to correct this.

I marked this as p0 because my counting fix won't work in 1.3 and the tests are still failing, and also there's a bigger issue at play. I would be ok with short term solutions to not encounter this for 1.5 and proper fixes in 1.6.

krousey · 2016-11-29T18:49:45Z

cc @saad-ali

krousey · 2016-11-29T18:51:21Z

cc @roberthbailey

saad-ali · 2016-11-29T18:58:22Z

CC @MrHohn @bowei

MrHohn · 2016-11-29T19:07:47Z

We do injected the mechanism to delete the old ReplicationControllers/Deployments after an upgrade in Add-on Manager.

This https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/addon-manager/kube-addons.sh#L191-L197 is for prunning the old ReplicationControllers. For the old heapter Deployment, because they have different names, kubectl apply --prune should be able to prune the old one as well.

Is there any way to retrieve the Add-on Manager's log from GKE master?

MrHohn · 2016-11-29T19:08:22Z

cc @mikedanese

MrHohn · 2016-11-29T19:10:58Z

The old resource pruning will have 1 minute delay though, it is for supporting zero downtime for kube-dns.

But this seems not to be this case.

MrHohn · 2016-11-29T19:16:29Z

Sorry, one mistake above. If the name of heapster Deployment changed, current Addon Manager will not prune the old one. This could be fixed by adding one more resource type in the same place(https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/addon-manager/kube-addons.sh#L191-L197).

I'm taking look at why the old RCs were not pruned.

mikedanese · 2016-11-29T19:35:58Z

@MrHohn yup, that's definitely it. Ping my on a PR and I can give you a quick review.

mikedanese · 2016-11-29T19:44:05Z

We also need to merge #37139 to get the --prune-whitelist in.

bowei · 2016-11-29T20:17:29Z

reference wrong issue, ignore above ^^^^^^

MrHohn · 2016-11-29T20:33:15Z

Yeah, but I think #37139 may not fix this issue since Addon Manager v6.0-alpha should be able to prune the old RCs in theory.

I'm working on a repro on my own cluster (upgrade from 1.3 -> 1.5). Also checking GCE 1.4 -> 1.5 upgrade tests here, but found the Addon Manager's log looks normal.

mikedanese · 2016-11-29T21:12:17Z

Yeah, but I think #37139 may not fix this issue since Addon Manager v6.0-alpha should be able to prune the old RCs in theory.

What currently is deployed doesn't have the prune whitelist and there are no RCs in the addons folder anymore so RCs aren't considered for pruning. I think we need both?

MrHohn · 2016-11-29T21:20:07Z

there are no RCs in the addons folder anymore

You are right. I used to think that there is still one ReplicationController in the addons folder --- elasticsearch-logging-v1. But turns out this is not enable on GKE.

If this is the case, #37139 combines with the quick fix for Deployment should do the job.

Will sent that PR very soon.

@mikedanese

Automatic merge from submit-queue Fixes Addon Manager's pruning issue for old Deployments Fixes #37641. Attaches the `last-applied`annotations to the existing Deployments for pruning. Below images are built and pushed: - gcr.io/google-containers/kube-addon-manager:v6.1 - gcr.io/google-containers/kube-addon-manager-amd64:v6.1 - gcr.io/google-containers/kube-addon-manager-arm:v6.1 - gcr.io/google-containers/kube-addon-manager-arm64:v6.1 - gcr.io/google-containers/kube-addon-manager-ppc64le:v6.1 @mikedanese cc @saad-ali @krousey

krousey added area/upgrade priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-blocker labels Nov 29, 2016

krousey added this to the v1.5 milestone Nov 29, 2016

krousey added sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. team/cluster labels Nov 29, 2016

mikedanese assigned mikedanese and MrHohn and unassigned mikedanese Nov 29, 2016

bowei mentioned this issue Nov 29, 2016

Remove extraneous curl, pods, etc from privileged pod test #37647

Merged

MrHohn mentioned this issue Nov 29, 2016

Fixes Addon Manager's pruning issue for old Deployments #37655

Merged

k8s-github-robot closed this as completed in #37655 Nov 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple versions of addons running after upgrade. #37641

Multiple versions of addons running after upgrade. #37641

krousey commented Nov 29, 2016

krousey commented Nov 29, 2016

krousey commented Nov 29, 2016

krousey commented Nov 29, 2016

saad-ali commented Nov 29, 2016

MrHohn commented Nov 29, 2016

MrHohn commented Nov 29, 2016

MrHohn commented Nov 29, 2016

MrHohn commented Nov 29, 2016

mikedanese commented Nov 29, 2016

mikedanese commented Nov 29, 2016

bowei commented Nov 29, 2016

MrHohn commented Nov 29, 2016

mikedanese commented Nov 29, 2016

MrHohn commented Nov 29, 2016

Multiple versions of addons running after upgrade. #37641

Multiple versions of addons running after upgrade. #37641

Comments

krousey commented Nov 29, 2016

krousey commented Nov 29, 2016

krousey commented Nov 29, 2016

krousey commented Nov 29, 2016

saad-ali commented Nov 29, 2016

MrHohn commented Nov 29, 2016

MrHohn commented Nov 29, 2016

MrHohn commented Nov 29, 2016

MrHohn commented Nov 29, 2016

mikedanese commented Nov 29, 2016

mikedanese commented Nov 29, 2016

bowei commented Nov 29, 2016

MrHohn commented Nov 29, 2016

mikedanese commented Nov 29, 2016

MrHohn commented Nov 29, 2016