Proportionally scale paused and rolling deployments #20273

0xmichalis · 2016-01-28T16:50:25Z

Enable paused and rolling deployments to be proportionally scaled.
Also have cleanup policy work for paused deployments.

Fixes #20853
Fixes #20966
Fixes #20754

@bgrant0607 @janetkuo @ironcladlou @nikhiljindal

This change is

bgrant0607 · 2016-01-29T04:49:06Z

pkg/controller/deployment/deployment_controller.go

+		// modified since. Such is life.
+		return nil
+	}
+	_, err = dc.scaleRCAndRecordEvent(newRC, deployment.Spec.Replicas, deployment)


We don't want to scale the new RC to the full replica count of the Deployment. That would radically change the pace of the rolling update and violate maxSurge.

The least impactful behavior with respect to update progress would be to proportionally scale all RCs with non-zero replicas, rounding to whole replicas.

I don't think the un-paused scaling behavior is what we want, either. If the new scale is greater than the scale of the newest RC, then it scales down just the newest RC. However, that doesn't take into account that older RCs may still have replicas. Scaling proportionally makes sense to me while un-paused, also.

It also doesn't take into account which pods are ready or not ready, but that's another issue.

If the new scale is greater than the scale of the newest RC, then it scales down just the newest RC.

You mean lesser, right?

Yes, lesser. :-)

fat fingers

We don't want to scale the new RC to the full replica count of the Deployment. That would radically change the pace of the rolling update and violate maxSurge.

I was thinking about this today and I believe maxSurge does make sense only under a progressing deployment, right? If a deployment is paused but not underway a rollout (ie. its latest rs owns all of the replicas), then we should scale to the full count. If a deployment is paused in the middle of a rollout, respect maxSurge/maxUnavailable and scale down older, scale up new. Also this is specific for rolling deployments.

Why might a user pause a rollout? They might want to treat new instances as a canary to observe how they behave under real load. Or they might want to run tests in prod. Or they may have observed a problem that they want to investigate.

In any of these cases, I'd imagine they'd want to freeze the rollout in whatever state it was in.

And, whether paused or not, I'd imagine scaling (eg. by HPA) would change the current state proportionally.

Which makes me think that we should implement canary checks. Also regarding test deployments see openshift/origin#6930.

I don't think the un-paused scaling behavior is what we want, either. If the new scale is lesser than the scale of the newest RC, then it scales down just the newest RC. However, that doesn't take into account that older RCs may still have replicas.

Then we are susceptible to violating maxUnavailable, right? It seems to me that we want exactly the behavior of unpaused minus the generation of a new rc in case of podtemplate changes.

bgrant0607 · 2016-02-03T00:12:39Z

@Kargakis I'd like to get this into 1.2. Do you have time to work on it this week?

cc @mqliang

0xmichalis · 2016-02-03T13:14:03Z

@bgrant0607 updated to use the normal path when there is already a new rc for a paused deployment. The intent of a paused deployment is to make changes in the pod template w/o having the controller automatically reconciling those new changes. I now believe it makes sense to provide everything else we already have: proportional scaling in both directions to ensure we won't violate maxSurge/maxUnavailable, cleanup policy if defined.

bgrant0607 · 2016-02-03T22:55:54Z

@Kargakis I'll take a look, but I would not expect a rollout to advance (e.g., scale all old RCs down to 0 while the Deployment replicas count were unchanged) if the Deployment were paused.

0xmichalis · 2016-02-04T00:27:59Z

@bgrant0607 we can check that d.status.replicas (current pods from all rcs) equals d.spec.replicas. If not, we can interpret this as a change in the scale (most probably it will be that - otherwise it's just the difference created by maxSurge/maxUnavailable) and allow the deployment to process.

bgrant0607 · 2016-02-04T00:33:12Z

@Kargakis We should take surge into account. spec.replicas compared with (status.replicas - surge amount). Assuming that a rollout is underway. If no rollout is underway, then there would be no surge amount.

bgrant0607 · 2016-02-04T04:15:02Z

When paused, I don't see much harm in creating new RCs with 0 replicas, so that they appear in the revision history. OTOH, the revision history is about the only reason to. Note also that an RC might already exist that matches the updated pod template, anyway.

If there's only one RC with non-zero replicas, scaling is easy.

Otherwise, it's not so easy.

Scenario axes when a scaling (up or down) event occurs:

Paused or unpaused
RC that matches the current pod template or not
Old RCs with non-zero replicas or not
Full surge pod count, less than full surge pod count since new RC had full number of replicas, or no surge pods

Unless we record the number of surge pods in status or in annotations on the RCs (#14062 (comment)), it's not obvious to me that we could figure out the previous desired number of replicas, which I'd like to do in order to compute the scaling factor.

If we detect a scaling event, I'd like to scale all the RCs with non-zero replicas proportionally, independent of the normal reconciliation that enables rollout progress.

bgrant0607 · 2016-02-04T04:17:04Z

One advantage of annotations on the RCs is that they can be written atomically with RC scale changes.

bgrant0607 · 2016-02-04T04:55:33Z

The proposal to scale proportionally was to mitigate risk. Otherwise, scaling up would increase the size of the newRC and scaling down would decrease the sizes of the old ones, both of which would have the effect of hastening the rollout progress, which could produce a higher proportion of unavailable replicas in the event of a problem with the rolled out template.

An alternative would be to scale up the RC with the largest number of available replicas and scale down the one(s) with the most unavailable replicas. However, in order to be obviously better than proportional scaling, we'd have to be able to do it without actually figuring out whether we were scaling up or down.

The lack of multi-resource transactions certainly makes this harder.

bgrant0607 · 2016-02-04T07:05:56Z

We should be able to tell whether there was a scaling event.

If there is one RC with non-zero replicas, just compare its replicas count with the Deployment's.

If there are multiple RCs with non-zero replicas, compare the total pod count with Deployment's plus maxSurge and newRC's replica count with Deployment's replica count (since newRC's shouldn't exceed Deployment's).

Concrete example, similar to the one I posted to #20368:

10 replicas
2 maxUnavailable (absolute number, not percent)
3 maxSurge (absolute number, not percent)

Deployment is updated, newRC is created with 3 replicas, oldRC is scaled down to 8, and newRC is scaled up to 5.

3 newRC replicas become available, so oldRC is scaled down to 5 and newRC is scaled up to 8.

Deployment is updated again, a newnewRC is created with 0 replicas (allPodsCount is already 13). Let's say 5 of the 8 replicas of (the old) newRC are available, so 3 will be scaled down and then newnewRC will be scaled up to 3: 5, 5, and 3.

What if we scale the Deployment up to 12 at this point? Well, there are multiple RCs with non-zero replicas, so we could have up to 12 + maxSurge replicas, 15 in this case. But there are only 13 pods.

Where should the 2 new pods go?

The current policy would always add them to newRC:

scaleUpCount := min(maxTotalPods - currentPodCount, deploymentReplicas-newRCReplicas)
newReplicasCount := newRCReplicas + scaleUpCount

And shrinking the Deployment replicas count shrinks the newRC if it exceeds the new replicas count, and otherwise just shrinks the oldRCs.

Could we do proportional distribution?

Sort the RCs in descending order by number of available pods: 5, 5, 0.

Compute the distribution weighed by proportion of available pods, round off, distribute leftovers evenly across the RCs in order, and cap at the number of pods we want to add. 5/10*2 = 1, so +1, +1, +0.

Shrinking down should work the same way.

However, one catch is that we currently create the newRC with 0 replicas, so a pod template update would always look like a scaling event. We could fix that by setting the replicas count at creation time, though, to maxTotalPods - currentPodCount.

bgrant0607 · 2016-02-04T07:12:32Z

Note that if the controller failed in the middle of effecting the proportional changes it could skew the proportions. I think that's acceptable, especially if we use available pod counts, since newly created pods wouldn't become available immediately when scaling up, and RCs that had already been scaled down would be scaled proportionally less after recovery. That would still be more graceful than always scaling the newest RC.

0xmichalis · 2016-02-04T13:26:28Z

@bgrant0607 thanks for the thorough analysis. I have pushed changes that implement proportional scaling for paused deployments. Please have a look. Note that tomorrow I'll be out for DevConf here at Brno and I am not sure I can come back to this first thing on Monday. If needed, @janetkuo can pick it up.

bgrant0607 · 2016-02-04T21:20:15Z

@Kargakis Thanks. I'll take a look. Scaling proportionally while paused would definitely be an improvement over not scaling while paused. However, I would like unpaused deployments to be scaled proportionally, also.

bgrant0607 · 2016-02-04T23:15:33Z

pkg/controller/deployment/deployment_controller.go

+	if err != nil {
+		return fmt.Errorf("invalid value for MaxSurge: %v", err)
+	}
+	if isPercent {


I've seen this code other places. We need a GetIntValueFromIntOrPercent() helper.

We can update this once #21044 merges

pwittrock · 2016-06-16T14:45:42Z

My pleasure

soltysh · 2016-06-20T10:01:47Z

pkg/apis/extensions/validation/validation.go

 		allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(intOrPercent.IntValue()), fldPath)...)
+	default:
+		allErrs = append(allErrs, field.Invalid(fldPath, intOrPercent, "must be an integer or percentage (e.g '5%')"))


You need to escape % with another one. So this needs to look like this: 5%%.

correct. will update

Enable paused and rolling deployments to be proportionally scaled. Also have cleanup policy work for paused deployments.

k8s-bot · 2016-06-24T11:14:54Z

GCE e2e build/test passed for commit f3d2e3f.

k8s-github-robot · 2016-06-25T02:08:41Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-bot · 2016-06-25T02:40:48Z

GCE e2e build/test passed for commit f3d2e3f.

k8s-github-robot · 2016-06-25T02:41:51Z

Automatic merge from submit-queue

Proportionally scale deployments was introduced in v1.4 (kubernetes#20273), which caused this 1.3 test to fail on 1.4 clusters. This is cherrypicked into 1.3 merely for fixing test failues on kubernetes-e2e-gke-1.3-1.4-upgrade-cluster

Proportionally scale deployments was introduced in v1.4 (kubernetes#20273), which caused this 1.2 test to fail on 1.4 clusters. This is cherrypicked into 1.2 merely for fixing test failues on kubernetes-e2e-gke-1.2-1.4-upgrade-cluster

Automatic merge from submit-queue Fix 1.3 paused deployment test failure against >1.4 cluster  **What this PR does / why we need it**: Fixing upgrade test kubernetes-e2e-gke-1.3-1.4-upgrade-cluster Proportionally scale deployments was introduced in v1.4 (#20273), which caused this 1.3 test to fail on 1.4 clusters. This is cherrypicked into 1.3 merely for fixing test failues on kubernetes-e2e-gke-1.3-1.4-upgrade-cluster **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #32710 **Special notes for your reviewer**: **Release note**:  ```release-note ```

spiffxp · 2016-09-21T22:28:06Z

@Kargakis @pwittrock this PR has a release-note-action-required label, but I can't actually tell what the action is, CHANGELOG.md only lists the PR title

can you clarify what the action required is?

pwittrock · 2016-09-22T01:05:20Z

I don't think there is any action required before upgrading, but this is definitely a notable change to existing behavior. May be there should be a section for that.

ReplicaSets of paused Deployments are now scaled while the Deployment is paused. This is retroactive to existing Deployments.

When scaling a Deployment during a rollout, the ReplicaSets of all Deployments are now scaled proportionally based on the number of replicas they each have instead of only scaling the newest ReplicaSet.

Proportionally scale deployments was introduced in v1.4 (kubernetes#20273), which caused this 1.3 test to fail on 1.4 clusters. This is cherrypicked into 1.3 merely for fixing test failues on kubernetes-e2e-gke-1.3-1.4-upgrade-cluster

…-deployment Automatic merge from submit-queue Fix 1.3 paused deployment test failure against >1.4 cluster  **What this PR does / why we need it**: Fixing upgrade test kubernetes-e2e-gke-1.3-1.4-upgrade-cluster Proportionally scale deployments was introduced in v1.4 (kubernetes#20273), which caused this 1.3 test to fail on 1.4 clusters. This is cherrypicked into 1.3 merely for fixing test failues on kubernetes-e2e-gke-1.3-1.4-upgrade-cluster **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes#32710 **Special notes for your reviewer**: **Release note**:  ```release-note ```

Proportionally scale deployments was introduced in v1.4 (kubernetes#20273), which caused this 1.2 test to fail on 1.4 clusters. This is cherrypicked into 1.2 merely for fixing test failues on kubernetes-e2e-gke-1.2-1.4-upgrade-cluster

k8s-github-robot assigned bprashanth Jan 28, 2016

k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 28, 2016

bprashanth assigned janetkuo and unassigned bprashanth Jan 28, 2016

bgrant0607 reviewed Jan 29, 2016
View reviewed changes

bgrant0607 assigned bgrant0607 and unassigned janetkuo Jan 29, 2016

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 31, 2016

bgrant0607 added this to the v1.2 milestone Feb 3, 2016

bgrant0607 added the area/app-lifecycle label Feb 3, 2016

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2016

0xmichalis changed the title ~~deployment-controller: allow scaling of paused deployments to propagate~~ Allow scaling of paused deployments to propagate Feb 3, 2016

0xmichalis closed this Feb 4, 2016

0xmichalis reopened this Feb 4, 2016

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 4, 2016

bgrant0607 reviewed Feb 4, 2016
View reviewed changes

k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 16, 2016

0xmichalis added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2016

soltysh reviewed Jun 20, 2016
View reviewed changes

0xmichalis added 3 commits June 20, 2016 12:13

integer: add utility for proper integer rounding

bad8b6d

integer: add int32 min/max helpers

a098d9f

controller: proportionally scale paused and rolling deployments

f3d2e3f

Enable paused and rolling deployments to be proportionally scaled. Also have cleanup policy work for paused deployments.

k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 20, 2016

0xmichalis added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 20, 2016

0xmichalis mentioned this pull request Jun 20, 2016

Controllers doesn't take any actions when being deleted. #27438

Merged

k8s-github-robot merged commit fc1937f into kubernetes:master Jun 25, 2016

0xmichalis deleted the allow-scaling-paused-deployments branch June 25, 2016 08:02

janetkuo mentioned this pull request Sep 16, 2016

Fix 1.3 paused deployment test failure against >1.4 cluster #32901

Merged

janetkuo mentioned this pull request Sep 16, 2016

Fix 1.2 paused deployment test failure against >1.4 cluster #32904

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proportionally scale paused and rolling deployments #20273

Proportionally scale paused and rolling deployments #20273

0xmichalis commented Jan 28, 2016 •

edited

Loading

bgrant0607 Jan 29, 2016

bgrant0607 Jan 29, 2016

0xmichalis Jan 29, 2016

bgrant0607 Jan 29, 2016

0xmichalis Feb 3, 2016

bgrant0607 Feb 3, 2016

0xmichalis Feb 3, 2016

0xmichalis Feb 3, 2016

bgrant0607 commented Feb 3, 2016

0xmichalis commented Feb 3, 2016

bgrant0607 commented Feb 3, 2016

0xmichalis commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

0xmichalis commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 Feb 4, 2016

bgrant0607 Feb 13, 2016

pwittrock commented Jun 16, 2016

soltysh Jun 20, 2016

0xmichalis Jun 20, 2016

k8s-bot commented Jun 24, 2016

k8s-github-robot commented Jun 25, 2016

k8s-bot commented Jun 25, 2016

k8s-github-robot commented Jun 25, 2016

spiffxp commented Sep 21, 2016

pwittrock commented Sep 22, 2016

Proportionally scale paused and rolling deployments #20273

Proportionally scale paused and rolling deployments #20273

Conversation

0xmichalis commented Jan 28, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgrant0607 commented Feb 3, 2016

0xmichalis commented Feb 3, 2016

bgrant0607 commented Feb 3, 2016

0xmichalis commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

0xmichalis commented Feb 4, 2016

bgrant0607 commented Feb 4, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pwittrock commented Jun 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Jun 24, 2016

k8s-github-robot commented Jun 25, 2016

k8s-bot commented Jun 25, 2016

k8s-github-robot commented Jun 25, 2016

spiffxp commented Sep 21, 2016

pwittrock commented Sep 22, 2016

0xmichalis commented Jan 28, 2016 •

edited

Loading