Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proportionally scale paused and rolling deployments #20273

Merged
merged 3 commits into from
Jun 25, 2016
Merged

Proportionally scale paused and rolling deployments #20273

merged 3 commits into from
Jun 25, 2016

Conversation

0xmichalis
Copy link
Contributor

@0xmichalis 0xmichalis commented Jan 28, 2016

Enable paused and rolling deployments to be proportionally scaled.
Also have cleanup policy work for paused deployments.

Fixes #20853
Fixes #20966
Fixes #20754

@bgrant0607 @janetkuo @ironcladlou @nikhiljindal


This change is Reviewable

@k8s-github-robot k8s-github-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 28, 2016
@bprashanth bprashanth assigned janetkuo and unassigned bprashanth Jan 28, 2016
// modified since. Such is life.
return nil
}
_, err = dc.scaleRCAndRecordEvent(newRC, deployment.Spec.Replicas, deployment)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want to scale the new RC to the full replica count of the Deployment. That would radically change the pace of the rolling update and violate maxSurge.

The least impactful behavior with respect to update progress would be to proportionally scale all RCs with non-zero replicas, rounding to whole replicas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the un-paused scaling behavior is what we want, either. If the new scale is greater than the scale of the newest RC, then it scales down just the newest RC. However, that doesn't take into account that older RCs may still have replicas. Scaling proportionally makes sense to me while un-paused, also.

It also doesn't take into account which pods are ready or not ready, but that's another issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the new scale is greater than the scale of the newest RC, then it scales down just the newest RC.

You mean lesser, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lesser. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fat fingers

We don't want to scale the new RC to the full replica count of the Deployment. That would radically change the pace of the rolling update and violate maxSurge.

I was thinking about this today and I believe maxSurge does make sense only under a progressing deployment, right? If a deployment is paused but not underway a rollout (ie. its latest rs owns all of the replicas), then we should scale to the full count. If a deployment is paused in the middle of a rollout, respect maxSurge/maxUnavailable and scale down older, scale up new. Also this is specific for rolling deployments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why might a user pause a rollout? They might want to treat new instances as a canary to observe how they behave under real load. Or they might want to run tests in prod. Or they may have observed a problem that they want to investigate.

In any of these cases, I'd imagine they'd want to freeze the rollout in whatever state it was in.

And, whether paused or not, I'd imagine scaling (eg. by HPA) would change the current state proportionally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which makes me think that we should implement canary checks. Also regarding test deployments see openshift/origin#6930.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the un-paused scaling behavior is what we want, either. If the new scale is lesser than the scale of the newest RC, then it scales down just the newest RC. However, that doesn't take into account that older RCs may still have replicas.

Then we are susceptible to violating maxUnavailable, right? It seems to me that we want exactly the behavior of unpaused minus the generation of a new rc in case of podtemplate changes.

@bgrant0607 bgrant0607 assigned bgrant0607 and unassigned janetkuo Jan 29, 2016
@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 31, 2016
@bgrant0607
Copy link
Member

@Kargakis I'd like to get this into 1.2. Do you have time to work on it this week?

cc @mqliang

@bgrant0607 bgrant0607 added this to the v1.2 milestone Feb 3, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 3, 2016
@0xmichalis 0xmichalis changed the title deployment-controller: allow scaling of paused deployments to propagate Allow scaling of paused deployments to propagate Feb 3, 2016
@0xmichalis
Copy link
Contributor Author

@bgrant0607 updated to use the normal path when there is already a new rc for a paused deployment. The intent of a paused deployment is to make changes in the pod template w/o having the controller automatically reconciling those new changes. I now believe it makes sense to provide everything else we already have: proportional scaling in both directions to ensure we won't violate maxSurge/maxUnavailable, cleanup policy if defined.

@bgrant0607
Copy link
Member

@Kargakis I'll take a look, but I would not expect a rollout to advance (e.g., scale all old RCs down to 0 while the Deployment replicas count were unchanged) if the Deployment were paused.

@0xmichalis 0xmichalis closed this Feb 4, 2016
@0xmichalis 0xmichalis reopened this Feb 4, 2016
@0xmichalis
Copy link
Contributor Author

@bgrant0607 we can check that d.status.replicas (current pods from all rcs) equals d.spec.replicas. If not, we can interpret this as a change in the scale (most probably it will be that - otherwise it's just the difference created by maxSurge/maxUnavailable) and allow the deployment to process.

@bgrant0607
Copy link
Member

@Kargakis We should take surge into account. spec.replicas compared with (status.replicas - surge amount). Assuming that a rollout is underway. If no rollout is underway, then there would be no surge amount.

@bgrant0607
Copy link
Member

When paused, I don't see much harm in creating new RCs with 0 replicas, so that they appear in the revision history. OTOH, the revision history is about the only reason to. Note also that an RC might already exist that matches the updated pod template, anyway.

If there's only one RC with non-zero replicas, scaling is easy.

Otherwise, it's not so easy.

Scenario axes when a scaling (up or down) event occurs:

  • Paused or unpaused
  • RC that matches the current pod template or not
  • Old RCs with non-zero replicas or not
  • Full surge pod count, less than full surge pod count since new RC had full number of replicas, or no surge pods

Unless we record the number of surge pods in status or in annotations on the RCs (#14062 (comment)), it's not obvious to me that we could figure out the previous desired number of replicas, which I'd like to do in order to compute the scaling factor.

If we detect a scaling event, I'd like to scale all the RCs with non-zero replicas proportionally, independent of the normal reconciliation that enables rollout progress.

@bgrant0607
Copy link
Member

One advantage of annotations on the RCs is that they can be written atomically with RC scale changes.

@bgrant0607
Copy link
Member

The proposal to scale proportionally was to mitigate risk. Otherwise, scaling up would increase the size of the newRC and scaling down would decrease the sizes of the old ones, both of which would have the effect of hastening the rollout progress, which could produce a higher proportion of unavailable replicas in the event of a problem with the rolled out template.

An alternative would be to scale up the RC with the largest number of available replicas and scale down the one(s) with the most unavailable replicas. However, in order to be obviously better than proportional scaling, we'd have to be able to do it without actually figuring out whether we were scaling up or down.

The lack of multi-resource transactions certainly makes this harder.

@bgrant0607
Copy link
Member

We should be able to tell whether there was a scaling event.

If there is one RC with non-zero replicas, just compare its replicas count with the Deployment's.

If there are multiple RCs with non-zero replicas, compare the total pod count with Deployment's plus maxSurge and newRC's replica count with Deployment's replica count (since newRC's shouldn't exceed Deployment's).

Concrete example, similar to the one I posted to #20368:

  • 10 replicas
  • 2 maxUnavailable (absolute number, not percent)
  • 3 maxSurge (absolute number, not percent)

Deployment is updated, newRC is created with 3 replicas, oldRC is scaled down to 8, and newRC is scaled up to 5.

3 newRC replicas become available, so oldRC is scaled down to 5 and newRC is scaled up to 8.

Deployment is updated again, a newnewRC is created with 0 replicas (allPodsCount is already 13). Let's say 5 of the 8 replicas of (the old) newRC are available, so 3 will be scaled down and then newnewRC will be scaled up to 3: 5, 5, and 3.

What if we scale the Deployment up to 12 at this point? Well, there are multiple RCs with non-zero replicas, so we could have up to 12 + maxSurge replicas, 15 in this case. But there are only 13 pods.

Where should the 2 new pods go?

The current policy would always add them to newRC:

scaleUpCount := min(maxTotalPods - currentPodCount, deploymentReplicas-newRCReplicas)
newReplicasCount := newRCReplicas + scaleUpCount

And shrinking the Deployment replicas count shrinks the newRC if it exceeds the new replicas count, and otherwise just shrinks the oldRCs.

Could we do proportional distribution?

Sort the RCs in descending order by number of available pods: 5, 5, 0.

Compute the distribution weighed by proportion of available pods, round off, distribute leftovers evenly across the RCs in order, and cap at the number of pods we want to add. 5/10*2 = 1, so +1, +1, +0.

Shrinking down should work the same way.

However, one catch is that we currently create the newRC with 0 replicas, so a pod template update would always look like a scaling event. We could fix that by setting the replicas count at creation time, though, to maxTotalPods - currentPodCount.

@bgrant0607
Copy link
Member

Note that if the controller failed in the middle of effecting the proportional changes it could skew the proportions. I think that's acceptable, especially if we use available pod counts, since newly created pods wouldn't become available immediately when scaling up, and RCs that had already been scaled down would be scaled proportionally less after recovery. That would still be more graceful than always scaling the newest RC.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 4, 2016
@0xmichalis
Copy link
Contributor Author

@bgrant0607 thanks for the thorough analysis. I have pushed changes that implement proportional scaling for paused deployments. Please have a look. Note that tomorrow I'll be out for DevConf here at Brno and I am not sure I can come back to this first thing on Monday. If needed, @janetkuo can pick it up.

@bgrant0607
Copy link
Member

@Kargakis Thanks. I'll take a look. Scaling proportionally while paused would definitely be an improvement over not scaling while paused. However, I would like unpaused deployments to be scaled proportionally, also.

if err != nil {
return fmt.Errorf("invalid value for MaxSurge: %v", err)
}
if isPercent {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen this code other places. We need a GetIntValueFromIntOrPercent() helper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can update this once #21044 merges

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 16, 2016
@0xmichalis 0xmichalis added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2016
@pwittrock
Copy link
Member

My pleasure

allErrs = append(allErrs, apivalidation.ValidateNonnegativeField(int64(intOrPercent.IntValue()), fldPath)...)
default:
allErrs = append(allErrs, field.Invalid(fldPath, intOrPercent, "must be an integer or percentage (e.g '5%')"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to escape % with another one. So this needs to look like this: 5%%.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct. will update

Enable paused and rolling deployments to be proportionally scaled.
Also have cleanup policy work for paused deployments.
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 20, 2016
@0xmichalis 0xmichalis added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 20, 2016
@k8s-bot
Copy link

k8s-bot commented Jun 24, 2016

GCE e2e build/test passed for commit f3d2e3f.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jun 25, 2016

GCE e2e build/test passed for commit f3d2e3f.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

@k8s-github-robot k8s-github-robot merged commit fc1937f into kubernetes:master Jun 25, 2016
@0xmichalis 0xmichalis deleted the allow-scaling-paused-deployments branch June 25, 2016 08:02
janetkuo added a commit to janetkuo/kubernetes that referenced this pull request Sep 16, 2016
Proportionally scale deployments was introduced in v1.4 (kubernetes#20273),
which caused this 1.3 test to fail on 1.4 clusters.
This is cherrypicked into 1.3 merely for fixing test failues on
kubernetes-e2e-gke-1.3-1.4-upgrade-cluster
janetkuo added a commit to janetkuo/kubernetes that referenced this pull request Sep 16, 2016
Proportionally scale deployments was introduced in v1.4 (kubernetes#20273),
which caused this 1.2 test to fail on 1.4 clusters.
This is cherrypicked into 1.2 merely for fixing test failues on
kubernetes-e2e-gke-1.2-1.4-upgrade-cluster
k8s-github-robot pushed a commit that referenced this pull request Sep 17, 2016
Automatic merge from submit-queue

Fix 1.3 paused deployment test failure against >1.4 cluster

<!--  Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->

**What this PR does / why we need it**: Fixing upgrade test kubernetes-e2e-gke-1.3-1.4-upgrade-cluster

Proportionally scale deployments was introduced in v1.4 (#20273),
which caused this 1.3 test to fail on 1.4 clusters.
This is cherrypicked into 1.3 merely for fixing test failues on
kubernetes-e2e-gke-1.3-1.4-upgrade-cluster

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #32710

**Special notes for your reviewer**:

**Release note**:
<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->
```release-note
```
@spiffxp
Copy link
Member

spiffxp commented Sep 21, 2016

@Kargakis @pwittrock this PR has a release-note-action-required label, but I can't actually tell what the action is, CHANGELOG.md only lists the PR title

can you clarify what the action required is?

@pwittrock
Copy link
Member

I don't think there is any action required before upgrading, but this is definitely a notable change to existing behavior. May be there should be a section for that.

  • ReplicaSets of paused Deployments are now scaled while the Deployment is paused. This is retroactive to existing Deployments.
  • When scaling a Deployment during a rollout, the ReplicaSets of all Deployments are now scaled proportionally based on the number of replicas they each have instead of only scaling the newest ReplicaSet.

robdaemon pushed a commit to hpcloud/kubernetes that referenced this pull request Oct 31, 2016
Proportionally scale deployments was introduced in v1.4 (kubernetes#20273),
which caused this 1.3 test to fail on 1.4 clusters.
This is cherrypicked into 1.3 merely for fixing test failues on
kubernetes-e2e-gke-1.3-1.4-upgrade-cluster
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
Proportionally scale deployments was introduced in v1.4 (kubernetes#20273),
which caused this 1.3 test to fail on 1.4 clusters.
This is cherrypicked into 1.3 merely for fixing test failues on
kubernetes-e2e-gke-1.3-1.4-upgrade-cluster
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
…-deployment

Automatic merge from submit-queue

Fix 1.3 paused deployment test failure against >1.4 cluster

<!--  Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, read our contributor guidelines https://github.com/kubernetes/kubernetes/blob/master/CONTRIBUTING.md and developer guide https://github.com/kubernetes/kubernetes/blob/master/docs/devel/development.md
2. If you want *faster* PR reviews, read how: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/faster_reviews.md
3. Follow the instructions for writing a release note: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/pull-requests.md#release-notes
-->

**What this PR does / why we need it**: Fixing upgrade test kubernetes-e2e-gke-1.3-1.4-upgrade-cluster

Proportionally scale deployments was introduced in v1.4 (kubernetes#20273),
which caused this 1.3 test to fail on 1.4 clusters.
This is cherrypicked into 1.3 merely for fixing test failues on
kubernetes-e2e-gke-1.3-1.4-upgrade-cluster

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes kubernetes#32710

**Special notes for your reviewer**:

**Release note**:
<!--  Steps to write your release note:
1. Use the release-note-* labels to set the release note state (if you have access) 
2. Enter your extended release note in the below block; leaving it blank means using the PR title as the release note. If no release note is required, just write `NONE`. 
-->
```release-note
```
shyamjvs pushed a commit to shyamjvs/kubernetes that referenced this pull request Dec 1, 2016
Proportionally scale deployments was introduced in v1.4 (kubernetes#20273),
which caused this 1.2 test to fail on 1.4 clusters.
This is cherrypicked into 1.2 merely for fixing test failues on
kubernetes-e2e-gke-1.2-1.4-upgrade-cluster
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.