Feature request: ability to specify a min/max replica count on scaleable resources #33843

derekwaynecarr · 2016-09-30T16:39:06Z

Users have requested the ability to set local min/max replica constraints for any resource that is a target for manual scaling (i.e. Deployment / ReplicaSet / ReplicationController / Job / PetSet).

The idea is that each spec for the related resources would have the following (similar to HPA):

// lower limit for the number of replicas (defaults to 0)
MinReplicas *int32 `json:"minReplicas,omitempty"`
// upper limit for the number of replicas
MaxReplicas *int32 `json:"maxReplicas,omitempty"`

If a user submitted a kubectl scale command that caused Replicas to fall out of the configured boundary, the request would fail validation, and not proceed.

The example use case is as follows:

When deploying a database (pod) with no replication using persistent volumes and persistent volume claims, I want to restrict the scaling of the pod to no more than one replica. The reason is that I don't want data distributed over various persistent stores causing fragmentation.

Keeping a min/max local to the resource may make the most sense, and is easily enforceable via validation. I had debated a pattern using LimitRange but that has the drawback that the scope covered is too large. While I am interested in adding label selectors to LimitRange to let you scope constraints to particular classes of things better, I think in this case, it may make the most sense to have min/max replicas on the local resource that is being constrained most closely.

Thoughts?

/cc @smarterclayton @eparis @bgrant0607 @lavalamp

The text was updated successfully, but these errors were encountered:

derekwaynecarr · 2016-09-30T16:46:45Z

I have users that want to run Spark or a Gluster management service that have basically requested this extra level of control to set local min/max replica boundaries for their resources where they wanted to prevent operator changes to replicas.

bgrant0607 · 2016-09-30T16:47:16Z

cc @erictune

bgrant0607 · 2016-09-30T16:51:32Z

I am not a fan of this feature. It would add API complexity and seems cumbersome. I think there are much more important issues with the controllers that we need to tackle if we're going to spend API review bandwidth.

derekwaynecarr · 2016-09-30T17:14:24Z

@bgrant0607 - my response echoed your sentiment, but I got a fair amount of push-back from three distinct user communities looking to support Spark, Gluster, and a mobile application scenario.

At it's core the request is asking to have a way to prevent users from harming themselves. User's can harm themselves any number of ways, so I get the complexity/cumbersome argument may not be worth it for all workloads.

Any objection to the concept via annotations on the objects in question, and enforcement via some optional admission controller? If it gets broad adoption, then we could revisit when more bandwidth is needed?

eparis · 2016-09-30T17:16:29Z

We are running into a number of places where the dev/person who wrote the app has these restrictions, but the operator doesn't likely know. It looks to me like a dev/ops split. Allow dev to keep ops from doing things to hurt themselves.

danmcp · 2016-09-30T18:15:55Z

I think @eparis nailed the interaction exactly. The application writers are often the ones applying/suggesting these limitations for the deployer/operator. It should generally be possible for the operator to override (at a config level) separate from the scaling operation itself.

smarterclayton · 2016-09-30T18:59:14Z

It's usually not enough to do min/max. We'd need to do cut outs - an etcd
pet set should probably only be allowed to exist at scales 3, 5, or 7.

On Fri, Sep 30, 2016 at 2:16 PM, Dan McPherson notifications@github.com
wrote:

I think @eparis https://github.com/eparis nailed the interaction
exactly. The application writers are often the ones applying/suggesting
these limitations for the deployer/operator. It should generally be
possible for the operator to override (at a config level) separate from the
scaling operation itself.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33843 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxnI6rZvk6j6LcVo2-6elzyUMqkaks5qvVHpgaJpZM4KLR_D
.

derekwaynecarr · 2016-10-01T01:52:41Z

ack: enforcing odd numbers is a reasonable request

smarterclayton · 2016-10-01T03:40:52Z

I share some of Brian's concerns as I've mentioned before. I think this
tends to be more of a requirement for pet sets (which was documented in
the original proposal but is not currently in the critical implementation
path). Ultimately the expectation that the person doing the scaling
doesn't understand what is being scaled is somewhat difficult to enforce,
even if it's not unreasonable when dealing with white box deployed software.

This most frequently comes up for things that are "stateful" - I can only
have 1 of something, but nothing prevents me from setting it to 2 or more.
I would argue the database case is best solved by PV, and the "I'm using a
PVC with RWO but based on RWM and it scales up and breaks me" case is best
solved by not allowing a PVC with RWO to be mounted in two locations (this
has been discussed under the pet set fencing and storage locking
discussion, and applies to local persistent storage and users performing in
place upgrades on the same node).

On Fri, Sep 30, 2016 at 9:52 PM, Derek Carr notifications@github.com
wrote:

ack: enforcing odd numbers is a reasonable request

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33843 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_p-J8Ws36jScRP0FD0y_ABLtm32Y5ks5qvbz2gaJpZM4KLR_D
.

bgrant0607 · 2016-10-11T22:25:27Z

cc @erictune

smarterclayton · 2016-10-12T02:15:45Z

It's not enough to do odd numbers though - 3,5,7 may be valid but others are not. A few systems require 3n+1. Also, does zero count, or is it a potential exception? I think this is a requirement to be leveraged against petsets, not against general resources, and should be solved there, not across all scalable resources.

smarterclayton · 2016-10-12T02:25:56Z

Also, enforcing an annotation would break autoscaling, and autoscaling
a pet set should be possible, so an enforcing admission controller
needs to at least take that into account and prevent abuse / bugs /
unexpected behavior.

On Oct 11, 2016, at 10:15 PM, Clayton Coleman ccoleman@redhat.com wrote:

It's not enough to do odd numbers though - 3,5,7 may be valid but
others are not. A few systems require 3n+1. Also, does zero count,
or is it a potential exception?

I think this is a requirement to be leveraged against petsets, not
against general resources, and should be solved there, not across all
scalable resources.

smarterclayton · 2016-10-12T13:32:40Z

A much simpler option might be a "scale protection", like the "delete
protection" we discussed previously. An annotation which clients respect
as "the caller suggested not scaling this". That makes this less of an API
problem than a UX problem.

On Tue, Oct 11, 2016 at 7:25 PM, Clayton Coleman ccoleman@redhat.com
wrote:

Also, enforcing an annotation would break autoscaling, and autoscaling
a pet set should be possible, so an enforcing admission controller
needs to at least take that into account and prevent abuse / bugs /
unexpected behavior.

On Oct 11, 2016, at 10:15 PM, Clayton Coleman ccoleman@redhat.com
wrote:

It's not enough to do odd numbers though - 3,5,7 may be valid but
others are not. A few systems require 3n+1. Also, does zero count,
or is it a potential exception?

I think this is a requirement to be leveraged against petsets, not
against general resources, and should be solved there, not across all
scalable resources.

dims · 2016-11-16T14:33:27Z

This needs to be triaged as a release-blocker or not for 1.5 @smarterclayton @derekwaynecarr

eparis · 2016-11-16T14:38:16Z

not blocker.

dims · 2016-11-18T13:02:29Z

thanks @eparis

dims · 2016-12-09T15:09:35Z

@derekwaynecarr Is it appropriate to move this to the next milestone or clear the 1.5 milestone? (and remove the non-release-blocker tag as well)

ethernetdan · 2017-03-13T22:39:22Z

Moving to 1.7 as late to happen in 1.6. Feel free to switch back if this is incorrect.

lpshikhar · 2017-04-21T18:47:18Z

Any progress on the scale selector ?

fejta-bot · 2017-12-23T12:02:40Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-22T12:50:24Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

bgrant0607 · 2018-01-22T16:21:14Z

This kind of policy should be implemented outside the workload controllers.

derekwaynecarr self-assigned this Sep 30, 2016

derekwaynecarr added this to the v1.5 milestone Sep 30, 2016

k8s-github-robot added area/kubectl team/ux labels Sep 30, 2016

bgrant0607 added sig/apps Categorizes an issue or PR as relevant to SIG Apps. area/workload-api/deployment and removed area/kubectl team/ux labels Sep 30, 2016

dims added the non-release-blocker label Nov 18, 2016

eparis modified the milestones: v1.6, v1.5 Dec 9, 2016

dims removed the non-release-blocker label Dec 12, 2016

ethernetdan modified the milestones: v1.7, v1.6 Mar 13, 2017

bgrant0607 removed this from the v1.7 milestone Mar 21, 2017

bgrant0607 mentioned this issue Mar 21, 2017

Workload API v1 requirements umbrella issue #42752

Closed

0xmichalis added area/stateful-apps and removed area/workload-api/deployment labels Mar 21, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 23, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 22, 2018

bgrant0607 closed this as completed Jan 22, 2018

dturn mentioned this issue Jan 24, 2018

New annotations for safety Shopify/krane#241

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: ability to specify a min/max replica count on scaleable resources #33843

Feature request: ability to specify a min/max replica count on scaleable resources #33843

derekwaynecarr commented Sep 30, 2016

derekwaynecarr commented Sep 30, 2016 •

edited

Loading

bgrant0607 commented Sep 30, 2016

bgrant0607 commented Sep 30, 2016

derekwaynecarr commented Sep 30, 2016

eparis commented Sep 30, 2016

danmcp commented Sep 30, 2016

smarterclayton commented Sep 30, 2016

derekwaynecarr commented Oct 1, 2016

smarterclayton commented Oct 1, 2016

bgrant0607 commented Oct 11, 2016

smarterclayton commented Oct 12, 2016 via email

smarterclayton commented Oct 12, 2016

smarterclayton commented Oct 12, 2016

dims commented Nov 16, 2016

eparis commented Nov 16, 2016

dims commented Nov 18, 2016

dims commented Dec 9, 2016

ethernetdan commented Mar 13, 2017

lpshikhar commented Apr 21, 2017

fejta-bot commented Dec 23, 2017

fejta-bot commented Jan 22, 2018

bgrant0607 commented Jan 22, 2018

Feature request: ability to specify a min/max replica count on scaleable resources #33843

Feature request: ability to specify a min/max replica count on scaleable resources #33843

Comments

derekwaynecarr commented Sep 30, 2016

derekwaynecarr commented Sep 30, 2016 • edited Loading

bgrant0607 commented Sep 30, 2016

bgrant0607 commented Sep 30, 2016

derekwaynecarr commented Sep 30, 2016

eparis commented Sep 30, 2016

danmcp commented Sep 30, 2016

smarterclayton commented Sep 30, 2016

derekwaynecarr commented Oct 1, 2016

smarterclayton commented Oct 1, 2016

bgrant0607 commented Oct 11, 2016

smarterclayton commented Oct 12, 2016 via email

smarterclayton commented Oct 12, 2016

smarterclayton commented Oct 12, 2016

dims commented Nov 16, 2016

eparis commented Nov 16, 2016

dims commented Nov 18, 2016

dims commented Dec 9, 2016

ethernetdan commented Mar 13, 2017

lpshikhar commented Apr 21, 2017

fejta-bot commented Dec 23, 2017

fejta-bot commented Jan 22, 2018

bgrant0607 commented Jan 22, 2018

derekwaynecarr commented Sep 30, 2016 •

edited

Loading