diff --git a/docs/proposals/autoscaling.md b/docs/proposals/autoscaling.md new file mode 100644 index 0000000000000..029a6a822d966 --- /dev/null +++ b/docs/proposals/autoscaling.md @@ -0,0 +1,254 @@ +## Abstract +Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the +number of pods deployed within the system automatically. + +## Motivation + +Applications experience peaks and valleys in usage. In order to respond to increases and decreases in load, administrators +scale their applications by adding computing resources. In the cloud computing environment this can be +done automatically based on statistical analysis and thresholds. + +### Goals + +* Provide a concrete proposal for implementing auto-scaling pods within Kubernetes +* Implementation proposal should be in line with current discussions in existing issues: + * Resize verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) + * Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) + * Rolling updates - [1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) + * Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) + +## Constraints and Assumptions + +* This proposal is for horizontal scaling only. Vertical scaling will be handled in [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) +* `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are +constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md#responsibilities-of-the-replication-controller) +* Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources +* Auto-scalable resources will support a resize verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) +such that the auto-scaler does not directly manipulate the underlying resource. +* Initially, most thresholds will be set by application administrators. It should be possible for an autoscaler to be +written later that sets thresholds automatically based on past behavior (CPU used vs incoming requests). +* The auto-scaler must be aware of user defined actions so it does not override them unintentionally (for instance someone +explicitly setting the replica count to 0 should mean that the auto-scaler does not try to scale the application up) +* It should be possible to write and deploy a custom auto-scaler without modifying existing auto-scalers +* Auto-scalers must be able to monitor multiple replication controllers while only targeting a single scalable +object (for now a ReplicationController, but in the future it could be a job or any resource that implements resize) + +## Use Cases + +### Scaling based on traffic + +The current, most obvious, use case is scaling an application based on network traffic like requests per second. Most +applications will expose one or more network endpoints for clients to connect to. Many of those endpoints will be load +balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to +server traffic for applications. This is the primary, but not sole, source of data for making decisions. + +Within Kubernetes a [kube proxy](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md#ips-and-portals) +running on each node directs service requests to the underlying implementation. + +While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage +traffic to backends. OpenShift, for instance, adds a "route" resource for defining external to internal traffic flow. +The "routers" are HAProxy or Apache load balancers that aggregate many different services and pods and can serve as a +data source for the number of backends. + +### Scaling based on predictive analysis + +Scaling may also occur based on predictions of system state like anticipated load, historical data, etc. Hand in hand +with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. + +### Scaling based on arbitrary data + +Administrators may wish to scale the application based on any number of arbitrary data points such as job execution time or +duration of active sessions. There are any number of reasons an administrator may wish to increase or decrease capacity which +means the auto-scaler must be a configurable, extensible component. + +## Specification + +In order to facilitate talking about auto-scaling the following definitions are used: + +* `ReplicationController` - the first building block of auto scaling. Pods are deployed and scaled by a `ReplicationController`. +* kube proxy - The proxy handles internal inter-pod traffic, an example of a data source to drive an auto-scaler +* L3/L7 proxies - A routing layer handling outside to inside traffic requests, an example of a data source to drive an auto-scaler +* auto-scaler - scales replicas up and down by using the `resize` endpoint provided by scalable resources (`ReplicationController`) + + +### Auto-Scaler + +The Auto-Scaler is a state reconciler responsible for checking data against configured scaling thresholds +and calling the `resize` endpoint to change the number of replicas. The scaler will +use a client/cache implementation to receive watch data from the data aggregators and respond to them by +scaling the application. Auto-scalers are created and defined like other resources via REST endpoints and belong to the +namespace just as a `ReplicationController` or `Service`. + +Since an auto-scaler is a durable object it is best represented as a resource. + +```go + //The auto scaler interface + type AutoScalerInterface interface { + //ScaleApplication adjusts a resource's replica count. Calls resize endpoint. + //Args to this are based on what the endpoint + //can support. See https://github.com/GoogleCloudPlatform/kubernetes/issues/1629 + ScaleApplication(num int) error + } + + type AutoScaler struct { + //common construct + TypeMeta + //common construct + ObjectMeta + + //Spec defines the configuration options that drive the behavior for this auto-scaler + Spec AutoScalerSpec + + //Status defines the current status of this auto-scaler. + Status AutoScalerStatus + } + + type AutoScalerSpec struct { + //AutoScaleThresholds holds a collection of AutoScaleThresholds that drive the auto scaler + AutoScaleThresholds []AutoScaleThreshold + + //Enabled turns auto scaling on or off + Enabled boolean + + //MaxAutoScaleCount defines the max replicas that the auto scaler can use. + //This value must be greater than 0 and >= MinAutoScaleCount + MaxAutoScaleCount int + + //MinAutoScaleCount defines the minimum number replicas that the auto scaler can reduce to, + //0 means that the application is allowed to idle + MinAutoScaleCount int + + //TargetSelector provides the resizeable target(s). Right now this is a ReplicationController + //in the future it could be a job or any resource that implements resize. + TargetSelector map[string]string + + //MonitorSelector defines a set of capacity that the auto-scaler is monitoring + //(replication controllers). Monitored objects are used by thresholds to examine + //statistics. Example: get statistic X for object Y to see if threshold is passed + MonitorSelector map[string]string + } + + type AutoScalerStatus struct { + // TODO: open for discussion on what meaningful information can be reported in the status + // The status may return the replica count here but we may want more information + // such as if the count reflects a threshold being passed + } + + + //AutoScaleThresholdInterface abstracts the data analysis from the auto-scaler + //example: scale by 1 (Increment) when RequestsPerSecond (Type) pass + //comparison (Comparison) of 50 (Value) for 30 seconds (Duration) + type AutoScaleThresholdInterface interface { + //called by the auto-scaler to determine if this threshold is met or not + ShouldScale() boolean + } + + + //AutoScaleThreshold is a single statistic used to drive the auto-scaler in scaling decisions + type AutoScaleThreshold struct { + // Type is the type of threshold being used, intention or value + Type AutoScaleThresholdType + + // ValueConfig holds the config for value based thresholds + ValueConfig AutoScaleValueThresholdConfig + + // IntentionConfig holds the config for intention based thresholds + IntentionConfig AutoScaleIntentionThresholdConfig + } + + // AutoScaleIntentionThresholdConfig holds configuration for intention based thresholds + // a intention based threshold defines no increment, the scaler will adjust by 1 accordingly + // and maintain once the intention is reached. Also, no selector is defined, the intention + // should dictate the selector used for statistics. Same for duration although we + // may want a configurable duration later so intentions are more customizable. + type AutoScaleIntentionThresholdConfig struct { + // Intent is the lexicon of what intention is requested + Intent AutoScaleIntentionType + + // Value is intention dependent in terms of above, below, equal and represents + // the value to check against + Value float + } + + // AutoScaleValueThresholdConfig holds configuration for value based thresholds + type AutoScaleValueThresholdConfig struct { + //Increment determines how the auot-scaler should scale up or down (positive number to + //scale up based on this threshold negative number to scale down by this threshold) + Increment int + //Selector represents the retrieval mechanism for a statistic value from statistics + //storage. Once statistics are better defined the retrieval mechanism may change. + //Ultimately, the selector returns a representation of a statistic that can be + //compared against the threshold value. + Selector map[string]string + //Duration is the time lapse after which this threshold is considered passed + Duration time.Duration + //Value is the number at which, after the duration is passed, this threshold is considered + //to be triggered + Value float + //Comparison component to be applied to the value. + Comparison string + } + + // AutoScaleThresholdType is either intention based or value based + type AutoScaleThresholdType string + + // AutoScaleIntentionType is a lexicon for intentions such as "cpu-utilization", + // "max-rps-per-endpoint" + type AutoScaleIntentionType string +``` + +#### Boundary Definitions +The `AutoScaleThreshold` definitions provide the boundaries for the auto-scaler. By defining comparisons that form a range +along with positive and negative increments you may define bi-directional scaling. For example the upper bound may be +specified as "when requests per second rise above 50 for 30 seconds scale the application up by 1" and a lower bound may +be specified as "when requests per second fall below 25 for 30 seconds scale the application down by 1 (implemented by using -1)". + +### Data Aggregator + +This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing +time series statistics. + +Data aggregation is opaque to the the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` +that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation +must feed a common data structure to ease the development of `AutoScaleThreshold`s but it does not matter to the +auto-scaler whether this occurs in a push or pull implementation, whether or not the data is stored at a granular level, +or what algorithm is used to determine the final statistics value. Ultimately, the auto-scaler only requires that a statistic +resolves to a value that can be checked against a configured threshold. + +Of note: If the statistics gathering mechanisms can be initialized with a registry other components storing statistics can +potentially piggyback on this registry. + +### Multi-target Scaling Policy +If multiple resizable targets satisfy the `TargetSelector` criteria the auto-scaler should be configurable as to which +target(s) are resized. To begin with, if multiple targets are found the auto-scaler will scale the largest target up +or down as appropriate. In the future this may be more configurable. + +### Interactions with a deployment + +In a deployment it is likely that multiple replication controllers must be monitored. For instance, in a [rolling deployment](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md#rolling-updates) +there will be multiple replication controllers, with one scaling up and another scaling down. This means that an +auto-scaler must be aware of the entire set of capacity that backs a service so it does not fight with the deployer. `AutoScalerSpec.MonitorSelector` +is what provides this ability. By using a selector that spans the entire service the auto-scaler can monitor capacity +of multiple replication controllers and check that capacity against the `AutoScalerSpec.MaxAutoScaleCount` and +`AutoScalerSpec.MinAutoScaleCount` while still only targeting a specific set of `ReplicationController`s with `TargetSelector`. + +In the course of a deployment it is up to the deployment orchestration to decide how to manage the labels +on the replication controllers if it needs to ensure that only specific replication controllers are targeted by +the auto-scaler. By default, the auto-scaler will scale the largest replication controller that meets the target label +selector criteria. + +During deployment orchestration the auto-scaler may be making decisions to scale its target up or down. In order to prevent +the scaler from fighting with a deployment process that is scaling one replication controller up and scaling another one +down the deployment process must assume that the current replica count may be changed by objects other than itself and +account for this in the scale up or down process. Therefore, the deployment process may no longer target an exact number +of instances to be deployed. It must be satisfied that the replica count for the deployment meets or exceeds the number +of requested instances. + +Auto-scaling down in a deployment scenario is a special case. In order for the deployment to complete successfully the +deployment orchestration must ensure that the desired number of instances that are supposed to be deployed has been met. +If the auto-scaler is trying to scale the application down (due to no traffic, or other statistics) then the deployment +process and auto-scaler are fighting to increase and decrease the count of the targeted replication controller. In order +to prevent this, deployment orchestration should notify the auto-scaler that a deployment is occurring. This will +temporarily disable negative decrement thresholds until the deployment process is completed. It is more important for +an auto-scaler to be able to grow capacity during a deployment than to shrink the number of instances precisely. +