-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP][DO NOT MERGE] Proposal: Auto-scaling #546
Closed
Closed
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
035e035
auto scaling proposal
pweil- a9a278c
add repl controller selector
pweil- 21eb7d1
make more generic for kubernetes proposal
pweil- b390c0e
remove old proposal portion
pweil- 184c1cb
updates based on feedback
pweil- 2fa504c
updates based on feedback
pweil- e91b56a
remove details about statistics, defer until later
pweil- 217dc82
spelling correction and remove unnecessary line
pweil- File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
## Abstract | ||
Auto-scaling is a data-driven feature that allows users to increase or decrease capacity as needed by controlling the | ||
number of pods deployed within the system automatically. | ||
|
||
## Motivation | ||
|
||
Applications experience peaks and valleys in usage. In order to respond to increases and decreases in load administrators | ||
scale their applications by adding computing resources. In the cloud computing environment this can be | ||
done automatically based on statistical analysis and thresholds. | ||
|
||
### Goals | ||
|
||
* Provide a concrete proposal for implementing auto-scaling pods within Kubernetes | ||
* Implementation proposal should be in line with current discussions in existing issues: | ||
* Resize verb - [1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629) | ||
* Config conflicts - [Config](https://github.com/GoogleCloudPlatform/kubernetes/blob/c7cb991987193d4ca33544137a5cb7d0292cf7df/docs/config.md#automated-re-configuration-processes) | ||
* Rolling updates - [1353](https://github.com/GoogleCloudPlatform/kubernetes/issues/1353) | ||
* Multiple scalable types - [1624](https://github.com/GoogleCloudPlatform/kubernetes/issues/1624) | ||
|
||
## Constraints and Assumptions | ||
|
||
* This proposal is for horizontal scaling only. Vertical scaling will be handled in by [issue 2072](https://github.com/GoogleCloudPlatform/kubernetes/issues/2072) | ||
* `ReplicationControllers` will not know about the auto-scaler, they are the target of the auto-scaler. The `ReplicationController` responsibilities are | ||
constrained to only ensuring that the desired number of pods are operational per the [Replication Controller Design](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/replication-controller.md#responsibilities-of-the-replication-controller) | ||
* Auto-scalers will be loosely coupled with data gathering components in order to allow a wide variety of input sources | ||
* Auto-scalable resources will support a resize verb ([1629](https://github.com/GoogleCloudPlatform/kubernetes/issues/1629)) | ||
such that the auto-scaler does not directly manipulate the underlying resource. | ||
* Initially, most thresholds will be set by application administrators. It should be possible for an autoscaler to be | ||
written later that sets thresholds automatically based on past behavior (CPU used vs incoming requests). | ||
* The auto-scaler must be aware of user defined actions so it does not override them unintentionally (for instance someone | ||
explicitly setting the replica count to 0 should mean that the auto-scaler does not try to scale the application up) | ||
* It should be possible to write and deploy a custom auto-scaler without modifying existing auto-scalers | ||
|
||
## Use Cases | ||
|
||
### Scaling based on traffic | ||
|
||
The current, most obvious use case, is scaling an application based on network traffic like requests per second. Most | ||
applications will expose one or more network endpoints for clients to connect to. Many of those endpoints will be load | ||
balanced or situated behind a proxy - the data from those proxies and load balancers can be used to estimate client to | ||
server traffic for applications. This is the primary, but not sole, source of data for making decisions. | ||
|
||
Within Kubernetes a [kube proxy](https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/services.md#ips-and-portals) | ||
running on each node directs service requests to the underlying implementation. | ||
|
||
While the proxy provides internal inter-pod connections, there will be L3 and L7 proxies and load balancers that manage | ||
traffic to backends. OpenShift, for instance, adds a "route" resource for defining external to internal traffic flow. | ||
The "routers" are HAProxy or Apache load balancers that aggregate many different services and pods and can serve as a | ||
data source for the number of backends. | ||
|
||
### Scaling based on predictive analysis | ||
|
||
Scaling may also occur based on predictions of system state like anticipated load, historical data, etc. Hand in hand | ||
with scaling based on traffic, predictive analysis may be used to determine anticipated system load and scale the application automatically. | ||
|
||
### Scaling based on arbitrary data | ||
|
||
Administrators may wish to scale the application based on any number of arbitrary data points such as job execution time or | ||
duration of active sessions. There are any number of reasons an administrator may wish to increase or decrease capacity which | ||
means the auto-scaler must be a configurable, extensible component. | ||
|
||
## Specification | ||
|
||
In order to facilitate talking about auto-scaling the following definitions are used: | ||
|
||
* `ReplicationController` - the first building block of auto scaling. Pods are deployed and scaled by a `ReplicationController`. | ||
* kube proxy - The proxy handles internal inter-pod traffic, an example of a data source to drive an auto-scaler | ||
* L3/L7 proxies - A routing layer handling outside to inside traffic requests, an example of a data source to drive an auto-scaler | ||
* auto-scaler - scales replicas up and down by using the `resize` endpoint provided by scalable resources (`ReplicationController`) | ||
|
||
|
||
### Auto-Scaler | ||
|
||
The Auto-Scaler is a state reconciler responsible for checking data against configured scaling thresholds | ||
and calling the `resize` endpoint to change the number of replicas. The scaler will | ||
use a client/cache implementation to receive watch data from the data aggregators and respond to them by | ||
scaling the application. Auto-scalers are created and defined like other resources via REST endpoints and belong to the | ||
namespace just as a `ReplicationController` or `Service`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Talk about whether an autoscaler should be annotation data on a replication controller vs its own object and tradeoffs. |
||
|
||
There are two options for implementing the auto-scaler: | ||
|
||
1. Annotations on a `ReplicationController` | ||
|
||
Pros: | ||
|
||
* uses an existing resource, not another component that must be defined separately | ||
* easy to know what the target of the auto-scaler is since the config for the scaler is attached to the target | ||
|
||
Cons: | ||
|
||
* Configuration in annotations is marginally more difficult than plain old json. | ||
* Rather than watching explicitly for new auto-scaler definitions, the auto-scaler controller must watch all | ||
`ReplicationController`s and create auto-scalers when appropriate. As new, auto-scalable resources are defined the | ||
auto-scaler controller must also watch those resources. | ||
|
||
1. As a new resource | ||
|
||
Pros: | ||
|
||
* auto-scalers are managed by the user independent of the `ReplicationController` | ||
* flexible by using a selector to the scalable resource (that implements the `resize` verb), future implementations | ||
*may* require no extra work on the auto-scaler side | ||
|
||
Cons: | ||
|
||
* one more resource to store, manage, and monitor | ||
|
||
For this proposal, the auto-scaler is a resource: | ||
|
||
//The auto scaler interface | ||
type AutoScalerInterface interface { | ||
//Adjust a resource's replica count. Calls resize endpoint. Args to this are based on what the endpoint | ||
//can support. See https://github.com/GoogleCloudPlatform/kubernetes/issues/1629 | ||
ScaleApplication(num int) error | ||
} | ||
|
||
type AutoScaler struct { | ||
//Thresholds | ||
AutoScaleThresholds []AutoScaleThreshold | ||
|
||
//turn auto scaling on or off | ||
Enabled boolean | ||
//max replicas that the auto scaler can use, empty is unlimited | ||
MaxAutoScaleCount int | ||
//min replicas that the auto scaler can use, empty == 0 (idle) | ||
MinAutoScaleCount int | ||
|
||
//the label selector that points to a resource implementing the resize verb. Right now this is a ReplicationController | ||
//in the future it could be a job or any resource that implements resize | ||
ScalableTargetSelector string | ||
} | ||
|
||
|
||
//abstracts the data analysis from the auto-scaler | ||
//example: scale when RequestsPerSecond (type) are above 50 (value) for 30 seconds (duration) | ||
type AutoScaleThresholdInterface interface { | ||
//called by the auto-scaler to determine if this threshold is met or not | ||
ShouldScale() boolean | ||
} | ||
|
||
type StatisticType string | ||
|
||
//generic type definition | ||
type AutoScaleThreshold struct { | ||
//scale based on this threshold (see below for definition) | ||
//example: RequestsPerSecond StatisticType = "requestPerSecond" | ||
Type StatisticType | ||
//after this duration | ||
Duration time.Duration | ||
//when this value is passed | ||
Value float | ||
} | ||
|
||
### Data Aggregator | ||
|
||
This section has intentionally been left empty. I will defer to folks who have more experience gathering and analyzing | ||
time series statistics. | ||
|
||
Data aggregation is opaque to the the auto-scaler resource. The auto-scaler is configured to use `AutoScaleThresholds` | ||
that know how to work with the underlying data in order to know if an application must be scaled up or down. Data aggregation | ||
must feed a common data structure to ease the development of `AutoScaleThreshold`s but it does not matter to the | ||
auto-scaler whether this occurs in a push or pull implementation, whether or not the data is stored at a granular level, | ||
or what algorithm is used to determine the final statistics value. Ultimately, the auto-scaler only requires that a statistic | ||
resolves to a value that can be checked against a configured threshold. | ||
|
||
Of note: If the statistics gathering mechanisms can be initialized with a registry other components storing statistics can | ||
potentially piggyback on this registry. | ||
|
||
|
||
## Use Case Realization | ||
|
||
### Scaling based on traffic | ||
|
||
1. User defines the application's auto-scaling resources | ||
|
||
{ | ||
"id": "myapp-autoscaler", | ||
"kind": "AutoScaler", | ||
"apiVersion": "v1beta1", | ||
"maxAutoScaleCount": 50, | ||
"minAutoScaleCount": 1, | ||
"thresholds": [ | ||
{ | ||
"id": "myapp-rps", | ||
"kind": "AutoScaleThreshold", | ||
"type": "requestPerSecond", | ||
"durationVal": 30, | ||
"durationInterval": "seconds", | ||
"value": 50, | ||
} | ||
], | ||
"selector": "myapp-replcontroller" | ||
} | ||
|
||
1. The auto-scaler controller watches for new `AutoScaler` definitions and creates the resource | ||
1. Periodically the auto-scaler loops through defined thresholds and determine if a threshold has been exceeded | ||
1. If the app must be scaled the auto-scaler calls the `resize` endpoint for `myapp-replcontroller` | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to write a custom auto-scaler and drive a replication controller without having to modify the existing auto-scaler.