Skip to content

Latest commit

 

History

History
 
 

app-scaling

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Kubernetes App Auto-scaling

Horizontal Pod Autoscaling (HPA) is a Kubernetes feature to dynamically increase/decrease the number of pod replicas based on resource utilization metrics.

Heapster is an open source project that collects cluster metrics, and Kubernetes integrates with it to determine CPU consumption. Heapster’s default backend, referred to as a “sink”, is InfluxDB. Additional sinks are supported including Elasticsearch.

Kubernetes currently has beta support for custom metrics from multiple sources, and not just Heapster. This is enabled via the API Server Aggregation feature, and this documentation will be updated when the feature is live to show HPA with additional types of metrics.

HPA can automatically scale pods deployed in a replication controller, deployment, or a replica set. For additional information on how HPA works, check out the Kubernetes community documentation.

Pre-requisites

A 3 master nodes and 5 worker nodes cluster as explained at ../cluster-install#multi-master-multi-node-multi-az-gossip-based-cluster is used for this chapter.

Deploy Heapster using the steps described in the Cluster Monitoring lab. You can deploy Heapster using the configuration file cluster-monitoring/heapster/templates/heapster.yaml:

$ kubectl create -f heapster/templates/heapster.yaml
serviceaccount "heapster" created
deployment "heapster" created
service "heapster" created

Deploy an application

In this step, we deploy a simple Go web application and constrain the CPU resources just for the purposes of this test.

$ kubectl run webapp --image=trevorrobertsjr/webapp --requests=cpu=50m --expose --port=8080
service "webapp" created
deployment "webapp" created

It also publishes the service at port 8080.

Horizontal Pod Autoscaler configuration

Now that our application is running, we create a Horizonal Pod Autoscaler for our webapp deployment.

$ kubectl autoscale deployment webapp --cpu-percent=10 --min=1 --max=10
deployment "webapp" autoscaled

This command will mainain between 1 and 10 replicas of the pod. The autoscaler will increase or decrease the number of replicas to maintain average CPU utilization of 10% across all the pods.

Generate load

The simplest method to do this would be to access the application in an infinite loop similar to the example in the Kubernetes Horizonal Pod Autoscaler documentation:

First, deploy a busybox container, label it load-generator and attach to it’s prompt:

$ kubectl run -i --tty load-generator --image=busybox /bin/sh

At the load-generator command prompt, run a continuous request of the webapp

$ while true; do wget -q -O- http://webapp.default.svc.cluster.local:8080; done

If for any reason you get disconnected from the load-generator container, you can re-attach to it with the following command.

$ kubectl attach $(kubectl get pod | grep load | awk '{print $1}') -c load-generator -i -t

In a different terminal window, check the status of the Horizontal Pod Autoscaler.

$ kubectl get hpa -w

You will see output similar to the following over successive queries of the hpa resource:

$ kubectl get hpa -w
NAME      REFERENCE           TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
webapp    Deployment/webapp   0% / 10%   1         10        1          6m
webapp    Deployment/webapp   62% / 10%   1         10        1         7m
webapp    Deployment/webapp   62% / 10%   1         10        4         7m
webapp    Deployment/webapp   112% / 10%   1         10        4         8m
webapp    Deployment/webapp   112% / 10%   1         10        4         8m
webapp    Deployment/webapp   53% / 10%   1         10        4         9m

Notice that, eventually, the value in the REPLICAS column will increase as the load generator continues to run.

Stop load

In the terminal window that is running the load generator, hit Ctrl+C to terminate the process. Again, run the kubectl get hpa -w command in your other terminal window, and you will see the number of replicas begin to decrease as the CPU load returns to 0%. It shows the output:

Note
It takes a few minutes for the number of replicas to scale down.
$ kubectl get hpa -w
NAME      REFERENCE           TARGETS     MINPODS   MAXPODS   REPLICAS   AGE
webapp    Deployment/webapp   51% / 10%   1         10        4          10m
webapp    Deployment/webapp   51% / 10%   1         10        4         10m
webapp    Deployment/webapp   27% / 10%   1         10        4         11m
webapp    Deployment/webapp   27% / 10%   1         10        8         11m
webapp    Deployment/webapp   0% / 10%   1         10        8         12m
webapp    Deployment/webapp   0% / 10%   1         10        8         12m
webapp    Deployment/webapp   0% / 10%   1         10        8         13m
webapp    Deployment/webapp   0% / 10%   1         10        8         13m
webapp    Deployment/webapp   0% / 10%   1         10        8         14m
webapp    Deployment/webapp   0% / 10%   1         10        8         14m
webapp    Deployment/webapp   0% / 10%   1         10        8         15m
webapp    Deployment/webapp   0% / 10%   1         10        8         15m
webapp    Deployment/webapp   0% / 10%   1         10        8         16m
webapp    Deployment/webapp   0% / 10%   1         10        1         16m
webapp    Deployment/webapp   0% / 10%   1         10        1         17m