Kubernetes Autoscaling – an overview

This post describes the features k8s provides for Automated Scaling of applications. This is Part 3 of a series. Part 1 is an Introduction to K8S and Part 2 talks about Kubernetes Internals.

  • k8s can automate horizontal scaling of pods and has experimental support for vertical scaling. Scaling can even extend to requesting new hardware nodes using the infra provider’s API. Scaling can be based on utilization of actual hardware (CPU/memory), or pod metrics like QPS across the pods, or cluster wide objects like load on a database or network.

  • For horizontal scaling, the HorizontalPodAutoscaler (HPA) tracks current values of all resources/metrics mentioned in the pod spec and compares them with the thresholds defined in the spec. A new pod count is calculated such that values of all resources remain within the thresholds. Simple averages are used for the calculation, for the most part. This also implies that scaling behaviour is assumed to be linear.

  • The HPA then updates the “replicas” field of the Scale sub-resource of the relevant Deployment/ReplicaSet/ReplicationController. The actual scaling is done by these respective Controllers.

  • Given that gathering of metrics takes time and resource usage may have short spikes, k8s has conservative defaults for how quickly it scales up (or down) a resource and by how much, in order to avoid flutter.

  • The Vertical Pod Autoscaler (VPA) can change the resource requests and limits of existing pods without changing their count, and even assign initial resources for pods based on current traffic and past history.

  • The VPA contains a Recommender module which consumes metrics, builds a model of the cluster’s utilization, and comes up with suitable recommendations. The Updater module acts on these recos to evict pods. If replacement pods are going to get scheduled, a VPA module that is registered as part of the new pod creation flow sets the new resource values for these replacements.

  • VPA has limitations: i) Scale up happens via restart of pods, not by allowing running pods to grab more capacity ii) It may not work well if used along with HPA when it comes to CPU and memory utilization based scaling.

  • The Cluster Autoscaler interacts with your infra provider’s APIs to add/remove hardware nodes. Its implementation is necessarily coupled to those APIs. Some of the different implementations are the AWS Cluster Autoscaler, GKE Cluster Autoscaler.

Leave a Reply

Your email address will not be published. Required fields are marked *