We have updated our Data Processing Addendum, for more information – Click here.

How We Use Autoscaling to Save Money With Kubernetes

Contents

Since mid-2021, Split runs all of its services in containers under Kubernetes clusters; we don’t have that many—around 30—but we have some services that receive a high amount of traffic, around 35k requests per second.

Before containers, our infrastructure consisted on EC2 instances managed by terraform and some scripting to provision them. In this scenario, the number of instances per service was really high, as the provisioning for each individual instance could take several minutes. This is impossible to scale quickly. We needed to be constantly over-provisioned.

At the same time, each of those instances was running a single copy of our microservices, wasting a lot of resources along the way.

The Kubernetes World

Then we moved to Kubernetes, and we were able to fill up those instances more efficiently, and also use the Kubernetes orchestration power to provision more replicas of our micro services quickly. Much faster than starting up entire EC2 instances.

There is one catch, though: Even when you run everything in containers, those containers must run somewhere, usually in more EC2 instances that are usually controlled by autoscaling groups. The thing about autoscaling groups is that they are “not aware” of Kubernetes by default; they can only act based on system metrics or custom metrics in cloud watch. Here is where cluster-autoscaler comes in.

Cluster autoscaler is a service that runs inside your Kubernetes cluster that does two things:

  1. It listens for events from the Kubernetes scheduler, particularly when there are not enough nodes to schedule pods. When that event happens, cluster-autoscaler connects to the AutoScaling Group using the AWS API (or any other cloud API) and increases the size of the “desired” number of nodes by one. Creating a new node that joins the cluster and gives space for the pod to be scheduled.
  2. It’s constantly trying to find unused or underutilized nodes to destroy. It does this by draining the node (removing all pods from it) and then destroying it using the AWS API.

This way, your cluster is always efficiently provisioned, you usually don’t have nodes underutilized, and if you need more nodes, cluster-autoscaler will get them for you.

This is great! Now our cluster is truly elastic.

Now, let’s see how we can combine cluster-autoscaler with a great feature from Kubernetes.

Horizontal Pod Autoscaling

Horizontal Pod Autoscaling (HPA) is a Kubernetes feature that will use metrics to scale your deployment up or down. To put it in simple terms, if a metric goes over a certain number, HPA will increase the number of replicas of a given deployment; if it goes below, it will decrease it.

Do you see what we are doing here? This is a killer combo! HPA can watch for metrics to scale up and down pods, and cluster-autoscaler will swipe in as its side-kick, creating nodes when needed and destroying them when not.

Let’s see how we use HPA + cluster-autoscaler in Split. You can use different kinds of metrics to trigger HPA events, these can be: basic, custom or external metrics.

Basic Metrics

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hello-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hello
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 50
YAML

Custom Metrics

Sometimes, system metrics are not enough. You won’t always see an increase in CPU or memory if you need more instances. Maybe your application “is always busy” and uses all of the available resources at all times. In these cases, you can use custom metrics, metrics that you create for your application to scale.

A good example of these metrics can be “amount of concurrent users using your application.” This metric will have to be created by you, and it’s specific to your application code, but it might bring great insight into the status of your application and can be useful to scale up or down:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: hello-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: hello
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Object
      object:
        metric:
          name: rabbitmq_queue_messages_ready
        describedObject:
          apiVersion: v1
          kind: Service
          name: rabbitmq-service
        target:
          type: Value
          value: 5
YAML

Note: The use of custom metrics requires some extra work from the infrastructure side, a service called Prometheus-adapter can act as a proxy for HPA to find metrics from your application.

External Metrics

In the same way that metrics tied to your code can be helpful, sometimes you have metrics outside your application that can help you understand if you need more resources. A typical example of this might be a queue size.

In our case, we have a feature called live-tail that allows users to see their events in real-time while they are being received by Split.

When one of our customers starts using this feature, it triggers a service that starts sending those events to a queue, which is then consumed by our live-tail pods. These live-tail pods are mostly dormant, as they don’t need to do anything if there are no messages in the queue. But once our queue starts receiving events, we need to scale up those live-tails pods so they can process the events and show them to the end user.

In this case, we use the queue size to scale up and down our live-tail replicas. Easy peasy!

Note: The use of external metrics requires some more work from the infrastructure side; besides using Prometheus-adapter, you’ll need a cloudwatch-exporter to bring the AWS cloudwatch metrics into Prometheus.

Auto-scaling is not only helpful when you need more resources, but it will also save money when you don’t.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

Split gives product development teams the confidence to release features that matter faster. It’s the only feature management and experimentation solution that automatically attributes data-driven insight to every feature that’s released—all while enabling astoundingly easy deployment, profound risk reduction, and better visibility across teams. Split offers more than a platform: It offers partnership. By sticking with customers every step of the way, Split illuminates the path toward continuous improvement and timely innovation. Switch on a trial account, schedule a demo, or contact us for further questions.

Want to Dive Deeper?

We have a lot to explore that can help you understand feature flags. Learn more about benefits, use cases, and real world applications that you can try.

Create Impact With Everything You Build

We’re excited to accompany you on your journey as you build faster, release safer, and launch impactful products.