Flagship 2024 – Day 2 is live! Click here to register and watch now.

How to Measure Latency at Scale


When measuring latency, keeping track of individual measurements is a good, first-pass solution. It gives you the flexibility to perform aggregations at a later time and experiment with different ways of constructing histograms and explore metrics in specific time windows, among other tests.

However, as your scale increases, memory restrictions can make it impossible to record every individual measurement. At this point, you’ll need techniques that allow you to keep an increasing amount of metrics in a constrained amount of memory. We’ve found that creating a latency histogram is a great way to aggregate latency measurements.

What is the ideal bucket size for the histogram? That’s the key question you’ll need to answer if you decide to set up a latency histogram. You’ll likely consider two bucketization approaches, but I’ve found one to be vastly superior to the other:

1. Lay out the bucket sizes as an arithmetic sequence

This concept is best illustrated with an example:

[1ms, 3ms, 5ms, 7ms, 9ms, 11ms, …. 2000ms]

In the example above, bucket 0 contains the count of latencies that were <= 1ms. Bucket 1 contains the count of latencies > 1ms and <= 3ms. The difference in buckets in this arithmetic sequence is 2ms.

This approach is better than measuring each latency separately, but with fixed width buckets, it will take roughly 1,000 buckets to measure latencies up to 2000ms. Most of the buckets will be empty, which isn’t ideal.

2. Lay out the bucket sizes as a geometric sequence

Again, we’ll start with an example:

[1ms, 1.5ms, 2.25ms, 3.75ms, 5.07ms, 7.6ms, …. 2000ms]

Here each bucket boundary is 150% of the previous bucket’s boundary. The advantage of this style of bucketization is that we get very granular data at low latencies, which is the interesting part of the distribution, and the data becomes less granular for larger buckets. With 19 buckets, you can capture latencies up to 2000ms and each bucket will be used well.

I greatly prefer the geometric sequence over the arithmetic one, and I’m not alone. According to Ben Sigelman, the geometric sequence is the default bucketization used at Google to measure latency histograms. I strongly recommend using a geometric sequence to bucketize your latency histogram.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Switch on a free account today, schedule a demo, or contact us for further questions.

Want to Dive Deeper?

We have a lot to explore that can help you understand feature flags. Learn more about benefits, use cases, and real world applications that you can try.

Create Impact With Everything You Build

We’re excited to accompany you on your journey as you build faster, release safer, and launch impactful products.