How to Measure Latency at Scale

When measuring latency, keeping track of individual measurements is a good, first-pass solution. It gives you the flexibility to perform aggregations at a later time and experiment with different ways of constructing histograms and explore metrics in specific time windows, among other tests.

However, as your scale increases, memory restrictions can make it impossible to record every individual measurement. At this point, you’ll need techniques that allow you to keep an increasing amount of metrics in a constrained amount of memory. We’ve found that creating a latency histogram is a great way to aggregate latency measurements.

What is the ideal bucket size for the histogram? That’s the key question you’ll need to answer if you decide to set up a latency histogram. You’ll likely consider two bucketization approaches, but I’ve found one to be vastly superior to the other:

1. Lay out the bucket sizes as an arithmetic sequence

This concept is best illustrated with an example:

[1ms, 3ms, 5ms, 7ms, 9ms, 11ms, …. 2000ms]

In the example above, bucket 0 contains the count of latencies that were <= 1ms. Bucket 1 contains the count of latencies > 1ms and <= 3ms. The difference in buckets in this arithmetic sequence is 2ms.

This approach is better than measuring each latency separately, but with fixed width buckets, it will take roughly 1,000 buckets to measure latencies up to 2000ms. Most of the buckets will be empty, which isn’t ideal.

2. Lay out the bucket sizes as a geometric sequence

Again, we’ll start with an example:

[1ms, 1.5ms, 2.25ms, 3.75ms, 5.07ms, 7.6ms, …. 2000ms]

Here each bucket boundary is 150% of the previous bucket’s boundary. The advantage of this style of bucketization is that we get very granular data at low latencies, which is the interesting part of the distribution, and the data becomes less granular for larger buckets. With 19 buckets, you can capture latencies up to 2000ms and each bucket will be used well.

I greatly prefer the geometric sequence over the arithmetic one, and I’m not alone. According to Ben Sigelman, the geometric sequence is the default bucketization used at Google to measure latency histograms. I strongly recommend using a geometric sequence to bucketize your latency histogram.