So, why is anomaly detection not that common? First, it is really, really hard. Second, its value goes down drastically if it doesn’t control for false positives (incorrectly marked anomalies).
At this year’s O’Reilly Velocity 2016, there were many interesting sessions on anomaly detection in APM. Specifically, a fantastic talk was presented by two of my ex-colleagues from LinkedIn, Ritesh Maheshwari and Yang Yang. They spoke about ‘Anomaly Detection for Real User Monitoring Data’. While a video is not yet available, you can see the slides here.
I’ve highlighted two important topics from their presentation:
#1 Their anomaly detection algorithm was simple yet powerful in detecting sustained anomalies (an anomaly that lasts for a while). Engineers learn from experience that threshold based anomaly detection is broken: yesterday’s threshold is today’s normal. Ritesh and Yang used sign test to detect if say the page load times today were anomalous when compared to yesterday or the same time a week ago. Besides its simplicity, the approach leads to an adaptive sustained anomaly detection which addresses false positives better.
#2 By connecting RUM with anomaly detection, they were able to quickly determine a high level root cause. For instance, if the anomaly was in connection time, they could be confident that the problem lay in their network, down to the region or PoP where the problem occurred. Similarly, if the anomaly was in first byte time or page download time, they could be confident that the problem lay on the server side (CDN Origin).
In summary, combining RUM with their anomaly detection approach is very promising and an interesting new approach to analysis for modern engineering teams.
Stay up to date
Don’t miss out! Subscribe to our digest to get the latest about feature flags, continuous delivery, experimentation, and more.
At Split, we “dogfood” our own product in so many ways. Our engineering and product teams are using Split nearly every day. It’s how we make Split better.
A/B testing is a powerful tool for learning about your users, understanding your features’ impact, and making informed business decisions. To ensure you make the best decisions and are extracting the most insights from your experiments, some experimental design guidelines are essential. These guidelines can be cumbersome or confusing at…
Feature flags provide so much for software organizations: they allow teams to separate code deployment from feature release, test in production, run experiments, and more. However, some rules apply to the feature flagging process that are easy for teams to overlook. I’ve gathered the best practices of feature flags from…