Flagship 2024 – Day 2 is live! Click here to register and watch now.

Rethinking DORA: Mean Time to Restore

Contents

DORA Rethink: MTTR

DORA continues to be the benchmark for evaluating the performance and efficiency of software engineering teams. However, as the industry relies heavily on feature flags, we should take a closer look at the DORA metrics and how we approach them. 

When it comes to Mean Time to Restore (MTTR), it’s hard to ignore the impact that feature flags have upon the numbers and the way in which we respond to feature-related issues. Before I go into further detail, let’s do a quick review of DORA.

The 4 DORA Metrics

What are the DORA metrics? They’re a set of four metrics that measure software delivery performance. How did they come to be? They’re a result of seven years of surveys conducted by the DevOps Research and Assessment Group. 

Here’s how each is currently defined:

1. Lead Time for Changes

This is the length of time between when a code change is committed to the trunk and when it is deployed to production. 

2. Change Failure Rate

The change failure rate is the percentage of code changes that require hotfixes, rollbacks, or other remediation after production. This does not measure failures caught by testing that are fixed before code is deployed.

3. Deployment Frequency

This measures the frequency of new code being deployed into production. Many teams use the term “delivery” to mean code changes that are released into a pre-production staging environment, while “deployment” is reserved only for production environments. 

4. Mean Time to Restore (MTTR)

This is how long it takes to restore a service from a partial interruption or total failure. Whether the interruption is the result of a recent deployment or an isolated system failure, MTTR is important to track.

In this particular article, I focus primarily on how feature flags redefine the way we respond to feature-related issues. Because of this, it’s worth reimagining the full potential of MTTR in way that’s relevant to today’s work streams. 

Feature Flags Elevate the Old Standard

One of the metrics that feature flags blow the door off is Mean Time to Restore (MTTR). With feature flags, restoring to a previously stable state is as automatic as flipping a switch: Literally. Engineers who adopt feature flags are also shifting toward small, frequent release strategies. As a result, the blast radius of a release is minimized (as well as the level of recovery needed shall a feature-related issue occur). 

All of this is great news for MTTR numbers. But what is MTTR missing in a feature flag-driven world? 

Beyond Shipping Fixes Fast, It’s About Switching Off the Pain

While it’s important for engineering teams to get faster at creating and shipping fixes, there’s another modern tool in the toolbox that doesn’t require rushing through a half-baked fix. Feature flags allow you to instantly “turn off the pain” of an issue, so customers are not affected throughout the remediation process. As a result, you can spend the extra time needed to fully repair the problem without harming the user experience. This is a major plus for risk mitigation.

As we dissect the DORA metric from this lens, we shouldn’t just be considering the pace of restoration. Fixing things right without a customer noticing the problem is just as important as fixing things fast. Therefore, we should be prioritizing the time it takes to “stop the pain” as well. “How quickly can I turn this problematic feature flag off, so it’s not impacting customers?” This is another important efficiency to gain and improve upon. 

Can MTTR accurately capture this metric? If not, what’s the new one? Switch Off Speed (SOS)? Let’s leave that to DORA to figure out. 

Improve MTTR & More With Feature Management

By adopting a feature flag approach to instant triage, you can cut down MTTR to a matter of seconds (and in a way that barely harms user experiences). All you need is feature management to help. 

With the right feature management platform, there’s no need to scramble and make major repairs to a big bang release in a rush. Instead, you’ll be able to release feature by feature, attaching each one to a feature flag and measuring the impact as soon as it’s turned ON. Is it creating latency issues? Is it breaking the experience? If it does, you’ll be automatically alerted to the problem causing feature, all you have to do is turn it OFF. Then, rather than rush to ship a major fix, you’re just isolating the problem and taking it out of the equation. 

Hotfixes aren’t really a thing in this new way of working, and feature management platforms like Split are redefining the standards for speedy MTTR metrics and beyond. 

In Conclusion

As a new standard of nearly automatic triage emerges, it’s important you have the right tools and techniques at your disposal. Otherwise, you’ll be left in the dust. Don’t be on the wrong side of today’s faster MTTR standards with the ability to isolate issues throughout the process. Strengthen your approach to DORA metrics with a feature management platform that has automated rollout monitoring baked right in. You’ll eliminate downtime, hotfixes, and stop the pain experienced by customers with the push of a button.

More on “Rethinking the DORA Metrics”

I speak more in depth about reimagining the DORA metrics and leveraging feature flags in a recent podcast interview on Dev Interrupted. Be sure to listen here. Plus, be sure to check back soon on the Split blog feed for my upcoming discussion around Deployment Frequency. 

Switch It On With Split

The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Schedule a demo to learn more.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Want to Dive Deeper?

We have a lot to explore that can help you understand feature flags. Learn more about benefits, use cases, and real world applications that you can try.

Industry Trends"}]}" data-page="1" data-max-pages="1">

Create Impact With Everything You Build

We’re excited to accompany you on your journey as you build faster, release safer, and launch impactful products.