The power of data is so crucial to software development. We shouldn’t take it for granted. However, extracting insights from feature releases and experiments isn’t always straightforward. To get the insights you need, it takes the right tools and a little something I call “data respect.”
What does it mean to respect data as software people? When it comes to experimentation, using a feature flagging tool that connects flags to an advanced statistical engine is a great start. The goal here is to gain conclusions that are accurate and can be used to monitor development operations. Inevitably, this will empower your software teams to make smarter, data-driven decisions.
I can think of one helpful platform that carefully processes large amounts of flag data at the deepest, most granular level. It’s called Split. Apologies if I sound biased; I’ve been working on the platform’s metric cards. Let’s look at some of the new improvements, along with the importance of measuring impact.
Improvements to Split’s Metric Cards
Just recently, I helped improve the UI on the Split metric cards. They’re one of the first things you’ll see if you’re trying to interpret feature flag measurements. The strategy of this update has been to make the data understandable, valuable, and actionable. Here’s what you’ll notice:
For crystal clarity, we’ve made it easier to interpret the nuances of the data distributed. Split’s new metric cards now decouple negative signals from desired outcomes. Why? Because when a metric indicates a negative signal (and that’s what you’re looking for), the data should be more apparent. By bringing the right insights to the forefront fast, software engineers and product teams can react the same.
Attention to Impact Intervals
Also, we’ve given further thought to every impact interval displayed on the cards. The smaller the range, the more accurate the impact. The wider the range, the more inaccurate the impact. In the case that the range goes from negative to positive, the metric would therefore be inconclusive. That has been made obvious to the viewer! Whatever the impact may be, metric readings are designed to be communicated with clarity.
Making inclusivity a central part of the development process all starts with listening to customers. What we heard was that the metric cards’ needed enhancement from a UI perspective, and we were all ears! Specifically, we added color accessibility to Split’s functionality. This allows people with visual impairments (plus color vision deficiencies) to interact with digital experiences the same way their non-impaired counterparts do.
A Detailed Approach to Simplification
Statistical analysis can be confusing. Therefore, we looked to simplify the information on metric cards. But we did it in a very detailed way. Now when we surface results that have no true causation or correlation, we bring that to our customers’ full attention. Analyzing, summarizing, and processing data at the feature level should be intuitive. And, you shouldn’t have to be a data scientist to make sense of it all.
Impact Is Key
Yes, there are some very advanced experimentation strategies that will help you fully unleash the power of the Split platform. Take this Central Limit Theorem for example—power users, feel free to run with this equation, if you’re ready to up your game. But, if you’re just getting started with hypothesis-driven development, focus on “Impact; it’s one of the most important concepts to understand. Impact is the difference between the mean, or average, of any experiment control and the test treatment for a particular metric. In other words, when analyzing a certain metric for a software feature, the impact is used to compare two versions and see which one performed better.
A Simple Example of Impact Measurement
Let’s say we have two feature flags, one ON and one OFF. The first version, ON, will expose an exciting new application button, in an attempt to see if it receives more clicks. The feature flag in the OFF state, on the other hand, only shows our current version—the one we want to improve. In this experiment, we measure the number of clicks on the “Buy” button. After the result comes in, the ON flag has 400 clicks, and the OFF only has 336 clicks. Therefore, the impact for the metric “count of clicks per buy” would be (400-336)/100 = 0.64 = 64%. In conclusion, the ON has a 64% increase in clicks compared to the OFF treatment.
When reading metric cards, it’s critical to review with “Impact” in mind. If there is no impact difference between the samples, we feel it’s important to bring it to our customers’ attention. That’s why Split metric cards only show results that are useful and certain. If the data doesn’t give a helpful answer, then it’s our job to “respect” that by making everyone aware. After all, accuracy is the truth!
Remember this: When an experiment starts, you still have the ability to create new metrics, even if you’re sending the corresponding events for one particular measurement. Also, all the metrics you create will be calculated together, so if there are unintended effects in other parts of your core infrastructure, you’ll see it. Most tools only work with two or three metrics. If there’s an undesired effect caused by a feature release on another part of your software system, feature management platforms like Split’s will let you know.
Creating tools that are powerful and simple to use is almost a contradiction itself. But that doesn’t mean we can’t aim to do both. Processing huge amounts of data to refute a hypothesis and succinctly show the results is our objective with our measurement and learning platform. The metric cards will be the starting point to understand what is going on with your feature flags. Therefore, be sure to bucket your users, so we can find the impact for your most important metrics. The insights are right at your fingertips, you just need to have a clear goal of what you want to improve, and then go ahead to experiment.