12 minute read
Every organization has specific business goals, and its methods of measurement: revenue, number of clients, profit, etc. However, there are basic metrics every business should be tracking. Here are seven essential metrics all organizations should consider keeping an eye on.
Error Rates and Page Load Time Beyond a Threshold
The first two metrics are easy to build from your log streams. First, how fast do your pages load? Second, do they load at all? If you change anything to your code, you want to make sure it doesn’t make your pages slower. Good software engineering practice says you should run unit tests and integration checks. Both practices are good, but some issues can still find their way into a release.
Tracking both Error Rate and Page Load Time before and after a feature launch is useful, but not enough. If you have a complex website, there might be a lot of external factors affecting your page loads. Splitting your release with an A/B test allows you to convincingly detect that an error is due to your change.
Catching bugs doesn’t inspire a lot of wow factors during keynotes, but it’s useful. This third layer of verification — after unit tests and integration checks — really helps with releasing quality software.
Should you track load times or error rates? The answer is both.
Some people capture the average load time, which can seem convenient. But that’s generally not a good idea.
Let’s take an example:
- If 98 percent of your users load a page in 10 milliseconds and 2 percent in 1,010 milliseconds, it will have the same average load time as:
- 90 percent loading in 10 milliseconds and 10 percent in 210 milliseconds
In this case, 2 percent of your users have a visible delay. However, in the other case, 200 milliseconds is barely perceptible. Capturing the 95th percentile of load time is also a good metric in that respect. But it’s not an average and doesn’t behave as seamlessly with test statistics.
The better approach is deciding on a threshold for a likely but not acceptable wait time. For example, the 98th or 99th percentile, and looking at the ratio of pages that load slower than that. If it stays around 1 percent or 2 percent, fine. But if there is any significant deviation, then you know your change is affecting users.
Slow processes are often a sign that things aren’t going well and that users are waiting. However, your Page-load-time metric is waiting for the page to end loading to get a value. If there’s a problem, and the page doesn’t load, that value is Null. It might be ignored by your metric platform.
You won’t notice that there is a failure. For most A/B tests nowadays, you run well-tested code changes and stay within an optimized front-end framework. You would not expect either of those metrics to be significantly affected. Issues should be noticed earlier during QA, code review, or integration tests.
However, if you see a degradation of service, you should investigate. Both Page-load-time and Error-rate are very common safety net metrics.
“Safety net,” “health” or “hygiene” metrics are ways to describe metrics that you should include in your test results. Not because you expect them to change, but to ensure that your change isn’t affecting other teams down the line.
Activity and Engagement
The second set of metrics worth tracking is user activity and overall engagement.
Are they clicking buttons and opening links? Is their mouse moving around the page? Define a minimum level of activity to make sure that your users are engaging with your website or app. You should also remove obvious spam from that metric.
The idea is to have a realistic threshold of humans who would acknowledge having used your service. Many things can affect the number of server queries, but it takes a lot more to influence human behavior. If that number is affected by an A/B test, then you know that the change is triggering something meaningful. Depending on the direction, the change can improve the application and make it more usable, or make it worse.
On top of that minimal activity, define a more meaningful engagement level. Just opening an app isn’t the same as say:
1. Load the home page fully
2. Scroll more than half a screen
3. Click on at least one item
4. Consider it for at least one second
1. Go from not logged in to log in
2. Load a page fully
1. Type in a query
2. Open at least one search result
3. Stay there for one second
1. Open their Favorite selection
All four of those could be considered engaged with the app or not, depending on your expectations.
Defining a minimum threshold can be a frustrating exercise of listing many ways of using the app, with every team thinking that their feature is the most important one. That is less important than isolating and measuring the rest: users who have not added meaningful engagement with your service.
Typically, some people will consider your service. Others may have opened the application by accident, but would not remember registering, or failed to do what they wanted. Facebook famously set that threshold at having ten active friends. Having a clear, agreed-upon number is key. It allows you to set expectations for how many users can be affected by changes. This only makes sense to engaged users vs. ways to convince non-engaged users to be.
Identify both the number of active users and engaged users.
The reason behind this is that oftentimes those numbers are not the same. They might be further apart than many would be willing to acknowledge. This can be a sobering exercise. In many cases, a large majority of users who are active are not engaged. Knowing that will help the team focus on bigger priorities, like making the app more usable for that active but disengaged majority.
Conversion and Revenue
Once you have enough users engaging with your service, you likely want them to spend money. Or you are an ad business, in which case you probably want them to click on ads. Either event represents revenue for you. You want to define one key metric that counts users reaching the earliest revenue-making event.
Cancellation in e-commerce or disputes over attribution in advertising will mean that your revenue is threatened.
What matters is that you have an early, clear, and commonly understood event that represents your commercial goal. That will help focus your efforts on things that immediately move the needle (like unblocking shoppers who want to spend their money). For some businesses, early events like “Adding to basket”, and “Add to Favorites” are best because they generally lead to a sale. While others like “Click the Purchase button” or “Click on the Payment section” are more relevant events.
There will be tests that dramatically increase the difference between conversions and net revenue. For example, you can offer free shipping for orders above a lower price threshold. However, those orders often have a cost and need to be tested on a more subtle metric than just conversion rate.
If you operate an advertising platform, the number of users who clicked on an ad is a good indicator of whether or not you are offering appropriate suggestions.
In addition, capture both the event of conversion and the amount. Testing on skewed quantities is harder for statistical reasons, but it’s key to capturing that value. This will ensure that you are not turning large orders into more, smaller orders, increasing the conversion rate but not the total revenue.
This represents a lot of effort, but it is really useful to have a net margin or profits individualized per item.
For example, if you offer free shipping, apply an average shipping cost to every order. Or consider whether certain types of bookings lead to expensive customer service interactions. A detailed breakdown helps test results to be representative. While that often requires extensive analytical work, every organization needs that type of metric. It helps with understanding how moving into a new market can lead to a lot of revenue. But it may be more costly than it is worth.
Satisfaction and Retention
Finally, the most important metric of all is whether your users or customers are happy.
It’s essential information. Ask them: “Would they recommend our service to friends?” That question can feel indirect. They might be happy with the service themselves, but not expect their friends to be. That pretense is useful: “recommend … to friends” forces users to think of justifying their choices.
They have to confront other people’s expectations in their heads. They will still share their personal opinion, but it won’t be overly positive just because they already used your service. The answer is subjective but still relevant. It’s often presented as a ten-point scale aggregated onto a Net Promoter Score (NPS). All methods are good.
Honestly, just ask. NPS has the benefit of being a standard that brands can compare with other companies.
Once customers have successfully conducted business with you, they are happy to be done. They often have other things to do. They might not care enough to fill in surveys. So getting effective feedback can be hard. Many surveys include a single, simple question on the confirmation page to get a response from more people. You can include a longer form if they are unhappy — that’s often when they want to be heard!
Many also include surveys at the end of their customer service interaction. That’s rarely the moment you get the best grades, but an important one to understand if you’ve done well. A better pattern is to ask before and after the interaction. The first might measure their expectations for the service; the second, reality. Therefore, the difference measures how much you’ve changed their mind. It’s crucial if you want to estimate whether your representatives are improving things.
More simply, the ratio of calls to your customer hotline per transaction is likely a representative metric of how usable your website is and whether it covers most interactions.
This might be tricky to define if your users are employees of your clients, and they might disagree on priorities. For instance, you offer a service that allows your corporate customers to operate single-sign-on (SSO) authentication. With SSO, they can use the same process to connect to your service like the one used internally. Most of the calls from users are about lost passwords, new accounts, resetting passwords: things you’ve delegated to their employer. As a provider, you have done all you can to help your client clarify how SSO works, but they haven’t.
Defining satisfaction is harder for B2B. However, it’s generally quite clear what you can do to improve it.
You might care about satisfaction because that is core to your brand or because of who you are. However, for many businesses facing competition, satisfaction is the biggest predictor of customer retention. If you spend a lot of money to convince people to try your service, retaining their business is extremely important. If you are a non-profit trying to prevent crime, recidivism is also a good metric to tell whether your actions are effective — although it‘s best if it goes down, not up.
This impact on satisfaction is why you want to measure retention.
In particular, if their purchase cycle moves fast enough to measure multiple purchases during an A/B test. If you sell homes, mortgages, cars, wedding services, or even exclusive travels, users typically don’t come back for a year or more. Satisfaction might be the only thing you can measure within the timeline of your model.
A Small Set of Metrics
When running experiments, you want to make sure that all aspects of your business are captured by a small set of metrics. Call these control metrics, hygiene, safety, or health metrics.
They won’t explain every change, but they will help you make changes to a service without damaging another business area. We recommend imagining the customer journey, from loading your site, engaging with your features, converting, and hopefully coming back. There are many other ways to look at it, typically more tailored by industry.
This initial framework should help you come up with a relevant set of health metrics.
Some changes can impact teams that you might not usually work with. For instance, promising a promotion that doesn’t work all the time can upset some customers. They will reach out, possibly angry. The customer service team will be affected. Some tests might unexpectedly impact other teams in your organization: logistics, external partners, and contractors.
To give a comprehensive idea of the impact of tests, you’ll want to include the impact on those team metrics in test results.
Many organizations learn that the hard way. You are better off being proactive: iterate based on feedback from your stakeholders. Remember to occasionally share your test ideas with colleagues who are not directly involved.
And be sure to ask if there’s anything you are not capturing, but should.
We have a series about key metrics for various industries on the horizon: e-commerce, financial industry, subscription-based services, and two-sided markets. Stay tuned!
Schedule a demo and discover how feature flags and data drive results.