A/B testing is the process of split testing two different variations of a web page or feature by serving different versions of the feature to specific percentages of users, gathering data over time until the sample size is large enough, then finding whether there are significant results for a key metric, such as conversion rate. A/A testing involves running an A/B testing process with two identical versions in order to ensure the testing process is in working order.
Why Run A/A Tests?
A/B testing is an immensely valuable process for making data-driven decisions about everything from web pages to feature releases. A hunch that your conversion rate optimization could be improved by making the CTA button larger is all well and good, but if you’ve split your userbase into two groups and the one that saw the larger button made 5% more conversions, that’s a very different (and much better) thing. But an A/B test can be a complicated process. How can you tell that your testing process is operating properly?
This is where A/A tests come in. By running two identical features through your A/B testing software or other process, you can ensure that the testing tool works as expected. With an A/A test, you can answer these questions:
- Are users split according to the percentages you planned?
- Does the data generally look how you expect it to?
- Are you seeing results with no statistical significance 95% (or whatever your confidence level is) of the time?
Let’s discuss that last point a bit further. If the two versions are identical, why are the results statistically insignificant only 95% of the time? Shouldn’t they be insignificant all the time?
If you have a 95% confidence level, that means you’re still wrong 5% of the time. Not all your data is identical – there is some variation – and that variation causes “significant” results 5% of the time, even when the versions are identical. This is called a false positive.
A/A tests can help you to ensure that your A/B testing process is working properly – you understand your data, the users are being split into groups as you wanted, and your significance levels are appropriate – so you can ensure that your A/B test results are telling you exactly what you think they are.