Simultaneous Experimentation: Run Multiple A/B Tests Concurrently

The ability to run multiple experiments simultaneously is one of the most significant force multipliers for development teams. Running multiple experiments simultaneously can maximize a company’s velocity, flexibility, and revenue. For this process to be as impactful as possible, here are some best practices.

Use Randomization to Empower Simultaneous Experimentation

When you run a simple A/B test, you divide your users into two groups: the control, which receives the existing baseline functionality, and the treatment, which receives the new proposed change.

Imagine you were to divide your users the same way for each experiment. In that case, you would have only two groups – one which gets the baseline experience and one which receives each of the changes for every experiment you are running. As a result, when analyzing the data, you would see the combined effects of all changes in the results and could not parse out the source of the impact. They may both be improving equally, they might be doing so unequally, or one might even have a negative effect that is being overwhelmed by the other. Identical assignment confuses results, making it impossible to know which change is causing the effect.

Fortunately, this isn’t how an experiment is run. Users are sorted between the baseline and treatment experiences at random for each separate rollout plan, so each experiment’s impact is distributed independently.

This creates four groups: one group that has the standard experience, two groups that each receive one change alone, and a group that gets both changes. 

When just focusing on one experiment, the impact of the other experiment does exist – however it is distributed between the baseline (control) and the treatment (variant). So while each group has a different absolute effect than the vanilla experience with your product, the relative difference between the treatment and baseline remains constant, and the experiment results remain accurate.

The Effect on Overall Velocity

Given that randomization allows you to isolate the impact of an experiment run in parallel, it’s valuable to highlight how that process can significantly improve your experimentation and development velocity. If you choose to run the tests sequentially, the time to reach a final decision will be dependent on the combined length of all experiments. This runs the risk of engineers and other resources moving on to other projects once they finish the code for a feature and not being available to support its later release or wait until one experiment ends before they can start working on the next one.

Having seen how these experiments can be safely run in parallel, you can now reap the benefits of starting or completing experiments on a schedule that suits that particular test and team. The total time to complete the concurrent experiments will be capped at the longest experiment running. Any other experiments can be started or concluded during that run time without fear of interfering with overall results and decisions.

This distribution of impact holds even when the rollout is targeted to a fraction of the overall distribution vs. just a 50/50 rollout. Let’s say you choose to test a new feature on only one-quarter of the population. 

The populations are seeing the same absolute impact as before. The only difference is that the treatment group is smaller. 

Be on the Lookout for Interaction Effects

There is one area to keep in mind when running parallel experiments. That is the case where two changes directly interfere with one another, creating a different impact on behavior when combined than when in isolation. This happens when experiments are being run on the same page, the same user flow, etc. To avoid interaction effects, review concurrent experiments for interactions: use tags and naming conventions to track changed areas, manually test changes as part of the rollout, and look for other feature flags in nearby code. You should also design colliding tests to highlight interactions and compare each variant’s performance against one another. When you receive surprising results from an experiment, be sure to examine them qualitatively to help you understand the cause.

Learn More About Experimentation

Experimenting with your business models and user flows can provide you with feedback on what your customers value most in your product. Running simultaneous experiments can give you even more insight into making a better product, increase your revenue, and increase your overall velocity. Be sure to avoid interaction effects by reviewing concurrent experiments and using guardrail metrics that will help you highlight them.

Here at Split, we know feature flags and experimentation go hand-in-hand! These powerful tools are the future of software development, and our platform empowers organizations to drive clear business impact every day. Ready to learn more? Check out these resources:

To stay up to date on all things testing in production, feature flags, and experimentation, follow us on Twitter @splitsoftware, and subscribe to our YouTube channel!