When speaking to product managers and data scientists about their A/B testing initiatives, I always emphasize the need to incorporate seasonality into the design and planning of their experiments. That is the idea that visitors’ behavior may differ based on the day of the week or the time of the year and you need to account for that when defining the duration of your experiment. For instance, when running a test where the OEC is conversion percentage you probably want to run the experiment for a number of complete weeks because in many cases, visitors are more likely to do online shopping on the weekend and you want to make sure that each day of the week is equally represented in your results. Or you might avoid drawing generalized conclusions for such an experiment run during the week of Black Friday.
So given the unusual and unprecedented times, we are going through right now, what’s a product manager to do? How can you redirect your feature flagging and experimentation efforts to still get meaningful and actionable information?
Tighten Up the Ship with A/B Testing
A/B testing can be used to measure changes that do not have anything to do with users’ actions. At this point in time you may be looking at changing code or infrastructure to increase efficiency, either to handle increased demand or to reduce infrastructure spend. Putting those changes behind a split and adding events and metrics to measure performance effects like page load time will give you solid data as to whether or not the changes negatively affect customer experience. You have the safety of rolling out new code to a small audience and verifying that it is indeed more efficient without risking the alienation of your entire customer base if things go south (as well as the ability to “rollback” the changes instantly if they do).
Testing: New Learnings for a New World
The dramatic changes in the context for your business are a new “normal”. There are tests other than traditional A/B tests that can be run, and ways you might alter or add to the measurement of your tests to help you get a handle on how your customers are behaving. Here are some suggestions for gauging the new landscape.
Targeting/Measuring New Users
Depending on what business you are in, lockdowns and working from home may be driving significant numbers of new users to register on your site. If you maintain the date on which a user registered, you can use that value either as a targeting attribute to target users who registered after a given date or as an event property to measure the behavior of “novel” users separately from your legacy user base. Target these users into a separate cohort if your feature (and hence your hypothesis) explicitly addresses this group of users. On the other hand, using an event property gives you another axis around which to slice your data to measure a feature targeted at all users.
Run an A/A Test for Metric Baselines
It’s a good bet that any metric baselines you’ve gathered in the past no longer represent the current state of things. One easily executed “experiment” that would collect this new data for you is an A/A test. Running one for a week before pulling new metric baselines (so you get samples for every day of the week – seasonality is still a thing) should be sufficient, but it’s cost-free to let it run longer and continue to monitor the metrics for changes. See the Split knowledge base for our video on Creating and Running an A/A test.
Enhance Existing Metrics
To help you understand the implications of the global situation on the usage of your application you may want more detail on how your users are behaving. For instance, you may currently have a straightforward Items Added to Cart Per User metric to measure treatment effects on that action. If there’s a particular product category that you suspect is being added in unusual numbers and your add_to_cart event has ‘product category’ as a property, you could create a new metric to track how many items in that category were added by filtering on a specific value for that property.
Similarly, there may be new outliers that skew metrics in unexpected ways. Creating a capped version of the metric and comparing it to the uncapped version can help you understand the effect these outliers are having on the metric value and variance.
These new, more focused metrics do not replace your existing OEC metric(s) but instead are intended to shine a light on possible areas for improvement and experimentation moving forward.
Why We Test: Change is Opportunity
Finally, we encourage you to continue running traditional A/B Tests at this time; you just need to do so being fully aware that the environment in which that test is running is not representative of what you might’ve seen six weeks ago or will see six weeks from now. It doesn’t mean you can’t learn something, just that what you learn should be put into the context of the times.
This is a good reminder of a general best practice not often followed: retesting and verifying that the original conclusion reached is still valid. While the social and economic changes we are currently experiencing are unprecedented in this era, it’s a truism that change is constant and this too shall pass.
Learn More About Feature Delivery and Testing
If this post got you excited to learn more about feature flags and experimentation, we have you covered. Check out these other resources from our team:
- You Might Not Need Continuous Deployment
- Know Your Why: Experimentation and Progressive Delivery at Walmart Grocery
- Set Up Feature Flags with React in 10 Minutes
- Monitor Your Feature Flag Performance with New Relic
- The 80% Rule of Software Development
Stay up to date
Don’t miss out! Subscribe to our digest to get the latest about feature flags, continuous delivery, experimentation, and more.
The increased usage of feature flags and canary releases (also “canary deployments”) in software development has had a tremendous impact on the overall release process for software companies globally. These canaries and feature flags allow you to test your features in production , convert your monolith to microservices, perform A/B…
Chaos engineering involves injecting failure into production systems, as a way to proactively validate that those systems handle a degraded environment.
As companies implement more innovative practices, they typically realize the faults of using an imperfect environment, like staging. Staging environments are expensive and they often do not match the behavior of production which leads to faulty test results. They also don’t provide any confidence that your features are working before…