The use of feature flags has taken off and with good reason. Feature flags let you control which users are exposed to code by comparing the current execution context to rules that can be updated instantly, without a new deployment. This super-power of dynamic exposure control drives the top three feature flag use cases: testing in production, gradual release, and feature experimentation.
Before we look at each of these use cases, let’s consider why they deliver so much value to engineering teams by briefly looking back at life before feature flags.
Pick One: Speed, or Safety
Before feature flags, engineering teams were forced to choose either speed or safety. Would innovation or stability rule the day?
Teams that choose speed focused on frequent deployments, pushing code quickly into production, and fixing issues in real-time when things went wrong, also known as “failing forward”. To many traditional teams, this approach was cavalier and “simply not feasible” in their environment due to concerns over safety, security, regulation, and reputation. Teams that worked this way made for fascinating conference presentations, but they also fueled conversations about burnout, due to the high human cost of the frequent heroics to keep services online.
Teams that chose safety focused on layer upon layer of testing and approvals. That meant multiple environments (none of which ever truly matched production). It also meant a lot of time and money spent on managing infrastructure and process complexity that had nothing to do with the live service consumed by customers. While it was less likely that anything would go wrong on a given day (since nothing was being pushed to production most days), this approach actually carried a greater risk at release time. When you change many things all at once, it’s more likely something will go wrong that eludes quick discovery, triage, and repair. Going slowly and doing “big bang” deployments wasn’t actually safer, after all.
What If You Could Have Speed and Safety (and Less Stress)?
It’s liberating when you realize that a constraint that has been holding you back can be eliminated by just looking at the problem differently. For years, teams equated deployments with releasing code. The maintenance window, often a weekly planned downtime, was used to upgrade systems from one release to the next. Blue-green deployments did away with the need for maintenance windows and frantic upgrade-in-place procedures, but they required a full copy of production running in parallel for at least some time and were still a “big bang” release releasing all new code when traffic was switched from blue to green.
When teams discovered how easy it was to decouple deployment from release all the way down to individual blocks of code with feature flags, things got even more interesting. Instead of switching from one replica of production to another, they could turn on and off individual features within a single deployed image, and they could do this on production at any time.
Testing In Production
By deploying new code encapsulated by feature flags that are turned off by default (i.e. “dark launching”), deployment risk is reduced to nearly zero, and developers can go from committing a change to verifying it in production in minutes. Designers, developers, and product managers can review and refine visual changes across multiple separate parts of an application in production before customers see them. As a bonus, any automated tests that are written to verify functionality in production double as always up to date monitoring scripts and post-deployment smoke tests. To learn more about testing in production, have a look at Talia Nassi’s post, Increase your Productivity by Testing in Production.
Once code is in production behind a feature flag, it can be gradually released, starting with lower risk groups and expanding in stages to confirm both usability and scalability. Lower risk groups might include your own employees (“dogfooding”), free-tier customers, design partner customers, and those who have opted-in as early adopters. The goal here is to “limit the blast radius” when things occasionally go wrong. If fewer users are impacted and for a shorter period of time, then incidents are less harmful to your business and less stressful for everyone involved.
While this may sound a lot like canary deployments, it’s actually more powerful in terms of productivity and decoupling of independent teams because the scope of control is at the feature (not entire release payload) level. For more on the quality of life differences between canary deployments and gradual releases, check out my post, Pros and Cons of Canary Release and Feature Flags in Continuous Delivery.
Feature Experimentation (A/B Testing)
Once you have the ability to target gradual releases by user demographics and to keep individual users in the same “on” or “off” or A/B/n cohort across sessions, you have half of what you need to conduct feature experimentation, also known as A/B testing or A/B/n testing. The other half of what you need is the ability to gather and compare statistics between the different cohorts in a reliable and repeatable way.
With feature experimentation, you split some portion of your user population across old and new or multiple alternatives of a feature (typically for a week or two) to compare user experience and business outcomes. The goal is to go beyond proving whether code is buggy or not to focus on business impact.
Be sure to keep an eye on application-wide “do no harm” metrics (proving you aren’t accidentally setting your business back), in addition to feature-specific conversion statistics.
Putting It All Together To Balance Speed, Quality and Risk
Many teams are combining testing in production, gradual release, and feature experimentation into a standard workflow that balances speed, quality, and risk. Adil Aijaz’s post, The 5 Phases of a Feature Launch, describes a five-step process based on the work LinkedIn has done in balancing speed, quality, and risk (SQR).
Learn More About Testing In Production, Gradual Release and Feature Experimentation
If you’re interested in learning more about the top three feature flag use cases, check out these posts:
- Increase your Productivity by Testing in Production
- Why would I want to decouple deployment from release?
- Embracing feature experimentation one step at a time
Know someone who would rather “watch the movie” than “read the book” for this topic? here you go!
Stay up to date
Don’t miss out! Subscribe to our digest to get the latest about feature flags, continuous delivery, experimentation, and more.
At Split, we “dogfood” our own product in so many ways. Our engineering and product teams are using Split nearly every day. It’s how we make Split better.
A/B testing is a powerful tool for learning about your users, understanding your features’ impact, and making informed business decisions. To ensure you make the best decisions and are extracting the most insights from your experiments, some experimental design guidelines are essential. These guidelines can be cumbersome or confusing at…
Feature flags provide so much for software organizations: they allow teams to separate code deployment from feature release, test in production, run experiments, and more. However, some rules apply to the feature flagging process that are easy for teams to overlook. I’ve gathered the best practices of feature flags from…