There’s a hazard that teams adopting feature flags very often trip over — too many flags! Feature flags are useful and easy to create. Unfortunately, removing flags requires a bit more effort than creating them. This means that, without a conscious effort to manage them, the number of flags in the system will grow over time. And while they’re valuable, these flags aren’t free. Each flag increases complexity to a codebase, increases the testing burden, and adds to the cognitive load required to manage feature releases.
So, it’s in a team’s best interests to keep the number of flags in check. Some teams try to keep this number low by using feature-set flags — a single flag that controls a large amount of functionality — or having standing flags — flags that are re-purposed to control different areas of functionality over time. However, both these strategies have some serious shortcomings. As we’ll discover in this post, a better way to control the number of flags in your system is by simply getting good at retiring flags.
Treat Feature Flags like Branches
There are a lot of parallels between managing source control branches and managing feature flags. This isn’t that surprising, given that feature flags are often used as an alternative to feature branches — a better way to organize concurrent changes to a codebase.
The similarities between flags and branches continue. Just like feature flags, branches are easier to create than remove, and they’re expensive to maintain. Our industry has learned over time that long-lived branches are best avoided. Rather than working with one long branch which contains a large batch of changes, many teams have shifted towards feature branches: small, short-lived branches that contain a single feature’s worth of changes. This allows a continuous flow of small changes, which leads to less risk and a more sustainable workflow.
We should apply the same thinking when it comes to feature flags — creating an individual flag for each feature, rather than batching up multiple features behind a single flag. This allows us to work in smaller batches, testing each feature individually as it is completed. It also provides more flexibility when it comes to releasing features.
Avoid Standing Flags
To further understand why small, focused feature flags are preferred, let’s look at an alternative approach. Faced with the realization that there’s such a thing as too many flags, some teams decide to reduce their flag count by introducing one or more “standing flags” — permanent flags that control different features over time.
An example of this approach would be having a standing
beta flag. Any feature that you want to manage with a flag would start life being controlled by the
After some initial testing, the feature would be promoted to beta by updating the flagging decision logic so that it’s now controlled by a
Once ready for general release, feature-flagging control would be removed from the feature entirely:
While this approach certainly reduces the number of flags under management, it comes with some significant drawbacks.
Firstly, all features at a given promotion stage (e.g., “beta”) are coupled together. There’s no way to manage these features individually. You can’t turn a specific feature on for a specific group of users or off in a certain environment. Conversely, suppose a specific feature is controlled via multiple decision points in your code. In that case, you have to be careful that you update every decision point whenever you want to promote a feature.
Another big drawback is that the only way to release or un-release a feature is via a code change. We’ve lost one of the big benefits of a feature flag management system — runtime control over a feature without needing to change code or make a deployment.
Avoid Feature-set Flags
Another strategy to reduce flag count which seems reasonable at first but turns out to be misguided, is feature-set flags. A feature-set flag is a broad-scoped flag that covers a related set of features intended to be released to customers as a unit.
Let’s say, for example, that a delivery team is working on a big UI refresh, one which will span several pages. It might be tempting to place all the work behind a single
ui-refresh feature flag since they are confident they will release all the UI changes at once — they wouldn’t ever want their users to experience a mix of the old UI and the new UI. However, while this makes sense from a release management perspective, it can create some challenges during feature development.
Implementing this set of features will involve work in a few different areas of the codebase, but it is hard to manage those changes independently if they are all controlled by the same flag. Let’s say that a change in one area of the code is ready for testing, but a change in another area of the code is in the midst of development and somewhat buggy. A tester can’t test the first change without turning on the other, buggy change. This tends to lead to a lumpy delivery flow, where all the changes behind this set of features pile up and end up having to be tested and signed off in one big batch, causing “feast or famine” situations for people further down the delivery pipeline — testers, product managers, etc. They might have a light workload for an extended period, and then once the final changes in a feature set are ready to test, they are suddenly hit with a large batch of changes that all need testing ASAP.
Even when we intend to release a set of related changes in one go, it’s preferable to be able to test and validate each change independently. This allows a smooth, continuous flow of changes, which maximizes utilization of the different parts of our delivery pipeline — development, testing, UAT. We achieve this by using a set of more granular feature flags that control distinct parts of a larger change. We can still release all of these parts together by making coordinated changes to this set of flags.
Avoid Broad Flags which Dilute the Value of Feature Flagging
To summarize our observations so far, approaches like standing flags and feature-set flags aim to reduce the number of flags under management by creating fewer flags. However, we’ve seen that doing this takes away many of the benefits that feature flagging can provide — we lose the ability to manage feature release at runtime, and we risk returning to large, lumpy batches of changes that run counter to Continuous Delivery principles.
Strive for Feature Flag Flow
There’s a better way to control the number of feature flags under management. Instead of creating fewer flags, we instead focus on reducing each flag’s lifecycle — in other words, we get better at removing flags.
Teams should aim for a steady stream of feature flags, with flags being removed at the same rate that they are being added. To achieve this, teams need to make it as easy as possible to remove flags and give themselves some time to do so. Teams can also keep themselves honest by tracking how many active flags they currently have under management and perhaps placing a WIP limit (work-in-progress limit) on those flags. Once that limit is reached, a team can’t create a new flag until they’ve removed an existing flag.
By focusing on feature flag flow, teams can reap the full benefits of feature flagging while also keeping the number of active flags to a manageable level.
Learn More About Implementing Feature Flags and Experimentation
Are you ready to implement feature flags for the first time, standardize their use across your organization, or begin your measurement and experimentation journey? Wherever you are, we’ve got content that can help. Check out these relevant resources:
- 4 Signs Experimentation Should Be Your Top Priority
- Testing a Feature-flagged Change
- A Quick Guide to Feature Toggles in Spring Boot
- How to Avoid Lying to Yourself with Statistics
- How to Implement Testing in Production