The process of launching a feature has undergone significant changes in the last ten years. Back then, it was common for teams to have feature launch tied to code release. When the release branch was merged into master and pushed to production, new features riding on that branch would be launched to customers.
Feature flags changed this equation by separating code release from feature launch. By putting a feature behind a flag, product managers and engineers got access to a rudimentary switch they could flip on after code ship. In case things went south, they could flip the switch off.
At large scale companies – typically B2C companies – this switch evolved into a ramp. Simply put the flag could be off, on, or on for a randomly selected percentage of users. For example, a product manager could release a feature to 5% of users, or 10% of trial users. This evolution was a game changer: it allowed engineers to test out the scalability of systems supporting the feature and product managers to turn every feature launch into an experiment by tying metrics to feature flags.
However, this capability has also led to confusion. Many teams struggle with the question: how many steps are required in this ramp and how long should we spend on each step? For instance, should a feature launch go from 1% to 5% to 10% to 20% to 50% to 100%? Or should it go 10%-50%-100%? Taking too many steps or taking too long at any step can slow down innovation. Taking big jumps or not spending enough time at each step can lead to suboptimal outcomes.
The experimentation team at LinkedIn has proposed a useful framework for answering this question. As an aside, in the remaining document, I use the terms feature and experiment interchangeably.
5 Launch Phases
Dogfooding Phase – The first phase of the ramp is dogfooding the feature in production with internal employees. The goal is to detect integration bugs, get design feedback from colleagues, get QA certification in production, or train sales or support teammates on a new feature. At this step, the goal is not to detect performance challenges or to measure the impact of the feature. So, this step can be quick, a couple of days at best. As an aside, this phase was added by me; it is not part of the original LinkedIn framework.
Debugging Phase – This second phase of the ramp is aimed at reducing risk of obvious bugs or bad user experience. If there is a UI component, does it render the right way? Can the system take the load of feature? Specifically, the goal of this phase is not to make a decision on whether the feature improved user experience therefore, there is no need to wait at this phase to gain statistical significance. Ideally, a few quick ramps – to 1%, 5%, or 10% of users – each lasting a day should be sufficient for debugging.
Maximum Power Ramp (MPR) Phase – Once we are confident the feature is not risky, the goal shifts to decision making. By decision making, we mean whether the feature is positively impacting the metrics it was designed to improve. The ideal next ramp step is a 50/50 ramp – 50% of users see the feature, 50% do not. From an experimentation or decision making perspective, a 50/50 ramp is the fastest way to gather data on customer impact. You should spend at least a week on this step of the ramp to collect data across high and low traffic days.
Scalability Phase – The MPR phase tells us whether the feature was successful or not. If it was, we can directly ramp to 100% of users. However, for most non-trivial scale of users, there may be concerns about the ability of your system to handle 100% of users for the feature. To resolve these operational scalability concerns, you can optionally ramp to 75% of users and stay there for one day of peak traffic to be confident your system will not topple over.
Learning Phase – The feature or experiment may be successful, but you may want to understand its long-term impact on users. For instance, if you are dealing with ads, did the new feature lead to long-term ad blindness? You can address these ‘learning’ concerns by keeping a hold-out set of 5% of users who are not given the feature for a long period of time, at least a month. This hold-out set can be used to measure long-term impact, which is useful in some cases. The key thing here is to have clear learning objectives, rather than keeping a hold-out set for hold-out’s sake.
Getting to Lift Off
Effectively launching a feature requires a series of steps, each with a specific objective. The dogfooding phase is for internal feedback, debugging and scalability phases are meant for risk mitigation, while MPR and learning phase are meant to speed up learning and decision making.
Hope this structure is useful to you in your next feature launch.
Stay up to date
Don’t miss out! Subscribe to our digest to get the latest about feature flags, continuous delivery, experimentation, and more.
Great teams don’t run experiments to prove they are right; they run them to answer questions. Guard against wishful thinking and hidden biases with this shortlist of core principles for productive online controlled experiments. (Video, transcript and screenshots from talk given at Pinterest HQ on September 13, 2019)
At Split we believe in the power of metrics, and are always striving to improve the ways we help our users make more data-driven product decisions. In this previous post we talked about the importance of understanding the impact of a new feature release via key and guardrail metrics. With…
Find faulty features before your customers do One of the most common reasons, our customers tell us, for moving towards feature flags is risk mitigation: the need to make sure releases don’t cause errors for users. Given how frequently organizations are now deploying features, the ability to limit the blast…