The process of launching a feature has undergone significant changes in the last ten years. Back then, it was common for teams to have feature launch tied to code release. When the release branch was merged into master and pushed to production, new features riding on that branch would be launched to customers.
Feature flags changed this equation by separating code release from feature launch. By putting a feature behind a flag, product managers and engineers got access to a rudimentary switch they could flip on after code ship. In case things went south, they could flip the switch off.
At large scale companies – typically B2C companies – this switch evolved into a ramp. Simply put the flag could be off, on, or on for a randomly selected percentage of users. For example, a product manager could release a feature to 5% of users, or 10% of trial users. This evolution was a game changer: it allowed engineers to test out the scalability of systems supporting the feature and product managers to turn every feature launch into an experiment by tying metrics to feature flags.
However, this capability has also led to confusion. Many teams struggle with the question: how many steps are required in this ramp and how long should we spend on each step? For instance, should a feature launch go from 1% to 5% to 10% to 20% to 50% to 100%? Or should it go 10%-50%-100%? Taking too many steps or taking too long at any step can slow down innovation. Taking big jumps or not spending enough time at each step can lead to suboptimal outcomes.
The experimentation team at LinkedIn has proposed a useful framework for answering this question. As an aside, in the remaining document, I use the terms feature and experiment interchangeably.
5 Launch Phases
Dogfooding Phase – The first phase of the ramp is dogfooding the feature in production with internal employees. The goal is to detect integration bugs, get design feedback from colleagues, get QA certification in production, or train sales or support teammates on a new feature. At this step, the goal is not to detect performance challenges or to measure the impact of the feature. So, this step can be quick, a couple of days at best. As an aside, this phase was added by me; it is not part of the original LinkedIn framework.
Debugging Phase – This second phase of the ramp is aimed at reducing risk of obvious bugs or bad user experience. If there is a UI component, does it render the right way? Can the system take the load of feature? Specifically, the goal of this phase is not to make a decision on whether the feature improved user experience therefore, there is no need to wait at this phase to gain statistical significance. Ideally, a few quick ramps – to 1%, 5%, or 10% of users – each lasting a day should be sufficient for debugging.
Maximum Power Ramp (MPR) Phase – Once we are confident the feature is not risky, the goal shifts to decision making. By decision making, we mean whether the feature is positively impacting the metrics it was designed to improve. The ideal next ramp step is a 50/50 ramp – 50% of users see the feature, 50% do not. From an experimentation or decision making perspective, a 50/50 ramp is the fastest way to gather data on customer impact. You should spend at least a week on this step of the ramp to collect data across high and low traffic days.
Scalability Phase – The MPR phase tells us whether the feature was successful or not. If it was, we can directly ramp to 100% of users. However, for most non-trivial scale of users, there may be concerns about the ability of your system to handle 100% of users for the feature. To resolve these operational scalability concerns, you can optionally ramp to 75% of users and stay there for one day of peak traffic to be confident your system will not topple over.
Learning Phase – The feature or experiment may be successful, but you may want to understand its long-term impact on users. For instance, if you are dealing with ads, did the new feature lead to long-term ad blindness? You can address these ‘learning’ concerns by keeping a hold-out set of 5% of users who are not given the feature for a long period of time, at least a month. This hold-out set can be used to measure long-term impact, which is useful in some cases. The key thing here is to have clear learning objectives, rather than keeping a hold-out set for hold-out’s sake.
Getting to Lift Off
Effectively launching a feature requires a series of steps, each with a specific objective. The dogfooding phase is for internal feedback, debugging and scalability phases are meant for risk mitigation, while MPR and learning phase are meant to speed up learning and decision making.
Hope this structure is useful to you in your next feature launch.
Stay up to date
Don’t miss out! Subscribe to our digest to get the latest about feature flags, continuous delivery, experimentation, and more.
Learn how Walmart uses product experimentation as a way to increase engineering impact and progressive delivery as a way to reduce the risk of moving fast.
New Relic and Split enable customers to make data-driven decisions using feature flags, with a shared belief that you can’t improve what you can’t measure.
There’s more than one way to succeed with Continuous Delivery. In this article we summarize four of the common tactics various organizations employ.