The Power of Controlled Rollouts: Software Development Lessons from the Samsung Galaxy Note 7 Recall

October 21, 2016

Samsung recently pulled the plug on its Galaxy Note 7 phones after failing to fix the random combustion problem plaguing them. On a recent flight I was on, an announcement asked us to power down these devices for the safety of the flight. This week it’s even more official: it’s now a crime to bring one on an airplane, even turned off and in checked luggage. For Samsung, this is a painful blow to revenue and customer trust.

Delivering great hardware has always been difficult. Once the unit is shipped, it cannot be fixed remotely. As software engineers at cloud companies we have it a bit easier. We can improve or fix our products whenever we want, usually without user interaction. However, there are some lessons we can draw from the Samsung example to help us improve software delivery.

#1 Use Controlled Rollouts to test features and quickly remediate issues.

The delay between the first reports of trouble and the device manufacturer’s response encouraged confusion and rumor, making it difficult to take stock of the true impact. It also meant that more phones were continuing to be sold exposing more customers to the issue. The lesson: getting in front of problems before more customers experience them is incredibly important.

Similarly, it’s better to discover a failure in new code early, by exposing it to a small, targeted percentage of users rather than risking them all. This concept of a gradually ramping release is called ‘Controlled Rollouts’ (CR).

There are two popular ways of achieving CR: canaries and CR platforms.

A canary is a dedicated machine(s) in your production cluster on which you can deploy new code. Using a load balancer, you can serve a percentage of production traffic with new code and the remaining with old. Canaries are easy to set up with most cloud providers, but the idea of targeting particular users with the feature doesn’t exist: the load balancer is simply delivering new code to a portion of traffic, regardless of what type of user they are.
CR platforms, like Split, let you deploy new code to an entire production cluster and be much more granular in targeting customers with the change, like 20% of ‘trial’ customers based in Los Angeles, for instance. Anyone on the team, from PMs to SREs, can target a group of individuals, segment, or percentage of customers when releasing new features. Another distinction: CR platforms live within app, post-deployment, and unlike canaries do not use load balancers.

Samsung could never use CR to fix a hardware problem (though, if it turns out the cause of the battery fires was software-induced, that might be another story); cloud software companies however, should never roll out software to all of their customers at once.

#2 Kill features, not products.

Emergency fixes — like the ones Samsung did — are high stress situations in which engineers can’t take stock of the bigger picture. The result is a fix that works most of the time, but occasionally compounds problems into a snowball effect that leads to killing an entire product, just like the Note 7.

In software, good monitoring can help you pinpoint the exact change causing a failure. A good CR platform will give you new insights into feature launch and user experience, so you can make these correlations. Instead of doing emergency fixes, stabilize the system by rolling back — aka killing — that change. In CR, this is equivalent to dialing the code down to 0% of production traffic. Without CR, you can achieve this by doing a code rollback.

#3 Identifying who has a problem is just as important as knowing why.

Failures happen; that’s a fact of software. But knowing who experienced the failure can be an early insight into why the failure happened.

For something like a phone, that can be very hard to figure out based on over-the-counter hardware sales. In software development, we don’t have to suffer from that problem, though many of us do. Logging who saw a new feature isn’t always seen as a priority when you’re trying to rush it into delivery, and often many new features are bundled together before they ship, making it difficult to easily and quickly pinpoint which might be the problem.

The ability to release discreet features to targeted groups of users means their experience, good or bad, can be tied directly to a unique treatment. Using a CR platform can also help you uniquely log a feature impression for each user experiencing the new feature, so you can correlate experiences with problems at the user-level, using the analytics products of your choice.

Take advantage of the safety measures available to software developers.

Degrading a user’s experience can leave an unforgettable blemish on your product—driving away prospects and customers and generating bad press and word-of-mouth for your brand. Adopting a CR approach to feature release makes it much easier (and faster) to prevent these problems or solve them when they do arise. This understanding is exactly why we built Split into the platform for controlled rollout. If you’d like to try it yourself, you can do so for free, and if you have any questions just drop us a line.

Get Split Certified

Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.

Switch It On With Split

The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Switch on a free account today, schedule a demo, or contact us for further questions.

Want to Dive Deeper?

We have a lot to explore that can help you understand feature flags. Learn more about benefits, use cases, and real world applications that you can try.

Webinar

Company

Flagship 2021: To Release or Not Release – Empower PMs in Product Launches

View Webinar

Blog

Industry Trends

Homegrown Versus SaaS Feature Toggle Solutions

View Blog

Feature Flags, Industry Trends

Differences Between Smoke Testing and Sanity Testing

View

Create Impact With Everything You Build

We’re excited to accompany you on your journey as you build faster, release safer, and launch impactful products.

Free Account Contact Us

Search site

Why Split

Products

Feature Delivery & Control

Feature Measurement & Learning

Enterprise Readiness

Related Links

Use Cases

By Need

By Industry

Resources

Developer Resources

Content Hub

Success

Related Links

Pricing

Company

Search site

The Power of Controlled Rollouts: Software Development Lessons from the Samsung Galaxy Note 7 Recall

Contents

#1 Use Controlled Rollouts to test features and quickly remediate issues.

#2 Kill features, not products.

#3 Identifying who has a problem is just as important as knowing why.

Take advantage of the safety measures available to software developers.

Get Split Certified

Switch It On With Split

Want to Dive Deeper?

Introducing Switch, Split’s New AI Developer Assistant

Experimenting With Statistical Rigor to Make Data-Driven Taco Decisions

Rethinking DORA: Mean Time to Restore

Don’t Fear the Percentage-Based Rollout

Influencing Without Authority Is All About Aligning Incentives

The Lifecycle of Software Releases Explained

Release New Features Faster

Want to Dive Deeper?

Flagship 2021: To Release or Not Release – Empower PMs in Product Launches

Homegrown Versus SaaS Feature Toggle Solutions

Differences Between Smoke Testing and Sanity Testing

Create Impact With Everything You Build