A Culture of Experimentation in 4 Stages 

This article is based on the Flagship 2022 keynote, “Accelerating Software Innovation with A/B Testing,” by Ronny Kohavi. Ronny is a best-selling author and a former executive at Airbnb, Microsoft, and Amazon. 

————————————————————————————–

“It is difficult to get a man to understand something when his salary depends upon his not understanding it.” – Upton Sinclair    

Everybody focuses on technology, but not on controlled experimentation or A/B testing.

Some believe it threatens their job as decision-makers. They want to make the decision and say it’s great. For some, failures may hurt their image and professional standing. It’s easier to declare success when the feature launches.

Why do people avoid controlled experimentation? We’ve heard it all before: 

  • “We know what to do. It’s in our DNA,” 
  • “Why don’t we just do the right thing?”
  • “You don’t need to test gravity.” 
  • “We don’t see the value in testing something we know we will ship.”

At Microsoft, program managers typically select the next set of features to develop. They propose ideas. They then get implemented and tested. In the early days, there was a new release of Office every three years. The PMs now have to propose alternative ideas. 

The data will come from building these minimum viable products. Many ideas are not going to be as good. And people aren’t going to be as sure about the new features. This humbleness is healthy.  

Editors and designers: They tell me they get paid to select a great design. The data shows that even the best learn much when exposed to a controlled experiment. They can adjust their intuitions. Failures may hurt, but it’s always easier to declare success when the feature is implemented and launched. However, the reality is we care about the users.  

Hubris

1. We don’t want to test it. It’s in our DNA. We know. Why don’t we just do the right thing?  Those radical ideas will actually teach you something. We’ve heard you don’t need to test gravity. People are so sure about something that they don’t even want to test it. 

2. We don’t need to see the value in testing something because we will ship it anyway. This is sometimes true when you have to be legally compliant and you have to ship something. However, testing four versions of something that’s legally compliant may show you that one version is better than the others.

3. We know what to do and we’re sure of it. It’s gravity. We don’t need to test it. Here’s a true story from 1854. John Snow claimed cholera was caused by polluted water. This landlord has apartment houses. The tenants are complaining that the water stinks. It doesn’t just stink, cholera is frequent among the tenants. The landlord comes in and says, “What is this crap that you’re telling me?” No need to be concerned. He drinks a glass of water in front of the tenants to show them there’s nothing wrong with it.

He died three days later. That’s hubris. Even if you’re sure about the ideas, test them. It will help you adjust your intuition and make things better down the road.  

Insight Through Measurement and Control   

A doctor worked at an important teaching and research hospital in the 1830s and ‘40s. At that time, there was a disease most of you have not heard of called childhood fever. It killed more than a million women giving birth.  

This professor, a very important doctor, made this amazing observation: The mortality rate for women in his ward was 15 percent. One in seven women died giving birth at the ward staffed by doctors and students. But at this other ward, attended by midwives, the rate is 2 percent.  

He tried to understand what was happening and control the differences. Birthing positions, ventilation, diet, nothing worked. He even checked if the laundry was different. Were they getting the hard cases? Nothing could explain what was happening with this massive difference in death rate. He went on sabbatical, a perk of teaching. He later returned to find the students were all excited. “Hey, the death rate significantly dropped when you were gone.” 

It fell so much so that he began to think this was related to him. And then he had this insight. Doctors performed autopsies each morning on the cadavers of women who died the day before. He conjectured that they were transmitting particles, today we call them germs, to healthy patients. All of this was happening at the hands of the physicians.  

Semmelweis Reflex

He started experiments with cleansing agents. He came up with chlorine and lime as something effective. The death rate fell from 18 to 1 percent. Then we get to the third state. Is it a success? It should be. No. There’s disbelief. Where are these particles? We’re scientific. How come we don’t see them? He was fired from his post at the hospital. 

He went to Hungary, where he was originally from. While in Hungary, he reduced the mortality rate to less than a percent. His students published a paper about the success of washing your hands with chlorine and lime. 

An editor wrote, “We believe this chlorine washing theory has long outlived its usefulness. It is time this theory no longer deceives us.” 

He felt responsible for a lot of women. He was not able to convince people. He suffered a nervous breakdown, beaten at a mental hospital where he died. 

Today we have a name for this, Semmelweis reflex. It’s a reflex-like rejection of new knowledge because it contradicts entrenched norms, beliefs, or paradigms. Is that true only for the 1800s? Turns out it’s not.  

In 2005, a study showed that inadequate hand washing is one of the prime contributors to 2 million healthcare-associated infections and 90,000 related deaths annually in the United States alone. It’s still hard for people to grasp the importance of washing your hands.  

Fundamental Understanding

The last stage is fundamental understanding. You build a theory and can explain what’s going on. 

In 1879, Louis Pasteur showed the presence of streptococcus in the blood of women with child fever. It took 143 years, but in 2008, a fifty Euro coin commemorating Semmelweis was issued.  

This is our model; you start with hubris and know everything. You start to measure in control. You realize it’s a little humbling. You come up with new insights. They’re going to be hard on your intuition. Sometimes you’ll see this Semmelweis reflex kicking in. Hopefully, over time you’ll believe it and you’ll get into fundamental understanding.  

Controlled experimentation is the way to help. It helps with doing the right thing. You launch it even if you don’t understand why something is working. Over time, you start developing the underlying theories as you get more data and more experiments to work.  

So let me summarize here: The less the data, the stronger the opinions.  

Think about your OEC. What are you optimizing for? Make sure the organization agrees to what you’re optimizing for. It’s an important cultural first step. Are the executives aligned? This is the set of metrics we will optimize for; they are driver metrics for our long-term value.  

Listen to Your Customers, Get the Data

We talk a lot about optimizing for lifetime customer value. Realize it’s hard to assess the values in advance. Listen to your customers and get the data. There’s nothing better than deploying something in the real world and seeing how it performs. Prepare to be humbled because data trumps intuition.  

Getting numbers is easy. Getting numbers you can trust is harder.  

Experiment often. 

There’s no reason not to experiment. To succeed, you’re going to triple your experimentation rate to fail fast and often. Accelerate innovation by lowering the cost of experimentation.