As companies implement more innovative practices, they typically realize the faults of using an imperfect environment, like staging. Staging environments are expensive and they often do not match the behavior of production which leads to faulty test results. They also don’t provide any confidence that your features are working before launch. The only way to know that your features are working in production is to test them in production. Testing in production increases developer productivity and gives you higher confidence for each feature release. It allows you to be proactive rather than reactive and gives your users a much better user experience.
Stop Testing in Staging
Staging environments are expensive to maintain, and it’s nearly impossible to find a staging environment that perfectly matches production and behaves exactly the same way. Because the data is different in staging than in production, test results can – and usually do – differ. This makes your test results in staging less reliable, and it causes the tests to be much slower when they run in staging. Something I see a lot is that a tester will go into their staging environment, automate a flow, build up their confidence that the feature is working, and then once it’s released to production, there are bugs because of the difference in environments. Check out this letter I wrote to break up with staging.
Bottom line: No one cares if your features are working in staging.
4 Steps for Safely Testing in Production
Testing in production, also known as “TiP” or “testing in prod,” can be risky when not done safely. In order to safely test your features in production, follow these steps.
1. Install the Right Tools
The first tool you will need to install is feature flagging. Feature flagging is a way to decide who sees which features. It’s used to hide, enable, or disable features at runtime. As a developer, you’ll deploy your features behind a feature flag, and target your internal team to be able to see those features. That means that only your team will be able to see and interact with the new feature. So if there’s bugs or things that need to be worked out, your end users will not see them. Once you finish testing and you are confident in your feature in production, go ahead and turn on the feature flag. At this point, you already know that your features are working in production and you can confidently turn on the feature flag, which allows your entire user base to see and interact with the feature.
The second tool you need is an automation framework. This is a necessity when testing in production because it’s not scalable to manually test every feature during release. When you create your feature flags, upon targeting your teammates, you should also target your automation bots. Then in your test, use the bot that you targeted in that feature flag to run the test. The tests can then continue to run while the feature flag is off and catch any bugs that might creep in. What’s great about this process is that you don’t need to make changes to your automation scripts after you release your feature.
The next tool you need is a job scheduler. The job scheduler should run your tests for you incrementally, or preferably in your build pipeline.
Lastly, you’ll need an alerting tool that can be integrated with your job scheduler. In the case where a test fails, you should immediately be alerted so that you can fix it.
2. Carefully Create Your Test Data
In order to create and manipulate test data in production without affecting real end users, or any data or analytics, I recommend using a boolean. You can have something like
is_test = true for all test users. Then in your data and analytics platform, you can exclude or move those test users’ actions to a different list. You also want to make sure that your test entities only interact with each other. You can hard code this into your code. For example, your test user can go to a test web page and interact with a test entity. Because you know these components ahead of time, you can mark them as test entities in the back end and hard code them in your automation scripts to say you want the test user to go to the test web page, etc.
3. Testing in Production Begins with Writing Tests
One of the most important parts of writing good tests is having accurate setup and teardown. In the setup step, make sure you start from a clean canvas – this means clearing all cookies, logging in with the correct test user, etc. In the teardown step, you want to make sure you undo any actions that were done in the test. Be sure to also set up your alerts to not only be sent when a test fails, but when any step of this process fails. If your teardown is failing and tests are being run in production and not being cleaned up, you will run into problems.
4. Deploy to a Production Canary
Production canaries allow you to slowly roll out a change to a small subset of users before rolling it out to the entire population. This provides risk mitigation in case something does go wrong. Let’s say you implement your feature flags and you test in production with the feature flag off, and for some reason you didn’t catch a bug that is now released for your users to see. Would you rather 100% of your users encounter that issue or 1%? Using a production canary provides you with this ability to fix the issue in a controlled environment in case something goes wrong.
Shift Your Company’s Culture to Welcome Testing in Production
Many times, the first thing people think about when you bring up the topic “testing in production” is fear. Companies don’t test in production because of fear and lack of trust in their systems and for the same reason they refuse to invest in the tools and process changes that will generate that trust. By using feature flags before launch, and using canary releases during launch, you are mitigating risk.
Here are some things I like to do when talking to managers and VPs about testing in production.
First, explain why the pros outweigh the cons. Think about your current staging environment and its reliability status. Are there frequently issues that could have been caught if you were testing in production? The best way to approach this is to use examples from the past. Think about a time where you tested a feature in and out, worked with your product owner to make sure all of the requirements were tested and all the bugs were sorted out, and then you released the feature to production and there were issues. Bring up examples from the past to prove your point.
Next, you’ll need to propose a path forward. You can’t just go to management with a problem and not a solution. Ask if you can take some time in the next sprint to experiment with feature flags. You can even do it for free with Split!
Learn More About Testing in Production
Excited about being able to test in production? Check out this video on Feature Flag Maintenance that goes into more detail about how to handle your feature flags in production.
Stay up to date
Don’t miss out! Subscribe to our digest to get the latest about feature flags, continuous delivery, experimentation, and more.
Continuous integration, continuous delivery, and continuous deployment are foundational in today’s agile engineering ecosystem. However, many times they are used interchangeably and often incorrectly. Let’s remove the confusion and settle the differences between continuous integration, continuous delivery, and continuous deployment. What is Continuous Integration? Continuous integration happens when developers regularly…
SRE goals align perfectly with robust feature delivery and experimentation, where every new feature gets tested, and releases happen behind feature flags.
Engineers are deploying half-finished features into production, and they’re doing it on purpose. They use feature flags to hide their partially completed work. You can too.