Developers love feature flags, or feature toggles as they’re often called, and why wouldn’t they? Feature flags make life much easier: They address common challenges such as delivery delays, unreliable releases, and unmeasurable features. In short, they accelerate the software development process and give development teams greater control and insight over every release.
Feature flags are a powerful development tool. I’ve previously written about some of the more common uses of feature flags. In this blog, I want to focus on some more unusual uses.
1. Removing Dead Code
Dead code is any code that no longer gets executed. If your codebase is more than a year old, I can guarantee you have dead code. There are a variety of reasons why your code might be dead.
First, you have a method somewhere that’s no longer referenced. Most modern IDEs will help identify unreferenced methods, which works well in some cases. However, if you’re using dynamic method calls, it can be really tricky to identify unused methods. In addition to unreferenced methods, it’s possible to have an entire API endpoint that’s no longer called. This is impossible to see by looking at the code.
It’s also possible that you have an if-statement or switch-statement that executes different blocks of code based on a variable. If this variable never has certain values, some of those blocks will never be executed.
These problems can also stack together. It’s possible, for example, that you have a method that is called only from two API endpoints. If neither endpoint is ever called, that method will also never be called.
Now that we’ve established that there’s a chance you have some dead code, how do you get rid of it?
One solution is to add log lines to every branching statement of your code. UUIDs (Universally Unique Identifiers) can also be added in each log to identify which lines were executed. The problem with this is that you can quickly generate a high volume of loglines, which can quickly get expensive. Feature flags are one way to solve this problem.
If you wrap the logline in a feature flag, you can control if it will log. With this control, you can enable logging very briefly, for a small percentage of traffic. Then you can remove the loglines that were hit. This should eliminate the logs for the lines most commonly executed. So now you can turn the flag back on for longer or for a higher percentage of traffic. You can repeat this process, and eventually, you can just leave the feature flag on for 100% of traffic. Then check your logs periodically for any additional usage logs. If you are no longer seeing new logs, any remaining loglines in your code are not getting hit. Whenever you’ve comfortably left the flag on long enough, you can delete the blocks of code that still have loglines.
2. Parity Testing
Parity checking or parity testing is a way to check if results from multiple systems are the same. It’s an important form of testing when you replace an old system with a newer one. You might be migrating to a new datastore, breaking up a monolith into microservices, or rearchitecting a data pipeline. It’s useful any time you want the experience for the end-user to stay the same even though things are changing in the background.
How does parity testing work? After getting a new request from a user, you’ll send it to your old system like normal. When you receive the response back, you then send an asynchronous call to a parity checker that includes the original request and the old system’s response. Simultaneously you would send that response back to the end user, so—from their perspective— they’re just calling the old system.
The parity checker then sends the call to the new system and verifies that the response from the new system is the same as what the old system returned. If the responses differ, log the original request and the two responses. Because this is done asynchronously, the user should not see any added latency.
There are a few reasons to put this parity checking behind a feature flag. First of all, logs are expensive and if you have a major problem with the system, you could get a lot of logs. Secondly, if you find a problem, there may not be a point in running traffic through a new system until the problem is fixed.
Finally, if you need to gather additional information about the problem or you want to reproduce it, you can turn the checking back on for a specific test account. Doing this allows you to easily isolate the problem.
3. Evaluating Technology Costs
When you are moving functionality to a new system, you may be hoping to reduce costs. We recently redesigned a significant portion of our data pipeline to do just that.
Before switching to the new system, we wanted to verify that the new design would, in fact, reduce costs. To do that, we used mirroring. Mirroring sends all traffic through an asynchronous call to both the old and new systems. It’s very similar to parity checking except that mirroring doesn’t check for response disparities. Because the response information is not sent or compared, the asynchronous call can be sent at any point to the new system.
To use mirroring to evaluate costs, pick a time interval that gives a representative view of your traffic. Most commonly, that might be a day or a week. Then turn mirroring on for that interval and then back off. Now you can check how much it cost while mirroring was on and extrapolate from there.
Depending on the technology, it might be easier to do a lightweight proof of concept to test costs early on. Or you may need to do this as a verification step later. In either case, it is helpful to verify the cost before fully switching to a new system.
4. Load Testing
We can use mirroring for more than estimating costs. If you’re standing up a net new service, mirroring can also allow you to evaluate if your service can handle your typical traffic patterns. The nice thing about doing this through a feature flip is that you can ramp up the traffic slowly while monitoring your service. If something goes wrong, it’s often easier to identify the problem when you’re near the traffic threshold that causes the problem rather than way over. Furthermore, if you discover the service can’t handle the load, there’s no point in continuing to send traffic to the service until the problems are fixed.
5. Stress Testing
While load testing helps determine whether or not your service can handle expected traffic patterns, stress testing is meant to determine the point at which your service will fail. It allows you to identify and monitor the point at which you should add additional resources or rearchitect your system.
Many tools, such as Gatling or JMeter can be helpful for stress testing. However, these tools work by having someone write a series of test cases. These tools send test cases repeatedly at higher and higher rates. The problem is that these test case traffic patterns are usually very different from actual traffic patterns. You could miss edge cases that could cause problems.
To better emulate your actual traffic patterns, you can instead use mirroring. However, instead of sending the traffic once as you did for load testing, you wait a few seconds and send it again (or possibly more than twice). While there will be some differences from the actual traffic patterns, in most cases, it will be closer than anything you artificially generate.
In this case, the feature flag allows you to ramp up the traffic, and turn it back off once the service fails or when you decide the test is done.
We’ve covered some unique and advanced use cases. Maybe you have some new ideas of your own. We’d love to hear them!