Where Feature Flags Meet Testing

Henry Jewkes on February 23, 2017

At Split, we’re committed to making meaningful contributions to the process of how software is released and provide the tools to facilitate testing your application under all circumstances.​ We always have our sights set on making our service better, more responsive to customer feedback, and more robust in its integrations. In light of that, Henry Jewkes will describe how we have extended Split to the JUnit Java testing framework​, to facilitate easier software testing when feature flags are being employed for quicker releases. - Patricio Echague, CTO

Since joining Split, it has been clear that testing is the fundamental building block of our software development process. We deploy unit tests to validate discrete parts of business logic, end-to-end tests to monitor our APIs, automated user interaction tests to maintain our user experience, and entire network of bots ensuring the quality of service of our SDKs.

Our tests drive development, allowing engineers to think through all possible use cases and edge conditions. Tests provide documentation, showing new members of the team how a block of code is expected to work in a variety of situations. And tests drive stability, constantly seeking out regressions as more and more features are added.

Testing, with control.

In today's modern software development lifecycle, high quality testing is as important as ever, but it has become increasingly more challenging to ensure your coverage is complete. At the same time that feature flag technology has allowed for releases to be more stable than ever, poor implementations of that technology have brought about cultures who test purely in production - reliant on the ease of ramping and rollback to protect users from widespread catastrophe. Using a controlled rollout platform that is both feature rich and highly testable is a challenge, but here at Split we believe it is essential for successful releases to be able to have the utmost confidence in the features released to clients.

When most organizations start with working with controlled rollout, they default all feature flag checks to the "off" or "control" state and then rely on unit testing the methods specific to the new feature to ensure it behaves as expected. However, as more and more features are enabled in production there can be significant drift between the behavior you test and the behavior customers actually see.

To prevent that drift both during testing and for engineers running their product's locally, we have integrated into the Split Software Development Kit the recently-announced Off-the-Grid mode, which uses a configuration file to set the state of each of your features. Our customers often maintain such a configuration to run their tests against that mirrors the state of an environment. It is even possible to implement your deployment architecture to run tests with multiple Off-The-Grid configurations, allowing for a variety of states to be tested before deploying to users.

Introducing Split's new programmatic testing mode.

Often, though, we will want to test the multiple states of a feature directly inside the unit test suite. Rather than pass in multiple configuration files, we prefer a method that allows a single file to test a variety of feature options for consistency, and also to directly test for the expected changes of behavior when a feature is enabled. To that end, we have released a new testing mode for our Java client. This testing mode allows you to programatically change the treatments returned by the Split Client; allowing you to start a test by setting the specific features you wish to test regardless of their state in the Split console.

Examples: how we test at Split.

This is best described in context of a recent feature release here at Split. At Split we know our success is dependent on ensuring our product is built for teams, and to that end we are releasing functionality that allows users to construct their own groups of users and use those teams as a short hand throughout the product. Part of this project involves transitioning our existing model of handling Split Administrators to utilize this same groups infrastructure. However, making a change to such a critical part of our architecture needs to be done with robust testing and deliberate roll out.

A best practice we have established for these types of transitions is to utilize the techniques of dual writes and dark reads. Effectively, you start your roll out by writing the relevant data to both the original collection and the new data structure, while continuing to read from the original collection. Then, to validate your new logic, you start reading from both the original source and the new structure and compare the results. You still return the data coming from the original source (this second read happens "in the dark") but are able to log any occasions that the results differ. Once you confirm that the data is always consistent, you can turn the reads from the new structure "on", and only once those reads are fully ramp do you transition your writes from "dual" to "off" and deprecate the old code. This process ensures a steady, consistent, and controlled roll out with minimal risk of exposure to customers.

Testing this logic in the old world would be very risky. Typically requiring careful monitoring of logs and dashboards, ramping and de-ramping of the relevant features, and additional deployments to fix bugs found along the way. Using the testing mode of our java client, we can add extensive unit tests to confirm all permutations of these features behave as expected.

This system allows you to validate multiple feature release states in a single unit test file, both setting treatments at the class level in the setup function and at the test level. However, this model does result in a lot of repeated code. Certainly you could separate out the business logic of the test to a separate function, but here at Split we have a better way.

Alongside the SplitClientForTest, we are also today releasing a JUnit Test Runner and suite of Java Annotations to streamline testing of your application across a wide variety of configurations. The SplitTestRunner is built on the standard JUnit4 test runner and preserves all of the @Before, @After, and @Rule functionality you are likely using today. However in using this runner in your test files and tagging your SplitClient in the file, you can then annotate your tests with the features and treatments you wish to run them against and Split will do the rest.

The SplitTestRunner executes each @SplitTest once for each treatment value, updating the SplitClient with the correct treatments before each run. Any unset features will return the "control" treatment, and anything treatments manually set elsewhere will be preserved unless specifically overwritten by the SplitTest annotation.

In most cases, one is not limited to testing a single feature's roll out, but ideally you wish to test how a behavior might interact with a variety of other feature states. Therefore we have built the SplitScenario, which allows the developer to define a variety of feature states and the test will run across all possible permutations. In the case at hand, that would be testing both the read and write treatments simultaneously!

Now in less code than before we are actually running the test 9 times! Our goal is to make it easier to test a wide variety of complex interactions than it would be to simply test two different feature states under other frameworks.

In running this test for the first time, one may see certain failing scenarios. With writes disabled to the feature, total reads to that feature would likely start failing, and so would that run of the test. Now, the best practice I recommend is to identify these failure scenarios and defensively code against them - falling back to a sane default. These permutations are an excellent way to identify such cases. However, there could be scenarios where a bad combination of features may be inevitable, and we do support limiting your tests to only supported feature combinations with SplitSuites.

In this way you can specifically test for different behavior with different combinations of features. Finally, the SplitTestClient annotation also works as a global level SplitSuite, letting you apply the same set of feature permutations across all tests run in the class (unless specifically overridden at the test level). This configuration even carries over through inheritance, allowing you to define a BaseTest class with a full suite of default feature treatments, and then have your Test files inherit from that BaseTest and further customize accordingly. We do exactly that here at Split, with a BaseTest which both defines our default feature configuration and sets up our Guice injection. That allows our test class in this case to focus specifically on the logic that we care to test:

Testing with high numbers of permutations can be done as part of integration or release testing so as to reduce the runtime of local tests. The Java testing client and annotations suite is available now, and further documentation is available by reviewing the sample tests available in that repository. In coming releases we will be rolling out similar testing clients for all of our SDKs, and will be implementing similar annotations and decorators based on language support.

We're Hiring!

Join the Team