For online experiments, we test whether a change is improving a metric, so a false negative result would indicate that “the change being tested has not improved the key metric significantly when in fact, the change generally has a positive impact on the underlying behavior.” In the context of A/B testing, a false negative declares that the treatment (i.e. the new way of doing things) did not lead to an outcome different from the control (the status quo), when it did. A false negative is also known as a type II error, or a mistaken acceptance of the null hypothesis.
Understanding False Negative
To understand why a false negative is the “mistaken acceptance of the null hypothesis,” it’s useful to remember that in a statistical test, we start with the assumption that there will be no difference between the control and treatment (the “A” and “B” in an A/B test). The goal is to then disprove the “null hypothesis” by accumulating enough evidence to observe a difference greater than random chance would introduce. However, there is still a small chance (5 or 10 percent). If we don’t reject the null hypothesis in our findings when in fact there is a difference in the real world, we have mistakenly accepted the null hypothesis, committed a type II error, and found a false negative result.
The best way to avoid a false negative result, is to ensure that your experiment has sufficient power before you conduct it. A power analysis will help you determine how many observations (i.e. users passing through your test) are needed in order to reliably detect a given amount of difference. Power analysis will let you calculate the minimum likely detectable effect (MLDE) for a given sample size, or conversely to calculate the sample size needed to reliably measure a given MLDE.
Definition of False-Negative
Errors are sometimes present during an experiment or an AB-test process. Still, preventing them is essential so that their effect is negligible. Sometimes no matter how much precaution is taken, some errors still slip in. These errors are known as random errors, they can generate false positive or false negative results in medical tests.
In an online experiment, a false positive means that a team releases a change that isn’t effective; a false negative means they don’t release a change that is effective. Releasing a bad implementation (false positive) could be corrected or improved later. However, not releasing a change that is effective can demoralise a team, discouraging them from trying a similar idea.
Technical Difficulties May Cause False-Negatives
Technical difficulties that encompass inappropriate set-up, configuration, or software involved in a test may cause false-negatives. It is also the lack of know-how of the test procedures.
False-Negatives are Misleading
Every test aims to get the appropriate result which helps to determine the action that follows. Getting a false-negative can be misleading as the wrong line of action follows. Because of this reason, errors that might lead to this type of result must be avoided to its barest negligible minimum. The authenticity of a result is the safety of both the patient and the people in the environment.
Get Split Certified
Split Arcade includes product explainer videos, clickable product tutorials, manipulatable code examples, and interactive challenges.
Deliver Features That Matter, Faster. And Exhale.
Split is a feature management platform that attributes insightful data to everything you release. Whether your team is looking to test in production, perform gradual rollouts, or experiment with new features–Split ensures your efforts are safe, visible, and highly impactful. What a Release. Get going with a free account, schedule a demo to learn more, or contact us for more information.