Glossary

Multi-Armed Bandit

A multi-armed bandit is a problem to which limited resources need to be allocated between multiple options, and the benefits of each are not yet fully known.

Return to Glossary Read More Below

A multi-armed bandit problem occurs when the benefits of each option are not known or are incompletely known at the time of allocation, but can be discovered as time passes and resources are reallocated. The name comes from a particular visualization of this problem.

Imagine a gambler playing several different slot machines (sometimes called “one-armed bandits”), each of which has a different possible return (aka, some arms are superior to others, but the gambler doesn’t know which ones). The gambler wants to maximize his total reward and to do this, every round he can choose an arm to pull from whatever number of arms he has. Resulting from this predicament iterated over many rounds, the gambler has two choices: he can either keep playing whichever arm has had the greatest return so far, or he can take a random action to pull some other arm, knowing that while some may be more optimal than his current best arm, some may be less. In machine learning, the tradeoff between these options is called the exploration/exploitation tradeoff.

This may seem like a highly specific, non-generalizable problem, but its applications range from clinical trials to financial portfolio design to adaptive routing to feature experimentation. The exploration/exploitation trade-off is seen in any agent incapable of simultaneously planning and executing.

And in general, multi-armed bandit algorithms (aka multi-arm bandits or MABs) attempt to solve these kinds of problems and attain an optimal solution which will cause the greatest returns and the lowest total regret.

Types of Multi-Armed Bandits

There are different approximate solutions to the multi-armed bandit problem. The simplest such solution is called the “epsilon-greedy” algorithm, and all it does is, given a small decimal value epsilon (ε), it spends ε% of the time exploring and (1-ε)% exploiting. This algorithm is called “greedy” because of all the exploiting.

There are many variations on the basic epsilon-greedy algorithm: strategies for finite experiments such as epsilon-first (pure exploration followed by pure exploitation) and epsilon-decreasing (decreasing value of ε over the course of the experiment), as well as strategies which can be used on infinite or continuous experiments, such as value-difference-based epsilon (automatically reduced ε based on machine learning process) and contextual-epsilon-greedy (value of ε computed based on situation). There are also probability-matching (also called Thompson sampling or Bayesian Bandits) strategies which involve matching the number of pulls to the probability of a certain arm being the optimal one.

You may note similarities to A/B/n testing in the process of finding the optimal alternative among many for the purpose of exploiting it.

Benefits and Drawbacks

Multi-armed bandit algorithms are best used for two use cases: either very short experiments where the time it would take to gather significant data in an A/B test is prohibitive (like finding the best headline for a hot new article), or else in very long or ongoing experiments where waiting for a “final answer” from an A/B test doesn’t make sense (like optimizing each user’s news feed).

The main problem with bandit algorithms is their difficulty to implement. If an organization is falling at all short in their DevOps practices, trying to implement a bandit will bring that out. Further, because there aren’t many data scientists who are also excellent programmers, bandits are frequently more expensive since they take more people.

Switch It On With Split

The Split Feature Data Platform™ gives you the confidence to move fast without breaking things. Set up feature flags and safely deploy to production, controlling who sees which features and when. Connect every flag to contextual data, so you can know if your features are making things better or worse and act without hesitation. Effortlessly conduct feature experiments like A/B tests without slowing down. Whether you’re looking to increase your releases, to decrease your MTTR, or to ignite your dev team without burning them out–Split is both a feature management platform and partnership to revolutionize the way the work gets done. Schedule a demo or explore our feature flag solution at your own pace to learn more.

BOOK A DEMO

Want to Dive Deeper?

We have a lot to explore that can help you understand feature flags. Learn more about benefits, use cases, and real world applications that you can try.

Blog

Features

RIP Release Night

View Blog

Analyst Report

Other

AD&D Pros: Stop Guessing and Start Experimenting

Download Analyst Report

Blog

Experimentation

Feature Experimentation With Brain Chemicals In Mind

View Blog

Create Impact With Everything You Build

We’re excited to accompany you on your journey as you build faster, release safer, and launch impactful products.

Free Account Contact Us

Search site

Why Split

Products

Feature Delivery & Control

Feature Measurement & Learning

Enterprise Readiness

Related Links

Use Cases

By Need

By Industry

Resources

Developer Resources

Content Hub

Success

Related Links

Pricing

Company

Search site

Multi-Armed Bandit

Types of Multi-Armed Bandits

Benefits and Drawbacks

Switch It On With Split

Want to Dive Deeper?

Split Experimentation for Azure App Configuration Now in Public Preview

Introducing Switch, Split’s New AI Developer Assistant

Experimenting With Statistical Rigor to Make Data-Driven Taco Decisions

Create Impact With Everything You Build

Want to Dive Deeper?

RIP Release Night

AD&D Pros: Stop Guessing and Start Experimenting

Feature Experimentation With Brain Chemicals In Mind

Create Impact With Everything You Build

Feature Delivery & Control

Feature Measurement & Learning

Related Links

By Need

By Industry

Developer Resources

Content Hub

Success

Related Links

Multi-Armed Bandit

Types of Multi-Armed Bandits

Benefits and Drawbacks

Want to Dive Deeper?

Create Impact With Everything You Build

Want to Dive Deeper?

Create Impact With Everything You Build

Want to see how Split can measure impact and reduce release risk?