Conversion & funnels

Multi-armed bandit testing

A multi-armed bandit is an adaptive allocation strategy that sends more traffic to variants that look better as data accumulates, instead of a fixed split. It minimises 'regret' — lost conversions from showing inferior options — but the moving allocation complicates classic inference. Bandits suit ongoing optimisation more than one-off learning. This page explains the trade-off honestly.

Partially verified

Explore versus exploit

The bandit framing comes from a gambler facing slot machines ('one-armed bandits') with unknown payouts. Every pull is a choice between exploring (learning which arm is best) and exploiting (playing the arm that looks best so far). Adaptive allocation tilts toward exploitation as evidence grows, reducing conversions lost to weak variants.

What you gain and give up

The gain is lower regret: fewer users see the losing variant while the test runs, which matters when traffic is valuable. The cost is statistical: because allocation changes with the data, the clean confidence intervals of a fixed A/B test no longer apply directly, and estimating a precise effect size for each arm is harder.

Lower regret during the test
Adaptive split, not a fixed ratio
Harder to extract a clean effect estimate

When a bandit fits

Bandits suit short-lived or continuous-optimisation problems — headlines, promotions, layouts where you mainly want to serve the best option soon. When the goal is a durable, well-estimated learning about an effect (to inform a strategy), a fixed-horizon A/B test is usually the cleaner instrument.

How it appears in analytics and logs

Shifting traffic shares are expected behaviour for a bandit, not a bug. But the adaptive split means you cannot read it like a fixed 50/50 A/B test.

Diagnostic use case

Use a bandit when the goal is to maximise conversions during the test (exploit the winner fast), rather than to obtain a clean, fixed-horizon estimate of the effect size.

What WebmasterID can help detect

WebmasterID's first-party per-variant conversion events give a bandit the aggregate signal it needs, and let you audit the realised allocation over time.

Common mistakes

Reading bandit arms as if they were a fixed 50/50 A/B test.
Using a bandit when you need a precise effect-size estimate.
Ignoring that early allocation is noisy before evidence accrues.

Privacy and accuracy notes

A bandit allocates by performance, not by identity. It can run on aggregate per-arm rates without storing personal data about who saw which variant.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Wikipedia — Multi-armed banditExplore/exploit trade-off and regret minimisation.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.