Contextual bandit optimisation
A contextual bandit extends the bandit idea by conditioning the choice of variant on context — features available at decision time, such as device or referrer. It learns a policy that maps context to the option likely to convert, allowing per-segment personalisation. This raises the same inference caveats as bandits plus risks around the context features used. This page covers both.
Context changes the choice
A plain bandit asks 'which arm is best overall?'. A contextual bandit asks 'which arm is best given what I know about this visit?'. It uses features present at decision time to pick a variant, learning a mapping from context to action that can differ across segments.
Per-segment, not one winner
The payoff is that no single variant has to win everywhere. A layout that converts mobile users and one that converts desktop users can both be served to the right audience. The cost is added complexity and the same adaptive-allocation inference caveats as any bandit, now multiplied across contexts.
- Conditions on visit context, not just arm performance
- Enables per-segment best options
- Inference is harder than a fixed A/B test
Choose context features carefully
The features you feed a contextual bandit determine both its power and its privacy posture. Coarse, first-party signals — device class, traffic source, page type — keep it privacy-safe. Reaching for identity-grade or fingerprinting features to gain accuracy is a poor trade and not recommended.
How it appears in analytics and logs
Different segments receiving different variants is the intended behaviour, not an inconsistency. The policy is conditioning on the context features it was given.
Diagnostic use case
Use a contextual bandit when the best variant plausibly differs by segment, and you want allocation to adapt per-context rather than crowning one global winner.
What WebmasterID can help detect
WebmasterID's first-party dimensions (device class, referrer, page) supply coarse, privacy-safe context a bandit can condition on without cross-site identifiers.
Common mistakes
- Feeding the policy identity-grade or fingerprinting features.
- Expecting one global winner when the policy is per-segment.
- Treating contextual results as clean fixed-split A/B estimates.
Privacy and accuracy notes
Context should use coarse, non-identifying signals. Building a policy on fingerprinting-grade features is not endorsed; coarse first-party context is privacy-safer.
Related pages
- Multi-armed bandit testing
A multi-armed bandit is an adaptive allocation strategy that sends more traffic to variants that look better as data accumulates, instead of a fixed split. It minimises 'regret' — lost conversions from showing inferior options — but the moving allocation complicates classic inference. Bandits suit ongoing optimisation more than one-off learning. This page explains the trade-off honestly.
- Segmentation for conversion analysis
Segmentation divides visitors into groups — by source, device, geography, or behaviour — so you can compare conversion within comparable cohorts. A single blended conversion rate can hide that one segment converts well and another barely at all. The discipline is choosing segments that answer a question without slicing so finely that each group becomes noise.
- Interaction effects between changes
An interaction effect occurs when the combined impact of two changes is not simply the sum of their individual impacts — one change alters how the other performs. Interactions matter when several experiments run on the same page at once, and they are the core reason multivariate testing exists. This page explains interactions and how concurrent tests can collide.
- Privacy-first analytics
Coarse, first-party context without fingerprinting.
Sources and verification notes
- Wikipedia — Multi-armed bandit (contextual bandit)Context-conditioned policy learning.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.