Conversion & funnels

Contextual bandit optimisation

A contextual bandit extends the bandit idea by conditioning the choice of variant on context — features available at decision time, such as device or referrer. It learns a policy that maps context to the option likely to convert, allowing per-segment personalisation. This raises the same inference caveats as bandits plus risks around the context features used. This page covers both.

Partially verified

Context changes the choice

A plain bandit asks 'which arm is best overall?'. A contextual bandit asks 'which arm is best given what I know about this visit?'. It uses features present at decision time to pick a variant, learning a mapping from context to action that can differ across segments.

Per-segment, not one winner

The payoff is that no single variant has to win everywhere. A layout that converts mobile users and one that converts desktop users can both be served to the right audience. The cost is added complexity and the same adaptive-allocation inference caveats as any bandit, now multiplied across contexts.

Conditions on visit context, not just arm performance
Enables per-segment best options
Inference is harder than a fixed A/B test

Choose context features carefully

The features you feed a contextual bandit determine both its power and its privacy posture. Coarse, first-party signals — device class, traffic source, page type — keep it privacy-safe. Reaching for identity-grade or fingerprinting features to gain accuracy is a poor trade and not recommended.

How it appears in analytics and logs

Different segments receiving different variants is the intended behaviour, not an inconsistency. The policy is conditioning on the context features it was given.

Diagnostic use case

Use a contextual bandit when the best variant plausibly differs by segment, and you want allocation to adapt per-context rather than crowning one global winner.

What WebmasterID can help detect

WebmasterID's first-party dimensions (device class, referrer, page) supply coarse, privacy-safe context a bandit can condition on without cross-site identifiers.

Common mistakes

Feeding the policy identity-grade or fingerprinting features.
Expecting one global winner when the policy is per-segment.
Treating contextual results as clean fixed-split A/B estimates.

Privacy and accuracy notes

Context should use coarse, non-identifying signals. Building a policy on fingerprinting-grade features is not endorsed; coarse first-party context is privacy-safer.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Wikipedia — Multi-armed bandit (contextual bandit)Context-conditioned policy learning.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.