Conversion & funnels

Interleaving experiments

Interleaving compares two ranking algorithms by merging their results into a single list shown to the same user, then crediting whichever ranker contributed the items that were clicked. Because each user sees both rankers' picks side by side, within-user comparison removes between-user noise, making interleaving far more sensitive than splitting users between two whole rankings — widely documented for search and recommendation evaluation.

Partially verified

Blend two rankings, read the clicks

Given ranker A and ranker B, interleaving constructs one result list by alternating or team-drafting items from each, tracking which ranker contributed each position. The user sees a single combined list. Clicks are then attributed to the contributing ranker, and the side that accrues more clicks is preferred. Because the comparison happens within each user's single session, it controls for the user-to-user variation that dominates a between-user split.

Merge both rankers' results into one list
Credit clicks to the contributing ranker
Within-user comparison cancels between-user noise

Sensitivity and limits

The within-user design is its strength: interleaving has been reported to reach reliable conclusions with far less traffic than equivalent A/B tests, which is why it is common for search and recommendation tuning. Its scope is the limit — it measures relative ranking preference via clicks, not downstream outcomes like conversion or revenue, so teams pair it with an A/B test on business metrics before launch.

Clicks are a proxy; confirm the winner improves the outcome that matters.

How it appears in analytics and logs

A consistent click preference for one ranker's contributed items indicates that ranker is preferred, with less traffic than a between-user test.

Diagnostic use case

Use interleaving to compare two search or recommendation rankers quickly, when a small ranking improvement would be hard to detect with a user-split A/B test.

What WebmasterID can help detect

WebmasterID's first-party click and result-interaction events provide the signal that interleaving credits to each ranker.

Common mistakes

Treating a click-preference winner as proof of better conversion.
Using interleaving outside ranking problems it was designed for.
Skipping a follow-up A/B test on the real business metric.

Privacy and accuracy notes

Interleaving compares rankers using aggregate click preference; no extra personal data beyond normal interaction logging is required.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Chapelle, Joachims, Radlinski, Yue — Large-scale validation and analysis of interleaved search evaluation (ACM TOIS)Peer-reviewed validation of interleaving's sensitivity vs A/B.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.