WebmasterID logoWebmasterID
Conversion & funnels

Interleaving experiments

Interleaving compares two ranking algorithms by merging their results into a single list shown to the same user, then crediting whichever ranker contributed the items that were clicked. Because each user sees both rankers' picks side by side, within-user comparison removes between-user noise, making interleaving far more sensitive than splitting users between two whole rankings — widely documented for search and recommendation evaluation.

Partially verified

Blend two rankings, read the clicks

Given ranker A and ranker B, interleaving constructs one result list by alternating or team-drafting items from each, tracking which ranker contributed each position. The user sees a single combined list. Clicks are then attributed to the contributing ranker, and the side that accrues more clicks is preferred. Because the comparison happens within each user's single session, it controls for the user-to-user variation that dominates a between-user split.

Sensitivity and limits

The within-user design is its strength: interleaving has been reported to reach reliable conclusions with far less traffic than equivalent A/B tests, which is why it is common for search and recommendation tuning. Its scope is the limit — it measures relative ranking preference via clicks, not downstream outcomes like conversion or revenue, so teams pair it with an A/B test on business metrics before launch.

Clicks are a proxy; confirm the winner improves the outcome that matters.

How it appears in analytics and logs

A consistent click preference for one ranker's contributed items indicates that ranker is preferred, with less traffic than a between-user test.

Diagnostic use case

Use interleaving to compare two search or recommendation rankers quickly, when a small ranking improvement would be hard to detect with a user-split A/B test.

What WebmasterID can help detect

WebmasterID's first-party click and result-interaction events provide the signal that interleaving credits to each ranker.

Common mistakes

Privacy and accuracy notes

Interleaving compares rankers using aggregate click preference; no extra personal data beyond normal interaction logging is required.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.