Conversion & funnels

Holdout groups

A holdout group is a randomly chosen set of users who are intentionally excluded from one or more shipped changes, so their behaviour serves as a long-run baseline. Where an A/B test measures one change briefly, a holdout measures the combined, sustained effect of everything launched, guarding against the slow accumulation of small regressions or overstated wins.

Verified against primary sources

What this means

A holdout group is held back from receiving changes — sometimes a single feature, sometimes the whole stream of launches — for an extended period. Everyone else gets the new experiences. The holdout's metrics become the counterfactual: what the world would look like had you shipped nothing. Comparing the treated population to the holdout estimates the real, accumulated impact.

Why short tests are not enough

Individual A/B tests are short and measured at launch, when novelty and optimistic interpretation can inflate them. Ship dozens and the headline wins rarely sum to the expected total: some effects decay, some interact, some were noise. A long-running holdout reveals the true aggregate by keeping a clean baseline untouched by the changes.

The cost is real — holdout users miss improvements, and the group must be large enough and maintained long enough to detect the cumulative effect — so teams size and time-box holdouts deliberately.

Random users kept on the old experience as a baseline
Measures cumulative, long-run impact of shipped changes
Costs the group the benefit of improvements meanwhile

How it appears in analytics and logs

A persistent gap between the holdout baseline and the treated population is the durable, cumulative effect of your shipped changes. A shrinking or absent gap warns that short-term test wins did not add up.

Diagnostic use case

Reserve a holdout when you want to verify that the sum of many shipped experiments actually moved the business over months, not just that each looked good in isolation at launch.

What WebmasterID can help detect

WebmasterID measures first-party conversion and engagement over long windows, which is what comparing a holdout baseline to the treated group requires.

Common mistakes

Assuming launch-time test wins simply add up over time.
Making the holdout too small to detect the cumulative effect.
Contaminating the holdout by leaking changes into it.

Privacy and accuracy notes

Holdouts are defined by random aggregate assignment, not by profiling individuals. This page is educational, not statistical advice.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.