Stratification in experiments
Stratification splits the population into subgroups (strata) such as device, country, or new-vs-returning, then randomises within each so every variant gets a balanced share of each stratum. This prevents chance imbalance on a known high-variance dimension and, when the stratifying variable predicts the outcome, lowers the variance of the overall effect estimate — a variance-reduction technique alongside CUPED.
Balanced assignment within strata
Pure randomisation balances subgroups only in expectation; in any single run a high-variance dimension can end up skewed between variants by chance. Stratified (block) randomisation removes that risk by assigning within each stratum, guaranteeing each variant gets a proportional slice of every subgroup. This is most valuable when the stratifying variable both varies a lot and correlates with conversion.
- Randomise within strata, not just across the whole population
- Guarantees balance on the chosen dimension
- Most useful when the dimension predicts the outcome
Variance reduction and analysis
When the stratifying variable predicts the outcome, estimating the effect within strata and pooling (a stratified estimate) removes the between-stratum noise, narrowing the confidence interval for the same data. The catch is that strata must be defined before assignment and not be so fine that cells become tiny or identifying. Analyse with the same strata you assigned on.
Stratification and CUPED both cut variance; they can be combined.
How it appears in analytics and logs
Balanced strata across variants reduce the risk that an observed difference is really a subgroup-mix difference rather than a treatment effect.
Diagnostic use case
Stratify on a variable strongly tied to the outcome (e.g. device) so variants are balanced on it and the effect estimate is more precise.
What WebmasterID can help detect
WebmasterID's first-party dimensions (device, source, returning) supply coarse strata for balanced assignment and analysis.
Common mistakes
- Choosing strata after seeing results (post-stratification fishing).
- Using strata so granular that cells are tiny or identifying.
- Stratifying then ignoring the strata in the analysis.
Privacy and accuracy notes
Stratify on coarse, aggregate attributes; avoid strata so granular they could single out an individual.
Related pages
- CUPED variance reduction
CUPED (Controlled-experiment Using Pre-Experiment Data) reduces the variance of an experiment metric by adjusting it with a covariate measured before the test — typically each user's own pre-period behaviour. Because the covariate is independent of the treatment, the adjustment removes noise without introducing bias, so confidence intervals narrow and tests reach a decision with less traffic.
- Pitfalls of segmenting test results
Segmenting experiment results — by device, country, source — is useful, but slicing a non-significant test until some segment 'wins' is a recipe for false positives. Each extra segment is another comparison; enough slices guarantee a spurious hit. Legitimate segment analysis is pre-planned or corrected for multiplicity. This page separates honest segmentation from data dredging.
- Sample ratio mismatch (SRM)
Sample ratio mismatch (SRM) is when the observed allocation of users to experiment arms diverges from the planned ratio by more than chance allows — for example a 50/50 test that lands far from 50/50. It signals a bug in assignment, logging, or filtering, and a test with SRM should not be trusted regardless of how good the headline result looks.
- Event Explorer
First-party dimensions usable as experiment strata.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.