Variance reduction overview
Variance reduction is a family of techniques that make an experiment more sensitive by lowering the variance of its effect estimate — narrowing confidence intervals so a true effect is detected with less traffic. Done correctly, it changes precision, not the expected effect, so it introduces no bias. The main methods — CUPED, stratification, and covariate adjustment — all exploit information unrelated to the treatment.
Why reduce variance
Sample size needed to detect an effect scales with the metric's variance: a noisier metric needs more traffic for the same power. Variance reduction attacks the noise directly, so the same data yields a tighter estimate. The crucial property is that it must not change the expected effect — these are sensitivity techniques, not ways to manufacture a result. They borrow predictive information that is independent of the treatment.
- Required sample size scales with variance
- Tighter estimate from the same data
- Must not shift the expected effect (no bias)
The main methods
CUPED adjusts the metric with a pre-experiment covariate, removing noise the treatment could not have caused. Stratification balances and pools across predictive subgroups. General covariate adjustment (regression with pre-treatment covariates) generalises the idea. All require that the covariate or strata be defined before assignment so they are independent of treatment — violate that and you reintroduce bias. They can be combined, and pair with correct ratio-metric variance via the delta method.
None of these is a substitute for adequate sample size; they stretch it.
How it appears in analytics and logs
After variance reduction, narrower intervals reflect lower estimator variance, not a bigger effect; the point estimate should be unchanged in expectation.
Diagnostic use case
Apply variance reduction when traffic is the binding constraint, to reach decisions sooner without inflating false positives or biasing the estimate.
What WebmasterID can help detect
WebmasterID's first-party pre-period metrics and dimensions supply the covariates and strata these techniques rely on.
Common mistakes
- Using a covariate influenced by the treatment, reintroducing bias.
- Expecting variance reduction to change the effect estimate, not just precision.
- Treating it as a replacement for adequate sample size.
Privacy and accuracy notes
These methods use aggregate covariates and pre-period data; keep inputs first-party and within retention and consent rules.
Related pages
- CUPED variance reduction
CUPED (Controlled-experiment Using Pre-Experiment Data) reduces the variance of an experiment metric by adjusting it with a covariate measured before the test — typically each user's own pre-period behaviour. Because the covariate is independent of the treatment, the adjustment removes noise without introducing bias, so confidence intervals narrow and tests reach a decision with less traffic.
- Stratification in experiments
Stratification splits the population into subgroups (strata) such as device, country, or new-vs-returning, then randomises within each so every variant gets a balanced share of each stratum. This prevents chance imbalance on a known high-variance dimension and, when the stratifying variable predicts the outcome, lowers the variance of the overall effect estimate — a variance-reduction technique alongside CUPED.
- Delta method for ratio metrics
Many experiment metrics are ratios where the denominator is itself random — clicks per session, revenue per user, pages per visit. When the randomisation unit is coarser than the denominator unit, the numerator and denominator are correlated, so naive variance formulas are wrong. The delta method uses a first-order Taylor expansion to approximate the variance of the ratio correctly, fixing confidence intervals.
- WebmasterID docs
How conversion events feed your own analysis.
Sources and verification notes
- Deng, Xu, Kohavi, Walker — Improving the Sensitivity of Online Controlled Experiments (Microsoft)Foundational variance-reduction reference; methods change precision, not the expected effect.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.