CUPED variance reduction
CUPED (Controlled-experiment Using Pre-Experiment Data) reduces the variance of an experiment metric by adjusting it with a covariate measured before the test — typically each user's own pre-period behaviour. Because the covariate is independent of the treatment, the adjustment removes noise without introducing bias, so confidence intervals narrow and tests reach a decision with less traffic.
How the adjustment works
CUPED replaces the raw metric Y with an adjusted metric Y − θ(X − E[X]), where X is a pre-experiment covariate (often the same metric measured before the test) and θ is chosen to minimise variance. Because X is fixed before any user is assigned, it cannot be affected by the treatment, so subtracting its variation removes noise without shifting the expected treatment effect. The optimal θ is the covariance of Y and X divided by the variance of X.
- Covariate must be pre-experiment (independent of treatment)
- Adjusted metric has lower variance, same expected effect
- Optimal θ = Cov(Y,X) / Var(X)
When it helps and when it does not
The variance reduction grows with how strongly the covariate correlates with the outcome; for users with no history (new visitors) there is no covariate, so the gain is smaller. CUPED is a sensitivity technique, not a way to manufacture an effect — applied correctly it changes precision, not the point estimate. It is one of several variance-reduction methods alongside stratification.
Validate that the covariate truly predates assignment, or the no-bias guarantee breaks.
How it appears in analytics and logs
Narrower intervals after CUPED reflect lower estimator variance, not a larger effect — the point estimate of the effect is unchanged in expectation.
Diagnostic use case
Apply CUPED when users have pre-experiment history correlated with the outcome, to detect the same effect with less traffic or more sensitivity.
What WebmasterID can help detect
WebmasterID's first-party history supplies the pre-experiment covariate CUPED needs, computed within your own retention settings.
Common mistakes
- Using a covariate measured during the experiment, which reintroduces bias.
- Expecting CUPED to change the effect estimate rather than its variance.
- Assuming gains for cohorts with no pre-experiment history.
Privacy and accuracy notes
CUPED uses pre-period aggregates or covariates; apply it on first-party data within your retention and consent rules.
Related pages
- Stratification in experiments
Stratification splits the population into subgroups (strata) such as device, country, or new-vs-returning, then randomises within each so every variant gets a balanced share of each stratum. This prevents chance imbalance on a known high-variance dimension and, when the stratifying variable predicts the outcome, lowers the variance of the overall effect estimate — a variance-reduction technique alongside CUPED.
- Confidence intervals for conversion metrics
A confidence interval gives a range of plausible values for a metric rather than a single point. A 95% confidence interval is constructed so that, over many repeats, that procedure captures the true value 95% of the time. Reporting an interval communicates uncertainty honestly — a conversion rate of 4% with a wide interval is a very different claim than a narrow one.
- Sample size in experiments
Sample size is the number of subjects per arm an experiment needs to detect a chosen effect with acceptable error rates. It is computed in advance from the baseline rate, the minimum effect worth detecting, and the false-positive and false-negative rates you accept. Too small and you miss real effects; running until 'it looks good' inflates false positives.
- Event Explorer
Pre-period behaviour that can serve as a CUPED covariate.
Sources and verification notes
- Deng, Xu, Kohavi, Walker — Improving the Sensitivity of Online Controlled Experiments (CUPED), MicrosoftOriginal CUPED paper; θ formula and no-bias property defined there.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.