Conversion & funnels

The winner’s curse in experiments

The winner's curse is the tendency for the measured effect of a 'winning' experiment to overstate the true effect, because selecting on statistical significance favours upward noise. It explains why shipped wins often underdeliver in production. Larger samples and replication shrink the inflation. This page explains the mechanism and how to set realistic expectations after a win.

Partially verified

Why selection inflates effects

Every measured effect is the true effect plus noise. When you keep only the experiments that crossed a significance bar, you preferentially keep the ones where noise pushed the estimate up. The surviving winners therefore have measured effects biased above their true values — selection itself causes the inflation.

It hits marginal wins hardest

The smaller the sample and the closer the result sat to the significance threshold, the larger the curse. A win that barely cleared the bar in an underpowered test is the most likely to disappoint in production. Well-powered tests with comfortable margins suffer far less inflation.

Selecting significant results favours upward noise
Underpowered, marginal wins are inflated most
Larger samples and replication reduce the bias

Setting honest expectations

Treat a test's point estimate as an optimistic ceiling, not a forecast — especially for marginal wins. Replicate important results, prefer adequately powered tests, and monitor the effect after launch. Building a roadmap on uncorrected, barely-significant uplifts is how a backlog of 'wins' fails to add up to real growth.

How it appears in analytics and logs

A win that scraped past the significance threshold likely overstates its true effect. Production typically delivers less than the test's point estimate suggested.

Diagnostic use case

Discount the headline uplift of a barely-significant win when forecasting impact; expect production results below the test estimate, and replicate before betting big.

What WebmasterID can help detect

WebmasterID's first-party measurement lets you re-observe a shipped win over time, so you can compare the post-launch effect against the test estimate and detect inflation.

Common mistakes

Forecasting production impact from a marginal win's point estimate.
Shipping underpowered wins without replication.
Summing many barely-significant uplifts as if each is certain.

Privacy and accuracy notes

The winner's curse is a property of how effects are selected, estimated from aggregate results. No personal data is involved.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

Wikipedia — Winner’s curseSelection-induced overestimation of effects.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.