Conversion & funnels

P-value misconceptions

The p-value is one of the most misread numbers in experimentation. It is the probability of seeing data at least as extreme as observed if the null hypothesis were true — not the probability the null is true, not the probability of a fluke, and not a measure of effect size. The American Statistical Association issued a formal statement listing exactly these misconceptions.

Verified against primary sources

What this means

Formally, a p-value is the probability, computed under a specified statistical model and the null hypothesis, of obtaining a result equal to or more extreme than what was observed. That is a statement about the data given the null — not about the null given the data. Inverting it ('there is a 5% chance the null is true') is a logical error, not a conservative rounding.

What the ASA warned against

The American Statistical Association's 2016 statement set out principles every experimenter should internalise: p-values do not measure the probability that the hypothesis is true or that the data were produced by chance alone; conclusions should not be based only on whether a p-value passes a threshold; and a p-value does not measure the size or importance of an effect. Proper inference needs context, effect sizes, and design quality, not a single number.

In practice this means a 'significant' result on a tiny, meaningless effect can be worthless, and a non-significant result is not proof of no effect.

Not the probability the null is true
Not the probability the result is a fluke
Not a measure of effect size or business value

How it appears in analytics and logs

A p-value of, say, a small number means the data would be unusual if there were no effect. It does not tell you the probability the effect is real, the size of the effect, or whether the result matters commercially.

Diagnostic use case

Read p-values correctly so you do not over- or under-state evidence: a small p-value flags surprise under the null, nothing more, and must be paired with effect size and context.

What WebmasterID can help detect

WebmasterID provides the first-party event counts a significance test consumes; interpreting the resulting p-value correctly is on the analyst, and this page helps.

Common mistakes

Reading p as 'probability the result is wrong'.
Treating p just above threshold as proof of no effect.
Reporting significance without an effect size.

Privacy and accuracy notes

P-values are computed from aggregate counts, not personal data. This page is educational and not a substitute for a statistician on high-stakes decisions.

↑ All conversion topics in Conversion & funnels

Sources and verification notes

American Statistical Association — Statement on p-values (2016)

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.