Statistical significance and p-values
A result is 'statistically significant' when it would be unlikely if there were really no effect. The p-value is the probability of seeing data at least as extreme as yours assuming the null hypothesis is true — it is not the probability the variant is better, and not a measure of how big the effect is. Significance and practical importance are different questions.
What a p-value is
The p-value is the probability of observing a result at least as extreme as the one you got, assuming the null hypothesis (no real difference) is true. A common threshold is 0.05, but that number is a convention, not a law of nature. Crossing it means the data are surprising under 'no effect' — nothing more.
What it is not
A p-value is not the probability that your variant is better, nor the probability the null is true. It says nothing about effect size: with enough traffic a trivial difference becomes 'significant'. And one significant result is not proof — false positives happen at the rate you set. Always pair significance with an effect size and a confidence interval.
The American Statistical Association has cautioned that p-values are routinely misinterpreted; treat them as one input, not a verdict.
- p is computed assuming the null is true
- Significance is not importance or effect size
- Large samples make tiny effects 'significant'
How it appears in analytics and logs
A small p-value means the data would be surprising if there were no effect. It does not tell you the probability the variant wins, how large the effect is, or that the result will replicate.
Diagnostic use case
Use significance to judge whether an observed difference is plausibly more than noise — but read effect size and confidence interval to judge whether it matters.
What WebmasterID can help detect
WebmasterID supplies the first-party conversion counts a significance test consumes; the statistical judgement stays yours.
Common mistakes
- Reading p as the probability the variant wins.
- Equating statistical significance with business importance.
- Treating a single significant result as settled proof.
Privacy and accuracy notes
Significance testing operates on aggregate counts, not individuals. No personal data is required to compute a p-value.
Related pages
- A/B testing fundamentals
An A/B test randomly assigns visitors to a control (A) or a variant (B), shows each group one version, and compares a pre-chosen metric. Random assignment is what lets you attribute a difference to the change rather than to who happened to see it. The discipline is in deciding the metric and sample size before you start, not after you peek at the numbers.
- Confidence intervals for conversion metrics
A confidence interval gives a range of plausible values for a metric rather than a single point. A 95% confidence interval is constructed so that, over many repeats, that procedure captures the true value 95% of the time. Reporting an interval communicates uncertainty honestly — a conversion rate of 4% with a wide interval is a very different claim than a narrow one.
- The peeking problem in A/B tests
The peeking problem is checking an experiment over and over and stopping the moment it crosses significance. Because each look is another chance for noise to cross the threshold, repeated peeking inflates the false-positive rate well above the nominal level. The fixes are a pre-set sample size or a sequential method designed for continuous monitoring.
- WebmasterID docs
How conversion events feed your own analysis.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.