Frequentist vs Bayesian experiment analysis
Frequentist and Bayesian are two coherent ways to analyse the same experiment data. Frequentist methods ask how likely the observed data is under a null hypothesis and report p-values and confidence intervals. Bayesian methods combine a prior with the data to report posterior probabilities and credible intervals. Each has assumptions and failure modes; neither is universally 'correct'.
What this means
The frequentist framework treats the true effect as a fixed unknown and the data as random; it controls long-run error rates and summarises evidence with p-values and confidence intervals. The Bayesian framework treats the effect as a random quantity with a probability distribution and the data as fixed once observed; it reports posteriors and credible intervals. They can reach the same practical conclusion but phrase certainty differently.
Trade-offs to weigh
Frequentist tests give an explicit, prior-free error-rate guarantee, which auditors and platforms often expect, but the p-value is widely misread. Bayesian tests give an intuitive probability and a natural way to express prior knowledge, but the prior must be chosen and disclosed, and a poor prior distorts small samples.
Both require enough data and break under peeking. The choice is about which question and which assumptions fit your context, not about one being more honest than the other.
- Frequentist: p-values, confidence intervals, fixed error rates
- Bayesian: posteriors, credible intervals, explicit priors
- Both fail under peeking and inadequate samples
How it appears in analytics and logs
A p-value answers 'how surprising is this data under no effect'; a posterior answers 'how probable is this effect given the data and prior'. Knowing which you are reading prevents misinterpreting one as the other.
Diagnostic use case
Pick a framework deliberately: frequentist when you want fixed error-rate control and a familiar p-value, Bayesian when you want a direct probability statement and can justify a prior.
What WebmasterID can help detect
WebmasterID supplies the first-party event counts; the analytical framework is a downstream choice. The same measured events support either approach.
Common mistakes
- Calling one framework objectively correct.
- Reading a confidence interval as a credible interval.
- Switching frameworks mid-test to get a result you like.
Privacy and accuracy notes
Both frameworks work on aggregate exposure and conversion counts, not individuals. This is an educational comparison, not statistical advice for a specific decision.
Related pages
- Bayesian A/B testing
Bayesian A/B testing treats the conversion rate of each arm as an unknown with a probability distribution. It combines a prior belief with observed data to produce a posterior, from which you can state things like 'the probability that B beats A is high' and quantify the expected loss of choosing wrong. It is an alternative framing to the frequentist p-value, with different assumptions rather than a guarantee of more truth.
- P-value misconceptions
The p-value is one of the most misread numbers in experimentation. It is the probability of seeing data at least as extreme as observed if the null hypothesis were true — not the probability the null is true, not the probability of a fluke, and not a measure of effect size. The American Statistical Association issued a formal statement listing exactly these misconceptions.
- Confidence intervals for conversion metrics
A confidence interval gives a range of plausible values for a metric rather than a single point. A 95% confidence interval is constructed so that, over many repeats, that procedure captures the true value 95% of the time. Reporting an interval communicates uncertainty honestly — a conversion rate of 4% with a wide interval is a very different claim than a narrow one.
- Statistical significance and p-values
A result is 'statistically significant' when it would be unlikely if there were really no effect. The p-value is the probability of seeing data at least as extreme as yours assuming the null hypothesis is true — it is not the probability the variant is better, and not a measure of how big the effect is. Significance and practical importance are different questions.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.