Sample ratio mismatch (SRM)
Sample ratio mismatch (SRM) is when the observed allocation of users to experiment arms diverges from the planned ratio by more than chance allows — for example a 50/50 test that lands far from 50/50. It signals a bug in assignment, logging, or filtering, and a test with SRM should not be trusted regardless of how good the headline result looks.
What this means
If you assign users 50/50 to control and variant, you expect roughly equal counts, with small random variation. SRM is when the gap between observed and expected counts is too large to be chance — formally, a chi-squared goodness-of-fit test on the arm counts returns a very small p-value. A handful of users off is normal; a persistent skew is a defect.
Why it invalidates a test
SRM almost never appears alone. The same fault that lost or misrouted users — a broken redirect, a CDN cache serving one variant, a bot hitting one arm, a filter dropping events asymmetrically — also distorts the conversion data. So an experiment with SRM is not 'mostly fine with a slight imbalance'; its core comparison is compromised. The discipline is to check SRM first and refuse to interpret results until it passes.
Common culprits include redirect-based variant delivery, asymmetric bot filtering, and logging that fires for one arm more reliably than the other.
- Detected with a chi-squared test on arm counts
- Indicates an assignment, caching, or logging bug
- Invalidates the result; investigate before trusting it
How it appears in analytics and logs
A failed SRM check means users were not split as intended. That usually points to a redirect, caching, bot, or logging fault that also biases the metrics — so the measured lift is untrustworthy until the cause is found.
Diagnostic use case
Run an SRM check (typically a chi-squared test on arm counts) before reading any experiment result, and treat a failed check as a stop-and-investigate signal, not a footnote.
What WebmasterID can help detect
WebmasterID records first-party exposure events per arm, and comparing those counts is exactly the input an SRM check needs to confirm a clean split.
Common mistakes
- Reading the lift before checking the split.
- Dismissing a small but statistically real imbalance.
- Fixing the ratio without finding the root cause.
Privacy and accuracy notes
SRM detection compares aggregate counts per arm, requiring no personal data. This page is educational, not statistical advice.
Related pages
- Control and variant in experiments
In an experiment the control is the existing version that acts as the baseline, and the variant is the version carrying the one change you are testing. Comparing the two only yields a clean answer when assignment is random and the variant differs from the control in exactly one way. Multiple variants are possible but each must be isolated.
- A/B testing fundamentals
An A/B test randomly assigns visitors to a control (A) or a variant (B), shows each group one version, and compares a pre-chosen metric. Random assignment is what lets you attribute a difference to the change rather than to who happened to see it. The discipline is in deciding the metric and sample size before you start, not after you peek at the numbers.
- Guardrail metrics in experiments
Guardrail metrics are the secondary measures you monitor during an experiment to make sure a change that improves the primary metric does not quietly damage something important — load time, retention, refunds, support load. They turn 'did the target go up' into the fuller question 'did the target go up without breaking anything'.
- Bot traffic in analytics: filtering it out
Bots — crawlers, scrapers, monitors, scanners — generate requests that, unfiltered, inflate pageviews and distort every metric. Client-side analytics often misses bots (many do not run JavaScript) or miscounts the ones that do. Server-side classification at ingest is the reliable way to keep bot traffic out of human reports.
Sources and verification notes
- Wikipedia — Pearson's chi-squared testThe goodness-of-fit test used to detect SRM.
- Fabijan et al. — Diagnosing Sample Ratio Mismatch in Online Controlled Experiments
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.