Experiment instrumentation quality
Instrumentation quality is the often-ignored foundation of trustworthy experiments: if exposure, assignment, and metric events are logged wrongly, every downstream statistic is wrong too. This covers logging the exposure point correctly, deduplicating events, handling consent gaps, and validating tracking with A/A tests before a real experiment runs. Bad instrumentation produces confident, precise, and false conclusions.
What can be silently wrong
Common faults: the exposure event fires before the user actually sees the variant (or not at all on some browsers), assignment and exposure logs disagree, conversion events double-fire or drop on slow connections, and consent gating removes events unevenly across arms. None of these announce themselves — the analysis still runs and returns a confident number, just a wrong one. Garbage in, significant garbage out.
- Exposure logged at the wrong moment or not at all
- Duplicate or dropped conversion events
- Consent gaps removing events unevenly
Validate before you trust
Run an A/A test — two identical variants — and confirm it shows no significant difference at the expected rate; a recurring 'winner' between identical arms exposes an instrumentation or analysis bug. Reconcile assignment counts against exposure counts (a mismatch is an SRM). Spot-check event firing across browsers and devices. Only after the plumbing is verified do experiment results mean anything. Re-validate when the tracking code changes.
This is the precondition for SRM debugging and every variance technique to be meaningful.
How it appears in analytics and logs
An A/A test that shows a 'significant' difference, or exposure counts that do not match assignment, signals an instrumentation fault, not a real effect.
Diagnostic use case
Validate exposure, assignment, and metric logging — ideally with an A/A test — before launching experiments, so results rest on correct data.
What WebmasterID can help detect
WebmasterID's first-party event pipeline is where exposure and conversion events are captured; validating them underpins every test.
Common mistakes
- Trusting results without ever running an A/A validation.
- Logging exposure before the variant is actually rendered.
- Letting consent gating drop events unevenly across arms.
Privacy and accuracy notes
Instrumentation should log only the events needed, respect consent gaps, and avoid capturing identifying field values.
Related pages
- Debugging a sample ratio mismatch
A sample ratio mismatch (SRM) — observed variant counts that differ from the intended split by more than chance — invalidates a test, because whatever broke the ratio likely biased the metrics too. Debugging SRM is a systematic hunt: check the assignment mechanism, redirect and timing effects, bot filtering, logging gaps, and analysis filters that drop one arm unevenly. This entry is the troubleshooting procedure, not the definition.
- Sample ratio mismatch (SRM)
Sample ratio mismatch (SRM) is when the observed allocation of users to experiment arms diverges from the planned ratio by more than chance allows — for example a 50/50 test that lands far from 50/50. It signals a bug in assignment, logging, or filtering, and a test with SRM should not be trusted regardless of how good the headline result looks.
- Bot traffic in analytics: filtering it out
Bots — crawlers, scrapers, monitors, scanners — generate requests that, unfiltered, inflate pageviews and distort every metric. Client-side analytics often misses bots (many do not run JavaScript) or miscounts the ones that do. Server-side classification at ingest is the reliable way to keep bot traffic out of human reports.
- Events docs
Capture exposure and conversion events reliably.
Sources and verification notes
- Kohavi, Tang, Xu — Trustworthy Online Controlled Experiments (book site)Standard reference on instrumentation, A/A tests, and trust.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.