Network effects in experiments
Standard A/B tests assume each user's outcome depends only on their own assigned variant — the no-interference (SUTVA) assumption. Network effects break it: in social products, marketplaces, or anything with sharing, a treated user changes the experience of untreated users, so control is 'contaminated' and the measured effect is biased. Cluster, switchback, or ego-network designs reduce the leakage.
Interference breaks the core assumption
The validity of a user-randomised A/B test rests on SUTVA: one user's outcome is unaffected by another user's assignment. Network effects violate this. If a treated user shares content, invites friends, or consumes shared supply, untreated control users feel the treatment second-hand. The control group is no longer a clean baseline, and the estimated treatment effect is biased — often toward zero, sometimes away.
- SUTVA = no interference between units
- Sharing, invites, shared supply all cause spillover
- Bias can shrink or inflate the measured effect
Designs that contain spillover
Cluster randomisation assigns whole groups (geographies, communities, network components) to one variant so most interactions stay within an arm. Switchback designs randomise time so the whole system is one variant at a time. Ego-cluster and graph-cluster methods approximate isolated neighbourhoods. Each trades statistical power (fewer effective units) for reduced interference — a deliberate exchange when spillover would otherwise dominate the bias.
No design fully removes interference; the goal is to make residual leakage small enough to ignore.
How it appears in analytics and logs
Under interference, a user-level A/B test under- or over-states the true effect because control users are indirectly exposed to treatment.
Diagnostic use case
Watch for interference whenever users interact (messaging, sharing, shared inventory) and switch to a cluster or time-based design when spillover is plausible.
What WebmasterID can help detect
WebmasterID's aggregate first-party metrics help spot suspiciously contaminated control behaviour that hints at spillover.
Common mistakes
- Running user-level tests on social or marketplace features without checking for spillover.
- Assuming a near-zero effect is real when interference may be masking it.
- Ignoring the power cost of cluster designs.
Privacy and accuracy notes
Cluster designs group by coarse units (regions, communities) rather than tracking individual relationships in identifiable detail.
Related pages
- Switchback experiments
A switchback experiment randomises treatment at the level of time windows (and sometimes regions) rather than users: the entire system runs control for one interval, treatment for the next, alternating on a schedule. It is used where treating some users affects others — marketplaces, pricing, dispatch — so a user-level split would leak between arms. Time becomes the randomisation unit.
- Randomization unit
The randomization unit is the thing you randomly assign to control or treatment: a user, a session, a device, a cookie, or a cluster. The choice must match how you analyse and how users experience the change. Mismatches cause two classic failures — a user flipping variants between sessions (inconsistent experience) and analysing at a finer grain than you assigned (understated variance, false significance).
- Referral funnel
The referral funnel measures how existing users bring in new ones: being prompted to invite, sharing, the invitee clicking, the invitee signing up, and the invitee activating. Each stage has its own drop-off. Referral carries pitfalls that other funnels do not — two-sided incentives that can attract gaming, attribution of who gets credit, and network interference that complicates experiments measuring it.
- WebmasterID docs
How conversion events feed your own analysis.
Sources and verification notes
- Saveski et al. — Detecting Network Effects: Randomizing Over Randomized Experiments (KDD)Peer-reviewed treatment of interference detection in experiments.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.