Differential privacy
Differential privacy is a mathematical framework that bounds how much any single person's data can affect a published result, by injecting carefully calibrated random noise. It lets you release useful aggregate statistics while provably limiting what can be learned about any individual. This page explains the core idea and where it appears in analytics.
The core guarantee
A randomised analysis is differentially private if its output distribution barely changes whether or not any one individual's record is included. The 'barely' is quantified by a privacy-loss parameter, epsilon: a smaller epsilon means stronger privacy and more noise; a larger epsilon means weaker privacy and more accuracy. The mechanism typically adds noise (for example from a Laplace or Gaussian distribution) calibrated to a query's sensitivity.
Because the guarantee holds regardless of an attacker's side knowledge, it is robust against many re-identification attacks that defeat ad-hoc anonymisation.
Where it shows up
Differential privacy underpins parts of several privacy-preserving systems and is used to publish statistics — for example, it was applied to protect census data and appears in some browser and platform measurement features that report aggregates. In analytics, it lets you share counts and trends while bounding individual exposure, at the cost of added noise that is largest for small or finely sliced segments.
- Epsilon (ε) tunes the privacy-accuracy trade-off
- Noise is calibrated to query sensitivity
- Strong against side-knowledge re-identification attacks
How it appears in analytics and logs
Differentially private outputs are intentionally approximate; large aggregates stay accurate while tiny segments can be noise-dominated, which is the privacy guarantee working as designed.
Diagnostic use case
Recognise when aggregates are differentially private — and that small or sliced segments carry more noise — so you read noised counts appropriately.
What WebmasterID can help detect
The intuition behind differential privacy — protect individuals while publishing aggregates — mirrors WebmasterID's aggregate-first reporting posture.
Common mistakes
- Reading noised small-segment counts as exact.
- Assuming any noise added equals differential privacy.
- Ignoring the cumulative privacy budget across many queries.
Privacy and accuracy notes
Differential privacy is a privacy-protective technique. This page is educational; it explains the mechanism, not a guarantee that any given deployment is correctly configured.
Related pages
- k-anonymity in aggregate reporting
k-anonymity is a privacy model in which every record is indistinguishable from at least k-1 others on its quasi-identifiers, so no individual can be singled out within a group. Analytics platforms apply k-anonymity-style thresholds to suppress or hide small segments. This page explains the model, why thresholds appear in reports, and its known weaknesses.
- Anonymisation vs pseudonymisation
Anonymisation and pseudonymisation are often confused but have very different legal consequences. Truly anonymous data cannot be linked back to a person by any reasonable means, so it falls outside the GDPR. Pseudonymous data can be re-identified using a separately held key, so it remains personal data. Mislabelling one as the other is a common and costly error. This is educational, not legal advice.
- The Attribution Reporting API
The Attribution Reporting API (ARA) is a Privacy Sandbox API that connects ad clicks or views to later conversions without third-party cookies or cross-site identifiers. It produces two kinds of output — limited, noised event-level reports and aggregatable summary reports processed through an aggregation service. This page explains both and their trade-offs.
- Privacy-first analytics
Aggregate-first reporting that protects individuals.
Sources and verification notes
- Harvard — Differential Privacy (privacytools.seas.harvard.edu)Accessible primer on the formal definition.
- NIST — Differential Privacy (SP 800-226 / blog series)Standards-body explanation of the technique.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.