Tokenization and data masking
Tokenization replaces a sensitive value with a non-sensitive surrogate ('token'), keeping the mapping in a separately protected vault so analytics can join records without holding the original. Data masking transforms or obscures field values — redacting, scrambling, or partially hiding characters — so the displayed or stored data is less revealing. Both are data-protection techniques, not legal regimes. This page is educational; their effect on any law depends on reversibility and key control.
Tokenization vs masking
Tokenization substitutes a sensitive value with an unrelated token and stores the original-to-token map in a guarded vault; with vault access the value can be recovered, so it is reversible by design. Masking instead transforms the value itself — for example showing only the last digits, or replacing characters — and may be irreversible (static masking) or applied only at display time (dynamic masking). Choose based on whether downstream joins need consistent surrogates.
What it does and does not achieve
Both techniques cut the blast radius of a leak: a stolen token store is far less useful without the vault. But reversible tokenization is pseudonymisation, not anonymisation — the data can be re-linked given the mapping, so privacy law often still applies. Static, irreversible masking can move data closer to anonymous, but only if no residual values allow re-identification. Protect keys and vaults as carefully as the original data.
NIST's tokenization guidance frames these trade-offs for payment data and beyond.
- Tokenization: reversible surrogate via a protected vault
- Masking: redact, scramble, or partially hide values
- Reversible mapping means it stays pseudonymous
How it appears in analytics and logs
If analytics fields show tokens or partly hidden values rather than raw identifiers, tokenization or masking is in use; the key question is who can reverse it.
Diagnostic use case
Reduce exposure of sensitive fields in analytics by replacing them with tokens or masked values, so downstream systems work without the raw data.
What WebmasterID can help detect
WebmasterID minimises identifiers at ingest; tokenization and masking illustrate the same goal of working with surrogates instead of raw sensitive values.
Common mistakes
- Calling reversible tokenization 'anonymisation'.
- Leaving the token vault less protected than the raw data.
- Masking display fields while storing the raw value unprotected.
Privacy and accuracy notes
This page is educational, not legal advice. Reversible tokenization is typically pseudonymisation, not anonymisation, since the mapping can re-identify data.
Related pages
- Pseudonymisation in analytics
Pseudonymisation processes personal data so it can no longer be attributed to a specific person without additional information that is kept separately and secured. It is a recognised safeguard under the GDPR — but pseudonymised data is still personal data, not anonymous. Understanding that distinction prevents over-claiming privacy protection. This is an educational overview, not legal advice.
- Hashing and salting identifiers
Hashing applies a one-way function to an identifier (such as an email or IP) to produce a fixed-length digest, so the original is not stored directly. Salting prepends a secret value before hashing to defeat precomputed lookup ('rainbow') tables and dictionary attacks. In analytics these techniques pseudonymise identifiers, but because the input space is often small or guessable, hashed identifiers are frequently still personal data. This page is educational, not legal advice.
- Anonymisation vs pseudonymisation
Anonymisation and pseudonymisation are often confused but have very different legal consequences. Truly anonymous data cannot be linked back to a person by any reasonable means, so it falls outside the GDPR. Pseudonymous data can be re-identified using a separately held key, so it remains personal data. Mislabelling one as the other is a common and costly error. This is educational, not legal advice.
- Privacy-first analytics
Working with surrogates reduces exposure of raw identifiers.
Sources and verification notes
- NIST SP 800-188 / Tokenization guidance (De-Identifying Government Data)Primary guidance on de-identification techniques including tokenization.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.