Privacy & compliance

Tokenization and data masking

Tokenization replaces a sensitive value with a non-sensitive surrogate ('token'), keeping the mapping in a separately protected vault so analytics can join records without holding the original. Data masking transforms or obscures field values — redacting, scrambling, or partially hiding characters — so the displayed or stored data is less revealing. Both are data-protection techniques, not legal regimes. This page is educational; their effect on any law depends on reversibility and key control.

Verified against primary sources

Tokenization vs masking

Tokenization substitutes a sensitive value with an unrelated token and stores the original-to-token map in a guarded vault; with vault access the value can be recovered, so it is reversible by design. Masking instead transforms the value itself — for example showing only the last digits, or replacing characters — and may be irreversible (static masking) or applied only at display time (dynamic masking). Choose based on whether downstream joins need consistent surrogates.

What it does and does not achieve

Both techniques cut the blast radius of a leak: a stolen token store is far less useful without the vault. But reversible tokenization is pseudonymisation, not anonymisation — the data can be re-linked given the mapping, so privacy law often still applies. Static, irreversible masking can move data closer to anonymous, but only if no residual values allow re-identification. Protect keys and vaults as carefully as the original data.

NIST's tokenization guidance frames these trade-offs for payment data and beyond.

Tokenization: reversible surrogate via a protected vault
Masking: redact, scramble, or partially hide values
Reversible mapping means it stays pseudonymous

How it appears in analytics and logs

If analytics fields show tokens or partly hidden values rather than raw identifiers, tokenization or masking is in use; the key question is who can reverse it.

Diagnostic use case

Reduce exposure of sensitive fields in analytics by replacing them with tokens or masked values, so downstream systems work without the raw data.

What WebmasterID can help detect

WebmasterID minimises identifiers at ingest; tokenization and masking illustrate the same goal of working with surrogates instead of raw sensitive values.

Common mistakes

Calling reversible tokenization 'anonymisation'.
Leaving the token vault less protected than the raw data.
Masking display fields while storing the raw value unprotected.

Privacy and accuracy notes

This page is educational, not legal advice. Reversible tokenization is typically pseudonymisation, not anonymisation, since the mapping can re-identify data.

↑ All privacy topics in Privacy & compliance

Sources and verification notes

NIST SP 800-188 / Tokenization guidance (De-Identifying Government Data)Primary guidance on de-identification techniques including tokenization.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.