Hashing and salting identifiers
Hashing applies a one-way function to an identifier (such as an email or IP) to produce a fixed-length digest, so the original is not stored directly. Salting prepends a secret value before hashing to defeat precomputed lookup ('rainbow') tables and dictionary attacks. In analytics these techniques pseudonymise identifiers, but because the input space is often small or guessable, hashed identifiers are frequently still personal data. This page is educational, not legal advice.
How hashing and salting work
A cryptographic hash maps any input to a fixed-length, deterministic digest that is infeasible to reverse directly. But determinism is a double edge: the same email always hashes to the same value, so an attacker who guesses candidate emails and hashes them can match digests. A salt — a secret, ideally per-record or per-system value added before hashing — breaks precomputed tables and makes guessing far more expensive. Peppers (a secret kept separate from the data) raise the bar further.
Why hashed identifiers are often still personal data
Regulators including the EDPB and the UK ICO have made clear that hashing usually produces pseudonymous, not anonymous, data: if the original can be re-derived by guessing the limited input space (emails, phone numbers, IPv4 addresses are all small enough), the hash still relates to an identifiable person. So hashing reduces exposure and is good practice, but it does not remove data from the scope of privacy law. Treat hashed identifiers with the same care as the originals, and keep salts secret.
Unsalted hashes of low-entropy inputs offer little real protection.
- Hash = one-way digest; salt = secret to defeat guessing
- Deterministic hashes of small input spaces are reversible by guessing
- Regulators treat re-identifiable hashes as personal data
How it appears in analytics and logs
If analytics stores digests instead of raw emails or IPs, hashing is in use; whether it counts as anonymous depends on guessability and salt secrecy.
Diagnostic use case
Reduce direct storage of raw identifiers in analytics by hashing them, using a secret salt so the digests resist reversal by guessing common inputs.
What WebmasterID can help detect
WebmasterID minimises identifiers at ingest; hashing and salting illustrate why a digest of an email or IP is still typically personal data, not anonymous.
Common mistakes
- Calling hashed identifiers 'anonymous'.
- Using unsalted hashes of emails, phone numbers, or IPs.
- Storing the salt alongside the digests with no separation.
Privacy and accuracy notes
This page is educational, not legal advice. Hashing is usually pseudonymisation, not anonymisation: regulators treat re-identifiable hashes as personal data.
Related pages
- Pseudonymisation in analytics
Pseudonymisation processes personal data so it can no longer be attributed to a specific person without additional information that is kept separately and secured. It is a recognised safeguard under the GDPR — but pseudonymised data is still personal data, not anonymous. Understanding that distinction prevents over-claiming privacy protection. This is an educational overview, not legal advice.
- Tokenization and data masking
Tokenization replaces a sensitive value with a non-sensitive surrogate ('token'), keeping the mapping in a separately protected vault so analytics can join records without holding the original. Data masking transforms or obscures field values — redacting, scrambling, or partially hiding characters — so the displayed or stored data is less revealing. Both are data-protection techniques, not legal regimes. This page is educational; their effect on any law depends on reversibility and key control.
- Anonymisation vs pseudonymisation
Anonymisation and pseudonymisation are often confused but have very different legal consequences. Truly anonymous data cannot be linked back to a person by any reasonable means, so it falls outside the GDPR. Pseudonymous data can be re-identified using a separately held key, so it remains personal data. Mislabelling one as the other is a common and costly error. This is educational, not legal advice.
- Privacy-first analytics
Minimising identifiers avoids relying on hashing for compliance.
Sources and verification notes
- ICO (UK) — Guidance on anonymisation and pseudonymisationRegulator guidance treating re-identifiable hashes as personal data. Educational, not legal advice.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.