How to parse user agents safely
Parsing user agents by hand with regular expressions is fragile and breaks as strings evolve. The safer approach is to use a maintained UA library, store a coarse category rather than each visitor's raw string, and treat the result as a hint, not an identity. This page sets out a privacy-safe parsing approach.
Use a maintained library, not hand-rolled regex
User-agent strings are inconsistent and change frequently, so hand-written regular expressions tend to drift out of date and misclassify new clients. A maintained user-agent parsing library encodes community knowledge of current patterns and is updated as browsers evolve.
Match on stable tokens for coarse classification and accept that you will not perfectly identify every client. An honest 'unknown' is better than a confident wrong guess.
- Prefer a maintained UA library over bespoke regexes
- Match on stable tokens, not full version strings
- Keep an honest 'unknown' bucket for unrecognised clients
Store a category, not the raw string
For real visitors, you rarely need the raw user-agent string after classification. Storing a coarse category — browser family, device class, bot-or-human — meets analytics needs while minimising data and reducing the fingerprinting surface that a full raw string would add.
Reserve raw-string handling for the moment of classification, and prefer User-Agent Client Hints when you genuinely need finer detail, since the raw UA is shrinking anyway. This keeps parsing both robust and privacy-safe.
How it appears in analytics and logs
A parsed user agent yields a coarse classification — browser family, device class, or bot category — that is useful context but never proof of identity, because the underlying string is client-controlled.
Diagnostic use case
Choose a parsing approach that stays accurate as user agents change and keeps visitor data minimal, instead of brittle hand-rolled regexes and raw-string storage.
What WebmasterID can help detect
WebmasterID parses user agents server-side into coarse categories against a maintained signature list, keeping raw strings out of stored visitor data and unknown clients in an honest bucket.
Common mistakes
- Hand-rolling regexes that break as user-agent strings evolve.
- Storing every visitor's raw user-agent string when a category would do.
- Treating a parsed result as proof of identity rather than a hint.
Privacy and accuracy notes
Storing the raw user agent of every real visitor is unnecessary and increases fingerprinting surface. Store a coarse category instead. WebmasterID classifies at ingest and does not retain raw visitor user-agent strings.
Related pages
- Browser user agents: how to read them
A browser user-agent string packs several tokens into one line: a legacy Mozilla prefix, a rendering-engine signature, the platform, and the browser itself. This page explains each part so you can read a UA without over-reading it, because the contents are client-controlled and can be copied by any client.
- User-Agent Client Hints
User-Agent Client Hints are HTTP headers (the Sec-CH-UA family) that let a site request specific browser, platform, and version detail rather than reading it all from one passive string. They underpin UA reduction: the raw user agent is shrinking, and finer detail moves to opt-in hints. This page explains the model.
- Privacy-first analytics
Analytics that store coarse categories, not fingerprintable detail.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.