User agents

How to parse user agents safely

Parsing user agents by hand with regular expressions is fragile and breaks as strings evolve. The safer approach is to use a maintained UA library, store a coarse category rather than each visitor's raw string, and treat the result as a hint, not an identity. This page sets out a privacy-safe parsing approach.

Verified against primary sources

Use a maintained library, not hand-rolled regex

User-agent strings are inconsistent and change frequently, so hand-written regular expressions tend to drift out of date and misclassify new clients. A maintained user-agent parsing library encodes community knowledge of current patterns and is updated as browsers evolve.

Match on stable tokens for coarse classification and accept that you will not perfectly identify every client. An honest 'unknown' is better than a confident wrong guess.

Prefer a maintained UA library over bespoke regexes
Match on stable tokens, not full version strings
Keep an honest 'unknown' bucket for unrecognised clients

Store a category, not the raw string

For real visitors, you rarely need the raw user-agent string after classification. Storing a coarse category — browser family, device class, bot-or-human — meets analytics needs while minimising data and reducing the fingerprinting surface that a full raw string would add.

Reserve raw-string handling for the moment of classification, and prefer User-Agent Client Hints when you genuinely need finer detail, since the raw UA is shrinking anyway. This keeps parsing both robust and privacy-safe.

How it appears in analytics and logs

A parsed user agent yields a coarse classification — browser family, device class, or bot category — that is useful context but never proof of identity, because the underlying string is client-controlled.

Diagnostic use case

Choose a parsing approach that stays accurate as user agents change and keeps visitor data minimal, instead of brittle hand-rolled regexes and raw-string storage.

What WebmasterID can help detect

WebmasterID parses user agents server-side into coarse categories against a maintained signature list, keeping raw strings out of stored visitor data and unknown clients in an honest bucket.

Common mistakes

Hand-rolling regexes that break as user-agent strings evolve.
Storing every visitor's raw user-agent string when a category would do.
Treating a parsed result as proof of identity rather than a hint.

Privacy and accuracy notes

Storing the raw user agent of every real visitor is unnecessary and increases fingerprinting surface. Store a coarse category instead. WebmasterID classifies at ingest and does not retain raw visitor user-agent strings.

↑ All user-agent families in User agents

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.