WebmasterID logoWebmasterID
User agents

Parsing user agents with regex pitfalls

Writing your own regular expressions to parse user-agent strings is fragile: strings carry overlapping legacy tokens, Chromium browsers share Chrome and Safari tokens, and new browsers appear constantly. Hand-rolled patterns produce false matches and silently rot. A maintained user-agent parser, Client Hints, or feature detection are more durable approaches.

Partially verified

What this means

A user-agent string is not a clean, structured field. It carries legacy tokens (Mozilla, KHTML), shared engine names (every Chromium browser has Chrome and Safari tokens), and a long tail of niche and embedded clients. A regex that looks reasonable for the top browsers quietly mismatches the rest.

The classic failures: matching Safari before Chrome and labelling Chrome as Safari; matching a substring that also appears in another browser; and bucketing every new or niche browser as Other because the pattern never anticipated it.

Why DIY regex rots

User agents change continuously — new browsers, new versions, UA reduction trimming detail. A static pattern set cannot keep up without ongoing maintenance, and the breakage is silent: numbers still appear, they are just wrong. Order-of-match bugs are especially common because token overlap means sequence matters.

Prefer a maintained, community-updated user-agent parser that already handles the overlap and the long tail. Better still, where you only need OS family or form factor, read low-entropy Client Hints; and where you need a capability, use feature detection instead of parsing at all.

How it appears in analytics and logs

Misclassified browser/OS data in analytics often traces back to brittle regex — for example Chrome counted as Safari, or a new browser bucketed as Other — because the pattern did not account for shared tokens and evolving strings.

Diagnostic use case

Avoid the common failure modes of DIY user-agent regex and choose a more robust approach to extracting browser, OS, or device context.

What WebmasterID can help detect

WebmasterID classifies user agents with maintained logic and matches stable, specific product tokens, avoiding the false positives that naive regex produces.

Common mistakes

Privacy and accuracy notes

This is about parsing technique, not visitor data. Whatever the method, keep extraction coarse — browser/OS family and form factor — and avoid assembling high-entropy detail into identifiers.

Frequently asked questions

Is it ever fine to parse user agents with regex?
For a narrow, well-tested match on a single stable product token it can be fine. For general browser/OS/device classification, a maintained parser, Client Hints, or feature detection are far more reliable than hand-rolled patterns.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.