User agents in access log formats
Web-server access logs follow conventional formats, and the user agent lives in a known position within them. The widely used combined log format appends the referrer and user agent to the common format, while JSON log formats give the user agent a named key. Knowing where the field sits prevents mis-parsing and quoted-string mistakes.
Where the user agent sits
The common log format records host, identity, user, timestamp, request line, status, and size. The combined log format extends it by appending two quoted fields: the referrer and the user agent. The user agent is therefore the final quoted field in a typical combined-format line.
JSON-structured log formats, increasingly common, instead expose the user agent under a named key, which removes positional ambiguity. Knowing your format tells you exactly where to read the field.
Extracting it without mistakes
The classic pitfall is splitting a combined-format line on spaces: the user agent itself contains spaces, so a naive split shreds it. Because the UA is wrapped in quotes, parse it as a quoted field rather than by whitespace position alone.
Also account for escaping and for length: a long user agent can be truncated by the log format if a cap is configured, and embedded quotes must be handled per your server's escaping rules. JSON formats sidestep most of this by quoting and escaping the value as a string under its key.
- Combined format appends quoted referrer then quoted user agent
- JSON formats expose the user agent under a named key
- Parse the quoted field; do not split a combined line on spaces
How it appears in analytics and logs
A correctly extracted user-agent field gives the client's self-reported identity per log line. Mis-parsing — for example splitting a quoted UA on its internal spaces — corrupts that value and downstream classification.
Diagnostic use case
Locate the user-agent field in combined and JSON access-log formats and extract it reliably without splitting on the wrong delimiter.
What WebmasterID can help detect
WebmasterID captures and classifies the user agent at the edge, so you get a structured, parsed value instead of having to extract the right field from each log format yourself.
Common mistakes
- Splitting a combined-format line on spaces and shredding the UA.
- Ignoring quote-escaping rules when extracting the field.
- Assuming the log format never truncates a long user agent.
Privacy and accuracy notes
The user-agent field in any log format describes a client, not a person. It is coarse metadata; combining log fields to single out individuals would be fingerprinting and is outside privacy-safe practice.
Related pages
- User agents in server logs
Most web servers record the User-Agent header of every request in their access logs. That field is a primary source for understanding who and what reaches your site, but it is self-reported by the client, so it can be blank, generic, or spoofed. Reading server-log user agents well means treating them as claims to corroborate, not facts.
- User-agent string length and truncation
User-agent strings have grown long thanks to layered compatibility tokens, and intermediaries sometimes cap their length. A database column, log format, or proxy that truncates the string silently corrupts downstream parsing, turning a known browser into an unknown one. Knowing where truncation happens helps you keep UA data intact.
- Parsing user agents with regex pitfalls
Writing your own regular expressions to parse user-agent strings is fragile: strings carry overlapping legacy tokens, Chromium browsers share Chrome and Safari tokens, and new browsers appear constantly. Hand-rolled patterns produce false matches and silently rot. A maintained user-agent parser, Client Hints, or feature detection are more durable approaches.
- Website observability
Get a parsed user agent per request without hand-extracting log fields.
Sources and verification notes
- Apache HTTP Server — Log Files (Combined Log Format)Documents the combined log format that appends quoted referrer and user-agent fields.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.