Detecting AI crawlers without a user agent
Not every AI crawler declares a clean token — some send a blank, generic, or browser-like user agent. You cannot identify those by token alone. This entry describes the behavioural and network signals that flag likely automated AI fetching, while being explicit that behaviour suggests a class, not a named vendor, and that you must never invent identity.
Why some crawlers have no usable token
A user-agent header is set by the client, so it can be blank, generic (a bare library default), or a copied browser string. Crawlers that present no recognisable token cannot be classified by token matching — the primary method most identification relies on.
This is common with undeclared scrapers and with agents driving real browsers. The absence of a token is itself a signal worth noting, but it is not proof of who is behind the request.
Behavioural and network signals
When the token is missing, lean on behaviour: a methodical sweep through many URLs, requests for HTML without the accompanying images, CSS, and scripts a real browser would load, unusual request cadence, and origins in datacenter rather than consumer networks. Several weak signals together raise confidence that traffic is automated.
Critically, behaviour points to a class — 'likely automated, possibly AI' — not to a named vendor. Do not invent a user-agent string, an IP range, or a vendor attribution to fill the gap. Where verification material exists for a suspected crawler, use it; where it does not, the honest label is a probabilistic bot classification, not a confirmed identity.
- Missing/generic UA cannot be matched by token
- Signals: methodical URL sweep, no asset loading, datacenter origin
- Behaviour suggests a class, not a vendor — never invent identity
How it appears in analytics and logs
Automated-looking sessions with no or generic user agent — methodical fetch patterns, no asset loading, datacenter origins — suggest non-human, possibly AI, fetching. They indicate a behaviour class, not a confirmed vendor, so attribution must stay cautious.
Diagnostic use case
Flag probable AI-crawler activity when the user agent is missing or generic, using behavioural signals, without falsely attributing it to a specific vendor.
What WebmasterID can help detect
WebmasterID classifies traffic server-side using more than the user agent, so sessions without a clean token can still be flagged as likely bots rather than silently counted as humans.
Common mistakes
- Attributing a token-less crawler to a specific vendor without evidence.
- Inventing an IP range or UA string to manufacture an identity.
- Treating a single weak signal as proof of automated traffic.
Privacy and accuracy notes
Behavioural detection uses request and network metadata, not personal data, and is not user fingerprinting. WebmasterID records the result as a bot event and never builds a human profile from it.
Related pages
- Undeclared AI scrapers and how they appear
Some AI scrapers do not declare a recognisable token. They appear with generic user agents, browser-like strings, or forged identities. They cannot be identified by a clean token, so the honest approach is to describe the pattern, verify what you can, and categorise conservatively.
- AI crawler user-agent spoofing
Any client can put GPTBot or ClaudeBot in its User-Agent header, because that header is supplied by the client and never validated by HTTP. Spoofers do this to borrow a trusted crawler's reputation or to get around rules. The defence is verifying the request's network source against the operator's published ranges, not trusting the string.
- AI crawler honeypots and traps
An AI crawler honeypot is a deliberately planted resource — a hidden link, a disallowed path, or an endlessly generated 'tar-pit' page — used to detect or slow crawlers that ignore robots.txt. Tools such as Nepenthes popularised the tar-pit approach. This entry explains the techniques, what they can prove, and why they are a detection aid rather than enforcement.
- Bot vs human traffic
Classify automated sessions even when the user agent is missing.
Sources and verification notes
- MDN — User-Agent headerExplains the UA is client-set and may be absent or generic.
- IAB/ABC — international spiders & bots list (background)Industry background on classifying non-declaring automated traffic.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.