AI crawlers

Detecting AI crawlers without a user agent

Not every AI crawler declares a clean token — some send a blank, generic, or browser-like user agent. You cannot identify those by token alone. This entry describes the behavioural and network signals that flag likely automated AI fetching, while being explicit that behaviour suggests a class, not a named vendor, and that you must never invent identity.

Partially verified

Why some crawlers have no usable token

A user-agent header is set by the client, so it can be blank, generic (a bare library default), or a copied browser string. Crawlers that present no recognisable token cannot be classified by token matching — the primary method most identification relies on.

This is common with undeclared scrapers and with agents driving real browsers. The absence of a token is itself a signal worth noting, but it is not proof of who is behind the request.

Behavioural and network signals

When the token is missing, lean on behaviour: a methodical sweep through many URLs, requests for HTML without the accompanying images, CSS, and scripts a real browser would load, unusual request cadence, and origins in datacenter rather than consumer networks. Several weak signals together raise confidence that traffic is automated.

Critically, behaviour points to a class — 'likely automated, possibly AI' — not to a named vendor. Do not invent a user-agent string, an IP range, or a vendor attribution to fill the gap. Where verification material exists for a suspected crawler, use it; where it does not, the honest label is a probabilistic bot classification, not a confirmed identity.

Missing/generic UA cannot be matched by token
Signals: methodical URL sweep, no asset loading, datacenter origin
Behaviour suggests a class, not a vendor — never invent identity

How it appears in analytics and logs

Automated-looking sessions with no or generic user agent — methodical fetch patterns, no asset loading, datacenter origins — suggest non-human, possibly AI, fetching. They indicate a behaviour class, not a confirmed vendor, so attribution must stay cautious.

Diagnostic use case

Flag probable AI-crawler activity when the user agent is missing or generic, using behavioural signals, without falsely attributing it to a specific vendor.

What WebmasterID can help detect

WebmasterID classifies traffic server-side using more than the user agent, so sessions without a clean token can still be flagged as likely bots rather than silently counted as humans.

Common mistakes

Attributing a token-less crawler to a specific vendor without evidence.
Inventing an IP range or UA string to manufacture an identity.
Treating a single weak signal as proof of automated traffic.

Privacy and accuracy notes

Behavioural detection uses request and network metadata, not personal data, and is not user fingerprinting. WebmasterID records the result as a bot event and never builds a human profile from it.

↑ All AI crawlers in AI crawlers

Sources and verification notes

MDN — User-Agent headerExplains the UA is client-set and may be absent or generic.
IAB/ABC — international spiders & bots list (background)Industry background on classifying non-declaring automated traffic.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.