Diagnosing an unknown bot
An unknown bot is a client whose user-agent does not match a known crawler. The right response is to verify what you can and resist guessing: attributing an unfamiliar user-agent to a named operator without evidence is how bad data spreads. An honest other bucket is more useful than a confident wrong label.
What an unknown bot is
An unknown bot is a request that looks automated but whose user-agent does not match any crawler you have documented. Some self-identify with a token and URL you simply have not catalogued yet; others are deliberately vague, generic, or spoofed.
The absence of a match is information in itself — it means you do not yet know who this is.
Verify, do not guess
The temptation is to assign an unfamiliar user-agent to a plausible operator. That is how inaccurate data spreads: a guess becomes a label, and the label gets trusted. Instead, verify what you can — does the user-agent contain a self-identifying URL or token you can look up? For clients claiming a known crawler, does the source pass that operator's published verification?
If verification does not resolve it, the correct outcome is to leave it unclassified. An honest other bucket preserves the integrity of every category you are confident about.
- An unmatched user-agent is not evidence of a specific operator
- Verify via self-identifying URLs and published methods
- Keep the unresolved in an explicit other bucket
Operator checklist
Capture the user-agent token and any self-identifying URL. Look it up against documented crawlers. For ones claiming a known crawler, verify the source. If it does not resolve, keep it unclassified rather than guessing. Revisit the bucket periodically as you catalogue more crawlers.
How it appears in analytics and logs
An unknown bot is automated traffic with a user-agent that does not map to a documented crawler. It is not evidence of any specific operator; treat it as unclassified until verification proves otherwise.
Diagnostic use case
Handle an uncategorized user-agent responsibly — verify identity where possible and keep it in an honest unclassified bucket rather than guessing an operator.
What WebmasterID can help detect
WebmasterID classifies what it can verify and keeps the rest in an explicit unclassified bucket, so unknown bots are visible and honestly labelled rather than guessed into a named category.
Common mistakes
- Guessing a named operator for an unfamiliar user-agent without evidence.
- Collapsing all unknown bots into a known category, polluting its data.
- Trusting a self-declared crawler token without verification.
Privacy and accuracy notes
Diagnosis uses the user-agent string and published verification methods only — never visitor identity, fingerprinting, or raw IP addresses. WebmasterID keeps unclassified bots in a transparent other bucket and does not invent attribution.
Related pages
- Diagnosing a bot traffic spike
A sudden spike in traffic is often bots, not audience. The diagnostic question is which bots: a verified crawler doing a fresh crawl wave, or spoofers and scrapers impersonating known crawlers. Separating verified crawlers from impostors by user-agent token and verification keeps your human analytics honest.
- Diagnosing a blocked crawler
When a crawler is not reaching your pages, the block can come from several layers: a robots.txt Disallow, a server-side 403, a WAF or bot-management rule, or an IP filter. Confirming which layer is responsible — rather than guessing — is the key to fixing it without opening doors you meant to keep shut.
- Bot intelligence
Deterministic categorisation with an honest unclassified bucket.
Sources and verification notes
- Google Search Central — Verifying Googlebot and other crawlersDocuments verifying a crawler rather than trusting the user agent.
- MDN — User-Agent header
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.