AI crawler user agents
AI crawlers from companies building and serving large models fetch public web content. Their user agents follow a recognisable shape: a product token plus a self-identifying URL pointing at the operator's documentation. This page explains how to read the AI-crawler pattern and links to the AI-crawlers hub for specifics.
The AI-crawler UA pattern
AI crawlers follow the same self-identifying convention as other well-behaved crawlers: a stable product token (such as GPTBot, ClaudeBot, or PerplexityBot) plus a URL that points at the operator's crawler documentation. The token is the stable identifier; version detail changes.
For the per-crawler details — what each token is, how to control it in robots.txt, and how to verify it — see the AI-crawlers hub.
- Product token (e.g. GPTBot, ClaudeBot, PerplexityBot)
- A self-identifying +https://… documentation URL
- Match on the token, not the full version string
Reading and verifying AI crawlers
Treat an AI-crawler user agent as a claim. Many operators publish verification methods (IP ranges or reverse DNS) for requests that need to be trusted, exactly as search engines do.
When you allow an AI crawler you let it fetch your public pages, which can help your content be represented in that operator's products; disallowing the token asks it to stay out. Decide per token, not in bulk.
How it appears in analytics and logs
A user agent carrying an AI crawler's product token and a documentation URL is an AI crawler fetching your content. It is a crawl signal, not a human visit, and should be separated from human analytics.
Diagnostic use case
Recognise AI crawler traffic by its token-plus-URL pattern, count it as bot activity, and find the per-crawler robots.txt controls in the AI-crawlers hub.
What WebmasterID can help detect
WebmasterID classifies AI crawlers server-side and surfaces them on the bot-intelligence and AI-visibility surfaces, so AI crawl coverage is observable per page without parsing logs.
Common mistakes
- Counting AI crawl hits as human traffic or audience growth.
- Trusting an AI-crawler user agent without verification when it matters.
- Assuming one robots.txt rule covers every AI crawler — each has its own token.
Privacy and accuracy notes
AI crawlers carry no visitor identity. WebmasterID records their requests as bot events, separate from human analytics, and never as human profiles.
Related pages
- GPTBot — OpenAI's web crawler
GPTBot is the crawler OpenAI uses to fetch publicly available web content that may be used to help train its foundation models. It is a declared, well-documented crawler with a stable robots.txt token, and OpenAI publishes both documentation and an IP range list so operators can identify and control it.
- ClaudeBot — Anthropic's web crawler
ClaudeBot is the web crawler operated by Anthropic to fetch publicly available content. It is a declared crawler with a documented robots.txt token, and Anthropic publishes guidance for operators who want to identify or restrict it. It is separate from Claude-User, the agent that fetches pages when a person asks Claude to browse.
- SEO crawler user agents
SEO platforms run their own crawlers to build backlink indexes and audit data. Bots such as AhrefsBot, SemrushBot, and DotBot identify themselves with a documented token and honour robots.txt. They are not search-engine indexers. This page explains the family and how to recognise and control it.
- AI visibility analytics
See which AI crawlers and assistants reach your site, recorded server-side.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.