GPTBot — OpenAI's web crawler
GPTBot is the crawler OpenAI uses to fetch publicly available web content that may be used to help train its foundation models. It is a declared, well-documented crawler with a stable robots.txt token, and OpenAI publishes both documentation and an IP range list so operators can identify and control it.
What this means
GPTBot is OpenAI's web crawler. OpenAI describes it as the crawler used to fetch publicly accessible content that may be used to improve its models. Allowing GPTBot can help your content be represented in OpenAI's models; disallowing it asks OpenAI not to use your site for that purpose.
GPTBot is distinct from ChatGPT-User (which fetches a page in real time when a user asks ChatGPT to browse) and from OAI-SearchBot (which supports OpenAI's search features). Treat the three as separate controls.
How GPTBot identifies itself
GPTBot uses the robots.txt user-agent token GPTBot. Its full user-agent string contains the GPTBot token together with a self-identifying URL pointing at OpenAI's GPTBot documentation. Match on the stable token rather than a full version string, because the version component changes over time.
Because any client can copy a user-agent string, treat the user agent as a claim. For requests where authenticity matters, verify the source IP against OpenAI's published GPTBot IP range list.
- robots.txt token: GPTBot
- User agent contains the GPTBot token plus an OpenAI GPTBot URL
- Verification: OpenAI publishes a downloadable IP range list
robots.txt considerations
GPTBot honours robots.txt. To allow it everywhere, no action is required beyond your normal allow rules. To disallow it from your whole site, target the GPTBot token specifically — blocking GPTBot does not block ChatGPT-User or OAI-SearchBot, and vice versa.
Example to disallow GPTBot site-wide:
User-agent: GPTBot Disallow: /
Remember robots.txt is a request, not an access control. It is honoured by compliant crawlers; it is not a security boundary and cannot stop a non-compliant client.
How it appears in analytics and logs
A request whose user-agent contains the GPTBot token means OpenAI's training crawler reached that URL. It is a crawl signal, not a human visit, so it should be counted as bot traffic and kept out of human analytics. A spike in GPTBot requests usually reflects a fresh crawl wave, not audience growth.
Diagnostic use case
Confirm whether GPTBot has fetched a given page, decide whether to allow it for AI-training visibility, and verify a request claiming to be GPTBot against OpenAI's published IP ranges before trusting it.
What WebmasterID can help detect
WebmasterID classifies GPTBot requests server-side as an AI crawler and surfaces them on the bot-intelligence and AI-visibility surfaces, separate from human analytics. It records which pages GPTBot reached and when, so you can see AI-training crawl coverage without parsing raw server logs.
Common mistakes
- Blocking GPTBot and assuming it also blocks ChatGPT-User or OAI-SearchBot — each has its own token.
- Trusting the GPTBot user agent without verifying the IP for requests where authenticity matters.
- Counting GPTBot crawl hits as human traffic, inflating page-view metrics.
- Treating a robots.txt Disallow as enforcement rather than a request to compliant crawlers.
Privacy and accuracy notes
GPTBot identification relies only on the request user-agent and, for verification, OpenAI's published IP ranges. No visitor identity is involved — a crawler is not a person. WebmasterID records the crawl as a bot event and never attaches it to a human profile.
Frequently asked questions
- Does blocking GPTBot remove my site from ChatGPT?
- Not necessarily. GPTBot governs crawling for model training. ChatGPT browsing on a user's behalf uses ChatGPT-User, and OpenAI's search features use OAI-SearchBot. Control each token according to what you actually want to allow.
- How do I verify a request is really GPTBot?
- Check that the source IP falls within OpenAI's published GPTBot IP ranges. A user agent that says GPTBot but originates outside those ranges is not GPTBot.
Related pages
- ClaudeBot — Anthropic's web crawler
ClaudeBot is the web crawler operated by Anthropic to fetch publicly available content. It is a declared crawler with a documented robots.txt token, and Anthropic publishes guidance for operators who want to identify or restrict it. It is separate from Claude-User, the agent that fetches pages when a person asks Claude to browse.
- PerplexityBot — Perplexity's web crawler
PerplexityBot is the crawler operated by Perplexity to index publicly available web pages for its AI answer engine. Perplexity documents the crawler and its robots.txt token. It is separate from Perplexity-User, which fetches a page in real time in response to a user's question.
- How to block GPTBot in robots.txt
If you do not want OpenAI's training crawler fetching your site, you can disallow GPTBot in robots.txt. This page gives the exact rule, clarifies that it does not affect ChatGPT-User or OAI-SearchBot, and is honest about the limits of robots-based blocking.
- AI visibility analytics
See which AI crawlers and assistants reach your site, recorded server-side.
Sources and verification notes
- OpenAI — GPTBot documentationToken, user-agent guidance, and IP range list reference.
- OpenAI — GPTBot overview
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.