WebmasterID logoWebmasterID
AI crawlers

GPTBot — OpenAI's web crawler

GPTBot is the crawler OpenAI uses to fetch publicly available web content that may be used to help train its foundation models. It is a declared, well-documented crawler with a stable robots.txt token, and OpenAI publishes both documentation and an IP range list so operators can identify and control it.

Verified against primary sources

What this means

GPTBot is OpenAI's web crawler. OpenAI describes it as the crawler used to fetch publicly accessible content that may be used to improve its models. Allowing GPTBot can help your content be represented in OpenAI's models; disallowing it asks OpenAI not to use your site for that purpose.

GPTBot is distinct from ChatGPT-User (which fetches a page in real time when a user asks ChatGPT to browse) and from OAI-SearchBot (which supports OpenAI's search features). Treat the three as separate controls.

How GPTBot identifies itself

GPTBot uses the robots.txt user-agent token GPTBot. Its full user-agent string contains the GPTBot token together with a self-identifying URL pointing at OpenAI's GPTBot documentation. Match on the stable token rather than a full version string, because the version component changes over time.

Because any client can copy a user-agent string, treat the user agent as a claim. For requests where authenticity matters, verify the source IP against OpenAI's published GPTBot IP range list.

robots.txt considerations

GPTBot honours robots.txt. To allow it everywhere, no action is required beyond your normal allow rules. To disallow it from your whole site, target the GPTBot token specifically — blocking GPTBot does not block ChatGPT-User or OAI-SearchBot, and vice versa.

Example to disallow GPTBot site-wide:

User-agent: GPTBot Disallow: /

Remember robots.txt is a request, not an access control. It is honoured by compliant crawlers; it is not a security boundary and cannot stop a non-compliant client.

How it appears in analytics and logs

A request whose user-agent contains the GPTBot token means OpenAI's training crawler reached that URL. It is a crawl signal, not a human visit, so it should be counted as bot traffic and kept out of human analytics. A spike in GPTBot requests usually reflects a fresh crawl wave, not audience growth.

Diagnostic use case

Confirm whether GPTBot has fetched a given page, decide whether to allow it for AI-training visibility, and verify a request claiming to be GPTBot against OpenAI's published IP ranges before trusting it.

What WebmasterID can help detect

WebmasterID classifies GPTBot requests server-side as an AI crawler and surfaces them on the bot-intelligence and AI-visibility surfaces, separate from human analytics. It records which pages GPTBot reached and when, so you can see AI-training crawl coverage without parsing raw server logs.

Common mistakes

Privacy and accuracy notes

GPTBot identification relies only on the request user-agent and, for verification, OpenAI's published IP ranges. No visitor identity is involved — a crawler is not a person. WebmasterID records the crawl as a bot event and never attaches it to a human profile.

Frequently asked questions

Does blocking GPTBot remove my site from ChatGPT?
Not necessarily. GPTBot governs crawling for model training. ChatGPT browsing on a user's behalf uses ChatGPT-User, and OpenAI's search features use OAI-SearchBot. Control each token according to what you actually want to allow.
How do I verify a request is really GPTBot?
Check that the source IP falls within OpenAI's published GPTBot IP ranges. A user agent that says GPTBot but originates outside those ranges is not GPTBot.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.