How to block GPTBot in robots.txt
If you do not want OpenAI's training crawler fetching your site, you can disallow GPTBot in robots.txt. This page gives the exact rule, clarifies that it does not affect ChatGPT-User or OAI-SearchBot, and is honest about the limits of robots-based blocking.
The rule
Add this group to your robots.txt to disallow GPTBot across the whole site:
User-agent: GPTBot Disallow: /
To block only part of the site, list specific paths instead of /, and use Allow to carve out exceptions.
What it does and does not affect
This rule targets only the GPTBot token. It does not affect ChatGPT-User (real-time browsing on a user's behalf) or OAI-SearchBot (OpenAI search features). If you want to restrict those too, add their tokens as separate groups. It also has no effect on non-OpenAI crawlers.
- Affects: GPTBot (training crawler)
- Does NOT affect: ChatGPT-User, OAI-SearchBot
- Does NOT affect: other companies' AI crawlers
The limits
robots.txt is honoured by compliant crawlers; OpenAI states GPTBot respects it. But robots.txt cannot force compliance, and a disallow does not retroactively remove content already used. Treat it as a forward-looking request, and verify suspicious GPTBot requests by IP.
How it appears in analytics and logs
After adding a GPTBot Disallow, compliant GPTBot requests to the blocked paths should stop. Continued requests claiming to be GPTBot warrant IP verification against OpenAI's published ranges.
Diagnostic use case
Disallow GPTBot site-wide (or on specific paths) while leaving other OpenAI tokens and other crawlers under their own rules.
What WebmasterID can help detect
WebmasterID shows GPTBot crawl activity before and after your change, so you can confirm the block took effect for the compliant crawler — and flag any client ignoring it.
Common mistakes
- Expecting a GPTBot block to also stop ChatGPT-User or OAI-SearchBot.
- Assuming robots.txt removes content already crawled in the past.
- Typos in the token — it must be exactly GPTBot.
Privacy and accuracy notes
Blocking a crawler is a publishing-policy choice, not a privacy mechanism. The rule itself is public in your robots.txt.
Related pages
- robots.txt basics: what it does and what it cannot do
robots.txt is a plain-text file at your site root that tells compliant crawlers which paths they may request. This page covers the directives, how user-agent groups are matched, and the limits that trip people up: robots.txt is advisory, it does not hide pages from search, and it is not a security boundary.
- GPTBot — OpenAI's web crawler
GPTBot is the crawler OpenAI uses to fetch publicly available web content that may be used to help train its foundation models. It is a declared, well-documented crawler with a stable robots.txt token, and OpenAI publishes both documentation and an IP range list so operators can identify and control it.
- ClaudeBot — Anthropic's web crawler
ClaudeBot is the web crawler operated by Anthropic to fetch publicly available content. It is a declared crawler with a documented robots.txt token, and Anthropic publishes guidance for operators who want to identify or restrict it. It is separate from Claude-User, the agent that fetches pages when a person asks Claude to browse.
- AI visibility analytics
Confirm AI-crawler activity before and after a robots change.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.