Google-Extended — Google AI training control
Google-Extended is not a crawler or a user-agent string. It is a robots.txt token that lets site owners control whether their content is used to improve Google's generative AI models such as Gemini and Vertex AI. Googlebot continues to crawl for Search normally regardless of the Google-Extended setting.
What this means
Google-Extended is a robots.txt user-agent token Google provides so site owners can control whether their content helps improve Google's generative AI products, including Gemini and Vertex AI generative APIs. It is a policy switch, not a crawler.
Crucially, setting a Google-Extended rule does not change Googlebot's crawling for Search. Your pages can still be crawled and ranked in Search while being excluded from generative AI training. Google documents the two controls as independent.
How to use Google-Extended
Because Google-Extended is not a fetcher, it never appears as a user agent in server logs. You use it only in robots.txt. To opt out of generative AI training site-wide:
User-agent: Google-Extended Disallow: /
This is a request honoured by Google for generative AI training use. It does not affect Googlebot, Google-InspectionTool, or other Google tokens, which you control separately.
- robots.txt token: Google-Extended
- Not a separate crawler and not a user-agent string in logs
- Googlebot continues crawling for Search regardless of this token
How it appears in analytics and logs
You will not see Google-Extended as a user agent in your logs — it is a robots.txt control token, not a fetcher. Its presence in your robots.txt signals a policy choice about AI-training use, not a crawl event.
Diagnostic use case
Opt your content in or out of Google's generative AI training via robots.txt without affecting how Googlebot crawls your site for Search.
What WebmasterID can help detect
WebmasterID focuses on observed crawl traffic; because Google-Extended is a control token rather than a fetcher, it will not appear as bot events. WebmasterID can still help you see Googlebot's actual crawl activity, which Google-Extended does not change.
Common mistakes
- Expecting to see Google-Extended in server logs — it is a control token, not a fetcher.
- Believing a Google-Extended Disallow removes pages from Google Search — it does not; it governs generative AI training only.
- Confusing Google-Extended with Googlebot rules — they are independent.
Privacy and accuracy notes
Google-Extended is a robots.txt directive, not a request, so it involves no visitor data at all. It governs how Google may use already-crawled content, and concerns policy rather than identity.
Frequently asked questions
- Does blocking Google-Extended hurt my Google Search ranking?
- No. Google documents Google-Extended as controlling use of content for generative AI training only. Googlebot continues to crawl and index your site for Search independently of the Google-Extended setting.
Related pages
- Applebot-Extended — Apple AI training control
Applebot-Extended is a robots.txt token Apple provides so site owners can opt out of having their content used to train Apple's generative AI models. It is a control, not a separate crawler: Applebot remains the user agent that powers Apple search features and Siri, and it keeps crawling regardless of the Applebot-Extended setting.
- GPTBot — OpenAI's web crawler
GPTBot is the crawler OpenAI uses to fetch publicly available web content that may be used to help train its foundation models. It is a declared, well-documented crawler with a stable robots.txt token, and OpenAI publishes both documentation and an IP range list so operators can identify and control it.
- Web crawlers reference
Reference for crawlers, control tokens, and how they appear in traffic.
Sources and verification notes
- Google — Google-Extended documentationDocuments Google-Extended as a control token separate from Googlebot.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.