How to opt out of AI training
Opting your content out of AI training is done through robots.txt: per-crawler tokens such as GPTBot and CCBot, plus dedicated control tokens like Google-Extended and Applebot-Extended. There is no single switch — you assemble the policy token by token, and it is a request to compliant systems.
There is no single switch
Opting out of AI training is assembled from multiple robots.txt rules, because different vendors expose different controls. Some training crawlers have their own tokens you disallow directly, such as GPTBot and CCBot. Others provide a dedicated training-control token layered on an existing crawler — Google-Extended for Google's generative AI, and Applebot-Extended for Apple's.
The key consequence: there is no universal opt-out token. You build the policy vendor by vendor. Importantly, control tokens like Google-Extended do not affect search indexing — Googlebot keeps crawling for Search regardless.
A practical checklist
Decide your intent first — opt out of training while keeping search visibility is a common goal — then express it per token. Disallow direct training-crawler tokens you want to exclude, and set the dedicated training-control tokens where vendors provide them.
Remember that robots.txt is a request honoured by compliant systems, not an access-control boundary, and it does not retroactively remove content already incorporated. Pair the rules with measurement so you can see whether the crawlers you opted out actually stop appearing.
- Direct training crawlers: e.g. GPTBot, CCBot
- Dedicated control tokens: Google-Extended, Applebot-Extended
- Search indexing is controlled separately and is unaffected
How it appears in analytics and logs
An AI-training opt-out lives in your robots.txt as a set of token rules. It is a policy signal, not a crawl event; whether it is honoured depends on each vendor's compliance, which you can confirm by observing whether those crawlers still appear.
Diagnostic use case
Assemble a robots.txt policy that opts your content out of AI training across the major vendors that offer such controls.
What WebmasterID can help detect
WebmasterID shows which AI training crawlers still reach your site after you set opt-out tokens, so you can confirm whether your policy is being respected rather than assuming it.
Common mistakes
- Looking for a single universal AI-training opt-out token — there is none.
- Assuming a training opt-out also removes you from search indexes.
- Expecting opt-out tokens to retroactively undo past training use.
Privacy and accuracy notes
These are robots.txt directives, not requests, so they involve no visitor data. They govern how vendors may use already-public content for training — a policy matter, not an identity one.
Related pages
- Google-Extended — Google AI training control
Google-Extended is not a crawler or a user-agent string. It is a robots.txt token that lets site owners control whether their content is used to improve Google's generative AI models such as Gemini and Vertex AI. Googlebot continues to crawl for Search normally regardless of the Google-Extended setting.
- Applebot-Extended — Apple AI training control
Applebot-Extended is a robots.txt token Apple provides so site owners can opt out of having their content used to train Apple's generative AI models. It is a control, not a separate crawler: Applebot remains the user agent that powers Apple search features and Siri, and it keeps crawling regardless of the Applebot-Extended setting.
- Web crawlers reference
Reference for crawlers, control tokens, and how they appear in traffic.
Sources and verification notes
- Google — Google-Extended documentationDocuments Google-Extended as a training control separate from Googlebot.
- Apple — Applebot-Extended documentationDocuments the Applebot-Extended training opt-out token.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.