llms.txt and AI crawlers
llms.txt is a proposed convention: a Markdown file at your site root that points AI systems to your most important, LLM-friendly content. It is not robots.txt and not an access control — it is a curation hint. Adoption by AI crawlers is voluntary and uneven, so treat it as a complement to, not a replacement for, robots.txt and server-side controls.
What llms.txt proposes
llms.txt is a community-proposed convention for a Markdown file at your domain root that lists and links your most relevant content in a form that is easy for language models to consume. The idea is curation: instead of letting a model wade through navigation and boilerplate, you hand it a clean index of what matters.
It is explicitly not robots.txt. robots.txt says what may be crawled; llms.txt suggests what is worth reading. The two address different questions and do not overlap.
Why it is not enforcement
llms.txt is a proposal, not a ratified standard, and honouring it is entirely voluntary. There is no requirement that any AI crawler read it, and support across operators is uneven. So it cannot block, allow, or rate-limit anything.
Because of that, it must not be treated as access control or as an opt-out mechanism. To control crawling you still need robots.txt; to enforce, you still need server-side rules. llms.txt sits on top as an optional curation hint.
- Proposed convention, not a ratified or enforced standard
- Curation hint — not an allow/block control like robots.txt
- Adoption by AI crawlers is voluntary and uneven
Using it pragmatically
If you publish llms.txt, keep it a faithful, current map of genuinely useful pages — stale or padded files help no one. Maintain robots.txt and server controls independently for actual policy and enforcement.
Because support is still evolving, set expectations accordingly: llms.txt may improve how some systems find your best content, but it is not a guarantee of visibility, citation, or any traffic outcome.
How it appears in analytics and logs
Seeing fetches of /llms.txt suggests a client is looking for that convention, but because adoption is uneven, its absence or presence says little on its own. It does not control crawling — robots.txt and server rules still do.
Diagnostic use case
Offer AI systems a curated, structured map of your key content via llms.txt while keeping robots.txt and server controls as the real policy and enforcement layer.
What WebmasterID can help detect
WebmasterID records requests to /llms.txt as bot events like any other path, so you can see which AI clients fetch it, while noting it does not change how crawling is controlled.
Common mistakes
- Treating llms.txt as an opt-out or access control — it is neither.
- Assuming all AI crawlers read it; adoption is voluntary and uneven.
- Letting llms.txt drift out of sync with the site it describes.
- Dropping robots.txt because llms.txt exists — they solve different problems.
Privacy and accuracy notes
llms.txt is a static curation file with no visitor data. Whether a crawler reads it is observable only as a bot request to that path; no human identity is involved.
Frequently asked questions
- Does llms.txt replace robots.txt?
- No. robots.txt controls what crawlers may fetch and is widely honoured. llms.txt is a proposed curation hint pointing AI systems to your best content, with voluntary and uneven adoption. Keep robots.txt for actual crawl control.
Related pages
- Do AI crawlers obey robots.txt?
Major declared AI crawlers such as GPTBot, ClaudeBot, and Google-Extended document that they honour robots.txt, but compliance is voluntary and varies across operators. robots.txt is a crawl request defined by a shared standard, not an access-control mechanism, so a non-compliant or undeclared scraper can ignore it. Enforcement requires server-side controls.
- How to opt out of AI training
Opting your content out of AI training is done through robots.txt: per-crawler tokens such as GPTBot and CCBot, plus dedicated control tokens like Google-Extended and Applebot-Extended. There is no single switch — you assemble the policy token by token, and it is a request to compliant systems.
- AI visibility analytics
See which AI clients fetch resources like llms.txt, recorded server-side.
Sources and verification notes
- llmstxt.org — proposalCommunity proposal; not a ratified standard, adoption is voluntary.
- Google — robots.txt specificationContrast: robots.txt is the established crawl-control standard.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.