AI crawlers and content negotiation
Content negotiation lets a server return different representations of a URL based on request headers like Accept and Accept-Encoding. AI crawlers send these headers too, so the variant they receive depends on what they advertise and what you serve. Mishandled negotiation — wrong Vary header, or serving crawlers a different representation than humans — can distort what is ingested.
What content negotiation does
A single URL can have multiple representations — formats, languages, encodings. HTTP content negotiation lets the client express preferences through request headers: Accept for media types, Accept-Encoding for compression, Accept-Language for language. The server picks a representation and serves it, ideally signalling its choice with the Vary header.
AI crawlers participate in this. They send Accept and Accept-Encoding headers, so the representation they get depends on what they advertise and what your negotiation logic does with it. Most of the time this is invisible and correct; problems arise at the edges.
Where negotiation goes wrong for crawlers
The Vary header is the common pitfall. It tells caches which request headers a response depends on; omit it, and a shared cache may serve a variant negotiated for one client to a different one — including serving a crawler a representation meant for someone else. Set it accurately for every header your negotiation branches on.
A second pitfall is branching content meaningfully on headers a crawler sends differently than a browser. If, for example, you serve a stripped representation to clients that do not advertise a particular Accept type, a crawler that omits it gets the stripped version. Negotiation should change format or encoding, not silently degrade the content a crawler ingests.
- Crawlers send Accept and Accept-Encoding; the variant follows from them
- Set Vary for every header negotiation branches on, or caches misserve
- Negotiate format/encoding, not a degraded content variant for crawlers
Serving variants safely
Keep negotiation about representation, not substance: the same content in HTML or JSON, compressed or not, in different languages — but the same information. Set Vary precisely so caches key variants correctly, and test by sending crawler-like headers to confirm the response is the complete, intended representation.
Where you offer a structured alternative (such as a JSON representation of an HTML page), document it and keep it in sync, rather than letting content negotiation become an accidental way crawlers receive something different from what humans see. Consistency across variants is what keeps AI ingestion faithful to your content.
How it appears in analytics and logs
If crawlers receive a different or degraded representation than browsers do, content negotiation may be branching on a header the crawler sends differently. A missing or wrong Vary header can also cause caches to serve the wrong variant to a crawler.
Diagnostic use case
Make sure content negotiation serves AI crawlers a correct, complete representation of each URL, setting the Vary header accurately so caches and crawlers handle variants properly and crawlers do not receive an unintended format.
What WebmasterID can help detect
WebmasterID records which AI tokens fetched which URLs and the response status, so you can spot when crawlers receive unexpected responses that a negotiation or Vary issue might explain on the bot-intelligence surface.
Common mistakes
- Omitting or mis-setting Vary, so caches serve crawlers the wrong variant.
- Branching real content, not just format, on headers crawlers send differently.
- Letting a structured representation drift out of sync with the HTML page.
- Never testing responses with crawler-like Accept and Accept-Encoding headers.
Privacy and accuracy notes
Content negotiation acts on request headers describing format and encoding, not identity. Which variant a crawler receives keys on those headers and the crawler token, never on visitor identity or precise location.
Frequently asked questions
- Can content negotiation give AI crawlers the wrong version of a page?
- Yes, if the Vary header is missing or wrong, a shared cache can serve a crawler a variant negotiated for a different client. It can also happen if you branch real content on a header crawlers send differently. Keep negotiation about format and encoding, and set Vary accurately.
Related pages
- AI crawlers: API vs HTML access
AI systems can reach your content two ways: by crawling your public HTML pages, or by calling a structured API or feed you expose. HTML crawling is uncontrolled discovery of whatever is public; API access is an explicit, shaped channel you can authenticate, rate-limit, and version. The choice shapes how much control and visibility you keep.
- AI crawlers and conditional requests
Conditional requests let a crawler ask 'send this only if it changed' using validators it stored from a prior fetch — an ETag or a Last-Modified date. If the page is unchanged, the server replies 304 Not Modified with no body, saving bandwidth and origin work. Supporting conditional requests well makes re-crawling by AI crawlers efficient for both sides.
- AI crawlers, caching, and snapshots
An AI assistant can present content from a stored snapshot taken during an earlier crawl rather than fetching your page in real time. That means an AI may reference a version of your page that no longer matches the live one, and your logs may show no recent crawl despite active AI usage. This entry explains snapshot behaviour and its measurement consequences.
- Website observability
Spot unexpected crawler responses a negotiation or Vary issue can cause.
Sources and verification notes
- MDN — Content negotiationDocuments Accept headers and server-driven negotiation.
- MDN — Vary headerVary tells caches which headers a representation depends on.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.