WebmasterID logoWebmasterID
AI crawlers

AI crawlers and content negotiation

Content negotiation lets a server return different representations of a URL based on request headers like Accept and Accept-Encoding. AI crawlers send these headers too, so the variant they receive depends on what they advertise and what you serve. Mishandled negotiation — wrong Vary header, or serving crawlers a different representation than humans — can distort what is ingested.

Verified against primary sources

What content negotiation does

A single URL can have multiple representations — formats, languages, encodings. HTTP content negotiation lets the client express preferences through request headers: Accept for media types, Accept-Encoding for compression, Accept-Language for language. The server picks a representation and serves it, ideally signalling its choice with the Vary header.

AI crawlers participate in this. They send Accept and Accept-Encoding headers, so the representation they get depends on what they advertise and what your negotiation logic does with it. Most of the time this is invisible and correct; problems arise at the edges.

Where negotiation goes wrong for crawlers

The Vary header is the common pitfall. It tells caches which request headers a response depends on; omit it, and a shared cache may serve a variant negotiated for one client to a different one — including serving a crawler a representation meant for someone else. Set it accurately for every header your negotiation branches on.

A second pitfall is branching content meaningfully on headers a crawler sends differently than a browser. If, for example, you serve a stripped representation to clients that do not advertise a particular Accept type, a crawler that omits it gets the stripped version. Negotiation should change format or encoding, not silently degrade the content a crawler ingests.

Serving variants safely

Keep negotiation about representation, not substance: the same content in HTML or JSON, compressed or not, in different languages — but the same information. Set Vary precisely so caches key variants correctly, and test by sending crawler-like headers to confirm the response is the complete, intended representation.

Where you offer a structured alternative (such as a JSON representation of an HTML page), document it and keep it in sync, rather than letting content negotiation become an accidental way crawlers receive something different from what humans see. Consistency across variants is what keeps AI ingestion faithful to your content.

How it appears in analytics and logs

If crawlers receive a different or degraded representation than browsers do, content negotiation may be branching on a header the crawler sends differently. A missing or wrong Vary header can also cause caches to serve the wrong variant to a crawler.

Diagnostic use case

Make sure content negotiation serves AI crawlers a correct, complete representation of each URL, setting the Vary header accurately so caches and crawlers handle variants properly and crawlers do not receive an unintended format.

What WebmasterID can help detect

WebmasterID records which AI tokens fetched which URLs and the response status, so you can spot when crawlers receive unexpected responses that a negotiation or Vary issue might explain on the bot-intelligence surface.

Common mistakes

Privacy and accuracy notes

Content negotiation acts on request headers describing format and encoding, not identity. Which variant a crawler receives keys on those headers and the crawler token, never on visitor identity or precise location.

Frequently asked questions

Can content negotiation give AI crawlers the wrong version of a page?
Yes, if the Vary header is missing or wrong, a shared cache can serve a crawler a variant negotiated for a different client. It can also happen if you branch real content on a header crawlers send differently. Keep negotiation about format and encoding, and set Vary accurately.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.