AI crawlers and conditional requests
Conditional requests let a crawler ask 'send this only if it changed' using validators it stored from a prior fetch — an ETag or a Last-Modified date. If the page is unchanged, the server replies 304 Not Modified with no body, saving bandwidth and origin work. Supporting conditional requests well makes re-crawling by AI crawlers efficient for both sides.
How conditional requests work
When a server serves a page, it can include validators: an ETag (an opaque token representing the current version) and a Last-Modified date. A crawler that stores these can, on its next visit, send them back as If-None-Match (for the ETag) or If-Modified-Since (for the date), asking the server to respond fully only if the resource changed.
If nothing changed, the server returns 304 Not Modified with headers but no body. The crawler keeps using its stored copy, and almost no bandwidth is spent. If the resource did change, the server returns a normal 200 with the new content and updated validators.
Why this helps AI crawlers and you
AI crawlers re-visit pages to keep their picture current. Without conditional requests, every re-visit is a full 200 download even when nothing changed, wasting bandwidth on both ends. With them, an unchanged page costs a tiny 304 exchange instead of a full transfer.
For a large site re-crawled often, the saving is substantial: the crawler reconfirms freshness cheaply, and your origin and CDN move far fewer bytes. It also lets the crawler spend its budget on pages that actually changed, improving how current its view of your site stays.
- Server sends ETag and Last-Modified validators with a page
- Crawler re-requests with If-None-Match or If-Modified-Since
- Unchanged page returns a bodyless 304, saving bandwidth both ways
Supporting conditional requests well
Make sure your stack emits stable validators: an ETag that only changes when the content does, and an accurate Last-Modified. Validators that change on every request — for instance, an ETag derived from a timestamp — defeat the mechanism, because the crawler is always told the page changed.
Honour the conditional headers on the way in, returning 304 when the validator still matches. Behind a CDN, confirm the cache passes conditional requests through or answers them itself. Done right, conditional requests pair with accurate sitemap lastmod and good caching to keep AI re-crawls light.
How it appears in analytics and logs
Frequent 304 responses to an AI crawler mean conditional requests are working: the crawler is re-validating and your server is confirming no change cheaply. All-200 re-crawls of unchanged pages mean validators are missing or ignored, wasting bandwidth.
Diagnostic use case
Serve ETag and Last-Modified validators and honour If-None-Match and If-Modified-Since so AI crawlers re-checking your pages receive a lightweight 304 when nothing changed, cutting bandwidth on repeat crawls.
What WebmasterID can help detect
WebmasterID records the status codes AI tokens receive, so a healthy share of 304 responses on re-crawls is visible as evidence that conditional requests are saving bandwidth on the bot-intelligence surface.
Common mistakes
- Emitting an ETag that changes every request, so 304 never happens.
- Not honouring If-None-Match or If-Modified-Since, forcing full 200 re-crawls.
- Sending an inaccurate Last-Modified that misreports change.
- Assuming a CDN handles conditional requests without confirming it does.
Privacy and accuracy notes
Conditional requests concern whether a resource changed, not who asks. The validators are page metadata; detection keys on the crawler token and the request, never on visitor identity or precise location.
Frequently asked questions
- How do AI crawlers avoid re-downloading unchanged pages?
- Through conditional requests. If you serve an ETag or Last-Modified validator, a crawler can re-request with If-None-Match or If-Modified-Since, and your server replies 304 Not Modified — no body — when nothing changed, so the crawler reuses its stored copy and almost no bandwidth is spent.
Related pages
- AI crawlers, caching, and snapshots
An AI assistant can present content from a stored snapshot taken during an earlier crawl rather than fetching your page in real time. That means an AI may reference a version of your page that no longer matches the live one, and your logs may show no recent crawl despite active AI usage. This entry explains snapshot behaviour and its measurement consequences.
- AI crawlers and content negotiation
Content negotiation lets a server return different representations of a URL based on request headers like Accept and Accept-Encoding. AI crawlers send these headers too, so the variant they receive depends on what they advertise and what you serve. Mishandled negotiation — wrong Vary header, or serving crawlers a different representation than humans — can distort what is ingested.
- AI crawl budget and server load
Each AI crawler spends a finite budget on your site and consumes real origin resources per request. Inefficient URL structures, parameter explosions, and uncacheable dynamic pages waste that budget and amplify load. Reducing wasted fetches lets the budget reach your important content while keeping CPU, database, and bandwidth use sustainable.
- Website observability
See the share of 304 responses AI crawlers receive on re-crawls.
Sources and verification notes
- MDN — Conditional requestsDocuments ETag, Last-Modified, If-None-Match, If-Modified-Since, and 304.
- RFC 9110 — HTTP Semantics (Conditional Requests)Defines validator semantics and 304 Not Modified.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.