HTTP response codes and AI crawlers
AI crawlers act on the HTTP status you return. A 200 invites ingestion; 301/308 moves them to a new URL; 403 or 401 signals refusal; 404/410 says the page is gone; 429 asks them to slow down; 5xx says try again later. Returning the right code is how you steer a compliant AI crawler without blunt blocking, and the wrong code can mislead it for a long time.
Codes that invite or move crawling
A 200 OK is an invitation: the content is here, ingest it. A 301 or 308 permanent redirect tells the crawler the URL has moved for good, so it should follow and update its reference; a 302 or 307 signals a temporary move and the crawler keeps the original URL in mind.
Use redirects deliberately. A permanent move deserves 301/308 so the crawler consolidates on the new URL; using a temporary redirect for a permanent move keeps the crawler returning to the old address.
Codes that refuse, retire, or throttle
403 Forbidden and 401 Unauthorized signal refusal — the crawler is not allowed here. 404 Not Found says the page is missing; 410 Gone says it is intentionally and permanently removed, which is a stronger, faster signal to drop the URL. 429 Too Many Requests with Retry-After asks the crawler to slow down rather than go away.
Choosing precisely matters: 410 retires a URL faster than 404, and 429 throttles where 403 would read as a permanent block. Each code communicates a different intent to a compliant crawler.
- 200 invites ingestion; 301/308 permanently moves the crawler
- 404 means missing, 410 means intentionally gone (stronger signal)
- 429 + Retry-After throttles; 403/401 refuse outright
Server errors and crawl behaviour
5xx responses tell a crawler your origin is having trouble. Compliant crawlers typically back off and retry later when they see sustained 5xx, which can slow your crawl coverage if errors persist. A flood of 500s during a crawl wave is both a reliability problem and a crawl-coverage problem.
Avoid serving 200 for error or empty pages — a soft-404 hides the problem and wastes crawl budget. Return honest status codes so crawlers behave the way the code intends.
How it appears in analytics and logs
The status your origin returns to an AI token tells you how it will likely behave next: repeated 200s mean ongoing ingestion, sustained 429s mean throttling is engaging, and 5xx spikes may slow crawling because the crawler treats your origin as unstable.
Diagnostic use case
Use precise HTTP status codes to guide AI crawlers — throttle with 429, retire URLs with 410, refuse with 403 — instead of relying on ambiguous responses that confuse their behaviour.
What WebmasterID can help detect
WebmasterID records the status returned to each AI token per URL, so you can confirm whether crawlers receive the codes you intend on the observability and bot-intelligence surfaces.
Common mistakes
- Using a temporary redirect (302/307) for a permanent move.
- Serving soft-404s (200 for missing content) instead of real 404/410.
- Returning 403 when you meant to throttle — use 429 with Retry-After.
- Ignoring sustained 5xx, which makes crawlers back off and slows coverage.
Privacy and accuracy notes
Response-code handling concerns crawler behaviour and server policy, not visitor identity. Analysis keys on the crawler token and status returned; no human data is involved.
Related pages
- Rate-limiting AI crawlers
Rate-limiting AI crawlers throttles how fast they fetch without fully blocking them. Options range from robots.txt crawl-delay (honoured by some crawlers, ignored by others) to server-side or CDN request limits that return 429 Too Many Requests. The goal is to protect origin capacity while still allowing AI crawlers to read your content over time.
- AI crawl budget and server load
Each AI crawler spends a finite budget on your site and consumes real origin resources per request. Inefficient URL structures, parameter explosions, and uncacheable dynamic pages waste that budget and amplify load. Reducing wasted fetches lets the budget reach your important content while keeping CPU, database, and bandwidth use sustainable.
- Website observability
See the status codes your origin returns to each AI crawler.
Sources and verification notes
- MDN — HTTP response status codesCanonical reference for 2xx/3xx/4xx/5xx semantics.
- Google — HTTP status codes and crawlingDocuments how crawlers interpret status codes including 410 and 5xx.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.