AI crawlers and HTTP 429 rate limits
HTTP 429 Too Many Requests is the standard way to tell a crawler it is sending too many requests. A compliant AI crawler should back off, ideally honouring a Retry-After header. This entry explains how 429 interacts with AI crawlers, the Retry-After mechanism, and why 429 is a cooperative signal rather than a hard block.
What 429 signals
HTTP 429 Too Many Requests, defined in RFC 6585, tells a client it has sent too many requests in a given window. For an AI crawler, returning 429 is the cooperative way to say 'slow down' without permanently denying access. A well-behaved crawler interprets 429 as a throttle and reduces its request rate.
This is friendlier than a hard 403 block: it keeps the door open while protecting your origin from overload during a heavy crawl wave.
Retry-After and the limits of 429
You can pair 429 with a Retry-After header to suggest how long the crawler should wait before retrying. A compliant crawler should respect that interval. The header is the polite, standards-based way to set the pace rather than leaving the back-off to the crawler's own heuristics.
The catch is that 429 only works if the crawler honours it. A non-compliant client can ignore 429 and keep hammering, in which case you need a harder control at the edge or WAF. So treat 429 as the first, cooperative lever — effective for well-behaved crawlers, insufficient on its own for those that ignore it.
- 429 (RFC 6585) asks a crawler to slow down, not stop forever
- Retry-After suggests the wait interval to compliant crawlers
- Non-compliant clients can ignore 429 — escalate at edge/WAF if needed
How it appears in analytics and logs
A pattern of 429 responses to a given AI-crawler token means your origin is shedding load from that crawler. Whether the crawler then slows down tells you something about its compliance; continued hammering after 429 signals a poorly behaved client.
Diagnostic use case
Use 429 responses to throttle aggressive AI crawlers gracefully, and understand what a compliant crawler should do versus what 429 cannot enforce.
What WebmasterID can help detect
WebmasterID can surface response codes per crawler, so you can see whether 429s are being issued to an AI crawler and whether that crawler subsequently backs off.
Common mistakes
- Assuming 429 forces every crawler to back off — only compliant ones do.
- Returning 429 without a Retry-After to guide the back-off interval.
- Using a hard 403 when a cooperative 429 would have throttled a good crawler.
Privacy and accuracy notes
Rate-limit decisions use request metadata such as the user-agent token and request rate, not visitor identity. WebmasterID records crawls and their response codes as bot events only.
Frequently asked questions
- Will a 429 remove an AI crawler permanently?
- No. 429 is a temporary 'too many requests' signal, not a block. A compliant crawler slows down and tries again later, ideally after the Retry-After interval. To deny access entirely you need a different control.
Related pages
- Rate-limiting AI crawlers
Rate-limiting AI crawlers throttles how fast they fetch without fully blocking them. Options range from robots.txt crawl-delay (honoured by some crawlers, ignored by others) to server-side or CDN request limits that return 429 Too Many Requests. The goal is to protect origin capacity while still allowing AI crawlers to read your content over time.
- HTTP response codes and AI crawlers
AI crawlers act on the HTTP status you return. A 200 invites ingestion; 301/308 moves them to a new URL; 403 or 401 signals refusal; 404/410 says the page is gone; 429 asks them to slow down; 5xx says try again later. Returning the right code is how you steer a compliant AI crawler without blunt blocking, and the wrong code can mislead it for a long time.
- AI crawlers, CDN and WAF
Most AI-crawler traffic hits your CDN and WAF before it ever reaches the origin. That edge layer is where allow, throttle, challenge, and block decisions are most effective. Some CDNs ship managed rules and verified-bot lists for AI crawlers; the trade-off is that a JavaScript challenge can break a legitimate crawler that does not execute scripts.
- Website observability
Watch response codes per crawler, including 429 back-off behaviour.
Sources and verification notes
- RFC 6585 — HTTP 429 Too Many RequestsDefines 429 and references Retry-After.
- MDN — Retry-After headerDocuments the Retry-After back-off interval.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.