WebmasterID logoWebmasterID
AI crawlers

Rate-limiting AI crawlers

Rate-limiting AI crawlers throttles how fast they fetch without fully blocking them. Options range from robots.txt crawl-delay (honoured by some crawlers, ignored by others) to server-side or CDN request limits that return 429 Too Many Requests. The goal is to protect origin capacity while still allowing AI crawlers to read your content over time.

Verified against primary sources

Why rate-limit instead of block

Blocking an AI crawler removes your content from that system entirely. Rate-limiting keeps it allowed but caps the request rate so a crawl wave does not exhaust origin CPU, database connections, or bandwidth. It is the middle path between full allow and full block.

The right ceiling depends on your capacity, not on a universal number. A static site behind a CDN tolerates far higher crawl rates than a dynamic app that renders each request at the origin.

Crawl-delay and its limits

Some crawlers honour a robots.txt Crawl-delay directive, which asks for a minimum gap between requests. It is a request, not an enforcement mechanism, and major crawlers vary in whether they obey it — Google, for example, does not use Crawl-delay and manages crawl rate through its own controls.

Because honouring is inconsistent, treat crawl-delay as a hint and pair it with enforced limits. Never assume a crawl-delay line alone will protect a fragile origin.

Enforced limits and 429

For enforcement, apply a rate limit at the CDN or web server keyed on the crawler's token or its verified source, and return 429 Too Many Requests with a Retry-After header when the limit is exceeded. Well-behaved crawlers back off on 429 and retry later, so coverage continues at a sustainable pace.

Prefer 429 over 403 for throttling: 403 signals a hard refusal, while 429 signals 'slow down, come back', which is the behaviour you actually want from a crawler you intend to keep allowing.

How it appears in analytics and logs

A burst of requests from one AI token hitting many URLs per second is a crawl wave that may strain your origin. Sustained 429s to that token in logs mean your rate limit is engaging — controlling load, but also slowing that crawler's coverage.

Diagnostic use case

Reduce origin load from aggressive AI crawl waves while keeping your content available to AI systems, by combining crawl-delay hints with enforced server or CDN rate limits.

What WebmasterID can help detect

WebmasterID shows per-token request volume and timing on the bot-intelligence surface, so you can see which AI crawler is driving load and whether your rate limits are taking effect, without parsing raw logs.

Common mistakes

Privacy and accuracy notes

Rate-limiting decisions key on the crawler's user-agent token and request rate, not on any visitor identity. No human data is used, and a crawler is not a person.

Frequently asked questions

Does every AI crawler honour Crawl-delay?
No. Crawl-delay is advisory and support varies by operator; some ignore it entirely. To actually protect your origin, enforce a rate limit at the CDN or server and return 429 with Retry-After.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.