WebmasterID logoWebmasterID
Crawl diagnostics

Rate-limiting crawlers without losing indexing

When a crawler is overloading your server, the goal is to slow it without telling search engines your content is gone. Safe techniques include returning 503 or 429 with a Retry-After header for short-term overload, using crawl-delay only where a crawler honours it (Googlebot does not), and adjusting settings where the operator provides them. Blunt blocks or long outages risk deindexing, so rate-limit deliberately.

Verified against primary sources

Use the right temporary signal

For genuine short-term overload, return 503 Service Unavailable or 429 Too Many Requests with a Retry-After header. These tell compliant crawlers the condition is temporary and when to come back; they preserve the URL's index status because you are signalling a transient state, not removal. Google specifically recommends 503/429 for telling Googlebot to slow down or pause briefly.

Keep such throttling short. Sustained 5xx/429 over long periods can lead crawlers to reduce crawl rate broadly and, if pages stay unavailable, to drop them from the index.

Crawl-delay and operator settings

The robots.txt crawl-delay directive is honoured by some crawlers (for example Bing historically supports it) but not by Googlebot, which ignores it. So crawl-delay is not a universal lever; apply it only for crawlers documented to obey it, and rely on other mechanisms for those that do not.

Where an operator provides a way to influence crawl rate, use it. Reduce the cause of load too: a crawler hammering thousands of faceted or parameter URLs is better fixed by removing the crawl trap than by throttling, because that recovers crawl budget instead of just capping it.

What not to do

Do not block an aggressive but legitimate crawler with a 403 or a robots Disallow as a load fix — that can remove your content from that engine. Do not return 404/410, which signals the content is gone. Do not leave a site-wide 503 up for days; that is read as an extended outage.

First verify the crawler is who it claims to be (by the operator's published method), then choose a proportionate response: throttle verified-but-aggressive crawlers temporarily, and block only unverified or abusive automation. Reserve hard blocks for traffic that is not a legitimate crawler at all.

How it appears in analytics and logs

How you slow a crawler matters. A temporary 503/429 with Retry-After tells crawlers to back off and return; a 404/410 or robots block tells them the content is gone or off-limits. Using the wrong signal to manage load can cost you indexing.

Diagnostic use case

Reduce load from aggressive crawling without harming indexing: choose the right throttling signal (503/429 with Retry-After) and avoid blocks that cause pages to drop.

What WebmasterID can help detect

WebmasterID records which crawlers are fetching how often and what status they receive server-side, helping you identify the specific bot causing load and confirm that throttling reduced it without blocking legitimate crawling.

Common mistakes

Privacy and accuracy notes

Rate-limiting governs request volume from crawlers, not people. WebmasterID records crawler fetch rates and statuses without attaching them to any visitor.

Frequently asked questions

How do I tell Googlebot to slow down temporarily?
Return 503 or 429 responses with a Retry-After header for the affected requests. Google reads this as a temporary signal to reduce crawling and to retry later, without treating the pages as removed. Keep it short-term.
Does Googlebot obey robots.txt crawl-delay?
No. Googlebot ignores the crawl-delay directive. Some other crawlers honour it, so use it only for those, and manage Googlebot through proper status codes and server performance instead.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.