Rate-limiting crawlers without losing indexing
When a crawler is overloading your server, the goal is to slow it without telling search engines your content is gone. Safe techniques include returning 503 or 429 with a Retry-After header for short-term overload, using crawl-delay only where a crawler honours it (Googlebot does not), and adjusting settings where the operator provides them. Blunt blocks or long outages risk deindexing, so rate-limit deliberately.
Use the right temporary signal
For genuine short-term overload, return 503 Service Unavailable or 429 Too Many Requests with a Retry-After header. These tell compliant crawlers the condition is temporary and when to come back; they preserve the URL's index status because you are signalling a transient state, not removal. Google specifically recommends 503/429 for telling Googlebot to slow down or pause briefly.
Keep such throttling short. Sustained 5xx/429 over long periods can lead crawlers to reduce crawl rate broadly and, if pages stay unavailable, to drop them from the index.
- Short-term overload: 503 or 429 with a Retry-After header
- These preserve index status by signalling a temporary condition
- Do not keep them on for long periods — that reduces crawl rate and risks deindexing
Crawl-delay and operator settings
The robots.txt crawl-delay directive is honoured by some crawlers (for example Bing historically supports it) but not by Googlebot, which ignores it. So crawl-delay is not a universal lever; apply it only for crawlers documented to obey it, and rely on other mechanisms for those that do not.
Where an operator provides a way to influence crawl rate, use it. Reduce the cause of load too: a crawler hammering thousands of faceted or parameter URLs is better fixed by removing the crawl trap than by throttling, because that recovers crawl budget instead of just capping it.
What not to do
Do not block an aggressive but legitimate crawler with a 403 or a robots Disallow as a load fix — that can remove your content from that engine. Do not return 404/410, which signals the content is gone. Do not leave a site-wide 503 up for days; that is read as an extended outage.
First verify the crawler is who it claims to be (by the operator's published method), then choose a proportionate response: throttle verified-but-aggressive crawlers temporarily, and block only unverified or abusive automation. Reserve hard blocks for traffic that is not a legitimate crawler at all.
How it appears in analytics and logs
How you slow a crawler matters. A temporary 503/429 with Retry-After tells crawlers to back off and return; a 404/410 or robots block tells them the content is gone or off-limits. Using the wrong signal to manage load can cost you indexing.
Diagnostic use case
Reduce load from aggressive crawling without harming indexing: choose the right throttling signal (503/429 with Retry-After) and avoid blocks that cause pages to drop.
What WebmasterID can help detect
WebmasterID records which crawlers are fetching how often and what status they receive server-side, helping you identify the specific bot causing load and confirm that throttling reduced it without blocking legitimate crawling.
Common mistakes
- Blocking a legitimate crawler with 403/robots to fix load, risking deindexing.
- Returning 404/410 for overload, telling crawlers the content is gone.
- Relying on robots.txt crawl-delay for Googlebot, which ignores it.
- Leaving a site-wide 503 up for days so it reads as an extended outage.
Privacy and accuracy notes
Rate-limiting governs request volume from crawlers, not people. WebmasterID records crawler fetch rates and statuses without attaching them to any visitor.
Frequently asked questions
- How do I tell Googlebot to slow down temporarily?
- Return 503 or 429 responses with a Retry-After header for the affected requests. Google reads this as a temporary signal to reduce crawling and to retry later, without treating the pages as removed. Keep it short-term.
- Does Googlebot obey robots.txt crawl-delay?
- No. Googlebot ignores the crawl-delay directive. Some other crawlers honour it, so use it only for those, and manage Googlebot through proper status codes and server performance instead.
Related pages
- HTTP 429 Too Many Requests and crawl rate
429 Too Many Requests means the client has sent too many requests in a given time and is being rate limited. It can include a Retry-After header telling the client when to try again. Compliant crawlers slow down in response, making 429 a controlled way to manage crawl rate.
- Crawl rate and server load
When crawlers request pages faster than your origin can comfortably serve, load rises. Compliant crawlers respond to 429 and 503 with Retry-After by slowing down, giving you a controlled way to protect the server. Google adjusts crawl rate automatically based on site responsiveness and offers a way to report rate problems.
- Diagnosing a bot traffic spike
A sudden spike in traffic is often bots, not audience. The diagnostic question is which bots: a verified crawler doing a fresh crawl wave, or spoofers and scrapers impersonating known crawlers. Separating verified crawlers from impostors by user-agent token and verification keeps your human analytics honest.
- Bot intelligence
Identify which crawler is causing load and confirm throttling worked, server-side.
Sources and verification notes
- Google Search Central — Reduce the Googlebot crawl rate
- Google Search Central — robots.txt specifications (crawl-delay unsupported)
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.