An AI crawler is hammering my site — should I block it?

Not as a first move. Verify the token against the operator's published identity, then contain the load with rate limiting and 429 or temporary 503. Reserve a permanent block for confirmed spoofing or non-compliance; for a genuine crawler, caching or a standing limit usually solves it.

AI crawlers

AI crawler incident response

An AI crawler incident is a crawl event that threatens stability or trust: a sudden request surge, a crawl that loads the origin near failure, or a request claiming a crawler identity it cannot prove. Good incident response is staged — verify, contain, then decide — so you protect the site without permanently blocking a crawler over a transient spike.

Verified against primary sources

Verify before you react

The first step is to confirm what is happening. Identify the crawler token responsible for the surge and check whether the requests match the operator's documented identity — the self-identifying URL pattern and, where published, the source IP ranges. A user agent is a claim; a spike that says GPTBot or ClaudeBot but comes from outside the operator's ranges is not that crawler.

Distinguish a genuine crawl wave from abuse. Compliant crawlers respect robots.txt and back off on 429 and 503; a client that ignores both while impersonating a known token is behaving like an impersonator, and the response differs accordingly.

Contain without breaking

If load is the problem, contain it with reversible controls before reaching for a permanent block. Rate limiting and returning 429 Too Many Requests with a Retry-After header ask a compliant crawler to slow down; a temporary 503 during peak stress signals unavailability without telling the crawler the content is gone.

Reserve permanent measures for confirmed abuse. Blocking a genuine, well-behaved crawler over a transient surge can cost AI visibility you wanted, and the surge often subsides on its own once the crawler completes its wave or honours your throttle.

Verify token and source against operator documentation first
Use 429 with Retry-After or temporary 503 to contain load reversibly
Reserve permanent blocks for confirmed spoofing or non-compliance

Decide and record

Once the pressure is off, make a durable decision. If the crawler is genuine and you value it, the fix is usually caching the heavy endpoints or a standing rate limit, not a block. If it is genuine but unwanted, target its documented token in robots.txt. If it was spoofing, the durable control is verification-based — match the source against published ranges — not the user agent alone.

Record what happened: which token, what volume, how you responded, and the outcome. A short incident note turns a one-off scramble into a playbook, so the next surge is routine rather than an emergency.

How it appears in analytics and logs

A short, sharp rise in requests from one AI token usually signals a crawl wave, not an attack. Requests claiming a known token but originating outside the operator's published ranges signal spoofing and warrant a different response than a genuine crawler.

Diagnostic use case

Work through an AI crawler incident methodically: confirm what is actually happening, contain the load with rate limiting or temporary 429s, verify the crawler's identity, then make a durable allow, limit, or block decision once the pressure is off.

What WebmasterID can help detect

WebmasterID records AI crawler request volume and the pages and status codes involved, so during an incident you can see which token is responsible and how the site is responding on the bot-intelligence surface, instead of grepping logs under pressure.

Common mistakes

Permanently blocking a genuine crawler over a transient surge.
Trusting the user agent during an incident instead of verifying the source.
Returning 404 or 410 under load, telling crawlers the content is gone.
Not recording the incident, so the next surge is handled from scratch.

Privacy and accuracy notes

Incident response keys on crawler tokens, request rates, and operator-published verification signals. It concerns machine traffic, not people, and introduces no visitor identity or precise-location tracking.

Frequently asked questions

An AI crawler is hammering my site — should I block it?: Not as a first move. Verify the token against the operator's published identity, then contain the load with rate limiting and 429 or temporary 503. Reserve a permanent block for confirmed spoofing or non-compliance; for a genuine crawler, caching or a standing limit usually solves it.

↑ All AI crawlers in AI crawlers

Sources and verification notes

MDN — 429 Too Many RequestsDocuments 429 and Retry-After for reversibly throttling crawlers.
MDN — 503 Service Unavailable503 signals temporary unavailability without implying content removal.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.