Robots & crawl control

How to block the Netcraft crawler

Netcraft runs crawlers for its internet-survey, technology-detection and anti-phishing services. They are declared crawlers with a documented robots.txt token. Operators who do not want their site sampled into Netcraft's surveys can disallow the crawler, with the caveat that security-related scanning may not be governed by robots.txt at all.

Partially verified

What this means

Netcraft is known for its long-running internet survey, web-server and technology detection, and anti-phishing services. Its crawlers sample public sites to build those datasets. Blocking the survey crawler asks Netcraft to stop including your pages in survey-style data collection.

Note that Netcraft also performs security and anti-fraud scanning, for example checking sites reported as phishing. That kind of scanning is not a courtesy crawl and may not follow robots.txt, so a robots.txt block addresses survey-style crawling rather than all Netcraft activity.

How to block it

Target the Netcraft crawler token in its own user-agent group, matching on the stable token rather than a version string.

User-agent: NetcraftSurveyAgent Disallow: /

Because robots.txt is advisory, confirm in your logs that survey requests stop. If a Netcraft user agent continues — particularly around security checks — robots.txt is the wrong tool; a firewall or WAF rule is the appropriate control for traffic that does not honour the exclusion protocol.

robots.txt token to target: NetcraftSurveyAgent
Security/anti-phishing scans may ignore robots.txt by design
Use a firewall/WAF for traffic that does not honour robots.txt

How it appears in analytics and logs

A request carrying Netcraft's crawler token is a survey or technology-detection fetch, not a human visit. It is bot traffic. The user agent is only a claim, so verify behaviour in your logs rather than trusting the token alone.

Diagnostic use case

Ask Netcraft's survey crawler to skip your site, and understand why security scanning may continue regardless of robots.txt.

What WebmasterID can help detect

WebmasterID classifies survey and security crawlers server-side, so you can see whether Netcraft activity continues after a robots.txt change and decide if a firewall rule is warranted.

Common mistakes

Expecting a robots.txt block to stop Netcraft's security or anti-phishing scans.
Trusting the Netcraft user agent without confirming behaviour in logs.
Counting survey-crawler hits as human traffic.

Privacy and accuracy notes

Blocking Netcraft relies only on the request user-agent token. No human identity is involved. WebmasterID records the crawl as a bot event, separate from human analytics, and never attaches it to a visitor profile.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Netcraft — Internet Data Mining and surveyNetcraft documents its survey and security services; confirm the exact token against current docs.
Robots Exclusion Protocol (RFC 9309)Defines user-agent group and Disallow semantics.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.