How to block the Netcraft crawler
Netcraft runs crawlers for its internet-survey, technology-detection and anti-phishing services. They are declared crawlers with a documented robots.txt token. Operators who do not want their site sampled into Netcraft's surveys can disallow the crawler, with the caveat that security-related scanning may not be governed by robots.txt at all.
What this means
Netcraft is known for its long-running internet survey, web-server and technology detection, and anti-phishing services. Its crawlers sample public sites to build those datasets. Blocking the survey crawler asks Netcraft to stop including your pages in survey-style data collection.
Note that Netcraft also performs security and anti-fraud scanning, for example checking sites reported as phishing. That kind of scanning is not a courtesy crawl and may not follow robots.txt, so a robots.txt block addresses survey-style crawling rather than all Netcraft activity.
How to block it
Target the Netcraft crawler token in its own user-agent group, matching on the stable token rather than a version string.
User-agent: NetcraftSurveyAgent Disallow: /
Because robots.txt is advisory, confirm in your logs that survey requests stop. If a Netcraft user agent continues — particularly around security checks — robots.txt is the wrong tool; a firewall or WAF rule is the appropriate control for traffic that does not honour the exclusion protocol.
- robots.txt token to target: NetcraftSurveyAgent
- Security/anti-phishing scans may ignore robots.txt by design
- Use a firewall/WAF for traffic that does not honour robots.txt
How it appears in analytics and logs
A request carrying Netcraft's crawler token is a survey or technology-detection fetch, not a human visit. It is bot traffic. The user agent is only a claim, so verify behaviour in your logs rather than trusting the token alone.
Diagnostic use case
Ask Netcraft's survey crawler to skip your site, and understand why security scanning may continue regardless of robots.txt.
What WebmasterID can help detect
WebmasterID classifies survey and security crawlers server-side, so you can see whether Netcraft activity continues after a robots.txt change and decide if a firewall rule is warranted.
Common mistakes
- Expecting a robots.txt block to stop Netcraft's security or anti-phishing scans.
- Trusting the Netcraft user agent without confirming behaviour in logs.
- Counting survey-crawler hits as human traffic.
Privacy and accuracy notes
Blocking Netcraft relies only on the request user-agent token. No human identity is involved. WebmasterID records the crawl as a bot event, separate from human analytics, and never attaches it to a visitor profile.
Related pages
- How to block the Censys scanner
Censys runs internet-wide scanning that catalogs hosts and services for security research. Because it operates at the host/port level rather than fetching pages as a polite web crawler, robots.txt is largely ineffective. This page explains what Censys does and why firewall-level controls, not robots.txt, are the right response.
- How to block the Qualys web scanner
Qualys runs web-application and vulnerability scanners used by security teams to assess sites. When a Qualys crawler fetches content with a declared token, robots.txt can ask it to stop — but a scan you own is configured inside Qualys, so the right control depends on whether the scan is yours or a third party's. This page covers both cases.
- robots.txt vs a firewall/WAF
robots.txt and a firewall/WAF solve different problems: robots.txt politely asks compliant crawlers what to skip, while a firewall or WAF actually blocks requests at the network or edge layer. This page contrasts the two, explains when each is appropriate, and warns against using robots.txt for jobs only enforcement can do.
- Bot intelligence
Deterministic categorisation of survey, security, and SEO crawlers.
Sources and verification notes
- Netcraft — Internet Data Mining and surveyNetcraft documents its survey and security services; confirm the exact token against current docs.
- Robots Exclusion Protocol (RFC 9309)Defines user-agent group and Disallow semantics.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.