Security scanners vs search crawlers
Security scanners (Censys, Shodan, BinaryEdge, Qualys and similar) probe hosts, ports, and application surface to assess exposure and find vulnerabilities. Search crawlers (Googlebot, Bingbot) fetch and index content to rank it. Confusing the two leads to wrong robots.txt decisions and misread logs: robots.txt governs content crawling, not port scanning, and scan traffic should never be counted as audience.
What this means
A search crawler exists to discover and index content so it can be ranked and shown in search results. A security scanner exists to map exposure: which hosts answer, which ports are open, and whether an application has exploitable weaknesses. They share the trait of being automated, but their goals and their request patterns are different.
Search crawlers follow links and fetch real pages. Scanners often request unusual paths, probe ports, and send many parameter variations that have nothing to do with content.
Why robots.txt is the wrong control for scanning
robots.txt is a convention that compliant content crawlers honour when deciding what to index. Internet scanners and vulnerability tools are not content crawlers and generally do not treat robots.txt as a scanning boundary, because they are characterising infrastructure, not building a search index.
To limit scanning, use the provider's documented opt-out where one exists, network-level controls, and good external hygiene — not robots.txt. To control content indexing, use robots.txt and meta-robots directives aimed at the search crawlers.
- Search crawler: fetches content to index and rank it
- Security scanner: probes ports/parameters to assess exposure
- robots.txt governs content crawling, not scanning
How it appears in analytics and logs
If logs show systematic probing of ports, parameters, or many URL variations, that is scanning, not indexing. Search crawlers fetch real content paths to index them. Misclassifying scanning as crawling produces wrong robots.txt rules and wrong analytics.
Diagnostic use case
Decide the right response to a given automated probe by classifying it as a security scan or a search crawl, since the controls and the meaning differ.
What WebmasterID can help detect
WebmasterID separates security-scan probes from search-crawl coverage server-side, so each is read correctly and neither inflates human analytics.
Common mistakes
- Adding robots.txt rules to stop port scanning — wrong tool.
- Reading scanner probes as search indexing that affects rankings.
- Counting scan traffic as human or as crawl coverage.
Privacy and accuracy notes
Both scanners and crawlers are identified by user-agent and behaviour only. No visitor identity is involved; WebmasterID records automated probes as bot events, never as human profiles.
Related pages
- Censys and Shodan scanning crawlers
Censys and Shodan are internet-wide scanning services that map reachable hosts, open ports, and exposed services for security research and asset discovery. They are not search-engine crawlers indexing your content for ranking; they probe infrastructure. Their requests appear in logs as scanning activity from their published scanner identities, and they offer opt-out mechanisms for operators.
- Qualys web application scanner
Qualys operates security scanning that assesses web applications and infrastructure for vulnerabilities and misconfigurations. Some Qualys scanning is authorised by the site owner (an internal security assessment); some is part of broader internet measurement. It is a security tool, not a search crawler, and its probes appear in logs as scanning rather than content fetching for ranking.
- Search crawlers vs SEO crawlers
Search-engine crawlers like Googlebot and Bingbot build the indexes that determine search visibility. Third-party SEO crawlers like AhrefsBot and SemrushBot feed analysis tools and do not affect rankings directly. Distinguishing them matters for crawl-budget reasoning and for deciding what to allow or limit.
- Bot vs human
How automated probes are separated from real visitors.
Sources and verification notes
- Google Search Central — robots.txt introductionrobots.txt governs content crawling by compliant crawlers; it is not an access-control or anti-scanning mechanism.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.