How to block the BinaryEdge scanner
BinaryEdge runs internet-wide scans that catalogue exposed services and web properties for its attack-surface and threat-intelligence datasets. Where it crawls web content with a declared token, robots.txt can ask it to stop; but much internet-wide scanning operates below the HTTP-courtesy layer, so a firewall rule is usually the real control. This page covers both.
What this means
BinaryEdge builds datasets about exposed internet services and web properties for security and attack-surface monitoring. When it fetches web content with a declared crawler token, robots.txt can ask it to stop. But a large part of internet-wide scanning happens at the service level — probing ports and endpoints — which does not consult robots.txt at all.
So treat a robots.txt block as covering the courteous, token-carrying web crawl, while accepting that broad scanning of public IP space is a separate matter handled at the network layer.
How to block it
Target the BinaryEdge crawler token in its own group:
User-agent: BinaryEdge Disallow: /
Then confirm in your logs whether token-carrying requests stop. For scanning that continues without honouring robots.txt, the appropriate control is a firewall or WAF rule, optionally rate-limiting or blocking the offending sources. robots.txt is a request to compliant crawlers and is never an access-control mechanism for security scanners.
- robots.txt token to target: BinaryEdge
- Service-level scanning ignores robots.txt by design
- Use a firewall/WAF for non-compliant scanning
How it appears in analytics and logs
A request carrying a BinaryEdge token is an attack-surface or threat-intelligence scan, not a human visit. It is bot traffic. Internet-wide scanners often probe directly, so the absence of token-carrying requests does not always mean you are not being scanned.
Diagnostic use case
Ask BinaryEdge's web crawler to skip your pages, and decide when a firewall rule is the correct control for scanning that ignores robots.txt.
What WebmasterID can help detect
WebmasterID classifies scanning crawlers server-side, so you can see BinaryEdge web-crawl activity and judge whether robots.txt is enough or a firewall rule is needed.
Common mistakes
- Expecting robots.txt to stop internet-wide service scanning — it only governs courteous web crawls.
- Trusting the user agent rather than confirming behaviour in logs.
- Counting scanner hits as human traffic.
Privacy and accuracy notes
Blocking BinaryEdge relies only on the request user-agent token. No human identity or raw IP is exposed as a feature. WebmasterID records the scan as a bot event, separate from human analytics.
Related pages
- How to block the Censys scanner
Censys runs internet-wide scanning that catalogs hosts and services for security research. Because it operates at the host/port level rather than fetching pages as a polite web crawler, robots.txt is largely ineffective. This page explains what Censys does and why firewall-level controls, not robots.txt, are the right response.
- How to block the Qualys web scanner
Qualys runs web-application and vulnerability scanners used by security teams to assess sites. When a Qualys crawler fetches content with a declared token, robots.txt can ask it to stop — but a scan you own is configured inside Qualys, so the right control depends on whether the scan is yours or a third party's. This page covers both cases.
- robots.txt vs a firewall/WAF
robots.txt and a firewall/WAF solve different problems: robots.txt politely asks compliant crawlers what to skip, while a firewall or WAF actually blocks requests at the network or edge layer. This page contrasts the two, explains when each is appropriate, and warns against using robots.txt for jobs only enforcement can do.
- Bot intelligence
Deterministic categorisation of scanning and security crawlers.
Sources and verification notes
- BinaryEdge — documentationBinaryEdge documents its scanning; confirm the exact web-crawler token against current docs.
- Robots Exclusion Protocol (RFC 9309)robots.txt governs courteous crawlers, not service-level scanners.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.