WebmasterID logoWebmasterID
Robots & crawl control

How to block the Censys scanner

Censys runs internet-wide scanning that catalogs hosts and services for security research. Because it operates at the host/port level rather than fetching pages as a polite web crawler, robots.txt is largely ineffective. This page explains what Censys does and why firewall-level controls, not robots.txt, are the right response.

Partially verified

What Censys is

Censys performs internet-wide scanning to catalog hosts, open ports, certificates, and services for security and research purposes. Unlike a search-engine crawler that requests pages and reads robots.txt, a host scanner probes addresses and services directly.

Where Censys makes HTTP requests, it may send a self-identifying user agent. But because the goal is host discovery rather than content crawling, robots.txt — a content-crawling convention — does not meaningfully govern its behavior.

Why robots.txt is the wrong tool

You can add a Disallow for any self-identifying scanner token, and a compliant HTTP fetcher may honour it on the content-crawling side:

User-agent: CensysInspect Disallow: /

But this does nothing about host-level scanning. To limit exposure, use enforcement: firewall rules, an IP allowlist for sensitive services, and a WAF. Censys also documents an opt-out process for its scanning, which is more effective than robots.txt for this kind of activity.

How it appears in analytics and logs

Requests or connections attributed to Censys are internet-wide scanning, not human visits and not polite page crawling. They reflect external host discovery; treat them as automated scanning, not audience.

Diagnostic use case

Understand why a robots.txt Disallow does little against an internet-wide host scanner, and use firewall/edge controls to limit exposure instead.

What WebmasterID can help detect

WebmasterID classifies self-identifying scanner requests that reach your application as automated, separate from human analytics, so HTTP-level scans are visible in bot traffic.

Common mistakes

Privacy and accuracy notes

Identifying scanner traffic relies on request characteristics and any self-identifying user agent, not visitor identity. Edge enforcement may act on connection metadata operationally; that is access control, not visitor profiling.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.