Security scanner user agents
The public web receives constant probing from security scanners — vulnerability tools, research crawlers, and internet-wide scanners. Some identify themselves clearly in the user agent; others mimic browsers. This page explains why probing is expected background noise and why reacting with blanket blocks can do more harm than good.
Probing is constant background noise
Any site exposed to the internet is continuously probed: vulnerability scanners, academic and commercial internet-wide research crawlers, and opportunistic tools request known-sensitive paths to see what responds. This is a normal condition of being online, not necessarily a sign you are specifically targeted.
Some research scanners identify themselves with a token and a URL explaining their project and how to opt out; others, especially hostile ones, mimic browsers or send minimal strings.
- Internet-wide probing is constant on the public web
- Some research scanners self-identify with a token and opt-out URL
- Hostile probes may mimic browsers or send minimal UAs
Respond proportionately, do not over-block
It is tempting to block anything that looks like a scanner, but blanket blocking by user agent is brittle: hostile scanners simply change their string, while you risk blocking legitimate research scanners, monitoring, or your own security testing. The user agent is a weak signal here.
A measured approach is to ensure sensitive paths are properly protected regardless of who probes them, rate-limit abusive sources, and categorise scanner traffic honestly rather than relying on UA-based bans. Self-identifying scanners often document how to request exclusion.
How it appears in analytics and logs
Requests probing for known paths or vulnerabilities are scanner activity. Some carry a self-identifying research-scanner token; many do not. Probing is constant on the public web and is not, by itself, proof of a targeted attack.
Diagnostic use case
Recognise security-scanner probing as expected background activity, distinguish self-identifying research scanners from hostile probes, and respond proportionately rather than over-blocking.
What WebmasterID can help detect
WebmasterID classifies recognisable scanner patterns server-side as automation and keeps ambiguous probes in an honest bucket, so background scanning is visible without being mistaken for human traffic.
Common mistakes
- Treating routine internet-wide probing as proof of a targeted attack.
- Blanket-blocking by user agent, which hostile scanners trivially evade.
- Blocking self-identifying research scanners while real threats spoof browsers.
Privacy and accuracy notes
Scanner user agents describe tools, not people. WebmasterID records scanner activity as bot events and never builds a human profile from it.
Related pages
- Spoofed and fake user agents: what to watch for
Spoofing a user agent is trivial — any client can claim to be Googlebot or a normal browser. This page explains why spoofing happens, the common fake-crawler patterns, and the verification methods that turn a claimed identity into a confirmed one.
- Empty or missing user-agent strings
The User-Agent header is not mandatory, so some requests arrive with an empty string or no header at all. This usually points to a script, a misconfigured client, or an old device — not a specific identity. This page explains what a missing UA means and how to handle it without over-blocking.
- Bot intelligence
Categorise scanners and automation, with the unknowns kept honest.
Sources and verification notes
- MDN — User-Agent headerScanner tokens vary widely and many probes spoof or omit a UA; identification is partial by nature.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.