Search bots

Project Honey Pot and Http:BL

Project Honey Pot is a community effort that uses honeypot pages to catch email harvesters, comment spammers, and other malicious bots, and exposes its findings through the Http:BL (HTTP blacklist) service. It is not a search crawler: it identifies bad bots so operators can recognise them. Understanding it helps separate abusive automation from legitimate search and SEO crawling.

Partially verified

What this means

Project Honey Pot embeds invisible honeypot addresses and pages that legitimate users and well-behaved crawlers never touch. When a bot harvests those addresses or hits those traps, it reveals itself as a harvester or spammer. The project aggregates this across many sites.

Http:BL exposes the resulting reputation data so operators can query whether an IP has been seen behaving badly. None of this is search indexing — it is a way to recognise malicious automation.

How it relates to search crawlers

Legitimate search crawlers (Googlebot, Bingbot) identify themselves, publish verification methods, and avoid honeypot traps. Harvesters and spam bots do neither, which is exactly what honeypots expose.

Use this distinction when reading logs: a request that fails crawler verification, ignores robots.txt, and trips traps is abusive automation, not a search engine. Because reputation lists and traps are heuristic and can have edge cases, treat Http:BL as one signal among several, which is why this concept page is marked partially verified.

Honeypots catch harvesters and spam bots, not real users
Http:BL exposes IP reputation data for operators
Legitimate search crawlers self-identify and avoid traps

How it appears in analytics and logs

A client flagged by Http:BL or caught by a honeypot is automation associated with harvesting or spam, not a legitimate search crawler. It is a reason to scrutinise, not to treat as audience or indexing.

Diagnostic use case

Use Project Honey Pot's framing to distinguish abusive harvester/spammer bots from legitimate search crawlers when reading logs and deciding what to block.

What WebmasterID can help detect

WebmasterID helps separate abusive automation from legitimate search and SEO crawlers server-side, so harvester and spam traffic does not get mistaken for search coverage or audience.

Common mistakes

Treating an Http:BL hit as proof rather than one signal among several.
Confusing honeypot-caught bots with legitimate search crawlers.
Blocking by reputation alone without verifying crawler identity.

Privacy and accuracy notes

Bot identification here is based on behaviour and IP reputation data, not on tracking real visitors. WebmasterID records automated requests as bot events and never as human profiles; reputation lookups should be used carefully and lawfully.

↑ All search bots in Search bots

Sources and verification notes

Project Honey PotCommunity honeypot project and Http:BL service; reputation data is heuristic, used as one signal.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.