WebmasterID logoWebmasterID
AI crawlers

AI crawler honeypots and traps

An AI crawler honeypot is a deliberately planted resource — a hidden link, a disallowed path, or an endlessly generated 'tar-pit' page — used to detect or slow crawlers that ignore robots.txt. Tools such as Nepenthes popularised the tar-pit approach. This entry explains the techniques, what they can prove, and why they are a detection aid rather than enforcement.

Partially verified

How honeypots work

The simplest honeypot is a link hidden from humans (for example via CSS) and listed as Disallow in robots.txt. A compliant crawler skips it; a non-compliant one fetches it, outing itself. A more aggressive variant is a tar pit — a page that generates endless low-value links to waste a misbehaving crawler's budget. The open-source Nepenthes project is a documented example of the tar-pit pattern.

The goal is detection and friction, not access control. A honeypot tells you something fetched a path it should not have; that is evidence of non-compliance you can act on.

Limits and cautions

Honeypots prove behaviour, not identity. A fetch of a disallowed path shows the client ignored robots.txt, but a copied user-agent token cannot tell you which vendor it really was — combine with verification before naming anyone. Tar pits also consume your own server resources and can entangle benign tools, so use them deliberately.

Never present a honeypot as a guarantee against scraping. Determined non-compliant clients can detect and avoid traps, and aggressive tar-pitting may breach your host's acceptable-use terms. Treat honeypots as one detection signal among several, not a wall.

How it appears in analytics and logs

A request for a hidden or robots-disallowed honeypot URL is a strong signal the client did not honour your rules — a useful flag for non-compliance. It identifies behaviour, not necessarily a specific named vendor.

Diagnostic use case

Detect crawlers that ignore robots.txt by observing whether they fetch a disallowed honeypot path, and understand the limits before relying on traps.

What WebmasterID can help detect

WebmasterID can surface hits to known honeypot paths as bot events, helping you spot crawlers that ignored robots.txt without manually grepping logs for the trap URL.

Common mistakes

Privacy and accuracy notes

Honeypots observe crawler behaviour, not human users. Any link a real person would never see should not affect human analytics. WebmasterID records honeypot hits as bot events only.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.