How to block magpie-crawler
magpie-crawler is a web crawler associated with Brandwatch's social and web monitoring platform, which gathers public content for brand and media analysis. This page shows the robots.txt token to target, what the crawler does, and why a Disallow steers only compliant fetchers.
What magpie-crawler is
magpie-crawler is a web crawler associated with Brandwatch, a social and web monitoring platform that collects public content for brand listening and media analysis. Operators who do not want their pages folded into that monitoring can disallow it.
Match on the documented magpie-crawler user-agent token rather than a version string. The user agent is self-identifying and contains a URL pointing at the operator.
- robots.txt token: magpie-crawler
- Associated with Brandwatch social/web monitoring
- Purpose: brand and media intelligence collection
robots.txt rule
To ask magpie-crawler to stay off your site:
User-agent: magpie-crawler Disallow: /
This targets only that token and leaves search and AI crawlers unaffected. robots.txt is honoured by compliant crawlers and is not enforcement, so confirm with crawl behavior that the crawler actually backed off.
How it appears in analytics and logs
Requests carrying the magpie-crawler token are monitoring-crawler events, not human visits. They indicate a brand/media-intelligence platform is collecting your public content; classify them as bot traffic.
Diagnostic use case
Keep a brand-monitoring crawler from harvesting your public pages for social/media analytics and confirm the rule reached the correct token.
What WebmasterID can help detect
WebmasterID classifies magpie-crawler server-side as a crawler and shows whether it keeps reaching your pages after a robots.txt rule is added.
Common mistakes
- Misspelling the magpie-crawler token so the rule never matches.
- Expecting robots.txt to enforce the block rather than request compliance.
- Counting monitoring-crawler hits as human sessions.
Privacy and accuracy notes
Blocking magpie-crawler uses only the request user-agent token. No visitor identity is involved, and WebmasterID records the crawl as a bot event separate from human analytics.
Related pages
- Magpie-crawler (Brandwatch)
Magpie-crawler is a crawler that has been associated with Brandwatch's Magpie data-collection infrastructure for social and web monitoring. It fetches publicly available pages to support media monitoring and analytics rather than a consumer search engine. The self-identifying token is observable; published specifics are limited, so this entry is partially verified.
- How to block ZoominfoBot
ZoominfoBot is the crawler associated with ZoomInfo, a business-data platform that compiles company and contact information from public web pages. This page shows how the crawler identifies itself, the robots.txt token to target, and why a Disallow is a request rather than enforcement against a non-compliant fetcher.
- robots.txt basics: what it does and what it cannot do
robots.txt is a plain-text file at your site root that tells compliant crawlers which paths they may request. This page covers the directives, how user-agent groups are matched, and the limits that trip people up: robots.txt is advisory, it does not hide pages from search, and it is not a security boundary.
- Bot intelligence
Categorise monitoring and brand-listening crawlers.
Sources and verification notes
- Brandwatch — crawler/bot informationmagpie-crawler associated with Brandwatch monitoring; token matched on the self-identifying user agent.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.