Magpie-crawler (Brandwatch)
Magpie-crawler is a crawler that has been associated with Brandwatch's Magpie data-collection infrastructure for social and web monitoring. It fetches publicly available pages to support media monitoring and analytics rather than a consumer search engine. The self-identifying token is observable; published specifics are limited, so this entry is partially verified.
What this means
Magpie-crawler has been associated with Brandwatch's Magpie data-collection system used for social and web monitoring. It fetches publicly available pages to feed media-monitoring and analytics products, not a search engine.
Operators who do not want their content collected can disallow the token. As always, robots.txt is a request honoured by compliant crawlers, not an enforcement mechanism.
How it identifies itself
It uses a self-identifying user-agent token in the magpie-crawler form together with a self-identifying URL. Match on the stable token. Published documentation is limited, so confirm by the self-identifying URL and behaviour rather than asserting unverified IP ranges or version strings.
- robots.txt token: magpie-crawler (self-identifying)
- Purpose: media/brand monitoring data collection
- Not a consumer search engine
How it appears in analytics and logs
A request carrying the Magpie-crawler token is a media/brand-monitoring crawler fetching public content, not a search engine indexing you for users. It is third-party monitoring traffic and should be counted as bot.
Diagnostic use case
Recognise Magpie-crawler as a monitoring/data-collection crawler in your logs and distinguish it from search-engine bots when reviewing crawl sources.
What WebmasterID can help detect
WebmasterID classifies Magpie-crawler as a monitoring/data crawler distinct from search bots, so its requests are visible separately and excluded from human analytics.
Common mistakes
- Mistaking a monitoring crawler for a search-engine indexer.
- Counting Magpie-crawler hits as human visits.
- Inventing an exact user-agent string or IP range for it.
Privacy and accuracy notes
Magpie-crawler is identified by its user-agent token only. WebmasterID records it as a bot event and never attaches it to a visitor identity; any monitoring it does is outside WebmasterID.
Related pages
- BUbiNG research crawler
BUbiNG is an open-source distributed web crawler developed by the Laboratory for Web Algorithmics (LAW) at the University of Milan. It is designed for high-throughput crawling for research and dataset building, not to power a consumer search engine. Because anyone can run the open-source software, a BUbiNG user agent indicates the crawler software, not a single operator.
- Monitoring bots vs search crawlers
Monitoring bots (uptime and performance checkers such as Pingdom and UptimeRobot) fetch your pages on a schedule to confirm availability, not to index them. They differ from search crawlers, which build a search index, and from SEO crawlers, which gather competitive data. Telling them apart keeps synthetic checks out of human analytics.
- Managing third-party SEO crawler load
Third-party SEO crawlers such as AhrefsBot and SemrushBot can generate significant request volume without contributing to search visibility. You can manage their load by targeting their tokens in robots.txt, using crawl-delay where the crawler supports it, and blocking those that bring no value to you.
- Bot vs human
Separates monitoring crawlers from real visitors.
Sources and verification notes
- Brandwatch — social and web monitoringMagpie data-collection infrastructure; full crawler specifics not exhaustively published.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.