Robots & crawl control

How to block BUbiNG

BUbiNG is an open-source, high-throughput web crawler developed for research and large-scale web data collection. Because anyone can run an instance, its behavior depends on the operator. This page shows the robots.txt token to target and why a Disallow only steers compliant deployments.

Partially verified

What BUbiNG is

BUbiNG is an open-source distributed crawler from the Laboratory for Web Algorithmics, designed for high-throughput web data collection in research contexts. Because it is software anyone can deploy, the crawler you see is run by whoever set up that instance, and politeness settings are operator-controlled.

Match on the documented BUbiNG user-agent token rather than a version string. Since deployments vary, a robots.txt rule reaches only the instances configured to honour it.

robots.txt token: BUbiNG (self-identifying user agent)
Open-source research crawler; run by many operators
Politeness and volume depend on the deployment

robots.txt rule

To ask BUbiNG to stay off your site:

User-agent: BUbiNG Disallow: /

BUbiNG implementations that respect robots.txt will back off. A deployment that ignores robots.txt — or runs with politeness disabled — is not stopped by this rule, so if crawl load continues, escalate to edge-level rate limiting or a WAF.

How it appears in analytics and logs

Requests carrying the BUbiNG token are research-crawler events, not human visits. Because BUbiNG is run by many operators, volume and politeness vary by deployment; treat the hits as bot traffic.

Diagnostic use case

Reduce crawl load from BUbiNG-based research crawlers on your public pages and confirm the disallow targeted the right token.

What WebmasterID can help detect

WebmasterID classifies BUbiNG server-side as a crawler and shows whether it keeps reaching your pages after a robots.txt rule, helping you tell compliant deployments from ones that ignore it.

Common mistakes

Assuming one rule stops every BUbiNG instance regardless of its configuration.
Expecting robots.txt to enforce a block rather than request compliance.
Counting research-crawler hits as human sessions.

Privacy and accuracy notes

Blocking BUbiNG uses only the request user-agent token. No visitor identity is involved, and WebmasterID records the crawl as a bot event separate from human analytics.

↑ All robots topics in Robots & crawl control

Sources and verification notes

BUbiNG — Laboratory for Web AlgorithmicsBUbiNG open-source crawler project page; token matched on the self-identifying user agent.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.