Search bots

BUbiNG research crawler

BUbiNG is an open-source distributed web crawler developed by the Laboratory for Web Algorithmics (LAW) at the University of Milan. It is designed for high-throughput crawling for research and dataset building, not to power a consumer search engine. Because anyone can run the open-source software, a BUbiNG user agent indicates the crawler software, not a single operator.

Partially verified

What this means

BUbiNG is an open-source, distributed, high-throughput web crawler from the LAW group at the University of Milan, the same group behind UbiCrawler and the WebGraph datasets. It is used to crawl large portions of the web for research and to build datasets, not to serve a public search engine.

Because the software is open source, a BUbiNG user agent tells you the crawler software in use, not a single fixed operator. Different deployments may be run by different researchers or organisations.

How it identifies itself

It uses a BUbiNG user-agent token, typically with a self-identifying URL or contact set by whoever runs it. Match on the BUbiNG token, but remember the contact/operator portion is configurable by the deployer.

Do not assume a fixed IP range or operator; verify by the self-identifying details of the specific deployment and by behaviour.

Open-source crawler from LAW, University of Milan
High-throughput, research/dataset focused
Operator varies by deployment; not a search product

How it appears in analytics and logs

A request carrying a BUbiNG user agent means someone is running the BUbiNG crawler, often for academic or dataset purposes. The identity behind it is not fixed, so treat it as research bot traffic and verify behaviour rather than assuming a specific organisation.

Diagnostic use case

Recognise BUbiNG-identified requests as research/dataset crawling, understand the operator is whoever deployed the open-source crawler, and set policy accordingly.

What WebmasterID can help detect

WebmasterID classifies BUbiNG as a research crawler distinct from search engines, so its requests are visible separately and excluded from human analytics, even though the operator behind it can vary.

Common mistakes

Assuming all BUbiNG traffic comes from one organisation.
Treating a research crawler as a search-engine indexer.
Inventing a fixed IP range for an open-source crawler anyone can run.

Privacy and accuracy notes

BUbiNG is identified by its user-agent token only. It is crawler software, not a person; WebmasterID records it as a bot event with no visitor profile attached.

↑ All search bots in Search bots

Sources and verification notes

LAW — BUbiNG crawlerOpen-source crawler; operator varies, so per-deployment specifics are not fixed.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.