Robots & crawl control

How to block omgilibot in robots.txt

omgilibot is the crawler historically associated with Omgili and the Webz.io web-data project, which collects public web content for datasets resold to third parties, including AI uses. This page gives the robots.txt rule to disallow the omgilibot token and flags where documentation is limited.

Partially verified

What omgilibot is

omgilibot is a crawler historically associated with Omgili, a forum/discussion search project later linked to the Webz.io web-data business, which collects and resells public web content as structured feeds. Those feeds are used for market intelligence and, increasingly, AI training inputs.

Because the project's branding and ownership have shifted over time, some public documentation is limited. This entry avoids asserting current ownership specifics that cannot be confidently sourced and focuses on the stable robots.txt token.

The rule

To disallow omgilibot site-wide, target its token:

User-agent: omgilibot Disallow: /

Match the lowercase token exactly. Because related data products have used different crawler tokens over time, confirm in your logs which token is actually requesting your pages before assuming this rule covers all of it. robots.txt is a request, not enforcement.

Token: omgilibot
Lineage: Omgili / Webz.io web-data collection
Related products have used other tokens — verify in logs

How it appears in analytics and logs

A request carrying the omgilibot token is a data-collection crawler in the Omgili/Webz.io lineage fetching a URL. It is a bot event tied to dataset building, not a human visit.

Diagnostic use case

Disallow omgilibot when you do not want your content collected into resold web-data feeds that may be used for AI training or market intelligence.

What WebmasterID can help detect

WebmasterID classifies omgilibot by its token, separate from human analytics, so you can confirm whether a disallow rule reduced its activity.

Common mistakes

Assuming omgilibot is the only token used by the wider Webz.io data project.
Capitalising the token — it is conventionally lowercase omgilibot.
Inventing ownership or IP details where documentation is sparse.

Privacy and accuracy notes

Blocking omgilibot is a publishing-policy choice in a public file. It involves no visitor data and is not an access-control boundary.

↑ All robots topics in Robots & crawl control

Sources and verification notes

Webz.io / Omgili — crawler reference (token observed)Token omgilibot is observed in the Omgili/Webz.io lineage; comprehensive current docs are limited, so specifics are marked partially verified.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.