How to block omgilibot in robots.txt
omgilibot is the crawler historically associated with Omgili and the Webz.io web-data project, which collects public web content for datasets resold to third parties, including AI uses. This page gives the robots.txt rule to disallow the omgilibot token and flags where documentation is limited.
What omgilibot is
omgilibot is a crawler historically associated with Omgili, a forum/discussion search project later linked to the Webz.io web-data business, which collects and resells public web content as structured feeds. Those feeds are used for market intelligence and, increasingly, AI training inputs.
Because the project's branding and ownership have shifted over time, some public documentation is limited. This entry avoids asserting current ownership specifics that cannot be confidently sourced and focuses on the stable robots.txt token.
The rule
To disallow omgilibot site-wide, target its token:
User-agent: omgilibot Disallow: /
Match the lowercase token exactly. Because related data products have used different crawler tokens over time, confirm in your logs which token is actually requesting your pages before assuming this rule covers all of it. robots.txt is a request, not enforcement.
- Token: omgilibot
- Lineage: Omgili / Webz.io web-data collection
- Related products have used other tokens — verify in logs
How it appears in analytics and logs
A request carrying the omgilibot token is a data-collection crawler in the Omgili/Webz.io lineage fetching a URL. It is a bot event tied to dataset building, not a human visit.
Diagnostic use case
Disallow omgilibot when you do not want your content collected into resold web-data feeds that may be used for AI training or market intelligence.
What WebmasterID can help detect
WebmasterID classifies omgilibot by its token, separate from human analytics, so you can confirm whether a disallow rule reduced its activity.
Common mistakes
- Assuming omgilibot is the only token used by the wider Webz.io data project.
- Capitalising the token — it is conventionally lowercase omgilibot.
- Inventing ownership or IP details where documentation is sparse.
Privacy and accuracy notes
Blocking omgilibot is a publishing-policy choice in a public file. It involves no visitor data and is not an access-control boundary.
Related pages
- Omgilibot — Webz.io data crawler
Omgilibot is a web data crawler operated by Webz.io, also seen under the omgili name. Its robots.txt token is omgilibot. Public documentation is limited in places, so specifics that cannot be confidently sourced are marked partially verified rather than guessed.
- How to block cohere-ai in robots.txt
cohere-ai is the robots.txt token associated with Cohere's web fetching for its AI products. This page gives the rule to disallow the cohere-ai token, explains where it fits in an AI-crawler policy, and stays cautious where Cohere's public documentation is limited.
- Writing an AI crawler policy for robots.txt
An AI crawler policy is a deliberate decision about which AI-related tokens you allow and which you disallow in robots.txt. This page offers a structured way to make and document those choices, while staying realistic: robots.txt is a request to compliant crawlers, not a legal or technical guarantee.
- Bot intelligence
See whether an omgilibot disallow changed its activity.
Sources and verification notes
- Webz.io / Omgili — crawler reference (token observed)Token omgilibot is observed in the Omgili/Webz.io lineage; comprehensive current docs are limited, so specifics are marked partially verified.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.