Omgilibot — Webz.io data crawler
Omgilibot is a web data crawler operated by Webz.io, also seen under the omgili name. Its robots.txt token is omgilibot. Public documentation is limited in places, so specifics that cannot be confidently sourced are marked partially verified rather than guessed.
What this means
Omgilibot is a web data crawler operated by Webz.io. It is also referenced under the older omgili name, and it gathers web data that Webz.io makes available as feeds and datasets.
Public documentation is limited in places. This entry therefore describes the stable identification pattern and avoids asserting specifics that cannot be confidently sourced.
How Omgilibot identifies itself
The crawler uses the robots.txt user-agent token omgilibot, and the related name omgili is also seen. Its user-agent string contains that token. Match on the stable token rather than a full version string.
The user agent is a claim and can be copied. Do not invent IP ranges; identify it by the token and treat trust-sensitive decisions conservatively.
- robots.txt token: omgilibot (also omgili)
- Operated by Webz.io
- Some specifics: not fully documented publicly
robots.txt considerations
To disallow it site-wide, target both names if needed:
User-agent: omgilibot Disallow: /
User-agent: omgili Disallow: /
Treat the rule as a request rather than a guarantee where documentation is incomplete. robots.txt is never an access-control boundary.
How it appears in analytics and logs
A request carrying the omgilibot token is a Webz.io data crawler fetching a URL — a bot event, not a human visit. Identify it by the token and treat undocumented specifics conservatively.
Diagnostic use case
Identify Omgilibot in logs by its token and set robots.txt policy for the Webz.io data crawler.
What WebmasterID can help detect
WebmasterID classifies Omgilibot server-side by its token and surfaces it on the bot-intelligence surface, so you can see its activity per page without parsing logs.
Common mistakes
- Targeting only one of the omgilibot/omgili names when both may appear.
- Inventing IP ranges to verify the crawler.
- Counting crawler hits as human sessions.
Privacy and accuracy notes
Detection uses only the request user-agent. No human identity is involved. WebmasterID records the crawl as a bot event, separate from human analytics, and never attaches it to a visitor profile.
Related pages
- ImagesiftBot — image dataset crawler
ImagesiftBot is an image-focused web crawler associated with ImageSift (linked to Hive). Its robots.txt token is ImagesiftBot. Public documentation is limited in places, so specifics that cannot be confidently sourced are marked partially verified rather than guessed.
- CCBot — Common Crawl crawler
CCBot is the crawler operated by Common Crawl to build its open, freely available web dataset. That dataset is widely reused as a training source by many AI projects. Common Crawl documents the crawler and its robots.txt token, and CCBot honours robots.txt.
- Bot intelligence
Deterministic categorisation of crawlers, search bots, and automation.
Sources and verification notes
- Webz.io — crawler reference (token observed)Tokens omgilibot/omgili are observed; comprehensive official docs are limited, so some specifics are marked partially verified.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.