Search bots

SimilarWeb crawler

SimilarWeb is a digital-intelligence company whose crawler fetches publicly accessible web pages as one input to its market-research, traffic-estimation, and competitive-analytics products. It is a data-collection crawler, not a search engine: it gathers signals about websites rather than building a public search index. SimilarWeb publishes a self-identifying crawler user-agent and a page describing the bot so operators can recognise and control it.

Verified against primary sources

What this means

SimilarWeb sells digital market-intelligence: estimated traffic, audience, and competitive metrics for websites and apps. Its crawler is one of many inputs into those estimates, fetching publicly accessible pages to read structure and public signals.

This is not a search engine. SimilarWeb does not publish a consumer search index of your pages; it aggregates data for its analytics customers. Treat it as a research/data-collection crawler distinct from Googlebot or Bingbot.

How it identifies itself

SimilarWeb documents a self-identifying crawler user-agent containing a SimilarWeb token and a URL pointing at its bot documentation. Match on the stable token rather than a full version string, which changes over time.

As with any user-agent, the string is a claim and can be copied. Where authenticity matters, corroborate with request patterns rather than trusting the string alone.

Operator: SimilarWeb (digital market intelligence)
User agent contains a SimilarWeb token plus a bot-info URL
Purpose: data collection for analytics, not public search

robots.txt considerations

SimilarWeb states its crawler honours robots.txt. To disallow it, target the documented SimilarWeb token:

User-agent: SimilarWeb Disallow: /

Use the exact token published in SimilarWeb's bot documentation. robots.txt is a request honoured by compliant crawlers, not an access-control mechanism, and blocking the crawler does not remove your site from any estimates already derived from other inputs.

How it appears in analytics and logs

A request carrying SimilarWeb's crawler identity means a web-intelligence platform fetched your page as one input to its analytics. It is data-collection bot traffic, not a human visit and not a search-index crawl; sustained activity reflects market-research coverage, not audience.

Diagnostic use case

Recognise SimilarWeb's web-intelligence crawler in logs, separate it from search indexing and SEO link crawlers, and decide robots.txt policy for a market-research data collector.

What WebmasterID can help detect

WebmasterID classifies the SimilarWeb crawler server-side as a web-intelligence data collector and surfaces its activity on the bot-intelligence surface, so you can see which pages it reached without parsing raw logs.

Common mistakes

Treating SimilarWeb's crawler as a search engine that indexes your pages for end users.
Counting market-intelligence crawl hits as human sessions in analytics.
Guessing the robots.txt token instead of using SimilarWeb's documented one.

Privacy and accuracy notes

Identification uses only the request user-agent. A crawler is not a person, and no visitor identity is involved. WebmasterID records the fetch as a bot event, separate from human analytics, and never attaches it to a profile.

↑ All search bots in Search bots

Sources and verification notes

SimilarWeb — crawler / bot informationSelf-identifying crawler user-agent and robots.txt guidance documented.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.