WebmasterID logoWebmasterID
Robots & crawl control

How to block the SimilarWeb crawler

SimilarWeb operates a crawler that gathers public web data for its market-intelligence and traffic-estimation products. It is a declared crawler with a documented robots.txt token, so operators who do not want their pages crawled for competitive-analytics datasets can disallow it. This page shows the token to target and the rule to use.

Partially verified

What this means

SimilarWeb gathers publicly available web data to power its traffic-estimation and market-intelligence products. Part of that data collection involves a crawler that fetches public pages. If you do not want your content used for competitive-analytics datasets, you can ask the crawler to stay out via robots.txt.

A block does not remove your site from estimates SimilarWeb derives from other data sources; it only asks the crawler to stop fetching your pages directly. robots.txt is a request honoured by compliant crawlers, not an access-control mechanism.

How to block it

Target the SimilarWeb crawler token in its own user-agent group. Match on the stable token rather than a full version string, because the version component changes over time.

User-agent: SimilarWebBot Disallow: /

Place this group alongside your other rules. Because robots.txt is advisory, verify in your logs that requests carrying the token stop after the change; if they continue, the source may be a non-compliant client impersonating the token, which a firewall rule would address instead.

How it appears in analytics and logs

A request carrying the SimilarWebBot token is SimilarWeb's crawler fetching a URL for analytics datasets, not a human visit. Treat it as bot traffic. The user agent is only a claim, so sustained activity from the token is crawl coverage, not audience.

Diagnostic use case

Stop SimilarWeb from crawling your site for its market-intelligence datasets, and confirm in your logs whether the crawler has honoured the rule.

What WebmasterID can help detect

WebmasterID classifies the SimilarWeb crawler server-side and surfaces its activity on the bot-intelligence surface, so you can confirm whether a robots.txt block is being respected without parsing raw server logs.

Common mistakes

Privacy and accuracy notes

Blocking SimilarWeb relies only on the request user-agent token. No human identity is involved — a crawler is not a person. WebmasterID records the crawl as a bot event, separate from human analytics, and never attaches it to a visitor profile.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.