WebmasterID logoWebmasterID
Robots & crawl control

How to block Sogou Spider

Sogou Spider is the web crawler for Sogou, a Chinese search engine. This page shows how to disallow it in robots.txt using its documented user-agent tokens, explains what blocking does and does not affect, and how to confirm the rule is honoured.

Partially verified

robots.txt rule

Sogou's crawler identifies itself with user-agent tokens beginning with Sogou (for example a web-spider token). To disallow it site-wide, target the token in its own group:

User-agent: Sogou web spider Disallow: /

Verify the exact token from your own access logs before committing, because Sogou operates more than one crawler token and a partial match may miss some of them. Match on the documented token, not a full version string.

What blocking does and does not do

A Disallow asks compliant crawlers to stop fetching; it does not remove already-indexed pages and is not a firewall. If you need pages dropped from a search index, use noindex on a page the crawler can still read, not a robots.txt block that hides the directive.

Because any client can send the Sogou user agent, treat the user agent as a claim. If hits persist from outside expected networks, the source may be a non-compliant scraper rather than Sogou itself.

How it appears in analytics and logs

Continued Sogou Spider hits after a Disallow usually mean a token mismatch, a not-yet-refreshed robots.txt cache, or a non-compliant client copying the Sogou user agent.

Diagnostic use case

Reduce crawl load from Sogou Spider when your audience is not in the Chinese Sogou market, or keep specific sections out of Sogou's index, without affecting Google or Bing.

What WebmasterID can help detect

WebmasterID records Sogou Spider hits as search-bot events, so after adding a Disallow you can watch whether the crawler's activity actually tapers — the practical signal that the rule is being honoured.

Common mistakes

Privacy and accuracy notes

Blocking Sogou Spider concerns a crawler, not a person. The rule matches a user-agent token and involves no visitor data; robots.txt is a request, not an access control.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.