How to block the Gigablast crawler
Gigablast was an independent search engine whose crawler, GigaBot, fetched public pages to build its index. The service is no longer operating as it once did, but the token can still appear in logs from residual or impersonating clients. This page shows the robots.txt rule and how to interpret leftover GigaBot activity.
What this means
Gigablast was a long-running independent search engine that ran its own crawler, GigaBot, to build its index. The public search service has effectively wound down, so meaningful indexing-driven crawling is unlikely. Despite that, the GigaBot token can still surface in logs.
When a token from a defunct service appears, the most likely explanations are residual infrastructure, archived crawl jobs, or another client reusing the recognisable token. Treat such hits as low-trust and verify behaviour rather than assuming a legitimate search crawl.
How to block it
To disallow GigaBot, target its token in its own group:
User-agent: GigaBot Disallow: /
Given the service's status, the practical effect of allowing or blocking it on real search visibility is minimal. If GigaBot-tokened requests persist and ignore the rule, that strongly suggests impersonation rather than the original crawler, in which case a firewall or WAF rule is the appropriate control.
- robots.txt token to target: GigaBot
- Original Gigablast search service is largely defunct
- Persistent GigaBot hits likely mean impersonation — use a firewall
How it appears in analytics and logs
A request carrying the GigaBot token is attributed to Gigablast's crawler. Because the original service is largely defunct, current GigaBot hits may be residual infrastructure or a client impersonating the token, so treat them with extra scepticism.
Diagnostic use case
Disallow GigaBot in robots.txt and understand why a crawler from a discontinued search engine might still show up in your logs.
What WebmasterID can help detect
WebmasterID classifies GigaBot server-side and surfaces its activity, so you can see whether the token still appears in your traffic and whether a robots.txt rule changes anything.
Common mistakes
- Assuming GigaBot activity reflects live search-engine indexing today.
- Trusting the token without confirming behaviour in logs.
- Counting residual crawler hits as human traffic.
Privacy and accuracy notes
Blocking Gigablast relies only on the request user-agent token. No human identity is involved. WebmasterID records the crawl as a bot event, separate from human analytics, and never attaches it to a visitor profile.
Related pages
- How to block Exabot (Exalead)
Exabot is the web crawler historically associated with Exalead, a search engine. Its crawler fetches public pages to build a search index. This page shows the robots.txt token to target, notes the crawler's search-engine origin, and explains why a Disallow steers only compliant fetchers.
- robots.txt vs a firewall/WAF
robots.txt and a firewall/WAF solve different problems: robots.txt politely asks compliant crawlers what to skip, while a firewall or WAF actually blocks requests at the network or edge layer. This page contrasts the two, explains when each is appropriate, and warns against using robots.txt for jobs only enforcement can do.
- Gigablast crawler (GigaBot)
Gigablast was an independent search engine, known for running its own web index and open-sourcing parts of its technology. Its crawler (associated with the GigaBot identity) fetched public pages to build that index. Gigablast's public search has wound down, so its crawler is largely a legacy token seen in historic logs rather than an active mainstream engine.
- Web crawler reference
How search crawlers, past and present, identify themselves.
Sources and verification notes
- Gigablast — open-source search engine projectGigablast's crawler GigaBot; the public search service is no longer actively operating.
- Robots Exclusion Protocol (RFC 9309)
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.