Datacenter traffic filtering
A large share of non-human traffic originates from datacenter and cloud-hosting IP ranges — automation, scrapers, and monitoring that may not declare themselves as bots. Filtering on known datacenter ranges removes a class of noise that user-agent rules miss, but ranges change and some legitimate users (VPNs, corporate proxies) also live there. This page covers the technique and its limits.
What this means
Bots, scrapers, uptime monitors, and headless automation overwhelmingly run in cloud and hosting environments, so their requests come from a relatively small set of datacenter IP ranges and autonomous system numbers (ASNs). Filtering by these ranges catches automation that does not announce itself in the user-agent string.
This complements user-agent and known-bot filtering: it targets origin network rather than declared identity.
Limits of the approach
Datacenter ranges are reassigned and expand constantly, so static lists go stale. Some legitimate human traffic — corporate VPNs, privacy proxies, mobile carrier gateways — also originates from such ranges, so blanket filtering risks dropping real visitors. Treat it as a strong signal, not a verdict, and combine with other classification.
- Most undeclared bots run on cloud/hosting networks
- Ranges change; static lists decay
- VPNs and proxies put some real users in these ranges
How it appears in analytics and logs
Concentrated traffic from cloud or hosting ASNs is a strong non-human signal, but not absolute — corporate VPNs and proxies can place real users in datacenter ranges.
Diagnostic use case
Reduce automated noise by recognizing datacenter-origin traffic, while understanding that IP ranges shift and a few real users route through them.
What WebmasterID can help detect
WebmasterID's bot intelligence classifies datacenter-origin and automated traffic so it is separated from human totals without you maintaining IP lists.
Common mistakes
- Treating every datacenter IP as definitively a bot.
- Relying on a static IP list that goes stale.
- Storing raw IPs instead of aggregating by network.
Privacy and accuracy notes
IP-based filtering must avoid storing or exposing raw addresses; aggregate by network/ASN, not by individual IP. WebmasterID classifies bots without retaining raw IPs.
Related pages
- Bot traffic in analytics: filtering it out
Bots — crawlers, scrapers, monitors, scanners — generate requests that, unfiltered, inflate pageviews and distort every metric. Client-side analytics often misses bots (many do not run JavaScript) or miscounts the ones that do. Server-side classification at ingest is the reliable way to keep bot traffic out of human reports.
- IP filtering pitfalls
Filtering out internal or unwanted traffic by IP address is intuitive but fragile: residential IPs are dynamic, mobile and shared networks sit behind carrier-grade NAT, IPv6 prefixes differ from IPv4 rules, and privacy relays mask the real address. As a result IP filters silently stop matching or match the wrong people. This page details the pitfalls of IP-based filtering.
- Dark traffic in analytics
Dark traffic (or dark social) is genuine human traffic whose source is lost, so it falls into the Direct bucket. It comes from links opened inside apps and messaging clients, email programs, documents, and secure-to-insecure transitions that strip the Referer header. The result is an inflated Direct channel that hides real acquisition. This page explains the mechanisms that erase the referrer.
- Bot Intelligence
Classify datacenter and automated traffic.
Sources and verification notes
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.