Crawler IP verification methods
Because user-agent strings are trivially copied, the reliable way to confirm a crawler is to check its source. The two documented methods are reverse DNS with a forward-confirm step, and matching the source IP against the engine's published IP ranges. Together they defend against spoofed crawler traffic.
Reverse DNS with forward-confirm
The reverse-DNS method looks up the source IP to get a hostname, then confirms that hostname belongs to the engine — for example googlebot.com or google.com for Google, search.msn.com for Bing, or a Yandex domain for Yandex. The essential second step is a forward lookup on that hostname, confirming it resolves back to the original IP.
The forward-confirm step matters because reverse-DNS records alone can be set by whoever controls the IP's PTR record. Requiring both directions to agree closes that gap.
- Reverse lookup the IP to a hostname
- Confirm the hostname is the engine's domain
- Forward lookup the hostname back to the original IP
Published IP-range matching
Several engines publish their crawler IP ranges, sometimes as downloadable, regularly updated lists. Matching a request's source IP against the current published ranges confirms it originates from the engine, without a per-request DNS lookup.
The trade-off is maintenance: published ranges change, so you must refresh them rather than hardcoding addresses. Never paste raw IPs into documentation or rules as permanent facts — treat the engine's published list as the live source of truth. Many operators combine IP matching with reverse DNS for defence in depth.
How it appears in analytics and logs
A user-agent string is a claim that anyone can copy. IP verification establishes whether the request actually came from the engine's infrastructure. A crawler claim that fails both reverse-DNS and IP-range checks is spoofed.
Diagnostic use case
Confirm a crawler's authenticity before trusting it, using reverse DNS or published IP ranges, so spoofed user agents cannot drive your decisions.
What WebmasterID can help detect
WebmasterID classifies crawlers server-side and distinguishes verified crawlers from spoofed lookalikes, so the verification logic runs once centrally instead of on every request you inspect by hand.
Common mistakes
- Skipping the forward-confirm step, which makes reverse DNS spoofable.
- Hardcoding IP addresses instead of refreshing the engine's published ranges.
- Trusting the user agent alone for decisions where authenticity matters.
Privacy and accuracy notes
Verification inspects the request's own IP and DNS records, which belong to crawler infrastructure, not to any human visitor. WebmasterID applies verification to classify bots and never builds human profiles from it.
Related pages
- How to verify Googlebot
The Googlebot user agent is widely spoofed, so a request claiming to be Googlebot should be verified, not trusted. Google documents two methods: a reverse-DNS check that resolves into googlebot.com or google.com confirmed by a matching forward lookup, and matching the source IP against Google's published crawler IP ranges.
- Fake search-bot traffic
Because search-engine crawlers are widely allowed, abusive clients copy the Googlebot or Bingbot user-agent string to slip past rules meant for real crawlers. This fake search-bot traffic is identified by verifying the source: genuine crawlers pass reverse-DNS and published-IP checks, spoofed ones do not.
- Bot intelligence
Verified crawler classification separated from human traffic.
Sources and verification notes
- Google — Verifying Googlebot and other crawlersDocuments reverse-DNS verification and published crawler IP ranges.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.