Measuring AI crawl coverage
AI crawl coverage is the share of your important URLs that declared AI crawlers have actually fetched in a window. Measuring it means joining a list of crawl-worthy pages to observed bot requests by token, then looking at which URLs were reached, how recently, and which were missed. It is a server-side measurement built from request logs, not from human analytics.
What coverage actually measures
Coverage answers a precise question: of the URLs you care about, which has a particular AI crawler fetched, and when. It is per-token, because GPTBot, ClaudeBot, PerplexityBot, and others crawl independently — being reached by one says nothing about the others.
Start from a denominator: your canonical list of crawl-worthy URLs, usually your sitemap minus noindex and parameter duplicates. The numerator is the set of those URLs a token fetched in your window. Coverage is the ratio, but the useful output is the gap list, not the percentage.
How to compute it from logs
Filter requests to those whose user-agent carries a declared AI token, then group by token and normalised path. For each path, keep the most recent fetch timestamp and a count. Left-join your sitemap URL set against that table: rows with no match are uncovered; rows whose newest fetch is older than your content's last change are stale.
Resolve paths consistently — strip tracking parameters, unify trailing slashes, and fold canonical equivalents — or you will undercount coverage because the same page appears under several keys.
- Denominator: canonical crawl-worthy URLs (sitemap minus noindex/dupes)
- Numerator: those URLs fetched by the token in the window
- Track newest fetch per URL to separate covered-but-stale from never-covered
Reading the gaps
A page no AI crawler has reached is usually a discoverability problem: weak internal linking, blocked in robots.txt, or absent from the sitemap. A page covered by GPTBot but not ClaudeBot is normal — crawl budgets and schedules differ per operator.
Coverage is necessary but not sufficient for AI visibility: a crawler reaching a page does not guarantee the content is used or surfaced. Treat coverage as the upstream signal and AI referrals or citations as the downstream one.
How it appears in analytics and logs
High coverage of priority URLs by a given token means that crawler has had the chance to read most of your content. Low or stale coverage means some pages may be absent from that AI system's view of your site, even if they rank well in classic search.
Diagnostic use case
Quantify how much of your site AI crawlers such as GPTBot and ClaudeBot have reached, find pages never fetched, and prioritise fixes for orphaned or slow-to-be-crawled URLs.
What WebmasterID can help detect
WebmasterID records AI-crawler requests server-side by token and URL, so you can see per-page coverage and recency for each crawler on the bot-intelligence and AI-visibility surfaces without parsing raw logs yourself.
Common mistakes
- Reporting one global coverage number instead of per-token coverage — crawlers differ.
- Counting a page as covered without checking how stale the last fetch is.
- Failing to normalise URLs, so the same page is double-counted across path variants.
- Assuming coverage equals AI visibility — being fetched is not the same as being cited.
Privacy and accuracy notes
Coverage is computed from bot requests identified by user-agent token, never from human sessions. No visitor identity is involved, and edge country is a coarse estimate only — a crawler is not a person.
Frequently asked questions
- Is one coverage percentage enough?
- No. Coverage should be reported per crawler token, because GPTBot, ClaudeBot, and PerplexityBot crawl on independent schedules and budgets. A single blended number hides which AI systems can actually see your content.
Related pages
- Tracking GPTBot activity in logs
Tracking GPTBot means isolating requests whose user-agent carries the GPTBot token, verifying them against OpenAI's published IP ranges, then reporting which URLs were fetched, how often, and how recently. It is a server-side log exercise that should keep GPTBot out of human analytics and distinguish it from OpenAI's other tokens, ChatGPT-User and OAI-SearchBot.
- AI crawler traffic patterns
AI crawler activity often shows up as crawl waves — bursts as a vendor refreshes coverage — or as steadier background streams. Reading these patterns helps you interpret spikes correctly and, crucially, keep bot traffic separate from human analytics.
- AI visibility analytics
See which AI crawlers reach your site and which pages they cover, recorded server-side.
Sources and verification notes
- Google — Sitemaps overviewSitemaps define the canonical URL set used as a coverage denominator.
- OpenAI — bots documentationDocuments AI crawler tokens used to attribute coverage per crawler.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.