Most analytics tools were designed before AI assistants started reading the web on their own schedule. The default behavior — count everything as a page view, treat every user-agent as a human — gives you a misleading picture if your site is being indexed by GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot, Bingbot, and friends.
For a publisher or a content network, AI crawler activity is its own signal. It tells you which models are interested in your content, how often they re-crawl, and where they concentrate. That is information you can act on — robots policy, sitemap priority, editorial decisions about what to publish behind a login.
Two-track ingestion
WebmasterID classifies every incoming event at the edge of the ingest API. Requests with a recognised AI/search-bot user-agent are written to a separate bot_visits table; everything else goes to the main events table. Human aggregates stay clean by construction; the AI side is queryable as a first-class signal.
The current detector list lives in @webmasterid/ai-visibility. The full set is on the AI visibility page.
AI referrals are the other half
When a real human clicks a citation in ChatGPT, Claude, or Perplexity, the resulting visit looks like any other browser page view — except the referrer is recognisable. WebmasterID tags those events with traffic_category = ai_referral so you can size demand from each AI surface without losing the human dimension.
What you can do with this data
Three concrete uses, each on a different timescale:
- Editorial: which articles are being re-crawled by which AI surfaces, and which ones generate human AI referrals.
- SEO/AEO operations: robots policy and sitemap priority informed by actual crawler behaviour, not guessed defaults.
- Architecture: for AI-native sites, knowing which surfaces are reading you helps shape the structured-data and machine-readability investments that pay off.
For implementation details, see /architecture. For who this matters most for, see /use-cases.