How do I separate GPTBot from ChatGPT-User in logs?

Match each token independently. GPTBot is the training crawler, ChatGPT-User is the real-time browsing fetcher, and OAI-SearchBot supports search. They share OpenAI's documentation but are distinct tokens, so report them separately.

AI crawlers

Tracking GPTBot activity in logs

Tracking GPTBot means isolating requests whose user-agent carries the GPTBot token, verifying them against OpenAI's published IP ranges, then reporting which URLs were fetched, how often, and how recently. It is a server-side log exercise that should keep GPTBot out of human analytics and distinguish it from OpenAI's other tokens, ChatGPT-User and OAI-SearchBot.

Verified against primary sources

Isolating GPTBot requests

Start by filtering your access logs to requests whose user-agent contains the GPTBot token. Match on the stable token, not a full version string, because OpenAI changes the version component over time and a brittle exact match will silently miss traffic.

Keep GPTBot in a bot bucket, never in human analytics. Counting crawl hits as page views inflates traffic and corrupts engagement metrics, so segment it out at the point of measurement.

Verifying it is really GPTBot

The user agent is a claim anyone can copy. To trust a GPTBot request, verify its source IP against OpenAI's published GPTBot IP range list. Requests whose UA says GPTBot but whose IP falls outside those ranges are impostors and should be excluded from coverage and flagged separately.

Do not invent or hardcode IP literals from memory — fetch OpenAI's current published list, which can change, and verify against it. Never treat user-agent match alone as proof of identity.

Match the GPTBot token, not a brittle full version string
Verify source IP against OpenAI's published GPTBot range list
Exclude UA-spoofed requests from coverage; flag them separately

Reporting what GPTBot reached

Once you have verified requests, group by normalised path to see which pages GPTBot fetched, keep the newest timestamp per path for recency, and count requests over time to spot crawl waves. Separate GPTBot cleanly from ChatGPT-User (real-time browsing) and OAI-SearchBot (search), since all three share the OpenAI origin but serve different purposes.

The useful outputs are a covered-pages list, a never-fetched gap list, and a volume timeline — these tell you what OpenAI's training crawler can currently see of your site.

How it appears in analytics and logs

A verified GPTBot request means OpenAI's training crawler read that URL. A user agent that says GPTBot but originates outside OpenAI's IP ranges is not GPTBot and should be treated as a spoof, not as crawl coverage.

Diagnostic use case

Build a reliable view of GPTBot's crawl on your site: confirmed-genuine requests, the pages it reached, fetch recency, and request volume over time.

What WebmasterID can help detect

WebmasterID classifies GPTBot server-side and records the URLs it reached with timestamps, so you can see GPTBot activity on the bot-intelligence and AI-visibility surfaces without writing log queries yourself.

Common mistakes

Matching a full version string instead of the stable GPTBot token, missing traffic.
Trusting the GPTBot user agent without IP verification.
Lumping GPTBot together with ChatGPT-User and OAI-SearchBot.
Counting GPTBot fetches as human page views.

Privacy and accuracy notes

GPTBot tracking uses the user-agent token and OpenAI's published IP ranges only. No visitor identity is involved, and a crawler is never attached to a human profile.

Frequently asked questions

How do I separate GPTBot from ChatGPT-User in logs?: Match each token independently. GPTBot is the training crawler, ChatGPT-User is the real-time browsing fetcher, and OAI-SearchBot supports search. They share OpenAI's documentation but are distinct tokens, so report them separately.

↑ All AI crawlers in AI crawlers

Sources and verification notes

OpenAI — GPTBot documentationDocuments the GPTBot token and the IP range list used for verification.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.