archive.org_bot — Internet Archive web crawler
archive.org_bot is a user-agent associated with Internet Archive crawling that fetches public web pages for preservation in collections such as the Wayback Machine. It is an archival agent, distinct from search-engine indexing crawlers, and identifies via an archive.org URL in its user-agent. Operators see it when their public pages are captured for long-term snapshots.
What this means
archive.org_bot is an Internet Archive crawler agent used to fetch public pages for preservation. The Internet Archive captures snapshots so pages can be viewed later via the Wayback Machine, supporting research, journalism, and historical record.
This is preservation, not search. An archive.org_bot visit does not influence your search rankings; it captures the page as it appeared at that moment.
How it identifies itself
The agent identifies with an archive.org URL in its user-agent string. Match on the archive.org token pattern rather than an exact version. As with any crawler, the user-agent is a claim and can be copied, so corroborate with source behaviour where authenticity matters.
The Internet Archive does not exhaustively publish a single token list or IP ranges, so this entry is marked partially verified; the archival purpose and archive.org self-identification are the reliable signals.
- Identifies via an archive.org URL in the user-agent
- Purpose: snapshot preservation for the Wayback Machine
- Separate from search-engine indexing crawlers
robots.txt considerations
To ask Internet Archive crawling not to fetch your pages, target the relevant archive token in robots.txt. The Archive's handling of robots.txt has changed over time, especially for already-captured material.
robots.txt is a request honoured by compliant crawlers, not enforcement, and does not retroactively remove existing snapshots.
How it appears in analytics and logs
An archive.org_bot request is an archival fetch to preserve a snapshot. It is bot traffic and preservation coverage, not search ranking activity or human audience.
Diagnostic use case
Identify Internet Archive preservation crawling in logs, separate it from search indexing and SEO tools, and decide robots.txt policy for snapshot capture.
What WebmasterID can help detect
WebmasterID classifies archive.org_bot server-side as a bot and shows which pages it reached, keeping archival crawling out of human analytics.
Common mistakes
- Confusing archival capture with search indexing.
- Assuming robots.txt removes pages already archived.
- Counting archival crawl hits as human visits.
Privacy and accuracy notes
Detection uses only the request user-agent. No human identity is involved. WebmasterID records the fetch as a bot event, separate from human analytics.
Related pages
- ia_archiver and the Internet Archive crawler
ia_archiver is a long-standing user-agent token associated with crawling for the Internet Archive's Wayback Machine and related collections. The Internet Archive operates archival crawlers that fetch public pages to preserve snapshots over time. The token has historic ties to the Alexa crawler that fed early Archive collections, so log entries may show ia_archiver or archive.org-related agents depending on the crawl source.
- Archival crawlers overview
Archival crawlers — led by the Internet Archive's Wayback Machine crawling — fetch public pages to preserve point-in-time snapshots for research, journalism, and the historical record. They are not search crawlers: they capture how a page looked, not rank it. Understanding the difference keeps robots.txt and analytics decisions sensible, since archiving and indexing serve different goals.
- Wayback Machine Save Page Now fetcher
Save Page Now is the Internet Archive feature that captures a specific URL on demand when a person requests a snapshot through the Wayback Machine. Unlike background archival crawling, this fetch happens because someone asked for it right now, making it a user-triggered archival fetch. It appears in logs as an archive.org-identifying request tied to a save request rather than a scheduled crawl.
- Web crawlers
How crawlers and archival bots are detected and categorised.
Sources and verification notes
- Internet Archive — Wayback Machine helpBackground on Archive crawling; exact token list and IP ranges not exhaustively published.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.