Search bots

archive.org_bot — Internet Archive web crawler

archive.org_bot is a user-agent associated with Internet Archive crawling that fetches public web pages for preservation in collections such as the Wayback Machine. It is an archival agent, distinct from search-engine indexing crawlers, and identifies via an archive.org URL in its user-agent. Operators see it when their public pages are captured for long-term snapshots.

Partially verified

What this means

archive.org_bot is an Internet Archive crawler agent used to fetch public pages for preservation. The Internet Archive captures snapshots so pages can be viewed later via the Wayback Machine, supporting research, journalism, and historical record.

This is preservation, not search. An archive.org_bot visit does not influence your search rankings; it captures the page as it appeared at that moment.

How it identifies itself

The agent identifies with an archive.org URL in its user-agent string. Match on the archive.org token pattern rather than an exact version. As with any crawler, the user-agent is a claim and can be copied, so corroborate with source behaviour where authenticity matters.

The Internet Archive does not exhaustively publish a single token list or IP ranges, so this entry is marked partially verified; the archival purpose and archive.org self-identification are the reliable signals.

Identifies via an archive.org URL in the user-agent
Purpose: snapshot preservation for the Wayback Machine
Separate from search-engine indexing crawlers

robots.txt considerations

To ask Internet Archive crawling not to fetch your pages, target the relevant archive token in robots.txt. The Archive's handling of robots.txt has changed over time, especially for already-captured material.

robots.txt is a request honoured by compliant crawlers, not enforcement, and does not retroactively remove existing snapshots.

How it appears in analytics and logs

An archive.org_bot request is an archival fetch to preserve a snapshot. It is bot traffic and preservation coverage, not search ranking activity or human audience.

Diagnostic use case

Identify Internet Archive preservation crawling in logs, separate it from search indexing and SEO tools, and decide robots.txt policy for snapshot capture.

What WebmasterID can help detect

WebmasterID classifies archive.org_bot server-side as a bot and shows which pages it reached, keeping archival crawling out of human analytics.

Common mistakes

Confusing archival capture with search indexing.
Assuming robots.txt removes pages already archived.
Counting archival crawl hits as human visits.

Privacy and accuracy notes

Detection uses only the request user-agent. No human identity is involved. WebmasterID records the fetch as a bot event, separate from human analytics.

↑ All search bots in Search bots

Sources and verification notes

Internet Archive — Wayback Machine helpBackground on Archive crawling; exact token list and IP ranges not exhaustively published.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.