Search bots

Crossref crawler

Crossref is a non-profit DOI registration agency that links scholarly publications through persistent identifiers and rich metadata. Its services fetch publisher landing pages and content to support DOI registration, metadata deposit, similarity checking, and link resolution. It is scholarly-infrastructure crawling for the academic citation ecosystem, not general web search indexing.

Verified against primary sources

What this means

Crossref maintains the DOI infrastructure that links scholarly works and exposes their metadata. Publishers deposit metadata with Crossref, and Crossref provides services such as Cited-by linking, Similarity Check, and metadata APIs that may fetch publisher landing pages or content.

If you are a scholarly publisher, Crossref-related fetches support registering and connecting your DOIs. This is academic-citation infrastructure, not consumer web search.

How it identifies itself

Crossref services carry Crossref-identifying user-agents, and most interaction is via deposit and metadata APIs rather than broad crawling. Match on the Crossref identity and scholarly context rather than an exact version.

As with any crawler, the user-agent is a claim and can be copied. Corroborate with behaviour where authenticity matters.

Operator: Crossref (scholarly DOI registration agency)
Scope: publisher landing pages, DOI metadata, link/similarity services
Mostly API-based deposit and metadata, not broad crawling

robots.txt considerations

To express a preference for any Crossref fetcher, target its documented user-agent token in robots.txt. Because the scholarly metadata you deposit is provided to Crossref deliberately, most of Crossref's data does not depend on crawling your pages.

robots.txt is honoured by compliant crawlers and is not an access control.

How it appears in analytics and logs

A Crossref request typically relates to DOI registration, metadata, or link checking for scholarly content you publish. It is scholarly-infrastructure bot traffic, not a human visit and not a general web-search crawl.

Diagnostic use case

Recognise Crossref-related fetches in publisher logs, distinguish scholarly DOI/metadata infrastructure from general web search, and read it as citation-linking activity.

What WebmasterID can help detect

WebmasterID classifies Crossref-related fetches server-side as scholarly-infrastructure bot traffic and surfaces them on the bot-intelligence surface, separate from human analytics.

Common mistakes

Confusing scholarly DOI/metadata infrastructure with general web search indexing.
Assuming robots.txt affects metadata you deposited with Crossref directly.
Counting infrastructure fetches as human readers in analytics.

Privacy and accuracy notes

Identification uses only the request user-agent and scholarly context. No visitor identity is involved. WebmasterID records the fetch as a bot event, separate from human analytics, and never attaches it to a profile.

↑ All search bots in Search bots

Sources and verification notes

Crossref — DOI registration and scholarly metadataScholarly DOI infrastructure; deposit, metadata, and linking services documented.

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.