WebmasterID logoWebmasterID
Search bots

CORE academic aggregator crawler

CORE is one of the world's largest aggregators of open-access research papers, harvesting content from institutional and subject repositories to provide a unified scholarly search and dataset. Its crawler and harvesters fetch open-access papers and metadata from repositories rather than indexing the general web. It appears in logs as scholarly harvesting, typically against repository and publisher endpoints.

Verified against primary sources

What this means

CORE aggregates open-access research from thousands of repositories and journals worldwide, offering search, a dataset, and an API over harvested scholarly content. It is run as a not-for-profit scholarly service.

If you operate an institutional repository or open-access journal, CORE may harvest your papers and metadata to include them in its aggregated index. This is scholarly harvesting, distinct from general web search crawling.

How it identifies itself

CORE harvesting carries a CORE-identifying user-agent and frequently uses standard repository protocols such as OAI-PMH for metadata, alongside HTTP fetches for full text. Match on the CORE identity and the harvesting context rather than an exact version string.

As with any crawler, the user-agent is a claim and can be copied. Corroborate with behaviour where authenticity matters.

robots.txt considerations

To express a crawl preference for CORE, target its documented user-agent token in robots.txt. Note that repository metadata is often exposed deliberately via OAI-PMH for harvesting, so blocking the HTTP crawler may not stop metadata aggregation.

robots.txt is honoured by compliant crawlers and is not an access control. For open-access content, harvesting generally increases discoverability of your research.

How it appears in analytics and logs

A CORE request means an open-access aggregator harvested scholarly content or metadata from your repository. It is academic-harvesting bot traffic, not a human visit and not a general web-search crawl; it reflects inclusion in a unified research index.

Diagnostic use case

Recognise CORE's open-access harvesting in repository logs, distinguish scholarly aggregation from general web search, and read it as research-content collection.

What WebmasterID can help detect

WebmasterID classifies CORE's harvester server-side as an academic-aggregation bot and surfaces it on the bot-intelligence surface, so scholarly harvesting stays separate from human analytics.

Common mistakes

Privacy and accuracy notes

Identification uses only the request user-agent and harvesting context. No visitor identity is involved. WebmasterID records the fetch as a bot event, separate from human analytics, and never attaches it to a profile.

Related pages

Sources and verification notes

Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.