OpenAlex crawler
OpenAlex, run by the non-profit OurResearch, is a free and open catalogue of the global research system — papers, authors, institutions, venues, and concepts — offered as data and an API. Its crawler and harvesters gather scholarly metadata and links to build an open scientific knowledge graph. It is a research-metadata aggregator rather than a general web search engine.
What this means
OpenAlex provides an open, comprehensive index of scholarly entities and their relationships, intended as a successor to the discontinued Microsoft Academic Graph. It is published openly as data and via an API by the non-profit OurResearch.
If you host scholarly content or metadata, OpenAlex may gather links and metadata about it to enrich its open graph. This is research-metadata aggregation, not general web search indexing.
How it identifies itself
OpenAlex collection carries an OpenAlex/OurResearch-identifying user-agent, and much of its data is assembled from open sources, Crossref, and repository metadata rather than broad page crawling. Match on the OpenAlex identity rather than an exact version string.
As with any crawler, the user-agent is a claim and can be copied. Corroborate with behaviour where authenticity matters.
- Operator: OurResearch (OpenAlex open scholarly catalogue)
- Scope: papers, authors, institutions, venues, concepts
- Built largely from open metadata sources and Crossref
robots.txt considerations
To express a crawl preference for OpenAlex, target its documented user-agent token in robots.txt. Because OpenAlex assembles much of its graph from open metadata and partner sources, blocking a direct crawler may not remove metadata sourced elsewhere.
robots.txt is honoured by compliant crawlers and is not an access control.
How it appears in analytics and logs
An OpenAlex request means an open scholarly catalogue harvested research metadata or links related to your content. It is academic-metadata bot traffic, not a human visit and not a general web-search crawl.
Diagnostic use case
Recognise OpenAlex harvesting in scholarly logs, distinguish open research-metadata aggregation from general web search, and read it as inclusion in an open scientific graph.
What WebmasterID can help detect
WebmasterID classifies OpenAlex harvesting server-side as an academic-metadata bot and surfaces it on the bot-intelligence surface, so research-graph aggregation stays separate from human analytics.
Common mistakes
- Confusing an open scholarly-metadata catalogue with a general web search engine.
- Assuming a robots.txt block removes metadata sourced from Crossref or repositories.
- Counting metadata-harvesting hits as human readers in analytics.
Privacy and accuracy notes
Identification uses only the request user-agent and harvesting context. No visitor identity is involved. WebmasterID records the fetch as a bot event, separate from human analytics, and never attaches it to a profile.
Related pages
- Academic and research crawlers — overview
Academic and research crawlers fetch scholarly papers and metadata to build research search engines, open catalogues, and citation infrastructure. This overview covers how Semantic Scholar, CORE, OpenAlex, and Crossref differ from general web crawlers, why much of their work is metadata harvesting via standard protocols, and how to set policy. For sites hosting research, they generally increase scholarly discoverability.
- Crossref crawler
Crossref is a non-profit DOI registration agency that links scholarly publications through persistent identifiers and rich metadata. Its services fetch publisher landing pages and content to support DOI registration, metadata deposit, similarity checking, and link resolution. It is scholarly-infrastructure crawling for the academic citation ecosystem, not general web search indexing.
- CORE academic aggregator crawler
CORE is one of the world's largest aggregators of open-access research papers, harvesting content from institutional and subject repositories to provide a unified scholarly search and dataset. Its crawler and harvesters fetch open-access papers and metadata from repositories rather than indexing the general web. It appears in logs as scholarly harvesting, typically against repository and publisher endpoints.
- Web crawlers
How academic catalogues and search crawlers are categorised.
Sources and verification notes
- OpenAlex — open catalogue of scholarly works (OurResearch)Open scholarly graph and API; assembled from open metadata sources.
Last reviewed 2026-06-24. Facts are checked against primary/official sources where available; uncertain specifics are marked “Data not yet verified” rather than guessed.