Tag: openalex api

  • OpenAlex: The Case for Open Research Metrics

    OpenAlex is a free, CC0-licensed index of more than 319 million scholarly works, authors and institutions, built by the non-profit OurResearch to replace the discontinued Microsoft Academic Graph. For institutions weighing research-metrics platforms, its open data answers a question closed commercial indices cannot: who can audit the numbers behind an assessment decision.

    OpenAlex is a bibliographic catalogue of scientific papers, authors and institutions accessible in open-access mode, named after the Library of Alexandria. That single design choice — publishing the full dataset under a public-domain licence rather than behind a subscription wall — is what separates it structurally from Elsevier’s Scopus and Clarivate’s Web of Science, and why it has become a reference point in debates about research-assessment transparency.

    What Is OpenAlex?

    OpenAlex launched in January 2022, built by OurResearch (a US non-profit operating as Impactstory, Inc.) as a successor to the Microsoft Academic Graph, which Microsoft stopped updating on 31 December 2021. The project inherited MAG’s dataset and rebuilt it as an open, queryable graph of works, authors, institutions, funders, and topics.

    Two design decisions define the platform. First, the entire dataset is released under a Creative Commons Zero (CC0) licence, meaning any institution, developer, or researcher can download, redistribute, and build on it without permission or cost. Second, OpenAlex has formally adopted the Principles of Open Scholarly Infrastructure (POSI), a governance commitment covering sustainability, community control, and data portability.

    The scale is now substantial. OpenAlex’s own catalogue reports more than 319 million scholarly works, and its API handled roughly 115 million queries a month in 2024, according to figures cited in the platform’s Wikipedia entry. It draws source data from Crossref, ORCID, DOAJ, and Unpaywall rather than from a closed editorial pipeline.

    How Does OpenAlex Compare with Scopus and Web of Science?

    The practical difference is not just price — it is what each platform lets an institution verify. Scopus and Web of Science apply proprietary, selective journal-inclusion criteria and sell access to the resulting index. OpenAlex indexes broadly by default and publishes the inclusion logic as open code, which means an institution can inspect exactly why a work is or is not counted.

    Dimension OpenAlex Scopus (Elsevier) Web of Science (Clarivate)
    Governance Non-profit (OurResearch), POSI-aligned Commercial publisher Commercial data company
    Data licence CC0, fully open, bulk download Proprietary, licensed access only Proprietary, licensed access only
    Core journal metric No proprietary journal metric CiteScore (four-year citation average) Journal Impact Factor
    Coverage approach Broad, automated aggregation, strong Diamond OA and non-English coverage Curated, selective journal list Curated, selective journal list
    Cost to institutions Free API; optional paid support tier Subscription Subscription

    CiteScore, Scopus’s flagship journal metric, averages the citations a journal’s documents receive over a four-year window — a useful signal, but one calculated entirely inside a closed system that institutions cannot independently reproduce. OpenAlex does not publish an equivalent branded journal score; instead it exposes the underlying citation and work-level data so that any bibliometrician can calculate their own indicator and show their working.

    Coverage differences matter for equity as much as accuracy. A 2024 study cited in OpenAlex’s Wikipedia entry found the platform indexes more than 12,500 Diamond Open Access journal titles, including over 60% of Diamond OA journals absent from both Web of Science and Scopus — a direct consequence of not gating inclusion behind a commercial selection committee.

    Why Does Open Metrics Infrastructure Serve DORA’s Transparency Principle?

    The San Francisco Declaration on Research Assessment (DORA), first published in 2012, asks funders, institutions, and publishers to stop substituting journal-based proxies for direct evaluation of research and to be explicit about the criteria used in funding, hiring, and promotion decisions. That explicitness requirement is where the platform choice stops being neutral.

    A closed index can tell an institution that a number was calculated a certain way, but it cannot let that institution independently verify how, because the underlying citation graph is licensed, not published. An open metadata layer removes that opacity: the same dataset an institution cites in a tenure file or a funding report can be downloaded, re-run, and checked by anyone, including the researcher being assessed.

    Adoption evidence has followed the argument. Leiden University announced in September 2023 that it would produce an open-source edition of its CWTS Leiden Ranking using OpenAlex data from 2024 onward. Sorbonne University announced in December 2023 that it was withdrawing its Scopus subscription in favour of OpenAlex. In 2024, France’s Ministry of Higher Education and Research pledged financial support to the project, describing it as “crucial open science infrastructure,” and the Arcadia Fund awarded OurResearch a $7.5 million grant explicitly to build OpenAlex into a sustainable alternative to commercial citation indices.

    • Leiden University: open-source CWTS Leiden Ranking edition built on OpenAlex data (from 2024)
    • Sorbonne University: Scopus subscription withdrawn in favour of OpenAlex (December 2023)
    • French Ministry of Higher Education and Research: financial commitment to OpenAlex as open science infrastructure (2024)
    • Arcadia Fund: $7.5 million grant to OurResearch for OpenAlex sustainability (March 2024)

    None of this means closed indices lack value; their curated selection and mature analytics tooling still suit some high-stakes evaluations. But where the explicit requirement is transparency rather than convenience, an auditable, CC0-licensed data layer meets DORA’s stated principle more directly than a licensed black box.

    Common Questions About OpenAlex

    What is OpenAlex used for?

    Universities, funders, and publishers use OpenAlex to track publication output, measure open-access status, benchmark institutional performance, and feed alternative rankings such as the open-source CWTS Leiden Ranking. Its free API also underpins third-party dashboards, systematic-review tools, and research-information systems that need citation and affiliation data without a subscription fee.

    Is OpenAlex legit?

    Yes. OpenAlex is maintained by OurResearch, a non-profit with a multi-year record of building open scholarly infrastructure, and it has formally adopted the Principles of Open Scholarly Infrastructure (POSI). Its data and methodology are openly licensed and auditable, and the platform is already cited in peer-reviewed scientometrics research, including a 2022 arXiv paper by its founders.

    Is OpenAlex free?

    Yes. The full dataset is released under a Creative Commons Zero (CC0) public-domain licence, and the REST API can be queried without a subscription, unlike Scopus or Web of Science. A polite-pool rate limit applies to unauthenticated use, and OurResearch offers an optional paid support tier for high-volume institutional queries.

    Who owns OpenAlex?

    OpenAlex is created and maintained by OurResearch, a US-based non-profit operating as Impactstory, Inc., not by a commercial publisher. Governance sits with a mission-driven organisation rather than a shareholder-owned company — the structural distinction that underpins its CC0 licensing and its appeal to institutions pursuing publisher-independent, DORA-aligned metrics.

    What Should Institutional Leaders Do Next?

    Platform choice is now a governance decision, not just a procurement one. An institution that cites OpenAlex data in a promotion case, a funding report, or an open-access dashboard is making a transparency claim as well as a metrics claim, and that claim should be tested before it is relied upon.

    • Map which existing assessment workflows (tenure, funding reports, rankings submissions) rely on a metric an evaluator cannot independently reproduce.
    • Pilot OpenAlex alongside — not instead of — existing subscriptions, comparing coverage gaps directly against Scopus or Web of Science outputs for your own institutional corpus.
    • Document data provenance explicitly in assessment criteria, consistent with DORA’s requirement for stated, auditable methodology.
    • Track POSI-aligned infrastructure commitments (OpenAlex, CrossRef, ORCID, ROR) as the durable layer beneath any commercial tool an institution also chooses to license.

    Open, non-proprietary metadata will not replace every function a commercial index performs today. But as funders and assessment reformers keep pressing for auditable evidence over proprietary scores, institutions that already understand — and can reproduce — their own metrics will be the ones best placed to defend them.

  • OpenAlex API: Building a Metrics Dashboard

    The OpenAlex API is a free, fully open REST interface to a catalogue of hundreds of millions of scholarly works, authors, institutions and funders, and it is the most practical data source for building an in-house institutional research metrics dashboard without a subscription. Query the /works endpoint with an institution filter, aggregate with group_by, and you have publication counts, open-access share and citation-percentile data in a single JSON response.

    OpenAlex is an open, CC0-licensed catalogue of the global research system — works, authors, institutions, sources, funders and topics — built and maintained by the non-profit OurResearch as a successor to the discontinued Microsoft Academic Graph. Because every record and the API itself are free to query, research offices can build metrics dashboards without licensing a commercial bibliometrics platform, provided they understand the filter syntax, pagination limits and the metric gaps this guide covers.

    What is the OpenAlex API and what does it cover?

    The OpenAlex API exposes entity endpoints — Works, Authors, Institutions, Sources, Topics, Funders and Awards — each accessed at https://api.openalex.org/{entity}. Every entity supports four operations: list, get (by ID), filter, and group_by (server-side aggregation), which together are the building blocks of a dashboard.

    Each entity carries a persistent OpenAlex ID and, for institutions, a cross-walked ROR identifier — the Research Organization Registry ID also used by ORCID, Crossref and DataCite. Filtering on an institution’s ROR-linked OpenAlex ID, rather than a free-text name match, is what keeps a dashboard’s institutional attribution stable as an organisation’s name or subsidiary structure changes.

    Entity endpoint Dashboard use case Example filter
    /works Publication counts, open-access share, citation percentiles authorships.institutions.id
    /authors Researcher productivity, h-index-style summary stats affiliations.institution.id
    /institutions Peer benchmarking, collaboration networks ror
    /topics Subject-area concentration and trend detection works_count

    How do you query the Works endpoint for institutional metrics?

    Every institution-level query starts with the authorships.institutions.id filter set to the institution’s OpenAlex ID, which you resolve once via /institutions?filter=ror:https://ror.org/{your-ror-id}. From there, combine filters with commas (AND logic) and pipes (OR logic), and add group_by to turn a list query into an aggregation query in one request — no client-side loop required.

    • Publication trend: /works?filter=authorships.institutions.id:I123...,publication_year:2020-2026&group_by=publication_year
    • Open-access share: add &group_by=oa_status to the same filter to split output into gold, green, hybrid, bronze and closed counts.
    • Field distribution: &group_by=primary_topic.field.id reveals subject concentration across an institution’s output.
    • Collaboration mapping: &group_by=authorships.institutions.id returns co-publishing partner institutions ranked by shared-work count.

    Use the select parameter to strip unused fields from large responses, and switch from offset-based page/per_page pagination to cursor pagination once a query’s meta.count exceeds roughly 10,000 results — offset pagination is capped and will silently stop returning new pages beyond that depth.

    How do you approximate field-weighted citation impact with OpenAlex data?

    Field-weighted citation impact (FWCI) is a proprietary metric popularised by Elsevier’s SciVal and Scopus products, calculated by comparing a work’s citations to the average for same-year, same-subject, same-document-type publications; OpenAlex does not expose a field literally called “FWCI”, and no open API replicates the Scopus subject-classification baseline it is normalised against.

    OpenAlex’s nearest open equivalent is the cited_by_percentile_year object returned on every work record, which gives a min/max percentile rank of that work’s citation count against all works of the same publication year and type. Aggregating this field across an institution’s output — for example, the share of works in the top decile (percentile ≥ 90) per year — produces a transparent, reproducible citation-impact proxy that a dashboard can compute without a commercial licence, though it is not interchangeable with SciVal’s FWCI for benchmarking against institutions that report the Scopus figure.

    For most dashboards the honest approach is to present both: raw citation counts (context-dependent, not comparable across fields) and the percentile-year proxy (comparable within OpenAlex’s corpus), clearly labelled as distinct from any vendor-reported FWCI value cited in external reports.

    What are the authentication, rate-limit and pricing rules?

    OpenAlex’s underlying dataset, website and API are free and the data is CC0-licensed, so no purchase is required to query or redistribute results. Every request should still include a contact identifier — either a mailto query parameter with your email address or a registered api_key — to enter the “polite pool”, which OurResearch prioritises over anonymous traffic for faster, more consistent response times.

    Requests without a mailto parameter or API key are routed to a slower, lower-priority pool and are more likely to be throttled during peak load; this single parameter is the most common fix for intermittent 429 or timeout errors reported by developers building batch-harvesting scripts. Dashboard builders scheduling nightly refresh jobs should always set mailto or an API key rather than relying on the anonymous pool.

    Common developer questions

    Is the OpenAlex API free?

    Yes. OpenAlex is free to query, and the underlying data is licensed under CC0, meaning it can be reused and redistributed without royalties. Registering an email via the mailto parameter or an API key gives access to the faster “polite pool” but does not change the underlying no-cost model.

    Does OpenAlex have an API for institutional data?

    Yes. The Institutions endpoint returns disambiguated organisation records cross-walked to ROR identifiers, and the Works endpoint accepts an authorships.institutions.id filter, which is the standard way to scope any query to a single institution’s publication output for a dashboard.

    What is OpenAlex used for in research administration?

    Research offices use OpenAlex to track publication trends, open-access compliance, collaboration networks and topic concentration without paying for a commercial bibliometrics subscription. Its open licence also makes it suitable for public-facing institutional reporting, since results can be republished without redistribution restrictions.

    Implications for institutional research offices

    A dashboard built directly on the OpenAlex API gives research administration teams a free, auditable alternative to proprietary bibliometrics tools for routine reporting — publication counts, open-access compliance tracking and collaboration mapping — while reserving paid platforms for tasks that genuinely require vendor-normalised metrics such as reported FWCI. The trade-off is that teams take on the engineering work themselves: handling pagination beyond 10,000 results, keeping institution ID mappings current as ROR records change, and documenting clearly that a percentile-based proxy is not the same figure a funder or ranking body may expect from Scopus.

    As OpenAlex’s topic classification and percentile fields mature, the gap between what a free, transparent API can deliver and what a paid platform delivers continues to narrow for most day-to-day institutional reporting needs, making a well-built in-house dashboard an increasingly credible default rather than a stopgap.