Abstract
The proliferation of scholarly publications underscores the necessity for reliable tools to navigate scientific literature. OpenAlex, an emerging platform amalgamating data from diverse academic sources, holds promise in meeting these evolving demands. Nonetheless, our investigation uncovered a flaw in OpenAlex’s portrayal of publication status, particularly concerning retractions. Despite accurate metadata sourced from Crossref database, OpenAlex consolidated this information into a single Boolean field, ‘is_retracted’, leading to misclassifications of papers. This challenge not only impacts OpenAlex users but also extends to users of other academic resources integrating the OpenAlex API. The issue affects data provided by OpenAlex in the period between 22 December 2023 and 19 March 2024. Anyone using data from this period should urgently check it and replace it if necessary.
Introduction
The exponential growth of scholarly publications highlights the increasing need for tools that facilitate rapid access to current and authentic scientific knowledge. Such tools not only aid researchers in staying up-to-date of the latest advancements but also play a pivotal role in conducting bibliometric analyses, thereby enabling the evaluation of the evolution of scientific literature within different domains. These analyses serve as crucial metrics for assessing the productivity and impact of authors, institutions and journals. Among the emerging online resources in this domain stands OpenAlex, 1 a noteworthy platform known for its openness and data integration capabilities. OpenAlex consolidates and standardises data from diverse academic sources, with notable emphasis on the Microsoft Academic Graph, which ceased operation in December 2021 [1], and the extensive corpus maintained by Crossref, 2 the largest DOI registration agency [2].
The current scholarly communication landscape is witnessing a significant shift towards open science [3]. In this evolving paradigm, OpenAlex by OurResearch emerges as a solution that is better aligned with the current requisites of the academic community when compared with the closed, subscription-based citation databases such as Web of Science and Scopus. OpenAlex provides a significantly broader coverage of academic literature, as noted by [1, 4], thereby addressing the growing demand for comprehensive and accessible sources of research information. Moreover, the OpenAlex API presents a compelling advantage with its unrestricted access to metadata retrieval, rendering it an invaluable resource for conducting large-scale bibliometric analyses [5–8]. Furthermore, the provision of database snapshots empowers users with the capability to obtain full copies of the OpenAlex database for deployment on their own servers, thereby enhancing accessibility and facilitating further research endeavours.
Since its launch in January 2022, OpenAlex has swiftly garnered substantial interest among academic stakeholders. A notable illustration of this phenomenon is exemplified by Sorbonne University, which, in alignment with its overarching policy of fostering openness, opted not to renew its subscription to the Clarivate bibliometric tools. Instead, the university redirected its focus towards exploring open tool alternatives, with OpenAlex emerging as a prominent candidate. 3 Similarly, the Center for Science and Technology Studies (CWTS) at Leiden University has integrated OpenAlex as a cornerstone data source for its novel CWTS Leiden Ranking Open Edition initiative. This pioneering endeavour aims to equip stakeholders with ‘fully transparent information about the scientific performance of over 1400 major universities worldwide’. 4
The rapid evolution of computer technology has enabled us to swiftly combine bibliographic data from diverse sources and automate its processing and analysis. However, while such advancements offer immense potential, they often entail challenges concerning the accuracy, comprehensiveness and standardisation of data obtained from disparate sources. As a new data source, OpenAlex faces precisely these challenges. A recent comprehensive investigation conducted by Zhang et al. [9] delves into the issue of absent affiliations in the metadata of journal articles within the OpenAlex platform. Analysis by Jahn et al. [10] found that the is_oa filter in OpenAlex, which indicates the availability of open full texts, did not always match the open access status information of the paper. In this article, we present our own observations regarding the incorrect representation of retractions within OpenAlex metadata and propose potential remedies to mitigate this issue.
The growing volume of scientific output is accompanied by a corresponding increase in various forms of academic misconduct, including paper mills, questionable journals, plagiarism and the fabrication or falsification of research findings [11–14]. This concerning trend places heightened demands on journal editors and reviewers, whose workload is experiencing a corresponding escalation [15]. As a result, errors or misconduct may not always be promptly identified. Consequently, there has been a surge in retractions worldwide – a process in which journal editors formally notify readers of publications containing significant flaws or erroneous data, thereby announcing that the reliability of their findings and conclusions is questionable [16–18].
The process of retracting a publication involves a meticulous and exhaustive investigation by the journal’s editors, culminating in a formal decision to retract the article. Information about retractions is typically published separately within the journal, where editors explain the rationale behind the decision as well as the date of retraction. For detailed information on retractions of scientific articles, researchers can leverage the Retraction Watch database. 5 Notably, in September 2023, Crossref, the pre-eminent DOI registration agency, acquired the Retraction Watch database. 6 This acquisition enhances the database’s utility and accessibility as an important resource for scholarly inquiry.
Retracted papers are accessible to readers on the journal’s website, but they must contain a clear note indicating their retracted status. This serves as a cautionary measure to alert users to potential issues associated with the respective paper. However, ensuring consistent marking of retractions across all reference databases where the publication is indexed remains a challenge. Although it is important that retractions are accurately marked, there are inconsistencies in the way that many databases approach this task [19, 20]. Therefore, we conducted an investigation to assess how information pertaining to retractions is presented in the metadata of publications within the OpenAlex database.
Methods
In the initial phase of our study on March 6, 2024, we utilised the OpenAlex API to retrieve 47,720 retraction records. 7 This data set included all records that were marked as retracted in OpenAlex. Subsequently, we downloaded these records as a CSV file for further analysis. Upon scrutinising the obtained results, it became apparent that not all entries designated as retractions were accurate. Closer examination of the OpenAlex metadata revealed that the ‘is_retracted’ field serves as the determinant of a publication’s status, with values restricted to either true or false.
As previously mentioned, OpenAlex primarily sources its data from Crossref database. 8 Following Crossref’s acquisition of the Retraction Watch database, information from this database was integrated into the Crossref Labs API, accessible through the ‘update-nature’ field. 9 We enriched of 47,018 entries (excluding 704 records lacking DOIs) OpenAlex records with the ‘update-nature’ from Crossref using a Python script. Due to the experimental character of the Labs API, it was not possible to get a complete data set. This resulted in a subset of 20,486 records.
Results and discussion
The results of our analysis of a subset of the ‘update-nature’ field in Crossref metadata are depicted in Figure 1. It is evident from the figure that this field encompasses a range of classifications beyond retractions, including Corrections, Expressions of Concern and Crossmark Retractions. Our findings indicate that Crossref presents the publication status granularly in the metadata (as illustrated in Figure 2), but OpenAlex employs an approach that consolidates this information in a single Boolean field labelled ‘is_retracted’ (Figure 3). Consequently, the mere presence of any information about an update causes OpenAlex to categorise the publication as retracted.

Results of analysing the content of the ‘update-nature’ field in selected Crossref metadata.

Example of contents in the ‘update-nature’ field in Crossref metadata. 10

Example of incorrect contents of the ‘is_retracted’ field in OpenAlex metadata. 11
This representation of publication status in OpenAlex is a significant concern, particularly given the platform’s increasing importance. For instance, in our examination of retractions within OpenAlex, we observed that among the most cited papers with a retraction status is a seminal work by Corman et al. [21], which presented the establishment of a reverse transcription polymerase chain reaction (RT-PCR) test for the detection of the 2019-nCoV virus, which caused the COVID-19 pandemic. Although this article underwent minor corrections, it was never retracted. Mislabelling such influential publications as retractions not only has the potential to misinform healthcare professionals and jeopardise patient care but also risks undermining public trust in the quality of scientific research as a whole.
In a blog post, Herb [22] highlights the issue of inaccurate representation of retractions in OpenAlex, resulting in the misclassification of papers within institutional repositories. Consequently, the ramifications of this problem extend beyond users directly accessing OpenAlex via the web interface to encompass users of other academic resources leveraging the OpenAlex API.
Given the far-reaching implications of this issue, it was imperative that it is promptly addressed. As it is of utmost importance to ensure the accurate portrayal of publication statuses on retractions, we have contacted the OurResearch team and brought the issue to their attention. Approximately 2300 incorrect records were identified and corrected. Metadata provided via the API between 22 December 2023 and 19 March 2024, as well as the data snapshot releases 2024-01-24 and 2024-02-27 are affected. 12
It is crucial to periodically scrutinise critical metadata to ensure its accuracy and reliability. Employing alternative tools, such as the Problematic Paper Screener’s Annulled Detector, 13 can help verify the status of publications and identify discrepancies. Given the significant implications of retraction statuses – affecting academic integrity, research validity and public trust – each indication of retraction must be handled with meticulous care. By doing so, we can maintain the integrity of scholarly communication and support the ongoing efforts to uphold rigorous standards in academic publishing.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship and/or publication of this article: No funding was received by the authors or for this research. The publication of this article was funded by the Open Access Fund of Technische Informationsbibliothek (TIB).
