Death of a reviewer or death of peer review integrity? the challenges of using AI tools in peer reviewing and the need to go beyond publishing policies

Abstract

Peer review facilitates quality control and integrity of scientific research. Although publishing policies have adapted to include the use of Artificial Intelligence (AI) tools, such as Chat Generative Pre-trained Transformer (ChatGPT), in the preparation of manuscripts by authors, there is a lack of guidelines or policies on whether peer reviewers can use such tools. The present article highlights the lack of policies on the use of AI tools in the peer review process (PRP) and argues that we need to go beyond policies by creating transparent procedures that will enable journals to investigate allegations of non-compliance and take decisions that will protect the integrity of the peer review system. Reviewers found to violate relevant policies must be excluded from the process to safeguard the integrity of the peer review system.

Keywords

Artificial Intelligence ChatGPT ethics integrity peer review policy

Introduction

Peer review not only plays a main role in the scholarly communication; it also facilitates quality control and integrity of scientific research (Guston, 2007; Rennie 2003), even in the digital age (Nicholas et al., 2015). Setting academic research to the judgment of peers who are experts in the field aims to critically assess the novelty, quality, impact, and ethicality of academic research (Ware, 2008). To enhance peer review systems, publishers have incorporated automated screening tools in the peer review process (PRP) that enable editors to accelerate manuscript screening, to assess compliance with journal policies and to identify suitable reviewers based on expertise and past performance, which, however, have raised concerns of responsibility and transparency among scholars (Schulz et al., 2022). On the other side of the publishing process, we are witnessing a turning point where academic authors increasingly use Artificial Intelligence (AI)-assisted technologies, particularly generative conversational AI such as Chat Generative Pre-trained Transformer (ChatGPT) and Large Language Models (LLMs) in general, in the manuscript preparation phase (Dergaa et al., 2023; Lee et al., 2023). Generative AI and LLMs in general are the point of focus for this manuscript because of their ability to produce content, which is not always disclosed as such. AI-assistance tools frequently used by scholars such as search engines, formatting and thesaurus are out of the scope of the arguments presented herein. Although generative AI tools have the potential to enhance and speed up academic writing, their use raises ethics issues on authorship, authenticity, credibility and accountability of academic work (Dergaa et al., 2023), with subsequent legal implications on copyright.

However, an issue that has not received comparable attention so far is what happens when a reviewer uses AI-based tools to produce a peer review report without disclosing it. In other words, what would be an appropriate policy for editors to follow in the case that authors expressed their concerns, or editors themselves have suspicions that the peer review report is a result of, or it includes parts produced by, generative AI tools? How should these allegations be investigated? Should the reviewer be excluded from peer reviewing? If so, at which level? Should it be exclusion from the peer review of the journal or all journals the publisher is responsible for? What are the responsibilities of different parties in this process to ensure compliance with ethics standards and integrity? Are these sufficiently described in journal publishing policies?

The triggering event for this article is the author’s own experience as an editor, during the peer review of a manuscript, where ethical dilemmas emerged and the lack of effective procedures was exposed. The aim of this article is to illustrate the lack of specific policies on the use of AI-based tools in the PRP, to analyze the potential consequences on the integrity of the PRP, and propose practical ways to mitigate the risks and safeguard the integrity of peer review.

Materials and methods

An ethics analysis approach was followed and critical reasoning was applied to explore whether the absence of policies exposes the peer review to risks. To provide grounds for the absence of policies, the publishing policies of the 10 largest scientific publishers, as classified by journal count (Nishikawa-Pacher, 2022), were reviewed based on the information provided in their websites. It is acknowledged that this is just an indicative list of publishers, suggestive only of the different policies, which serves the aim of this manuscript. Individual journal publishing policies were not reviewed.

Results

Large publishers have adapted their publishing policies to the potential use of AI-assisted technologies by authors during manuscript preparation (Table 1). Eight out of the 10 largest publishers have established policies on the use of generative AI and AI-assisted technologies by authors, five of which are in line with the Committee On Publication Ethics (COPE) position statement on “Authorship and AI tools,” according to which:

AI tools, such as ChatGPT or Large Language Models (LLMs), cannot be listed as an author of a paper and authors who use AI tools in the writing of a manuscript, production of images or graphical elements of the paper, or in the collection and analysis of data, must be transparent in disclosing how the AI tool was used and which tool was used. (Committee on Publication Ethics, 2023)

Table 1.

Comparative summary of policies on the use of AI-based tools by editors, reviewers, and authors, which are followed by large publishers.

Publisher	Policy on the use of AI-based tools
Publisher	For editors	For reviewers	For authors
Springer	None	None	LLMs, such as ChatGPT, do not currently satisfy the authorship criteria. Use of an LLM should be properly documented in the manuscript (Springer Editorial Policies, Artificial Intelligence, 2023)
Taylor & Francis	Files, images or information from unpublished manuscripts must not be uploaded into databases or tools that do not guarantee confidentiality, are accessible by the public and/or may store or use this information for their own purposes (for example, generative AI tools like ChatGPT) (Taylor and Francis, Journal Editor Code of Conduct, 2023)	Reviewers must not use AI tools to generate manuscript review reports, including LLM-based tools like ChatGPT (Taylor and Francis, Reviewer Guidelines, 2023)	COPE guidelines (COPE, 2023)
Elsevier	None	Reviewers should not upload a submitted manuscript or any part of it into a generative AI tool as this may violate the authors’ confidentiality and proprietary rights and, where the paper contains personally identifiable information, may breach data privacy rights (Elsevier, Publishing Ethics, 2023)	Authors are allowed to use generative AI and AI-assisted technologies in the writing process before submission, but only to improve the language and readability of their paper and with the appropriate disclosure (Elsevier, Publishing Ethics, 2023)
Wiley	None	None	COPE guidelines (COPE, 2023) The final decision about whether use of an AI tool is appropriate or permissible lies with the journal’s editor or other party responsible for the publication’s editorial policy (Willey, Best Practice Guidelines on Research Integrity and Publishing Ethics, 2023)
SAGE	None	None	COPE guidelines (COPE, 2023) WAME guidelines (Zielinski et al., 2023) AI bots, such as ChatGPT, should not be listed as an author. Authors who use AI tools must acknowledge this in the manuscript. Editors and reviewers should evaluate the appropriateness of the use of LLMs in the manuscript and ensure that the generated content is accurate and valid (Sage, Publishing Policies, ChatGPT and Generative AI, 2023)
OMICS International online journals	None	None	None (OMICS online, Publication Policies and Ethics, 2023)
DeGruyter	Refer to journals’ policy	None	None
Oxford University Press	None	None	Neither symbolic figures, such as Camille Noûs, nor natural language processing tools driven by AI, such as ChatGPT, qualify as authors. The use of AI tools must be disclosed in the manuscript and the cover letter (Oxford Academic, Authoring, Ethics, 2023)
Inder Science	None	None	COPE guidelines (COPE, 2023) AI tools (e.g. ChatGPT) cannot be listed as authors. Authors are fully responsible for the content of their article, even those parts produced by any AI tool, and are thus liable for any inaccuracies or breach of publication ethics. The use of AI tools must be acknowledged in the manuscript, describing the technologies used and the purpose (Inder Science, Publishing Ethics Statement, 2023)
Brill	Editors same as authors	None	COPE guidelines (COPE, 2023) Authors may use AI and LLMs in the writing and preparation of their manuscripts when doing so with transparency, maintaining full responsibility and accountability for their research (BRILL, Position Statement on Author/Editor Use of Artificial Intelligence & Large Language Models, 2023)

AI: Artificial Intelligence; ChatGPT: Chat Generative Pre-trained Transformer; COPE: Committee On Publication Ethics; LLMs: Large Language Models; WAME: World Association of Medical Editors.

Only 1 of the 10 large publishers also refers to the World Association of Medical Editors (WAME) recommendations, according to which:

[. . .] chatbots cannot be authors, authors should be transparent when chatbots are used and provide information about how they were used, authors are responsible for material provided by a chatbot in their paper and for appropriate attribution of all sources. (Zielinski et al., 2023)

Nonetheless, neither COPE nor WAME have published guidelines on the use of AI tools by peer reviewers, and most large publishers do not cover in their policies the use of generative AI by reviewers. Notably, only 2 out of the 10 large publishers mention the use of AI tools by reviewers. However, the policies of these two publishers have a distinct focus. Elsevier focuses on data security, the potential breach of the authors’ confidentiality and rights, as well as on potential breach of data privacy rights when personal information are included in the manuscript. Thus, this policy does not explicitly prohibit the use of generative AI to produce the content of the peer review report. For instance, if generative AI was used offline or as part of a private generative AI tool, then data confidentiality would not be an issue, and the reviewer would be allowed to do so.

Taylor & Francis is the only publisher that explicitly mentions that reviewers “must not use AI tools to generate manuscript review reports, including LLM-based tools like ChatGPT,” although the rationale is not clarified. Regarding the use of generative AI tools by editors, the policy by Taylor & Francis prohibits the use of generative AI tools like ChatGPT to protect confidentiality of the information in the manuscript. Thus, one could assume that the prohibition for reviewers is also based on confidentiality reasons.

None of the policies of large publishers, as described in their websites, indicate the procedure to be followed in handling potential violations. By way of explanation, there is a lack of transparency regarding the investigation of cases where a reviewer (or an editor) has used or is suspected to have used generative AI. It remains obscure if and how these cases are handled by the journal’s Ethics Committee (if one exists at all) or research integrity team. It is unclear whether publishers have any means available to provide evidence for the detection of content generated by AI tools. There is opacity on whether any kind of sanctions will be imposed on reviewers using generative AI tools in the PRP.

The lack of relevant policies entails risks for the PRP, which are described in Table 2. Unless transparent policies are established by publishers, the trust between different parties involved in the PRP, including authors, editors, peer reviewers, and publishers, is at stake. Decision-making on which manuscripts will be published is challenged, and disputes between authors and journals may arise. A consequent critical risk is that the integrity of the peer review system overall is called into question.

Table 2.

Concerns, risks, and suggested approach for the ethics review process.

Concerns	Risks	Suggested approach
What should happen when a reviewer uses AI-based tools to produce a peer review report without disclosing it?	Trust between authors/editors and peer reviewers is challenged. Decision-making in publishing manuscripts is questioned. Lack of transparency in procedures leads to disputes. The integrity of the peer review system is questioned. The reputation of journals and publishers is at risk.	Clear policies should be implemented on whether it is allowed to used AI-based tools in the peer review or not. Procedures should be in place to investigate allegations. AI-based tools could be used to detect (and consequently prove) whether a review Report is/contains parts generated by AI-based tools.
Authors express concerns OR editor has suspicions that the peer review report is a result of, or includes content produced by generative AI tools, such as ChatGPT. What should editors do in such a case?
How should allegations/suspicions of violations be investigated?
Should the reviewer be excluded from peer reviewing?
What are the responsibilities of different parties (authors, editors, journal Ethics Committees/research integrity team, publishers) in this process to ensure compliance with ethics standards and integrity?
Are responsibilities and procedures sufficiently and transparently described in journal publishing policies?

Discussion

Clear guidelines on the use of AI-based tools by academic authors, like the position statements published by COPE and WAME, are essential to promote transparency and integrity in research and publishing. Therefore, it is not surprising that several large publishers follow these guidelines. However, this is not the case for reviewers playing an important role in the PRP. There is clearly a lack of guidelines by appropriate committees, such as COPE and WAME, on whether AI-based tools should be used by reviewers and if yes, under which conditions. This issue is also insufficiently addressed by the existing publishing policies of large scientific publishers.

Such an issue should be examined by taking into consideration the advantages and disadvantages of using AI tools in the PRP. For instance, AI tools can facilitate editors in the initial screening phase of manuscripts to assess their quality, detect plagiarism, check formatting, and whether they fit with journal scope in a time efficient manner; they can be used in identifying suitable reviewers; they can even be used in summarizing individual review reports and in writing final decision letters. Generative AI tools can also enhance peer reviewers by generating more concise, informative and well-written reviews for native but particularly for non-native speakers, empowering them to confidently use the English language and improve their writing style; they can also save time and effort for reviewers from correcting grammar and spelling mistakes.

However, recent studies investigating the potential of AI tools to serve as peer reviewers in the evaluation process of manuscripts, highlight the limitations of AI-generated review reports. Donker (2023) found that when ChatGPT was used to produce a peer review report, it was able to produce a good summary of the manuscript clearly describing its main goal and its conclusions, but it was unable to provide specific improvements for a manuscript and was mainly left to general comments, with no critical content about the described study. Of note, the AI-generated peer review provided a list of specific-looking general comments with no bearing on the text and included specific but unrelated comments to the study content, which could be perceived as reasons for rejection. When additional references were asked, ChatGPT fabricated articles which were non-existent but were authored by real persons working on similar topics. The author concluded that due to the ability of ChatGPT in summarizing the paper and its methodology remarkably well, it could easily be mistaken for an actual review report by persons that have not fully read the manuscript and that authors should be prepared to challenge reviewer comments that seem unrelated and non-specific (Donker, 2023).

A study by Checco et al. (2021) aimed to illustrate and discuss the potential, pitfalls, and uncertainties of the use of AI to approximate or assist human decisions in the quality assurance and peer-review process associated with research outputs. For this reason, they developed an AI tool to assess, amongst others, whether AI can approximate human decisions in the peer-review process, whether it can uncover common biases in the review process or preserve such biases. It was found that even when rather superficial metrics such as word distribution, readability and formatting scores were used to perform the training, the AI tool was often able to successfully predict the peer review outcome reached as a result of human reviewers’ recommendations. This may indicate that the “first-impression bias” found in reviewers can be present in a training dataset and thus, the trained AI model can propagate such biases (Checco et al., 2021).

Additional to limitations of AI-tools in the PRP, their use raises ethical concerns on various other aspects. Data confidentiality is a key issue in the PRP, highlighted in all publishing policies as an obligation for editors and reviewers involved. One of the possible risks when using AI tools, such as ChatGPT, in the PRP is that of data breach. When confidential manuscripts are uploaded, they contain original and novel ideas that are the authors’ intellectual property, original research data, or even include personal data of research subjects. It is unknown if these can be used by LLMs for self-training and subsequent tasks. Therefore, data security is a prerequisite not always met when using online generative AI tools.

Algorithmic bias of the LLMs that are used can be also introduced to AI-produced review reports. Existing peer reviews are used as training datasets for the AI algorithms, and these datasets may contain biases propagated by the AI model. Such biases include first impression bias (Checco et al., 2021), cultural and organizational biases (Diakopoulos, 2016) including language skills, and biases towards highly reputable research institutes or wealthy countries with high research and development expenditure. AI-generated review reports which are biased can have a significant impact on a researchers’ career or in a research team’s reputation, emphasizing that human oversight is necessary in the process. There are also concerns that AI-based tools could exacerbate existing challenges of the peer review system allowing fake peer reviewers to create more unique and well-written reviews (Hosseini and Horbach, 2023).

Most importantly, one should not overlook the risk of editors and reviewers over-relying on AI tools to assess the originality, quality and impact of a manuscript, overlooking their own experience, scientific and expert judgment. Dependency on AI tools – even to a certain extent – in order to manage the PRP can jeopardize the way that editors and reviewers exercise their autonomy, which becomes even more problematic when biases exist in the algorithms, as discussed earlier. Consequently, this leads authors to distrust the PRP due to a lack of transparency on the rationale of decision-making.

Ultimately, the potential negative consequences of AI tools used in the peer review may undermine the integrity and the purpose of the process, and challenge academic communication and even trust in science at large. Thus, the need to have clear policies on the use of AI tools by reviewers is evident. Indeed, this issue has been very recently raised by scholars who argue that there is a need to apply a consistent, end-to-end policy on the ethics and integrity of AI-based tools in publishing, including the editorial process and the PRP, to avoid the risk of compromising the integrity of the PRP and of undermining the credibility of academic publications (Garcia, 2023; Ling and Yan, 2023).

Judging from existing policies on the use of AI-based tools by authors, the policies on whether reviewers can use AI tools will be either restrictive or permissive on the condition that the tools and the way they are used are disclosed in the review report. But the next question that arises is: are policies enough and what are the steps that need to be taken further down to ensure that policies are followed and reviewers who violate them are being identified? WAME guidelines for the use of AI tools by authors recommend that:

[. . .] editors need appropriate tools to help them detect content generated or altered by AI for the good of science and the public, and to help ensure the integrity of healthcare information and reducing the risk of adverse health outcomes. (Zielinski et al., 2023)

Should we use similar tools to detect AI-generated/altered review reports or will this undermine the altruism on which peer review is based on? Reviewers voluntarily spend time and effort on manuscript aiming to help authors improve their papers and play an active role in the scientific community (Mulligan and Raphael, 2010). Checking for AI content in review reports challenges the trust between reviewers and editors as integral parts of the PRP, as well as the reviewers’ willingness to continue offering their expertise is the peer review system.

But then again, without tools to detect content generated or altered by AI in peer review reports, how can we prove whether allegations are true or not? How can we fill in the gap of evidence in cases where there are suspicions that a reviewer has used AI tools, such as ChatGPT, to produce a review report? What are the procedures to be followed by editors in such cases? Publishers urgently need to consider forming appropriate policies for such cases, which are expected to increase along with the use of generative AI and LLMs. Policies and procedures have to be transparent, detailed and solid, facilitating decision-making when a reviewer is found to have used AI-tools without acknowledging it. In case authors suspect that a reviewer has used AI-assisted technologies to produce a review report, they need to alert the editors, who subsequently have an obligation to raise such matters to the journal’s Ethics Committee (again, assuming one exists) or research integrity team. But in the absence of evidence, how could such allegations be investigated by Ethics Committees? If the use of AI tools in review reports cannot be proven, for example, through tools that detect content generated or altered by AI, I am afraid that this will lead to the death of ethics in publishing. Unless measures are taken, this weakness will be added on top of the existing criticism on the peer review system related to the long delays in publishing new findings, to the efficacy of the whole process, to its susceptibility to perceived bias by editors and reviewers, and to its inability to detect fraudulent research and scientific misconduct (Castelo-Branco, 2023; Manchikanti et al., 2015; Tennant et al., 2017).

Thus, in the dilemma “death of a reviewer or death of peer review integrity?,” I choose the death of a reviewer, after acquiring evidence of course. Reviewers who are found not to comply with publishing policies must be flagged and excluded from the PRP, not only for the specific journal but also for all journals under the same publisher. Publishing policies are essential, but they need to enable certain solid steps towards protecting peer review ethics. Reluctance to do so can put the journal’s and/or the publisher’s reputation at risk. In fact, it is more than reputation which is at stake here: the integrity and ethics of the peer review system can be compromised. Peer review misconduct is not just about attempts to make fake reviews or abusing the recommended reviewer functionality of journals. It is also about trying to verify the originality and accuracy of the peer review report, for which reviewers are solely accountable.

Footnotes

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

All articles in Research Ethics are published as open access. There are no submission charges and no Article Processing Charges as these are fully funded by institutions through Knowledge Unlatched, resulting in no direct charge to authors. For more information about Knowledge Unlatched please see here: .

Ethics approval

Not applicable.

ORCID iD

Vasiliki Mollaki

References

BRILL, Position Statement on Author/Editor Use of Artificial Intelligence (AI) & Large Language Models (LLMs) (2023). Available at: https://brill.com/page/ai_policy/position-statement-on-authoreditor-use-of-artificial-intelligence-ai-large-language-models-llms (accessed 29 September 2023).

Castelo-Branco

(2023) Peer review? No thanks! Climacteric 26(1): 3–4.

Checco

Bracciale

Loreti

, et al. (2021) AI-assisted peer review. Humanities and Social Sciences Communications 8: 25.

Committee On Publication Ethics (COPE) (2023) Authorship and AI. Available at: https://publicationethics.org/cope-position-statements/ai-author (accessed 29 September 2023).

Dergaa

Chamari

Zmijewski

, et al. (2023) From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biology of Sport 40(2): 615–622.

Diakopoulos

(2016) Accountability in algorithmic decision making. Communications of the ACM 59: 56–62.

Donker

(2023) The dangers of using large language models for peer review. Lancet Infectious Diseases 23(7): 781.

Elsevier, Publishing Ethics (2023). Available at: https://www.elsevier.com/about/policies/publishing-ethics (accessed 29 September 2023).

Garcia

(2023) Using AI tools in writing peer review reports: Should academic journals embrace the use of ChatGPT? Annals of Biomedical Engineering. Online ahead of print.

10.

Guston

(2007) Between Politics and Science: Assuring the Integrity and Productivity of Research. New York, NY: Cambridge University Press.

11.

Hosseini

Horbach

SPJM

(2023) Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review. Research Square 8(1): 4.

12.

Inder Science, Publishing Ethics Statement (2023). Available at: https://www.inderscience.com/mobile/ingeneral/publishing-ethics-statement/ (accessed 29 September 2023).

13.

Lee

Salim

Abdullah

, et al (2023) Use of ChatGPT in medical research and scientific writing. Malaysian Family Physician 18: 58.

14.

Ling

Yan

(2023) Let’s be fair. What about an AI editor? Account Research 15: 1–2.

15.

Manchikanti

Kaye

Boswell

, et al. (2015) Medical journal peer review: Process and bias. Pain Physician 18(1): E1–E14.

16.

Mulligan

Raphael

(2010) Peer review in a changing world: Preliminary findings of a global study. Serials 23: 25–34.

17.

Nicholas

Watkinson

Jamali

, et al. (2015) Peer review: Still king in the digital age. Learned Publishing 28: 15–21.

18.

Nishikawa-Pacher

(2022) Who are the 100 largest scientific publishers by journal count? A webscraping approach. Journal of Documentation 78(7): 450–463.

19.

OMICS online, Publication Policies and Ethics (2023). Available at: https://www.omicsonline.org/publication-policies-and-ethics.php (accessed 29 September 2023).

20.

Oxford Academic, Authoring, Ethics (2023). Available at: https://academic.oup.com/pages/authoring/journals/preparing_your_manuscript/ethics (accessed 29 September 2023).

21.

Rennie

(2003) Editorial peer review: its development and rationale. Peer Review in Health Sciences 2: 1–13.

22.

Sage Publishing Policies, ChatGPT and Generative AI (2023). Available at: https://us.sagepub.com/en-us/nam/chatgpt-and-generative-ai (accessed 29 September 2023).

23.

Schulz

Barnett

Bernard

, et al. (2022) Is the future of peer review automated? BMC Research Notes 15(1): 203.

24.

Springer Editorial Policies, Artificial Intelligence (AI) (2023). Available at: https://www.springer.com/gp/editorial-policies/artificial-intelligence–ai-/25428500 (accessed 29 September 2023).

25.

Taylor and Francis, Journal Editor Code of Conduct (2023). Available at: https://editorresources.taylorandfrancis.com/welcome-to-tf/policies-guidelines/editor-code-of-conduct/ (accessed 29 September 2023).

26.

Taylor and Francis, Reviewer Guidelines (2023). Available at: https://editorresources.taylorandfrancis.com/reviewer-guidelines/ (accessed 29 September 2023).

27.

Tennant

Dugan

Graziotin

, et al. (2017). A multi-disciplinary perspective on emergent and future innovations in peer review. F1000Res 20(6): 1151.

28.

Ware

(2008) Peer review: benefits, perceptions and alternatives. Publishing Research Consortium 4: 1–20.

29.

Willey, Best Practice Guidelines on Research Integrity and Publishing Ethics (2023). Available at: (https://authorservices.wiley.com/ethics-guidelines/index.html) (accessed 29 September 2023).

30.

Zielinski

Winker

Aggarwal

, et al. (2023) Chatbots, generative AI, and scholarly manuscripts. WAME recommendations on Chatbots and generative artificial intelligence in relation to scholarly publications. WAME. Available at: https://wame.org/page3.php?id=106 (accessed 29 September 2023).