Abstract
Journals and publishers are increasingly using artificial intelligence (AI) to screen submissions for potential misconduct, including plagiarism and data or image manipulation. While using AI can enhance the integrity of published manuscripts, it can also increase the risk of false/unsubstantiated allegations. Ambiguities related to journals’ and publishers’ responsibilities concerning fairness and transparency also raise ethical concerns. In this Topic Piece, we offer the following guidance: (1) All cases of suspected misconduct identified by AI tools should be carefully reviewed by humans to verify accuracy and ensure accountability; (2) Journals/publishers that use AI tools to detect misconduct should use only well-tested and reliable tools, remain vigilant concerning forms of misconduct that cannot be detected by these tools, and stay abreast of advancements in technology; (3) Journals/publishers should inform authors about irregularities identified by AI tools and give them a chance to respond before forwarding allegations to their institutions in accordance with Committee on Publication Ethics guidelines; (4) Journals/publishers that use AI tools to detect misconduct should screen all relevant submissions and not just random/purposefully selected submissions; and (5) Journals should inform authors about their definition of misconduct, their use of AI tools to detect misconduct, and their policies and procedures for responding to suspected cases of misconduct.
Background
Recently, the research integrity community have been using artificial intelligence (AI) tools, such as Papermill Alarm, Proofig, FigCheck, ImaCheck, and ImageTwin, to identify manipulated images and papers produced by paper mills or AI ChatBots (Else, 2022; Oza, 2023; Sanderson, 2024; STM Integrity Hub, n.d). Now, journals and publishers are beginning to take advantage of these tools (Conroy, 2023). The editors of the Science family of journals, for example, recently announced that they will use AI to screen all submitted images for problematic manipulations (Thorp, 2024). In March 2024, Wiley announced it would begin using a new AI-powered Papermill Detection service to identify papers produced by paper mills (De Rose, 2024). 1 A Wiley spokesperson told the research integrity blog Retraction Watch that this system is “meant to supplement human integrity checks with AI-powered tools. This means that papers will not automatically be rejected if they are flagged in the system – rather, they will be flagged to an editor for closer consideration before proceeding in the publishing workflow” (Oransky, 2024). While using AI tools to screen journal submissions for plagiarism and data or image manipulation is likely to enhance the integrity of published research, it also generates some ethical concerns. In this Topic Piece, we will explore these concerns and offer some guidance for journals and publishers. 2
AI errors increase the likelihood of false or unsubstantiated allegations of misconduct
Although misconduct-detection programs perform well, they are not error-free (Haritonova, 2022). Errors made by these programs include false positives (e.g. identifying something deviant when there is no factual basis for this claim) and false negatives (i.e. failing to identify deviancy). For example, Turnitin, a popular plagiarism detection tool used by many academic institutions, has a false positive rate of 4% (D’Agostino, 2023). AI image-manipulation detection programs are also mostly accurate, but they can make mistakes (David, 2023). Proofig, for example, has an estimated false positive rate of 5.8% (Evanko, 2022). While both types of error can undermine efforts to detect scientific misconduct, false positives raise more serious concerns than false negatives because they may lead to false allegations of misconduct against innocent researchers.
The use of AI tools to screen submissions could likely lead to unsubstantiated allegations, that is, allegations that lack a sufficient factual basis. An unsubstantiated allegation may occur when an AI tool correctly identifies a deviation from research standards, but there is not enough evidence to show that the accused person (or respondent) acted with ill-intent. Most research misconduct regulations and policies distinguish between misconduct (which is intentional) and honest error (which is not intentional) (Shamoo and Resnik, 2022). For example, according to US federal policy, a finding of research misconduct (defined as fabrication or falsification of data or plagiarism) requires proof by preponderance of evidence that the person acted intentionally, knowingly, or recklessly (Office of Science and Technology Policy, 2000). While AI tools can enhance our ability to detect deviations from accepted research practices, they cannot prove misconduct because they cannot show that the respondent acted knowingly, intentionally, or recklessly, which requires delving into the person’s state of mind (Shamoo and Resnik, 2022).
Because researchers’ behavior can be negligent or incompetent without qualifying as misconduct, distinguishing between misconduct and honest error can be a resource-intensive and time-consuming activity. In one of the most famous cases of alleged misconduct, which took place from 1984 to 1994, a federal appeals court found that Tereza Immanishi-Kari’s recordkeeping practices were negligent without amounting to research misconduct (Shamoo and Resnik, 2022). Although the media and research integrity experts tend to focus on high-profile cases involving substantiated allegations of research misconduct (such as the Hwang Woo-suk or Diederik Stapel cases), most misconduct allegations turn out to be unsubstantiated. A study of 3561 allegations of research misconduct involving Public Health Service (PHS)-funded research between 1994 and 2011 found that institutions dismissed 87.4% of allegations because they did not meet the definition of misconduct or there was insufficient evidence (Loikith and Bauchwitz, 2016). In 2022, the Office of Research Integrity, which oversees PHS-funded research, dismissed 85.9% of the misconduct allegations it received for the same reasons (Office of Research Integrity, 2023).
It is likely, therefore, that a high percentage of research misconduct allegations generated by AI tools will also turn out to be unsubstantiated. Support for this hypothesis comes from a recent study that examined manuscripts submitted to nine journals sponsored by the American Association of Cancer Research between January 2021 and May 2022. These journals used Proofig to screen submissions for improper image manipulation. Correspondence between the editors and authors showed that the majority (63%) of the 195 images identified as duplications were likely due to error rather than deliberate misbehavior (Evanko, 2022).
Unsubstantiated allegations of research misconduct can cause considerable emotional, financial, and reputational harm to respondents. Defending oneself against a research misconduct allegation is an arduous and emotionally-taxing process that can last years. Respondents who can afford to hire an attorney may spend tens of thousands of dollars in legal fees, and their reputations may still be damaged forever (Benderly, 2016). Furthermore, false or unsubstantiated allegations can incur needless costs on involved institutions. One study estimated the costs of a single research misconduct investigation to be more than $500,000 (Michalek et al., 2010).
AI tools have limitations that can impact efforts to detect misconduct
Detection tools have limitations that can impact efforts to detect plagiarism and image or data manipulation. For example, current AI tools can detect plagiarism of words but not other types of plagiarism, such as misappropriation of data, ideas, methods, arguments, images, tables, or unique forms of expression (Hosseini and Gordijn, 2020). Current AI tools can detect relatively unsophisticated forms of image manipulation, such as image duplication or repositioning but not more subtle forms of manipulation, such as alteration of microdetails of represented structures (Bik et al., 2016). In the future, ill-intentioned researchers may take advantage of generative AI tools to create fraudulent images that are nearly impossible to distinguish from real images (Gu et al., 2022; Kim et al., 2024). Furthermore, since AI detection tools are openly available, ill-intentioned researchers can try them out before submission to ensure that their manuscript passes certain tests so they can avoid being caught by AI tools.
Journals and publishers that plan to use AI to detect misconduct should be aware of their limitations and correct their biases (COPE Council, 2021: 6). For example, journals and publishers should use only well-tested and reliable tools and still be vigilant about preventing and detecting forms of plagiarism and data or image manipulation that their AI tools cannot easily detect. They should also stay abreast of technological advancements, such as next-generation detection software and digital certification, which can help to verify the originality of text and authenticity of data and images (Chennamma and Madhushree, 2023; Kim et al., 2024).
Publishers’ responsibilities associated with misconduct detection using AI tools are unclear
Because using AI tools to screen submissions increases the ability to detect misconduct, it may also increase journals’ and publishers’ responsibilities related to reporting or investigating suspected misconduct. However, it is unclear what journals or publishers will do after detecting suspected misconduct in a submitted manuscript and whether or how they will share this information with authors or institutions. The editors of the Science family of journals indicate that the handling editor “will probe further and take steps that could include rejecting the paper” (Thorp, 2024: 7), but they do not say much about their process and whether institutions will be informed. In cases where misconduct appears to be highly likely, rejecting the paper without informing the institution is not sufficient because it does not prevent the author(s) from submitting elsewhere (Oransky and Marcus, 2016), nor does it inform institutions that support and fund authors about their malpractices. It is also important to note that bias could occur in screening manuscripts for misconduct if AI tools are applied only to certain types of manuscripts or authors. Studies have shown, for example, that researchers tend to be biased against submissions from low- and middle-income countries and low-prestige institutions (Harris et al., 2017; Tavoletti et al., 2022).
To promote fairness and transparency, journals and publishers that use AI tools to detect misconduct should disclose their policies and procedures on their websites and follow Committee on Publication Ethics (COPE) guidelines for dealing with cases of suspected misconduct, including the stipulation to “inform institutions if misconduct by their researchers is suspected, and provide evidence to support these concerns; cooperate with investigations and respond promptly to institutions’ questions about misconduct allegations, with dedicated individuals or teams assigned to investigate and communicate with institutions; have policies and procedures for responding to institutions and other organizations that investigate cases of research misconduct” (COPE Council, 2024a, 2024b: 3).
Conclusion
While using AI tools to screen journal submissions for misconduct is likely to enhance the integrity of published research, it also generates some ethical concerns related to harm, fairness, and transparency, which need to be addressed. Therefore, we offer the following recommendations to journals and publishers that use AI to screen submissions for possible misconduct:
(1) In agreement with COPE guidelines, all cases of suspected misconduct identified by AI tools should be carefully reviewed by humans to verify accuracy and ensure accountability;
(2) Journals/publishers that use AI tools to detect misconduct should only use well-tested and reliable tools, remain vigilant concerning forms of misconduct that cannot be detected by these tools, and stay abreast of advancements in technology;
(3) Journals/publishers should inform authors about irregularities identified by AI tools and give them a chance to respond before forwarding allegations to their institutions in accordance with COPE guidelines;
(4) Journals/publishers that use AI tools to detect misconduct should screen all relevant submissions and not just random/purposefully selected submissions; and
(5) Journals should inform authors about their use of AI tools to detect misconduct, their policies and procedures for responding to suspected cases of misconduct, and their definition of misconduct.
Footnotes
Acknowledgements
We thank the journal editor and two anonymous reviewers for their constructive and valuable feedback. We are also grateful for helpful comments from Ivan Oransky and Jennifer Wright.
Funding
Mohammad Hosseini was supported by the National Institutes of Health’s National Center for Advancing Translational Sciences (UL1TR001422). David Resnik was supported by the Intramural Program of the NIH. The funders have not played a role in the design, analysis, decision to publish, or preparation of the manuscript.
Ethical approval
Ethical approval is not relevant to this study because no human subjects were used.
