Sage Journals: Discover world-class research

Abstract

Journals and publishers are increasingly using artificial intelligence (AI) to screen submissions for potential misconduct, including plagiarism and data or image manipulation. While using AI can enhance the integrity of published manuscripts, it can also increase the risk of false/unsubstantiated allegations. Ambiguities related to journals’ and publishers’ responsibilities concerning fairness and transparency also raise ethical concerns. In this Topic Piece, we offer the following guidance: (1) All cases of suspected misconduct identified by AI tools should be carefully reviewed by humans to verify accuracy and ensure accountability; (2) Journals/publishers that use AI tools to detect misconduct should use only well-tested and reliable tools, remain vigilant concerning forms of misconduct that cannot be detected by these tools, and stay abreast of advancements in technology; (3) Journals/publishers should inform authors about irregularities identified by AI tools and give them a chance to respond before forwarding allegations to their institutions in accordance with Committee on Publication Ethics guidelines; (4) Journals/publishers that use AI tools to detect misconduct should screen all relevant submissions and not just random/purposefully selected submissions; and (5) Journals should inform authors about their definition of misconduct, their use of AI tools to detect misconduct, and their policies and procedures for responding to suspected cases of misconduct.

Keywords

Publication ethics research misconduct artificial intelligence policy transparency

Background

Recently, the research integrity community have been using artificial intelligence (AI) tools, such as Papermill Alarm, Proofig, FigCheck, ImaCheck, and ImageTwin, to identify manipulated images and papers produced by paper mills or AI ChatBots (Else, 2022; Oza, 2023; Sanderson, 2024; STM Integrity Hub, n.d). Now, journals and publishers are beginning to take advantage of these tools (Conroy, 2023). The editors of the Science family of journals, for example, recently announced that they will use AI to screen all submitted images for problematic manipulations (Thorp, 2024). In March 2024, Wiley announced it would begin using a new AI-powered Papermill Detection service to identify papers produced by paper mills (De Rose, 2024).¹ A Wiley spokesperson told the research integrity blog Retraction Watch that this system is “meant to supplement human integrity checks with AI-powered tools. This means that papers will not automatically be rejected if they are flagged in the system – rather, they will be flagged to an editor for closer consideration before proceeding in the publishing workflow” (Oransky, 2024). While using AI tools to screen journal submissions for plagiarism and data or image manipulation is likely to enhance the integrity of published research, it also generates some ethical concerns. In this Topic Piece, we will explore these concerns and offer some guidance for journals and publishers.²

AI errors increase the likelihood of false or unsubstantiated allegations of misconduct

Although misconduct-detection programs perform well, they are not error-free (Haritonova, 2022). Errors made by these programs include false positives (e.g. identifying something deviant when there is no factual basis for this claim) and false negatives (i.e. failing to identify deviancy). For example, Turnitin, a popular plagiarism detection tool used by many academic institutions, has a false positive rate of 4% (D’Agostino, 2023). AI image-manipulation detection programs are also mostly accurate, but they can make mistakes (David, 2023). Proofig, for example, has an estimated false positive rate of 5.8% (Evanko, 2022). While both types of error can undermine efforts to detect scientific misconduct, false positives raise more serious concerns than false negatives because they may lead to false allegations of misconduct against innocent researchers.

The use of AI tools to screen submissions could likely lead to unsubstantiated allegations, that is, allegations that lack a sufficient factual basis. An unsubstantiated allegation may occur when an AI tool correctly identifies a deviation from research standards, but there is not enough evidence to show that the accused person (or respondent) acted with ill-intent. Most research misconduct regulations and policies distinguish between misconduct (which is intentional) and honest error (which is not intentional) (Shamoo and Resnik, 2022). For example, according to US federal policy, a finding of research misconduct (defined as fabrication or falsification of data or plagiarism) requires proof by preponderance of evidence that the person acted intentionally, knowingly, or recklessly (Office of Science and Technology Policy, 2000). While AI tools can enhance our ability to detect deviations from accepted research practices, they cannot prove misconduct because they cannot show that the respondent acted knowingly, intentionally, or recklessly, which requires delving into the person’s state of mind (Shamoo and Resnik, 2022).

Because researchers’ behavior can be negligent or incompetent without qualifying as misconduct, distinguishing between misconduct and honest error can be a resource-intensive and time-consuming activity. In one of the most famous cases of alleged misconduct, which took place from 1984 to 1994, a federal appeals court found that Tereza Immanishi-Kari’s recordkeeping practices were negligent without amounting to research misconduct (Shamoo and Resnik, 2022). Although the media and research integrity experts tend to focus on high-profile cases involving substantiated allegations of research misconduct (such as the Hwang Woo-suk or Diederik Stapel cases), most misconduct allegations turn out to be unsubstantiated. A study of 3561 allegations of research misconduct involving Public Health Service (PHS)-funded research between 1994 and 2011 found that institutions dismissed 87.4% of allegations because they did not meet the definition of misconduct or there was insufficient evidence (Loikith and Bauchwitz, 2016). In 2022, the Office of Research Integrity, which oversees PHS-funded research, dismissed 85.9% of the misconduct allegations it received for the same reasons (Office of Research Integrity, 2023).

It is likely, therefore, that a high percentage of research misconduct allegations generated by AI tools will also turn out to be unsubstantiated. Support for this hypothesis comes from a recent study that examined manuscripts submitted to nine journals sponsored by the American Association of Cancer Research between January 2021 and May 2022. These journals used Proofig to screen submissions for improper image manipulation. Correspondence between the editors and authors showed that the majority (63%) of the 195 images identified as duplications were likely due to error rather than deliberate misbehavior (Evanko, 2022).

Unsubstantiated allegations of research misconduct can cause considerable emotional, financial, and reputational harm to respondents. Defending oneself against a research misconduct allegation is an arduous and emotionally-taxing process that can last years. Respondents who can afford to hire an attorney may spend tens of thousands of dollars in legal fees, and their reputations may still be damaged forever (Benderly, 2016). Furthermore, false or unsubstantiated allegations can incur needless costs on involved institutions. One study estimated the costs of a single research misconduct investigation to be more than $500,000 (Michalek et al., 2010).

AI tools have limitations that can impact efforts to detect misconduct

Detection tools have limitations that can impact efforts to detect plagiarism and image or data manipulation. For example, current AI tools can detect plagiarism of words but not other types of plagiarism, such as misappropriation of data, ideas, methods, arguments, images, tables, or unique forms of expression (Hosseini and Gordijn, 2020). Current AI tools can detect relatively unsophisticated forms of image manipulation, such as image duplication or repositioning but not more subtle forms of manipulation, such as alteration of microdetails of represented structures (Bik et al., 2016). In the future, ill-intentioned researchers may take advantage of generative AI tools to create fraudulent images that are nearly impossible to distinguish from real images (Gu et al., 2022; Kim et al., 2024). Furthermore, since AI detection tools are openly available, ill-intentioned researchers can try them out before submission to ensure that their manuscript passes certain tests so they can avoid being caught by AI tools.

Journals and publishers that plan to use AI to detect misconduct should be aware of their limitations and correct their biases (COPE Council, 2021: 6). For example, journals and publishers should use only well-tested and reliable tools and still be vigilant about preventing and detecting forms of plagiarism and data or image manipulation that their AI tools cannot easily detect. They should also stay abreast of technological advancements, such as next-generation detection software and digital certification, which can help to verify the originality of text and authenticity of data and images (Chennamma and Madhushree, 2023; Kim et al., 2024).

Publishers’ responsibilities associated with misconduct detection using AI tools are unclear

Because using AI tools to screen submissions increases the ability to detect misconduct, it may also increase journals’ and publishers’ responsibilities related to reporting or investigating suspected misconduct. However, it is unclear what journals or publishers will do after detecting suspected misconduct in a submitted manuscript and whether or how they will share this information with authors or institutions. The editors of the Science family of journals indicate that the handling editor “will probe further and take steps that could include rejecting the paper” (Thorp, 2024: 7), but they do not say much about their process and whether institutions will be informed. In cases where misconduct appears to be highly likely, rejecting the paper without informing the institution is not sufficient because it does not prevent the author(s) from submitting elsewhere (Oransky and Marcus, 2016), nor does it inform institutions that support and fund authors about their malpractices. It is also important to note that bias could occur in screening manuscripts for misconduct if AI tools are applied only to certain types of manuscripts or authors. Studies have shown, for example, that researchers tend to be biased against submissions from low- and middle-income countries and low-prestige institutions (Harris et al., 2017; Tavoletti et al., 2022).

To promote fairness and transparency, journals and publishers that use AI tools to detect misconduct should disclose their policies and procedures on their websites and follow Committee on Publication Ethics (COPE) guidelines for dealing with cases of suspected misconduct, including the stipulation to “inform institutions if misconduct by their researchers is suspected, and provide evidence to support these concerns; cooperate with investigations and respond promptly to institutions’ questions about misconduct allegations, with dedicated individuals or teams assigned to investigate and communicate with institutions; have policies and procedures for responding to institutions and other organizations that investigate cases of research misconduct” (COPE Council, 2024a, 2024b: 3).

Conclusion

While using AI tools to screen journal submissions for misconduct is likely to enhance the integrity of published research, it also generates some ethical concerns related to harm, fairness, and transparency, which need to be addressed. Therefore, we offer the following recommendations to journals and publishers that use AI to screen submissions for possible misconduct:

(1) In agreement with COPE guidelines, all cases of suspected misconduct identified by AI tools should be carefully reviewed by humans to verify accuracy and ensure accountability;

(2) Journals/publishers that use AI tools to detect misconduct should only use well-tested and reliable tools, remain vigilant concerning forms of misconduct that cannot be detected by these tools, and stay abreast of advancements in technology;

(3) Journals/publishers should inform authors about irregularities identified by AI tools and give them a chance to respond before forwarding allegations to their institutions in accordance with COPE guidelines;

(4) Journals/publishers that use AI tools to detect misconduct should screen all relevant submissions and not just random/purposefully selected submissions; and

(5) Journals should inform authors about their use of AI tools to detect misconduct, their policies and procedures for responding to suspected cases of misconduct, and their definition of misconduct.

Footnotes

Acknowledgements

We thank the journal editor and two anonymous reviewers for their constructive and valuable feedback. We are also grateful for helpful comments from Ivan Oransky and Jennifer Wright.

Funding

Mohammad Hosseini was supported by the National Institutes of Health’s National Center for Advancing Translational Sciences (UL1TR001422). David Resnik was supported by the Intramural Program of the NIH. The funders have not played a role in the design, analysis, decision to publish, or preparation of the manuscript.

Ethical approval

Ethical approval is not relevant to this study because no human subjects were used.

ORCID iDs

Mohammad Hosseini

David B Resnik

Notes

References

Benderly

(2016) Can you rescue a damaged reputation? Science. Epub ahead of print 28 December 2016. DOI: 10.1126/science.caredit.a1600165.

Bik

Casadevall

Fang

(2016) The prevalence of inappropriate image duplication in biomedical research publications. mBio 7(3): e00809–e00816.

Chennamma

Madhushree

(2023) A comprehensive survey on image authentication for tamper detection with localization. Multimedia Tools and Applications 82(2): 1873–1904.

Conroy

(2023) How ChatGPT and other AI tools could disrupt scientific publishing. Nature 622(7982): 234–236.

COPE Council (2021) COPE discussion document: Artificial intelligence (AI) in decision making. Available at: https://doi.org/10.24318/9kvAgrnJ (accessed 9 April 2024).

COPE Council (2024a) Paper mills research. Available at: https://doi.org/10.24318/jtbG8IHL (accessed 3 April 2024).

COPE Council (2024b) COPE guidelines: Cooperation between research institutions and journals on research integrity and publication misconduct cases. Available at: https://doi.org/10.24318/cope.2018.1.3 (accessed 15 March 2024).

David

(2023) A quantitative study of inappropriate image duplication in the journal toxicology reports. bioRxiv. Available at: https://www.biorxiv.org/content/10.1101/2023.09.03.556099v1 (accessed 15 March 2024).

De Rose

(2024). Wiley announces pilot of new AI-powered Papermill Detection service. Available at: https://johnwiley2020news.q4web.com/press-releases/press-release-details/2024/Wiley-announces-pilot-of-new-AI-powered-Papermill-Detection-service/default.aspx (accessed 2 April 2024).

10.

D’Agostino

(2023) Turnitin’s AI detector: Higher-than-expected false positives. Inside Higher Ed. Available at: https://www.insidehighered.com/news/quick-takes/2023/06/01/turnitins-ai-detector-higher-expected-false-positives (accessed 15 March 2024).

11.

Else

(2022) ‘Papermill alarm’ software flags potentially fake papers. Nature. Epub ahead of print 23 September 2022. DOI: 10.1038/d41586-022-02997-x.

12.

Evanko

(2022) Use of an artificial intelligence–based tool for detecting image duplication prior to manuscript acceptance. Chicago, IL: Peer Review Congress. Available at: https://peerreviewcongress.org/abstract/use-of-an-artificial-intelligence-based-tool-for-detecting-image-duplication-prior-to-manuscript-acceptance/ (accessed March 15, 2024).

13.

Wang

, et al. (2022) AI-enabled image fraud in scientific publications. Patterns 3(7): 100511.

14.

Haritonova

(2022) Machine learning fraud detection: Pros, cons, and use cases. PixelPlex. Available at: https://pixelplex.io/blog/machine-learning-for-fraud-detection/ (accessed March 15, 2024).

15.

Harris

Macinko

Jimenez

, et al. (2017) Measuring the bias against low-income country research: An implicit association test. Globalization and Health 13(1): 80.

16.

Hosseini

Gordijn

(2020) A review of the literature on ethical issues related to scientific authorship. Accountability in Research 27: 284–324.

17.

Hosseini

Resnik

Holmes

(2023) The ethics of disclosing the use of artificial intelligence tools in writing scholarly manuscripts. Research Ethics 19(4): 449–465.

18.

Kim

Lee

, et al. (2024) Generative AI can fabricate advanced scientific visualizations: ethical implications and strategic mitigation framework. AI and ethics. Available at: https://doi.org/10.1007/s43681-024-00439-0 (accessed 3 April 2024).

19.

Loikith

Bauchwitz

(2016) The essential need for research misconduct allegation audits. Science and Engineering Ethics 22(4): 1027–1049.

20.

Lund

Wang

Mannuru

, et al. (2023) ChatGPT and a new academic reality: Artificial Intelligence-written research papers and the ethics of the large language models in scholarly publishing. Journal of the Association for Information Science and Technology 74(5): 570–581.

21.

Michalek

Hutson

Wicher

, et al. (2010) The costs and underappreciated consequences of research misconduct: A case study. Plos Medicine 7(8): e1000318.

22.

Office of Research Integrity (2023) Annual report FY 2022. Rockville, MD: U.S. Department of Health and Human Services. Available at: https://ori.hhs.gov/sites/default/files/2023-02/FY22%20ORI%20Annual%20Report_FINAL_0.pdf (accessed 3 April 2024).

23.

Office of Science and Technology Policy (2000) Federal research misconduct policy | ORI - The Office of Research Integrity. DOCID:fr06de00-72. Available at: https://ori.hhs.gov/federal-research-misconduct-policy (accessed 15 March 2024).

24.

Oransky

(2024) Up to one in seven submissions to hundreds of Wiley journals flagged by new paper mill tool. Retraction Watch. Available at: https://retractionwatch.com/2024/03/14/up-to-one-in-seven-of-submissions-to-hundreds-of-wiley-journals-show-signs-of-paper-mill-activity/ (accessed 14 April 2024).

25.

Oransky

Marcus

(2016) There’s a way to spot data fakery. All journals should be using it. STAT News. Available at: https://www.statnews.com/2016/11/11/spot-data-fakery/ (accessed 3 April 2024).

26.

Oza

(2023) AI beats human sleuth at finding problematic images in research papers. Nature 622(7982): 230–230.

27.

Sanderson

(2024) Science’s fake-paper problem: High-profile effort will tackle paper mills. Nature 626(7997): 17–18.

28.

Shamoo

Resnik

(eds) (2022) Responsible Conduct of Research, 4th edn. New York, NY: Oxford University Press.

29.

Stahl

Eke

(2024) The ethics of ChatGPT – Exploring the ethical issues of an emerging technology. International Journal of Information Management 74: 102700.

30.

STM Integrity Hub (n.d) About STM. Available at: https://www.stm-assoc.org/stm-integrity-hub/ (accessed 14 April 2024).

31.

Tavoletti

Stephens

Taras

, et al. (2022) Nationality biases in peer evaluations: The country-of-origin effect in global virtual teams. International Business Review 31(2):101969.

32.

Thorp

(2024) Genuine images in 2024. Science 383(6678): 7.