Sage Journals: Discover world-class research

Abstract

The integration of artificial intelligence (AI) in health care, particularly in colorectal cancer screening, represents a transformative advancement in medical diagnostics. This commentary explores the development and deployment of AI platforms in colonoscopy, highlighting the critical role of human–AI interaction and the valuable lessons learned throughout this journey. We discuss the challenges of integrating AI into clinical settings, including the necessity for robust AI validation, transparency, and accountability. Key insights from our experience emphasize the importance of effective human–AI collaboration to enhance diagnostic accuracy and procedure quality. This commentary aims to share critical insights that can guide future AI applications in various medical disciplines.

Introduction

The potential of artificial intelligence (AI) to revolutionize health care, particularly in cancer screening, has been increasingly recognized.^1,2 Cancer screening aims to detect cancer at an early stage, but human error and variability can limit its effectiveness. AI platforms can address some of these challenges by providing consistent, accurate, and real-time analysis of medical images and videos.^3,4

Colorectal cancer (CRC) screening is a prime example of how AI can make a significant impact. CRC is the third most common cancer worldwide, with precancerous polyps offering a window for early detection and intervention. However, identifying and characterizing these lesions can be challenging even for experienced endoscopists.⁵ AI platforms can aid in this process by enhancing the diagnostic accuracy of colonoscopies.^2,6

The development of AI platforms for polyp detections during colonoscopies represents a significant milestone in the intersection of AI and health care.¹ In the journey of developing a successful AI platform for screening colonoscopy, numerous valuable lessons have been gleaned. These lessons, derived from the intricate process of translating AI from development to deployment in health care, have significantly shaped our understanding of the challenges and opportunities in this field.

The genesis of the first AI platform for polyp detection to be cleared in the United States^7–9 was rooted in the need to address unmet clinical needs in gastroenterology, specifically to reduce the adenoma miss rate and enhance overall procedure quality in colonoscopy through AI technology. This endeavor began before AI became a mainstream tool in health care, capitalizing on a unique opportunity to record and analyze colonoscopy videos to aid early polyp detection.⁸

Following this initial de novo clearance by the FDA, other companies joined the AI race in gastroenterology within the United States and worldwide.^2,6,10 The evolution of these AI platforms since their inception has been remarkable. These devices have been adopted globally, particularly in the United States, where they have significantly influenced the field of gastroenterology. The AI platform’s success in improving clinical outcomes has been corroborated by numerous studies and trials, affirming their utility and effectiveness in enhancing gastroenterological procedures.^11–13 These findings have been instrumental in establishing the credibility of these AI platforms and highlighting their potential to revolutionize CRC screening. Although high-quality evidence¹³ has confirmed the benefits of AI in terms of credibility and improved adenoma detection rates, it is also essential to acknowledge that some data suggests a lack of performance improvement.¹⁴

Lessons Learned

The journey into developing a successful AI platform for endoscopy has been a rewarding and enlightening experience. It has offered numerous lessons, some of which were expected, whereas others were surprising and unexpected. These lessons have not only shaped the development and deployment of the AI platform but have also influenced the broader understanding of AI in health care. In this commentary, we will focus on three specific areas where we learned some lessons we had not anticipated when we embarked on this journey.

Transparency and accountability

One of the first lessons learned during our journey is that developing AI technologies, particularly in health care, is a delicate balancing act. On one hand, we have the immense potential of AI to improve patient outcomes, streamline health care processes, and reduce costs. On the other hand, we have legitimate concerns about patient safety, data privacy, and the ethical implications of AI use. Striking the right balance between these competing interests is a complex task that requires careful consideration and a nuanced understanding of the technology and its potential impacts. Differing perspectives on what constitutes “responsible AI” are often influenced by the varying priorities of stakeholders. For instance, AI developers and manufacturers might focus on innovation and economic gains, whereas health care providers and patients are likely to emphasize safety and ethical standards.

Given these complexities, various authoritative bodies—including The White House,¹⁵ The European Union,¹⁶ WHO,¹⁷ and FDA¹⁸—have set forth principles to guide the responsible development of AI. These guidelines stress the importance of transparency and accountability in AI development to foster wider acceptance and integration within health care settings.

As we progressed with our AI platform, we found that transparency about our development processes,^5,19,20 the data sources used,⁸ and sharing datasets to spur further innovation²¹ significantly bolstered stakeholder confidence. Engaging with the scientific community through peer-reviewed publications has allowed us to subject our work to both regulatory oversight and expert academic critique, enhancing the credibility and reliability of our AI applications.

Another lesson learned regarding transparency and accountability involves robust validation and testing procedures for AI platforms, especially those intended for clinical use. Ensuring these platforms’ safety, efficacy, and reliability is imperative and requires comprehensive testing. This includes technical validation, where the AI’s performance is tested across various datasets;¹⁸ clinical validation, such as conducting randomized controlled trials (RCTs) to evaluate AI’s impact on patient outcomes;^22,23 and real-world studies, for ensuring AI-based medical devices continue to perform safely and effectively after they have reached the market.^24,25 Adhering to established guidelines for conducting and reporting on AI performance studies is crucial. Guidelines such as DECIDE-AI,²⁶ SPIRIT-AI,²⁷ CONSORT-AI,²⁸ STARD-AI,²⁹ and TRIPOD-AI³⁰ offer structured frameworks that ensure transparency, accountability, and efficacy throughout the different stages of AI development and implementation. It is of vital importance for both the industry and the medical community to thoroughly understand these guidelines. A more profound grasp of the scientific requirements is essential for health care providers to discern well-conducted AI studies that provide solid evidence from the plethora of less rigorous research often found in the scientific literature. By enhancing this knowledge base, clinicians can effectively evaluate AI technologies, adopting only scientifically validated and effective solutions in clinical practice. Conversely, it is equally crucial that AI engineers and medical device manufacturers understand clinical needs and respond with appropriate solutions.

Communication between stakeholders

Perhaps one of the most challenging lessons learned is the need for effective communication and collaboration between AI developers and health care professionals. The successful deployment of AI in health care hinges on clinicians’ acceptance and adoption of the technology. This requires not only educating clinicians about the benefits and limitations of AI but also actively involving them in the development process to ensure that the AI platform meets their needs and fits seamlessly into their workflow.^17,18,31,32

In the health care sector, the introduction of new technologies often encounters resistance because of a variety of factors, including the fear of obsolescence, concerns about patient safety, and the perceived threat to professional autonomy.³³ The adoption of AI in health care is no exception. Despite the demonstrated potential of AI to enhance clinical decision making and improve patient outcomes, its integration into routine clinical practice has been met with mixed reactions from physicians.³⁴ The introduction of AI in health care represents a paradigm shift in the practice of medicine, requiring physicians to adapt to new ways of working and thinking. This change can be challenging, particularly for those unfamiliar with AI or who may have reservations about its effectiveness or ethical implications.³⁵ Therefore, it is essential to address these concerns through effective communication, training, and support. In our experience, when physicians were not adequately prepared for the AI or did not accept the new tool, the clinical study outcomes were suboptimal.¹⁴

Conversely, we learned that promoting information sharing is a powerful catalyst for increasing scientific papers using our technology. By encouraging investigators to publish their study results, regardless of outcome, we facilitate knowledge sharing and accelerate innovation. In less than 4 years since FDA clearance of our AI platform, more than 30 papers have been published utilizing our AI platform, with eight being RCTs.³⁶ This success is a testament to the importance of collaboration in developing AI solutions for health care.

One significant lesson learned from this experience was the value of a multidisciplinary approach in addressing unmet clinical needs and improving procedure quality through AI technology.¹⁸ Our unique partnership between a pharmaceutical company, a global med-tech giant, physicians, and AI experts brought together diverse expertise in handling high-quality data, understanding clinical and regulatory needs, marketing and distribution, and AI for medical image diagnosis.⁸ This collaboration was instrumental in developing our AI platform, which has improved polyp detection and overall procedure quality in colonoscopy. Our experience with AI-driven Software as a Medical Device (SaMD) highlights the importance of effective communication and resource sharing among stakeholders to prevent redundant efforts and ensure commercial viability.

Unfortunately, fragmentation is common among AI initiatives, leading to delays and failures. To address this issue, we propose introducing specialized marketplaces for AI applications that facilitate innovation sharing. These marketplaces can streamline regulatory approval processes, accelerate technology commercialization, and provide a structured environment where stakeholders can collaborate effectively.³⁷ By creating these ecosystems, more AI devices can reach patients, ultimately improving health care outcomes.

Human-AI interaction

The rapid growth in information and computer technologies in the 1980s led to the recognition of the importance of studying human–computer interaction.³⁸ This field aims to investigate social and psychological aspects of interactions that influence the acceptability and utility of new technologies. As humans’ methods of interacting with technology have become more sophisticated, so too has the complexity of their influences on and impact from these interactions.³⁹ Ease of use is a crucial driver of uptake for new technological products or platforms.⁴⁰ In medical science, successful implementation of AI platforms depends not only on technical success but also on the ability to interact effectively with human operators.⁴¹

An important lesson learned is that when the appropriate conditions are met, the potential for a synergistic relationship between AI and human expertise becomes evident.²⁰ This “human-AI hybrid” model has demonstrated superior accuracy in medical tasks compared with either the AI or the human practitioner alone. The key to achieving this result lies in the AI’s ability to clearly communicate the boundaries of its predictions. By doing so, the physician can effectively integrate the additional information provided by the AI while also considering their own knowledge and the clinical context.^5,20,42

Achieving optimal collaboration requires careful consideration of user interfaces and communication mechanisms between AI and physicians. The AI must provide clear, concise, and actionable information that physicians can easily interpret and apply. In addition, it should transparently disclose its limitations and uncertainties to prevent overreliance on AI and ensure physicians remain in control of clinical decision-making processes.¹⁰ This “human-AI hybrid” model represents a significant advancement in AI health care applications, highlighting AI’s potential as a complementary tool that can enhance accuracy and efficiency.

Human–AI interaction will be crucial for individual application success or failure; therefore, optimizing interface elements is essential to prevent clinician frustration with poorly designed AI platforms.³² Historically, the development of AI platforms has focused mainly on technical outcomes in idealized settings, expecting clinicians and patients to adapt to these technologies without sufficient consideration for human preferences and cognitive biases.⁴¹ To create more user-friendly, effective, safer, and better-value AI systems, we must reorient our focus toward developing human-centered AI and incorporating the study of human interaction at each stage of platform development. By aligning with appropriate regulation and governance, we can ensure that AI platforms are designed with humans in mind, leading to more successful implementation and adoption.⁴³

A Look to the Future

The rapid advancements in large language models and large multimodal models, like GPT-4,⁴⁴ PaLM,⁴⁵ and Gemini,⁴⁶ have significant implications for health care. These models can effectively encode clinical knowledge and perform impressively in medical question-answering benchmarks, even for complex cases requiring clinical reasoning and multimodal understanding.⁴⁷

However, before these models are implemented in real-world scenarios, it is crucial to address the regulatory, clinical, and equity-related risks they pose. Today, generative AI in health care is being used primarily for nondiagnostic tasks that have a lower risk profile but can still significantly improve care providers’ efficiency by alleviating administrative burdens. Nevertheless, the capabilities of large multimodal language models are ushering in a new era of possibilities for health and medicine.⁴⁷ These advancements hold the potential to accelerate biomedical discoveries, assist in health care delivery, and enhance patient experiences.

In the future, AI platforms will continue to improve, becoming increasingly time-saving technologies. By automating routine tasks, AI can free up valuable time for physicians to focus on more complex aspects of patient care. In the context of CRC screening, AI platforms have demonstrated their ability to improve the efficiency and accuracy of polyp detection, thereby enhancing the overall quality of colonoscopy procedures. However, the benefits of AI extend beyond mere diagnostic accuracy. By automating routine tasks, AI allows physicians to devote more time to patient interaction, thereby fostering a more holistic approach to patient care. This shift in focus from keyboard to the patient is not merely a return to the traditional doctor–patient relationship but a significant leap forward in patient-centered care.

However, it is essential to temper optimism with realism. AI is not a panacea but rather a tool that requires careful consideration of various factors before successful integration into the health care workflow. These factors include physician acceptance, clarity of AI predictions, regulatory landscape, and more.

Conclusions

The integration of AI into CRC screening has been transformative, driving significant advancements in medical diagnostics and patient care. This journey highlights the importance of transparency, accountability, and strong communication between developers and health care professionals to build trust and facilitate AI adoption.

As we navigate this technological landscape, a crucial question arises: How do we allocate tasks between humans and machines? The rapid growth of tasks that technology can perform has led to an ever-diminishing number of tasks reserved for human expertise. This debate is not new; researchers have long disagreed on the purpose of AI—should it replace or augment human performance? However, as AI continues to evolve at a breakneck pace, it appears that the decision will ultimately be driven by technological development rather than specific ethical considerations.

To ensure AI remains a safe and effective tool, adhering to stringent regulatory and ethical standards is crucial. As we look forward, thoughtful integration of AI into health care holds great promise—better clinical outcomes, more efficient processes, and a new era of medical technology are all within reach.

Author Disclosure Statement

A.C. is affiliated with Cosmo Intelligent Medical Devices, the developer of the GI Genius medical device.

References

Rajpurkar

, Chen

, Banerjee

, et al. AI in health and medicine. Nat Med, 2022; 28(1):31–38; doi: 10.1038/s41591-021-01614-0

Spadaccini

, Troya

, Khalaf

, et al. Artificial Intelligence-assisted colonoscopy and colorectal cancer screening: Where are we going? Dig Liver Dis, 2024; 56(7):1148–1155; doi: 10.1016/j.dld.2024.01.203

Topol

. High-performance medicine: The convergence of human and artificial intelligence. Nat Med, 2019; 25(1):44–56; doi: 10.1038/s41591-018-0300-7

Acosta

, Falcone

, Rajpurkar

, et al. Multimodal biomedical AI. Nat Med, 2022; 28(9):1773–1784; doi: 10.1038/s41591-022-01981-2

Cherubini

, East

. Gorilla in the room: Even experts can miss polyps at colonoscopy and how AI helps complex visual perception tasks. Dig Liver Dis, 2023; 55(2):151–153; doi: 10.1016/j.dld.2022.10.004

Antonelli

, Rizkala

, Iacopini

, et al. Current and future implications of artificial intelligence in colonoscopy. Ann Gastroenterol, 2023; 36(2):114–122; doi: 10.20524/aog.2023.0781

FDA. FDA Authorizes Marketing of First Device that Uses Artificial Intelligence to Help Detect Potential Signs of Colon Cancer. fda.gov 2021. Available from: https://www.fda.gov/news-events/press-announcements/fda-authorizes-marketing-first-device-uses-artificial-intelligence-help-detect-potential-signs-colon.

Cherubini

, Ngo Dinh

. A Review of the Technology, Training, and Assessment Methods for the First Real-Time AI-Enhanced Medical Device for Endoscopy. Bioengineering, 2023; 10(4):404; doi: 10.3390/bioengineering10040404

Spadaccini

, Marco

, Franchellucci

, et al. Discovering the first US FDA-approved computer-aided polyp detection system. Future Oncol, 2022; 18(11):1405–1412; doi: 10.2217/fon-2021-1135

10.

Joshi

, Jain

, Araveeti

, et al. FDA-Approved Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices: An Updated Landscape. Electronics, 2024; 13(3):498; doi: 10.3390/electronics13030498

11.

Hassan

, Spadaccini

, Iannone

, et al. Performance of artificial intelligence in colonoscopy for adenoma and polyp detection: A systematic review and meta-analysis. Gastrointest Endosc, 2021; 93(1):77–85.e6; doi: 10.1016/j.gie.2020.06.059

12.

Shah

, Park

, Chehade

NEH

, et al. Effect of computer-aided colonoscopy on adenoma miss rates and polyp detection: A systematic review and meta-analysis. J Gastroenterol Hepatol, 2023; 38(2):162–176; doi: 10.1111/jgh.16059

13.

Hassan

, Spadaccini

, Mori

, et al. Real-Time computer-aided detection of colorectal neoplasia during colonoscopy: A Systematic Review and Meta-analysis. Ann Intern Med, 2023; 176(9):1209–1220; doi: 10.7326/M22-3678

14.

Berzin

, Glissen Brown

. Navigating the “Trough of Disillusionment” for CADe Polyp Detection: What Can We Learn About Negative AI Trials and the Physician-AI Hybrid? Am J Gastroenterol, 2023; 118(10):1743–1745; doi: 10.14309/ajg.0000000000002286

15.

The white House.. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence.The White House 2023.

16.

EU. Artificial Intelligence Act. EU AI Act. 2024. Available from: https://artificialintelligenceact.eu/.

17.

WHO. Ethics and governance of artificial intelligence for health. World Health Organization. 2021.

18.

FDA. Good Machine Learning Practice for Medical Device Development: Guiding Principles. FDA 2023. Available from: https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles.

19.

Biffi

, Salvagnini

, Ngo Dinh

, et al. A novel AI device for real-time optical characterization of colorectal polyps. NPJ Digit Med, 2022; 5(1):84; doi: 10.1038/s41746-022-00633-6

20.

Reverberi

, Rigon

, Solari

, et al. Experimental evidence of effective human-AI collaboration in medical decision-making. Sci Rep, 2022; 12(1):14952; doi: 10.1038/s41598-022-18751-2

21.

Biffi

, Antonelli

, Bernhofer

, et al. REAL-Colon: A Dataset for Developing Real-World AI Applications in Colonoscopy. 2024;11:539; doi: 10.1038/s41597-024-03359-0

22.

Plana

, Shung

, Grimshaw

, et al. Randomized clinical trials of machine learning interventions in health care: A Systematic Review. JAMA Netw Open, 2022; 5(9):e2233946; doi: 10.1001/jamanetworkopen.2022.33946

23.

Han

, Acosta

, Shakeri

, et al. Randomised controlled trials evaluating artificial intelligence in clinical practice: A scoping review. Lancet Digit Health, 2024; 6(5):e367–e373; doi: 10.1016/S2589-7500(24)00047-5

24.

World Health Organization. Generating Evidence for Artificial Intelligence-Based Medical Devices: A Framework for Training, Validation and Evaluation. World Health Organization: Geneva; 2021.

25.

Keswani

, Thakkar

, Sals

, et al. A Computer-Aided Detection (CADe) System Significantly Improves Polyp Detection in Routine Practice. Clin Gastroenterol Hepatol, 2023; 22(4):893–895.e1; doi: 10.1016/j.cgh.2023.09.008

26.

Vasey

, Nagendran

, Campbell

, et al. Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. Nat Med, 2022; 28(5):924–933; doi: 10.1038/s41591-022-01772-9

27.

Cruz Rivera

, Liu

, Chan

A-W

, et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: The SPIRIT-AI extension. Nat Med, 2020; 26(9):1351–1363; doi: 10.1038/s41591-020-1037-7

28.

Liu

, Cruz Rivera

, Moher

, et al. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: The CONSORT-AI extension. Nat Med, 2020; 26(9):1364–1374; doi: 10.1038/s41591-020-1034-x

29.

Sounderajah

, Ashrafian

, Aggarwal

, et al. Developing specific reporting guidelines for diagnostic accuracy studies assessing AI interventions: The STARD-AI Steering Group. Nat Med, 2020; 26(6):807–808; doi: 10.1038/s41591-020-0941-1

30.

Collins

, Moons

KGM

, Dhiman

, et al. TRIPOD+AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. Bmj, 2024; 385:e078378; doi: 10.1136/bmj-2023-078378

31.

Solomonides

, Koski

, Atabaki

, et al. Defining AMIA’s artificial intelligence principles. J Am Med Inform Assoc, 2022; 29(4):585–591; doi: 10.1093/jamia/ocac006

32.

van de Sande

, Van Genderen

, Smit

, et al. Developing, implementing and governing artificial intelligence in medicine: A step-by-step approach to prevent an artificial intelligence winter. BMJ Health Care Inform, 2022; 29(1):e100495; doi: 10.1136/bmjhci-2021-100495

33.

Kostick-Quenet

, Gerke

. AI in the hands of imperfect users. NPJ Digit Med, 2022; 5(1):197–196; doi: 10.1038/s41746-022-00737-z

34.

Yang

, Ngai

EWT

, Wang

. Resistance to artificial intelligence in health care: Literature review, conceptual framework, and research agenda. Information & Management, 2024; 61(4):103961; doi: 10.1016/j.im.2024.103961

35.

Leggett

, Parasa

, Repici

, et al. Physician perceptions on the current and future impact of artificial intelligence to the field of gastroenterology. Gastrointest Endosc, 2024; 99(4):483–489.e2; doi: 10.1016/j.gie.2023.11.053

36.

Cosmo IMD. Updated list of Scientific evidence on GI Genius. Cosmo Intelligent Medical Devices. 2024. Available from: https://cosmoimd.com/latest-news/.

37.

Medtronic. AI Access Platform. GI Genius. 2023. Available from: https://www.medtronic.com/us-en/healthcare-professionals/education-training/genius-academy/future-is-now/gi-genius-ai-access-platform.html.

38.

Dix

. Human–computer interaction, foundations and new paradigms. Journal of Visual Languages & Computing, 2017; 42:122–134; doi: 10.1016/j.jvlc.2016.04.001

39.

Carroll

, Olson

. Mental Models in Human-Computer Interaction1. In: Handbook of Human-Computer Interaction. ( Helander

. ed) North-Holland: Amsterdam; 1988; pp. 45–65; doi: 10.1016/B978-0-444-70536-5.50007-5

40.

Cutillo

, Sharma

, Foschini

, et al. Machine intelligence in healthcare—perspectives on trustworthiness, explainability, usability, and transparency. NPJ Digit Med, 2020; 3(1):47–45; doi: 10.1038/s41746-020-0254-2

41.

Sujan

, Pool

, Salmon

. Eight human factors and ergonomics principles for healthcare artificial intelligence. BMJ Health Care Inform, 2022; 29(1):e100516; doi: 10.1136/bmjhci-2021-100516

42.

Introzzi

, Zonca

, Cabitza

, et al. Enhancing human-AI collaboration: The case of colonoscopy. Dig Liver Dis, 2023; 56(7):1131–1139; doi: 10.1016/j.dld.2023.10.018

43.

The DECIDE-AI Steering Group. DECIDE-AI: New reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence. Nat Med, 2021; 27(2):186–187; doi: 10.1038/s41591-021-01229-5

44.

Achiam

, Adler

, et al. OpenAI. GPT-4 Technical Report. 2024; doi: 10.48550/arXiv.2303.08774

45.

Chowdhery

, Narang

, Devlin

, et al. PaLM: Scaling Language Modeling with Pathways. Journal of Machine Learning Research, 2023; 24(240):1–113.

46.

Anil

, Borgeaud

, et al. Gemini Team. Gemini: A Family of Highly Capable Multimodal Models. 2024; doi: 10.48550/arXiv.2312.11805

47.

Saab

, Tu

, Weng

W-H

, et al. Capabilities of Gemini Models in Medicine. 2024.

Human–Artificial Intelligence Collaboration: Insights and Lessons from Colonoscopy Artificial Intelligence Integration