Abstract
Background:
Artificial intelligence (AI) is rapidly infiltrating multiple areas in medicine, with gastrointestinal endoscopy paving the way in both research and clinical applications. Multiple challenges associated with the incorporation of AI in endoscopy are being addressed in recent consensus documents.
Objectives:
In the current paper, we aimed to map future challenges and areas of research for the incorporation of AI in capsule endoscopy (CE) practice.
Design:
Modified three-round Delphi consensus online survey.
Methods:
The study design was based on a modified three-round Delphi consensus online survey distributed to a group of CE and AI experts. Round one aimed to map out key research statements and challenges for the implementation of AI in CE. All queries addressing the same questions were merged into a single issue. The second round aimed to rank all generated questions during round one and to identify the top-ranked statements with the highest total score. Finally, the third round aimed to redistribute and rescore the top-ranked statements.
Results:
Twenty-one (16 gastroenterologists and 5 data scientists) experts participated in the survey. In the first round, 48 statements divided into seven themes were generated. After scoring all statements and rescoring the top 12, the question of AI use for identification and grading of small bowel pathologies was scored the highest (mean score 9.15), correlation of AI and human expert reading-second (9.05), and real-life feasibility-third (9.0).
Conclusion:
In summary, our current study points out a roadmap for future challenges and research areas on our way to fully incorporating AI in CE reading.
Introduction
Artificial intelligence (AI) technologies are being rapidly developed and implemented in multiple areas of medicine. In recent years, gastrointestinal (GI) endoscopy led the way, with several AI-based technologies for colonic polyp detection introduced and evaluated in clinical practice.1–8 Some of the systems are already being used routinely across the world, even though real-world implementation results are still lacking.
AI is being implemented in several other areas of GI endoscopies, such as early detection of gastric neoplasia,9,10 Barrett’s esophagus, 11 endoscopic ultrasound,12,13 and grading of mucosal inflammation in ulcerative colitis.14–18 An additional field with the fast development of AI research is capsule endoscopy (CE), with several publications evaluating deep learning for automated detection of inflammatory lesions,19–26 vascular lesions,27,28 protruding and neoplastic lesions/masses, 29 and scoring of bowel cleanliness. 30
However, there are still multiple challenges in the way of implementation of the impressive experimental performance of AI in CE in clinical practice. Some of these challenges include standardization of the results, validation of established end points, creation of common datasets and computational methodology, and correlation with clinical outcomes. These challenges are partially common to other areas of GI endoscopy and medicine in general 31 and are being addressed by expert consensus meetings and workshops providing roadmaps into future research targets and methodologies. Recently, such priority-setting statement was published for colonoscopy 5 ; the key themes were identified as the establishment of clinical trial design/end points, technological development, clinical integration, data access and annotation and regulatory approval.
The main aim of our study was to identify the top research priorities related to the implementation and further research for AI in CE.
Materials and methods
Study design
The study design was based on a modified three-round Delphi consensus online survey. The modified Delphi methodology is well known and used in medical literature5,32 for establishing research priorities based on expert’s opinions. All rounds were distributed through GoogleForm® (Mountain View, CA, USA) and the study was conducted between September 2021 and January 2022. The first round consisted of an open and anonymized 10 queries survey proposed to a panel of CE experts. The suggested queries based on expert’s opinions aimed to identify and map out key research statements and challenges for the implementation of AI in CE. All answers collected during round one were reviewed by a core group (CG) and categorized into seven singular themes. Queries addressing the same questions were merged into a single issue by the CG. The second round aimed to rank the questions generated during round 1 and identify the highest scoring statements. The third round aimed to redistribute and rescore the top-ranked statements. All respondents were asked to rate all statements on a numerical wider scale from 1 (very low priority) to 10 (very high priority).
The core group and expert group
The CG was composed of translational CE readers and data scientists (UK, RL, AK, XD, and AH) to form a key leader opinion to conduct this study. All questionnaires were sent to a panel of CE experts, including the CG, with a diversity of backgrounds including physicians CE experts’ readers and data scientists related to the CE medical field.
Results
Among the 22 experts invited to participate in this study, 21 finally answered at least one of the questionnaires. The participation rate was 90% (n = 19) for the first round and 95% (n = 20) for the second and third rounds, respectively. The 21 respondents were considered as the expert group (EG) and included physicians CE experts readers (76%, n = 16) and data scientists (24%, n = 5). Members of the EG were based in Denmark (n = 3), England (n = 1), France (n = 4), Germany (n = 1), Greece (n = 1), Ireland (n = 1), Israel (n = 3), Italy (n = 2), Norway (n = 1), Portugal (n = 1), Spain (n = 1), and Sweden (n = 2). The mean age of the experts was 49 years. The main practice setting was academic (n = 18; 86%) and mixed academic/private (n = 3; 14%). The physicians CE experts had a mean CE reading experience of 14 years [interquartile range (IQR) = 13] and a mean number of CE read annually of 154 (IQR = 150). The data scientists had a mean CE experience of 12 years.
After the review process by the CG, the first round generated 48 statements divided into seven themes (Table 1). These statements were then considered for scoring in the second round. In round two, considering the wider scale from 1 (very low priority) to 10 (very high priority), the mean score obtained for the 48 statements ranged from 4.6 to 9.2 (Table 1). Then the top 12 statements, including tied scores, were identified from three themes including performances metrics, AI in CE in clinical practice, and auditing AI systems. The final 12 statements were considered for rescoring in the third round. Results of the third round showed a mean score ranging from 7.63 to 9.15 (Table 2).
Results of the first round after reviewing by the core group.
AI, artificial intelligence; CE, capsule endoscopy.
Results of the top 12 ranked statements third round with the top 12 ranked statements.
AI, artificial intelligence; CE, capsule endoscopy; SB, small bowel.
Discussion
The current study is the first attempt to prioritize and standardize the research challenges and questions in the application of AI in CE. The consensus was facilitated by a modified three-stage Delphi process through an established group of CE experts and data engineers with extensive experience on the subject.
AI is rapidly being incorporated into multiple fields in medicine, with GI endoscopy being among the leading disciplines. In colonoscopy, AI-based systems for the facilitation of polyp detection are already commercially available and have been proven successful in improving polyp detection rate by up to 40%.6,7 In CE, comparative research is not yet available; however, detection of multiple types of small bowel (SB) and colonic pathologies is accurate and feasible.33–35 CE is perhaps an even more attractive target for AI research, as there is no need for real-time diagnosis, and the variety of pathology types is somewhat limited: thus, identification of most SB pathologies by AI was very accurate, with AUC above 90%.22,30,36–38 In future models of CE, strong incorporation of AI modules with automated lesion markup can be expected. However, the incorporation of AI into clinical practice and clinical trials requires a huge leap in terms of standardization, quality assessment, reproducibility, and workflow integration. In a recent large European survey encompassing 380 gastroenterologists (of them 88% experienced capsule readers), a majority of the responders agreed that AI would positively impact CE, shorten CE reading time, help standardize reporting in CE, and characterize lesions seen in CE; however, the likelihood of complete replacement of human readers by AI was deemed to be low. 39
We aimed to map and prioritize the main challenges for further research and integration into clinical practice. Our EG was comprised of 76% physicians and 24% data scientists, all of them with a vast track record of CE reading and research. It appears that the highest scores issued in this EG were still those referring to accuracy in detection of findings in both SB and colon, as well as the optimal threshold for accuracy of the algorithm. The next group of statements addressed the feasibility and accuracy in a real-world setting. Indeed, to date, no real-world model for the utilization of AI in CE has been published. Identification of specific lesions by AI may be introduced into capsule reading software of any of capsule producers/vendors; nonetheless, clinical decision-making or predictive models based on AI capsule reading (either complete videos or still images) are still missing
Most of the available studies included images obtained with a specific capsule model or brand. Widespread utilization of AI will require brand/model-spanning algorithms that are still very rare. 28 Similarly, a clinical algorithm would be required to detect multiple types of pathologies at the same time and on the same still image, regardless of the location of the image (SB/colon). An additional issue of concern is whether AI would be able to completely replace a human reader, and what degree of human supervision/auditing will be required. This challenge is closer to those originating from the worlds of imaging and pathology; AI in colonoscopy may augment human judgment but will not replace it completely as the human is still behind the scope and is instrumental to the obtainment of quality images. CE diagnosis is completely in post-acquisition, and human intervention is not required for anything but the reading itself. Nevertheless, CE reading is a tedious and lengthy task, especially for colonic capsules; AI can shorten the reading time by at least 95%; however, real-life accuracy data that could support this model in clinical practice are completely lacking. This issue could be critical for the uptake of colonic CE that is currently hampered by long reading time and subsequent devaluation of the economic model for this potentially appealing screening modality. An additional temporary compromise could be to utilize AI to remove normal images, in similarity to the current features of ‘Quickview’ or top-100; currently, these features are insufficiently accurate40,41; however, future hardware and software improvements may change that.
It is likely that in the near future we will witness several AI systems for CE, some incorporated in the reading software and others as standalone suites. 42 There is a need to compare the accuracy of these and other forthcoming systems and a requirement for benchmarking parameters for both clinical trial and real-world use.
In inflammatory bowel diseases, CE has a potential major advantage of being able to access the panenteric inflammatory burden. The importance of mucosal healing as a therapeutic target in inflammatory bowel diseases has been well described.43,44 However, the concept of mucosal healing in IBD is almost constantly addressing the colon and the terminal ileum. Nonetheless, mucosal responsiveness of different gut segments to medical treatment is not identical.45,46 In addition, in Crohn’s disease patients in clinical remission, some residual SB inflammation is very common 47 and has major clinical implications on the likelihood of long-term remission. 48 Surprisingly, only a handful of studies to date utilized CE for evaluation of mucosal healing prospectively.49,50 In the last few years, several studies reported the use of colonic capsules for panenteric evaluation51–54; recently a specialized Crohn’s disease capsule (Pillcam Crohn, Medtronic, Minneapolis, MN, USA) has been released.55–58 AI-read CE could be a potentially safe and accurate modality for the assessment of mucosal inflammation in clinical trials in IBD, pending further benchmarking and standardization. To date, no studies evaluating complete film AI-augmented reading were published. This challenge may require a very different analytical approach.
Our study has several limitations. Primarily, this was a Delphi survey of a predefined group of AI and CE experts. The group was limited in size however included participants with a significant track record in the field. Attitudes toward AI in CE in a larger and more representative group of gastroenterologists were previously evaluated by our group. 39 In addition, our objective was to raise and solicitate research questions; the suggestions of the participating experts merit further research efforts in the years to come.
Conclusion
In summary, our current study points out a roadmap for future challenges and research areas on our way to fully incorporate AI in CE reading. These statements are useful not only for research but also for AI medical education.
