Abstract
Inflammatory bowel disease is a complex chronic inflammatory disorder with challenges in diagnosis, choosing appropriate therapy, determining individual responsiveness, and prediction of future disease course to guide appropriate management. Artificial intelligence has been examined in the field of inflammatory bowel disease endoscopy with promising data in different domains of inflammatory bowel disease, including diagnosis, assessment of mucosal activity, and prediction of recurrence and complications. Artificial intelligence use during endoscopy could be a step toward precision medicine in inflammatory bowel disease care pathways. We reviewed available data on use of artificial intelligence for diagnosis of inflammatory bowel disease, grading of severity, prediction of recurrence, and dysplasia detection. We examined the potential role of artificial intelligence enhanced endoscopy in various aspects of inflammatory bowel disease care and future perspectives in this review.
Introduction
In their leading textbook, ‘Artificial Intelligence: A Modern Approach’, experts in artificial intelligence (AI), Stuart Russell and Peter Norvig, identify four possible goals of AI systems that think like humans, act like humans, think within the ideal concept of intelligence (i.e. rationally), or act rationally. 1 The scientific discipline of AI can be classified into ‘Computer Vision’ (image recognition and classification), ‘Natural Language Processing’ (speech-to-text generation and translation), and ‘Machine Learning’, which is the most commonly employed type in medicine. 2 The objective of machine learning (ML) is to build an algorithm based on inputted data with the power to predict new data, such as future outcomes. Processes of ML include supervised learning, unsupervised learning, and deep learning.3,4 In supervised learning, the algorithm is trained on labeled data points. The algorithm learns the relationship between the known data and the correct outcomes. It then predicts future outcomes of unforeseen data. On other hand, unsupervised learning is evaluating for patterns based on available data. The algorithm learns the inherent structure/relationships within inputted data without the supervision of prescribed labels and outcomes.1,4 Finally, the deep learning, specifically deep/convoluted neural networks (DNN/CNN), works by using layers of algorithms that are interconnected and receives weighted input from earlier layers to recognize patterns and ultimately capture complex relations within the data. In this way, these DNN function much like the mammalian brain. 4
AI in medicine, specifically in gastroenterology, has been of growing interest with exciting developments in the last couple of decades. Two major applications of ML are computer-aided detection (CADe) for detection of a pathology (e.g. polyp identification) and computer-aided diagnosis (CADx) for predicting diagnosis (e.g. polyp classification). 4 Previously, the use of AI in gastroenterology has largely focused on CADe with polyp detection during screening or surveillance colonoscopy with advancement to CADx in predicting polyp pathology to aid in cancer diagnosis.2,5–7
Another area of interest is endoscopy in inflammatory bowel disease (IBD) where there is significant interobserver variation in disease diagnosis, endoscopic activity assessment, dysplasia detection, and prognostic assessment of the disease course. Due to significant heterogeneity in presentation, disease course, treatment response, and provider assessment, AI offers objectifying the endoscopic assessment of inflammation, lesion detection, and possible role in prognostication of disease recurrence. Potential applications of AI in IBD include use in diagnosis, identifying mucosal disease activity assessment, prediction of response to therapy/recurrence/complications/hospitalizations, and dysplasia detection (Figure 1). This review will focus on the technologies currently being studied in the IBD and possible future applications. We will review published data on AI in disease diagnosis by endoscopy, staging or activity assessment, and lesion detection (summarized in Table 1). Given the evolving vastness of the role of AI in other aspects of IBD care including IBD patient education, electronic health record linkage using natural language processing (NLP), and pattern recognition during abdominal imaging, we have limited our discussion to AI application during IBD endoscopy for the purpose of this review.

Potential AI Applications during Endoscopy in IBD.
Brief Overview of Selected Prior Studies Examining Utility of AI in Various Aspects of IBD Management.
AI, artificial intelligence; AUC, area under the curve; AUROC, area under the receiver operating curve; CD, Crohn’s disease; CI, confidence interval; IBD, inflammatory bowel disease; UC, ulcerative colitis; UCEIS, Ulcerative Colitis Endoscopic Index of Severity.
Diagnosis of IBD
The diagnosis of IBD is a clinicopathological diagnosis that involves a combination of a thorough clinical history, imaging, endoscopy, histopathology as well as serum markers.8,9 It can be a difficult diagnosis to make, even for gastrointestinal (GI) specialists and pathologists. The accuracy of endoscopic evaluation is based on the single subjective observation and biopsy retrieval of the endoscopist performing it. Most endoscopy technologies currently employ high-definition white light endoscopy (HD-WLE) with optical enhancing chromoendoscopy options. Therefore, the challenge in endoscopic diagnosis is primarily interpretation and less so visualization. 4 A poor-quality endoscopy can lead to a delayed diagnosis and severe complications. 10 There is documented interobserver variation in pathologist diagnosis of IBD and then furthermore in differentiating between ulcerative colitis (UC) and Crohn’s disease (CD), even among specialized GI pathologists who are not available in most clinical centers. 9 For these reasons, serological, genetic, and inflammatory markers have become a focus of study for diagnosing IBD, differentiation of UC and CD, and grading severity of inflammation. A combination of all these techniques is the current practice for diagnosing both UC and CD. 8
To aid in diagnosis for IBD and many other disorders, Thakkar and colleagues 11 developed a CADe system to detect the quality of endoscopy and present this interpretable feedback to the endoscopist in real time. The system yields a score for each of the four quality metrics (visible surface area, opened/distended colon, preparation conditions, and clarity of current view). When compared with scores given by expert endoscopists, this AI-driven system shows comparable performance. This system was used in 10 real-time colonoscopies to prove its feasibility. The quality of endoscopies is highly dependent on the expertise of the endoscopist performing them. Furthermore, the many AI systems aimed at improving diagnosis (e.g. polyp/lesion detection) and treatment (e.g. identifying inflammatory tissue to determine remission versus further treatment in UC) depend on the quality of a procedure to identify a specific abnormality. Systems like this can provide the clinical team with an objective measure of confidence in the quality of a procedure, ultimately, enhancing intraprocedural diagnostic precision.
For the purpose of diagnosing and further differentiating between UC and CD, Mossotto and colleagues developed a CADx system using supervised and unsupervised learning in pediatric patients. The training data set used endoscopic images and histopathology data, from previously diagnosed patients, to build an algorithm that could classify CD versus UC. On the validation set, the combined DNN model classified CD and UC at diagnosis with 83.3% accuracy. 12 The importance of distinguishing CD and UC at diagnosis cannot be underemphasized because treatment options and follow-up differ between these diagnoses. A notable difference is the surveillance for dysplasia detection and subsequent intervention. Due to the significantly increased risk of colon cancer in UC, colectomy is often recommended if high-grade dysplasia is detected. 9 This system begins to show the potential for AI in diagnosis of IBD; however, integration of serum markers and application in real time will be future developments that are necessary for this complex diagnostic process.
Detection of inflammation
An early area of study was the use of ML for detection of inflammatory lesions in IBD patients including the areas that are beyond the reach of ileocolonoscopy. Systems that have been developed to do this use video capsule endoscopy and endomicroscopy technologies. While systems that can interpret live endoscopy images have not yet been developed, the DNN technology described here used with video capsule endoscopy and endomicroscopy may theoretically be used in the real-time setting.
Video capsule endoscopy
In the early 2010s, Kumar and colleagues used a supervised learning model to classify lesions on video capsule endoscopy images. Their system detected CD lesions with 91% precision but only 79% accuracy for classification of lesion severity. 13 Girgis and colleagues 14 also used capsule endoscopy to detect regions of inflammation in CD patients with an accuracy of 87%. Most recently, in 2021, Barash and colleagues demonstrated that their DNN technology which was trained first to identify CD lesions versus normal mucosa could also grade the severity of CD ulcers with impressive accuracy. The system performed with the highest classification accuracy [0.91, confidence interval (CI): 0.716–0.844] for distinguishing grade 1 versus grade 3 ulcers.15,16 These CADe systems show evidence that AI can be used with video capsule endoscopy to help decrease manpower and time spent reviewing these images while improving subjective human accuracy.
Endomicroscopy
A novel frontier is the field of endomicroscopy that allows histologic observation during endoscopy without requiring biopsies. Using a multidimensional system that incorporates endoscopic visualization and histological evaluation is important for the diagnosis of IBD.
A DNN model developed by Takenaka and colleagues in Japan is built on data from both endoscopic images and histopathology from patients with UC. The team used the Ulcerative Colitis Endoscopic Index of Severity (UCEIS) and Geboes score to define endoscopic and histologic remission, respectively. In the validation phase, this study found that the model predicted endoscopic remission with 90% accuracy and had a kappa score of 0.80 when compared with experienced reviewer scores. Histological remission was identified with 93% accuracy and reproducibility kappa score of 0.86 compared with pathologist readings. 17 This innovative system shows the potential to avoid biopsies in UC patients in remission, decreasing healthcare expenditure in this chronic disease for both the system (procedural costs and pathologist compensation) and patient.
Bossuyt and colleagues created a Red Density (RD) score as an endomicroscopic evaluation of UC activity. This DNN was built on algorithmic data from redness color map and vascular patterns on endoscopic images that were then linked to the Robarts histological index (RHI). The system performed with statistically significant correlation to RHI in the validation set (r = 0.65). 18 As discussed, this endomicroscopy system allows for computer-aided evaluation of patient inflammation versus remission without the need for extensive biopsies, and the score produced presents this information in a functionally objective form.
Scoring disease severity in UC
Multiple studies have found promising evidence that deep learning algorithms have the potential to predict severity of disease in UC. Systems are being trained to produce Mayo Endoscopic Disease Activity Assessment score and UCEIS scores based on endoscopic images. This has the clinical benefit of assessing the effectiveness of a current treatment for patients and determining appropriate future treatments. Given the poor interobserver agreement for these endoscopic scores, a DNN system would reduce human error, allow for more timely treatments, and improve a patient’s quality of life. 19 We will discuss these systems based on the imaging technologies.
Endoscopic images
Ozawa and colleagues developed a complex neural network system to label endoscopic mucosal images with Mayo endoscopic scores to evaluate disease activity. The system labeled images with Mayo scores of 0, 1, or 2–3 with an area under the receiver operating curve (AUROC) of 0.98. This system can be used to differentiate if a patient with UC is in endoscopic remission (defined as Mayo score: 0–1) or has active inflammation (Mayo score: 2–3). Ozawa and colleagues 20 acknowledged the necessity of using this system in real time and claimed that the DNN processing time was less than 30 ms which is fast enough for use during a colonoscopy.
Bhambhvani and Zamora recently demonstrated that their DNN system could go further and accurately classify UC lesion severity by specific Mayo scores of 1, 2, or 3 with an area under the curve (AUC) of 0.96, 0.86, and 0.89, respectively. The system performed with an average specificity of 85.7% and average sensitivity of 72.4%. 21 This expansion of AI from identification of inflammation to classification of severity is a promising advancement for patients with complex and UC, requiring detailed distinction between their levels of active disease.
Numerous DNN models have also been built based on data from full endoscopy videos. Gottlieb and colleagues used both Mayo and UCEIS scores to compare the performance of a recurrent neural network model with human reader scores with very good reproducibility, κ = 0.84 for Mayo scoring and κ = 0.85 for UCEIS scoring. A DNN system by Stidham and colleagues 22 is capable of distinguishing UC in remission versus moderate/severe disease state using the Mayo score with an AUROC of 0.97. This model had comparable agreement to human reviewer scores, κ = 0.86. Yao and colleagues built a similar model and investigated its performance both on high- and low-resolution endoscopic videos. Their system performed on par with the aforementioned studies with the same aims of grading UC severity. They found improved agreement of the model and human scores with high-resolution videos (κ = 0.84) and an overall system accuracy of 84%. This highlights the need for standardization of endoscopy digitization if AI is to be used more broadly in both clinical trial and clinical treatment settings. 23
Maeda and colleagues 19 constructed a CADx system to predict persistent histological inflammation in patients with UC using endocytoscopy technique, which uses a specialized endocytoscope that have a forward-facing microscope capable of 1000× real-time magnification compared with the traditional 50× of a standard endoscope. In the cohort of 187 patients with UC, when scoring per segment for histological active inflammation versus healing, the system performed with 91% accuracy, 97% specificity, and perfect agreement with histologist Geboes scoring (κ = 1), but only 74% sensitivity. 24 Because additional indications such as dysplasia surveillance necessitate endoscopic biopsy specimens regardless, lower sensitivity may be tolerated, given the exceptional specificity and agreement regarding histological remission.
Dysplasia detection
Current CADe/CADx systems aimed at the detection and differentiation of colon polyps/lesions have shown remarkable data for colon cancer screening studies, but this technology remains to be applied as broadly for dysplasia surveillance in IBD patients. Theoretically, the same technology differentiating different types of polyps could be used to survey for visible dysplasia. In addition, virtual chromoendoscopy (VCE) is another technique that has recently been of interest for dysplasia surveillance in patients with IBD. Few recent studies have evaluated the potential role for VCE in the identification of dysplastic lesions. A multicenter randomized control trial by Khandiah and colleagues 25 compared the performance of high-definition VCE (i-scan OE mode 2) with HD-WLE, showing comparable detection of neoplasia. El-Dallal and colleagues recently published a meta-analysis looking at the comparative efficacy of VCE compared with HD-WLE or traditional chromoendoscopy for dysplasia detection in higher risk IBD patients. When looking at per patient analysis, there were no statistical differences between VCE, traditional chromoendoscopy, and HD-WLE. 26 Although still being investigated, VCE could be an additional modality that could be used by AI systems to target dysplasia in IBD.
In an alternate approach to AI outside of endoscopy, Selaru and colleagues demonstrate the potential for AI in dysplasia detection for patients with long-standing colitis using tissue specimens. This group trained a DNN system using complementary DNA to distinguish between IBD-related dysplasia and sporadic colorectal adenomas. 27 Notably, as it requires tissue samples, this system still necessitates colonoscopy with biopsies of any suspicious lesions. The decrease in human error and manpower for tissue analysis and diagnosis are benefits such DNN-based technology improving patient outcomes. There are several challenges to address when implementing this technology in clinical practice for IBD patients including the ability to differentiate pseudopolyps from true polyps and detection of abnormal flat lesions that are common in long-standing colitis.
Limitations
While AI is not ready for prime time in IBD endoscopy and the field is overall evolving with no system ready to be used in daily practice, there are certainly growing interests in various systems from different organizations. While competition for the best system is important, incorporation of such DNN that can perform diagnosis, differentiation, and detection of lesions in the same system, although ideal, is challenging. Another concern with the use of AI in gastroenterology is in the form of ethics. How independently can and should these systems perform? What level of responsibility for negative outcomes can and should be placed on the system versus a physician (i.e. complications from procedures, missed diagnoses)? As science progresses, these questions will need to be answered for the safety and comfort of patients. Similarly, in IBD endoscopy, before availability of these systems at the bedside, multicenter prospective trials with comparison against current standard of care followed by rigorous regulatory approval are required.
While these AI systems show promising evidence in validation sets, many of these current AI technologies have not yet been integrated into current patient care. Therefore, there is a lack of data with regard to patient outcomes. AI technologies applied to detecting inflammation and grading severity should improve patient experience during a procedure and decrease procedure time. A major improvement in patient outcome would be that the use of AI would decrease time required for data interpretation/time to diagnosis and treatment significantly enough to improve patients’ quality of life. Ultimately, the optimal goal of AI use in IBD patients is improving the clinical course of the disease by earlier diagnosis, leading to improved disease burden, quality of life, and possible decrease or early detection of dysplasia.
Potential applications and future
The benefits and potential for the use of AI in IBD diagnosis and treatment as well as clinical trials are vast. Computer systems processing is undeniably faster than that of its human counterparts. Decreasing the need for biopsies can contribute to overall savings in healthcare costs. Finally, the potential to improve interobserver agreement in endoscopic disease scoring between gastroenterologists is possibly the most exciting aim of ML. Coupling CAD systems with physician observation and clinical gestalt amplifies the positive impact and illuminates an exciting future in medicine.
What is on the horizon for AI in IBD? Endomi-croscopy is clearly an exciting new technology with promising results; however, the system is not available widely and requires a steep learning curve. Future studies in AI systems that incorporate clinical information such as biomarkers and symptoms may further the use of DNN in the multifactorial diagnosis of IBD. Finally, the current and future technologies should be applied to live colonoscopies to fully utilise their potential. Live CADe and CADx systems will provide the most clinically significant benefits to patients and endoscopists, so the next step includes using these newest technologies in clinical trials., The ideal AI system would help improve all aspects of endoscopy-related care in IBD including detection and characterization of inflammation, mucosal healing, recurrence pattern, and detection of any concerning lesion in the same AI system helping endoscopists and patients from one-step assessment.
Footnotes
Conflict of interest statement
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
