Abstract
Artificial intelligence (AI) is redefining the management of inflammatory bowel diseases (IBD) by enhancing diagnostic accuracy, refining disease classification, and optimizing disease monitoring. This review highlights AI’s potential to transform IBD management by streamlining clinical workflows, improving diagnostic precision, and supporting personalized treatment strategies. By addressing the limitations of traditional clinical assessments including variability, subjectivity, and resource intensity, AI offers unbiased, consistent, and efficient solutions. Concluding with a forward-looking perspective, this paper emphasizes how integrating AI into clinical practice could lead to more precise, proactive, and patient-centric approaches to IBD care, ultimately enhancing clinical outcomes and quality of life for these patients.
Plain language summary
Inflammatory Bowel Diseases (IBD), like Crohn’s disease and ulcerative colitis, are chronic conditions that affect the digestive system. Diagnosing and managing IBD can be complex, often requiring multiple tests and expert interpretation. Artificial Intelligence (AI) offers new tools to improve IBD care. AI systems can quickly and accurately analyze medical images, predict treatment responses, and support doctors in making better decisions. This review explains how AI can make diagnosing IBD more accurate, treatment plans more personalized, and overall care more efficient. By using AI, healthcare providers could help patients with IBD get faster, more reliable, and more tailored treatments, improving their quality of life.
Keywords
Introduction
The management of inflammatory bowel diseases (IBD) requires specialized expertise for making accurate diagnosis and nuanced decision-making to optimize patient outcomes. Accurate diagnosis, disease classification, and proactive disease monitoring are fundamental to effective care, relying on endoscopy, imaging, and histopathology tools to evaluate treatment targets and management of disease. 1 However, interpreting these tools is often subjective, prone to bias, costly, and time-intensive. While standardized clinical and endoscopic scoring systems exist to bring consistency to these assessments, their adoption in routine practice has been limited due to their complexity and the fact that they are often cumbersome and difficult to integrate efficiently into clinical workflows. Consequently, disease activity, phenotype, and therapeutic response evaluations can differ, introducing variability in care.
Furthermore, crafting effective treatment strategies for IBD goes beyond understanding disease activity alone. A patient’s coexisting conditions, prior treatments, extraintestinal manifestations, and personal preferences often shape management decisions, resulting in highly individualized care plans. While clinical guidelines provide a general roadmap, the intricate and dynamic nature of IBD care demands advanced expertise and experience to navigate these complexities.
Artificial intelligence (AI) refers to the simulation of human intelligence in machines programmed to think, learn, and solve problems. In healthcare, AI systems are primarily powered by machine learning (ML), which enables computers to identify patterns and make predictions based on data. One common approach is supervised learning, where models are trained on labeled datasets to forecast outcomes in new cases. 2
Deep learning is a subfield of ML that uses neural networks with multiple layers that learn complex relationships directly from raw inputs. Convolutional neural networks (CNNs), in particular, are highly effective at analyzing visual data, making them well-suited for tasks like endoscopic image interpretation, radiologic pattern detection, and histological feature recognition.3,4 Natural language processing (NLP), another key technique, allows AI systems to extract meaningful information from unstructured clinical text, such as physician notes or patient reports. 5
AI is poised to transform how IBD is diagnosed, classified, and monitored, addressing many challenges faced in current clinical workflows. 6 By leveraging digitized medical data such as imaging, pathology slides, clinical notes, and patient-reported outcomes alongside advanced analytical methods including ML and neural networks, AI can identify patterns that may be difficult for clinicians to detect. For instance, AI-powered tools can provide rapid, unbiased, and consistent interpretation of endoscopic and radiological images, minimizing variability and ensuring uniformity across providers and settings.7,8
In addition, AI applications are advancing beyond imaging to include NLP capabilities that can analyze clinical documentation and patient narratives, extracting actionable insights to support diagnostic and prognostic decisions. ML models have shown promise in categorizing patients into distinct disease phenotypes, predicting complications, and tracking therapeutic responses over time, offering a personalized and proactive approach to care. 9
This review explores the expanding role of AI in IBD care, focusing on its applications in disease diagnosis, classification, and monitoring (Figure 1). The integration of AI into clinical practice has the potential to streamline workflows, improve diagnostic precision, reduce variability, and enhance patient outcomes, marking a transformative step forward in the management of IBD.

AI applications in IBD management, showcasing innovations in diagnosis, monitoring, and personalized care.
Search strategy and review methodology
This article is a narrative review designed to synthesize key advances in AI applications in the diagnosis, classification, and monitoring of IBD. To identify relevant studies, we searched PubMed, Scopus, and Web of Science databases for English-language articles published between January 2012 to December 2024 using combinations of the following terms: “inflammatory bowel disease,” “ulcerative colitis,” “Crohn’s disease,” “artificial intelligence,” “machine learning,” “deep learning,” “natural language processing,” “radiomics,” “endoscopy,” “histology,” and “diagnostic imaging.”
We included peer-reviewed original studies, systematic reviews, and meta-analyses that applied AI to clinical or imaging data in the context of IBD. Exclusion criteria were non-English publications, conference abstracts, non-human studies, and papers focused purely on algorithm development without clinical relevance. As this was a narrative review rather than a systematic one, study selection was guided by relevance to clinical practice and recent impact in the field.
AI in disease diagnosis and classification
Differentiating UC and CD
Distinguishing ulcerative colitis (UC) from Crohn’s disease (CD) presents several challenges due to overlapping clinical, endoscopic, and histopathological characteristics. Endoscopically, UC is typically characterized by continuous colonic involvement, loss of the normal vascular pattern, and superficial ulcerations. While CD often features cobble stoning, deep ulcers, and strictures, 10 however, these findings are not pathognomonic, and overlap can occur. Atypical presentations, such as rectal sparing and patchy inflammation in UC, may resemble CD, further complicating the diagnostic process. 11 The use of immunohistochemical markers, (e.g., Das-1 and CG-3) and quantification of CD30+ lymphocytes and eosinophils in biopsy samples, has been proposed to enhance the differentiation between UC and CD. However, the use of immunohistochemical marker is limited by variability in marker expression, lack of absolute specificity, and the requirement for specialized expertise and standardized protocol.12,13 Several studies have highlighted the potential of AI in overcoming these challenges in distinguishing UC from CD through various advanced methodologies, thereby enhancing diagnostic accuracy.
Chierici et al. 14 developed a deep learning framework using endoscopic images to classify IBD subtypes and distinguish healthy controls. The model exhibited moderate performance in differentiating UC from CD (Matthews Correlation Coefficient (MCC) = 0.688). The model’s lower performance may reflect intrinsic difficulty in distinguishing IBD subtypes. Conversely, the model demonstrated excellent performance in differentiating UC from healthy controls with an MCC of 0.931. 14
Another study employed whole transcriptome analysis on endoscopic samples to identify differentially expressed genes, including PI3, ANXA1, and VDR, which exhibited significant performance in discriminating CD from UC with an area under the curve (AUC) of 0.84. This system leverages the differential expression of specific genes to accurately classify the two conditions. 15
Furthermore, Manandhar et al. 16 utilized supervised ML on gut microbiome data to identify differential bacterial taxa and operational taxonomic units that distinguish IBD patients from healthy controls and CD from UC. The model achieved an AUC of >0.90 for differentiating CD from UC. 16
Another AI system based on Raman spectroscopy, in conjunction with support vector machines (SVMs), demonstrated the ability to differentiate between CD and UC with 98.9% accuracy. This method utilizes the unique molecular signatures captured by Raman spectroscopy to distinguish between the two types of IBD, providing a highly accurate diagnostic tool. 17
These studies suggest that AI-based approaches show great promise in accurately distinguishing UC from CD, addressing critical gaps in clinical practice where misdiagnosis leads to suboptimal outcomes. Additionally, by integrating microbiome data and unique molecular signatures, this represents a significant leap in diagnostic precision. However, their clinical application requires further refinement, validation, and standardized protocols.
Endoscopy
AI technologies, particularly deep learning algorithms, have demonstrated remarkable accuracy in detecting and grading mucosal inflammation and lesions. Rimondi et al. 18 conducted a comprehensive systematic review and meta-analysis of studies evaluating the diagnostic accuracy of AI systems assessing mucosal healing in patients with UC using endoscopic images and videos. The AI systems exhibited sensitivity and specificity of 91% and 89%, respectively, and a diagnostic odds ratio (OR) of 92.42 for fixed images. For videos, sensitivity was 86%, specificity was 91%, and OR of 70.86. The AUC was 0.957 for fixed images and 0.941 for videos, indicating exceptional diagnostic performance. Despite the impressive diagnostic performance, the findings were subject to moderate to high heterogeneity due to variations in training algorithms, datasets, and mucosal healing definitions. 18
Xie et al. 19 evaluated the performance of a deep learning model in detecting and grading small-bowel CD ulcers from double balloon images. The model demonstrated high accuracies for detecting ulcers (96.3%), non-inflammatory stenosis (95.7%), and inflammatory stenosis (96.7%), with AUC values exceeding 0.98 for all categories. In ulcer grading, it exhibited accuracies of 87.3% (95% CI, 84.6%–89.6%) for ulcerated surface, 87.8% (95% CI, 85.0%–90.2%) for ulcer size, and 85.2% (95% CI, 83.2%–87.0%) for ulcer depth. Compared to endoscopists, the AI model outperformed junior and intermediate levels and performed similarly to senior endoscopists, demonstrating expert-level accuracy in lesion detection and objective severity grading. These findings highlight the potential of AI to enhance the accuracy, efficiency, and objectivity of small-bowel CD evaluations. 19
Capsule endoscopy (CE) is a crucial modality in diagnostic evaluation and assessing the extent of disease in patients with CD. Ferriera et al. developed and validated a CNN-based model for automatically detecting ulcers and erosions in the gastrointestinal tract using images from the PillCam™ Crohn’s Capsule (PCC). Images from 59 PCC examinations conducted at two centers were divided into a training dataset (80%) and a validation dataset (20%). The model achieved an overall accuracy of 98.8%, sensitivity of 98%, specificity of 99%, and an AUC of 1 for detecting ulcers and erosions in the PCC images. Notably, the CNN demonstrated comparable diagnostic accuracy to expert gastroenterologists. 20
Similarly, Klang et al. 21 developed a deep learning algorithm for the automated detection of small-bowel ulcers in CD CE images. The study included 17,640 images from 49 patients, AUC values of 0.94–0.99, and accuracy ranging from 95.4% to 96.7%, demonstrated high diagnostic accuracy. 21 In 2019, Aoki et al. 22 developed a CNN-based model for automatically detecting ulcers and erosions in computed CE images. The model exhibited overall performance with an AUC of 0.95, sensitivity of 88%, and specificity of 91%. 22
The transformative role of AI in improving diagnostic workflows, particularly in areas requiring precise lesion detection and grading is quite evident from these studies (Table 1). By automating image analysis with high accuracy and efficiency, AI-based models hold promise for enhancing the diagnostic process, standardizing assessments, and alleviating the clinical workload. Continued validation in diverse clinical settings and integration into practice are essential steps to realize the full potential of AI in IBD diagnostics.
Summary of AI applications in endoscopy, histology, and imaging for inflammatory bowel disease.
AI, artificial intelligence; AUC, area under the curve; CADe, computer-aided detection; CD, Crohn’s disease; CE, capsule endoscopy; CNN, convolutional neural network; CTE, computed tomography enterography; IBD, inflammatory bowel diseases; ICC Intraclass Correlation Coefficient; IHC, Immunohistochemistry; IUS, intestinal ultrasound; ML, machine learning; MSCT, multi-slice computed tomography; OR, odds ratio; UC, ulcerative colitis.
Histology
Several studies have evaluated the potential of AI for histological diagnosis of IBD. Furlanello et al. 23 utilized 4981 annotated histological images to develop an AI system for semi-automated detection and quantification of plasma cells, specifically focusing on basal plasmacytosis, a key histological feature in IBD. The model was validated using 356 biopsies from CD, UC, and control samples. The AI system demonstrated reliable detection of plasma cells with high sensitivity, with these cells being more prevalent in colonic regions compared to the ileum, aligning well with human assessments. UC cases exhibited significantly higher plasma cell counts compared to CD cases, reflecting established histological patterns. The OR for IBD diagnosis versus normal tissues was 4.968, highlighting the AI system’s accuracy. 23
The study by Rymarczyk et al. 24 utilized 6431 biopsies from 1189 patients enrolled in six phase II and III clinical trials for CD and UC. Biopsies were collected from specific anatomical sites, including the terminal ileum and colon for CD and the rectum and sigmoid colon for UC. The study evaluated three multi-instance learning methods and selected the model with the best overall performance for detailed analysis. Model predictions were compared against scores assigned by a central pathologist (gold standard) and an independent panel of five experienced pathologists. For colonic biopsies (CD and UC), the model achieved 87%–94% accuracy. For CD ileum biopsies, accuracy ranged from 76% to 83%. The authors acknowledged data imbalances, with fewer CD ileum biopsies and an overrepresentation of normal or mild disease severity in some subgroups, potentially limiting the model’s generalizability. 24
These studies highlight the ability of AI to complement pathologists by streamlining and standardizing histological assessments, particularly for large datasets (Table 1). However, challenges such as data diversity, representation of disease severity, and anatomical site variability must be addressed to enhance generalizability and clinical applicability.
Cross-sectional and diagnostic imaging
AI has been utilized in cross-sectional imaging modalities such as intestinal ultrasound (IUS) to identify bowel wall thickening, a surrogate marker for bowel inflammation in CD. For instance, Carter et al. 25 developed an AI-based system that achieved an overall accuracy of 90.1%, sensitivity of 86.4%, and specificity of 94% in detecting bowel wall thickening exceeding 3 mm on IUS images. This AI module facilitates the utilization of IUS by less experienced operators, potentially standardizing the interpretation of IUS imaging and enhancing diagnostic consistency. 25
Naziroglu et al. 26 employed magnetic resonance enterography (MRE) to evaluate a semiautomatic method for measuring bowel wall thickness (BWT) in patients with CD. The study analyzed the MRE dataset of 53 patients. The algorithm-generated measurements of BWT exhibited superior interobserver agreement compared to the human-assessed measurements (intraclass correlation coefficient 0.88 vs 0.45, p = 0.005). 26
Computed tomography enterography (CTE) is a valuable diagnostic tool for identifying IBD. Stidham et al. 27 employed ML to automate the assessment of cumulative ileal injury utilizing 8242 ileal mini segments extracted from 229 CTE scans of patients diagnosed with ileal CD. The ML-predicted injury grades exhibited substantial concordance with the radiologists’ assessments (kappa = 0.80), which is comparable to the inter-radiologist agreement (kappa = 0.87). Notably, the ML method demonstrated high accuracy (74.8% exact match with radiologists) and exhibited particularly strong performance in distinguishing mild-moderate from severe disease (88.6% accuracy 27 ; Table 1).
AI in IBD dysplasia
AI systems, particularly those utilizing deep learning algorithms, have accurately identified and classified neoplastic lesions in IBD patients.
Yamamoto et al. 28 evaluated the diagnostic accuracy of an AI system against four expert and three non-expert endoscopists using 862 non-magnified endoscopic images from 99 IBD-associated neoplasia. The AI system was designed to differentiate high-grade dysplasia and adenocarcinoma from low-grade dysplasia, sporadic adenomas, and non-neoplastic mucosa. The image-based diagnostic ability of the system yielded sensitivity, specificity, and accuracy of 64.5%, 89.5%, and 80.6%, respectively. The lesion-based diagnostic ability of the system yielded sensitivity, specificity, and accuracy of 74.4%, 85%, and 80.8%, respectively. The AI system demonstrated a higher accuracy of 79% compared to both experts (77.8%) and non-experts (75.8%). While the AI system outperformed experts in sensitivity (72.5% vs 60.5%), it had slightly lower specificity (82.9% vs 88.0%). Additionally, the study results may not be generalizable due to the smaller sample size. 28
Histologic evaluation is a vital component in diagnosing UC-associated cancer or dysplasia. Immunohistochemical analysis of p53 mutation in the biopsy samples is a key study to detect dysplasia and colitis-associated carcinoma. Noguchi et al. 29 developed a CNN model based on p53 positivity with an average precision of 0.71–0.754. The model predicted the p53 immunohistochemistry staining with an accuracy of 86%–91%. 29
A recent prospective, cross-sectional, non-inferiority study by López-Serrano et al. 30 compared a computer-aided detection (CADe) system (Discovery™) with virtual chromoendoscopy (VCE using iSCAN) during surveillance colonoscopy in patients with UC at risk for colorectal cancer. The CADe system demonstrated comparable diagnostic performance to VCE, identifying dysplasia in a similar proportion of lesions and patients. These findings offer important real-world validation for AI-assisted endoscopy in the IBD surveillance setting, while also highlighting practical considerations for integrating AI tools into routine clinical workflows. 30
AI has the potential to improve dysplasia detection in IBD through high-accuracy imaging, histology, and advanced optical tools, enhancing precision and consistency. Future work should focus on integrating AI into clinical practice for better outcomes. To further support reader understanding of AI-based imaging workflows in IBD, we include a schematic overview (Figure 2) illustrating how medical images such as those from colonoscopy or cross-sectional imaging are processed by AI systems to generate standardized outputs. These outputs can aid in disease activity scoring, dysplasia detection, or structural classification, ultimately enhancing real-time interpretation and clinical decision support.

AI-assisted workflow for dysplasia detection in colonoscopy images: a colonoscopy image is analyzed by an AI system to detect dysplasia. The result is displayed through a user interface, indicating whether dysplasia is present to support clinical decision-making.
Disease monitoring
Traditionally, scoring systems are employed to monitor disease activity in patients with IBD. Commonly utilized scoring systems for CD include the Crohn’s Disease Activity Index (CDAI), the Simple Endoscopic Score for Crohn’s Disease, the Crohn’s Disease Endoscopic Index of Severity (CDEIS), 31 and for UC the Mayo Endoscopic score and the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). 32 Several imaging-based scoring systems have been developed to assess disease activity for CD, such as the Magnetic Resonance Index of Activity (MaRIA), the Clermont and London score. 33 However, these scoring systems encounter significant limitations, including their time-consuming nature, variable sensitivity and specificity, and interobserver variability. These limitations restrict their clinical applicability and present an opportunity for AI to enhance patient care. Studies have demonstrated significant potential in addressing these limitations 34 (Table 2).
Summary of AI applications in disease monitoring and prognosis for inflammatory bowel disease.
AI, artificial intelligence; AUC, area under the curve; CD, Crohn’s disease; CDAI, Crohn’s Disease Activity Index; CDEIS, Crohn’s Disease Endoscopic Index of Severity; CTE, computed tomography enterography; GCR, Goblet Cell Ratio; HR, hazard ratio; ML, machine learning; PHRI, PICaSSO Histologic Remission Index; UC, ulcerative colitis; UCEIS, Ulcerative Colitis Endoscopic Index of Severity; VIGOR, Virtual gastrointestinal tract.
The study conducted by Fan et al. 35 evaluated the efficacy of a deep learning-based system in assessing inflammatory activity in UC using 5875 endoscopic images and 20 full-length videos from 332 UC patients. The system demonstrated an accuracy of 86.54% for Mayo Score classification and accuracies of 90.7%, 84.6%, and 77.7% for UCEIS sub-scores for vascular pattern, erosions/ulcers, and bleeding, respectively, with kappa coefficients exceeding 0.7. These findings exhibited a high level of agreement with the expert endoscopist. Additionally, the system generated two-dimensional images that provided visual representations of inflammation severity and distribution, facilitating the identification of disease activity. The study also reported the system’s ability to track changes in inflammation before and after treatment, which correlated with clinical outcomes. However, it was noted that the system exhibited lower performance for certain disease categories, particularly in cases of mild inflammation (Mayo 1) and UCEIS bleeding scores. 35 While the model demonstrated robust performance overall and strong agreement with expert endoscopists, its reduced accuracy in detecting mild inflammation and bleeding suggests the need for further refinement before routine clinical adoption.
Cai et al. 36 evaluated ML models for predicting disease activity based on non-invasive, routinely collected clinical and laboratory data from 876 individuals with IBD. The study included 601 patients with CD and 275 patients with UC. Disease activity was assessed using the CDAI for CD and the Mayo score for UC. Out of the seven algorithms tested, the SVM exhibited superior performance for predicting disease activity in CD patients, while the Adaptive Boosting (AdaBoost) algorithm demonstrated the best performance for UC patients. For disease activity prediction, the ML models achieved an accuracy of 93%, sensitivity of 94.7%, specificity of 92%, and an AUC of 0.975 for SVM. Similarly, AdaBoost achieved an accuracy of 85.5%, sensitivity of 84.4%, specificity of 87.5%, and AUC of 0.911 for UC patients. 36 These findings demonstrate ML models’ high accuracy and reliability in predicting disease activity, suggesting potential applications in clinical decision-making. However, the study’s retrospective design, potential for selection bias, and lack of external validation limit its generalizability and emphasize the need for prospective, multicenter validation before clinical integration.
Puylaert et al. 37 developed a semiautomatic scoring system to evaluate the disease activity for CD using the MRI. Virtual gastrointestinal tract (VIGOR) score was derived from semiautomatic measurements of BWT, excess volume, and dynamic contrast enhancement, combined with radiologist-assessed features like mural T2 signal. Scores were compared to established MRI activity indices (MaRIA, London score, and CD MRI index) and the CDEIS as the reference standard. The VIGOR score exhibited moderate correlation with the CDEIS (r = 0.58 for observer 1 and r = 0.59 for observer 2), comparable to other MRI activity scores. Notably, the VIGOR score demonstrated superior interobserver agreement (ICC = 0.81 vs 0.44–0.59). Furthermore, the VIGOR score achieved a diagnostic accuracy of 80%–81% for detecting active disease, comparable to other scores. 37
The study by Rymarczyk et al. 24 developed AI-based severity scoring system both for CD and UC utilizing histological dataset. The model’s performance was evaluated by comparing its predicted disease severity with the assessments of central readers. The global histology activity score for CD demonstrated accuracies ranging from 65% to 89% with kappa values of 0.46–0.67. Similarly, the simplified Geboes score for UC achieved accuracies ranging from 65% to 85% with kappa values of 0.44–0.68. The model exhibited optimal performance at the extremes of severity (grades 0 and 3) but exhibited a decline in accuracy for intermediate grades. The model’s performance was comparable to that of independent pathologists for most features and subgrades, with minor discrepancies in accuracy for specific categories (detecting neutrophils in CD ileum biopsies). Furthermore, the model’s predictions of histological improvement were significantly correlated with clinical remission and endoscopic improvement. 24
AI is transforming disease monitoring in IBD by enhancing the accuracy and efficiency of activity assessments. It supports scoring systems, such as Mayo and UCEIS, through endoscopic image analysis, predicts disease activity using clinical data, and refines imaging evaluations with MRI and CTE-based tools.
In addition to transforming traditional scoring methods, AI aligns with the STRIDE-II framework, a cornerstone in IBD management that emphasizes a treat-to-target approach. STRIDE-II defines therapeutic goals, progressing from clinical remission and biomarker normalization to endoscopic and histological healing. 38 AI tools enhance this framework by enabling real-time, objective assessments of disease activity and treatment response. For example, endoscopic AI systems refine mucosal healing evaluations, while ML models predict treatment outcomes and track biomarker normalization, supporting intermediate goals. Advanced imaging-based AI tools provide detailed assessments of transmural and mucosal healing, aligning with long-term targets. By integrating AI-driven insights with STRIDE-II, clinicians can implement timely interventions and personalized care strategies, ensuring optimal outcomes for patients with IBD.
Prognosis
A substantial amount of research has been conducted to identify the prognostic factors that can reduce the risk of complications and prompt timely intervention. Nevertheless, these factors have limited predictive value due to their inability to distinguish between disease remission and activity, lack of standardization, and variable predictive accuracy. Studies have demonstrated that AI-based models generally exhibit superior prognostic accuracy compared to conventional methods in IBD (Table 2).
Iacucci et al. 39 evaluated an AI-based computer-aided diagnosis system for assessing histological disease activity and predicting clinical outcomes in UC. The model was developed utilizing the PICaSSO Histologic Remission Index (PHRI). The model demonstrated the ability to differentiate between histologic remission and activity with a sensitivity of 89%, specificity of 85%, and accuracy of 87%. Furthermore, the model-predicted PHRI exhibited a strong correlation with flare-ups within the subsequent year, with a hazard ratio of 4.64 (95% confidence interval (CI): 2.76–7.80), comparable to or surpassing human assessments. 39
Ohara et al. 40 developed a deep learning-based model to quantify goblet cell mucus (GCM) in colonic biopsies from 114 UC patients with clinical and endoscopic remission (Mayo Endoscopic Subscore ⩽1). The model demonstrated high accuracy in identifying GCM areas in histologic images. Patients who experienced relapse (Mayo score ⩾3) within 12 months exhibited significantly lower Goblet Cell Ratio (GCR; defined as the ratio of GCM area to epithelial cell) in the rectum, cecum, and ascending colon compared to the relapse-free group. A GCR threshold of ⩽12% in rectal specimens was strongly associated with relapse (45% vs 6.5%, p < 0.01). Interobserver agreement for pathologists assessing mucin depletion was moderate (Cohen’s kappa = 0.59), while the AI model exhibited excellent reproducibility. 40 Likewise, Klein et al. 41 designed a system to analyze baseline histological images from individuals with CD. This system demonstrated the ability to predict the likelihood of developing fibro stenosis (AUC 0.74) and internal penetrating disease behavior (AUC 0.78) within 5 years. 41
A recent systematic review by Maeda et al. 42 of AI-assisted colonoscopy in identifying histologic remission and predicting clinical outcomes in patients with UC reported that AI systems demonstrated performance comparable to or exceeding experienced endoscopists in detecting the histologic remission. The sensitivity ranged from 65% to 98%, and the specificity ranged from 80% to 97%. 42 Moreover, these models demonstrated the ability to identify patients at risk of relapse based on both endoscopic and histological features.
Imaging-based studies offer a non-invasive approach to monitor disease activity and predict outcomes. Several studies have demonstrated the utility of radiomics in predicting IBD outcomes. Zeng et al. 43 developed and validated a radiomics nomogram for IBD patients using multi-slice computed tomography (MSCT) and clinical data to stratify the risk of intestinal fibrosis. The study included data from 218 IBD patients (113 with CD and 105 with UC) who underwent MSCT imaging and endoscopic or histological evaluations. A clinical-radiomics nomogram was constructed by integrating selected radiomics features with clinical factors (e.g., lesion location, engorged vasa recta, and computed tomography (CT) value of arterial phase enhancement). In the training set, the nomogram demonstrated AUC of 0.971 and in the test set, it had an AUC of 0.865 (95% CI: 0.738–0.992) and an accuracy of 79%, reflecting excellent predictive performance. This study indicates that the integration of radiomics and clinical data can provide superior predictive accuracy compared to models using single data sources. 43
The study by Zhu et al. 44 evaluated radiomics models based on CT imaging of the bowel wall and mesenteric adipose tissue to identify the severity of colonic fibrosis and predict clinical response to biologics in UC patients. Radiomics features were extracted from the CT images of 119 UC patients (patients undergoing proctocolectomy, 72 and patients starting biologics, 47). Two radiomic models were developed: bowel wall radiomic model (BW-RM) focused on bowel wall features, and the mesenteric adipose tissue radiomic model (MAT-RM) focused on mesenteric fat characteristics. Regarding predicting colonic fibrosis, BW-RM had an AUC of 0.86, and MAT-RM performed with an AUC of 0.83. Both models significantly outperformed visual assessment by radiologists (AUC ~0.60). In predicting response to biologics, MAT-RM showed superior performance compared to BW-RM (AUC; 0.71–0.80 vs 0.61–0.72). 44
Stidham et al. 27 studied the CTE image-based ML model to quantify cumulative ileal damage and to predict surgical outcomes. They compared ML-derived scores with traditional imaging features to predict bowel resection within 3 years. Patients who underwent surgery had significantly higher Simple Cumulative Ileal Damage Severity scores (S-CIDSS (46.6 vs 30.4, p = 0.0007)) and mean severity grades (1.80 vs 1.42, p < 0.0001). ML models combining S-CIDSS and mean severity grade achieved an AUC of 0.76 for predicting surgery, outperforming traditional imaging measures (AUC 0.62). 27 These studies support the integration of radiomics into clinical workflows for personalized management of IBD patients.
Another study based on data from the OptumLabs® Data Warehouse assessed the feasibility of ML in predicting adverse outcomes in IBD patients. The study included 72,178 patients in the training set and 69,165 in the validation set, and 108 predictive variables were incorporated in model evaluation. Random forest had the best overall performance for predicting IBD-related hospitalization (AUC 0.73), biologic initiation (AUC 0.92), steroid use (AUC 0.81), and surgery (AUC 0.71). Common predictors of adverse outcomes included prior hospitalizations, use of steroids, antibiotics, and biologics, frequency of office visits, and IBD-related procedures. 45 The study highlights the potential of ML models to predict adverse outcomes in IBD and paves the way for implementing data-driven, preemptive care in IBD management.
AI demonstrates significant potential in improving prognostication for IBD, offering superior accuracy in predicting disease relapse, fibrosis, and adverse outcomes. By leveraging histological, imaging, and clinical data, AI-based models enable personalized, data-driven care. Future efforts should focus on integrating these models into routine practice to enhance predictive precision and patient management.
Personalized treatment strategy
Managing IBD requires personalized strategies due to its diverse phenotypes, variable disease progression, and outcomes. AI has the potential to revolutionize IBD care by processing patient data, including medical history and treatment outcomes, to generate tailored treatment recommendations that align with individual needs and preferences. 46 This not only enhances patient adherence and clinical outcomes but also optimizes direct and indirect costs including payers’ financial returns. By integrating omics data, historical records, and predictive treatment responses, AI can uncover patterns and biomarkers that guide the selection of the most effective therapies for each patient.47,48 Additionally, as the medicine cabinet for IBD expands, offering novel therapeutic options such as Janus kinase inhibitors, S1P receptor modulators, and anti-integrin therapies. AI can integrate emerging data on these agents to refine their positioning within treatment algorithms and ensure their appropriate use in clinical practice.
It has been reported that single therapeutic agents often reach a plateau with limited remission rates 49 recognized as the “therapeutic ceiling.” This could be due to multiple pathological pathways driving the inflammatory process in the IBD. Treatment strategies involving combination of established single agents could be an alternative approach to address the complex inflammatory pathways. 49 The existing biologic therapies for IBD are supported by predictive models and clinical decision-support tools, yet there is room to improve their evaluation and application.47,50 –52 AI’s capacity to refine these tools and assess their predictive performance can lead to more precise treatment strategies, resulting in better patient outcomes and more effective disease management. 53
AI and ML algorithms can forecast disease trajectories from diagnosis, enabling clinicians to determine the most suitable treatment pathways. These algorithms excel at detecting patterns within structured data while MLP tools provide insights from unstructured sources such as clinical notes and patient-reported outcomes, supporting the creation of highly individualized management and treatment care plans.
Real-world implementation and ongoing clinical trials
In addition to retrospective model development, recent efforts have also focused on operationalizing AI tools for both clinical practice and within IBD clinical trials. A real-time deep learning system for Mayo Endoscopic Score classification has been evaluated in endoscopy units to assist in disease activity grading and reduce interobserver variability. 54 In parallel, AI-designed therapeutics such as ISM5411 have progressed through phase I trials, with phase II studies in UC expected to begin in late 2025, marking early translation of AI into therapeutic discovery. 55
Complementing these efforts, emerging translational frameworks are shaping how AI may be integrated into IBD trials and care pathways. Sedano et al. 56 proposed a comprehensive AI roadmap for IBD clinical trials, emphasizing its role in digital enrichment, automated eligibility screening, and dynamic outcome assessment. Similarly, Ahmad et al. 57 highlighted the utility of AI-assisted endoscopy for reducing variability in mucosal healing endpoints, a major challenge in trial reproducibility. These frameworks underscore the increasing recognition of AI as both a diagnostic and trial optimization tool, even as formal implementation studies remain limited.
Current limitations
While the integration of artificial AI into IBD care shows considerable promise, several limitations must be addressed before widespread clinical adoption is feasible. First, data heterogeneity across institutions and platforms presents a significant challenge. Most AI models are trained on retrospective, single-center datasets using varying imaging protocols, histologic scoring systems, and clinical documentation standards, limiting cross-institutional reproducibility. Additionally, the lack of external validation for many models restricts their generalizability, with few undergoing prospective or real-world testing. Generalizability concerns are further compounded by narrow patient populations, particularly when training cohorts lack diversity in age, ethnicity, or disease phenotype. These limitations increase the risk of model performance degradation in broader clinical populations. Another major concern is algorithmic bias, which can arise from training datasets that reflect existing disparities in healthcare delivery or documentation. Without careful oversight, AI models may inadvertently perpetuate health inequities, particularly for underserved populations. From a practical standpoint, integration into clinical workflows remains challenging. Many AI tools are not designed with interoperability or end-user efficiency in mind, resulting in poor adoption or redundancy with existing systems. This highlights the importance of co-designing tools with input from clinicians, informaticians, and patients. Lastly, regulatory and ethical considerations including patient privacy, data governance, and the explainability of AI models pose ongoing hurdles. Technical pitfalls and overreliance on automation warrant caution. Many models, especially deep learning systems, are susceptible to overfitting, wherein performance on training data is strong but fails to generalize to unseen patient populations. In addition, the black-box nature of most AI algorithms can obscure how decisions are made, limiting interpretability and raising concerns about trust, transparency, and accountability in clinical care. As AI becomes more embedded in diagnostic workflows, there is also a risk of overreliance, where clinicians may defer too heavily to algorithmic outputs. This highlights the need to maintain clinical oversight and emphasizes that AI should augment—not replace—human judgment. Future development must prioritize explainability, validation in diverse settings, and continuous feedback mechanisms to ensure safe and effective deployment. Additionally, the impact of AI on healthcare disparities, particularly its potential to either exacerbate or reduce inequities in under-resourced settings, remains a critical area for ongoing evaluation.
Future directions and conclusion
Looking ahead, the future of AI in IBD lies in collaborative, cross-disciplinary innovation. Key priorities include building multi-institutional and diverse datasets to enhance generalizability, developing explainable AI models to enhance trust and improve interpretability; and embedding these tools into clinical decision support systems that are both intuitive and actionable for healthcare providers. The incorporation of multi-modal data including genomics, microbiome profiles, imaging, and patient-reported outcomes offers a unique opportunity to develop robust predictive models for disease trajectory, treatment response, and complications. Federated learning and other privacy-preserving AI frameworks may help overcome data-sharing barriers while enabling for broader validation efforts. AI has the potential to support treat-to-target strategies, streamline disease monitoring, and individualize therapy based through real-time data. As regulatory frameworks evolve and health systems embrace digital transformation, AI is poised to become an indispensable component of precision medicine in IBD.
In conclusion, while substantial challenges remain, the current trajectory of AI in IBD is promising. With continued research, rigorous validation, and ethical implementation, AI has the potential to revolutionize IBD care by enhancing diagnostic accuracy, reducing clinical variability, and enabling personalized treatment pathways.
