Abstract
Colonoscopy remains the gold standard exam for colorectal cancer screening due to its ability to detect and resect pre-cancerous lesions in the colon. However, its performance is greatly operator dependent. Studies have shown that up to one-quarter of colorectal polyps can be missed on a single colonoscopy, leading to high rates of interval colorectal cancer. In addition, the American Society for Gastrointestinal Endoscopy has proposed the “resect-and-discard” and “diagnose-and-leave” strategies for diminutive colorectal polyps to reduce the costs of unnecessary polyp resection and pathology evaluation. However, the performance of optical biopsy has been suboptimal in community practice. With recent improvements in machine-learning techniques, artificial intelligence–assisted computer-aided detection and diagnosis have been increasingly utilized by endoscopists. The application of computer-aided design on real-time colonoscopy has been shown to increase the adenoma detection rate while decreasing the withdrawal time and improve endoscopists’ optical biopsy accuracy, while reducing the time to make the diagnosis. These are promising steps toward standardization and improvement of colonoscopy quality, and implementation of “resect-and-discard” and “diagnose-and-leave” strategies. Yet, issues such as real-world applications and regulatory approval need to be addressed before artificial intelligence models can be successfully implemented in clinical practice. In this review, we summarize the recent literature on the application of artificial intelligence for detection and characterization of colorectal polyps and review the limitation of existing artificial intelligence technologies and future directions for this field.
Keywords
Introduction
Colorectal cancer (CRC) is the third most common cancer and second most common cause of cancer deaths worldwide. 1 Colonoscopy reduces the risk of CRC through detection and resection of pre-cancerous lesions such as adenomas. 2 The ability to detect adenomas during colonoscopy (ADR) is greatly operator dependent, with studies reporting a wide ADR range of 7% to 53% among different endoscopists. 3 Failure to detect and remove neoplastic lesions is associated with the development of interval CRC, which accounts for nearly 10% of all diagnosed CRC. 4 In addition, most of the detected polyps during colonoscopy are diminutive in size (1–5 mm), with a negligible risk of progression to cancer. 5 Unnecessary resection and pathology evaluation of these non-neoplastic lesions are associated with increased costs and adverse events. The American Society for Gastrointestinal Endoscopy (ASGE) has published a Preservation and Incorporation of Valuable endoscopic Innovations (PIVI) statement for optical biopsy of diminutive polyps. The “resect and discard” paradigm is recommended when the optical biopsy provides 90% agreement with histologic assessment for post-polypectomy surveillance intervals, and the “diagnose and leave” strategy is recommended for hyperplastic polyps when the negative predictive value (NPV) for diminutive rectosigmoid adenomas is 90% or more. 6 However, the performance of endoscopists’ optical biopsy has not consistently reached these thresholds in community practice.
To overcome these challenges, artificial intelligence (AI) has been introduced to the field of endoscopy. AI-assisted computer-aided detection (CADe) and diagnosis (CADx) systems, especially deep-learning techniques, are promising options to improve detection and optical biopsy and decrease human variation through the ability to process high-dimensional endoscopic data and to self-identify trainable parameters not appreciable to humans. The application of computer-aided design (CAD) on real-time colonoscopy has been shown to increase the ADR, reduce the withdrawal time, improve endoscopists’ optical biopsy, while reducing the time to make a diagnosis. Yet, issues such as real-world applications and regulatory approval need to be addressed before AI models can be successfully implemented in clinical practice. In this review, we summarize the recent literature on the application of AI for detection and characterization of colorectal polyps, and review the clinical implementation, current limitation of existing AI technologies, and future directions for this field.
AI for detection of colorectal polyps (CADe)
Table 1 summarizes the important studies on CADe for detection of colorectal polyps.
Clinical studies on computer-aided detection (CADe) for colorectal polyps.
ADR, adenomas during colonoscopy; AI, artificial intelligence; AQCS, Automatic Quality Control System; AUC, area under the curve; CADe, computer-aided detection; CNN, convolutional neural network; NBI, narrow band imaging; PPV, positive predictive value; RCT, randomized controlled trials; WLI, white light imaging.
The initial CADe systems were reported in the early 2000s.7,8,24 These systems were designed with a handcrafted algorithm, based on certain polyp features, and provided accuracy more than 90%. Several other groups designed and evaluated different handcrafted CADe solutions, using small numbers of static images. While these systems typically showed high accuracy on carefully chosen data sets, they were limited in real-world application due to low sensitivity, high false-positive rates, and long processing time. More recently, deep-learning algorithms such as convolutional neural networks (CNNs) have been utilized for the development of CADe systems, enabling the continuous recognition of abnormal lesions without the need for external input. Using 50 polyp and 85 non-polyp videos, Misawa and colleagues 11 developed a three-dimensional CNN-based CADe with a sensitivity and specificity of 90% and 63%, respectively. Urban and colleagues reported the first real-time application of CNN-based CADe, trained on more than 8,000 images from 2,000 patients. Their CADe showed 97% sensitivity, 95% specificity, and 96% accuracy for detection of colorectal polyps, which was superior to the performance of the endoscopist (45% vs 36%). The unique feature of this study was that of the 73 polyps missed by endoscopist, 67 were detected by CADe, with a false-positive rate of 5%. 12 Klare and colleagues prospectively studied CADe during live colonoscopy performed by a trained endoscopist while a second observer monitored the CADe output. The system analyzed with an average delay of only 50 ms and achieved a polyp detection rate (PDR) of 51% and ADR of 29%, comparable to the endoscopist’s PDR of 56% and ADR of 31%. The first commercially available CADe (GI-Genius, Medtronic) was recently studied in a retrospective validation trial which showed an excellent performance with a per-lesion sensitivity rate of 99.7%. 15
To date, eight randomized controlled trials (RCTs) have compared CADe to standard colonoscopy, all demonstrating a significantly higher ADR by CADe. Wang and colleagues reported the first RCT (non-blinded) on 1,058 patients (536 with CADe, 522 without CADe) and reported a significantly higher ADR (29.1% vs 20.3%, p < 0.001) and increased number of adenomas per patient (0.53 vs 0.31) in the CADe group. However, the increased ADR was limited to an increase in detection of diminutive adenomas, and there was no difference in detection of polyps more than 10 mm between the two groups. Moreover, a higher proportion of polyps detected by CADe were hyperplastic (43.6% vs 34.9%) and there was no difference in the proportion of detected advanced adenomas or sessile serrated lesions (SSL) between the two groups. 17 The same authors performed a double-blind RCT using sham-AI and showed significantly greater ADR in the CADe than the sham group (34% vs 28%, p = 0.03). 18 Su and colleagues designed a CADe that was able to evaluate the quality of bowel preparation and measure the withdrawal time. In their study, 308 and 315 patients were analyzed in the CADe and control groups. The CADe group had a significantly higher ADR (29% vs 17%, p < 0.001) with prolonged exposure time (7.0 vs 5.6 min, p < 0.001) and adequate bowel preparation. 22 Liu and colleagues 21 conducted an RCT on 1,026 patients and found that the CADe group had a significantly higher ADR (39% vs 24%, p < 0.001). Repici and colleagues conducted a multicenter RCT for the GI Genius CADe system on 685 patients and identified a significantly higher ADR in the CADe group (54.8% vs 40.4%). It is important to note that this study showed higher ADR for both diminutive (33.7% vs 26.5%) and small (6–9 mm) size adenomas (10.6% vs 5.8%) which was irrespective of the polyp shape or location. Another unique feature of this study was its high baseline ADR, as opposed to the aforementioned studies. 20 Gong and colleagues developed a CADe with the ability to recognize cecal intubation. In addition to showing a significantly higher ADR in the CADe group (16% vs 8%, p = 0.001), they demonstrated a significantly higher detection rate for advanced polyps as well (3% vs 1%). 19 Wang and colleagues conducted the first randomized tandem trial comparing CADe with standard colonoscopy. The adenoma miss rate was significantly lower in the CADe group (13.8% vs 40.0%, p < 0.001) and was significant for diminutive (39.6% vs 13.1%, p < 0.001), and small polyps (46.9% vs 13.7%, p < 0.0001), but not for the polyps bigger than 10 mm in size (15.3% vs 33.3%). 23 They further evaluated the miss rate among visible polyps (exposed but not recognized by endoscopists) and invisible (not exposed) and reported that CADe rarely misses that polyp if the mucosa is exposed by the operator (visible in the CADe: adenoma miss rate 1.5%, polyp miss rate 2.3%). Regarding sessile serrated polyps, serrated miss rate was found not to be significantly different between the two groups.
AI for characterization of colorectal polyps (CADx)
Table 2 summarizes the studies on CADx for characterization of colorectal polyps.
Clinical studies on computer-aided diagnosis (CADx) for characterization of colorectal polyps.
AI, artificial intelligence; AUC, area under the curve; BLI, blue light imaging; CADx, computer-aided diagnosis; CNN, convolutional neural network; NBI, narrow band imaging; NPV, negative predictive value; PPV, positive predictive value; SSL, sessile serrated lesion; SVM, support vector machine; WLI, white light imaging.
CADx for digital image-enhanced endoscopy
Narrow band imaging (NBI; Olympus Corp., Tokyo, Japan)–based CADx systems are the most extensively studied modality to date. The initial CADx systems utilized a support vector machine (SVM) and were made for magnifying NBI, which limited the widespread use of these systems in clinical practice.25,26,28 Recent integration of CNN with CADx has resulted in systems with higher diagnostic accuracy and faster processing times.31,40,41 Using standard non-magnified NBI, Chen and colleagues 31 developed a CNN-based CADx that had sensitivity, specificity, positive predictive value (PPV), NPV, and accuracy of 96.3%, 78.1%, 89.6%, 91.5%, and 91%, respectively. Byrne and colleagues developed the first CADx that reached the ASGE optical biopsy thresholds in real-time clinical practice. 34 Using standard NBI, they trained the CADx with 223 polyp videos (60,089 frames) and tested their system on 125 diminutive polyp videos, of which credibility score did not reach more than 50% for 19 polyps. Of the remaining 106 polyp videos, the sensitivity, specificity, PPV, NPV, and accuracy for identifying diminutive adenomas and hyperplastic polyps were 98%, 83%, 90%, 97%, and 94%, respectively. Zachariah and colleagues 37 designed a CNN-based CADx with both white-light imaging (WLI) and NBI that exceeded the ASGE PIVI thresholds with NPV and accuracy of 93% and 94%, respectively. This study resulted in accurate automatic classification of diminutive polyps, irrespective of endoscopists’ experience and NBI usage, which could potentially be a positive factor for the community endoscopists. Using both NBI and blue light imaging (BLI), Zorron Cheng Tao Pu developed a CADx based on the modified Sano (MS) classification and validated it with two internal and external polyp image data sets.39,42 The CADx had a mean area under the curve (AUC) of 94.3% for the internal set, and 84.5% and 90.3% for the external sets (NBI and BLI, respectively). A unique feature of this study was to show an equal highly accurate CADx prediction across two different imaging technologies (NBI and BLI), suggesting the potential to have a CADx trained and used with two different technologies, even when the predicted endoscopy imaging technology is not part of the training set. Moreover, the CADx AUC was comparable with experts and similar with both NBI and BLI. Song and colleagues developed and compared their CNN-based CADx model with both trainees and NBI expert endoscopists. The CADx system had a significantly higher diagnostic accuracy (81%–82%) compared with the trainees (63.8%–71.8%, p < 0.01), and comparable to the experts (82.4%–87.3%, p = 0.72). 35 Importantly, the addition of CADx as a support tool resulted in significant improvement in trainees’ diagnostic accuracy (63.8%–72% vs 82.7%–84.2%, p < 0.001). Similar results were also noted by Jin and colleagues, who showed that the addition of CADx as a support tool resulted in improvement of endoscopists’ diagnostic accuracy (82.5% to 88.5%, p < 0.05). The greatest improvement was noted in novice endoscopists (73.8% to 85.6%, p < 0.05), almost reaching the accuracy of experts (89.0%, p = 0.10). 38
CADx for chromoendoscopy
There are a few older studies on CADx for chromoendoscopy. Takemura and colleagues developed a software that enabled computer-aided prediction of pit pattern by extracting six features (e.g. area, perimeter, circularity) from crystal violet–stained images. Their CADx performed surprisingly well, with 98.5% accuracy. 27 Pit pattern classification requires crystal violet staining by endoscopist, and the depth of color depends on how much dye is sprayed. Therefore, it is difficult to obtain uniform image quality and as a result, to obtain robust CADx for chromoendoscopy.
CADx for white-light imaging
Studies on CADx for WLI have failed to report high diagnostic accuracy, likely because optical diagnosis using WLI is usually less informative than by NBI or chromoendoscopy. Komeda and colleagues developed a WLI-based CADx model with a reported diagnostic accuracy rate of only 75.1%. Sánchez-Montes WLI-based CADx reached 95.0% sensitivity, 87.9% specificity, 82.6% PPV, 96.7% NPV, and 91.1% accuracy for differentiating diminutive rectosigmoid adenomas.33,30
CADx for endocytoscopy
Endocytoscopy (H290ECI, Olympus, Tokyo, Japan) is a novel in vivo microscopic imaging technique that allows real-time visualization of cellular and microvascular patterns of colorectal polyps. 43 Endocytoscopy is considered ideal for pairing with CAD systems because it consistently provides focused, fixed-size images, thus facilitating easier image analysis. In 2015, Mori and colleagues developed a CAD system which used stained feature extraction to predict neoplastic polyps in 152 patients. Polyps less than 10 mm were analyzed in real-time and the system was able to achieve a sensitivity of 92.0% and specificity of 79.5%, with an accuracy of 89.2% for identifying neoplastic changes, comparable to those of expert endoscopists. 44 In a prospective trial on 791 patients and 466 diminutive rectosigmoid polyps, the NPV was 93.7%, reaching the performance level required for the ASGE diagnose-and-leave strategy. 32 Misawa and colleagues 29 developed an NBI-based CADx for endocytoscopy that achieved more impressive results with overall sensitivity of 84.5%, specificity of 97.6%, and accuracy of 90.0% using the existing training images. When the resulting probability of diagnosis was greater than 90%, the result was considered a “high-confidence” diagnosis. These diagnoses carried an overall sensitivity of 97.6%, specificity of 95.8%, and accuracy of 96.9%, surpassing the proposed cutoffs for the diagnose-and-leave strategy. 29 In a retrospective comparison of 30 endoscopists (trainee and expert) of both stained endocytoscopy and NBI images versus endocytoscopy, endocytoscopy identified colon lesions with 96.9% sensitivity, 100% specificity, 98% accuracy, 100% PPV, and 94.6% NPV, which were all significantly greater than those of the endoscopy trainees and experts. For NBI, endocytoscopy distinguished neoplastic from non-neoplastic lesions with 96.9% sensitivity, 94.3%, 96.0% accuracy, 96.9% PPV, and a 94.3% NPV, all significantly higher than those of the endoscopy trainees. Sensitivity and NPV were significantly higher, but the other values are comparable to those of the experts. 36 A recent cost-effectiveness analysis on the use of AI for implementing the diagnose-and-leave strategy showed that through AI, 145 rectosigmoid diminutive polyps were not resected, which suggested that one could reduce the average colonoscopy cost and the gross annual reimbursement for colonoscopies by 18.9% and US$149.2 million in Japan, 6.9% and US$12.3 million in England, 7.6% and US$1.1 million in Norway, and 10.9% and US$85.2 million in the United States, respectively. 45 However, endocytoscopy is not widely used in clinical practice. Given its cost-efficient potential, more attention should be paid toward regulation, accessibility, and effective implementation of this powerful technology.
Full workflow systems (CADe + CADx)
To enhance the integration of CAD systems into clinical practice, full workflow systems with the ability to perform both polyp detection and characterization have been developed. Mori and colleagues 17 designed a novel CAD that included two algorithm, a deep learning–based CAD for polyp detection with WLI, and an algorithm for optical biopsy by endocytoscopic images. Guizard and colleagues 46 developed a full work flow system using both WL and NBI, which was also able to tag polyps with unique identifiers that could be tracked throughout the procedure. Ozawa and colleagues designed a CNN-based CAD for both WLI and NBI, using a single-shot MultiBox detector that could detect and characterize a target object simultaneously. For WLI, the sensitivity and PPV were 90% and 83%, and for NBI, the sensitivity and PPV were 97% and 98%, respectively. Among those lesions that were accurately identified as polyps, 83% were correctly classified through images and 97% of adenomas were precisely identified under the WLI. 17
Limitations and future directions
While AI technologies have shown impressive results for detection and histologic prediction of colorectal polyps, there are still several points that need to be addressed before the use of CAD can be implemented in routine clinical practice. To improve the reliability and minimize bias, the performance of CAD systems should be evaluated in prospective RCTs, conducted in both community and academic centers, and among endoscopists with different levels of experience. The preferred study endpoint would be those of ASGE PIVI strategies, for example, the design of the CAD models should use widely available technology (such as standard NBI), with the ability to process raw videos taken during real-time colonoscopy. Moreover, training should be performed with a large number of standardized high-quality data sets, and testing should be done with several data sets and diverse contents. Recently, Misawa and colleagues launched a publicly accessible colonoscopy video database (SUN-database) that contains 49,799 polyp frames annotated with bounding boxes and 102,761 frames without polyps, making a total of 152,560 frames. 47 It is important to note that the pathology is not always the gold standard for diagnosis, especially regarding the ⩽3 mm colorectal lesions. In a recent study on 644 colon polyps ⩽3 mm in size, there was a 28.9% (13.2% HPs, 0.3% SSLs, and 15.4% normal mucosa; respectively) discrepancy between expert endoscopic and histologic opinion, of which 15.4% were diagnosed as normal by the pathologist. Following a blinded optical evaluation by two expert endoscopists, agreement with the endoscopic diagnosis was made in 94% and 100% of cases, respectively. 48 Based on these data, Shahidi and colleagues evaluated the application of AI as the arbitration between endoscopist and pathologist when discordant diagnoses occur. They used an established real-time AI clinical decision support solution (CDSS), which agreed with the endoscopic diagnosis in 89.6% lesions. In discordant cases, CDSS agreed with the endoscopic diagnosis in 90.3% lesions. Interestingly, of those lesions identified on pathology as normal mucosa, CDSS agreed with the endoscopic diagnosis in 90.9% of cases. 49 In addition to adenomas, the CAD designs should also focus on detecting the proximal colon lesions, specifically SSLs.
Obtaining regulatory approval is an essential factor for using CAD systems in clinical practice. Currently, the CAD EYE™ (Fujifilm Corp, Tokyo, Japan), DISCOVERY™ (Pentax Corp, Tokyo, Japan), Endo-AID (Olympus Corp), and GI-Genius (Medtronic Corp, Minneapolis, MN) have successfully obtained the regulatory approval, which hopefully will open doors for more platforms. Medico-legal issues are important topics to be discussed. As AI systems do not always provide accurate information, negative results due to the use of AI can possibly happen, which could lead to medico-legal challenges. We should recognize the strengths and weaknesses of AI and avoid over relying on the results of AI. However, with wide spread of the AI tools in medical fields, we will have to reconsider the medico-legal issues in the near future.
Summary
In recent years, the application of AI has significantly expanded in the field of gastrointestinal endoscopy. Multiple studies have shown that integration of CAD with colonoscopy can improve the endoscopists’ performance in detection and characterization of colorectal polyps, which are promising steps toward improving and standardizing colonoscopy quality and implementing the ASGE PIVI paradigm, among others. However, the majority of these data are based on small studies at tertiary care centers, with relatively small number of images used for the AI model’s training set, with possible selection bias and no randomization. There is a substantial need for large, multicenter clinical trials to establish the diagnostic accuracy of AI technology in real-time clinical practice, which will be an essential step for obtaining regulatory approval and widespread use of AI technologies.
Footnotes
Conflict of interest statement
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: M.F.B.: CEO and shareholder: Satisfai Health; founder of AI4GI joint venture. Co-development agreement between Olympus America and AI4GI in artificial intelligence and colorectal polyps. N.P. has no conflicts to declare.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
