Abstract
Colorectal cancer is the third most prevalent cancer worldwide, and its treatment has been a demanding clinical problem. Beyond traditional surgical therapy and chemotherapy, newly revealed molecular mechanisms diversify therapeutic approaches for colorectal cancer. However, the selection of personalized treatment among multiple treatment options has become another challenge in the era of precision medicine. Artificial intelligence has recently been increasingly investigated in the treatment of colorectal cancer. This narrative review mainly discusses the applications of artificial intelligence in the treatment of colorectal cancer patients. A comprehensive literature search was conducted in MEDLINE, EMBASE, and Web of Science to identify relevant papers, resulting in 49 articles being included. The results showed that, based on different categories of data, artificial intelligence can predict treatment outcomes and essential guidance information of traditional and novel therapies, thus enabling individualized treatment strategy selection for colorectal cancer patients. Some frequently implemented machine learning algorithms and deep learning frameworks have also been employed for long-term prognosis prediction in patients with colorectal cancer. Overall, artificial intelligence shows encouraging results in treatment strategy selection and prognosis evaluation for colorectal cancer patients.
Introduction
Colorectal cancer (CRC) ranks as the third most prevalent cancer worldwide and has the second highest mortality rate 1 . The management of patients with CRC in appropriate methods is a challenging medical problem, especially concerning treatment. 2 Surgery and endoscopic resection are the mainstay of treatment for localized, nonmetastatic CRC. Fluoropyrimidine-based chemotherapy has contributed to improved prognosis in metastatic patients. 3 Recent work has further elucidated the molecular mechanisms of CRC, diversifying the therapeutic approach to CRC, such as targeted therapy and immunotherapy.4,5 However, in the era of precision medicine, the individualized selection of treatment strategies for CRC patients is essential for providing the best cancer care. This poses a problem for clinicians: predicting treatment outcomes and making appropriate treatment decisions. 6 Besides, accurate prediction of the prognosis of CRC patients can also provide evidence for selecting treatment strategies. Differing from the traditional therapeutic outcome and prognosis prediction biomarkers, personalized prediction of cancer treatment can move into a new era with the advancement of computer technology.
Artificial intelligence (AI) is an essential branch of computing that can perform various functions, including prediction and classification based on existing data. 7 AI with sufficient data can classify patients to select personalized treatment strategies. 8 AI is an overall term, while machine learning (ML) and deep learning (DL) are the 2 most extensively used AI approaches in the medical field. Machine learning uses algorithms to parse and learn from data and then make decisions and predictions about real-world objects. In the medical field, popular ML algorithms such as support vector machines (SVMs) have been employed for disease stratification, prediction, and other purposes. 9 DL is a new learning approach based on an extension of ML, with multi-layered neural network algorithms to implement tasks. 10 It has proven to be proficient at finding complex structures in high-dimensional data. 11 DL has made more breakthroughs in areas such as image recognition, predicting gene expression, and disease impact than conventional ML, which has a limited ability to process natural data in its original form. 11 Cancer patients have multimodal data, including electronic medical records, molecular data, radiological data, and digital pathology data. 12 DL techniques such as convolution neural networks (CNN) 13 can better process these complex data individually for assisting in personalized treatment decision-making. The development of multimodal approaches in recent years has also enabled the integrated processing of these data. 14 Currently, AI has already played a role in CRC treatment with its excellent predictive and stratification power. This narrative review will discuss the applications of AI in the treatment of CRC patients in terms of therapeutic strategies and prognosis evaluation. In terms of therapeutic strategies, we review the applications of AI in the decision-making of treatment, neoadjuvant chemoradiotherapy efficacy prediction, chemotherapy efficacy and toxicity prediction, immunotherapy efficacy prediction, targeted therapy efficacy prediction, endoscopic therapy selection, and surgical therapy management. In terms of prognosis evaluation, we summarize the performance of ML and DL approaches in CRC, respectively.
A literature search was conducted strictly and comprehensively in Medline, EMBASE, Web of Science. The investigators’ search studies available up to October 5, 2023, according to predetermined protocols. The following keywords and/or medical subject heading terms searched were used: “Colorectal cancer,” “Artificial intelligence,” “Neoadjuvant chemoradiotherapy,” “Chemotherapy,” “Immunotherapy,” “Targeted therapy,” “Endoscopic therapy,” “Surgical therapy,” “Prognosis,” “Machine learning,” and “Deep learning.” The investigators performed the initial screening of titles and abstracts. Full-length articles of identified studies were retrieved to ensure the representativeness of the included references. We also searched Clinicaltrials.gov to identify the ongoing clinical trials of AI applications in CRC treatment.
Therapeutic Strategies
Artificial intelligence could be applied in neoadjuvant therapy, chemotherapy, immunotherapy, targeted therapy, endoscopic therapy, and surgical therapy for CRC (Figure 1). Previous studies have demonstrated the promising performance of AI applied to CRC therapeutic strategy selection (Table 1).

Artificial intelligence applications in therapeutic strategies of colorectal cancer.
Artificial intelligence in CRC therapeutic strategies selection.
Abbreviations: ANN, artificial neural network; AUC, area under the curve; BSLR, backward stepwise logistic regression; CI, confidence interval; CNN, convolution neural network; CRC, colorectal cancer; CT, computerized tomography; DNN, deep neural network; HR, hazard ratio; LARC, locally advanced rectal cancer; LASSO, least absolute shrinkage and selection operator; lmCRC, colorectal cancer liver metastases; LNM, lymph node metastases; MRI, magnetic resonance imaging; MSI, microsatellite instability; NA, not applicable; nCRT, neoadjuvant chemoradiotherapy; NPV, negative predictive value; pCR, pathologic complete response; PPV, positive predictive value; PTS, peritumoral stroma; RF, random forests; SVM, support vector machines; TILs, tumor-infiltrating lymphocytes; TMB, tumor mutational burden; TCGA, The Cancer Genome Atlas; COAD, colon adenocarcinoma; ESD, endoscopic submucosal dissection; POEM, peroral endoscopic myotomy.
AI in neoadjuvant chemoradiotherapy efficacy prediction
For locally advanced rectal cancer (LARC), preoperative neoadjuvant chemoradiotherapy (nCRT) and surgery are the standard therapy. 49 Preoperative nCRT is administered to achieve tumor shrinkage and to increase the probability of complete tumor clearance in surgical resection. 50 However, about 15% to 27% of cases will achieve a pathologic complete response (pCR) after nCRT. 51 Patients cured by neoadjuvant chemotherapy do not need to be referred for bowel surgery, and their postoperative pathological sections show no residual tumor cells. Therefore, it is of great significance to identify pCR after nCRT to avoid the impairment brought by surgical resection. Nevertheless, accurate prediction of pCR is currently challenging.
As a potential approach, AI was applied to predict treatment response by radiomics. Radiomics allows the digital decoding of radiographic images into quantitative features, including shape, texture features, etc. 52 Radiomic texture features from magnetic resonance imaging (MRI) may reflect the biological characteristics of the tumor. 53 The T2-weighted sequence was the most investigated sequence in rectal radiomics. 54 A study 15 proposed a radiomics model to predict pCR in rectal adenocarcinoma patients based on pretreatment T2-weighted MRI. In this research, a set of radiomics texture features were identified and used to construct a random forests (RF) classification model. The results showed that the RF model reached an area under the curve (AUC) of 0.712 with an accuracy of 70.5% on a hold-out validation data set containing 44 cases. The AUC of this model was general, and an external validation cohort with a larger sample size should be used to validate the model. Another research 16 used recursive feature elimination to select texture features in pre-nCRT T2-weighted MRI and developed a logistic regression classifier to predict pCR in LARC patients. The model yielded the best AUC of 0.80 on the hold-out test set. A combination of pretreatment, mid-treatment, and post-treatment MRI radiomics could characterize pCR after nCRT. Another team of researchers 17 proposed a RF classifier model based on pretreatment, mid-treatment, and post-treatment T2-weighted MRI of patients with LARC. The RF model could identify nonresponders to nCTR with a mean AUC of 0.83 in a validation cohort.
In addition to the T2-weighted sequence, radiomics features could also be extracted from diffusion-weighted magnetic resonance images (DWIs) alone or combined with other sequences to predict treatment response. Researchers 18 utilized features extracted from the apparent diffusion coefficient (ADC) maps of DWI to construct least absolute shrinkage and selection operator (LASSO)-logistic regression models, which were capable of predicting nCRT therapeutic response in patients with LARC. Of note, the model based on features extracted by a pre-trained CNN yielded a higher mean AUC of 0.73 than the model built with handcrafted features (AUC = 0.64). However, this study focused on differentiating between patients who responded to nCRT and nonresponders. In another research, 19 3 algorithms for feature selection combined with 4 ML classifiers were tested to predict pCR in LARC. Interestingly, texture features of the models were extracted from T2w as well as ADC maps, and 3-dimensional segmentations were accomplished by a DL algorithm or radiology residents. The model constructed by ranking approach as feature selection combined with SVM as classifier achieved the best performances. The automatic segmentation reached a higher accuracy of 75% than manual segmentation, with an accuracy of 68% on the validation set. Besides, a previous study 20 designed a deep neural network (DNN) based on radiomics features extracted from computerized tomography (CT). The DNN could predict pCR after nCRT with an overall accuracy of 80% on an external validation set, which also showed a good predictive capacity of CT.
Pathomics also has the potential to predict the treatment response of nCRT. A study 21 constructed a SVM classifier based on collagen structural features (CFs) in the tumor microenvironment to predict pCR among LARC patients. The CFs of pre-nCRT patients were analyzed by multiphoton imaging technology. Finally, the CFs-SVM classifier displayed good discrimination, achieving a high AUC of 0.854 on the validation data set.
AI in chemotherapy efficacy and toxicity prediction
For patients with colorectal cancer liver metastasis (CRLM) who are not eligible for surgery, chemotherapy is the usual treatment option. Identifying the therapeutic response of lesions to chemotherapy is essential for selecting treatment strategies in CRLM patients. 55 Previous studies have demonstrated that AI can predict treatment response to chemotherapy, which is challenging for physicians. Of note, the short-term treatment response was generally characterized by tumor shrinkage on imaging.
A previous study 22 proposed a radiomics model for predicting the therapeutic response of an individual liver lesion in patients with CRLM. The radiomics model was based on ML algorithms and achieved a per-lesion sensitivity of 73% and a specificity of 47% on a validation data set composed of portal CT scans. The sensitivity of this model is moderate, and the specificity is very low, showing the limitations of its clinical applications. Besides, the delta-radiomics score can also identify nonresponsive lesions to FOLFOX chemotherapy. A delta-radiomics model 23 achieved a high sensitivity of 85% and a high specificity of 92% in predicting nonresponsive liver metastatic CRC lesions. Recently, long noncoding RNAs (lncRNAs) have been considered potential biomarkers of CRC prognosis. 56 The lncRNAs are revealed to be associated with immune modifications in CRC. 24 A group of researchers 24 proposed a ML-based composite model, thus presenting a consensus immune-related lncRNA signature. The lncRNA signature can identify nonresponders to fluorouracil-based adjuvant chemotherapy and achieved a high AUC of 0.854 on the validation data set. Another study 25 used 10 ML algorithms to construct a consensus ML-derived lncRNA signature, which can also characterize patients who benefited from fluorouracil-based adjuvant chemotherapy. To sum up, AI enables individualized assessment of chemotherapy treatment response.
Artificial intelligence is also applied to predict the toxicities of chemotherapy. A study 26 developed ML models to predict cardiotoxicity in CRC patients who received fluoropyrimidine-based chemotherapy. Of all the algorithms tested, XGBoost achieved the highest precision of 0.607 in predicting the 30-day cardiotoxicity. A previous study has shown that the predicted factors for chemotherapy cardiotoxicity are relatively few and are associated with treatment regimens chosen. 57 Pretreatment predictions of cardiotoxicity are limited for individuals on the same treatment regimen in humans. Hence, AI could be introduced into clinical practice as a novel tool that may achieve better predictive performance in predicting cardiotoxicity than health personnel. Another research 27 demonstrated that ML can predict the toxicity of Irinotecan for each cycle of treatment, which is characterized by leukopenia, neutropenia, and diarrhea. In patients with CRC, ML achieved accuracies of over 75% in predicting all 3 symptoms. In this study, although artificial intelligence has high accuracy in predicting adverse reactions, it requires using pharmacokinetic data after drug administration. Therefore, its practicability may be weaker than human assessment of risk factors for chemotherapy toxicity. For individuals, personalized prediction of adverse reactions is of great significance for medical decision-making.
AI in immunotherapy efficacy prediction
Previous studies have applied AI models in predicting the response of immunotherapy in various types of cancers, including CRC.58,59 Microsatellite instability (MSI) is reported as a common molecular phenomenon in CRC, which is attributed to DNA mismatch repair deficiency (dMMR). 60 MSI was found in approximately 15% of colorectal tumors. 60 Initially, MSI was considered related to the efficacy of 5-fluorouracil in patients with CRC. 61 In recent years, the application of detecting MSI status for tumor immunotherapy has gradually been emphasized. Patients with MSI-high tumors may be more responsive to immune checkpoint inhibitors (ICIs),62,63 which is essential for the selection of immunotherapy. In clinical practice, MSI status was usually discriminated by immunohistochemistry or genetic analysis of a biopsy or resected specimens. 28 However, the current methods are not universal due to the cost and technical constraints. As a new approach, AI has been applied to identify MSI effectively.
A study 28 presented DL models based on MRI to identify the MSI status in rectal cancer. The pure MRI-based model reached an AUC of 0.820, while a clinical model based on clinical factors only had an AUC of 0.573 on the testing data set. When combining the MRI-based model and the clinical model, the integrated model had the highest AUC of 0.868. Moreover, histology-based models can also detect MSI status to predict immunotherapy efficacy. Another research 29 proposed a multiple-instance-learning-based DL model named Ensembled Patch Likelihood Aggregation (EPLA) to identify MSI-high and MSI-low/ microsatellite stability (MSS) in colorectal tumors. Ensembled Patch Likelihood Aggregation was based on histopathology images and had a low AUC of 0.6497 in an external validation data set. Researchers further applied transfer learning to generalize EPLA in addressing the wide variations in clinical practice. Transfer learning can learn from related learning problems. 64 For instance, there are differences between image acquisitions in the data sets because distinct scanners or scanning protocols are employed. Transfer learning may be able to eliminate data discrepancies that exist in clinical practice. After transfer learning, EPLA achieved a high AUC of 0.8504 on the external validation data set. Even if AUC is only part of the model performance evaluation, the dramatic improvement in AUC after transfer learning is encouraging.
Currently, tumor mutational burden (TMB) is applicable for identifying the sensitization to immunotherapy. Metastatic CRC with MSS/MMR-proficient performed a significantly lower TMB, which means resistance to anti-PD-1-based treatment. 65 Researchers 30 developed a DL method to evaluate TMB from hematoxylin and eosin (HE) staining CRC sections. Based on the Residual Network (ResNet) 50, the method achieved the highest AUC of 0.774 among all tested algorithms. Compared to the current standard method of measuring TMB, the model based on DL can reduce costs and improve efficiency.
AI in targeted therapy efficacy prediction
Generally, KRAS mutations can be found in approximately 40% of CRC. 66 In CRC patients with KRAS mutations, anti-EGFR-targeted therapy lacks benefits. 67 Therefore, KRAS mutations are considered as a negative biomarker for anti-EGFR-targeted therapy. 68 Detecting KRAS mutation has been suggested by the practice guidelines and is of great importance for the selection of anti-EGFR-targeted therapy in metastatic CRC patients. 69 A study 31 used a radiomics model to predict KRAS mutations in rectal cancer patients. Researchers proposed several T2-weighted image-based classifiers, including logistic regression, decision tree, and SVM. Among them, the SVM classifier achieved the best performance, with an AUC value of 0.714 on an external validation data set. Another 32 study designed a DL model based on both T2-weighted images and clinicopathological characteristics to detect KRAS mutations in rectal cancer. The combined model yielded an AUC of 0.841. Artificial intelligence may serve as an assistance method for the noninvasive assessment of KRAS mutations. Combined detection of KRAS, NRAS, and BRAF gene mutations contributes to the selection of anti-EGFR-targeted therapy in CRC patients. A previous study 33 designed DL models utilizing radiomics and semantic features. The models are capable of predicting KRAS, NRAS, and BRAF mutations in cases having CRLM. The model combined radiomics with the effective semantic score and achieved an AUC value of 0.79 in a validation cohort. Artificial intelligence enables rapid and accurate selection of anti-EGFR-targeted therapy.
Artificial intelligence has also been developed to predict the therapeutic response of targeted therapy. A research team reported that 70% of patients with metastatic CRC with HER2 amplification or overexpression benefited from trastuzumab plus lapatinib treatment. 70 Researchers 34 further constructed a DL model based on pretreatment CT to distinguish responders and nonresponders in CRC patients with hepatic metastases who received trastuzumab and lapatinib treatment. On an external validation cohort, the model has a sensitivity of 90% and a specificity of 42% per lesion. More cases are required to validate the performance of the model in predicting trastuzumab plus lapatinib treatment response.
AI in endoscopic therapy management
Artificial intelligence can support medical decision-making in endoscopic therapy. Endoscopic resection has emerged as an effective method to remove some early-stage CRCs before open surgery. 71 Unnecessary surgical resection may pose additional risks. However, 8% of patients with T1 and 18.5% of patients with T2 CRCs have lymph node metastases (LNM), which is a contraindication to endoscopic resection. 72 Currently, LNM in early CRCs cannot be accurately predicted. Therefore, some AI models have been developed to predict LNM, thus determining endoscopic treatment strategy. A previous study 35 proposed a ML model based on 45 clinicopathological factors to predict preoperative LNM in patients with T1 CRCs. On an external data set, the model reached a sensitivity of 100%. Whereas compared with guidelines of different countries, the model had specificity ranging from 0% to 66% and accuracy ranging from 9% to 69%. The high-performing model can decrease the high rate of unnecessary surgery brought by the guidelines. Researchers 36 developed models to predict LNM in patients with T1 CRCs after endoscopic resection. Some clinicopathological factors were identified by the RF classifier or generalized linear algorithm, respectively, and the RF classifier yielded a higher AUC of 0.85 on an external validation data set. In patients with T2 CRCs, the presence of LNM can also be identified by AI. A research team 37 constructed a model based on RF to predict LNM in patients with T2 CRCs after endoscopic resection. In this model, 8 clinicopathological factors, such as age, were utilized. The RF-based model achieved a robust AUC of 0.93 on a validation cohort and can help LNM-negative patients undergoing endoscopic full-thickness resections to avoid additional surgical resections. Another study 38 designed a LASSO-based algorithm to select clinicopathological variables. The model achieved an AUC of 0.765 in a validation cohort, which is higher than the Japanese guideline. Histopathological sections were also applied independently to predict the presence of LNM. A previous study 39 designed a CNN model to predict CRC LNM from histological slides. The model had an AUC of 0.710 on the internal test data set, which is higher than the model based on clinical data. However, genomics phenotypes 73 and clinical factors such as the T stage can be combined to improve model performance.
More importantly, AI technology can assist in the endoscopic resection of CRC. A study 40 proposed a DeepLabv3-based model to depict blood vessels and other structures on endoscopic images. As a DL model, the method achieved a mean vessel detection rate of 85%. This finding could reduce the risk of bleeding and perforation in endoscopic submucosal dissection performed by operators. However, studies investigating using AI technology to assist in the endoscopic resection of CRC were relatively scarce. More primary and validation studies are needed.
AI in surgical therapy management
Artificial intelligence can predict preoperative pathological variables for CRC surgery to support surgical management. Perineural invasion (PNI) is considered to be a negative prognostic factor. Patients undergoing radical resection of rectal cancer with PNI have higher postoperative mortality. 74 To predict PNI, a previous study 41 constructed SVM models based on preoperative CT. The classifiers achieved an AUC of 0.793 for detecting PNI in a colon cancer validation set. Preoperative prediction of PNI may be important for formulating surgical plans and postoperative management. In addition to preoperative prediction, AI can play an essential role in predicting postoperative complications. Anastomosis leakage (AL) is a common postoperative complication of CRC surgery, and some AI models have been developed to predict the occurrence of AL. A research team42,43 used SVM and composite kernels to predict AL from preoperative electronic health records, with a high AUC of 0.92. Besides, a study 44 proposed ML models based on clinical data to predict AL in postoperative CRC patients. In internal validation, the LASSO-based model achieved the highest AUC of 0.690 among all algorithms. A study 45 used machine learning models, including logistic regression, SVM, decision trees, RF, and ANN, to predict low anterior resection syndrome following CRC resection operation. These authors concluded that logistic regression is the most practical since it has a high sensitivity of 0.911 and can be used as a screening tool for low anterior resection syndrome. However, the results lack external validation set verification. Another study 46 used Gated Recurrent Unit with Decay based DL frameworks to predict postoperative wound and organ space infection in real time. The models can complete a bedside risk assessment with dynamic and static clinical variables, achieving an AUC of 0.68 in predicting wound infection and 0.78 in predicting organ space infection. Despite the low AUC, this novel tool also has the capacity to assist surgeons in making timely adjustments to treatment regimens. Artificial intelligence can use preoperative data to predict postoperative complications, which is essential for the management of CRC surgery.
Of note, the applications of AI in surgery could also be expressed in robotics. However, the use of AI in surgery, especially AI algorithms, is currently rarely associated with surgical technology. Therefore, we only reviewed the studies investigating AI applications in managing surgical complications.
Personalized Treatment
With the growing need for precision medicine, providing personalized treatment for CRC patients has become a challenge for oncologists and surgeons. However, the treatment of CRC is a complicated decision-making process that relies on diverse factors, including treatment guidelines, the condition of patients, and the physicians themselves. Clinical Decision Support Systems (CDSSs) based on AI technologies are considered as potential approaches to help solve the challenge. Based on clinical medical information, CDSSs can support physicians in their demand for making treatment decisions, providing personalized treatment, minimizing medical errors, and improving the quality of care. 75
As shown in Table 1, some AI-based CDSSs have been proposed for precision medicine. A previous study 47 developed a DL-based CDSS to enable the personalized selection of adjuvant treatment in CRC patients. The CDSS can stratified CRC patients into 3 risk groups based on HE-stained tissue sections and pathological staging markers. Thus, adjuvant chemotherapy could be avoided in stratified low-risk patients to reduce morbidity, mortality, and costs. Another 48 study developed a multistain deep learning model to evaluate AImmunoscore for CRC patients. AImmunoscore was a parameter based on multiple immunohistopathological images of various immune cell subtypes. The results showed that AImmunoscore is an independent prognostic factor for CRC and can predict treatment response to neoadjuvant therapy. In clinical practice, AImmunoscore can provide clinicians with additional rationale for performing neoadjuvant chemotherapy. AImmunoscore could be considered a decision tool for clinicians to promote precision medicine.
Besides, some other CDSSs, such as IBM(R) Watson for Oncology, were also introduced into clinical practice to provide personalized treatment recommendations for CRC. Their effectiveness has also been validated.76-79 Generally, the evaluation criterion in most validation studies is the consistency between AI and human experts. The results suggest that the best setting for using AI-based CDSS is probably in centers with limited expert CRC resources. Besides, AI-given regimens inconsistent with expert regimens are not necessarily incorrect options that physicians have considered and rejected. These may be regimens that physicians have not considered. AI-based CDSS may provide additional regimens with evidence, prompting physicians to query evidence and improve decision-making accuracy. These AI-based CDSS can considerably assist physicians in making personalized medical decisions for CRC patients.
Prognosis Evaluation
Previous studies have demonstrated the promising performance of AI applied to CRC prognosis evaluation (Table 2). Some previous studies have used AI to predict the prognosis of CRC patients, including recurrence, metastasis, and survival.
Artificial intelligence in CRC prognosis evaluation.
Abbreviations: AUC, area under the curve; CI, confidence interval; CNN, convolution neural network; CRC, colorectal cancer; CT, computerized tomography; DFS, Disease-Free Survival; HR, hazard ratio; HE, hematoxylin-eosin; LASSO, Least absolute shrinkage and selection operator; MTR, mucus-tumor ratio; NA, not applicable; OS, overall survival; PFS, progression-free survival; RF, random forests; RFS, recurrence-free survival; SVM, support vector machines; TSR, tumor-stroma ratio; TTNT, the time to next treatment; DACHS, Darmkrebs Chancen der Verhutung durch Screening; PET, positron emission tomography; RWE, real-world evidence; TRIBE2, Triplet plus Bevacizumab 2; TCGA, The Cancer Genome Atlas.
AI in recurrence prediction
Tumor recurrence after surgical resection is correlated with poor prognosis. In stage II and III CRC patients, 5-year cumulative local recurrence rates after surgery were 11.0% and 23.5%, respectively. 95 A previous study 80 proposed radiomics models based on 3 different ML algorithms to predict the recurrence in patients having stage II and III CRC and validate the performance of different algorithms. Radiomics features for model construction were extracted from the preoperative CT image of the tumor. Among the 3 tested algorithms, multivariate regression (MR) and RF achieved better predictive performance than SVM. However, the MR-based model combining radiomics features with clinicopathological factors achieved the highest balanced accuracy of 0.78 and a Matthews correlation coefficient (MCC) of 0.6. Machine learning approaches can also construct models independently to predict the recurrence of patients with CRC based on T stage, KRAS mutations, and other clinicopathological factors. Based on clinicopathological factors such as KRAS mutation, researchers developed a score derived from the bootstrap method and multivariable mixed-effects logistic regression. The scoring model achieved an AUC of 0.693 in predicting 1-year disease-free survival (DFS) in patients with CRLM after hepatectomy. 81 Another group of researchers 82 tested the performance of 4 ML algorithms for predicting postoperative recurrence in patients with stage IV CRC. Based on several clinicopathological factors, GradientBoosting achieved the highest AUC of 0.761 among logistic regression, decision tree, GradientBoosting, and Light Gradient Boosting Machine. A study 88 used DL to construct a CT-based model capable of evaluating the prognosis of patients with CRC. In the research, an innovative end-to-end multi-size convolutional neural network (MSCNN) was developed to effectively evaluate DFS, and the Kaplan-Meier analysis was conducted to prove that CT signature can predict DFS (P < 0.001). These studies also demonstrated that AI-based models may be better than traditional models in predicting tumor recurrence.
AI in metastasis prediction
Tumor metastasis in CRC patients is considered to be associated with a high risk of death. 96 About 18% to 25% of patients without distant metastases at the time of primary diagnosis will develop distant metastases within 5 years. 97 Therefore, early prediction of distant metastases from CRC is valuable for prognosis prediction. The liver is the most frequently metastasized organ of CRC, and liver metastasis occurs in approximately 50% of cases within the course of disease. 96 A study 83 tests the performance of models based on different ML algorithms to predict liver metastases of T1 stage CRC patients at primary diagnosis. Clinicopathological factors such as tumor information from the training set samples can significantly predict liver metastasis through all 7 common ML algorithms. To achieve the highest performance, the researchers finally optimized and integrated 7 models using the Bootstrap aggregating algorithm and stacked regression to obtain a stacking bagging model. The stacking bagging model yielded the highest AUC of 0.9631, which provides reliable evidence for prognosis prediction at primary diagnosis.
AI in survival prediction
Some researchers have concentrated on the advances of AI in predicting the prognosis of CRC patients receiving chemotherapy. Chemotherapy may bring risks or benefits to patients with CRC. A previous study 84 constructed a ML model based on pretreatment clinicopathological data to predict mortality within 30 days after cancer chemotherapy. This model used the gradient-boosted trees algorithm and was validated for performance on external data sets. For CRC patients, an AUC of 0.924 was displayed in predicting 30-day mortality, which could help reduce unnecessary chemotherapy that may pose a high risk. Another study 85 developed a ML-derived molecular signature named FOLFOXai for predicting treatment efficacy for metastatic CRC patients who received oxaliplatin-containing chemotherapy. Validation on independent data sets demonstrated that FOLFOXai can directly predict treatment efficacy, which is characterized by progression-free survival, the time to next treatment (TTNT), and overall survival (OS). For the treatment selection of CRC patients, ML models based on clinical and molecular data can help balance the benefits and risks of chemotherapy through prognostic prediction. For CRC patients undergoing surgical resection, the ML approaches can also be used to predict prognosis. A previous study 86 developed a radiomics model to predict postoperative prognosis for patients with CRC. Researchers employed ML algorithms to select PET/CT radiomic features and clinical features and constructed random survival forest (RSF) models. The best RSF model achieved a C-index of 0.820 in prognosis prediction of cases having stage III colon cancer, which is essential for therapeutic strategy selection. Another research 89 proposed a fusion model based on radiomics and deep convolutional neural networks (DCNN)to evaluate the prognosis of stage II CRC patients. Radiomics and DCNN features were obtained from CT of the primary tumors and peripheral lymph nodes to construct the model. The model achieved an AUC of 0.76 ± 0.08 in predicting DFS and 0.91 ± 0.05 in predicting OS. Studies have demonstrated the potential of CT-based AI models for prognostic prediction.
Besides, AI is also applied in histopathological images to perform prognostic evaluation. Artificial intelligence can automatically quantify indicators that can evaluate the survival of patients with CRC from histopathology images. A multicenter study 90 validates the performance of a CNN model based on HE-stained tumor tissue sections to predict the prognosis of CRC patients. This model assessed prognosis by extracting microenvironment biomarkers in histopathological pictures. On an external validation data set, the “deep stroma score” based on the DL model is an independent prognosticator for CRC patients, which is highly correlated with OS, disease-specific survival (DSS), and relapse-free survival (RFS). Another study 91 developed a CNN model based on HE-stained histopathological images to identify a poor or good prognosis of CRC patients who received capecitabine treatment. On a large validation cohort, the prognosticator selected by CNN had a sensitivity of 52% and a specificity of 78% in predicting 3-year DSS. In addition, integrating biomarkers selected by DL algorithms and clinicopathologic factors may achieve better performance in prognostic prediction. 98 A study 92 validated the performance of a CNN model to automatically evaluate tumor-stroma ratio (TSR) from HE-stained tumor sections. In the validation cohort, high TSR based on the tested CNN model was correlated with increased OS (P < 0.004). In another study, 93 the CNN-quantified mucus proportion was also validated to be correlated with the prognosis of colorectal mucinous adenocarcinoma patients (P < 0.008). Another innovative study 94 focused on classifying consensus molecular subtypes (CMSs) based on histopathology images of colorectal tumors. Derived from molecular classification, CMSs are identified into 4 robust subtypes at the gene-expression level and are associated with the prognosis of CRC patients. 99 Researchers developed neural networks to classify CMSs on histopathology images instead of high-cost bulk transcriptomics.94,100 The DL model showed results at the AUC value of 0.85 on an external validation set, which significantly facilitates the clinical application of CMSs in prognosis classification. These DL models are important for the prognostic stratification and treatment selection of CRC patients.
In addition to traditional single-omics data, recently, multi-omics data such as gene and tissue microbes has also performed well in predicting the survival of CRC patients. 87 A study 87 identified tissue bacterial biomarkers that predict the survival of CRC patients and developed a microbiome-based ML model. This model has more outstanding predictive performance than models based on mRNA or miRNA data. Recent studies 101 have revealed the mechanisms behind the relationship between microbes and tumor progression. Based on ML approaches, microbiome and other multi-omics information provide reliable new directions for prognosis prediction of CRC patients. These studies also demonstrated the great potential of the AI approaches for predicting prognosis in CRC patients.
Ongoing Clinical Trials
We listed the ongoing clinical trials utilizing AI in the treatment of CRC in Table 3. Most clinical trials focus on predicting the treatment response of neoadjuvant chemotherapy. In addition, some clinical trials are using AI to predict occurrence of complications after surgery. Other areas being focused on are prognostic prediction and targeted therapy response prediction.
Summary of the ongoing clinical trials of artificial intelligence applications in colorectal cancer treatment.
Abbreviations: CNN, convolution neural network; CRC, colorectal cancer; CT, computerized tomography; DL, deep learning; ML, machine learning; MRI, magnetic resonance imaging; NA, not applicable; PANIC, Prediction of Anastomotic Insufficiency Risk After Colorectal Surgery; DCE, dynamic contrast-enhanced; PET, positron emission tomography; RPAI-TRG, RadioPathomics Artificial Intelligence Model to Predict Tumor Regression Grading; VAMIS, Video Analysis in Minimally Invasive Surgery.
Discussion
AI plays an essential role in computer science and is emerging as an important approach in medical research. While integrating AI into daily patient management still confronts numerous practical issues, its outstanding performance remains inspiring. Artificial intelligence can predict treatment outcomes, which may help clinicians select treatment strategies in CRC. For conventional therapies such as chemotherapy, surgery, and endoscopic resection, AI can help clinicians predict treatment effectiveness, toxicity, and complications. In addition, AI technology can directly assist surgeons in completing endoscopic resection. For novel therapies such as immunotherapy and targeted therapies, AI enables screening the appropriate populations for the treatment and reducing resource consumption. For CRC, AI techniques have also made great strides in prognosis prediction, involving recurrence, metastasis, and survival prediction.
In the era of precision medicine, personalized treatment decision selection has become a new requirement for cancer care. Based on the selection of treatment strategies by AI, the application of AI in cancer may make personalized treatment a reality. Previous studies have proposed some CDSSs for making direct treatment recommendations promoting precision medicine. However, some of the included CDSSs were not validated on diverse patient cohorts. Large retrospective studies with robust external validation are necessary to further integrate AI into clinical practice. Moreover, the evaluation criterion in most validation studies is the consistency between AI-based CDSSs and human experts. The good performance of CDSSs indicated that our best setting for using CDSSs may be in centers with limited expert resources. Besides, in some previous validation studies,76,79 AI-based CDSSs may suggest treatment recommendations that experts considered outdated and inappropriate. Almost all of the studies76,77,79 considered these divergences as negative. However, we should realize that AI may occasionally provide better advice. Since CDSSs are not yet commonly used in clinical practice, no studies have yet investigated this issue. After the CDSSs are applied in clinical practice, future studies could be designed to directly compare the advantages and disadvantages of the treatment decisions given by the CDSS with those given by the experts. In addition, the application scenarios of CDSSs in CRC are relatively limited currently. Artificial intelligence models could be applied in augmenting the decision to operate, identification and mitigation of modifiable risk factors, prediction and management of complications. 102 They also have the potential to screen the appropriate populations for immunotherapy and targeted therapy. Future applications of AI-based CDSS could greatly help surgeons and physicians make treatment decisions for CRC in clinical practice.
AI has surpassed traditional models in some fields, and it can still advance more. Prognostic models for cancer patients are based on multimodal data, including electronic medical records, molecular data, radiological data, and digital pathology data. 12 Previous studies have demonstrated that ML can theoretically automate and merge these multimodal data for cancer treatment. 12 Nevertheless, for high-dimensional data such as omics information, DL-based multimodal approaches allow for better integration. 14 This partly shows the advantages of the DL techniques for the personalized prediction of cancer treatments. Besides, for medical image data processing and analysis, DL can obtain better performance than ML owing to its excellent properties. 103 Recently, some novel DL algorithms, such as transformer have shown better performance in image recognition. Transformer can potentially process medical images for more accurate treatment outcome prediction in CRC patients. 104 Advances in algorithms could facilitate cancer treatment outcome prediction.
The AI models included in this review also have some limitations. Before AI can be introduced into clinical practice, it must be critically evaluated to ensure the validity and safety of the model. First, a key step in the evaluation process is validating model performance on an external validation data set. External data set validation is an important part of trusting AI algorithms. Notably, a proportion of the included studies did not have external data set validation. This could lead to descriptions of overfitted and not generalizable models. Second, we reported metrics such as AUC, recall, and precision to partially represent the reliability of AI models in the included studies. Some of the studies select the appropriate models by evaluating discrimination, which can be measured using AUC. However, AUC only measures ranking and does not deal with the calibration of the models. Calibration refers to the agreement between the estimated and the “true” risk of an outcome, which is different from discrimination.105,106 Calibration is important for the evaluation of AI models for clinical prediction. 107 Of note, only a small number of included studies in this review reported calibration. Therefore, the performance of AI models without calibration should be interpreted more cautiously. Third, the small sample sizes of some included studies also made their results not robust.
There are still several issues that need to be addressed for further application of AI models in CRC treatment. First, the AI models currently being developed are the “weak AIs” that do not implement tasks in the same way as humans. The “weak AIs” are merely driven by the designer’s program and achieve satisfying results in actual problem-solving. 108 These models play only a supplementary role in clinical practice. And any eventual clinical decisions must be determined by physicians. However, “strong AI” with consciousness and intentionality may be proposed for assisting clinical decision making in the future. The problems associated with the use of “strong AI” in clinical practice should be considered in advance, such as validation and ethical issues. Second, AI involves many ethical issues, including data privacy disclosure, patient consent, and the risk of decision errors. The AI models are based on the patients’ private and sensitive information, such as identity, health, diagnosis, and treatment information. The data security is at risk. For patients, collecting their information without informed consent will infringe on their rights and interests if, for example, the information is stolen or misused. Researchers have developed approaches such as the k-anonymity privacy protection algorithm to address the issues. 109 However, the application of these methods is limited by time and cost. In addition, the problem of attributing responsibility for medical decision errors caused by AI still exists. Artificial intelligence has not yet been applied to clinical practice on a large scale, which is somewhat limited by the fact that the issue of liability for medical malpractice is still controversial. Medical malpractice caused by AI has not yet been legally defined and is a thorny issue. Despite the difficulties we face, we still believe that AI approaches could play a greater part in medical prediction and classification.
Limitations
This narrative review provides a relatively comprehensive overview of artificial intelligence in CRC treatment. However, this review also has some limitations. First, we have not performed a systematic review. Because of the rapid advances in the AI field, some outdated studies were not appropriate to be reviewed. We only included studies representative enough in the field of AI. We are conscious that some studies were eliminated due to their lack of good results or sufficient data to support them. Second, we only summarized key information including AUC, external validation, calibration to report the limitations of the included studies. More comprehensive information would be required to evaluate the quality of included studies.
Footnotes
Funding:
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Author Contributions
Conception and design: JQY, XLM and DQH. Manuscript writing and editing: JQY and JH. Manuscript reviewing: JQY and XLM. Figures creating and charts drawing: JQY.
