Abstract
Background
Ureteral injury (UI) is a rare but devastating complication during colorectal surgery. Ureteral stents may reduce UI but carry risks themselves. Risk predictors for UI could help target the use of stents, but previous efforts have relied on logistic regression (LR), shown moderate accuracy, and used intraoperative variables. We sought to use an emerging approach in predictive analytics, machine learning, to create a model for UI.
Methods
Patients who underwent colorectal surgery were identified in the National Surgical Quality Improvement Program (NSQIP) database. Patients were split into training, validation, and test sets. The primary outcome was UI. Three machine learning approaches were tested including random forest (RF), gradient boosting (XGB), and neural networks (NN), and compared with traditional LR. Model performance was assessed using area under the curve (AUROC).
Results
The data set included 262,923 patients, of whom 1519 (.578%) experienced UI. Of the modeling techniques, XGB performed the best, with an AUROC score of .774 (95% CI .742-.807) compared with .698 (95% CI .664-.733) for LR. Random forest and NN performed similarly with scores of .738 and .763, respectively. Type of procedure, work RVUs, indication for surgery, and mechanical bowel prep showed the strongest influence on model predictions.
Conclusions
Machine learning-based models significantly outperformed LR and previous models and showed high accuracy in predicting UI during colorectal surgery. With proper validation, they could be used to support decision making regarding the placement of ureteral stents preoperatively.
Key Takeaways
• Machine learning-based models significantly outperformed logistic regression in the prediction of ureteral injury during colorectal surgery. • With proper validation, these machine learning models could augment surgeon judgment by identifying patients at highest risk of UI for consideration of ureteral stent placement.
Introduction
Ureteral injury (UI) is a rare but devastating complication of colorectal surgery, occurring in .3-.6% of cases.1,2 In addition to the injury, UI is associated with higher morbidity, mortality, length of stay, and hospital charges. 1 Identifying the ureter is a key step in most colorectal operations, and ureteral stents may help surgeons rapidly and safely identify the ureter in cases of increased complexity. Ureteral stent placement may be associated with a decreased rate of UI, but stent placement itself carries risks, including hematuria, urinary tract infection, acute kidney injury, and stent migration.3,4
A predictive model to identify patients at highest risk of UI would be helpful in selecting patients who would benefit most from ureteral stents. Previous efforts have created a predictive model based on logistic regression (LR) for this outcome, but they showed limited accuracy and required intraoperative information. 1 This limits the usefulness of these models, as ureteral stent placement occurs prior to surgery and intraoperative information is not yet known. Machine learning is an emerging technique within data science that uses computational methods to find nonlinear patterns in large data sets. It has been successfully applied to the prediction of procedure-specific outcomes, such as pancreatic fistula and complexity of abdominal wall reconstruction.5-7
We sought to use machine learning to predict UI during colorectal surgery and create a tool which could be used to guide ureteral stent placement. Our hypothesis was that machine learning methods applied to multi-institutional data from the National Surgical Quality Improvement Program (NSQIP) database would outperform previous approaches and result in a more accurate model.
Methods
Data
Study exemption and waiver of informed consent were obtained from the University of North Carolina Institutional Review Board. We used data from the NSQIP database, including the proctectomy and colectomy procedure-targeted data sets. NSQIP collects data from more than 700 participating hospitals using trained surgical reviewers, with reliability ensured through rigorous training and reliability audits. 8 We used all available years, which included 2012 to 2019 for colectomy and 2016 to 2019 for proctectomy. Data from 2019 were held out as an external test set, as it was the most recent year available and would most closely resemble current patients. Data from 2012 to 2018 were split into training and validation sets in an 80/20 ratio with 5-fold cross-validation. These data were used for model training, adjusting model settings (hyperparameters), and ensuring model generalizability prior to evaluation on the test. This work is reported in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklist.
Outcomes and Variables
Included Variables for the Prediction of Ureteral Injury.
Abbreviations: L, left; R, right; SIRS, systemic inflammatory response syndrome; ASA, American Society of Anesthesiologists; SSI, surgical site infection; PATOS, present at time of surgery; RVUs, relative value units.
Modeling
We applied various machine learning approaches including random forest (RF), gradient boosting (XGB), and deep neural networks (NNs). We compared these techniques with LR. Random forest and XGB are based on combining decision trees, while NN processes data through nonlinear functions that are adjusted with training. 9 Hyperparameters were tuned for each algorithm using Bayesian optimization from the scikit-optimize library. This algorithm explores a wide range of hyperparameters and uses previous results to guide a targeted search for the highest performing set. For RF and XGB, hyperparameters included the number of trees, the maximum tree depth, the minimum samples per split, the number of features considered per split, and the subsample of features considered. For NN, hyperparameters included the number of neurons, the number of layers, the percentage of dropout used per layer, and the learning rate.
Evaluation
We evaluated each model using area under the receiver operating characteristic curve (AUROC) and area under the precision-recall curve (AUPRC). Area under the receiver operating characteristic curve is based on a plot of sensitivity vs 1 – specificity and represents a model’s ability to distinguish positive from negative cases. Area under the receiver operating characteristic curve ranges from .5 to 1, with .5 representing random guessing and 1 representing perfect classification. The DeLong test was used to compare AUROC, with significance set at P < .05. 10 Area under the precision-recall curve is based on a plot of sensitivity vs positive predictive value and represents a model’s ability to identify all positive cases without identifying negative cases. Area under the precision-recall curve ranges from the rate of the positive class (ie, the rate of UI) to 1 (representing perfect sensitivity). We also calculated sensitivity, specificity, positive predictive value, and negative predictive value. Model performance was also analyzed among patients who had ureteral stents placed. We used Shapley additive explanations to identify which variables had the strongest impact on model predictions. 11 LR, RF, and XGB were implemented using scikit-learn and NN using TensorFlow/Keras.12,13 All analyses were performed in Python (version 3.8). Code to reproduce this work is available at https://github.com/gomezlab/colorectal_predictors/tree/main/ureter_injury.
Results
The data set included 262,923 patients. Of these, 213,786 were used for training/validation, while 49,137 were used for testing. 12,229 patients who had prophylactic ureteral stents placed were excluded from analysis. 1519 (.578%) of patients experienced UI. The cohort consisted of 52% female patients, with an average age of 61.6. By procedure type, the rate of UI was 1.0% for left-sided colectomies, .7% for low anterior resection, and .2% for right-sided colectomies. By indication, the rate of UI was highest for chronic diverticular disease (.9%) and rectal cancer (1.3%) and lowest for colon cancer (.5%) and non-malignant polyp (.1%). The average work relative value unit (RVUs) for procedures with no UI was 26.8 compared with 29.2 for those with UI.
Of the 3 machine learning techniques, XGB performed the best, with an AUROC score of .774 (95% CI .742-.807), compared with .698 (95% CI .664-.733) for LR. Random forest and NN also performed well with AUROC scores of .738 (95% CI .704-.772) and .763 (95% CI .730-.796), respectively. Comparison using the DeLong test showed a significant difference between the AUROCs of XGB and LR with P < .001. Receiver operating characteristic curves are shown in Figure 1. Receiver operating characteristic curves for models predicting ureteral injury. RF, random forest; XGB, gradient boosting; NN, neural network; LR; logistic regression; AUROC; area under the receiver operating characteristic curve.
Area Under the Receiver Operating Characteristic Curve and Precision Recall Curve for Models Predicting Ureteral Injury.
Abbreviations: AUROC, area under the receiver operating characteristic curve; CI, confidence interval; AUPRC, area under the precision-recall curve.

Precision-recall curves for models predicting ureteral injury. RF, random forest; XGB, gradient boosting; NN, neural network; LR, logistic regression; AUPRC, area under the precision-recall curve.
Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value for Gradient Boosting Models Predicting Ureteral Injury.
We also analyzed model performance for patients who had stents placed. Among patients who had ureteral stents placed, the rate of UI was .883%. For the XGB model, AUROC was .613 and AUPRC was .017. However, the average prediction for the XGB model was .007 for patients with stents compared with .003 those without stents, showing higher predicted average risk.
Using SHAP values, we identified which variables had the strongest influence as predictors in the XGB model. The type of procedure, complexity of procedure (assessed using work relative value units), indication for surgery, mechanical bowel prep, wound classification, emergency procedure, and operative approach had the strongest influence on model decision making (Figure 3). Importance of variables in predicting ureteral injury for the gradient boosting model. RVUs, relative value units.
Discussion
This study sought to create and validate machine learning-based models to predict UI during colorectal surgery and showed significant improvements in predictive ability compared with LR, as well as high accuracy in the test set. Analysis of model decision making showed that the type of procedure, procedure complexity, indication for surgery, and mechanical bowel prep were the most important variables. Model accuracy was much lower for patients who had ureteral stents placed, suggesting that ureteral stent placement changes the risk profile for UI, but the model agreed with surgeon assessment that these patients were at overall higher risk.
Multiple previous studies have identified risk factors for UI during colorectal surgery.14-16 The largest of these, an analysis using NSQIP from 2012 to 2014, found that UI was associated with diverticulitis, T4 malignancy, and open approach. 2 Additionally, a study using the Danish Colorectal Database from 2005 to 2011 found that laparoscopic approach and surgery for rectal cancer was associated with a higher risk of UI. 17 Our study contradicted this study’s findings regarding operative approach, perhaps due to increased laparoscopic experience during the time period included in the NSQIP database. One previous study built a predictive model for UI during colorectal surgery using the National Inpatient Sample from 2001 to 2010 and showed an AUROC of .73. 1 However, this model required intraoperative information, including the presence of adhesions, limiting its use for preoperative decision making and inflating its accuracy.
One interesting finding from our study was the model’s use of work RVUs as a risk factor for UI. While work RVUs are mainly used for financial compensation, they are also a measure of the complexity of operations. Work RVUs have been strongly associated with risk of postoperative complications in multiple previous studies. 18 In addition, other factors such as mechanical and oral antibiotic bowel prep are unlikely to have a direct influence on UI but are indirectly associated with decreased risk of injury, so have predictive value. Overall, there is a strong overlap between previously identified risk factors for UI and factors used by the current model. This confirms that one strength of machine learning approaches when applied to clinical risk prediction lies in improved interpretation of known risk factors and their combinatory interactions rather than identification of novel ones.
Our model adds to the growing literature showing the potential for machine learning-based models to assist with preoperative decision making through risk prediction. Machine learning has shown high accuracy in predicting general postoperative outcomes across a variety of procedures and data sets. 5 More recently, it has been successfully applied to procedure and disease-specific outcomes including complexity of abdominal wall reconstruction and response of rectal cancer to neo-adjuvant therapy.7,19 The current models similarly show the potential of using machine learning to incorporate information from large data sets and provide insights traditionally dependent on a surgeon’s clinical judgment. If these models were integrated into the electronic medical record as decision-support tools, they could be used to automatically flag patients at the highest risk of UI for the consideration of stent placement. However, decision making would still be dependent on surgeons, who would choose the threshold at which they believe the benefits of stent placement outweigh the risks.
The study of iatrogenic UI is particularly difficult because it is so rare. This results in important limitations for the current study. The most significant is its method of identifying UI (with the use of CPT codes) within the NSQIP database. As noted by previous authors, this approach can incorrectly identify intentional concurrent resections of the ureter, although these cases are likely rare.2,20 Additionally, it does not identify ureteral injuries which are not recognized intraoperatively and do not result in return to OR. Second, the models show low AUPRC and positive predictive value, which is expected given the rare incidence of UI and the importance of the operation itself in causing injuries. Third, validation of the model on data external to NSQIP is difficult due to the low incidence of UI and models trained on the national NSQIP data set may not perform well at individual institutions. Fourth, our dataset does not include other preoperative information that may be helpful in predicting UI, such as a history of abdominal surgery or preoperative CT imaging. Finally, more accurate models specific to procedure type and indication are possible. However, we chose to balance model accuracy and generalizability by developing a model specific to colorectal surgery but useful for all colorectal resections.
Conclusion
In conclusion, this study shows that machine learning-based models can significantly outperform LR in the prediction of UI during colorectal surgery. Validation of these models will require assembly of a large, multi-institutional database in order to capture a sufficient number of events. However, with proper validation, these machine learning models could augment surgeon judgment by identifying patients at highest risk of UI for consideration of ureteral stent placement.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by funding from the National Institutes of Health (Program in Translational Medicine T32-CA244125 to UNC/KAC).
