Abstract
Background:
Radical cystectomy for bladder cancer has one of the highest rates of morbidity among urologic surgery, but the ability to predict postoperative complications remains poor. Our study objective was to create machine learning models to predict complications and factors leading to extended length of hospital stay and discharge to a higher level of care after radical cystectomy.
Methods:
Using the American College of Surgeons National Surgical Quality Improvement Program, peri-operative adverse outcome variables for patients undergoing elective radical cystectomy for bladder cancer from 2005 to 2016 were extracted. Variables assessed include occurrence of minor, infectious, serious, or any adverse events, extended length of hospital stay, and discharge to higher-level care. To develop predictive models of radical cystectomy complications, we fit generalized additive model (GAM), least absolute shrinkage and selection operator (LASSO) logistic, neural network, and random forest models to training data using various candidate predictor variables. Each model was evaluated on the test data using receiver operating characteristic curves.
Results:
A total of 7557 patients were identified who met the inclusion criteria, and 2221 complications occurred. LASSO logistic models demonstrated the highest area under curve for predicting any complications (0.63), discharge to a higher level of care (0.75), extended length of stay (0.68), and infectious (0.62) adverse events. This was comparable with random forest in predicting minor (0.60) and serious (0.63) adverse events.
Conclusions:
Our models perform modestly in predicting radical cystectomy complications, highlighting both the complex cystectomy process and the limitations of large healthcare datasets. Identifying the most important variable leading to each type of adverse event may allow for further strategies to model cystectomy complications and target optimization of modifiable variables pre-operative to reduce postoperative adverse events.
Introduction
Radical cystectomy with urinary diversion for bladder cancer is a highly morbid procedure for which predicting complications is challenging. Depending on the reporting method, complications following radical cystectomy can be as high as 50–69%,1–5 even in high-volume centers, 6 and despite open versus robotic approaches. 7 Complications can increase length of stay (LOS) by, on average, 4 days, and increase cost by between $10,000 and $30,000 based on readmission status. 8 As one of the most complex and expensive genitourinary cancer surgeries performed, 9 bladder cancer surgeons are tasked with examining ways to lessen the burden on both patients and the health care system.
A number of risk stratification tools have been employed in an attempt to predict surgical peri-operative outcomes and complications after radical cystectomy, each with limited success.10–12 The American College of Surgeon’s National Quality Improvement Program surgical risk calculator, which is designed to predict 30-day postoperative outcomes, has been shown to poorly predict complications after radical cystectomy. 13 Other commonly used indices for surgical risk assessment, including the American Society of Anesthesiologists (ASA), the Modified Frailty Index (mFI), and the Charlson Comorbidity Index (CCI), had similar poor prognostic abilities for predicting complications in patients undergoing radical cystectomy. 14 Each of these studies demonstrate the need for better pre-operative risk assessment tools, as well as the challenges in predicting complications from radical cystectomy.
Despite studies investigating methods to more accurately assess risk factors for complications following radical cystectomy, there is still lacking a high confidence model that can be used in clinic practice for this patient population and procedure. Using the American College of Surgeons (ASC) National Surgical Quality Improvement Program (NSQIP) database, our primary study aim was to create models that predict complications in patients undergoing radical cystectomy, stratifying each model by any, serious, or minor complications, infection event, extended LOS, and discharge to a higher level of care. As a secondary aim, we report the importance of variables impacting each of the stratified predictive models. We employ machine learning techniques that can help organize large data, and reveal trends that are not possible without this technology. Machine learning techniques have been previously employed in prostate cancer detection and predicting postcystectomy recurrence and Survival.15,16 By predicting complications, we may both inform the decision to operate, and target postoperative care and care provision for these patients. As health care moves to a value-based system that penalizes surgeons and hospitals for complications and readmissions, the inability to predict postoperative radical cystectomy complications remains problematic.
Patients and methods
Data for this study is from the ACS NSQIP database. NSQIP is a large national registry of surgical admissions submitted by over 500 hospitals in the US, which includes data on over 300 demographic, comorbidity, pre- and postoperative clinical and laboratory values with high reliability (approximately 98% inter-rater agreement). 17 Postoperative variables are measured with 30-day follow up.
For this study we selected all patients admitted during 2005–2016 with a diagnosis of bladder cancer, defined by ICD-9 codes 188.0–9, and who underwent a cystectomy, defined by Current Procedure Terminology (CPT) codes 51590, 51595, or 51596, with no exclusion criteria.
Variables
The primary outcome was any adverse event during the 30 days following surgery, which included coma greater than 24 h, cardiac arrest requiring cardiopulmonary resuscitation (CPR), deep vein thrombosis (DVT), pulmonary embolism (PE), myocardial infarction (MI), unplanned intubation, unplanned return to operating room (OR), sepsis, stroke/cerebrovascular accident (CVA), acute renal failure, bleeding/transfusions, pneumonia, surgical site infection (SSI), urinary tract infection (UTI), and wound disruption. We then divided our adverse outcomes categories into serious and minor, similar to prior studies evaluating the NSQIP dataset.18,19 Serious adverse events included coma greater than 24 h, cardiac arrest requiring CPR, DVT, PE, MI, unplanned intubation, unplanned return to OR, sepsis, and stroke/CVA. Minor adverse events included acute renal failure, bleeding/transfusions, pneumonia, SSI, UTI, and wound disruption. Infectious adverse events included pneumonia, SSI, UTI, and wound disruption. Additional secondary outcomes were extended LOS (defined as greater than the 75th percentile), and discharge to a higher level of care (including rehab, separate acute care, skilled care, or unskilled facility).
We selected candidate predictors based on previous literature. 5 Our approach was to select as many candidate predictor variables as possible, then to employ methods with built-in variable selection and support for high dimensionality in the predictor space.
Demographic variables selected included age, sex, body mass index (BMI), and the following comorbidities: cerebrovascular disease, chronic pulmonary disease, congestive heart failure, peripheral vascular disease, diabetes, renal disease, metastatic solid tumor, respiratory problems, decreased peripheral pulse, arterial hypertension, cardiac problems, changes in everyday life, history of stroke, current smoker (within 1 year), current drinker (>2 drinks/day in 2 weeks before admission), ventilator dependent, previous percutaneous coronary intervention (PCI), paraplegia, quadriplegia, open wound/wound infection, steroid use for chronic condition, bleeding disorders, chemotherapy for malignancy in ⩽30 days pre-op, radiotherapy for malignancy in last 90 days, systemic sepsis, prior operation within 30 days.
We included the cystectomy classification (CPT 51590, 51595, or 51596) as well as additional procedure categories defined by CPT: ureterocolon conduit, 50815; ureteroileal conduit, 50820; continent diversion, 50825 hernia, 49491–49611; prostate, 55700–55899; intestines, 44005–44799; ureter, 50600–50980; integumentary, 10030–19499; appendix, 44900–44979; female genital, 56405–58999; lymph, 38300–38999; abdomen, 49000–49999; kidney, 50010–50593; urethra, 53000–53899; other bladder, 51020–52700 not including 51590, 51595 and 51596; cardio, 33010–37799; rectum, 45000–45999, and any other procedure.
Additional candidate predictors included peri-operative transfusion, pre-operative transfusion, operative time, days from hospital admission to operation, wound classification, ASA classification, >10% loss of body weight in last 6 months, sum of work relative value units (RVU) (total sum of the RVUs of each of the procedures performed at the time of the radical cystectomy), and pre-operative laboratory values, which included hematocrit, platelet, white blood cell count (WBC), creatinine, and albumin, that were missing for less than 5% of patients.
Statistical analysis
We tested four types of predictive models: generalized additive models (GAM) with logistic link; least absolute shrinkage and selection operator (LASSO) again with logistic link; feed-forward neural network with logistic activation function and no weight decay (NNET); and random forest classifier (RF). These models were selected to capture potential sparsity (LASSO), arbitrary nonlinearity (GAM), interactions (RF), and higher order patterning (NNET).
We fit each of the four model types to each outcome (any, serious, minor, infection event, extended LOS, discharge to a higher level of care), entering all candidate predictors into each model. We used the Box-Cox transformation on all continuous variables to help stabilize models, which is especially recommended in the case of NNET.
Before any model fitting, we split the data into training (80%) and test (20%) sets, and conducted all model fitting on the training sets. We used 10-fold cross-validation on the training set to select hyperparameters, which were the number of hidden layers (1–5) for NNET, the number of split variables (1–7) for RF, lambda for LASSO, and the spline penalty parameter for GAM. Hyperparameters were selected to maximize the cross-validated area under curve (AUC) of the receiver operating characteristic (ROC) curve.
After models were selected using cross validation, we evaluated predictions against their observed value in the test set and estimated ROC curves. We estimated uncertainty around the AUC using 1000 bootstrap resamples of the test data.
All data analysis was conducted in R version 3.5.0 for Windows, with cross validation and model training procedures implemented in the ‘caret’ package, neural networks using the ‘nnet’ package, random forest using the ‘randomForest’ package, lasso using the ‘glmnet’ package, GAM using the ‘mgcv’ package, and ROC curves estimated using the ‘pROC’ package.
Results
There were 7557 patient admissions meeting inclusion criteria in NSQIP from 2005 to 2016, who had a median [interquartile range (IQR)] age 70.0 (62.0–76.0) and BMI (kg/m2) 27.8 (24.7–31.5), of whom 6231 (82.5%) were male and 1871 (24.8%) were current smokers within 1 year (Table 1). Overall, 2221 (29.4%) patients experienced any adverse event, with approximately similar numbers experiencing a serious (19.7%), minor (19.2%), or infectious event (22.3%); additionally, 891 (11.8%) were discharged to a higher level of care and 2277 (30.1%) had an extended (>75th percentile) LOS (Table 2).
Patient demographics.
ASA, American Society of Anesthesiologists; BMI, body mass index; CPT, Current Procedure Terminology; IQR, interquartile range; PLND, pelvic lymph node dissection.
Incidence of adverse events.
CPR, cardiopulmonary resuscitation; DVT, deep vein thrombosis; OR, operating room.
Figure 1 displays boxplots of estimates of AUC for each response variable (any, serious, infectious, minor adverse event; discharge to higher level of care; extended length of stay) for each model type (LASSO, RF, GAM, and NN), estimated using 1000 bootstrap resamples on the test data. For the primary outcome of any adverse event, the highest discrimination was achieved on the test set by LASSO (AUC 0.64, 95% CI 0.62, 0.65), followed by RF (AUC 0.55, 95% CI 0.49, 0.59), and GAM (AUC 0.54, 95% CI 0.51, 0.58), with NN performing the worst (AUC 0.52, 95% CI 0.46, 0.56). Similarly, LASSO was the highest performing model type for serious (AUC 0.67, 95% CI 0.65, 0.68), minor (AUC 0.62, 95% CI 0.60, 0.64), and infectious (AUC 0.62, 95% CI 0.60, 0.64) adverse events, as well as discharge to a higher level of care (AUC 0.74, 95% CI 0.72, 0.76). In contrast, GAM was the highest performer for extended LOS (AUC 0.66, 95% CI 0.64, 0.67), although it was closely followed by LASSO (AUC 0.65, 95% CI 0.63, 0.66).

Discrimination accuracy of each model type for each response variable, estimated on the training data (n = 5657) and applied to 1000 bootstrap resamples of the test data (n = 1431).
Figure 2 shows variable importance (VI) estimates for the top-performing model for each response variable. For LASSO, the VI represents the standardized regression coefficient as a signed percent of the maximum coefficient (not including the intercept), and for GAM, VI is the sum of absolute values of t statistics for each spline coefficient for that variable, as a percentage of the maximum value, with sign based on the overall linear trend. These variable importance measures are intended to allow for comparison across model types.

Variable importances of top 10 most importance variables for top performing models for each response variable.
For any adverse event, the top two most important variables (VI) were integumentary procedure (100) and cardio procedure (94.2); for minor events, they were sum of RVUs (–100) and BMI (98.7); for infectious events, they were CPT 51596 (100) and abdomen procedure (97); and for serious events, they were abdomen procedure (100) and sum of RVUs (–93.1). For discharge to a higher level care, the top two most importance variables (VI) were age of patient with patients over 89 coded as 90+ (100) and changes in everyday life (27); and for extended LOS, they were days from hospital admission to operation (100), and age of patient with patients over 89 coded as 90+ (73.2).
Discussion
Despite recent efforts to enhance recovery protocols, 20 radical cystectomy remains one of the most complex disease processes urologists manage, due, in large part, to the high complication rates. Our study aim was to develop models to predict complications following radical cystectomy. Even though our models perform better than many others in the literature, our best-performing model was fair, highlighting the need for continued investigation into predicting radical cystectomy complications along with the challenges of this complex process.
We found we could best predict discharge to a higher-level care facility using the LASSO model technique, which resulted in an AUC of 0.74. This is similar to models of Golen and colleagues from the NSQIP data set that moderately predicted discharge to a rehabilitation center with AUC of 0.75. 13 The LASSO model was the highest performing model for each adverse event category, with the exception of extended LOS, which performed marginally better using the GAM model, AUC 0.66. Still, most models performed similarly at predicting a given outcome, suggesting that the limitation lies in the data rather than the models. The success of modeling discharge to higher level facilities may suggest that elements of transitions of care, as well as appropriateness of discharge, are more predictable by variables recorded during the hospital stay. Apart from this, the higher performance obtained using a LASSO model with purely linear specification suggests that there are either less likely to be important interactions or nonlinearities in the variables included in models, or that measurement error or sampling error made these difficult to detect. The NSQIP data, like most healthcare datasets, may have more measurement error than a prospective study.
Overall, all four models performed modestly in predicting adverse events in general, with AUC scores between 0.52 and 0.64. These findings are marginally better than prior work, which demonstrated AUC scores for other commonly used risk stratification tools including ASA, mCCI, and mFI of between 0.51 and 0.58. 14 There may be several explanations why predicting complications of radical cystectomy remains challenging. First, there continues to be a lack of consensus amongst surgeons on what constitutes a postoperative complication. This difficulty is further exacerbated by disagreement on a structured classification of postoperative complications and morbidity. Although the NSQIP dataset has higher inter-rater agreement, what constitutes a complication needs to be agreed upon. Second, although the NSQIP is designed to capture variables to improve the quality of surgical care, cancer stage, which has been shown to impact outcomes, is not a dataset variable. Third, additional trends not captured in this study, such as postoperative trends in consecutive laboratory values or vital signs, may impact performance of our predictive models. Lastly, despite the comprehensive nature of the NSQIP data, surgical volume is not reported. Both the experience of the surgeon and the institutional cystectomy volume may impact cystectomy outcomes.21,22
As a secondary aim, we evaluated VI estimates for each of the top-performing adverse models. The most important variables for any adverse event were procedures on the integumentary system and prior cardiac procedures, both unsurprising given the substantially higher risk of operating on those with prior significant cardiac disease. Similarly, both discharge to a higher level of care and extended LOS outcomes underscored age of the patient as the most important variable, which is seen in clinical practice with the slower recovery of older patients. Increased RVUs appears not to increase complications, which may suggest that patients who receive more care at the same level of need do better. These findings, taken together, highlight the crude ability to use pre-operative patient variables to predict certain complications within the designed models. In our analysis, additional variables, such as sarcopenia, leukocytosis, and hypoalbuminemia, were included as additional candidate predictor variables based on prior work that revealed associates with complications and readmissions after radical cystectomy.23–27 As mentioned above, important variables such as surgeon experience, hospital volume, and surgical approach were not included in the analysis and may have meaningful effects on complication rates.
Our study highlights how large healthcare data remain extremely limited in their predictiveness in the domain of complications of radical cystectomy for bladder cancer. We identified the most important variables in each adverse event that appear be important to consider in future similar studies. Furthermore, these findings may provide insight as to what variables may be more predictive that could be collected prospectively. Interestingly, a large percentage of complications appear closely related to the reconstructive rather than the exenterative portion of the surgery. 28 Outside of urinary diversion procedure codes, which were incorporated into our models, additional bowel-related variables could further contribute to improved predictive performance of future models.
Additional limitations of this analysis arise from the NSQIP database, which has a disproportionate number of large academic facilities making the results of these models perhaps not generalizable to all urologic practices. Any model derived from NSQIP will be limited by the selection biases of the database. Only patients who actually underwent a procedure are included, which means that patient and surgeon were willing to accept the perceived risks. We were unable to control for the surgical approach, whether open or robotic, which may have further informed our models. Moreover, we could not control for postoperative ileus, which is a common complication after radical cystectomy, as no standard definition of ileus after bowel surgery exists, and it is therefore subjective to the surgeon, as well as the coder, and the dataset itself. In addition, there may be other unmeasured confounders or interactions that significantly affect the outcomes, such as whether an advanced recovery after surgery (ERAS) protocol was followed. Although NSQIP is subject to oversight and regular auditing, it may also be vulnerable to reviewer bias. As mentioned, the NSQIP also captures follow-up data only up to 30 days postoperatively, and does not record cancer stage. Future predictive models may benefit from investigating additional variables not found in the NSQIP data that may have meaningful impact on outcomes.
Conclusion
Our machine learning statistical models predict complications, discharge status, and extended LOS following radical cystectomy for bladder cancer marginally better than other commonly used indices, underscoring the need for continued investigation into predicting radical cystectomy complications along with the challenges of this complex process. Identifying the most important variables leading to each type of adverse event may allow for further strategies to model cystectomy complications and target optimization of modifiable variables pre-operatively to reduce postoperative adverse events.
Footnotes
Author contribution
Protocol/project development: J. Taylor, X. Meng, A. Renson, M.A. Bjurlin.
Data collection or management: J. Taylor, X. Meng, A. Renson, M.A. Bjurlin.
Data analysis: J. Taylor, A. Renson, M.A. Bjurlin.
Manuscript writing/editing: J. Taylor, X. Meng, A. Renson, A.B. Smith, J.S. Wysock, S.S. Taneja, W.C. Huang and M.A. Bjurlin
Critical revisions and supervision: A.B. Smith, J.S. Wysock, S.S. Taneja, W.C. Huang, and M.A. Bjurlin
Funding
The author(s) received no financial support for the research, authorship, and publication of this article.
Conflict of interest statement
Marc A. Bjurlin is a paid speaker for Ultimate Medical Academy
Ethical approval
National Surgical Quality Improvement Program is a de-identified publically available dataset.
