Abstract
Introduction
National ranking, funding, and overall reputation for hospitals and their individual departments are determined in part by outcome metrics. The American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) developed a surgical risk calculator (SRC) to help risk stratify patients and predict outcomes. The SRC was developed using data from over 4.3 million operations at 780 hospitals between 2013 and 2017, and calculates a patient’s overall risk for specific complications, length of hospital stay, and mortality within 30 days of surgery. 1 The calculated risk is based on a variety of factors including patient demographics, preexisting comorbidities, and type of procedure.
These algorithms have demonstrated inconsistent predictive accuracy; they have been shown to be accurate in predicting adverse events for head and neck oncologic resections without microvascular free flap reconstruction,2,3 but less accurate in predicting risk in patients undergoing microvascular free flaps and total laryngectomies with prior chemoradiation.4-8 Surgeons, on the other hand, develop predictive skills in their careers, but their predictive accuracy is unknown. This study aims to prospectively compare the accuracy of surgeons’ predictions with that of established risk calculators in head and neck surgery, as well as to assess the individual risk factors associated with morbidity and mortality.
Patients and Methods
This prospective cohort trial was approved by the institutional review board and conducted within a single academic medical center, consisting of 5 head and neck oncologic surgeons. For a 1 year period, surgeons were asked to complete a survey for any patient undergoing surgery with an expected inpatient admission. The survey included 4 questions about the patient’s anticipated postoperative risk using a 5-point Likert scale (eg, 1 being least likely, 5 being most likely). Using their own judgment, clinical reasoning, and knowledge of the patient, they assessed risk for (1) overall complication, (2) mortality while in the hospital, (3) mortality within 6 months due to cancer (if applicable), and (4) mortality within 6 months due to medical comorbidities.
Surgical procedure and preoperative risk factors as determined by the ACS were then entered into the SRC to obtain calculated risk of complications and mortality. An example of SRC risk predictions can be found in Figure 1. In addition to identifying risk factors included in the SRC, this study identified additional data including known cancer diagnosis, clinical evidence of locoregional metastases, and whether or not the procedure was a salvage surgery (patient received prior treatment with any combination of surgery, chemotherapy, and radiation).

Example of NSQIP SRC risk predictions for a patient undergoing total laryngectomy. NSQIP, National Surgical Quality Improvement Program; SRC, surgical risk calculator.
Electronic medical records were manually reviewed at least 30 days following surgery to determine postoperative outcomes. They were not reviewed again after this, as the perioperative period is defined by the NSQIP-ACS as 30 days. Complications designated as “serious” included all mortalities, cardiac events, infections requiring antibiotics >1 week, and surgical takebacks. Nonserious complications included uncomplicated pneumonia, local infections treated with <1 week of antibiotics, urinary tract infections, deep venous thrombosis without embolism, and readmission without operative intervention. Study data were collected by 2 investigators who were not involved in the surgeries and managed using REDCap electronic data capture tools.
Univariate logistic regressions were applied to assess the association between preoperative risk factors and outcomes, the unadjusted odds ratios (ORs), and their corresponding 95% confidence interval (CI). The area under the curve (AUC) in a receiver operating characteristic (ROC) curve was used to evaluate the predictive discrimination ability of surgeons’ risk assessment versus the calculated SRC score based on preoperative risk factors. The 95% CI of AUC was computed with 2000 stratified bootstrap replicates and Delong’s test was used to compare 2 correlated ROC curves. Statistical analysis was performed using R (Version 4.03) and the pROC package was used to calculate AUC and perform the Delong’s test (R Core Team, 2020).
Results
Of 123 patients undergoing qualifying procedures, 22 did not meet criteria for an inpatient admission, resulting in 101 included surgeries performed by 5 surgeons. The patients were 78.2% male (n = 79) and 21.8% female (n = 22). Mean patient age was 63.9 years, and mean body mass index was 26.55 kg/m2. The 101 cases included oncologic ablation with any type of microvascular free flap reconstruction, excluding total laryngectomy (n = 45, 44.6%), total laryngectomy with or without free flap reconstruction (n = 18, 17.8%), parotidectomy with neck dissection (n = 10, 9.9%), composite resection without free flap (n = 9, 8.9%), neck dissection (n = 7, 6.9%), transoral robotic surgery with or without neck dissection (n = 6, 5.9%), thyroidectomy or parathyroidectomy (n = 3, 3.0%), partial laryngectomy (n = 2, 2.0%), and carotid body tumor excision (n = 1, 1.0%).
Observed preoperative risk factors as recognized by the SRC included medically treated hypertension (n = 56, 55.4%), smoking within 1 year of surgery (n = 31, 30.7%), chronic obstructive pulmonary disease (n = 20, 19.8%), diabetes treated with oral medications (n = 15, 14.9%), diabetes treated with insulin (n = 8, 7.9%), distantly metastatic cancer (n = 3, 3.0%), chronic kidney disease (n = 1, 1.0%), and steroid use (n = 1, 1.0%). No patients had congestive heart failure or ascites within 30 days of surgery. Although not included by the SRC, we examined other risk factors including known cancer diagnosis (n = 91, 90.1%), presence of clinical neck metastases (n = 42, 41.6%), and whether this was a salvage surgery (n = 34, 33.7%).
Of the 101 patients, 37 (36.6%) experienced a complication of any kind and 18 (17.8%) experienced a serious complication. The most common complications were surgical site infection (n = 21, 20.8%), return to the operating room (n = 19, 18.8%), and readmission (n = 18, 17.8%). Less common complications included pneumonia (n = 4, 4.0%), cardiac event (n = 1, 1.0%), and deep venous thromboembolism (n = 1, 1.0%). There were 2 (1.9%) mortalities in total; one of these occurred >30 days from the time of surgery, which is beyond the standard ACS time frame of postoperative surveillance. However, it was included for the purposes of this study because it was attributed to multiple postoperative complications and known cancer diagnosis.
Smoking resulted in a 2.49 times higher overall complication rate (P = .04). When laryngectomies were used as the baseline for complication rates, patients were less likely to experience a complication when undergoing free flap reconstruction (OR 0.9) or any other surgery (OR 0.26; P = .02). Patients who were older than the mean of 63.9 years were less likely to experience a complication (OR 0.95, P = .002). Other preoperative risk factors did not have a statistically significant association with outcomes. A full rendering of predictive scores for risk factors is displayed in Table 1.
Preoperative Risk Factors and Associated Complication Rates.
Both surgeons and the SRC performed poorly on the prediction of the outcomes. The surgeons’ assessment for any complication resulted in an AUC of 0.51 (95% CI: 0.39-0.62); the SRC predictions resulted in an AUC of 0.58 (95% CI: 0.47-0.70), respectively. While the SRC score performed slightly better than surgeons’ assessment on AUC, Delong’s test did not show a statistically significant difference between the two (P = .34). The ROC curves for surgeon prediction and SRC prediction of any complication are superimposed for direct comparison in Figure 2. For the prediction of the outcome of serious complication, the AUC for surgeons and the SRC were 0.55 (95% CI: 0.41-0.69) and 0.60 (95% CI: 0.46-0.74), respectively; these differences were also not statistically significant (P = .58; Figure 3).

ROC curves comparing surgeon and SRC predictions for any complications. ROC, receiver operating characteristic; SRC, surgical risk calculator.

ROC curves comparing surgeon and SRC predictions for serious complications. ROC, receiver operating characteristic; SRC, surgical risk calculator.
Discussion
This is the first study to date that directly compares surgeons and electronic risk calculators in predicting surgical outcomes. While certain individual risk factors were associated with complications, neither surgeons nor the ACS-NSQIP SRC were very accurate in predicting complications.
Not surprisingly, smoking status and type of surgery were significant predictors of postoperative complications. The effects of tobacco use on wound healing and overall health are well established across surgical subspecialties, including head and neck surgery.9,10 The findings of this study can be used to reinforce preoperative discussions with patients regarding smoking cessation. Patients undergoing total laryngectomy with or without free flap were also more likely to experience complications than those undergoing other types of surgery. Laryngectomies have high rates of complication at baseline; the complication of pharyngocutaneous fistulas alone have a high overall incidence of up to 33% of primary and 48% to 68% of salvage total laryngectomies.11,12 However, in this study, salvage surgery in and of itself was not significantly associated with a higher postoperative complication rate compared to non-salvage surgery.
Interestingly, patients who were older than the average age of patients in this study were less likely to experience a complication. Surgical treatments geared toward younger patients tend to be more aggressive, with the goal to maximize survival. There is no objective data to substantiate this, and the statistical significance of this finding may be due to the smaller sample size of patients.
Prior studies have found that the SRC has variable accuracy in head and neck surgery,3-12 and there remains uncertainty regarding the role that preoperative risk factors such as age, smoking, alcohol use, and patient comorbidities play in determining a patient’s risk for complications.13-15 Very few attempts have been made to create an otolaryngology-specific risk calculator, and these have had limited success or adoption within the specialty.16-18
The initial hypothesis when forming this study was that surgeons are more accurate in risk prediction than the standardized tool. This is because an automated risk calculator is a snapshot of the patient and does not capture a patient’s motivation, available support systems, and social determinants of health. While the SRC shows a slightly higher discriminating power than surgeons, this was not statistically significant and both the surgeon and the SRC were poor predictors of outcomes.
The SRC has been highly profiled in both news media and television shows and is widely used by institutions and ranking systems across the United States, most notably its parent organization the ACS. The implications for surgical departments being graded by a tool that is not accurate are considerable, and this study suggests that using the SRC may not be an adequate way to capture data and make these assessments. A high ratio of observed to expected mortalities can result in reduced reimbursement and lower national rankings by programs such as the U.S. News and World Report, which can be devastating to a department. Our study also, however, identified that surgeons themselves are poor predictors, and so the question remains on how best to predict outcomes and thus assess surgical departments. As we continually develop new machine learning techniques and medicine leans toward automation, it is the hope that more accurate and specialty-specific risk calculators can lead to more thorough, accurate, and fair assessments of surgical departments.
There are notable limitations to this study. The most troublesome limitation encountered in data analysis was comparing a 5-point Likert scale to a risk model. The surveys given to surgeons were designed as a 5-point Likert scale for ease of use and to encourage reporting, but this inherently created challenges for direct comparison with the SRC’s percentages. Analyses such as Brier scores are good measurements for predictive accuracy but could not be applied to the surgeons’ risk assessments, and so were not used. The AUC ROC method measures discriminative power of prediction and remains the most appropriate analysis given these different measurement scales, although their overall fit remains limited.
Other limitations include a small study size and complex procedures. Analysis could be more powerful if there were more patients, more diversity of cases, and multiple surgeons making predictions for each patient. The SRC also only allows for input of one procedure code; if patients underwent multiple procedures, only the major or most complex procedure was listed. For this reason, all total laryngectomies were considered within the same category during analysis, rather than breaking the group into those with and without free flap reconstruction. With increasing complexity of cases, risk should increase but this cannot easily be captured by the SRC.
Over one-third of surgeries in this study were salvage and nearly half were microvascular free flaps, and prior studies have identified that the SRC may be unreliable in these surgeries.2-7 Future studies with a larger cohort may discriminate salvage surgery based on the type of prior treatment; due to the small size of this study, patients who had received any prior combination of chemotherapy, radiation, or surgery with curative intent were analyzed together.
This study was not powered to stratify patients with higher risk percentages from those with lower risk, which may be a way to further delineate accuracy of surgeons and the SRC. Patients with increased comorbidities may not be selected by the surgeon for high-risk cases based on low survivability. Future studies may compare the accuracy of predictions based on prior risk stratification, type of surgery, or the complexity and duration of the procedure.
Based on this study and other retrospective work evaluating the ACS-NSQIP in head and neck surgery, the tools used to assign risk to patients are inaccurate and may lead to inadequate evaluations. This study serves as a starting point for other studies wishing to compare surgeons and computerized risk calculators. These findings demonstrate the need for larger studies evaluating risk predictions with diverse groups of surgeons and procedures, with the overarching goal of producing highly accurate, specialty-specific risk calculators. This is ever more important as the healthcare world continues to adapt to the emergence of artificial intelligence.
In conclusion, both surgeons and standardized risk calculators are poor predictors of outcomes in head and neck surgery, and neither outperformed a coin toss. The only risk factor that accurately predicts complications is preoperative smoking, although age and type of surgery are also significant. These findings can help guide preoperative decision-making and discussion with patients. More studies are needed to further characterize the accuracy of surgeon predictions and standardized risk calculators in otolaryngology.
Footnotes
Data Availability Statement
Data are available on request.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The author(s) received an internal grant from the Vanderbilt Institute for Clinical and Translational Research, which was used solely to obtain assistance in complex statistical analysis.
Ethical Approval
The study was approved by the Vanderbilt University Medical Center Institutional Review Board.
