Abstract
Study Design
Retrospective Cohort Study.
Objectives
To evaluate the accuracy of the ACS-NSQIP Pediatric Surgical Risk Calculator in predicting postoperative complications and mortality following pediatric spinal deformity surgery. Predicted risks were compared with observed outcomes from the ACS-NSQIP Pediatric database, stratified by scoliosis etiology, fusion level, and surgical approach.
Methods
We performed a retrospective analysis of pediatric patients who underwent spinal deformity correction between 2012 and 2023 using the ACS-NSQIP Pediatric database. Patients were categorized as idiopathic or neuromuscular scoliosis. Predicted risks were compared with observed 30-day outcomes including mortality, surgical site infection, pneumonia, and urinary tract infection. Predictive performance was assessed using the Brier score across discrimination and calibration dimensions.
Results
A total of 58,010 patients were included (45,211 idiopathic; 12,799 neuromuscular). Overall, the calculator predicted a 2.74% complication rate vs an observed rate of 9.54% (Brier: 0.00462; 5.35% of maximum), reflecting poor discrimination and substantially underestimated absolute risk. The most frequent complications were surgical site infection (2.12%), pneumonia (0.99%), and urinary tract infection (0.64%), each demonstrating adequate individual-level discrimination and calibration. Stratified analyses showed adequate performance for idiopathic scoliosis patients undergoing 0-12 level fusions, while discrimination was poor for ≥13 level fusions. Performance was substantially worse among neuromuscular scoliosis patients across all fusion levels. Surgical approach did not meaningfully affect performance.
Conclusions
The calculator reliably predicts select individual complications but underestimates overall risk in high-complexity cases, particularly extensive fusions and neuromuscular scoliosis. Incorporating deformity-specific and surgical complexity variables may improve preoperative risk stratification and counseling.
Introduction
Surgery for pediatric spinal deformity is complex, costly, and comes with significant risks. 1 Estimated complication rates for pediatric spinal deformity surgery, particularly adolescent idiopathic scoliosis (AIS) ranges from 5% to 23%, with even higher rates for neuromuscular and congenital scoliosis.2,3 Furthermore, post-operative complications can be a large driver of overall healthcare costs.4,5 The high costs and potential complications have prompted the development of predictive analytics and risk stratification tools specific to pediatric spinal deformity surgery.1,6 These tools aim to mitigate postoperative morbidity and mortality, reduce healthcare costs, and improve decision-making for both surgeons and patients.
Despite advancements in surgical techniques and perioperative care, challenges remain. Enhanced Recovery After Surgery (ERAS) protocols have shown promise in improving outcomes and reducing complications in pediatric spinal deformity surgery.7–9 These protocols aim to standardize perioperative care and accelerate recovery, potentially leading to shorter hospital stays and reduced costs. Analysis of spinal fusion outcomes in children with neuromuscular disorders showed rates of surgical and medical complications, and health-related quality of life measures have improved, but there is much room to improve. 8 Thus, it is essential to accurately predict morbidity and mortality among pediatric patients undergoing spinal deformity surgery.
There is currently a gap of literature evaluating predictive metrics in patients undergoing surgery for pediatric spinal deformity. 10 The effective evaluation of potential risks and complications, including morbidity and mortality in the postsurgical period, is an essential component to ensure the medical optimization of comorbidities and to further improve decision-making by spine surgeons. Therefore, the purpose of our study was to calculate the risk for postoperative complications as well as mortality following corrective surgery for pediatric spinal deformity patients using the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) Pediatric Surgical Risk Calculator (PSRC). 11
Methods
Data Source
This study utilized the ACS-NSQIP Pediatric database to identify factors influencing perioperative outcomes in pediatric spinal deformity surgeries. The dataset, maintained by the American College of Surgeons, includes detailed data on over 150 preoperative, intraoperative, and 30-day postoperative variables from patients treated in more than 250 urban and rural hospitals across the United States. Patient cohorts were identified using specific Current Procedural Terminology (CPT) codes (22800, 22802, 22804, 22808, 22810, and 22812), corresponding to corrective surgical procedures for spinal deformity. Neuromuscular scoliosis (NMS) was defined as scoliosis cases with a pediatric NSQIP neuromuscular disorder indicator present, whereas idiopathic scoliosis (IS) was defined by the absence of this neuromuscular disorder flag. Data collection was standardized through trained surgical clinical reviewers, ensuring accuracy and interrater reliability. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Guidelines. 12
Study Design
ACS-NSQIP Pediatric Surgical Risk Calculator Inputs and Outputs
Data collection
Comprehensive patient data were tracked using standardized forms, documenting demographic, clinical, and surgical parameters. Primary outcomes included 30-day mortality, non-fatal complications, and surgical site infections (SSI). Secondary outcomes included pneumonia, cardiac complication, urinary tract infection, venous thrombosis requiring therapy, acute renal failure, and unplanned intubation. SSI was defined as superficial incisional infection (SII), deep incisional infection (DSI), and organ space infection (OSI). Outcomes assessed under “any complication” included pneumonia, reintubation, pulmonary embolism, renal insufficiency or failure, urinary tract infection, central line-associated bloodstream infection, coma lasting more than 24 hours, seizure, peripheral nerve injury, cerebral intraventricular hemorrhage, stroke or intracranial hemorrhage, cardiac arrest requiring CPR, venous thrombosis requiring therapy, graft/prosthesis/flap failure, sepsis, surgical site infection, bleeding events (blood transfusion ≥25 mL/kg), deep wound disruption, and superficial wound disruption. Observed outcomes were compared with risk estimates derived from the NSQIP calculator.
Risk Calculator Input and Outcomes
For each included CPT code, relevant parameters were entered into the ACS-NSQIP Pediatric Surgical Risk Calculator without surgeon-adjusted risk modifications. Predicted probabilities of adverse events were documented and subsequently compared to observed outcomes. It is important to note that the “any complication” composite outcome as reported in this study reflects the exact definition embedded within the ACS-NSQIP PSRC output, and was not modified or refined by the study authors. This definition is intentionally broad by design, as the calculator was developed to generate a single aggregate risk estimate across all tracked adverse events to facilitate preoperative counseling across diverse surgical populations. As such, the observed “any complication” rate in our cohort, which includes both high-severity events and lower-severity events, reflects the full scope of the calculator’s composite endpoint rather than a clinician-defined measure of major morbidity.
Statistical Analysis
The predictive accuracy of the ACS-NSQIP PSRC was assessed using the Brier score, a quadratic scoring method. 13 This score calculates the squared differences between predicted probabilities and actual binary outcomes. The maximum Brier score was determined as the product of the mean observed outcome and its complement (1 – mean observed outcome). Model performance was evaluated across two distinct but complementary dimensions. Discrimination, the model’s ability to rank patients by relative risk, was assessed using the Brier score threshold, with scores <5% of the maximum possible value classified as adequate and scores ≥5% classified as poor. Calibration, the agreement between predicted probabilities and observed event rates, was assessed by examining absolute differences between predicted and observed rates, classified as well-calibrated (<2 percentage point difference), underestimated (observed exceeding predicted by 2-5 percentage points), or very underestimated (observed exceeding predicted by > 5 percentage points). Because a favorable Brier score does not preclude systematic underestimation of absolute risk, especially in the setting of low-prevalence outcomes, discrimination and calibration were reported separately throughout to provide a complete and clinically meaningful characterization of model performance.
Results
Patient Characteristics
NSQIP Calculator and Operative Factors for Patients in the IS Cohort
NSQIP Calculator and Operative Factors for Patients in the NMS Cohort
Among patients with IS, the mean age was 14.1 ± 2.53 years, and the majority were female (71.8%, N = 32,472). Most patients were classified as ASA II (61.0%, N = 27,601) followed by ASA III (19.1%, N = 8637). The most common comorbid conditions included impaired cognitive status (10.8%, N = 4881), acquired abnormality (8.1%, N = 3647), and seizure disorder (3.2%, N = 1468). The majority of patients underwent 7-12 level fusions (58.6%, N = 26,512), followed by 13+ levels fused (28.1%, N = 12,715) and 0-6 levels fused (13.2%, N = 5984). A posterior surgical approach was used in the majority of cases (97.6%, N = 44,121).
In contrast, patients with NMS were younger with a mean age of 12.9 ± 3.05 years, and 53.9% were female (N = 6895). Patients in this cohort demonstrated a substantially higher comorbidity burden. The most common conditions included impaired cognitive status (66.9%, N = 8559), seizure disorder (36.3%, N = 4651), nutritional support dependence (32.6%, N = 4173), and acquired abnormality (35.8%, N = 4581). Most patients were classified as ASA III (71.6%, N = 9160) followed by ASA II (17.2%, N = 2199). The majority underwent 13+ level fusions (64.7%, N = 8281), followed by 7-12 level fusions (20.3%, N = 2597) and 0-6 level fusions (15.0%, N = 1921). Similar to the IS cohort, the posterior approach predominated (97.7%, N = 12,506).
Fusion Level Performance
Predicted Percentages From the NSQIP Pediatric Surgical Risk Calculator and Observed Percentages From the Observed Cohort of Perioperative Outcomes, as Well as the Brier Score and Brier Max Between the Predicted and Observed Percentages Stratified by Fusion Level in the IS Cohort
Predicted Percentages From the NSQIP Pediatric Surgical Risk Calculator and Observed Percentages From the Observed Cohort of Perioperative Outcomes, as Well as the Brier Score and Brier Max Between the Predicted and Observed Percentages Stratified by Fusion Level in the NMS Cohort
For IS patients with 0-6 levels fused, the predicted risk of any complication was 2.8%, compared to an observed rate of 5.5%. The Brier score was 0.0007 (max: 0.052, % of max: 1.27), reflecting adequate discrimination, though the calculator underestimated absolute complication risk. Within this group, SSI had a predicted risk of 1.8% with an observed rate of 1.95% (Brier score: 0.000002, max: 0.019, % of max: 0.01), demonstrating adequate discrimination and well-calibrated individual-level prediction. Mortality was predicted at 0% with an observed rate of 0.05% (Brier score: 0.0000003, max: 0.0005, % of max: 0.05), reflecting adequate discrimination for this low-prevalence outcome. For IS patients with 7-12 levels fused, the predicted risk for any complication was 1.5%, compared to an observed rate of 3.11%. The Brier score was 0.00026 (max: 0.03, % of max: 0.87), indicating adequate discrimination, though predicted probabilities underestimated observed rates. SSI rates were nearly identical between prediction (1.0%) and observation (0.99%) (Brier score: 0.00000001, max: 0.01, % of max: 0), demonstrating adequate discrimination and well-calibrated individual-level prediction. Mortality remained low with a predicted risk of 0% and an observed rate of 0.02% (Brier score: 0.00000004, max: 0.0002, % of max: 0.02), reflecting adequate discrimination for this rare outcome. In contrast, among IS patients with 13+ levels fused, the model substantially underestimated complication risk, predicting 2.9% while the observed rate was 10%. The Brier score was 0.005 (max: 0.09, % of max: 5.6), indicating poor discrimination and very underestimated absolute complication risk. However, SSI predictions remained well-calibrated (predicted: 1.7%, observed: 2.02%; Brier score: 0.00001, max: 0.02, % of max: 0.05), with close agreement between predicted and observed rates and adequate discrimination. Mortality again remained low with a predicted risk of 0% and an observed rate of 0.06%, reflecting adequate discrimination for this rare outcome.
Predictive performance differed substantially for patients with NMS. For 0-6 levels fused, the predicted risk of any complication was 3.5%, whereas the observed rate was 14.6%. The Brier score was 0.0123 (max: 0.125, % of max: 9.86), indicating poor discrimination and very underestimated absolute complication risk. Despite this, individual complications such as SSI demonstrated more satisfactory individual-level prediction (predicted 1.9%, observed 3.8%; Brier score: 0.00036, % of max: 1.0), with adequate discrimination, though absolute rates remained underestimated. For 7-12 level fusions, the predicted complication risk was 2.0% while the observed rate was 12.2%, with a Brier score of 0.0104 (max: 0.107, % of max: 9.73), again reflecting poor discrimination and very underestimated absolute risk. SSI predictions showed adequate discrimination with closer agreement between predicted (1.3%) and observed (3.0%) rates (Brier score: 0.00029, % of max: 1.0), though absolute rates remained underestimated. For patients with 13+ levels fused, the discrepancy between predicted and observed complication risk was greatest. The model predicted a complication rate of 4.5%, whereas the observed rate was 30.4%. The Brier score was 0.067 (max: 0.212, % of max: 31.6), demonstrating poor discrimination and very underestimated absolute complication risk. Notably, readmission also demonstrated poor discrimination at this fusion level (% of max: 5.35). Despite this, several individual complications, including SSI (predicted 1.8%, observed 5.5%) and pneumonia (predicted 0.3%, observed 3.6%), yielded adequate discrimination by Brier score thresholds, though absolute rates were consistently underestimated.
Approach Performance
Predicted Percentages From the NSQIP Pediatric Surgical Risk Calculator and Observed Percentages From the Observed Cohort of Perioperative Outcomes, as Well as the Brier Score and Brier Max Between the Predicted and Observed Percentages Stratified by Approach in the IS Cohort
Predicted Percentages From the NSQIP Pediatric Surgical Risk Calculator and Observed Percentages From the Observed Cohort of Perioperative Outcomes, as Well as the Brier Score and Brier Max Between the Predicted and Observed Percentages Stratified by Approach in the NMS Cohort
Among IS patients undergoing posterior fusion, the predicted risk for any complication was 2.53% with an observed rate of 5.34%. The Brier score was 0.00079 (max: 0.0506, % of max: 1.57), reflecting adequate discrimination, though absolute complication risk was underestimated. SSI prediction showed adequate discrimination with close agreement between predicted (1.25%) and observed (1.41%) rates (Brier score: 0.000003, % of max: 0.02), reflecting well-calibrated individual-level prediction. Mortality remained rare with a predicted risk of 0% and an observed rate of 0.03%, reflecting adequate discrimination for this low-prevalence outcome. Among IS patients undergoing anterior approaches, the predicted complication risk was 2.7% compared to an observed rate of 6.33%. The Brier score was 0.00132 (max: 0.0593, % of max: 2.23), again indicating adequate discrimination with underestimation of overall absolute risk. SSI predictions similarly demonstrated adequate discrimination and close agreement between predicted (1.0%) and observed (1.19%) rates. However, calibration was notably poorer for pneumonia and unplanned intubation, where observed rates (0.73% for both) substantially exceeded predicted risks (0.17%), suggesting the model inadequately captures respiratory complication risk in anterior approaches.
For NMS patients undergoing posterior fusion, the predicted complication risk was 3.33%, whereas the observed rate was 24.3%. The Brier score was 0.04397 (max: 0.184, % of max: 23.9), indicating poor discrimination and very underestimated absolute complication risk. Similar systematic underestimation was observed for pneumonia and unplanned intubation across this approach. Among NMS patients undergoing anterior approaches, the predicted complication risk was 3.33% compared to an observed rate of 25.3%. The Brier score was 0.04826 (max: 0.189, % of max: 25.6), again reflecting poor discrimination and very underestimated absolute complication risk. Reoperation and readmission demonstrated particularly poor discrimination (% of max: 8.94 and 9.69, respectively), with observed rates of 11.9% and 12.6% substantially exceeding predictions.
Total Cohort Performance
Predicted Percentages From the NSQIP Pediatric Surgical Risk Calculator and Observed Percentages From the Observed Total Cohort of Perioperative Outcomes, as Well as the Brier Score and Brier Max Between the Predicted and Observed Percentages for the Total Cohort
Discussion
This study of 58,010 pediatric spinal deformity patients demonstrates that the ACS-NSQIP PSRC accurately predicts several individual 30-day postoperative complications, including SSI, pneumonia, and mortality, with generally good calibration. Predictive performance was strongest in IS patients undergoing 0-6 or 7-12 level fusions, where predicted and observed complication rates aligned closely, but the model underestimated overall complication risk in ≥13 level fusions. Performance was substantially poorer in NMS, where any complication was consistently underestimated across fusion levels despite acceptable calibration for individual complications. Predictive performance was otherwise similar across posterior and anterior approaches, with good calibration for most outcomes.
The increasing frequency of spinal deformity surgeries in pediatric patients has placed significant pressure on healthcare systems, leading to rising expenditures driven by prolonged hospitalizations, intensive postoperative care, and high complication rates.9,14 Pediatric spinal deformity procedures, including IS and NMS corrections, are among the most resource-intensive surgeries, with complication rates exceeding 20% in high-risk cohorts.15,16 These complications, including SSIs and respiratory failure, may have significant lifelong implications, including impaired functional outcomes, delayed return to school and activities, and increased long-term healthcare utilization.2,16,17 Thus, risk prediction tools such as the PSRC must be validated so clinicians can identify high-risk patients before surgery and implement targeted, patient-specific interventions, such as perioperative nutritional optimization, respiratory prehabilitation, or multidisciplinary team coordination for children with complex comorbidities—to mitigate complications.11,18 Furthermore, unlike adults, children undergoing spinal deformity correction face unique developmental, physiological, and psychosocial considerations, necessitating risk prediction models tailored to their specific needs.
In addition to the ACS-NSQIP PSRC, multiple other risk calculators and risk stratification tools have been developed or adapted for use in pediatric spine surgery. For surgical site infections specifically, Matsumoto et al developed a validated, multi-institutional SSI risk calculator specifically for pediatric spinal deformity surgery. 7 The calculator incorporated factors such as nonambulatory status, neuromuscular etiology, pelvic instrumentation, operative time ≥7 hours, ASA grade >2, revision surgery, hospital volume, abnormal hemoglobin, and elevated BMI, achieving an AUC of 0.76. Regarding risk stratification tools, the Pediatric Scoliosis Infection Risk (PSIR) Score was derived from over 31,000 NSQIP Pediatric cases undergoing posterior arthrodesis for scoliosis/kyphosis, achieving an AUC of 0.74-0.78 for predicting 30-day postoperative infection. 19 The surgical apgar score (SAS) was originally developed for general surgery procedures, but has recently been applied to and validated in cerebral palsy patients undergoing spinal deformity correction, where lower scores were associated with higher complication rates; however, the overall discriminative ability was modest with an AUC of 0.65. 20 Notably, no single, universally adopted pediatric spinal deformity correction-specific comprehensive calculator exists as most tools are either general pediatric surgical risk calculators applied to spine cases, such as the ACS-NSQIP PSRC, or focus on a single outcome such as SSI. 21
The underestimation of complications in complex cases with extensive fusion levels aligns with prior literature demonstrating prediction tools often fail to accurately capture complications in severe neuromuscular scoliosis cases requiring extensive fusion.10,22–24 The approach-dependent variability supports McCarthy et al.’s observations that anatomical factors in anterior approaches create distinct risk profiles not adequately captured by general calculators. 25 Furthermore, the NSQIP calculator’s performance in pediatric spinal surgery should be considered alongside evidence from adult NSQIP calculator validation studies. Narain et al evaluated the adult ACS-NSQIP calculator in adult spinal procedures and found similar patterns of underestimation in high-complexity cases involving multiple vertebral levels and combined approaches. 26 Revision status may also impact risk calculator performance, with poorer prediction in revision deformity cases, with other pediatric risk calculators accounting for to improve accuracy. 7 These findings mirror our pediatric results, suggesting inherent limitations in capturing the full complexity of deformity surgery across both adult and pediatric calculators.
The validation of the ACS-NSQIP PSRC for pediatric spinal deformity surgery informs clinical practice across multiple domains, with surgeons advised to place greater confidence in predictions for posterior approaches with fewer than 13 fusion levels while exercising greater caution for anterior approaches and extensive constructs. From a systems perspective, the calculator enables risk-stratified care pathways that may reduce ICU utilization while maintaining outcomes, supporting a framework for targeted intervention bundles (eg, nutritional supplementation, pulmonary conditioning) to mitigate complications.27,28 Families of children with scoliosis often struggle with the complexity of surgical risks, and personalized, visualized risk data can improve informed consent quality while helping to align expectations regarding postoperative recovery and complications. 29 Providing individualized, data-driven risk estimates has been shown to significantly improve family comprehension compared to general risk discussions. 30 While refinements are needed to enhance approach-specific predictions and long-term complication tracking, its current implementation drives advancements in risk-adjusted, value-based care for children undergoing spinal deformity surgery.
Limitations and Future Directions
The retrospective nature of this study introduces potential selection bias and documentation variability across NSQIP-participating centers. Additionally, the NSQIP database’s reliance on ICD/CPT codes may oversimplify complex spinal deformities, and its 30-day tracking window fails to capture late complications such as implant failures and junctional kyphosis.
The development of a more accurate risk stratification tool for pediatric spinal deformity surgery necessitates integration of deformity-specific, patient-level, and intraoperative variables into a unified framework. A key predictor demonstrating clear dose-response relationship with complication risk was fusion length, a relationship that binary CPT codes fundamentally obscure. 31 Pairing fusion lengths with curve magnitudes in the form of Cobb angles and deformity etiology as co-predictors would allow the model to distinguish between a 14-level posterior fusion for severe neuromuscular scoliosis from a 14-level fusion for rigid adolescent idiopathic scoliosis, which carry meaningfully different risk profiles despite identical CPT coding. Neuromuscular disorder subtypes should also be considered as categorical or continuous variables to reflect the distinct risks and burdens of cerebral palsy, Duchenne muscular dystrophy, spinal muscular atrophy, and myelomeningocele. 32 One possible means of operationalization is the use of existing functional classification systems such as the Gross Motor Function Classification System (GMFCS) for cerebral palsy or the Expanded Disability Status Scale, both of which have demonstrated associations with surgical outcomes and are feasibly collected preoperatively. 33 Nutritional status should also be considered and similarly quantified beyond binary support-dependence flags incorporating serum albumin or prealbumin levels, BMI z-scores, and weight-for-age percentile, all of which have been independently associated with SSI and respiratory complications in pediatric spine surgery and are routinely available in the preoperative setting. 7 Pulmonary function parameters, such as forced vital capacity, should be incorporated as continuous predictors of pneumonia and unplanned intubation risk given that restrictive lung disease is especially prevalent in advanced NMS and directly predicts postoperative respiratory failure. 34 Additional variables that merit inclusion are revision surgery status and preoperative ambulatory classification.
From a modeling standpoint, these variables are suited to gradient boosting, random forest, or other machine learning models that can capture nonlinear interactions that logistic regressions cannot accommodate. The current PSRC recently upgraded to machine learning from a logistic model; however, the training of this model remains broad and unspecific to pediatric spine surgeries. 35 A tree-based ensemble model trained on ACS-NSQIP Pediatric data supplemented with spine-specific variables would allow automatic detection of high-risk subgroup combinations, such as NMS patients with serum albumin <3.0 g/dL, and planned 13+ level fusion, the the phenotype for which our data demonstrated the greatest miscalibration (observed complication rate 30.4% vs predicted 4.5%). Ultimately, such a model could be incorporated into prospective clinical decision making support tools that generate individualized risk estimates at the time of surgical planning supporting both surgeons and family counseling with a level of specificity and accuracy that the current PSRC cannot provide.
Conclusion
The ACS-NSQIP Pediatric Surgical Risk Calculator demonstrates generally strong predictive performance for several 30-day postoperative complications following pediatric spinal deformity surgery, including surgical site infection, pneumonia, and mortality. However, the calculator underestimates overall complication risk in higher-complexity cases, particularly among patients undergoing extensive (13 or more level) fusions and those with neuromuscular scoliosis. Despite these limitations, the tool remains useful for preoperative risk stratification and perioperative counseling. Future refinements incorporating deformity-specific and neuromuscular variables, as well as measures of surgical complexity, may improve predictive accuracy and better inform procedure-specific risk assessment for pediatric spinal deformity surgery.
Footnotes
Ethical Considerations
This study was determined to be exempt from Institutional Review Board (IRB) review as it utilized de-identified patient data from the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) database. No direct patient contact or identifiable private information was involved.
Consent to Participate
Not applicable - patient consent was not required for use of de-identified database records.
Funding
The authors did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Conflicting Interests
The authors certify that there are financial or non-financial no conflicts of interest regarding the material discussed in the manuscript.
Data Availability Statement
The data that support the findings of this study are available from the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP). Restrictions apply to the availability of these data, which were used under license for the current study, and so they are not publicly available. Data are, however, available from the authors upon reasonable request and with permission from the ACS-NSQIP.
