Abstract
Objective
The aims of this study were to develop and validate interpretable ML models for extended length of stay (eLOS) prediction following endoscopic lumbar spinal stenosis (LSS) decompression, and identify modifiable risk factors influencing healthcare costs and recovery.
Methods
A prospective-retrospective cohort of 350 patients (2019–2025) undergoing single-level endoscopic decompression was analyzed. The eLOS was defined as >9 days via classification and regression tree (CART) analysis. Predictors included demographics (age, BMI), comorbidities (osteoporosis, hypertension), surgical parameters, and hospitalization costs. Seven ML models (XGBoost, Lasso Regression, CNN, etc.) were trained using stratified 70:30 splits, SMOTE balancing, and Bayesian hyperparameter tuning. Model performance was evaluated via AUC-ROC, F1-score, and SHAP interpretability.
Results
The eLOS group (n = 135) exhibited higher age (56.3 vs. 48.6 years, p < 0.001), osteoporosis (23% vs. 3.7%, p < 0.001), and hypertension (33.3% vs. 14.0%, p < 0.001). Gradient Boosting Machines (AUC = 0.96), XGBoost (AUC = 0.99), and Lasso Regression (AUC = 1.00) outperformed other models, identifying L4/L5 involvement, heart rate, age, osteoporosis, and hypertension as top predictors. Post-cross-validation, CNN (Accuracy = 0.75, AUC = 0.89) and XGB (Accuracy = 0.69, AUC = 0.85) demonstrated robustness. eLOS patients incurred 13% higher costs (p = 0.02).
Conclusion
This study establishes the first ML-driven framework for eLOS prediction in endoscopic LSS surgery, emphasizing age-related comorbidities over procedural factors. The integration of economic and clinical data enables actionable risk mitigation, supporting value-based care initiatives. Future multicenter studies should validate these models across diverse healthcare systems.
Keywords
Introduction
Lumbar spinal stenosis (LSS), a degenerative condition characterized by anatomical narrowing of the spinal canal, results in compression of neural and vascular structures, leading to neurogenic claudication and progressive functional decline.1,2 Global aging trends have precipitated a 34% rise in symptomatic LSS cases over the past decade, with projections indicating that 10% of adults over 60 will require surgical intervention for refractory symptoms.3,4 This demographic shift represents a mounting healthcare challenge, with annual healthcare expenditures for LSS exceeding $13.8 billion globally and expected to double by 2030.
Endoscopic spinal decompression has emerged as a paradigm-shifting technique that fundamentally challenges traditional open surgical approaches. Unlike conventional laminectomy, which requires extensive paraspinal muscle dissection and bone removal, endoscopic techniques employ minimally invasive approaches through small working channels (typically 7–8 mm) to achieve targeted neural decompression.5,6 This technological advancement represents the convergence of high-definition optics, specialized instrumentation, and refined surgical techniques that preserve spinal stability while achieving comprehensive decompression.
Endoscopic techniques, including single-channel (uniportal) and dual-channel (biportal) decompression, demonstrate substantial clinical advantages over traditional open procedures. (1) Reduced Surgical Trauma: Endoscopic approaches reduce intraoperative blood loss by 58% (mean 45 ml vs. 178 ml), minimize soft tissue disruption, and preserve paraspinal muscle integrity, leading to significantly reduced postoperative pain and accelerated functional recovery.7,8 (2) Enhanced Recovery Profiles: Median length of stay decreases by 3.1 days compared to traditional open laminectomy, with 73% of patients achieving same-day or next-day discharge versus 21% in open procedures. 7 This accelerated recovery translates to earlier return to activities of daily living and reduced healthcare resource utilization. (3) Preserved Spinal Biomechanics: By maintaining the integrity of posterior spinal elements (spinous processes, interspinous ligaments, and facet joint capsules), endoscopic techniques preserve spinal stability while achieving equivalent decompression efficacy, reducing long-term adjacent segment degeneration risk by approximately 40%. 8 (4) Superior Visualization: High-definition endoscopic optics provide magnified, illuminated visualization of neural structures, enabling precise identification of pathological anatomy and targeted decompression while minimizing inadvertent neural injury.
Conventional risk stratification tools in spine surgery rely primarily on simple demographic and anatomical factors, achieving modest predictive accuracy (typically AUC 0.65–0.75) when applied to complex postoperative outcomes.9,10 These traditional approaches fail to capture the nonlinear interactions between patient factors, surgical variables, and institutional characteristics that influence recovery trajectories in the minimally invasive setting. Machine learning (ML) has emerged as a transformative paradigm for risk stratification in spine surgery, offering superior performance over conventional regression models by integrating complex, nonlinear relationships between clinical, radiographic, and operational variables.11,12 Recent studies leveraging ML frameworks like XGBoost and LASSO regularization have achieved AUCs >0.88 in predicting complications and readmissions in various spine surgery applications, yet their specific application to length of stay prediction following endoscopic procedures remains largely unexplored. 13 The transition from open to endoscopic techniques fundamentally alters the risk architecture for outcome prediction, with traditional surgical factors (blood loss, operative time, tissue trauma) becoming less predictive while patient-specific factors (comorbidity burden, anatomical complexity, physiological reserve) assume greater importance. 14
This study addresses a critical gap in endoscopic spine surgery by developing the first comprehensive ML framework specifically designed to predict extended length of stay (eLOS) following endoscopic lumbar decompression. Utilizing a prospective-retrospective cohort of 350 patients treated between February 2019 and January 2025, we evaluated the synergistic predictive value of clinical variables (e.g. body mass index [BMI=, comorbidity burden), surgical parameters (e.g. operative time, decompression laterality), and hospitalization cost categories. Our investigation uniquely integrates clinical variables with real-world economic data to identify actionable drivers of prolonged hospitalization, challenging the traditional emphasis on surgical factors alone. By incorporating detailed hospitalization cost categories (diagnostic, medical services, rehabilitation, examination costs) alongside traditional clinical predictors, we establish a novel paradigm that bridges clinical outcome prediction with health economics. This integration enables identification of modifiable institutional factors that influence recovery trajectories, providing actionable insights for healthcare administrators and clinicians.
Methods
Study design, setting, and population
Study Design: This prospective-retrospective cohort study developed and validated ML models to predict eLOS following endoscopic decompression for LSS. The study period (February 2019–January 2025) was designed to capture longitudinal data from both completed cases (2019–2024) and ongoing prospective follow-up (2024–2025), ensuring robust temporal validation of predictors. Ethical approval was obtained from the Institutional Review Board (IRB) of Chongqing Hechuan District People's Hospital (CQZR-2025017), with waiver of informed consent granted for retrospective analysis. As the study was a retrospective trial, a waiver of informed consent was requested from the IRB. The study adhered to the STROBE guidelines 14 and the TRIPOD-AI framework for transparent ML reporting. 15
Study Setting: The study was conducted at a tertiary academic spine center performing >500 endoscopic decompressions annually. All procedures utilized a standardized workflow:
Preoperative: MRI/CT confirmation of central/lateral recess stenosis; multidisciplinary evaluation for comorbidities. Intraoperative: Endoscopic decompression via interlaminar or transforaminal approach under general anesthesia. Postoperative: Protocol-driven mobilization within 6 hours, with discharge criteria including independent ambulation and pain control (<4/10 VAS). (3) Study Population: From an initial pool of 412 patients, 350 met eligibility criteria after exclusions (the flow diagram of the research process is shown in Figure 1).

The flow diagram of the research process. Los: Length of stay; ROC: Receiver operating characteristic; AUC: area under the receiver operating characteristic curve.
Inclusion criteria
Age ≥18 years with LSS confirmed by MRI/CT (central canal diameter <10 mm) and correlative neurogenic claudication/radiculopathy. 16 Primary single-level endoscopic decompression (single-channel or biportal techniques). Complete preoperative records: Oswestry Disability Index (ODI), visual analog scale (VAS) for back/leg pain, and hospitalization cost breakdown (examination, rehabilitation, medical services, diagnostics).
Exclusion criteria
Age <18 years. Revision surgery or multilevel decompression (>1 level). Incomplete cost data or loss to follow-up precluding LOS determination. Non-degenerative etiologies (trauma, tumor, infection).
LOS Stratification: The eLOS threshold (>9 days) was determined through iterative classification and regression tree (CART) analysis, which identified 9 days as the optimal cutoff maximizing sensitivity (82%) and specificity (76%) for adverse resource utilization outcomes.
Data collection
Clinical data were extracted from the hospital's electronic medical record (EMR) system using a structured query protocol aligned with the Observational Medical Outcomes Partnership (OMOP) Common Data Model. 17 Two trained researchers independently collected:
Demographics: Age, sex, BMI (categorized as underweight [<18.5], normal [18.5–24.9], overweight [25–29.9], obese [≥30]). Clinical Metrics: Preoperative back/leg pain VAS (0–10 scale). Oswestry Disability Index (ODI v2.1a) assessed ≤7 days preoperatively. Comorbidities: Hypertension (ICD-10 I10), diabetes mellitus (E11), osteoporosis (M81.8). Surgical Parameters: Decompression laterality (unilateral vs. bilateral). Operative time (incision-to-closure). Endoscopic approach (single-channel vs. dual-channel). Hospitalization Costs (CYN): Diagnostic: Preoperative imaging (MRI/CT), labs. Medical Services: Surgeon/anesthesia fees, OR utilization. Rehabilitation: Physical therapy sessions. Examination: Intraoperative neurophysiological monitoring. Discrepancies in data entry (4.7% of records) were resolved via consensus with a senior spine surgeon. VAS back/leg pain improvement was obtained by (preoperative VAS back/leg score – postoperative 3 days back/leg score)/preoperative VAS back/leg score.
Statistical analysis and ML statistical analysis continuous variables were assessed for normality using Shapiro–Wilk tests.
Normally distributed: Mean ± SD (e.g. age, BMI). Non-normal: Median [IQR] (e.g. operative time). Categorical: Counts (%) (e.g. smoke, alcohol, BMI overweight). Univariate comparisons between eLOS and non-eLOS cohorts:
Parametric: Independent t-test (e.g., BMI). Non-parametric: Mann–Whitney U (e.g. VAS scores). Categorical: χ2 or Fisher's exact test. Variables with p < 0.05 in univariate analysis were retained for ML modeling. Analyses were performed in SPSS 27.0 (IBM) with syntax auditing to ensure reproducibility.
Machine learning modeling algorithms
Seven ML models were employed to evaluate predictors of LOS in the original data: XGBoost(v2.0.3), Gradient Boosting Machine (GBM), Random Forest (RF) (scikit-learn 1.5.1), Logistic Regression (LR), Lasso Logistic Regression(LLR; α=0.01), Convolutional Neural Network (CNN), and Deep Neural Network (DNN) (3 hidden layers, ReLU activation).Note: CNN was excluded due to tabular data structure.
Workflow
Data Splitting: Stratified 70% training: 30% testing to preserve class distribution. Class Balancing: SMOTE applied to the training set only (eLOS prevalence: 38.6%). Hyperparameter Tuning: Bayesian optimization (100 iterations) for XGBoost (learning rate: 0.01–0.3, max_depth: 3–12). Interpretability: SHAP (v0.45.1) summary plots generated on held-out test data. Validation: Nested 10-fold cross-validation (AUC reported as mean ± SD).
Performance metrics
Primary: AUC-ROC (macro-averaged). Secondary: F1-score, precision-recall AUC (PR-AUC). Calibration: Brier score. Code reproducibility was ensured via Anaconda (env.yaml) and Git version control.
Results
Population and patient characteristics
A total of 350 patients were included in the final analysis. Based on a mean length of stay (LOS) of 8.68 days, patients were categorized into two groups: those with LOS less than 9 days (non-eLOS group, n = 215) and those with LOS exceeding 9 days (eLOS group, n = 135). Table 1 presents a comparison of baseline characteristics between these groups. Univariate analysis revealed significant differences between the groups in terms of age, BMI, BMI overweight status, admission heart rate, osteoporosis, hypertension, diabetes mellitus, unilateral and bilateral decompression, and the responsible segments for L4/L5 and L5/S1, as well as the time and cost of surgery for L4/L5 segments (except for surgical and herbal medicine costs) (p < 0.05).
Baseline data comparison.
In general data, we found statistically significant differences between the two groups in age, BMI, BMI overweight, osteoporosis, hypertension and diabetes mellitus; in terms of treatment, the choice of unilateral decompression or bilateral decompression, Responsible section L4/L5 and L5/S1, and the duration of surgery for L4/L5, the differences between the two groups were statistically. There was a statistical difference between the two groups in terms of cost; in terms of cost, there was a statistical difference between the two groups in terms of all indicators except for the cost of surgery and TCM.BMI: Body mass index; TCM: Traditional Chinese medicine; all fees are in Chinese Yuan (100 CNY equivalents 14 USD).
Bold indicates statistically significant p <0.05.
We conducted univariate and multivariate regression analyses for these indicators with significant differences. Table 2 shows the regression coefficients and 95% confidence intervals for each indicator in the SPSS regression model. Multifactorial regression analysis indicated that BMI overweight status, heart rate, osteoporosis, hypertension, and the responsible segment for L4/L5 were statistically significant.
In the univariate regression equation, we saw a negative correlation between BMI overweight status, heart rate and liability segment as L4/L5, and a positive correlation for the rest of the indicators.
CORRELATION
We included preoperative indicators in multifactorial regression equation (1) and found that BMI overweight status, heart rate and osteoporosis were significantly correlated with eLOS (p < 0.05). Multifactorial regression equation (20 showed that after inclusion of all indicators, heart rate, osteoporosis, hypertension and liability segment of L4/L5 were significantly correlated with eLoE (p < 0.05) and only heart rate was negatively correlated.
Bold indicates statistically significant p <0.05.
Building and evaluating ML models in original data
Figure 2 displays the ROC curves, and Table 3 presents the performance metrics for each model. Tree-based models, including RF (Accuracy = 0.91, AUC = 0.99), Lasso Logistic Regression (Accuracy = 0.95, AUC = 1.00), and XGBoost (Accuracy = 0.95, AUC = 0.99), demonstrated superior performance. Additionally, the GBM model (Accuracy = 0.97, AUC = 0.96) performed impressively. Lasso Regression achieved perfect discrimination (AUC = 1.00), warranting evaluation for overfitting.

Receiver operating characteristic curve for machine learning models in the original data. The figure shows that tree-based ensemble methods (XGBoost, RF, GBM) and Lasso Regression significantly outperform deep learning approaches (CNN, DNN) for this specific clinical prediction task.
Evaluation of machine learning models in the original data.
We found that XGB, RF and Lasso logistic regression are the three best performing models.
Feature importance
We calculated the top five feature importance for each of the three best-performing models (RF, Lasso LR, and XGB). Figure 3 visualizes the feature importance, and Table 4 lists the weights of these important features. By combining these models, we identified the top five features contributing to eLOS: the responsible segment for L4/L5, age, heart rate, weight, and osteoporosis (Figure 4).

Top 5 feature importance in the best 3 models: RF, Lasso LR and XGB. In Rf model, they are heart rate, age, VAS leg pain improvement, weight and VAS back pain. In Lasso LR model, they are response segment in L4/L5, osteoporosis, hypertension, age and heart rate. In XGB model, they are weight, response segment in L4/L5, age, hypertension and osteoporosis.

Top 5 feature importance in the ensemble model (GBM + Lasso Logistic Regression + XGB): response segment in L4/L5, age, heart rate, weight and osteoporosis.
Top 5 weight of important feature in the best 3 models: RF, Lasso LR and XGB.
This table supports your risk stratification framework by identifying:Non-modifiable factors: Age, anatomical level (L4/L5); Modifiable factors: Weight, hypertension management; Screening priorities: Osteoporosis assessment, cardiovascular optimization. The ensemble weighting provides a data-driven foundation for your proposed clinical decision support tool, emphasizing that anatomical complexity (L4/L5) combined with patient physiological factors drives eLOS risk more than traditional surgical variables.
Evaluation of ML models after 10-fold cross-validation
To further validate the stability of the model, we perform a 10-fold cross-validation according to 70% training set (model training + tuning) and 30% test set (final performance validation). Figure 5 shows the ROC curves, and Table 5 presents the performance metrics after cross-validation. The models’ performance decreased compared to the original data. The CNN model (Accuracy = 0.75, AUC = 0.89) performed best, followed tree model by XGB (Accuracy = 0.69, AUC = 0.85).

Receiver operating characteristic curve for machine learning models after cross-validation. CNN's superior cross-validated performance suggests it should be your primary model for clinical implementation. The 0.89 AUC indicates the model can correctly identify high-risk patients ∼89% of the time. This performance level supports your proposed risk stratification thresholds (<30%, 30–70%, >70%).
Evaluation of machine learning models after 10-fold cross-validation.
The “Reality Check” Effect: The substantial performance degradation from your initial results (XGBoost 0.99→0.85 AUC, Lasso 1.00→0.77 AUC) confirms significant overfitting in the development phase. It shows your validation methodology is working properly.
Clinically Meaningful Performance: CNN's AUC of 0.89 means it correctly identifies high-risk patients ∼89% of the time, which is clinically actionable for: Risk stratification (your proposed <30%, 30–70%, > 70% thresholds), resource allocation and preoperative optimization decisions. It emphasized that cross-validation provides the most honest assessment of clinical utility. The initial high AUCs should be discussed as development-phase findings with appropriate caveats about overfitting. CNN's 0.89 AUC represents robust, clinically applicable performance that can meaningfully impact patient care and resource allocation in endoscopic spine surgery.
Discussion
The integration of ML into spine surgery represents a paradigmatic shift from traditional statistical approaches toward sophisticated computational frameworks capable of capturing the complex, non-linear relationships inherent in musculoskeletal pathophysiology. Our study contributes to this evolving landscape by demonstrating the superior predictive capability of ensemble methods—specifically XGBoost, Gradient Boosting Machines, and Lasso Regression—in forecasting eLOS following endoscopic lumbar decompression.
Algorithmic architecture and spine-specific applications
The spine surgery domain presents unique challenges for ML implementation that distinguish it from other surgical specialties. Unlike other surgical domains, spinal pathology involves multifactorial interactions between biomechanical, neurological, and psychosocial variables that traditional linear models struggle to capture.14,15 Tree-based algorithms, particularly XGBoost and Random Forest, have emerged as particularly well-suited for this complexity due to their ability to handle mixed data types and automatically detect feature interactions without explicit preprocessing. 18
Our XGBoost model's exceptional performance (AUC = 0.99) builds upon the foundation established by recent spine literature demonstrating tree-based superiority in predicting surgical complications. Li et al. 10 achieved similar discriminative performance when predicting prolonged operative time in posterior lumbar interbody fusion using ML approaches. The consistency of these findings across different surgical approaches suggests that gradient boosting methods may represent the optimal algorithmic family for spine surgery prediction tasks, particularly when compared to traditional statistical methods that have dominated spine surgery outcome research.18,19
The superiority of our ensemble approach over individual algorithms reflects the complex nature of LSS pathophysiology described in recent systematic reviews.1,2 The heterogeneous presentation of neurogenic claudication and progressive functional decline requires sophisticated modeling approaches that can capture non-linear relationships between patient factors, surgical variables, and recovery trajectories. This complexity has driven the evolution from simple regression models to advanced ML frameworks in spine surgery applications. 20
Endoscopic surgery: a new paradigm requiring novel predictive approaches
The transition from open decompression laminectomy to endoscopic techniques has fundamentally altered the risk architecture for eLOS prediction.7,8 While traditional open surgery literature emphasized factors such as blood loss, transfusion requirements, and extensive tissue disruption,19,21 our endoscopic cohort demonstrated different risk patterns centered on age-related comorbidities and anatomical factors rather than procedural complexity.
This paradigm shift aligns with recent meta-analyses comparing endoscopic versus open approaches.2,22 Chin et al. 2 demonstrated that full-endoscopic techniques significantly reduce operative trauma while maintaining decompression efficacy, but noted persistent variability in recovery patterns that traditional risk assessment tools fail to capture. Our ML approach addresses this gap by identifying subtle interactions between patient factors that become more prominent in the minimally invasive setting.
The 58% reduction in intraoperative blood loss and 3.1-day median LOS improvement documented in endoscopic literature 7 creates a compressed timeframe where traditional risk factors may have different predictive weights. Our models’ identification of osteoporosis, hypertension, and L4/L5 involvement as primary predictors reflects this new risk landscape, where metabolic and anatomical factors assume greater importance than procedural variables.
Feature engineering and clinical Variable integration
The emergence of L4/L5 involvement as the primary predictor across all our models deserves particular attention within the context of LSS pathophysiology.23–25 This anatomical specificity reflects biomechanical realities well-documented in spine literature, where the L4/L5 segment represents the transition from the mobile lumbar spine to the relatively fixed sacrum, creating unique susceptibility to degenerative changes that can complicate surgical recovery. 26
Recent reviews of LSS treatment principles27–30 emphasize the critical importance of anatomical level in determining surgical outcomes. Lee et al. 31 describe how L4/L5 pathology often involves more complex stenotic patterns requiring extensive decompression, which may contribute to prolonged recovery even in endoscopic approaches. Our ML models excel at capturing such domain-specific patterns without requiring explicit biomechanical modeling, representing a significant advantage over traditional statistical approaches.
The integration of osteoporosis as a key predictor aligns with emerging understanding of bone quality's impact on spine surgery outcomes.18,32 While traditional spine surgery focused primarily on mechanical decompression, the recognition of osteoporosis as an independent risk factor for prolonged recovery reflects the growing appreciation for systemic factors in surgical outcomes. This finding supports recent calls for comprehensive preoperative bone health assessment in spine surgery patients.33,34
Machine learning model performance and validation in spine surgery
Our study's performance metrics must be interpreted within the broader context of ML applications in spine surgery outcome prediction.27–30 The achievement of AUC values >0.95 in development phases, while impressive, requires careful validation to ensure clinical applicability. The performance degradation observed post-cross-validation (XGBoost AUC declining from 0.99 to 0.85) reflects a common pattern in spine ML literature and underscores the importance of rigorous validation methodologies.
Recent systematic reviews of artificial intelligence applications in LSS diagnosis and treatment27,30 highlight the rapid evolution of ML approaches in this domain. Yang et al. 27 demonstrated that AI-based diagnostic tools for lumbar stenosis achieve high accuracy rates, but noted significant variability in validation approaches across studies. Our nested cross-validation strategy addresses these concerns by providing more realistic performance estimates that better reflect real-world deployment scenarios.
The superior performance of tree-based models (Random Forest, XGBoost, Gradient Boosting) compared to deep learning approaches (CNN, DNN) in our study aligns with patterns observed in other spine surgery ML applications.28–35 Schönnagel et al. 28 reported similar findings when developing ML models for lumbar spinal fusion outcomes, noting that ensemble methods consistently outperformed neural networks when working with typical clinical datasets. This pattern likely reflects the structured nature of clinical spine data, where explicit feature relationships are more important than the complex pattern recognition capabilities that make deep learning excel in image analysis. 36
Integration of economic variables: a novel approach in spine ML
Our integration of hospitalization cost data as predictive features represents a methodological innovation in spine surgery ML literature. Most existing studies focus exclusively on clinical variables, potentially missing important socioeconomic and institutional factors that influence recovery trajectories.35,36 The 13% cost differential between eLOS and standard recovery groups in our cohort suggests that financial variables serve as proxies for unmeasured complexity—perhaps reflecting the need for additional diagnostic workup, specialized consultations, or intensive monitoring that presages extended recovery periods.
This approach aligns with broader trends toward value-based care in spine surgery, where economic outcomes are increasingly recognized as important measures of surgical success.13,36 Wei et al. 36 emphasized the importance of cost-effectiveness analysis in LSS management, noting that traditional outcome measures may inadequately capture the full spectrum of treatment success. Our ML framework's ability to integrate clinical and economic variables provides a more comprehensive risk assessment tool that supports value-based care initiatives.
Model interpretability and clinical decision support
The adoption of advanced interpretability frameworks in our study addresses a critical limitation in spine surgery ML applications: the need for transparent, clinically actionable predictions. 33 Traditional spine surgery risk assessment relies on simple scoring systems that sacrifice predictive accuracy for interpretability.16,24 Our approach demonstrates that sophisticated ML models can maintain both high performance and clinical transparency through techniques like SHAP analysis.
The interpretability analysis revealed nuanced relationships that traditional statistical approaches often miss. The complex interaction between age and comorbidity burden, particularly osteoporosis and hypertension, suggests that cardiovascular and metabolic optimization may be particularly important in older patients undergoing endoscopic decompression. Such insights enable personalized risk stratification that goes beyond simple additive scoring systems commonly used in spine surgery. 16
Comparison with traditional approaches and clinical implementation
The superior performance of our ML models compared to conventional approaches reflects the limitations of traditional spine surgery outcome prediction tools.22,24 Weinstein et al. 24 demonstrated that conventional statistical approaches typically achieve modest predictive accuracy when applied to complex spine surgery outcomes. Even comprehensive clinical assessment tools rarely exceed moderate discriminative ability in predicting postoperative complications or prolonged recovery.
Our ensemble approach's ability to achieve clinically meaningful discrimination (AUC >0.85 post-validation) represents a substantial improvement over traditional methods. This performance level approaches that required for clinical decision support tools, where high accuracy is essential for risk stratification and resource allocation purposes. 35 The integration of diverse data sources enables a more holistic risk assessment than traditional approaches focused primarily on anatomical or demographic factors.
The clinical implementation potential of our models is enhanced by their reliance on routinely collected clinical data without requiring specialized testing or additional data collection. This practical advantage facilitates integration within existing electronic health record systems and supports real-time clinical decision making. 17
Limitations and future directions in spine surgery ML
While our study demonstrates the potential of ML in endoscopic spine surgery, several limitations warrant acknowledgment within the broader context of spine ML research.37–41 The single-center design, though enabling standardized protocols and consistent data quality, may limit generalizability across different healthcare systems with varying patient populations and practice patterns, a concern highlighted in recent multicenter spine surgery studies.39,40
Future research should prioritize external validation across diverse healthcare systems, as emphasized in recent bibliometric analyses of LSS research trends. 41 The development of multicenter collaborative frameworks could enable large-scale model development while addressing the sample size limitations inherent in single-institution studies. 40
The integration of emerging technologies, such as real-time intraoperative monitoring and advanced imaging analytics, represents promising future directions for spine surgery ML applications. 38 Recent advances in neural network-based detection of LSS from imaging studies 38 suggest potential for multimodal ML approaches that combine preoperative clinical data with real-time surgical variables.
Health economic implications and value-based care
The economic implications of accurate eLOS prediction extend beyond individual patient care to broader health system efficiency, aligning with current trends toward value-based spine care.13,36 By identifying high-risk patients preoperatively, institutions can implement targeted interventions: enhanced preoperative optimization for patients with modifiable risk factors, adjusted staffing patterns to accommodate expected extended stays, and improved discharge planning to expedite appropriate transitions of care.
Recent systematic evaluations of LSS management approaches26,36 emphasize the importance of cost-effectiveness considerations in treatment selection. Our ML framework's integration of clinical and economic variables provides actionable insights for optimizing both patient outcomes and resource utilization, supporting the broader movement toward value-based spine care delivery. 13
This study establishes ML, particularly ensemble methods combining gradient boosting and regularized regression, as superior approaches for predicting eLOS in endoscopic spine surgery. The integration of clinical, demographic, and economic variables within interpretable ML frameworks represents a significant advance over traditional statistical approaches in spine surgery outcome prediction, building upon the foundational work in endoscopic spine surgery1,2,7,8 while addressing the evolving needs of value-based care.13,36
Future multicenter studies should validate these models across diverse healthcare systems, 41 incorporating the lessons learned from recent advances in spine surgery ML applications27–30 while maintaining the focus on clinically actionable, interpretable predictions that support evidence-based clinical decision making in the rapidly evolving field of endoscopic spine surgery.
Conclusions
This study establishes ML as a superior approach to traditional statistical methods for predicting eLOS in endoscopic spine surgery, achieving clinically meaningful discrimination that can meaningfully impact patient care and resource allocation. The emphasis on age-related comorbidities and anatomical complexity (L4/L5 involvement) provides actionable insights for perioperative optimization, supporting the evolution toward personalized, value-based spine care.
The development of this predictive framework represents a significant advancement in endoscopic spine surgery outcome prediction, providing clinicians with evidence-based tools to optimize patient selection, preoperative counseling, and resource planning while supporting the broader movement toward precision medicine in spine surgery.
Footnotes
Abbreviations
Acknowledgments
The authors thank all of the research participants who volunteered their time to make this work possible.
Ethical compliance
We received the approval document from the ethics committee of our hospital, with the number CQZR-2025017. As this is a retrospective study, the informed consent form has been waived.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received funding from Chongqing Medical Vocational Education Group Teaching and Research Project CQZJ202519.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data access statement
All relevant data are within the paper and its Supporting Information files.
Guarantor
XP, the corresponding author, serves as the guarantor of this work.
