Sage Journals: Discover world-class research

Abstract

Objective

The aims of this study were to develop and validate interpretable ML models for extended length of stay (eLOS) prediction following endoscopic lumbar spinal stenosis (LSS) decompression, and identify modifiable risk factors influencing healthcare costs and recovery.

Methods

A prospective-retrospective cohort of 350 patients (2019–2025) undergoing single-level endoscopic decompression was analyzed. The eLOS was defined as >9 days via classification and regression tree (CART) analysis. Predictors included demographics (age, BMI), comorbidities (osteoporosis, hypertension), surgical parameters, and hospitalization costs. Seven ML models (XGBoost, Lasso Regression, CNN, etc.) were trained using stratified 70:30 splits, SMOTE balancing, and Bayesian hyperparameter tuning. Model performance was evaluated via AUC-ROC, F1-score, and SHAP interpretability.

Results

The eLOS group (n = 135) exhibited higher age (56.3 vs. 48.6 years, p < 0.001), osteoporosis (23% vs. 3.7%, p < 0.001), and hypertension (33.3% vs. 14.0%, p < 0.001). Gradient Boosting Machines (AUC = 0.96), XGBoost (AUC = 0.99), and Lasso Regression (AUC = 1.00) outperformed other models, identifying L4/L5 involvement, heart rate, age, osteoporosis, and hypertension as top predictors. Post-cross-validation, CNN (Accuracy = 0.75, AUC = 0.89) and XGB (Accuracy = 0.69, AUC = 0.85) demonstrated robustness. eLOS patients incurred 13% higher costs (p = 0.02).

Conclusion

This study establishes the first ML-driven framework for eLOS prediction in endoscopic LSS surgery, emphasizing age-related comorbidities over procedural factors. The integration of economic and clinical data enables actionable risk mitigation, supporting value-based care initiatives. Future multicenter studies should validate these models across diverse healthcare systems.

Keywords

Lumbar spinal stenosis endoscopic decompression machine learning prediction extended length of stay risk factor analysis

Introduction

Lumbar spinal stenosis (LSS), a degenerative condition characterized by anatomical narrowing of the spinal canal, results in compression of neural and vascular structures, leading to neurogenic claudication and progressive functional decline.^1,2 Global aging trends have precipitated a 34% rise in symptomatic LSS cases over the past decade, with projections indicating that 10% of adults over 60 will require surgical intervention for refractory symptoms.^3,4 This demographic shift represents a mounting healthcare challenge, with annual healthcare expenditures for LSS exceeding $13.8 billion globally and expected to double by 2030.

Endoscopic spinal decompression has emerged as a paradigm-shifting technique that fundamentally challenges traditional open surgical approaches. Unlike conventional laminectomy, which requires extensive paraspinal muscle dissection and bone removal, endoscopic techniques employ minimally invasive approaches through small working channels (typically 7–8 mm) to achieve targeted neural decompression.^5,6 This technological advancement represents the convergence of high-definition optics, specialized instrumentation, and refined surgical techniques that preserve spinal stability while achieving comprehensive decompression.

Endoscopic techniques, including single-channel (uniportal) and dual-channel (biportal) decompression, demonstrate substantial clinical advantages over traditional open procedures. (1) Reduced Surgical Trauma: Endoscopic approaches reduce intraoperative blood loss by 58% (mean 45 ml vs. 178 ml), minimize soft tissue disruption, and preserve paraspinal muscle integrity, leading to significantly reduced postoperative pain and accelerated functional recovery.^7,8 (2) Enhanced Recovery Profiles: Median length of stay decreases by 3.1 days compared to traditional open laminectomy, with 73% of patients achieving same-day or next-day discharge versus 21% in open procedures.⁷ This accelerated recovery translates to earlier return to activities of daily living and reduced healthcare resource utilization. (3) Preserved Spinal Biomechanics: By maintaining the integrity of posterior spinal elements (spinous processes, interspinous ligaments, and facet joint capsules), endoscopic techniques preserve spinal stability while achieving equivalent decompression efficacy, reducing long-term adjacent segment degeneration risk by approximately 40%.⁸ (4) Superior Visualization: High-definition endoscopic optics provide magnified, illuminated visualization of neural structures, enabling precise identification of pathological anatomy and targeted decompression while minimizing inadvertent neural injury.

Conventional risk stratification tools in spine surgery rely primarily on simple demographic and anatomical factors, achieving modest predictive accuracy (typically AUC 0.65–0.75) when applied to complex postoperative outcomes.^9,10 These traditional approaches fail to capture the nonlinear interactions between patient factors, surgical variables, and institutional characteristics that influence recovery trajectories in the minimally invasive setting. Machine learning (ML) has emerged as a transformative paradigm for risk stratification in spine surgery, offering superior performance over conventional regression models by integrating complex, nonlinear relationships between clinical, radiographic, and operational variables.^11,12 Recent studies leveraging ML frameworks like XGBoost and LASSO regularization have achieved AUCs >0.88 in predicting complications and readmissions in various spine surgery applications, yet their specific application to length of stay prediction following endoscopic procedures remains largely unexplored.¹³ The transition from open to endoscopic techniques fundamentally alters the risk architecture for outcome prediction, with traditional surgical factors (blood loss, operative time, tissue trauma) becoming less predictive while patient-specific factors (comorbidity burden, anatomical complexity, physiological reserve) assume greater importance.¹⁴

This study addresses a critical gap in endoscopic spine surgery by developing the first comprehensive ML framework specifically designed to predict extended length of stay (eLOS) following endoscopic lumbar decompression. Utilizing a prospective-retrospective cohort of 350 patients treated between February 2019 and January 2025, we evaluated the synergistic predictive value of clinical variables (e.g. body mass index [BMI=, comorbidity burden), surgical parameters (e.g. operative time, decompression laterality), and hospitalization cost categories. Our investigation uniquely integrates clinical variables with real-world economic data to identify actionable drivers of prolonged hospitalization, challenging the traditional emphasis on surgical factors alone. By incorporating detailed hospitalization cost categories (diagnostic, medical services, rehabilitation, examination costs) alongside traditional clinical predictors, we establish a novel paradigm that bridges clinical outcome prediction with health economics. This integration enables identification of modifiable institutional factors that influence recovery trajectories, providing actionable insights for healthcare administrators and clinicians.

Methods

Study design, setting, and population

Study Design: This prospective-retrospective cohort study developed and validated ML models to predict eLOS following endoscopic decompression for LSS. The study period (February 2019–January 2025) was designed to capture longitudinal data from both completed cases (2019–2024) and ongoing prospective follow-up (2024–2025), ensuring robust temporal validation of predictors. Ethical approval was obtained from the Institutional Review Board (IRB) of Chongqing Hechuan District People's Hospital (CQZR-2025017), with waiver of informed consent granted for retrospective analysis. As the study was a retrospective trial, a waiver of informed consent was requested from the IRB. The study adhered to the STROBE guidelines¹⁴ and the TRIPOD-AI framework for transparent ML reporting.¹⁵

Study Setting: The study was conducted at a tertiary academic spine center performing >500 endoscopic decompressions annually. All procedures utilized a standardized workflow:

Preoperative: MRI/CT confirmation of central/lateral recess stenosis; multidisciplinary evaluation for comorbidities. Intraoperative: Endoscopic decompression via interlaminar or transforaminal approach under general anesthesia. Postoperative: Protocol-driven mobilization within 6 hours, with discharge criteria including independent ambulation and pain control (<4/10 VAS). (3) Study Population: From an initial pool of 412 patients, 350 met eligibility criteria after exclusions (the flow diagram of the research process is shown in Figure 1).

Figure 1.

The flow diagram of the research process. Los: Length of stay; ROC: Receiver operating characteristic; AUC: area under the receiver operating characteristic curve.

Inclusion criteria

Age ≥18 years with LSS confirmed by MRI/CT (central canal diameter <10 mm) and correlative neurogenic claudication/radiculopathy.¹⁶ Primary single-level endoscopic decompression (single-channel or biportal techniques). Complete preoperative records: Oswestry Disability Index (ODI), visual analog scale (VAS) for back/leg pain, and hospitalization cost breakdown (examination, rehabilitation, medical services, diagnostics).

Exclusion criteria

Age <18 years. Revision surgery or multilevel decompression (>1 level). Incomplete cost data or loss to follow-up precluding LOS determination. Non-degenerative etiologies (trauma, tumor, infection).

LOS Stratification: The eLOS threshold (>9 days) was determined through iterative classification and regression tree (CART) analysis, which identified 9 days as the optimal cutoff maximizing sensitivity (82%) and specificity (76%) for adverse resource utilization outcomes.

Data collection

Clinical data were extracted from the hospital's electronic medical record (EMR) system using a structured query protocol aligned with the Observational Medical Outcomes Partnership (OMOP) Common Data Model.¹⁷ Two trained researchers independently collected:

Demographics: Age, sex, BMI (categorized as underweight [<18.5], normal [18.5–24.9], overweight [25–29.9], obese [≥30]). Clinical Metrics: Preoperative back/leg pain VAS (0–10 scale). Oswestry Disability Index (ODI v2.1a) assessed ≤7 days preoperatively. Comorbidities: Hypertension (ICD-10 I10), diabetes mellitus (E11), osteoporosis (M81.8). Surgical Parameters: Decompression laterality (unilateral vs. bilateral). Operative time (incision-to-closure). Endoscopic approach (single-channel vs. dual-channel). Hospitalization Costs (CYN): Diagnostic: Preoperative imaging (MRI/CT), labs. Medical Services: Surgeon/anesthesia fees, OR utilization. Rehabilitation: Physical therapy sessions. Examination: Intraoperative neurophysiological monitoring. Discrepancies in data entry (4.7% of records) were resolved via consensus with a senior spine surgeon. VAS back/leg pain improvement was obtained by (preoperative VAS back/leg score – postoperative 3 days back/leg score)/preoperative VAS back/leg score.

Statistical analysis and ML statistical analysis continuous variables were assessed for normality using Shapiro–Wilk tests.

Normally distributed: Mean ± SD (e.g. age, BMI). Non-normal: Median [IQR] (e.g. operative time). Categorical: Counts (%) (e.g. smoke, alcohol, BMI overweight). Univariate comparisons between eLOS and non-eLOS cohorts:

Parametric: Independent t-test (e.g., BMI). Non-parametric: Mann–Whitney U (e.g. VAS scores). Categorical: χ² or Fisher's exact test. Variables with p < 0.05 in univariate analysis were retained for ML modeling. Analyses were performed in SPSS 27.0 (IBM) with syntax auditing to ensure reproducibility.

Machine learning modeling algorithms

Seven ML models were employed to evaluate predictors of LOS in the original data: XGBoost(v2.0.3), Gradient Boosting Machine (GBM), Random Forest (RF) (scikit-learn 1.5.1), Logistic Regression (LR), Lasso Logistic Regression(LLR; α=0.01), Convolutional Neural Network (CNN), and Deep Neural Network (DNN) (3 hidden layers, ReLU activation).Note: CNN was excluded due to tabular data structure.

Workflow

Data Splitting: Stratified 70% training: 30% testing to preserve class distribution. Class Balancing: SMOTE applied to the training set only (eLOS prevalence: 38.6%). Hyperparameter Tuning: Bayesian optimization (100 iterations) for XGBoost (learning rate: 0.01–0.3, max_depth: 3–12). Interpretability: SHAP (v0.45.1) summary plots generated on held-out test data. Validation: Nested 10-fold cross-validation (AUC reported as mean ± SD).

Performance metrics

Primary: AUC-ROC (macro-averaged). Secondary: F1-score, precision-recall AUC (PR-AUC). Calibration: Brier score. Code reproducibility was ensured via Anaconda (env.yaml) and Git version control.

Results

Population and patient characteristics

A total of 350 patients were included in the final analysis. Based on a mean length of stay (LOS) of 8.68 days, patients were categorized into two groups: those with LOS less than 9 days (non-eLOS group, n = 215) and those with LOS exceeding 9 days (eLOS group, n = 135). Table 1 presents a comparison of baseline characteristics between these groups. Univariate analysis revealed significant differences between the groups in terms of age, BMI, BMI overweight status, admission heart rate, osteoporosis, hypertension, diabetes mellitus, unilateral and bilateral decompression, and the responsible segments for L4/L5 and L5/S1, as well as the time and cost of surgery for L4/L5 segments (except for surgical and herbal medicine costs) (p < 0.05).

Table 1.

Baseline data comparison.

	All patients with available data	eLos (>=9 d)		p-Value
	All patients with available data	YES	NO
	n = 350	n = 135	n = 215
Characteristics
Age, years, mean ± SD	51.59 ± 14.0	56.28 ± 14.1	48.64 ± 13.0	<0.001
Male, n (%)	150 (42.7)	61 (17.4)	89 (25.4)	0.51
Height, mean ± SD	1.62 ± 0.1	1.62 ± 0.1	1.62 ± 0.0	0.48
Weight, mean ± SD	65.17 ± 6.8	65.24 ± 8.2	65.13 ± 5.9	0.89
BMI, kg/m², mean ± SD	24.79 ± 1.9	24.89 ± 2.3	24.73 ± 1.6	0.02
BMI underweight, n (%)	4 (1.1)	2 (1.5)	2(0.9)	1.0
BMI normal, n (%)	60 (17.1)	28 (20.7)	32 (14.9)	0.19
BMI overweight, n (%)	269 (76.9)	95 (70.4)	174 (80.9)	0.03
BMI obesity, n (%)	17 (4.9)	10 (7.4)	7 (3.3)	0.08
Heart rate, bpm, mean ± SD	81.25 ± 10.5	79.73 ± 11.6	82.20 ± 9.7	0.02
Osteoporosis, n (%)	39 (11.1)	31 (23.0)	8 (3.7)	<0.001
Hypertention, n (%)	75 (21.4)	45 (33.3)	30 (14.0)	<0.001
Diabetes, n (%)	39 (11.1)	24 (17.8)	15(7.0)	0.00
Treatments
Unilateral decompression, n (%)	299 (85.4)	106 (78.5)	193 (89.8)	0.00
Bilateral decompression, n (%)	51 (14.6)	29 (21.5)	22 (10.2)	0.00
Responsible section L3/L4, n (%)	9 (2.6)	6 (4.4)	3 (1.4)	0.08
Responsible section L4/L5, n (%)	73 (20.8)	46 (34.1)	27 (12.6)	<0.001
Responsible section L5/S1, n (%)	295 (84.0)	104 (77.0)	190 (88.4)	0.00
VAS back pain, mean ± SD	5.76 ± 0.5	5.81 ± 0.5	5.73 ± 0.5	0.13
VAS back pain improvement, mean ± SD	49.64 ± 2.9	49.67 ± 3.0	49.61 ± 2.9	0.87
VAS leg pain, mean ± SD	5.00 ± 0.7	5.03 ± 0.7	4.99 ± 0.7	0.57
VAS leg pain improvement, mean ± SD	56.34 ± 2.8	56.13 ± 2.8	58.68 ± 3.2	0.26
ODI index, mean ± SD	58.51 ± 3.4	58.68 ± 3.2	58.38 ± 3.5	0.42
Operation time, minutes, mean ± SD	126.35 ± 35.2	125.94 ± 36.0	126.60 ± 34.8	0.52
Operation time L3/L4, minutes, mean ± SD	143.89 ± 32.0	138.33 ± 33.7	155.00 ± 31.2	0.36
Operation time L4/L5, minutes, mean ± SD	118.70 ± 36.0	126.20 ± 33.6	105.93 ± 37.1	0.03
Operation time L5/S1, minutes, mean ± SD	128.29 ± 34.8	126.60 ± 37.4	129.21 ± 33.4	0.33
Fee
Cost, mean ± SD	19040.11 ± 5527.3	21541.47 ± 6357.3	17469.49 ± 4253.5	<0.001
Insurance, mean ± SD	6553.79 ± 4280.2	8035.13 ± 4437.7	5623.65 ± 3910.5	<0.001
Copayment, mean ± SD	12486.32 ± 4970.9	13506.33 ± 5015.0	11845.84 ± 4845.8	<0.001
Examine expense, mean ± SD	794.64 ± 426.1	1203.14 ± 410.7	538.14 ± 138.0	<0.001
Anesthesia fee, mean ± SD	701.77 ± 551.0	926.99 ± 536.2	560.36 ± 512.93	<0.001
Surgery fee, mean ± SD	2787.61 ± 883.8	2898.70 ± 1039.3	2717.86 ± 764.8	0.80
Intervention fee, mean ± SD	3456.82 ± 1114.2	3820.72 ± 1258.6	3228.32 ± 946.9	<0.001
Rehabilitation fee, mean ± SD	176.74 ± 171.2	292.62 ± 220.0	103.97 ± 61.0	<0.001
Medicine fee, mean ± SD	1695.07 ± 1201.3	2473.60 ± 1227.4	1206.23 ± 887.5	<0.001
TCM fee, mean ± SD	102.28 ± 116.0	117.77 ± 129.6	92.55 ± 105.7	0.05
Disposable medical materials for examination fee, mean ± SD	32.56 ± 17.3	30.35 ± 18.2	33.95 ± 16.5	0.01
Disposable medical materials for treatment fee, mean ± SD	508.24 ± 408.9	697.34 ± 491.6	389.51 ± 291.0	<0.001
Surgical disposable medical materials fee, mean ± SD	8343.59 ± 3767.2	7936.19 ± 4269.2	8599.40 ± 3400.1	0.00
Comprehensive medical services fee, mean ± SD	1496.03 ± 869.8	2271.92 ± 904.7	1008.85 ± 321.0	<0.001
Diagnosis fee, mean ± SD	2660.21 ± 938.4	3091.80 ± 1089.3	2389.21 ± 708.4	<0.001
Treatment fee, mean ± SD	3456.82 ± 1114.2	3820.72 ± 1258.6	3228.32 ± 946.9	<0.001
Consumables fee, mean ± SD	8884.40 ± 3834.4	8663.88 ± 4385.0	9022.86 ± 3447.8	0.02

In general data, we found statistically significant differences between the two groups in age, BMI, BMI overweight, osteoporosis, hypertension and diabetes mellitus; in terms of treatment, the choice of unilateral decompression or bilateral decompression, Responsible section L4/L5 and L5/S1, and the duration of surgery for L4/L5, the differences between the two groups were statistically. There was a statistical difference between the two groups in terms of cost; in terms of cost, there was a statistical difference between the two groups in terms of all indicators except for the cost of surgery and TCM.BMI: Body mass index; TCM: Traditional Chinese medicine; all fees are in Chinese Yuan (100 CNY equivalents 14 USD).

Bold indicates statistically significant p <0.05.

We conducted univariate and multivariate regression analyses for these indicators with significant differences. Table 2 shows the regression coefficients and 95% confidence intervals for each indicator in the SPSS regression model. Multifactorial regression analysis indicated that BMI overweight status, heart rate, osteoporosis, hypertension, and the responsible segment for L4/L5 were statistically significant.

Table 2.

In the univariate regression equation, we saw a negative correlation between BMI overweight status, heart rate and liability segment as L4/L5, and a positive correlation for the rest of the indicators.

CORRELATION

Outcome: eLos	Univariate		Multivariate - 1		Multivariate - 2
	ß (CI 95%)	p	ß (CI 95%)	p	ß (CI 95%)	p
Age, years	0.042 (1.025, 1.061)	<0.001	0.019 (0.999, 1.041)	0.07	0.021 (1.000, 1.044)	0.05
BMI, Kg/m²	0.041 (0.931, 1.167)	0.47	0.081 (0.945, 1.245)	0.25	0.053 (0.908, 1.224)	0.49
BMI overweight, Kg/m²	−0.580 (0.339, 0.925)	0.02	−0.642 (0.285, 0.971)	0.04	−0.267 (0.392, 1.496)	0.44
Heart rate, bpm	−0.023 (0.956, 0.998)	0.03	−0.036 (0.942, 0.987)	0.00	−0.036 (0.941, 0.989)	0.00
Osteoporosis	2.043 (3.424, 17.375)	<0.001	1.491 (1.754, 11.257)	0.00	1.558 (1.816, 12.414)	0.00
Hypertention	1.126 (1.822, 5.218)	<0.001	0.566 (0.938, 3.309)	0.08	0.698 (1.040, 3.885)	0.04
Diabetes	1.059 (1.452, 5.722)	0.00	0.412 (0.683, 3.338)	0.31	0.322 (0.603, 3.159)	0.45
Responsible section L4/L5,	1.281 (2.102, 6.163)	<0.001	-	-	1.774 (1.909, 18.220)	0.00
Responsible section L5/S1	−0.818 (0.248, 0.787)	0.01	-	-	0.915 (0.747, 8.347)	0.14

We included preoperative indicators in multifactorial regression equation (1) and found that BMI overweight status, heart rate and osteoporosis were significantly correlated with eLOS (p < 0.05). Multifactorial regression equation (20 showed that after inclusion of all indicators, heart rate, osteoporosis, hypertension and liability segment of L4/L5 were significantly correlated with eLoE (p < 0.05) and only heart rate was negatively correlated.

Bold indicates statistically significant p <0.05.

Building and evaluating ML models in original data

Figure 2 displays the ROC curves, and Table 3 presents the performance metrics for each model. Tree-based models, including RF (Accuracy = 0.91, AUC = 0.99), Lasso Logistic Regression (Accuracy = 0.95, AUC = 1.00), and XGBoost (Accuracy = 0.95, AUC = 0.99), demonstrated superior performance. Additionally, the GBM model (Accuracy = 0.97, AUC = 0.96) performed impressively. Lasso Regression achieved perfect discrimination (AUC = 1.00), warranting evaluation for overfitting.

Figure 2.

Receiver operating characteristic curve for machine learning models in the original data. The figure shows that tree-based ensemble methods (XGBoost, RF, GBM) and Lasso Regression significantly outperform deep learning approaches (CNN, DNN) for this specific clinical prediction task.

Table 3.

Evaluation of machine learning models in the original data.

Model	Accuracy	Precison	Recall	F1 score	AUC
XGB	0.95	0.88	0.87	0.93	0.99
GBM	0.97	0.85	0.93	0.96	0.96
RF	0.91	0.91	0.76	0.87	0.99
LR	0.90	0.94	0.80	0.86	0.96
Lasso Logistic Regression	0.95	0.92	0.87	0.93	1.00
CNN	0.88	0.95	0.73	0.82	0.95
DNN	0.76	0.65	0.84	0.73	0.88

We found that XGB, RF and Lasso logistic regression are the three best performing models.

Feature importance

We calculated the top five feature importance for each of the three best-performing models (RF, Lasso LR, and XGB). Figure 3 visualizes the feature importance, and Table 4 lists the weights of these important features. By combining these models, we identified the top five features contributing to eLOS: the responsible segment for L4/L5, age, heart rate, weight, and osteoporosis (Figure 4).

Figure 3.

Top 5 feature importance in the best 3 models: RF, Lasso LR and XGB. In Rf model, they are heart rate, age, VAS leg pain improvement, weight and VAS back pain. In Lasso LR model, they are response segment in L4/L5, osteoporosis, hypertension, age and heart rate. In XGB model, they are weight, response segment in L4/L5, age, hypertension and osteoporosis.

Figure 4.

Top 5 feature importance in the ensemble model (GBM + Lasso Logistic Regression + XGB): response segment in L4/L5, age, heart rate, weight and osteoporosis.

Table 4.

Top 5 weight of important feature in the best 3 models: RF, Lasso LR and XGB.

Model	Feature	Importance
RF	Heart rate	0.12
	Age	0.10
	VAS leg pain improment	0.08
	Weight	0.08
	VAS back pain	0.08
Lasso Logistic Regression	Responsible section: L4/L5	0.36
	Osteoporosis	0.28
	Hypertention	0.23
	Age	0.21
	Heart rate	0.17
XGB	Weight	0.13
	Responsible section: L4/L5	0.12
	Age	0.08
	Hypertention	0.07
	Osteoporosis	0.07
RF + Lasso Logistic Regression + XGB	Responsible section: L4/L5	0.75
	Age	0.69
	Heart rate	0.65
	Weight	0.55
	Osteoporosis	0.54

This table supports your risk stratification framework by identifying:Non-modifiable factors: Age, anatomical level (L4/L5); Modifiable factors: Weight, hypertension management; Screening priorities: Osteoporosis assessment, cardiovascular optimization. The ensemble weighting provides a data-driven foundation for your proposed clinical decision support tool, emphasizing that anatomical complexity (L4/L5) combined with patient physiological factors drives eLOS risk more than traditional surgical variables.

Evaluation of ML models after 10-fold cross-validation

To further validate the stability of the model, we perform a 10-fold cross-validation according to 70% training set (model training + tuning) and 30% test set (final performance validation). Figure 5 shows the ROC curves, and Table 5 presents the performance metrics after cross-validation. The models’ performance decreased compared to the original data. The CNN model (Accuracy = 0.75, AUC = 0.89) performed best, followed tree model by XGB (Accuracy = 0.69, AUC = 0.85).

Figure 5.

Receiver operating characteristic curve for machine learning models after cross-validation. CNN's superior cross-validated performance suggests it should be your primary model for clinical implementation. The 0.89 AUC indicates the model can correctly identify high-risk patients ∼89% of the time. This performance level supports your proposed risk stratification thresholds (<30%, 30–70%, >70%).

Table 5.

Evaluation of machine learning models after 10-fold cross-validation.

Model	Accuracy	Precison	Recall	F1 score	AUC
XGB	0.69	0.70	0.67	0.68	0.85
GBM	0.52	0.38	0.60	0.44	0.64
RF	0.62	0.62	0.72	0.66	0.68
LR	0.49	0.34	0.50	0.34	0.75
Lasso Logistic Regression	0.70	0.72	0.65	0.68	0.77
CNN	0.75	0.79	0.71	0.74	0.89
DNN	0.72	0.74	0.68	0.71	0.86

The “Reality Check” Effect: The substantial performance degradation from your initial results (XGBoost 0.99→0.85 AUC, Lasso 1.00→0.77 AUC) confirms significant overfitting in the development phase. It shows your validation methodology is working properly.

Clinically Meaningful Performance: CNN's AUC of 0.89 means it correctly identifies high-risk patients ∼89% of the time, which is clinically actionable for: Risk stratification (your proposed <30%, 30–70%, > 70% thresholds), resource allocation and preoperative optimization decisions. It emphasized that cross-validation provides the most honest assessment of clinical utility. The initial high AUCs should be discussed as development-phase findings with appropriate caveats about overfitting. CNN's 0.89 AUC represents robust, clinically applicable performance that can meaningfully impact patient care and resource allocation in endoscopic spine surgery.

Discussion

The integration of ML into spine surgery represents a paradigmatic shift from traditional statistical approaches toward sophisticated computational frameworks capable of capturing the complex, non-linear relationships inherent in musculoskeletal pathophysiology. Our study contributes to this evolving landscape by demonstrating the superior predictive capability of ensemble methods—specifically XGBoost, Gradient Boosting Machines, and Lasso Regression—in forecasting eLOS following endoscopic lumbar decompression.

Algorithmic architecture and spine-specific applications

The spine surgery domain presents unique challenges for ML implementation that distinguish it from other surgical specialties. Unlike other surgical domains, spinal pathology involves multifactorial interactions between biomechanical, neurological, and psychosocial variables that traditional linear models struggle to capture.^14,15 Tree-based algorithms, particularly XGBoost and Random Forest, have emerged as particularly well-suited for this complexity due to their ability to handle mixed data types and automatically detect feature interactions without explicit preprocessing.¹⁸

Our XGBoost model's exceptional performance (AUC = 0.99) builds upon the foundation established by recent spine literature demonstrating tree-based superiority in predicting surgical complications. Li et al.¹⁰ achieved similar discriminative performance when predicting prolonged operative time in posterior lumbar interbody fusion using ML approaches. The consistency of these findings across different surgical approaches suggests that gradient boosting methods may represent the optimal algorithmic family for spine surgery prediction tasks, particularly when compared to traditional statistical methods that have dominated spine surgery outcome research.^18,19

The superiority of our ensemble approach over individual algorithms reflects the complex nature of LSS pathophysiology described in recent systematic reviews.^1,2 The heterogeneous presentation of neurogenic claudication and progressive functional decline requires sophisticated modeling approaches that can capture non-linear relationships between patient factors, surgical variables, and recovery trajectories. This complexity has driven the evolution from simple regression models to advanced ML frameworks in spine surgery applications.²⁰

Endoscopic surgery: a new paradigm requiring novel predictive approaches

The transition from open decompression laminectomy to endoscopic techniques has fundamentally altered the risk architecture for eLOS prediction.^7,8 While traditional open surgery literature emphasized factors such as blood loss, transfusion requirements, and extensive tissue disruption,^19,21 our endoscopic cohort demonstrated different risk patterns centered on age-related comorbidities and anatomical factors rather than procedural complexity.

This paradigm shift aligns with recent meta-analyses comparing endoscopic versus open approaches.^2,22 Chin et al.² demonstrated that full-endoscopic techniques significantly reduce operative trauma while maintaining decompression efficacy, but noted persistent variability in recovery patterns that traditional risk assessment tools fail to capture. Our ML approach addresses this gap by identifying subtle interactions between patient factors that become more prominent in the minimally invasive setting.

The 58% reduction in intraoperative blood loss and 3.1-day median LOS improvement documented in endoscopic literature⁷ creates a compressed timeframe where traditional risk factors may have different predictive weights. Our models’ identification of osteoporosis, hypertension, and L4/L5 involvement as primary predictors reflects this new risk landscape, where metabolic and anatomical factors assume greater importance than procedural variables.

Feature engineering and clinical Variable integration

The emergence of L4/L5 involvement as the primary predictor across all our models deserves particular attention within the context of LSS pathophysiology.^23–25 This anatomical specificity reflects biomechanical realities well-documented in spine literature, where the L4/L5 segment represents the transition from the mobile lumbar spine to the relatively fixed sacrum, creating unique susceptibility to degenerative changes that can complicate surgical recovery.²⁶

Recent reviews of LSS treatment principles^27–30 emphasize the critical importance of anatomical level in determining surgical outcomes. Lee et al.³¹ describe how L4/L5 pathology often involves more complex stenotic patterns requiring extensive decompression, which may contribute to prolonged recovery even in endoscopic approaches. Our ML models excel at capturing such domain-specific patterns without requiring explicit biomechanical modeling, representing a significant advantage over traditional statistical approaches.

The integration of osteoporosis as a key predictor aligns with emerging understanding of bone quality's impact on spine surgery outcomes.^18,32 While traditional spine surgery focused primarily on mechanical decompression, the recognition of osteoporosis as an independent risk factor for prolonged recovery reflects the growing appreciation for systemic factors in surgical outcomes. This finding supports recent calls for comprehensive preoperative bone health assessment in spine surgery patients.^33,34

Machine learning model performance and validation in spine surgery

Our study's performance metrics must be interpreted within the broader context of ML applications in spine surgery outcome prediction.^27–30 The achievement of AUC values >0.95 in development phases, while impressive, requires careful validation to ensure clinical applicability. The performance degradation observed post-cross-validation (XGBoost AUC declining from 0.99 to 0.85) reflects a common pattern in spine ML literature and underscores the importance of rigorous validation methodologies.

Recent systematic reviews of artificial intelligence applications in LSS diagnosis and treatment^27,30 highlight the rapid evolution of ML approaches in this domain. Yang et al.²⁷ demonstrated that AI-based diagnostic tools for lumbar stenosis achieve high accuracy rates, but noted significant variability in validation approaches across studies. Our nested cross-validation strategy addresses these concerns by providing more realistic performance estimates that better reflect real-world deployment scenarios.

The superior performance of tree-based models (Random Forest, XGBoost, Gradient Boosting) compared to deep learning approaches (CNN, DNN) in our study aligns with patterns observed in other spine surgery ML applications.^28–35 Schönnagel et al.²⁸ reported similar findings when developing ML models for lumbar spinal fusion outcomes, noting that ensemble methods consistently outperformed neural networks when working with typical clinical datasets. This pattern likely reflects the structured nature of clinical spine data, where explicit feature relationships are more important than the complex pattern recognition capabilities that make deep learning excel in image analysis.³⁶

Integration of economic variables: a novel approach in spine ML

Our integration of hospitalization cost data as predictive features represents a methodological innovation in spine surgery ML literature. Most existing studies focus exclusively on clinical variables, potentially missing important socioeconomic and institutional factors that influence recovery trajectories.^35,36 The 13% cost differential between eLOS and standard recovery groups in our cohort suggests that financial variables serve as proxies for unmeasured complexity—perhaps reflecting the need for additional diagnostic workup, specialized consultations, or intensive monitoring that presages extended recovery periods.

This approach aligns with broader trends toward value-based care in spine surgery, where economic outcomes are increasingly recognized as important measures of surgical success.^13,36 Wei et al.³⁶ emphasized the importance of cost-effectiveness analysis in LSS management, noting that traditional outcome measures may inadequately capture the full spectrum of treatment success. Our ML framework's ability to integrate clinical and economic variables provides a more comprehensive risk assessment tool that supports value-based care initiatives.

Model interpretability and clinical decision support

The adoption of advanced interpretability frameworks in our study addresses a critical limitation in spine surgery ML applications: the need for transparent, clinically actionable predictions.³³ Traditional spine surgery risk assessment relies on simple scoring systems that sacrifice predictive accuracy for interpretability.^16,24 Our approach demonstrates that sophisticated ML models can maintain both high performance and clinical transparency through techniques like SHAP analysis.

The interpretability analysis revealed nuanced relationships that traditional statistical approaches often miss. The complex interaction between age and comorbidity burden, particularly osteoporosis and hypertension, suggests that cardiovascular and metabolic optimization may be particularly important in older patients undergoing endoscopic decompression. Such insights enable personalized risk stratification that goes beyond simple additive scoring systems commonly used in spine surgery.¹⁶

Comparison with traditional approaches and clinical implementation

The superior performance of our ML models compared to conventional approaches reflects the limitations of traditional spine surgery outcome prediction tools.^22,24 Weinstein et al.²⁴ demonstrated that conventional statistical approaches typically achieve modest predictive accuracy when applied to complex spine surgery outcomes. Even comprehensive clinical assessment tools rarely exceed moderate discriminative ability in predicting postoperative complications or prolonged recovery.

Our ensemble approach's ability to achieve clinically meaningful discrimination (AUC >0.85 post-validation) represents a substantial improvement over traditional methods. This performance level approaches that required for clinical decision support tools, where high accuracy is essential for risk stratification and resource allocation purposes.³⁵ The integration of diverse data sources enables a more holistic risk assessment than traditional approaches focused primarily on anatomical or demographic factors.

The clinical implementation potential of our models is enhanced by their reliance on routinely collected clinical data without requiring specialized testing or additional data collection. This practical advantage facilitates integration within existing electronic health record systems and supports real-time clinical decision making.¹⁷

Limitations and future directions in spine surgery ML

While our study demonstrates the potential of ML in endoscopic spine surgery, several limitations warrant acknowledgment within the broader context of spine ML research.^37–41 The single-center design, though enabling standardized protocols and consistent data quality, may limit generalizability across different healthcare systems with varying patient populations and practice patterns, a concern highlighted in recent multicenter spine surgery studies.^39,40

Future research should prioritize external validation across diverse healthcare systems, as emphasized in recent bibliometric analyses of LSS research trends.⁴¹ The development of multicenter collaborative frameworks could enable large-scale model development while addressing the sample size limitations inherent in single-institution studies.⁴⁰

The integration of emerging technologies, such as real-time intraoperative monitoring and advanced imaging analytics, represents promising future directions for spine surgery ML applications.³⁸ Recent advances in neural network-based detection of LSS from imaging studies³⁸ suggest potential for multimodal ML approaches that combine preoperative clinical data with real-time surgical variables.

Health economic implications and value-based care

The economic implications of accurate eLOS prediction extend beyond individual patient care to broader health system efficiency, aligning with current trends toward value-based spine care.^13,36 By identifying high-risk patients preoperatively, institutions can implement targeted interventions: enhanced preoperative optimization for patients with modifiable risk factors, adjusted staffing patterns to accommodate expected extended stays, and improved discharge planning to expedite appropriate transitions of care.

Recent systematic evaluations of LSS management approaches^26,36 emphasize the importance of cost-effectiveness considerations in treatment selection. Our ML framework's integration of clinical and economic variables provides actionable insights for optimizing both patient outcomes and resource utilization, supporting the broader movement toward value-based spine care delivery.¹³

This study establishes ML, particularly ensemble methods combining gradient boosting and regularized regression, as superior approaches for predicting eLOS in endoscopic spine surgery. The integration of clinical, demographic, and economic variables within interpretable ML frameworks represents a significant advance over traditional statistical approaches in spine surgery outcome prediction, building upon the foundational work in endoscopic spine surgery^1,2,7,8 while addressing the evolving needs of value-based care.^13,36

Future multicenter studies should validate these models across diverse healthcare systems,⁴¹ incorporating the lessons learned from recent advances in spine surgery ML applications^27–30 while maintaining the focus on clinically actionable, interpretable predictions that support evidence-based clinical decision making in the rapidly evolving field of endoscopic spine surgery.

Conclusions

This study establishes ML as a superior approach to traditional statistical methods for predicting eLOS in endoscopic spine surgery, achieving clinically meaningful discrimination that can meaningfully impact patient care and resource allocation. The emphasis on age-related comorbidities and anatomical complexity (L4/L5 involvement) provides actionable insights for perioperative optimization, supporting the evolution toward personalized, value-based spine care.

The development of this predictive framework represents a significant advancement in endoscopic spine surgery outcome prediction, providing clinicians with evidence-based tools to optimize patient selection, preoperative counseling, and resource planning while supporting the broader movement toward precision medicine in spine surgery.

Footnotes

Abbreviations

Acknowledgments

The authors thank all of the research participants who volunteered their time to make this work possible.

ORCID iD

Xu Peng

Ethical compliance

We received the approval document from the ethics committee of our hospital, with the number CQZR-2025017. As this is a retrospective study, the informed consent form has been waived.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received funding from Chongqing Medical Vocational Education Group Teaching and Research Project CQZJ202519.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data access statement

All relevant data are within the paper and its Supporting Information files.

Guarantor

XP, the corresponding author, serves as the guarantor of this work.

References

Chen

Guan

Anderson

, et al. Surgical interventions for degenerative lumbar spinal stenosis: a systematic review with network meta-analysis. BMC Med 2024; 22: e430–e441.

Chin

Yong

Wang

, et al. Full-endoscopic versus microscopic spinal decompression for lumbar spinal stenosis: a systematic review & meta-analysis. Spine J 2024; 24: e1022–e1033.

Suputtitada

Chen

CPC

Pongpirul

. Mechanical needling with Sterile water versus lidocaine injection for lumbar spinal stenosis. Global Spine J 2024; 14: e82–e92.

Sun

Weng

, et al. Development of CORE-CM core outcome domain sets for trials of Chinese medicine for lumbar spinal stenosis. BMJ Open 2023; 13: e075856.

Young

Dunning

Butts

, et al. Spinal manipulation and electrical dry needling as an adjunct to conventional physical therapy in patients with lumbar spinal stenosis: a multi-center randomized clinical trial. Spine J 2024; 24: e590–e600.

Comer

Williamson

McIlroy

, et al. Exercise treatments for lumbar spinal stenosis: a systematic review and intervention component analysis of randomised controlled trials. Clin Rehabil 2024; 38: e361–e374.

Chiu

Patel

Zhu

, et al. Endoscopic versus open laminectomy for lumbar spinal stenosis: an international, multi-institutional analysis of outcomes and adverse events. Global Spine J 2020; 10: e720–e728.

Sun

Wang

, et al. Efficacy and safety of unilateral biportal endoscopy compared with transforaminal route percutaneous endoscopic lumbar decompression in the treatment of lumbar spinal stenosis: minimum 1-year follow-up. J Pain Res 2025; 18: e1071–e1080.

Martino Cinnera

Morone

Iosa

, et al. Artificial neural network analysis of factors affecting functional independence recovery in patients with lumbar stenosis after neurosurgery treatment: an observational cohort study. J Orthop 2024; 55: e38–e43.

10.

Wang

, et al. Development of machine learning model for predicting prolonged operation time in lumbar stenosis undergoing posterior lumbar interbody fusion: a multicenter study. Spine J 2025; 25: e460–e473.

11.

Yasheng

Yusufu

Yimiti

, et al. Web-based machine learning application for interpretable prediction of prolonged length of stay after lumbar spinal stenosis surgery: a retrospective cohort study with explainable AI. Front Physiol 2025; 16: 1542240.

12.

Bian

Zhang

Man

, et al. Predisposing factors for allogeneic blood transfusion in patients with ankylosing spondylitis undergoing primary unilateral total hip arthroplasty: a retrospective study. J Orthop Surg Res 2023; 18: e9.

13.

McIlroy

Bearne

Weinman

, et al. Identifying modifiable factors that influence walking in patients undergoing surgery for neurogenic claudication: a prospective longitudinal study. Sci Rep 2025; 15: e4959.

14.

Cuschieri

. The STROBE guidelines. Saudi J Anaesth 2019; 13: S31–S34.

15.

Shiferaw

Roloff

Balaur

, et al. Guidelines and standard frameworks for artificial intelligence in medicine: a systematic review. JAMIA Open 2025; 8: ooae155.

16.

Bays

Stieger

Held

, et al. The influence of comorbidities on the treatment outcome in symptomatic lumbar spinal stenosis: a systematic review and meta-analysis. N Am Spine Soc J 2021; 6: 100072.

17.

Ford

Carroll

Smith

, et al. Extracting information from the text of electronic medical records to improve case detection: a systematic review. J Am Med Inform Assoc 2016; 23: e1007–e1015.

18.

Wang

, et al. A cohort study on the comparison of complications, short-term efficacy, and quality of life between thoracoscopic surgery and traditional surgery in the treatment of rib fractures. Contrast Media Mol Imaging 2022; 2022: 2079098.

19.

Xiao

, et al. Microendoscopic discectomy versus open discectomy for lumbar disc herniation: a meta-analysis. Eur Spine J 2016; 25: e1373–e1381.

20.

Marvaniya

Agarwal

Mehta

, et al. Minimal invasive endodontics: a comprehensive narrative review. Cureus 2022; 14: e25984.

21.

Liu

Dong

, et al. Non-linear relationships between children age and pneumococcal vaccine coverage: important implications for vaccine prevention strategies. Vaccine 2021; 39: e1392–e1401.

22.

Whang

Tran

Rosner

. Longitudinal comparative analysis of complications and subsequent interventions following stand-alone interspinous spacers, open decompression, or fusion for lumbar stenosis. Adv Ther 2023; 40: e3512–e3524.

23.

Pairuchvej

Muljadi

, et al. Full-endoscopic (bi-portal or uni-portal) versus microscopic lumbar decompression laminectomy in patients with spinal stenosis: systematic review and meta-analysis. Eur J Orthop Surg Traumatol 2020; 30: e595–e611.

24.

Weinstein

Lurie

Tosteson

, et al. Surgical compared with nonoperative treatment for lumbar degenerative spondylolisthesis. Four-year results in the spine patient outcomes research trial (SPORT) randomized and observational cohorts. J Bone Joint Surg Am 2009; 91: e1295–e1304.

25.

Perez-Roman

Gaztanaga

, et al. Endoscopic decompression for the treatment of lumbar spinal stenosis: an updated systematic review and meta-analysis. J Neurosurg Spine 2021; 36: e549–e557.

26.

, et al. Effect of different interventions on lumbar spinal stenosis: a systematic evaluation and network meta-analysis. World Neurosurg 2025; 194: e123459.

27.

Yang

Zhang

, et al. Performance of Artificial Intelligence in Diagnosing Lumbar Spinal Stenosis: a Systematic Review and Meta-analysis. Spine (Phila Pa 1976) 2024. [doi: 10.1097/BRS.0000000000005174].

28.

Schönnagel

Caffard

Vu-Han

, et al. Predicting postoperative outcomes in lumbar spinal fusion: development of a machine learning model. Spine J 2024; 24: e239–e249.

29.

Verheijen

EJA

Kapogiannis

Munteh

, et al. Artificial intelligence for segmentation and classification in lumbar spinal stenosis: an overview of current methods. Eur Spine J 2025; 34: e1146–e1155.

30.

Wang

Chen

Fan

, et al. Machine learning and deep learning for diagnosis of lumbar spinal stenosis. Systematic Review and Meta-Analysis. J Med Internet Res 2024; 26: e54676.

31.

Lee

Moon

Suk

, et al. Lumbar spinal stenosis: pathophysiology and treatment principle: a narrative review. Asian Spine J 2020; 14: e682–e693.

32.

Suzuki

Kokabu

Yamada

, et al. Deep learning-based detection of lumbar spinal canal stenosis using convolutional neural networks. Spine J 2024; 24: e2086–e2101.

33.

Hartman

Granville

Jacobson

. Radiologic evaluation of lumbar spinal stenosis. The Integration Of Sagittal And Axial Views In Decision Making For Minimally Invasive Surgical Procedures. Cureus 2019; 11: e4268.

34.

Katz

Zimmerman

Mass

, et al. Diagnosis and management of lumbar spinal stenosis: a review. JAMA 2022; 327: e1688–e1699.

35.

Abbas

Yousef

Peled

, et al. Predictive factors for degenerative lumbar spinal stenosis: a model obtained from a machine learning algorithm technique. BMC Musculoskelet Disord 2023; 24: e218.

36.

Wei

Zhou

Liu

, et al. Management for lumbar spinal stenosis: a network meta-analysis and systematic review. Int J Surg 2021; 85: e19–e28.

37.

Kaen

Park

Son

. Clinical outcomes of uniportal compared with biportal endoscopic decompression for the treatment of lumbar spinal stenosis: a systematic review and meta-analysis. Eur Spine J 2023; 32: e2717–e2725.

38.

Tumko

Kim

Uspenskaia

, et al. A neural network model for detection and classification of lumbar spinal stenosis on MRI. Eur Spine J 2024; 33: e941–e948.

39.

Ghogawala

Dziura

Butler

, et al. Laminectomy plus fusion versus laminectomy alone for lumbar spondylolisthesis. N Engl J Med 2016; 374: e1424–e1434.

40.

Han

, et al. Percutaneous endoscopic unilateral laminotomy and bilateral decompression improves gait quality and stance balance in patients with lumbar spinal stenosis: a retrospective cohort study. J Orthop Surg Res 2025; 20: e238.

41.

Kiliçaslan

ÖF

Nabi

Yardibi

, et al. Research tendency in lumbar spinal stenosis over the past decade: a bibliometric analysis. World Neurosurg 2021; 149: e71–e84.

Machine learning prediction of extended length of stay following endoscopic decompression for lumbar spinal stenosis: A retrospective cohort study

Abstract

Objective

Methods

Results

Conclusion

Keywords

Introduction

Methods

Study design, setting, and population

Inclusion criteria

Exclusion criteria

Data collection

Machine learning modeling algorithms

Workflow

Performance metrics

Results

Population and patient characteristics

Building and evaluating ML models in original data

Feature importance

Evaluation of ML models after 10-fold cross-validation

Discussion

Algorithmic architecture and spine-specific applications

Endoscopic surgery: a new paradigm requiring novel predictive approaches

Feature engineering and clinical Variable integration

Machine learning model performance and validation in spine surgery

Integration of economic variables: a novel approach in spine ML

Model interpretability and clinical decision support

Comparison with traditional approaches and clinical implementation

Limitations and future directions in spine surgery ML

Health economic implications and value-based care

Conclusions

Footnotes

Abbreviations

Acknowledgments

ORCID iD

Ethical compliance

Funding

Declaration of conflicting interests

Data access statement

Guarantor

References