Abstract
Objective
Extended length of stay (eLOS) after hip fracture surgery in elderly patients poses significant clinical and economic challenges. While traditional statistical models identify key predictors, they may miss complex variable interactions. This study compared logistic regression with machine learning (ML) algorithms to predict eLOS, emphasizing actionable factors within Enhanced Recovery After Surgery (ERAS) protocols.
Methods
This retrospective cohort study analyzed 1137 patients aged ≥50 years who underwent hip arthroplasty or internal fixation for hip fracture (2019–2025). Extended LOS was defined as hospital stay ≥14 days based on median LOS of 13.8 days. Two prediction models were developed: preoperative (admission data only) and early in-hospital (including day-1 postoperative data). Multivariate logistic regression identified independent predictors, while nine ML algorithms were trained and validated using 10-fold cross-validation. Feature importance was assessed through SHAP analysis.
Results
Among 1137 patients, 500 (44.0%) experienced eLOS. Logistic regression identified male gender (odds ratio (OR) = 1.42, p = 0.01), delayed surgery >48 hours (OR = 2.31, p < 0.001), prolonged operation time (OR = 1.67, p = 0.02), and postoperative pneumonia (OR = 3.12, p < 0.001) as independent risk factors. Tranexamic acid (TXA) use was protective (OR = 0.65, p = 0.03). After 10-fold cross-validation, logistic regression and Support Vector Machine achieved area under the curve (AUC) = 0.76 (95% confidence interval (CI) 0.73–0.79), while XGBoost showed AUC = 0.72 (95% CI 0.69–0.75). SHAP (SHapley Additive exPlanations) analysis confirmed time-to-surgery, TXA use, and coagulation markers as key predictors across models.
Conclusion
Both statistical and ML approaches identified delayed surgery and pneumonia as critical eLOS predictors, while ML revealed complex interactions involving coagulation dynamics and reinforced TXA's protective role. These findings support ML-augmented ERAS protocols targeting modifiable risk factors. External validation and clinical implementation studies are needed to confirm utility in routine practice.
Keywords
Introduction
Hip fractures in elderly patients represent a growing global health challenge, with incidence projected to exceed six million cases annually by 2050. 1 These injuries carry substantial morbidity and mortality, with extended length of stay (eLOS) after surgical treatment associated with nosocomial infections, delayed rehabilitation, functional decline, and increased healthcare costs.2,3
Traditional statistical approaches have identified several predictors of prolonged hospitalization, including delayed surgery, comorbidities, and postoperative complications.4,5 However, conventional regression models assume linear relationships and may fail to capture complex interactions among variables, potentially limiting their predictive accuracy and clinical utility for individualized care planning.
Machine learning (ML) techniques offer advantages in handling nonlinear relationships and variable interactions, with potential to improve risk stratification in orthogeriatric populations.6–8 Recent studies demonstrate promising results using ML algorithms for predicting outcomes after hip fracture surgery, though systematic comparisons with traditional statistical methods remain limited.9–11 Furthermore, the integration of ML predictions with Enhanced Recovery After Surgery (ERAS) protocols—which emphasize evidence-based interventions to optimize perioperative care—has received limited attention despite its potential to guide targeted interventions.
The present study aims to develop and compare logistic regression (LR) with multiple ML algorithms for predicting extended hospital stay (eLOS ≥14 days) after hip fracture surgery using routinely collected clinical variables. We hypothesized that: (i) ML models would achieve superior discrimination (AUC >0.80) compared to traditional regression; (ii) surgical delay and postoperative pneumonia would emerge as primary predictors across methods; and (iii) interventions aligned with ERAS principles, particularly tranexamic acid (TXA) use, would demonstrate protective associations. To enhance clinical interpretability, we employed SHAP (SHapley Additive exPlanations) analysis to identify modifiable risk factors suitable for integration within ERAS pathways. This study follows TRIPOD 12 and TRIPOD-AI 13 reporting guidelines for prediction model development and validation.
Materials and methods
Study design and data source
This retrospective cohort study analyzed patients aged ≥50 years who underwent hip arthroplasty or internal fixation for hip fracture at the Department of Orthopaedics, People's Hospital of Chongqing Hechuan, between 2019 and 2025. The study adhered to the Declaration of Helsinki and received institutional ethics committee approval (HX-2025-009). Given the retrospective design, informed consent was waived. Reporting follows EQUATOR Network recommendations 14 and TRIPOD guidance for prediction model studies.12,13
Inclusion and exclusion criteria
Inclusion criteria:
Age ≥50 years at the time of injury; Diagnosis of hip fracture; and Surgical treatment (arthroplasty or internal fixation). Age <50 years; Multiple fractures, pathological fractures, or periprosthetic fractures; Conservative treatment due to severe comorbidities; and Missing more than 30% of clinical data (see Figure 1).
Exclusion criteria:

Study participant selection flowchart. Flow diagram illustrating the application of inclusion and exclusion criteria for elderly patients with hip fractures undergoing surgical treatment (2019–2025). The final cohort of 1137 patients comprised 765 (67.3%) hip arthroplasty cases and 372 (32.7%) internal fixation cases. Extended length of stay (eLOS) was defined as hospital stay ≥14 days based on the median length of stay of 13.8 days.
All analyzed variables were mandatory perioperative examination items routinely recorded for every patient, resulting in a complete dataset with no missing values requiring imputation.
Missing data analysis and handling
All variables analyzed in this study were mandatory perioperative examination items routinely recorded for every patient according to institutional protocols. Following application of exclusion criteria, the final dataset was complete with no missing values for any analyzed variables. Therefore, no imputation procedures or pairwise deletion methods were required. This complete-case approach eliminates potential bias from missing data mechanisms and ensures robust statistical inference across all modeling approaches.
Variables and definitions
Primary outcome: eLOS was defined as hospital stay ≥14 days, based on the cohort median LOS of 13.8 days.
Predictor variables were categorized as:
Demographic factors: Age, sex, height, weight, and body mass index (BMI).
Clinical characteristics: Heart rate on admission, fracture type, and comorbidities.
Laboratory parameters: Hemoglobin (Hb), albumin (Alb), liver enzymes, renal function markers, coagulation studies (Activated Partial Thromboplastin Time (APTT), Prothrombin Time-International Normalized Ratio (PT-INR), and D-dimer), obtained preoperatively and on postoperative day 1.
Surgical factors: Time from admission to surgery, surgical procedure (arthroplasty vs internal fixation), anesthesia type, operation time, and TXA administration.
Postoperative complications: Pneumonia and deep vein thrombosis (DVT).
Economic variables: Total hospitalization costs relative to Diagnosis-Related Group (DRG) thresholds (CNY 32,751 for arthroplasty; CNY 22,643 for internal fixation per Chongqing health insurance regulations).
Clinical definitions followed established criteria: anemia (Hb <120 g/L for females, < 130 g/L for males) 15 ; malnutrition (albumin <3.5 g/dL) 16 ; organ dysfunction based on abnormal preoperative liver or renal indices. 17
All variables analyzed in this study were mandatory perioperative examination items routinely recorded for every patient according to institutional protocols. Following application of exclusion criteria, the final dataset was complete with no missing values for any analyzed variables. Therefore, no imputation procedures or pairwise deletion methods were required. This complete-case approach eliminates potential bias from missing data mechanisms and ensures robust statistical inference across all modeling approaches.
Statistical analysis
Descriptive statistics and group comparisons
Normality was assessed using Shapiro–Wilk test (n < 2000) or Kolmogorov–Smirnov test with Lilliefors correction (n ≥ 2000). Continuous variables are presented as mean ± SD or median (IQR (interquartile range)) as appropriate. Group comparisons used Student's t-test or Mann–Whitney U-test for continuous variables and chi-square test for categorical variables.
Prediction timepoints
Two prediction scenarios were evaluated:
Preoperative model: Using only baseline variables available at admission;
Early in-hospital model: Including postoperative day-1 variables for dynamic risk stratification.
Predictor specification and logistic regression analysis
Candidate predictors were defined a priori based on clinical relevance and previous literature, including demographics, comorbidities, surgical timing, operative factors, laboratory indices, and early postoperative complications. We did not rely solely on univariate p-values for variable selection. To mitigate overfitting, penalized logistic regression (LASSO) was applied to identify stable predictors. Continuous variables (e.g. age, BMI, and laboratory values) were modeled flexibly using restricted cubic splines to account for potential nonlinear relationships. Clinically plausible interactions (e.g. TXA use × operation time; gender × anemia) were tested in exploratory analyses.
In our cohort, 500 patients experienced eLOS (≥14 days). With k candidate predictors included in the modeling, this corresponds to an events-per-variable (EPV) ratio of approximately 500 ÷ k, which exceeds the commonly recommended minimum of 10 EPV for reliable LR modeling. To further control model complexity and reduce overfitting, we prespecified clinically plausible predictors based on prior literature, applied penalized regression (LASSO) to shrink unstable coefficients, and used cross-validation with embedded preprocessing for all ML algorithms.
Given the low frequency of postoperative pneumonia (n = 15), we applied penalized LR (Firth correction) and conducted bootstrap resampling (1000 samples) to evaluate the stability of regression coefficients for rare events.
Machine learning model development
Nine classification algorithms were implemented: Gradient Boosting Machine (GBM), LR, Naïve Bayes (NB), k-nearest neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN).
Model training, hyperparameter tuning, and validation: All models were developed using a rigorous cross-validation framework to prevent information leakage. Preprocessing steps (scaling only, as no missing data required imputation) were embedded within each fold of the pipeline. Hyperparameter tuning was confined to the training folds using grid search with internal cross-validation. Model performance was then evaluated on the corresponding held-out validation fold. For final performance estimates, the entire dataset was split into a training set (70%) and a held-out test set (30%). The test set was reserved strictly for one-time final evaluation and was not used during training or tuning.
Performance metrics: Accuracy, precision, recall, F1-score, positive predictive value (PPV), negative predictive value (NPV), and area under the ROC (receiver operating characteristic) curve (AUC-ROC).
Model calibration and clinical utility: In addition to discrimination metrics, calibration was assessed using the Brier score, calibration slope, and calibration intercept. Calibration plots were generated by plotting predicted versus observed probabilities across deciles of risk. Clinical utility was evaluated with decision-curve analysis (DCA) to quantify net benefit across a range of clinically relevant thresholds for eLOS (≥14 days).
Feature importance and interpretability
SHAP analysis 18 provided model-agnostic feature importance rankings and directional effects. Quantitative importance values were calculated across ensemble methods (GBM, RF, and XGBoost) with cross-validation stability assessment.
Sensitivity analyses
Robustness was evaluated through: (i) alternative eLOS thresholds (≥10, ≥ 16 days); (ii) procedure-specific analyses (arthroplasty vs internal fixation); (iii) repeated 10-fold cross-validation with hyperparameter perturbation (±20%); and (iv) leave-one-year-out temporal validation (2019–2025).
All analyses were performed using SPSS 27.0 (IBM Corp.) and Python 3.12.4 with scikit-learn, 19 XGBoost, 20 and SHAP 18 libraries. DCA quantified net clinical benefit across a range of thresholds. 21 SHAP plots 22 were used to interpret feature contributions to model outputs. Ensemble models (GBM, RF, and XGBoost) were further integrated to rank feature importance across architectures.
Results
Results are presented separately for the two prediction scenarios: preoperative models (baseline variables only) and early in-hospital models (including day 1 postoperative variables).
Patient characteristics and baseline comparisons
A total of 1137 patients were included after applying selection criteria (Figure 1), comprising 765 (67.3%) who underwent hip arthroplasty and 372 (32.7%) who underwent internal fixation. Based on the median hospital stay of 13.8 days, patients were categorized into conventional LOS (<14 days; n = 637, 56.0%) and eLOS (≥14 days; n = 500, 44.0%).
Baseline characteristics differed significantly between groups (Table 1). Patients with eLOS were more likely to be male (52.4% vs 45.1%, p = 0.013), had longer admission-to-surgery intervals (5.81 vs 3.37 days; difference 2.44 days, 95% CI 2.11–2.77, p < 0.001), and higher rates of preoperative malnutrition (18.6% vs 12.7%, p = 0.008) and organ dysfunction. Operation times were longer in the eLOS group (102.3 vs 89.7 minutes, p = 0.002), with lower rates of TXA use (64.2% vs 73.1%, p = 0.001) and higher incidence of postoperative pneumonia (3.0% vs 0.5%, p < 0.001). Economic costs significantly exceeded DRG thresholds more frequently in the eLOS group.
Baseline demographic and clinical characteristics of study participants stratified by length of hospital stay.
Data are presented as mean ± standard deviation for normally distributed continuous variables, median (interquartile range) for non-normally distributed continuous variables, and number (percentage) for categorical variables. Statistical comparisons performed using Student's t-test or Mann–Whitney U-test for continuous variables and chi-square test for categorical variables. Significant differences (p < 0.05) observed for gender distribution, body weight, malnutrition status, preoperative liver and renal dysfunction, operation time, time from admission to surgery, postoperative pneumonia, tranexamic acid use, and economic factors.
Values are presented as mean ± SD or n (%), unless otherwise indicated. p-Values are reported to three decimal places.
Abbreviations: Alb: albumin; APTT: Activated Partial Thromboplastin Time; BMI: body mass index; bpm: beats per minute; CNY: Chinese yuan; DRGs: diagnosis-related groups; DVT: deep vein thrombosis; eLOS: extended length of stay; Hb: hemoglobin; HCT: hematocrit; MCHC: Mean Corpuscular Hemoglobin Concentration; n: sample size; Po: postoperative; Pre: preoperative; PT-INR: Prothrombin Time–International Normalized Ratio; SD: standard deviation.
Multivariable logistic regression analysis
Preoperative versus early in-hospital prediction models
Two prediction timepoints were evaluated to assess clinical utility across different decision-making contexts. In the preoperative model using only admission variables, male sex (OR = 1.28, 95% CI 0.98–1.67, p = 0.071) and time from admission to surgery (OR = 1.68, 95% CI 1.55–1.82, p < 0.001) emerged as primary predictors.
In the comprehensive early in-hospital model incorporating postoperative day 1 data, five factors remained independently associated with eLOS (Table 2).
Multivariable logistic regression analysis identifying independent predictors of extended hospital stay in elderly patients with hip fractures.
Three models presented: univariate analysis, preoperative multivariable model (using admission variables only), and comprehensive multivariable model (including preoperative, operative, and early postoperative variables). Results presented as β, OR with 95% CI, and p-values. Bootstrap resampling (1000 iterations) performed to assess stability of coefficients for rare events. Extended length of stay defined as hospital stay ≥14 days.
Three models are presented: Univariate analysis of each predictor.
Multivariate model 1 (preoperative prediction): includes only baseline and preoperative variables (demographics, comorbidities, nutritional status, laboratory indices, and time-to-surgery). This reflects the admission/triage use-case.
Multivariate model 2 (early in-hospital prediction): extends model 1 by incorporating operative and early postoperative variables (operative time, tranexamic acid use, and pneumonia within the first postoperative day). This reflects the early in-hospital monitoring use-case.
Values are presented as β, ORs with 95% CI, and p-values.
β: regression coefficient; BMI: body mass index; CI: confidence interval; eLOS: extended length of stay; Po: postoperation; Pre: preoperation; OR: odds ratio.
Risk factors:
Male sex (OR = 1.42, 95% CI 1.09–1.84, p = 0.010); Delayed surgery >48 hours (OR = 2.31, 95% CI 1.72–3.09, p < 0.001); Prolonged operation time (OR = 1.67, 95% CI 1.10–2.53, p = 0.020); Postoperative pneumonia (OR = 3.12, 95% CI 1.63–5.99, p < 0.001).
Protective factor:
TXA use (OR = 0.65, 95% CI 0.44–0.95, p = 0.030).
Alternative multivariable analysis confirmed these associations with slight variations in effect sizes (Supplemental Table 1). Sensitivity analysis using bootstrap resampling (1000 iterations) confirmed the stability of these associations, although the pneumonia effect showed wide confidence intervals due to low event frequency (n = 15). Absolute effect sizes for key continuous variables are presented in Supplemental Table 2.
Cross-validation performance of logistic regression
In 10-fold cross-validation, LR achieved moderate discrimination with AUC of 0.73 (95% CI 0.70–0.76), alongside acceptable calibration (Brier score 0.19; calibration slope 0.84; intercept −0.02). Performance metrics are included in the comprehensive model comparison (Table 3). While ensemble ML models demonstrated superior discrimination, LR maintained clinical interpretability advantages for understanding individual predictor effects and their confidence intervals.
Performance metrics for nine machine learning algorithms in predicting extended length of stay before cross-validation.
Algorithms evaluated include ensemble methods (GBM, RF, XGBoost), linear methods (logistic regression, SSVM), instance-based methods (KNN), probabilistic methods (Naïve Bayes), tree-based methods (Decision Tree), and neural networks (ANN). Performance assessed using standard classification metrics on held-out test set (30% of data). All models trained using 70% training data with embedded preprocessing and hyperparameter optimization. Values represent initial training set performance metrics before cross-validation. See Table 5 for cross-validated results with 95% confidence intervals.
Values represent mean cross-validated performance metrics with 95% confidence intervals where applicable.
ANN: artificial neural network; AUC: area under the curve; GBM: gradient boosting machine; KNN: k-nearest neighbor; NPV: negative predictive value; Precision: positive predictive value (PPV); Recall: sensitivity; RF: Random Forest; SVM: support vector machine; XGB: extreme gradient boosting.
Machine learning model performance
Initial model discrimination
Nine ML algorithms were trained and evaluated for eLOS prediction (Supplemental Figure 1 and Supplemental Table 3). Initial performance assessment demonstrated superior results for ensemble methods (Table 3). GBM achieved the highest discrimination (AUC = 0.88, accuracy = 81%), followed by RF (AUC = 0.85, accuracy = 79%), and XGBoost (AUC = 0.85, accuracy = 78%). Simpler algorithms showed more limited performance: DT (AUC = 0.79, accuracy = 75%) and KNN (AUC = 0.74, accuracy = 70%). All models significantly outperformed random classification (AUC = 0.5), demonstrating meaningful predictive capability.
Cross-validation performance and model robustness
Ten-fold cross-validation revealed important changes in model ranking and highlighted the importance of rigorous validation (Figure 2). After 10-fold cross-validation, SVM and logistic regression achieved AUC of 0.76 (95% CI 0.73–0.79) and balanced performance (accuracy = 73%, precision = 75%, recall = 71%). This superior generalization likely reflects SVM's inherent regularization properties and optimal hyperplane construction.

ROC curve comparison of nine machine learning algorithms showing initial training performance for predicting extended length of stay (≥14 days) in elderly patients with hip fractures. Note: These curves represent training set performance before cross-validation. After 10-fold cross-validation (see Table 5), the models achieved more conservative estimates: SVM and logistic regression (AUC = 0.76, 95% CI 0.73–0.79), XGBoost (AUC = 0.72, 95% CI 0.69–0.75), Random Forest (AUC = 0.72), Decision Tree (AUC = 0.74), GBM (AUC = 0.67), and Naïve Bayes (AUC = 0.73). KNN and ANN showed higher AUCs (0.87 and 0.86) but require external validation. AUC: area under the curve; ANN: Artificial Neural Network; CI: confidence interval; GBM: gradient boosting machine; KNN: k-nearest neighbors; ROC: receiver operating characteristic.
Feature importance rankings across ensemble machine learning models for predicting extended length of stay.
Importance values scaled relative to the top predictor (time from admission to surgery = 1.00) for each model. Rankings derived from model-specific importance metrics: Gini importance for Random Forest, gain-based importance for XGBoost, and permutation importance for GBM. Combined ensemble ranking represents averaged importance across all three models. Cross-validation stability assessed through repeated sampling across 10 folds.
Feature importance scores from trained machine learning models. Values are scaled to model-specific maximum (1.00).
APTT: activated partial thromboplastin time; D-dimer: fibrin degradation product (D-dimer); DVT: deep vein thrombosis; GBM: gradient boosting machine; HCT: hematocrit; Hb: hemoglobin; Platelet: platelet count; XGB: extreme gradient boosting.
Cross-validated performance of machine learning models for preoperative triage and early in-hospital prediction of eLOS.
Preoperative models (baseline variables only) demonstrated moderate discrimination (AUC ∼0.74–0.80), whereas early in-hospital models (including day 1 postoperative variables) achieved superior discrimination (AUC ∼0.85–0.92) with favorable calibration. At recall-oriented thresholds (sensitivity ≥0.80), early models identified high-risk patients with net benefit comparable to a treat-all strategy at 30–40% threshold probabilities. Values are mean estimates from 10-fold cross-validation with bootstrapped 95% CIs.
AUC: area under the curve; Cal: calibration; CI: confidence interval; eLOS: extended length of stay; NB: net benefit; NPV: negative predictive value; PPV: positive predictive value; SVM: support vector machine; Thr: threshold probability; XGBoost: extreme gradient boosting.
Ensemble methods showed performance decline after cross-validation: GBM maintained high precision (86%) but lower recall (40%), suggesting potential overfitting in initial training. RF and XGBoost achieved more balanced performance with AUCs of 0.76 (95% CI 0.73–0.79) and similar values, respectively. Detailed cross-validation metrics with confidence intervals are provided in Supplemental Table 4.
Model calibration and clinical utility
Calibration assessment demonstrated acceptable agreement between predicted and observed risks across top-performing models. Brier scores ranged from 0.15 to 0.21, with calibration slopes near 1.0 and intercepts close to 0, indicating minimal systematic bias (Supplemental Table 5 and Supplemental Figure 2(a)–(c)).
DCA confirmed clinical utility of ML approaches over conventional strategies (Supplemental Figure 2). Ensemble models (GBM, RF, and XGBoost) provided greater net benefit than LR across clinically relevant threshold probabilities (10–40%), with peak net benefit of 0.373–0.388 at 10% threshold (Supplemental Table 6).
Feature importance and interpretability analysis
Model-specific feature rankings
Individual ensemble models revealed distinct patterns in predictor prioritization. The top five predictors identified by each model were:
GBM model (Supplemental Figure 4a): Time from admission to surgery (importance ∼0.45), preoperative APTT, postoperative day 1 hematocrit, postoperative day 1 hemoglobin, and albumin levels.
XGBoost model (Supplemental Figure 4b): Time from admission to surgery (importance ∼0.16), postoperative deep vein thrombosis, TXA usage, operation time, and preoperative APTT.
RF model (Supplemental Figure 4c): Time from admission to surgery (importance ∼0.17), preoperative APTT, preoperative D-dimer, preoperative hematocrit, and platelet count.
Quantitative feature rankings
Consolidated feature importance analysis across ensemble methods revealed consistent patterns with some model-specific variations. Time from admission to surgery was universally ranked as the most influential predictor across all models (scaled importance = 1.00). The complete ranking across top predictors is detailed in Supplemental Table 7, confirming surgical delay as the primary driver with secondary contributions from coagulation and hematological parameters.
Ensemble feature importance and directional effects
Consolidated ensemble ranking across all three models identified the top secondary predictors as preoperative APTT (importance 0.21), postoperative day 1 hematocrit (0.18), platelet count (0.15), and preoperative D-dimer (0.12) (Table 4 and Supplemental Figure 5). SHAP analysis provided insights into feature directionality and magnitude of effects (Supplemental Figure 6(a)–(c)). Across all models, surgical delay showed strong positive association with eLOS risk. Elevated preoperative coagulation markers (APTT and D-dimer) and declining postoperative hematocrit consistently contributed to higher eLOS probability. TXA use demonstrated protective effects, particularly in XGBoost models.
Distribution analysis confirmed distinct patterns between patient groups (Supplemental Figure 7). The eLOS group showed significantly longer admission-to-surgery intervals, elevated coagulation markers, and different hematological parameter distributions, supporting their predictive validity.
Feature correlations and independence
Correlation analysis among top 20 predictive features revealed important interdependencies (Supplemental Figure 8). Strong positive correlations existed between preoperative and postoperative hematological markers (r = 0.98 for hematocrit), while coagulation measures showed mild inverse correlations. Notably, platelet count demonstrated relatively independent predictive value with minimal correlation to other variables, suggesting unique contribution to eLOS prediction.
Sensitivity and robustness analyses
Alternative outcome thresholds
Model performance remained consistent across alternative eLOS definitions. Using thresholds of ≥10 and ≥16 days, the best-performing models achieved AUCs of 0.78 (95% CI 0.75–0.81) and 0.69 (95% CI 0.65–0.72), respectively, with maintained calibration (Brier scores 0.14–0.19) and slopes near 1.0 (Supplemental Table 8).
Procedure-specific analysis
Subgroup analyses comparing arthroplasty (n = 765) versus internal fixation (n = 372) patients showed similar discrimination patterns. Cross-validated AUCs were 0.71 for arthroplasty and 0.77 for fixation procedures, with overlapping CIs supporting generalizability across surgical approaches (Supplemental Table 9).
Temporal and hyperparameter stability
Repeated 10-fold cross-validation and modest hyperparameter perturbations (±20%) produced <0.02 absolute changes in AUC without altering model rankings. Leave-one-year-out validation (2019–2025) confirmed temporal stability, though slight performance variation was observed, emphasizing the importance of ongoing model monitoring (Supplemental Table 10).
Clinical decision support applications
The developed models support two distinct clinical use-cases with different performance characteristics (Table 5):
Preoperative triage model: Using baseline variables only, achieved moderate discrimination (AUC≈0.74–0.80) suitable for early discharge planning and resource allocation at admission.
Early in-hospital prediction model: Incorporating day 1 postoperative data, achieved superior discrimination (AUC≈0.85–0.92) for dynamic risk stratification and intervention targeting.
At recall-oriented thresholds (sensitivity ≥0.80), the early in-hospital model identified high-risk patients with net benefit comparable to treating all patients at clinically relevant threshold probabilities of 0.30–0.40 (Supplemental Figure 2), supporting practical implementation for discharge planning decisions. DCA demonstrated that ML models provided greater clinical utility than conventional approaches across the 10–40% risk threshold range most relevant for perioperative decision making.
Discussion
Clinical insights from statistical and machine learning approaches
This study compared conventional LR with multiple ML algorithms to predict extended length of stay after hip fracture surgery in elderly patients. Both methodologies consistently identified delayed surgery and postoperative pneumonia as critical predictors of prolonged hospitalization, while revealing TXA's independent protective association—findings that align with ERAS principles.
Multivariate LR confirmed delayed surgery (>48 hours), male sex, prolonged operation time, and postoperative pneumonia as independent risk factors for extended hospitalization. The identification of TXA as an independent protective factor (OR = 0.65, p = 0.03) extends beyond its established benefits for reducing blood loss and transfusion requirements.23–25 Our findings suggest TXA's direct association with shorter hospital stays, potentially through limiting inflammatory responses secondary to significant blood loss and reducing transfusion-related complications.26–28 This may involve reducing the need for re-interventions such as surgical drainage for hematomas and minimizing intensive care unit (ICU) stays associated with bleeding complications.29–31 These findings support existing ERAS guidelines recommending routine TXA administration during hip fracture surgery, particularly among elderly patients vulnerable to perioperative complications.32–34
Nevertheless, our regression model may oversimplify dose–response relationships or optimal timing for TXA administration. It also does not explore potential interactions between TXA and other ERAS elements, such as early mobilization or multimodal analgesia, which could synergistically enhance patient recovery.35–37 Thus, while TXA independently predicts shorter hospitalization, its optimal effectiveness may depend on integration within a comprehensive ERAS protocol rather than as an isolated intervention.38–41
Machine learning performance and clinical interpretability
Ensemble ML models—specifically GBM, XGBoost, and RF—outperformed traditional regression in initial training, demonstrating their ability to effectively model complex, nonlinear relationships between clinical variables. The GBM model achieved the strong initial training performance (see Table 3), though cross-validation revealed more modest estimates, attributable to its iterative error-correction approach, capturing nuanced interactions among predictors such as surgical timing, coagulation markers (APTT and D-dimer), and TXA use.
Following rigorous cross-validation, SVM unexpectedly demonstrated robust cross-validated performance (AUC: 0.76, 95% CI 0.73–0.79), outperforming previously superior ensemble methods. This superior generalization likely reflects SVM's inherent regularization properties through the soft-margin parameter C and kernel trick, which creates optimal hyperplanes that generalize well to unseen data.42,43 In contrast, GBM's performance decline postvalidation underscores sensitivity to data partitioning and the risk of overfitting, emphasizing the importance of meticulous hyperparameter tuning and validation strategies.
SHAP analysis provided further clinical insights, highlighting that ML models identified not only recognized clinical risk factors (delayed surgery and pneumonia) but also subtler dynamic changes, such as declines in postoperative hematocrit, potentially reflecting hidden complexities in patient recovery trajectories. Variations in feature rankings across different algorithms—for example, the prioritization of TXA by XGBoost versus coagulation markers by RF—highlight that feature importance is context- and model-dependent, necessitating careful clinical interpretation to distinguish genuine predictive features from algorithmic artifacts.44,45
Comparison with existing literature and model performance
Recent digital-health studies have reported moderate-to-strong discrimination for predicting prolonged/eLOS after hip fracture surgery, though definitions and cohorts vary. A single-center RF model (cut-point at the 75th percentile; PLOS ≥12 days) achieved a test AUC of 0.85 in 360 patients, with surgical timing and laboratory markers among the top predictors. 9 In a larger single-center study (n = 763, 2018–2022) using multiple ML algorithms, delayed surgery, D-dimer, The American Society of Anesthesiologists class, surgery type, and sex consistently emerged as important, with SVM/LR performing best after cross-validation. 10 Broader fragility-fracture work has similarly reported AUCs around 0.84, but often with different endpoints and variable sets. 11
Against this backdrop, our cross-validated AUCs (≈0.73–0.76) and leading predictors (time-to-surgery, coagulation/hematological indices, sex, and TXA) are consistent with the literature, while our eLOS definition (≥14 days) and inclusion of both arthroplasty and fixation distinguish the cohort and task framing. Differences in thresholds, sample size, feature availability, and timing windows likely explain performance spread across studies.
Our findings reaffirm that delayed surgical intervention remains the most critical modifiable factor associated with eLOS, aligning with recent literature.46–48 Additionally, the consistent identification of coagulation status (APTT and D-dimer) as a secondary predictor aligns with studies linking perioperative coagulation abnormalities to prolonged recovery and complications, such as thromboembolic events.49,50
Integration with enhanced recovery after surgery protocols
Interpretability analyses identified time-to-surgery, postoperative pneumonia, and TXA use as the most influential predictors across models, each aligning with potentially actionable elements of perioperative care. Expedited surgical scheduling is a core component of ERAS and may mitigate risk of prolonged hospitalization. Prevention and early recognition of pneumonia—through respiratory physiotherapy, incentive spirometry, and infection control—represent additional modifiable targets. TXA use, already recommended in orthopedic guidelines, was associated with reduced odds of eLOS and highlights the importance of protocol adherence.
Both traditional regression and ML approaches consistently identified delayed surgery and postoperative pneumonia as crucial predictors of prolonged hospitalization. However, ML techniques expanded predictive insights by highlighting underappreciated variables, such as TXA use and preoperative coagulation status, underscoring their potential as targets for intervention. Future ERAS guidelines might incorporate recommendations regarding optimal TXA dosing and timing, based on predictive model insights.51,52
Although ML models provided superior predictive accuracy, they entail greater computational complexity and reduced transparency compared to regression. These limitations highlight the importance of balancing interpretability with predictive accuracy in clinical settings. For practical clinical application, hybrid modeling approaches that leverage regression for initial identification of key predictors, followed by ML methods for enhanced risk stratification, could optimize both interpretability and accuracy.53,54
Clinical decision support and implementation
The developed models support two distinct clinical use-cases with different performance characteristics. A preoperative triage model using baseline variables only achieved moderate discrimination (AUC≈0.74–0.80) suitable for early discharge planning and resource allocation at admission. An early in-hospital prediction model incorporating postoperative day 1 data achieved superior discrimination (AUC≈0.85–0.92) for dynamic risk stratification and intervention targeting.
Beyond discrimination, we confirmed that the models were reasonably calibrated, as reflected by Brier scores and calibration slope/intercept. DCA showed potential clinical utility: ML-based models consistently achieved higher net benefit than LR at thresholds relevant for early discharge planning. These findings highlight that the models not only distinguish high-risk patients but also provide actionable value for perioperative decision making. Future work should focus on integrating these predictors into ERAS pathways and prospectively testing whether targeted interventions improve discharge outcomes. 55
Limitations
This study has several important limitations. The retrospective, single-center design may limit generalizability, and selection bias cannot be excluded. Although we collected mandatory perioperative variables, important factors such as baseline functional status, frailty indices, and social support were not available, leaving potential for unmeasured confounding. The DRG cost thresholds used in this study (CNY 32,751 for arthroplasty and CNY 22,643 for fixation) are specific to Chongqing health insurance regulations, which may limit the generalizability of economic predictors to other healthcare systems with different reimbursement structures.
Our study was conducted in a single-center setting, which may limit the generalizability of the models to other institutions with different patient populations, clinical practices, or discharge pathways. The data span 2019–2025; temporal changes in care processes, surgical techniques, or enhanced recovery pathways may influence model performance over time. Our findings are observational and associative in nature and should not be interpreted as evidence of causal relationships. Although we adjusted for multiple covariates, residual confounding from unmeasured factors cannot be excluded.
Future directions
Future work should include prospective, multicenter external validation to confirm generalizability, integration of these models into electronic health record systems with clinically calibrated decision thresholds, and monitoring for performance drift over time. Fairness across demographic and clinical subgroups will need to be assessed, with periodic recalibration to maintain accuracy. Emphasis should be placed on aligning predictive insights with actionable ERAS interventions, such as expedited surgery scheduling, standardized TXA protocols, and pneumonia-prevention bundles.
Conclusion
In this retrospective study, we compared LR with multiple machine learning algorithms to predict extended hospital stay after hip fracture surgery. Models incorporating both preoperative and early in-hospital variables achieved good discrimination and reasonable calibration, with time-to-surgery, postoperative pneumonia, and TXA use emerging as key associative predictors. These findings suggest potential utility for supporting ERAS-aligned discharge planning and perioperative management, while emphasizing interpretability through SHAP analysis. External multicenter validation, calibrated thresholds, and ongoing performance monitoring are essential before clinical implementation.
Supplemental Material
sj-jpg-1-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-1-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-2-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-2-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-3-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-3-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-4-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-4-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-5-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-5-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-6-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-6-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-7-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-7-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-8-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-8-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-9-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-9-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-10-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-10-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-11-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-11-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-12-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-12-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-13-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-13-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-jpg-14-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-jpg-14-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-15-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-15-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-16-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-16-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-17-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-17-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-18-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-18-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-19-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-19-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-20-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-20-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-21-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-21-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-xlsx-22-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-xlsx-22-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-xlsx-23-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-xlsx-23-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-xlsx-24-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-xlsx-24-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Supplemental Material
sj-docx-25-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective
Supplemental material, sj-docx-25-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH
Footnotes
Abbreviations
Acknowledgements
We thank all the research participants who volunteered their time to make this work possible.
Ethical approval
We received the approval document from the ethics committee of our hospital, with the number HX-2025-009. As this is a retrospective study, the informed consent form has been waived.
Contributorship
XP and FT were involved in conceptualization; FT, XL, and XS in data curation; FT and CQ in formal analysis; HP, XL, and CQ in methodology; XL and XP in project administration; HP in supervision; FT and KB in writing—original draft; and XP in writing—review and editing.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received funding from Chongqing Medical Vocational Education Group Teaching and Research Project CQZJ202543, Chongqing Municipal Hechuan District Research Project HCKJ-2025-055 and Chongqing Science and Health Joint Medical Research Project 2026MSXM127.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data access statement
All relevant data are within the paper and its Supporting Information files.
Guarantor
XP.
Peer review
XXX
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
