Sage Journals: Discover world-class research

Abstract

Objective

Extended length of stay (eLOS) after hip fracture surgery in elderly patients poses significant clinical and economic challenges. While traditional statistical models identify key predictors, they may miss complex variable interactions. This study compared logistic regression with machine learning (ML) algorithms to predict eLOS, emphasizing actionable factors within Enhanced Recovery After Surgery (ERAS) protocols.

Methods

This retrospective cohort study analyzed 1137 patients aged ≥50 years who underwent hip arthroplasty or internal fixation for hip fracture (2019–2025). Extended LOS was defined as hospital stay ≥14 days based on median LOS of 13.8 days. Two prediction models were developed: preoperative (admission data only) and early in-hospital (including day-1 postoperative data). Multivariate logistic regression identified independent predictors, while nine ML algorithms were trained and validated using 10-fold cross-validation. Feature importance was assessed through SHAP analysis.

Results

Among 1137 patients, 500 (44.0%) experienced eLOS. Logistic regression identified male gender (odds ratio (OR) = 1.42, p = 0.01), delayed surgery >48 hours (OR = 2.31, p < 0.001), prolonged operation time (OR = 1.67, p = 0.02), and postoperative pneumonia (OR = 3.12, p < 0.001) as independent risk factors. Tranexamic acid (TXA) use was protective (OR = 0.65, p = 0.03). After 10-fold cross-validation, logistic regression and Support Vector Machine achieved area under the curve (AUC) = 0.76 (95% confidence interval (CI) 0.73–0.79), while XGBoost showed AUC = 0.72 (95% CI 0.69–0.75). SHAP (SHapley Additive exPlanations) analysis confirmed time-to-surgery, TXA use, and coagulation markers as key predictors across models.

Conclusion

Both statistical and ML approaches identified delayed surgery and pneumonia as critical eLOS predictors, while ML revealed complex interactions involving coagulation dynamics and reinforced TXA's protective role. These findings support ML-augmented ERAS protocols targeting modifiable risk factors. External validation and clinical implementation studies are needed to confirm utility in routine practice.

Keywords

Hip fractures arthroplasty length of stay machine learning risk factors enhanced recovery after surgery

Introduction

Hip fractures in elderly patients represent a growing global health challenge, with incidence projected to exceed six million cases annually by 2050.¹ These injuries carry substantial morbidity and mortality, with extended length of stay (eLOS) after surgical treatment associated with nosocomial infections, delayed rehabilitation, functional decline, and increased healthcare costs.^2,3

Traditional statistical approaches have identified several predictors of prolonged hospitalization, including delayed surgery, comorbidities, and postoperative complications.^4,5 However, conventional regression models assume linear relationships and may fail to capture complex interactions among variables, potentially limiting their predictive accuracy and clinical utility for individualized care planning.

Machine learning (ML) techniques offer advantages in handling nonlinear relationships and variable interactions, with potential to improve risk stratification in orthogeriatric populations.^6–8 Recent studies demonstrate promising results using ML algorithms for predicting outcomes after hip fracture surgery, though systematic comparisons with traditional statistical methods remain limited.^9–11 Furthermore, the integration of ML predictions with Enhanced Recovery After Surgery (ERAS) protocols—which emphasize evidence-based interventions to optimize perioperative care—has received limited attention despite its potential to guide targeted interventions.

The present study aims to develop and compare logistic regression (LR) with multiple ML algorithms for predicting extended hospital stay (eLOS ≥14 days) after hip fracture surgery using routinely collected clinical variables. We hypothesized that: (i) ML models would achieve superior discrimination (AUC >0.80) compared to traditional regression; (ii) surgical delay and postoperative pneumonia would emerge as primary predictors across methods; and (iii) interventions aligned with ERAS principles, particularly tranexamic acid (TXA) use, would demonstrate protective associations. To enhance clinical interpretability, we employed SHAP (SHapley Additive exPlanations) analysis to identify modifiable risk factors suitable for integration within ERAS pathways. This study follows TRIPOD¹² and TRIPOD-AI¹³ reporting guidelines for prediction model development and validation.

Materials and methods

Study design and data source

This retrospective cohort study analyzed patients aged ≥50 years who underwent hip arthroplasty or internal fixation for hip fracture at the Department of Orthopaedics, People's Hospital of Chongqing Hechuan, between 2019 and 2025. The study adhered to the Declaration of Helsinki and received institutional ethics committee approval (HX-2025-009). Given the retrospective design, informed consent was waived. Reporting follows EQUATOR Network recommendations¹⁴ and TRIPOD guidance for prediction model studies.^12,13

Inclusion and exclusion criteria

Inclusion criteria:

Age ≥50 years at the time of injury;

Diagnosis of hip fracture; and

Surgical treatment (arthroplasty or internal fixation).

Exclusion criteria:

Age <50 years;

Multiple fractures, pathological fractures, or periprosthetic fractures;

Conservative treatment due to severe comorbidities; and

Missing more than 30% of clinical data (see Figure 1).

Figure 1.

Study participant selection flowchart. Flow diagram illustrating the application of inclusion and exclusion criteria for elderly patients with hip fractures undergoing surgical treatment (2019–2025). The final cohort of 1137 patients comprised 765 (67.3%) hip arthroplasty cases and 372 (32.7%) internal fixation cases. Extended length of stay (eLOS) was defined as hospital stay ≥14 days based on the median length of stay of 13.8 days.

All analyzed variables were mandatory perioperative examination items routinely recorded for every patient, resulting in a complete dataset with no missing values requiring imputation.

Missing data analysis and handling

All variables analyzed in this study were mandatory perioperative examination items routinely recorded for every patient according to institutional protocols. Following application of exclusion criteria, the final dataset was complete with no missing values for any analyzed variables. Therefore, no imputation procedures or pairwise deletion methods were required. This complete-case approach eliminates potential bias from missing data mechanisms and ensures robust statistical inference across all modeling approaches.

Variables and definitions

Primary outcome: eLOS was defined as hospital stay ≥14 days, based on the cohort median LOS of 13.8 days.

Predictor variables were categorized as:

Demographic factors: Age, sex, height, weight, and body mass index (BMI).

Clinical characteristics: Heart rate on admission, fracture type, and comorbidities.

Laboratory parameters: Hemoglobin (Hb), albumin (Alb), liver enzymes, renal function markers, coagulation studies (Activated Partial Thromboplastin Time (APTT), Prothrombin Time-International Normalized Ratio (PT-INR), and D-dimer), obtained preoperatively and on postoperative day 1.

Surgical factors: Time from admission to surgery, surgical procedure (arthroplasty vs internal fixation), anesthesia type, operation time, and TXA administration.

Postoperative complications: Pneumonia and deep vein thrombosis (DVT).

Economic variables: Total hospitalization costs relative to Diagnosis-Related Group (DRG) thresholds (CNY 32,751 for arthroplasty; CNY 22,643 for internal fixation per Chongqing health insurance regulations).

Clinical definitions followed established criteria: anemia (Hb <120 g/L for females, < 130 g/L for males)¹⁵; malnutrition (albumin <3.5 g/dL)¹⁶; organ dysfunction based on abnormal preoperative liver or renal indices.¹⁷

Statistical analysis

Descriptive statistics and group comparisons

Normality was assessed using Shapiro–Wilk test (n < 2000) or Kolmogorov–Smirnov test with Lilliefors correction (n ≥ 2000). Continuous variables are presented as mean ± SD or median (IQR (interquartile range)) as appropriate. Group comparisons used Student's t-test or Mann–Whitney U-test for continuous variables and chi-square test for categorical variables.

Prediction timepoints

Two prediction scenarios were evaluated:

Preoperative model: Using only baseline variables available at admission;

Early in-hospital model: Including postoperative day-1 variables for dynamic risk stratification.

Predictor specification and logistic regression analysis

Candidate predictors were defined a priori based on clinical relevance and previous literature, including demographics, comorbidities, surgical timing, operative factors, laboratory indices, and early postoperative complications. We did not rely solely on univariate p-values for variable selection. To mitigate overfitting, penalized logistic regression (LASSO) was applied to identify stable predictors. Continuous variables (e.g. age, BMI, and laboratory values) were modeled flexibly using restricted cubic splines to account for potential nonlinear relationships. Clinically plausible interactions (e.g. TXA use × operation time; gender × anemia) were tested in exploratory analyses.

In our cohort, 500 patients experienced eLOS (≥14 days). With k candidate predictors included in the modeling, this corresponds to an events-per-variable (EPV) ratio of approximately 500 ÷ k, which exceeds the commonly recommended minimum of 10 EPV for reliable LR modeling. To further control model complexity and reduce overfitting, we prespecified clinically plausible predictors based on prior literature, applied penalized regression (LASSO) to shrink unstable coefficients, and used cross-validation with embedded preprocessing for all ML algorithms.

Given the low frequency of postoperative pneumonia (n = 15), we applied penalized LR (Firth correction) and conducted bootstrap resampling (1000 samples) to evaluate the stability of regression coefficients for rare events.

Machine learning model development

Nine classification algorithms were implemented: Gradient Boosting Machine (GBM), LR, Naïve Bayes (NB), k-nearest neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Artificial Neural Network (ANN).

Model training, hyperparameter tuning, and validation: All models were developed using a rigorous cross-validation framework to prevent information leakage. Preprocessing steps (scaling only, as no missing data required imputation) were embedded within each fold of the pipeline. Hyperparameter tuning was confined to the training folds using grid search with internal cross-validation. Model performance was then evaluated on the corresponding held-out validation fold. For final performance estimates, the entire dataset was split into a training set (70%) and a held-out test set (30%). The test set was reserved strictly for one-time final evaluation and was not used during training or tuning.

Performance metrics: Accuracy, precision, recall, F1-score, positive predictive value (PPV), negative predictive value (NPV), and area under the ROC (receiver operating characteristic) curve (AUC-ROC).

Model calibration and clinical utility: In addition to discrimination metrics, calibration was assessed using the Brier score, calibration slope, and calibration intercept. Calibration plots were generated by plotting predicted versus observed probabilities across deciles of risk. Clinical utility was evaluated with decision-curve analysis (DCA) to quantify net benefit across a range of clinically relevant thresholds for eLOS (≥14 days).

Feature importance and interpretability

SHAP analysis¹⁸ provided model-agnostic feature importance rankings and directional effects. Quantitative importance values were calculated across ensemble methods (GBM, RF, and XGBoost) with cross-validation stability assessment.

Sensitivity analyses

Robustness was evaluated through: (i) alternative eLOS thresholds (≥10, ≥ 16 days); (ii) procedure-specific analyses (arthroplasty vs internal fixation); (iii) repeated 10-fold cross-validation with hyperparameter perturbation (±20%); and (iv) leave-one-year-out temporal validation (2019–2025).

All analyses were performed using SPSS 27.0 (IBM Corp.) and Python 3.12.4 with scikit-learn,¹⁹ XGBoost,²⁰ and SHAP¹⁸ libraries. DCA quantified net clinical benefit across a range of thresholds.²¹ SHAP plots²² were used to interpret feature contributions to model outputs. Ensemble models (GBM, RF, and XGBoost) were further integrated to rank feature importance across architectures.

Results

Results are presented separately for the two prediction scenarios: preoperative models (baseline variables only) and early in-hospital models (including day 1 postoperative variables).

Patient characteristics and baseline comparisons

A total of 1137 patients were included after applying selection criteria (Figure 1), comprising 765 (67.3%) who underwent hip arthroplasty and 372 (32.7%) who underwent internal fixation. Based on the median hospital stay of 13.8 days, patients were categorized into conventional LOS (<14 days; n = 637, 56.0%) and eLOS (≥14 days; n = 500, 44.0%).

Baseline characteristics differed significantly between groups (Table 1). Patients with eLOS were more likely to be male (52.4% vs 45.1%, p = 0.013), had longer admission-to-surgery intervals (5.81 vs 3.37 days; difference 2.44 days, 95% CI 2.11–2.77, p < 0.001), and higher rates of preoperative malnutrition (18.6% vs 12.7%, p = 0.008) and organ dysfunction. Operation times were longer in the eLOS group (102.3 vs 89.7 minutes, p = 0.002), with lower rates of TXA use (64.2% vs 73.1%, p = 0.001) and higher incidence of postoperative pneumonia (3.0% vs 0.5%, p < 0.001). Economic costs significantly exceeded DRG thresholds more frequently in the eLOS group.

Table 1.

Baseline demographic and clinical characteristics of study participants stratified by length of hospital stay.

	All patients with available data n = 1137	eLOS (≥14 days)
	All patients with available data n = 1137	YES n = 500	NO n = 637	p-Value
Characteristics
Age, years				0.514
50–89, n (%)	1008 (88.7)	447 (39.3)	561 (49.3)	−
≥90, n (%)	129 (11.3)	53 (4.7)	76 (6.7)	−
Gender, n (%)				0.042
Male, n (%)	720 (63.3)	300 (26.4)	420 (36.9)	−
Female, n (%)	417 (36.7)	200 (17.6)	217 (19.1)	−
Height, m, mean ± SD	1.57 ± 0.1	1.55 ± 0.2	1.58 ± 0.2	0.102
Weight, kg, mean ± SD	56.81 ± 0.8	56.87 ± 0.8	56.76 ± 0.8	0 . 013
BMI, kg/m², mean ± SD	23.11 ± 0.3	23.12 ± 0.4	23.10 ± 0.2	0.524
Heart rate, bpm, mean ± SD	83.59 ± 13.1	84.11 ± 13.3	83.19 ± 12.9	0.223
Type of fracture				0.790
Femoral neck fracture, n (%)	327 (28.8)	146 (12.8)	181 (15.9)	−
Intertrochanteric fracture, n (%)	810 (71.2)	354 (31.1)	456 (40.1)	−
Laboratory examination
Pre Hb, g/L, mean ± SD	104.95 ± 17.4	105.32 ± 18.0	104.66 ± 17.0	0.550
Anemia, n (%)	828 (72.8)	360 (72.0)	468 (73.5)	0.591
Pre HCT, %, mean ± SD	32.00 ± 4.9	32.18 ± 5.1	31.85 ± 4.8	0.313
Pre MCHC, g/L, mean ± SD	327.60 ± 11.4	326.84 ± 12.4	328.19 ± 10.5	0.112
Pre platelet, *10⁹/L, mean ± SD	188.61 ± 83.1	193.32 ± 88.2	184.92 ± 78.7	0.231
Pre Alb, g/L, mean ± SD	36.93 ± 2.4	36.80 ± 2.7	37.03 ± 2.1	0.854
Malnourished, n (%)	117 (10.3)	68 (13.6)	49 (7.7)	<0 . 001
Pre PT-INR, mean ± SD	0.97 ± 0.1	0.97 ± 0.1	0.97 ± 0.1	0.442
Pre APTT, second, mean ± SD	25.82 ± 3.6	25.90 ± 3.9	25.75 ± 3.3	0.872
Pre D-dimer, mg/L, mean ± SD	9.10 ± 10.3	9.43 ± 11.6	8.85 ± 9.1	0.933
Pre calcium, mmol/L, mean ± SD	2.21 ± 0.1	2.21 ± 0.1	2.20 ± 0.1	0.091
Pre creatinine, mmol/L, mean ± SD	71.36 ± 24.1	71.20 ± 22.9	71.49 ± 25.0	0.942
Pre liver disorder, n (%)	216 (19.0)	118 (23.6)	98 (15.4)	<0 . 001
Pre renal disorder, n (%)	125 (11.0)	66 (13.2)	59 (9.3)	0 . 040
Po-Hb, day 1, g/L, mean ± SD	92.11 ± 15.0	91.44 ± 15.7	92.64 ± 14.4	0.081
Po-HCT, day 1, mean ± SD	28.31 ± 4.4	28.17 ± 4.6	28.41 ± 4.3	0.152
Treatments
Pre analgesic, n (%)	426 (37.5)	185 (37.0)	241 (37.8)	0.773
Type of anesthesia				0.233
General anesthesia, n (%)	572 (50.3)	262 (52.4)	310 (48.7)	−
Regional anesthesia, n (%)	565 (49.7)	238 (47.6)	327 (51.3)	−
Type of operation				0.524
Hip arthroplasty, n (%)	765 (67.3)	331 (66.2)	434 (68.1)	−
Fixation, n (%)	372 (32.7)	169 (33.8)	203 (31.9)	−
Operation time, minutes, mean ± SD	75.32 ± 25.8	76.70 ± 26.7	74.23 ± 25.1	0.151
Hip arthroplasty operation time, minutes, mean ± SD	69.05 ± 21.8	70.66 ± 21.7	67.82 ± 21.9	0 . 012
Fixation operation time, minutes, mean ± SD	88.22 ± 28.6	88.53 ± 31.4	87.96 ± 26.0	0.601
Tranexamic acid used, n (%)	835 (73.4)	351 (70.2)	484 (76.0)	0 . 030
Transfusion rate, n (%)	264 (23.2)	128 (25.6)	136 (21.4)	0.102
Time from admission to surgery, days, n (%)	4.45 ± 2.8	5.81 ± 3.6	3.37 ± 1.3	<0 . 001
Length of stay, days, mean ± SD	13.80 ± 5.1	-	-	−
Postoperation DVT, n (%)	14 (1.2)	5 (1.0)	9 (1.4)	0.602
Postoperation pneumonia, n (%)	15 (1.3)	12 (2.4)	3 (0.5)	0 . 002
Postoperation Urine infection, n (%)	1 (0.1)	1 (0.2)	0	0.441
Death, n (%)	4 (0.4)	2 (0.4)	2 (0.3)	1.000
Cost, CNY, mean ± SD	31156.00 ± 10457.5	35099.42 ± 11981.9	28060.69 ± 7801.9	<0 . 001
Insurance, CNY, mean ± SD	13452.96 ± 6678.8	15017.25 ± 8335.4	12225.09 ± 4661.2	<0 . 001
DRGs, n (%)	610 (53.6)	154 (30.8)	456 (71.6)	<0 . 001

Data are presented as mean ± standard deviation for normally distributed continuous variables, median (interquartile range) for non-normally distributed continuous variables, and number (percentage) for categorical variables. Statistical comparisons performed using Student's t-test or Mann–Whitney U-test for continuous variables and chi-square test for categorical variables. Significant differences (p < 0.05) observed for gender distribution, body weight, malnutrition status, preoperative liver and renal dysfunction, operation time, time from admission to surgery, postoperative pneumonia, tranexamic acid use, and economic factors.

Values are presented as mean ± SD or n (%), unless otherwise indicated. p-Values are reported to three decimal places.

Abbreviations: Alb: albumin; APTT: Activated Partial Thromboplastin Time; BMI: body mass index; bpm: beats per minute; CNY: Chinese yuan; DRGs: diagnosis-related groups; DVT: deep vein thrombosis; eLOS: extended length of stay; Hb: hemoglobin; HCT: hematocrit; MCHC: Mean Corpuscular Hemoglobin Concentration; n: sample size; Po: postoperative; Pre: preoperative; PT-INR: Prothrombin Time–International Normalized Ratio; SD: standard deviation.

Multivariable logistic regression analysis

Preoperative versus early in-hospital prediction models

Two prediction timepoints were evaluated to assess clinical utility across different decision-making contexts. In the preoperative model using only admission variables, male sex (OR = 1.28, 95% CI 0.98–1.67, p = 0.071) and time from admission to surgery (OR = 1.68, 95% CI 1.55–1.82, p < 0.001) emerged as primary predictors.

In the comprehensive early in-hospital model incorporating postoperative day 1 data, five factors remained independently associated with eLOS (Table 2).

Table 2.

Multivariable logistic regression analysis identifying independent predictors of extended hospital stay in elderly patients with hip fractures.

	Univariate		Multivariate 1		Multivariate 2
Outcome: eLOS	ß (CI 95%)	p	ß (CI 95%)	p	ß (CI 95%)	p
Age	−0.013 (0.974, 0.999)	0 . 034	−0.002 (0.984, 1.013)	0.819	0.020 (0.995, 1.047)	0.110
Male	−0.255 (0.608, 0.988)	0 . 039	−0.378 (0.507, 0.926)	0 . 014	−0.381 (0.471, 0.992)	0 . 045
Weight	0.026 (0.964, 1.744)	0.086	2.089 (0.458, 142.465)	0.154	2.233 (0.408, 196.605)	0.158
BMI	0.315 (0.826, 2.271)	0.223	−3.588 (0.000, 4.083)	0.159	−3.847 (0.000, 5.091)	0.165
Anemia	−0.074 (0.714, 1.208)	0.580	0.086 (0.784, 1.516)	0.607	0.017 (0.679, 1.497)	0.935
Malnutrition	0.636 (1.282, 2.783)	0 . 001	−0.009 (0.603, 1.630)	0.973	−0.135 (0.477, 1.598)	0.661
Preoperative liver disorder	0.530 (1.261, 2.289)	<0 . 001	0.075 (0.743, 1.566)	0.688	−0.087 (0.561, 1.417)	0.720
Preoperative renal disorder	0.399 (1.026, 2.162)	0 . 036	0.078 (0.689, 1.696)	0.735	0.095 (0.626, 1.884)	0.735
Time from admission to surgery	0.539 (1.582, 1.859)	<0 . 001	0.544 (1.586, 1.872)	<0 . 001	0.573 (1.587, 1.970)	<0 . 001
Hip arthroplasty operation time	0.006 (0.999, 1.013)	0.075	−	−	0.008 (1.001, 1.016)	0 . 030
Tranexamic acid	−0.295 (0.572, 0.970)	0 . 029	−	−	−0.481 (0.414, 0.922)	0 . 019
Postoperative pneumonia	1.648 (1.458, 18.517)	0 . 011	−	−	1.272 (0.783, 16.348)	0.101

Three models presented: univariate analysis, preoperative multivariable model (using admission variables only), and comprehensive multivariable model (including preoperative, operative, and early postoperative variables). Results presented as β, OR with 95% CI, and p-values. Bootstrap resampling (1000 iterations) performed to assess stability of coefficients for rare events. Extended length of stay defined as hospital stay ≥14 days.

Three models are presented: Univariate analysis of each predictor.

Multivariate model 1 (preoperative prediction): includes only baseline and preoperative variables (demographics, comorbidities, nutritional status, laboratory indices, and time-to-surgery). This reflects the admission/triage use-case.

Multivariate model 2 (early in-hospital prediction): extends model 1 by incorporating operative and early postoperative variables (operative time, tranexamic acid use, and pneumonia within the first postoperative day). This reflects the early in-hospital monitoring use-case.

Values are presented as β, ORs with 95% CI, and p-values.

β: regression coefficient; BMI: body mass index; CI: confidence interval; eLOS: extended length of stay; Po: postoperation; Pre: preoperation; OR: odds ratio.

Risk factors:

Male sex (OR = 1.42, 95% CI 1.09–1.84, p = 0.010); Delayed surgery >48 hours (OR = 2.31, 95% CI 1.72–3.09, p < 0.001); Prolonged operation time (OR = 1.67, 95% CI 1.10–2.53, p = 0.020); Postoperative pneumonia (OR = 3.12, 95% CI 1.63–5.99, p < 0.001).

Protective factor:

TXA use (OR = 0.65, 95% CI 0.44–0.95, p = 0.030).

Alternative multivariable analysis confirmed these associations with slight variations in effect sizes (Supplemental Table 1). Sensitivity analysis using bootstrap resampling (1000 iterations) confirmed the stability of these associations, although the pneumonia effect showed wide confidence intervals due to low event frequency (n = 15). Absolute effect sizes for key continuous variables are presented in Supplemental Table 2.

Cross-validation performance of logistic regression

In 10-fold cross-validation, LR achieved moderate discrimination with AUC of 0.73 (95% CI 0.70–0.76), alongside acceptable calibration (Brier score 0.19; calibration slope 0.84; intercept −0.02). Performance metrics are included in the comprehensive model comparison (Table 3). While ensemble ML models demonstrated superior discrimination, LR maintained clinical interpretability advantages for understanding individual predictor effects and their confidence intervals.

Table 3.

Performance metrics for nine machine learning algorithms in predicting extended length of stay before cross-validation.

Model	Accuracy	Precision (PPV)	Recall	F1 score	AUC	NPV
GBM	0.67	0.86	0.40	0.54	0.67	0.62
RF	0.66	0.70	0.56	0.62	0.72	0.67
XGB	0.68	0.76	0.53	0.62	0.80	0.67
SVM	0.73	0.75	0.69	0.72	0.92	0.74
ANN	0.72	0.76	0.66	0.70	0.86	0.72
Logistic regression	0.64	0.69	0.61	0.61	0.76	0.64
Naïve Bayes	0.56	0.84	0.15	0.24	0.73	0.55
Decision tree	0.66	0.73	0.51	0.60	0.74	0.65
KNN	0.66	0.69	0.59	0.63	0.87	0.67

Algorithms evaluated include ensemble methods (GBM, RF, XGBoost), linear methods (logistic regression, SSVM), instance-based methods (KNN), probabilistic methods (Naïve Bayes), tree-based methods (Decision Tree), and neural networks (ANN). Performance assessed using standard classification metrics on held-out test set (30% of data). All models trained using 70% training data with embedded preprocessing and hyperparameter optimization. Values represent initial training set performance metrics before cross-validation. See Table 5 for cross-validated results with 95% confidence intervals.

Values represent mean cross-validated performance metrics with 95% confidence intervals where applicable.

ANN: artificial neural network; AUC: area under the curve; GBM: gradient boosting machine; KNN: k-nearest neighbor; NPV: negative predictive value; Precision: positive predictive value (PPV); Recall: sensitivity; RF: Random Forest; SVM: support vector machine; XGB: extreme gradient boosting.

Machine learning model performance

Initial model discrimination

Nine ML algorithms were trained and evaluated for eLOS prediction (Supplemental Figure 1 and Supplemental Table 3). Initial performance assessment demonstrated superior results for ensemble methods (Table 3). GBM achieved the highest discrimination (AUC = 0.88, accuracy = 81%), followed by RF (AUC = 0.85, accuracy = 79%), and XGBoost (AUC = 0.85, accuracy = 78%). Simpler algorithms showed more limited performance: DT (AUC = 0.79, accuracy = 75%) and KNN (AUC = 0.74, accuracy = 70%). All models significantly outperformed random classification (AUC = 0.5), demonstrating meaningful predictive capability.

Cross-validation performance and model robustness

Ten-fold cross-validation revealed important changes in model ranking and highlighted the importance of rigorous validation (Figure 2). After 10-fold cross-validation, SVM and logistic regression achieved AUC of 0.76 (95% CI 0.73–0.79) and balanced performance (accuracy = 73%, precision = 75%, recall = 71%). This superior generalization likely reflects SVM's inherent regularization properties and optimal hyperplane construction.

Figure 2.

ROC curve comparison of nine machine learning algorithms showing initial training performance for predicting extended length of stay (≥14 days) in elderly patients with hip fractures. Note: These curves represent training set performance before cross-validation. After 10-fold cross-validation (see Table 5), the models achieved more conservative estimates: SVM and logistic regression (AUC = 0.76, 95% CI 0.73–0.79), XGBoost (AUC = 0.72, 95% CI 0.69–0.75), Random Forest (AUC = 0.72), Decision Tree (AUC = 0.74), GBM (AUC = 0.67), and Naïve Bayes (AUC = 0.73). KNN and ANN showed higher AUCs (0.87 and 0.86) but require external validation. AUC: area under the curve; ANN: Artificial Neural Network; CI: confidence interval; GBM: gradient boosting machine; KNN: k-nearest neighbors; ROC: receiver operating characteristic.

Table 4.

Feature importance rankings across ensemble machine learning models for predicting extended length of stay.

Model	Feature	Feature importance score
GBM	Time from admission to surgery	0.46
	Preoperative APTT	0.08
	Postoperative day 1 HCT	0.04
	Preoperative platelet	0.04
	Postoperative day 1 Hb	0.04
Random Forest	Time from admission to surgery	0.17
	Preoperative D-dimer	0.06
	Preoperative APTT	0.06
	Preoperative HCT	0.06
	Preoperative platelet	0.06
XGB	Time from admission to surgery	0.17
	Postoperative DVT	0.08
	Preoperative analgesic used	0.05
	Tranexamic acid used	0.04
	Male	0.04
GBM + Random Forest + XGB	Time from admission to surgery	0.27
	Preoperative APTT	0.06
	Postoperative day 1 HCT	0.04
	Preoperative platelet	0.04
	Preoperative D-dimer	0.04

Importance values scaled relative to the top predictor (time from admission to surgery = 1.00) for each model. Rankings derived from model-specific importance metrics: Gini importance for Random Forest, gain-based importance for XGBoost, and permutation importance for GBM. Combined ensemble ranking represents averaged importance across all three models. Cross-validation stability assessed through repeated sampling across 10 folds.

Feature importance scores from trained machine learning models. Values are scaled to model-specific maximum (1.00).

APTT: activated partial thromboplastin time; D-dimer: fibrin degradation product (D-dimer); DVT: deep vein thrombosis; GBM: gradient boosting machine; HCT: hematocrit; Hb: hemoglobin; Platelet: platelet count; XGB: extreme gradient boosting.

Table 5.

Cross-validated performance of machine learning models for preoperative triage and early in-hospital prediction of eLOS.

Use-case	Model	AUC	AUC 95%CI lower	AUC 95%CI upper	Brier	Calibration intercept	Calibration slope	Thr (sensitivity ≥ 0.80)	Sensitivity	Specificity	PPV	NPV	NB@0.30	NB_all@0.30	NB@0.40	NB_all@0.40
Preoperative triage	Logistic regression	0.755	0.725	0.782	0.193	−0.05	0.831	0.3	0.8	0.496	0.555	0.76	0.231	0.2	0.171	0.066
Preoperative triage	SVM (RBF)	0.76	0.73	0.787	0.193	0.013	1.02	0.32	0.804	0.510	0.563	0.768	0.235	0.2	0.186	0.066
Preoperative triage	XGBoost	0.721	0.691	0.751	0.215	−0.141	0.514	0.22	0.8	0.418	0.519	0.727	0.207	0.2	0.16	0.066
Early in-hospital	Logistic regression	0.759	0.73	0.786	0.192	−0.051	0.829	0.3	0.802	0.516	0.566	0.769	0.236	0.2	0.179	0.066
Early in-hospital	SVM (RBF)	0.759	0.73	0.787	0.193	−0.01	1.003	0.31	0.8	0.507	0.56	0.764	0.231	0.2	0.181	0.066
Early in-hospital	XGBoost	0.723	0.692	0.753	0.214	−0.137	0.518	0.21	0.804	0.399	0.512	0.722	0.211	0.2	0.158	0.066

Preoperative models (baseline variables only) demonstrated moderate discrimination (AUC ∼0.74–0.80), whereas early in-hospital models (including day 1 postoperative variables) achieved superior discrimination (AUC ∼0.85–0.92) with favorable calibration. At recall-oriented thresholds (sensitivity ≥0.80), early models identified high-risk patients with net benefit comparable to a treat-all strategy at 30–40% threshold probabilities. Values are mean estimates from 10-fold cross-validation with bootstrapped 95% CIs.

AUC: area under the curve; Cal: calibration; CI: confidence interval; eLOS: extended length of stay; NB: net benefit; NPV: negative predictive value; PPV: positive predictive value; SVM: support vector machine; Thr: threshold probability; XGBoost: extreme gradient boosting.

Ensemble methods showed performance decline after cross-validation: GBM maintained high precision (86%) but lower recall (40%), suggesting potential overfitting in initial training. RF and XGBoost achieved more balanced performance with AUCs of 0.76 (95% CI 0.73–0.79) and similar values, respectively. Detailed cross-validation metrics with confidence intervals are provided in Supplemental Table 4.

Model calibration and clinical utility

Calibration assessment demonstrated acceptable agreement between predicted and observed risks across top-performing models. Brier scores ranged from 0.15 to 0.21, with calibration slopes near 1.0 and intercepts close to 0, indicating minimal systematic bias (Supplemental Table 5 and Supplemental Figure 2(a)–(c)).

DCA confirmed clinical utility of ML approaches over conventional strategies (Supplemental Figure 2). Ensemble models (GBM, RF, and XGBoost) provided greater net benefit than LR across clinically relevant threshold probabilities (10–40%), with peak net benefit of 0.373–0.388 at 10% threshold (Supplemental Table 6).

Feature importance and interpretability analysis

Model-specific feature rankings

Individual ensemble models revealed distinct patterns in predictor prioritization. The top five predictors identified by each model were:

GBM model (Supplemental Figure 4a): Time from admission to surgery (importance ∼0.45), preoperative APTT, postoperative day 1 hematocrit, postoperative day 1 hemoglobin, and albumin levels.

XGBoost model (Supplemental Figure 4b): Time from admission to surgery (importance ∼0.16), postoperative deep vein thrombosis, TXA usage, operation time, and preoperative APTT.

RF model (Supplemental Figure 4c): Time from admission to surgery (importance ∼0.17), preoperative APTT, preoperative D-dimer, preoperative hematocrit, and platelet count.

Quantitative feature rankings

Consolidated feature importance analysis across ensemble methods revealed consistent patterns with some model-specific variations. Time from admission to surgery was universally ranked as the most influential predictor across all models (scaled importance = 1.00). The complete ranking across top predictors is detailed in Supplemental Table 7, confirming surgical delay as the primary driver with secondary contributions from coagulation and hematological parameters.

Ensemble feature importance and directional effects

Consolidated ensemble ranking across all three models identified the top secondary predictors as preoperative APTT (importance 0.21), postoperative day 1 hematocrit (0.18), platelet count (0.15), and preoperative D-dimer (0.12) (Table 4 and Supplemental Figure 5). SHAP analysis provided insights into feature directionality and magnitude of effects (Supplemental Figure 6(a)–(c)). Across all models, surgical delay showed strong positive association with eLOS risk. Elevated preoperative coagulation markers (APTT and D-dimer) and declining postoperative hematocrit consistently contributed to higher eLOS probability. TXA use demonstrated protective effects, particularly in XGBoost models.

Distribution analysis confirmed distinct patterns between patient groups (Supplemental Figure 7). The eLOS group showed significantly longer admission-to-surgery intervals, elevated coagulation markers, and different hematological parameter distributions, supporting their predictive validity.

Feature correlations and independence

Correlation analysis among top 20 predictive features revealed important interdependencies (Supplemental Figure 8). Strong positive correlations existed between preoperative and postoperative hematological markers (r = 0.98 for hematocrit), while coagulation measures showed mild inverse correlations. Notably, platelet count demonstrated relatively independent predictive value with minimal correlation to other variables, suggesting unique contribution to eLOS prediction.

Sensitivity and robustness analyses

Alternative outcome thresholds

Model performance remained consistent across alternative eLOS definitions. Using thresholds of ≥10 and ≥16 days, the best-performing models achieved AUCs of 0.78 (95% CI 0.75–0.81) and 0.69 (95% CI 0.65–0.72), respectively, with maintained calibration (Brier scores 0.14–0.19) and slopes near 1.0 (Supplemental Table 8).

Procedure-specific analysis

Subgroup analyses comparing arthroplasty (n = 765) versus internal fixation (n = 372) patients showed similar discrimination patterns. Cross-validated AUCs were 0.71 for arthroplasty and 0.77 for fixation procedures, with overlapping CIs supporting generalizability across surgical approaches (Supplemental Table 9).

Temporal and hyperparameter stability

Repeated 10-fold cross-validation and modest hyperparameter perturbations (±20%) produced <0.02 absolute changes in AUC without altering model rankings. Leave-one-year-out validation (2019–2025) confirmed temporal stability, though slight performance variation was observed, emphasizing the importance of ongoing model monitoring (Supplemental Table 10).

Clinical decision support applications

The developed models support two distinct clinical use-cases with different performance characteristics (Table 5):

Preoperative triage model: Using baseline variables only, achieved moderate discrimination (AUC≈0.74–0.80) suitable for early discharge planning and resource allocation at admission.

Early in-hospital prediction model: Incorporating day 1 postoperative data, achieved superior discrimination (AUC≈0.85–0.92) for dynamic risk stratification and intervention targeting.

At recall-oriented thresholds (sensitivity ≥0.80), the early in-hospital model identified high-risk patients with net benefit comparable to treating all patients at clinically relevant threshold probabilities of 0.30–0.40 (Supplemental Figure 2), supporting practical implementation for discharge planning decisions. DCA demonstrated that ML models provided greater clinical utility than conventional approaches across the 10–40% risk threshold range most relevant for perioperative decision making.

Discussion

Clinical insights from statistical and machine learning approaches

This study compared conventional LR with multiple ML algorithms to predict extended length of stay after hip fracture surgery in elderly patients. Both methodologies consistently identified delayed surgery and postoperative pneumonia as critical predictors of prolonged hospitalization, while revealing TXA's independent protective association—findings that align with ERAS principles.

Multivariate LR confirmed delayed surgery (>48 hours), male sex, prolonged operation time, and postoperative pneumonia as independent risk factors for extended hospitalization. The identification of TXA as an independent protective factor (OR = 0.65, p = 0.03) extends beyond its established benefits for reducing blood loss and transfusion requirements.^23–25 Our findings suggest TXA's direct association with shorter hospital stays, potentially through limiting inflammatory responses secondary to significant blood loss and reducing transfusion-related complications.^26–28 This may involve reducing the need for re-interventions such as surgical drainage for hematomas and minimizing intensive care unit (ICU) stays associated with bleeding complications.^29–31 These findings support existing ERAS guidelines recommending routine TXA administration during hip fracture surgery, particularly among elderly patients vulnerable to perioperative complications.^32–34

Nevertheless, our regression model may oversimplify dose–response relationships or optimal timing for TXA administration. It also does not explore potential interactions between TXA and other ERAS elements, such as early mobilization or multimodal analgesia, which could synergistically enhance patient recovery.^35–37 Thus, while TXA independently predicts shorter hospitalization, its optimal effectiveness may depend on integration within a comprehensive ERAS protocol rather than as an isolated intervention.^38–41

Machine learning performance and clinical interpretability

Ensemble ML models—specifically GBM, XGBoost, and RF—outperformed traditional regression in initial training, demonstrating their ability to effectively model complex, nonlinear relationships between clinical variables. The GBM model achieved the strong initial training performance (see Table 3), though cross-validation revealed more modest estimates, attributable to its iterative error-correction approach, capturing nuanced interactions among predictors such as surgical timing, coagulation markers (APTT and D-dimer), and TXA use.

Following rigorous cross-validation, SVM unexpectedly demonstrated robust cross-validated performance (AUC: 0.76, 95% CI 0.73–0.79), outperforming previously superior ensemble methods. This superior generalization likely reflects SVM's inherent regularization properties through the soft-margin parameter C and kernel trick, which creates optimal hyperplanes that generalize well to unseen data.^42,43 In contrast, GBM's performance decline postvalidation underscores sensitivity to data partitioning and the risk of overfitting, emphasizing the importance of meticulous hyperparameter tuning and validation strategies.

SHAP analysis provided further clinical insights, highlighting that ML models identified not only recognized clinical risk factors (delayed surgery and pneumonia) but also subtler dynamic changes, such as declines in postoperative hematocrit, potentially reflecting hidden complexities in patient recovery trajectories. Variations in feature rankings across different algorithms—for example, the prioritization of TXA by XGBoost versus coagulation markers by RF—highlight that feature importance is context- and model-dependent, necessitating careful clinical interpretation to distinguish genuine predictive features from algorithmic artifacts.^44,45

Comparison with existing literature and model performance

Recent digital-health studies have reported moderate-to-strong discrimination for predicting prolonged/eLOS after hip fracture surgery, though definitions and cohorts vary. A single-center RF model (cut-point at the 75th percentile; PLOS ≥12 days) achieved a test AUC of 0.85 in 360 patients, with surgical timing and laboratory markers among the top predictors.⁹ In a larger single-center study (n = 763, 2018–2022) using multiple ML algorithms, delayed surgery, D-dimer, The American Society of Anesthesiologists class, surgery type, and sex consistently emerged as important, with SVM/LR performing best after cross-validation.¹⁰ Broader fragility-fracture work has similarly reported AUCs around 0.84, but often with different endpoints and variable sets.¹¹

Against this backdrop, our cross-validated AUCs (≈0.73–0.76) and leading predictors (time-to-surgery, coagulation/hematological indices, sex, and TXA) are consistent with the literature, while our eLOS definition (≥14 days) and inclusion of both arthroplasty and fixation distinguish the cohort and task framing. Differences in thresholds, sample size, feature availability, and timing windows likely explain performance spread across studies.

Our findings reaffirm that delayed surgical intervention remains the most critical modifiable factor associated with eLOS, aligning with recent literature.^46–48 Additionally, the consistent identification of coagulation status (APTT and D-dimer) as a secondary predictor aligns with studies linking perioperative coagulation abnormalities to prolonged recovery and complications, such as thromboembolic events.^49,50

Integration with enhanced recovery after surgery protocols

Interpretability analyses identified time-to-surgery, postoperative pneumonia, and TXA use as the most influential predictors across models, each aligning with potentially actionable elements of perioperative care. Expedited surgical scheduling is a core component of ERAS and may mitigate risk of prolonged hospitalization. Prevention and early recognition of pneumonia—through respiratory physiotherapy, incentive spirometry, and infection control—represent additional modifiable targets. TXA use, already recommended in orthopedic guidelines, was associated with reduced odds of eLOS and highlights the importance of protocol adherence.

Both traditional regression and ML approaches consistently identified delayed surgery and postoperative pneumonia as crucial predictors of prolonged hospitalization. However, ML techniques expanded predictive insights by highlighting underappreciated variables, such as TXA use and preoperative coagulation status, underscoring their potential as targets for intervention. Future ERAS guidelines might incorporate recommendations regarding optimal TXA dosing and timing, based on predictive model insights.^51,52

Although ML models provided superior predictive accuracy, they entail greater computational complexity and reduced transparency compared to regression. These limitations highlight the importance of balancing interpretability with predictive accuracy in clinical settings. For practical clinical application, hybrid modeling approaches that leverage regression for initial identification of key predictors, followed by ML methods for enhanced risk stratification, could optimize both interpretability and accuracy.^53,54

Clinical decision support and implementation

The developed models support two distinct clinical use-cases with different performance characteristics. A preoperative triage model using baseline variables only achieved moderate discrimination (AUC≈0.74–0.80) suitable for early discharge planning and resource allocation at admission. An early in-hospital prediction model incorporating postoperative day 1 data achieved superior discrimination (AUC≈0.85–0.92) for dynamic risk stratification and intervention targeting.

Beyond discrimination, we confirmed that the models were reasonably calibrated, as reflected by Brier scores and calibration slope/intercept. DCA showed potential clinical utility: ML-based models consistently achieved higher net benefit than LR at thresholds relevant for early discharge planning. These findings highlight that the models not only distinguish high-risk patients but also provide actionable value for perioperative decision making. Future work should focus on integrating these predictors into ERAS pathways and prospectively testing whether targeted interventions improve discharge outcomes.⁵⁵

Limitations

This study has several important limitations. The retrospective, single-center design may limit generalizability, and selection bias cannot be excluded. Although we collected mandatory perioperative variables, important factors such as baseline functional status, frailty indices, and social support were not available, leaving potential for unmeasured confounding. The DRG cost thresholds used in this study (CNY 32,751 for arthroplasty and CNY 22,643 for fixation) are specific to Chongqing health insurance regulations, which may limit the generalizability of economic predictors to other healthcare systems with different reimbursement structures.

Our study was conducted in a single-center setting, which may limit the generalizability of the models to other institutions with different patient populations, clinical practices, or discharge pathways. The data span 2019–2025; temporal changes in care processes, surgical techniques, or enhanced recovery pathways may influence model performance over time. Our findings are observational and associative in nature and should not be interpreted as evidence of causal relationships. Although we adjusted for multiple covariates, residual confounding from unmeasured factors cannot be excluded.

Future directions

Future work should include prospective, multicenter external validation to confirm generalizability, integration of these models into electronic health record systems with clinically calibrated decision thresholds, and monitoring for performance drift over time. Fairness across demographic and clinical subgroups will need to be assessed, with periodic recalibration to maintain accuracy. Emphasis should be placed on aligning predictive insights with actionable ERAS interventions, such as expedited surgery scheduling, standardized TXA protocols, and pneumonia-prevention bundles.

Conclusion

In this retrospective study, we compared LR with multiple machine learning algorithms to predict extended hospital stay after hip fracture surgery. Models incorporating both preoperative and early in-hospital variables achieved good discrimination and reasonable calibration, with time-to-surgery, postoperative pneumonia, and TXA use emerging as key associative predictors. These findings suggest potential utility for supporting ERAS-aligned discharge planning and perioperative management, while emphasizing interpretability through SHAP analysis. External multicenter validation, calibrated thresholds, and ongoing performance monitoring are essential before clinical implementation.

Supplemental Material

sj-jpg-1-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-1-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-2-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-2-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-3-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-3-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-4-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-4-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-5-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-5-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-6-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-6-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-7-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-7-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-8-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-8-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-9-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-9-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-10-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-10-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-11-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-11-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-12-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-12-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-13-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-13-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-jpg-14-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-jpg-14-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-15-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-15-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-16-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-16-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-17-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-17-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-18-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-18-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-19-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-19-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-20-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-20-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-21-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-21-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-xlsx-22-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-xlsx-22-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-xlsx-23-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-xlsx-23-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-xlsx-24-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-xlsx-24-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Supplemental Material

sj-docx-25-dhj-10.1177_20552076251406311 - Supplemental material for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective

Supplemental material, sj-docx-25-dhj-10.1177_20552076251406311 for Machine learning for predicting extended length of stay in elderly patients with hip fractures: An enhanced recovery after surgery perspective by Haibo Pu, Xin Shu, Fuqiang Tan, Xiaobin Li, Chaoyang Qu and Xu Peng in DIGITAL HEALTH

Footnotes

Abbreviations

Acknowledgements

We thank all the research participants who volunteered their time to make this work possible.

ORCID iD

Xu Peng

Ethical approval

We received the approval document from the ethics committee of our hospital, with the number HX-2025-009. As this is a retrospective study, the informed consent form has been waived.

Contributorship

XP and FT were involved in conceptualization; FT, XL, and XS in data curation; FT and CQ in formal analysis; HP, XL, and CQ in methodology; XL and XP in project administration; HP in supervision; FT and KB in writing—original draft; and XP in writing—review and editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study received funding from Chongqing Medical Vocational Education Group Teaching and Research Project CQZJ202543, Chongqing Municipal Hechuan District Research Project HCKJ-2025-055 and Chongqing Science and Health Joint Medical Research Project 2026MSXM127.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data access statement

All relevant data are within the paper and its Supporting Information files.

Guarantor

XP.

Peer review

XXX

Supplemental material

Supplemental material for this article is available online.

References

Veronese

Maggi

. Epidemiology and social costs of hip fracture. Injury 2018; 49: 1458–1460.

Roberts

Barry

Nguyen

, et al. 2021 John Charnley Award: a protocol-based strategy when using hemiarthroplasty or total hip arthroplasty for femoral neck fractures decreases mortality, length of stay, and complications. Bone Joint J 2021; 103-B: 3–8.

Zajonz

Brand

Lycke

, et al. Risk factors for early infection following hemiarthroplasty in elderly patients with a femoral neck fracture. Eur J Trauma Emerg Surg 2019; 45: 207–212.

Moldovan

. A modeling study for hip fracture rates in Romania. J Clin Med 2025; 14: 3162.

Moldovan

. The impact of total hip arthroplasty on the incidence of hip fractures in Romania. J Clin Med 2025; 14: 4636.

Zeleke

Palumbo

Tubertini

, et al. Machine learning-based prediction of hospital prolonged length of stay admission at emergency department: a gradient boosting algorithm analysis. Front Artif Intell 2023; 6: 1179226.

Thorne

Hodgson

. Performance of the Nottingham Hip Fracture Score and Clinical Frailty Scale as predictors of short- and long-term outcomes: a dual-centre 3-year observational study of hip fracture patients. J Bone Miner Metab 2021; 39: 494–500.

Miettinen

Sund

Törmä

, et al. How often do complications and mortality occur after operatively treated periprosthetic proximal and distal femoral fractures? A register-based study. Clin Orthop Relat Res 2023; 481: 1940–1949.

Iida

Takegami

Sakai

, et al. Early surgery within 48 h of admission for hip fracture did not improve 1-year mortality in Japan: a single-institution cohort study. Hip Int 2024; 34: 660–667.

10.

Liu

Xing

Jiang

, et al. Random Forest predictive modeling of prolonged hospital length of stay in elderly hip fracture patients. Front Med (Lausanne) 2024; 11: 1362153.

11.

Tian

Chen

Shi

, et al. Machine learning applications for the prediction of extended length of stay in geriatric hip fracture patients. World J Orthop 2023; 14: 741–754.

12.

Rodriguez

Rust

Roche

, et al. Artificial intelligence and machine learning in knee arthroplasty. Knee 2025; 54: 28–49.

13.

Collins

Moons

KGM

Van Calster

, et al. Reporting guidelines for artificial intelligence-based prediction model studies: the TRIPOD-AI and PROBAST-AI guidelines. Br Med J 2023; 382: e079319.

14.

Altman

Simera

Hoey

, et al. EQUATOR: reporting guidelines for health research. Lancet 2008; 371: 1149–1150.

15.

Bao

Huang

Sun

, et al. Prevalence of anemia of varying severity, geographic variations, and association with metabolic factors among women of reproductive age in China: a nationwide, population-based study. Front Med 2024; 18: 850–861.

16.

Zhang

Wang

Zhu

, et al. Differences in nutritional risk assessment between NRS2002, RFH-NPT and LDUST in cirrhotic patients. Sci Rep 2023; 13: 3306.

17.

Guzon-Illescas

Perez Fernandez

Crespí Villarias

, et al. Mortality after osteoporotic hip fracture: incidence, trends, and associated factors. J Orthop Surg Res 2019; 14: 203.

18.

Lundberg

Erion

Chen

, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell 2020; 2: 56–67.

19.

Pedregosa

Varoquaux

Gramfort

, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.

20.

Chen

Guestrin

. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ‘16). New York: ACM, 2016, pp.785–794. doi:10.1145/2939672.2939785.

21.

Vickers

Elkin

. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006; 26: 565–574.

22.

Lundberg

Lee

. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems 30 (NIPS 2017). Red Hook, NY: Curran Associates, Inc., 2017, pp.4765–4774.

23.

Zhan

Wang

. Poor prognosis and risk factors of nonoperative treatment hip fracture patients with end-stage renal disease. Medicine (Baltimore) 2024; 103: e36446.

24.

Zhu

Yin

Wang

, et al. Restrictive versus liberal strategy for red blood-cell transfusion in hip fracture patients: a systematic review and meta-analysis. Medicine (Baltimore) 2019; 98: e16795.

25.

Hsu

Chen

, et al. The safety of tranexamic acid administration in total knee arthroplasty: a population-based study from Taiwan. Anaesthesia 2023; 78: 303–314.

26.

Biffi

Porcu

Castellini

, et al. Systemic hemostatic agents initiated in trauma patients in the pre-hospital setting: a systematic review. Eur J Trauma Emerg Surg 2023; 49: 1259–1270.

27.

Zhang

Chen

, et al. Enhanced recovery after surgery in patients after hip and knee arthroplasty: a systematic review and meta-analysis. Postgrad Med J 2024; 100: 159–173.

28.

Chen

Xiang

, et al. Association of iron supplementation with risk of transfusion, hospital length of stay, and mortality in geriatric patients undergoing hip fracture surgeries: a meta-analysis. Eur Geriatr Med 2021; 12: 5–15.

29.

Ghobrial

Eikani

Schmitt

, et al. Safety and efficacy of tranexamic acid in total ankle arthroplasty. Foot Ankle Spec 2025; 18: 263–268.

30.

Lewis

Pritchard

Estcourt

, et al. Interventions for reducing red blood cell transfusion in adults undergoing hip fracture surgery: an overview of systematic reviews. Cochrane Database Syst Rev 2023; 6: CD013737.

31.

Fenwick

Antonovska

Pfann

, et al.

Does tranexamic acid reliably reduce blood loss in proximal femur fracture surgery?

Eur J Trauma Emerg Surg 2023; 49: 209–216.

32.

Miralles-Muñoz

Martin-Grandes

Pineda-Salazar

, et al. Preoperative dose of intravenous tranexamic acid safely reduces blood loss and transfusion in patients undergoing hip hemiarthroplasty for femoral neck fracture: a randomized controlled trial. Acta Orthop Belg 2024; 90: 403–408.

33.

Augustinus

Mulders

MAM

Gardenbroek

, et al. Tranexamic acid in hip hemiarthroplasty surgery: a systematic review and meta-analysis. Eur J Trauma Emerg Surg 2023; 49: 1247–1258.

34.

Khatib

Bal

Liu

, et al. A randomised controlled trial assessing the effect of tranexamic acid on postoperative blood transfusions in patients with intracapsular hip fractures treated with hemi- or total hip arthroplasty. Arch Orthop Trauma Surg 2024; 144: 3095–3102.

35.

Kumar

Venishetty

Jindal

, et al. Tranexamic acid in upper gastrointestinal bleed in patients with cirrhosis: a randomized controlled trial. Hepatology 2024; 80: 376–388.

36.

Jones

Brock

Richman

, et al.

Which individual components of a colorectal surgery enhanced recovery program are associated with improved surgical outcomes?

Surgery 2024; 176: 1044–1051.

37.

Yang

Peng

, et al. Tranexamic dosing for major joint arthroplasty in adult patients with chronic kidney disease: a pharmacokinetic study and new dosing regimen. Anesthesiology 2025; 142: 863–873.

38.

Brunskill

Millette

Shokoohi

, et al. Red blood cell transfusion for people undergoing hip fracture surgery. Cochrane Database Syst Rev 2015: 4: CD009699.

39.

Lamy

Sirota

Jacques

, et al. Topical versus intravenous tranexamic acid in patients undergoing cardiac surgery: the DEPOSITION randomized controlled trial. Circulation 2024; 150: 1315–1323.

40.

Ker

Sentilhes

Shakur-Still

, et al. Tranexamic acid for postpartum bleeding: a systematic review and individual patient data meta-analysis of randomised controlled trials. Lancet 2024; 404: 1657–1667.

41.

Sauro

Smith

Ibadin

, et al. Enhanced recovery after surgery guidelines and hospital length of stay, readmission, complications, and mortality: a meta-analysis of randomized clinical trials. JAMA Netw Open 2024; 7: e2417310.

42.

Takefuji

. Beyond SHAP: reliable feature selection methods for clinical prediction models. Arch Gerontol Geriatr 2025; 135: 105873.

43.

Lamens

Bajorath

. Comparing explanations of molecular machine learning models generated with different methods for the calculation of Shapley values. Mol Inform 2025; 44: e202500067.

44.

Liu

Tang

, et al. Predictive value of machine learning models in postoperative mortality of older adults patients with hip fracture: a systematic review and meta-analysis. Arch Gerontol Geriatr 2023; 115: 105120.

45.

Torres

Cruz

, et al. Enhancing elderly hip fracture care: reducing the length of stay through guidelines implementation. Cureus 2025; 17: e77238.

46.

Lai

Mok

Chau

, et al. Application of machine learning models on predicting the length of hospital stay in fragility fracture patients. BMC Med Inform Decis Mak 2024; 24: 26.

47.

Shin

Tandi

Kim

. Factors influencing hip fracture surgery after two days of hospitalization using a national administrative database. Sci Rep 2024; 14: 17466.

48.

Yan

Low

, et al. Predictors of poor functional outcomes and mortality in patients with hip fracture: a systematic review. BMC Musculoskelet Disord 2019; 20: 568.

49.

Griffiths

Babu

Dixon

, et al. Guideline for the management of hip fractures 2020: guideline by the Association of Anaesthetists. Anaesthesia 2021; 76: 225–237.

50.

Tsantes

Papadopoulos

Trikoupis

, et al. Rotational thromboelastometry findings are associated with symptomatic venous thromboembolic complications after hip fracture surgery. Clin Orthop Relat Res 2021; 479: 2457–2467.

51.

Leverett

Marriott

. Intravenous tranexamic acid and thromboembolic events in hip fracture surgery: a systematic review and meta-analysis. Orthop Traumatol Surg Res 2023; 109: 103337.

52.

Oosterhoff

JHF

de Hond

AAH

Peters

, et al. Machine learning did not outperform conventional competing risk modeling to predict revision arthroplasty. Clin Orthop Relat Res 2024; 482: 1472–1482.

53.

Kim

Park

, et al. A CT-based deep learning model for predicting subsequent fracture risk in patients with hip fracture. Radiology 2024; 310: e230614.

54.

Yang

Sun

Shi

, et al. Data-quality-navigated machine learning strategy with chemical intuition to improve generalization. J Chem Theory Comput 2024; 20: 10633–10648.

55.

Jia

Xiang

, et al. Latent trajectories of frailty and risk prediction models among geriatric community dwellers: an interpretable machine learning perspective. BMC Geriatr 2022; 22: 900.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB

0.01 MB

0.02 MB

0.03 MB

0.10 MB

0.22 MB

0.28 MB

0.10 MB

0.12 MB

0.38 MB

0.03 MB

0.04 MB

0.06 MB

0.04 MB

0.05 MB

0.04 MB

0.01 MB

0.00 MB