Abstract
Background:
Elbow stiffness is a condition that causes mobility dysfunction and severe suffering in patients. Patients with posttraumatic elbow stiffness who undergo open elbow arthrolysis are at risk for poor postoperative flexion-extension outcome, which leads us to suspect that these patients possess a unique “elbow stiffness predisposition.”
Purpose:
To develop an innovative predictive model that combines clinical and laboratory indicators to forecast poor flexion-extension outcomes after open elbow arthrolysis, thereby interpreting the “elbow stiffness predisposition.”
Study Design:
Cohort study; Level of evidence, 3.
Methods:
Patients who underwent open elbow arthrolysis between 2019 and 2022 were selected for model training and validation (n = 254), while those who underwent open elbow arthrolysis between 2016 and 2017 served as a test set (n = 35). The study assessed 19 clinical features and 58 laboratory parameters. A comparative analysis of several machine learning models—logistic regression, Naive Bayes, decision trees, random forest, gradient boosting, and XGBoost—was performed to identify the most effective approach. SHapley Additive exPlanations (SHAP) were employed to prioritize the key factors.
Results:
Using univariate analysis and the least absolute shrinkage and selection operator (LASSO) regression, 14 key variables were selected for inclusion in the model. The XGBoost model demonstrated superior performance, reaching an area under the curve of 0.909 on the test dataset. The most important indicator identified by LASSO was alkaline phosphatase. Indicators ranked by SHAP were lipoprotein(a), alkaline phosphatase, visual analog scale score, serum calcium, basophil count, serum sodium, previous arthrolysis, alanine aminotransferase, blood glucose, preoperative elbow range of motion, Cystatin C, uric acid, tobacco use, and serum cholinesterase.
Conclusion:
We have successfully developed a machine learning model using 14 key indicators to predict poor flexion-extension outcomes, which preliminarily explains the “elbow stiffness predisposition.”
Posttraumatic elbow stiffness (PTES) limits elbow mobility due to trauma, causing significant impairments in daily activities. Treatment modalities encompass conservative measures and surgical interventions, with open elbow arthrolysis being the most prevalent surgical approach, 42 albeit carrying a risk of unfavorable flexion-extension outcome. 45 Elbow mobility is critically important for personal activities of daily living and occupational functions; subsequent recurrence of limited range of motion (ROM) after surgery imposes substantial economic and psychological burdens on patients and their families. As these unfavorable outcomes are not isolated incidents, we hypothesized that surgical nonresponders may possess a constitutional predisposition to elbow stiffness. We aimed to predict this susceptibility to provide adjunctive information for clinical decision-making, thereby potentially mitigating the adverse effects of nonbeneficial surgical interventions on patients.
Accurately predicting poor outcomes for patients with PTES remains a significant clinical challenge, which highlights the need for more robust, data-driven approaches for preoperative assessment. Machine learning has emerged as a potent tool for analyzing complex, multidimensional data to predict adverse events, an approach widely validated across various biomedical challenges for tasks such as deciphering drug response or identifying key molecular markers.14,34 Its utility spans orthopaedics, such as forecasting complications after surgeries, including anterior cruciate ligament reconstruction and total joint arthroplasty.16,26 Nonetheless, machine learning's potential in predicting postoperative issues after PTES surgery, using integrated laboratory and clinical data, remains untapped. Feature selection is vital in machine learning model development. While recent research has emphasized clinical factors such as smoking and obesity on PTES outcomes,35,36,46 laboratory indicators are less examined. Blood-based tests provide accessible and cost-effective data for predictive modeling. There is growing interest in laboratory markers, such as uric acid (UA) and alkaline phosphatase,38,51 related to PTES outcomes. Building on strategies proven effective in other medical contexts for creating predictive scores from systemic markers, and a comprehensive analysis combining clinical and laboratory data through machine learning could enhance preoperative assessments.31,54
This study proposes a machine learning-based approach to predict poor flexion-extension outcomes after open elbow arthrolysis by combining 19 clinical features and 58 laboratory markers, refined to 14 key variables. Multiple machine learning techniques—logistic regression, Naive Bayes, decision trees, random forest, gradient boosting, and extreme gradient boosting—were employed to identify the most effective predictive model. The goal is to enhance preoperative assessments, improve patient well-being, and optimize the use of health care resources.
Methods
Data Collection
The retrospective patient cohort was collected from a large municipal general hospital and analyzed under strict confidentiality to protect patient privacy. All processes adhere to the core principles of the Declaration of Helsinki. Data for analysis were gathered from patients experiencing elbow stiffness who received open elbow arthrolysis between May 2019 and December 2022, as well as between January 2016 and December 2017. The data from 2019 to 2022 were used for training and validation, while the data from 2016 to 2017 were used for testing. The preoperative clinical and laboratory data were retrieved from the electronic medical record system, the medical laboratory system, and the record book, and the postoperative rehabilitation data were obtained through follow-up. All patients were observed for >2 years. Due to variations in surgeons and temporal factors, the test set possesses certain external validation characteristics. All missing data were excluded from this study at the data collection stage (exclusion rate, 17%). Therefore, data imputation was not applied in this study. This study was reviewed and approved by the hospital's ethics committee (Approval No: 2025-KY-032-K). The research process is in line with the principles of the Declaration of Helsinki, and the information is strictly confidential.
The eligibility requirements were as follows: (1) Posttraumatic stiffness in the elbow, with a ROM restricted to <100°. ROM was measured by the attending physician (Y.O.) using a standard goniometer. The axis was placed on the lateral epicondyle of the humerus, and the mean of 2 measurements was taken; and (2) the intervention involved open elbow arthrolysis. The exclusion criteria were as follows: (1) Elbow joint burns, central nervous system-related injuries due to stroke, spinal cord injury, or brain trauma, elbow swelling and discomfort with unclear cause after clinical examination and radiograph/magnetic resonance imaging assessment, as well as comorbid conditions—including rheumatoid arthritis or tuberculosis; (2) Patients who also underwent concomitant total elbow replacement or radial head replacement during the open elbow arthrolysis; (3) Patients who underwent another elbow surgery other than open elbow arthrolysis during the follow-up period, which affected prognostic judgment. In instances where the elbow joint required further treatment, specifically through open elbow arthrolysis during the follow-up, such cases were classified as a recurrence of elbow stiffness and thus incorporated into the study; and (4) Patients with missing medical records, those who declined participation, or those who were lost to follow-up. Ultimately, data on patients spanning from 2019 to 2022 were collected to develop the training and internal cross-validation sets, employing a method of random sampling for division. Additionally, patient data from 2016 to 2017 were collected as the test set (Figure 1). Because of the rarity of the condition and to ensure the representativeness of the small sample size, we collected as many patients as possible, with varying ages, sexes, and severities of the condition. The application of machine learning to such focused, limited-size datasets has proven effective in other specialized scientific fields. 20

Patient information collection process. Patients who had surgery between 2019 and 2022 were split into training and validation sets (7:3). Patients from 2016 to 2017 were used for model-independent testing. ROM, range of motion.
If the ROM fell <100° at any point during the follow-up phase or at the final assessment, it was deemed that the elbow stiffness had returned, indicating a negative postoperative result. A total of 19 demographic and clinical characteristics were collected, as well as 58 preoperative laboratory test indicators involving blood routine, inflammation, blood lipids, coagulation, myocardial enzymes, and liver and kidney function (Table 1).
Laboratory and Clinical Indicators a
BMI, body mass index; Preop, preoperative; ROM, range of motion; VAS, visual analog scale.
Various assessment techniques were employed to examine body mass index (BMI), visual analog scale (VAS), ROM, and ulnar nerve symptoms. BMI categories included underweight (<18.5 kg/m2), normal weight (18.5-23.9 kg/m2), overweight (24-27.9 kg/m2), and obese (>28 kg/m2). 53 The VAS for pain was rated as none (0), mild (0-3, interferes with sleep), moderate (4-6, unable to sleep), and severe (7-10, severe pain). 12 Ulnar nerve symptoms are classified as none, mild, moderate, and severe according to the Dellon classification. 9 Preoperative ROM was separated 25 into 4 grades: <30, 30 to 59, 60 to 89, and 90 to 100 (≥90). Types of injuries include dislocation, intra-articular fracture, extra-articular fracture, complex fracture dislocation, and other elbow injuries. 38 Initial treatment was categorized as conservative and operative. Other clinical features include age, sex, stiffness duration, preoperative pronation, preoperative supination, history of multiple surgeries, articular ankylosis, habitual side stiffness, tobacco use, alcohol consumption, history of radial head replacement, previous arthrolysis, and hyperuricemia.
Surgical options for elbow stiffness include open and arthroscopic arthrolysis, with open arthrolysis offering broader indications. 42 Open procedures typically employ a dual medial-lateral approach: the medial route addresses ulnar nerve repositioning and removes osteophytes and scar tissue, while the lateral approach focuses on clearing scar tissue around the radial head and coronoid fossa. Postoperative rehabilitation often involves external fixation braces.
Statistical Analysis
Feature engineering was conducted on the training dataset utilizing R Version 4.3.2 (R Foundation for Statistical Computing), a powerful tool for statistical computing in complex modeling workflows. 40 To identify relevant variables, a combination of univariate analysis and the Least Absolute Shrinkage and Selection Operator (LASSO) regression using the glmnet package was employed. The Shapiro-Wilk test was used to assess the normality of continuous variables, while the t test was used for data that were normally distributed. For continuous variables that did not conform to normality, the rank-sum test (also known as the Wilcoxon rank-sum test) was employed. Categorical variables were analyzed using either the chi-square test or the Fisher exact test. Variables with a significance level of P < .1 in the univariate analysis were initially filtered for additional LASSO regression to mitigate issues of overfitting and multicollinearity. To test the significance of intervariable correlations, the corrplot package was used to obtain the P value with a significance level of .05.
The models were developed and assessed using Python 3.8, specifically leveraging scikit-learn Version 1.3.1 and XGBoost Version 1.7.0. After selecting features, 6 distinct machine learning techniques were utilized to construct prognostic models: logistic regression, Naive Bayes, decision trees, random forests, gradient boosting, and XGBoost. The models were developed based on a training dataset, with hyperparameter optimization performed using the “GridSearchCV” function from scikit-learn, configured with 10-fold cross-validation. The effectiveness of the models was gauged using both the cross-validation and test datasets. The model that yielded the highest area under the curve (AUC) in both sets was selected for a more thorough assessment across various metrics. The AUC served as the main performance indicator, while additional metrics, such as accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score, were computed as well using functions from the “sklearn.metrics” module (eg, roc_auc_score, confusion_matrix, accuracy_score). Moreover, the contribution of each variable to the model's predictions was evaluated using SHapley Additive exPlanations (SHAP), via the shap. Explainer class and associated plotting functions from the “shap” library, which facilitated the breakdown of predictions for individual instances. To address class imbalance in the training data, the Synthetic Minority Over-sampling Technique (SMOTE) (from the “imbalanced-learn” library, class imblearn.over_sampling.SMOTE) was applied, and the classification threshold for the models was subsequently set at the standard 0.5. One representative positive case and one representative negative case were selected for SHAP visualization.
Results
General Information
The study involved 204 patients who underwent elbow arthrolysis between 2019 and 2022, meeting the inclusion/exclusion criteria (Supplement Table 1). Univariate analysis identified 19 variables potentially associated with outcomes (P < .05) (Table 2). Patients were categorized as “Yes” for poor flexion-extension outcome and “No” otherwise, split into a training set (n = 142) and a validation set (n = 62). The ages were a median of 36 and 35 years, with male dominance in both sets. Adverse outcomes were present in 35.9% and 37.1% of the sets. Detailed patient data were collected for a multidimensional assessment. An additional 35-patient subset (2016-2017) served as a test set for model generalizability.
Laboratory Testing Indicators and Clinical Characteristics With P < .05 in the Baseline Table a
Data are presented as median [IQR], mean (SD), or n (%). ALT, alanine aminotransferase; ALP, alkaline phosphatase; AST, aspartate aminotransferase; Ca, calcium; CHE, cholinesterase; Cysc, cystatin C; GGT, gamma-glutamyl transferase; IQR, interquartile range; KU/L, kilo units per liter; LP(a), lipoprotein (a); Na, sodium; Preop, preoperative; RBP, retinol-binding protein; ROM, range of motion; TG, triglycerides; UA, uric acid; VAS, visual analog scale.
Feature Engineering
For all 77 clinical and laboratory features, univariate analysis was first conducted to preliminarily screen variables potentially associated with poor flexion-extension outcome. To analyze the relationship between continuous variables and negative outcomes, we employed both the rank-sum test and the t test. For categorical variables, we utilized the chi-square test alongside the Fisher exact test. We established a significance level at P < .1 for univariate screening in the training test (Supplement Table 2), which led to the identification of 36 variables (Supplement Table 3). LASSO regression was subsequently used to further reduce collinearity between variables (Figure 2). The data were standardized, and categorical variables were constructed as dummy variables. Ultimately, a minimal set of 14 outcome-related variables was identified, including alkaline phosphatase (ALP) (identified by LASSO as the most influential factor), lipoprotein (a) (Lp[a]), previous arthrolysis, basophils (BAS), UA, glucose levels (GLU), sodium (Na), ALT (alanine aminotransferase), tobacco use, VAS, cystatin C (CysC), calcium (Ca), cholinesterase (CHE), and preoperative ROM (Supplement Table 4).

LASSO regression identifies the minimal subset of variables. (A) The binomial deviance changes with the log of the regularization parameter λ. At 1 standard error above the minimum deviance, λ yields a minimal variable set (14 variables) with optimal complexity and deviance. (B) Each feature's coefficient varies with the penalization strength, and the data have been standardized. LASSO, least absolute shrinkage and selection operator.
Elements such as ALP and UA were identified as contributors to unfavorable outcomes, with ALP wielding the most considerable influence. In contrast, effective initial joint mobility and the lack of joint pain factors are associated with positive results. These findings align well with existing literature, indicating that our screening methodology was highly effective. Furthermore, patients who underwent reoperation after previously experiencing poor outcomes often faced similar results, hinting at a possible physical predisposition to elbow stiffness and underscoring the importance of our study. No strong correlations were observed among the selected variables (Supplement Figure 1).
Model Construction and Evaluation
To initially assess the predictive power of each model regarding postoperative outcomes, we performed a 10-fold cross-validation using the training dataset. This involved splitting the data into 10 segments, utilizing 9 segments for training while reserving 1 for validation during each iteration, thereby evaluating each model's learning and forecasting abilities across the full training set. We employed a randomized grid search for hyperparameter optimization. Out of the 6 models assessed, the XGBoost model stood out with the highest mean AUC (AUC = 0.889), highlighting its exceptional ability to distinguish outcomes within the overall training dataset (Figure 3).

Ten-fold cross-validation within the training set. AUC, area under the curve; DT, decision tree; GBM, gradient boosting machine; LR, logistic regression; NBC, Naive Bayes Classifier; RF, random forest; XGB, XGBoost.
The final model was built using the complete training set. An independent dataset from 2016–2017 was used as an external test set in a nonrigorous sense, enabling further evaluation of each model's generalizability across different time periods and surgical teams. The baseline characteristics of the 2 cohorts are compared in detail in Supplement Table 5. The receiver operating characteristic curves and confusion matrix highlighted the XGBoost model's strong generalization capability in both the internal validation (AUC = 0.914) and test sets (AUC = 0.909) (Figure 4). Consequently, the model trained with the XGBoost method was selected as the final model for predicting poor flexion-extension outcome after open elbow arthrolysis.

Performance of the constructed models in the validation and test sets. (A) ROC curves for each model in the validation set. (B) XGBoost model's confusion matrix in the validation set. (C) Model performance metrics in the validation set. (D) ROC curves for each model in the test set. (E) XGBoost model's confusion matrix in the test set. (F) Model performance metrics in the test set. AUC, area under the curve; DT, decision tree; GBM, gradient boosting machine; LR, logistic regression; NBC, Naive Bayes Classifier; NPV, negative predictive value; PPV, positive predictive value; RF, random forest; ROC, receiver operating characteristic; XGB, XGBoost.
Interpretability of the XGBoost Model
SHAP values from the XGBoost model elucidate the effects of its features on predictions. The mean absolute SHAP values indicate feature significance, with higher values denoting greater importance. The most critical features identified were Lp(a), ALP, VAS, Ca, BAS, Na, previous arthrolysis, ALT, GLU, preoperative ROM, CysC, UA, tobacco use, and CHE (Figure 5A). Positive SHAP values contribute to poor prognosis, while negative values suggest better outcomes. Lower preoperative ROM was associated with poorer prognosis, consistent with clinical observations; however, the remaining indicators were all positively correlated with the probability of the poor outcome (Figure 5B).

Assessing variable importance in the XGBoost model using the SHAP method. (A) The mean SHAP value of each feature, with larger values indicating greater importance to the model's predictions. (B) Distribution of SHAP values per feature, with red indicating high values and blue indicating low values. Units do not match the x-axis; only simple information is provided, where 0 indicates the absence of this condition, and 1 indicates its presence. ALP, alkaline phosphatase; ALT, alanine aminotransferase; BAS, basophils (absolute value); Ca, calcium; CHE, serum cholinesterase; CysC, cystatin C; GLU, glucose levels; Lp(a), lipoprotein(a); Na, sodium; ROM, range of motion; SHAP, SHapley Additive exPlanations; VAS, visual analog scale; UA, uric acid.
Single-sample prediction decomposition helps further explain our model. The risk prediction plot differentiates between low (blue) and high-risk (red) contributions, with a baseline score of 0.5 for prognosis. For example, patient ID6, a smoker with low-risk scores, is likely to have a good outcome (Figure 6A), while patient ID171, despite lower VAS and BAS, shows a higher risk due to elevated ALP, ALT, Na, and GLU scores (Figure 6B).

Single sample prediction decomposition for patients (A) ID6 and (B) ID171. ALP, alkaline phosphatase; ALT, alanine aminotransferase; BAS, basophils (absolute value); Ca, calcium; GLU, glucose levels; Lp(a), lipoprotein(a); Na, sodium; VAS, visual analog scale.
Discussion
This study assessed 6 machine learning algorithms—logistic regression, Naive Bayes, decision trees, random forests, gradient boosting, and XGBoost—to predict negative postoperative outcomes in patients with elbow stiffness. XGBoost outperformed others, using 14 key features: 10 laboratory indicators and 4 clinical features.
Compared with previous studies, represented by the Shanghai Prediction Model for Elbow Stiffness Surgical Outcome (SPESSO), 21 our study offers a renewed perspective both conceptually and methodologically. A key distinction lies in our prediction target. In contrast, SPESSO was dedicated to identifying patients likely to achieve a favorable outcome; our model is designed to predict a poor outcome, thereby providing a potential pathway to identify clinically high-risk individuals for whom alternative or adjunctive treatment options might be considered. Furthermore, we compared multiple machine-learning methods, enabling us to capture the complex better, nonlinear interactions among the biological pathways that contribute to elbow stiffness. Most importantly, we comprehensively integrated laboratory parameters, including 58 indicators, with clinical data. This systematic inclusion of biomarkers enabled us to hypothesize that recalcitrant elbow stiffness is not merely driven by local joint pathology but may instead originate from a systemic, biologically detectable “stiffness predisposition.”
We speculate that abnormalities in these 14 indicators define a systemic state that acts as a crucial contributory factor, predisposing individuals to an aberrant healing response after trauma. Our findings indicate that this predisposition contributes to adverse outcomes after open elbow arthrolysis, suggesting that such patients may possess an aberrant pathological remodeling tendency in response to trauma, because the surgical procedure itself constitutes a controlled trauma that disrupts tissue integrity to enable reconstruction. The pathological development of elbow stiffness begins with trauma-induced acute inflammatory response. If this inflammatory process becomes uncontrolled or the joint undergoes prolonged immobilization, persistent inflammatory signals stimulate excessive fibroblast proliferation and secretion of disorganized collagen, resulting in progressive fibrosis, thickening, and contracture of soft tissues such as the joint capsule and ligaments, with subsequent loss of elasticity.5,27 Heterotopic ossification (HO) may also form within soft tissue, creating bony impediments.15,27 The combination of soft tissue fibrotic contracture and HO, along with disuse muscle atrophy and adhesions caused by pain and limited movement, collectively contributes to decreased ROM, potentially initiating a vicious cycle of immobility, increased stiffness, and further immobility. This ultimately progresses to a state of elbow stiffness. From this, we can first understand why VAS is associated with poor outcomes.
Cases where patients have previously undergone open elbow arthrolysis for elbow stiffness but did not achieve a good flexion-extension outcome have been identified as a risk factor. This further supports our hypothesis regarding the predisposition of elbow stiffness, suggesting that reoperation may not necessarily benefit patients with a relevant type of constitution.
Among the 14 indicators, ALP was the most important according to LASSO and the second most important according to SHAP. A study on myocardial fibrosis shows that serum ALP levels are positively correlated with fibrosis markers and can activate profibrotic signaling pathways, such as transforming growth factor beta (TGF-β)/SMAD. 3 Given that the TGF-β/SMAD pathway may also be a key mechanism in the development of joint capsule fibrosis, it is plausible that elevated serum ALP may reflect a systemic profibrotic tendency, thereby contributing to a higher risk of elbow joint capsular contracture. Furthermore, ALP explicitly supports osteogenesis by hydrolyzing organic phosphates, aiding in bone mineralization,4,22 and is predictive of HO and postoperative outcomes.33,38
In the SHAP results, Lp(a) demonstrated the strongest predictive value. While this finding was unexpected, we respect it. Upon reviewing the literature, we were pleasantly surprised to uncover the possible relationship between Lp(a) and elbow stiffness, including at least 2 plausible biological mechanisms that could explain this association. Firstly, Lp(a) is a well-established proinflammatory mediator in the pathogenesis of atherosclerosis, partly due to its high concentration of oxidized phospholipids. 37 It is therefore hypothesized that this Lp(a)-mediated chronic, low-grade inflammation could also create a profibrotic microenvironment within the elbow joint capsule. Secondly, the high homology between apolipoprotein(a) and plasminogen allows Lp(a) to competitively inhibit plasminogen activation on fibrin and cell surfaces, which would impair fibrinolysis and promote fibrosis. 1 Consequently, high levels of Lp(a) may tip the balance toward excessive matrix accumulation, laying the foundation for promoting joint capsule fibrosis. Also, Lp(a) is an independent risk factor for atherosclerotic cardiovascular disease, closely associated with vascular and valvular calcification, as well as thrombus formation.18,28,49 In fact, calcification and thrombosis are closely associated with HO. 30,39,43 Vascular and valvular calcification is considered a form of HO and has been linked to the activation of key osteogenic pathways, including BMP2/4, Runx2, and Wnt/β-catenin signaling. 19,39 Moreover, as shown in recent studies, Lp(a) could induce osteogenic differentiation in cells. 2,52 Furthermore, there are numerous lipid factors associated with the formation of HO, such as low-density lipoprotein.7,8 This finding aligns with previous studies focusing on BMI. 35 In addition, serum ALT levels and butyrylcholinesterase (bCHE) are also related to BMI,41,44 and these 2 are risk factors in our results, which indicates that BMI and obesity status indeed have profound connections to elbow stiffness constitution.
Serum calcium is an important indicator of bone remodeling. Concerns have been raised regarding the potential for calcium and vitamin D supplementation to worsen HO, necessitating further treatment evaluations. 11 Verapamil, a calcium channel blocker, may reduce HO in animal models. 10
Basophil-derived interleukin-4 has been shown to be associated with tissue fibrosis, promoting the production of fibroblasts. 29 Sodium ions play an important role in osteogenic signal transduction through epithelium sodium channels, especially in converting mechanical stress in rehabilitation therapy into biological signals that stimulate osteoblasts. 50 Furthermore, research in other tissues provides strong mechanistic support for the hypothesis that high sodium leads to fibrosis: for instance, a high-salt environment has been demonstrated to upregulate the key fibrotic cytokine TGF-β in the kidney 13 and to directly induce a comprehensive proinflammatory and profibrotic response in peritoneal fibroblasts. 32 Therefore, it is reasonable to infer that a similar pathological pathway exists in the microenvironment of the elbow joint, wherein a high-sodium state could drive the excessive matrix production by capsular fibroblasts, ultimately leading to fibrotic contracture. Hyperglycemic states can activate TGF-β and advanced glycation end product/receptor for advanced glycation end product signaling pathways to promote tissue fibrosis. 48 CysC is associated with both fibrotic diseases and osteogenic phenotypes.6,17 Monosodium urate crystals formed by UA are closely related to inflammation around joints. 47 The upregulation of these indicators may put the body in the state of “elbow joint stiffness predisposition.”
Clinical factors, such as VAS scores, preoperative ROM, and smoking status, affect outcomes of elbow stiffness surgery.21,46 High VAS scores may decrease rehabilitation frequency due to pain, limited ROM could lead to severe ossification and challenging surgery, and smoking might impair joint recovery.
Indeed, the predictive power of our 14-marker signature likely stems not from an additive risk, but from its ability to identify a distinct patient phenotype defined by a systemic “elbow stiffness predisposition.” We conceptualize this predisposition as a state of convergent maladaptive potential, where diverse biological systems are primed for a pathological response. Specifically, it represents the synergistic interplay of at least 3 core axes: metabolic dysregulation, chronic inflammation, and a profibrotic state. The metabolic markers, those related to hepatic function and Lp(a), suggest a systemic environment that fuels low-grade inflammation. This chronic inflammatory milieu, in turn, sensitizes the downstream fibrotic pathways. Consequently, these patients exist in a state of heightened physiological tension. The arthrolysis does not simply initiate a normal healing cascade; instead, it acts as a potent trigger upon this preexisting, unstable foundation. This convergence explains why the healing process is “hijacked”—leading to a vicious feed-forward loop of excessive matrix deposition (fibrosis) and pathological mineralization (ossification), as indicated by markers like ALP. Thus, our model identifies a kind of patient whose fundamental capacity for tissue repair is intrinsically skewed toward a stiff, nonfunctional outcome.
Translating this insight into mechanistic validation is the crucial next step. While a compelling target like Lp(a) presents significant challenges for in vivo modeling due to its unique expression in primates, powerful and feasible alternatives exist. Priority should be given to ex vivo studies using patient-derived joint capsule tissue and in vitro models exposing capsular fibroblasts to patient serum. The way of exploring the keloid formation and the neuroimmune microenvironment could also be drawn on.23,24 These approaches would allow us to directly dissect how this systemic signature orchestrates the cellular-level fibrotic and osteogenic responses, bridging our clinical prediction to its underlying biological cause.
It must be acknowledged that this study has limitations. Due to the rarity of the disease and the limited number of surgical centers, the study lacks a strictly defined external validation set. Even with the implementation of a pseudo-external validation set, the assessment of the model's generalizability remains limited due to the small patient population, posing a risk of overfitting. In addition, we acknowledge that there is an inherent risk of identifying false positives when analyzing a large number of initial variables. To reduce this risk, we employed a multistep feature selection strategy, including literature hypothesis-driven variable preselection, univariate screening, and LASSO regression models. Second, this study is a retrospective single-center study. Although we did our best to collect a test set of non-contemporaneous (2016-2017) patients, providing a kind of “temporal validation” that reduces the risk of the model overfitting a particular short-term cohort, this does not constitute true external validation from different institutions. Therefore, the models and ideas we propose still require further confirmation through future multiagency collaborations and prospective studies. Additionally, we excluded patients with incomplete data, which may introduce selection bias. Future prospective studies should establish a more rigorous data collection process to minimize missing data.
Furthermore, a limitation inherent to our retrospective design is the absence of direct tissue-level data. We were therefore unable to correlate our key serological predictors, such as Lp(a) and ALP, with direct histological markers of fibrosis or osteogenesis within the joint capsule. Future prospective cohort studies, incorporating the systematic collection of tissue samples, are warranted to formally establish this crucial biological link. Additionally, we did not analyze the timing of recurrence during the follow-up period. In our study protocol, once a patient's ROM fell <100°, we terminated observation and included them in the cohort without recording whether the recurrence was rapid (early) or gradual (late). In the future, we will systematically record the timing and rate of recurrence in a larger cohort to explore the relationship between recurrence timing and predictive scores, thereby adding a temporal dimension to our hypothesis and enhancing its persuasiveness.
Finally, our current model is based on clinical and serological data and does not incorporate radiological features, such as systematic preoperative evaluations for HO. We recognize the immense value of creating a “multi-modal” predictive model, a direction in which the field of artificial intelligence is broadly moving. However, we made a deliberate decision to exclude high-dimensional radiomic features in this study. Given our sample size, introducing the vast number of features generated from medical imaging would have created a significant risk of model overfitting. This consideration, however, delineates a clear vision for our future research. Our planned next steps involve 2 potential paths: (1) developing a separate, dedicated radiomics model to investigate the “osteogenesis” aspect of the predisposition specifically, and (2) more ambitiously, expanding our multicenter collaborations to amass a dataset sufficient for building a truly integrative, multi-modal model.
In summary, machine learning is a valuable tool for predicting postoperative outcomes in medicine and warrants further investigation. This study developed an XGBoost model to predict poor flexion-extension outcome after open elbow arthrolysis. To strengthen these results, further validation in larger studies is needed. Certain markers, like Lp(a), are possibly linked to elbow stiffness, but have been underexplored in practice. Thus, researching these established indicators might lower treatment development costs and improve patient outcomes.
Conclusion
This study developed and validated a machine learning model, identifying XGBoost as the most effective algorithm for predicting poor flexion-extension outcomes after open elbow arthrolysis. The model demonstrated high predictive accuracy using a signature of 14 selected clinical and laboratory indicators. It is crucial, however, to distinguish our model's robust predictive capacity from a direct explanation of the underlying biological predisposition. Rather than definitively explaining this predisposition, our model successfully identifies key clinical and laboratory markers that contribute to it. These indicators provide important, albeit preliminary, insights into the systemic factors that may contribute to an “elbow stiffness predisposition.” Consequently, this work provides a strong foundation and clear direction for the targeted mechanistic investigations required to truly unravel the pathophysiology of elbow stiffness.
Supplemental Material
sj-pdf-1-ojs-10.1177_23259671251389129 – Supplemental material for Machine Learning Prediction of Poor Flexion-Extension Outcome After Open Elbow Arthrolysis: Identifying Individual Predisposition Using Clinical and Laboratory Indicators
Supplemental material, sj-pdf-1-ojs-10.1177_23259671251389129 for Machine Learning Prediction of Poor Flexion-Extension Outcome After Open Elbow Arthrolysis: Identifying Individual Predisposition Using Clinical and Laboratory Indicators by Xinyu Wang, Wencai Liu, Yuanhao Tong, Yaowei Lv, Limin Han, Cunyi Fan and Yuanming Ouyang in Orthopaedic Journal of Sports Medicine
Footnotes
Final revision submitted July 6, 2025; accepted August 19, 2025.
One or more of the authors has declared the following potential conflict of interest or source of funding: This study was funded by the National Key R&D Program of China, the National Natural Science Foundation of China, the Medical Engineering Co-Project at the University of Shanghai for Science and Technology, and the Shanghai Municipal Health Commission. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval for this study was obtained from Shanghai Sixth People's Hospital.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
