Abstract
Background:
Although some evidence suggests that machine learning algorithms may outperform classical statistical methods in prognosis prediction for several orthopaedic surgeries, to our knowledge, no study has yet used machine learning to predict patient-reported outcome measures after rotator cuff repair.
Purpose:
To determine whether machine learning algorithms using preoperative data can predict the nonachievement of the minimal clinically important difference (MCID) of disability at 2 years after rotator cuff surgical repair with a similar performance to that of other machine learning studies in the orthopaedic surgery literature.
Study Design:
Case-control study; Level of evidence, 3.
Methods:
We evaluated 474 patients (n = 500 shoulders) with rotator cuff tears who underwent arthroscopic rotator cuff repair between January 2013 and April 2019. The study outcome was the difference between the preoperative and 24-month postoperative American Shoulder and Elbow Surgeons (ASES) score. A cutoff score was calculated based on the established MCID of 15.2 points to separate success (higher than the cutoff) from failure (lower than the cutoff). Routinely collected imaging, clinical, and demographic data were used to train 8 machine learning algorithms (random forest classifier; light gradient boosting machine [LightGBM]; decision tree classifier; extra trees classifier; logistic regression; extreme gradient boosting [XGBoost]; k-nearest neighbors [KNN] classifier; and CatBoost classifier). We used a random sample of 70% of patients to train the algorithms, and 30% were left for performance assessment, simulating new data. The performance of the models was evaluated with the area under the receiver operating characteristic curve (AUC).
Results:
The AUCs for all algorithms ranged from 0.58 to 0.68. The random forest classifier and LightGBM presented the highest AUC values (0.68 [95% CI, 0.48-0.79] and 0.67 [95% CI, 0.43-0.75], respectively) of the 8 machine learning algorithms. Most of the machine learning algorithms outperformed logistic regression (AUC, 0.59 [95% CI, 0.48-0.81]); nonetheless, their performance was lower than that of other machine learning studies in the orthopaedic surgery literature.
Conclusion:
Machine learning algorithms demonstrated some ability to predict the nonachievement of the MCID on the ASES 2 years after rotator cuff repair surgery.
Shoulder pain is the third most common musculoskeletal complaint that drives patients to look for health care services, affecting 18% to 26% of adults. 34 These conditions are usually labeled as subacromial pain syndrome. They may often be associated with rotator cuff tears or injuries, which is why those with shoulder pain look for a shoulder specialist. 37
There has been an increasing trend in the surgical indication of rotator cuff surgical repairs in those with rotator cuff tears during the last decades,7,31,37 generating a high economic burden for treating this population.33,37 However, the functional results and retear rate after surgical repair are still disappointing. 39 Therefore, although rotator cuff surgical repair has considerable clinical benefits in symptomatic patients with rotator cuff tears, 22 the question of who would likely benefit from surgical treatment remains.
Choosing which surgery may be a good option depends on the interaction of several preoperative information, including14,46 the time of conservative treatment without clinical improvement (ie, 6 months) and the belief that rotator cuff tears may worsen and impair patient prognosis in the future, even without evidence of it. 5 This information is often used by clinicians to decide when to prescribe surgical treatment or continue using conservative approaches. However, these heuristics do not truly assist clinicians during their decision-making process to deliver surgical treatment for those most likely to benefit from it. 5
Few studies in the literature developed prognostic models to address this issue by identifying prognostic factors for treatment outcomes using classical statistical methods (eg, regression models) for clinical or structural outcomes with large sample sizes.12,20,27,36 However, although some models have shown promising results in predicting rotator cuff healing, 27 the clinical utility of algorithms developed to predict disability has often been limited, and their performance has been disappointing.
Recently, methods that combine computational science and statistics to maximize the accuracy and predictive power of data, known as machine learning, 51 have begun to be used to predict clinical outcomes in several health conditions.9,30,40,45 These studies have shown that machine learning algorithms can predict, with high performance, the achievement of clinically significant outcomes after several orthopaedic surgeries. 26 However, to our knowledge, no study has used machine learning to predict patient-reported outcome measures after rotator cuff repair.
The main goal of this study was to determine whether machine learning algorithms using preoperative data can predict the nonachievement of clinically significant disability improvement 2 years after rotator cuff surgical repair with a similar performance to that of other machine learning studies in the literature on orthopaedic surgeries. We hypothesized that machine learning algorithms would predict the nonachievement of the minimal clinically important difference (MCID) with a similar performance to that of other machine learning studies in the literature on orthopaedic surgeries—with the area under the receiver operating characteristic curve (AUC) values of >0.7.
Methods
Data Source
We observed a cohort of 474 patients (n = 500 shoulders) with subacromial pain syndrome associated with rotator cuff tears who underwent arthroscopic rotator cuff repair between January 2013 and April 2019. The surgical procedures were performed by 4 surgeons in the same institution. The inclusion criteria were as follows: primary arthroscopic rotator cuff repair (partial or complete); having undergone standardized predata collection; and having preoperative magnetic resonance imaging (MRI). Those who had debridement without rotator cuff repair, open or mini-incision surgeries, or previous surgery in the same shoulder were not included. The study protocol received institutional review board approval, and the requirement for informed consent was waived. This study followed the guidelines of the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis. 42
Surgery and Rehabilitation
The surgeries were performed in beach-chair or lateral decubitus positions under general anesthesia associated with brachial plexus block, depending on the surgeon's preference. Bursectomy was routinely performed. Depending on the surgeon's preference, acromioplasty was either performed or not. Patients with symptomatic arthrosis in the acromioclavicular joint—defined as pain on local palpation with MRI findings—underwent distal resection. The long head of the biceps was approached when it presented subluxation or dislocation partial lesions >25% or in the presence of type 2, 3, or 4 superior labrum anterior to posterior lesions. 49 Tenotomy was performed for patients aged ≥60 years. Tenodesis was performed on younger patients, athletes, or those with a body mass index of >25 kg/m2, regardless of age.
Rotator cuff repair was performed in single or double rows for posterosuperior tears and single rows for subscapularis tears. Immobilization with a sling was maintained for 4 to 6 weeks. Movements with the elbow, wrist, and fingers were allowed from the first day after surgery. Passive exercises were started after the end of the third week. Active assisted and free active exercises were started after the sling was removed. Muscle strengthening was performed only after a significant gain in movement, around the 12th week. Patients were released for sports activities at 6 months, as long as the range of motion and strength were reestablished.
Study Outcome
The outcome of this study—also known as the target variable—was the difference between the preoperative and 24 monthly postoperative scores of the American Shoulder and Elbow Surgeons (ASES) standardized shoulder assessment form. 47 A cutoff score was calculated based on the established MCID of 15.2 points to separate success (higher than the cutoff) from failure (lower than the cutoff). 38
Predictors
Predictors were selected according to already identified risk factors routinely collected at the institution. Demographic characteristics and comorbidity data were collected by a research assistant in the preoperative period using a questionnaire that collected information on patient characteristics, previous illnesses, and life habits, such as smoking.
Predictors with information on the type, location, and extent of shoulder injuries were obtained retrospectively through the analysis of the MRI files together with the radiological report by an attending orthopaedic surgeon (E.A.M.) with >10 years of experience who was blinded as to the surgical procedures and patients’ clinical outcomes. In all patients, the interval between the examination and surgery was <12 months.
The following variables were selected from the analyses: (1) supraspinatus tear evaluated according to tendon thickness (partial or full-thickness tear); (2) retraction (<3 or ≥3 cm); (3) extension (affects part of the tendon or the entire extension affected); (4) tear of the anterior portion; (5) infraspinatus tear, categorized into tendon thickness (intact, partial tear, or full-thickness tear), retraction (<3 or ≥3 cm), and extension (intact, superior portion, or the entire tendon); (6) subscapularis tear, categorized as intact, partial tear of the upper third, full-thickness tear of the upper third, or tear involving the upper two-thirds or the entire tendon; (7) fatty degeneration of the supraspinatus, (8) infraspinatus and subscapularis muscles (according to Goutallier et al 17 and modified by Fuchs et al 13 ); (9) long head of the biceps tear (stable, partial tear, complete tear) and instability (topical, subluxated, dislocated, depending on its position in the biceps sulcus, or not applicable in cases of complete tear); and (10) the presence of arthrosis of the glenohumeral joint (absent or present). Symptomatic acromioclavicular arthrosis was defined as pain on local palpation with MRI findings, such as capsuloligamentous thickening and osteophytosis.
Statistical Analysis
We used a random sample of 70% of patients to train the algorithms, and 30% were left for performance assessment, simulating new data. Stratified cross-validation with 10 folds was used to train the models and adjust hyperparameters. The predictors were normalized by the z score. The strategy of the last observation carried forward was adopted, with 12-month outcome data when there were missing outcome variable values at 24 months. Patients with missing data at 24 and 12 months were excluded from the analysis.
The borderline Synthetic Minority Oversampling Technique (SMOTE) was used to balance classes of the target variable. 19 Automated methods of selection of variables were not used because they were chosen on theoretical basis from experts; however, a multicollinearity test was performed, and any variable with >0.9 of correlation was removed. 6 Hyperparameter optimization of all algorithms, including logistic regression, was performed to optimize the AUC by the Optuna library, 1 with the tree-structured Parzen estimator 4 as the search algorithm and asynchronous successive halving algorithm 32 as the early stopping algorithm. We applied the same preprocessing steps and feature selection techniques to all algorithms in the study to ensure a fair comparison. The AUC was used to evaluate the performance of the models. We also extracted the accuracy, precision, recall, and F1 score to evaluate the models. The 95% bootstrap confidence interval was calculated to assess the variability of these metrics. To interpret the final model, the SHAP (SHapley Additive exPlanations) was used to understand the influence of the variables. 35
After removing highly correlated variables (r > 0.9) and identifying variables (eg, patient number), 24 variables were selected to develop the predictive models. The algorithms used were as follows: random forest classifier, light gradient boosting machine (LightGBM), decision tree classifier, extra trees classifier, extreme gradient boosting (XGBoost), k-nearest neighbors (KNN) classifier, CatBoost classifier, and logistic regression.
Results
During the period evaluated, 651 surgeries were performed for rotator cuff repair. The following were excluded: 84 open procedures, 10 debridement surgeries, 26 cases with previous shoulder surgery, 12 patients without postoperative clinical evaluation, and 19 patients with incomplete pre- or perioperative evaluation data. The analyzed sample consisted of 474 patients (n = 500 shoulders). Data imputation from outcomes at 12 months was necessary in the functional assessment in 76 cases (15.2%) because of missing data at 24 months. Therefore, 474 patients were included in the study and further divided into train and test datasets. The preoperative predictors and their summarized values can be seen in Tables 1 to 3. As the target variable was highly imbalanced—that is, 17.2% of participants did not reach the minimum MCID in 24 months after surgery—it was further corrected with the borderline SMOTE technique. 19
Demographic and Comorbidity Predictors of Patients Included in the Models (N = 474) a
Data are presented as mean ± SD or frequency (%). ASES, American Shoulder and Elbow Surgeons.
Supraspinatus and Infraspinatus MRI Predictors Included in the Model a
Data are presented as frequency (%). MRI, magnetic resonance imaging.
Subscapularis, Long Head of the Biceps, and Glenohumeral Joint MRI Predictors Included in the Model a
Data are presented as frequency (%). MRI, magnetic resonance imaging.
The most common models for prediction with structured data were then fitted. Table 4 presents the performance and variability (ie, 95% bootstrap CIs) of the models in the test dataset, ordered by the AUC metric. The model with the best AUC metric was the Random Forest Classifier (0.68 [95% CI, 0.48-0.79]). An interface with the trained random forest classifier algorithm deployed was developed for clinical application using Streamlit (https://bit.ly/rotatorcuffsurgeryAI).
Performance Measures of the Prediction Models a
Data in parentheses are 95% bootstrap CIs. AUC, area under the receiver operating characteristic curve; KNN, k-nearest neighbors; LightGBM, light gradient-boosting machine; XGBoost, extreme gradient boosting.
The SHAP was used to interpret the relationship of preoperative predictors with the model outcome that presented the best performance (Figure 1). The most relevant predictor was the extent of supraspinatus affected, in which patients with complete supraspinatus involvement were more likely to not reach the MCID 24 months after surgery. The second most relevant predictor was the preoperative ASES score, in which the higher the preoperative score, the greater the chance of not reaching the MCID 24 months after surgery. The third most relevant predictor was the presence of Goutallier grade ≥2 fatty infiltration of the supraspinatus tendon, in which patients without it were more likely not to reach the MCID. The fourth most relevant predictor was age, in which younger patients were more likely to not reach the MCID. The interpretation of the other models can be found in the Supplemental Material, available separately.

SHAP values of the random forest classifier model. This figure provides other relevant information for model interpretation: (1) the predictors are ordered from top to bottom according to their relevance; (2) the more to the right the points of a variable are, the greater the influence of the variable in predicting the outcome (ie, not reaching the minimum MCID); and (3) the redder the point, the higher the predictor value; and the bluer the point, the lower the predictor value. ASES, American Shoulder and Elbow Surgeons; MCID, minimal clinically important difference.
Discussion
We found that machine learning algorithms could predict the nonachievement of the disability MCID 2 years after a rotator cuff repair surgery. Still, their performance was lower than that of other machine learning studies in the orthopaedic surgery literature. 26 The mean AUC of all algorithms was 0.62 in the test set, of which the Random Forest Classifier and LightGBM presented the highest measures (0.68 and 0.67, respectively). In addition, the mean recall and precision were relatively low (0.41 and 0.25, respectively).
A recent systematic review has identified 18 studies that used machine learning algorithms to predict MCIDs after several orthopaedic surgeries, of which 7 concerned outcomes after spine surgery, 6 after sports medicine procedures, 3 after total joint arthroplasty (hip and knee), and 2 after shoulder arthroplasty. 26 Most studies performed well in predicting the MCID, with AUCs ranging from 0.7 to 0.9.
Although we were not able to find other studies that developed machine learning algorithms to predict outcomes after rotator cuff repair surgery, 2 studies used machine learning to predict MCID achievement after primary shoulder arthroplasty. In both studies, the patients were separated into cohorts undergoing anatomic shoulder arthroplasty and reverse arthroplasty. Kumar et al24,25 used the XGBoost classifier to accurately predict MCID achievement, with AUCs between 0.7 and 0.98.
While the random forest classifier model was 8 points ahead in the AUC compared with the logistic regression—a model commonly used in health care for predictions—the more complex ensemble models (ie, LightGBM, XGBoost, CatBoost) did not outperform random forest. One possible explanation for this is that the ensemble models perform better with massive quantities of data, and the present study used a few variables and people to train the algorithm. Additionally, the KNN and CatBoost classifier algorithms did not outperform logistic regression.
Therefore, although the development of accurate prognostic models for orthopaedic surgeries using machine learning algorithms is undoubtedly a natural next step, given that artificial intelligence has shown promising results in predicting outcomes in various scientific fields, including health care, better than classical statistical methods,21,44,48 the overall performance of the algorithms developed in this study was moderate.
The mean low recall and precision values as well as the inability to achieve higher AUC values in our models may be related to a few limitations of this study that should be noted. The analyses conducted during this study were not planned. Therefore, some relevant preoperative predictors of outcomes after rotator cuff repair surgery may not have been included in the analysis. While several studies suggest that the presence of systemic diseases (eg, diabetes) and the severity of rotator cuff tear or injury are relevant prognostic factors for rotator cuff repair outcomes,11,16,18,28,31,43 evidence suggests that socioeconomic and psychosocial factors might also provide relevant information about rotator cuff repair outcomes. 50
The use of socioeconomic and psychosocial variables as predictors of outcomes after surgeries is supported by studies identifying that these variables are relevant prognostic factors for failed back surgery syndrome and chronic postsurgical pain in several conditions.3,8,15,23,29,41 Finally, although the sample size used in this study is, to our knowledge, one of the highest analyzing clinical outcomes of rotator cuff repair literature, it is important to note that training our machine learning models with a dataset of only 350 patients might limit their utility. Machine learning models typically benefit from larger training datasets, and their performance would likely improve significantly with an increase in training data in the future. Therefore, future studies should also use higher sample sizes and preoperative measures of socioeconomic status and psychosocial factors as potential predictors for this population to develop better prediction models. Moreover, the strategy for outcome missing data (ie, using 12 months assessment when 24 months were not available) might have impacted the analysis, despite the mean differences in the ASES score in 12 and 24 months being relatively close.
It is important to note that the MCID of 15.2 points used for the ASES scale in our study was based on the mean found among 3 methods to determine the MCID in a study by Malavolta et al 38 (ie, distribution, anchor, and minimum detectable change). Those authors observed values ranging from 6.1 (anchor method) to 26.3 (minimum detectable change). Therefore, different MCID values can influence the predictors’ weight in further models. Additionally, patients with preoperatory ASES scores >84.8 (ie, which corresponds to 2% of our dataset) may not be able to achieve the MCID used in this study, which is higher than their potential improvement. This limitation should also be considered when interpreting the results.
Furthermore, the structural outcome of the rotator cuff repair was not evaluated in the present study since it is not part of our routine to perform postoperative MRIs to assess tendon healing. Although the integrity of rotator cuff tendons does not seem to be directly associated with functional outcomes,2,10 future studies might also benefit clinicians during their decision-making with models to predict retears after surgeries.
Clinical Implications
Supervised machine learning algorithms are primarily designed to learn from hidden patterns in available data about maximizing outcome prediction, rather than explaining causal relationships between a prediction and the outcome. This is because they adjust the weight of each variable based on the hyperparameters set and treat each category of categorical variables as a variable by itself, which makes it difficult to interpret the individual role of each predictor variable. This is also one of the reasons why machine learning algorithms may outperform conventional statistical methods and linear thinking. Nevertheless, the results of this study suggest that the use of machine learning algorithms might be a promising new tool that can assist clinicians during clinical decision-making to decide when to prescribe surgical treatment or continue using nonoperative approaches to treat patients with rotator cuff tears.
Conclusion
We found that machine learning algorithms demonstrated some ability to predict the nonachievement of the disability MCID 2 years after a rotator cuff repair surgery. Still, their performance was lower than that of other machine learning studies in the literature on orthopaedic surgeries.
Supplemental Material
sj-pdf-1-ojs-10.1177_23259671231206180 – Supplemental material for Using Machine Learning to Predict Nonachievement of Clinically Significant Outcomes After Rotator Cuff Repair
Supplemental material, sj-pdf-1-ojs-10.1177_23259671231206180 for Using Machine Learning to Predict Nonachievement of Clinically Significant Outcomes After Rotator Cuff Repair by Rafael Krasic Alaiti, Caio Sain Vallio, Jorge Henrique Assunção, Fernando Brandão de Andrade e Silva, Mauro Emilio Conforto Gracitelli, Arnaldo Amado Ferreira Neto and Eduardo Angeli Malavolta in Orthopaedic Journal of Sports Medicine
Footnotes
Final revision submitted May 17, 2023; accepted May 30, 2023.
The authors have declared that there are no conflicts of interest in the authorship and publication of this contribution. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval for this study was obtained from the Clinical Hospital of the Medical School of the University of Sao Paulo, Sao Paulo, Brazil (protocol No. 2.778.930).
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
