Abstract
Background:
Clinical decision-making often relies on evidence-based medicine, derived from objective data with conventional and rigorous statistical tests to evaluate significance. The literature surrounding rehabilitation after rotator cuff repair (RCR) is conflicting, with no defined standard of practice.
Purpose:
To determine the fragility index (FI) and the fragility quotient (FQ) of randomized controlled trials (RCTs) evaluating rehabilitation protocols after RCR.
Study Design:
Systematic review.
Methods:
A systematic review was performed according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines by searching the PubMed, Cochrane Library, and Embase databases for RCTs evaluating rehabilitation protocols after arthroscopic RCRs from 2000 to June 1, 2022. The FI was determined by manipulating the dichotomous outcome events from each article until a reversal of significance with 2 × 2 contingency tables was achieved. The FQ was determined by dividing the FI by the sample size.
Results:
Fourteen RCTs with 48 dichotomous outcomes were ultimately included for analysis. The mean FI for the included dichotomous outcomes was 4 (interquartile range, 3-6), suggesting that the reversal of 4 events is required to change study significance. The mean FQ was 0.048. Of the RCTs that reported data regarding loss to follow-up, most studies (58.5%) indicated that >4 patients had been lost to follow-up.
Conclusion:
The results of RCT studies of RCR rehabilitation protocols are moderately fragile, something clinicians should be aware of when implementing study results into practice. We recommend the inclusion of FI and FQ in addition to standard P values when reporting statistical results in future RCTs with dichotomous outcome variables on this topic.
Keywords
Clinical decision-making in orthopaedic surgery is usually based on current literature, with randomized controlled trials (RCTs) that compare ≥2 interventions and evaluate a series of continuous and categorical outcomes, providing the highest level of evidence. The ideal rehabilitation protocol after rotator cuff repair (RCR) remains controversial among orthopaedic surgeons. 11,20 Large RCTs have sought to define a standard of care, with most studies comparing prolonged immobilization with early range of motion (ROM) protocols. 25 Systematic reviews and meta-analyses have also been published on rehabilitation after RCR with conflicting conclusions, suggesting that the optimal postoperative protocol remains unknown. 23 –25 These systematic reviews are limited by the poor quality of included studies, mostly because of a small sample size of included studies, and heterogeneity among studies that can affect the strength of final conclusions. Despite their limitations, both RCTs and systematic reviews that evaluate the existing literature on rehabilitation after RCR often affect surgeons’ practices. Therefore, the robustness of these conclusions, or lack thereof, should be better scrutinized and reported in a transparent fashion to help surgeons use the best evidence-based medicine.
The P value is an important metric, along with other metrics such as effect size, that RCTs use to test significance and justify the conclusions they draw. Most often, the α value, or the chance that an alternative hypothesis found true is actually due to chance, is used, with statistical significance set at P < .05. Although statistical tests are imperative to help the surgeon draw conclusions from a study, the use of P values alone to ascribe significance may not optimize statistical rigor. 32 Because significance is usually assigned an otherwise arbitrary value of less than an α of .05, outcomes sometimes require a reversal of only 1 to 2 events to change the significance of an outcome itself. 2,17,34 The fragility index (FI) is a relatively new concept, developed by Feinstein 8 in 1990 and used to characterize the stability (or fragility) of a given dichotomous outcome. The AAOS guidelines suggested that an FI of >2 was considered statistically robust. 9 The FI is calculated by manipulating outcome events until a reversal of significance is achieved. A low FI signifies that the outcome is statistically fragile since it would require minimal manipulation of the outcome event to reverse significance.
To mitigate the shortcomings of FI and its independence of sample size, the fragility quotient (FQ) was developed not long after. 36 The FQ is calculated by dividing the FI by the sample size. Together, the FI and FQ can help augment RCTs’ statistical reporting and better characterize each outcome’s statistical stability. Several studies have commented on the fragility of the literature surrounding shoulder surgery and RCR. 27 –29 However, none of the current published RCTs evaluating rehabilitation after RCR include fragility analysis (either FI or FQ). The lack of fragility analysis in the current literature on this controversial topic limits the confidence a surgeon can have in the robustness of the conclusions of these studies and whether to implement study recommendations into practice.
The purpose of this study was to analyze dichotomous outcomes in RCTs evaluating rehabilitation after RCR to determine the FI and the FQ of these trials. Our hypothesis was that the conclusions drawn regarding rehabilitation after RCR would be statistically fragile and support inclusion of FI and FQ in future RCTs on this topic.
Methods
Search Strategy
This systematic review was performed according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. Three databases, including Embase, Cochrane Library, and PubMed, were searched by 2 reviewers (S.B.S. and M.A.W.) up to June 1, 2022, using the search string “rehabilitation” OR “immobilization” AND “rotator cuff repair.”
Eligibility Criteria
To meet inclusion criteria, selected articles had to be a comparative RCT that evaluated outcomes for a rehabilitation intervention after RCR. They also had to evaluate at least 1 dichotomous outcome variable and report P values for that variable. Studies were excluded if they were in a language other than English or if the full text was not available. Data extraction from each study was performed independently by 2 reviewers (S.B.S. and M.A.W.) and reconciled by a third reviewer (A.M.M.).
The primary outcome of our study was the mean FI and FQ across all dichotomous outcome variables reported in the original study as statistically significant. The secondary outcomes were the FI and FQ examined separately for both significant and nonsignificant outcome variables.
Statistical Analysis
The FI and FQ were calculated for all dichotomous outcome variables in the included RCTs. To calculate the FI and FQ, we recorded outcome events in a 2 × 2 contingency table. Both significant and nonsignificant dichotomous outcomes were evaluated. The original P value was recorded for each outcome, and the Fisher exact test was used to verify the accuracy of the original, reported P value. Iterative manipulation of each outcome event was subsequently performed until a reversal of significance (P < .05) was achieved. At this point, the number of events required for a reversal of significance was recorded as the FI. The FI of all dichotomous outcomes within included RCTs was calculated in an identical manner. The FQ was determined by taking each FI as a proportion of the total sample size. Means and interquartile ranges (IQRs) were computed for the FI and FQ of each outcome to better comment on the variability in the statistical fragility between the 25th and 75th percentiles.
Data regarding loss to follow-up were also evaluated for all studies by determining the sample size of patients who were initially included and then subsequently analyzed for each outcome. For example, if 450 patients were included in the study but only 437 returned to undergo imaging for a postoperative cuff tear, the number lost to follow-up was documented as 13. However, if 445 returned for their first postoperative visit and were evaluated for stiffness, the number lost to follow-up for stiffness was documented as 5.
Results
Descriptive Summary of Included RCTs
Of 692 RCTs that were identified from the PRISMA search, 14 RCTs * met all inclusion criteria and were included in the statistical fragility analysis (Figure 1). The characteristics of the included studies are shown in Appendix Table A1. Of those RCTs that met inclusion criteria, 9 † (64.3%) were classified as having level 1 evidence and 5 3,6,12,21,26 (35.7%) as having level 2 evidence. Eleven studies ‡ (78.6%) performed an a priori power analysis, 2 studies 18,31 (14.3%) had no information regarding power analysis, and 1 study 5 conducted a post hoc power analysis. All included studies with a power analysis were found to be adequately powered. The mean sample size for the included RCTs was 100.6 ± 35.5 patients. A total of 48 dichotomous outcomes from the included articles were evaluated for this study, with 7 initially reported as statistically significant and 41 as insignificant. Eighteen (37.5%) of the dichotomous outcomes evaluated were primary outcomes. The rate of retear at various time points was the most common dichotomous outcome across the 14 RCTs included within this study, with some studies including multiple retear outcomes (n = 16; 33.3%). Other commonly evaluated dichotomous outcomes included complication rates and progression to different stages of rotator cuff degeneration, as measured by classification systems such as the one by Sugaya et al. 33

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for systematic review of rehabilitation after rotator cuff repair (RCR).
Basic FI and FQ Analysis
The mean FI of the 48 dichotomous outcome events was 4 (IQR, 3-6). The mean FQ of the 48 dichotomous outcomes was 0.048 (IQR, 0.032-0.063). The mean FI of the 7 significant dichotomous outcome events was 2 (IQR, 1-3), and the FQ of significant outcome events was 0.036 (IQR, 0.025-0.050). The mean FI of the 41 insignificant dichotomous outcome events was 5 (IQR, 4-6), and the FQ of insignificant outcome events was 0.05 (IQR, 0.034-0.067). The mean FI was 5 when evaluating outcomes involving retear (Table 1), and the mean FQ was 0.02.
The FI and FQ for All Analyzed Outcomes (n = 48) From the 14 Included Randomized Controlled Trials a
a FI, fragility index; FQ, fragility quotient.
Analysis of Loss to Follow-up
Of the 48 outcome events across 14 RCTs, 7 outcome events in 3 studies 3,5,35 did not have data regarding loss to follow-up. These outcomes included bursitis, 35 echogenicity at 6 weeks and 3 months, 35 and 4 retears. 5,30,35 Of the outcome events with data regarding loss to follow-up, 24 of 41 (58.5%) had lost >4 patients.
Discussion
This study found that the conclusions in RCTs evaluating rehabilitation after RCR were moderately fragile, with a mean FI of 4 (IQR, 3-6) and FQ of 0.048. This finding suggests that on average, only 4 outcomes are required to reverse the significance for the dichotomous outcomes included within this study. Furthermore, for 58.5% of outcomes, the lost to follow-up numbers were >4, suggesting that had compliance with follow-up been maintained, conclusions drawn in these RCTs could have been different. Knowing the fragility of a study may influence clinicians’ willingness to adopt study recommendations into practice. Future statistical reporting on this subject should include FI and FQ in addition to P values to provide clinicians with a more complete picture of the robustness of the data and aid in clinical decision-making.
Rehabilitation protocols after surgical RCR have been extensively studied in the literature, though the data are conflicting. 23,25 A recent systematic review on early versus delayed rehabilitation after surgical RCR concluded that although there was no difference between the 2 rehabilitation groups for most clinical outcomes and retear rates, the early rehabilitation group exhibited superior ROM compared with the late-rehabilitation group. 25 Another systematic review that evaluated 16 level I to II studies also found that there was no difference in functional or retear rates between early and delayed ROM, although external rotation was better in the early ROM cohort. 23 A third recent systematic review evaluating postoperative rehabilitation protocols found a possible benefit of better functional outcomes at the risk of increased retear rates, conflicting the aforementioned reviews. 4 These reviews and meta-analyses are limited by the quality of the data in the primary studies and also by the heterogeneity of the included studies. None of the previous studies, reviews, or meta-analyses assessed the statistical fragility of results, although the conflicting nature of the data and lack of consistent conclusions may suggest a lack of robustness to the data.
While the fragility of studies on rehabilitation after RCR has not been previously assessed, our findings that studies on this topic are statistically fragile are consistent with reviews that have evaluated statistical fragility of other orthopaedic literature. Khan et al 13 evaluated statistical fragility in the orthopaedic sports medicine literature and found that, over a 10-year period, the mean FI of study outcomes was 2. A more recent study of the statistical fragility of the orthopaedic sports literature determined that the FI was 5. 28 Parisien et al 27 found that conclusions regarding the efficacy of platelet-rich plasma were statistically fragile, with a mean FI of 4 and FQ of 0.092. In this analysis, they also found that for about one-third of outcomes, the study had a lost to follow-up number greater than the FI, suggesting that had better follow-up been maintained, statistical significance and conclusions may have been reversed, assuming the outcomes of the patients lost to follow-up trended in the opposite direction from those who were evaluated. Even studies outside of orthopaedic surgery, including those in gynecologic surgery and cardiovascular research, have found similarly low FI and FQ, suggesting that the poor rigor of statistical reporting is not unique to orthopaedic surgery. 7,29 A study of journals with the highest impact factors, including the New England Journal of Medicine and Lancet, found that study conclusions were comparatively less fragile than what is found in other journals but ultimately still statistically fragile. 14 Despite focusing on higher-impact journals and more recent literature, our study still found that the literature comparing rotator cuff rehabilitation protocols is quite fragile.
This study is unique in demonstrating the fragility of a specific and important practice among shoulder and elbow surgeons that has not been previously studied, to our knowledge. By examining the literature on rehabilitation after RCR in this manner, the included previously published RCTs can all be better interpreted. FI and FQ add information beyond the published P values that can help clinicians better interpret the robustness of study results and determine whether conclusions should be incorporated into clinical practice.
Based on the moderate statistical fragility found in this analysis, we recommend that future RCTs examining rehabilitation after RCR tailor their study design and statistical analysis to incorporate FI and FQ. As suggested by a previous fragility study in the literature, RCTs with a larger sample size and greater power will inherently produce higher FI and FQ, optimizing their statistical rigor and the strength of the subsequent conclusions of the study. 1 We posit that the consistent and regular reporting of FI and FQ in tandem with P values, as well as larger sample sizes and greater power in future RCTs, will help to specifically address previous deficiencies in the literature and determine a gold standard for rehabilitation after arthroscopic RCR. FI and FQ provide physicians who review literature another way to critically examine the significance of findings and gauge the clinical relevance of each trial with respect to patient care. Although the results of this analysis are related to rehabilitation after RCR, the concepts of FI and FQ can and should be broadly applied to other facets of the orthopaedic surgery literature to enhance the critical examination of RCT findings to best inform future clinical practice.
Limitations
Although, to our knowledge, this study is the first of its kind to evaluate the statistical fragility of conclusions drawn regarding rehabilitation after RCR, it does have its limitations. First, the inclusion of only high-impact orthopaedic and physical medicine and rehabilitation, while intentional, may have excluded RCTs that would have otherwise fit inclusion criteria. In addition, the concept of FI has intrinsic limitations itself. FI is a stand-alone value with no prescribed threshold to indicate fragility or stability of the study in question and, moreover, does not incorporate the study’s sample size into consideration. FQ was introduced to mitigate some of these limitations, but even FQ still is limited by no true threshold to confer fragility or lack thereof. Also, this study did not include an evaluation of basic patient demographic factors that could influence outcomes in this analysis of rehabilitation outcomes after RCR. Finally, only dichotomous outcomes were included in the analysis of fragility. The inability to assess the fragility of continuous outcome variables limits the generalizability of the study findings.
Conclusion
The results of RCT studies of RCR rehabilitation protocols are moderately fragile, something clinicians should be aware of when implementing study results into practice. We recommend the inclusion of FI and FQ in addition to standard P values when reporting statistical results in future RCTs on this topic.
Footnotes
Notes
Final revision submitted February 20, 2023; accepted March 2, 2023.
One or more of the authors has declared the following potential conflict of interest or source of funding: M.A.W. has received education payments from Arthrex, Elite Orthopedics, Smith & Nephew, and Supreme Orthopedic Systems. A.M.M. has received education payments from Supreme Orthopedic Systems; consulting fees from Catalyst OrthoScience, DePuy/Medical Device Business Services, Globus Medical, Ignite Orthopedics, Stryker, and Zimmer Biomet; nonconsulting fees from Globus Medical; royalties from DePuy, Ignite Orthopedics, and Globus Medical; honoraria from Wright Medical; and has an investment interest in Ignite Orthopedics. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Appendix
Details of the Included Studies a
| Lead Author (Year) | LOE | Sample Size | Dichotomous Outcomes Included in Analysis |
|---|---|---|---|
| Lee (2012) 21 | 2 | 74 | Retear rates |
| Chou (2015) 5 | 1 | 24 | Small to medium retears, large retears |
| Jenssen (2018) 10 | 1 | 120 | Goutallier 0, Goutallier 1, Goutallier 2, Thomazeau 1 (early), Thomazeau 2 (early), Thomazeau 1 (late), Thomazeau 2 (late), Sugaya 2, Sugaya 3, Nonhealed rotator cuff |
| Keener (2014) 12 | 2 | 114 | Retear rates |
| Kim (2012) 15 | 1 | 105 | Retear rates |
| Kjær (2021) 16 | 1 | 82 | Retear rates |
| Koh (2014) 18 | 1 | 100 | Supraspinatus atrophy, stiffness, full-thickness retear, Sugaya 1, Sugaya 2, Sugaya 3 |
| Littlewood (2021) 22 | 1 | 73 | Retear rate, nonserious adverse event, serious adverse event |
| Mazzocca (2017) 26 | 2 | 73 | Failure |
| Sheps (2015) 30 | 1 | 189 | Retear rate, complication rate |
| Sheps (2019) 31 | 1 | 206 | Full-thickness tear, infraspinatus tears, infraspinatus atrophy (early), infraspinatus atrophy (late), reoperation rate, complication rate, persistent pain |
| Tirefort (2019) 35 | 1 | 80 | Bursitis, echogenicity (early), echogenicity (late), retear |
| Arndt (2012) 3 | 2 | 100 | External rotation <20°, external rotation >30°, adhesive capsulitis, nonintact cuff, recurrent tear, complete healing |
| Cuff (2012) 6 | 2 | 68 | Retear rates |
a LOE, level of evidence.
