Abstract
Background:
A P value of <.05 is often used to denote statistical significance; however, in many scenarios, this threshold is vulnerable to a small number of outcome reversals. This study joins a body of studies within the orthopaedic literature that evaluate the statistical fragility of existing research via metrics such as fragility index (FI) and fragility quotient (FQ).
Purpose/Hypothesis:
The purpose of this study was to investigate the statistical fragility of randomized controlled trials (RCTs) and comparative studies on the topic, given the resurgent interest in lateral extra-articular tenodesis (LET) to augment primary or revision anterior cruciate ligament reconstruction (ACLR). It was hypothesized that the outcomes reported in these studies would be statistically fragile.
Study Design:
Systematic review; Level of evidence, 4.
Methods:
Comparative studies and RCTs regarding LET as an adjunct procedure to ACLR published between 2000 and 2022 were analyzed. Descriptive characteristics, dichotomous outcomes, and continuous outcomes were extracted. The FI and continuous FI (CFI) were calculated by the number of event reversals to change significance; the FQ and continuous FQ (CFQ) were calculated to normalize the fragility metrics per sample size.
Results:
Of 455 studies screened, 29 studies were included (9 RCTs, 20 comparative); 79.3% of included studies were published after 2020. A total of 48 dichotomous and 265 continuous outcomes were analyzed. The median FI was 9.0 (IQR, 7.0-13.3), with FQ of 0.1 (IQR, 0.04-0.17); the median CFI was 7.8 (IQR, 4.2-19.6), with CFQ of 0.12 (IQR, 0.08-0.19). The FQ and CFQ for studies on LET with revision ACLR were larger (0.117 and 0.113, respectively) than those focused on primary ACLR (0.042 and 0.095, respectively).
Conclusion:
Studies focused on LET with primary ACLR were more fragile than those on LET with revision, which suggests that further research on the indications for LET with primary ACLR is necessary. Future orthopaedic comparative research should include fragility metrics alongside traditional P values.
Keywords
Injury to the anterior cruciate ligament (ACL) is one of the most common sports-related injuries, affecting more than 200,000 patients in the United States alone per year.21,24 For the past 2 decades, ACL reconstruction (ACLR) has been considered the gold standard surgical treatment for ACL injuries in patients with symptomatic instability who desire a return to cutting or pivoting activities.46,48 Extensive research has been devoted to perfecting ACLR technique and producing favorable outcomes.6,16,23,49 Despite these advances, primary graft failure rates have remained unacceptably high, between 2.8% and 30%.7,32,50
The ACL is responsible for preventing anterior translation of the tibia and providing rotational stability to the knee joint. While the former is often restored via ACLR alone, there has been biomechanical evidence of residual rotational laxity after ACLR. 18 Post-ACLR rotational instability, called anterolateral rotational instability, can impede functional recovery and graft survival, especially in patients returning to high-level athletics.18,25 ACLR may be augmented by a lateral extra-articular procedure to improve rotational stability and reduce risk of reinjury. The most common lateral extra-articular tenodesis (LET) technique is the nonanatomic modified Lemaire procedure, which involves fixation of a strip of the iliotibial band to the lateral femoral epicondyle. Before the intra-articular ACLR was popularized, LET was performed in isolation; now it has re-entered the conversation to augment primary or revision ACLR to decrease the substantial risk of failure. 27 In recent years, contradictory evidence has been published on the effect of ACLR augmented by LET. Although some cadaveric studies have demonstrated the ability of LET augmentation to restore native joint kinematics and tibiofemoral stability, others have reported overconstraint or lack of significant effect.9,18,25,34,45
When the available literature consists of contradictory results, researchers are prompted to scrutinize the rigor of studies to help ensure quality, evidence-based decision-making. Typically, researchers use a P value of <.05 as the threshold to reject the null hypothesis. However, several other metrics may be used to understand the strength of a study. The fragility index (FI) represents the number of data points needed to reverse the statistical significance of an outcome if a data point were flipped from an event to a nonevent. 49 Studies with a low FI are statistically weak, and knowledge that statistical significance can be reversed by a small number of patient events can influence interpretation of findings. Furthermore, the fragility quotient (FQ)—calculated as the FI divided by the sample size—represents the percentage of data point reversals needed to change statistical significance; the FQ normalizes the FI based on study size. 1 Whereas FI and FQ measure the statistical fragility of dichotomous outcomes (ie, variables with only 2 categories [eg, sex]), the continuous FI (CFI) was developed to measure the statistical fragility of continuous outcomes (ie, variables that take on a value within a range [eg, age]), expanding the application of these metrics. 3 Similarly, the CFI is derived by identifying the minimum number of patients moved from the experimental group to the control group to change significance and divided by the sample size to give the continuous FQ (CFQ).
With reinterest in LET, there is an imperative need to evaluate the rigor of clinical studies regarding LET with ACLR. This study aimed to present a comprehensive picture of the robustness of evidence from comparative studies regarding LET, to inform evidence-based medical decision-making for current practitioners. We hypothesized that the results of these analyses will show statistical fragility, consistent with similar evidence across the orthopaedic literature.
Methods
Search Strategy
Comparative studies and randomized controlled trials (RCTs) related to LET as an augmentation procedure to ACLR or revision ACLR published between 2000 and 2022 were identified and collected. Studies on the topic were broadly queried for relevance, and after screening, they were included or excluded based on both study-specific criteria and alignment with the current surgical trends for ACLR. Using the PubMed database, the initial search used the following terms: ((((((((((lateral extra-articular tenodesis) OR (lateral extraarticular tenodesis)) OR (LET)) OR (LEAT)) OR (lateral tenodesis)) OR (lateral plasty)) OR (lateral augmentation)) OR (anterolateral extra-articular procedures)) OR (AEAP)) AND ((anterior cruciate ligament) OR (ACL))) AND (((ACL reconstruction) OR (ACL revision)) OR (ACLR)).
Title and abstract screenings were then performed, and studies were included if they (1) pertained to LET as an augmentation to ACLR or revision ACLR and (2) were designed as comparative studies or RCTs. Studies solely regarding anterolateral ligament reconstruction (ALLR) were excluded, because although ALLR is a lateral augmentation procedure, it was not the focus of this analysis. The full text of each article was then examined carefully and excluded if the studies were (1) cadaveric, nonhuman, in vitro, laboratory, or surgical technique (without patient outcomes); (2) commentary, editorial, letter to the editor, conference reports, future study design/published protocol; (3) abstract only; (4) non-English; or (5) lacking the statistical basis for a fragility analysis. More specifically, studies were excluded on the basis of statistics in scenarios where (1) no calculated statistical comparison between 2 groups (treatment and control) was made, (2) reported outcomes were measured before treatment was administered, (3) comparative statistics for fragility analysis were not reported (descriptive statistics only, median), >2 groups were compared (4), and outcome measures (5) did not provide comparisons indicating treatment success (tendon displacement, flexion angle, etc). One author (R.B.) performed a second PubMed search for systematic reviews and meta-analyses on the topic, using the following terms: (((lateral extra-articular tenodesis) OR (lateral tenodesis)) OR (tenodesis)) AND (((((anterior cruciate) OR (anterior cruciate ligament)) OR (anterior cruciate ligament reconstruction)) OR (ACL)) OR (ACLR)). The studies included in each of these reviews were examined, cross-referenced with our existing list, and included if they met criteria.
Three authors (R.B., B.A., L.Z.) independently reviewed included papers and extracted variables of interest, including both dichotomous and continuous variables relevant to clinical decision making. Discrepancies were resolved via paired discussions.
Data Extraction and Statistical Fragility Analysis
Data collected for each paper included published journal, publication year, level of evidence, length of follow-up, trial type, and intervention used. All dichotomous and continuous outcomes related to postoperative results of the LET procedure were extracted. For each dichotomous outcome, the outcome assessed, sample size, number lost to follow-up, reported P value, and number of events were collected. For each continuous outcome, the sample size, number lost to follow-up, reported P value, standard error and/or standard deviation, and the sample means were collected.
For each dichotomous outcome, we calculated FI using a 2-by-2 contingency table and the Fisher exact test using the method outlined by Walsh et al 49 (Figure 1). Through an iterative process, 1 patient is moved from the negative group to the positive group until the statistical significance is flipped. The FI is represented by the number of patients moved. This was conducted for dichotomous outcomes that were initially reported as both significant and nonsignificant. To compare FI between studies of varying sample sizes, the FQ was calculated by dividing the FI by the sample size. 47

A demonstration of how the fragility index (FI) is calculated for dichotomous variables. In this example, a 9-subject event reversal (FI = 9) resulted in altered statistical significance. Fragility quotient (FQ) is calculated by dividing the FI with total number of patients in the study (FQ = 9/589 = 0.0153). ACLR, anterior cruciate ligament reconstruction; LET, lateral extra-articular tenodesis.
For each continuous outcome, we calculated a CFI using the Welch t test and the method proposed by Caldwell et al 3 as a way to expand fragility analysis to continuous variables. This statistical method has been refined by other studies such that it can be used for outcomes that do not report raw data, improving its utility. 51 The analysis for each outcome was conducted with simulations (n = 5) using synthetic, representative data generated from the reported sample mean, standard deviation, and sample size for both the experimental and the control arms (Figure 2). In an iterative process, a patient was moved from one data set to another until statistical significance was flipped. The CFI is represented by the number of patients moved. The CFQ was calculated by dividing the FQ by the sample size.

A demonstration of how the continuous fragility index (CFI) is calculated for continuous variables. In this example, an 11-patient event reversal (CFI = 11) resulted in altered statistical significance. The continuous fragility quotient (CFQ) is calculated by dividing the CFI by the total number of patients in the study (CFQ = 11/73 = 0.1507). ACLR, anterior cruciate ligament reconstruction; LET, lateral extra-articular tenodesis.
For both dichotomous and continuous outcomes, the statistical fragility was reported using median and interquartile range. Comparisons of mean statistical fragility were conducted using a nonparametric t test. Data were analyzed using Python 3.7 (Python Software).
Results
Of the 455 initially identified studies, 178 full texts were screened. Ultimately, 29 studies were included in the final analysis (Figure 3), including 20 comparative studies and 9 RCTs. Table 1 summarizes the characteristics of the included studies and outcomes that were measured. The included studies were published in 11 journals and performed in 11 countries; the American Journal of Sports Medicine published the majority of studies, followed by Arthroscopy. A total of 18 studies reported dichotomous outcomes and 27 studies reported continuous outcomes, which resulted in a cumulative total of 48 dichotomous outcomes and 265 continuous outcomes for analysis.

Flowchart of study inclusion. LET, lateral extra-articular tenodesis; RCT, randomized controlled trial; SR, systematic review.
General Characteristics of Included Outcomes From 29 Studies a
Data are presented as n (%). NA, characteristic not applicable; NR, not reported; RCT, randomized controlled trial.
Am J Sports Med, American Journal of Sports Medicine; ANZ J Surg, Australia and New Zealand Journal of Surgery; Arch Orthop Trauma Surg, Archives of Orthopaedic and Trauma Surgery; Arthroscopy, Arthroscopy: The Journal of Arthroscopic and Related Surgery; Eur J Orthop Surg Traumatol, European Journal of Orthopaedic Surgery & Traumatology; Int Orthop, International Orthopaedics; J Comp Eff Res, Journal of Comparative Effectiveness Research; Knee, The Knee; Knee Surg Sports Traumatol Arthrosc, Knee Surgery, Sports Traumatology, Arthroscopy; Orthop J Sports Med, Orthopaedic Journal of Sports Medicine; Orthop Traumatol, Journal of Orthopaedics and Traumatology.
Outcome types were categroized as clinical (primary versus secondary), translational, imaging, and other. Due to the clinical nature of this study, only clinical outcomes (115 regarding LET with primary ACLR, and 60 regarding LET with revision ACLR) were included in the final analysis.
Most studies were comparative (69%); however, most outcomes came from RCTs (59.7%). A total of 231 outcomes in the final analysis were nonsignificant, with P values ≥.05 (73.8%). Notably, a majority of studies were published after 2020 (79.3%), with 12 of 29 (41.4%) published in 2022. Most studies and outcomes focused on primary ACLR (75.9% of studies, 36.7% of outcomes). Figure 4 reports the distribution of FI and CFI values for all dichotomous and continuous outcomes. The median FI was 9.0 (IQR, 7.0-13.3), with median FQ of 0.1 (IQR, 0.04-0.17). The median CFI was 7.8 (IQR, 4.2-19.6), with median CFQ of 0.12 (IQR, 0.08-0.19).

Distribution of fragility index for all dichotomous and continuous outcomes.
Table 2 reports the FI by subgroups of dichotomous outcome characteristics. Commonly reported dichotomous outcomes included clinical failure, graft rupture, and return to sports. The median FI for graft rupture was 10, with a median FQ of 0.116. The median FI for outcomes measuring return to function was 6, with a median FQ of 0.040. The number of outcomes in which loss to follow-up (LTF) exceeded the FI was 13 (27%). Reported significant dichotomous outcomes were significantly more fragile than outcomes that were not significant (significant dichotomous outcomes FQ = 0.02, non-significant dichotomous FQ = 0.11; Welch t test, P <.001); however, a majority of dichotomous outcomes were reported as insignificant (79.2%). Dichotomous outcomes from RCTs were more fragile than dichotomous outcomes from comparative studies (FQ, 0.04 vs 0.14). Table 3 reports the CFI by subgroups of continuous outcome characteristics. Commonly reported continuous outcomes across different studies included the Knee injury and Osteoarthritis Outcome Score, International Knee Documentation Committee (IKDC), and Lysholm scores. The median CFI for Lysholm scores was 8.5, with a median CFQ of 0.128; the median CFI for IKDC scores was 7.2, with a median CFQ of 0.115. LTF exceeded the CFI for 20 outcomes (8%). In contrast to dichotomous outcomes, continuous outcomes from RCTs and comparative studies were equivalently robust (CFQ, 0.12 vs 0.12). The CFQs of both significant and nonsignificant continuous outcomes were similarly equal (0.12 vs 0.12).
Fragility Analysis for Subgroups of the 18 Studies Reporting Dichotomous Outcomes a
FI, fragility index; FQ, fragility quotient; RCT, randomized controlled trial.
Fragility Analysis for Subgroups of the 27 Studies Reporting Continuous Outcomes a
CFI, continuous fragility index; CFQ, continuous fragility quotient; RCT, randomized controlled trial.
Table 4 reports fragility quotients from studies that focused on LET for either primary or revision ACLR clinically; these subgroups excluded translational research and imaging studies. The FQ and CFQ for studies focused on revision ACLR were larger (0.117 and 0.113) than those focused on primary ACLR (0.042 and 0.095).
Analysis Based on Intervention Type a
ACLR, anterior cruciate ligament reconstruction; CFQ, continuous fragility quotient; FQ, fragility quotient; IQR, interquartile.
Discussion
The growing number of fragility analyses have highlighted the importance of cautious interpretation of P values in clinical research. The P value is affected by variables such as arbitrary alpha threshold, statistical methods, and population size and ultimately only indicates the probability of an outcome being due to chance. However, it has been shown that it is not uncommon for orthopaedic practitioners to be biased in their assessment of clinical studies, incorrectly interpreting lower P values as evidence of greater significance, effect size, or difference.2,28 Therefore, it is important to consider the addition of fragility metrics (FI, CFI, FQ, CFQ) alongside reported P values to provide a utilitarian mechanism of understanding potential uncertainty and effect size. 40
Dichotomous Outcomes
The current study offers a comprehensive analysis of the statistical fragility of published outcomes regarding LET augmentation to primary or revision ACLR. The overall FI for dichotomous outcomes was 9, with an FQ of 0.1, indicating that reversing the outcome of 9 patients (or 10 out of 100) would change the statistical significance of the evaluated studies. These results are comparable with, if not superior to, those found in previously published orthopaedic fragility studies, which have traditionally reported FIs ranging from 0 to 9 and FQs of 0.025 to 0.050. ‡ These results remained consistent with more granular analysis of specific, clinically important outcome measurements such as graft rupture (FI = 10; FQ = 0.116) and return to function (FI = 6; FQ = 0.040). Prior to this study, the analysis by Megafu and Megafu 35 of distal radial fracture, RCTs had the highest median FI at 9 (FQ = 0.097). With regard to the fragility literature surrounding ACLR, our results join those of Ehlers et al 11 in their comparisons of single- versus double-bundle techniques (FI = 3.14; FQ = 0.05) and autograft choices (FI = 3.77; FQ = 0.04). 12 Both of those studies demonstrated more fragility than the current study. The number of extracted outcomes in which LTF was greater than the FI was 27% in the current study; this metric is important because it implies that a change in the rate of follow-up or completion of study protocol could match, or at times surpass, the number of outcomes needed to flip statistical significance. Our results are positive in comparison with those of Ehlers et al,11,12 who reported that LTF exceeded FI in over 76% of outcomes. Interestingly, dichotomous outcomes that were initially reported as significant (P < .05) had a mean FQ that was significantly more fragile than that for nonsignificant outcomes; this suggests that although the LET data are statistically robust, significant outcomes are more fragile. As clinical decision-making often stems from significant findings, these results should be analyzed more carefully and interpreted with caution.
Continuous Outcomes
Our study joins a smaller group of studies calculating CFI for continuous outcomes. Extension of the concept of statistical fragility to continuous variables allows for inclusion of a larger, more comprehensive set of results from each study. The overall CFI was 7.8, with a CFQ of 0.12, which suggests that moving 7.8 patients (or 12 out of 100) from the test group to the control group would be sufficient to change significance. For specific outcomes of clinical interest, such as Lysholm (CFI = 8.5; CFQ = 0.128) and IKDC scores (CFI = 7.2; CFQ = 0.115), the fragility metrics were similar. Only 10% of outcomes had a higher number of patients who did not complete the study protocol (ie, were lost to follow-up) compared with the FI or CFI. These results suggest that available continuous outcome data for LET augmentation are rather robust when compared with much of the other orthopaedic literature. Caldwell et al 3 first demonstrated the use of the CFI through application of the statistical method on a preexisting fragility analysis from Khan et al 29 ; the authors reported a much higher CFI of 9 than Khan et al’s originally reported FI of 2, suggesting that the inclusion of continuous outcomes can increase the robustness of the included studies. Gupta et al 22 found a similarly high median CFI of 9 in their analysis of RCTs on platelet-rich plasma for the treatment of plantar fasciitis. Xu et al 51 found slightly less robust results for CFI in their analysis of RCTs on platelet-rich plasma for noninsertional Achilles tendinopathy (median FI = 4.5; median CFI = 5). Given that the CFI was much more robust than the associated FI in the current study and in the demonstration by Caldwell et al, the inclusion of both metrics when analyzing a study has the potential to balance concerns of a study’s fragility. The inclusion of continuous as well as dichotomous outcomes in a fragility analysis also allows for the inclusion of outcomes from more studies; as explained by Caldwell et al, the study from Khan et al originally excluded 12 RCTs related to sports surgery on the basis of no reported dichotomous outcomes. The inclusion of patient-centric metrics is especially valuable in our analysis of LET because the procedure specifically addresses knee instability, which is often distressing to the patient and has the potential to affect psychological confidence in knee function. Therefore, the CFI is an important metric for a truly comprehensive review of the quality of existing literature on a topic.
LET to Augment Primary Versus Revision ACLR
Although clinical evidence currently lacks clear indications for LET augmentation, a recent international consensus statement includes revision ACLR as an “appropriate” indication. 19 However, other indications included high-grade pivot shift, generalized ligamentous laxity, and younger patients returning to high-level pivoting sports. 21 This suggests that LET is more commonly used in association with revision, but growing in its use as an augmentation to primary ACL procedures in certain patients. Notably, both the median FQ and the median CFQ for outcomes from studies on revision ACLR (0.117 and 0.113) were larger than for primary ACLR (0.042 and 0.095, respectively). This suggests that the evidence for LET as an augmentation procedure for revision ACLR is more robust, while that for LET with primary cases remains relatively fragile. Despite the noticeable resurgence of academic interest in LET—12 out of 29 (41.4%) of included studies were published in 2022—the relative fragility of evidence regarding LET as an adjunct to primary ACLR indicates the need for further research before routine adoption into clinical practice.
Strengths and Limitations
Our study has many strengths. As mentioned previously, it is one of the first truly comprehensive fragility analyses, due to our inclusion of dichotomous and continuous outcomes, as well as both RCTs and comparative studies. However, it similarly has several limitations. First, the concept of FI for binary outcomes can be applied only to trials performing 1:1 randomization that report statistically significant findings. 50 CFI can be applied only to studies that reported sample mean, standard error or standard deviation, and sample size. 3 Many studies considered for this review were excluded because they had >2 parallel arms or did not report data on associations or other statistical measures (mean, standard error, standard deviation). Second, the current review did not thoroughly evaluate the study quality of individual RCTs and only focused on the FI. The FI is a tool used to evaluate the statistical robustness of RCTs and should not be the sole criterion for assessing the quality or validity of a study. 26 Currently, there is no threshold value to objectively categorize a metric as “robust” or “fragile.” Therefore, the strength of an FI is interpreted in the context of FIs of other studies. 5 A comparative rather than absolute evaluation ensures that the numerical value is considered in the context of the existing literature and not at face value.
Despite its limitations, the FI can be a useful tool for understanding the results of RCTs. Its simplicity and ease of interpretation make it an appealing way to evaluate RCTs or comparative studies, especially those with small sample sizes or few events, which can be difficult to interpret intuitively. The final limitation is that although the fragility analysis allows us to comment on the statistical strength of included studies, it does not allow us to comment on the directionality of the findings regarding LET usage; therefore, the findings of the current study should be considered when evaluating these sources directly, to better inform clinical practice.
Conclusion
Research regarding the usage of LET as an augmentation procedure for ACLR and revision ACLR is more statistically robust than many other topics within orthopaedics. Studies regarding LET with primary ACLR are more fragile than those for revision ACLR. Given the resurgent interest in this procedure and the current mixed evidence regarding its effectiveness and indications, we recommend the future reporting of fragility quotients alongside P values to assist clinicians in assessing the robustness of new evidence to inform decision-making.
Footnotes
Final revision submitted September 16, 2023; accepted October 5, 2023.
Presented at the AOSSM Annual Meeting in Denver, Colorado, July 10-14, 2024.
One or more of the authors has declared the following potential conflict of interest or source of funding: B.D.O. has received consulting fees from DePuy/Medical Device Business Services, Vericel, Linvatec, and Musculoskeletal Transplant Foundation; royalties from Linvatec; and honoraria from Vericel; and is a paid associate editor for the American Journal of Sports Medicine. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
