Abstract
Background:
The literature presents conflicting findings regarding outcomes after pediatric anterior cruciate ligament reconstruction (ACLR) with various autograft options, reflecting a lack of consensus on the standard of practice. Fragility analyses may assist in evaluating the statistical robustness of these studies.
Purpose:
To evaluate the statistical fragility of comparative studies in pediatric ACLR through the fragility index (FI) and fragility quotient (FQ), as well as qualitative factors such as outcome type, outcome significance, and patients lost to follow-up.
Study Design:
Systematic review; Level of evidence, 4.
Methods:
A systematic review conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines identified 1139 studies in the PubMed and Embase databases that met the search criteria; ultimately, 6 studies were selected for inclusion. A total of 32 comparative outcomes were assessed for fragility across the 6 studies. Descriptive statistics were employed to summarize the fragility data and generate subgroup comparisons.
Results:
The mean FI was 1.5, and the mean reverse FI was 3.19 (P < .01); the mean FQ was 0.0064, and the mean reverse FQ was 0.028 (P≤ .0001). No significant difference was found in the FIs between objective outcomes and patient-reported outcomes (P = .418). These findings suggested that a comparable number of patients would need to transition from a nonevent to an event to alter a statistically significant result to a nonsignificant one. The FI was lower than the estimated number of patients lost to follow-up for 30 of the 32 outcomes (93.7%).
Conclusion:
Comparative studies on pediatric ACLR autograft outcomes displayed vulnerability when assessed using fragility metrics, indicating a lack of statistically robust data. The findings revealed that many reported outcomes are fragile and may require further investigation. Future research should incorporate fragility analyses—especially in studies with long-term follow-ups—to enhance the reliability of conclusions regarding optimal graft selection in pediatric ACLR.
Keywords
Numerous clinical studies have been conducted to compare the outcomes of anterior cruciate ligament (ACL) reconstruction (ACLR) using various graft options in pediatric patients. 24 ACLR in the pediatric population differs from that in the adult population since surgical techniques must be utilized to minimize the risk of growth disturbance while still providing tibiofemoral stability during pivoting sports activities. 28 Comparative pediatric ACLR autograft studies have provided valuable evidence regarding the efficacy and safety of patellar tendon, quadriceps tendon, and hamstring tendon autografts in the pediatric population.5,7,22,24,34 However, it is essential to critically assess the stability of the conclusions drawn from these studies, as the significance of the study results may be influenced by a small number of outcome events. Critically assessing the stability of study conclusions is particularly important given the current controversy regarding pediatric ACLR graft selection and the equivocal nature of outcomes among different autografts.15,21,37 The equivocal outcomes of pediatric ACLR graft selection can be quantified using a statistical measure called the fragility index (FI).
Since its introduction 11 in 1990, the concept of the FI has gained recognition as a valuable tool for evaluating the fragility of research studies within various medical disciplines. By quantifying the number of events or outcomes required to nullify the statistical significance of a result, the FI indicates the robustness of the study’s results.11,24,30 The reverse fragility index (RFI) offers insights into how many more events would need to be reversed in a study for it to acquire statistical significance (most commonly, P < .05), which may potentially be utilized for identifying the fragility of negative results. This analysis enables a more nuanced interpretation of hypothesis test results, with a low FI or RFI suggesting that a result may be underpowered or more likely to be attributable to chance.11,35,36
Although several studies have investigated the outcomes of ACLR using different autograft types in pediatric patients, the FI of these studies has not been explored.8,22,33 Exploring the statistical validity of this body of pediatric ACLR literature is critical, as multiple novel surgical techniques are rapidly being developed, all of which have implications with regard to potential growth disturbance and rerupture rate.
This study aimed to assess the vulnerability and reliability of current research on pediatric ACLR graft choices by evaluating the FI of comparative clinical trials. We hypothesized that the FIs would reveal significant fragility, underscoring the need for careful consideration of the robustness of these research conclusions.
Methods
Primary research published between 2010 and 2023 that investigated comparative outcomes of different autograft types for ACLR in pediatric patients was queried for this study. The initial search strategy involved a well-established methodological querying of the PubMed and Embase online databases for studies related to the ACL or ACLR in pediatric patients.9,12,13,29,38 The titles and abstracts of the retrieved studies were screened by 3 authors (G.S., S.A., and P.H.) for relevance to pediatric ACLR utilizing autografts. Studies were excluded if they met any of the following criteria: (1) no dichotomous outcomes generated, or no P values or statistical significance reported; (2) not a pediatric study, or no report of outcomes in a pediatric population; (3) not an autograft study with differential autograft outcomes, (4) not primary research; (5) a cadaveric study; or (6) if the study used population databases, national registries, or cross-sectional data.
Fragility Metrics
To assess the stability and reliability of the reported outcomes in these studies, the mean FI, fragility quotient (FQ), RFI, and reverse fragility quotient (RFQ) were calculated for each study, as well as each outcome measured. To determine the FI for each outcome, an established trial-and-error method was employed.9,12,13,29 Additionally, the FQ and RFQ were calculated for each outcome by dividing the FI or RFI by the number of patients included in the study. The FQ represents the proportion of events in the overall sample size that would have been reversed to generate a nonsignificant result.
Outcomes Assessed
The outcomes were grouped into objective outcomes—including graft failure and postoperative complications such as arthrofibrosis—and clinical or patient-reported outcomes such as return to play. In addition, the reported P value for each outcome was verified for accuracy using the 2-tailed Fisher exact test. Outcomes with a listed significance discordant with the calculated Fisher test were assigned an FI or RFI of 0 because no results were needed to be flipped for post hoc calculated significance to change. An FI of 0 may be generated in the setting of an analysis using a different statistical test than the Fisher exact test for a dichotomous outcome. 1 In this review we only analyzed dichotomous outcomes; thus, the patient-reported outcomes analyzed fell into dichotomous categories (eg, whether or not patients met functional recovery based on Knee injury and Osteoarthritis Outcome scores 15 ).
The mean FI, FQ, RFI, and RFQ for all included outcome events were calculated along with their interquartile ranges. Three subgroups were analyzed for significant differences using independent t tests at a 95% CI: (1) graft failure or arthrofibrosis outcomes versus clinical or patient-reported outcomes; (2) significant (P < .05) versus nonsignificant (P≥ .05) outcomes; and (3) outcomes for which the FI or RFI was less than the estimated number of patients lost to follow-up (LTFU).
The data analysis was conducted utilizing Excel Version 16.80 (Microsoft) and R programming language Version 4.3.2 (R Core Team). Descriptive statistics were employed to summarize the fragility data and generate subgroup comparisons.
Results
A total of 1139 studies were initially screened, resulting in 50 studies meeting the initial search criteria. From this final pool, 6 studies were selected for the final analysis. The flow chart of study inclusion is depicted in Figure 1. In these studies, bone-patellar tendon-bone (BPTB), quadriceps tendon, hamstring tendon, and iliotibial band grafts were compared (Table 1).

Identification of studies for inclusion via databases and registers.
Characteristics of the Included Studies (N = 6) a
BPTB, bone-patellar tendon-bone; FI, fragility index; FQ, fragility quotient; HT, hamstring tendon; ITB, iliotibial band; LOE, level of evidence; NA, not available; QT, quadriceps tendon; RFI, reverse fragility index; RFQ, reverse fragility quotient.
Data are presented as means. The number of values used to calculate the mean FI and RFI are included in parentheses.
The mean FIs of the included studies ranged from 0 to 3, with an overall mean of 1.5, indicating that on average, <2 events would annul the statistical significance of the reported outcomes if changed to nonevents. The mean FQ ranged from 0 to 0.01, with an overall mean of 0.006, suggesting that, on average, around 0.6% of the sample size would need to be altered to nullify the statistical significance of the outcome (Table 1). The mean RFI and RFQ were calculated to measure the fragility of nonsignificant results with reported P≥ .05. Calculated mean RFIs ranged from 1 to 7, with an overall average of 3.19. The mean RFQ was 0.042 (Table 1).
No difference was observed between the magnitude of fragility between graft complication or clinical versus functional or patient-reported outcomes (P = .418) (Table 2). This result suggested that the statistical fragility of patient-reported outcomes may not be significantly different from more concrete outcomes such as the proportion of ACL graft failure. Significant outcomes were found to be less robust (more fragile) than nonsignificant outcomes, as reflected by the smaller FI values for significant results and the larger RFI values for nonsignificant results (Table 2). Our analysis examining the relationship between the FI and the number of patients LTFU did not reveal a statistically significant difference between the subgroups of FI or RFI ≤ LTFU and the FI or RFI > LTFU (Table 2).
Overall Fragility Data and Analysis of Subgroups a
Data are presented as mean (IQR). FI, fragility index; FQ, fragility quotient; IQR, interquartile range; LTFU, lost to follow-up.
Discussion
The mean magnitude of fragility indices for all comparative outcomes was 2.875, indicating that a mean of <3 events would need to be reversed to alter the statistical significance of most findings within these studies of pediatric autograft ACLR. An approximate FI of 3 suggests the similar vulnerability of the conclusions in pediatric orthopaedic ACLR studies to the previous orthopaedic literature reporting similar FI values in sports medicine studies, 17 and studies focusing on surgical techniques and rehabilitation in pediatric ACL tears.9,23,33 The American Academy of Orthopaedic Surgeons guidelines indicate that an FI ≥2 is desirable. 6 Although the fragility of the negative findings in these studies met the desired standard, the positive findings, which achieved statistical significance (P < .05), did not. The mean FI for positive findings was 1.5, indicating that, on average, reversing the outcome of <2 patients would change the significance of the study. Furthermore, the highest FI observed was 3, meaning that in the most fragile positive results, reversing the outcomes of just 3 patients would eliminate statistical significance.
Notably, none of the included studies conducted an a priori power analysis, and only 1 study 3 conducted a post hoc analysis. Britt et al 3 describe conducting a power analysis that was underpowered at β = 0.8. Power analyses are a crucial component of strong comparative clinical studies that help determine minimal sample sizes 31 and can help guide researchers to reduce fragility, ensure adequate sensitivity, estimate effect size, and assess the risk of type 2 errors in their final analysis.2,4,32 Therefore, we suggest orthopaedic researchers perform a priori power analyses during the study design phase and conduct post hoc analyses to ensure the validity of their findings. When considering the type of outcome, our study revealed no significant difference in fragility between groups of outcomes measuring concrete events such as graft rerupture and patient-reported outcomes such as return to play and functional recovery (Table 2). Patient-reported outcomes have previously faced criticism for their perceived lack of precision, unsubstantiated correlations with overall outcomes, increased susceptibility to recall bias, and inherent challenges with interpretation.10,14,19,20,27 However, through a fragility analysis, patient-reported outcomes can be compared with objective outcomes to help orthopaedic surgeons assess their congruence, evaluate the robustness and quality of patient-reported outcomes, and inform patient-centered clinical decision-making.
The accuracy of patient-reported outcomes is also supported after clinical rehabilitation of ACL tears within the nonoperative setting, 16 suggesting that the inclusion of both concrete and patient-reported outcomes can provide an accurate assessment of treatment outcomes and contribute to the overall validity and clinical applicability of research findings. The characterization of FI and FQ by previous studies demonstrates the moderate vulnerability of the patient-reported outcomes in pediatric ACLR relative to other areas of orthopaedic research.18,25 The most statistically robust conclusions that demonstrate significance drawn from this body of literature are from Maheshwer et al, 21 where an FI of 3 was generated from their analysis comparing the higher rate of retear in hamstring autograft ACLR to BPTB autograft at >2 years of follow-up. This finding suggests that only 3 event reversals would be needed to change the outcome’s statistical significance, indicating moderate fragility. An FI of 0 was generated in analyzing retear rates in 13- to 15-year-old patients who received either hamstring or BPTB autografts, 18 signifying that even a single event change would affect the study’s conclusions, demonstrating extreme fragility. The context provided by these results is critical for our study, as it underscores the variability in statistical robustness across different studies. For patient management, these findings highlight the necessity for clinicians to critically evaluate the robustness of the evidence when making decisions about autograft selection for pediatric ACLR. The fragility of some studies suggests that clinical decisions should not rely solely on statistically significant findings but also consider the FI and other qualitative factors to ensure more reliable outcomes.
Among the nonsignificant results, notable findings emerged from studies such as Morgan et al 25 and Kilkenny et al. 18 Morgan et al reported comparable rerupture rates between BPTB and hamstring autografts, yielding an FI of 7. Similarly, Kilkenny et al observed no disparity in outcomes among 13- to 15-year-old patients who underwent BPTB or hamstring autograft repair, resulting in an FI of 7. Morgan et al reported the lowest FI in our analysis, scoring 0, when investigating the 15-year follow-up of BPTB versus hamstring graft repair and contralateral ACL rupture rates.
Limitations
This study has several limitations. One such limitation is that the FI was not able to be calculated for nondichotomous data. Therefore, several studies and outcomes that examined nondichotomous outcome data in the setting of pediatric autograft ACLR were excluded, as these were unable to be examined with fragility methodology. The outcomes were grouped into graft rupture or arthrofibrosis findings, or clinical and patient-reported outcomes, which was a post hoc analysis performed after the conclusion of the literature search. This review provides a critical outlook on the strength of the studies examining autograft choice in pediatric ACLR, but as autograft choices exhibit individualized indications, the randomization of graft choice was not considered here. We primarily focused on evaluating population-level analysis, neglecting other patient-specific factors such as age, skeletal maturity, and activity level, which play a crucial role in determining tailored treatment approaches.22,28,37 Additionally, the lack of long-term follow-up studies limited our understanding of the durability and functional outcomes associated with different graft options.
Conclusion
The findings of comparative studies investigating outcomes of pediatric ACLR with different autografts were found to be subject to vulnerability when evaluated using fragility metrics. There was a lack of statistically robust data adequately describing the similarities and differences in outcomes between various pediatric ACLR autograft choices. Many outcomes in the literature may be statistically fragile and may require further investigation. Future comparative study analyses should consider evaluating pediatric ACLR studies with long-term follow-ups with fragility metrics to ensure more reliable conclusions.
Footnotes
Final revision submitted August 23, 2024; accepted September 5, 2024.
One or more of the authors has declared the following potential conflict of interest or source of funding: D.W. has received grant support from Immunis and Vericel; education payments from Micromed; consulting fees from DePuy/Medical Device Business Services, Ipsen Bioscience, Newclip Technics, and Vericel; nonconsulting fees from Vericel; hospitality payments from Arthrex and Stryker; and has stock/stock options in Cartilage Inc and Overture Orthopaedics. B.T.F. has received education payments from Evolution Surgical. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
