Abstract
Objective
Marrow stimulation is used to address knee cartilage defects. In this study, we used the fragility index (FI), reverse fragility index (rFI), and fragility quotient (FQ) to evaluate statistical fragility of outcomes reported in randomized controlled trials (RCTs) evaluating marrow stimulation.
Design
PubMed, Embase, and MEDLINE were queried for recent RCTs (January 1, 2010-September 5, 2023) assessing marrow stimulation for cartilage defects of the knee. The FI and rFI were calculated as the number of outcome event reversals required to alter statistical significance for significant and nonsignificant outcomes, respectively. The FQ was determined by dividing the FI by the study sample size.
Results
Across 155 total outcomes from 21 RCTs, the median FI was 3 (interquartile range [IQR], 2-5), with an associated median FQ of 0.067 (IQR, 0.033-0.010). Thirty-two outcomes were statistically significant, with a median FI of 2 (IQR, 1-3.25) and FQ of 0.050 (IQR, 0.025-0.069). Ten of the 32 (31.3%) outcomes reported as statistically significant had an FI of 1. In total, 123 outcomes were nonsignificant, with a median rFI of 3 (IQR, 2-5). Studies assessing stem cell augments were the most fragile, with a median FI of 2. In 55.5% of outcomes, the number of patients lost to follow-up was greater than or equal to the FI.
Conclusion
Statistical findings in RCTs evaluating marrow stimulation for cartilage defects of the knee are statistically fragile. We recommend combined reporting of P-values with FI and FQ metrics to aid in the interpretation of clinical findings in comparative trials assessing cartilage restoration.
Keywords
Introduction
Articular cartilage has a very limited capacity for self repair.1,2 As such, marrow stimulation techniques were developed to promote functional restoration of cartilage defects within joints instead of solely alleviating symptoms. Microfracture, first described by Steadman in the early 1990s, consists of debriding the defect and perforating small holes into the subchondral bone to induce invasion of progenitor cells and encourage tissue repair.1 -3 Although temporary improvements in joint function have been reported, microfracture results in fibrocartilaginous repair tissue, degradation of subchondral bone, and functional loss long term.1,4 Subchondral drilling is an even earlier technique first described by Smilie in 1957, which involves drilling holes (usually larger) in the subchondral bone plate which leads to blood clot formation and fibrocartilage repair tissue. 5 Similar to microfracture, this fibrocartilaginous tissue is structurally and biomechanically inferior to hyaline articular cartilage, leading to decreasing clinical results in as soon as 18 months.5,6
To direct repair toward hyaline-like cartilage, cell-based techniques have been developed, including autologous chondrocyte implantation (ACI), matrix-applied chondrocyte implantation (MACI), and osteochondral autologous transplantation (OATS). ACI/MACI involve harvesting chondrocytes and expanding them ex vivo and subsequently implanting them into the damaged articular defect as a patch. 7 The current generation, MACI, involves seeding of chondrocyte cells onto a type I/III collagen matrix, which can be cut to fit the size of the defect and implanted by arthroscopy or mini-arthrotomy. 8 In previous systematic reviews of randomized controlled trials (RCTs), ACI and MACI have demonstrated superior clinical improvement compared with marrow stimulation techniques.9,10 Prior literature also suggests that OATS may lead to significantly higher return to activity, patient-reported outcome measures (PROMs), and lower failure rates compared with marrow stimulation.11,12 Many recent RCTs have also assessed the use of augments, including stem cells, collagen membranes, extracellular matrix scaffolds, and so on, for marrow stimulation. However, the clinical significance of marrow stimulation augments remains unclear.10,13
RCTs represent the highest level of evidence in guiding management of cartilage defects with P-values reported widely in the orthopedic literature to indicate statistical significance. 14 Although the P-value is essential, it has received criticism for neglecting study design elements and patients lost to follow-up. 15 To supplement the P-value, Feinstein introduced the concept of the fragility index (FI) to address the P-value’s limitations. 16 The FI represents how “fragile” a statistical outcome is and is calculated as the number of iterative outcome event reversals needed to lose statistical significance. 16 This metric has been used widely to assess statistical fragility of RCT findings in the orthopedic literature.16 -24 The reverse fragility index (rFI) was similarly defined to represent the number of outcome event reversals required to turn nonsignificant outcomes into statistically significant findings.25 -27 To take sample size into consideration, the fragility quotient (FQ) is calculated as the FI divided by the sample size and represents the proportion of patients that need an outcome event reversal for significance to be altered.28,29 The purpose of this study was to evaluate the statistical fragility of RCTs assessing the efficacy of marrow stimulation techniques for cartilage defects of the knee using the FI, rFI, and FQ metrics. Specifically, we evaluated the fragility of RCTs that compared both marrow stimulation versus other cartilage restoration techniques and RCTs evaluating augments for marrow stimulation. We hypothesized that study findings would be statistically fragile, especially outcomes initially reported as statistically significant.
Methods
Literature Review
This systematic review was in accordance with the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). 30 The PubMed, Embase, and Medline databases were queried to identify RCTs published from January 1, 2010, to September 5, 2023, related to marrow stimulation for cartilage defects of the knee ( Fig. 1 ). The search keywords used across all databases were ((stimulation) OR (microfracture) OR (mfx) OR (drilling)) AND (chondral OR cartilage) AND “knee.” Studies met the inclusion criteria if they were RCTs reporting dichotomous, categorical outcomes with an intervention arm related to marrow stimulation techniques (e.g., microfracture, subchondral drilling). Non-English language, cadaveric/biomechanical/animal, in vitro, and non-RCT studies were excluded. The same RCT population at multiple follow-up time periods was included if the reported outcomes were distinct. Title/abstract screening and full-text review was performed by 2 independent reviewers and all conflicts were resolved by a third independent reviewer. A risk-of-bias assessment was performed for all included studies. The Cochrane risk of bias tool for assessing bias of randomized trials was used for quality assessment. 31

Preferred Reporting Items for Systematic Reviews and Meta-Analyses flow diagram showing identification, screening, and inclusion of eligible articles from PubMed, Embase, and Medline.
Data Extraction
The first author, year of publication, journal of publication, and treatment intervention of the 2 arms were extracted from each included study. Reporting of clinically meaningful difference metrics was also assessed for each included article (i.e., the minimum clinically important difference [MCID]). 32 Outcome events in each intervention arm, any reported P-values, and patients lost to follow-up were recorded for each study outcome. All RCT outcomes were reviewed and outcome categories were established by 2 reviewers for subgroup analysis. Outcome categories included complications/adverse events, volume of cartilage defect filling, failure/reoperation rates, clinical improvement in PROMs, subchondral bone architecture, integration of cartilage repair with adjacent native cartilage, and quality and homogeneity of repair tissue surface and structure.
Fragility Analysis
A 2-tailed Fisher exact test was used to confirm reported P-values for each outcome. Outcomes with P-values < 0.05 were considered statistically significant. The FI was calculated by manipulating outcome events until the P-value was reversed from <0.05 to ≥0.05 as demonstrated in Figure 2 . The rFI was calculated similarly for the P-value to switch from ≥0.05 to <0.05. The FQ was calculated by dividing the FI or rFI for each outcome by the study sample size to represent the proportion of patients that require an outcome event reversal for significance to be altered for a given outcome. We further performed subgroup analysis based on statistical significance, intervention comparison type, and outcome type. Findings are presented as median FI (interquartile range [IQR]).

Demonstration of statistical significance reversal using a 2 × 2 contingency table with a resulting fragility index = 1. OAT = osteochondral autologous transplantation.
Results
Of 371 RCTs screened, 21 studies were included for analysis. Seven RCTs were from American Journal of Sports Medicine, 4 from Knee Surgery, Sports Traumatology, Arthroscopy, 3 from Journal of Bone and Joint Surgery, 2 from Orthopaedic Journal of Sports Medicine, 2 from Arthroscopy: The Journal of Arthroscopic and Related Surgery, 2 from Cartilage, and 1 from Regenerative Therapy. The included RCTs assessed both marrow stimulation versus other cartilage restoration techniques and augments for marrow stimulation as indicated in Table 1 .
Characteristics of Included Studies Including First Author, Journal/Year of Publication, Title, and Interventions Assessed.
MACI = matrix-associated autologous chondrocyte implantation; AMIC = autologous matrix-induced chondrogenesis; OATS = osteochondral autologous transplantation; ACI = autologous chondrocyte implantation.
There were 155 total outcomes reported across the included RCTs related to marrow stimulation techniques for cartilage defects of the knee. The median FI across all outcomes was 3 (IQR, 2-5), indicating 3 outcome event reversals from the included RCTs alters overall statistical significance. The median FQ across all outcomes was 0.067 (IQR, 0.033-0.010). Thus, an outcome event reversal in 6.7 out of 100 patients alters outcome significance ( Table 2 ). In 86 out of 155 (55.5%) of outcomes, the number of patients lost to follow-up was greater than or equal to the FI.
Fragility Data Based on Outcome Significance.
FI = fragility index; IQR = interquartile range; FQ = fragility quotient; RCT = randomized controlled trial.
Thirty-two outcomes were statistically significant with a median FI of 2 (IQR 1-3.25), indicating that statistically significant outcomes rely on just 2 outcome events. The associated FQ for significant outcomes was 0.050 (IQR, 0.025-0.069). Thus, outcome event reversals in just 5% of patients reverse statistically significant outcomes. In 10 of the 32 (31.3%) outcomes reported as statistically significant, the FI was found to be 1. In total, 123 outcomes were reported as statistically nonsignificant; these outcomes were found to have a median rFI of 3 (IQR, 2-5) and FQ of 0.067 (IQR, 0.034-0.010).
Six of the included studies assessed the efficacy of augments to marrow stimulation and had a median FI of 3 (IQR, 2-5) across 52 outcomes. Stem cells were the most fragile augment, with a median FI of 2 (IQR, 1-4) across 21 outcomes. Autologous matrix-induced chondrogenesis (AMIC) versus microfracture alone demonstrated a median FI of 3.5 (IQR, 2.75-4) across 8 outcomes.
MACI versus microfracture was the least fragile intervention comparison assessed. Across 5 studies comprising 39 outcomes, we identified a median FI of 4 (IQR, 2-4.5). Four studies assessed ACI versus microfracture and had a median FI of 3.5 (IQR, 2-6) across 16 outcomes. Three studies assessed OATS versus microfracture and had a median FI of 3 (IQR, 2-4) across 20 outcomes.
The 22 outcomes related to cartilage defect filling and 11 outcomes related to the repair tissue structure were the most fragile outcome categories, with a median FI of 2 (IQR, 2-4) and 2 (IQR, 1.5-2.5), respectively ( Table 3 ). Complications/adverse events were the most commonly reported outcome category comprising 54 outcomes with a median FI of 4 (IQR, 2-5). Outcomes related to failure/reoperation, clinical improvement, and integration with adjacent cartilage similarly demonstrated fragility, each with a median FI of 3; outcomes relating to the subchondral architecture had a median FI of 4.
Subgroup Analysis Based on Outcome Type.
FI = fragility index; IQR = interquartile range; FQ = fragility quotient.
Bias assessment revealed that only one study was found to be at “high risk” of bias ( Table 4 ). Furthermore, only 11 of the 105 domains of bias evaluated across the 21 included studies. Bias was identified most commonly in the domains for missing outcome data (i.e., lost to follow-up) and in measurement of outcomes.
Bias Assessment for Included Studies Evaluated Using Revised Cochrane Risk-of-Bias Tool for Randomized Trials.
For shading, green shading indicates low risk of bias. Yellow shading indicates some concerns for risk of bias. Red shading indicates high risk of bias.
Of the 21 included RCTs, 7 (33.3%) used the MCID metrics to demonstrate clinically significant improvement in reported outcomes.
Discussion
The purpose of this systematic review was to use the FI, rFI, and FQ metrics to assess the statistical fragility of RCTs evaluating marrow stimulation techniques for cartilage restoration of the knee. Across 155 total outcomes, we demonstrated that just 3 outcome event reversals may alter statistical significance for the 21 included RCTs. In addition, outcome event reversals in just 6.7% of patients may be needed to alter statistical significance. We further demonstrated that the number of patients lost to follow-up was greater than or equal to the FI in over half of all outcomes. Subgroup analysis demonstrated considerable fragility for stem cell augments for marrow stimulation, while findings relating to MACI versus microfracture were most robust. The most fragile outcome categories across the cartilage restoration modalities involved cartilage defect filling and the repair tissue structure.
An important finding in this present study was the median FI of 2 identified for statistically significant outcomes, which indicates that outcome event reversals in just 2 patients may alter statistical significance in our review. In addition, with a median FQ of 0.050, statistical significance of findings may be lost with outcome event reversals in just 5% of patients. Furthermore, given that nearly one-third of the 32 statistically significant outcomes had an FI of 1, statistically significant findings in RCTs assessing marrow stimulation may not be as reliable as previously thought. Given that statistically significant outcomes reported in RCTs evaluating marrow stimulation may hinge on the outcome events of just 2 patients, these findings must be interpreted with caution.
For 86 out of 155 of the outcomes assessed, the number of patients lost to follow-up was greater than or equal to the outcome FI or rFI. This finding raises skepticism over the reliability of outcomes in RCTs assessing marrow stimulation techniques as outcome events lost due to attrition are capable of altering over half of all outcomes. Prior literature has suggested that a substantial portion of RCTs published do not adequately report follow-up data and those that do have high levels of missing outcome data.33,34 Furthermore, in a fragility analysis of RCTs in the orthopedic sports medicine literature, over 50% of included studies did not report on potential sources of bias. 35 In our bias assessment, only 1 of 21 RCTs demonstrated high risk of bias, which indicates that the statistical fragility identified likely is not a result of high level of bias among included studies. Ten of the studies demonstrated at least “some concerns” for bias and 4 of the studies were at risk of bias as a result of missing outcome data. Thus, efforts to minimize loss to follow-up may minimize bias and improve the reliability of RCT findings in the orthopedic literature. 34
In a 2022 systematic review by Wen et al., 36 augmented microfracture was deemed to have superior Lysholm scores and radiographic outcomes compared with microfracture alone. On the contrary, a 2022 systematic review and meta-analysis by Abraamyan et al. 10 identified no benefit for augmented microfracture across PROMs assessed. In our subgroup analysis, we identified that the 6 studies assessing augmented marrow stimulation were particularly fragile with just 3 outcome events altering statistical significance. Stem cell augments for marrow stimulation, in particular, were the most fragile with just 2 outcome event reversals required to alter significance. Outcomes relating to AMIC versus microfracture alone were less fragile; however, these outcomes may be reversed with outcome event reversals in just 3.5 patients. AMIC involves a 1-step repair procedure that involves marrow stimulation augmented by application of a collagen I/III matrix to stabilize the blood clot. 37 A systematic review by Kim et al. 38 reported significantly greater PROMs and radiographic findings for AMIC compared to microfracture. However, we demonstrate that RCT outcomes involving AMIC versus marrow stimulation alone might be more fragile than previously thought.
Prior literature has also suggested that ACI and MACI produce clinically meaningful improvements in PROMs compared with marrow stimulation especially in younger, more active populations.10,39 Among the studies comparing cell-based techniques to marrow stimulation, our fragility analysis identified that the 5 studies assessing MACI versus microfracture were the least fragile with a median FI of 4, while the 4 studies assessing ACI versus microfracture had a median FI of 3.5. Although the studies assessing MACI versus microfracture were less fragile in our analysis, an FI of 4 is still of concern and additional comparative trial literature is needed to guide decision-making surrounding the role of cell-based techniques in cartilage restoration.
The P-value is a ubiquitous tool for determining statistical significance in the scientific literature. 40 However, when used independently, it has garnered significant criticism as it fails to indicate effect size, clinical significance, or consider patients lost to follow-up.41,42 Furthermore, Chavalarias et al. 43 reported that with increasing use of the P-value in the biomedical literature, there has been a considerable bias toward reported significant P-values and even reporting data in a way that transforms nonsignificant findings into significant outcomes. Furthermore, Chavalarias et al. 43 recommend that the P-value not be used in isolation. As Sterne and Poeran noted, the FI and FQ metrics serve as valuable metrics to clearly convey a result’s uncertainty in a way that can be easily interpreted by clinicians. 44 In a 2018 study, Checketts et al. 45 identified a median FI of 2 and median FQ of 0.022 across 72 trials regarded as “strong evidence” by the American Academy of Orthopaedic Surgeons Clinical Practice Guidelines. Thus, studies guiding evidence-based medicine are prone to statistical fragility and their findings are not as robust as previously thought.
Statistical fragility in the orthopedic literature has been demonstrated by several studies.17,18,20,21,23,24,46 -56 In a 2019 study, Parisien et al. 47 found a median FI of 5 across 102 comparative trials in the sports medicine literature. In a recent study, Lawrence et al. 57 identified a median FI of 5 in studies evaluating bone-patellar tendon-bone versus hamstring tendon autografts for anterior cruciate ligament reconstruction. In an analysis of 19 RCTs in the knee cartilage restoration literature from 2000 to 2020, Parisien et al. 46 found a median FI of 4 across 60 outcomes. Our fragility analysis focused on RCTs published from 2010 to 2023 evaluating marrow stimulation techniques for knee cartilage restoration. Interestingly, we identified a higher level of statistical fragility (FI of 3) across 155 total outcomes compared to the 2021 study by Parisien et al. The findings in this present study thus highlight the continued need for future comparative trials evaluating cartilage restoration approaches such as MACI, OATS, AMIC, and scaffold/extracellular chondral matrix augments in treating chondral lesions.
This present fragility analysis included RCTs from 2010 to present across the PubMed, Embase, and Medline databases in adherence with the PRISMA guidelines. Our 2-directional fragility analysis demonstrated statistical fragility for significant and nonsignificant outcomes and took into account the sample size through the FQ metric. Our subgroup analysis by intervention comparisons, outcome category, and assessment of lost to follow-up among included RCTs further adds credence to the clinical implications of the statistical fragility identified in the marrow stimulation literature.
Given the statistical fragility identified across the orthopedic literature, future research should integrate the FI and FQ metrics in outcome reporting to aid in the interpretation of study findings. Furthermore, future studies should consider using additional statistical tools such as the MCID, substantial clinical benefit, patient-acceptable symptomatic state, and maximal outcome improvement to ensure clinically significant improvement from interventions employed. In the present fragility analysis, just 7 articles used such metrics to indicate clinically significant improvement. Consistent reporting of the MCID or other clinically significant outcome measures may allow for more effective evaluation of the extent of treatment effects. 32 Furthermore, these metrics will aid in standardizing assessment of patient improvement and ensuring evidence-based decision-making for management of cartilage defects of the knee.
Limitations
This fragility analysis is not without limitations. Our systematic review was limited to the available RCT literature with an intervention arm related to marrow stimulation for cartilage restoration of the knee. In addition, our fragility analysis was limited to dichotomous outcomes, thus leaving out continuous outcomes or trials with greater than 2 intervention arms. Furthermore, while we categorized the extracted outcomes for subgroup analysis, there was heterogeneity across the studies in assessment of the outcomes (e.g., different timepoints of outcome assessment, different PROMs evaluated, different thresholds for volume of defect filling). This review also did not assess studies prior to 2010 which may have limited how comprehensive our fragility analysis of marrow stimulation RCTs was. Finally, there are currently no FI and FQ thresholds set in the literature. However, Baer et al. 58 argue against setting a uniform threshold and instead recommends considering the clinical question being addressed and study design characteristics when interpreting fragility. Thus, a comprehensive analysis of outcome robustness should include the P-value, FI, and FQ metrics in conjunction with evaluation of study design quality and evidence of bias.
Conclusion
RCTs assessing marrow stimulation techniques in cartilage restoration of the knee are statistically fragile. Only 3 outcome event reversals may be sufficient to alter significance. We therefore recommend the combined reporting of FI and FQ metrics with P-values and clinically significant outcome metrics (i.e., the MCID) to ensure that clinicians are able to effectively interpret the robustness of outcomes reported in RCTs assessing knee cartilage restoration techniques.
Footnotes
Acknowledgments and Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
IRB approval was not required for this manuscript as only publicly available data was included in the investigation.
