Sage Journals: Discover world-class research

Abstract

Objective

The purpose of this study was to utilize fragility analysis to assess the robustness of randomized controlled trials (RCTs) evaluating the management of articular cartilage defects of the knee. We hypothesize that the cartilage restorative literature will be fragile with the reversal of only a few outcome events required to change statistical significance.

Design

RCTs from 11 orthopedic journals indexed on PubMed from 2000 to 2020 reporting dichotomous outcome measures relating to the management of articular cartilage defects of the knee were included. The Fragility Index (FI) for each outcome was calculated through the iterative reversal of a single outcome event until significance was reversed. The Fragility Quotient (FQ) was calculated by dividing each FI by study sample size. Additional statistical analysis was performed to provide median FI and FQ across subgroups.

Results

Nineteen RCTs containing 60 dichotomous outcomes were included for analysis. The FI and FQ of all outcomes was 4 (IQR 2-7) and 0.067 (IQR 0.034-0.096), respectively. The average number of patients lost to follow-up (LTF) was 3.9 patients with 15.8% of the included studies reporting LTF greater than or equal to 4, the FI of all included outcomes.

Conclusions

The orthopedic literature evaluating articular cartilage defects of the knee is fragile as the reversal of relatively few outcome events may alter the significance of statistical findings. We therefore recommend comprehensive fragility analysis and triple reporting of the P value, FI, and FQ to aid in the interpretation and contextualization of clinical findings reported in the cartilage restoration literature.

Keywords

articular cartilage knee fragility index fragility quotient cartilage restoration statistical significance

Introduction

With the 20th century came a wave of scientific and technologic discoveries that would help shape modern medicine. With this came the recognition that physicians lacked the tools to efficiently evaluate the literature and incorporate it into their practice. Evidence-based medicine (EBM) emerged as an effective paradigm for connecting research to clinical practice and is now an integral component of medical training programs.¹ This training emphasizes an understanding of foundational statistical methods required for assessing the significance of research findings. While several methods exist, the P value is most commonly utilized to determine the statistical significance of clinical findings. If this value falls below a certain probability threshold (P < 0.05), researchers conclude a finding to be statistically significant. Thus, P values play an integral role in the resultant clinical recommendations derived from medical research.

Despite its ubiquity, the P value provides an incomplete picture of the significance of clinical findings. It does not provide any information regarding the magnitude of an effect and thus, as a single statistical metric, may fall short in its attempt to provide meaningful insight into the clinical relevance of a statistical finding.² Furthermore, small study size or large loss to follow-up may further challenge the robustness of the resultant P value. While confidence intervals have been proposed to address some of the issues surrounding P values and hypothesis testing, they are not without their limitations.^3-5 Confidence intervals often prove cumbersome and may be challenging to interpret with the majority of physicians lacking appropriate statistical training. The Fragility Index (FI) has emerged as a statistical complement to the P value while adding depth to its interpretation. The FI is determined by calculating the absolute number of outcome events required to reverse the statistical significance of a particular outcome. Ahmed et al.⁶ expanded on this concept by introducing the Fragility Quotient (FQ), which divides the FI by the study’s sample size in order to provide a relative measure of fragility. These statistics enhance understanding of P values by providing insight into the robustness of a study’s conclusion; a small FI or FQ will call into question the validity of a study’s finding, while larger values will strengthen it.

Since the concept of fragility was originally described in 1990,⁷ it has been used across medical specialties to help readers evaluate the strength of findings in dichotomous comparative trials.^8-11 Within the orthopedic literature, FI and FQ have been used to evaluate the findings of a multitude of subspecialties, from pediatric trauma to orthopedic oncology.^12-18 The majority of these studies have demonstrated significant statistical fragility, thus calling into question the robustness of study conclusions. However, the fragility of studies pertaining to articular cartilage defects of the knee has yet to be explored in the literature. The purpose of this study was to utilize fragility analysis to assess the robustness of randomized controlled trials evaluating the management of articular cartilage defects of the knee. We hypothesize the cartilage restorative literature will be fragile with the reversal of only a few outcome events required to change statistical significance.

Methods

This meta-analysis was performed in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for conducting randomized controlled trials (RCTs) from the PubMed database related to the treatment of articular cartilage defects of the knee and published in 1 of 11 high-impact orthopedic journals from January 1, 2000 to October 10, 2020 were evaluated. The 11 journals were the American Journal of Sports Medicine (AJSM); Cartilage; Arthroscopy; Journal of Bone and Joint Surgery (JBJS); Journal of the American Academy of Orthopaedic Surgeons (JAAOS); Journal of Knee Surgery; Journal of Pediatric Orthopaedics; Knee Surgery, Sports Traumatology, Arthroscopy (KSSTA); Knee; Orthopaedic Journal of Sports Medicine (OJSM); and The Physician and Sportsmedicine. These journals were chosen given their consistent inclusion of research pertaining to the knee as well as their particular prominence within the orthopedic literature; the impact factors for included journals can be found in Table 1 . The full study search strategy can be found in the online Supplemental Appendix. All RCTs pertaining to cartilage disorders of the knee that reported dichotomous categorical outcomes with P values were included in this study. All lengths of follow-up were eligible for inclusion in this study. Animal studies, cadaveric studies, biomechanical studies, case reports, systematic reviews, meta-analyses, nondichotomous studies, and non-RCTs were excluded from this systematic review. After meeting these criteria, the following information was extracted: study authors, journal, publication year, PMID, study design, loss-to-follow-up (LTF), number and type of study outcome (primary vs. secondary), and the P value associated with each outcome. Extraction was performed by a single author (MC) and verified by 2 coauthors (RLP and DPT). Bias was assessed for each study with utilization of the Cochrane Collaboration’s tool for assessing risk of bias in randomized trials.¹⁹ This bias assessment is based on 7 guiding principles: (1) do not use quality scales; (2) focus on internal validity; (3) assess the risk of bias in trial results, not the quality of reporting or methodological problems that are not directly related to risk of bias; (4) assessment of risk of bias require judgment; (5) choose domains to be assessed based on a combination of theoretical and empirical considerations; (6) focus on risk of bias in the data as represented in the review rather than as originally reported; and (7) report outcomes-specific evaluations of risk of bias. In addition to these guiding principles, the risk of bias assessment includes specific evaluation of random sequence generation, allocation concealment, performance bias, detection bias, attrition bias, reporting bias, and other bias ( Table 2 ).

Table 1.

Included Journals with Associated 2020 Impact Factors.

Journal	Impact Factor
American Journal of Sports Medicine	5.810
Journal of Bone and Joint Surgery	4.578
Arthroscopy	4.325
Cartilage	3.857
Knee Surgery, Sports Traumatology, Arthroscopy	3.166
Orthopaedic Journal of Sports Medicine	2.492
Journal of the American Academy of Orthopaedic Surgeons	2.286
Journal of Knee Surgery	1.986
Knee	1.913
Journal of Pediatric Orthopaedics	1.909
The Physician and Sportsmedicine	1.662

Table 2.

Risk of Bias Assessment.^a

Study	Random Sequence Generation (Selection Bias)	Allocation Concealment (Selection Bias)	Blinding of Participants and Researchers (Performance Bias)	Blinding of Outcome Assessment (Detection Bias)	Incomplete Outcome Data (Attrition Bias)	Selective Reporting (Reporting Bias)	Other Bias
Gudas et al. (2009)²⁷	⊕	?	⊕	⊖	⊕	⊕	⊕
Gudas et al. (2005)³⁴	?	⊕	⊕	⊕	⊕	⊕	⊕
Zorzi et al. (2007)³⁰	⊕	⊕	⊕	⊕	?	⊕	?
Gudas et al. (2012)³³	?	⊕	⊕	?	⊕	⊕	⊕
Kim et al. (2020)³⁶	⊕	⊕	⊕	⊕	⊖	⊕	⊕
Spahn et al. (2016)²⁸	⊕	⊕	⊕	?	?	⊕	?
Knutsen et al. (2016)²³	?	⊕	?	?	⊕	⊕	⊕
Solheim et al. (2018)³¹	⊕	⊕	?	⊕	⊕	⊕	?
Bartlett et al. (2005)²⁴	⊕	⊕	?	⊖	⊖	⊕	?
Spahn et al. (2010)³⁷	⊕	⊕	⊕	⊕	⊕	?	?
Zeifang et al. (2010)³²	⊕	⊕	?	⊕	?	⊕	?
Saris et al. (2014)²¹	⊕	⊕	⊖	?	⊕	⊕	?
Townshend et al. (2009)²⁶	?	⊕	⊕	⊕	⊕	⊕	⊕
Spahn et al. (2008)²⁹	⊕	⊕	?	⊕	⊕	⊕	?
Saris et al. (2008)²⁰	⊕	⊕	⊕	⊖	⊕	⊕	⊕
Hoogeslag et al. (2010)³⁵	?	⊕	⊕	⊕	⊕	⊕	⊕
Bentley et al. (2012)²⁵	⊕	⊕	⊖	?	?	⊕	⊕
Yoon et al. (2020)³⁸	⊕	⊕	?	⊕	?	⊕	?
Lee et al. (2019)²²	⊕	⊕	⊕	⊕	⊕	⊕	⊕

⊕ = low risk; ? = uncertain risk; ⊖ = high risk.

Risk represents overall assessment of risk across multiple outcome groups.

Fragility analysis was performed through the manipulation of a 2-by-2 contingency table containing the dichotomous outcome events extracted from comparative trials. For each statistically significant result (P < 0.05), a single outcome event was reversed until the result was no longer statistically significant (P ≥ 0.05) ( Table 3 ). This process was repeated for initially non-significant results (P ≥ 0.05 to P < 0.05). The number of event reversals required to change the significance of a result was recorded as the FI of the outcome. An FQ was then generated by dividing each outcome’s FI by the sample size of its study. The median FI and FQ across all outcome measures were calculated, representing the fragility measures for our entire study. We further stratified FI and FQ results by the following prespecified subgroups: initial significance (P ≥ 0.05 vs. P < 0.05), outcome type (primary vs. secondary), and year of publication. Two-tailed Fisher exact tests were used to verify the P values reported in each study as well as to generate P values during FI analysis. Interquartile ranges (IQRs) were calculated for all reported FI and FQ data.

Table 3.

Demonstration of the Reversal of Statistical Significance with Resultant Fragility Index (FI) = 1.

	Outcome A	Outcome B		Outcome A	Outcome B
Treatment A	23	75	Treatment A	23	75
Treatment B	11	87	Treatment B	12	86
P value		0.04	P value		0.06

Results

After confining PubMed results to our specified journal and publication date criteria, 48,881 studies were available for screening. Nineteen RCTs from 6 journals were eligible for inclusion and formed the basis of this study.^20-38All included studies evaluated the surgical management of articular cartilage defects of the knee; the following interventions were utilized in at least one arm of any study: 7 cell-based therapies such as autologous chondrocyte implantation (ACI), 9 microfracture, 7 arthroscopic techniques (e.g., debridement, radiofrequency ablation), and 5 osteochondral autograft transplantation.

Of the 60 total outcome events reported across the included studies, 38%were statistically significant (P < 0.05) and 62% were not significant ( Fig. 1 ). The median FI and FQ across all 60 outcome events was only 4 (IQR 2-7) and 0.067 (IQR 0.034-0.096), respectively. A FI ≤ 2 was determined for 30% of all outcome events; 7 with a FI of 1 and 11 with a FI of 2. Of the 60 total outcome events, 38% represented primary outcomes, 18% secondary outcomes, and 43% were specified as neither primary nor secondary. Primary outcomes demonstrated a median FI of 4 (IQR 2-8) and FQ of 0.067 (IQR 0.033-0.100). Secondary outcomes demonstrated a median FI of 2 (IQR1-4) and FQ of 0.038 (IQR 0.0187-0.065). Outcomes not specified demonstrated a median FI of 6 (IQR 3-7) and FQ of 0.069 (IQR 0.040-0.096). A total of 23 outcomes (38%) initially reported as significant demonstrated a median FI and FQ of 3 (IQR 2-7) and 0.050 (IQR 0.028-0.088), respectively. A total of 37 outcomes (62%) initially reported as nonsignificant demonstrated a median FI and FQ of 6 (IQR 3-8) and 0.067 (IQR 0.034-0.098), respectively. Trials published between 2000 and 2010 represented 27% of the total outcome events with a FI and FQ of 3 (IQR 2-4.5) and 0.058 (IQR 0.021-0.091), respectively. Trials published in the following decade between 2011 and 2020 represented 73% of the total outcome events with a FI and FQ of 5.5 (IQR 2.5-8) and 0.067 (IQR 0.035-0.098), respectively. A complete description of the fragility analysis can be found in Table 4 . The average LTF across all trials was 3.9 patients. Of the 19 RCTs, 15.8% (3 studies) demonstrated LTF greater than or equal to the overall FI of 4.

Figure 1.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart.

Table 4.

Fragility Data Based on Trial and Outcome Characteristics.

Characteristic	Events	Fragility Index (IQR)	Fragility Quotient (IQR)
All trials	60	4 (2-7)	0.067 (0.034-0.096)
Outcome
Primary	23	4 (2-8)	0.067 (0.033-0.100)
Secondary	11	2 (1-4)	0.038 (0.017-0.065)
Not Specified	26	6 (3-7)	0.069 (0.040-0.096)
Reported P value
P < 0.05	23	3 (2-7)	0.050 (0.028-0.088)
P ≥ 0.05	37	6 (3-8)	0.067 (0.034-0.098)
Year of publication
2000-2010	16	3 (2-4.5)	0.058 (0.021-0.091)
2011-2020	44	5.5 (2.5-8)	0.067 (0.035-0.098)

Bias Assessment

Fourteen studies (73.7%) were at low risk of selection bias due to inadequate random sequence generation and 5 (26.3%) were at uncertain risk. Eighteen studies (94.7%) were at low risk of selection bias due to inadequate allocation concealment and 1 (5.3%) was at uncertain risk. For both attrition and performance bias, 11 (57.9%) were low risk, 5 (26.3%) were of uncertain risk, and 2 (10.5%) showed high risk of bias. Evaluation of detection bias showed similar results, with 11 (57.9%) low-risk, 5 (26.3%) uncertain-risk, and 3 (15.8%) high-risk studies. Reporting bias was generally low, with 18 (94.7%) studies showing low risk of bias and only 1 (5.3%) showing uncertain risk. Evaluation of additional biases not outlined in the tool showed 10 (52.6%) studies to be at low risk and 9 (47.4%) to be at uncertain risk.

Discussion

The purpose of this study was to utilize fragility analysis to assess the robustness of RCTs evaluating the management of articular cartilage defects of the knee. Fragility analysis revealed the median FI and FQ across all outcomes to be 4 and 0.067, respectively. These results indicate that the reversal of only 6.7 outcome events per 100 subjects is sufficient to alter the significance of a study’s findings. Furthermore, 30% of all outcomes demonstrated a FI of only 1 or 2. Fragility was demonstrated across all subgroups of analysis with LTF exceeding the median FI in more than 15% of studies. Subgroup fragility analysis by year revealed increased trial robustness in the second decade (2011-2020) of study with an FI of 5.5 versus an FI of only 3 for the previous decade (2000-2010).

Statisticians have been calling attention to the incompatibility of P values and hypothesis testing since their introduction in the early 20th century, yet the bond between these 2 statistical methods has only grown stronger.^39,40 This ubiquity, however, has attracted critical attention regarding the independent utilization of P values to ascertain statistical significance in medical research. Growing concern regarding the interpretation of P values in the research community led the American Statistical Association to release a statement clarifying the role of this controversial value in research.⁴¹ Through description of 6 principles, the authors highlighted some of the most common misuses and misconceptions surrounding P values. In addition to calling out the conflation of statistical significance with clinical significance, the authors noted that the statistic does not provide a complete picture of the evidence surrounding a hypothesis, and that overuse of P values “as a license for making a claim of a scientific finding leads to considerable distortion of the scientific process.”⁴¹ Finally, primary P value utilization has led publications to preferentially accept studies reporting significant results and may influence researchers to manipulate data in order to achieve such desired results.⁴²

Several alternative methods for interpreting study results have been proposed given the problematic nature of the P value. Confidence intervals (CIs) have received the support of a number of researchers who believe that their emphasis on estimation rather than a precise threshold for significance would better encapsulate a study’s results.^5,43,44 The use of CIs has not been universally supported to replace the status quo, however, and many believe its interpretation to be less straightforward than the P value and thus less likely to be adopted by the medical community.⁴⁵ An alternative suggestion is simply lowering the standard threshold for significance (i.e., P < 0.05 to P < 0.001), with advocates of the approach believing it may improve reproducibility of studies and lower the rate of false positives.^46,47 However, critics point out that this change would result in more false negatives and fails to address publication and other biases. Furthermore, isolated P value analysis does not address the context in which a study occurs, which has been described as the interplay between a study’s sample size and effect size.⁴⁸

FI and FQ are 2 metrics that can address the context in which a P value is reported by providing readers a sense of the stability of a given P value. Walsh et al.¹¹ pioneered this approach in 2014 when they reviewed 399 RCTs with median sample size of 682, reporting a median FI of 8. They further discovered that the number of patients LTF was greater than the FI calculated for over 50% of the included RCTs, indicating that claims of significance had the potential to be reversed if all patients were appropriately maintained to follow-up. Shochet et al.⁸ described similar results in their study of 127 RCTs in nephrology with a median FI of 3 and over 40% of studies revealing a LTF greater than their respective FI. As the use of the FI to aid in interpretation of study results has grown, Ahmed et al.⁶ noted that the absolute nature of the FI makes it difficult to compare the FI across studies. He therefore proposed dividing the FI by the study’s sample size in order to generate the FQ, a relative indication of robustness that can be interpreted across studies. The FI and FQ have since been adopted by a number of researchers in an attempt to add additional clarity to the interpretation of results.^49-51

The FI and FQ have also been utilized for orthopedic research, often illustrating a body of literature whose results are far from robust. A recent article by Forrester et al.¹⁵ of 23 clinical trials in orthopedic oncology reported a median FI of 2, which the authors attributed to small sample sizes and large losses to follow-up. In a similar study examining 48 RCTs in sports medicine, Khan et al.¹³ found results to be fragile with a median FI of 2 and loss to follow-up exceeding a study’s FI for nearly half (48%) of outcomes. Parisien et al.¹⁴ reported the same trend in the orthopedic trauma literature, where analysis of 198 comparative studies revealed a median FI of 5 and an FQ of 0.046. These studies show that the fragility of clinical findings is pervasive in the orthopedic literature and emphasize the importance of the FI and FQ as tools in critically assessing a study’s findings. This is the first study to examine the fragility of literature specifically evaluating articular cartilage defects of the knee. While many nonoperative and operative treatments exist to address these conditions, the findings of our study suggest the literature is fragile and highlights the importance of expanding clinicians’ ability to critically interpret the clinical significance of study findings.

Our study has several strengths, beginning with our methodology. We identified 11 prominent journals known to include high-quality studies on articular cartilage defects of the knee, ensuring that we identified the studies with the highest impact on treatment of these disorders. In addition to FI, we derived the FQ of included studies in order to address the effect sample size plays on FI. Finally, our inclusion of primary, secondary and outcomes not specified along with both significant and nonsignificant outcomes, further strengthens our study by allowing a more complete assessment of the literature. However, one limitation that is inherent to the method of fragility analysis is its application to only dichotomous comparative trials, leaving trials that report continuous outcomes unevaluated. Furthermore, P value and fragility analyses provide an understanding of statistical significance but are not sufficient in the determination of clinical significance. The clinician must incorporate other information such as qualitative integrity of study design and the presence or absence of bias when evaluating clinical research findings.

Conclusion

Supplemental Material

sj-pdf-1-car-10.1177_19476035211012458 – Supplemental material for The Fragility of Statistical Significance in Cartilage Restoration of the Knee: A Systematic Review of Randomized Controlled Trials

Supplemental material, sj-pdf-1-car-10.1177_19476035211012458 for The Fragility of Statistical Significance in Cartilage Restoration of the Knee: A Systematic Review of Randomized Controlled Trials by Robert L. Parisien, Michael Constant, Bryan M. Saltzman, Charles A. Popkin, Christopher S. Ahmad, Xinning Li and David P. Trofa in CARTILAGE

Footnotes

Supplementary material for this article is available on the Cartilage website at .

Acknowledgments and Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Robert L. Parisien: American Orthopaedic Society for Sports Medicine: Board or committee member; Arthroscopy: Editorial or governing board; Society of Military Orthopaedic Surgeons: Board or committee member. Bryan M. Saltzman: Nova Science Publishers: Publishing royalties, financial or material support. Charles A. Popkin: Arthrex, Inc: Other financial or material support; research support; Smith & Nephew: Other financial or material support; USA Hockey Safety and Protective Equipment Committee: Board or committee member Christopher S. Ahmad: Arthrex, Inc.: IP royalties; paid consultant; research support; At Peak: Stock or stock options; Lead Player: Publishing royalties, financial or material support; Major League Baseball: Research support; Orthopedics Today: Editorial or governing board; Stryker: Research support. Xinning Li: AAOS: Board or committee member; American Journal of Sports Medicine: Editorial or governing board; American Shoulder and Elbow Surgeons: Board or committee member Arthroscopy Association of North America: Board or committee member; BMC Musculoskeletal Disorders: Editorial or governing board; FH Ortho: IP royalties; paid consultant; Journal of Bone and Joint Surgery–American: Editorial or governing board; Journal of Medical Insight (JOMI): Editorial or governing board; Orthopedic Reviews: Editorial or governing board World Journal of Orthopaedics: Editorial or governing board. David P. Trofa: Education support from Arthrex, Gotham Surgical Solutions & Devices, Peerless Surgical, Smith & Nephew; Grants from Smith & Nephew, Arthrex.

ORCID iDs

Michael Constant

Bryan M. Saltzman

Charles A. Popkin

References

Evidence-Based Medicine Working Group. Evidence-based medicine: a new approach to teaching the practice of medicine. JAMA. 1992;268(17):2420-5. doi:10.1001/jama.1992.03490170092032

Goodman

A dirty dozen: twelve P-value misconceptions. Semin Hematol. 2008;45(3):135-40. doi:10.1053/j.seminhematol.2008.04.003

Morey

Hoekstra

Rouder

Lee

Wagenmakers

EJ.

The fallacy of placing confidence in confidence intervals. Psychon Bull Rev. 2016;23(1_suppl):103-23. doi:10.3758/s13423-015-0947-8

Gardner

Altman

DG.

Confidence intervals rather than P values: estimation rather than hypothesis testing. Br Med J (Clin Res Ed). 1986;292(6522):746-50. doi:10.1136/bmj.292.6522.746

Simon

Confidence intervals for reporting results of clinical trials. Ann Intern Med. 1986;105(3):429-35. doi:10.7326/0003-4819-105-3-429

Ahmed

Fowler

McCredie

VA.

Does sample size matter when interpreting the fragility index?

Crit Care Med. 2016;44(11):e1142-e1143. doi:10.1097/CCM.0000000000001976

Feinstein

AR.

The unit fragility index: an additional appraisal of “statistical significance” for a contrast of two proportions. J Clin Epidemiol. 1990;43(2):201-9. doi:10.1016/0895-4356(90)90186-S

Shochet

Kerr

Polkinghorne

KR.

The fragility of significant results underscores the need of larger randomized controlled trials in nephrology. Kidney Int. 2017;92(6):1469-75. doi:10.1016/j.kint.2017.05.011

Matics

Khan

Jani

Kane

The Fragility Index in a cohort of pediatric randomized controlled trials. J Clin Med. 2017;6(8):79. doi:10.3390/jcm6080079

10.

Ridgeon

Young

Bellomo

Mucchetti

Lembo

Landoni

The fragility index in multicenter randomized controlled critical care trials. Crit Care Med. 2016;44(7):1278-84. doi:10.1097/CCM.0000000000001670

11.

Walsh

Srinathan

McAuley

Mrkobrada

Levine

Ribic

, et al. The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index. J Clin Epidemiol. 2014;67(6):622-8. doi:10.1016/j.jclinepi.2013.10.019

12.

Parisien

Trofa

Dashe

Cronin

Curry

, et al. Statistical fragility and the role of P values in the sports medicine literature. J Am Acad Orthop Surg. 2019;27(7):e324-e329. doi:10.5435/JAAOS-D-17-00636

13.

Khan

Evaniew

Gichuru

Habib

Ayeni

Bedi

, et al. The fragility of statistically significant findings from randomized trials in sports surgery: a systematic survey. Am J Sports Med. 2017;45(9):2164-70. doi:10.1177/0363546516674469

14.

Parisien

Dashe

Cronin

Bhandari

Tornetta

Statistical significance in trauma research. J Orthop Trauma. 2019;33(12):e466-70. doi:10.1097/BOT.0000000000001595

15.

Forrester

Jang

Lawson

Capi

Tyler

WK.

Statistical fragility of surgical and procedural clinical trials in orthopaedic oncology. JAAOS Glob Res Rev. 2020;4(6):e1900152. doi:10.5435/JAAOSGlobal-D-19-00152

16.

Evaniew

Files

Smith

Bhandari

Ghert

Walsh

, et al. The fragility of statistically significant findings from randomized trials in spine surgery: a systematic survey. Spine J. 2015;15(10):2188-97. doi:10.1016/j.spinee.2015.06.004

17.

Khormaee

Choe

Ruzbarsky

Agarwal

Blanco

Doyle

, et al. The fragility of statistically significant results in pediatric orthopaedic randomized controlled trials as quantified by the Fragility Index. J Pediatr Orthop. 2018;38(8):e418-23. doi:10.1097/BPO.0000000000001201

18.

Checketts

Scott

Meyer

Horn

Jones

Vassar

The robustness of trials that guide evidence-based orthopaedic surgery. J Bone Joint Surg Am. 2018;100(12):e85. doi:10.2106/JBJS.17.01039

19.

Higgins

JPT

Altman

Gøtzsche

Jüni

Moher

Oxman

, et al. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ. 2011;343(7829):d5928. doi:10.1136/bmj.d5928

20.

Saris

DBF

Vanlauwe

Victor

Haspl

Bohnsack

Fortems

, et al. Characterized chondrocyte implantation results in better structural repair when treating symptomatic cartilage defects of the knee in a randomized controlled trial versus microfracture. Am J Sports Med. 2008;36(2):235-46. doi:10.1177/0363546507311095

21.

Saris

Price

Widuchowski

Bertrand-Marchand

Caron

Drogset

, et al. Matrix-applied characterized autologous cultured chondrocytes versus microfracture: two-year follow-up of a prospective randomized trial. Am J Sports Med. 2014;42(6):1384-94. doi:10.1177/0363546514528093

22.

Lee

Moon

Lee

Kwon

. Intra-articular injection of type i atelocollagen to alleviate knee pain: a double-blind, randomized controlled trial. Cartilage. Published online August 2, 2019. doi:10.1177/1947603519865304

23.

Knutsen

Drogset

Engebretsen

Grøntvedt

Ludvigsen

Løken

, et al. A randomized multicenter trial comparing autologous chondrocyte implantation with microfracture: Long-Term Follow-up at 14 to 15 Years. J Bone Jointt Surg Am. 2016;98(16):1332-9. doi:10.2106/JBJS.15.01208

24.

Bartlett

Skinner

Gooding

Carrington

RWJ

Flanagan

Briggs

TWR

, et al. Autologous chondrocyte implantation versus matrix-induced autologous chondrocyte implantation for osteochondral defects of the knee. A prospective, randomised study. J Bone Joint Surg Br. 2005;87(5):640-5. doi:10.1302/0301-620X.87B5.15905

25.

Bentley

Biant

Vijayan

Macmull

Skinner

Carrington

RWJ

. Minimum ten-year results of a prospective randomised study of autologous chondrocyte implantation versus mosaicplasty for symptomatic articular cartilage lesions of the knee. J Bone Joint Surg Br. 2012;94 B(4):504-9. doi:10.1302/0301-620X.94B4.27495

26.

Townshend

Emmerson

Jones

Partington

Muller

Intra-articular injection versus portal infiltration of 0.5% bupivacaine following arthroscopy of the knee: a prospective, randomised double-blinded trial. J Bone Joint Surg Br. 2009;91(5):601-3. doi:10.1302/0301-620X.91B5.21932

27.

Gudas

Simonaityte

Čekanauskas

Tamošiunas

A prospective, randomized clinical study of osteochondral autologous transplantation versus microfracture for the treatment of osteochondritis dissecans in the knee joint in children. J Pediatr Orthop. 2009;29(7):741-8. doi:10.1097/BPO.0b013e3181b8f6c7

28.

Spahn

Hofmann

von Engelhardt

LV.

Mechanical debridement versus radiofrequency in knee chondroplasty with concomitant medial meniscectomy: 10-year results from a randomized controlled study. Knee Surg Sport Traumatol Arthrosc. 2016;24(5):1560-8. doi:10.1007/s00167-015-3810-6

29.

Spahn

Kahl

Mückley

Hofmann

Klinger

HM.

Arthroscopic knee chondroplasty using a bipolar radiofrequency-based device compared to mechanical shaver: results of a prospective, randomized, controlled study. Knee Surg Sport Traumatol Arthrosc. 2008;16(6):565-73. doi:10.1007/s00167-008-0506-1

30.

Zorzi

Dall’Oca

Cadossi

Setti

Effects of pulsed electromagnetic fields on patients’ recovery after arthroscopic surgery: prospective, randomized and double-blind study. Knee Surg Sport Traumatol Arthrosc. 2007;15(7):830-4. doi:10.1007/s00167-007-0298-8

31.

Solheim

Hegna

Strand

Harlem

Inderhaug

Randomized study of long-term (15-17 years) outcome after microfracture versus mosaicplasty in knee articular cartilage defects. Am J Sports Med. 2018;46(4):826-31. doi:10.1177/0363546517745281

32.

Zeifang

Oberle

Nierhoff

Richter

Moradi

Schmitt

Autologous chondrocyte implantation using the original periosteum-cover technique versus matrix-associated autologous chondrocyte implantation: a randomized clinical trial. Am J Sports Med. 2010;38(5):924-33. doi:10.1177/0363546509351499

33.

Gudas

Gudaite

Pocius

Gudiene

Cekanauskas

Monastryreckiene

, et al. Ten-year follow-up of a prospective, randomized clinical study of mosaic osteochondral autologous transplantation versus microfracture for the treatment of osteochondral defects in the knee joint of athletes. Am J Sports Med. 2012;40(11):2499-508. doi:10.1177/0363546512458763

34.

Gudas

Kalesinskas

Kimtys

Stankevicius

Toliusis

Bernotavicius

, et al. A prospective randomized clinical study of mosaic osteochondral autologous transplantation versus microfracture for the treatment of osteochondral defects in the knee joint in young athletes. Arthroscopy. 2005;21(9):1066-75. doi:10.1016/j.arthro.2005.06.018

35.

Hoogeslag

RAG

Brouwer

Van Raay

JJAM

. The value of tourniquet use for visibility during arthroscopy of the knee: A double-blind, randomized controlled trial. Arthroscopy. 2010;26(9 Suppl 1):S67-72. doi:10.1016/j.arthro.2009.12.008

36.

Kim

Chun

Wang

Kim

Kang

Yoo

, et al. Microfractures versus a porcine-derived collagen-augmented chondrogenesis technique for treating knee cartilage defects: a multicenter randomized controlled trial. Arthroscopy. 2020;36(6):1612-24. doi:10.1016/j.arthro.2019.11.110

37.

Spahn

Klinger

McKley

Hofmann

GO.

Four-year results from a randomized controlled study of knee chondroplasty with concomitant medial meniscectomy: mechanical debridement versus radiofrequency chondroplasty. Arthroscopy. 2010;26(9 Suppl 1):S73-S80. doi:10.1016/j.arthro.2010.02.030

38.

Yoon

Yoo

Choi

Lee

Kim

, et al. Costal chondrocyte–derived pellet-type autologous chondrocyte implantation versus microfracture for repair of articular cartilage defects: a prospective randomized trial. Cartilage. Published online June 1, 2020. doi:10.1177/1947603520921448

39.

Fisher

. Statistical methods for research workers. In: Breakthroughs in Statistics. New York, NY: Springer; 1992:66-70. doi:10.1007/978-1-4612-4380-9_6

40.

Neyman

Pearson

ES.

IX . On the problem of the most efficient tests of statistical hypotheses. Philos Trans R Soc London Ser A. 1933;231(694-706):289-337. doi:10.1098/rsta.1933.0009

41.

Wasserstein

Lazar

NA.

The ASA statement on p-values: context, process, and purpose. Am Stat. 2016;70(2):129-33. doi:10.1080/00031305.2016.1154108

42.

Bedi

Feeley

Williams

RJ.

Management of articular cartilage defects of the knee. J Bone Joint Surg Am. 2010;92(4):994-1009. doi:10.2106/JBJS.I.00895

43.

Gardner

Altman

DG.

Statistics in medicine confidence intervals rather than P values: estimation rather than hypothesis testing. BMJ. 1986;292:746-50.

44.

Rothman

A show of confidence. N Engl J Med. 1978;299(24):1362-3. doi:10.1056/nejm197812142992410

45.

Morey

Hoekstra

Rouder

Lee

Wagenmakers

EJ.

The fallacy of placing confidence in confidence intervals. Psychon Bull Rev. 2016;23(1_suppl):103-23. doi:10.3758/s13423-015-0947-8

46.

Benjamin

Berger

Johannesson

Nosek

Wagenmakers

Berk

, et al. Redefine statistical significance strength of evidence from P values. Nat Hum Behav. 2017;2(1_suppl):6-10. doi:10.1038/s41562-017-0189-z

47.

Ioannidis

JPA

. The proposal to lower P value thresholds to .005. JAMA. 2018;319(14):1429-30. doi:10.1001/jama.2018.1536

48.

Betensky

RA.

The P-value requires context, not a threshold. Am Stat. 2019;73(Suppl 1):115-7. doi:10.1080/00031305.2018.1529624

49.

Chase Kruse

Matt Vassar

. Unbreakable? An analysis of the fragility of randomized trials that support diabetes treatment guidelines. Diabetes Res Clin Pract. 2017;134:91-105. doi:10.1016/j.diabres.2017.10.007

50.

Tignanelli

Napolitano

LM.

The Fragility Index in randomized clinical trials as a means of optimizing patient care. JAMA Surg. 2019;154(1_suppl):74-9. doi:10.1001/jamasurg.2018.4318

51.

Wayant

Meyer

Gupton

Som

Baker

Vassar

The Fragility Index in a cohort of HIV/AIDS randomized controlled trials. J Gen Intern Med. 2019;34(7):1236-43. doi:10.1007/s11606-019-04928-5

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.22 MB