Abstract
Background:
Randomized controlled trials (RCTs) stand atop the evidence-based hierarchy of study designs for their ability to arrive at results with the lowest risk of bias. Even for RCTs, however, critical appraisal is essential before applying results to clinical practice.
Purpose:
To analyze the quality of reporting of RCTs published in The American Journal of Sports Medicine (AJSM) from 1990 to 2020 and to identify trends over time and areas of improvement for future trials.
Study Design:
Systematic review; Level of evidence, 1.
Methods:
We queried the AJSM database for RCTs published between January 1990 and December 2020. Data pertaining to study characteristics were recorded. Quality assessments were conducted using the Detsky quality-of-reporting index and the modified Cochrane risk-of-bias (mROB) tool. Univariate and multivariable models were generated to establish factors with associations to study quality. The Fragility Index was calculated for eligible studies.
Results:
A total of 277 RCTs were identified with a median sample size of 70 patients. A total of 19 RCTs were published between 1990 and 2000 (t1); 82 RCTs between 2001 and 2010 (t2); and 176 RCTs between 2011 and 2020 (t3). From t1 to t3, significant increases were observed in the overall mean-transformed Detsky score (from 68.2% ± 9.8% to 87.4% ± 10.2%, respectively; P < .001) and mROB score (from 4.7 ± 1.6 to 6.9 ± 1.6, respectively; P < .001). Multivariable regression analysis revealed that trials with follow-up periods of <5 years clearly stated primary outcomes, and a focus on the elbow, shoulder, or knee were associated with higher mean-transformed Detsky and mROB scores. The median Fragility Index was 2 (interquartile range, 0-5) for trials with statistically significant. Studies with small sample sizes (<100 patients) were more likely to have low Fragility Index scores and less likely to have a statistically significant finding in any outcome.
Conclusion:
The quantity and quality of published RCTs published in AJSM increased over the past 3 decades. However, single-center trials with small sample sizes were prone to fragile results.
Keywords
When making treatment decisions, orthopaedic surgeons must consider patient preferences and values, along with their own clinical experience and expertise, all integrated with the best available evidence. Atop the hierarchy of study designs sits the randomized controlled trial (RCT), as it is thought to minimize bias by controlling for it as well as for confounding factors. 28 Over time, there has been a shift in the orthopaedic sports community away from anecdote and opinion toward evidence-based medicine, with increasing demand that treatments are based on best evidence, ideally derived from RCTs. 27
Previous studies have demonstrated a higher level of evidence within sports medicine literature compared with other orthopaedic and surgical subspecialties, 33 with a greater proportion of randomized and prospective study designs. 4,5,9,10 However, quantity does not necessarily equal quality, and the strength of conclusions drawn from this literature may be compromised by conflicting evidence from small, underpowered trials or those of poor methodological quality. 14,26,34 Accordingly, critical appraisal of the literature is an essential step before making inferences from study results and applying that to clinical practice.
The purpose of the present study was to identify and examine the quality of all RCTs published in The American Journal of Sports Medicine (AJSM) from 1990 to 2020 and to identify trends in study quality over time and areas of improvement for future clinical trials. The Fragility Index—a measurement of the robustness of statistically significant findings—and its associated variables were another important outcome. 30 It was hypothesized that the quality of RCTs in AJSM would have increased over the past 3 decades and that the Fragility Index value would be superior to that of other published RCTs in orthopaedic sports medicine.
Methods
Study Selection and Data Extraction
A search was conducted on the AJSM website (http://www.ajsm.org) for RCTs published between January 1990 and December 2020. All other study types (cohort studies, case-control studies, case series, case reports, meta-analyses, and reviews) were excluded. Two investigators (A.S., L.L.) independently reviewed eligible trial abstracts to identify trials with patients randomly allocated to interventions. The abstract screening was then followed by a full-text review. Discrepancies between reviewers were resolved by consensus discussion, involving independent review by the senior authors (D.B.W., G.H.) when an agreement could not be reached.
The following variables were extracted from each included RCT: first author’s profession; study type; cited statistical support or support by an epidemiology department; location of the trial; whether it was multicentered; financial support; body region; category of intervention; prior trial registration (protocols cross-referenced with ClinicalTrials.gov for outcomes); allocation concealment; and blinding of outcome assessors and statistically significant (P < .05) findings.
Quality Assessment
The quality assessments for each study were conducted independently by 2 research associates (A.S., L.L.), with discrepancies resolved by consensus agreement after discussion or independent review by the senior authors. Trials were reviewed using the Detsky quality-of-reporting index and the modified Cochrane risk-of-bias (mROB) tool, which were considered the 2 primary outcome measures. 17 The Detsky score evaluates the quality of reporting based on 14 questions covering 5 categories, each worth 4 points for a total possible score of 20 (Supplemental Table S1, available separately). 11 The score was then converted into a percentage (mean-transformed Detsky score). Studies scoring >75% on the transformed score were considered high quality.
The mROB assessment evaluates the methodological quality of the study based on the following 10 categories: (1) randomization; (2) allocation concealment; (3) orthopaedic surgeon or treatment provider blinding; (4) assessor blinding; (5) patient blinding; (6) patient follow-up; (7) selective outcome reporting; (8) objectivity of outcomes; (9) adequate sample size; (10) and orthopaedic surgeon experience with treatment. The maximum score on this scale is 10 points, indicating a low risk of bias. Trials scoring ≥8 of 10 points on the mROB assessment were considered high quality.
Fragility Index
Studies with a statistically significant finding in any reported dichotomous outcome were selected for the Fragility Index calculation. The Fragility Index for each outcome was calculated according to the method described by Walsh et al 30 using 2 × 2 contingency tables. The P value for each outcome was first recalculated using a 2-sided Fisher exact test. We then added events to the group with a smaller number of events while subtracting nonevents from the same group to keep the total number of participants constant. Events were added iteratively until the calculated P value became > .05. The smallest number of additional events required to obtain P > .05 was the Fragility Index for that outcome.
Statistical Analysis
The kappa statistic (κ) was used to calculate the level of agreement between reviewers for the inclusion of studies. An a priori κ criterion of >0.65 was selected to indicate adequate agreement. 8 The intraclass correlation coefficient (ICC) with a 95% CI was used to calculate interrater agreement for the mROB assessment and the Detsky score. Descriptive statistics were calculated, with categorical variables presented as proportions and continuous data presented as means with standard error of the mean (SEM).
All statistical tests were 2-tailed, and significance was set at P < .05. The primary analysis examined the effect of independent variables on the dependent variables (mean-transformed Detsky score and mROB). Analysis of variance (ANOVA) with a Bonferroni correction was used to account for multiple comparisons, and independent Student t tests were used to compare the differences in the mean-transformed Detsky scores and mROB scores. Variables significantly associated with study quality in the univariate analyses for either quality assessment tool were included in a multivariable linear regression model, with results reported as beta coefficients with 95% CIs.
Studies were grouped into 3 time periods, each spanning 1 decade: t1 (1990-2000); t2 (2001-2010); and t3 (2011-2020). The chi-square test and ANOVA were used to determine whether there were significant differences between the trials within each decade for the previously stated categorical and continuous independent variables, respectively. Linear regression was used to assess for significant changes in the transformed Detsky scores and mROB scores over time. Similarly, the association between the Fragility Index with sample size, funding, trial registration, number of centers, and Detsky and mROB scores was evaluated with the Mann-Whitney U test or the Kruskal Wallis test for categorical variables and the Pearson correlation coefficient (r) for continuous variables. The correlations were grouped as follows: r < 0.20 = no correlation; 0.20 < r < 0.40 = weak correlation; 0.40 < r < 0.60 = moderate correlation; and r > 0.60 = strong correlation. All analyses were performed using SAS Version 9.4 (SAS Institute Inc).
Results
Study Identification and Characteristics
A total of 7143 citations were published in AJSM between January 1990 and December 2020. After the exclusion of 6866 nonrandomized trials, 277 RCTs (3.9%) were included in our analysis (Table 1 and Supplemental Table S2). The agreement between the reviewers regarding the eligibility of the studies was almost perfect (κ = 0.99).
Characteristics of the Included RCTs (N = 277) a
a Data are presented as n (%). Conflict of interest indicates ≥1 author reporting a financial conflict of interest in the author disclosures. Statistical support indicates the support of an epidemiologist or a statistician in the acknowledgment or among the listed authors. MD, medical doctor; PMR, physical medicine and rehabilitation; PRP, platelet-rich plasma; PT, physical therapist; RCT, randomized controlled trial.
b Nonsurgical treatments included rehabilitation studies, injury prevention, and laboratory- or imaging-based studies.
c Categories are not mutually exclusive.
The 277 RCTs published in AJSM between 1990 and 2020 demonstrated an increasing trend in the number of trials published over time (Figure 1). The annual number of studies published and the year of publication were strongly correlated (r = 0.89). A total of 19 RCTs were published between 1990 and 2000 (t1), 82 RCTs between 2001 and 2010 (t2), and 176 RCTs between 2011 and 2020 (t3) (Table 2).

Number of randomized controlled trials published in AJSM and the mean Detsky score from 1990 to 2020. Error bars represent SEM. The Pearson correlation for the number of studies versus the year of publication, r = 0.89; and for the mean-transformed Detsky score versus year of publication, r = 0.83. AJSM, The American Journal of Sports Medicine; ICC, Pearson correlation coefficient.
Characteristics of Trials Across Decades of Publication a
a Data are presented as n (%) or mean ± SEM unless otherwise indicated. Bold P values indicate statistically significant differences between decades (P < .05). MD, mean difference; mROB, modified Cochrane risk-of-bias.
b 3×2 chi-square tests were used for categorical variables and 1-way analysis of variance was used for continuous variables, followed by unpaired t test pairwise comparisons for variables with P < .05.
The mean sample size of included trials was 139.7 ± 18 patients (range, 10-3611 patients). The median sample size was 70 patients; 201 studies (72.6%) had
An a priori sample size calculation was completed in 203 (73.3%) of the included trials. Of trials that showed an a priori sample size calculation, 137 (67.5%) enrolled a sufficient number of patients to achieve statistical power and 75 (36.9%) reported maintaining the required sample size at the follow-up. Of the 172 trials that had authors who reported financial support or conflicts of interest, 71 (41.3%) received funding or grants from industry.
Statistically significant results in any study outcome were reported in 166 trials (59.9%). Of these 166 trials, there was a significant finding in the primary outcome of 72 trials (43.4%). The correlation between Detsky and mROB scores was moderate (r = 0.67). The Science Citation Index weakly correlated with the Detsky score (r = –0.14) and the mROB score (r = –0.14). All other individual study variables are reported in Supplemental Table S3.
Assessment of the Detsky Index Quality Score
The ICC for interrater agreement on the Detsky score was 0.82 (95% CI, 0.64-1), indicating very high agreement (Supplemental Table S4). The mean-transformed Detsky score was 84.7% ± 0.7% (Figure 1). One trial (0.4%) scored <50%, 65 trials (23.5%) scored between 50% and 75%, and 211 trials (76.2%) scored >75%.
Univariate analyses demonstrated significant associations between the Detsky score and the type of intervention, a clearly stated primary outcome, a priori trial registration, the area of body studied, length of follow-up, type of financial support, and use of platelet-rich plasma (PRP) (Table 3). Multivariable linear regression analysis subsequently demonstrated significant independent associations between improved Detsky scores and follow-up durations of <5 years; trials on the shoulder, elbow, knee, or foot/ankle (reference: multiple/injury prevention); a priori trial registration; and a clearly stated primary outcome (Table 4).
Univariate Analysis of Characteristics Associated With Quality Scores a
a Scores are reported as mean ± SEM. Bold P values indicate variables with statistically significant differences within subgroups (P < .05); these variables were included in the multivariable analysis (Table 4). COI, conflict of interest; MD, medical doctor; mROB, modified Cochrane risk-of-bias; PMR, physical medicine and rehabilitation; PRP, platelet-rich plasma; PT, physical therapist.
b Unpaired t tests for categories with 2 variables and 1-way analysis of variance for categories with >2 variables.
Multivariable Analysis of Characteristics Associated With Quality Scores a
a Dashes indicate variables not included in the analysis. Bold P values indicate statistical significance (P < .05). COI, conflict of interest; mROB, modified Cochrane risk-of-bias; multi, multiple; PRP, platelet-rich plasma.
Detsky scores significantly increased over time between 1990 and 2020 (β = 3.5 [95% CI, 2.5-4.5]; P < .001). The overall mean-transformed Detsky score increased significantly from t1 (68.2% ± 9.8%) to t2 (82.7% ± 11.6), and again from t2 to t3 (87.4% ± 10.2%) (P < .001 for both) (see Table 2). The Detsky score was strongly correlated with the year of publication (r = 0.83). The mean sample size, proportion of multicenter collaborations, number of industry-funded studies, and significant findings did not change over time (see Table 2).
Risk-of-Bias Assessment
The overall interrater agreement for the mROB score was 0.88 (95% CI, 0.72-1), corresponding to a very high agreement (Supplemental Table S4). The mean mROB assessment score was 6.6 ± 0.1 points (Figure 2). The domains of “treatment-administrator blinding” (30/277) and “loss to follow-up >5%” (86/277) had the lowest scores, indicating a prevalent risk of study bias in these categories (Supplemental Table S5).

Number of randomized controlled trials published in AJSM and the mean mROB score, 1990 to 2020. The Pearson correlation coefficient for the number of studies versus the year of publication, r = 0.89; for the mROB score versus the year of publication, r = 0.76. AJSM, The American Journal of Sports Medicine; mROB, modified Cochrane risk-of-bias.
Univariate analysis showed a significant association with mROB scores and the type of trial, placebo-controlled comparison group, clearly stated primary outcome, a priori trial registration, number of study centers, area of body studied, length of follow-up, type of financial support, use of PRP, and those reporting results of a previous trial (P < .05) (see Table 3). Multivariate regression analysis showed that trials investigating the shoulder, elbow, or knee (reference: multiple/injury prevention), with follow-ups of <4 weeks, 1 to 12 months, 12 to 24 months, and 24 to 36 months (reference: >5 years), or a clearly stated primary outcome were associated with higher mROB scores (see Table 4).
The mROB scores significantly increased over time between 1990 and 2020 (β = 0.07 [95% CI, 0.04-0.10]; P < .001). The mean mROB score significantly increased from t1 (4.7 ± 1.6) to t2 (6.4 ± 1.7), and again from t2 to t3 (6.9 ±1.6) (P < .001 for both) (Table 2). The mROB score was moderately correlated with the year of publication (r = 0.76).
Fragility Index
The median Fragility Index was 2 (interquartile range, 0-5) for the 44 included studies, with significant findings in dichotomous outcomes (Supplemental Figure S1 and Table 5). Using the 2-sided Fisher exact test, 13 studies became nonsignificant when the P value was calculated, and therefore had a Fragility Index of 0. Increasing the Fragility Index value (indicating less fragility) was associated with a sample size of ≥100 patients (P = .002), a clearly stated primary outcome (P = .010), and a statistically significant finding in the primary outcome (P = .020) (see Table 4). The number of patients lost to follow-up was greater than the Fragility Index score in 75% (33/44) of studies. The Fragility Index was moderately correlated with the sample size (r = 0.68). The Fragility Index was not correlated with the transformed Detsky score (r = 0.23) or the mROB score (r = 0.16).
Fragility Index Values and Study Characteristics a
a Data are presented as median [interquartile range]. Bold P values indicate statistically significant differences within subgroups (P < .05).
b Kruskal Wallis tests for variables of >2 categories and Mann-Whitney U tests for variables of 2 categories.
c Trials with significant findings in any outcome were included in the Fragility Index calculation for that outcome.
Discussion
In examining all RCTs published in AJSM over 30 years, it was demonstrated that the mean methodological quality of RCTs in AJSM is relatively high and has increased over time. Multivariable analysis revealed that trials with follow-up periods of <5 years, a clearly stated primary outcome, and a focus on either elbow, shoulder, or knee were associated with higher mean-transformed Detsky and mROB scores. The median Fragility Index of studies with statistically significant findings was 2, and the number of patients lost to follow-up was greater than the Fragility Index in 75% of studies.
The present findings reflect similar results from a recent review of all surgical RCTs published in a high-impact general orthopaedic journal 29 from 1988 to 2013, which also noted a decrease in sample sizes over time despite increasing numbers of RCTs and improved study quality. The trend has also been observed in other surgical subspecialties. 1,7,35 A previous appraisal of the quality of all studies published in AJSM was conducted in 2016 by Brophy et al. 5 They identified an increase in the number of RCTs published and the level of evidence from the 1991-1993 and 2001-2003 periods to the 2011-2013 period. This study was limited by only sampling 3-year periods and generalizing several qualitative parameters as a proxy for methodological quality. At that time, the authors called for a more comprehensive study to assess parameters of quality across a wider breadth of published studies utilizing standardized and validated methodological quality instruments, 5 as performed in the present study.
Both the Detsky and mROB quality metrics showed relatively high study quality of published RCTs from 1990 to 2020. Identification of prevalent strengths and weaknesses within trial quality can help guide clinicians, researchers, and reviewers in performing and publishing high-quality research within sports medicine going forward. For example, we found that clearly stating a primary outcome was associated with higher quality on all metrics. This alludes to the authors’ understanding of the research process and a structured, scientific approach to writing and reporting the trial. Based on this result, those aiming to answer orthopaedic sports medicine questions through a randomized trial should ensure that a primary outcome is identified before the initiation of the research and that it is communicated in their paper.
During the data analysis, it was noted that the Detsky and mROB tools have several potential shortcomings in the context of assessing surgical trials. For example, the mROB tool places significant emphasis on blinding. However, a trial with a surgical versus nonsurgical intervention, in which neither the orthopaedic surgeon nor the patient can be blinded, is penalized by 3 points (30% of the total score). Additionally, no quality score incorporates a length of follow-up as a measure of strength despite the importance of long-term comparisons for surgical interventions. There is penalty for loss to follow-up of >5%, which disproportionately affects trials with a longer follow-up due to their increased propensity to lose more patients. This is seen in our finding that trials with follow-ups of <3 years had higher-quality scores. A lack of correlation between Detsky and mROB scores with other proxies for study quality, such as the Fragility Index, Citation Index, and sample size/multicenter collaboration, was observed. One weakness of both tools is that they combine assessments of methodological quality with the quality of reporting into a composite score. It is important to distinguish between them—a trial that is poorly designed with notable bias but is well reported can receive a high-quality score, and vice versa. 25 Unfortunately, all well-known methodological quality questionnaires for RCTs have some flaws, primarily because of the clinical settings in which they were developed. 7,15,16
Given the shortcomings of the quality assessment scores utilized to determine a high-quality grade for the RCTs we analyzed, other metrics may shed light on the confidence with which we can draw inferences from the results of these studies. The Fragility Index assessment highlights possible shortcomings of studies with small sample sizes and their robustness. For example, 13 of 44 studies reporting statistically significant results had a Fragility Index of 0, meaning that when the analysis was performed using a more conservative Fisher exact test, they were shown to be nonsignificant. Studies with a sample size of ≥100 patients had a median Fragility Index of 0.5, meaning that only 1 patient changing to a nonevent would alter the study’s conclusions. It is interesting to note that, despite larger sample sizes being associated with a greater likelihood of a statistically significant difference in study outcomes, the mean RCT sample size in AJSM has shown a trend to decrease (β = –3.8 [95% CI, 1.4 to –9.0]; P = .15). The median Fragility Index of 2 is comparable with other RCTs in orthopaedic sports medicine and spinal surgery but lags behind orthopaedic trauma (Fragility Index = 5) and far behind internal medicine subspecialty trials published in high-impact factor journals (eg, New England Journal of Medicine, The Lancet, Journal of the American Medical Association, BMJ, and Annals of Internal Medicine) (Fragility Index = 13). 12,13,21 –23
Within the time frame we examined, small sample sizes (<50 patients; n = 75 studies) and a high proportion of single-center trials (86.3%) were observed, and there was a nonsignificant trend toward smaller mean sample sizes over time (see Table 2). Our analysis demonstrated increased fragility of the results from trials with <100 patients. Additionally, most trials (63%) failed to meet their a priori sample size calculations at the final follow-up (Supplemental Table C), and the number of patients lost to follow-up exceeded the Fragility Index in 75% of studies with significant findings. Taken together, these metrics indicate a risk of type I error in many trials that reported significant findings. Conversely, small trials are also at risk for type II error by failing to demonstrate a true difference in outcomes because of lack of power. Both errors are problematic in that they may affect the distribution of health-research resources and funding 19 and erode confidence in the efficacy of surgical procedures. 3 An opportunity exists to encourage multicenter collaboration within the orthopaedic community to produce higher-quality research in this regard. At present, orthopaedic surgery and sports medicine have lagged behind other medical disciplines in the percentage of collaborative, multicenter trials. 5,6,31 Although conducting larger, well-conducted trials may be time-consuming and expensive, the effort will increase the likelihood of producing meaningful and truthful results, with increased collaboration among institutions and appropriate planning. 18 –20,32
Limitations
Limitations of the present study include that the review did not consider trials published in other journals, limiting the generalizability of the results about the trends in the orthopaedic sports medicine literature to the global scientific community. However, AJSM has one of the highest impact factors among orthopaedic sports medicine journals and is likely to represent higher-quality orthopaedic trials. The quality of reporting of the included trials may have hindered the evaluation of the true methodological quality. Previous research has shown that few clinical trials adequately report on a number of statistical features, including the identification of primary or secondary analyses and providing or reporting sample size calculations. 24 Although certain criteria of the quality scores addressed this, further steps could be taken in the future to more comprehensively assess the adequacy of statistical reporting. 2
Conclusion
The quantity and quality of published RCTs published in AJSM increased over the past 3 decades. Although these improvements are encouraging, single-center trials with small sample sizes (<100 patients) are still common (72.6% of studies) and produce fragile results. To limit bias and demonstrate the efficacy of orthopaedic treatments moving forward, there is a need to continue to conduct high-quality trials of appropriate sample size and rigorous design. This effort will undoubtedly demand an enhanced spirit of collaboration among the orthopaedic community.
Supplemental material for this article is available at https://journals.sagepub.com/doi/full/10.1177/23259671231161293#supplementary-materials.
Supplemental Material
Supplemental Material, sj-pdf-1-ojs-10.1177_23259671231161293 - Assessment of 30 Years of Randomized Controlled Trials in The American Journal of Sports Medicine: 1990-2020
Supplemental Material, sj-pdf-1-ojs-10.1177_23259671231161293 for Assessment of 30 Years of Randomized Controlled Trials in The American Journal of Sports Medicine: 1990-2020 by Ajay Shah, Graeme Hoit, Lucy Lan and Daniel B. Whelan in Orthopaedic Journal of Sports Medicine
Footnotes
Final revision submitted November 22, 2022; accepted January 19, 2023.
The authors declared that there are no conflicts of interest in the authorship and publication of this contribution. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
