Abstract
Purpose:
The American Academy of Orthopaedic Surgeons produces clinical practice guidelines for the treatment of orthopedic injuries. We examined the strength of the evidence underlying these recommendations in order to answer the following questions: (1) Have AAOS work groups improved guideline creation practices to locate evidence to generate strong recommendations? (2) Is there variability in the available evidence based on anatomic site or stage of care? (3) Has the level of evidence supporting improved over time?
Methods:
Twenty-two current guidelines of the Academy were examined which yielded 408 individual recommendations. These recommendations were assigned one of five strength of evidence ratings (strong, moderate, limited, inconclusive, consensus) by the guideline panel, based on the availability and quality of the supporting evidence. From these guidelines, we extracted all of the recommendations and their corresponding evidence ratings. We then classified the recommendations by stage of care, year, and anatomical site.
Results:
The distribution of the levels of evidence was as follows: 77 (18.9%) were based on consensus; 53 (13.0%) were inconclusive; 93 (22.8%) were based on limited evidence; 112 (27.5%) were based on moderate evidence; and 73 were based on (17.9%) strong evidence. Strong strength of evidence was found in 45.2% of the recommendations for preventive/screening/diagnostic care, 41.1% of nonsurgical treatment, 45.1% of surgical treatment, 51.1% of rehabilitation/postoperative treatment, and 45.5% of the recommendations that had mixed stages of care. Inconclusive strength of evidence was found to be prevalent from 2009–2013, but was eliminated starting in 2014.
Conclusions:
Only 73 (17.9%) recommendations generated by the Academy in its 22 clinical practice guidelines are based on a “strong” strength of evidence. More robust research is needed in orthopedics to bolster confidence in the recommendations in future guideline updates.
Introduction
Clinical practice guidelines (CPGs) are used by orthopedic surgeons to guide clinical decision-making. 1 These guidelines are developed by integrating clinical expertise with credible scientific evidence. 2 –6 However, the treatment recommendations constituting these guidelines are often based on expert opinion, low-level evidence, or low quality evidence 7 –9 making them susceptible to bias. 7,10,11 One study reported that only 11% of the recommendations comprising the cardiology guidelines were supported by strong evidence. Furthermore, the proportion of recommendations based on inconclusive evidence increased over time. 12 A lack of high quality studies has also been indicated in orthopedic surgery, as fewer than 5% of all orthopedic studies are randomized trials. 13 –16 Randomized trials are often considered the “gold standard” in medicine. 17 Given that there is often a small number of recommendations underpinned by high quality evidence, such as randomized trials, it is important that research be conducted to elucidate ways in which to limit bias and adequately implement these methodological safeguards in future published reports.
The American Academy of Orthopaedic Surgeons (AAOS) produces CPGs designed to guide the treatment of orthopedic injuries. These guidelines are developed in accordance with a level of evidence grading method 18 published by The Journal of Bone and Joint Surgery—American Volume. 19 This method assigns evidence ratings (e.g., strong, moderate, limited, inconclusive or consensus) to the supporting literature of each recommendation. Strength of evidence is determined by the types of studies that are included in the CPG, with randomized clinical trials and systematic reviews resulting in the highest levels of evidence. Hanzlik et al. examined more than 1,000 studies published by Journal of Bone and Joint Surgery—American Volume from 1975 to 2005 and assessed each study for strength of evidence. The percentage of Level-I studies (e.g., randomized controlled trials or systematic reviews) in this journal increased from 4% in 1975 to 21% in 2005. Despite such growth, the authors cautioned about the need for further improvement. 20
Other studies have affirmed the need for improvement in the quality of orthopedic research. Evidence suggests that over 85% of lateral epicondylitis surgery trials had unsatisfactory methodological quality. 21 Furthermore, the quality of elbow surgery trials has not improved over time, and key methodological components were often missing from the trial reports. 22 While some improvements to the poor methodological quality and reporting of pediatric orthopedic trials have been noted over time, only a minority of these trials received an acceptable level of quality on the Detsky quality scale. 23 In clinical trials of upper extremity disorders, 57% either did not describe or described inappropriate randomization methods. 24 Additionally, guidelines developed under the purview of the AAOS have been criticized for the lack of high quality evidence supporting the recommendations. 25 It is important to reverse this trend by conducting orthopedic research with high methodological quality standards which could bolster evidence ratings and make treatment recommendations more conclusive.
The number of CPGs produced in recent years has increased at an accelerating pace. 7,10 Given the considerable increase in the amount of Level-I research publications involving orthopedic injuries in recent years, 20 it is hoped that increases in research production should have led to stronger evidence for guideline recommendations. To date, this has yet to be reported. Therefore, our study aims to answer the following questions: (1) Have AAOS work groups improved guideline creation practices to locate evidence that generates strong recommendations or will orthopedic surgery, like many other fields, 12,26 –29 have weak recommendations supported by low quality evidence? (2) Is there variability in the available evidence for recommendations based on anatomic site or stage of care? (3) Has the level of evidence supporting the recommendations improved over time?
Materials and methods
This study was not subject to institutional review board oversight because it did not meet the regulatory definition of human subject research as defined in 45 CFR 46.102(d) and (f) of the Department of Health and Human Services’ Code of Federal Regulations. 30 We applied the relevant Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines for reporting descriptive statistics. 31
One investigator located all current clinical practice guidelines from the AAOS website on December 15, 2019. To date, the AAOS has published 22 guidelines (Table 1), all of which were included in the present study. We did not include Companion Consensus Statements, which are recommendations with no evidence rating that are provided to specialty societies so they can issue consensus statements. Such recommendations are provided to specialty societies to issue consensus statements. We also excluded Appropriate Use Criteria, since these documents identify areas where sound data are not available or do not provide sufficient evidence to apply to the full range of patients seen in clinical practice. 32
AAOS guidelines.
* Denotes the year the guidelines were established per.
The guidelines provide a list of included studies and an assessment of the methodological quality of these studies. Based on the results of these assessments, a strength of evidence rating is assigned to each recommendation by the guideline panel. These ratings are presented in a summary of recommendations section in each CPG. Details of this rating system are outlined in Table 2.
Strength of evidence.
Strength of evidence was identified by—http://www.orthoguidelines.org/guidelines.
Two investigators independently extracted each recommendation and its corresponding evidence rating from the CPGs. The investigators next stratified each recommendation by year, stage of care (preventive, screening, diagnostic, nonsurgical, surgical, postoperative, or rehabilitation), and anatomic site (upper, lower, spine, or other). A consensus meeting was planned a priori to resolve disagreements about extractions or stratifications; however, there were no disagreements between the investigators, and this meeting was not ultimately needed.
We used Microsoft Excel to calculate the descriptive statistics.
Results
Twenty-two guidelines composed of 408 individual recommendations were identified (Table 1). Across the guidelines, the distribution of the strength of evidence for these recommendations was as follows: 77 (18.90%) were consensus statements, 53 (13.00%) were inconclusive, 93 (22.80%) were based on limited evidence, 112 (27.50%) were based on moderate evidence, and 73 (17.90%) were based on strong evidence (Table 2 and Figure 1).

Strength of evidence by year.
Between 2009 and 2012, the strong/moderate ratings ranged between 0.0% and 21.2%. There was then a change that is seen from 2013 to 2019, resulting in a higher percentage of Moderate/Strong rating, ranging from 43.5% to 66.6%.
Recommendations were then stratified by stage of care (Table 3). Overall, 157 (38.4%) recommendations addressed Preventative Care/Screening/Diagnostic, 51 (12.5%) addressed Nonsurgical Treatment, 133 (32.6%) were Surgical Treatment, 45 (11.0%) discussed Rehabilitation & Post-Operative Treatment, 22 (5.4%) were about recommendation associated with other “mixed” stages of care, Strong/Moderate evidence was found in 45.2% of the recommendations for Preventive Care/Screening/Diagnostic, 41.1% of nonsurgical treatment, 45.1% of surgical treatment, 51.1% of rehabilitation/postoperative treatment, and 45.5% of the recommendations on “mixed.”
Guideline categories.
* Mixed—Refers to any multiple categories other than screening/diagnostic and Rehabilitation/Postoperative. **n = 407 due to Psychosocial Factors not being treated by orthopedic physicians.
We stratified the recommendations by guideline categories (Table 3). Overall, 200 (49.0%) recommendations focused on the lower extremity, 127 (31.1%) on the upper extremity, 38 (9.3%) on joints, 16 (5.74%) on Infection based guidelines, and 26 (6.4%) on Oncology based guidelines. Strong/Moderate strength of evidence was found in 40.2% of the recommendations for the lower extremity, 51.0% of the upper extremity, 39.5% of the joints, 43.8% of the infectious conditions and 34.6% of recommendations on oncological related guidelines.
Discussion
The AAOS CPGs provide guidance for the management of orthopedic conditions. Our results suggest that the quality of evidence underpinning the recommendations has improved; the majority of the recommendations between 2009 and 2013 were inconclusive, compared with the majority of recommendations between 2014 and 2019 which are based on moderate quality evidence. Yet despite this improvement, half of all AAOS recommendations are inconclusive or based on limited quality evidence. Of the guideline categories, lower extremity had the highest recommendation and the highest percentage of strong to moderate evidence. But aside from that, the other four guidelines, with a variety of recommendations, had similar percentages of strong to moderate evidence. Our findings suggest the need for greater emphasis on conducting randomized trials and high-level systematic reviews, when appropriate, and using other study types that contribute to greater confidence in the therapeutic effect estimates and guideline recommendations for all the guideline categories in the AAOS CPGs.
When there is a lower levels of evidence for a recommendation, it leaves a level of interpretation for orthopedic surgeons as to how to approach a problem. In Cabana et al. it was shown that at least 10% of physicians in the study disagreed with a recommendation in a CPG due to their individual interpretation. 33 By increasing the level of evidence for CPGs, this will result in greater evidence backing a recommendation, resulting in greater confidence that the procedure will do well for the patient.
While our study indicates that only 18% of the AAOS guidelines are supported by high quality evidence, similar shortcomings have been reported in guidelines produced by other organizations, such as the American Heart Association and American College of Cardiology, American College of Obstetricians and Gynecologists, American College of Chest Physicians, and National Comprehensive Cancer Network Guidelines, with high-level evidence comprising less than 1% to 30% of the guideline recommendations. 12,26 –29 While our findings indicate a lack of high quality evidence on which to base treatment recommendations in orthopedics, the same result has been found across diverse specialties, which suggests that this issue is problematic across medicine.
In 2010, the AAOS published the CPG “Optimizing the Management of Rotator Cuff Problems.” Shortly after publication, the guideline received criticism owing to a large number of inconclusive recommendations from low quality evidence or expert opinion. Critics of this guideline were concerned that due to the lack of adequate evidence, it would be unjust to call this work a “guideline” at all, stating that “inconclusive recommendations, do not a guideline make.” 25 Because this guideline was based primarily on expert opinion, critics stressed the limitations of using such “evidence” and argued that the “experts” may see different groups of patients than other orthopedic surgeons, may value certain outcomes more than patients, or may have undisclosed conflicts of interest, all of which may bias their perspectives. 34 However, it appears the AAOS and orthopedic researchers alike understood the need for more robust research regarding rotator cuff repair. As a result, the 2019 update of the rotator cuff repair guideline has 0% of its recommendations as inconclusive in comparison to 55.5% in its predecessor. The AAOS answering the call for a more evidence based guideline in this case demonstrates the importance of scrutinizing the quality of guideline recommendations and advocating for a higher level of evidence from published studies and guidelines alike.
It has also been shown that there are methodological flaws with meta-analyses conducted in systematic reviews. 35 This limitation results in lower quality of evidence for the CPGs. With an increase in randomized trials and other high-level studies, and better methodology for systematic reviews, this will boost the quality of systematic reviews. If these steps can be accomplished, it will result in a higher quality of evidence for the recommendations in CPGs.
As the majority of the AAOS CPG recommendations are based on inadequate evidence, the opportunity for orthopedic researchers, and the agencies that fund them, to address these knowledge gaps is readily available. These under-addressed research areas should be attended to by methodologically sound research. Areas of methodological deficiency in orthopedic studies include: improper reporting of randomization 24 ; poor descriptions of recruitment, poor reporting of allocation concealment 36 ; inadequate or poorly described power calculations, inadequate reporting of the number of patients needed for successful treatment 21 ; and poor adherence to Consolidated Standards of Reporting Trials (CONSORT) standards. 37 There is also evidence of selective outcome reporting bias in orthopedic trials, as discrepancies have been found between the primary outcomes listed in clinical trial registries and those detailed in the published report. 38 Trial registration is also problematic. One study 38 found that only 38% of orthopedic studies were adequately registered. These deficiencies should be a focus of orthopedic journals, funding agencies, and researchers. If funding agencies and journals would adopt policies requiring trial registration and adherence to reporting guidelines, these common methodological shortcomings in orthopedic surgery research might improve. Improvements to the quality of orthopedic research would bolster the evidence used to establish guideline recommendations.
Limitations
Our study only evaluated CPGs published by the AAOS, and therefore is not generalizable outside of orthopedic surgery. Additionally, our study is not generalizable toward other AAOS quality measures such as Appropriate Use Criteria, or other published literature. Because some of the guidelines were published prior to the current year, they may not be an accurate reflection of the current levels of evidence in orthopedic literature, and therefore our study may underestimate the current research quality in orthopedics. Furthermore, for some recommendations, establishing a double blind randomized controlled trial may not be possible, and achieving a high level of quality is diminished. 39
Conclusion
Only 73 (17.9%) of the recommendations included in the Academy’s 22 guidelines were based on a “strong” strength of evidence. Almost half of all recommendations are supported by limited/low quality evidence or expert opinion. Although the strength of recommendations has improved over time, there is still much room for improvement in orthopedic research.
Footnotes
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr Vassar reports receipt of funding from the National Institute on Drug Abuse, the National Institute on Alcohol Abuse and Alcoholism, the US Office of Research Integrity, Oklahoma Center for Advancement of Science and Technology, and internal grants from Oklahoma State University Center for Health Sciences—all outside of the present work. All other authors have no conflicts to report.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
