Abstract
Study Design:
A systematic cross-sectional survey of systematic reviews (SRs).
Objective:
To evaluate the methodological quality of spine surgery SRs published in 2018 using the updated AMSTAR 2 critical appraisal instrument.
Methods:
We identified the PubMed indexed journals devoted to spine surgery research in 2018. All SRs of spine surgical interventions from those journals were critically appraised for quality independently by 2 reviewers using the AMSTAR 2 instrument. We calculated the percentage of SRs achieving a positive response for each AMSTAR 2 domain item and assessed the levels of confidence in the results of each SR.
Results:
We identified 28 SRs from 4 journals that met our criteria for inclusion. Only 49.5% of the AMSTAR 2 domain items satisfied the AMSTAR 2 criteria. Critical domain items were satisfied less often (39.1%) compared with noncritical domain items (57.3%). Domain items most poorly reported include accounting for individual study risk of bias when interpreting results (14%), list and justification of excluded articles (18%), and an a priori establishment of methods prior to the review or registered protocol (18%). The overall confidence in the results was rated “low” in 2 SRs and “critically low” in 26.
Conclusions:
The credibility of a SR and its value to clinicians and policy makers are dependent on its methodological quality. This appraisal found significant methodological limitations in several critical domains, such that the confidence in the findings of these reviews is “critically low.”
Introduction
The number of published systematic reviews (SRs) of studies of spinal interventions has increased rapidly over the past 20 years. 1 Clinical decision and policy makers rely on the results of high-quality SRs because they are generally recognized as representing the highest level of evidence. SRs are classified as observational research studies. As such, they are subject to a range of biases. The quality of a SR is dependent on how rigorously the research team attempts to limit these biases by following methodologically sound practices. It is important that users be able to distinguish high-quality reviews.
Several instruments have been designed to assess the quality of SRs based on established methodological processes for such reviews. 2 -7 One popular instrument, the AMSTAR (A MeaSurement Tool to Assess systematic Reviews), was first developed and published in 2007 to evaluate SRs of randomized controlled trials. 6 It was revised and updated in 2017 to include SRs of nonrandomized studies and to respond to critiques of the original. The result was the AMSTAR 2, a 16-item tool with a comprehensive user guide that provides an overall rating based on weaknesses in 7 critical domains. 8 Furthermore, the AMSTAR 2 provides a scheme for interpreting weaknesses in both critical and noncritical domains and provides an overall confidence level in the results of the review.
Whereas we identified 1 study that assessed spine surgery meta-analyses using the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) checklist, 1 we have not identified any study that critically appraises SRs in the spine surgery literature using the AMSTAR 2. Therefore, the purpose of the current study is to evaluate the methodological quality of spine surgery SRs published in 2018 using the AMSTAR 2 critical appraisal instrument.
Methods
Study Selection
We identified PubMed indexed journals from the Web of Science’s Journal Citation Reports in 2018 with an impact factor ≥2 and devoted to spine surgery research using the words “spine” or “spinal” in the title of the journal. The impact factor measures how often a publication is cited, not the quality of research. Also, we were interested in those journals with the highest citation frequency. We chose the year 2018 because it represented the latest complete year available at the beginning of our review. Each journal was searched for publications listed as a SR or meta-analysis. SRs were then screened to determine if they met a priori inclusion criteria. We included only those SRs that studied spine surgical interventions or adjunctive treatment to spine surgical interventions such as venous thromboembolism prophylaxis in spine surgery. We excluded reviews of prognostic factors, diagnostic criteria, therapy in nonoperative patients, and therapy in postoperative patients (eg, bracing or rehabilitation after spine surgery).
Study Evaluation
The quality of each SR was evaluated independently by 2 epidemiologists trained in SR methodology and critical appraisal (JD, AS) using the AMSTAR 2 (Table 1). Differences between evaluators in rating each domain item were identified. The evaluators then discussed the rationales for their respective score, and the differences were resolved when both evaluators agreed on a common assessment. The creators of AMSTAR 2 proposed 3 answer options for each item: “yes,” “partial yes,” or “no.” For our purposes, we dichotomized the results, so that either a “yes” or “partial yes” denoted a positive result and “no” a negative result. Additionally, “no meta-analysis” was an option for items 11, 12, and 15. We then applied the AMSTAR 2 rating scheme for interpreting the overall confidence in the results of the SR (Table 2).
16-Item AMSTAR 2 Instrument: Critical Domains (Bold) Include Items 2, 4, 7, 9, 11, 13, and 15.
Rating Overall Confidence in the Results of the Review.
a Multiple noncritical weaknesses may diminish confidence in the review, and it may be appropriate to move the overall appraisal down from moderate to low confidence.
Data Analysis
The included SRs were summarized using descriptive statistics. We calculated the percentage of SRs achieving a positive response for each item and the percentage achieving each of 4 different levels of confidence in the results. For SRs without a meta-analysis, items 11, 12, and 15 did not contribute to the percentage calculations. In addition to providing overall results for all SRs, we stratified results by journal. Excel (Microsoft) was used for all analyses.
Results
Study Selection
Eight journals were listed in the Web of Science’s 2018 Journal Citation Reports with “spine” or “spinal” in the title. We excluded one for having published no SRs in 2018 (Joint Bone Spine) and 3 for having an impact factor less than 2.0 (Spinal Cord, Clinical Spine Surgery, and The Journal of Spinal Cord Medicine), leaving 4 journals meeting our inclusion criteria (Figure 1). A total of 72 studies were identified as SRs published in the 4 journals in 2018. A total of 44 SRs were excluded from this review; excluded studies and reasons for exclusion are listed in Table S1 in the supplemental material. The remaining 28 SRs met our criteria for inclusion: 13 from the European Spine Journal, 1 from the Journal of Neurosurgery Spine, 9 from Spine, and 5 from the Spine Journal.

Study selection. Abbreviation: SR, systematic review.
Critical Appraisal of Systematic Reviews
The mean percentage of the AMSTAR 2 domain items satisfying the AMSTAR 2 criteria for all SRs was 49.5% (Figure 2). The critical domain items were satisfied less often (39.1%) compared with the noncritical domain items (57.3%). The SRs published in The Spine Journal had a higher percentage of both critical and noncritical domain items answered satisfactorily compared with the other journals.

Mean percentage of AMSTAR 2 domain items satisfying the AMSTAR 2 criteria by journal.
Of the 16 AMSTAR 2 domain items, 11 were reported ≤50% of the time across studies (Figure 3). Domain items best reported included comprehensive literature search (item 4, 93%), potential conflict of interest (item 16, 93%), and specification of inclusion criteria containing components of PICO (patient, intervention, comparator, and outcome; item 1, 89%). Domain items most poorly reported include accounting for individual study risk of bias (RoB) when interpreting the results (item 13, 14%), list and justification of excluded articles (item 7, 18%), and an a priori establishment of methods prior to the review or registered protocol (item 2, 18%).

Percentage of systematic reviews satisfying the AMSTAR 2 criteria by domain item number.
Overall Confidence in Results of Systematic Reviews
The overall confidence in the results of the 28 reviews was rated as “low” in 2 SRs 9,10 and “critically low” in the rest 11 -36 (Table 3). Critical appraisal of each SR is found in Table S2 in the supplemental material.
Confidence in the Results of the Systematic Review.
Discussion
The included 28 SRs published in 4 high-impact spine journals in 2018 demonstrated poor compliance with the AMSTAR 2 instrument; confidence in the results was graded as “critically low” in 93% (n = 26). To our knowledge, this is the first critical appraisal of spine surgery SRs.
A few AMSTAR 2 items were well reported among the included sample of SRs. Almost all SRs used a comprehensive electronic literature search to identify potential articles. It is our experience that many spine surgeons equate SRs with systematic literature searching, underestimating the importance of the other aspects of a SR. This could, in part, account for the high percentage correctly reporting this domain item while at the same time neglecting many other aspects of the SR that document sound methodological principles. Almost all review authors declared whether they had potential conflict of interest. We expected a high percentage with this domain item because it is required now by many journals. However, although there has been an emphasis on the importance of declaring financial conflicts of interest, little attention appears to be given to declaring nonfinancial conflicts of interest (NFCOIs) such as personal relationships, strongly held beliefs, and the desire for career advancement. When ignored, NFCOIs can call into question the impartiality of a review. 37
Whereas nearly 70% of the AMSTAR 2 domain items were reported ≤50% of the time, 3 items were reported <18% of the time: publishing a protocol prior to initiating the review, detailed identification of excluded articles, and accounting for RoB in the results and conclusions. Publishing of a protocol prior to initiating a SR is often not considered by those performing spine surgery SRs. The methods for conducting the review, including inclusion/exclusion criteria and planned analyses, should be developed a priori and stated as such at a minimum. A SR is an observational study and, therefore, requires agreed upon methods prior to starting the review. Adhering to the methods helps reduce RoB.
Listing of excluded articles and the justification for exclusion were also poorly reported. Several articles listed general reasons for exclusion in their PRISMA flow diagram but failed to identify the specific studies excluded. Without explicit citation of the articles excluded, the potential impact of their exclusion cannot be known.
Half of the reviews used appropriate techniques in assessing the RoB in primary studies. However, only 14% accounted for the RoB when interpreting the results or drawing conclusions. Many SRs drew conclusions from the data of studies with a high RoB and did not account for the poor quality of the data available from such studies. As a result, conclusions drawn were not supported by the quality of the data. The spine surgery literature often includes studies with varying levels of quality. When presenting the data in a SR, investigators should emphasize the highest-quality studies, those with the lowest RoB, and acknowledge that the confidence in the data is likely low when high-RoB studies are included.
Previous assessments of SR quality using different critical appraisal instruments across a variety of surgical subspecialties have demonstrated similar shortcomings as the SRs we reviewed. 38 -40 For example, one study of the neurosurgery literature 39 reported only that 18% of SRs listed the excluded studies, and just 21% assessed publication bias using the original AMSTAR instrument. In our review, the journal with the highest impact factor had the highest percentage of AMSTAR 2 domain items satisfying the AMSTAR 2 criteria. This is consistent with other reviews assessing quality by journal impact factor. 1,41
Our analysis is limited to SRs of spine surgery published in 2018 in 4 spine journals with an impact factor >2.0. Our review did not evaluate aspects of SR quality outside of the domains of the AMSTAR 2 instrument. By dichotomizing whether criteria were met or partially met as “yes” versus not met, the overall quality of some reviews may have been rated higher than if a yes, partially met, or no scale had been used. We did not stratify studies by type of surgery or focus (eg, complications vs effectiveness); however, the methodological rigor (and appraisal) for SRs should not differ.
Conclusions
As with any research, the credibility of a SR and its value to clinicians and policy makers are dependent on its methodological quality. Although the SRs included came from spine journals with higher impact factors, this appraisal found that confidence in the findings of these reviews was critically low.
Supplemental Material
Supplemental Material, Appendix - Critically Low Confidence in the Results Produced by Spine Surgery Systematic Reviews: An AMSTAR-2 Evaluation From 4 Spine Journals
Supplemental Material, Appendix for Critically Low Confidence in the Results Produced by Spine Surgery Systematic Reviews: An AMSTAR-2 Evaluation From 4 Spine Journals by Joseph R. Dettori, Andrea C. Skelly and Erika D. Brodt in Global Spine Journal
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
The supplemental material is available in the online version of the article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
