Editorial Perspective: Time for Another Grading System

Abstract

This is an important critical reassessment regarding the proliferative publication trend of “Systematic Reviews” seen not just in just spine surgery but across all medical specialties. With the advent of the “evidence-based medicine” era, the traditional hierarchy of evidence, which had historically held prospectively randomized clinical trials at the very pinnacle of the evidence pyramid, was rearranged in favor of “well-performed” systematic reviews (SRs) and meta-analyses (MAs).¹ With their overarching and collective nature, these undertakings can offer a statistically much more potent literature overview while potentially reducing bias almost invariably introduced in single studies.^2,3 Undeniably, SRs and MAs have become essential foundations for guidelines (ie, National Institute for Health and Care Excellence [NICE] guidelines), they are the “go-to” first look resource for government agencies and granting bodies alike. And for future authors, such studies have become a welcome primary entry point for a deeper dive towards their own research projects.

By now, the dramatic proliferation of SRs and MAs has become a well-reported phenomenon as well as problem in the scientific publication world.⁴ As the study by Fontelo and Liu⁴ reported, the United States has shown a linear rise of medical SRs crossing the 1000 publication threshold in 1996 and in 2015 contributing just shy of 10 000 such studies/year to the peer reviewed literature amounting to a total of >82 000 publications. In contrast, the People’s Republic of China (PRC) showed a sudden rise since 2010 and is now the nation with the second highest number of SR publications following the United States with 21 000 publications as of 2015. In the arena of MAs, the PRC now leads the world in MAs published with almost 4000 per year (15 345 total) ahead of the United States with just below 2500/year (16 581 total). Again, the PRC publication profile took a sudden sharp upward turn after 2009 and since then continued in a continued logarithmic turn upward.4

There are several well-recognized reasons for the appeal of SRs and MAs:

If done well, they hold the potential to provide a “state of the art” overarching assessment of the body of literature on a given topic.

Well-done or unique SRs and MAs promise a ready bounty of copious citations, especially if published early-on regarding a novel or hot button topic.

Frankly speaking, they also offered the convenience of publishing scientifically even in major scientific journals from the convenience of a connected desktop workstation by searching various freely accessible search engines without having to bother with institutional review boards or hassling with the increasingly prohibitive cost and rigmarole of de novo clinical research. Creating a new and well-recognized publication in a matter of days suddenly became a reality.

The sudden proliferation of SRs and MAs has not unexpectedly lead to inconsistent quality standards with journal editors not necessarily applying existing quality standards to submissions. A telling quote of the authors of this EBSJ study expressed this deficiency that “most authors (at least in the spine literature as of 2018) seemed to equate systematic reviews with systematic literature searching.”

Looking back there was an early recognition of the need to raise the quality of SRs and MAs. The PRISMA statement was born from a collaborative effort called QUORUM (QUality Of Reporting Of Meta-analyses) in 1996.⁵ The change to the term PRISMA (Preferred Reporting Items for Systemic reviews and Meta-Analyses) arose from a wish to include SRs in addition to a straightforward checklist tool and adopted the definitions used by the respected Cochrane Collaboration.⁶

While the intent of the PRISMA group was to provide authors a checklist tool for their creation of a quality SR or MA, the AMSTAR (AMeasSurement Tool to Assess systematic Reviews) instrument published in 2007 and its AMSTAR 2 revision published in 2017 were created to create a more user friendly “critical appraisal tool” of an SR inclusive of nonrandomized studies (Table 1).⁷ While both tools share some overlap they are meant to be complementary to one another and thus are applied sequentially for critical assessment of quality of SR and MA studies. For instance, Kelly et al⁸ applied a sequential analysis of PRISMA and AMSTAR 2 tools to so-called Rapid Reviews (which are a more recent introduction of a more “accelerated evidence synthesis”) published throughout 2016 and found poor compliance with both entities with published reviews showing better compliance with PRISMA guidelines than with AMSTAR items.

Table 1.

AMSTAR 2 Items.

Item 1: Did the research questions and inclusion criteria for the review include the components of PICO [population, intervention, comparison intervention, outcome measures]

Item 2: Did the report of the review contain and explicit statement that the review methods were established prior to conduct of the review and did the report justify any significant deviations from the protocols

Item 3: Did the review authors explain their selection of the study for inclusion in the review?

Item 4: Did the review strategy authors use a comprehensive literature search strategy?

Item 5: Did the review authors perform study selection in duplicate?

Item 6: Did the authors perform data extraction in duplicate?

Item 7: Did the authors provide a list of excluded studies and justify the exclusion?

Item 8: Did the authors describe the included studies in adequate details?

Item 9: Did the authors use a satisfactory technique for assessing risk of bias (RoB)?

Item 10: Did the authors report on the sources of funding for the studies included in the review?

Item 11: If meta-analysis was justified did the review authors use appropriate methods for statistical combination of results?

Item 12: If meta-analysis: If meta-analysis was performed did the review authors assess the potential impact of RoB in individual studies on the results of the meta-analysis or other evidence synthesis?

Item 13: Did the review authors account for RoB in individual studies when interpreting/discussing the results of the review?

Item 14: Did the review authors provide a satisfactory explanation for, and discussion of, any heterogeneity observed in the results of the review?

Item 15: If they performed quantitative synthesis did the review authors carry out an adequate investigation of publication bias (small study bias) and discuss its likely impact on the results of the review?

Item 16: Did the authors report any potential sources of conflict of interest, including any funding they received for conducting the review?

In psychiatry, a larger study assessing AMSTAR 2 in comparison with AMSTAR and another rating tool (ROBIS; Risk Of Bias in Systematic reviews) showed moderate interrater reliability of AMSTAR 2 and high concordance of this test with the ROBIS test but not with AMSTAR itself for SRs that studied psychological and pharmacologic depression treatments but overall found similar validity across all rating tools.⁹

As the science of rating SRs and MAs is still evolving, the authors of the present EBSJ study performed a thorough assessment of spine-related SRs in one publication year (2018) and applied the AMSTAR 2 criteria to

evaluate the quality of compliance of these SR’s with these new ratings tools and

inform the larger spine community about this tool and its intent.

The authors did not wish to belittle the efforts of the authors or the 4 historically leading spine journals and their editorial staff but rather hoped to expand the quality awareness of future authors and reviewers on the subject matters of SRs and MAs. The findings, which were (critically low) in 93% of results, will hopefully lead to an improved adherence to PRISMA standards at the onset and address the quality standards formulated in the AMSTAR 2 guidelines. To this end, the evolving field of evaluation tools for SRs and MAs introduces a new field on investigations for a new generation of spine researchers.

Jens R. Chapman Swedish Medical Center, Seattle, WA, USA

References

Sackett

Rosenberg

Gray

Haynes

Richardson

. Evidence based medicine: what it is and what it isn’t. BMJ. 1996;312:71–72.

Guyatt

Haynes

Jaeschke

, et al. Users’ guides to the medical literature: XXV. Evidence-based medicine: principles for applying the users’ guides to patient care. JAMA. 2000;284:1290–1296. doi:10.1001/jama.284.10.1290

Petrisor

Keating

Schemitsch

. Grading the evidence levels of evidence and grades of recommendation. Injury. 2006;37:321–327.

Fontelo

Liu

. A review of recent publication trends from top publishing countries. Syst Rev. 2018;7:147. doi:10.11.1186/s13643-018-080819-1

Moher

Liberait

Tetzlaff

Altman

; PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-analyses: The PRISMA statement. PLoS Med. 2009;6:e1000097. doi:10.1371/journal.pmed100097

Higgins

JPT

Green

. Cochrane Handbook for Systematic Reviews of Interventions 4.2.5. Chichester, England: Wiley; 2005. http://www.rees-france.com/en/IMG/pdf/2005_handbook-cochrane_systematic_review_.pdf. Accessed April 3, 2020.

AMSTAR 2 guidance document. https://amstar.ca/docs/AMSTAR%202-Guidance-document.pdf. Accessed April 3, 2020.

Kelly

Moher

Clifford

. Quality of conduct and reporting in rapid reviews an exploration of compliance with PRIMSA and AMSTAR guidelines. Syst Rev. 2016;10:79. doi:10.1186/s13643-016-0258-9

Lorenz

Mathias

Pieper

, et al.

AMSTAR 2 overall confidence rating: lacking discriminating capacity of requirement of high methodological quality?

J Clin Epidemiol. 2019;114:133–140. doi:10.1016/j.clinepi.2019.05028

Editorial Perspective: Time for Another Grading System—From PRISMA to AMSTAR 2

Abstract

References