Abstract
Critical appraisal of evidence is performed to assess its validity, trustworthiness and usefulness in evidence-based practice. There currently exists a large number and variety of critical appraisal tools (also named risk of bias tools and quality assessment instruments), which makes it challenging to identify and choose an appropriate tool to use. We sought to develop an online inventory to inform librarians, practitioners, graduate students, and researchers about critical appraisal tools. This online inventory was developed from a literature review on critical appraisal tools and is kept up to date using a crowdsourcing collaborative web tool (eSRAP-DIY). To date, 40 tools have been added to the inventory (www.catevaluation.ca), and grouped according to five general categories: (a) quantitative studies, (b) qualitative studies, (c) mixed methods studies, (d) systematic reviews and (e) others. For each tool, a summary is provided with the following information: tool name, study designs, number of items, rating scale, validity, reliability, other information (such as existing websites or previous versions), and main references. Further studies are needed to test and improve the usability of the online inventory, and to find solutions to reduce to monitoring and update workload.
Introduction
The purpose of this article is to describe the development and maintenance of an online inventory of critical appraisal tools (CATs). CATs (also named risk of bias tools and quality assessment instruments) are checklists or scales that provide a list of domains and items to assess the quality of studies (Bai et al., 2012). These tools can be used as part of evidence-based practice that advocates to critically appraise the evidence to assess its validity, trustworthiness, and usefulness (Burls, 2009; Greenhalgh, 2014). Also, critical appraisal is a step of systematic literature reviews to ensure that conclusions drawn from the included studies properly reflect the quality of evidence reviewed. The appraisal can help to highlight the strengths and weaknesses of the evidence reviewed, to select studies, to indicate how much confidence readers can have in the results, to guide the synthesis and interpretation of the findings, and to identify the gaps in knowledge (Booth et al., 2016; Viswanathan et al., 2012).
A variety of CATs has been developed to formalize the appraisal process and ensure it is done in a more rigorous, systematic, and explicit manner. Despite the numerous existing literature reviews analyzing over 500 CATs, there is still no clear guidance to help select which of the many and varied CATs to use. These reviews covered different study designs (e.g., randomized controlled trials (RCT) (Armijo-Olivo et al., 2013; Moher et al., 1995), non-randomized studies (Deeks et al., 2003; Jarde et al., 2012; Saunders et al., 2003; Shamliyan et al., 2010), or qualitative studies (Majid & Vanstone, 2018; Santiago-Delefosse et al., 2016; Walsh & Downe, 2006) and analyzed the tools using different criteria (e.g., number of items, time needed to apply, content, frequency of use). Also, several tools were not developed using methodological standards or are not supported by sound validation and reliability testing (Crowe & Sheppard, 2011; Katrak et al., 2004). This makes it more challenging for librarians, practitioners, graduate students and researchers to identify and choose a proper tool to use. To respond to this challenge, we developed an online inventory of CATs.
Online inventory of critical appraisal tools
The online public inventory of CATs,
The initial inventory in May 2020 consisted of 37 CATs. These CATs were identified from the results of literature reviews performed during the doctoral project of the first author (Hong, 2018). To select the CATs for the inventory, the selection criteria listed in Table 1 were used. Given that tools relying on rigorous development and testing are being advocated for in the literature, we decided to include only tools that were supported by validation and/or reliability testing (Whiting et al., 2017). When several tools for a specific study design met the selection criteria, we decided to only include the most recent ones. In the online inventory, these CATs are grouped into five general categories: (a) quantitative studies, (b) qualitative studies, (c) mixed methods studies, (d) systematic reviews, and (e) others. Each tool is described in a table that presents the tool name, study design, number of items, rating scale, validity, reliability, other information (such as existing websites or previous versions), and main references.
Selection criteria
Selection criteria
To keep the inventory up to date, we are monitoring the CAT literature using the collaborative web tool eSRAP©-DIY,
For each CAT user, finding and choosing a proper CAT to assess the quality of a study can be a daunting task because of the large number and variety of existing tools. To address this issue, we developed an online inventory to inform librarians, practitioners, graduate students and researchers about the CATs that have been tested for their validity and/or reliability. We structured the inventory into five general research categories to facilitate the identification of a tool, and provided a short summary of each tool. This categorization could help users identify and select CATs by providing them a quick overview of the available validated tools.
Maintaining online inventories up to date and relevant for users is a challenge. The success of the maintenance of our CAT inventory currently relies on voluntary efforts of people interested in contributing to a collective effort to screen records identified in the monitoring system eSRAP-DIY. In addition to the first author, five raters were recruited to participate to the screening of records in eSRAP-DIY. However, only two logged in the systems and rated records. Participant retention is a problem that is often observed when using crowdsourcing (Strong & Simmons, 2018). Also, compared to systematic reviews that typically have an end-point, monitoring records for an online inventory is an ongoing process, and thus, it is more challenging to keep the crowd motivated. Currently, the workload to screen new records remains manageable (around 600 new records per year), but is likely to increase yearly similarly to the annual growth in the number of articles published, which is estimated to be approximately 3% (Ware & Mabe, 2015). In order to improve the engagement of the crowd, it is necessary to provide frequent feedback on the crowd’s performance (Strong & Simmons, 2018) such as sending a newsletter email to the raters on a regular basis to inform the progress of the crowd or notifications of new records.
There are some limitations of this inventory. First, it is possible that not all eligible CATs were identified and included in the inventory. We limited our search to articles indexed in bibliographic databases and did not search for tools published in the grey literature. Also, eSRAP-DIY is only monitoring one bibliographic database (SCOPUS). Second, we adapted the selection criteria depending on the tool category. For example, on the one hand, given that several CATs have been developed for the RCTs and non-randomized studies categories, we mainly retained the most recently developed tools that had information on both validly and reliability. On the other hand, given the paucity of existing tools for the mixed methods and systematic reviews categories, our inclusion criteria were less stringent; we retained tools even if there was information only on validity or only on reliability. Third, we selected the CATs based on whether they included validity and/or reliability testing, regardless of the validity and/or reliability results. We advise the users of our inventory to read the results of the studies listed in the CAT descriptive tables to make a decision regarding decide which tools to use.
Conclusion
This article presents the development and maintenance of an online inventory of CATs to help librarians, practitioners, graduate students and researchers to identify and select the proper tools to use. Usability studies are necessary to test the ease of use, acceptability, and usefulness of our CAT inventory. There is also a need to explore other options to manual monitoring, such as machine learning approaches, that could help reduce the workload by reducing the number of irrelevant records to screen (O’Mara-Eves et al., 2015). We invite the research community to collaborate by informing us about new or missing CATs, providing feedback on the utility and usability, or joining the ‘crowd’ to screen records identified in the eSRAP-DIY system.
Footnotes
Acknowledgments
This project was funded by the Quebec Strategy for Patient-Oriented Research (SPOR) SUPPORT Unit. We would like to thank Robin Melanson for the website development, Vera Granikov and David Tang for their support with eSRAP-DIY, and Paula Bush for commenting an earlier version of this manuscript.
