Sage Journals: Discover world-class research

Abstract

Purpose

Existing research quality appraisal tools in the social sciences face significant disciplinary incompatibility and methodological limitations. This study aims to address these gaps by evaluating current tools and developing a versatile new checklist to standardize quality assessment across quantitative, qualitative, and mixed-methods studies.

Design/Approach/Methods

A systematic literature analysis of 24 mainstream appraisal tools assessed their disciplinary compatibility, methodological scope, and criteria design. A new checklist (i.e., Quantitative, Qualitative, and Mixed-Methods Studies [QQM] checklist) was developed using a utility-usability framework, integrating common indicators to ensure methodological inclusiveness for studies.

Findings

Analysis revealed most tools are restricted to single methods, 67% focus on medical fields, and only two are discipline-specific for social sciences. Issues of existing tools include ambiguous criteria, rigid binary scoring, and outdated frameworks. The QQM checklist addresses these gaps with eight universal indicators and method-specific criteria (4–6 for quantitative, 6 for qualitative), using a three-tiered scoring system (0–2 points).

Originality/Value

The QQM checklist offers a standardized solution for quality appraisal in social sciences, transcending disciplinary and methodological boundaries. Its design enables AI integration, advancing standardization for consistent, rigorous evaluations—critical for funding, publication, and policy decisions. This work contributes to improving research quality and transparency in the social sciences.

Keywords

Mixed-methods studies quality appraisal tools research quality evaluation

Introduction

Research quality evaluation has always been important for science improvement (Harden & Gough, 2012). It is also an important part for researchers’ or institutes’ performance in high-stakes evaluations, such as grant funding (Geuna & Martin, 2003). Within scientific publications, the burgeoning of meta-analyses or systematic reviews calls for better judgments of the studies that are included in those syntheses (Harden & Gough, 2012; Heyvaert et al., 2013; Pluye & Hong, 2014). For instance, “garbage in, garbage out” has long been a concern for some careless meta-analyses as they may produce biased conclusions for the scientific community (Egger et al., 2001; Nelson et al., 2018; Slavin, 1986, 1995). As a result, attempts have been made to apply quality appraisal, typically conducted using checklists or tools with multiple quality indicators, to most scientific domains (Appelbaum et al., 2018; Michie et al., 2005).

To date, a number of appraisal tools have been developed to increase the rigor and standardization of the assessment process (Harrison et al., 2021; Hong et al., 2018b; Sirriyeh et al., 2012). However, most of them favor specific methods (e.g., either qualitative or quantitative). We are still lacking comprehensive assessment tools for evaluating studies of multiple methods, particularly in the social science domain. Our literature review showed that, out of 24 comprehensive appraisal tools/checklists, only two were specifically designed for the social sciences—one focusing on Mexican Americans and the other on implementation science.

The lack of appropriate tools has led social science researchers to simply use tools that have been developed in other disciplines. However, research quality and evaluation are highly discipline-dependent (Collins et al., 2012; Fàbregues & Molina-Azorín, 2017). Social science research applies to a wide variety of methods, from collecting and analyzing numerical data to using qualitative techniques to dissect complex social phenomena that defy simple quantification. As a result, generic tools may fail to capture the specificities of social science research, potentially overlooking important evaluative aspects and reducing the validity of the assessment (Koch & Harrington, 1998; Protogerou & Hagger, 2019). Additionally, some researchers use different tools to assess various study types separately in reviews or meta-analyses. This increases the workload and, on the other hand, compromises the objectivity of the evaluation.

In addition to the difficulties in finding suitable tools, existing tools have certain shortcomings. A major one is the lack of validated and consensus-based criteria. Some tools also suffer from incomplete or unclear evaluation criteria, lack of user guidance, and failure to update in line with academic advancements (Heyvaert et al., 2013; Hong et al., 2019). Therefore, what social science researchers need is a standardized and easy-to-use checklist to guide the research quality evaluation across various study designs.

To address this gap, the present study developed a research quality checklist that can be used for quantitative, qualitative, and mixed-methods studies. Building upon a previous version created by the authors for a scoping review (Salmela-Aro et al., 2021), the checklist includes eight generic indicators applicable to all study types, along with 4–6 tailored indicators for quantitative studies and 6 for qualitative studies. Suggestions for cut-off points are also provided to guide users in determining the eligibility of studies for further analysis. One purpose of this paper is to introduce the creation, initial validation, and application of the checklist. It is hoped that the checklist will improve systematic reviews, meta-syntheses, and evidence-based practice in the social sciences by enabling studies to be assessed with greater rigor and consistency. It can also serve as a tool for editors evaluating articles for publication, guiding research funding decisions, and educating university students on conducting high-quality research.

Current research appraisal tools

Due to the increasing popularization of systematic review and meta-analysis in recent studies (Hong et al., 2018b; Sirriyeh et al., 2012), there has been a considerable growth in the number of appraisal tools in recent years. As the necessary tool for systematic examination of study quality, appraisal tools formalize the quality appraisal process and guarantee the process to be conducted in a systematic, transparent, and reproducible manner (Harden & Gough, 2012; Hong et al., 2018b). Appraisal tools are regularly divided into two categories based on the target of evaluated methods: (a) appraisal tools for specific research approaches and (b) comprehensive appraisal tools.

The former type of tool usually focuses on either quantitative or qualitative method studies. In particular, appraisal tools for quantitative studies usually comprise the criteria for cross-sectional and longitudinal surveys as well as experimental/trial methods, for example, the Prevalence Critical Appraisal Instrument (Munn et al., 2014). Appraisal tools for qualitative studies, on the other hand, generally concentrate on the evaluation of interview, observation, field study, and case study, with the Critical Appraisal Skills Programme (CASP) qualitative checklist tool being one of the most commonly used tools (Long et al., 2020). Given the specific focus on quantitative or qualitative methods, these tools are not able to be used effectively for assessing the quality of mixed-methods studies. In addition, some appraisals were particularly designed for systematic review or meta-analysis rather than empirical studies, like Cochrane Handbook (Chandler et al., 2019) and the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines (Moher et al., 2010), and thus, they are not included in the review.

The latter type of tool, for comprehensive appraisal, aims to evaluate quantitative, qualitative, and mixed-methods studies and categorize them according to a single evaluation scale. To achieve these aims, most appraisal frameworks take one of two approaches. The first approach is to develop a broad range of criteria for all sub-dimensions of quantitative and qualitative methods. One of the most widely used examples is the set of critical appraisal tools from the Joanna Briggs Institute (2024), with 14 different sub-scales for different methods. The second approach attempts to emphasize the unique methodological feature of mixed-methods studies, by adding criteria such as “rationale for using a mixed method design, how the quantitative and qualitative components were integrated, and the added knowledge gained from the integration” (Fàbregues et al., 2021; Ivankova, 2014). For instance, the Mixed Methods Appraisal Tool (MMAT) is a widely used model among them (i.e., Hong et al., 2018a). In addition, some appraisal tools concentrate on the service for a particular discipline or cohort, including the “Quality Assessment Tool for Studies With Diverse Designs” (QATSDD) for medical/clinical studies (i.e., Sirriyeh et al., 2012), the A comprehenSive tool to Support rEporting and critical appraiSal Tool for Implementation Science (ASSESS; Ryan et al., 2022), and the Transformative, Mixed Methods Checklist for Psychological Research With Mexican Americans (TMMC-MA) for the cohort of Mexican Americans (Canales, 2013).

Prior studies have pointed out that there is a small number of comprehensive appraisal tools because many of these tools focus on a specific type of design or method (Harrison et al., 2021; Sirriyeh et al., 2012). In one systematic review of appraisal tools (over 500 tools), Hong et al. (2019) found that few of them include the comprehensive criteria for quantitative, qualitative, and mixed methods. Moreover, the shortage of comprehensive appraisal tools is very evident in social science. Building upon Hong et al.'s (2019) review of appraisal tools from 2000 to 2018, we conducted a follow-up literature search across the years 2019–2024 (see Appendix A for the search protocol). In total, we identified 64 relevant studies with 24 comprehensive tools/checklists. Among them, 43 studies (67.2%) were published in medical/clinical/nursing journals (i.e., 25 in the Social Sciences Citation Index [SSCI] Q1–Q2), 15 studies (23.4%) in method-related journals (i.e., 10 in SSCI Q1–Q2, especially in the Journal of Mixed Methods Research [JMMR]), and only five studies (7.8%) in education, psychology, or management journals (i.e., two in SSCI Q1–Q2). Regarding the tools/checklists, 10 were of the comprehensive type for most disciplines (four of which were more targeted to medical/clinical/nursing), and 11 were particularly designed for the medical discipline. Only two tools were designed for the social sciences (i.e., 1 for Implementation Science and 1 for psychological studies in Mexican American populations). As discipline plays a critical role in shaping researchers’ thinking and decisions regarding appraisal quality (Fàbregues et al., 2019), the shortage of tools in social science may restrict the ability of researchers to evaluate the quality of studies fitting into this disciplinary field.

Moreover, the existing comprehensive appraisal tools are inconvenient to use for a number of reasons. In reality, there is a controversial swing between “too few/simple criteria to guarantee the quality” and “too many/complex criteria to guarantee appraisal efficacy.” On the one hand, some tools attempt to reduce users’ analysis burden by including fewer items; however, it may lead to unclear and incomplete features (Heyvaert et al., 2013). Take MMAT, a comparatively short appraisal tool, as an example. Experts have identified that some items include several concepts without clear clarifications, which are difficult for them to interpret and score (Hong et al., 2018b). On the other hand, other tools try to add more items and categories to develop a more comprehensive framework. Nonetheless, this attempt may contribute to overlapped, complicated, time-consuming, and less user-friendly issues, especially for less experienced researchers (Heyvaert et al., 2013; O'Cathain, 2010).

We can see this difficulty play out in examples of the two main approaches to comprehensive tool development mentioned above. Including a comprehensive range of criteria for all sub-dimensional methods is meaningful for users who focus on a specific method of studies, whereas it might be inconvenient and time-consuming for those reviewing studies with various methods. Similarly, including a specific criterion for mixed-methods studies might benefit researchers who are interested in evaluating quality according to whether mixed methods are better than other types of study designs (Fàbregues et al., 2021; Ivankova, 2014). However, this attempt may also increase the demands and difficulties in tool usage, especially when mixed-methods studies are just one part of a systematic review and users are less experienced.

Furthermore, existing appraisal tools often fall short in their scoring method/procedure. On the one hand, few tools provide simple and clear guidance to quickly understand scoring standards and make a final assessment (e.g., suggested cut-off points). This limitation undermines the efficacy and convenience for users (Heyvaert et al., 2013), and therefore, has received criticism from several systematic review studies (e.g., Holl et al., 2016; McPherson et al., 2017; Orr et al., 2021). On the other hand, criteria in most comprehensive appraisal tools are mostly dichotomous (i.e., only “yes,” “no,” and “unknown”), without a medium score. The dichotomous response usually fails to distinguish between studies with strong and weak coverage of an issue (Sirriyeh et al., 2012). Given all those limitations, an easy-to-use yet comprehensive appraisal tool with a clear-scoring method is needed to help researchers conduct quality evaluations of studies with quantitative, qualitative, and mixed-methods designs, particularly in the social sciences.

Need for a brief but powerful comprehensive tool

As systematic reviews and meta-analyses are becoming increasingly widely utilized (Hong et al., 2018b; Sirriyeh et al., 2012), it is imperative to develop a brief comprehensive appraisal tool for researchers to save time and improve efficacy during literature selection (Harrison et al., 2021; Sirriyeh et al., 2012). Otherwise, researchers have to spend substantial time and effort on applying different tools to a single corpus of literature to appraise different methods (e.g., Leung & Peisah, 2023; Narsakka et al., 2022). Developing a comprehensive all-purpose tool is most urgent in social science research, since research quality and its evaluation are highly discipline-dependent (Collins et al., 2012; Fàbregues & Molina-Azorín, 2017), and available comprehensive appraisal tools for social sciences are rare, as we have explained above. See Table 1 for the summarized needs.

Table 1.

Need for a brief and powerful comprehensive research appraisal tool.

Need	Key reasons	Examples with unsatisfied features
1. Develop a powerful, comprehensive appraisal tool, especially for social science studies	(1) Systematic reviews and meta-analyses are increasingly popular in recent years.(2) Brief comprehensive appraisal tools are efficient and time-saving for review.(3) Research quality and its evaluation are highly discipline-dependent, and available appraisal tools for social science are rare.	(1) Leung and Peisah (2023) used Attree and Milton's checklist to rate qualitative study, while using Medical Research Standard Quality Assessment Criteria for quantitative study.(2) Narsakka et al. (2022) used Joanna Briggs Institute's Critical Appraisal Checklists to assess quantitative and qualitative studies, while they only used the Mixed Method Appraisal Tools to assess mixed-methods studies, though the latter is a comprehensive tool.
2. Build consensus in comprehensive appraisal by identifying key criteria while overcoming biases in existing tools	(1) A wide variety of current appraisal frameworks makes it difficult for reviewers to select.(2) Without consensus, designers have to conduct appraisal frameworks based on their research experience or views, contributing to evaluation bias.	(1) Using QATSDD, Clausen et al. (2017) found that all qualitative articles scored poorly, whereas quantitative studies did not. Fenton et al. (2015) also identified the quantitative bias when using QATSDD.(2) According to experts interviewed for Mixed Methods Appraisal Tools, selecting item was difficult for cross-sectional and single-group studies, and items in qualitative and mixed-methods studies were difficult to judge (Hong et al., 2018b).
3. Design comprehensive appraisal tool with short, clear, and comprehensive features	(1) An appraisal instrument should be easy, quick, clear, and comprehensive in its use.(2) It is demanding to balance the need to be short while guaranteeing clear and comprehensive features.	(1) A short and simple version of MMAT was developed (Hong et al., 2018a); however, some experts pointed out the unclear issues (e.g., several concepts in 1 item; unclear distinction between some items) and worried that some key items may be missing (Hong et al., 2018b).(2) O'Cathain (2010) included all the key criteria from prior tools into a framework, but the framework became too long and items overlapped.
4. Provide a simple and clear user guidance with appropriate examples	(1) A simple and clear user guidance is important to develop a short, clear, and comprehensive appraisal tool.(2) Providing concrete and transferable examples is of great benefit to make the tool user-friendly.	(1) QATSDD was criticized in several review studies due to its insufficiency in final assessment and suggested cut-off point (e.g., Holl et al., 2016; McPherson et al., 2017; Orr et al., 2021).(2) While QATSDD provided examples to assist users when scoring, some authors found the tool challenging when examples did not match the methods used in the reviewing papers (Harrison et al., 2021).
5. Follow the tide of academic development in framework design and revision	(1) Satisfying the updated requirements is a crucial criterion of powerful appraisal tools.	(1) Many widely used comprehensive appraisal tools (e.g., MMAT and QATSDD) have not included research ethics in their appraisal frameworks, although ethics is a vital component of research quality.

Developing a brief comprehensive appraisal tool is also beneficial for raising consensus in quality appraisal, and for helping to overcome the existing drawbacks due to the lack of consensus. Specifically, the wide variety of appraisal frameworks makes it difficult for reviewers to choose the most appropriate one(s) for their study (Hong et al., 2018b). Moreover, without a consensus to guide criteria selection, tool designers have to conduct the framework based on their research experience or views (Fàbregues et al., 2019), and consequently, their tools tend to bias their familiar/unfamiliar methods (e.g., quantitative bias in QATSDD; Clausen et al., 2017; Fenton et al., 2015). Given these drawbacks, there is an urgent need for a comprehensive appraisal tool that identifies the key and comparatively consensual criteria from existing tools while overcoming their biases.

To design such a powerful appraisal tool, the primary requirement is for it to be easy, quick, clear, and comprehensive in its use (Heyvaert et al., 2013). In other words, appraisal tools are highly appreciated when they are short, comprehensive, and clear (Hong et al., 2018b). Nonetheless, achieving this balance for comprehensive appraisal tools is demanding. For instance, O'Cathain (2010) attempted to develop a clear and comprehensive appraisal framework by including all the key criteria from prior tools, but the author found the framework became too long with a few overlapping criteria.

To meet the requirements of easy, quick, clear, and comprehensive features, there is a further need to provide simple and clear user guidance, which is lacking in many current comprehensive appraisal tools, especially with respect to their scoring frameworks (Heyvaert et al., 2013). Specifically, providing concrete and transferable examples is one of the most crucial strategies to make user guidance simple and clear (Hong et al., 2018b), but many existing tools failed to provide it. For instance, users found the examples provided by QATSDD are challenging to understand when they did not match the methods used in the reviewing papers (Harrison et al., 2021).

Finally, a powerful tool should also follow the tide of academic development, so it covers updated requirements regarding research quality. For instance, in recent years, there has been an increasing emphasis on research ethics in studies; however, many widely used comprehensive appraisal tools (e.g., MMAT and QATSDD) do not include this as a criterion.

Theoretical framework guiding the development of the quantitative, qualitative, and mixed-methods studies [QQM] checklist

To design a brief, powerful, yet comprehensive research quality appraisal checklist, this study adopted a realistic stance for its theoretical framework (including utility and usability; Hong et al., 2018b). Specifically, the framework divided the “utility” and “usability” dimensions of the realistic stance into four and five sub-dimensions, respectively (see detailed information in Table 2).

Table 2.

Theoretical framework guiding new appraisal tool construction.

Framework	Sub-dimensions and explanation
UtilityWhether the tool can function as needed	(1) Coverage (comprehensiveness) Can be applied to several study designs (qualitative, quantitative, and mixed-methods studies)
	(2) Completeness The degree to which all important items to appraise the quality of studies are included
	(3) Flexibility Can be modified based on the research topic or study design, including adding optional, weighting items, or modifying the cut-off values when judged necessary by the reviewers
	(4) Educational Value The helpfulness to learn about study designs as an educational resource for graduate students
UsabilityHow well users can use the tool	(1) Learnability The degree of how easy the tool is for new users to understand and start using effectively
	(2) Efficiency The extent to which users can complete the appraisal process with minimal time and effort once familiar with the tool
	(3) Clear Clarification (or fewer Errors) Clear criteria that reduce ambiguity and support easy selection of appropriate appraisal items
	(4) Fit for Purpose The extent to which the tool meets the needs of its target users for their specific appraisal goals
	(5) Accessibility The tool's ease of access, particularly in online settings

Utility is evaluated by whether the tool can function as needed, with four sub-dimensions: Coverage, Completeness, Flexibility, and Educational Value. Coverage (or comprehensiveness) means the possibility to apply a tool to several study designs (i.e., qualitative, quantitative, and mixed-methods studies), while Completeness refers to the degree to which all important items and criteria are included. These two sub-facets are the most essential for all comprehensive appraisal tools. Furthermore, good utility also entails desirable flexibility and educational value, in which the tools can be modified based on the research topic or study design, and can be used as an educational resource for graduate students to learn about study designs.

Next, Usability was assessed by how well users can use the tool, based on users’ perspective. Five sub-dimensions were included in Usability: Learnability, Efficiency, Clear Clarification, Fit for Purpose, and Accessibility. Learnability and Efficiency usually interact with each other, highlighting that a good tool should be easy for users to learn and be able to come out with a good performance once the users have learned it. Moreover, Clear Clarification requires the tools to have fewer difficulties in understanding criteria and selecting appropriate items to appraise, and Fit for Purpose demands the identification of targeted users and other issues, in order to help users select appropriate tools based on their purpose. In addition, the Usability dimension advocates that the tools should be easily accessible online (Accessibility).

QQM checklist—The quality appraisal checklist for quantitative, qualitative, and mixed-methods studies

Introduction of QQM Checklist

Guided by the aforementioned theoretical framework, the Quality Appraisal Checklist for Quantitative, Qualitative, and Mixed-Methods Studies (QQM Checklist; see Appendix B for the checklist) is designed to evaluate the quality of quantitative, qualitative, and mixed-methods studies in social science research. The prototype of QQM Checklist has been developed and used in a review on students’ engagement (Salmela-Aro et al., 2021), showing potential for use by social science researchers for various types of reviews. The full QQM Checklist comprises a total of 20 indicators categorized into three sections: (1) Study Procedures and Sample, (2) Additional Indicators for Quantitative Studies, and (3) Additional Indicators for Qualitative Studies. For mixed-methods studies, both quantitative and qualitative indicators should be considered.

The first section consists of eight indicators that apply to all study types. These indicators assess key aspects of research design, including the description of the study setting, participant recruitment processes, study administration, informed consent, ethical considerations, sample description, achieved sample, and justification of sample size. Clearly describing data collection, sampling strategies, and analysis procedures ensures auditability, allowing future researchers to examine and replicate the findings (Nowell et al., 2017). For example, inadequate sample size justification can lead to underpowered studies and increase the risk of Type II errors (Cohen, 1988). Additionally, poor reporting of recruitment and participant selection processes may introduce selection bias, threatening both internal and external validity (Shadish et al., 2002). Also, quality research nowadays is required to be conducted ethically and transparently.

The second section applies to quantitative studies (including the quantitative aspect of mixed-methods research) and assesses the rigor of data handling and measurement. It includes four indicators: missing data reporting, missing data handling strategies, measurement validity and reliability, and descriptive statistics. Addressing missing data is critical because mishandling can introduce bias and compromise statistical power (Little & Rubin, 2019). Furthermore, ensuring measurement validity and reliability aligns with psychometric principles, as unreliable instruments reduce the generalizability and replicability of findings (DeVellis, 2017). For longitudinal and repeated-measures studies, two additional indicators assess participant attrition and repeated measures’ reliability (RepeatR). These indicators are included because high attrition rates can lead to systematic biases in longitudinal analyses (Gustavson et al., 2012), and poor reliability in repeated measures can undermine the consistency of findings over time (Schmidt & Hunter, 1996).

The third section applies to qualitative studies (including the qualitative aspect of mixed-methods research) and examines the methodological robustness of data collection and analysis. It includes six indicators evaluating the description and rationale of instrument content, the inclusion of instrument examples, the specification of data analysis methods, the provision of analytical examples, and researcher reflexivity. These indicators align with Lincoln and Guba's (1985) criteria for trustworthiness, which emphasize the importance of credibility, dependability, and confirmability in qualitative research. For instance, reflexivity enhances the credibility of the findings by helping researchers recognize their own biases, positionality, and power dynamics with the study participants.

Each indicator is rated on a three-point scale (1 = No, 2 = Partial, 3 = Yes), and the total score is converted into a percentage for comparisons across different study types. Our checklist also includes suggested cut-off points, categorizing studies as poor (≤50%), moderate (51%–79%), or high quality (≥80%). Such cut-off points are commonly used in systematic reviews to guide the inclusion or exclusion of studies based on methodological rigor (Higgins et al., 2024).

Applying our checklist involves three main steps. First, users should begin by reading the instructions in the checklist and determining the study type (qualitative, quantitative, or mixed methods). Second, users need to select and apply the appropriate sections based on the study type. A flowchart (Figure 1) is provided to guide section selection based on study type and design. For qualitative studies, users should apply the first section (“Study Procedures and Sample”) and the third section (“Additional Indicators for Qualitative Studies”). For quantitative studies, the first section (“Study Procedures and Sample”) and the second section (“Additional Indicators for Quantitative Studies”) should be applied, noting that the “Attrition” and “RepeatR” indicators in the second section only apply to studies with repeated measures or longitudinal data. Mixed-methods studies require the application of all three sections. Finally, scores from all applicable sections are summed and converted into a percentage score.

Figure 1.

Flowchart for applying the QQM Checklist.

For example, a longitudinal quantitative study scoring 39 out of 42 points would have a score of 93%, indicating high quality. Notably, quality dimensions and cut-off points can be adjusted based on research objectives. For instance, all studies might be included in a scoping review, but stricter quality thresholds may be applied in a meta-analysis to ensure the validity and rigor of the synthesized findings. A further benefit of the QQM Checklist is its appropriateness for producing high-quality inter-rater reliability statistics because each item uses a three-point scale (enabling more refined judgements) and researchers can agree beforehand exactly which items should be scored.

QQM checklist validation

To validate the effectiveness of QQM Checklist, we first surveyed six experts in social science subjects from China. All of them received doctoral degrees (4 from education, 1 from psychology, and 1 from sociology) in English-speaking programs (1 from the USA, 1 from Hong Kong SAR, 4 from Europe) so that they are proficient in English and Chinese. On average, they have five years of working experience after Ph.D. The survey includes all the dimensions of utility and usability with 14 items (see Table 3). The items were rated from 1 to 5 (Strongly Disagree to Strongly Agree) using a 5-point Likert-style scale. Results (in Table 3) indicated good quality of this checklist, with high mean scores of 4.5–5.0, small standard variance of 0.41–0.82, and small Coefficient of Variation of 0.08–0.19 for nearly all items (except “the item 1” with mean = 3.83, SD = 1.47, and COV = 0.38). ICC was calculated to demonstrate inter-rater agreement rather than Kappa given the ordinary variables (Gisev et al., 2013). ICC (=0.93) indicated good inter-rater agreement, illustrating a high consensus between experts.

Table 3.

Experts’ survey results and agreements.

Dimensions	Questions	Mean	SD	COV	ICC
Coverage	1. The checklist can meet the research needs in meta-analyses and systematic literature reviews in social sciences.	3.83	1.47	0.38	0.93
	2. The checklist is suitable for assessing the quality of quantitative research.	4.83	0.41	0.08
	3. The checklist is suitable for assessing the quality of qualitative research.	4.67	0.52	0.11
	4. The checklist is suitable for assessing the quality of mixed-methods research.	4.50	0.55	0.12
Completeness	5. The checklist includes all core items necessary for evaluating the quality of empirical research.	4.50	0.55	0.12
Completeness	6. The checklist is universally useful across different social science disciplines (e.g., education, psychology, sociology).	4.33	0.82	0.19
Usability	7. Accessibility: The checklist is easily accessible.	4.83	0.41	0.08
	8. Learnability: The items in this checklist are clear and understandable, making it easy to learn, comprehend, and apply.	5.00	0.00	0.00
	9. Clear Clarification: The checklist provides sufficient explanations and examples (e.g., examples for scoring criteria, item definitions).	4.67	0.52	0.11
	10. Efficiency: The checklist is concise, helping researchers save time and improve efficiency.	4.83	0.41	0.08
	11. Fit for Purpose: The checklist contains redundant or ambiguous items that could lead to misunderstanding or operational errors (Reverse).	4.67	0.52	0.11
Flexibility	12. The checklist can be effectively applied in your and your students’ current or future empirical research.	4.67	0.52	0.11
Flexibility	13. The checklist allows for adjustments to accommodate research needs (e.g., modifying item weights or cut-off scores).	4.50	0.55	0.12
Educational Value	14. The checklist can be used in teaching research methods to beginners (e.g., guiding them to understand the key elements a paper should include).	4.83	0.41	0.08

Note. COV = coefficient of variation.

Later, we invited two young researchers (one master student and one junior doctoral student) to evaluate three studies using quantitative, qualitative, and mixed methods (studies are shown in Supplementary Appendix C), and examined their inter-rater reliability. Good inter-rater agreement was observed by calculating Kappa values (i.e., quantitative = 1.00, qualitative = 0.84, and mixed methods = 0.79), demonstrating that even young researchers can have a good command of this checklist.

Unique features of the QQM checklist

To show unique contributions of the QQM Checklist, we conducted a comprehensive comparison between this checklist and other two widely-used appraisals for empirical studies, including MMAT (Hong et al., 2018a, 2018b) and Joanna Briggs Institute (JBI) (2024). The detailed criteria for different methods in these three tools were listed in Table 4. In terms of similarities, the QQM Checklist not only adopts the basic criteria from these two tools (see the bold texts in Table 4), but also aligns with the theoretical framework from Hong et al. (2018a, 2018b), the designers of MMAT. Specifically, both of the appraisals are based on the concepts of Utility (quality of tool: Coverage, Completeness, Flexibility, and Educational Value) and Usability (user friendliness of tool: Learnability, Efficiency, Clear Clarification, Fit for Purpose, and Accessibility), which have been discussed in the section of “Theoretical framework guiding the development of the QQM Checklist.”

Table 4.

Comparisons of QQM with MMAT and JBI.

	QQM	MMAT	JBI
General Issues	Setting, recruitment, administration, consent & ethics, sample description	Clear research questions and data to address these questions	/
General Issues	6 items	2 items	/
Qualitative	Instrument description, rationale & example, appropriate analysis method & example, reflexivity	Appropriate instrument & data collection, results & findings adequately derived from data, research coherence	Paper coherence (among philosophical perspective, methodology, question/objective, data collection, representation, result interpretation, and conclusion), setting, reflexivity, participant representativeness, ethics
Qualitative	6 items	5 items	10 items (6 for coherence)
Quantitative	Missing data & handling, reliability & validity, descriptive analysis (general);Attrition, reliability for repeated measures (longitudinal / repeated)	Randomization, groups comparability, data completeness, intervention method (randomized)Participant representativeness, data completeness, appropriate measurements, confounders, intervention method (unrandomized)Sampling strategy & representativeness, appropriate measurements & analysis, non-response bias (descriptive)	Criteria for inclusion, subjects and setting, reliability & validity, appropriate measurement & analysis, confounders (cross-sectional)Randomization, group allocation & comparability, blind to experiment & assessment, appropriate measurement & analysis, data completeness (missing) (randomized controlled)Temporal/Casual precedence, sample selection and allocation, confounders, administration, multiple measurements (reliability), appropriate measurement & analysis, retention (attrition) (quasi-experiment)
Quantitative	6 items	15 items (5 for randomized trials, 5 for non-randomized trials, 5 for descriptive)	41 items (8 for cross-sectional, 11 for cohort, 13 for randomized controlled, 9 for quasi-experiment)
Mixed	Combination of quantitative and qualitative criteria	Rationale for mixed methods, effective integration, and comprehensive interpenetration, divergences, and inconsistencies between different methods	/
Mixed	/	5 items	/
Additional Information			10 other sub-scales were omitted (e.g., case study, prevalence study, systematic review, textual evidence)

Note. The similar criteria between QQM and the other two tools were highlighted in bold.

Regarding key differences, the QQM Checklist fulfilled the existing tool gaps by mainly focusing on “research quality” to eliminate those studies with less valid methods, procedures, and results, whereas MMAT and JBI emphasized more on “paper quality” (including writing quality). Therefore, the QQM Checklist comprises richer criteria for basic research principles (e.g., setting, recruitment, administration, consent, ethics, and sample), while excludes the criteria only related to paper-writing, such as coherence of difference sections and effective integration of mixed methods. Consequently, studies with good research quality but with comparatively lower writing quality would acquire a higher score in QQM Checklist than the other two appraisals, as their results and findings are still trustworthy and valuable for systematic review and meta-analysis. Additionally, the QQM Checklist integrates various empirical methods into the framework of quantitative, qualitative, and mixed methods, captures their key common principles, and provides balanced number of criteria for them (i.e., six items in each sub-facets). This design is more tailored to social science studies with less complicated method profiles and equal importance in different methods. In contrast, other two appraisals provide detailed categories of methods (especially for quantitative methods) and their corresponding criteria to meet the requirement of medical research (e.g., 14 sub-scales in JBI).

Discussion

Throughout the literature, we found that there is a shortage of quality evaluation tools suited to social science studies, especially the tools balancing complexity, easy usage, and reliable judgments. Some tools are less user-friendly due to overlapping criteria, poor layout, and insufficient user guidance. For instance, the JBI and CASP tools applicable to multiple study designs provide a separate checklist for each design. However, there is considerable overlap between the general quality criteria for qualitative and quantitative research (Brannen, 1992). This design places unnecessary burdens on users who need to switch between different sub-scales for the same criteria. To address this issue, our checklist consists of a unified section that applies to all research methods, along with two separate sections containing indicators tailored for quantitative, qualitative, and mixed-methods research. Each section clearly identifies the applicable research types and provides specific descriptions of each indicator, including how the generic indicators may differ in their application to quantitative, qualitative, and mixed-methods studies (e.g., justification for sample size in qualitative versus quantitative research). Additionally, unlike most tools that list indicators first and provide explanations in a separate section, our checklist organizes the indicators alongside their descriptions in a parallel structure. This makes it easier for users, especially beginners, to interpret and score, thereby reducing variability in scoring and enhancing the efficiency of the assessment.

To ensure reliable judgment, our checklist provides criteria and examples specific to the social science field. Applying tools designed for other disciplines to evaluate social science research can be problematic, as they often prioritize criteria specific to those fields. For instance, tools developed for clinical and medical research emphasize internal validity, generalizability, and experimental control, which may be less applicable to the social sciences, especially in qualitative research. A key issue is the quantitative bias inherent in some of these tools; for instance, small sample sizes are unnecessarily penalized in the “Quality Assessment Tool for Studies With Diverse Designs.” Given that social science research often deals with human behavior in complex social and cultural contexts, qualitative methods such as observational studies and ethnography that often use smaller sample sizes can provide valid and valuable evidence. However, the misalignment between qualitative methodologies and appraisal tools that focus on quantitative metrics can lead to underestimation or exclusion of qualified qualitative research (Carter & Little, 2007). To avoid this bias, in our checklist, criteria regarding sample size are based on the justification of participant numbers for the chosen analysis method, rather than on total sample size per se. In addition, the qualitative section prioritizes the credibility of the analytical process rather than factors such as replicability or statistical power. Another limitation of other tools is their emphasis on reporting quality over the appropriateness of research conduct. This can lead to misleading evaluations, as poor reporting does not necessarily indicate flaws in research design or execution. To address this, our checklist emphasizes methodological rigor by evaluating the integrity and transparency of the research process, including planning, data collection, and analysis.

Research ethics is also included in our checklist to protect participant rights, ensure academic integrity, and promote social responsibility. Many assessment tools (e.g., MMAT) prioritize core methodological criteria while ignoring ethical issues. This may lead to unassessed research misconduct and malpractice, which undermines the credibility of the research and causes negative social impacts. This problem is particularly acute in sensitive research settings, such as studies dealing with controversial topics or involving vulnerable populations that are often conducted in the social sciences. In light of this, our checklist emphasizes ethical requirements such as informed consent and obtaining ethical clearance from a mandated body.

To accurately reflect study quality, a scoring system is adopted whereby each indicator is evaluated as “no,” “partially,” or “yes,” receiving numerical scores of 1, 2, and 3, respectively. This scale addresses the limitations of dichotomous scoring used in tools like MMAT, which fail to distinguish between studies with stronger and weaker coverage on a given issue, leading to an oversimplification of the assessment. For example, a study employing a complex statistical assessment may receive the same score as one relying solely on a face validity check. Additionally, dichotomous scoring may ignore partial compliance with criteria and exclude studies that have limitations on some criteria but are still of high quality, resulting in bias due to the absence of eligible studies. On the other hand, a scoring system with many rating categories may reduce inter-rater reliability, as observed in the use of QATSDD (Sirriyeh et al., 2012). To address this, our checklist employs a three-point scale and provides clear descriptions and transferable examples that indicate what constitutes full or partial evidence. This adds transparency and consistency to the scoring process and prevents reviewers from scoring based on personal interpretations. The overall score, summed from the totals of each applicable section, is computed into a percentage by dividing by the number of indicators. This allows for standardized comparisons and categorization of study quality. It also avoids the unfair scoring seen in tools like QATSDD, where mixed-methods studies may receive inflated scores simply because they are assessed by a wider range of metrics due to their dual qualitative and quantitative components, even if the individual quality of each is not superior to a well-conducted single-method study. Moreover, our checklist sets different scoring criteria for different study types (e.g., a range of 54 points for mixed-methods studies versus 42 points for qualitative studies), ensuring that studies are not unfairly scored based on the number of methods used.

Our checklist can be integrated with artificial intelligence (AI) tools to enhance usability and scoring consistency. Large language models (LLMs), for instance, can be used to assess studies based on checklist indicators and generate initial scores. These scores can then be compared with human ratings to evaluate inter-rater reliability and refine the scoring process. Recent work by Zhu et al. (2025) demonstrated the feasibility of such an approach in a systematic review and meta-analysis, where ChatGPT was used for screening and coding processes, achieving strong agreement with expert reviewers. Machine learning algorithms can also be used to identify which indicators most effectively distinguish study quality and adjust their weights accordingly. Additionally, AI-generated feedback could assist novice users in understanding the criteria and making more accurate judgments.

The checklist can benefit both novice and experienced researchers in the social sciences conducting systematic reviews and meta-analyses, reviewers assessing manuscripts for potential publication, and readers critically appraising and synthesizing research findings. Its flexibility and efficiency allow users to spend more time on in-depth analysis, reflection, and generating new research questions. Additionally, funding agencies can use the QQM checklist as a baseline check to evaluate the methodological rigor of research proposals. Furthermore, educators can use it as a teaching aid to help graduate students grasp and apply key concepts and standards in research design and execution.

Further considerations in use of QQM checklist

It is worth noting that effective quality appraisal presupposes a clearly formulated review question. Systematic reviewers and meta-analysts using the checklist still need to make value judgments based on the study specifics and the review question to determine whether an article is relevant to the question (Petticrew & Roberts, 2008). In addition, even with scoring guidelines in place, researchers’ personal backgrounds and judgments may still influence the final score, leading to inconsistency in scoring. Thus, an iterative process is encouraged whereby reviewers independently score and then compare and discuss to resolve discrepancies (Sirriyeh et al., 2012). Moreover, it should be noted that our choice of three-point scoring system may not fully capture the nuanced features of some complex studies. Readers/users may set up more rigid or looser cut-off points (e.g., a threshold of 90% as “Very high quality”) as long as they provide reasonable rationale. Furthermore, although this paper outlines possible pathways for AI integration, no empirical implementation has yet been conducted. Future work will focus on the development, testing, and validation of such AI-assisted workflows to assess their feasibility, reliability, and acceptance among researchers. Finally, researchers from different cultures and regions are encouraged to apply the QQM checklist in their research and teaching practices and to provide feedback on the tool's applicability to us. This will help us revise and update the checklist in the future, to further improve its usability internationally.

Supplemental Material

sj-docx-1-roe-10.1177_20965311251371227 - Supplemental material for Quality Appraisal Tools for Quantitative, Qualitative, and Mixed-Methods Studies: A Review and a Brief New Checklist

Supplemental material, sj-docx-1-roe-10.1177_20965311251371227 for Quality Appraisal Tools for Quantitative, Qualitative, and Mixed-Methods Studies: A Review and a Brief New Checklist by Xin Tang (唐鑫), Zixin Zeng (曾子欣), Haoyan Huang (黄浩岩) and Jennifer Symonds in ECNU Review of Education

Footnotes

Contributorship

Xin Tang was responsible for the conceptualization of the work, contributed to writing the original draft, and was involved in reviewing and editing the manuscript. Additionally, Xin Tang provided supervision and resources for the project. Zixin Zeng contributed to writing the original draft and was involved in reviewing and editing. Haoyan Huang contributed to writing the original draft and determined the methodology. Jennifer Symonds was responsible for the conceptualization of the work, contributed to reviewing and editing the manuscript, and provided resources.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical statement

As a review study, not an empirical study, this study did not survey human participants using measurement tools. Thus, normal IRB was exempt. However, this study was conducted in accordance with the principles outlined in the Helsinki Declaration. Six human experts were asked to evaluate QQM Checklist based on the dimensions of utility and usability. Prior to their participation, all six expert reviewers were provided with a clear explanation of the study's purpose, the nature of their task (evaluating the research quality assessment tool), and the expected time commitment. Participation was voluntary, and reviewers were informed that they could withdraw at any time without consequence. Verbal agreement to participate was obtained from all experts before they were given access to the tool for review. All data collected were anonymized and has been handled with strict confidentiality.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study is supported by Shanghai Jiao Tong University and University College London Strategic Partner Collaborative Project (PI: Xin Tang; Co-PI: Jennifer Symonds).

ORCID iDs

Xin Tang

Jennifer Symonds

Supplemental Material

Supplemental material for this article is available online.

Appendix A. Procedure of systematic review

Search string
Construct	Synonyms	Boolean
Comprehensive Method	“qualitative, quantitative” OR “quantitative, qualitative” OR “qualitative and quantitative” OR “quantitative and qualitative” OR “mixed method*”	AND
Appraisal	appraisal OR evaluation OR assessment	AND
Tool	tool OR checklist OR framework OR report OR table OR guideline
Database search
Item	Detail
Search engine	EBSCOhost
Databases	Academic Search CompleteBritish Education IndexEducation Resource Information Center (ERIC)APA PsycInfo
Field codes	Title
Search code	TI((“qualitative, quantitative” OR “quantitative, qualitative” OR “qualitative and quantitative” OR “quantitative and qualitative” OR “mixed method*”) AND (appraisal OR evaluation OR assessment) AND (tool OR checklist OR framework OR report OR table OR guideline))
Result N (2018–2024)	255
Duplicates removed	119
Irrelevant removed	128
Summary of Literature
Literature in 2018–2024 from EBSCOhost	8
Literature in 2000–2017 (Hong et al., 2019)	34 (excluded 1 due to duplication; 29 due to single-method appraisal)
Added literature from the citations of key studies	22
Summary	64 (see notes in Excel workbook)

Note. Journal articles irrelevant to research appraisal or only concerning single-method were excluded; tools/checklists including several sub-scales (JBI critical appraisal checklist) were only accounted for one time.

Appendix B. Quality Appraisal Checklist for Quantitative,Qualitative,and Mixed-Methods Studies

Section 1. Study procedures and sample (8 indicators).

Apply this section to quantitative, qualitative, and mixed-methods studies.

Section 2. Additional indicators for quantitative studies (4‒6 indicators).

Apply this section to quantitative and mixed-methods studies. Please note that the attrition and repeated measures indicators are only relevant for longitudinal studies (do not fill these out for cross-sectional studies).

Section 3. Additional indicators for qualitative studies (6 indicators).

Cut-off point suggestions

Researchers may wish to use a convenient score to determine eligibility for further data processing (e.g., only include studies scoring 50% or higher). It is up to you to decide whether to employ cut-off points. Cut-off points should match the purpose of your study (e.g., if you are doing a scoping review, you might not want to reject any studies based on quality, whereas if you are doing a meta-analysis, you might want to restrict the quantitative studies based on whether they report measurement validity statistics). We have included some suggestions for quality dimensions with two cut-off points,¹ emphasizing that these are only suggestions and not rules that researchers must follow. You can flexibly select either loose or strict criteria based on your research needs and goals (e.g., threshold of 90% as “Very high quality”); however, rationales/explanations should be provided to readers.

Notes

References

Appelbaum

Cooper

Kline

R. B.

Mayo-Wilson

Nezu

A. M.

Rao

S. M.

(2018). Journal article reporting standards for quantitative research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 3–25. https://doi.org/10.1037/amp0000191

Brannen

(Ed.) (1992). Mixing methods: Qualitative and quantitative research (1st ed.). Routledge. https://doi.org/10.4324/9781315248813

Canales

(2013). Transformative, mixed methods checklist for psychological research with Mexican Americans. Journal of Mixed Methods Research, 7(1), 6–21. https://doi.org/10.1177/1558689812446022

Carter

S. M.

Little

(2007). Justifying knowledge, justifying method, taking action: Epistemologies, methodologies, and methods in qualitative research. Qualitative Health Research, 17(10), 1316–1328. https://doi.org/10.1177/1049732307306927

Chandler

Cumpston

Page

M. J.

Welch

(2019). Cochrane handbook for systematic reviews of interventions (Vol. 4). Wiley.

Clausen

Cummins

Dionne

(2017). Educational interventions to enhance competencies for interprofessional collaboration among nurse and physician managers: An integrative review. Journal of Interprofessional Care, 31(6), 685–695. https://doi.org/10.1080/13561820.2017.1347153

Cohen

(1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587

Collins

K. M. T.

Onwuegbuzie

A. J.

Johnson

R. B.

(2012). Securing a place at the table: A review and extension of legitimation criteria for the conduct of mixed research. American Behavioral Scientist, 56(6), 849–865. https://doi.org/10.1177/0002764211433799

DeVellis

R. F.

(2017). Scale development: Theory and applications (4th ed.). Sage.

10.

Egger

Smith

G. D.

Sterne

J. A.

(2001). Uses and abuses of meta-analysis. Clinical Medicine (London, England), 1(6), 478–484. https://doi.org/10.7861/clinmedicine.1-6-478

11.

Fàbregues

Molina-Azorín

J. F.

(2017). Addressing quality in mixed methods research: A review and recommendations for a future agenda. Quality & Quantity, 51(6), 2847–2863. https://doi.org/10.1007/s11135-016-0449-4

12.

Fàbregues

Molina-Azorin

J. F.

Fetters

M. D.

(2021). Virtual special issue on “quality in mixed methods research”. Journal of Mixed Methods Research, 15(2), 146–151. https://doi.org/10.1177/15586898211001974

13.

Fàbregues

Paré

M.-H.

Meneses

(2019). Operationalizing and conceptualizing quality in mixed methods research: A multiple case study of the disciplines of education, nursing, psychology, and sociology. Journal of Mixed Methods Research, 13(4), 424–445. https://doi.org/10.1177/1558689817751774

14.

Fenton

Lauckner

Gilbert

(2015). The QATSDD critical appraisal tool: Comments and critiques. Journal of Evaluation in Clinical Practice, 21(6), 1125–1128. https://doi.org/10.1111/jep.12487

15.

Geuna

Martin

B. R.

(2003). University research evaluation and funding: An international comparison. Minerva, 41(4), 277–304. https://doi.org/10.1023/B:MINE.0000005155.70870.bd

16.

Gisev

Bell

J. S.

Chen

T. F.

(2013). Interrater agreement and interrater reliability: Key concepts, approaches, and applications. Research in Social and Administrative Pharmacy, 9(3), 330–338. https://doi.org/10.1016/j.sapharm.2012.04.004

17.

Gustavson

von Soest

Karevold

Røysamb

(2012). Attrition and generalizability in longitudinal studies: Findings from a 15-year population-based study and a Monte Carlo simulation study. BMC Public Health, 12(1), 918–918. https://doi.org/10.1186/1471-2458-12-918

18.

Harden

Gough

(2012). Quality and relevance appraisal. In Gough

Oliver

Thomas

(Eds.), An Introduction to systematic reviews (pp. 153–178). Sage.

19.

Harrison

Jones

Gardner

Lawton

(2021). Quality assessment with diverse studies (QuADS): An appraisal tool for methodological and reporting quality in systematic reviews of mixed- or multi-method studies. BMC Health Services Research, 21(1), 144. https://doi.org/10.1186/s12913-021-06122-y

20.

Heyvaert

Hannes

Maes

Onghena

(2013). Critical appraisal of mixed methods studies. Journal of Mixed Methods Research, 7(4), 302–327. https://doi.org/10.1177/1558689813479449

21.

Higgins

J. P. T.

Thomas

Chandler

Cumpston

Page

M. J.

Welch

V. A.

(Eds.) (2024). Cochrane handbook for systematic reviews of interventions version 6.5 (updated August 2024). Cochrane. www.training.cochrane.org/handbook

22.

Holl

van den Dries

Wolf

J. R. L. M.

(2016). Interventions to prevent tenant evictions: A systematic review. Health and Social Care in the Community, 24(5), 532–546. https://doi.org/10.1111/hsc.12257

23.

Hong

Q. N.

Fàbregues

Bartlett

Boardman

Cargo

Dagenais

Gagnon

M.-P.

Griffiths

Nicolau

O’Cathain

Rousseau

M.-C.

Vedel

Pluye

(2018a). The Mixed Methods Appraisal Tool (MMAT) version 2018 for information professionals and researchers. Education for Information, 34, 285–291. https://doi.org/10.3233/EFI-180221

24.

Hong

Q. N.

Gonzalez-Reyes

Pluye

(2018b). Improving the usefulness of a tool for appraising the quality of qualitative, quantitative and mixed methods studies, the Mixed Methods Appraisal Tool (MMAT). Journal of Evaluation in Clinical Practice, 24(3), 459–467. https://doi.org/10.1111/jep.12884

25.

Hong

Q. N.

Pluye

Fàbregues

Bartlett

Boardman

Cargo

Dagenais

Gagnon

M.-P.

Griffiths

Nicolau

O’Cathain

Rousseau

M.-C.

Vedel

(2019). Improving the content validity of the mixed methods appraisal tool: A modified e-Delphi study. Journal of Clinical Epidemiology, 111, 49–59. https://doi.org/10.1016/j.jclinepi.2019.03.008

26.

Ivankova

N. V.

(2014). Implementing quality criteria in designing and conducting a sequential QUAN → QUAL mixed methods study of student engagement with learning applied research methods online. Journal of Mixed Methods Research, 8(1), 25–51. https://doi.org/10.1177/1558689813487945

27.

Joanna Briggs Institute . (2024). JBI’s critical appraisal tools. https://jbi.global/critical-appraisal-tools

28.

Koch

Harrington

(1998). Reconceptualizing rigour: The case for reflexivity. Journal of Advanced Nursing, 28(4), 882–890. https://doi.org/10.1046/j.1365-2648.1998.00725.x

29.

Leung

K. C. Y.

Peisah

(2023). A mixed-methods systematic review of group reflective practice in medical students. Healthcare, 11(12), 1798. https://www.mdpi.com/2227-9032/11/12/1798

30.

Lincoln

Y. S.

Guba

E. G.

(1985). Naturalistic inquiry (pp. 289–331). Sage. https://doi.org/10.1016/0147-1767(85)90062-8

31.

Little

R. J. A.

Rubin

D. B.

(2019). Statistical analysis with missing data (3rd ed., Vol. 793). Wiley. https://doi.org/10.1002/9781119482260

32.

Long

H. A.

French

D. P.

Brooks

J. M.

(2020). Optimising the value of the critical appraisal skills programme (CASP) tool for quality appraisal in qualitative evidence synthesis. Research Methods in Medicine & Health Sciences, 1(1), 31–42. https://doi.org/10.1177/2632084320947559

33.

McPherson

A. C.

Hamilton

Kingsnorth

Knibbe

T. J.

Peters

Swift

J. A.

Krog

Chen

Steinberg

Ball

G. D. C.

(2017). Communicating with children and families about obesity and weight-related topics: A scoping review of best practices. Obesity Reviews, 18(2), 164–182. https://doi.org/10.1111/obr.12485

34.

Michie

Johnston

Abraham

Lawton

Parker

Walker

(2005). Making psychological theory useful for implementing evidence based practice: A consensus approach. Quality & Safety in Health Care, 14(1), 26–33. https://doi.org/10.1136/qshc.2004.011155

35.

Moher

Liberati

Tetzlaff

Altman

D. G.

(2010). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. International Journal of Surgery, 8(5), 336–341. https://doi.org/10.1016/j.ijsu.2010.02.007

36.

Munn

Moola

Riitano

Lisy

(2014). The development of a critical appraisal tool for use in systematic reviews addressing questions of prevalence. International Journal of Health Policy and Management, 3(3), 123–128. https://doi.org/10.15171/ijhpm.2014.71

37.

Narsakka

Suhonen

Kielo-Viljamaa

Stolt

(2022). Physical, social, and symbolic environment related to physical activity of older individuals in long-term care: A mixed-method systematic review. International Journal of Nursing Studies, 135, 104350. https://doi.org/10.1016/j.ijnurstu.2022.104350

38.

Nelson

L. D.

Simmons

Simonsohn

(2018). Psychology’s renaissance. Annual Review of Psychology, 69(1), 511–534. https://doi.org/10.1146/annurev-psych-122216-011836

39.

Nowell

L. S.

Norris

J. M.

White

D. E.

Moules

N. J.

(2017). Thematic analysis: Striving to meet the trustworthiness criteria. International Journal of Qualitative Methods, 16(1), 1–13. https://doi.org/10.1177/1609406917733847

40.

O'Cathain

(2010). Assessing the quality of mixed methods research: Toward a comprehensive framework. In Tashakkori

Teddlie

(Eds.), SAGE Handbook of mixed methods in social & behavioral research (pp. 531–555). Sage Publications. https://doi.org/10.4135/9781506335193.n21

41.

Orr

Wright

F. V.

Grassmann

McPherson

A. C.

Faulkner

G. E.

Arbour-Nicitopoulos

K. P.

(2021). Children and youth with impairments in social skills and cognition in out-of-school time inclusive physical activity programs: A scoping review. International Journal of Developmental Disabilities, 67(2), 79–93. https://doi.org/10.1080/20473869.2019.1603731

42.

Petticrew

Roberts

(2008). Systematic reviews in the social sciences: A practical guide (1st ed.). Wiley. https://doi.org/10.1002/9780470754887

43.

Pluye

Hong

Q. N.

(2014). Combining the power of stories and the power of numbers: Mixed methods research and mixed studies reviews. Annual Review of Public Health, 35(1), 29–45. https://doi.org/10.1146/annurev-publhealth-032013-182440

44.

Protogerou

Hagger

M. S.

(2019). A case for a study quality appraisal in survey studies in psychology. Frontiers in Psychology, 9, 2788. https://doi.org/10.3389/fpsyg.2018.02788

45.

Ryan

Vieira

Gyamfi

Ojo

Shelley

Ogedegbe

Iwelunmor

Peprah

(2022). Development of the ASSESS tool: A comprehenSive tool to Support rEporting and critical appraiSal of qualitative, quantitative, and mixed methods implementation reSearch outcomes. Implementation Science Communications, 3(1), 34. https://doi.org/10.1186/s43058-021-00236-4

46.

Salmela-Aro

Tang

Symonds

Upadyaya

(2021). Student engagement in adolescence: A scoping review of longitudinal studies 2010–2020. Journal of Research on Adolescence, 31(2), 256–272. https://doi.org/10.1111/jora.12619

47.

Schmidt

F. L.

Hunter

J. E.

(1996). Measurement error in psychological research: Lessons from 26 research scenarios. Psychological Methods, 1(2), 199–223. https://doi.org/10.1037/1082-989X.1.2.199

48.

Shadish

W. R.

Cook

T. D.

Campbell

D. T.

(2002). Experimental and quasi-experimental designs for generalized causal inference. Houghton Mifflin.

49.

Sirriyeh

Lawton

Gardner

Armitage

(2012). Reviewing studies with diverse designs: The development and evaluation of a new tool. Journal of Evaluation in Clinical Practice, 18(4), 746–752. https://doi.org/10.1111/j.1365-2753.2011.01662.x

50.

Slavin

R. E.

(1986). Best-evidence synthesis: An alternative to meta-analytic and traditional reviews. Educational Researcher, 15(9), 5–11. https://doi.org/10.3102/0013189X015009005

51.

Slavin

R. E.

(1995). Best evidence synthesis: An intelligent alternative to meta-analysis. Journal of Clinical Epidemiology, 48(1), 9–18. https://doi.org/10.1016/0895-4356(94)00097-a

52.

Zhu

Chen

Wei

Liu

Gan

(2025). A systematic review and Bayesian network meta-analysis on the efficacy and potential of mobile interventions for stress management. Nature Human Behaviour. https://doi.org/10.1038/s41562-025-02162-0

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB

Quality Appraisal Tools for Quantitative,Qualitative,and Mixed-Methods Studies: A Review and a Brief New Checklist

Abstract

Purpose

Design/Approach/Methods

Findings

Originality/Value

Keywords

Introduction

Current research appraisal tools

Need for a brief but powerful comprehensive tool

Theoretical framework guiding the development of the quantitative, qualitative, and mixed-methods studies [QQM] checklist

QQM checklist—The quality appraisal checklist for quantitative, qualitative, and mixed-methods studies

Introduction of QQM Checklist

QQM checklist validation

Unique features of the QQM checklist

Discussion

Further considerations in use of QQM checklist

Supplemental Material

sj-docx-1-roe-10.1177_20965311251371227 - Supplemental material for Quality Appraisal Tools for Quantitative, Qualitative, and Mixed-Methods Studies: A Review and a Brief New Checklist

Footnotes

Contributorship

Declaration of Conflicting Interests

Ethical statement

Funding

ORCID iDs

Supplemental Material

Appendix A. Procedure of systematic review

Appendix B. Quality Appraisal Checklist for Quantitative,Qualitative,and Mixed-Methods Studies

Notes

References

Supplementary Material