What can we learn from reviews of realist evaluation studies? Principles for rigorous realist evaluation

Abstract

The realist evaluation approach has become firmly established within the field of evaluation. Reflecting its sustained and increasing uptake across diverse fields, a growing number of reviews have, over the years, examined practical applications of realist evaluations. Drawing on an umbrella review of 23 published reviews of realist evaluations, this article takes stock of key challenges in realist evaluation and proposes practical principles for addressing them. The proposed principles are designed to promote greater methodological congruence, coherence and transparency in the design and implementation of future realist evaluations.

Keywords

RAMESES II realist evaluation realist principles research on evaluation theory-based evaluation umbrella review

Introduction

Since the publication of Ray Pawson and Nick Tilley’s (1997) seminal book Realistic Evaluation, the realist evaluation (RE) approach has become firmly established within the field of evaluation. Its growing influence is evident in the exponential proliferation of publications on RE (Lemire et al., 2020; Renmans and Pleguezuelo, 2023), the publication of dedicated books (Emmel et al., 2018; Manzano and Williams, 2025), the convening of international conferences (e.g. the International Realist Conferences in 2021 and 2025), and the development of quality standards for reporting on REs (Wong et al., 2017) and realist syntheses (Wong et al., 2014). As evaluation practice grapples with rising complexity and calls for theory-driven explanation, RE appears well positioned to play an increasingly central role in future evaluation research, policy learning and practice.

At its core, RE seeks to answer the question of how programmes work, for whom, and under what conditions (Pawson and Tilley, 1997). It rests on the assumption that interventions are “theories incarnate” (Pawson and Tilley, 1997), meaning that programmes embody underlying assumptions about causal processes. The conceptual structure guiding RE is expressed through context–mechanism–outcome configurations (CMOCs) or variants thereof (De Weger et al., 2020), which articulate how mechanisms operating under particular contextual conditions generate outcomes. For realist evaluators, context is an irreducible and dynamic component of explanation (Craig et al., 2008; Greenhalgh and Manzano, 2021). Accordingly, CMOCs are developed through retroduction, a form of reasoning that moves from observed patterns to hypothesising about the underlying causal mechanisms that account for them.

The explanatory ambitions of RE are grounded in scientific realism, yet the precise nature of this grounding has been the focus of considerable theoretical debate. Early debates centred on whether RE required adherence to critical realism (Porter, 2015). In his rebuttal, Pawson (2016) resisted Bhaskarian social theory as overly normative and argued that RE does not aim to offer a full theory of social structure, agency or emancipation. Instead, RE’s purpose is to develop testable, context-sensitive causal explanations of programmes in action (Pawson, 2016). More recently, Mukumbang et al. (2023) reframe this divide by arguing that RE draws on an amalgam of scientific and critical realist principles. This account positions realist programme theories as retroductively developed, fallible and context-sensitive explanatory propositions. Functionally, they are the equivalent of middle-range theories that mediate between abstract ontology and empirical findings.

In unpacking CMOCs, REs are “methodologically promiscuous,” potentially using quantitative and qualitative data to test theories (Van Belle et al., 2016: 313). However, the compatibility of RE with certain research designs, such as experimental designs, remains contested (Blackwood et al., 2010; Jamal et al., 2015; Van Belle et al., 2016). To this day, the RE community remains divided between advocates for combining RE with experimental designs (e.g. Bonell et al., 2024) and those that are strongly opposed, arguing that such realist trials are ontologically and epistemologically incongruent with realist principles (Van Belle et al., 2016). Such divisions illustrate how methodological debate continues to play a constitutive role in the development of RE.

Whereas rigour in (post) positivist experimental impact evaluation is grounded in randomisation as a means of isolating causal effects and minimising bias, realists emphasise rigour as explicit and reasoned theorising about programmes, the articulation of their underlying mechanisms and the iterative testing of these theories to build plausible explanatory accounts. To support such practices, realist scholars have developed the Realist And Meta-narrative Evidence Syntheses: Evolving Standards (RAMESES II) for design and reporting (Wong et al., 2017). Yet, given the expansive and flexible nature of RE, these standards are necessarily less prescriptive than those found in the experimental social and health sciences. As with other emerging methodologies, the application of RE remains open to interpretation and variation.

The maturation of RE as an evaluation approach is reflected in a growing number of reviews examining its application across diverse fields (e.g. Malengreaux et al., 2024; Taylor et al., 2024). These past reviews underscore both the methodological promise of RE and the challenges that accompany its practical implementation. Considered collectively, these reviews offer a comprehensive overview of the RE landscape.

Nearly three decades after Pawson and Tilley’s seminal publication, it is timely to take stock of what has been learned about the practice of RE. To this end, we conducted an umbrella review of published reviews of REs. An umbrella review synthesises evidence across multiple reviews on a given topic, offering a comprehensive overview of existing findings, identifying patterns and generating higher-level insights to guide future research and practice (Belbasis et al., 2022). This umbrella review addresses two related questions. First, what do existing reviews of RE studies reveal about how RE is practised, particularly with respect to conceptual, methodological and analytical issues across the RE cycle? Second, building on these insights, what overarching yet practically actionable principles can be articulated to address these challenges and strengthen rigour in future REs?

This article is structured in three parts. The review methodology is outlined first, followed by a presentation of the findings, which highlight key conceptual, analytical and methodological challenges related to RE. The article concludes by proposing and discussing a set of principles to inform and strengthen future RE practice.

Methodology

In this section, we describe the search, screening, coding and analysis procedures performed in the review.

Search strategy

To inform our umbrella review, we first conducted broad electronic searches in PsycINFO, PubMed, Web of Science, ERIC, Campbell Collaboration and Cochrane Libraries, supplemented by more targeted manual searches using Google Scholar, selected institutional websites and citation chasing. Search terms for the electronic searches combined “realist evaluation,” “realistic evaluation” and “review,” covering the period from 1997 to May 2025. Publications in English, French, German and the Scandinavian languages were considered.

All types of reviews—systematic, scoping, integrative, narrative, meta-reviews and meta-syntheses—that explicitly aimed to identify, appraise and/or synthesise findings from multiple RE studies were included. Reviews including both RE studies and realist syntheses were eligible. We included reviews published in academic journals and grey literature. We excluded review protocols, book reviews, opinion pieces, reviews of realist syntheses exclusively (with no individual RE studies), individual RE studies and topical realist reviews/syntheses of other types of studies (e.g. realist syntheses of non-RE studies on a given topic).

Our database searches initially identified 1133 publications. After abstract screening and removal of duplicates, 19 publications remained for full-text screening. Manual searching and citation chasing yielded an additional nine publications, resulting in 28 publications screened for eligibility. Following full-text screening, five articles were excluded as they did not meet the inclusion criteria. In total, 23 reviews of RE were included in the analysis. The selection process is depicted in the PRISMA diagram (Figure 1).

Figure 1.

PRISMA diagram of literature search.

Coding and analysis

All publications were read by the authors. Coding and data extraction in NVivo were carried out by author 1, thus avoiding interrater reliability issues. Data was extracted using a two-pronged strategy. We used NVivo for detailed coding of the reviews and developed a summary table in Microsoft Excel to capture their key characteristics. The NVivo codebook, informed by prior knowledge of the literature, included the following overarching themes (parent nodes): (1) scope and objectives, (2) review type, (3) methodology, (4) data sources and search strategies, (5) domain, (6) methodological challenges, (7) analytical challenges, (8) conceptual challenges, (9) recommendations and (10) RAMESES standards. Data for the summary table were extracted by both authors and subsequently consolidated. The summary table was used to identify overall patterns across the reviews. While discussing and consolidating our own analysis, we also compared these findings against AI-generated output.

We applied Generative Artificial Intelligence (GenAI) models—ChatGPT 5.1 and NotebookLM—for two purposes: first, as a tool for researcher triangulation, adopting the perspectives of a realist theorist and evaluator. We prompted the models to generate summary tables and draft text, which were considered alongside our own analyses. These AI-generated outputs helped to challenge and deepen our interpretations in areas we initially found underexplored, such as stakeholder involvement and the philosophical underpinnings of RE. When cross-validating the AI-generated summary tables against those produced using NVivo, we found the AI outputs to be of lower quality than our human data extraction, illustrating the current frontier of GenAI in research support (Dell’Acqua et al., 2023). Second, we employed the models as proxy persona-based article reviewers (Bougie and Watanabe, 2024), simulating the perspectives of a realist theorist, a review methodologist and an evaluation practitioner. These AI-generated reviews informed revisions to the article, including the addition of references to concrete applications and refinements to the proposed principles in the discussion section.

Limitations

This umbrella review presents several limitations that warrant consideration. First, as an umbrella review, it synthesises findings from existing reviews of REs rather than directly analysing primary RE studies. This second-order analysis may obscure important methodological nuances present in individual REs. Second, the gross number of REs cited across the reviews includes substantial duplication (about 50%), and the exact number of unique REs remains undetermined. This compromises the precision of claims about the breadth of empirical evidence underpinning the review. Third, past reviews are heavily weighted towards evaluations in the health sector and authored predominantly by researchers in Anglo-European contexts. This overrepresentation may limit the transferability of findings to other sectors or regions with different evaluation traditions. Moreover, while peer-reviewed literature provides a degree of rigour, the exclusion of unpublished evaluations, dissertations and grey literature in many past reviews introduces the risk of publication bias, particularly in a field where much evaluation practice occurs outside academic settings. Finally, while using a single coder in NVivo alleviates concerns about interrater reliability, it introduces the possibility of coder bias and limits the breadth of interpretation. To address this, we used summary tables to discuss and consolidate data extraction and employed two GenAI models to interrogate and cross-validate our interpretations. All findings were subsequently reviewed and discussed jointly by the authors to ensure consistency and analytical rigour.

Findings

In this section, we first provide a description of the key characteristics of the 23 reviews included in this umbrella review, including their publication year (Figure 2), and country of origin, main purpose, domain, coverage and application of quality standards in Table 1. Informed by this initial description, we identify different review orientations and discuss three methodological challenges identified across the reviews.

Table 1.

Key characteristics of the reviews.

Authors (year)	Country of origin*	Purpose	Domain	Coverage	Quality standards applied in the review	Databases searched
Marchal et al. (2012)	Belgium	Assess whether published REs adhere to realist methodological principles.	Health; Health systems, health care	18 REs published in 1997–2011.	Not specified	PubMed and Web of Science/Social Sciences Citation Index
Ridde et al. (2012)	Canada	Assess how REs have been applied and identify methodological challenges.	Health; public health	10 REs published in 1997–2010.	Not specified	Medline, CNAHL, EMBASE, CAB HEALTH, SAGE Journals Online and ERIC
Salter and Kothari (2014)	Canada	Examine the use of RE (CMOC development and methodology) in knowledge translation interventions in health care.	Health; public health policy; Knowledge translation interventions	14 REs published in 2007–2013.	Not specified	Medline, SCOPUS, CINAHL and Embase
Lacouture et al. (2015)	France	Clarify the concept of mechanism within RE and propose a classification framework.	Health; Public health	49 publications published in 1997–2012.	Not specified	Medline, Academic Search Complete, Eric, SAGE Journals Online, BDSP, Cairn info and ScienceDirect
Manzano (2016)	United Kingdom	Examine the use of interviewing in RE and propose principles for conducting realist interviews.	Health sector	40 REs using interviews published in 2004–2013.	Not specified	PubMed and Web of Science/Social Sciences Citation Index
Wong et al. (2017)	United Kingdom	Develop quality standards for RE design and reporting.	Cross sector	37 REs published in 2012–2015.	Not specified	CINAHL, The Cochrane Library, Dissertations & Theses, EMBASE, ERIC, Global Health, MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, Scopus and Web of Science
Haunberger and Baumgartner (2017)	Switzerland	Map and analyse the use of RE in social work evaluations.	Social work	33 REs published in 1997–2013.	Not specified	ERIC, Health Source: Nursing/Academic Edition, IBSS, Medline, PsycINFO, Sociological Abstracts, Web of Science and WISO
Lemire et al. (2020)	USA	Examine how mechanisms are defined and applied in RE.	Cross sector	195 REs published in 1997–2017.	Not specified	Web of Science, ArticleFirst, PsycINFO, Social Work Abstracts, Sociological Abstracts, ERIC and Medline (PubMed)
Quintans et al. (2020)	Brazil	Identify and analyse RE concepts and methodologies in the health sector.	Health; Health systems	19 articles on RE published in 1997–2018.	Not specified	PubMed, LILACS, Cochrane, EVIPNet, Health Systems Evidence, Rx for change, PDQ Evidence, SciELO, Teses-CAPES and Google Scholar
Dalkin et al. (2021)	United Kingdom	Explore the usefulness of Normalisation Process Theory (NPT) in understanding mechanisms in REs.	Health; Health care	12 REs published in 2007–2019.	Not specified	ASSIA, CINAHL, Health Research Premium Collection, MEDLINE and PsycARTICLES.
Greenhalgh and Manzano (2021)	United Kingdom	Explore the concept of ‘context’ in RE and realist synthesis.	Cross sector	40 REs and syntheses published in 2012–2015.	Not specified	CINAHL, The Cochrane Library, Dissertations & Theses, EMBASE, ERIC, Global Health, MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, Scopus and Web of Science
Jenkins et al. (2021)	Australia	Map the use of realist methodologies in nutrition and dietetics research.	Health; public health interventions; Nutrition, dietetics	Seven REs published in 2000–2020.	Not specified	Eight domain-specific journals
Lam et al. (2021)	Canada	Compare and contrast the use of the Theory of Change and RE in supporting food security programmes.	Food security, international development	Eight REs published in 2011–2020.	Not specified	AGRICOLA, CabDirect and Web of Science
Malengreaux et al. (2022)	Belgium	Assess the design, implementation and methodological quality of REs.	Health; Public health; Health promotion, prevention (e.g. breastfeeding, oral health, physical activity)	17 REs published in 2010–2021.	Not specified	Embase, PubMed, PsycINFO, ScienceDirect and Scopus.
Nielsen et al. (2022)	Denmark	Examine how context is conceptualised and operationalised in REs.	Cross sector	126 REs published in 1997–2017.	Not specified	Web of Science, ArticleFirst, PsycINFO, Social Work Abstracts, Sociological Abstracts, ERIC and Medline (PubMed)
Palm and Hochmuth (2020)	Germany	Analyse common features across REs to identify design patterns and conditions for success.	Health; Health care, nursing	28 REs published in 1997–2019.	Not specified	MEDLINE and CINAHL
Hitchcock et al. (2022)	South Africa	Assess realist approaches in the context of health systems and implementation research.	Health; Health systems, service delivery, clinical interventions, social programmes	31 REs published in 1997–2022.	Not specified	Scopus
(Nielsen, Jaspers, and Lemire, 2023)	Denmark	Investigate methodological compatibility of RE with randomised controlled trials (realist trials).	Cross sector	16 realist trials; published in 1997–2017.	Informed by: WWC design standards, Cocrane Collaboration standards, RAMESES design standards	Web of Science, ArticleFirst, PsycINFO, Social Work Abstracts, Sociological Abstracts, ERIC and Medline (PubMed)
Rees et al. (2024)	Australia	Critically analyse the use of realist interviews in Health Professions Education Research (HPER).	Health; Health care, Health professions education	12 REs using interviews and published in 2013–2023.	Five criteria Informed by RAMESES	Five domain-specific journals
Renmans and Pleguezuelo (2023)	Belgium	Examine data collection methods and challenges in REs.	Cross sector	166 REs published in 1997–2021.	Not specified	Scopus, Web of Science, Global Health, EMBASE and MEDLINE
Malengreaux et al. (2024)	Belgium	Identify how stakeholders are involved in REs and synthesise guidance for best practice.	Cross sector	29 REs published in 1997–2023.	Not specified	Medline/PubMed, EMBASE, Web of Science, Scopus and Google Scholar
Taylor et al. (2024)	United Kingdom	Introduce RE and realist synthesis to the field of kidney research, providing practical guidance.	Health; Clinical health; Kidney research, nephrology	16 RE published in 1997–2023.	Not specified	Medline, Embase and PsycINFO
Nielsen and Lemire (2025)	Denmark	Examine analytical strategies in REs to identify and illustrate how analytical strategies relate to research design and data collection.	Cross sector	126 REs published in 1997–2017.	Not specified	Web of Science, ArticleFirst, PsycINFO, Social Work Abstracts, Sociological Abstracts, ERIC and Medline (PubMed)

Note. *First author.

Key characteristics of existing reviews

Publication year

The publication of reviews on RE has grown steadily over time, reflecting the broader expansion of the approach. Although Pawson and Tilley’s book (1997) marks the starting point, the first reviews did not appear until 2012 (Marchal et al., 2012; Ridde et al., 2012). Between 1997 and 2016, only six reviews were published, compared with 17 published between 2017 and 2025. This increase mirrors the wider proliferation of RE studies in recent years (Nielsen et al., 2022; Renmans and Pleguezuelo, 2023). The increasing trend in reviews of RE studies is illustrated in Figure 2.

Figure 2.

Reviews on realist evaluation published per year (2012–2025).

Country of origin

The geographical distribution of published reviews of REs is uneven, with most reviews stemming from the Global North (21 reviews). Two reviews come from (leading economies in) the Global South (Brazil and South Africa). Most reviews originate from Anglo-Saxon contexts (Australia, Canada, South Africa and the United States) and European countries (Belgium, Denmark, Germany and the United Kingdom), with only a single review published outside these linguistic regions—in Brazil (Quintans et al., 2020). This pattern broadly reflects the distribution of REs more generally, though researchers in the United Kingdom appear less represented among review authors. Nielsen and Lemire (2025) document that 13.5 per cent of published RE studies were conducted in Africa, Asia or South America.

Purpose

The 23 reviews varied in their primary purposes, reflecting, in large part, the diverse ways RE is studied and applied. Some reviews are primarily descriptive and aimed to map the RE literature and identify key issues (e.g. Haunberger and Baumgartner, 2017; Lemire et al., 2020; Nielsen et al., 2022, 2023; Renmans and Pleguezuelo, 2023; Ridde et al., 2012). Other reviews are predominantly prescriptive and serve as a foundation for methodological development (e.g. Manzano, 2016; Quintans et al., 2020; Wong et al., 2017), while a third group of reviews are primarily normative and aimed to advance RE within specific sector domains, such as health or social work (e.g. Jenkins et al., 2021; Lam et al., 2021; Taylor et al., 2024).

Domain

The domains covered by the reviews varied, with health (encompassing public health, clinical settings and health systems) being the dominant and primary focus of 13 reviews. This trend reflects a similar domain concentration observed in individual RE studies (Nielsen and Lemire, 2025). Eight reviews were cross-sectoral, while the remaining two focused on other health-adjacent domains, specifically social work (Haunberger and Baumgartner, 2017) and food security (Lam et al., 2021).

Coverage

The empirical coverage of the 23 reviews varied considerably, reflecting differences in purpose, domain and, to some extent, publication year. The largest reviews—Lemire et al. (2020) with 195 publications, Renmans and Pleguezuelo (2023) with 166 studies, the two reviews by Nielsen et al. (2022), and Nielsen and Lemire (2025) with 126 publications—addressed broad, cross-domain topics and were published more recently. In contrast, other reviews included fewer than 20 publications, either due to a narrow purpose (e.g. review by Nielsen et al., 2023, on realist trials) or domain focus (e.g. review by Lam et al., 2021, on RE in food security) or because they were published in earlier years where fewer RE studies had been published (Marchal et al., 2012; Ridde et al., 2012; Salter and Kothari, 2014).

Collectively, the 23 reviews drew on 1049 publications (some studies represented multiple affiliated papers). Of these, we were able to retrieve 881 articles (84%). After removing duplicates, 446 unique RE studies remained. Extrapolating to the full set of reviews, we estimate that roughly 535 unique studies constitute the empirical basis of the 23 reviews.

Quality standards

Early reviews use key characteristics of RE to structure their analysis, assess and identify challenges in the design and reporting of REs, but fall short of creating rubrics or standards to assess study quality (cf. Lacouture et al., 2015; Marchal et al., 2012; Salter and Kothari, 2014).

Wong et al. (2017) use their review as the basis, along with expert consultations and Delphi surveys, to develop the RAMESES II standards for designing and reporting RE studies. This effort marks an important juncture in codifying adherence to realist principles.

After the introduction of the RAMESES II standards, we found only two reviews that applied these standards in reviewing RE studies. Nielsen et al. (2023) applied the RAMESES II standards to assess whether a trade-off between RE and experimental design standards exists when conducting realist trials. Rees et al. (2024) partially applied RAMESES II standards, as they derived five criteria to assess the use of realist interviewing in health professions education research (HPER).

Only two of the remaining reviews explicitly addressed the RAMESES II standards as part of their review procedures. Malengreaux et al. (2022) and noted that they refrained from using the standards to assess study quality but did not provide a rationale. Several other reviews incorporated RAMESES reporting either as a screening criterion or as a coding variable (Dalkin et al., 2021; Lemire et al., 2020; Malengreaux et al., 2024; Nielsen and Lemire, 2025; Nielsen et al., 2022).

In a similar vein, none of the reviews set out to explicitly examine the philosophical underpinnings (epistemological and ontological) of RE studies and how these align with basic realist principles. Greenhalgh and Manzano’s review (2021) comes close, as they note that such assumptions are pivotal for the operationalisation of context. However, they refrain from presenting a systematic elicitation from the cases included in their review. Nielsen et al. (2023) also describe how differences anchored in critical or scientific realism, or their interpretation, may inform what research designs and techniques can be considered in adherence with RE principles in the context of realist trials. However, they fall short of exploring the issue in more depth.

Review orientations

Informed by the purpose and domain of the reviews, the main orientation (or focus) of each review can be identified. In Figure 3, we map the 23 reviews against three main orientations: methodological, conceptual and domain.

Figure 3.

Diagram of review orientations.

Methodological reviews

Some studies concentrate on methodological issues, such as how to design, collect data, analyse and report on REs. For example, Renmans and Pleguezuelo (2023) map the use and combination of different data collection methods across REs. Manzano (2016) develops principles for conducting realist interviews, while Nielsen and Lemire (2025) map analytical strategies and techniques employed in REs.

Conceptual reviews

Other reviews emphasise conceptual issues, such as how key RE constructs are defined and operationalised. Lacouture et al. (2015), Lemire et al. (2020) and Nielsen et al. (2022) focus on definitions and operationalisation of mechanism and context in RE studies. Greenhalgh and Manzano (2021) analyse the context construct with particular attention to its epistemological and ontological underpinnings. Similarly, Malengreaux et al. (2024) document how stakeholder involvement has implications for research and methods.

Domain reviews

A third group of reviews primarily adopts a domain focus, exploring the application of RE in specific domains, such as health or social work. Some reviews illustrate how RE can provide new heuristics in domains dominated by positivist paradigms (e.g. Lam et al., 2021; Taylor et al., 2024), while others map the use and implications of RE within social work (e.g. Haunberger and Baumgartner, 2017) or the health domain (e.g. Palm and Hochmuth, 2020). Some reviews combine domain focus with methodological issues, using the empirical domain as a backdrop for examining adherence to RE principles (Marchal et al., 2012) or developing procedural steps in RE (Quintans et al., 2020).

Some studies integrate all three orientations, echoing calls in the broader theory-based evaluation literature to connect substantive and implementation theory to programme and CMOC development (Lemire et al., 2020). For example, Dalkin et al. (2021) apply Normalisation Process Theory (NPT) to elucidate mechanisms in a health setting. Hitchcock et al. (2022) investigate how systems thinking informs programme theory and implementation in health systems, and Salter and Kothari (2014) use the PARiHS framework to identify mechanisms and contextual factors.

Methodological challenges

The authors of the 23 reviews identify several methodological challenges related to the design and implementation of REs. Across the reviews, we identify three main challenges: methodological congruence, methodological convergence and methodological transparency.

Methodological congruence

Methodological congruence refers to the extent to which a study’s design is logically and philosophically coherent (Creswell, 2013). Methodological congruence is particularly salient in RE, where the research design, data collection and analysis methods in unison should enable the testing of the programme theory.

While realist scholars broadly agree on methodological pluralism—allowing flexibility in tailoring research designs and methods to specific needs—RE is nonetheless grounded in epistemological and ontological foundations rooted in scientific and/or critical realism. This raises an important question: Are some designs inherently incompatible with the philosophical underpinnings of RE?

Several reviews have addressed this issue of philosophical alignment. Nielsen et al. (2023) summarise the debate on realist trials (see also Bonell et al., 2024; Van Belle et al., 2016), showing that while integration of RE with designs grounded in other ontological traditions is feasible in practice, it can present significant challenges in adhering to both established quality standards for randomised controlled trials standards and RAMESES II standards at the same time. Often, the impact study (involving the randomised controlled trial design) and the implementation study (using a RE approach) are reported separately. In their review, Nielsen et al. (2023) identified only two realist trials (out of 16 studies) that adhered to quality standards for both RE and randomised controlled trials and successfully integrated both of these aspects of their design.

Early reviews (Marchal et al., 2012; Ridde et al., 2012) noted that REs often lacked explicit CMOC logic, exhibited weak theorising and provided limited explanations of generative mechanisms. More recent reviews report persistent variation in how key realist constructs are defined, as well as analytical challenges in distinguishing mechanisms from both context and programme components (Lemire et al., 2020; Nielsen et al., 2022).

Greenhalgh and Manzano’s (2021) review of context in RE illustrates how study design and method selection shape the way realist constructs are conceptualised, operationalised and analysed. Greenhalgh and Manzano identify marked ontological and epistemological variation across realist studies, with some conceptualising context in a largely positivist or actualist manner (as static, observable features that trigger mechanisms) while others adopt a more scientific realist stance, treating context as relational, dynamic and constitutive of causal processes. These differences shape whether studies aim primarily at identifying transferable conditions for implementation or at developing explanatory, middle-range theories of how context–mechanism interactions evolve over time. Consequently, they argue that context in RE should not be treated as a statistical variable but rather as an irreducible and dynamic component of explanation, operating through interwoven micro-, meso-and macro-level processes.

Such differences in epistemological and ontological assumptions were also observed in other reviews. For example, the reviews by Palm and Hochmuth (2020) and Salter and Kothari (2014) reveal varied philosophical orientations among realist evaluators—some drawing on critical realism, others on scientific realism, and still others adopting a more pragmatic stance—though these positions are often left implicit. Defining rigour in terms of a realist logic of inquiry, therefore, requires explicit reflection on epistemological and ontological assumptions in study design, a point also emphasised by Renmans and Pleguezuelo (2023).

In sum, the uneven alignment between designs and realist principles identified across reviews highlights the need for greater attention to aligning study design and analytic procedures with the realist logic of inquiry in a coherent methodology. Methodological congruence may be supported by adherence to guiding principles and standards for RE. Although the introduction of the RAMESES II standards for design and reporting RE (Wong et al., 2017) has provided a framework for such alignment, to date, only one review (Nielsen et al., 2023) has applied these standards systematically to assess adherence to realist principles. More consistent application of the RAMESES II standards, combined with methodological innovations, such as realist interviewing and focus group techniques, may help strengthen methodological congruence in future studies.

Methodological convergence

Methodological convergence refers to strengthening the rigour of findings by demonstrating consistency across different methods or sources (Sánchez-Gómez and García, 2018). This can be achieved by triangulation of different data sources, methods, analytical techniques and theoretical perspectives (Yin, 2018). Methodological convergence is particularly important in RE because adequately explaining outcomes and the context–mechanism interactions that generate them often requires the integration of multiple data sources and methods.

Several reviews show that RE studies commonly employ multiple strands of qualitative or mixed-methods data collection (Nielsen and Lemire, 2025; Renmans and Pleguezuelo, 2023), providing opportunities to design methodologically convergent lines of inquiry. Renmans and Pleguezuelo (2023) report a particularly high reliance on interviews (97%), followed by observations (participant, video and document/monitoring data) (55%), surveys (26%) and what they term “innovative methods” such as vignettes, diaries and photographs (8%). They further note that specifically, realist interviews are used in only 18 per cent of studies, calling for a broader range of data collection techniques to better elucidate mechanisms, as well as greater attention to sampling strategies and realist-informed survey methods.

When examining data sources, Renmans and Pleguezuelo (2023), however, observe that

half of the REs are based on data solely from the users/beneficiaries or the key informants (policymakers, implementers, service deliverers, etc.). Although specific evaluation circumstances may justify this focus on one group of respondents, the influence of interests and social position . . . may give a biased and incomplete understanding of the intervention. (p. 6)

In a similar vein, Malengreaux et al. (2024) found that REs rarely provide a theoretically grounded account of who stakeholders are, when and how they should be involved, and why their perspectives matter. In most cases, stakeholder involvement was limited to knowledge validation, with decision-making authority concentrated among evaluators rather than shared with participants or communities. This potentially raises questions about the extent to which perspectives across stakeholder groups are fully integrated into the REs.

Considered collectively, these findings suggest that a more purposeful pursuit of triangulation across not only data collection methods but also data sources is called for.

Methodological transparency

Methodological transparency refers to conceptual transparency (clarity in defining and operationalising realist constructs) and procedural transparency (clarity in documenting analytical processes and methodological choices). A recurring concern in the reviews is the lack of transparency in how realist evaluators and researchers define, apply and report key concepts and analytical procedures.

Reviews consistently highlight that REs often lack clear and consistent definitions of core constructs (Lacouture et al., 2015; Nielsen et al., 2022; Ridde et al., 2012; Salter and Kothari, 2014). Conceptual confusion is evident in the varied or absent definitions of foundational constructs, with risk of conflation or misapplication. Greenhalgh and Manzano (2021) found that only 45 per cent of studies included an explicit definition of context, while Nielsen et al. (2022) reported a similar figure (48%). Lemire et al. (2020) observed that nearly half of the studies did not define mechanisms, and when definitions were provided, they varied widely across studies. They also found that different conceptualisations of mechanisms can be traced to different conceptualisations rooted in Pawson and Tilley’s work (1997) and that of Astbury and Leeuw (2010). In a similar vein, Greenhalgh and Manzano (2021) argue that different conceptualisations of context can be traced to different philosophical and methodological underpinnings.

Comparable inconsistencies were also found in how outcomes were defined and assessed (Salter and Kothari, 2014). This lack of definitional clarity contributes to persistent confusion between mechanisms and context (Lacouture et al., 2015; Nielsen et al., 2022; Ridde et al., 2012; Salter and Kothari, 2014). Taken together, these observations raise questions about how core realist principles are being operationalised in practice (Marchal et al., 2012).

RE explicitly encourages the development and testing of multiple CMOCs (De Weger et al., 2020). Programmes are often complex and likely to operate through different mechanisms depending on context. Studies that elicit or test too few CMOCs risk oversimplifying these dynamics and narrowing the scope of theory building. By contrast, working with several CMOCs within the same study allows evaluators to capture the heterogeneity of contexts, explore alternative mechanisms and generate more robust and practically useful explanations of what works, for whom, how and under which circumstances. However, this comes at the risk of losing sight of key explanatory drivers. This calls for transparent and reasoned identification and selection of CMOCs in the different phases of the realist cycle (De Weger et al., 2020).

Nielsen and Lemire (2025) reported a range from 1 to 23 CMOCs per evaluation, with an average of 4.1 CMOCs per evaluation, and noted that 77 per cent of evaluations contained five or fewer CMOCs. They further observed that a limited explanation was provided for why certain CMOCs were prioritised for testing while others were abandoned when moving from the initial programme theory to the ones being tested.

Additional concerns arise in relation to the use of data collection methods. Several reviews emphasise that the procedures for selecting methods and data sources (and analytical strategies) are often underdescribed or missing altogether (Haunberger and Baumgartner, 2017; Salter and Kothari, 2014). As just one example, Rees et al. (2024) found that studies often failed to specify how realist interview questions aligned with initial CMOs and how empirical data contributed to refining the final CMOs.

When testing the CMOCs, reviews also identified limited procedural transparency in reporting analytical steps, techniques and methodological decisions. Nielsen and Lemire (2025) found that only half of the reviewed studies explicitly reported the analytical techniques employed. While evaluators frequently employ thematic analysis, Nielsen and Lemire (2025) caution that such approaches are not always congruent with the analytical needs in a realist logic of inquiry.

Jenkins et al. (2021) similarly observed inconsistent documentation of coding strategies or techniques for developing and refining CMOCs in the field of nutrition and dietetics, while Hitchcock et al. (2022) noted that many health system evaluations used CMOCs only partially or failed to link them transparently to their findings.

Taken together, these findings suggest that real-world RE remains a maturing approach, characterised by variability in both conceptual clarity and procedural rigour. This requires more explicit definitions of key constructs, clearer documentation of analytical processes and transparent reporting of methodological choices. Adopting Morse et al.’s (1996) criteria for concept maturity (clarity of definition, delineated boundaries, specified preconditions and observable outcomes), RE appears best characterised as still developing rather than fully consolidated. This has important implications. Persistent variation in the definition and operationalisation of core constructs may constrain cumulative middle-range theory building, underscoring the need for greater conceptual clarity to support methodological congruence and analytical rigour. Moving forward, developing codified procedures and tools, alongside consistent application of frameworks such as the RAMESES II standards, may help strengthen both conceptual and procedural transparency in RE.

Recommendations for future realist evaluation

Motivated by identified challenges, many of the reviews provide recommendations on how to improve future REs. To organise the most salient recommendations, we adopt Salter and Kothari’s framework of four phases of RE: (1) formulation of an initial programme theory articulated as CMOCs, (2) data collection informed by the CMOCs, (3) data analysis and testing of the CMOCs and (4) refinement of CMOCs based on the findings. These recommendations are summarised in Table 2. Taken together, these recommendations underscore the need for clearer definitions, codified procedures and making explicit reasoning throughout the RE cycle.

Table 2.

Recommendations across four phases of a RE.

Phase	Activities	Data collection and analytical tools	Recommendations
A. Formulating initial programme theory and its CMOCs	a) Formulation of initial programme theory b) Development of potential CMOCs c) Generation of testable hypotheses for CMOCs	1. Research literature analysis 2. Document analysis 3. Stakeholder consultation 4. Programme theory construction	Ad A. Explain why RE is fit for the evaluation purpose and define key constructs and realist principles. Ad a. Apply substantive theory in formulating programme theory and mechanisms (Dalkin et al., 2021; Hitchcock et al., 2022). Ad b. Apply implementation theory to develop CMOCs (Salter and Kothari, 2014). Ad. 3. Consider the who, why, where, when and how of stakeholder involvement. (Malengreaux et al., 2024). Ad c. Make a reasoned selection of CMOCs (Nielsen and Lemire, 2025).
B. Data collection	d) Collect data appropriate to test hypotheses for CMOCs	1. Research Design 2. Quantitative data collection methods 3. Qualitative data collection methods	Ad 1. Consider methodological congruence in research design including its epistemological underpinnings (Greenhalgh and Manzano, 2021; Nielsen and Lemire, 2025). Ad 1. Make the programme theory and its CMOCs referential for data collection and analysis. Ad 1. Create a protocol with an explicit and reasoned analytical strategy for testing CMOCs (Nielsen and Lemire, 2025). Ad 1. Consider triangulation of methods and sources (Haunberger and Baumgartner, 2017; Nielsen and Lemire, 2025). Ad 1. Consider the who, why, where, when and how of stakeholder involvement. (Malengreaux et al., 2024). Ad 1. Consider theory-driven data collection (Renmans and Pleguezuelo, 2023). Ad 1. Give consideration to sampling strategy (Renmans and Pleguezuelo, 2023). Ad 2. Consider a realist survey (Renmans and Pleguezuelo, 2023). Ad 3. Apply realist interviewing where appropriate (Manzano, 2016; Rees et al., 2024).
C. Data analysis and hypothesis testing	e) Data analysis centred on testing hypotheses	1. Statistical analytical techniques 2. Qualitative analytical techniques 3. Mixed-methods convergence	Ad. 1. Apply descriptive or inferential statistics to support causal claims where appropriate, such as Structural Equation Modelling (Nielsen et al., 2023; Renmans and Pleguezuelo, 2023). Ad 2. Consider appropriate techniques for single or multicase studies, such as Linked Coding Approach, Process Tracing or Qualitative Comparative Analysis (Nielsen and Lemire, 2025). Ad 3. Seek converging lines of inquiry (Renmans and Plezeguelo, 2023; Nielsen and Lemire, 2025).
D. Refining the CMOs	f) Assess on empirical findings and verification of hypotheses g) Refine CMOs	1. Programme theory revision	Ad f. Display how and why programme theory has been refined (Nielsen and Lemire, 2025). Ad g. Consider the who, why, where, when and how of stakeholder involvement. (Malengreaux et al., 2024).

Source. Adapted from Nielsen and Lemire (2025), originally adapted from Salter and Kothari (2014).

Discussion

This umbrella review of 23 published reviews on RE reveals a maturing, yet heterogeneous, field, which is still grappling with persistent methodological challenges. These challenges largely revolve around ensuring congruence between epistemological assumptions, study design and analytical strategies; convergence of findings through triangulation; and issues related to lack of conceptual and procedural transparency. Nonetheless, authors express cautious optimism, with several reviews proposing frameworks and recommendations to scaffold future rigorous realist practice.

In advancing quality in RE, different instruments serve different purposes. Standards such as RAMESES II are valuable for quality assessment and peer review but offer limited guidance for real-world decisions about designing and implementing an RE study. Nevertheless, the RAMESES II standards have played an important role in supporting the maturation of RE as an evaluation approach. Building on this work, our aim is not only to provide recommendations drawn from existing reviews but also to go a step further by proposing a set of principles to guide realist practice. Principles serve as heuristics for thinking and decision-making, helping evaluators navigate trade-offs, design choices and philosophical tensions in real-world contexts—for example, aligning evaluations with commissioner requirements such as the MAGENTA guidelines (HM Treasury (British government), 2020).

Drawing on the work by Patton (2017), the proposed principles are developed using the GUIDE criteria: Guiding, Useful, Inspiring, Developmental, Evaluable. Using these criteria makes each proposed principle explicit, justified and actionable. The first five principles (Principles 1–5) focus on laying the conceptual and methodological foundations: Together, these principles create the conditions for rigour. They ensure that from the outset, a RE has conceptual clarity, methodological coherence and a theoretically informed framework. As documented, these are common issues raised across past reviews.

1. Clarify Philosophical Grounding: Make explicit the underlying philosophical orientation within realism—whether scientific, critical or a pragmatic blend—and explain how this stance informs design choices, CMOC construction, analytic strategies and interpretation of findings.

• Guiding: Enhances methodological coherence by aligning ontology, epistemology and methods.

• Useful: Enables critical appraisal of design and analytical choices.

• Inspiring: Encourages evaluators to engage with foundational debates in realism.

• Developmental: Supports reflective refinement as philosophical understanding deepens.

• Evaluable: Observable in the explicit articulation of philosophical stance in study reports.

2. Define and justify realist constructs clearly: Use precise, contextually justified definitions of mechanisms, context and outcomes—and explain why these interpretations are most meaningful for the evaluation at hand.

• Guiding: Anchors the entire evaluation in coherent realist reasoning.

• Useful: Helps avoid conceptual conflation in analysis.

• Inspiring: Encourages evaluators to ground practice in realist philosophy.

• Developmental: Allows definitions to adapt with empirical and theoretical progress.

• Evaluable: Reviewable through reported definitions and rationale.

3. Make programme theory central to research design: Treat the initial programme theory and its CMO configurations as the foundational framework for all aspects of evaluation design, including data collection, sampling and analysis.

• Guiding: Keeps the evaluation focused on testing the programme theory.

• Useful: Helps align the method with the purpose.

• Inspiring: Reinforces realist commitment to causal explanation.

• Developmental: Allows programme theory to evolve through iteration.

• Evaluable: Evident in the coherence between theory, design and data use.

4. Knit theory: integrate substantive and implementation theory: Enrich programme theory and CMO configuration development by drawing on both substantive theory (explains why a phenomenon occurs by specifying the causal mechanisms) and implementation theories to explain how and why interventions work in context.

• Guiding: Broadens the conceptual lens of CMOCs.

• Useful: Deepens explanatory power and practical relevance.

• Inspiring: Encourages interdisciplinary theory use.

• Developmental: Supports refinement as new insights emerge.

• Evaluable: Observable in theory, articulation and justification.

5. Scope CMO configurations purposefully: Identify a scope and number of CMO configurations sufficient to meaningfully test the programme theory—neither overextending nor under-specifying causal explanations.

• Guiding: Prevents over- or undercomplex theorising.

• Useful: Supports parsimonious yet powerful inference.

• Inspiring: Promotes thoughtful reflection on programme complexity.

• Developmental: Encourages flexibility as complexity and context are better understood.

• Evaluable: Transparent through reported CMOC maps and rationale.

The second set of five principles (Principles 6–10) shifts focus from foundational issues to execution. These principles aim to help evaluators ascertain that the evaluation is conducted and reported in a way that delivers transparent, reasoned and plausible explanations. They address the weaknesses in procedural transparency, methodological execution and stakeholder engagement that the review identified.

6. Prioritise and justify CMOCs systematically: Select and prioritise CMO configurations and hypotheses through explicit, theory-informed criteria that reflect relevance, plausibility and explanatory power.

• Guiding: Helps evaluators manage analytic scope.

• Useful: Directs data collection and testing towards the most salient questions.

• Inspiring: Elevates theoretical sensitivity and judgement.

• Developmental: Allows reprioritisation as new evidence emerges.

• Evaluable: Justifiable through stated criteria and documentation of prioritisation.

7. Define an analytical strategy early and align it with theory testing: Develop and document a realist-compatible analytical strategy and use these techniques from the outset, detailing how data will be used and analysed to formulate, test and refine CMOCs across the evaluation.

• Guiding: Shapes coherent and transparent analytical decisions.

• Useful: Enhances the rigour and interpretability of findings.

• Inspiring: Encourages creativity in analytical design within realist logic.

• Developmental: Can be revised as inquiry progresses.

• Evaluable: Demonstrable in design documents and analytic narrative.

8. Ensure design congruence across methods, theory, and realist principles: Align study design, data collection and analytical techniques to support realist causal inference, ensuring philosophical and procedural consistency throughout.

• Guiding: Avoids mismatches between realist logic and positivist tools.

• Useful: Provides a clear rationale for method choice and convergence.

• Inspiring: Encourages integration of theory and method.

• Developmental: Allows thoughtful adjustments during fieldwork.

• Evaluable: Evident in evaluation protocols and the coherence of execution.

9. Triangulate data sources and methods to strengthen programme theory testing: Use multiple data sources and collection methods to enhance testing and build richer, context-sensitive explanations.

• Guiding: Strengthens explanatory claims through multiple lenses.

• Useful: Increases the robustness and credibility of findings.

• Inspiring: Encourages evaluators to think expansively about evidence.

• Developmental: Supports adaptation of sources as new insights arise.

• Evaluable: Evident in documentation of methods, sampling and source integration.

10. Explicate stakeholder involvement: Define stakeholders and integrate involvement purposively throughout the RE cycle.

• Guiding: Ensures programme theory reflects multiple real-world perspectives and operational knowledge.

• Useful: Strengthens contextual analysis and plausibility of CMOCs.

• Inspiring: Fosters collaborative ownership and utilisation of findings.

• Developmental: Supports ongoing refinement of programme theory and triangulation through feedback loops.

• Evaluable: Observable in documentation of stakeholder roles, contributions and influence on CMOCs.

We readily admit that a structured and inclusive developmental process akin to the RAMESES I and II projects (Wong et al., 2014, 2017), undoubtedly would have strengthened the legitimacy and quality of the proposed principles. Nevertheless, we believe the proposed principles can be practically useful for guiding the selection of study design, data sources and collection methods, and analytical techniques. Informed by the current umbrella review and particularly on the work by Nielsen and Lemire (2025), we posit that the principles may help realist evaluators in at least two ways: practically and heuristically. By informing RE practice, using these principles may help evaluation practitioners avoid a number of the practical challenges identified in the current review. The principles both address foundational issues and procedural issues in RE. As such, the principles may help evaluators better adhere to RAMESES II standards, inform the development of protocols and training of RE practitioners, as well as guide future reviews of RE studies.

The principles may also serve to support future RE studies heuristically. Collectively, these principles are oriented towards strengthening RE’s contribution to middle-range theory building. By promoting conceptual clarity, methodological congruence and transparent reasoning, they seek to create the conditions under which explanatory propositions can accumulate, be refined across contexts, and travel beyond single case studies. Advancing RE in this direction is essential if it is to fulfil its ambition of generating transferable, yet context-sensitive, causal explanations.

Conclusion

This umbrella review synthesises key challenges and advances in RE, drawing on 23 published reviews. While RE offers strong potential for generating explanatory insights, the published studies also document conceptual ambiguity, methodological inconsistency and analytical opacity. To address these gaps, we proposed a set of guiding principles aligned with Patton’s GUIDE criteria to promote greater methodological congruence, convergence and clarity in future RE studies. The proposed principles aim to strengthen realist practice and support the continued development of standards, tools, evaluator training and rigour in thinking in future RE studies.

Footnotes

Acknowledgements

The authors would like to thank Ray Pawson and Stine Øien Dandanell Garn for constructive critique of earlier versions of this article.

ORCID iD

Steffen Bohni Nielsen

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by institutional resources.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

AI use declaration

The authors used ChatGPT (GPT-5.1, OpenAI) for data extraction and writing support, including language refinement and summary drafting during article development. AI was also used as a persona-based proxy reviewer of the article. All AI-assisted text was edited, verified and approved by the authors, who take full responsibility for the content and its interpretation. No confidential or proprietary information was provided to AI systems, and no AI tool is credited as an author.

Steffen Bohni Nielsen, PhD, is Director General at the Danish National Research Centre for the Working Environment, NFA and a member of the Danish National Research and Innovation Council.

Sebastian Lemire, PhD, is Senior Director of Evaluation and Learning at the Strada Education Foundation and a former board member of the American Evaluation Association.

References

Astbury

Leeuw

(2010) Unpacking black boxes: Mechanisms and theory building in evaluation. American Journal of Evaluation 31(3): 363–81.

Belbasis

Bellou

Ioannidis

JPA

(2022) Conducting umbrella reviews. BMJ Medicine 1: e000071.

Blackwood

O’Halloran

Porter

(2010) On the problems of mixing RCTs with qualitative research: The case of the MRC framework for the evaluation of complex healthcare interventions. Journal of Research in Nursing 15(6): 511–21.

Bonell

Melendez-Torres

Warren

(2024) Realist Trials and Systematic Reviews: Rigorous, Useful Evidence to Inform Health Policy. Cambridge: Cambridge University Press.

Bougie

Watanabe

(2024) Generative adversarial reviews: When LLMs become the critic. arXiv [preprint]. DOI: 10.48550/arXiv.2412.10415.

Craig

Dieppe

Macintyre

, et al. (2008) Developing and evaluating complex interventions: The new Medical Research Council guidance. The British Medical Journal 337: a1655.

Creswell

(2013) Qualitative Inquiry and Research Design: Choosing among Five Approaches, 3rd edn. Thousand Oaks, CA: Sage.

*Dalkin

Hardwick

RJL

Haighton

, et al. (2021) Combining realist approaches and normalization process theory to understand implementation: A systematic review. Implementation Science Communications 2: 68.

De Weger

Van Vooren

NJE

Wong

, et al. (2020) What’s in a realist configuration? Deciding which causal configurations to use, how, and why. International Journal of Qualitative Methods 19. DOI: 10.1177/1609406920938577.

10.

Dell’Acqua

McFowland

Mollick

, et al. (2023) Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Harvard Business School Technology & Operations Mgt. Unit Working paper. 24-013, 15 September. SSRN. Available at: http://dx.doi.org/10.2139/ssrn.4573321

11.

Emmel

Greenhalgh

Manzano

, et al. (eds) (2018) Doing Realist Research. London: Sage.

12.

*Greenhalgh

Manzano

(2021) Understanding “context” in realist evaluation and synthesis. International Journal of Social Research Methodology 25(5): 583–95.

13.

*Haunberger

Baumgartner

(2017) Wirkungsevaluationen in der Sozialen Arbeit mittels Realistic Evaluation: empirische Anwendungen und methodische Herausforderungen. Zeitschrift für Evaluation 16(1): 121–45.

14.

*Hitchcock

OEJ

Grobbelaar

Vermeulen

(2022) A scoping review and critical analysis of the literature surrounding a systems-thinking approach to RE, in the context of monitoring and evaluation. In: 2022 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Kuala Lumpur, Malaysia, 7–10 December 2022, 742–46. New York: IEEE.

15.

HM Treasury (British government) (2020) Magenta Book. Central Guidance on Evaluation. United Kingdom. PDF (assets.publishing.service.gov.uk). HM Treasury.

16.

Jamal

Fletcher

Shackleton

, et al. (2015) The three stages of building and testing mid-level theories in a realist RCT: A theoretical and methodological case-example. Trials 16: 466.

17.

*Jenkins

Maugeri

Palermo

, et al. (2021) Using realist approaches in nutrition and dietetics research. Nutrition & Dietetics 78: 238–51.

18.

*Lacouture

Breton

Guichard

, et al. (2015) The concept of mechanism from a realist approach: A scoping review to facilitate its operationalization in public health program evaluation. Implementation Science 10: 153.

19.

*Lam

Dodd

Wyngaarden

, et al. (2021) How and why are theory of change and realist evaluation used in food security contexts? A scoping review. Evaluation and Program Planning 89: 102008.

20.

Lemire

Kwako

Nielsen

, et al. (2020) What is this thing called a mechanism? Findings from a review of realist evaluations. New Directions for Evaluation 167: 73–86.

21.

*Malengreaux

Doumont

Scheen

, et al. (2022) Realist evaluation of health promotion interventions: A scoping review. Health Promotion International 37(5): daac136.

22.

*Malengreaux

Martens

Castellano Pleguezuelo

, et al. (2024) Stakeholder involvement in realist evaluation: A scoping review and best fit framework synthesis. Evaluation 31(1): 22–48.

23.

*Manzano

(2016) The craft of interviewing in realist evaluation. Evaluation 22(3): 342–60.

24.

Manzano

Willams

(eds) (2025) Realist Evaluation. Principles and Practice. London: Routledge.

25.

*Marchal

van Belle

van Olmen

, et al. (2012) Is realist evaluation keeping its promise? A review of published empirical studies in the field of health systems research. Evaluation 18(2): 192–212.

26.

Morse

Mitcham

Hupcey

, et al. (1996) Criteria for concept evaluation. Journal of Advanced Nursing 24: 385–90.

27.

Mukumbang

De Souza

Eastwood

(2023) The contributions of scientific realism and critical realism to realist evaluation. Journal of Critical Realism 22(3): 504–24.

28.

*Nielsen

Lemire

(2025) Nothing as practical as an analytical strategy. In: Manzano

Williams

(eds) Realist Evaluation: Principles and Practice. London: Routledge, 40–56.

29.

*Nielsen

Jaspers

SØ

Lemire

(2023) The curious case of the realist trial: Methodological oxymoron or unicorn? Evaluation 30(1): 120–37.

30.

*Nielsen

Lemire

Tangsig

(2022) Unpacking context in realist evaluations: Findings from a comprehensive review. Evaluation 28(1): 91–112.

31.

*Palm

Hochmuth

(2020) What works, for whom and under what circumstances? Using realist methodology to evaluate complex interventions in nursing: A scoping review. International Journal of Nursing Studies 109: 103601.

32.

Patton

(2017) Principles-Focused Evaluation: The GUIDE. New York: Guilford Press.

33.

Pawson

(2016) The ersatz realism of critical realism: A reply to Porter. Evaluation 22(1): 49–57.

34.

Pawson

Tilley

(1997) Realistic Evaluation. Thousand Oaks, CA: SAGE.

35.

Porter

(2015) The uncritical realism of realist evaluation. Evaluation 21(1): 65–82. https://doi.org/10.1177/1356389014566134.

36.

*Quintans

Yonekura

Trapé

, et al. (2020) Realist evaluation for programs and services in the health area: An integrative review of the theoretical and methodological literature. Revista Latino-Americana de Enfermagem 28: e3255.

37.

*Rees

Davis

Nguyen

VNB

, et al. (2024) A roadmap to realist interviews in health professions education research: Recommendations based on a critical analysis. Medical Education 58(6): 697–712.

38.

*Renmans

Pleguezuelo

(2023) Methods in realist evaluation: A mapping review. Evaluation and Program Planning 97: 102209.

39.

*Ridde

Robert

Guichard

, et al. (2012) L’approche realist à l’épreuve du reel de l’évaluation des programme. The Canadian Journal of Program Evaluation 26(3): 37–59.

40.

*Salter

Kothari

(2014) Using realist evaluation to open the black box of knowledge translation: A state-of-the-art review. Implementation Science 9: 115.

41.

Sánchez-Gómez

García

AVM

(2018) Convergence between quantitative and qualitative methodological orientations: Mixed models. In: Costa

Reis

Souza

, et al. (eds) Computer Supported Qualitative Research. ISQR 2017. Advances in Intelligent Systems and Computing, vol. 621. Cham: Springer, 341–57.

42.

*Taylor

Nimmo

AMS

Hole

, et al. (2024) An introduction to realist evaluation and synthesis for kidney research. Kidney International 105(1): 46–53.

43.

Van Belle

Wong

Westhorp

, et al. (2016) Can “realist” randomised controlled trials be genuinely realist? Trials 17: 313.

44.

Wong

Greenhalgh

Westhorp

, et al. (2014) Development of methodological guidance, publication standards and training materials for realist and meta-narrative reviews: The RAMESES (Realist and Meta-Narrative Evidence Syntheses: Evolving Standards) project. Health Services and Delivery Research 2(30). DOI: 10.3310/hsdr02300.

45.

*Wong

Westhorp

Greenhalgh

, et al. (2017) Quality and reporting standards, resources, training materials and information for realist evaluation: The RAMESES II project. Health Services and Delivery Research 5(28). DOI: 10.3310/hsdr05280.

46.

Yin

(2018) Case Study Research and Applications: Design and Methods, 6th edn. London: Sage.