Abstract
In this article, we discuss the methodological implications of data and theory integration for Theory-Based Evaluation (TBE). TBE is a family of approaches to program evaluation that use program theories as instruments to answer questions about whether, how, and why a program works. Some of the groundwork about TBE has expressed the idea that a proper program theory should specify the intervening mechanisms underlying the program outcome. In the present article, we discuss in what way data and theory integration can help evaluators in constructing and refining mechanistic program theories. The paper argues that a mechanism is both a network of entities and activities and a network of counterfactual relations. Furthermore, we argue that although data integration typically provides information about different parts of a program, it is the integration of theory that provides the most important mechanistic insights.
Introduction
In this article, we discuss the methodological implications of data and theory integration for Theory-Based Evaluation (TBE). We take as a point of departure the interest—expressed in some of the main groundwork for TBE—in investigating the intervening mechanisms that contribute to a program outcome. We discuss the claims concerning mechanisms in TBE and whether these claims can justify methodological choices connected to data and theory integration. In addition, we concretize our discussion with a case: a research project conceptualizing and evaluating a national-scale professional development (PD) program for mathematics teachers in Sweden.
By delving into the complex methodological details of our case, we investigate what kinds of mechanisms are studied in our examples and how different forms of integration contribute to the construction and refinement of a mechanistic program theory, that is, a theory that accounts for how a program generates its target outcome.
In the paper, we argue for three main claims. First, researchers in the field of TBE should conceive mechanisms as both networks of counterfactual relations and networks of theoretically specified entities and activities. Secondly, assuming this concept of mechanisms, it emerges from the analysis of our cases that data integration provides useful information about the parts of a program—that is, the activities and materials that constitute the program’s content—but provides no further contribution to the construction and refinement of a mechanistic program theory. Thirdly, the integration of theories seems to be the main driver of mechanistic theorizing, and the use of theoretical resources seems to be pervasive in the process of the construction and refinement of a mechanistic program theory.
The paper begins by introducing the concept of TBE and clarifying the role of program theories in this family of approaches. We then set the stage for our discussion by showcasing how the concept of a mechanism has been discussed in the context of TBE. In the same section, we propose a conceptualization of mechanisms in TBE. This conceptualization is the result of our interpretation of the claims concerning mechanism in the TBE, but it is not a simple descriptive account of mechanistic commitments among evaluation researchers. The next section concerns how integrative methods have been discussed in TBE, and especially what rewards are connected to different forms of integrations. In light of the discussions of mechanisms and integration in TBE, we move on to our analysis. This consists of two sections in which we discuss in what ways data integration and theory integration can contribute to constructing and refining mechanistic program theories, that is, theories that account for the mechanism through which a program generates its outcome. Our analysis is based on a case of a research project evaluating a national-scale PD program for mathematics teachers in Sweden. We conclude in the last section by summarizing our claims and discussing the contribution of our discussion to the field of TBE.
Theory-Based Evaluation
The concept of theory-based evaluation has been around for at least three decades, as illustrated by the historical perspectives on TBE in Weiss (1997) and Rogers and Weiss (2007). TBE is a term that describes a family of approaches to the evaluation of interventions that have as a common denominator the use of an explicit theory as an instrument for answering questions about whether a program works, and also how and why it does (Chen, 2006; Coryn et al., 2011). In other words, TBE describes all the approaches to evaluation that focus on the logic of an intervention, and that use a theory to define this logic. Along with its more common use is a longer list of labels, such as theory-driven evaluation, program-theory evaluation, theory-guided evaluation, theory-of-action, theory-of-change, program logic, or logical frameworks (Coryn et al., 2011), which, however, are not always used with the same distinct definitions (Rogers & Weiss, 2007).
Central to TBE is the use of a program theory which should function as an instrument that guides both the design and the application of the evaluation (Coryn et al., 2011). The content and nature of program theories vary depending on the approach to TBE. To name an example: Weiss (1997) distinguishes between implementation theories and programmatic theories. The former describes the activities and materials that are involved in the program, together with its intermediate and final outcomes; the latter describes the processes or mechanism through which the implementation of a program generates its outcome (for instance, the cognitive processing of the information provided during the program implementation). Weiss goes on explaining that Theory of Change approaches to TBE (also discussed in e.g., Funnell & Rogers, 2011; Rogers & Weiss, 2007) involve both programmatic and implementation theories. A theory of change describes, according to its proponents, the interventions as a chain of activities, processes, and intermediate outcomes. According to this approach, the intervention can be described as effective only if the activities that are described in the theory of change were actually implemented, and the contribution of the activities described—for instance, the cognitive processes resulting from the implementation—is accounted for in the theory of change and is empirically supported.
Mechanisms in TBE
Back in 1989, during the earlier development of TBE, Chen, one of the main contributors to TBE in this phase, discussed a conceptual framework for TBE (Chen, 1989). In this framework, he describes the main conceptual dimensions of theory in TBE. Theories that are employed for evaluating interventions must belong to certain theoretical domains, Chen (1989) argues, and one of these domains is what he calls the “intervening mechanism domain” (p. 393). According to Chen: This domain investigates the causal processes which link the implemented treatment to outcomes (i.e., the processes by which the treatment produces or fails to produce the desired outcome). Program treatment usually affects program outcomes through some intervening process. An investigation of intervening mechanisms will provide information about why a program works or does not work, and help to diagnose the strengths and/or weakness of a program for possible improvement. (1989, p. 393)
Chen describes here a set of specific constraints on the theory that determine the possibility of using a theory for evaluation purposes. The theories that TBE requires are causal, process-oriented, and likely to describe the causal interaction between several processes. Another early contributor to the mechanism-oriented discussion in TBE is Weiss (1997), who stressed the importance of knowing the mechanism for extrapolating program outcomes to other contexts: Knowing the mechanism that works is even more important for other sites that want to adopt the successful program. It is impossible for them to replicate the entire set of materials, procedures, physical conditions, and relationships that make up the program. They have to adapt it to their own participants. When they know the essential levers, they can make the adaptation without fear of losing the key component that makes the program effective. (Weiss, 1997, p. 511)
Although the concept of mechanism is present in the earlier contributions to TBE, it reached its centrality with the rise of realistic evaluation, a specific approach to TBE, couched in the philosophy of critical realism, first developed by Pawson and Tilley (1997). The main claim of Realist Evaluation is that an outcome is the result of the interaction between a mechanism and a context. Pawson and Tilley conceptualize program theories as mechanism-context-outcomes, to stress the importance of context. The theory that realist evaluators seek to develop and test accounts for how a specific context enables a mechanism to be effective. This form of theory is particularly fruitful for evaluation purposes, as it specifies not only how a program intervention works, but also under which contextual conditions—as Weiss (1997) stressed above—the effect of the program can be replicated.
We do not aim at providing an overview of the use of the mechanism concept in the TBE literature. Astbury and Leeuw (2010), Dalkin et al. (2015), and, more recently, Schmitt (2020), and Lemire et al. (2020) provide a detailed overview of this kind, showing the widespread use and the varieties of conceptualizations of the mechanisms in TBE. These overviews identify a number of recurrent themes concerning how mechanisms are conceptualized in the context of TBE. For instance, Astbury and Leeuw (2010) identify the terms “hidden,” “sensitive to variations in context,” and “generate outcomes” as recurring ways of characterizing mechanisms. Schmitt (2020) provides a taxonomy of two types of mechanism concepts used in the evaluation literature: behavioral mechanisms (describing changes in individual reasoning and decisions that mediate behavioral effects) and process mechanisms (describing how different activities involved in an intervention or program are linked together). In the same thematic volume on causal mechanisms in program evaluation, Lemire et al. (2020) provide further types of mechanism concepts in evaluation research: program components, psychological reactions, behavioral reactions, and contextual conditions.
These recent taxonomies are mainly descriptions of how the concept of mechanism is used in evaluation research and focus on what kind of components make up a mechanism.
What we observe in the literature on mechanisms in TBE is the absence of proposals for a normative standard about the nature of mechanisms. Arguably, the variety of uses of the term mechanism in the literature about TBE depends on this lack of a shared standard. Our first contribution in this paper is to fill this gap by proposing a normative definition of the concept of mechanism for TBE. Our proposal is based on how some features of mechanisms appear in the discussion around TBE but is not descriptive. Moreover, our proposal focuses on what defines a mechanism in TBE rather than on what constituents should a mechanism include. Our conceptualization consists of two claims:
The causal pathway claim and The entities and activities claim.
As we will elaborate later on, we argue that, together, these claims define how mechanisms should be understood in terms of these features, even if some existing approaches to TBE might not cohere with them. Our conceptualization does not settle the questions of what component types should make up a mechanism (such as program activities, psychological or behavioral reactions, or other forms of agency-related factors). We refer the reader to Dalkin et al. (2015) and to Lemire et al. (2020) for a discussion of this issue.
Causal Pathway Claim
As we mentioned above, Chen (1989) conceptualizes a mechanism as a set of “causal processes,” that is, “the processes by which the treatment produces or fails to produce the desired outcome” (p. 393). Crucially, Chen characterizes mechanisms as “intervening,” a concept that will be clarified by Tilley (2000) later on: In the case of social programmes we are concerned with change. Social programmes are concerned with effecting a change in a regularity. The initial regularity is deemed, for some reason, to be problematic. The programme aims to alter it. A pattern may be problematic for a whole variety of reasons. There may be crime problems, problems of pupils failing at school, health pattern problems, literacy difficulties, child care weaknesses and so on. […] The aim of a programme is to alter these regularities. Thus, […] evaluations of programmes are concerned with understanding how regularities are altered. (p. 5)
This passage clearly shows a central idea of how programs produce their effect on their target variables: they effectively intervene to alter a regularity. This way of conceiving a causal mechanism seems to fit well with the so-called interventionist theories of causation, according to which A causes B if there is a potential intervention I on A that is capable of generating a change in B (Woodward, 2005) when all other factors relevant to A and B are kept stable. A program theory describes just such type of potential intervention I, and its capacity is to change the effect of A on B prior to the intervention, disrupting or altering the regularity A → B (when I is not correlated to any other factors relevant to A and B). The crucial aspect that we want to highlight here is that programs are—in the light of the interventionist theory—counterfactually connected to their outcome: If the program were not implemented, the regularity A → B would not have been altered. Woodward’s (2005) interventionist theory of causation is originally intended to be a counterfactual theory. In ideal situations, A causes B if there is a concrete intervention on A that leads to a ceteris paribus change in B. However, as Runhardt (2020) discusses, causal claims should be defined in all situations in which ideal interventions are not possible (situations that are common in program evaluation). In these cases, a causal claim is associated with a counterfactual claim about what change would result if an intervention were applied to A. Therefore, we should interpret the shared claim of Chen (1989) and Tilley (2000) about an intervening mechanism as a counterfactual concept, as ideal interventions will often not be available when evaluating programs.
A further element that provides an indication of the nature of mechanisms is in Pawson and Tilley’s (1997) work and is the fact that a mechanism is not a further intervening variable, but rather “an account of the make-up, behavior, and interrelationships of those processes which are responsible for the regularity” (p. 68). Just because a program theory is a theory, it is not enough to describe the intervention as an intervening factor having a counterfactual effect on a regularity, but there is a sense in which program theories should decompose the intervening factor into more basic components and their relationships. Pawson and Tilley (1997) group these components into two basic families: resources and reasoning, where the former includes all material artifacts that a program involves, and the latter the reactions of the individual involved. A school-based program against the bullying might use information material for parents to improve their capacity of identifying the signs that their child is suffering from bullying. The mechanism consists of (1) the content of the information material, (2) the parents’ cognitive beliefs and desires resulting from the information material, (3) the relation between the two (the parents’ capacity changes, because the beliefs and desires are a result of reasoning processes made possible by the content of the information material). This is the idea of mechanisms as constitutive of the causal relationship between programs and their outcomes that Pawson and Tilley argue for in the same chapter. We will return to the issue of “makeup” in the next section. Here, we want to focus on the issue of relations. Assuming that the relationships between the parts of a program are a necessary feature of the mechanism, it seems that these inherit the more general counterfactual nature of programs discussed by Tilley (2000) above. What we mean is that the relationships between the constituents of a mechanism in TBE are themselves counterfactuals and that the global counterfactual relation between programs and outcomes is determined by the internal counterfactual relationship among the parts of a program. Only, the list of composing counterfactual relations between parts is not enough to specify the mechanism responsible for the outcome. Mechanisms are structured, and their structure determines their outcomes just as much as their components.
Consider again the anti-bullying program. It is supposed to work by raising parents’ awareness, which is itself the effect of reading the information material. Even if the main components are in place (parents’ awareness and information material), the account of how the program works is incomplete, as the program theory does not clarify that the parents’ awareness would not have changed in an observed way if the information material had not provided the parents with the necessary resources.
A program mechanism is a network of counterfactuals, and the outcome of a program is brought about by the parts of a program and their structural arrangements: two networks of the same program parts arranged differently determine two different mechanisms. We call this the causal pathway claim: it is a claim about the nature of program mechanisms that are necessary (but maybe not sufficient), and that sets a standard for our understanding of what a mechanism is, but maybe not necessarily about how a mechanism should be empirically studied.
Claims like the causal pathway claim have been defended in the philosophical literature on mechanisms in science. According to Cartwright and Stegenga (2012), in order for a mechanism to answer a how question, an evaluator must be able to “trace out the causal pathway from policy variable to effect” (p. 35). In other words, a mechanistic model of an intervention consists of a network of causal relations that has its main end-node in the expected effect. Causal pathways are essential in the analysis of mechanisms, as they express the idea that not only does an intervention on some part of the mechanism corresponds to an observable effect on its outcome, but also that the effect is effectively produced by the different parts of the intervention. Marchionni and Reijula (2019) argue for a similar claim. According to them, given any parts A and B of a mechanism M, in order to qualify M as a causal mechanism, it must be the case that A is counterfactually related to B; that is, the contribution of A to B would not have been obtained if A had not been obtained. Moreover, Marchionni and Reijula (2019) seem to claim that the causal path claim is both necessary and sufficient to define a mechanism. The composing parts and their counterfactual relations are all there is about mechanisms.
The concept of mechanism that emerges from the causal pathway claim is oftentimes represented as a graph that includes all the relevant factors and in which internal counterfactual relations are represented as arrows. This is the approach to the causal mechanism advocated by Pearl (2000). The approach to the causal mechanism he developed entails that the contribution of each factor in a program can be represented as a structural equation, describing each factor as a function of the other factors. The global mechanism that explains the outcome of a program can then be represented as a system of structural equations. Factors that are only determined by contextual elements are exogenous, whereas factors that are determined by the parts of the program are endogenous. This is a way of conceptualizing the mechanism-context-outcome configuration introduced by Pawson and Tilley (1997).
Entities-and-Activities Claim
There are some interesting indications that the causal pathway claim might only tell a part of the story about the nature of program mechanisms. The issue is whether counterfactual relations are sufficient to build up a theory, that is, following Chen (1989), Weiss (1997), and Pawson and Tilley (1997), an explanatory account of how a program brings about its outcome.
As we mentioned in the previous section, Pawson and Tilley (1997) argue that the specification of intermediate variables is not enough to account for a mechanism; what is necessary is a theory specifying Pawson and Tilley’s “makeup” of and “relationships” among the parts of the program. As we have argued in the previous section, the relationships between parts of a program must be counterfactual, but there is nothing in that argument that justifies the claim that the specification of counterfactual dependencies between the parts of a program sufficiently describes the program’s mechanism. This is because counterfactual relations seem to lack the capacity of clarifying the “makeup” of mechanisms. Counterfactual relations cannot indicate anything more than a relationship between changes (if X → Y, then there would not have been any change in Y if there had not been a change in X). However, in our experience, counterfactual relations are grounded in concrete processes (a term used by Chen (1989)) that can be suitable for specific descriptive accounts. Therefore, we might say that the counterfactual relation is in itself rather thin. Take, for instance, Astbury and Leeuw (2010): Mechanisms appear too frequently as unexplained ‘causal arrows’ that seem to flourish so well in the present climate of enthusiasm with visual logic models. This does not seem to be what theory-driven evaluators had in mind when they introduced the concept of ‘mechanism’ to the evaluation community. (p. 367)
Astbury and Leeuw go on providing—as means of exemplification—a citation from Weiss (1997): […] if counselling is associated with reduction in pregnancy, the cause of change might seem to be the counselling. But the mechanism is not the counselling; that is the program activity, the program process. The mechanism might be the knowledge that participants gain from the counselling. Or it might be that the existence of the counselling program helps to overcome cultural taboos against family planning; it might give women confidence and bolster their assertiveness in sexual relationships; it might trigger a shift in the power relations between men and women. These or any of several other cognitive, affective, social responses could be the mechanisms leading to desired outcomes. (p. 46)
Here, Weiss (1997) claims that establishing (counterfactual) dependencies does not per se specify a mechanism. Instead, this specification requires the description of the actual processes or activities that ground that dependency.
Schmitt and Beach (2015) argue for a similar claim: leaving causal interdependencies between parts of a program unspecified entails that the resulting program theory cannot be considered as a mechanistic one. In fewer words, according to Schmitt and Beach (2015), a network of causal arrows is not a mechanism. Instead, they argue that the crucial feature of mechanistic theories is the specification of causal arrows by means of “activities” (pp. 431–434).
In our example of the anti-bullying program, activities would describe the specific cognitive processes that lead parents to develop instruments for identifying signs that their child is a victim of bullying, and how the information material could trigger those processes. For instance, the information material could have described a simple tallying-heuristics that support parents’ decision-making, consisting of a sparse number of signs and the rule “if more than two signs are observed, consider the risk as high, otherwise low.” The information material could have also been structured in a way that produced a high sense of identification among parents in high-risk contexts. Here, the decision-making heuristics and the sense of identification are the activities that allow the effect of the program to travel along its parts (information materials → parents → children).
Schmitt and Beach (2015) build their claim on Machamer, Darden, and Craven’s (MDC) foundational work on mechanisms (2000). Nowadays the MDC account is considered one of the seminal works in the so-called new mechanism-turn in the philosophy of science. The MDC definition of a mechanism is: “Mechanisms are entities and activities organized such that they are productive of regular changes from start or set-up to finish or termination conditions.” (Machamer et al., 2000, p. 3). This account of mechanism has the main reward of introducing entities and activities as defining components, but it is not well-suited to TBE. The requirement of “regular changes” existing “from start or set-up to finish or termination conditions” is simply too strong for many program evaluations. Many social interventions are unable to produce regular changes—consider, for instance, educational interventions—so policymakers will often be content with irregular ones. At the same time, although it might be quite reasonable to expect a program to have definite start conditions, it is common for social interventions to have indefinite termination conditions. Interventions that act on social norms have indefinite termination conditions, as they will let social collectives take over the intervention and internalize it as their own process, thereby using it indefinitely.
More recently, Illari and Williamson (2012) have discussed a concept of mechanism across the science that modifies the MDC account to drop these conditions. According to them: “A mechanism for a phenomenon consists of entities and activities organized in such a way that they are responsible for the phenomenon” (p. 120). Illari and Williamson’s theory provides a looser conceptualization of mechanisms as any productive arrangement of entities and activities, having more or less regular outcomes, and more or less definite starting and ending points. We contend that this is a more suitable conceptualization of the concept of mechanism for TBE.
We therefore have two claims that together account for the nature of mechanisms in TBE. These state that, in the context of TBE, the term mechanism should be understood as:
A set of parts of a program that are counterfactually related to one another in the production of a targeted effect, where the structure of counterfactual relations is itself a contributing factor (causal pathway claim), and A set of entities and activities involved in the program and organized in such a way that they are responsible for the program outcome (entities-and-activities claim).
Beach (2016) has discussed two basic understandings of mechanisms (the counterfactual-based view and the system view) that cohere with our two claims above. Beach discusses these as contrasting views, especially from a methodological point of view. We proposed that from a conceptual point of view—that is, concerning how we understand the concept of a mechanism in TBE—the two views are complementary. As we mentioned above, our conceptualization provides a proposal for a normative standard for what evaluators should mean when they use the term mechanism. It can be the case that several approaches to TBE conceive mechanisms in a different way or only in terms of one of the claims above.
Finally, our conceptual proposal does not, per se, entail any set of methodological guiding principles. The problem of deriving such guidance from our concept of mechanism is discussed in the next sections.
Integration in TBE
In this paper, we are interested in the rewards of integration, and especially of data and theory integration, for TBE. The concept of integration refers to the way in which different elements are combined, merged, or conjoined in order to achieve some epistemic goal. Integration has been and still is a main—but not the only—methodological principle of mixed methods (e.g., Bazeley, 2012, 2017; Bazeley & Kemp, 2012; Cronin, Alexander, Fielding, & Moran-Ellis, 2008; Fetters, Curry, & Creswell, 2013; Fetters & Freshwater, 2015; Fetters & Molina-Azorin, 2017; Moran-Ellis et al., 2006; Moseholm & Fetters, 2017), and of multi-methods/process-tracing research (Beach, 2020; Beach & Brun Pedersen, 2013; Goertz, 2017; Goertz & Mahoney, 2012; Humphreys & Jacobs, 2015; Rohlfing, 2012; Seawright, 2016; Weller & Barnes, 2014). In the vast literature on integrative methodologies, the integrands can be data, theories, designs, and more. Later in the text, we focus only on data and theory integration, for the sake of simplicity and space, and because these two forms of integration are simple, intuitive, and commonly used in mixed and multi-method research. In simple terms, data integration entails integrating data sets, whereas theory integration entails integrating models of the target phenomena. In this section, we set the stage for the later discussion on data and theory integration in TBE, by showcasing some examples of claims that relate TBE to integration (now more generally understood) and that may be in some way connected to the idea of a mechanism. The point here is to look at how proponents of TBE have discussed integrative methodologies.
The earliest contribution to this issue that we could trace is found in Chen and Rossi (1989). Here, the authors discuss the appropriateness of randomized controlled trials (RCTs) for theory-based evaluation and claim that “Close attention to the modelling of program processes can improve the power of randomized experiments” (p. 304). This claim does not mention the term integration explicitly, but clearly suggests that approaches that are capable of constructing models of program processes—that is, theory-generating methodologies, such as process-tracing or grounded theory—can be used together with randomized controlled experiments in order to improve the validity of evaluation. Hence, Chen and Rossi (1989) seem to suggest that TBE can benefit from the integration of more specified models to effect measurements. The connection to the issue of mechanism is also not explicit here, unless the focus on program processes is understood as program activities as suggested in the previous section. Even in this case, Chen and Rossi claim here that a theory of processes improves the conclusions of a randomized experiment, and not that integration helps in identifying mechanisms. Mechanisms (if processes are mechanisms) are tools for more valid conclusions and not the primary epistemic goal. The issue is avoiding the potential threats to validity that are typically attributed to randomized trials, and mechanisms (or processes) might be a way of overcoming these threats. Similar consideration is also found in Chen (1989) and Chen (2006). Chen and Rossi (1989) also focus on “mixed methods,” without clearly discussing any specific integration. Here, the rewards of using mixed methods in TBE are not directly connected with the idea of mechanisms but consist in the possibility of either collecting data about different parts of the program, or in supporting different kinds of validity appraisals.
In a later contribution, Chen (1997) argues that, although mixed methods should not be considered as a dominant methodological paradigm for theory-based evaluation, it entails some clear rewards, especially in the case of complex programs. These rewards are the possibility of using a specific method for each part/level of the program, and the capacity of establishing a trade-off between internal and external validity. Here, too, integration provides data about the different parts of a program and supports validity appraisals. Similar to the examples above, the focus is not explicitly on mechanisms. The main idea here seems to be that mixed methods can be used for the integration of claims (claims about parts of a program, or claims about the scope of an evaluation) and that evaluations require making and being able to justify different claims.
White (2008) seems to be the first to make a direct connection between the issue of integration and the mechanistic goal of TBE: The current benchmark for valid impact estimates is that the study has a credible counterfactual, which means that it addresses the issue of selection bias where this is likely to be an issue […]. [However,] just knowing if an intervention worked or not is not usually sufficient, we also want some idea of why, how, and at what cost. And knowing if it worked, without knowing the context within which it did so, limits the scope for generalisation and lesson learning. Answering the ‘why’ question is where qualitative methods come in. (pp. 98–99)
In contrast to the examples above, White connects integration to the epistemic aim of program evaluation, and not to the validity of the evaluative conclusions. The claim he presents is that integration is necessary for achieving the epistemic goal of answering a “why” or a “how” question. White discusses several examples of integration involving data, designs, methods, and theories. According to him, the answer to “why” or “how” questions are program theories, that is, theories of mechanisms, and integration is necessary for the content of program theories. A further element is worthy of remark in White’s discussion: he seems to connect quantitative methods to the counterfactual element of program theories, while qualitative methods seem to be connected to the theoretical content of these theories. This seems to suggest that the issue of integration is directly connected with the two claims about the nature of mechanisms we presented above. We return to this issue later in this section.
Killoran and Kelly (2010) have also argued that the combination of an account of a process (provided using qualitative methods) and an account of an effect (provided using quantitative methods) is appropriate to the theoretical goals of TBE: The [objective of realist evaluation] is, of course, to understand ‘why’ the programme works and this involves close understanding of the programme mechanism and its precise utility and appeal in the different quarters of an intervention. […] Some broad [methodological] principles may be established, the most fundamental of which is that theory-driven analysis demands a multi-method evidence base. […] How does one trace [the basic and active ingredients of social change]? Well, data on process is generated, broadly speaking, using qualitative methods; outputs and outcomes are measured via quantitative approaches; contextual information requires comparative observation and measurement. Testing any programme theory requires the conjunction or triangulation of all three. (pp. 51–52)
Killoran and Kelly are in line with White (2008) with their focus on the content of program theories; they argue that answering a “why” question—that is, describing a mechanism—requires knowledge of processes and of outputs/outcomes. As these forms of knowledge require different methods, answering a “why” question requires the “conjunction,” or, in our words, integration, of both. This short showcase of examples is not meant to be complete but provides an illustration of how integration has been discussed in the literature on TBE and shows some instances in which integration has been directly connected to the concept of mechanism.
White’s (2008) and Killoran and Kelly’s (2010) suggestions provide us with a vantage point for formulating a question, which will inform our discussion in the remainder of this paper. If, as it seems to be suggested, integration is necessary to the TBE goal of accounting for mechanisms, are different forms of integrations (especially data and theory integration) differently connected to the two aspects of the nature of the mechanism described by the causal pathway claim and the entities-and-activities claim?
In the literature on TBE, we could not find a clear discussion about how different forms of integrations—and particularly data and theory integration—contribute to theorizing and assessing claims about program mechanisms. 1 In simple terms, it is not clear what mechanistic rewards result from theory and data integration. The discussion of this issue is our main contribution to the field of TBE, and the focus of the remainder of this paper. The discussion will proceed in the next sections in the following way. We consider data integration and theory integration separately. For each form of integration, we first present a reconstruction of the general rationale, and then discuss the connection between integration and the two claims about the nature of mechanisms we argued for: the causal pathway claim and the entities-and-activities claim.
In the following sections, we support our claims using the case of a research project involving a national-scale PD program for mathematics teachers. For this reason, it can be helpful to provide the reader with some context about our case before discussing the different integration strategies.
The Case of a Research Project Studying Large-Scale Teacher Professional Development
The authors of this paper are currently involved in a research project conceptualizing and evaluating a national-scale, state-coordinated PD program (Boost for Mathematics—BfM) for mathematics teachers in Sweden. The overarching goal of BfM is to improve students’ mathematics achievement by strengthening mathematics teaching. To facilitate this development, nearly 80% of all elementary school mathematics teachers in Sweden have participated in the year-long PD program. 2 One year of PD includes 16 rounds, consisting of four sessions each, in which teachers: (A) individually study PD materials provided on a digital platform 3 ; (B) meet with their colleagues to plan for an activity (e.g., a lesson to conduct); (C) carry out the activity with the class they normally teach; and (D) meet with their colleagues again to discuss experiences gained from the conducted activity.
The aim of our research project is to characterize and examine relationships among different features of this program, such as PD materials, collegial discussions, teachers’ beliefs and knowledge, classroom instruction, and student achievement. Several data sets were collected, including the PD materials (Session A), video recordings of collegial discussions (Sessions B and D) and classroom lessons (Session C), interviews with the teachers (both before and after the lessons), and student results on mathematics tests. The volume of data collected is quite extensive with about 160 video recordings of collegial discussions, 180 video recordings of classroom lessons, and 200 recordings of interviews. In this article, we make use of two of the project’s studies to illustrate our discussion of data integration and theory integration with a practical case.
Data Integration
According to a popular characterization of data integration (Caracelli & Greene, 1993, p. 197): “One means by which qualitative and quantitative data can be integrated during analysis is to transform one data type into the other to allow for statistical or thematic analysis of both data types together.”
Let us spell out data integration in further detail. We consider the situation in which several different data sets have been collected (e.g., structured and unstructured data). Each data set is assumed to represent a part of the program. For instance, in our case, the targeted program is BfM, and the involved teachers’ knowledge of the subject matter of mathematics is a property of the teachers, a part of the program which is assumed to be relevant for the effect of the intervention. Such assumptions are common approaches to TBE that start out by formulating a crude program theory, describing the essential contributing parts of the program, and collecting data with the purpose of refining the crude theory into a specified program theory. In some cases, the crude program theory is a part of the program, as stakeholders will plausibly organize program interventions to include its specific part because of a set of theoretical assumptions. This is the case of BfM, which is a complex intervention with complex theoretical planning behind it.
In the literature on mixed methods (e.g., Bazeley, 2017; Hesse-Biber & Johnson, 2015), data integration is typically based on the categorization of unstructured data as a means of comparison between structured and unstructured data. Unstructured data (interview transcripts, ethnographic notes, or video recordings) are coded, and the codes are integrated into the structured data as further variables. In our project, we employed several types of coding. We categorized the lessons using the UTeach Observation Protocol (UTOP; Walkington & Marder, 2018), in which each lesson is categorized using 28 indicators organized in four sections (Classroom Environment, Lesson Structure, Implementation, and Mathematics Content) and thereafter translated by observers to scores on a five-point Likert scale. The CMs were categorized according to one of the frameworks developed in Lindvall et al. (2018), in which the materials are classified according to one of five categories regarding their content focus (Teachers’ and students’ content knowledge, Didactics, Teacher actions, Lesson design, and Reflections on own learning and practice). Further, we categorized the collegial discussions using the framework used in Steenbrugge et al. (2018), in which each collegial discussion is described in terms of the agency of the teachers participating in it and thereafter classified as low or high.
This allowed us to integrate the UTOP data with the CM codes and the collegial discussions codes into an integrated data set in which every row is a lesson, and the columns represent the UTOP items, the connected CM codes, and the preceding collegial discussions codes. To each row, we added further contextual data, such as teacher ID, school characteristics, etc.
This integrated data set is a data model (Harris, 2003) of BfM, which is an artifact resulting from the theoretical re-description of the collected raw data materials. In this data model of the program, each part (corresponding to an element of the crude program theory) is represented as a set of codes or numerical variables. We move on now to the discussion of the mechanistic rewards of creating this integrated data model.
Data Integration and Mechanism
The first clear mechanistic reward of an integrated data model is access to information about the parts of the program. As we saw, a main feature of mechanisms is that they consist of different parts. A mechanistic program theory should—minimally—specify these parts, and an integrated data model provides exactly this specification. In our project, the crude program theory entails that the focus of CMs affects teachers’ agency and participation in collegial discussions, which in turn affects the UTOP scoring for the lessons. Data about all these program parts is required to specify the content focus of the different CMs, and the teachers’ agency and participation in the collegial discussions and the UTOP scorings for the teachers’ lessons.
The second reward is the possibility of comparing the variables in the integrated data set, allowing for the study of relationships among the parts of the program. Comparison between variables can result in a set of coefficients B = {β1,…, βn} describing the targeted relations between the elements of P. Examples of such coefficients could be estimated parameters of multivariate statistical models, such as mediation models, path or hierarchical models. By applying estimation methods to the integrated data set, the resulting parametrized multivariate model is itself a further data model, as it describes relationships between the data. These relational data models have the further advantage of enabling a graphical representation of the path that connects the different parts of the program to the targeted outcome.
Here comes a first complication. Even if integrated data sets are often used to estimate the parameters of a graphical model, they are very seldom used to estimate the structure of the graphical model itself. When an integrated data set is constructed, it allows for all possible relationships between its variables. If we have a preconception of how the parts of the program are causally arranged, then we select a subset of all possible relations, and we can use an integrated data set to estimate the parameters of the model, but it is very uncommon to use the integrated data set to understand how the parts of the program are causally arranged. The key issue here (one to which we will return on more occasions in the paper) is that the causal structural arrangement of the parts in a program must be a part of the background theory that motivates the different data collection and cannot be a product of statistical analysis. In short, the path leading to connecting the intervention to its outcome is not a product of the data integration, but rather a condition for it. In our crude theory, we assume participation and agency to depend causally on the CM content focus. Our integrated data model would allow us to study the converse causal arrow, but our background theory rules this out as irrelevant.
A further major issue concerns the interpretation of the parameters of the graphical model. Following our proposed normative theory of mechanism, to interpret a graphical data model as the observable consequence of an underlying mechanism, the relations depicted in the graphical model must be counterfactual. This interpretation is however another issue that does not depend on data integration, but is rather determined by theoretical and modeling assumptions. Several approaches to statistical data analysis focus on counterfactuals, such as RCT, structural causal models (SCMs), difference-in-difference (DiD), instrumental variables (IV), and propensity score approaches. However, in order to interpret the data as supporting counterfactual claims, all of these approaches require making theoretical assumptions, such as the randomized allocation into test and control groups in RCTs, the assumption of a causal theory in SCMs, the ability to hypothesize about confounders in IV, or the assumption of parallel trends in different groups in DiD. The more assumptions we can satisfy, the stronger the support that a counterfactual theory gets from the graphical data model. In simpler terms, graphical models of data do not always represent networks of counterfactuals, but can, under specific conditions, support counterfactual theories. When data sets are integrated, the capacity of constructing or refining a mechanistic theory depends on the successful integration of the background theories for each original data set, and not on the integration of the data itself.
A further problem arises when the background theory is “not enough.” Consider again the integrated data set from our project. Our crude program theory states that:

A partial, hypothetical path model of BfM. The rectangles are characterizations of CMs, part of the framework discussed in Lindvall et al. (2018). The rhomboids are characterizations of collegial discussions, part of the framework discussed in Steenbrugge et al. (2018). The final square on the right side of the diagram is a dimension of the UTOP framework, used to characterize lessons.
Our background theory provided us with a detailed framework for the description of lessons, CMs, and collegial discussions. However, neither this background theory nor our crude program theory is sufficient to know which set of arrows should be included in our refined program theory. Our crude program theory entails that some aspects can be related; however, we do not have a detailed background theory for which relationships between the specific dimensions in Figure 1 are relevant, because our background theory concerns each part separately. Also, our crude program theory rests mainly on the temporal succession of the parts of the program. Without a background theory that selects a subset of arrows among C1–C10, we cannot exclude the possibility that some relationships between different parts of the PD program are only apparent.
These issues show the importance of theory in the study of mechanisms. Both the structural and the counterfactual elements of mechanisms described in the causal pathway claim put important requirements on background theory and cannot be derived from the simple act of data integration. Therefore, as long as mechanisms are minimally understood according to the causal pathway claim, the construction of a mechanistic program theory depends on a background theoretical framework and is not a product of data integration. In simple terms: mechanistic analysis requires mechanistic theories.
Moreover, the issue of the missing background theory has consequences for our normative theory of mechanisms, indicating that the causal path claim and the entities-and-activities claim are equally important for understanding mechanisms. When background theory is not sufficient, the way of expanding it and completing the theoretical gaps is to theorize about the nature of the connection between the parts of the program, that is, to hypothesize the activities that connect the entities belonging to the different program parts.
A similar suggestion is found in the literature about multi-method research. In this field, one of the main claims is that the study of mechanisms requires integrating between-case and within-case studies. Between-case studies can be used to fit integrated data sets into available causal theories, and thereby estimate the contribution of each part of a program. However, whenever the initial theory is insufficient, within-case studies can be employed to study the nature of connections, that is, the activities, between the parts or entities of the target phenomenon that is studied. Within-case studies generate emergent theories that are then used to fill the gaps in the background theory. This is the case of our project, in which the lack of a global mechanistic program made us opt for a more in-depth study of the relationships between the parts of the PD program. For instance, Steenbrugge et al. (2018) investigate the relationship between meaning potentials in CMs and meaning negotiations in collegial discussions. A further case study (Insulander et al., 2019) investigates the relationship between how teacher agency is constructed in CMs and how the agency is constituted in collegial discussions.
Therefore, although data integration can be used to construct and refine mechanistic program theories, it relies heavily on background theory for the satisfaction of the two main characteristics of mechanisms, namely, structure and counterfactual relations. Furthermore, when background theory cannot specify all relevant relationships, the analysis of integrated data models can be insufficient. Finally, in the cases when background theory is not sufficient, it seems that the construction and refinement of a mechanistic program theory requires an understanding of the entities and activities involved in the program, which indicates that (as we have argued) the causal pathway condition does not sufficiently capture the nature of mechanisms.
Theory-Driven Approaches to Integrated Data Sets
The limitations discussed in the last section seem to indicate that theory-based evaluation, and especially the construction and refinement of mechanistic program theories, cannot be fully accomplished by means of data-driven methods. 4 This is plausibly the motivating idea of theory-based evaluation: evaluation requires theory. In this section, we discuss whether theory-driven methods can be more helpful in the construction and refinement of a mechanistic program theory.
Theory-driven approaches, such as qualitative methods and process-tracing methods, analyze data by looking at the relationships between concepts—and how these concepts are realized in the observed settings—rather than between observations. More specifically, researchers use theoretical resources to build or test a theoretical model of the specific ways and forms in which observed events seem to co-occur. This process rests on theory-informed inferences where the connections between two events are either—in the case of theory-generating methods, analyzed by putting forward explanatory facts and successively evaluating the plausibility of those facts, or—in the case of theory-testing approaches, by evaluating a theoretical model of the connection of the two events via the derivation of its observable consequences.
Let us concretize our discussion with an example of qualitative analysis from our case: Steenbrugge et al. (2018). This study focuses on how CMs of BfM can support teachers’ collective learning. The data used for this study are CMs (Session A) from one round and the transcriptions of two collegial discussions (Sessions B and D). The hypothesis is that the CMs might influence the possibilities for teachers’ collective learning during the collegial discussion sessions.
CM texts and discussion transcriptions are categorized using coding manuals. The textual differences between the texts are conceptualized using a further theoretical framework, developed by Kennedy (2016). This framework characterizes PD programs in terms of “approaches by which the programs aim to facilitate the enactment of new ideas: through (1) prescription, (2) strategies, (3) insight, or (4) presenting a body of knowledge” (Steenbrugge et al., 2018, p. 170). Therefore, each CM text is categorized according to Kennedy’s classification by looking at its main approach, and each categorization describes a type of text. The collegial discussions are analyzed in terms of their meaning of negotiations (Wenger, 2000; Wenger et al., 2002), and the authors examine the extent to which the participants in the collegial discussions were involved in activities that allowed them to negotiate the meaning of the concept involved in the CMs (and thereby made collective learning possible). The meaning of negotiations is described using two dichotomous categorizations: the teachers’ participation in the discussions (categorized as high or low) and the reification of central ideas (categorized as present or scarce). The second categorization, reification, has the function of describing the extent to which the group makes the subject of the CMs part of the teachers’ own conceptual repertoire.
The authors observed an interesting pattern in the findings, in which the texts characterized by a “prescriptions and strategies” approach to the enactment of the central ideas led to high participation and reification. In contrast, the content of the CMs characterized by an “insights and body-of-knowledge” approach led to low participation and reification. The theoretical connection between these categories is—according to the authors—that enacting the central ideas of a CM text using a “prescriptions and strategies” approach facilitates teachers’ engagement, as it is often supplemented with concrete instructions on what to do. On the contrary, the “insights and body-of-knowledge” approach is not implemented with instructions, in the same way, making teachers’ conceptualizations more difficult. In this example, the theory-driven approach identifies an operational relationship between differences in textual function (particularly a contrast between two strategies) and differences in meaning negotiations, and specifies the nature of this difference, putting forward a possible mechanism that connects the two events. The aim of theory-driven approach methods is ultimately that of building and/or refining models of entities and activities. The explanation provided by Steenbrugge et al. (2018) aims to identify the entities (meaning potentialities, meaning negotiations) and the activities connecting them (the facilitation of teachers’ engagement deriving from concrete application/instructions). Therefore, data integration followed by thematic analysis seems to facilitate the satisfaction of the entities-and-activities claim about mechanisms.
The result of the theory-driven analysis can be used to construct a mechanistic theory if mechanisms are understood as in the entities-and-activities claim. The analysis reveals the entities and activities that connect the parts of the program together in a way that (assuming that the meaning negotiations can be traced in the same way to the UTOP scorings) explains the program outcome.
In this case, it is indeed the integration of parts that sets the stage for a mechanistic theory. By connecting the parts of the program (CMs and collegial discussions) by means of theory-driven approaches, we get a picture of the program mechanism. However, the mechanistic reward is provided by the integration of theoretical elements and not by data integration. In the example above, the emerging theory is the result of the integration of theories of the curriculum materials and of the collegial discussions with a theory of engagement facilitation (more applicable elements and concrete instructions are easier to manipulate by teachers and facilitate engagement). The theories are integrated by means of the explanatory connection provided by the engagement facilitation, rather than by the observable co-occurrence of certain events.
It is interesting, at this point, to wonder if the explanatory account that results from theory-driven analysis describes a mechanism in the sense of the causal pathway claim. This is a question of the status of emergent theories. According to a still common view among interpretivist researchers (Lincoln & Guba, 1985), the theories that emerge from the application of interpretive qualitative methods should not be interpreted counterfactually. Rios (2004) argues, for instance, that the theories of interpretive sociology do not track mechanisms, because of their lack of counterfactual scope. According to both the influential account of Lincoln and Guba, and Riós, the resulting theories put events in a context, but only in descriptive terms. These theories do not include, in other words, any claim about what would have resulted if the context had been different. In contrast to this standpoint, other methodological approaches to qualitative methods—for example, in the field of process tracing—have provided compelling cases for the claim that theory-driven methods can indeed support counterfactual theories.
A very well-discussed example is Mahoney’s (2015) use of process tracing in historical research. In this paper, Mahoney discusses a specific approach to process tracing which he calls counterfactual analysis. This is a method for theory construction and not specifically for theory testing, meaning that the method can be used when the initial causal theory is not fully specified and questions about what mechanism causally connects X and Y still remain. In these cases, Mahoney suggests identifying the counterfactual statements that connect the known factors in the phenomenon of interest (which in Mahoney’s example are two historical events—the assassination of Archduke Franz Ferdinand and the starting of the First World War—and which in TBE are the implementation of a program and its outcome).
The basis for this claim can be found in Woodward’s theory (2005) and has been discussed in relation to process tracing by Runhardt (2015, 2020). The context considered by Mahoney (2015) is any case in which, as in the case of historical research, it is impossible to introduce a concrete intervention to assess a causal claim, and therefore it is necessary to consider ideal intervening factors in the form of counterfactual claims that are assessed as proxies for the concrete interventions. Mahoney discusses a methodology for this operation which consists of using background theories to look for possible counterfactual connections whenever the theory is unspecified and then assessing if there is evidence for this connection. The selection of a counterfactual should satisfy what Mahoney calls the “minimal rewrite rule,” meaning that the counterfactual connecting X and Y should be such that it describes the smallest intermediate event between X and Y sufficient to explain the causal connection Z → Y. This clearly requires using analogical reasoning to similar cases, as this hypothetical counterfactual is, by assumption, not part of the theory of the target phenomenon. Once a counterfactual Z → Y or X → Z is selected via analogy and is judged to satisfy the minimal rewrite rule, the process tracer should assess the available evidence in favor of or against the counterfactual. This requires deriving observational consequences from the counterfactual claim that can be used to construct a test. These observational consequences should clarify how an idealized intervention on Z (or on X) could generate a change in Y (or on Z). The severity of the test resulting from the available evidence either in favor of or against the counterfactual determines the level of support that the data provided to it. 5
In our case above, the authors hypothesize a causal relationship between “prescriptions and strategies”-oriented materials (X) and participation in the collegial discussion (Y), mediated by the ease of engagement that concrete instruction can entail (Z). The counterfactuals identified here through analogical reasoning (similar educational contexts exhibit these counterfactuals) are X → Z and Z → Y. These seem to satisfy the minimal rewrite rule, as they account for a possible change in Y and Z without the need for any other change in the context. If an intervention on Z resulted in hindering teachers’ engagement, with all else being the same, participation would decline. These counterfactuals are supported in our study by comparing how different collegial discussions developed. Collegial discussions with lower participation/reification are shown to depend on teachers’ difficulty in conceptualizing curriculum materials that are too abstract (“body of knowledge”-oriented). 6
Therefore, theory-driven approaches can be applied to integrated data sets to theorize about mechanisms, even when the term is intended counterfactually as in the causal pathway claim. In the same way as above, the capacity of process tracing and qualitative methods of theorizing about mechanisms (in the causal pathway sense) depends mainly on the use of theories for making inferences and much less on the integration of data. The possibility of constructing a counterfactual path X → Z → Y in our example depends on using background theories and analogical reasoning to assess the satisfaction of the minimal rewrite rule and to derive the observational consequences necessary to trace each step of the path. Integrating data is only a precondition, in this case necessary, but the leverage of the mechanistic theory requires integrating theoretical claims.
In both cases of data-driven and theory-driven analysis, the chances of theorizing about program mechanisms do not depend on the way data is integrated, but rather on the possibility of constructing a theory (or theories) that is sufficiently detailed to explain how the different parts together form a mechanism. Therefore, the discussion in this section allows for a relevant conclusion: data integration is not the main prima facie driver for the construction and refinement of mechanistic program theory.
In the next section, we will discuss methodologies that focus more specifically on theory integration.
Theory Integration
The main aim behind theory integration is to combine two or more models of phenomena (as opposed to data models, as in the case of data integration). Models of phenomena are commonly called theories. We use the term “theory integration” to encompass various terms that are used in the literature about mixed methods, such as integrative analysis or data analysis integration (Bazeley, 2017), interpretation integration (Fetters et al., 2013), and results integration (Schieber et al., 2017). We prefer to use the term “theory” to convey the idea that what is integrated is an epistemic representation of the target phenomenon. In simple terms, we put together what we know about the program. In what follows, we provide a general reconstruction of theory integration and discuss its mechanistic rewards.
Theory Integration and Mechanism
One of the studies generated within our research group (Lindvall et al., 2018) is an instructive example of how theory integration suits the aim of TBE. The aim of the study is to evaluate the impact of two PD programs (one of which is BfM) on student achievement, whereby two sets of data were collected and analyzed. Firstly, the CMs for the two PD programs were analyzed. Secondly, student results were collected on a mathematics test taken annually in the municipality where the study was conducted. The test results were collected for three groups of students: those whose teachers participated in (1) BfM, (2) the other PD program, and (3) no PD program.
The study involves two theories. The first is a hypothesis about the difference in student performance between the three groups. The main parameter is the difference in performance, which is expressed with a determination coefficient. The second theory is a model of the CMs used in the two interventions. This model categorizes the CMs along two dimensions: content focus (the entities) and methods for facilitating enactment (the activities). As for the former, each CM text is categorized into one of five categories describing the material’s main content focus (see above section Integration in TBE). As for the latter, each material is categorized using one of Kennedy’s (2016) previously mentioned categories.
The main reason for using these two theories is that the two PD programs are, according to background theory (Desimone, 2009b), very alike concerning features other than those captured by the two theories. Yet, the effect hypothesis is supported by the student data, which seems to indicate that they differ in effect for some grades. The categorical model is employed to explain the differences in effect size. According to the authors, the two programs differ in both content focus and methods for facilitating enactment, and these differences describe the mechanism underlying the group differences in effect size. The two models are integrated via their overlapping parts, since the CMs are a constitutive part of the PD program. This is therefore an example in which the effect size alone does provide evidence that one of the PD programs has a greater effect, but at the same time cannot answer the question, “why does one of the PD programs have a greater effect?” Instead, categorization is a way of answering this question by describing the difference between the programs.
Shifting to our mechanism terminology, the effect-size hypothesis describes a causal relationship between the two PD programs and student performance. Issues concerning the background theory (the two programs are very similar, which excludes a set of relevant confounders) and data collection, make the counterfactual interpretation of the effect measured in this natural experiment quite plausible. However, in itself, the effect theory does not describe any mechanism. Once the two theories are integrated then a mechanism is specified, and this mechanism seems to cohere with both normative mechanism claims. The integrated theory describes a pathway (CM with a certain content focus and method for facilitating enactment → teacher knowledge → teaching → student results) and each of these arrows can be described as an intervening factor with a differential effect (all the edges except one are compared with contrasts). Moreover, the theory specifies which entities and activities explain the outcome. Hence, the mechanism of this program theory is a mechanism according to both the causal pathway and the entities-and-activities claim. Most importantly, this mechanism is the product of theory integration. The example of theory integration we describe here coheres with what the literature on multi-method research describes as integration between studies that are between-cases (the effect study) and within-cases (the theory of CMs). Our contribution consists in highlighting the pervasive role of theory in mechanism-oriented integration.
If we put this together with the considerations drawn in the previous section, we obtain our main conclusion: our concept of mechanism entails that TBE is not only theory-oriented (i.e., a theory is the main epistemic aim), but also theory-driven (theoretical approaches are the main ways of achieving the aims of TBE). Theory integration appears to be an important driver of theoretical elements in the process of constructing a mechanistic theory. We conclude this section by discussing some possible complications of this process.
First of all, theory integration can be affected by problems regarding background theory that are similar to those affecting data integration. In our case, the overlap between the two models given by CMs is a sufficient point of contact for model combination. However, sometimes models will not share any point of contact, which will require an ad hoc bridge (see Figure 2). This bridge might sometimes be inherited by some background theory, but in many cases, such a theoretical resource will not be available. In such cases, the situation will be the same as that in data integration. It may be that the specific case will allow for the development of an ad hoc bridge theory through a within-case study (in the same way as we did in our project), but this is not necessarily always the case.

A diagrammatic representation of theory integration. Source: Models I and II are integrated into Model III. As background theory is insufficient for model coupling, an ad hoc bridge (the dashed arrow in the diagram) is necessary for integration.
The second problematic issue concerns scope. Usually, if data integration is possible, then there is no problem with theories having different scopes (more or less general), since data integration requires that the different data subsets cohere in some fashion at the unit level. In the case of theory integration, this is not a guarantee. With enough goodwill and patience, models that greatly differ in scope can be integrated with one another. But, if one of the models is based on an empirical base that is “too small,” it is at least questionable whether the two models have the same strength. Such a problem might emerge, for instance, if we attempt to integrate the model described in Lindvall et al. (2018) with the one in Steenbrugge et al. (2018). These two models are significantly different in size, which might imply that the scope of their main claims would also differ. Therefore, integrating theories entails integrating claims: if the claims are not easily integrated, the theories will not be either.
Conclusion
We have used conceptual analysis in this paper to discuss how data and theory integration can contribute to the construction and refinement of a mechanistic program theory. Our conclusions can be summarized in this way:
Mechanisms should be understood in TBE as both networks of counterfactual relations and networks of entities and activities. Both views are necessary for the goals of TBE. The mechanistic contribution of data integration is mainly to provide information about the different parts of a program and to enable comparisons between these parts. However, integration of data does not in itself provide knowledge about mechanisms. The analysis of integrated data sets can result in the construction or refinement of a mechanistic program theory, but the main drivers of this theory are theoretical. Mechanisms are described either by integrating theories or by applying further theoretical resources and theory-driven approaches to fill the gaps and build bridges in the initial program theory. Theory integration seems therefore to be one of the main drivers of mechanistic program theories, along with the focus on specifying the entities and activities involved in contributing to the program outcome. This focus on theoretical specification is pervasive in both data integration and theory integration. It has been suggested that the causal pathway claim exhausts the concept of mechanism, making the entities-and-activities claim redundant (Marchionni & Reijula, 2019). As a consequence of our discussion in this paper, we conclude that the entities and activities are just as fundamental, defining parts of all mechanisms. The causal pathway claim has also been criticized as involving a host of methodological problems (Beach, 2016). Our discussion of theory integration indicates that the analysis of counterfactuals is not an insurmountable obstacle for TBE, even when the data do not allow using statistical tools.
Our proposed way of understanding mechanisms and our methodological discussion should not be interpreted as entailing that any methodological approach that does not focus on both the counterfactual and the theoretical dimension of the mechanism is flawed. Many methodological approaches to TBE will focus on only one of these dimensions and can still be described as focusing on the mechanism. Our claim is rather that any conclusions based on only counterfactual analysis or on the specification of entities and activities that claims to have identified a program’s underlying mechanism should be considered incomplete.
Our paper contributes to the literature on TBE and on evaluation research in general in several ways. Firstly, we provide a detailed view of the complexity involved in the epistemic goal of constructing mechanistic program theories. Integration is a complex process that contributes to this goal in various ways. In particular, the difference between data and theory integration and the methodological importance of theory in mixed methods has not received sufficient attention.
Secondly, it has been argued that concrete examples of how TBE can be conducted in practice (e.g., reports of successes and failures, analytical techniques, evaluation effects) are “seriously needed in the published literature” (Coryn et al., 2011). In this article, we have provided practical descriptions of approaches used in a project striving to conduct a TBE of a specific educational intervention: a large-scale teacher PD program.
Thirdly, and as illustrated in our examples and conceptual arguments, TBEs of educational interventions involve many obstacles to overcome. For example, conducting a TBE using a mixed-methods design requires expertise in multiple methodologies, and it has been argued that few scholars have the expertise to implement several different methodologies to a high standard, whereby it is recommended that “teams of researchers with expertise in different methods, all working together” be involved when studying educational interventions (Desimone, 2009a, p. 172). However, as demonstrated in this study, the challenges are related not only to the use of different methodologies but also to how these methodologies can or should be integrated. We stress the importance of seeing both data integration and theory integration, an issue that seems to have been neglected in the literature on TBE. In other words, we argue that the proposed research teams need not only expertise in certain methods for data collection and analysis, but also expertise in strategies for integrating data as well as theories.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article. This work was supported by the Vetenskapsrådet (grant number 2014-2008).
