Sage Journals: Discover world-class research

Abstract

Psychologists are often interested in the effect of an internal state, such as ego depletion, that cannot be directly assigned in an experiment. Instead, they assign participants to a manipulation intended to produce this state and use manipulation checks to assess the manipulation’s effectiveness. In this article, I discuss statistical analyses for experiments in which researchers are primarily interested in the average treatment effect (ATE) of the target internal state rather than that of the manipulation. Often, researchers estimate the association of the manipulation itself with the dependent variable, but this intention-to-treat (ITT) estimator is typically biased for the ATE of the target state, and the bias could be either toward the null (conservative) or away from the null. I discuss the fairly stringent assumptions under which this estimator is conservative. Given this, I argue against the status-quo practice of interpreting the ITT estimate as the effect of the target state without any explicit discussion of whether these assumptions hold. Under a somewhat weaker version of the same assumptions, one can alternatively use instrumental-variables (IVs) analysis to directly estimate the effect of the target state. IVs analysis complements ITT analysis by directly addressing the central question of interest. As a running example, I consider a multisite replication study on the ego-depletion effect, in which the manipulation’s partial effectiveness led to criticism and several reanalyses that arrived at varying conclusions. I use IVs analysis to directly account for the manipulation’s partial effectiveness; this corroborated the replication authors’ reported null results.

Keywords

causal inference instrumental variables intention to treat noncompliance

In experimental psychology and related disciplines, researchers are often interested in the effects of a certain internal state, such as ego depletion or social stress, that cannot be directly assigned in an experiment. In such cases, researchers often assign participants to a manipulation that is intended to produce the target state and use manipulation-check items to assess whether the manipulation did so effectively (Aronson et al., 1990; Ejelöv & Luke, 2020). That is, manipulation checks are often used when the intervention of interest is not the randomized manipulation itself but, rather, the target state that the manipulation check measures. For example, in a multisite replication study on ego-depletion effect, participants were randomly assigned to perform a putatively effortful and fatiguing task or to perform a similar but noneffortful control task (Hagger et al., 2016). Here, scientific interest centered less on the effects of the effortful task per se than on the effect of effort and fatigue. Indeed, experiments on ego depletion have used a diverse range of manipulations, including working on unsolvable puzzles, concealing one’s emotions while watching a movie, and avoiding using stereotypes when describing people (Baumeister et al., 1998; Gailliot et al., 2007). As a manipulation check, Hagger et al. (2016) asked participants to self-report their effort, fatigue, perceived difficulty, and frustration. The replications were criticized partly because the manipulation was only partially effective based on these measures (Baumeister & Vohs, 2016).

Although manipulation checks are widely used, there has been little guidance on appropriate statistical analyses for studies using these measures (Hauser et al., 2018). In this article, I provide such guidance, informed by classic and recent results in causal inference. I first discuss several widespread existing statistical methods, such as estimating the association of the manipulation with the dependent variable without including the manipulation check in the analysis model. In the language of clinical trials, this is an “intention-to-treat” (ITT) analysis. Intuitively, it might seem that this analysis would always provide a conservative estimate (i.e., in the correct direction but biased toward the null) of the effect of the target state itself on the dependent variable because the estimate is “diluted” by the manipulation’s limited effectiveness. This presumed conservatism is perhaps why this approach is in widespread use. In fact, as I discuss, the ITT estimator is not necessarily conservative: It can be biased away from rather than toward the null, and it can even be in the wrong direction (Hernán & Hernández-Díaz, 2012). This means that the status-quo practice of interpreting the ITT estimate as the effect of the target state itself is not automatically justified even if the estimate is interpreted as conservative. I discuss the fairly stringent assumptions under which the ITT estimator is guaranteed to be conservative. These assumptions are violated, for example, for several sites in the ego-depletion replications, so the ITT estimate cannot necessarily be interpreted as the effect of ego depletion itself or even as a conservative estimate thereof.

In fact, under a weaker version of the same assumptions, it is possible to obtain a consistent, rather than conservative, estimator for the effect of the target state itself (a consistent estimator is one whose estimates converge to the true parameter—in this case, the effect of the target state—as the sample size increases). I discuss how to do so using standard instrumental-variables (IVs) analysis (Angrist et al., 1996). Conducting such an analysis along with the standard ITT analysis helps more directly address a central question of interest when using manipulation checks. As I discuss, estimating the effect of the target state requires fairly strong assumptions regardless of whether one uses IVs analysis or the conservative interpretation of ITT analysis. In particular, both methods require assuming “excludability,” meaning that the manipulation affects the dependent variable via only the target state and not via any other mechanisms (Angrist et al., 1996). Others have critiqued the status-quo practice of interpreting the ITT estimate in terms of the effect of the target state (Eronen, 2020; Gruijters, 2022; Spirtes & Scheines, 2004); I offer further methodological reasons to abandon this practice unless excludability and the other required assumptions have been carefully justified.

Running Example: Ego Depletion

As a running example, I consider Hagger et al.’s (2016) multisite replication study on ego depletion, in which the ITT estimate was close to the null (standardized mean difference [SMD] = 0.04; 95% confidence interval [CI] = [–0.07, 0.15]). I show that accounting for the manipulation’s partial effectiveness yields point estimates that remained very close to the null, corroborating the replication team’s findings and contributing new evidence to a contentious debate (Baumeister & Vohs, 2016; Dang, 2016; Drummond & Philipp, 2017; Hagger & Chatzisarantis, 2016). The experimental conditions in the replications were (a) a control task, which involved viewing a series of words and rapidly pressing a button if the word contained the letter “e,” or (b) an effortful task, which was identical to the control task except that participants were to avoid pressing the button if the “e” was next to or one letter away from a vowel (Hagger et al., 2016). This additional stipulation requires response inhibition and so is thought to make the task more ego-depleting. The primary dependent variable was reaction-time variability (RTV) on a multisource interference task, which is conceptually similar to the Stroop task and requires response inhibition. Higher RTV indicates worse performance, so the ego-depletion hypothesis predicts that experiencing more ego depletion would increase RTV. The four manipulation check items were 7-point scales assessing self-reported effort, fatigue, perceived difficulty, and frustration.

Setting and Notation

I focus on experiments in which the manipulation is randomly assigned but the target state is not. Let X denote the experimental condition; I assume this is a binary variable such that $X = 1$ denotes the manipulation of interest and $X = 0$ denotes the control condition. Let Y denote the dependent variable, which could be binary or continuous. Let R denote the results of the manipulation check, which measures the extent to which a given participant experienced the target state. The manipulation check could consist of one item or a composite of multiple items and could be binary (such that $R = 1$ denotes experiencing the target state) or continuous (such that higher values of R denote experiencing the target state more strongly). Usage of the term “manipulation check” has expanded to include, for example, measures that assess whether participants are paying attention (Oppenheimer et al., 2009); I do not address such attention checks because the relevant analytic considerations are quite different, as I detailed elsewhere (Mathur, 2024).

Any discussion of appropriate statistical methods must begin with clearly defining the estimand of interest—that is, the unknown quantity one is trying to estimate. One estimand of natural interest is the average treatment effect (ATE) of R, which can be formalized using the counterfactual framework of causal inference.¹ In this framework, a participant’s potential outcome is the value that a given variable, usually Y, would take for that individual if the individual were to receive a particular intervention (Hernán & Robins, 2020). For example, if R is binary, then the potential outcome $Y (r = 1)$ is the value that the dependent variable would take for a given individual if that individual were to experience the target state, whereas $Y (r = 0)$ is its value if the individual instead were to not experience the target state. In the ego-depletion example, $Y (r = 1)$ would be the value that the dependent variable (RTV) would take for a given individual if the individual were to experience ego depletion, and $Y (r = 0)$ would be the value if not. If R is a binary variable, the ATE of R is

A T E_{Target} = E [Y (r = 1) - Y (r = 0)] .

The notation $E [\cdot]$ represents an expected value. This ATE is, for example, the average difference in the dependent variable that participants would have if they all experienced ego depletion versus if they all did not experience ego depletion. More generally, if R is continuous, one would consider the effect of a 1-unit increase in R,

A T E_{Target} = E [Y (r = c + 1) - Y (r = c)],

for any baseline level c. I now consider three existing analytic approaches for experiments that use manipulation checks and consider whether these approaches can validly estimate $A T E_{Target}$ . The conclusions are summarized in Table 1.

Table 1.

Summary of Statistical Analyses, Questions Addressed, and Assumptions

Analysis	Question addressed	Distinctive assumptions^a
Intention-to-treat (status-quo approach)	How much does the manipulation itself affect the DV?	None
	How much does the target state affect the DV? (conservative estimate)	The target state is binary; the manipulation increases the target state; the manipulation affects the DV only via the target state; any unmeasured moderators of the target state–DV relationship are uncorrelated with any unmeasured moderators of the manipulation–target state relationship.
Mediation: indirect effect	How much of the manipulation’s effect on the DV is specifically via changing the target state?	The relationship between the target state and the DV is unconfounded, conditional on any adjusted covariates.
Moderation	How much more effective was the manipulation among participants who were in the target state?	The manipulation does not affect the target state (plus other assumptions^b).
Instrumental variables	How much does the target state affect the DV?	The manipulation affects the target state; the manipulation affects the DV only via the target state; any unmeasured moderators of the target state–DV relationship are uncorrelated with any unmeasured moderators of the manipulation–target state relationship.

Note: DV = dependent variable.

The listed assumptions are not exhaustive. For example, all methods assume that the manipulation is randomized and make other standard causal-inference assumptions (e.g., that participants do not affect each other; Hernán & Robins, 2020).

The other assumptions are given in Mathur and Shpitser’s (2024) Proposition 1.

Existing Approaches

ITT analysis

Because the target state R is not randomly assigned, the association of R itself with may Y not provide a valid estimate for $A T E_{Target}$ because of confounding variables—that is, variables affecting both R and Y (Hauser et al., 2018). In the ego-depletion example, confounding is likely because participant characteristics, such as trait executive functioning, general intelligence, or trait “depletion sensitivity,” could affect participants’ susceptibility to ego depletion and their RTV on the multisource interference task (Maples-Keller et al., 2016; Salmon et al., 2014). For this reason, researchers analyzing studies with manipulation checks typically use the ITT estimate, that is, the association of the randomized manipulation X with Y, ignoring R:

{\hat{A T E}}_{Manip} = E [Y | X = 1] - E [Y | X = 0] .

This estimate is typically obtained by taking the mean difference between experimental conditions. Of course, because X is randomized, this estimate is statistically consistent (i.e., valid) for the ATE of the manipulation itself. However, ${\hat{A T E}}_{Manip}$ will typically not be consistent for $A T E_{Target}$ if some participants assigned to $X = 1$ do not experience the target state or, likewise, if some participants assigned to $X = 0$ do experience the target state. This is analogous to noncompliance in a clinical trial, in which some participants assigned to take Treatment A might instead take Treatment B and vice versa (Angrist et al., 1996).

As noted in the introduction, intuition might incorrectly suggest that the ITT estimator would always be conservative (i.e., biased toward the null) for $A T E_{Target}$ . In the ego-depletion example, one might incorrectly suppose that the estimated effect of imperfect ego-depleting manipulations is a conservative estimate of the effect of ego depletion itself because the manipulations’ limited effectiveness would seem to “dilute” the estimate. However, this conservative interpretation is not necessarily justified: It is possible for ${\hat{A T E}}_{Manip}$ to be biased away from, rather than toward, the null (Hernán & Hernández-Díaz, 2012; McNamee, 2009). However, under some assumptions, the ITT estimator will indeed be conservative. In such cases, the resulting estimate ${\hat{A T E}}_{Manip}$ may be informative in a given study if it represents a meaningfully large effect size because one can then conclude that an estimate of $A T E_{Target}$ would be at least as large. This conservative interpretation holds under the following assumptions²:

Assumption 1 (no backfiring): The mean difference of X on R is positive.

Assumption 2 (excludability): The manipulation X affects the dependent variable Y only via the target state, R.

Assumption 3 (no simultaneous moderation): Any unmeasured moderators of the effect of target state R on Y are uncorrelated with any unmeasured moderators of the effect of manipulation X on R.

Assumption 4 (binary target state): The target state R is analyzed as a binary variable.

Assumption 1 states that the mean difference of X on R is positive, that is, $E [R | X = 1] - E [R | X = 0] > 0$ . This means that the manipulation does not “backfire” on average by decreasing rather than increasing the target state. A scientifically reasonable choice of manipulation makes this assumption highly plausible. For example, one would not choose an ego-depleting manipulation that would plausibly decrease rather than increase ego depletion. This assumption also can and should be tested empirically, for example, by regressing R on X (Ejelöv & Luke, 2020). Assumption 4, which states that R is analyzed as a binary variable, ensures that a second bound holds, that is, $E [R | X = 1] - E [R | X = 0] \leq 1$ . If R is not binary, the ITT estimator may be biased away from the null rather than toward the null.

If the manipulation affects R (regardless of the strength or direction of effect) and satisfies Assumption 2, then it is called an “instrument” for the target state R. The following conservatism results follow from classic and recent results regarding IVs, which are used in causal inference to estimate the effects of variables that cannot themselves be randomly assigned (Angrist et al., 1996; Hartwig et al., 2023; Hernán & Robins, 2006).³

Assumption 2 states that the manipulation affects the dependent variable only via the target state; this assumption would be violated if the manipulation had any effects on the dependent variable that were not entirely mediated by the target state. This assumption may be plausible a priori if the manipulation is compared with a closely matched control condition such that the only plausible difference between the conditions is whether they promote being in the target state. In Hagger et al.’s (2016) replications, the control task and ego-depleting task were identical except that the control task did not require effortful response inhibition. In addition, Assumption 2 can to some extent be tested empirically. For example, to try to rule out the possibility that the manipulation affects the dependent variable via mechanisms not involving the target state, one could measure variables representing these other plausible mechanisms (Ejelöv & Luke, 2020). If the manipulation is not empirically associated with any of these other mechanisms, this result could increase one’s confidence in Assumption 2 although would not conclusively prove that the assumption holds.

Assumption 3 refers to moderation of the R–Y and X–R relationships (Hartwig et al., 2023).⁴ In the ego-depletion example, if unmeasured variable, such as trait neuroticism, moderated both (a) the effect of experiencing ego depletion on RTV and (b) the effect of the manipulation on experiencing ego depletion, then Assumption 3 would be violated. However, if either form of moderation were implausible for all unmeasured moderators, then Assumption 3 would hold.⁵ The version of this assumption stated here applies if R is binary. If R is continuous, then the effect of R on Y must also be additive linear (Hartwig et al., 2023). This is similar to the standard assumption used in common regression models and means, for example, that for a given individual, the effect of a 1-unit increase in ego depletion on the dependent variable is the same for any level of ego depletion.

Under Assumptions 1 through 4, the ITT estimator will be conservative; more formally, it will be consistent for a parameter that is no greater than the true $A T E_{Target}$ . Although a conservative estimator of $A T E_{Target}$ may be useful in a given study when the resulting estimate represents a meaningfully strong effect size, a conservative estimate that is close to the null is not particularly informative. That is, a null estimate from a conservative estimator does not rule out the presence of even large effects. Thus, in the ego-depletion replications, the null results using the ITT estimator cannot be taken as evidence against effects of ego depletion itself. (In addition, in these replications, the stringent assumptions for the ITT estimator to be conservative were violated in the first place: Assumption 1 may have been violated in some sites because the estimated effect of the manipulation on ego depletion was negative, and Assumption 4 was violated because ego depletion was analyzed as a continuous variable.) It is helpful, therefore, to have a consistent estimator of $A T E_{Target}$ . In fact, this is possible using IVs analysis, which invokes Assumptions 2 and 3 and a less stringent version of Assumption 1, and which does not require Assumption 4. Given this, we generally recommend against the use of ${\hat{A T E}}_{Manip}$ as a conservative estimator of $A T E_{Target}$ . The alternative estimator is the focus of the section Alternative Approach: IVs Analysis below.

Mediation analysis

A second existing approach involves treating the target state, as assessed by the manipulation check, as a mediator and estimating the indirect effect of the manipulation that occurs through the target state (Hauser et al., 2018). For example, Drummond and Philipp (2017) reanalyzed Hagger et al.’s (2016) replication data and estimated small indirect effects through each manipulation-check item. Although mediation analysis can be useful and informative in many contexts, such analyses face two limitations in the context of manipulation checks. First, as I have suggested, experimenters using manipulation checks are often (although not always) ultimately interested in the effect of the target state itself. In contrast, the indirect effect represents the effect of the manipulation that operates via the manipulation’s effect on the target state. Although this estimand may be of interest for other reasons, it cannot be interpreted as the effect of the target state and so does not help address a question of central interest for experimenters who use manipulation checks (Hauser et al., 2018).

As a second limitation, even if the indirect effect is indeed of interest, mediation analysis requires strong assumptions to validly estimate causal effects (Pearl, 2009; Rohrer et al., 2022; VanderWeele, 2015). One such assumption is that there must be no confounding of the relationship between the mediator (here, the target state) and the dependent variable, conditional on any covariates that have been adjusted in analysis. Because the target state itself is not randomized, this assumption will often be violated unless the mediation analysis controls for all variables that affect both the target state and the dependent variable (Hauser et al., 2018; Pearl, 2009; VanderWeele, 2015). As noted above, in the ego-depletion example, it seems likely that certain participant characteristics could affect participants’ susceptibility to ego depletion and their RTV (Maples-Keller et al., 2016; Salmon et al., 2014). Because mediation analyses involving manipulation checks rarely adjust for such common causes (e.g., Drummond & Philipp, 2017), their estimates may not yield valid estimates of the indirect effect.

Moderator analysis

A third existing approach involves treating the manipulation check as a moderator rather than a mediator. For example, Dang (2016) reanalyzed the ego-depletion replications by examining the association of effort with RTV, stratified by experimental condition. It is not clear why of the four manipulation-check items, Dang considered only effort rather than fatigue, the problematic item. Dang found that participants who reported greater effort during the depletion task had higher RTV (i.e., worse performance on the final task) and concluded that the “ineffectiveness of Hagger’s replication may result from ineffectiveness of their manipulation.” One possible reason for treating the target state as a moderator would seem to be to estimate how much more effective the manipulation was among participants who were in the target state.⁶ Perhaps counterintuitively, if the manipulation affects the target state at all, moderation analysis does not validly estimate this quantity (Mathur & Shpitser, 2024).⁷ This problem is essentially a form of posttreatment bias, that is, bias because of conditioning on a variable affected by the treatment of interest (Montgomery et al., 2018). For moderation analysis to estimate the difference in effectiveness between participants who were in the target state versus those who were not, other assumptions are also required (Mathur & Shpitser, 2024). I do not discuss these other assumptions because the assumption that the manipulation does not affect the target state is already violated by design.

Alternative Approach: IVs Analysis

A consistent, rather than conservative, estimator of $A T E_{Target}$ can be obtained using IVs analysis. This method is often used to analyze natural experiments in which the treatment variable of interest (e.g., juvenile incarceration) cannot be randomly assigned, but a naturally randomized variable affects the treatment of interest (e.g., randomly assigned judges; Aizer & Doyle, 2015; Angrist et al., 1996). IVs analysis relies on two of the same assumptions that were required for ${\hat{A T E}}_{Manip}$ to be conservative and a less stringent version of Assumption 1⁸:

Assumption 1′ (relevance): The manipulation X affects the target state R.

Assumption 2 (excludability): The manipulation X affects the dependent variable Y only via the target state, R.

Assumption 3 (no simultaneous moderation): Any unmeasured moderators of the effect of target state R on Y are uncorrelated with any unmeasured moderators of the effect of manipulation X on R.

Assumption $1'$ states that the manipulation is “relevant” in that it affects R, but there are no restrictions on the direction of this effect. Like the previous Assumption 1, Assumption $1'$ can be tested empirically by regressing R on X. If Assumptions $1'$ through 3 hold, then $A T E_{Target}$ can be estimated using the Wald IV estimator, which is simply the ITT estimate divided by the estimated effect of the manipulation X on the target state R (Angrist et al., 1996; Hernán & Robins, 2020)⁹:

\begin{matrix} {\hat{A T E}}_{IV} = \frac{{\hat{A T E}}_{Manip}}{E [R | X = 1] - E [R | X = 0]} \\ = \frac{E [Y | X = 1] - E [Y | X = 0]}{E [R | X = 1] - E [R | X = 0]} . \end{matrix}

(1)

Heuristically, if X is only partially effective, then Assumptions 1′ through 3 guarantee that ${\hat{A T E}}_{Manip}$ is biased toward the null, and ${\hat{A T E}}_{IV}$ corrects this attenuation. In the ego-depletion example, if these assumptions hold and the manipulation only partially increases the subjective experience of ego depletion, then ${\hat{A T E}}_{Manip}$ will be diluted compared with the effect of ego depletion itself. However, inflating ${\hat{A T E}}_{Manip}$ by a correction factor related to the manipulation’s strength of effect on ego depletion corrects the dilution. The IV estimate, ${\hat{A T E}}_{IV},$ is exactly this corrected estimate. More precisely, if the estimated effects of X on Y and of X and R are in the same direction but the effect of X on R is relatively small, the denominator in Equation 1 is small, so the IV estimate strongly corrects ${\hat{A T E}}_{Manip}$ away from the null. If the estimated effects of X on Y and of X and R are in opposite directions, then the IV estimate will be in the opposite direction from the ITT estimate. This could occur if the effect of R itself on Y is in the opposite direction from the effect of X on R.

Near violations of Assumption $1'$ can occur if the manipulation has an only weak effect on the target state, in which case, the manipulation is called a “weak instrument.” Caution is warranted with weak instruments because they can amplify any bias because of violations of Assumptions 2 and 3 (Bound et al., 1995; Hernán & Robins, 2020). As noted above, researchers should check the empirical association of X with R; a common rule of thumb is that the resulting F statistic should be at least 10 (Hernán & Robins, 2020). In the context of manipulation checks, weak instruments should be a rare occurrence because a central component of experimental design is selecting a manipulation that has a reasonably strong effect on the target state.

Several approaches exist to assess how results might change if Assumptions $1'$ through 3, or some subset of them, are violated (Cinelli & Hazlett, 2022; Hernán & Robins, 2006; Swanson et al., 2018). Some methods allow for one or more assumptions to be violated, at the price of providing bounds on $A T E_{Target}$ rather than a point estimate. For example, there exist several possible bounds that apply if Assumptions $1'$ and 2 (relevance and excludability) hold but Assumption 3 (no simultaneous moderation) could be violated (Swanson et al., 2018). However, the bounds can often be so wide as to be relatively uninformative. Alternative approaches instead express the extent to which the IVs estimator could be biased for $A T E_{Target}$ given sensitivity parameters that characterize how strongly the assumptions are violated. For example, Cinelli and Hazlett (2022) provided bounds that apply if Assumptions 1 and 3 (relevance and no simultaneous moderation) hold but Assumption 2 (excludability) is violated. Sensitivity analysis for IVs analyses remains an active area of research.

Reanalysis of the Ego-Depletion Effect

Using ITT analysis, Hagger et al. (2016) estimated that the ego-depletion effect was close to the null (SMD = 0.04; 95% CI = [–0.07, 0.15]). The manipulation had strong effects on three of the four manipulation-check items (effort, difficulty, and frustration); SMDs ranged from 0.82 to 1.91. However, the manipulation had only weak effects on fatigue (SMD = 0.09; 95% CI = [–0.03, 0.20]). The lead author of the original study on ego depletion critiqued the replications on several grounds, one of which was that “the manipulation failed to create ego depletion” based on the fatigue measure (Baumeister et al., 1998).

To investigate how the replication findings might change when accounting for the manipulation’s partial effectiveness, I reanalyzed the data using IVs analysis. I first consider the plausibility of Assumptions 1′ through 3. First, Assumption 1′ (relevance) appears to hold because the manipulation did somewhat increase ego depletion, if only partially. Assumption 2 (excludability) seems fairly plausible on theoretical grounds because the manipulation involved stylized laboratory tasks that were identical except that the control task did not require effortful response inhibition (i.e., suppressing a response to press a button when a word contained the letter “e” but the “e” was near a vowel). This is a strength of the replications compared with some previous studies on ego depletion in which the manipulations were often considerably less specific (e.g., forcing oneself to eat radishes instead of freshly baked cookies) and hence more likely to violate the excludability assumption (Baumeister et al., 1998; Lurquin & Miyake, 2017). On the other hand, there appears to be little empirical basis for ruling out possible undesired effects of the letter-crossing manipulation on nuisance psychological states. Such effects could result in excludability violations (but regarding possible effects on negative affect, see Hagger et al., 2010). This paucity of literature on manipulation specificity contrasts strikingly with the better developed literature on the effectiveness and mechanisms of these tasks at inducing ego depletion (Arber et al., 2017; Baumeister & Vohs, 2016; Singh & Göritz, 2019). As noted previously, Assumption 3 (no simultaneous moderation) could potentially be violated if an unmeasured variable, such as trait neuroticism, moderated both (a) the effect of experiencing ego depletion on the dependent variable (RTV) and (b) the effect of the manipulation on experiencing ego depletion. I leave further substantive consideration of this possibility to the ego-depletion research community.

In my reanalysis, I treated the four manipulation-check items (effort, difficulty, frustration, and fatigue) as a single composite measure of ego depletion by aggregating Hagger et al.’s (2016) publicly available estimates of the manipulation’s effects on each item, accounting for correlation between the items. I combined these with Hagger et al.’s ITT estimates to obtain IV estimates for each replication site (Fig. 1). I aggregated the IV estimates across sites using random-effects meta-analysis fit with restricted maximum likelihood estimation and Knapp-Hartung standard errors (Knapp & Hartung, 2003; Sidik & Jonkman, 2002). I estimated that on average across sites, the effect of the manipulation on composite ego depletion was SMD = 0.79 (95% CI = [0.71, 0.86]). Within sites, the ITT estimates ranged from −0.51 to 0.50, and the IV estimates ranged from −0.76 to 0.55.¹⁰ In 20 of 23 sites, the ITT estimate was closer to null than the IV estimate or was equal to the IV estimate. In my aggregated IV analysis, I estimated that the effect of ego depletion itself (rather than the effect of the manipulation) was SMD = 0.06 (95% CI = [–0.09, 020]). Like Hagger and Chatzisarantis’s (2016) ITT estimate, this IV estimate is very close to the null with a CI that excludes medium or large effect sizes.

Fig. 1.

Forest plot of intention-to-treat (ITT) estimates and instrumental variable (IV) estimates within each replication site and pooled across sites with 95% confidence intervals. Interval limits that extend past the plotted range are truncated. IV estimates are based on the composite ego-depletion measure.

In a critical commentary on the replications, Baumeister and Vohs (2016) argued that fatigue is the most important of the manipulation-check items. I conducted a second analysis that was maximally favorable to this viewpoint in which I treated fatigue as the only manipulation-check item. This analysis would be justified in the extreme case that the only effects of the manipulation were via fatigue and not via effort, perceived difficulty, frustration, or any other pathways. The resulting IV estimate was again very similar to the ITT estimate (SMD = 0.05; 95% CI = [–0.25, 0.38]).¹¹ Although this estimate had a wide CI, it again corroborates Hagger et al.’s (2016) original findings under assumptions that favor the opposite conclusion. I reiterate that my IV reanalysis is subject to assumptions that might be violated and that merit further empirical evaluation and substantive consideration; nevertheless, these assumptions are less stringent than those required for the status-quo interpretation of the ITT estimate.

Discussion

In this article, I considered analytic approaches for experiments that involve manipulation checks, specifically, when the effect of primary interest is that of the target internal state that the manipulation is intended to produce. In this context, the widespread ITT analysis is not necessarily conservative for the ATE of the target state on the dependent variable. I discussed assumptions under which the ITT estimator is indeed conservative for this ATE. I suggested IVs analysis as an alternative approach that is well established in the causal-inference literature. The assumptions under which the IVs analysis estimates the ATE of the target state are slightly less stringent than those under which the ITT analysis is conservative.

The IVs estimate directly addresses a central question of interest in experiments involving manipulation checks, as in the ego-depletion literature. As another example, in a high-profile meta-analysis of experiments designed to assess the effects of mood on eating behavior, experimental manipulations of mood were highly diverse, including watching emotional video clips, recalling emotional experiences, receiving social feedback, and giving a public presentation (Cardi et al., 2015). Although many of the studies included manipulation checks, the meta-analysts (Cardi et al., 2015) extracted only ITT estimates. Despite the mood manipulations’ inconsistent effectiveness, the meta-analysts (Cardi et al., 2015) concluded that “eating behavior is influenced by emotional state” with no discussion of the assumptions required to thus interpret ITT estimates in terms of the target state. This is a case in which the causal effect of interest clearly concerns the target state, not the various manipulations, but the statistical analysis and discussion of assumptions were not well aligned with this objective.

On the other hand, there are cases in which the ITT estimate will be of equal or greater scientific interest than the IV estimate. Sometimes, researchers are additionally—or exclusively—interested in the effects of the manipulation itself on the dependent variable. This could be the case if the manipulation is a candidate real-world policy or intervention. For example, in a series of randomized experiments, Fernbach et al. (2013) found that participants who were made to explain the details of various political policies subsequently moderated their stances on the policies compared with participants who merely had to explain their own stances. The hypothesized mechanism was that having to explain policies reveals to participants their weak understanding of the policies, leading them to moderate their views. Fernbach et al. expressed interest in the manipulation itself as a potential intervention to counteract attitude polarization, and the ITT estimate directly assessed this possibility. If the authors were additionally interested in effect of participants’ perceived understanding of the policies, the IV estimate would more directly address this alternative question. Therefore, our recommendation is not that the IV estimate should always be used instead of the ITT estimate. Rather, researchers using manipulation checks should explicitly define and preregister the causal effect(s) of interest and choose analysis methods accordingly.

Among other assumptions, IVs analysis requires the strong assumption of excludability (i.e., that the manipulation affects the dependent variable only via the target state). By comparison, if the ITT estimate is interpreted as the causal effect of the manipulation itself, this is justified simply because the manipulation was randomly assigned. My position is not that the assumptions for IV are typically well justified in psychology experiments: In fact, I concur with others’ concerns that manipulations in psychology may often be “fat-handed,” meaning that they manipulate numerous psychological states other than the target state (Eronen, 2020; Gruijters, 2022; Spirtes & Scheines, 2004). Excludability will usually be violated if these other, nuisance effects of the manipulation also affect the dependent variable. The key point is that if a researcher is interested in the effect of the target state rather than the manipulation—as is usually the case in experimental psychology—then the excludability assumption is the price paid, and its plausibility should be scrutinized in substantive context.

Critically, as I have discussed, the excludability assumption is not unique to IVs analysis. If researchers use the ITT estimate but interprets it as the effect of the target state, then they are still implicitly assuming excludability. In fact, this seems to be the status-quo practice, and it requires assumptions that are strictly more stringent than those required for IVs analysis. Thus, the status-quo practice of interpreting the ITT estimate as the effect of the target state without any explicit discussion of whether excludability holds should be abandoned. Instead, researchers should either (a) describe why, on theoretical or empirical grounds, their manipulation warrants making the excludability assumption and thus be licensed in interpreting either the IV estimate or (under additional assumptions) the conservative ITT estimate in terms of the effect of the target manipulation or (b) decide that their manipulation may not warrant making the excludability assumption and interpret that ITT estimate as only the effect of the manipulation itself, not the effect of the target state.

When interest does center on the target state, and hence one must establish that excludabilty is plausible, I concur with others’ recommendations to conduct separate, careful validation studies of the manipulation (Gruijters, 2022). These studies should assess the manipulation’s effects on not only the target state but also other nuisance psychological states that could potentially result in excludability violations (Gruijters, 2022). I have also suggested applying sensitivity analyses and bounding methods to assess robustness to violations of this and other assumptions.

Ultimately, I do not advocate for an uncritical switch from ITT analysis to IVs analysis. Rather, I hope this article encourages psychology researchers to carefully define their estimands of interest, articulate and justify the relevant assumptions, and choose an appropriate analysis method accordingly.

Footnotes

Acknowledgements

The funders had no role in the design, conduct, or reporting. All code and data required to reproduce the applied example are publicly available and documented ().

Transparency

Action Editor: Rogier A. Kievit

Author Contributions

Maya B. Mathur: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Visualization; Writing – original draft; Writing – review & editing.

ORCID iD

Maya B. Mathur

Notes

References

Aizer

Doyle

J. J.

Jr. (2015). Juvenile incarceration, human capital, and future crime: Evidence from randomly assigned judges. The Quarterly Journal of Economics, 130(2), 759–803.

Angrist

J. D.

Imbens

G. W.

Rubin

D. B.

(1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91(434), 444–455.

Arber

M. M.

Ireland

M. J.

Feger

Marrington

Tehan

(2017). Ego depletion in real-time: An examination of the sequential-task paradigm. Frontiers in Psychology, 8, Article 1672. https://doi.org/10.3389/fpsyg.2017.01672

Aronson

Ellsworth

Carlsmith

J. M.

Gonzales

M. H.

(1990). Methods of research in social psychology (2nd ed.). McGraw-Hill.

Baumeister

R. F.

Bratslavsky

Muraven

Tice

D. M.

(1998). Ego depletion: Is the active self a limited resource? Journal of Personality and Social Psychology, 74(5), 1252–1265.

Baumeister

R. F.

Vohs

K. D.

(2016). Misguided effort with elusive implications. Perspectives on Psychological Science, 11(4), 574–575.

Bound

Jaeger

D. A.

Baker

R. M.

(1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association, 90(430), 443–450.

Cardi

Leppanen

Treasure

(2015). The effects of negative and positive mood induction on eating behaviour: A meta-analysis of laboratory studies in the healthy population and eating and weight disorders. Neuroscience & Biobehavioral Reviews, 57, 299–309.

Cinelli

Hazlett

(2022). An omitted variable bias framework for sensitivity analysis of instrumental variables. SSRN. https://doi.org/10.2139/ssrn.4217915

10.

Dang

(2016). Commentary: A multilab preregistered replication of the ego-depletion effect. Frontiers in Psychology, 7, Article 210746. https://doi.org/10.3389/fpsyg.2016.01155

11.

Drummond

Philipp

M. C.

(2017). Commentary: “Misguided effort with elusive implications” and “A multi-lab pre-registered replication of the ego depletion effect.” Frontiers in Psychology, 8, Article 273. https://doi.org/10.3389/fpsyg.2017.00273

12.

Ejelöv

Luke

T. J.

(2020). “Rarely safe to assume”: Evaluating the use and interpretation of manipulation checks in experimental social psychology. Journal of Experimental Social Psychology, 87, Article 103937. https://doi.org/10.1016/j.jesp.2019.103937

13.

Eronen

M. I.

(2020). Causal discovery and the problem of psychological interventions. New Ideas in Psychology, 59, Article 100785. https://doi.org/10.1016/j.newideapsych.2020.100785

14.

Fernbach

P. M.

Rogers

Fox

C. R.

Sloman

S. A.

(2013). Political extremism is supported by an illusion of understanding. Psychological Science, 24(6), 939–946.

15.

Gailliot

M. T.

Baumeister

R. F.

DeWall

C. N.

Maner

J. K.

Plant

E. A.

Tice

D. M.

Brewer

L. E.

Schmeichel

B. J.

(2007). Self-control relies on glucose as a limited energy source: Willpower is more than a metaphor. Journal of Personality and Social Psychology, 92(2), 325–336. https://doi.org/10.1037/0022-3514.92.2.325

16.

Gruijters

S. L.

(2022). Making inferential leaps: Manipulation checks and the road towards strong inference. Journal of Experimental Social Psychology, 98, Article 104251. https://doi.org/10.1016/j.jesp.2021.104251

17.

Hagger

M. S.

Chatzisarantis

N. L.

(2016). Commentary: Misguided effort with elusive implications, and sifting signal from noise with replication science. Frontiers in Psychology, 7, Article 621. https://doi.org/10.3389/fpsyg.2016.00621

18.

Hagger

M. S.

Chatzisarantis

N. L.

Alberts

Anggono

C. O.

Batailler

Birt

A. R.

Brand

Brandt

M. J.

Brewer

Bruyneel

Calvillo

D. P.

Campbell

W. K.

Cannon

P. R.

Carlucci

Carruth

N. P.

Cheung

Crowell

De Ridder

D. T. D.

Dewitte

, . . . others. (2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11(4), 546–573. https://doi.org/10.1177/1745691616652873

19.

Hagger

M. S.

Wood

Stiff

Chatzisarantis

N. L.

(2010). Ego depletion and the strength model of self-control: A meta-analysis. Psychological Bulletin, 136(4), 495–525. https://doi.org/10.1037/a0019486.

20.

Hartwig

F. P.

Wang

Smith

G. D.

Davies

N. M.

(2023). Average causal effect estimation via instrumental variables: The no simultaneous heterogeneity assumption. Epidemiology, 34(3), 325–332.

21.

Hauser

D. J.

Ellsworth

P. C.

Gonzalez

(2018). Are manipulation checks necessary? Frontiers in Psychology, 9, Article 998. https://doi.org/10.3389/fpsyg.2018.00998

22.

Hernán

M. A.

Hernández-Díaz

(2012). Beyond the intention-to-treat in comparative effectiveness research. Clinical Trials, 9(1), 48–55.

23.

Hernán

M. A.

Robins

J. M.

(2006). Instruments for causal inference: An epidemiologist’s dream? Epidemiology, 17(4), 360–372.

24.

Hernán

M. A.

Robins

J. M.

(2020). Causal inference: What if. Chapman & Hall, CRC.

25.

Knapp

Hartung

(2003). Improved tests for a random effects meta-regression with a single covariate. Statistics in Medicine, 22(17), 2693–2710.

26.

Knol

M. J.

VanderWeele

T. J.

(2012). Recommendations for presenting analyses of effect modification and interaction. International Journal of Epidemiology, 41(2), 514–520.

27.

Lurquin

J. H.

Miyake

(2017). Challenges to ego-depletion research go beyond the replication crisis: A need for tackling the conceptual crisis. Frontiers in Psychology, 8, Article 568. https://doi.org/10.3389/fpsyg.2017.00568

28.

Maples-Keller

J. L.

Berke

D. S.

Miller

J. D.

vanDellen

(2016). Ego depletion and conscientiousness as predictors of behavioral disinhibition: A laboratory examination. Personality and Individual Differences, 98, 6–10. https://doi.org/10.1016/j.paid.2016.03.054

29.

Mathur

M. B.

(2024). On the statistical analysis of studies with attention checks. OSF. https://osf.io/preprints/osf/r9vdp/

30.

Mathur

M. B.

Shpitser

(2024). Simple graphical rules for selection bias in general-population and selected-sample treatment effects. American Journal of Epidemiology. Advance online publication. https://doi.org/10.1093/aje/kwae145

31.

McNamee

(2009). Intention to treat, per protocol, as treated and instrumental variable estimators given non-compliance and effect heterogeneity. Statistics in Medicine, 28(21), 2639–2652.

32.

Montgomery

J. M.

Nyhan

Torres

(2018). How conditioning on posttreatment variables can ruin your experiment and what to do about it. American Journal of Political Science, 62(3), 760–775.

33.

Oppenheimer

D. M.

Meyvis

Davidenko

(2009). Instructional manipulation checks: Detecting satisficing to increase statistical power. Journal of Experimental Social Psychology, 45(4), 867–872.

34.

Pearl

(2009). Causality. Cambridge University Press.

35.

Rohrer

J. M.

Arslan

R. C.

(2021). Precise answers to vague questions: Issues with interactions. Advances in Methods and Practices in Psychological Science, 4(2), Article 25152459211007368. https://doi.org/10.1177/25152459211007368

36.

Rohrer

J. M.

Hünermund

Arslan

R. C.

Elson

(2022). That’s a lot to process! pitfalls of popular path models. Advances in Methods and Practices in Psychological Science, 5(2), Article 25152459221095827. https://doi.org/10.1177/25152459221095827

37.

Rothman

K. J.

Greenland

Walker

A. M.

(1980). Concepts of interaction. American Journal of Epidemiology, 112(4), 467–470.

38.

Salmon

S. J.

Adriaanse

M. A.

De Vet

Fennis

B. M.

De Ridder

D. T.

(2014). “When the going gets tough, who keeps going?” Depletion sensitivity moderates the ego-depletion effect. Frontiers in Psychology, 5, Article 647. https://doi.org/10.3389/fpsyg.2014.00647

39.

Sidik

Jonkman

J. N.

(2002). A simple confidence interval for meta-analysis. Statistics in Medicine, 21(21), 3153–3159.

40.

Singh

R. K.

Göritz

A. S.

(2019). Revisiting ego depletion: Moderators and measurement. Basic and Applied Social Psychology, 41(1), 1–19. https://doi.org/10.1080/01973533.2018.1530671

41.

Spirtes

Scheines

(2004). Causal inference of ambiguous manipulations. Philosophy of Science, 71(5), 833–845.

42.

Swanson

S. A.

Hernán

M. A.

Miller

Robins

J. M.

Richardson

T. S.

(2018). Partial identification of the average treatment effect using instrumental variables: Review of methods for binary instruments, treatments, and outcomes. Journal of the American Statistical Association, 113(522), 933–947.

43.

VanderWeele

T. J.

(2009). On the distinction between interaction and effect modification. Epidemiology, 20(6), 863–871.

44.

VanderWeele

T. J.

(2015). Explanation in causal inference: Methods for mediation and interaction. Oxford University Press.

45.

Wang

Tchetgen Tchetgen

(2018). Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. Journal of the Royal Statistical Society, Series B: Statistical Methodology, 80(3), 531–550.

46.

West

S. G.

Thoemmes

(2010). Campbell’s and Rubin’s perspectives on causal inference. Psychological Methods, 15(1), 18–37. https://doi.org/10.1037/a0015917

On the Statistical Analysis of Experiments With Manipulation Checks

Abstract

Keywords

Running Example: Ego Depletion

Setting and Notation

Existing Approaches

ITT analysis

Mediation analysis

Moderator analysis

Alternative Approach: IVs Analysis

Reanalysis of the Ego-Depletion Effect

Discussion

Footnotes

Acknowledgements

Transparency

ORCID iD

Notes

References