Sage Journals: Discover world-class research

Abstract

People feel tired or depleted after exerting mental effort. But even preregistered studies often fail to find effects of exerting effort on behavioral performance in the laboratory or elucidate the underlying psychology. We tested a new paradigm in four preregistered within-subjects studies (N = 686). An initial high-demand task reliably elicited very strong effort phenomenology compared with a low-demand task. Afterward, participants completed a Stroop task. We used drift-diffusion modeling to obtain the boundary (response caution) and drift-rate (information-processing speed) parameters. Bayesian analyses indicated that the high-demand manipulation reduced boundary but not drift rate. Increased effort sensations further predicted reduced boundary. However, our demand manipulation did not affect subsequent inhibition, as assessed with traditional Stroop behavioral measures and additional diffusion-model analyses for conflict tasks. Thus, effort exertion reduced response caution rather than inhibitory control, suggesting that after exerting effort, people disengage and become uninterested in exerting further effort.

Keywords

fatigue ego depletion self-control drift-diffusion model Bayesian analysis open data open materials preregistered

What are the consequences of exerting effort? Many people intuitively believe in ego depletion (Francis & Job, 2018), the idea that exerting effortful control depletes one’s energy (Baumeister & Vohs, 2016). However, high-powered preregistered studies (Garrison, Finley, & Schmeichel, 2019; Hagger et al., 2016), meta-analyses (Carter, Kofler, Forster, & McCullough, 2015), and theoretical reviews (Friese, Loschelder, Gieseler, Frankenbach, & Inzlicht, 2019; Inzlicht & Friese, 2019) suggest that laboratory depletion effects are small or potentially nonexistent and that previous work suffers from limitations such as ineffective experimental manipulations and low statistical power. Here, in four preregistered studies, we developed a paradigm that addresses previous methodological limitations and provides insights into the effects of effort exertion.

Ego-Depletion Controversy

In the first tests of ego depletion (i.e., Baumeister, Bratslavsky, Muraven, & Tice, 1998), one group of participants initially completed a difficult self-control task (depletion group; e.g., forced to eat radishes instead of chocolates), and another group completed an easier task (control group; e.g., allowed to eat chocolates). Both groups then completed a second, unrelated self-control task (e.g., worked on unsolvable puzzles), which served as the dependent variable (e.g., persistence duration on puzzles). The depletion group showed reduced self-control on the second task compared with the control group, providing evidence for ego depletion—the idea that self-control runs out after use (Friese et al., 2019).

Subsequent studies found that depletion influenced diverse outcomes even when the depletion or outcome tasks did not entail self-control or inhibitory control (e.g., Moller, Deci, & Ryan, 2006; Schmeichel, 2007), suggesting that exerting effort and experiencing fatigue (rather than recruiting inhibitory control) led to depletion effects. Critically, the first meta-analysis of 198 published tests suggested that the effect (d = 0.62, 95% confidence interval, or CI = [0.57, 0.67]) was practically important and deserved further investigation (Hagger, Wood, Stiff, & Chatzisarantis, 2010).

Subsequent evidence, however, suggested otherwise. Researchers began reporting replication failures or much smaller effect sizes (e.g., Tuk, Zhang, & Sweldens, 2015). Meta-analyses that were conducted to correct for publication bias (i.e., when significant results are published more frequently than nonsignificant ones) suggest that depletion might be unreal (Carter et al., 2015; Friese & Frankenbach, 2019), although subsequent work has questioned the validity of existing bias-correction techniques (Carter, Schönbrodt, Gervais, & Hilgard, 2019). These critiques were bolstered by further failures involving either large-scale preregistered replications or reanalyses of large data sets not originally gathered to investigate ego depletion (Etherton et al., 2018; Hagger et al., 2016).

Starting Anew: A Novel Approach

Although the field appears to have hit a dead end, it might be too soon to jettison ego depletion because researchers have relied mainly on one paradigm (i.e., between-subjects laboratory sequential tasks) and has yet to fully examine other approaches. For example, studies using archival data sets, field data, or experience sampling suggest that depletion or carryover fatigue effects may be apparent in people’s everyday lives (e.g., Dai, Milkman, Hofmann, & Staats, 2015; Hirshleifer, Levi, Lourie, & Teoh, 2019). Although ecologically valid, these studies often cannot control for real-world confounds. Our goal was to create a laboratory paradigm to provide converging evidence to facilitate future research. We also tested the idea that laboratory depletion effects are akin to real-life fatigue effects in that people shift their priorities when tired, resulting in disengagement from ongoing tasks (Inzlicht, Schmeichel, & Macrae, 2014).

Strong manipulation and within-subjects design

Instead of using standard depletion paradigms, which often use demanding tasks that are thought to tap inhibitory control, we focused on designing a manipulation that robustly elicited states (e.g., effort, fatigue) typically associated with depletion (Friese et al., 2019). We used the symbol-counting task, which draws on the shifting and updating aspects of executive function (Garavan, Ross, Li, & Stein, 2000). Crucially, we modified the task so that it adapted trial-by-trial to each participant’s performance, which ensured that the task was highly demanding for each participant. Second, we used a completely within-subjects design to reduce error variance and increase statistical power (Francis, Milyavskaya, Lin, & Inzlicht, 2018). To minimize demand characteristics and learning effects, we had participants complete the low-demand and high-demand tasks on two separate days roughly 1 week apart.

Drift-diffusion modeling

After the demand manipulation, participants completed the Stroop task, which is often used to assess inhibition abilities (Miyake & Friedman, 2012). Importantly, in addition to performing traditional behavioral analyses on reaction time and accuracy, our primary interest was to transform these observed measures into latent variables assumed to underlie performance. We fitted drift-diffusion models, which assume people make speeded decisions by gradually accumulating information until an evidence boundary is reached (Fig. 1; Ratcliff & McKoon, 2008). This model not only resolved the speed/accuracy trade-off in reaction time tasks (Ratcliff & McKoon, 2008) but also allowed us to examine whether fatigue effects affect information-processing speed (drift-rate parameter) and response caution or impulsivity (boundary parameter; see Fig. 1 for an explanation). Specifically, we fitted the EZ-diffusion model (Wagenmakers, Van der Maas, & Grasman, 2007), which, despite being simpler, often outperforms the full diffusion model and better detects experimental effects (Dutilh et al., 2019; van Ravenzwaaij, Donkin, & Vandekerckhove, 2017). Given the success of this modeling approach in explaining individual differences and how experimental manipulations influence psychological processes (Evans & Wagenmakers, 2019), these latent variables can provide insights into the psychology underlying depletion.

Fig. 1.

Schematic illustrating the drift-diffusion model, which decomposes the joint distributions of reaction time and accuracy into latent variables including drift rate, decision boundary, nondecision time, and starting point (Ratcliff & McKoon, 2008). Panel (a) shows three simulated decisions with different diffusion processes or paths. Each path depicts how one decision process evolved over time. The solid black and dashed lines depict decision processes reaching the correct boundary (i.e., correct responses made) at different rates; higher drift rates terminate sooner at the boundary (i.e., yielding faster reaction times). The light-gray arrows show the direct paths to the correct boundary for these two processes. The dotted line depicts a process that terminated relatively quickly at the error boundary (i.e., resulted in a fast error response). Panel (b) shows what would happen if the boundaries were reduced for the same three decisions. Decision processes would terminate at the boundaries sooner, although drift rates would remain unchanged, reflecting less evidence accumulation and resulting in noisier or more error responses and faster reaction times. The decision process depicted by the dashed line terminated prematurely at the error boundary. Boundary widths reflect either individual differences in response caution or experimental manipulations (e.g., emphasizing speedy responses reduces boundaries, whereas emphasizing accuracy increases them).

Hypotheses

Studies 1 and 2 were conducted in the laboratory. Studies 3 and 4 were conducted online. We varied the duration of the high-demand task across studies. In Study 1, we preregistered only traditional analyses on the Stroop task but ran exploratory diffusion-model analyses. Studies 2, 3, and 4 were preregistered, confirmatory experiments that tested two primary hypotheses: The high-demand experimental manipulation would reduce boundary and drift rate more than the low-demand manipulation would. These predictions reflect our prediction that exerting effort should reduce subsequent overall task engagement rather than specifically inhibitory control.

Method

Participants

Study 1 was designed to primarily evaluate the effectiveness of our within-subjects demand manipulation and its effects on traditional Stroop behavioral measures (i.e., accuracy and reaction time). Two hundred fifty-three undergraduates participated (178 women, 71 men, 4 other; mean age = 18.80 years, SD = 2.66; range = 17–46 years; the preregistration is at https://osf.io/hhn3s/). We also ran exploratory diffusion-model analyses, which we planned to confirm and replicate in Studies 2 to 4, in which we assumed a relatively small effect size (d = 0.26). This effect size reflected our beliefs at the time about preregistration, our skepticism around depletion research, and the likelihood that previous studies might have overestimated effect sizes. We conducted sensitivity analyses using the Power Analysis for General ANOVA Designs program for the R Shiny app (Westfall, 2016), which suggested that roughly 130 participants would provide at least 80% statistical power to detect the hypothesized effect. We tried our best to recruit about 130 participants for each study, but because we recruited participants in batches and had to exclude data (see the Exclusion Criteria section), our final sample sizes were not exactly 130: Study 2 (N = 132 undergraduates; 98 women, 32 men, 2 other; mean age = 18.80 years, SD = 1.78; range = 17–29 years; the preregistration is at https://osf.io/xp7hn/), Study 3 (N = 180 Amazon Mechanical Turk, or MTurk, workers; 94 women, 83 men, 3 other; mean age = 34.90 years, SD = 9.93; range = 20–70 years; the preregistration is at https://osf.io/6p8t4/), Study 4 (N = 121 MTurk workers; 63 women, 57 men, 1 other; mean age = 39.50 years, SD = 11.20; range = 20–66 years; the preregistration is at https://osf.io/6sncm/). All participants provided informed consent in accordance with policies of the University of Toronto’s Institutional Review Board.

Within-subjects design

To reduce error variance and increase statistical power, we used within-subjects designs in all four studies. To minimize demand characteristics and learning effects, we had each participant complete the low-demand and high-demand tasks on two separate days. In Studies 1 and 2, undergraduate participants completed the two tasks in two different weeks. Both sessions occurred on the same day of each week at the same time of the day. Each participant was pseudorandomly assigned to complete either the low-demand or high-demand task on the first day on the basis of his or her allocated participant number. Participants in Studies 1 and 2 received course credits for completing the study. In Studies 3 and 4, MTurk workers also completed the two tasks in two different weeks. However, because participants recruited via this online platform usually complete tasks at their convenience, they completed the second task 7 to 12 days after they completed the first task. They also did not have to complete the two tasks at the same time of the day. Each was randomly assigned to complete either the low-demand or high-demand task on the first day. Participants in Studies 3 and 4 received $2.90 and $3.60, respectively, for completing the study.

Procedure and sequential-task paradigm

Task 1: experimental manipulation

The low-demand task required participants to watch a 5-min wildlife video. The high-demand task required participants to complete a titrated symbol-counting task (study materials and code are available at https://osf.io/45gyk/; see Garavan et al., 2000); the high-demand task lasted approximately 20, 15, 5, and 10 min in Studies 1, 2, 3, and 4, respectively. We did not match the durations of the low-demand and high-demand tasks in Studies 1, 2, and 4 because we wanted to avoid inducing boredom with long but easy control tasks, which might lead to levels of subjective fatigue comparable with those from exerting cognitive effort on a demanding task (Milyavskaya, Inzlicht, Johnson, & Larson, 2019) and potentially undermine the demand manipulation. Further, previous work using unbalanced designs such as ours have reported stronger effects (Sjåstad & Baumeister, 2018).

The symbol-counting task is a cognitive task that parametrically manipulates executive demands (Garavan et al., 2000). On each trial, participants had to count the number of small black squares that had been presented. Thus, the task heavily taxes the shifting and updating (but not inhibition) aspects of executive function (Miyake & Friedman, 2012). To further increase the difficulty of the task, we calibrated the task for each individual such that difficulty was adjusted trial by trial according to the participant’s performance on the previous trial. On each trial, multiple small and big squares were presented sequentially (between 11 and 17 squares per trial), and each square was preceded by a fixation cross (Fig. 2). The first trial began with 12 squares and a switch frequency of 5 (i.e., the squares within a trial switched 5 times, from small to big square or big to small square). At the end of each trial, participants indicated how many small and how many big squares were presented. That is, participants had to keep a running tally of two lists. If participants responded correctly, the total number of squares in the next trial increased by one, the switch could also increase, and the square display duration decreased by 20 ms (see Table S1 in the Supplemental Material available online for details on how the switch frequency was determined on each trial and other task details). If participants responded incorrectly, the number of squares on the next trial decreased by one, the square display duration increased by 20 ms, and the switch frequency decreased. These calibration procedures helped to ensure that even without drawing on inhibition processes, the task was demanding and tiring for all participants regardless of individual differences in executive-function abilities.

Fig. 2.

Example trial from the titrated symbol-counter task used in the high-demand manipulation (adapted from the study by Garavan, Ross, Li, & Stein, 2000). This calibrated task heavily taxes the shifting and updating aspects of executive function. On each trial, multiple small and big squares were presented sequentially, and participants reported the number of small and big squares presented at the end of the trial. If participants responded correctly, the total number of squares in the next trial increased, the switch frequency increased, and the square display duration decreased. If participants responded incorrectly, the total number of squares on the next trial decreased, the switch frequency decreased, and the square display duration increased.

Measures of phenomenology

After completing the low-demand and high-demand tasks, participants answered five questions (presented in random order) about the task and their current mental state using a sliding Likert scale: (a) mental demand: “How mentally demanding was the [task/video task]?” (from very low demand to very demanding); (b) effort: “How hard did you have [to work/to work to watch the video]?” (from very little to very hard); (c) frustration: “How insecure, discouraged, irritated, stressed, and annoyed [were you/were you when watching the video]?” (from very little to very high); (d) boredom: “How boring was the [task/video task]?” (from not boring to very boring); (e) fatigue: “I’m mentally fatigued now” (from strongly disagree to strongly agree). Each scale ranged from 1 to 7, but participants did not see the scale ranges and saw only the two text anchors below the scale.

Task 2: outcome measure

After completing the experimental manipulation and manipulation checks, participants completed a Stroop task with 120 congruent and 60 incongruent trials. On each trial, a word (“red,” “blue,” or “yellow”) was presented in either the color red, blue, or yellow, and participants had to indicate the font color of the word by pressing a key (V = red, B = blue, N = yellow). The same mapping was used for all participants and was displayed at the bottom of the screen throughout the task. On congruent trials, the word and color matched (e.g., the word “red” shown in red font); on incongruent trials, the word and color did not match (e.g., the word “red” shown in blue font). Congruent and incongruent trials were interleaved randomly, and the stimulus on each trial remained on screen until the participant responded or until 2,000 ms had elapsed. If participants failed to respond on three consecutive trials, they were reminded to respond faster and more accurately. Participants practiced 12 trials before completing 180 experimental trials.

Exclusion criteria

We preregistered the same four exclusion criteria¹ for all four studies to exclude low-quality data (e.g., see the Exclusion Criteria section at https://osf.io/6sncm/). First, we excluded participants whose overall accuracy on the high-demand task (titrated symbol-counter task) was less than 20% (3, 9, 25, and 28 participants were excluded in Studies 1, 2, 3, and 4, respectively). Second, for the dependent variable (Stroop task), we excluded trials on which reaction time was faster than 250 ms (0.52%, 0.26%, 1.66%, and 0.93% trials were excluded in Studies 1, 2, 3, and 4, respectively). Third, we used a robust outlier-detection approach (median absolute deviation) rather than the commonly used but problematic ±3-SD approach to exclude trials with outlier reaction times (Leys, Delacre, Mora, Lakens, & Ley, 2019). For each participant and within each experimental condition, we excluded trials on which the reaction time was ±3 times the median absolute deviation (5.44%, 5.06%, 5.29%, and 4.68% of trials were excluded in Studies 1, 2, 3, and 4, respectively). Fourth, we used the same robust approach and criterion to exclude participants who made too many errors on congruent Stroop trials (12, 3, 24, and 16 participants were excluded in Studies 1, 2, 3, and 4, respectively). Note that these four criteria did not pertain to our dependent variables because the goal was to exclude extremely low-quality data (e.g., disengaged or inattentive participants who showed little signs of trying) rather than to exclude outliers on the basis of the outcome variables. Rerunning our main analyses with outliers included did not change our main conclusions (Table S6 in the Supplemental Material).

Diffusion-model fitting

We fitted the EZ-diffusion model (Wagenmakers et al., 2007) to each participant’s Stroop data, which transformed the observed reaction time and accuracy variables into the latent variables of drift rate, boundary, and nondecision time (for code, see Lin, 2019). This model does not compute the starting-point bias because it assumes that the starting point is equidistant from the two boundaries. Furthermore, the boundary parameter is generally assumed to be determined before stimulus onset and therefore should not vary as a function of Stroop stimulus congruency. However, using the EZ-diffusion model in our case prevented us from forcing the boundary to be the same for all stimuli. We therefore obtained separate boundary-parameter estimates for congruent and incongruent Stroop stimuli, assuming that participants rapidly adjusted their boundaries immediately after stimulus onset.

Despite these assumptions and the fact that the EZ-diffusion model is simpler than the full model, several studies have shown that the former often outperforms the latter (Dutilh et al., 2019; van Ravenzwaaij et al., 2017). To verify the EZ-diffusion model results, we ran exploratory but preregistered analyses (osf.io/7qcxa) to fit more appropriate models (i.e., diffusion model for conflict tasks; Evans & Servant, 2019), which led to conclusions similar to those obtained via EZ-diffusion modeling (see Figs. S2 and S3 in the Supplemental Material).

Preregistered hypotheses and analyses

Phenomenology

We expected participants to report higher mental demand, effort exerted, frustration, boredom, and fatigue in the high-demand than in the low-demand condition.

Primary hypotheses

We expected the high-demand condition to have a smaller boundary than the low-demand condition after controlling for Stroop congruency (trial type: congruent vs. incongruent), which would reflect less cautious or more impulsive responding after completing the high-demand task. We also expected the high-demand condition to have lower drift rate than the low-demand condition after controlling for Stroop congruency, which would reflect a slower information-processing rate. These analyses reflected our belief at the time that our demand manipulation should reduce overall task motivation and engagement rather than specifically reduce self-control or inhibition abilities. Note that we preregistered these two hypotheses only for Studies 2 to 4 but not Study 1, for which we preregistered only traditional Stroop behavioral effects. Finally, another plausible outcome² we did not preregister (and failed to observe in our data) is that effort exertion reduces drift rate and that participants might compensate by increasing boundary separation to ensure they maintain acceptable accuracies on the task.

Secondary hypotheses

We also tested additional hypotheses to indirectly examine the effects of high and low demand, but the primary effects described above did not hinge on these secondary effects. On the basis of the results from Study 1, we expected that (a) participants who reported feeling more fatigued, frustrated, or bored³ after completing the first task would have lower drift rate or boundary on the Stroop task and (b) incongruent Stroop trials would be associated with lower drift rate and boundary⁴ than congruent Stroop trials.

Exploratory analyses

We investigated the effects of our manipulations on traditional Stroop behavioral outcomes (reported in the Results section). Note that we preregistered these behavioral effects in Study 1 but not Studies 2 to 4. In addition, we verified the results of the EZ-diffusion model by running exploratory preregistered analyses (osf.io/7qcxa) that involved fitting more complicated diffusion models (Evans & Servant, 2019; Ulrich, Schröter, Leuthold, & Birngruber, 2015). Finally, because Stroop performance might be influenced by practice or learning effects (because of our within-subjects design), we also tested for session-order effects.

Statistical analyses

Continuous predictors were mean-centered on participant, and categorical predictors were recoded before model fitting: condition (low demand = −0.5; high demand = 0.5) and Stroop congruency (congruent = −0.5; incongruent = 0.5). We fitted Bayesian multilevel models using the R package brms (Bürkner, 2017). We first fitted two-level varying-intercept multilevel models separately for each study in which data/units clustered within participants:

y_{i} = β_{0 [participant] [i]} + X_{i} β +_{i}; R syntax : (1 | participant) .

To meta-analyze the four studies to obtain an overall effect, we fitted three-level varying-intercept multilevel models in which data were clustered within participants, who were in turn clustered within studies:

\begin{array}{l} y_{i} = β_{0 [study] [participant] [i]} + X_{i} β +_{i}; R syntax : \\ (1 | study / participant) . \end{array}

For the condition effect (high demand vs. low demand) in each model, we used an informed Gaussian prior of d equal to 0.28 (SD = 0.14), which was based on a Bayesian reanalysis of a depletion study (Wagenmakers & Gronau, 2017). The priors were rescaled to the raw scale of each outcome measure so that the prior mean reflected the expected difference between the low-demand and high-demand conditions, and the standard deviation of the prior distribution was half the prior mean (Dienes, 2014). For example, for the effects of condition on self-reported demand and boundary, the priors were N(0.36, 0.18) and N(−0.0088, 0.0044), respectively (see Fig. S1 in the Supplemental Material for visualizations of the prior and posterior distributions). For other effects that did not directly test the effect of our demand manipulation, we used the standard normal prior, N(0, 1).

Because the prior influences the posterior, we performed prior-sensitivity analyses by refitting the models using normal priors with the same standard deviations as the informed priors but centered around 0 for the effect of interest; effects not directly testing our demand manipulation had the prior N(0, 1). These priors reflected the belief that our experimental effects would be relatively tightly centered around 0: For example, the priors for the effects of condition on self-reported mental demand and boundary were N(0, 0.18) and N(0, 0.0044), respectively (compare these with the informed priors above). Results from the sensitivity analyses were consistent with our main or original conclusions, suggesting that our findings were robust to prior choice (see Table S2 in the Supplemental Material for complete results from models fitted using these priors).

For each model, we ran 20 Markov chain Monte Carlo chains with 2,000 samples and discarded the first 1,000 samples (as burn-in). For each effect, we report the mean of the posterior samples and the 95% highest-posterior-density (HPD) interval, which is the narrowest interval containing the specified probability mass. We used bridge sampling to compute Bayes factors (BFs), which reflect the amount of evidence favoring one model over a reduced model that does not contain the effect or hypothesis of interest. To ensure the stability of the results, we report BFs that were the mean of five BF computations. BFs equal to 1 indicate equal evidence for the null and experimental hypotheses. BFs greater than 1 indicate evidence in favor of the experimental hypothesis: from 1 to 3 indicate anecdotal evidence, from 3 to 10 indicate moderate evidence, from 10 to 30 indicate strong evidence, and greater than 30 indicate very strong or decisive evidence (Lee & Wagenmakers, 2013; but for problems with BFs, see Gelman & Shalizi, 2013). Conversely, BFs less than 1 indicate evidence in favor of the null hypothesis. Smaller values indicate stronger evidence for the null hypothesis: BFs from 0.33 to 1 indicate anecdotal evidence, from 0.10 to 0.33 indicates moderate evidence, from 0.03 to 0.10 indicates strong evidence, and less than 0.03 indicate very strong or decisive evidence. All data, materials, and code for the main analyses can be found at https://osf.io/45gyk/.

Preregistered-Analysis Results

Phenomenology

Demand

We found strong and consistent effects of condition on self-reported mental demand. In all studies (Fig. 3), mental demand was much higher in the high-demand than in the low-demand conditions (Study 1: b = 1.96, 95% HPD = [1.75, 2.17], d = 1.46; Study 2: b = 1.73, 95% HPD = [1.45, 2.01], d = 1.21; Study 3: b = 2.23, 95% HPD = [1.96, 2.48], d = 1.56; Study 4: b = 2.06, 95% HPD = [1.73, 2.39], d = 1.37; see Table 1 for more information). These findings suggest that our paradigm was highly effective for eliciting effort-related phenomenology. To meta-analyze the effects across studies, we fitted a three-level multilevel model (data/units clustered within participants, who were clustered within studies). The meta-analytic effect was equally strong, b = 2.83, 95% HPD = [2.70, 2.96], BF > 500, d = 2.17, 95% HPD = [2.02, 2.33] (see Fig. 3). Self-reported demand was much higher in the high-demand condition (M = 5.61, SD = 1.23) than in the low-demand condition (M = 2.41, SD = 1.36).

Fig. 3.

Phenomenology collapsed across studies: kernel-density estimates as a function of condition (high demand vs. low demand), separately for each of the self-reported ratings of mental demand, effort, frustration, boredom, and fatigue. The area beneath each density curve sums to 1. See Table 1 for detailed statistics. BF = Bayes factor.

Table 1.

Preregistered Analyses: Results From Bayesian Multilevel Models Using Informed Priors

Independent and dependent variable	Study 1 (20 min)	Study 2 (15 min)	Study 3 (5 min)	Study 4 (10 min)	Overall
Condition
Demand	1.96 [1.75, 2.17] (BF > 500)d = 1.46 [1.25, 1.69]	1.73 [1.45, 2.01] (BF > 500)d = 1.21 [0.93, 1.49]	2.23 [1.96, 2.48] (BF > 500)d = 1.56 [1.29, 1.84]	2.06 [1.73, 2.39] (BF > 500)d = 1.37 [1.03, 1.72]	2.83 [2.70, 2.96] (BF > 500)d = 2.17 [2.02, 2.33]
Effort	1.90 [1.69, 2.11] (BF > 500)d = 1.42 [1.20, 1.64]	1.58 [1.31, 1.86] (BF > 500)d = 1.11 [0.85, 1.38]	2.37 [2.11, 2.63] (BF > 500)d = 1.72 [1.42, 2.01]	1.99 [1.66, 2.31] (BF > 500)d = 1.29 [0.98, 1.62]	2.75 [2.61, 2.88] (BF > 500)d = 2.09 [1.94, 2.24]
Frustration	2.13 [1.91, 2.36] (BF > 500)d = 1.60 [1.34, 1.87]	1.31 [1.02, 1.59] (BF > 500)d = 0.88 [0.65, 1.13]	1.50 [1.23, 1.77] (BF > 500)d = 0.97 [0.75, 1.22]	1.48 [1.15, 1.78] (BF > 500)d = 0.96 [0.68, 1.26]	2.10 [1.95, 2.24] (BF > 500)d = 1.50 [1.36, 1.64]
Boredom	1.13 [0.89, 1.38] (BF > 500)d = 0.72 [0.54, 0.89]	0.69 [0.38, 1.00] (BF > 500)d = 0.43 [0.23, 0.63]	0.60 [0.31, 0.89] (BF = 198.00)d = 0.35 [0.17, 0.53]	0.74 [0.40, 1.08] (BF > 500)d = 0.41 [0.21, 0.61]	0.91 [0.75, 1.08] (BF > 500)d = 0.55 [0.45, 0.66]
Fatigue	2.05 [1.84, 2.25] (BF > 500)d = 1.65 [1.37, 1.90]	1.41 [1.13, 1.69] (BF > 500)d = 0.97 [0.72, 1.21]	1.92 [1.65, 2.19] (BF > 500)d = 1.25 [1.01, 1.49]	1.98 [1.64, 2.31] (BF > 500)d = 1.27 [0.97, 1.60]	2.49 [2.35, 2.63] (BF > 500)d = 1.86 [1.70, 2.02]
Boundary	−0.004 [−0.006, −0.002] (BF = 25.45)d = −0.22 [−0.34, −0.10]	−0.002 [−0.005, 0.001] (BF = 0.08)d = −0.08 [−0.24, 0.07]	−0.005 [−0.008, −0.003] (BF > 500)d = −0.32 [−0.46, −0.18]	−0.006 [−0.01, 0.00] (BF = 0.60)d = −0.13 [−0.26, 0.004]	−0.004 [−0.005, −0.002] (BF = 29.10)d = −0.15 [−0.22, −0.07]
Drift rate	−0.007 [−0.01, −0.001] (BF = 0.44)d = −0.14 [−0.26, −0.02]	−0.01 [−0.02, −0.002] (BF = 0.89)d = −0.20 [−0.36, −0.05]	0.001 [−0.006, 0.009] (BF = 0.04)d = 0.03 [−0.11, 0.16]	−0.003 [−0.01, 0.007] (BF = 0.07)d = −0.05 [−0.21, 0.11]	−0.003 [−0.007, 0.001] (BF = 0.06)d = −0.05 [−0.13, 0.02]
Congruency
Boundary	−0.02 [−0.02, −0.02] (BF > 500)d = −1.12 [−1.25, −0.98]	−0.02 [−0.02, −0.01] (BF > 500)d = −0.94 [−1.12, −0.76]	−0.01 [−0.02, −0.01] (BF > 500)d = −0.88 [−1.04, −0.72]	−0.01 [−0.02, −0.004] (BF = 0.30)d = −0.27 [−0.44, −0.09]	−0.02 [−0.02, −0.01] (BF > 500)d = −0.68 [−0.75, −0.60]
Drift rate	−0.10 [−0.11, −0.10] (BF > 500)d = −2.11 [−2.28, −1.96]	−0.10 [−0.11, −0.10] (BF > 500)d = −2.18 [−2.40, −1.96]	−0.09 [−0.10, −0.09] (BF > 500)d = −1.78 [−1.95, −1.60]	−0.10 [−0.11, −0.09] (BF > 500)d = −1.56 [−1.79, −1.36]	−0.10 [−0.10, −0.10] (BF > 500)d = −1.90 [−2.00, −1.81]
Fatigue
Boundary	−0.001 [−0.002, 0.00] (BF = 1.19)d = −0.09 [−0.15, −0.03]	0.00 [−0.001, 0.001] (BF = 0.02)d = 0.01 [−0.07, 0.09]	−0.002 [−0.003, −0.001] (BF = 317.55)d = −0.14 [−0.20, −0.08]	−0.002 [−0.004, 0.00] (BF = 0.10)d = −0.04 [−0.10, 0.02]	−0.001 [−0.002, 0.00] (BF = 1.88)d = −0.05 [−0.08, −0.02]
Drift rate	−0.001 [−0.003, 0.001] (BF = 0.03)d = −0.03 [−0.09, 0.03]	−0.003 [−0.006, 0.00] (BF = 0.09)d = −0.07 [−0.15, 0.01]	0.00 [−0.002, 0.003] (BF = 0.02)d = 0.01 [−0.05, 0.07]	0.00 [−0.004, 0.003] (BF = 0.03)d = −0.007 [−0.07, 0.06]	0.00 [−0.002, 0.001] (BF = 0.01)d = −0.007 [−0.04, 0.02]
Frustration
Boundary	−0.001 [−0.002, 0.00] (BF = 0.96)d = −0.08 [−0.14, −0.03]	0.00 [−0.001, 0.002] (BF = 0.02)d = 0.02 [−0.07, 0.11]	−0.002 [−0.003, −0.001] (BF = 6.86)d = −0.12 [−0.19, −0.05]	−0.002 [−0.005, 0.001] (BF = 0.10)d = −0.04 [−0.12, 0.03]	−0.001 [−0.002, 0.00] (BF = 0.25)d = −0.04 [−0.08, −0.009]
Drift rate	−0.002 [−0.004, 0.00] (BF = 0.23)d = −0.07 [−0.12, −0.01]	−0.002 [−0.006, 0.001] (BF = 0.06)d = −0.06 [−0.15, 0.03]	0.00 [−0.003, 0.003] (BF = 0.03)d = −0.006 [−0.07, 0.06]	−0.002 [−0.006, 0.003] (BF = 0.05)d = −0.03 [−0.11, 0.05]	−0.001 [−0.003, 0.00] (BF = 0.05)d = −0.03 [−0.07, 0.005]
Boredom
Boundary	−0.001 [−0.002, 0.00] (BF = 0.08)d = −0.06 [−0.13, 0.009]	0.00 [−0.002, 0.001] (BF = 0.03)d = 0.00 [−0.10, 0.09]	−0.001 [−0.002, 0.00] (BF = 0.17)d = −0.09 [−0.17, −0.005]	0.00 [−0.003, 0.004] (BF = 0.06)d = 0.005 [−0.08, 0.09]	0.00 [−0.001, 0.00] (BF = 0.02)d = −0.008 [−0.05, 0.04]
Drift rate	−0.004 [−0.006, −0.001] (BF = 0.99)d = −0.10 [−0.17, −0.03]	−0.003 [−0.007, −0.001] (BF = 0.11)d = −0.08 [−0.18, 0.01]	−0.001 [−0.005, 0.002] (BF = 0.04)d = −0.03 [−0.11, 0.05]	−0.005 [−0.01, 0.00] (BF = 0.27)d = −0.09 [−0.18, 0.004]	−0.003 [−0.004, −0.001] (BF = 0.67)d = −0.06 [−0.11, −0.02]

Note: The first value in each cell is a parameter estimate. Values in brackets are 95% highest-posterior-density intervals. Informed priors reflecting Cohen’s d = 0.28 (SD = 0.14) were created by rescaling the expected effect size to the raw scale of each outcome measure. Bayes factors (BFs) were computed using bridge sampling. BFs greater than 1 indicate evidence for the experimental hypothesis, whereas BFs less than 1 indicate evidence for the null hypothesis. For each study, the time given in parentheses indicates the length of the high-demand task (the low-demand task was always 5 min).

Effort

Self-reported effort was also much higher in the high-demand than in the low-demand condition in all studies (Study 1: b = 1.90, 95% HPD = [1.69, 2.11], d = 1.42; Study 2: b = 1.58, 95% HPD = [1.31, 1.86], d = 1.11; Study 3: b = 2.37, 95% HPD = [2.11, 2.63], d = 1.72; Study 4: b = 1.99, 95% HPD = [1.66, 2.31], d = 1.29). Results from the three-level multilevel-model meta-analysis suggest that overall, participants reported exerting much more effort in the high-demand condition (M = 5.44, SD = 1.22) than in the low-demand condition (M = 2.33, SD = 1.42; b = 2.75, 95% HPD = [2.61, 2.88], BF > 500, d = 2.09, 95% HPD = [1.94, 2.24]).

Frustration

Similarly, participants reported feeling more frustrated in the high-demand condition than in the low-demand condition in all studies (Study 1: b = 2.13, 95% HPD = [1.91, 2.36], d = 1.60; Study 2: b = 1.31, 95% HPD = [1.02, 1.59], d = 0.88; Study 3: b = 1.50, 95% HPD = [1.23, 1.77], d = 0.97; Study 4: b = 1.48, 95% HPD = [1.15, 1.78], d = 0.96). Results from the three-level multilevel-model meta-analysis were similar (low-demand condition: M = 2.19, SD = 1.41; high-demand condition: M = 4.49, SD = 1.70; b = 2.10, 95% HPD = [1.95, 2.24], BF > 500, d = 1.50, 95% HPD [1.36, 1.64]).

Boredom

Self-reported boredom was higher in the high-demand condition than in the low-demand condition in all studies (Study 1: b = 1.13, 95% HPD = [0.89, 1.38], d = 0.72; Study 2: b = 0.69, 95% HPD = [0.38, 1.00], d = 0.43; Study 3: b = 0.60, 95% HPD = [0.31, 0.89], d = 0.35; Study 4: b = 0.74, 95% HPD = [0.40, 1.08], d = 0.41). Results from the three-level multilevel-model meta-analysis suggest that the effect was consistent across studies but smaller than the effects on demand, effort, and frustration (low-demand condition: M = 3.53, SD = 1.83; high-demand condition: M = 4.48, SD = 1.87; b = 0.91, 95% HPD = [0.75, 1.08], BF > 500, d = 0.55, 95% HPD = [0.45, 0.66]).

Fatigue

Finally, participants reported higher fatigue in the high-demand condition than in the low-demand condition in all studies (Study 1: b = 2.05, 95% HPD = [1.84, 2.25], d = 1.65; Study 2: b = 1.41, 95% HPD = [1.13, 1.69], d = 0.97; Study 3: b = 1.92, 95% HPD = [1.65, 2.19], d = 1.25; Study 4: b = 1.98, 95% HPD = [1.64, 2.31], d = 1.27). Results from the three-level multilevel-model meta-analysis were similar (low-demand condition: M = 2.11, SD = 1.30; high-demand condition: M = 4.87, SD = 1.58; b = 2.49, 95% HPD = [2.35, 2.63], BF > 500, d = 1.86, 95% HPD = [1.70, 2.02], which suggests that our high-demand task was effective in eliciting fatigue.

Latent parameter: boundary

We report the effect of condition on the boundary parameter after controlling for Stroop congruency, which had strong effects on the boundary parameter in all studies (Fig. 4, Table 1). The boundary parameter was smaller in the high-demand than in the low-demand condition in all four studies (Study 1: b = −0.004, 95% HPD = [−0.006, −0.002], d = −0.22; Study 2: b = −0.002, 95% HPD = [−0.005, 0.001], d = −0.08; Study 3: b = −0.005, 95% HPD = [−0.008, −0.003], d = −0.32; Study 4: b = −0.006, 95% HPD = [−0.01, 0.00], d = −0.13), but only Studies 1 and 3 had effects whose 95% HPD did not include 0. Moreover, results from the three-level multilevel-model meta-analysis provided strong evidence for our preregistered hypothesis that completing a high-demand as opposed to low-demand task would lead to reduced boundary, b = −0.004, 95% HPD = [−0.005, −0.002], BF = 29.10, d = −0.15, 95% HPD = [−0.22, −0.07]; this effect was similar when we used a prior centered around 0 (but retaining the scale of the informed prior), b = −0.003, 95% HPD = [−0.005, −0.001], BF = 88.33, d = −0.13, 95% HPD = [−0.20, −0.06]. Together, our results provide strong and decisive evidence in favor of the hypothesis that exerting mental effort decreases subsequent boundary separation. Nonetheless, even if reliable, the meta-analytic effect size was small (b = −0.004, d = −0.13) and slightly less than half the expected effect size (prior b = −0.0088, prior d = −0.28; see Fig. 4).

Fig. 4.

Bayesian posterior- and prior-density distributions for the effect of condition (low demand vs. high demand) on the boundary (left) and drift-rate (right) parameters obtained from meta-analytic Bayesian multilevel models. Prior distributions reflect expectations about the effect sizes before empirical data are collected: Informed priors reflecting Cohen’s d of −0.28 (SD = 0.14) were created by rescaling the expected effect size to the raw scale of each parameter. Posterior distributions reflect revised or updated beliefs and effect sizes after empirical data are taken into consideration. See Table 1 for detailed statistics. BF = Bayes factor.

Exploratory analyses including session order and the session-order-by-condition interaction in the models showed that practice or learning effects were strong and consistent with previous work (e.g., Dutilh, Krypotos, & Wagenmakers, 2011). Boundary separation was smaller in the second than first session in all studies (BFs > 500), but order did not interact with condition, and the effect of our demand manipulation remained small but highly robust, b = −0.003, 95% HPD = [−0.005, −0.002], BF = 117.67, d = −0.14, 95% HPD = [−0.21, −0.07] (see Fig. S6 and Table S3 in the Supplemental Material).

Latent parameter: drift rate

We also report the effect of condition on drift rate after controlling for Stroop congruency, which had strong effects on the drift-rate parameter in all studies (Fig. 4, Table 1). The effects of condition on drift rate were inconsistent across studies (Study 1: b = −0.007, 95% HPD = [−0.01, −0.001], d = −0.14; Study 2: b = −0.01, 95% HPD = [−0.02, −0.002], d = −0.20; Study 3: b = 0.001, 95% HPD = [−0.006, 0.009], d = 0.03; Study 4: b = −0.003, 95% HPD = [−0.01, 0.007], d = −0.05). Further, results from the three-level multilevel-model meta-analysis suggest—contrary to our preregistered hypothesis—that completing a high-demand as opposed to a low-demand task did not lead to reduced drift rate, b = −0.003, 95% HPD = [−0.007, 0.001], BF = 0.06, d = −0.05, 95% HPD = [−0.13, 0.02] (Fig. 4).

Exploratory analyses including session order and the session-order-by-condition in the models showed that practice or learning effects were strong; these effects were consistent with previous findings (e.g., Dutilh et al., 2011). Drift rate was higher in the second than in the first session in all studies (BFs > 500), reflecting improved task performance. Order did not interact with condition, and drift rate did not differ between the high-demand and low-demand conditions (see Fig. S6 and Table S3).

Finally, to verify the results of the EZ-diffusion model, we ran exploratory analyses that fitted the diffusion model for conflict tasks, which is specifically designed for cognitive control tasks such as the Stroop. It models information integration during conflict tasks as a function of controlled and automatic processes; the automatic process varies over time according to a gamma function (Ulrich et al., 2015). Consistent with the results of the EZ-diffusion model, these analyses showed that effort exertion reduced boundary separation but had no effect on information integration via controlled (drift-rate parameter) or automatic (ζ parameter) processes. These results bolster our interpretation that exerting effort or being depleted does not selectively impair one’s ability to inhibit automatic processes (see Figs. S2–S5 in the Supplemental Material for details and additional results from the regular analytic diffusion model).

Phenomenology and latent-parameter relations

Boundary–fatigue relation

Increased self-reported fatigue was associated with smaller boundaries in three studies (Study 1: b = −0.001, 95% HPD = [−0.002, 0.00], d = −0.09; Study 3: b = −0.002, 95% HPD = [−0.003, −0.001], d = −0.14; Study 4: b = −0.002, 95% HPD = [−0.004, 0.00], d = −0.04) but not in Study 2 (b = 0.00, 95% HPD = [−0.001, 0.001], d = 0.01). Further, results from the three-level multilevel-model meta-analysis provided weak evidence for the prediction that increased fatigue was associated with reduced boundary, b = −0.001, 95% HPD = [−0.002, 0.00], BF = 1.88, d = −0.05, 95% HPD = [−0.08, −0.02].

Boundary–frustration relation

Increased self-reported frustration was associated with smaller boundaries in three studies, although only Study 3’s 95% HPD intervals did not include 0 (Study 1: b = −0.001, 95% HPD = [−0.002, 0.00], d = −0.08; Study 2: b = 0.00, 95% HPD = [−0.001, 0.002], d = 0.02; Study 3: b = −0.002, 95% HPD = [−0.003, −0.001], d = −0.12; Study 4: b = −0.002, 95% HPD = [−0.005, 0.001], d = −0.04). Overall, the three-level multilevel-model meta-analysis indicated that frustration was not associated with reduced boundary, b = −0.001, 95% HPD = [−0.002, 0.00], BF = 0.25, d = −0.04, 95% HPD = [−0.08, −0.009].

Boundary–boredom relation

Self-reported boredom was not associated with the boundary parameter in any of the four studies. All 95% HPD intervals included 0 (Table 1).

Drift rate–fatigue relation

Self-reported fatigue was not associated with the drift-rate parameter in any of the four studies. All 95% HPD intervals included 0 (Table 1).

Drift rate–frustration relation

Self-reported frustration was not associated with the drift-rate parameter in any of the four studies. All 95% HPD intervals included 0 (see Table 1).

Drift rate–boredom relation

Self-reported boredom was associated with reduced drift rate in Studies 1 and 2 (Study 1: b = −0.004, 95% HPD = [−0.006, −0.001], d = −0.10; Study 2: b = −0.003, 95% HPD = [−0.007, −0.001], d = −0.08; Study 3: b = −0.001, 95% HPD = [−0.005, 0.002], d = −0.03; Study 4: b = −0.005, 95% HPD = [−0.01, 0.00], d = −0.09). However, overall, the three-level multilevel-model results failed to provide evidence for the prediction that increased boredom was associated with reduced drift rate, b = −0.003, 95% HPD = [−0.004, −0.001], BF = 0.67, d = −0.06, 95% HPD = [−0.11, −0.02].

Exploratory analyses found that session order did not interact with fatigue, frustration, or boredom (see Tables S4 and S5 in the Supplemental Material).

Exploratory-Analysis Results

Other (nonpreregistered) analyses might provide further insights into the psychology of effort exertion and ego depletion (see Table 2). Note that because we did not preregister the analyses below, we used normal priors centered around 0 instead of informed priors.

Table 2.

Exploratory Analyses: Results From Bayesian Multilevel Models Using Zero-Centered Normal Priors

Independent and dependent variable	Study 1 (20 min)	Study 2 (15 min)	Study 3 (5 min)	Study 4 (10 min)	Overall
Demand
Boundary	−0.001 [−0.002, 0.00] (BF = 0.65)d = −0.05 [−0.11, 0.002]	0.00 [−0.001, 0.00] (BF = 0.19)d = −0.03 [−0.10, 0.04]	−0.001 [−0.002, 0.00] (BF = 226.86)d = −0.10 [−0.16, −0.05]	−0.001 [−0.003, 0.001] (BF = 0.54)d = −0.03 [−0.09, 0.02]	−0.001 [−0.002, −0.001] (BF = 17.21)d = −0.05 [−0.08, −0.02]
Drift rate	0.00 [−0.002, 0.002] (BF = 0.13)d = −0.01 [−0.07, 0.05]	−0.002 [−0.005, 0.00] (BF = 0.72)d = −0.06 [−0.13, 0.006]	0.00 [−0.002, 0.003] (BF = 0.17)d = 0.02 [−0.04, 0.07]	0.00 [−0.003, 0.003] (BF = 0.20)d = 0.008 [−0.05, 0.06]	0.00 [−0.001, 0.001] (BF = 0.08)d = −0.005 [−0.03, 0.02]
Effort
Boundary	0.00 [−0.001, 0.00] (BF = 0.30)d = −0.04 [−0.10, 0.01]	0.00 [−0.002, 0.00] (BF = 0.22)d = −0.03 [−0.10, 0.04]	−0.001 [−0.002, 0.00] (BF = 66.44)d = −0.10 [−0.15, −0.04]	−0.001 [−0.004, 0.00] (BF = 0.57)d = −0.03 [−0.09, 0.02]	−0.001 [−0.002, 0.00] (BF = 16.04)d = −0.05 [−0.08, −0.02]
Drift rate	−0.001 [−0.003, 0.001] (BF = 0.23)d = −0.03 [−0.08, 0.03]	−0.002 [−0.005, 0.00] (BF = 0.62)d = −0.06 [−0.14, 0.01]	0.00 [−0.002, 0.002] (BF = 0.13)d = 0.00 [−0.05, 0.05]	0.00 [−0.003, 0.004] (BF = 0.20)d = 0.01 [−0.05, 0.07]	0.00 [−0.002, 0.00] (BF = 0.10)d = −0.01 [−0.04, 0.02]
Condition × Congruency
Boundary	0.003 [−0.001, 0.006] (BF = 1.04)d = 0.14 [−0.08, 0.36]	−0.001 [−0.006, 0.004] (BF = 0.64)d = −0.06 [−0.34, 0.21]	0.001 [−0.003, 0.005] (BF = 0.58)d = 0.08 [−0.18, 0.33]	0.00 [−0.008, 0.007] (BF = 0.89)d = −0.01 [−0.18, 0.16]	0.00 [−0.003, 0.004] (BF = 0.43)d = 0.03 [−0.11, 0.17]
Drift rate	0.002 [−0.009, 0.01] (BF = 0.47)d = 0.04 [−0.18, 0.26]	0.001 [−0.01, 0.01] (BF = 0.58)d = 0.02 [−0.26, 0.30]	−0.003 [−0.02, 0.01] (BF = 0.58)d = −0.05 [−0.30, 0.19]	0.003 [−0.01, 0.02] (BF = 0.74)d = 0.04 [−0.21, 0.30]	0.001 [−0.007, 0.009] (BF = 0.32)d = 0.02 [−0.12, 0.16]
Condition
Stroop accuracy	−0.007 [−0.02, 0.00] (BF = 1.73)d = −0.11 [−0.22, 0.009]	−0.003 [−0.01, 0.007] (BF = 0.49)d = −0.04 [−0.19, 0.11]	−0.003 [−0.01, 0.006] (BF = 0.46)d = −0.05 [−0.18, 0.09]	−0.004 [−0.01, 0.007] (BF = 0.58)d = −0.05 [−0.21, 0.10]	−0.005 [−0.01, 0.00] (BF = 1.21)d = −0.08 [−0.15, −0.01]
Stroop reaction time	−0.004 [−0.01, 0.003] (BF = 0.42)d = −0.07 [−0.19, 0.05]	0.009 [−0.002, 0.02] (BF = 1.07)d = 0.14 [−0.02, 0.30]	−0.01 [−0.02, −0.003] (BF = 7.15)d = −0.19 [−0.33, −0.05]	−0.01 [−0.02, −0.004] (BF = 15.03)d = −0.25 [−0.42, −0.08]	−0.006 [−0.01, −0.001] (BF = 2.32)d = −0.09 [−0.17, −0.02]
Congruency
Stroop accuracy	−0.09 [−0.10, −0.08] (BF > 500)d = −1.28 [−1.41, −1.14]	−0.09 [−0.10, −0.08] (BF > 500)d = −1.36 [−1.56, −1.17]	−0.07 [−0.08, −0.06] (BF > 500)d = −0.99 [−1.15, −0.83]	−0.07 [−0.08, −0.05] (BF > 500)d = −0.95 [−1.15, −0.76]	−0.08 [−0.08, −0.07] (BF > 500)d = −1.16 [−1.24, −1.08]
Stroop reaction time	0.10 [0.10, 0.11] (BF > 500)d = 1.72 [1.57, 1.87]	0.11 [0.10, 0.12] (BF > 500)d = 1.67 [1.46, 1.88]	0.10 [0.09, 0.11] (BF > 500)d = 1.61 [1.43, 1.78]	0.12 [0.11, 0.13] (BF > 500)d = 2.05 [1.80, 2.27]	0.11 [0.10, 0.11] (BF > 500)d = 1.73 [1.63, 1.82]
Condition × Congruency
Stroop accuracy	−0.001 [−0.02, 0.01] (BF = 0.56)d = −0.02 [−0.22, 0.18]	0.00 [−0.02, 0.02] (BF = 0.68)d = −0.005 [−0.25, 0.25]	−0.003 [−0.02, 0.01] (BF = 0.68)d = −0.05 [−0.28, 0.18]	−0.001 [−0.02, 0.02] (BF = 0.71s)d = −0.02 [−0.27, 0.22]	−0.002 [−0.01, 0.007] (BF = 0.35)d = −0.03 [−0.17, 0.10]
Stroop reaction time	−0.006 [−0.02, 0.008] (BF = 0.56)d = −0.10 [−0.32, 0.13]	−0.001 [−0.02, 0.02] (BF = 0.52)d = −0.02 [−0.30, 0.28]	−0.004 [−0.02, 0.01] (BF = 0.52)d = −0.07 [−0.32, 0.19]	−0.007 [−0.02, 0.01] (BF = 0.66)d = −0.12 [−0.43, 0.19]	−0.005 [−0.01, 0.003] (BF = 0.55)d = −0.09 [−0.23, 0.05]

Note: The first value in each cell is a parameter estimate. Values in brackets are 95% highest-posterior-density intervals. Bayes factors (BFs) were computed using bridge sampling. BFs greater than 1 indicate evidence for the experimental hypothesis, whereas BFs less than 1 indicate evidence for the null hypothesis. For each study, the time given in parentheses indicates the length of the high-demand task (the low-demand task was always 5 min).

Phenomenology and latent-parameter relations

Boundary–demand relation

Results from the individual studies showed that self-reported demand was not consistently associated with changes in boundary (Study 1: b = −0.001, 95% HPD = [−0.002, 0.00], d = −0.05; Study 2: b = 0, 95% HPD = [−0.001, 0.00], d = −0.03; Study 3: b = −0.001, 95% HPD = [−0.002, 0.00], d = −0.10; Study 4: b = −0.001, 95% HPD = [−0.003, 0.001], d = −0.03). Critically, the three-level multilevel-model meta-analysis provided strong evidence for a negative relationship between demand and boundary, b = −0.001, 95% HPD = [−0.002, −0.001], BF = 17.21, d = −0.05, 95% HPD = [−0.08, −0.02], although the effect was very small.

Boundary–effort relation

As with the results for the boundary–demand relation, results from the individual studies for the boundary–effort relation were mixed (Study 1: b = 0, 95% HPD = [−0.001, 0.00], d = −0.04; Study 2: b = 0, 95% HPD = [−0.002, 0.00], d = −0.03; Study 3: b = −0.001, 95% HPD = [−0.002, −0.00], d = −0.10; Study 4: b = −0.001, 95% HPD = [−0.004, 0.00], d = −0.03). However, the three-level multilevel-model meta-analysis also provided evidence for a negative relationship between self-reported effort and boundary, b = −0.001, 95% HPD = [−0.002, 0.00], BF = 16.04, d = −0.05, 95% HPD = [−0.08, −0.02], although the effect was also very small.

These two results above suggest that increased feelings of mental demand and effort led to less cautious responding on the Stroop task, although we note that these two ratings correlated strongly in all four studies (rs > .77). Because we consider these analyses exploratory, we caution against overinterpreting these effects and have presented them because we believe that they could provide insights into the effects of different subjective mental states.

Drift rate–demand relation

Self-reported demand was not associated with drift rate in any of the studies. All 95% HPD intervals included 0 (Table 2).

Drift rate–effort relation

Self-reported effort was also not associated with drift rate in any of the studies. All 95% HPD intervals included 0 (Table 2).

Latent parameter: boundary (condition–congruency interaction)

We fitted a model in which the boundary parameter was predicted by condition, Stroop congruency, and their interaction. Here, we focused on the interaction term because it indicates whether the effect of condition on the boundary parameter varied as a function of Stroop congruency. In all four studies, the 95% HPD intervals of the interaction term included 0 (see Table 2). Results from the three-level multilevel model were similar, b = 0, 95% HPD = [−0.003, 0.004], BF = 0.43, d = 0.03, 95% HPD = [−0.11, 0.17]. These findings suggest that the effect of condition on the boundary parameter did not vary as a function of Stroop congruency.

Latent parameter: drift rate (condition–congruency interaction)

We also fitted a model in which the drift-rate parameter was predicted by condition, Stroop congruency, and their interaction. In all studies, the interaction effect was close to 0 (Table 2), which suggests that the effect of condition on the drift-rate parameter did not vary as a function of Stroop congruency.

Behavioral effects: Stroop accuracy

We modeled Stroop accuracy (proportion correct) as a function of condition, congruency, and their interaction. The congruency effect was robust and consistent across studies (Table 2), and the results from the three-level multilevel model indicated that accuracy was lower on incongruent trials (M = .90, SD = .11) than congruent trials (M = .98, SD = .03), b = −0.08, 95% HPD = [−0.08, −0.07], BF > 500, d = −1.16, 95% HPD [−1.24, −1.08]. The condition effect (high vs. low demand) was negative in all studies, but all 95% HPD intervals included 0; however, results from the three-level multilevel-model meta-analysis suggest some evidence for reduced overall accuracy in the high-demand condition, although the effect was small, b = −0.005, 95% HPD = [−0.01, 0.00], BF = 1.21, d = −0.08, 95% HPD = [−0.15, −0.01] (low-demand condition: M = .941, SD = .09; high-demand condition: M = .935, SD = .09). The congruency–condition interaction effect was close to 0 in all studies, and all 95% HPD intervals included 0 (Table 2), which suggests that the condition effect did not vary as a function of Stroop congruency.

Behavioral effects: Stroop reaction time

We also modeled Stroop reaction time (trials with correct responses) as a function of congruency, condition, and their interaction. As expected, the congruency effect was strong and consistent across studies (Table 2): Results from the three-level multilevel-model meta-analysis indicate that reaction times were slower on incongruent trials (M = 0.72 s, SD = 0.14) than on congruent trials (M = 0.62 s, SD = 0.10), b = 0.11, 95% HPD = [0.10, 0.11], BF > 500, d = 1.73, 95% HPD = [1.63, 1.82]. However, evidence for the condition effect (high demand vs. low demand) was less strong and mixed: The 95% HPD intervals for Studies 3 and 4 did not include 0 (Study 3: b = −0.01, 95% HPD = [−0.02, −0.003], d = −0.19; Study 4: b = −0.01, 95% HPD = [−0.02, −0.004], d = −0.25), whereas the 95% HPD intervals for Studies 1 and 2 included 0 (see Table 2). The three-level multilevel-model meta-analysis provided some evidence that across the four studies, overall reaction times were faster in the high-demand condition (M = 0.66 s, SD = 0.13) than in the low-demand condition (M = 0.67 s, SD = 0.13), although the effect was small, b = −0.006, 95% HPD = [−0.01, −0.001], BF = 2.32, d = −0.09, 95% HPD = [−0.17, −0.02]. The congruency–condition interaction effect was close to 0 in all studies (all 95% HPD intervals contained 0; see Table 2).

Discussion

Our results provide insights into the psychology of effort exertion. Across four studies, our demand manipulation (high vs. low) was highly depleting because it robustly elicited strong effort and fatigue sensations. Diffusion-model analyses provided insights into the effects of effort exertion on cognitive processes that have been previously unexamined.

For the two preregistered effects on the latent parameters, Bayesian analyses provided strong evidence for reduced boundary but not drift rate after participants completed the high-demand but not the low-demand task. The lack of evidence for reduced drift rate suggests that effort exertion did not worsen participants’ subsequent task performance or their abilities to process information. However, reduced boundary separation suggests that participants responded less cautiously, as if they cared less about the task and had lost some of their will to persist and engage fully.

Crucially, the reduced-boundary effect was not limited to situations involving inhibition (i.e., incongruent Stroop trials), consistent with our theoretical position that exerting effort leads to task reprioritization and disengagement with ongoing tasks (Inzlicht et al., 2014). Accordingly, depletion should impair performance on incongruent and congruent Stroop trials. Indeed, exploratory analyses revealed that overall Stroop reaction time and accuracy were lower in the high-demand condition than in the low-demand condition, although the effect was much weaker relative to the boundary effect. Furthermore, results from an extended diffusion model for conflict tasks also suggested that effort exertion affected only boundary separation but not controlled or automatic information-integration processes.

Further evidence for our theoretical view comes from the finding that participants reported increased boredom in the high-demand than in the low-demand condition, which might indicate unsuccessful attentional engagement when people feel either unable or unwilling to engage with ongoing tasks (Westgate & Wilson, 2018). Moreover, participants who reported increased fatigue, demand, or effort also had reduced boundary parameters, but these exploratory effects should be interpreted with caution.

Our findings suggest that even when tasks elicit strong subjective states related to fatigue, traditional behavioral measures might lack sensitivity to detect downstream effects. Instead, latent variables might be more sensitive. For example, the reduced-boundary effect was about twice as large as the reduced overall Stroop reaction time and accuracy effects, likely because the diffusion model solves the speed/accuracy trade-off associated with reaction time tasks. Given the strengths of the diffusion model (see Evans & Wagenmakers, 2019), we suggest that other researchers apply similar approaches or reanalyze previous depletion studies that used speeded reaction time tasks.

Depletion proponents might celebrate because our results provide strong evidence for and further insights into depletion effects as well as against the null hypothesis that depletion effects do not exist. Skeptics, however, will hasten to highlight various limitations. Only one of two hypotheses were confirmed—and only meta-analytically, with merely two of four individual studies providing evidence for our preregistered hypotheses. Nevertheless, the small but meaningful effect size (reduced boundary effect: d = −0.15) suggests that researchers hoping to examine similar effects should use within-subjects designs to ensure sufficient statistical power, especially given that the boundary effect was present even after we accounted for within-subjects learning effects in our studies. Despite these issues, our work has numerous strengths—strong manipulations, preregistered hypotheses, and cognitive modeling—that have allowed us to rigorously examine the cognitive processes underlying effort exertion.

Conclusion

Our paradigm robustly elicited feelings such as effort and fatigue, highlighting its utility for studying these subjective states. Bayesian analyses provided strong evidence for the idea that people disengage after exerting effort. Although we failed to find support for all our hypotheses, we have learned that laboratory depletion effects are elusive even with strong manipulations and latent variables that capture meaningful cognitive processes. But our rigorous approach has much potential to facilitate future empirical and theoretical developments.

Supplemental Material

Lin_OpenPracticesDisclosure_rev – Supplemental material for Strong Effort Manipulations Reduce Response Caution: A Preregistered Reinvention of the Ego-Depletion Paradigm

Supplemental material, Lin_OpenPracticesDisclosure_rev for Strong Effort Manipulations Reduce Response Caution: A Preregistered Reinvention of the Ego-Depletion Paradigm by Hause Lin, Blair Saunders, Malte Friese, Nathan J. Evans and Michael Inzlicht in Psychological Science

Supplemental Material

Lin_Supplemental_Material_rev – Supplemental material for Strong Effort Manipulations Reduce Response Caution: A Preregistered Reinvention of the Ego-Depletion Paradigm

Supplemental material, Lin_Supplemental_Material_rev for Strong Effort Manipulations Reduce Response Caution: A Preregistered Reinvention of the Ego-Depletion Paradigm by Hause Lin, Blair Saunders, Malte Friese, Nathan J. Evans and Michael Inzlicht in Psychological Science

Footnotes

Acknowledgements

We thank Colin Kupitz, Joachim Vandekerckhove, and Shravan Vasishth for their guidance. We also thank Julian Quandt for spotting mistakes in the preprint of this article.

Transparency

Action Editor: Laura King

Editor: D. Stephen Lindsay

Author Contributions

H. Lin, B. Saunders, M. Friese, and M. Inzlicht developed the study concept and contributed to the study design. H. Lin performed testing and collected the data. H. Lin and N. J. Evans analyzed and interpreted the data under the supervision of B. Saunders, M. Friese, and M. Inzlicht. H. Lin drafted the manuscript, and the remaining authors provided critical revisions. All of the authors approved the final version of the manuscript for submission.

ORCID iDs

Hause Lin

Malte Friese

Michael Inzlicht

Supplemental Material

Additional supporting information can be found at

Notes

References

Baumeister

R. F.

Bratslavsky

Muraven

Tice

D. M.

(1998). Ego depletion: Is the active self a limited resource. Journal of Personality and Social Psychology, 74, 1252–1262. doi:10.1037/0022-3514.74.5.1252

Baumeister

R. F.

Vohs

K. D.

(2016). Strength model of self-regulation as limited resource: Assessment, controversies, update. In Olson

J. M.

Zanna

M. P.

(Eds.), Advances in experimental social psychology (Vol. 54, pp. 67–127). San Diego, CA: Academic Press. doi:10.1016/bs.aesp.2016.04.001

Bürkner

P. C.

(2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1). doi:10.18637/jss.v080.i01

Carter

E. C.

Kofler

L. M.

Forster

D. E.

McCullough

M. E.

(2015). A series of meta-analytic tests of the depletion effect: Self-control does not seem to rely on a limited resource. Journal of Experimental Psychology: General, 144, 796–815. doi:10.1037/xge0000083

Carter

E. C.

Schönbrodt

F. D.

Gervais

W. M.

Hilgard

(2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2, 115–144. doi:10.1177/2515245919847196

Dai

Milkman

K. L.

Hofmann

D. A.

Staats

B. R.

(2015). The impact of time at work and time off from work on rule compliance: The case of hand hygiene in health care. Journal of Applied Psychology, 100, 846–862. doi:10.1037/a0038067

Dienes

(2014). Using Bayes to get the most out of non-significant results. Frontiers in Psychology, 5, Article 781. doi:10.3389/fpsyg.2014.00781

Dutilh

Annis

Brown

S. D.

Cassey

Evans

N. J.

Grasman

R. P. P. P.

. . . Donkin

(2019). The quality of response time data inference: A blinded, collaborative assessment of the validity of cognitive models. Psychonomic Bulletin & Review, 26, 1051–1069. doi:10.3758/s13423-017-1417-2

Dutilh

Krypotos

A.-M.

Wagenmakers

E.-J.

(2011). Task-related versus stimulus-specific practice. Experimental Psychology, 58, 434–442. doi:10.1027/1618-3169/a000111

10.

Etherton

J. L.

Osborne

Stephenson

Grace

Jones

De Nadai

A. S.

(2018). Bayesian analysis of multimethod ego-depletion studies favours the null hypothesis. British Journal of Social Psychology, 57, 367–385. doi:10.1111/bjso.12236

11.

Evans

N. J.

Servant

(2019). A comparison of conflict diffusion models in the flanker task through pseudolikelihood Bayes factors. Psychological Review, 127, 114–135. doi:10.1037/rev0000165

12.

Evans

N. J.

Wagenmakers

E.-J.

(2019). Evidence accumulation models: Current limitations and future directions. PsyArXiv. doi:10.31234/osf.io/74df9

13.

Francis

Job

(2018). Lay theories of willpower. Social and Personality Psychology Compass, 12(4), Article e12381. doi:10.1111/spc3.12381

14.

Francis

Milyavskaya

Lin

Inzlicht

(2018). Development of a within-subject, repeated-measures ego-depletion paradigm. Social Psychology, 49, 271–286. doi:10.1027/1864-9335/a000348

15.

Friese

Frankenbach

(2019). p-Hacking and publication bias interact to distort meta-analytic effect size estimates. Psychological Methods. Advance online publication. doi:10.1037/met0000246

16.

Friese

Loschelder

D. D.

Gieseler

Frankenbach

Inzlicht

(2019). Is ego depletion real? An analysis of arguments. Personality and Social Psychology Review, 23, 107–131. doi:10.1177/1088868318762183

17.

Garavan

Ross

T. J.

S.-J.

Stein

E. A.

(2000). A parametric manipulation of central executive functioning. Cerebral Cortex, 10, 585–592. doi:10.1093/cercor/10.6.585

18.

Garrison

K. E.

Finley

A. J.

Schmeichel

B. J.

(2019). Ego depletion reduces attention control: Evidence from two high-powered preregistered experiments. Personality and Social Psychology Bulletin, 45, 728–739. doi:10.1177/0146167218796473

19.

Gelman

Shalizi

C. R.

(2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66, 8–38. doi:10.1111/j.2044-8317.2011.02037.x

20.

Hagger

M. S.

Chatzisarantis

N. L. D.

Alberts

Anggono

C. O.

Batailler

Birt

A. R.

. . . Bruyneel

(2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11, 546–573. doi:10.1177/1745691616652873

21.

Hagger

M. S.

Wood

Stiff

Chatzisarantis

N. L.

(2010). Ego depletion and the strength model of self-control: A meta-analysis. Psychological Bulletin, 136, 495–525. doi:10.1037/a0019486

22.

Hirshleifer

Levi

Lourie

Teoh

S. H.

(2019). Decision fatigue and heuristic analyst forecasts. Journal of Financial Economics, 133, 83–98. doi:10.1016/j.jfineco.2019.01.005

23.

Inzlicht

Friese

(2019). The past, present, and future of ego depletion. Social Psychology, 50, 370–378. doi:10.1027/1864-9335/a000398

24.

Inzlicht

Schmeichel

B. J.

Macrae

C. N.

(2014). Why self-control seems (but may not be) limited. Trends in Cognitive Sciences, 18, 127–133. doi:10.1016/j.tics.2013.12.009

25.

Lee

M. D.

Wagenmakers

E.-J.

(2013). Bayesian cognitive modeling: A practical course. New York, NY: Cambridge University Press.

26.

Leys

Delacre

Mora

Y. L.

Lakens

Ley

(2019). How to classify, detect, and manage univariate and multivariate outliers, with emphasis on pre-registration. International Review of Social Psychology, 32(1), Article 5. doi:10.5334/irsp.289

27.

Lin

(2019). hauselin/hausekeep: Third release (Version v0.0.0.9003-alpha) [Computer software]. doi:10.5281/zenodo.2555874

28.

Milyavskaya

Inzlicht

Johnson

Larson

M. J.

(2019). Reward sensitivity following boredom and cognitive effort: A high-powered neurophysiological investigation. Neuropsychologia, 123, 159–168. doi:10.1016/j.neuropsychologia.2018.03.033

29.

Miyake

Friedman

N. P.

(2012). The nature and organization of individual differences in executive functions: Four general conclusions. Current Directions in Psychological Science, 21, 8–14. doi:10.1177/0963721411429458

30.

Moller

A. C.

Deci

E. L.

Ryan

R. M.

(2006). Choice and ego-depletion: The moderating role of autonomy. Personality and Social Psychology Bulletin, 32, 1024–1036. doi:10.1177/0146167206288008

31.

Ratcliff

McKoon

(2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20, 873–922. doi:10.1162/neco.2008.12-06-420

32.

Schmeichel

B. J.

(2007). Attention control, memory updating, and emotion regulation temporarily reduce the capacity for executive control. Journal of Experimental Psychology: General, 136, 241–255. doi:10.1037/0096-3445.136.2.241

33.

Sjåstad

Baumeister

R. F.

(2018). The future and the will: Planning requires self-control, and ego depletion leads to planning aversion. Journal of Experimental Social Psychology, 76, 127–141. doi:10.1016/j.jesp.2018.01.005

34.

Tuk

M. A.

Zhang

Sweldens

(2015). The propagation of self-control: Self-control in one domain simultaneously improves self-control in other domains. Journal of Experimental Psychology: General, 144, 639–654. doi:10.1037/xge0000065

35.

Ulrich

Schröter

Leuthold

Birngruber

(2015). Automatic and controlled stimulus processing in conflict tasks: Superimposed diffusion processes and delta functions. Cognitive Psychology, 78, 148–174. doi:10.1016/j.cogpsych.2015.02.005

36.

van Ravenzwaaij

Donkin

Vandekerckhove

(2017). The EZ diffusion model provides a powerful test of simple empirical effects. Psychonomic Bulletin & Review, 24, 547–556. doi:10.3758/s13423-016-1081-y

37.

Wagenmakers

E.-J.

Gronau

(2017, December 28). Redefine statistical significance XIII: The case of ego depletion [Web log post]. Retrieved from https://www.bayesianspectacles.org/redefine-statistical-significance-xiii-the-case-of-ego-depletion/

38.

Wagenmakers

E.-J.

Van der Maas

H. L. J.

Grasman

R. P. P. P.

(2007). An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14, 3–22. doi:10.3758/BF03194023

39.

Westfall

(2016). PANGEA: Power ANalysis for GEneral Anova designs. Retrieved from http://jakewestfall.org/publications/pangea.pdf

40.

Westgate

E. C.

Wilson

T. D.

(2018). Boring thoughts and bored minds: The MAC model of boredom and cognitive engagement. Psychological Review, 125, 689–713. doi:10.1037/rev0000097

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.63 MB

0.60 MB