Abstract
The ego depletion effect posits that initial exertion of self-control impairs subsequent self-regulatory performance. Despite being examined in over 1000 independent studies and cited extensively, recent large-scale studies have questioned its validity. We propose that the replicability of ego depletion may hinge on the intensity of the manipulation. Our new paradigm, involving a demanding antisaccade task lasting for 30–40 min followed by a Go-Nogo task, was tested across 14 samples, totaling 2078 participants worldwide, both in laboratory settings and online. Results consistently demonstrated significant ego depletion effects (d = 0.31 to 0.35) with minimal heterogeneity (I2 = 0). Bayesian meta-analysis further supported these findings with strong evidence (BF10 > 700). This study underscores the importance of manipulation intensity in ego depletion research and provides a reliable method for future studies. These findings have significant implications for resolving empirical controversies in ego depletion and addressing the broader replication crisis in psychology.
Psychology, particularly social psychology, is undergoing a replication crisis (Shrout & Rodgers, 2018). At the center of this storm, one of the most debated phenomena is the ego depletion effect, which refers to the idea that initial exertion of self-control impairs subsequent self-regulatory performance (Friese et al., 2019). Although this effect has been examined in over 1000 independent studies (Dang, 2018) conducted by more than 2000 researchers (Wolff et al., 2018), with the seminal paper (Baumeister et al., 1998) having been cited over 8000 times in Google Scholar, it has recently been seriously challenged and asserted to be spurious. In particular, two multi-lab projects across the globe, one including 23 laboratories (N = 2141, d = 0.04) (Hagger et al., 2016) and the other including 36 laboratories (N = 3531, d = 0.06) (Vohs et al., 2021), both found non-significant effects. Although another multi-lab project including 12 labs with 1775 participants yielded significant results, the effect size was relatively small (d = 0.10, CI95 = [0.01, 0.19]) (Dang et al., 2021). Therefore, despite its pervasive influence, ego depletion is now marginalized, with many researchers questioning whether it is a real effect (Friese et al., 2019).
We suggest that the replicability of ego depletion effects may depend, at least in part, on the intensity of the depletion manipulation. Importantly, not all acts of self-regulation are equally effortful or depleting. As VanDellen et al. (2012) argue, only tasks that surpass a “self-control threshold”—that is, those that are both consciously effortful and sufficiently demanding—are likely to meaningfully consume regulatory resources (Baumeister et al., 1998). Acts that fall below this threshold—because they are automatic, habitual, or require minimal effort—may not produce measurable depletion, even if they involve goal-directed behavior. For example, overriding impulses for just a couple of minutes may not impair subsequent control, much like lifting a light weight fails to induce muscle fatigue. Duration alone is insufficient if the task lacks intensity; similarly, cognitive load may not reliably induce depletion if it does not involve volitional override.
This framework helps explain inconsistencies in the existing ego depletion literature, including the three major multi-lab projects, in which most depletion tasks were relatively brief—typically under ten minutes. Such mild manipulations may not consistently cross the self-control threshold and are more vulnerable to individual variability: some participants may experience them as depleting, while others may feel unaffected or even energized, perceiving the task as a “warm-up” (Lopez et al., 2019; Wenzel et al., 2019; Wright et al., 2019). This subjective variability likely contributes to the substantial heterogeneity in reported effect sizes, yielding small average effects. Consistent with this view, a pilot study we conducted with 165 participants using a 10-min task revealed no significant depletion effect (see Supplemental Materials). Further supporting this interpretation, Tsai and Li (2020) found that task intensity—not just duration—is positively correlated with perceived fatigue.
Building on this insight, we developed a new task combination specifically designed to robustly tax self-control resources by ensuring that the initial task reliably exceeds the self-control threshold. In line with the standard two-task paradigm for ego depletion, participants in the depletion condition first completed a self-control demanding task, followed by a second task measuring downstream control performance. In our study, the initial task was a 30–40 min antisaccade task requiring constant inhibition of reflexive eye movements. Participants in the control condition performed a prosaccade task that was structurally identical in stimuli, timing, and duration but involved minimal inhibitory demand. By matching surface features while manipulating control demands, we aimed to isolate the role of intensity in producing consistent depletion effects. This high-effort, sustained inhibition was expected to produce greater uniformity in subjective depletion and stronger group differences on the subsequent task.
A further consideration is that prolonged control tasks may themselves impair subsequent self-control, not because they require inhibitory regulation, but because they induce boredom or disengagement (Mangin et al., 2021). Such effects could reduce the contrast between depletion and control conditions, potentially obscuring genuine depletion effects. To address this, our design included both a long control condition, matched in duration to the depletion task, and a short control condition that minimized monotony. This allowed us to test whether differences between depletion and control groups persist when the control task is unlikely to produce boredom-related fatigue.
Specifically, in the depletion condition, participants completed an antisaccade task in which a white cue (“=”) briefly flashed on either the left or right side of the screen, automatically capturing attention. The target stimulus then appeared on the opposite side for 100 ms before being masked, requiring participants to identify it while resisting the reflexive urge to look toward the cue. This sustained need to override a dominant response placed high demands on inhibitory control. In the control conditions, participants performed a prosaccade task in which the target appeared on the same side as the cue, minimizing inhibitory demands. The long control condition matched the depletion condition in structure and duration (720 trials, ∼30–40 min), whereas the short control condition contained only 30 trials, thereby minimizing monotony and potential fatigue.
Following the initial task, all participants completed a Go–Nogo task as the outcome measure. On each trial, they were instructed to press the spacebar in response to frequent Go stimuli (“SSSTSSS”) and withhold responses to infrequent Nogo stimuli (“SSSHSSS”). With three-fourths Go trials and one-fourth Nogo trials, the task fostered a strong prepotent tendency to respond, making response inhibition on Nogo trials particularly challenging. The Go–Nogo task is widely used as a behavioral measure of inhibitory control and self-regulation. It was selected based on prior evidence showing that the NoGo error rate yields higher reliability than other commonly used inhibitory control measures. In a large-scale psychometric evaluation, Hedge et al. (2018) reported that the NoGo error rate achieved the highest intraclass correlation coefficient (ICC = .76) among several inhibitory tasks, including Stroop and Flanker. In the present study, we computed ICC values for the NoGo error rate in each dataset. Across all samples, ICC values exceeded .80, indicating good internal consistency for this measure.
Finally, previous research suggests that the ego depletion effect may be moderated by individual differences. In this study, we included three individual difference variables that have been examined in at least two prior studies: trait self-control (Imhoff et al., 2014; Wang et al., 2015), action orientation (Dang et al., 2015; Gröpel et al., 2014), and lay theories about willpower (Job et al., 2010, 2015). These three constructs were also examined in two recent multi-lab projects (Dang et al., 2021; Vohs et al., 2021). Furthermore, exerting self-control is often accompanied by negative affect (Hagger et al., 2010), which may in turn influence subsequent self-control performance. To account for this, we included a fourth individual difference variable: affect intensity, which reflects the typical strength with which individuals experience both positive and negative emotions (Geuens & De Pelsmacker, 2002). Individuals who experience negative affect more intensely may be more susceptible to ego depletion effects.
Methods
Procedures
The current research was approved by the Institutional Review Board of the second author's affiliation. Informed consent was obtained from all participants. All materials, data, and scripts are available via: https://osf.io/j56xz/?view_only = 1bc81d41b4154b1091161df87af75f68. We tested our paradigm in both laboratory and online settings. For lab-based samples, participants were greeted by a research assistant and escorted to individual cubicles; from that point onward, all instructions and tasks were delivered via computer, with no further in-person interaction. For online samples (U.S. and international), participants completed the tasks entirely independently via the online platform, with no direct human contact at any stage of the experiment. The specific steps are described below.

Forest Plots for Between-Group Differences in Nogo Errors.
Demographic Information of Each Subsample.
Notes: a There were only two options (male or female) when participants were asked about their gender. b This column indicates the minimum effect size that could likely be obtained under standard criteria (α = .05 and β = .20), given the final sample size of each study.
To increase the generalizability of our paradigm, we shortened the total experimental duration by changing the trial number of the Go-Nogo task from 320 (240 No trials and 80 Nogo trials) to 80 (60 Go trials and 20 Nogo trials) and distributed this experiment online to a group of international participants who were recruited on Prolific (n = 297, see Online sample (international) in Figure 1 and Table 1), consisting of a depletion condition and a long control condition. The only requirement was that participants could understand English instructions. Individual difference variables were not measured. This step was not pre-registered.
Paradigm
Depletion Manipulation
Participants were randomly assigned to either two conditions (i.e., the depletion condition and the long control condition) or three conditions (i.e., the depletion condition, the long control condition, and the short control condition).
In the depletion condition, participants were required to finish an antisaccade task (Dang et al., 2017, 2021). The main task required participants to identify one of three target letters (B, P, or R) by pressing the corresponding key (1, 2, or 3, respectively) as quickly and accurately as possible. Each trial began with a fixation cross displayed for 200 ms on a black background. A flashing white “=” sign then appeared to either the left or right of the fixation cross for 100 ms, followed by a 50 ms blank screen and a second presentation of the “=” sign for another 100 ms at the same location. This sequence created the illusion of a flashing cue, which readily captured participants’ attention. After another 50 ms blank screen, the target letter (B, P, or R) appeared on the opposite side of the screen for 100 ms, followed by a 50 ms mask (“H”) and then a number “8,” which remained onscreen at the same location until a response was made. Participants completed 30 practice trials (including 12 trials to learn the response mapping and 18 trials to familiarize themselves with the task procedure), followed by 720 formal experimental trials.
In the long control condition, participant performed a prosaccade task with the same number of trials. The stimuli presentation was similar to the antisaccade with the exception that the target stimulus (B, P, or R) and the flashing sign “=” appeared on the same location. Therefore, participants did not need to exert self-control to overcome the attraction of the flashing sign. Instead, their attention was attracted to the location where the target stimulus was presented.
In the short control condition, participant performed a prosaccade task with only 30 trials.
In all conditions participants were provided with feedback regarding their performance in each trial (correct or wrong).
Manipulation Check Measures
After depletion manipulation, participants answered four manipulation check questions regarding effort (“How much effort did you put into the task just finished”), difficulty (“How difficult did you find the task just finished”), fatigue (“How tired do you feel after doing the task just finished”), and frustration (Did you feel frustrated while you were doing the task just finished) on a 7-point scale (Hagger et al., 2016).
Dependent Measure
Following the manipulation check measures, participants were required to finish a Go-Nogo task. There were two types of character strings. When the string “SSSTSSS” was present (i.e., the Go trial), participants were required to press the spacebar within 1250 ms after the presentation of the string. Failure to respond within 1250 ms were considered as an error in the Go trial. When the string “SSSHSSS” was present (i.e., the Nogo trial), participants were required not to make any response. Responding in the Nogo trial were considered as an error. In each trial, first a cross was shown in the center of the screen. The duration of the cross ranges from 400 ms to 600 ms (i.e., 400 ms, 440 ms, 480 ms, 520 ms, 560 ms, 600 ms). After the cross, the character string (either SSSTSSS or SSSHSSS) was presented for 200 ms and then masked by a 50 ms sign (i.e., “XXXXXXX”). Then there was a blank screen lasting for 1000 ms waiting for response. Participants received 20 trials (15 Go trials and 5 Nogo trials) for practice with feedback (correct or wrong) and 320 real trials (240 Go trials and 80 Nogo trials) without feedback. Note the trial number for the international online sample was 80, as described in the procedures.
The primary dependent variable is the number of errors participants made in Nogo trials. The number of errors in Go trials and reaction times (RT) in Go trials (above 200 ms) were also examined.
Individual Difference Measures
As described in the procedures, in five samples (Lihua Mao, Anna Baumert, Lile Jia, Oulmann Zerhouni, and Timur Sevincer), before the experiment, participants in the depletion condition and the long control condition completed a short questionnaire measuring four individual difference variables.
First, action orientation was measured by the Demand-Related Action Orientation subscale (AOD) of the Action Control Scale (Dang et al., 2015; Jostmann & Koole, 2007). The AOD scale consists of 12 items. Each item describes a demanding situation and an action-oriented versus a state-oriented coping way. Participants were asked to indicate the way that best describes their own reaction to that situation. Action-oriented responses were coded as 1 whereas state-oriented responses as 0. Scores summed for the entire scale could range from 0–12.
Second, lay theory of will power was measured by the six items developed by Job and colleagues (Job et al., 2010). Participants responded on a 6-point rating scale (1 = strongly disagree, 6 = strongly agree). Items were scored so that higher values indicate greater agreement with the unlimited-resource theory.
Third, trait self-control was measured by the 13-item Brief Self-Control Scale (Tangney et al., 2004). Participants indicated the extent to which they agree with each statement on a scale from 1 (strongly disagree) to 5 (strongly agree). Higher score represents better self-control.
Finally, affect intensity was measured by the affect intensity measure (Geuens & De Pelsmacker, 2002), which included three dimensions: positive affectivity, negative affectivity, and serenity. Participants indicated how they typically respond to the event described in each item on a scale from 1 (I never feel like that) to 6 (I always feel like that).
Results
In total, there were 2078 participants recruited across 14 samples: 815 in the depletion condition, 800 in the long control condition, and 463 in the short control condition (see Table 1 for demographic information).
Meta-Analyzing Nogo Errors
Random Effects Model
We meta-analyzed the overall effect size of the depletion manipulation using the random effects model. As shown in Table 2 and Figure 1, for Nogo errors, the primary dependent variable, the weighted average standardized mean difference (SMD) between the depletion condition and the long control condition was significant, d = 0.35, CI95 = [0.25, 0.45], Z = 6.97, p < .001. The SMD between the depletion condition and the short control condition was also significant, d = 0.31, CI95 = [0.18, 0.44], Z = 4.78, p < .001. However, the SMD between the long control condition and the short control condition was not significant, d = 0.01, CI95 = [-0.12, 0.14], Z = 0.18, p = .861. As shown in Table 2, The Q statistics for all the three meta-analyses were not significant. Similarly, the I2 values for all the three meta-analyses were zero, indicating the effect sizes were homogeneous across samples.
Meta-Analytical Parameters of Between-Group Differences.
*p < 0.05. **p < 0.01. ***p < 0.001.
Bayesian Meta-Analysis
We also conducted a Bayesian meta-analysis of the ego depletion effect from each sample (Gronau et al., 2017). We set the prior distribution of the standardized effect size as a zero-centered Cauchy distribution with scale parameter equal to 1/√2 and the prior distribution of the between-study standard heterogeneity as an Inverse-Gamma (1, 0.15) distribution. A Bayes factor (BF10) larger than 3 is generally considered as supporting evidence for the alternative hypothesis (i.e., ego depletion is present) whereas a BF10 smaller than 1/3 supports the null hypothesis (i.e., ego depletion is absent). A BF10 between 0.33 and 3 is considered as anecdotal evidence, which supports neither the alternative hypothesis nor the null hypothesis. As shown in Table 2, the meta-analytic Bayes factors comparing the depletion condition with the long control condition and the depletion condition with the short control condition are BF10 = 142516.30 and BF10 = 721.29, respectively. These values indicate very strong evidence in support of the presence of the ego depletion effect. In contrast, the meta-analytic Bayes factor comparing the long control condition with the short control condition is BF10 = 0.09, suggesting that there is no difference between these two conditions in Nogo errors.
Meta-Analyzing Go Errors and Go RT
As shown in Table 2, for Go errors and Go RT, none of the SMD comparisons were significant (for forest plots, see Figure S1 and Figure S2 in Supplemental materials). The meta-analytic Bayes factors were either smaller than 1/3 (indicating no group difference) or between 1/3 and 3 (indicating support for neither hypothesis).
Meta-Analyzing Manipulation Check Items
As shown in Table 3, manipulation check items yielded significant SMDs among all comparisons.
Meta-Analytical Parameters of Manipulation Check Measures.
*p < 0.05. **p < 0.01. ***p < 0.001.
Moderators
Sample-Level Moderators
We coded three sample level moderators. First, there were eight Chinese samples and six samples from other countries, so we coded the moderator as whether the sample was from China (categorical variable: 0 = no, 1 = yes). The second and third moderators were the mean age and gender composition (indicated by male percentage) of each sample, respectively, both of which were continuous variables.
As shown in Table S1, meta-regression revealed that none of the three variables played a moderating role for Nogo errors. For Go errors and Go RT, there was only one significant result, such that gender composition moderated the difference in Go RT between the long control and the short control conditions, b = -1.12, SE = 0.49, p = .023. The difference between these two conditions decreased when more male participants were recruited. Gender composition accounted for 86.40% of the heterogeneity.
Individual Differences
We included several individual difference measures (i.e., trait self-control, action orientation, lay theory of willpower, and affect intensity) in five samples. The affect intensity measure was scored separately for its three subscales: positive affectivity, negative affectivity, and serenity. All analyses used these subscale scores rather than a total score. In line with previous studies (Vohs et al., 2021), we used multi-level linear regression to test the moderating roles of these individual differences (with sample as the cluster variable), in which Nogo errors, Go errors, Go RT, and each manipulation check item (effort, difficulty, fatigue, frustration) was regressed on condition, the respective individual difference variable, and their interaction. As shown in Table S2, none of the interactions were significant.
Discussion
In summary, in contrast to previous large-scale studies (Dang et al., 2021; Hagger et al., 2016; Vohs et al., 2021), our research provides stronger evidence for the ego depletion effect by employing a more intensive depletion manipulation. These findings have significant implications for understanding self-control and addressing the replication crisis in psychology.
First, our fully computerized paradigm can consistently produce the ego depletion effect across age groups and gender compositions. It is particularly noteworthy that our ego depletion paradigm proved effective not only in traditional laboratory settings but also in fully computerized, online-administered studies. This is especially significant given prior evidence suggesting that online paradigms—especially those lacking live experimenter interaction—tend to suffer from lower participant engagement and reduced replicability (e.g., Baumeister et al., 2023). The success of our paradigm in such contexts suggests it may serve as a useful methodological advancement for future online research on self-control and potentially other psychological phenomena. We hope this work contributes to improving the robustness and generalizability of psychological findings beyond the laboratory.
Second, our results showed no significant differences in performance between the long and short control conditions. This suggests that the shorter version of the control task is equally effective in maintaining a non-depleting baseline. This finding is especially valuable for researchers aiming to reduce participant burden without compromising experimental validity. At the same time, the absence of differences also indicates that the long control task is not inherently depleting, supporting its use as a valid baseline condition. Thus, both versions can be considered appropriate depending on the research context: the short control helps to minimize monotony and participant burden, whereas the long control ensures closer comparability in task duration across conditions. This also supports prior concerns (e.g., Mangin et al., 2021) that overly long and monotonous control tasks may inadvertently tax participants’ cognitive resources, thus diminishing the intended contrast with depletion conditions.
Third, in line with recent large-scale studies (Vohs et al., 2021), none of the four individual difference variables—trait self-control, action orientation, lay theory of willpower, and affect intensity—moderated the ego depletion effect in our data. One possible explanation is that these variables are simply not the right moderators for this particular paradigm, and future research may need to identify other individual difference factors that better capture susceptibility to depletion. Another possibility is that the relatively intense depletion task we employed produced robust effects across participants, thereby reducing variability in susceptibility and obscuring potential moderation. In other words, when regulatory demands surpass a threshold of difficulty, most participants experience depletion regardless of trait-level predispositions. From this perspective, the consistency of the effect across individuals may itself serve as evidence that the manipulation was sufficiently powerful to override potential moderating influences.
Fourth, the present findings indicate that the paradigm is effective in both Chinese and several Western samples (Germany, France, US), suggesting its applicability across these cultural contexts. However, most non-Chinese samples in our study were from WEIRD societies, and caution is warranted when extending the findings to other non-WEIRD populations. Future work should examine the paradigm's generalizability in more diverse cultural settings.
Fifth, although ego depletion has pervasive influence in social psychology and many related research areas such as decision making and organizational behaviors (Friese et al., 2019), there are many conflicted empirical results that may arise from the ineffectiveness of the depletion manipulation used in the previous studies. Our paradigm addresses this issue by providing a more reliable method for inducing ego depletion, thus helping to resolve these empirical controversies. Notably, beyond the overall significant effect observed in our meta-analyses, the ego depletion effect reached statistical significance in the majority of individual samples. This consistency across samples suggests that our paradigm can serve as a reliable and generalizable tool for future research examining how ego depletion may influence specific psychological or behavioral outcomes.
Sixth, our paradigm encourages the development of additional paradigms using different task combinations with strong depletion manipulations. This approach can further validate the ego depletion effect and refine our understanding of self-control. More generally, the replication crisis observed in many other fields may also result from inadequate or inappropriate manipulations (Baumeister, 2020), suggesting that similar methodological improvements could enhance the reliability of findings across various domains of research.
Finally, beyond its contributions to debates in human psychology, our paradigm may also provide a useful framework for thinking about self-regulation in non-human agents. Recent work has begun to map the emergent “psychological” profiles of large language models (LLMs), such as ChatGPT, showing that they display systematic biases and patterns resembling human-like cognitive tendencies (Yuan et al., 2025; Zhang et al., 2025). This raises the intriguing possibility of whether an analogue of “ego depletion” might be observed in AI systems—that is, whether sustained engagement in tasks requiring continuous inhibition or contextual maintenance (e.g., filtering harmful content, sustaining coherence in long interactions) leads to performance degradation over time. Our intensive, computer-based antisaccade–Go/NoGo paradigm offers a potential template for probing such limits of self-regulatory processing in LLMs. Exploring these parallels could not only illuminate boundary conditions of human self-control but also inform the design and safe deployment of AI in contexts that demand prolonged inhibitory regulation.
One limitation of the current study is that the antisaccade task combines substantial visual processing demands with inhibitory control requirements, and thus the present design cannot completely disentangle whether the observed depletion effects are driven by inhibition per se or by the visual demands associated with vector inversion. Although the prosaccade control condition was matched for visual stimuli and timing, the visual processes involved differ in nature. Future research could include an additional control task that equates visual demands while minimizing inhibitory requirements to more precisely identify the mechanism underlying the observed effects.
Supplemental Material
sj-docx-1-pac-10.1177_18344909251386084 - Supplemental material for Revisiting Ego Depletion: Evidence from Multi-Lab Collaborations
Supplemental material, sj-docx-1-pac-10.1177_18344909251386084 for Revisiting Ego Depletion: Evidence from Multi-Lab Collaborations by Junhua Dang, Shanshan Xiao, Lihua Mao, Xiaoping Liu, Anna Baumert, Solenne Bonneterre, Shiyu Cai, Xiaoxi Chen, Margaux de Chanaleilles, Ning Ding, Wei Fan, Yi Feng, Dingguo Gao, Xiaoqing He, Wanting Huang, Ismaharif Ismail, Lile Jia, Haijiang Li, Ruijing Li, Zhenhua Li, Chunhui Lim, Laura Linke, Yangang Nie, Zhihong Qiao, Mengmeng Ren, A. Timur Sevincer, Jingbin Tan, Ziyi Wang, Song Wu, Oulmann Zerhouni, Yiping Zhong, Yalin Zhu, Axel Zinkernagel and Helgi B. Schiöth in Journal of Pacific Rim Psychology
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
