Abstract
Mindfulness-based interventions (MBIs) lead to improvements in mental health, and also increase trait mindfulness. Yet, trait mindfulness shows definitional and empirical overlaps with the personality dimension of neuroticism which is linked to mental health and is malleable through interventions as well. This meta-analysis examined whether previously reported associations between increases in self-reported trait mindfulness and mental health in MBIs, as well as in non-MBIs and treatment-as-usual (TAU) and waitlist controls, are mediated through concomitant changes in neuroticism on the between-study level. Data of 45 intervention studies (39 randomized controlled trials; total N = 2913) were investigated with three-level meta-analysis and the causal steps approach. Change in neuroticism mediated change in trait mindfulness and fully accounted for its mediational effects on mental health. Similar associative patterns were found for the active and TAU and waitlist control groups as well. Accounting for small-study effects did not alter this pattern of results. The findings highlight the relevance of neuroticism for intervention research and may explain previously observed apparent effects of trait mindfulness in non-MBIs and TAU and waitlist controls on mental health. The construct of trait mindfulness may need conceptual reconsideration and resharpening, and the investigation of personality change should be intensified in intervention research.
Plain language summary
Mindfulness-based interventions (MBIs) may not only improve mental health, but also the extent individuals are mindful in their daily lives (i.e., trait mindfulness). Previous research suggested that this increase in trait mindfulness explains the beneficial effects of MBIs on mental health. Paradoxically, other active, non-MBI, treatments increase trait mindfulness as well. The present research presents evidence that MBIs and other active treatments also decrease the tendency to experience negative emotionality and interpret stimuli in a more negative way (i.e., the personality trait of neuroticism). Neuroticism has close links with mental health. Changes in this more general personality trait thus may alternatively account for the beneficial effects of MBIs (and other active treatments), rather than increases in trait mindfulness in particular. This may explain otherwise paradoxical observations and highlights problems with the construct and measurement of trait mindfulness. Neuroticism may be the more general and relevant personality trait than trait mindfulness for explaining the beneficial effects of MBIs and other active treatments.
Keywords
Introduction
Jon Kabat-Zinn, who introduced mindfulness-based interventions (MBIs) to the psychotherapeutic field, defined mindfulness as “the awareness that emerges through paying attention on purpose, in the present moment, and nonjudgmentally to the unfolding of experience moment by moment” (Kabat-Zinn, 2003, p. 145). There is an ever-growing interest in this phenomenon, and the field has extended, and currently routinely applies, the term and concept of mindfulness variously to practices, processes, and also personality traits (Van Dam et al., 2018).
Trait or dispositional mindfulness is the relatively stable and trait-like disposition to act or behave in a mindful way in everyday life, formulated in contrast to the more temporary concept of state mindfulness that is concerned with being mindful in any given moment (Rau & Williams, 2016). The trait concept of mindfulness matches the general concept of personality traits, which are portrayed as relatively consistent patterns of thinking, feeling, and acting (McCrae & Costa, 1997). Trait mindfulness can be increased via meditation and MBIs and has been reported to mediate thereby the beneficial effects of mindfulness interventions on mental health (Gu et al., 2015; Tran et al., 2022). However, there is mounting evidence that, despite different conceptual origins, trait mindfulness and established personality traits from the Big Five or HEXACO taxonomies are highly intercorrelated and intertwined (e.g., Banfi & Randall, 2022; Giluk, 2009; Hanley & Garland, 2017; Rau & Williams, 2016). Further, trait mindfulness appears to be subject to a jangle fallacy (i.e., a different label for an identical concept; Gonzales et al., 2021; Hanfstingl et al., 2024) and, consequently, its incremental validity over and above established personality traits is only small for various criterion variables, including mental health (Altgassen et al., 2024; Fischer et al., 2023; Tran et al., 2020).
All this casts doubt on the validity of trait mindfulness being a mediator of treatment effects and begs the question as to which extent established personality traits might alternatively account for such reported associations. This appears relevant, as personality traits are intimately linked with the highly prevalent emotional dysfunction superspectrum of disorders in the hierarchical taxonomy of psychopathology (HiTOP; Watson et al., 2022; for details, see next subsection) and are malleable through psychotherapeutic interventions as well (e.g., Roberts et al., 2017). The following sections elaborate on these issues and how they were addressed in the present study.
Mindfulness, personality, and mental health
There is a lack of a normative, universally accepted definition of mindfulness and, likewise, current scientific definitions of mindfulness (such as Kabat-Zinn’s, above) must be seen as an abstraction and translation of this concept into a Western context, whose fit with the underlying (Hinduist/Buddhist) origins can be debated (which is beyond the scope of the present study; but see Dreyfus, 2011; Gethin, 2011; Harrington & Dunne, 2015). Nonetheless, there are a number of self-report scales, which are routinely used for the assessment of trait mindfulness (e.g., the Mindful Attention Awareness Scale; Brown & Ryan, 2003; or the comprehensive Five Facet Mindfulness Questionnaire; FFMQ; Baer, Smith, Hopkins, Krietemeyer, & Toney, 2006) and which consist of varying, yet overall similar facets representing central aspects of mindfulness each (see also Altgassen et al., 2024).
On the highest, most aggregate level, trait mindfulness is associated with both personality metatraits of stability (which subsumes neuroticism, conscientiousness, and agreeableness) and plasticity (subsuming openness and extraversion; DeYoung, 2015; DeYoung et al., 2002; Digman, 1997). However, associations appear to be stronger with the former than with the latter one (Banfi & Randall, 2022; see also Hanley & Garland, 2017). On the lower-order level, the personality traits of the Big Five and HEXACO taxonomies account for around 50% of the variance in trait mindfulness, which itself seems closer to conscientiousness, neuroticism, and openness than to extraversion, agreeableness, or honesty-humility (Altgassen et al., 2024). Meta-analytically, neuroticism (r = −.45: Giluk, 2009; r = −.56, when corrected for unreliability: Banfi & Randall, 2022) and conscientiousness (r = .32/.41) are strongest of all Big Five personality traits associated with trait mindfulness.
These associations also extend to the conceptual and the latent level, as the Big Five personality traits and trait mindfulness have repeatedly been shown to share a common latent factor structure (Spinhoven et al., 2017; Tran et al., 2020). Neuroticism is characterized by high negative emotionality and a disposition to experience anxiety (Barlow et al., 2014). Individuals high in neuroticism are thus variously described as worrying, nervous, emotional, and vulnerable (McCrae & Costa, 1987). Further, they show higher cognitive and emotional reactivity (Barnhofer & Chittka, 2010; Suls et al., 1998) and interpret stimuli in a more negative way (Vinograd et al., 2020). This characterization of high neuroticism contrasts with central elements of mindfulness which highlight: a nonreactive stance toward one’s inner experience (as, e.g., captured in the FFMQ facet Nonreactivity to Inner Experience [Nonreact]); a sustained present-moment awareness (Acting with Awareness [Actaware] facet in the FFMQ); and abstaining from judging one’s experience (Nonjudging of Inner Experience [Nonjudge] facet in the FFMQ). Accordingly, the neuroticism facets load highest (.63–.95) on the same factor as the Nonreact, Nonjudge, and Actaware facets (−.54 to −.69; Tran et al., 2020; see also Hanley & Garland, 2017). Conceptual similarities are also mirrored in the item contents of popular trait mindfulness and Big Five scales, which linguistically are highly similar in sentiment and semantics (Fischer et al., 2023).
Given the conceptual and empirical overlaps with neuroticism in particular, and the reported low incremental validity of trait mindfulness over and above established personality traits, the uniqueness of the trait mindfulness construct has been called into question. Even though there is some unique variance in the construct, popular measures like the FFMQ mostly seem to measure neuroticism (Altgassen et al., 2024; see also Tran et al., 2020).
Both trait mindfulness and the Big Five personality traits have strong links with mental health. Higher trait mindfulness is, for example, associated with lower perceived stress and greater psychological well-being (e.g., Zimmaro et al., 2016) and psychological health (Tomlinson et al., 2018). Among the Big Five, especially neuroticism has strong associations with mental health and psychological and subjective well-being (Anglim et al., 2020; Kotov et al., 2010; Lamers et al., 2012; Malouff et al., 2005). High neuroticism also increases the likelihood for mental disorder in later life (Jeronimus et al., 2016) and is (together with negative affect) intimately linked with the HiTOP emotional dysfunction superspectrum of disorders (Watson et al., 2022).
HiTOP provides a quantitative nosological framework that comprehensively describes psychopathology in a hierarchically nested structure, which features, on the highest and most general level, superspectra into which feed (each on successively lower levels) spectra, subfactors, (dimensional) syndromes, homogeneous symptom components/maladaptive traits, and, on the lowest level, individual symptoms themselves. The HiTOP model has been derived from factor-analytic aggregation of symptom data, but has further gained broad validity evidence from, for example, the field of behavioral genetics, but also concerning neural substrates, and the course and treatment response of mental illnesses (see Watson et al., 2022).
In its original form (Kotov et al., 2017), the highest level featured only one superspectrum that was identified with a general factor of psychopathology (p-factor). Later developments proposed three superspectra (psychosis, externalizing, and emotional dysfunction; Watson et al., 2022) of which the last appears to be of specific importance in the current context. The emotional dysfunction superspectrum encompasses a broad range of ‘internalizing’ disorders, which cover anxiety, depressive, trauma-related, eating, bipolar, and somatic symptom disorders, as well as sexual dysfunction and certain personality disorders. Neuroticism is strongly linked with this superspectrum (as a vulnerability factor, through shared genetic bases, through feedback loops that may promote disorder; see Watson et al., 2022), which highlights its relevance for some of the most common and prevalent mental disorders.
Many, including some of the most widely used, symptom scales in psychiatry feed into the emotional dysfunction superspectrum (Watson et al., 2022; see also Kotov et al., 2017, on the p-factor). As seen from the perspective of HiTOP, there is thus broad psychometric evidence that a wide variety of mental-health-related scales provide information on a common construct (namely, the superspectrum) on an aggregate higher-order level. The current study thus treated the seemingly broad variety of symptom scales as indicators of an overarching construct, called ‘mental health’ here for simplicity. While this approach disregards possible differences between the various scales and their intended (specific) constructs, it takes seriously all the available evidence that has been aggregated into, and is the basis of, the HiTOP framework.
Trait mindfulness a mediator of treatment efficacy in MBIs?
Mindfulness-based stress reduction (MBSR; Kabat-Zinn, 1982) has been the first widely used clinical intervention incorporating mindfulness practices, such as meditation and breathing exercises. Since then, the third wave of cognitive behavioral therapies has introduced numerous novel interventions that emphasize mindfulness (Hayes & Hofmann, 2017), such as mindfulness-based cognitive therapy (MBCT; Segal, Williams, & Teasdale, 2002), acceptance and commitment therapy (ACT; Hayes, Luoma, Bond, Masuda, & Lillis, 2006), and dialectical behavior therapy (DBT, Linehan, 1993). Despite conceptual differences, various MBIs lead to improvements in mental health and are effective for a wide range of problems and diverse populations (Goldberg et al., 2022; MBIs, meditation, and mindfulness do have beneficial effects beyond mental as well: for reviews, see, e.g., Donald et al., 2018; Eberth & Sedlmeier, 2012).
Moreover, MBIs lead to small-to-medium sized (i.e., Cohen d = 0.2 to 0.5; Cohen, 1992) increases in trait mindfulness as well (e.g., Goldberg et al., 2019; Quaglia et al., 2016; Tran et al., 2022; Visted et al., 2015). As trait mindfulness can be considered as the probably most proximal outcome of MBIs, this suggested that it also could be a possible mediator (“mechanism of action”; but see Tran et al., 2022, on the use of such terminology) of the treatment efficacy of MBIs; that is, increases in trait mindfulness might explain increases in mental health for these treatments. Corroborative evidence for this idea has been provided by a SEM-based meta-analysis of 12 mediation studies (Gu et al., 2015), which suggested that increases in trait mindfulness indeed partially mediated observed improvements in mental health in MBIs, as compared to controls.
However, these findings are qualified by the observation that trait mindfulness does not only increase in MBIs, but, to a lesser extent, in non-MBI active controls as well (e.g., Goldberg et al., 2019; Tran et al., 2022). Moreover, a recent meta-analysis (Tran et al., 2022; based on roughly 10 times more studies and participants than Gu et al., 2015) showed that the strength of association between changes in trait mindfulness and changes in self-reported mental health is similar for MBIs and non-MBI active controls, and even similar in treatment-as-usual (TAU) and waitlist passive controls. Trait mindfulness fully mediated the differences in treatment efficacy between MBIs and active controls (which were slightly larger in the former), and partially mediated the (larger) differences compared to passive controls. The fact that changes in trait mindfulness were similarly associated with changes in mental health in non-MBI, TAU, and waitlist controls as in MBIs strongly suggests that the apparent mediating effects of trait mindfulness might stem from a third, more general, source of influence or variable. This could be neuroticism, as already suspected in prior related research (“Interventions addressing the part of mindfulness attributable to established personality factors should therefore be regarded as interventions targeting established personality factors rather than mindfulness. Our results show that in cross-section the unique mindfulness part is not related to mental health (a common target of mindfulness-based interventions). Therefore, a more differentiated perspective on mindfulness-based interventions’ effectiveness and underlying mechanisms is needed.”; Altgassen et al., 2024, p. 378).
Personality change in psychotherapeutic interventions
Although personality traits are considered to be relatively stable, they are malleable and do change throughout life (Graham et al., 2020; Roberts et al., 2006). They are amenable to change in the context of psychotherapeutic (Roberts et al., 2017) and digital interventions (Allemand & Flückinger, 2022). Neuroticism is the personality trait that shows the largest changes in psychotherapeutic interventions (roughly half a standard deviation, i.e., d = 0.5; Roberts et al., 2017), which is approximately half the magnitude of personality change commonly experienced across the lifespan (Roberts et al., 2006). Type of intervention apparently does not moderate this effect, neither does the type of problem addressed by the intervention. Reports of decreases in neuroticism and increases in conscientiousness are also scattered throughout the literature on MBIs (e.g., Hanley et al., 2019; Krasner et al., 2009). However, such reports have not been systematically gathered and brought into the present broader context.
The present study
The empirical overlap of the Big Five traits and trait mindfulness raises the question as to which extent the treatment efficacy of MBIs (and non-MBI active controls and TAU and waitlist controls) can be attributed to changes in trait mindfulness, as compared to concomitant changes in personality. Neuroticism has the largest overlap with trait mindfulness (Banfi & Randall, 2022; Giluk, 2009), has been identified as one of the main contributing factors to psychopathology (Lahey, 2009), is assumed to play a key role in psychotherapeutic interventions (Barlow et al., 2014), and shows the largest changes in psychotherapeutic interventions, as compared to other personality traits (Roberts et al., 2017). For these combined reasons, this meta-analysis specifically focused on neuroticism (for which also the most data were available, compared to the other Big Five personality traits; see Methods section).
We focused on changes in neuroticism explaining changes in trait mindfulness in the present study, and not the other way round, as we were specifically interested in whether previously reported mediating effects of trait mindfulness on treatment efficacy (Tran et al., 2022) could alternatively be attributed to changes in neuroticism. Also, the metatrait of stability (of which neuroticism is part of) has biological roots (Ormel et al., 2013), and there is a continuity of earlier temperamental negative affect to later neuroticism (McAdams & Olson, 2010). As a fundamental attribute, stability (and neuroticism) thus likely manifests itself earlier in the developmental process than trait mindfulness, which could be considered more accessory, especially in its cultivated aspects (for a thorough discussion on the distinction of dispositional and cultivated aspects of trait mindfulness, see Burzler & Tran, 2022), as are of relevance in the current study. Lastly, neuroticism has, compared to trait mindfulness, a longer history in psychological science and appears to be less controversial, considering both its definition and measurement (e.g., Burzler & Tran, 2022; Van Dam et al., 2018). For all these reasons, neuroticism was given precedence over trait mindfulness in the present study. However, for the sake of completeness, we also provide the results of an additional, not preregistered analysis in supplemental materials, where the order of mediation was switched.
Analysis followed the statistical approach of Tran et al. (2022; see Methods section) and addressed, and controlled for, potential effect moderation by type of client (psychiatric, non-psychiatric, other medical), type of MBI, type of control group, treatment duration, and small-study effects. Importantly, this approach investigated associations on the between-study level, which made the currently available evidence amenable to analysis in the first place. Currently, there are not sufficient studies available to meta-analytically examine the associations also on the within-study level (with meta-SEM approaches). Thus, the results presented in the following only pertain to the between-study level, whereas not the within-study level. Still, we deem these important, as they represent the currently best available evidence. We hypothesized that changes in neuroticism account for both the changes in trait mindfulness and mental health in MBIs, non-MBI active controls, and TAU and waitlist controls alike.
Transparency and openness
The meta-analysis was preregistered at the Open Science Framework (https://osf.io/4nrhj/) on May 13, 2021. In the preregistration, the hypotheses and the basic methodological approach were outlined a priori. Overall, there were minor amendments between the original preregistration and the final study: As expected in the preregistration, there were not enough studies reporting data on all Big Five personality traits. Therefore, we concentrated on studies reporting changes in neuroticism and also included studies measuring correlates (proxy variables) of neuroticism. For studies directly measuring the Big Five, all pre–post study designs were considered eligible. For studies with proxy variables of neuroticism, only randomized controlled trials (RCT) were included. The search terms were adjusted accordingly and expanded to include also the correlates of neuroticism. The final list of search terms can be found in Table S1 in the supplemental material. As there is evidence that differences between retrievable conference abstracts and subsequently published articles are negligible (Scherer & Saldanha, 2019), conference abstracts were also considered as eligible (contrary to what was initially stated in the preregistration). Also, for the examination of publication bias, contour-enhanced funnel plots (Peters et al., 2008) and a variant of the Egger regression test (Egger et al., 1997) were used exclusively. Further methods were not applied, as the aforementioned methods already provided sufficient evidence for small-study effects. The ROB 2 tool (Sterne et al., 2019) was used for all study designs and not for RCTs only. For moderator analysis, the sample was not divided into clinical and healthy samples, but psychiatric samples, non-psychiatric, and samples with other medical conditions. Rainforest plots were not used, because they currently cannot be applied to multilevel data.
Methods
This meta-analysis conforms to the PRISMA 2020 standards (Page et al., 2021). Both the PRISMA checklist and the PRISMA flowchart of studies are provided in supplemental materials.
Study inclusion criteria
Studies that were considered eligible for this meta-analysis needed to examine some type of MBI. This included MBIs, such as MBSR, MBCT, ACT, or DBT, which were delivered in-person, but also online interventions, whose efficacy in treating psychiatric disorders has been shown previously (Karyotaki et al., 2017; for MBIs, see Sevilla-Llewellyn-Jones et al., 2018; Spijkerman et al., 2016). Interventions had to be carried out for at least two weeks or more; overly simple and brief mindfulness inductions (e.g., in experimental laboratory settings) were not deemed sufficient for inclusion in this meta-analysis. Eligible study designs were randomized controlled trials (RCTs) and multi-arm (non-RCT) and single-arm prospective cohort studies. We did not restrict the literature search to RCTs, as this could have resulted in excluding too much of the available evidence in a research field with an anticipated limited amount of evidence overall. However, most included studies were RCTs nonetheless (see Results section).
Beyond that, studies had to report pre- and post-intervention scores of standardized self-report scales related to three constructs: (1) trait mindfulness, (2) neuroticism, and (3) mental health. There were more studies reporting pre and post-intervention scores for neuroticism, compared to other Big Five personality traits, because of its well-known implications for mental health (Roberts et al., 2017). Yet, in conjunction with trait mindfulness scores, this number still was small (10 studies reported pre and post scores for both trait mindfulness and neuroticism/emotional stability). Therefore, the corpus of primary studies was augmented by including studies that measured at least one of three major correlates (proxy variables) of neuroticism as well: trait anxiety, trait worry, and trait negative affect. There are strong associations between trait worry and neuroticism (r = .56 to .76; de Bruin et al., 2007; McEvoy & Mahoney, 2012; Muris et al., 2005; Kennair et al., 2021; Servaas et al., 2014). Also, anxiety represents a facet of neuroticism in both the NEO-PI and NEO-FFI personality inventories (Costa & McCrae, 1992). It further has been suggested that neuroticism and trait anxiety are one and the same trait and that both can be integrated into the construct of negative affectivity (Watson & Clark, 1984).
Measures related to mental health included scales on well-being, psychiatric symptoms, stress, and constructs like rumination. Measures related to physical complaints and biological markers were not included in this meta-analysis. There were no sampling frame restrictions, as MBIs have been shown to be effective for psychiatric, medical, and healthy individuals across all age groups (Goldberg et al., 2022).
Literature search strategies
Multiple types of scientific sources were considered as eligible data sources for this meta-analysis, such as scientific journal articles, dissertations, theses, and conference abstracts. The literature search was conducted by one researcher in online literature databases (Google Scholar, PsycINFO, PubMed, Web of Science, and Scopus) until August 2021, using a combination of keywords (see Table S1 in supplemental materials). The search terms were modified for the different databases according to their specifics (e.g., the list was shortened for Google Scholar, due to the imposed character limit). Additionally, all studies included in Roberts et al. (2017) and Tran et al. (2022) were scanned for eligibility, as were all studies that cited Roberts et al. (2017) and also contained the keyword “mindfulness” (using Google Scholar for this cited-reference search and filtering by target keyword).
Data extraction and management
Study features coded included general study information, descriptive aspects of the sample, and intervention characteristics. To assess changes in the designated outcome variables, the pre- and post-scores and the corresponding standard deviations were obtained for all outcomes. An exhaustive list of the coded information for the primary studies can be found in Table S2. For studies taken from Tran et al. (2022), coded data were already available, with the exception of study location, which was originally coded as country and was merged to the level of continents for the present meta-analysis.
Control groups were coded as waitlist, if participants receiving no intervention at all; as treatment as usual (TAU), which included unspecific, unrelated, non-therapeutic treatments or health education; or as active control group, if participants received some kind of specific psychotherapeutic treatment, an aligned treatment, psychopharmacological treatment, or physical exercise. Waitlist and TAU controls were combined in parts of the analysis. According to their health status, participants themselves were further classified into three categories (type of client): psychiatric, non-psychiatric (i.e., nonclinical, healthy), and samples with other medical conditions (other medical).
The main study characteristics and the effect sizes of the primary studies were independently coded by a second rater for the 23 out of the 45 studies, which were not taken from Tran et al. (2022). This resulted in Cohen κ values ranging from .58 to .79 (Mdn = .66; range of percentage of agreement: 71.4%–95.7%; Mdn = 80.5%). For the pre and post scores, intraclass correlation coefficients (two-way ICC; McGraw & Wong, 1996) were used to assess levels of agreement. ICCs ranged between .80 and 1.00 (further information is provided in Table S3). The low values obtained for some variables were related to a streak of systematic mistakes in the second coding. This concerned year of publication, for which date of acceptance or of being published online was erroneously coded, instead of the year in which the study was finally published in the journal (as originally intended). In addition, for some studies the second coder did not distinguish between RCTs and nonrandomized study designs. For continent, the codes for North America and Europe were consistently (but erroneously) reversed for the second half of the studies. All of these errors were rechecked with the original data and corrected. Any further disagreements were also checked back with the original data and discussed among the two raters until reaching consensus.
Risk of bias assessment
Study quality was assessed with the Risk of Bias Tool 2 (RoB 2; Sterne et al., 2019) by one researcher. This tool originally has been conceptualized for randomized trials, but, for comparability reasons, was utilized for studies with other designs as well. Study quality was rated across five domains, namely, randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of reported results, with the categories high, with some concerns, or low risk of bias in each domain per study. Additionally, an overall study rating was calculated, based on the ratings of all five domains.
Outlier diagnostics
A meta-analytic outlier analysis was performed prior to the main analysis, using forest plots and Cook’s distance values. Effect-size estimates of all three outcome variables were also checked for plausibility and mistakes in the coding for the entire study sample.
Computation of effect sizes
Standardized mean differences were calculated for all three outcomes, separately for the studies’ intervention and the control groups, using the metric of Cohen d = (Mpost –Mpre)/SDpooled, where Mpre and Mpost are the pre and post-score means and SDpooled is the pooled standard deviation of the two time points (see Formulae 11.9 and 11.10 in Cumming, 2012, p. 291). If multiple effect sizes were available for a single outcome, their overall mean was used. Magnitude of effect size was interpreted as follows (Funder & Ozer, 2019; converting their benchmarks in the metric of r of .05, .10, .20, .30, and .40 to Cohen d and rounding to the nearest multiple of 0.1): very small effect ≈ 0.1, small ≈ 0.2, medium ≈ 0.4, large ≈ 0.6, and very large ≈ 0.9. Direction of effects was chosen such that a positive sign indicated increases in mental health (e.g., a reduction in psychiatric symptoms), increases in trait mindfulness, and decreases in neuroticism and its correlates. Three effect sizes in the metric of Cohen d (one for changes in mental health, trait mindfulness, and neuroticism each) were obtained for each treatment arm in each study (i.e., six effect sizes overall for a study with one treatment and control group each).
Statistical approach
This meta-analysis utilized a novel statistical approach to meta-analytic mediation analysis introduced by Tran et al. (2022) for the between-study level. Meta-analytic structural equation modeling (such as TSSEM; Cheung & Chan, 2005), which allows for investigations on the within-study level, requires the correlation matrices of primary studies to test mediational models. However, such correlation matrices often are unavailable from research reports. Yet, mediation analysis still can be performed on the between-study level via a three-level meta-analysis (TLMA; e.g., Konstantopulos, 2011), utilizing the causal steps approach (Baron & Kenny, 1986). TLMA is akin to the random-effects meta-analytic model in that it assumes, and estimates, heterogeneity in the effect sizes. However, it can do so on different levels of the data (i.e., across and within studies), as in multilevel modeling of primary study data. In the present application of TLMA, correlation matrices are not required, only the effect sizes of changes in pre and post scores for all continuous variables that are part of the mediational model. As fewer data are required, it is therefore possible to include more studies and to attain higher analytic power using this approach, as compared to TSSEM. In addition, unlike the TSSEM, the TLMA also easily allows for testing moderator effects for any study-level variable. This enabled the examination of a number of potential effect moderators (see Moderator Analyses, below).
In the present meta-analysis, the effect sizes of the individual primary studies were non-independent and organized in a hierarchical structure of three levels (see Figure 1). TLMA provided a superior method of dealing with these dependent effect sizes via the modeling of variance components for each level. The (observed) sampling errors of the extracted effect sizes were on Level 1. On Level 2, differences between treatment and control groups in the same study were captured by the within-study variance σ2 (variance roots, i.e., standard deviation values, instead of the variance values themselves, are presented throughout the present meta-analysis, as this provided higher accuracy with fewer digits, and estimates are on the same scale as the effect sizes themselves). On Level 3, differences between the studies were captured by the between-study variance σ3. Additionally, the current data structure necessitated a further variance component, as treatment arms not only differed within studies, but also between studies (see Figure 1: Studies 1 and 2 shared the same type of control group, but differed in treatment groups). This constellation is referred to as a cross-classified data structure. Disregarding such a structure might result in biased standard errors and variance components (Luo & Kwok, 2012; Meyers & Beretvas, 2006). Therefore, an additional variance component (σT×C) was estimated, taking into account the combination of treatment and control groups. Three-level structure of the data for the meta-analysis. Note. Mi = trait mindfulness, Ne = neuroticism, MH = mental health.
The mediational model tested via TLMA is presented in Figure 2. In this model, the variable coding treatment and control groups represented the independent (causal) variable, whereas change in mental health the dependent (or outcome) variable. Change in neuroticism and change in trait mindfulness represented two intervening (or mediator) variables. The two mediators were arranged in serial succession, such that the model allowed statistically accounting for changes in trait mindfulness by changes in neuroticism. The model thus tested whether changes in mental health, which previously have been attributed to changes in trait mindfulness (Tran et al., 2022), could be attributed to changes in neuroticism as well. Descriptions of all paths of this model are provided in the note to Figure 2. The paths themselves were estimated in a series of meta-analytic regression analyses, according to the causal steps approach of Baron and Kenny (1986; see Estimation of the Paths of the Mediational Model, below). These regression analyses formally constituted meta-analytic moderator analyses. Hence, the mediational model was investigated on the between-study level by utilizing a series of meta-analytic moderator analyses in the present approach. Meta-analytic mediational model. Note. Path a = total effect of group allocation on change in neuroticism; Path b = direct effect of change in neuroticism on change in trait mindfulness, controlling for group allocation; Path c = direct effect of change in trait mindfulness on change in mental health, controlling for group allocation and change in neuroticism; Path d1 (not shown) = total effect of group allocation on change in mental health; Path d1’ = direct effect of group allocation on change in mental health, controlling for change in neuroticism and trait mindfulness; Path d2 (not shown) = total effect of group allocation on change in trait mindfulness; Path d2’ = direct effect of group allocation on change in trait mindfulness, controlling for change in neuroticism; Path d3 (not shown) = direct effect of change in neuroticism on change in mental health, controlling for group allocation; Path d3’ = direct effect of change in neuroticism on change in mental health, controlling for group allocation and change in trait mindfulness. We tested whether there were indirect effects of (1) group allocation on change in mindfulness via change in neuroticism (a*b); (2) change in neuroticism on mental health via change in mindfulness (b*c); and (3) group allocation on mental health via the two candidate mediators (a*b*c). Furthermore, we tested whether (4) group allocation had an indirect effect on mental health via change in neuroticism alone, in the presence of change in trait mindfulness (a*d3’) and ensured that (5) trait mindfulness actually mediated the effect of group allocation on mental health in the absence of change in neuroticism, by testing the indirect effect d2*c0 for statistical significance, with c0 (not shown) denoting Path c in the absence of change in neuroticism in the model.
Baseline models
The data were imported into R (version 4.1.1), and the R package metafor (Viechtbauer, 2010) was used for the majority of the computations. Following Assink and Wibbelink (2016), baseline models were fitted to the data first. The intercepts in these models estimated the amount of change in each individual outcome (change in neuroticism, trait mindfulness, and mental health) across all studies and participants regardless of group allocation (i.e., treatment vs. control group). For each baseline model, the corresponding variance components for the within-study variance (σ2), the between-study variance (σ3), and for the treatment and control group combinations (σT×C) are provided as well. Likelihood ratio (LR) tests were performed to examine whether any of these variance components were statistically significant (p < .05) in any of the three baseline models. Since variance components always have values greater than zero, the respective tests were one-sided. Confidence intervals for the variance components were estimated with the profile-likelihood method. However, to conform to the hierarchical data structure, the three-level structure was upheld even if the variance components were not statistically significant. For all other model parameters, the Knapp and Hartung (2003) adjustment, which is based on the t distribution, was used for tests of statistical significance, also using a robust sandwich-type estimator for the standard errors (Hedges et al., 2010). Following prior related research (Tran et al., 2022), the maximum-likelihood method was used for parameter estimation, because this enabled to perform LR tests.
Effect-size heterogeneity was examined with Q tests, and I2 values were calculated to provide relative estimates of cross-study effect heterogeneity. Both these statistics were calculated overall, as well as for the individual variance components in the model, using a formula provided at https://www.metafor-project.org/doku.php/tips:i2_multilevel_multivariate. Overall I2 values around 75%, 50%, and 25% were interpreted as high, medium, and low levels of heterogeneity (Higgins et al., 2003).
Estimation of the paths of the meta-analytic mediational model
A series of regression analyses with varying predictors were performed to estimate the paths of the mediational model (Figure 2), mirroring the similarly stepwise approach of the widely used PROCESS macro (https://www.processmacro.org/index.html) for mediation analysis of primary data on the within-study level. In total, seven models were fitted to the data. All models included the causal variable (treatment arms) as predictor and either used change in neuroticism (Model 1; estimating Path a), change in trait mindfulness (Model 2 [estimating Path d2] and Model 3 [estimating Paths b and d2’]), or change in mental health (Model 4 [estimating Path d1] and Model 5 [estimating Path d3]) as the dependent variables (i.e., outcomes). Models 3 and 5 also included change in neuroticism, but not change in trait mindfulness, as a further predictor. Model 6 investigated change in mental health as outcome and included change in trait mindfulness, but not change in neuroticism, as a further predictor. This model was run in order to test whether there indeed is an indirect effect of the treatment arms via change in trait mindfulness on change in mental health (estimating Path c in the absence of change in neuroticism, c0, and testing d2*c0 for statistical significance). Model 7 (estimating Paths c, d1’, and d3’) resembled Model 6, but included change in neuroticism in addition. This model was used in order to test whether the indirect effect via change in trait mindfulness could statistically be attributed to change in neuroticism (i.e., whether a*d3’ is significant, whereas a*b*c not). Two additional, not preregistered, analyses also (1) switched the order of neuroticism and trait mindfulness in the mediational model (see Figure 2) and (2) limited the corpus of studies to those which directly assessed neuroticism (i.e., not using any proxy measures). The second analysis allowed for comparisons concerning the consistency of results of the main analysis with the results exclusively based on studies that assessed neuroticism strictly directly.
The treatment arms were represented by a multicategorical variable which was dummy-coded for analysis. MBIs were used as baseline (combining all different interventions), with the dummy variables enabling to test deviations of the active and combined TAU and waitlist control groups from this baseline (see Results, below). All models described in this section formally constituted meta-analytic moderator analyses. Tests of significance for all moderators in these analyses were performed with robust F tests (Frobust; Hedges et al., 2010).
The main analysis was not based on a model that estimated all paths in the mediational model (Figure 2) simultaneously (cf. Tran et al., 2022). This was because the number of available primary studies was smaller and the mediational model more complicated in the present study than the one tested in Tran et al. (2022), incorporating a large number of parameters which set tight constraints on computational feasibility. However, the results of a simplified model, estimating all paths of the mediational model simultaneously, are presented as well. This allowed for comparisons concerning the consistency of the obtained results of the stepwise approach in the main analysis with those of an alternative one-step approach.
Ninety-five percent confidence intervals for all indirect effects of the mediational models (d2*c0, a*b, b*c, a*b*c, and the alternative path a*d3’; see Figure 2) were obtained with the joint bootstrapping of the individually required models (again mirroring the approach of the PROCESS macro). In the bootstrap analyses (percentile bootstrap; using 1000 bootstrap samples, as in Tran et al., 2022), the variance estimates of σT×C were set to 0, and interactions of Paths b and c with active and combined TAU and waitlist controls (which were not significant in prior analyses) were not included in the models. This was done for ease of computation and to increase the likelihood of model convergence. Of the 1000 bootstrap samples for the tests of a*b, b*c, a*b*c, and a*d3’, there were 15 instances of non-converging models (1.5%), for which new bootstrap samples were created in a second run; for d2*c0, 11 samples (1.1%) needed to be redrawn. In the additional analysis, 4 samples (0.4%) had to be redrawn.
Meta-analytic effect moderator analyses
A number of additional relevant variables were investigated as to whether they were associated with the observed effects in primary studies (following similar analyses in Tran et al., 2022). First, type of client was investigated in all models, contrasting studies with psychiatric samples, non-psychiatric samples, and samples with other medical conditions (again using dummy-coding and psychiatric samples as baseline). Second, type of MBI (MBSR, MBCT, others) and control group (active, TAU, waitlist) were investigated (all dummy-coded) in all models. Third, effects of treatment duration were examined for the MBI study arms. Lastly, we also tested whether the measurement method for neuroticism (directly vs. via proxy variables) had an effect on the observed magnitude of change in this variable. Inter alia, these analyses served to elucidate and delineate which interventions worked for whom and under which conditions such effects did appear (see Bryan et al., 2021).
Publication bias and study quality
In addition to the above moderator analyses, the corpus of included primary studies was examined for small-study effects. This relates to the phenomenon that smaller studies might report larger effects, which could be due to publication bias (Sterne et al., 2001). Hence, it was examined whether less precise studies (i.e., smaller studies with larger standard errors) tended to report larger effects. This was analyzed by adding the standard error of the dependent effect size as a predictor to the baseline model in separate analyses (also controlling for type of client), which is akin to performing Egger’s regression test (Egger et al., 1997) in a multilevel framework. Further, it was examined whether study quality predicted effect size beyond any small-study effects. Risk of bias ratings for the different domains of study quality (randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome and selection of reported result), as assessed with the Cochrane RoB 2 instrument, were added as predictors of the standard error in separate models.
For the visual inspection of publication bias and conspicuous patterns relating to statistical significance, contour-enhanced funnel plots were applied (Peters et al., 2008; see Kossmeier et al., 2020, for the advantages of this graphical device). The contours of this type of funnel plot display the different levels of statistical significance (p < .05, p < .01). This helps detecting publication bias due to the suppression of non-significant findings. Asymmetries arising from studies missing in areas where nonsignificant studies would have been expected were interpreted to be likely caused by publication bias.
Results
Characteristics of the primary studies
Overall, 45 studies were included, totaling 2913 participants. A PRISMA study flow chart is provided in Figure S1, a bar chart of studies’ publication years in Figure S2. One study (*Delgado-Pastor et al., 2015) had two intervention groups, of which both were included, resulting in 46 MBI treatment arms across the set of 45 studies. Ten studies directly measured neuroticism, the remaining 35 studies measured proxy variables of neuroticism (17 anxiety, 16 worry, 1 anxiety and worry, 1 negative affect). The Big Five personality traits were most frequently measured with the Big 5 Personality Inventory (John & Srivastava, 1999), trait anxiety mainly with the State-Trait Anxiety Inventory (Vigneau & Cormier, 2008) and trait worry primarily with the Penn State Worry Questionnaire (Meyer et al., 1990). A full list of the used inventories can be found in Table S5 in the supplemental material. Of the participants, 75% were women, and sample mean age across studies was 39 years (SD = 14.2, Mdn = 41.4, IQR = 18.6). Of the 45 included primary studies, 28 comprised non-psychiatric, 10 psychiatric, and 7 other medical clients.
The majority (27 out of 45 studies) had treatments with a duration of 8 weeks; the mean treatment duration across all studies was 7.6 weeks (minimum 2, maximum 18). Thirty-nine studies were designed as RCTs, four were non-randomized cohort studies (Crescentini et al., 2018; Fabbro et al., 2020; Robins et al., 2019; Smith et al., 2008), two studies were single arm trials (Krasner et al., 2009; Ortet et al., 2020). MBCT was investigated in 12 studies, MBSR in ten, and other MBIs in 24 studies. Only 6 studies (Agee, 2006; Avdagic et al., 2014; Giommi et al., 2021; Smith et al., 2008, 2018; Zettle, 2003) had active control groups, such as cognitive-behavioral therapy, cognitive-behavioral stress reduction, or progressive muscle relaxation; the majority had TAU (11) or waitlist (26) control groups. Studies were published between 2003 and 2021. A total of 19 studies were conducted in Europe (4 from Italy, 3 each from Spain and the Netherlands, 2 each from England, Norway, Ireland, single studies from Scotland, Germany, and Sweden), 15 in North America (14 in the United States, 1 in Canada), 5 studies in Asia (2 from China, single studies from Japan, Israel, Iran), 5 studies in Australia, and 1 study in Brazil.
A list of all included studies is provided in supplemental materials. All coded study information and extracted effect sizes are provided in Tables S4 and S5, domain ratings of study quality are provided in Table S6. Overall, 21 studies were evaluated to exhibit “high” risk of bias, whereas 24 received a “some concerns” rating. The randomization process was rated in 11, 17, and 17 studies to entail “high,” “some concerns,” or “low” risk of bias, respectively. The respective numbers were 9, 12, and 24 for the ‘deviations from intended interventions’ domain, and 8, 5, and 32 for the “missing outcome data” domain. All 45 studies were rated with “some concerns” in the “measurement of the outcome” domain, as only self-report measures were used. Thirty-six studies were rated with “some concerns” in the “selection of reported results” domain, whereas nine with “low” risk of bias.
One effect size in one study was identified as a possible outlier (Russell et al., 2019; concerning worry as measured with the PSWQ; for forest plots, see Figures S3 to S5; for Cook’s distances, see Figure S6). The exceptionally large effect size (d = 2.14 for the intervention group and d = 1.96 for the control group) appeared to be a consequence of reporting standard errors instead of standard deviations. Its value was accordingly adjusted and analyses were performed with the corrected value (d = 0.31 for the intervention group and d = 0.29 for the control group).
Baseline models
Parameters of the baseline TLMA models for all outcome variables.
Note. SE = standard error; CI = confidence interval; t = t value with Knapp and Hartung (2003) adjustment; LRT = likelihood-ratio test; Q = heterogeneity among all effect sizes; df = degrees of freedom; I2 = total effect-size heterogeneity of the model is shown in the first line, followed by the percentage of heterogeneity accounted for by the variance components; Intercept = effect-size estimate across all groups; σ2 = within-study variance (Level 2); σ3 = between-study variance (Level 3); σT×C = heterogeneity caused by varying treatment and control group combinations within studies.
***p < .001.
Q tests were significant for all outcomes, suggesting cross-study effect-size heterogeneity beyond chance (i.e., mere sampling error). I2 values suggested moderate to high amounts of effect heterogeneity relative to sampling error, ranging from 67% (neuroticism) to 84% (mental health). Heterogeneity was, for all three outcomes, mostly due to variation of treatment and control group combinations within studies, followed by between-study differences for changes in neuroticism and changes in mindfulness, and within-study difference for changes in mental health. All variance components accounted for significant amounts of effect-size heterogeneity in at least one of the three outcomes. Given this, all variance components were kept in all further analyses.
Moderator analyses
Type of client and type of neuroticism measure
Psychiatric, non-psychiatric, and other medical samples differed with regard to the overall change in neuroticism, Frobust (2, 42) = 7.99, p = .001, but neither with regard to trait mindfulness, Frobust (2, 42) = 1.26, p = .29, nor mental health, Frobust (2, 42) = 1.68, p = .20. Changes in neuroticism were significantly smaller in other medical samples, d = 0.22, 95% CI = [0.17, 0.27], in comparison to psychiatric, d = 0.52, 95% CI = [0.35, 0.68], and non-psychiatric samples, d = 0.37, 95% CI = [0.25, 0.49]. Sample (i.e., client) type was included as a control variable in all further analyses. Controlling for type of client, proxy and direct neuroticism measures did not differ in their amount of change (direct relative to proxy: d = −0.07, 95% CI = [−0.24, 0.09], p = .38).
Treatment duration
Treatment duration did neither moderate overall changes in neuroticism (Frobust (1, 41) = 0.05, p = .83), nor trait mindfulness (Frobust (1, 41) = 1.43, p = .24), or mental health (Frobust (1, 41) = 0.43, p = .52).
Evidence of publication bias and effects of study quality
The effect sizes of the changes in neuroticism, trait mindfulness, and mental health were all positively associated with their respective standard errors (neuroticism: B = 2.49, SE = 0.54, 95% CI = [1.39, 3.58], p < .001; trait mindfulness: B = 2.09, SE = 0.73, 95% CI = [0.62, 3.57], p = .007; mental health: B = 3.22, SE = 0.75, 95% CI = [1.70, 4.74], p < .001). That is, studies with smaller samples and correspondingly less precise effect estimates tended to report larger changes in neuroticism, trait mindfulness, and mental health than larger studies (with more precise effect estimates). Small-study effects were also evident by the asymmetric patterns in the contour-enhanced funnel plots (Figure S7).
All five domain ratings and the overall study quality rating were investigated for possible associations with the reported effect sizes over and above small-study effects in all three outcome variables. Each quality rating was individually tested against small-study effects, resulting in 6 * 3 = 18 analyses. However, as the quality rating for the domain “measurement of the outcome” was the same for all studies (see above), 5 * 3 = 15 such analyses were performed in total. To counteract type-1-error inflation, a Bonferroni correction was applied, resulting in an adjusted α level of 0.05/15 = 0.003. There were no substantial or consistent effects of study quality (ps ≥ .041; detailed results are provided in Tables S7 to S9).
Meta-analytic mediation analysis on the between-study level
Type of treatment and control group (paths a, d1, and d2)
Effect-size estimates of change in the outcome variables for three MBI and three control conditions.
Note. All effect estimates are in Cohen d, with associated robust standard errors and 95% confidence intervals and refer to psychiatric samples; values for the control variables represent deviations to the studies with psychiatric samples; MBSR = mindfulness-based stress reduction; MBCT = mindfulness-based cognitive therapy; MBI = mindfulness-based intervention; TAU = treatment as usual; σ2 = within-study variance (Level 2); σ3 = between-study variance (Level 3); σT×C = heterogeneity caused by varying treatment and control group combinations within studies; a not available, could not obtain upper bound due to convergence problem.
*p < .05, **p < .01, ***p < .001.
Effect-size estimates of change in the outcome variables comparing MBIs to the active and combined TAU and waitlist control conditions.
Note. All effect estimates are in Cohen d, with associated robust standard errors and 95% confidence intervals and refer to psychiatric samples; values for the control variables represent deviations to the studies with psychiatric samples; MBI = mindfulness-based intervention; σ2 = within-study variance (Level 2); σ3 = between-study variance (Level 3); σT×C = heterogeneity caused by varying treatment and control group combinations within studies.
*p < .05, **p < .01, ***p < .001.
Effects of the MBIs and relative effects of the control conditions in the mediational model on the between-study level.
Note. N = neuroticism; TM = trait mindfulness; MH = mental health. All effect estimates are in the metric of Cohen d, with associated robust standard errors and 95% confidence intervals (for the paths involving continuous predictors, i.e., b, c, d3, and d3’, this requires assuming values of 1 in each of these predictors to directly interpret the numbers in the table as mean differences); effects (and significance levels) for the active and combined TAU and waitlist controls are relative (i.e., deviations) to the effects of the mindfulness-based interventions (MBIs).
*p < .05, **p < .01, ***p < .001.
Mediating effects of trait mindfulness in a model without neuroticism
Change in trait mindfulness was significantly associated with change in mental health, Path c0, B = 0.55 (0.16), 95% CI = [0.23, 0.86], p = .001 (interactions with active and combined TAU and waitlist controls were not significant, ps = .89 and .58, respectively). Controlling for this association, the direct effect of the MBIs on change in mental health was d = 0.29 (0.10), 95% CI = [0.09, 0.49], p = .006 (relative effect in the active controls: 0.21 (0.21), 95% CI = [−0.22, 0.64], p = .33; relative effect in the combined TAU and waitlist controls: −0.23 (0.10), 95% CI = [−0.44, −0.02], p = .034).
The 95% bootstrap confidence interval of d2*c0 = 0.31 in the MBIs was [0.17, 0.50], that is, it did not include zero. Thus, there was a significant indirect effect via change in trait mindfulness on change in mental health in the MBIs. The point estimates and confidence intervals for the active and combined TAU and waitlist controls were 0.17 [−0.03, 0.47] and 0.03 [−0.05, 0.12], respectively. Only the relative indirect effect in the combined TAU and waitlist controls differed significantly from the estimate in the MBIs (−0.28 [−0.45, −0.15]; active controls: −0.15 [−0.34, 0.08]). This provides evidence that the indirect effect of change in trait mindfulness on change in mental health was of similar magnitude in the MBIs and active control groups, but diminished in the combined TAU and waitlist controls.
Mediating effects of neuroticism and trait mindfulness (paths b, c, d1’, d2’, d3, and d3’)
In the full mediational model, change in neuroticism was significantly associated with change in trait mindfulness (Path b; Table 4) in the MBIs, active controls, and combined TAU and waitlist controls alike (i.e., no evidence for moderated mediation). The association of change in mindfulness with change in mental health was small and nominally not significant (Path c; as above, no evidence for moderated mediation).
Controlling for both mediators, the direct effect of the MBIs on change in mental health was not significant and smaller than the total effect (compare Paths d1 and d1’ in Table 4). The same was true for the active controls (Paths d1 and d1’ did not differ between the MBIs and the active controls in Table 4, see also Table 3). For the combined TAU and waitlist controls, both the total effect (d = 0.09, see Table 3) and the direct effect (d = −0.06 (0.06), 95% CI = [−0.18, 0.05], p = .28; see also Table 4) were very small and not significant. However, relative to the MBIs, d1’ was significantly smaller in the combined TAU and waitlist controls.
Controlling for change in neuroticism, the direct effect of the MBIs on change in trait mindfulness was small and not significant (Path d2’ in Table 4), whilst the total effect had been of large size (Table 3). The direct effects of the active and combined TAU and waitlist controls were significantly smaller than the direct effect of the MBIs (Table 4) and not significant themselves, d = −0.24 (0.17), 95% CI = [−0.58, 0.10], p = .16 for the active controls, and d = −0.08 (0.05), 95% CI = [−0.19, 0.03], p = .13 for the combined TAU and waitlist controls.
In contrast, change in neuroticism had a medium-to-large direct effect on change in mental health in all three groups, both in a model that did not include change in trait mindfulness (Model 5; Path d3 in Table 4), and in a model that did include change in trait mindfulness as a further mediator (Model 7; Path d3’). This pattern suggests that the effect of the group allocation on mental health was mediated by change in neuroticism (but not change in trait mindfulness) in the full mediational model on the between-study level.
Tests of significance of the indirect effects
Absolute and relative indirect effects in the mediational model on the between-study level.
Note. N = neuroticism; TM = trait mindfulness; MH = mental health. All effect estimates are in the metric of Cohen d and are presented alongside their 95% bootstrap confidence intervals (which requires assuming values of 1 for each of the continuous predictors in Paths b, c, and d3’ to directly interpret the numbers in the table as mean differences); relative indirect effects for the active and combined TAU and waitlist controls are relative (i.e., deviations) to the effects of the mindfulness-based interventions (MBIs).

Meta-analytic mediational model with estimates for all paths for the MBIs on the between-study level. Note. All effect estimates are in the metric of Cohen d (assuming values of 1 for each of the continuous predictors in the Paths b, c, and d3’); parameter estimates stem from Models 1, 2, and 7; Path a = total effect of group allocation on change in neuroticism; Path b = direct effect of change in neuroticism on change in trait mindfulness, controlling for group allocation; Path c = direct effect of change in trait mindfulness on change in mental health, controlling for group allocation and change in neuroticism; Path d1’ = direct effect of group allocation on change in mental health, controlling for change in neuroticism and trait mindfulness; Path d2’ = direct effect of group allocation on change in trait mindfulness, controlling for change in neuroticism; Path d3’ = direct effect of change in neuroticism on change in mental health, controlling for group allocation and change in trait mindfulness. *p < .05, **p < .01, ***p < .001.
Controlling for small-study effects (i.e., also including the standard errors of the changes in trait mindfulness or mental health, respectively, in Models 1 and 7) somewhat diminished (but otherwise left intact) the estimates of Path a (MBIs: d = 0.25 (0.10), 95% CI = [0.05, 0.45], p = .017; relative effect in the active controls: −0.19 (0.15), 95% CI = [−0.48, 0.11], p = .21; in the combined TAU and waitlist controls: −0.37 (0.05), 95% CI = [-0.47, −0.27], p < .001), as well as the estimates of Path d3’ (MBIs: B = 0.33 (0.16), 95% CI = [0.004; 0.66], p = .048; relative effect in the active controls: B = 0.65 (0.56), 95% CI = [−0.49; 1.78], p = .25; in the combined TAU and waitlist controls: B = 0.17 (0.28), 95% CI = [−0.40; 0.75], p = .55).
Additional analyses
Switching the order of the mediators in the model, there was a significant indirect effect of a*b*c, but no indirect effect of a*d3’ (which now went over changes in trait mindfulness instead of changes in neuroticism; see Table S10 and Figure S8). Changes in trait mindfulness also only partially mediated changes in neuroticism (significant path d2’ in Figure S8). This indicated that while the mediating effect of trait mindfulness on mental health could be fully attributed to changes in neuroticism, this was not similarly the case the other way round on the between-study level.
Using only those studies which measured neuroticism directly and not by proxy (k = 10) yielded results consistent with the above-reported from the main analysis, the latter one in the majority including studies relying on proxy measures (see Table S11). Also, the results of an analysis that investigated the mediational model in a single step were fully consistent with the above-presented results of a stepwise approach (see Tables S12 and S13).
Discussion
This meta-analysis investigated whether intervening variables may account for previously reported associations of change in trait mindfulness with change in self-reported mental health in mindfulness-based interventions (MBIs), as well as in active and TAU and waitlist controls in such intervention designs on the between-study level. We found that change in neuroticism mediated change in trait mindfulness and fully accounted for its mediational effects on mental health. Further, controlling for change in neuroticism, most differences in the effects of MBIs, active controls, and combined TAU and waitlist controls on mental health vanished. Thus, improvements in mental health achieved through psychotherapeutic interventions in the corpus of studies eligible for this meta-analysis could be alternatively attributed to changes in trait mindfulness or in personality on the between-study level. Seen in this perspective, trait mindfulness appears to be nothing else than a proxy for neuroticism—at least at the level of measurement in self-report that is currently available to scientific research.
The current results support prior related research suggesting a common latent structure for the Big Five personality dimensions and the facets of mindfulness (Spinhoven et al., 2017; Tran et al., 2020), highlight the definitional and empirical overlaps of both constructs (e.g., Giluk, 2009), and extend these findings and insights to the field of intervention research (see Altgassen et al., 2024). Importantly, the current results may not only explain increases of trait mindfulness in non-MBIs (e.g., Goldberg et al., 2019; Tran et al., 2022), but also previously reported similar associations between change in trait mindfulness and change in mental health in MBIs, active controls, and TAU and waitlist controls (Tran et al., 2022). Furthermore, they also may explain the effect mediation of MBIs on mental health via decreased repetitive negative thinking (Gu et al., 2015), which likely can be attributed to concomitant decreases in neuroticism as well.
Thus, trait mindfulness may well be no exclusive mediator of MBIs (Tran et al., 2022), because it shares central characteristics with neuroticism, which is amenable to change through psychotherapeutic interventions, largely independent of their specific goals and ingredients (Roberts et al., 2017). As neuroticism is generally highly relevant for mental health, the present findings therefore also bear some relevance for, and relate to, the Dodo bird verdict, which states that different types of psychotherapeutic interventions may produce similar outcomes, regardless of their specific components (Wampold et al., 1997). A thereby implied common-factors perspective has recently also been advanced for MBIs (Goldberg, 2022). Transdiagnostic cognitive-behavioral therapy (CBT) treatment protocols do not differ in efficacy from disorder-specific CBT protocols (e.g., Newby et al., 2015) and there are proposals for unified protocols for the transdiagnostic treatment of emotional disorders (Barlow et al., 2020), which are assumed to be linked to neuroticism in the HiTOP model. Hence, different treatments may show similar efficacy, because they address in many cases the same superspectrum of (co-occurring and comorbid) disorders, and the same personality trait that is linked with these disorders. The present results appear compatible with such a reasoning for MBIs. Other types of interventions still need more investigation.
The observed magnitude of change in neuroticism, trait mindfulness, and mental health across the studies eligible for the present evidence synthesis was in good accordance with prior results (Goldberg et al., 2019, 2022; Quaglia et al., 2016; Roberts et al., 2017; Tran et al., 2022; Visted et al., 2015). The various MBIs and the active controls did not differ concerning observed changes in neuroticism and mental health. This in accordance with preceding findings (Goldberg et al., 2018; Khoury et al., 2013; Roberts et al., 2017), which similarly did not find relevant differences between different psychotherapeutic interventions.
Increases in trait mindfulness were nearly twice the size for the MBIs compared to the active controls (d = 0.55 vs. 0.29), yet, this difference did not appear to be relevant for improvements in neuroticism and mental health, which were of similar size in both groups (ds ∼ 0.65). Changes in neuroticism were small-to-medium in TAU and waitlist controls, but not significant for trait mindfulness and mental health.
Of all mindfulness facets, Nonjudge, Actaware, and Nonreact appear to be most strongly associated with mental health (Carpenter, Conroy, Gomez, Curren, & Hofmann, 2019) and neuroticism (Hanley & Garland, 2017). Observing (i.e., the ability to notice or attend to experiences) appears to be the one facet that is increased first and foremost through MBIs and meditation, as the development of the more complex facets, like Nonjudge or Nonreact, builds on this very ability (Burzler & Tran, 2022; Eisenlohr-Moul et al., 2012). The larger increases in trait mindfulness in the MBIs, compared to the active controls, thus likely mirrored the specific increases in Observe, which were necessary, given the goals and techniques utilized in these interventions, for the promotion of mindfulness. However, only part of the total increase in trait mindfulness, namely, that related to Nonjudge, Actaware, and Nonreact, probably was relevant for the changes to neuroticism and mental health (on this topic, see also Altgassen et al., 2024). The results of our additional analysis, switching the order of the mediators in the model, are also consistent with such a conclusion. If the overall reasoning were correct (namely, only part of the total change in trait mindfulness is relevant for changes in neuroticism and mental health), this could also solve the apparent “conundrum” (in the word of Goldberg et al., 2019) that changes in the mediator (trait mindfulness), which is more proximal to the intervention, could be smaller than changes in the more distal outcome (mental health) for MBIs in previous research, as well as in the present meta-analysis.
Increases in mental health were smaller for medical and non-psychiatric patients, yet these differences were not significant. This is in accordance with Goldberg et al. (2022), showing that MBIs appear to be beneficial for both clinical and nonclinical samples. Yet, sample type had a significant impact on overall changes in neuroticism. Decreases in neuroticism were significantly smaller in medical samples than for non-psychiatric and psychiatric samples (and also smaller in non-psychiatric than in psychiatric samples, when merging all MBIs into a single group for analysis). This could relate to the nature of symptoms treated in the group of psychiatric patients, which likely were more closely tied to the emotional dysfunction superspectrum of disorders (in the sense to the HiTOP model), for which neuroticism acted as a direct predictor, than in the other sample types. This idea could be beneficially followed up and expanded in future inquiry along these lines.
The duration of the MBI did not moderate changes in mental health. This in accordance with Khoury et al. (2013) and Goldberg et al. (2022): out of a total of 13 moderator tests conducted in the course of these predecessor meta-analyses, focusing on intervention dosage, only a single one indicated a larger effect for longer interventions. Yet, adherence to and time spent with meditation practices outside of the specific intervention may also need to be considered more closely in future studies (in this context, see Tran et al., 2022, on the lack of standards for primary studies to assess such additional practices properly).
Limitations and future directions
All mediational analyses were on the between-study level, which precludes firm conclusions concerning the respective associations on the within-study level. However, the observed pattern of results is fully compatible with a host of other studies, which previously have pointed out the content and construct overlap of the involved constructs on the within-study level. As there currently is no evidence that the mediational model might differ on the within-study level from the between-study level in any substantive way, the present results constitute the currently best available evidence. It is self-evident that investigations on the within-study level are urgently needed. For meta-analytic investigations on the within-study level, there is not only a need of more primary studies, but also of studies providing their full covariance matrices (or, ideally, directly calculable from open data) to make such investigations possible.
Also, the present results pertain only to the aggregate levels of trait mindfulness, neuroticism, and mental health. More detailed analyses on the facet (or lower-order) levels of these constructs were beyond the scope of this study, because these currently are unfeasible, considering the available literature base. Further, all outcomes were measured with self-report scales, so some of the overlap could be related to common-method variance (i.e., variance attributable to the used measurement method rather than the measured constructs; Podsakoff et al., 2003, 2024). Thus, the transient mood state of the RCT participants could have affected their response behavior and therefore might have produced artificial covariance. Other common biases, such as acquiescence (i.e., tendency to agree with items regardless of their content) and social desirability phenomena might have further influenced response behavior (Podsakoff et al., 2003, 2024). Additionally, jangle fallacies probably not only concerned trait mindfulness and neuroticism, but also mental health (and its various measures), neuroticism (and its proxies), and trait mindfulness. While these potential overlaps were beyond the focus of the present study (see Hanfstingl et al., 2024, for systematic approaches to elucidate and resolve suspected jangle-fallacy phenomena), there is evidence that not only trait mindfulness shows content overlap with neuroticism in popular self-report scales, but also with mental health, and mental health, in turn, with neuroticism (e.g., Fischer et al., 2023). Such possible overlaps need to be borne in mind when interpreting the results of the current study and need to be addressed in future research.
There is thus a need for the (alternative or complimenting) use of objective assessment methods as well. For personality, there are methods like the conditional reasoning approach (James & LeBreton, 2012), which is an implicit method of measuring personality using an inductive reasoning problem that involves several correct answers that reflect the person’s personality (i.e., motives). Further, there is the implicit association task (Asendorpf et al., 2002). Informant reports and behavioral assessments are still further alternative approaches to measure personality. Informant reports have shown to provide valid results and are relatively low in cost and effort. They still are an underutilized source of information that have the potential to improve the validity of personality assessments (Vazire, 2006). Further, studies could also consider using multimethod approaches combined (e.g., self-reports alongside informant reports) to enhance the validity of scores for the assessment of personality. Likewise, informant reports or clinical expert ratings could be considered more explicitly for the assessment of mental health. The use of more objective assessment methods is also discussed in the literature on MBIs (e.g., Goldberg et al., 2022). Besides behavioral or physiological measurements, telomerase activity (i.e., the activity of an enzyme that acts on the telomeres, which protect the chromosomal ends from deterioration) could be a possible candidate on the cellular level (Bossert et al., 2023). Also, there are a number of biomarkers of stress and inflammation, which warrant further investigation (Grasmann et al., 2023).
The results of the current meta-analysis indicated the presence of small-study effects, which likely inflated observed overall effects. Yet, this did not affect the direction of effects, or the associations between the included variables. A common, reasonable explanation for small-study effects is publication bias, a perennial and ubiquitous problem in empirical science in general that needs to be further addressed in this specific research field as well (Goldberg et al., 2022; Tran et al., 2022). The current publication system still disfavors in many instances negative findings, which inflates effect-size estimates in meta-analyses (Fanelli, 2011). Publication pressure pushes researchers to engage in questionable research practices (Wicherts, 2017). Together, this biases reported effects and impedes future research. More preregistered studies are therefore needed. Not significant results and primary studies’ data, materials, and analytic code also need to be made publicly accessible.
Overall, the quality of the included studies was moderate to low, corroborating the results of previous research (Goldberg et al., 2022; Tran et al., 2022). Only five studies were preregistered (Gordon et al., 2021; Masih et al., 2020; Ninomiya et al., 2020; Oken et al., 2017; Russell et al., 2019); almost all studies relied on self-report methods only. Yet, other problems were bound to the nature and design of the included studies. For example, participant blinding really is impossible in psychological intervention studies. In many instances, required information was only imprecisely reported or not provided at all in included studies (e.g., concerning the randomization method). This increased subjectivity in the rating. Goldberg et al. (2022) further reported inconsistencies in the interpretation of the different domains between different meta-analyses. The assessment and interpretation of study quality still needs more attention in future research.
The interpretation of the effects in the present meta-analysis is limited by the number of available studies. More studies with larger sample sizes are also still needed. The 45 studies included in the current meta-analysis had less than 3000 participants in total (i.e., less than 70 participants on average per study). Yet, obtained estimates for all outcomes were consistent with prior research. Studies, interventions, and observed effects were also heterogeneous. Hence, the aggregated effects represent average expected effect sizes. Still, important study variables, like intervention and sample type, were not evenly distributed in the study sample. This needs to be considered concerning the expectable generalizability of results. Also, some further potentially interesting variables (like type of trait mindfulness or the specific mental-health measure used) could not be included and controlled for into the main analysis.
The present study reaffirms that treating the concept of personality as something immutable would not be consistent with extant evidence. Accordingly, it appears to be important to include measures of neuroticism more frequently in intervention studies. Currently, studies often asses personality traits, such as neuroticism, only at baseline, but not at later measurement occasions. Only 10 studies directly examined personality change in intervention study designs (Armstrong & Rimes, 2016; Crescentini et al., 2018; Fabbro et al., 2020; Jacobs et al., 2011; Krasner et al., 2009; Oken et al., 2017; Ortet et al., 2020; Robins et al., 2019; Smith et al., 2008, 2018). Therefore, the present study often had to rely on proxy variables instead. Changes in proxy measures did not differ from changes in direct neuroticism measures; still, proxy variables might not have completely covered the full breadth of the neuroticism domain. Personality represents a fundamental building block of behavior. Accordingly, it is overdue that more studies in the context of intervention research examine the malleability of personality dimensions and their links to improvements in mental health in the future.
Previous research has proposed trait mindfulness as a “mechanism of action” of MBIs (e.g., Gu et al., 2015). Terms like “mechanism of action” or “mechanism of change,” have been defined several times, but are often misused in research on treatment outcomes and used interchangeably with terms relating to techniques and processes. Yet, when the How and Why are not addressed, the mere occurrence of change does not imply knowledge of its reasons with any certainty. The exact processes and techniques causing such change need to be identified (Petrik & Cronin, 2014). Recent research has demonstrated that trait mindfulness is not specific to MBIs in its assumed effects on mental health and proposed using for it the simpler and more general term “mediator” (or “mode”) instead of “mechanism of action” or “mechanism of change” in the absence of more detailed empirical data (Tran et al., 2022). The present research synthesis suggests that the mode of action of trait mindfulness in intervention studies may well be attributed to concomitant changes in neuroticism, rendering trait mindfulness a mere proxy variable for neuroticism. Still, the exact processes and techniques leading to change need more study. Currently, the field appears mostly far from being able to explain the How and Why of change, be it with regard to mental health or neuroticism, or MBIs, or any other active treatment. Research into the processes and techniques leading to change needs to be intensified.
Connected to this, future studies may also need to take more than two assessment points into focus to be able to address also alternative (and more refined) models of change, like the three-phase model of change in psychotherapy (Howard et al., 1993). This would also enable more insight into when in the process personality changes may actually occur (the three-phase model assumes that progressive improvements in subjective, more state-like, well-being propel symptomatic improvement, which then propels improvement in, more trait-like, life functioning—to which personality change could conceptually relate). Changes in neuroticism could thus be some sort of “transfer effect” of MBIs, whose proximate target would still be (the more malleable) trait mindfulness, for which changes might occur in earlier phases. More research is still needed here.
Lastly, the construct of trait mindfulness appears to need conceptual reconsideration and resharpening. To provide added value, trait mindfulness needs to be clearly differentiated from existing constructs (i.e., neuroticism), and differences between mindfulness scales and underlying conceptualizations, which hint at disparities and a lack of construct validity of trait mindfulness (Altgassen et al., 2024; Fischer et al., 2023; Siegling & Petrides, 2014; Tran et al., 2020), need to be addressed. Empirical and conceptual overlaps between trait mindfulness and the Big Five personality traits need to be carefully sorted out, addressing also questions such as: which is on the higher level of a common hierarchy, if there is one? Currently, the answer appears to be: The Big Five personality traits, judging from available structural analyses of empirical data (Spinhoven et al., 2017; Tran et al., 2020).
Importantly, currently available trait mindfulness scales do not explain to respondents that the behaviors and actions mentioned in their item contents could (or should) be conceived as (mindfulness-related) practices to gain a better awareness of one’s psychological state. Also, respondents are not queried for whether they actually attempt to gain, or are interested in gaining, a better awareness of their psychological state (either via mindfulness-related practices or otherwise). The contents of these scales thus appear mostly context-free (like the contents of Big Five scales) and therefore might lack sufficient face validity to non-experts and non-meditators alike (that experienced meditators and non-meditators may differently understand and respond to the contents of mindfulness scales is already widely known, see Burzler & Tran, 2022). What is more, typical response formats (usually Likert-type scales) do not request respondents to rate how often they actually apply these practices to improve the awareness of their psychological state (or the state itself); instead, respondents are asked to rate how much these statements on behaviors and actions “generally” apply to them. Thus, a first step to better distinguish trait mindfulness from neuroticism (and its proxies) and other traits of the Big Five in self-report could lie in trying to bring more context into the scale instructions, but also response formats, of currently utilized mindfulness measures. Mindfulness scales might need to better factor-in respondents’ intention of wanting to use the therein presented practices to make the contents of these scales better distinguishable from the contents of other scales (for a discussion of further response format changes, see Burzler & Tran, 2022).
Also, a closer examination of the mindfulness construct at the facet level seems indispensable. It has been suggested that Actaware might be a unique component of mindfulness with relevance to mental health (Tran et al., 2020). Yet, more detailed investigations into dispositional versus cultivated mindfulness, which is highly relevant for intervention studies, and their valid assessment currently are also needed (Burzler & Tran, 2022).
Kraemer et al. (2001) recommended combining all proxy indicators and global risk factors in the investigation of risk factors to assess their global relevance. Relatedly, all proxy indicators of trait (or cultivated) mindfulness and neuroticism could be combined to assess their global relevance for mental health in future intervention studies as well. This could help gaining a better understanding of the causal processes in a first step, before turning a more detailed look on relevant techniques and processes in isolation in later steps.
Conclusion
This meta-analysis provided evidence that change in mental health in mindfulness-based interventions (previously attributed to changes in trait mindfulness) could alternatively be attributed to concomitant changes in neuroticism instead. Change in trait mindfulness was mediated by change in neuroticism, and change in neuroticism mediated the increases in mental health. Controlling for change in neuroticism, change in trait mindfulness did not predict change in mental health anymore. Similar associative patterns were found for the active and TAU and waitlist control groups in the investigated studies as well.
The current findings suggest that trait mindfulness is an apparent general mediator of treatment efficacy, because of its intimate links with the personality dimension of neuroticism. Future research thus needs to investigate those unique features of trait mindfulness that link it to mental health in more detail and needs to intensify efforts to delineate the construct of mindfulness more clearly from established alternative and competing constructs and traits, like neuroticism.
On the other hand, personality change in the context of intervention research should also be put more in the focus in the future. The current results reaffirmed the importance and reality of personality change through psychological interventions and needs to be followed up for mindfulness-based interventions, but also other types of psychological treatment. Given their links to the HiTOP model, investigations into personality, and neuroticism in particular, in intervention research also appear to be interesting and informative from a transdiagnostic perspective of psychopathology.
Supplemental Material
Supplemental Material - Personality change through mindfulness-based interventions: A preregistered systematic review and meta-analysis
Supplemental Material for Personality change through mindfulness-based interventions: A preregistered systematic review and meta-analysis by Frederick Almenräder, Gerit S. Heßmann, Martin Voracek, and Ulrich S. Tran in European Journal of Personality
Supplemental Material
Supplemental Material - Personality change through mindfulness-based interventions: A preregistered systematic review and meta-analysis
Supplemental Material for Personality change through mindfulness-based interventions: A preregistered systematic review and meta-analysis by Frederick Almenräder, Gerit S. Heßmann, Martin Voracek, and Ulrich S. Tran in European Journal of Personality
Supplemental Material
Supplemental Material - Personality change through mindfulness-based interventions: A preregistered systematic review and meta-analysis
Supplemental Material for Personality change through mindfulness-based interventions: A preregistered systematic review and meta-analysis by Frederick Almenräder, Gerit S. Heßmann, Martin Voracek, and Ulrich S. Tran in European Journal of Personality
Footnotes
Author contributions
Conceptualization, resources, and supervision: UST and MV; data curation, formal analysis, project administration, software, and writing of original draft: FA and UST; funding acquisition: N/A; investigation: FA and GSH; methodology: FA, UST, and MV; validation: UST; visualization: FA; review and editing of draft: FA, UST, MV, and GSH.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Open science statement
All study materials, data, and analysis scripts used for this research work can be accessed at https://osf.io/wt85d and are also provided in the supplementary materials. This research was preregistered (May 13, 2021, with subsequent amendments) at https://osf.io/wt85d. The data underlying the meta-analysis and open code to reproduce the analysis can be found online at osf.io and in
.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
