Sage Journals: Discover world-class research

Abstract

Experiential Avoidance is a core construct of third-wave behavioral theories and a predictor of internalizing psychopathology. Experiential avoidance has been most frequently measured using the Acceptance and Action Questionnaire-II (AAQ-II). However, several studies have indicated the AAQ-II scale scores demonstrate poor discriminant validity from neuroticism, calling into question the interpretation of past findings and leading some researchers to suggest measuring experiential avoidance with the Multidimensional Experiential Avoidance Questionnaire (MEAQ). In large online community (N = 643) and undergraduate (N = 488) samples, discriminant and convergent validity between scale scores of the AAQ-II, MEAQ, measures of neuroticism, a measure of trait negative affect, and a mindfulness measure were tested. In addition, the joint structure of scores from all measures was tested using Goldberg’s “bass-ackward” approach at the scale, subscale, and item levels. This allows for a thorough evaluation of the latent content being captured by scores on the AAQ-II and MEAQ. Results indicate that the AAQ-II is more accurately described as measure of neuroticism/negative affectivity, more specifically the anxiety/depression facets of trait neuroticism. Moreover, the MEAQ scale scores were distinct from neuroticism and negative affectivity scale scores and functioned as expected for a measure of experiential avoidance.

Keywords

experiential avoidance discriminant validity neuroticism negative affect internalizing

Research on third-wave behavioral therapies has provided an abundance of empirical support for their efficacy in fostering well-being (Howell & Passmore, 2019; Stenhoff et al., 2020) and, by extension, treating symptoms of anxiety and depression (Byrne et al., 2019; Garey et al., 2020; Robinson et al., 2019; Zhenggang et al., 2020). As such, third-wave behavioral theories are a major focus of psychological research, particularly for improving clinical interventions (Hayes et al., 2022). Specifically, a core construct of third-wave behavioral theory, experiential avoidance, has been suggested as a key factor in the development and maintenance of psychological distress (Akbari et al., 2022; Angelakis & Gooding, 2021; Angelakis & Pseftogianni, 2021; Brereton & McGlinchey, 2020). Experiential avoidance has been defined as “the tendency to avoid uncomfortable thoughts, feelings, and experiences, even when doing so leads to long-term negative consequences” (Gámez et al., 2011; Hayes et al., 1999). This includes attempts to control internal experiences by means of suppressing thoughts and emotions and judging oneself for having negative internal experiences (Hayes-Skelton & Eustis, 2020). Past research has demonstrated that experiential avoidance predicts several clinical outcomes, such as anxiety, depression, trauma, and obsessive-compulsive symptoms (Akbari et al., 2022; Angelakis & Gooding, 2021; Angelakis & Pseftogianni, 2021; Kroska et al., 2018).

Notably, experiential avoidance expands upon typical conceptualizations of avoidance. Avoidance has been traditionally discussed as the resistance to encountering an unpleasant person, place, or situation due to fear of a specific and immediate consequence (APA, 2018). Experiential avoidance takes this further by emphasizing the act of avoiding internal experiences and by including the consequences that repeated, avoidance behaviors (both internal and external) have for one’s quality of life (Gámez et al., 2011; Hayes et al., 1999, 2016). Furthermore, traditional exposure therapies targeting avoidance as a broader construct aim to distinguish the fear of negative consequences (i.e., Cognitive Behavioral Therapy, Exposure Therapy), whereas treating experiential avoidance through a third-wave behavioral therapy involves learning to accept the experience of anxiety and engage in meaningful life experiences anyway (i.e., Acceptance and Commitment Therapy, Dialectical Behavioral Therapy).

Experiential Avoidance as a Distinct Construct

It is important to note that Hayes and colleagues (1996) state experiential avoidance is a unique and distinct construct from both personality trait neuroticism and trait negative affect (NA), two constructs that are themselves very similar and highly overlapping (e.g., r = .59; Watson et al., 2015). Neuroticism is a personality domain that captures the tendency to experience negative thoughts, feelings, and related behaviors as well as emotional volatility. Neuroticism is the strongest and most empirically supported predictor of the development, severity, and prognosis of internalizing symptoms (Clark & Watson, 1995, 2019; Kotov et al., 2010; Watson, 2012). Similarly, trait NA is a temperamental disposition toward negative emotions (Watson & Clark, 1992). Indeed, while NA is sometimes considered more specific to feelings and does not speak to the level of volatility of those feelings, neuroticism and trait negative affect are often used interchangeably. In short, neuroticism and NA are highly overlapping, strong predictors of anxiety and depressive symptoms. Whereas, again, experiential avoidance is explicitly theorized as distinct from these two constructs (Hayes et al., 1996).

As such, the theoretical distinction of neuroticism and NA from experiential avoidance is essential to third-wave behavioral theory, and many theoretical assumptions rest on the foundation of experiential avoidance as a unique entity. Moreover, it is well-established that verifying discriminant validity from neuroticism and NA is an essential step in determining the construct validity of scale scores for any clinically relevant measure (Clark & Watson, 1995, 2019; Watson, 2012). Failure to do so can lead to inaccurate results and undermine conclusions drawn (Clark & Watson, 1995, 2019; Cronbach & Meehl, 1955; Strauss & Smith, 2009). To align with theory, the distinctions of experiential avoidance from neuroticism and NA must be captured by the scale scores of measures of experiential avoidance. For the current study’s purposes, neuroticism and NA will be discussed together, given that the goal is to distinguish experiential avoidance from these constructs. However, neuroticism is discussed as the main reference point for establishing discriminant validity of experiential avoidance because of its historically robust associations with, and predictive power of, internalizing symptoms.

Issues With Measurement of Experiential Avoidance

The most used measure of experiential avoidance is the Acceptance and Action Questionnaire II (AAQ-II; Bond et al., 2011); however, concerns regarding the construct validity of scores on the AAQ-II, and therefore the measure itself, have emerged (Broman-Fulks et al., 2021; Rochefort et al., 2018; Tyndall et al., 2019; Vaughan-Johnston et al., 2017; Wolgast, 2014). These criticisms have mainly focused on the AAQ-II scale score’s poor discriminant validity from neuroticism and NA. Several researchers have criticized the AAQ-II for including items that contain language with high conceptual overlap with neuroticism, and having poor internal consistency (Gámez et al., 2011; Rochefort et al., 2018; Tyndall et al., 2019; Vaughan-Johnston et al., 2017; Wolgast, 2014). Alternatively, another measure, the Multidimensional Experiential Avoidance Questionnaire (MEAQ; Gámez et al., 2011), was specifically created to capture the theoretical distinctions of experiential avoidance from neuroticism and NA (i.e., demonstrate adequate discriminant validity) as well as to capture the full breadth of experiential avoidance (Gámez et al., 2011).

Rochefort et al. (2018) conducted the most thorough evaluation of how well scores from both measures assess experiential avoidance, show discriminant validity from neuroticism and NA measures, and align with other third-wave behavioral constructs. In two large samples, they assessed (a) the convergent validity of the AAQ-II and the MEAQ scale scores and their discriminant validity from neuroticism and NA (b) the hierarchical structure of the AAQ-II, MEAQ, mindfulness,¹ neuroticism, and negative affect at the total scale score, subscale score, and item levels. In both samples, the AAQ-II scale scores were so highly correlated with scale scores from measures of neuroticism and NA that they would be considered evidence of convergent validity (r > .70; Clark & Watson, 2019). Moreover, the AAQ-II scale scores were more strongly correlated with Neuroticism and NA scores than scores from other third-wave behavior therapy measures. In contrast, scores on the MEAQ demonstrated the expected moderate to low correlations with scores from all other measures. Factor analytic results demonstrated the scores from the AAQ-II loaded onto factors containing neuroticism and/or negative affectivity content at the scale, subscale, and item level. In contrast, the MEAQ performed as expected for a measure of experiential avoidance, forming factors with other third-wave behavioral theory content (e.g., mindfulness) or its own factors at the scale, subscale, and item level. Rochefort et al. (2018) concluded that the AAQ-II scale scores function as an indicator of neuroticism and NA rather than experiential avoidance, whereas the MEAQ scores capture experiential avoidance.

Need for Replication and Extension of Rochefort and Colleagues (2018) Study

In recent years, scientists have highlighted the importance of replication for clinical psychology (Tackett et al., 2019; Tackett et al., 2017). As noted by Tackett & Miller, (2019, p. 597) “The replication movement requires greater involvement and engagement by clinical psychological researchers.” Replication, a “core principle of objective, empirical science,” bolsters our trust in scientific results (Tackett & Miller, 2019). Moreover, despite research indicating the AAQ-II scale scores demonstrate poor discriminant validity from those of neuroticism and NA measures, it continues to be the most widely used measure of experiential avoidance; this continued widespread use of the AAQ-II suggests the existing evidence is not sufficient to change practices among researchers. In addition, it is necessary to ensure results are not due to the specific measures used in any given study. It is possible that the neuroticism and NA measures used in Rochefort et al. (2018) were driving the strength of the associations and that other measures would yield different results. As such, replicating their findings in new samples with different measures of neuroticism and negative affectivity is necessary.

It also remains unclear if the scores from the AAQ-II assess neuroticism and NA at the domain-level, or more specifically assess facet-level scores of neuroticism and/or NA measures. Facets home in on specific components of neuroticism, such as emotional volatility. This is important because the neuroticism and NA measures used by Rochefort et al. (2018) included a total of only two subscales to target neuroticism facets. Assessing more neuroticism and NA facets scores would allow a more fine-grained test; examining associations of AAQ-II and MEAQ scores with increased specificity is highly relevant for more precisely evaluating what is assessed by scores on the AAQ-II and MEAQ, as well as clarifying results from research using the AAQ-II.

Current Study

The current study replicates and extends Rochefort et al. (2018) using two large samples of online community and undergraduate participants. Following the same procedures and analyses as Rochefort et al. (2018), we expanded upon that study by using more comprehensive and/or updated measures of neuroticism and negative affectivity. This was done to (a) rule out any measure-specific effects that may have contributed to the findings in the original study and (b) include more subscales capturing facets of neuroticism and NA, thereby allowing for a more precise assessment of what the AAQ-II and MEAQ scores are assessing. Our methods, sample size, data analytic plan, and hypotheses were pre-registered (https://doi.org/10.17605/OSF.IO/DK8EX).²

Compared to Rochefort et al. (2018), the current study uses the Big Five Inventory-2 (BFI-2; Soto & John, 2017) instead of the Big Five Inventory (BFI; John et al., 1991), the NEO-IPIP (NEO-IPIP; Johnson, 2014) instead of the Big Five Aspects Scale (BFAS; DeYoung et al., 2007), and the Temperament and Affectivity Inventory (TAI; Watson et al., 2015) instead of the Positive and Negative Affect Schedule (PANAS; Watson et al., 1988). The BFI-2 and TAI were specifically developed to address the psychometric limitations of the BFI and PANAS (see “Measures”). Altogether, current measures have subscales that target 9 neuroticism facets and 8 NA facets, compared to only 2 facets in the original study (i.e., BFAS Neuroticism: volatility and withdrawal).

Method

Participants and Procedures

The current study follows the same procedures, design, and sample types as Rochefort and colleagues’ (2018) study. The first sample (n = 643) was recruited on Amazon’s Mechanical Turk (MTurk) using the same criteria as Rochefort et al. (2018). This maximizes the replicability of the original study. Given concerns regarding a decrease in data quality on MTurk (see Chmielewski & Kucker, 2020), the Cloud Research MTurk Fraud Detection toolkit was used to prevent participation from accounts that provide low-quality data (i.e., “farmers,” bots, otherwise non-valid responders), block duplicate IDs and suspicious geocodes, and verify worker country locations (Chmielewski & Kucker, 2020; Douglas et al., 2023; Hauser et al., 2022). As an additional data quality check, the reliability of all included measures was compared to previous research (see Chmielewski & Kucker, 2020).

The second sample (n = 488) consisted of undergraduate college students recruited via the subject pool.

All participants provided informed consent and completed the study online via Qualtrics. Participants were removed if they failed to correctly respond to 2 out of 3 validity check items (e.g., “please select “always true””). Both samples were primarily female (MTurk: 58.3%, undergraduate: 71.1%) and non-Hispanic white (MTurk: 93.3%, undergraduate: 89.5%). MTurk participants received monetary compensation, and undergraduate students received course credit for their participation.

Hypotheses

We hypothesize that general findings from Rochefort et al. (2018) will be replicated in the current study. Specifically, based on the original study and the literature regarding discriminant and convergent correlation cut-offs (see Clark & Watson, 2019), we hypothesize that the AAQ-II scale scores will demonstrate poor discriminant validity (r ≥ .70) from neuroticism and NA scale scores. In contrast, the MEAQ scale scores will demonstrate appropriate discriminate validity (r = 0.40–0.60) from neuroticism and NA scale scores and will have appropriate associations (r = 0.50–0.65) with scale scores representing another third-wave behavior therapy construct, mindfulness. Convergent associations between measures of the same construct should be strong (r > .70). However, based on past research (Gámez et al., 2011; Rochefort et al., 2018), the association between the AAQ-II and MEAQ is expected to fall short of this threshold. Likewise, if the AAQ-II and MEAQ scores are measuring experiential avoidance, then their associations with scores of other third-wave construct measures should be significantly stronger than with neuroticism and NA scores (i.e., z ≥ |1.96|; Steiger, 1980). However, mirroring Rochefort et al. (2018), it is expected that the MEAQ scores will meet this expectation, but the AAQ-II scores will not.

We also hypothesize that the AAQ-II scale scores will function as an indicator of latent neuroticism and NA instead of latent experiential avoidance at the scale, subscale, and item levels in our structural analyses. In other words, the AAQ-II total score and its individual items are expected to consistently load onto factors with neuroticism and NA content across levels of analysis. In contrast, the MEAQ total scale score is expected to function as expected for a measure of experiential avoidance. Specifically, the MEAQ scores will load on factors with the mindfulness measure scores, or they will form their own factors. We did not have any pre-registered hypotheses regarding whether the AAQ-II scores assess neuroticism or NA broadly or are best conceptualized as facets of neuroticism and/or NA.

Measures

Third-Wave Behavioral Therapy Scales

Acceptance and Avoidance Questionnaire-II

The Acceptance and Avoidance Questionnaire-II (AAQ-II) is a widely used measure of experiential avoidance³ (Bond et al., 2011; Hayes et al., 2006). It provides a total score made up of seven items (e.g., “I’m afraid of my feelings,” and “I worry about not being able to control my worries and feelings”) that participants rate using a Likert-type scale ranging from 1 (“Never or very rarely true”) to 5 (“Very often or always true”). Cronbach’s alpha for this study was α = .95 in the MTurk sample and α = .93 in the undergraduate sample, which resembled the Rochefort et al. (2018) (MTurk: α = .93, college: α = .91).

Multidimensional Experiential Avoidance Questionnaire

The MEAQ is a measure of experiential avoidance that was developed using the construct validity approach to scale development (Gámez et al., 2011). In the original development study, the MEAQ had good internal consistency across student, community, and clinical samples (scale level: α = 0.91–.95, subscale level: .76–.90, M = .83–.87) and appropriate discriminant correlations with measures of neuroticism and negative affectivity (Gámez et al., 2011). It contains 62 items rated on a Likert-type scale ranging from 1 (“Strongly disagree”) to 6 (“Strongly agree”). The MEAQ provides a total scale score and subscale scores for specific components of experiential avoidance: Behavioral Avoidance (e.g., “I am quick to leave any situation that makes me feel uneasy”), Distress Aversion (e.g., “I hope to life without any sadness and disappointment”), Procrastination (e.g., “Why do today what you can put off until tomorrow”), Distraction and Suppression (e.g., “When a negative thought comes up, I immediately try to think of something else”), Repression and Denial (e.g., “People have told me that I’m not aware of my problems”), and Distress Endurance (e.g., “When working on something important, I won’t quit even if things get difficult”). The scale-level Cronbach’s alpha was α = .94 in the MTurk sample and α = .93 in the undergraduate sample. The subscale level Cronbach’s alphas ranged α = .90–.93 in the MTurk sample and α = .86–.90 in the undergraduate sample. These alphas were identical to those found in the original study (Rochefort et al., 2018).

Five Facet Mindfulness Questionnaire

The Five Facet Mindfulness Questionnaire (FFMQ) is a measure of mindfulness that consists of 39 items. Participants rate items using a Likert-type scale ranging from 1 (“Never or very rarely true”) to 5 (“Very often or always true”). This measure was developed via factor analysis of items from five existing mindfulness measures and is the most comprehensive mindfulness measure to date (Baer et al., 2006). The FFMQ has five subscales: Observe (e.g., “I pay attention to sensations, such as the wind in my hair or sun on my face”), Describe (e.g., “I can usually describe how I feel at the moment in considerable detail), Act with Awareness (e.g., “I find myself doing things without paying attention”), Nonjudge (e.g., “I tell myself I shouldn’t be feeling the way I’m feeling”), and Nonreact (e.g., “When I have distressing thoughts or images, I just notice them and let them go”). The subscales combine to create an overall, total scale score representing a higher-order dimension of “mindfulness.” In the current samples, the total scale Cronbach’s alpha was α = .92 in the MTurk sample and α = .88 in the undergraduate sample. The subscale score Cronbach’s alphas ranged α = .87 to .93 in the Mturk sample and α = .80 to .90 in the undergraduate sample. Alphas were nearly identical to those found in Rochefort et al. (2018).

Neuroticism and Negative Affect Scales

Big Five Inventory-2–Neuroticism (BFI-2-N)

The BFI-2 is a 60-item measure of the Big Five personality traits. Participants rate how much each phrase generally describes them using a Scale that ranges from 1 (“Strongly agree”) to 5 (“Strongly disagree”). The BFI-2 is an update to the BFI, adding subscales (Anxiety, Depression, and Emotional Volatility) and rewording difficult items (Soto & John, 2017). In the current samples, Cronbach’s alpha for the full neuroticism scale was α = .93 in the MTurk sample and α = .89 in the undergraduate sample. For the subscale scores, the Cronbach’s alphas ranged α = .83 to .86 in the MTurk sample and α = .79 to .83 in the undergraduate sample. The domain scale alphas were like Soto and John (2017), α = .90, but the subscale alphas in the current study were generally higher (α = .78 to .84; Soto & John, 2017).

International Personality Item Pool–NEO, Neuroticism (IPIP-N)

The International Personality Item Pool (IPIP)-NEO is an open-source measure mirroring the NEO PI-R (Costa & McCrae, 2008; Goldberg, 1999; Goldberg et al., 2006). Like the NEO PI-R, it assesses the Big Five broad trait domains, and six lower-order facets of each domain. The IPIP Neuroticism scale contains 60 items and includes subscales targeting the following facets: Anxiety (e.g., “Worry about things”), Anger (e.g., “Get angry easily”), Depression (e.g., “Often feel blue”), Self-Consciousness (e.g., “Am easily intimidated”), Immoderation (e.g., “Often eat too much”), and Vulnerability (e.g., “Become overwhelmed by events”). Participants rate how much these items generally describe them by using a Likert-type scale that ranges from 1 (“Strongly Agree”) to 5 (“Strongly Disagree”). In the current sample, Cronbach’s alpha for the Neuroticism total scale was α = .97 for the MTurk sample and α = .95 for the undergraduate sample. The subscale Cronbach’s alphas ranged α = .82 to .91 in the MTurk sample and α = .75 to .91 in the undergraduate sample, all of which were slightly higher than those found in the original development of the measure; Johnson (2014) had a Neuroticism domain scale alpha of α = .90, and subscales ranged α = .72 to .87 (Johnson, 2014).

Temperament and Affectivity Inventory

The Temperament and Affectivity Inventory (TAI) is a 93-item measure that assesses trait affectivity developed using the traditional personality format (i.e., full sentences rather than single words or phrases); scales were validated in four samples and showed good convergent and discriminant validity, incremental validity, internal consistency, and dependability (Watson et al., 2015). The TAI was created to address limitations of the single-word items in the trait version of the Positive and Negative Affect Schedule (PANAS, see Watson et al., 2015), which was used in Rochefort et al. (2018). Participants rated how much they agree with a statement using a scale ranging from 1 (“Strongly Disagree”) to 5 (“Strongly Agree”).

In the current research the Negative Affectivity total score and subscales were used: Regret (e.g., “I have more than my share of regrets”), Depression (e.g., “the world can be a very dreary place”), Anger (e.g., “I tend to be rather irritable”), Anxiety (e.g., “I tend to be nervous”), Mistrust (e.g., “people are usually not what they seem”), Self-Doubt (e.g., “I often doubt myself”), Lassitude (e.g., “I sometimes feel too tired to do anything”), and Attentiveness (reverse-keyed, e.g., “It is easy for me to focus my attention on what needs to be done”). In the current samples, the Cronbach’s alpha for the overall Negative Affect scale score was α = .96 in the MTurk sample and α = .95 in the undergraduate sample. The Cronbach’s alphas ranged α = .84 to .93 in the MTurk sample and α = .74 to .90 in the undergraduate sample. These alphas were similar to those across six different samples in the original development of the TAI (Watson et al., 2015), where alphas ranged α = .80 to .90.

Data Analysis

Mirroring the analysis conducted in Rochefort and colleagues’ (2018) study, missing data were imputed using the SPSS version 25 Imputation Function. To follow the procedures of the original study, all missing data were imputed for every scale that had less than 30% missing data⁴ (MTurk: missing item data ranged 2.2%–4.8% across scales; undergraduates: missing data ranged 1.4%–11.7% across scales). Pearson’s correlations were used to evaluate the convergent and discriminant associations between scale scores, and significance tests (Lee & Preacher, 2013) for dependent correlations were used to assess the difference in strength of associations.

Following Rochefort et al. (2018), separate exploratory factor analyses (EFA) with promax rotation were conducted at the total scale, subscale, and item level. For each set of analyses, to ensure every possible opportunity for the AAQ-II scores to distinguish themselves from neuroticism and NA scores, we continued extracting factors as suggested by scree plots, parallel analyses (O’Conner, 2000), and item loadings. In addition, we continued extracting factors (beyond what was suggested by scree plots and parallel analyses) as long as they contained “marker items” or items with loadings of 0.4 or higher on one primary factor (Clark & Watson, 2019).

To summarize the item-level analyses, we followed Goldberg’s (2006) bass-ackward method, where each consecutive factor solution in the hierarchy is graphically presented. Regression-based factor scores were computed for each factor solution and correlated with those of the next consecutive factor solution (e.g., factor scores from the two-factor solution correlated with those of the three-factor solution) (see Goldberg, 2006). This method does not seek to find an optimal factor solution but instead shows the correlations between broader (higher on the hierarchy) and more specific (lower on the hierarchy) factors. Thus, one can follow content as factors split apart into smaller components.

Results

Descriptive statistics for the total scales and subscales are presented in Table 1. Values resembled those found by Rochefort et al. (2018) and prior research (Broman-Fulks et al., 2021; Vaughan-Johnston et al., 2017). Correlations and corresponding Steiger-z significance tests are presented in Table 2. Notably, results were highly similar across the samples. Correlations between the AAQ-II and MEAQ (MTurk: r = .63, undergraduates: r = .61) indicated that these scores are not assessing the same latent construct (i.e., experiential avoidance), as they did not meet the criteria for convergent validity (r ≥ .70). The AAQ-II scores demonstrated correlations with neuroticism and NA scores (MTurk: r_Mean = .74, r_range = .71–.77; undergraduates: r_Mean = .72, r_range = .69–.76) high enough to meet the criteria for convergent validity between measures of the same latent construct, indicating substantial problems with discriminant validity. The MEAQ scores had more moderate correlations with neuroticism and NA scores (MTurk: r_Mean = .62, r_range = .56–.66; undergraduates: r_Mean = .61, r_range = .52–.68), which more closely aligns with expectations and theory. Finally, the FFMQ scores also demonstrated poor discriminant validity from neuroticism and NA scores (MTurk: r_Mean = −.69, r_range = −.64 – −.76; undergraduates: r_Mean = −.67, r_range = −.59 – −.71) in the current study.

Table 1.

Means and Standard Deviations for All Scales and Subscales.

	MTurk		Undergraduate
Scale	M	SD	M	SD
AAQ-II	3.03	1.59	2.89	1.33
MEAQ	3.30	0.66	3.10	0.65
Procrastination	3.05	1.17	3.37	1.04
Distraction/Suppression	3.71	1.11	3.67	0.90
Behavioral Avoidance	3.28	1.09	3.11	0.88
Repression/Denial	2.44	0.93	2.51	0.86
Distress Endurance	4.09	0.96	4.28	0.86
Distress Aversion	3.43	1.07	3.21	0.91
FFMQ	3.37	0.58	3.23	0.46
Observe	3.19	0.83	3.11	0.77
Describe	3.45	0.87	3.38	0.77
Act with Awareness	3.70	0.88	3.31	0.79
Nonjudge	3.54	0.98	3.47	0.86
Nonreact	2.97	0.84	2.86	0.70
BFI-2-N	3.32	0.92	3.24	0.77
Anxiety	3.04	1.02	2.80	0.91
Depression	3.43	1.05	3.50	0.90
Emotional Volatility	3.49	0.99	3.42	0.91
IPIP-N	2.67	0.74	3.25	0.57
Anxiety	2.77	0.89	3.04	0.72
Anger	2.55	0.87	3.42	0.69
Depression	2.57	0.98	3.48	0.83
Self-Consciousness	2.83	0.80	3.17	0.70
Immoderation	2.78	0.71	3.06	0.60
Vulnerability	2.53	0.85	3.34	0.70
TAI	2.85	0.64	2.82	0.63
Regret	2.76	1.01	2.86	0.94
Depression	2.69	0.98	2.62	0.89
Anger	2.51	0.83	2.52	0.67
Anxiety	2.61	1.02	2.80	0.85
Self-Doubt	2.97	1.12	3.14	0.99
Mistrust	2.87	0.87	2.72	0.71
Lassitude	2.87	0.97	3.12	0.88
Attentiveness	3.52	0.84	3.12	0.83

Note. Item means are reported. MEAQ items: 1–6 scale. All other measure items: 1–5 scale. SD = standard deviation; AAQ-II = Acceptance and Action Questionnaire II; MEAQ = Multidimensional Experiential Avoidance Questionnaire; FFMQ = Five Facet Mindfulness Questionnaire; BFI-2-N = Big Five Inventory-2–Neuroticism; IPIP-N = International Personality Item Pool–Neuroticism; TAI-NA= Temperament and Affectivity Inventory–Negative Affect.

Table 2.

Pearson Correlations Between Total Scales for Both Samples.

Scale	1	2	3	4	5	6
1. AAQ-II	–	0.63	−0.63	0.71	0.75	0.77
2. MEAQ	0.61	–	−0.63	0.56	0.66	0.66
3. FFMQ	−0.63	−0.63	–	−0.64	-0.76	−0.67
4. BFI-2-N	0.67	0.52	−0.59	–	0.83	0.76
5. IPIP-N	0.74	0.63	-0.70	0.82	–	0.85
6. TAI-NA	0.76	0.68	-0.71	0.74	0.83	–

Note. MTurk sample is above the diagonal and the undergraduate sample is below the diagonal. Correlations ≥ 0.70 are bolded. All correlations were significant at the p < .01 level. AAQ-II = Acceptance and Action Questionnaire II; MEAQ = Multidimensional Experiential Avoidance Questionnaire; FFMQ = Five Facet Mindfulness Questionnaire; BFI-2-N = Big Five Inventory-2–Neuroticism; IPIP-N = International Personality Item Pool–Neuroticism; TAI-NA= Temperament and Affectivity Inventory–Negative Affect.

Significance tests demonstrated that the AAQ-II scores were more strongly associated with neuroticism and NA scores than with the FFMQ and MEAQ scores in both the MTurk (i.e., BFI-2-N, IPIP-N, TAI, all ps < .01) and undergraduate (i.e., IPIP-N, TAI, all ps < .001) samples. In contrast, the MEAQ scores, in line with what would be expected from a measure of experiential avoidance, were more strongly correlated with the FFMQ scores than the BFI-2-N scores in both samples (MTurk: p = .011, undergraduate: p < .001). However, the MEAQ scores were not more strongly associated with the FFMQ scores than the IPIP-N or TAI scores in the undergraduate sample (p = .003, p = .046, respectively).

Subscale score correlations between the neuroticism subscales, the MEAQ subscales, and AAQ-II total scores are shown in Table 3. Like the scale-level correlations, results were similar across samples. Overall, the AAQ-II scores had moderate to strong associations with neuroticism subscales (MTurk: r_Mean = .64, r_range = .46–.76; undergraduate: r_Mean = .57, r_range = .39–.76). Meanwhile, The MEAQ subscales had moderate correlations with the neuroticism subscales (MTurk: r_Mean = .39, r_range = .06–.64; undergraduate: r_Mean = .35, r_range = .10–.63). The AAQ-II scores demonstrated high levels of convergence with specific neuroticism subscales across samples (i.e., BFI Depression, IPIP Anxiety, IPIP Depression, IPIP Vulnerability, TAI Anger, TAI Depression) whereas the MEAQ subscale correlations were more aligned with expectations based on theory.

Table 3.

Pearson Correlations Between Subscales for Both Samples.

Scale/Subscale	MEAQ
Scale/Subscale	BA	DA	P	DS	RD	DE	Total	AAQ-II
MTurk
BFI Anxiety	.42**	.37**	.48**	.18**	.22**	−.34**	.48**	.63**
BFI Depression	.46**	.42**	.55**	.14**	.34**	−.46**	.56**	.71**
BFI Emotional Vol.	.37**	.35**	.49**	.06	.29**	−.45**	.47**	.59**
IPIP Anxiety	.48**	.43**	.52**	.18**	.27**	−.47**	.56**	.71**
IPIP Anger	.39**	.38**	.50**	.13**	.38**	−.41**	.52**	.60**
IPIP Depression	.51**	.46**	.59**	.21**	.41**	−.48**	.63**	.76**
IPIP Self-conscious	.56**	.42**	.58**	.27**	.35**	−.46**	.63**	.64**
IPIP Immoderation	.35**	.27**	.53**	.16**	.38**	−.35**	.48**	.47**
IPIP Vulnerability	.53**	.44**	.60**	.17**	.39**	−.53**	.63**	.72**
TAI Anger	.44**	.40**	.56**	.18**	.47**	−.40**	.57**	.58**
TAI Anxiety	.49**	.45**	.52**	.24**	.36**	−.38**	.58**	.71**
TAI Attentiveness	.38**	.28**	.64**	.10*	.42**	−.56**	.56**	.52**
TAI Depression	.49**	.48**	.55**	.20**	.41**	−.43**	.62**	.75**
TAI Lassitude	.46**	.41**	.58**	.23**	.34**	−.34**	.56**	.59**
TAI Mistrust	.40**	.39**	.34**	.19**	.30**	−.21**	.43**	.46**
TAI Regret	.45**	.42**	.58**	.26**	.46**	−.31**	.58**	.69**
TAI Self-Doubt	.50**	.43**	.60**	.27**	.38**	−.38**	.61**	.69**
Undergraduate
BFI Anxiety	.36**	.34**	.39**	.33**	.21**	−.16**	.43**	.55**
BFI Depression	.40**	.38**	.43**	.22**	.40**	−.37**	.52**	.67**
BFI Emotional Vol.	.28**	.29**	.36**	.18**	.25**	−.26**	.40**	.48**
IPIP Anxiety	.47**	.44**	.44**	.30**	.27**	−.30**	.53**	.65**
IPIP Anger	.27**	.25**	.33**	.10*	.26**	−.24**	.34**	.46**
IPIP Depression	.43**	.45**	.46**	.23**	.50**	−.40**	.57**	.75**
IPIP Self-conscious	.54**	.38**	.44**	.29**	.41**	−.33**	.56**	.55**
IPIP Immoderation	.23**	.22**	.55**	.20**	.34**	−.28**	.43**	.43**
IPIP Vulnerability	.51**	.48**	.49**	.27**	.36**	−.44**	.60**	.66**
TAI Anger	.29**	.30**	.32**	.11*	.32**	−.21**	.38**	.39**
TAI Anxiety	.52**	.46**	.52**	.33**	.40**	−.35**	.61**	.67**
TAI Attentiveness	.26**	.30**	−.63**	.18**	.32**	−.36**	.50**	.42**
TAI Depression	.47**	.49**	.50**	.28**	.51**	−.31**	.61**	.76**
TAI Lassitude	.36**	.34**	.57**	.26**	.40**	−.20**	.51**	.56**
TAI Mistrust	.36**	.33**	.41**	.22**	.50**	−.15**	.47**	.47**
TAI Regret	.40**	.41**	.50**	.25**	.46**	−.29**	.55**	.63**
TAI Self-Doubt	.48**	.43**	.43**	.35**	.37**	−.23**	.54**	.60**

Note. BFI = Big Five Inventory; IPIP = International Personality Item Pool; TAI = Temperament and Affectivity Inventory; MEAQ = Multidimensional Experiential Avoidance Questionnaire; AAQ-II = Acceptance and Action Questionnaire II; BA = Behavioral Avoidance; DA = Distress Aversion; P = Procrastination; DS = Distraction and Suppression; RD = Repression and Denial; DE = Distress Endurance.

p < .05. **p < .01.

Structural Analyses

Scale Level

EFAs with Promax rotation were conducted on the AAQ-II, MEAQ, FFMQ, BFI-2-N, IPIP-N, and TAI scale scores. In the MTurk sample, the scree plot suggested the extraction of two factors, and parallel analysis suggested the extraction of a single factor at the scale level. Up to three factors could be extracted before new factors no longer had any marker items. The two-factor solution contained a neuroticism and NA factor (BFI-2-N, IPIP-N, TAI, and AAQ-II) and an ACT/ “third-wave behavior therapy” (MEAQ) factor, with the FFMQ splitting across both factors. The three-factor solution included: (a) a neuroticism (IPIP-N, BFI-2-N) factor, (b) an ACT/ “third-wave behavior therapy” (MEAQ, FFMQ) factor, and (c) a worry and negative affectivity (AAQ-II, TAI) factor. The TAI cross-loaded on the neuroticism factor. Moreover, the neuroticism and NA factor were correlated r = .80 with the negative affectivity factor, indicating they were not distinct. In the undergraduate sample, the scree plot suggested the extraction of two factors, and parallel analysis suggested the extraction of a single factor. Only two factors could be extracted before new factors no longer contained marker items: (a) a neuroticism and NA factor (BFI-2-N, IPIP-N) and (b) an ACT/third-wave behavior factor (MEAQ, FFMQ). The TAI and AAQ-II scores split across both factors. To summarize, in both samples the AAQ-II either (a) loaded onto a neuroticism and NA factor or (b) split across factors. Meanwhile, the MEAQ and FFMQ consistently emerged as an ACT/ “third-wave behavior therapy” factor.

Subscale Level

The results of the subscale analyses are presented in Table 3. The columns are organized by the number of factors extracted for each sample, and the numbers in each row represent which factor (sub)scale scores loaded greater than .40 (e.g., subscales with a “1” loaded onto the first factor, subscales with a “2” loaded onto the second factor, etc.). Subscale scores that load onto more than one factor have more than one number.

In the MTurk sample, the scree plot suggested the extraction of two to eight factors, and parallel analyses suggested the extraction of 10 factors. Up to seven factors were extracted before no new factors had any marker items. Across factor solutions, the AAQ-II always loaded onto a factor consisting of neuroticism and NA subscales (e.g., anxiety, depression, immoderation, self-doubt, mistrust, anger) regardless of how many factors were extracted (see Table 4). Meanwhile, the MEAQ subscales either (a) formed their own factors (i.e., distraction and suppression, behavioral avoidance), or (b) loaded with FFMQ subscales, creating factors that represent broader third-wave behavior therapy constructs (i.e., mindfulness). Starting at the five-factor solution, the broad neuroticism and NA factor separated into specific neuroticism facet content (IPIP-N anger and TAI anger broke away to create an anger facet). Meanwhile, the AAQ-II scores (which are not supposed to measure neuroticism) continued to load on the broad neuroticism and NA factor, made up of anxiety and depression content (e.g., anxiety, depression, immoderation, self-doubt, mistrust). This general pattern continued for the remaining solutions.

Table 4.

Summary of Subscale Level Factor Analysis.

		Number of Factors Extracted
Sample	Scale/Subscale	Two	Three	Four	Five	Six	Seven	Eight
MTurk
	AAQ-II	1	1	1	1	1	1	–
	MEAQ
	Procrastination	2	3	2	2	2	3	–
	Distraction/Suppression	1, 2	2	3	3	3	4	–
	Behavioral Avoidance	2	2	3	3	3	4	–
	Repression/Denial	2	2, 3	2	2	2	3	–
	Distress Endurance	1	3	*	*	6	7	–
	Distress Aversion	2	2	3	3	3	4	–
	FFMQ
	Observe	1	3	4	4	4	5	–
	Describe	1	3	2, 4	2, 4	2, 5	5	–
	Act with Awareness	1	*	2	2	2	3	–
	Nonjudge	2	1	1, 2	1	1	3	–
	Nonreact	1	1	1, 4	1, 4	*	*	–
	BFI-2-N
	Anxiety	1	1	1	1	1	1	–
	Depression	1	1	1	1	1	2	–
	Emotional Volatility	1	1	1	1	*	1	–
	IPIP-N
	Anxiety	1	1	1	1	1	1	–
	Anger	1	1	1	5	4	1, 6	–
	Depression	1	1	1	1	1	2	–
	Self-Consciousness	1	1	1	1	1	1	–
	Immoderation	1	1	1	*	*	*	–
	Vulnerability	1	1	1	1	1	1	–
	TAI
	Regret	1	1	1	1	1	2	–
	Depression	1	1	1	1	1	2	–
	Anger	1	1	1	5	4	6	–
	Anxiety	1	1	1	1	1	1	–
	Self-Doubt	1	1	1	1	1	2	–
	Mistrust	*	1	1	*	1	2	–
	Lassitude	1	1	1	1	1	2	–
	Attentiveness	1	1, 3	2	1, 2	2	3	–
Undergraduate
	AAQ-II	1	1	1	1	1	1	4
	MEAQ
	Procrastination	1	2	2	2	2	2	2
	Distraction/Suppression	1	1	4	4	5	5	6
	Behavioral Avoidance	1	1	4	4	5	5	6
	Repression/Denial	1	1, 2	2	5	4	4	5
	Distress Endurance	*	*	*	*	*	7	7
	Distress Aversion	1	1	4	4	5	5	6
	FFMQ
	Observe	*	*	3	3	6	6	8
	Describe	1	*	*	5	4	4	5
	Act with Awareness	1	2	2	2	2	2	2
	Nonjudge	1	2	1, 2	1	1	*	4
	Nonreact	2	3	3	3	6	7	8
	BFI-2-N
	Anxiety	2	3	1	1	1	1	1
	Depression	2	3	1	1	1	1	4
	Emotional Volatility	2	3	3	3	3	3	*
	IPIP-N
	Anxiety	2	3	1	1	1	1	1
	Anger	2	1, 3	3	3	3	3	3
	Depression	1, 2	3	1	1	1	1	4
	Self-Consciousness	1	1	1	1	1	1	1
	Immoderation	*	2	2	2	2	2	2
	Vulnerability	2	3	1, 3	1	1	1	1
	TAI
	Regret	1	*	1	1	1	1	4
	Depression	1	1	1	1	1	1	4
	Anger	2	2, 3	2, 3	3	3	3	3
	Anxiety	1	1	1	1	1	1	1
	Self-Doubt	1	1	1	1	1	1	1
	Mistrust	1	2	2	*	4	4	5
	Lassitude	1	2	2	2	2	1, 2	2
	Attentiveness	1	2	2	2	2	2	2

Note. Numbers indicate which subscales loaded on which factors. Subscales with multiple numbers indicate that the subscale loaded > 0.40 on multiple factors. Bolded numbers indicate that scales loaded on the same factor regardless of the number of factors extracted. In the eighth-factor extraction of the Mturk sample, the eighth factor consisted of no subscales, and was uninterpretable. This was true for the ninth-factor extraction in the undergraduate sample. AAQ-II = Acceptance and Action Questionnaire II; MEAQ = Multidimensional Experiential Avoidance Questionnaire; FFMQ = Five Facet Mindfulness Questionnaire; BFI-2-N = Big Five Inventory-2–Neuroticism; IPIP-N = International Personality Item Pool–Neuroticism; TAI-NA= Temperament and Affectivity Inventory–Negative Affect.

Indicates the subscale did not load > 0.40 on any factor.

In the undergraduate sample, the scree plot suggested the extraction of two to nine factors, and parallel analyses suggested the extraction of nine factors. However, only eight factors could be extracted before new factors no longer had any marker items. In the initial factor solutions, the pattern was initially less clear. In the two-factor solution, the first factor included scores from the AAQ-II, all the MEAQ subscales, most of the FFMQ subscales, and most of the TAI neuroticism and NA subscales (e.g., anxiety, depression, self-doubt, mistrust, lassitude, etc.). The BFI and IPIP neuroticism and NA formed the second factor. In the three-factor solution, several IPIP and TAI neuroticism and NA subscales moved to the general neuroticism and NA factor with the BFI and other IPIP subscales, while the AAQ-II and the MEAQ scores remained part of the large, mixed first factor. In addition, some of the FFMQ and TAI subscales formed a “distractedness” factor where mindfulness and attentiveness scales (i.e., Acting with Awareness, Procrastination, Attentiveness, Immoderation, and Lassitude) had negative loadings and TAI subscales (i.e., Lassitude, [in]attentiveness) had positive loadings. Starting at the four-factor solution, loadings shifted dramatically; the AAQ-II scores had high loadings on the broad neuroticism and NA factor, which consisted mainly of depression and anxiety subscale scores. The MEAQ subscales either (a) loaded onto their own experiential avoidance factors (i.e., distress aversion, behavioral avoidance, distraction, and suppression) or (b) loaded with FFMQ and TAI to create “denial” and “distractedness” factors.

In subsequent solutions, additional factors represented specific manifestations of either (a) third-wave behavior therapy facets (i.e., MEAQ subscales) or (b) neuroticism and NA facets (i.e., emotional volatility/anger subscales). However, the AAQ-II scores continued to load onto the neuroticism and NA factors, which consisted mainly of anxiety and depression content. In the nine-factor solution, the broad neuroticism and NA factor split into separate anxiety and depression factors, with the AAQ-II scores loading strongly onto the depression factor. Throughout all factor solutions, AAQ-II scores consistently loaded onto factors containing neuroticism and NA content (particularly anxiety and depression facets of neuroticism and NA). Conversely, the MEAQ subscale scores either formed factors that broadly represent experiential avoidance or loaded with FFMQ subscale scores onto general third-wave factors (e.g., denial, distractedness).

To summarize, in both samples, the AAQ-II scores consistently loaded onto a neuroticism and NA factor for almost every solution. This occurred even when some neuroticism and NA subscale scores broke away to create a specific factor representing anger/emotional volatility, leaving the AAQ-II scores loading with depression/anxiety facets of neuroticism. Conversely, the MEAQ subscale scores loaded together to create a general experiential avoidance factor or loaded with the FFMQ subscale scores to create factors representing other third-wave constructs (i.e., mindfulness, distractedness, denial).

Item Level

Item level analyses provide information regarding what each item of the AAQ-II and the MEAQ is capturing. The item-level analysis also clarifies whether the discriminant validity issues with the AAQ-II are due to a few problematic items or represent a more widespread issue. EFAs were conducted using the same methods as noted above; scree plots suggested approximately five to 13 factors in both samples, while parallel analyses suggested the extraction of 18 factors in the MTurk sample and 17 factors in the undergraduate sample. In both samples, when more than nine to 10 factors were extracted, several factors contained splitter items (i.e., loadings above .40 on more than one factor) or had no marker items. This suggests overextraction of factors, and that meaningful factors could no longer be extracted. However, 17 to 18 factors were extracted (the maximum suggested by the parallel analyses) to provide every opportunity for the AAQ-II to split away from the neuroticism and NA content. Figures 1 and 2 summarize the results using Goldberg’s (2006) method (due to space limitations, only two through seven factors are presented here; (all remaining factor solutions are in the supplement). For each factor, the percentage of items from each scale loading >.40 on that factor is reported. Items that loaded >.40 on more than one factor were included on each of those factors, resulting in some percentages summing to over 100%. Descriptions of patterns found in each individual sample are detailed below.

Figure 1.

MTurk Sample Item-Level Latent Construct Hierarchy.

Figure 2.

Undergraduate Sample Item-Level Latent Construct Hierarchy.

In the MTurk sample (see Figure 1), the AAQ-II items initially loaded onto a factor that was a mix of third-wave behavior (distress aversion, distraction and suppression, behavioral avoidance, procrastination, non-judgment) and neuroticism and NA (mistrust, lassitude, self-doubt, depression, regret, anger, anxiety, etc.) content. However, as soon as a three-factor solution was extracted, all the AAQ-II items loaded onto a neuroticism and NA factor (e.g., depression, anxiety, anger, volatility, self-doubt, self-consciousness, vulnerability, lassitude). This pattern continued regardless of how many factors were extracted. As in the subscale analyses, neuroticism and NA began to break into more specific content, with items capturing anger forming their own factor. Nevertheless, all AAQ-II items continued to load onto a factor with content related to depression (depression, anger, volatility, lassitude, immoderation, etc.) and anxiety (anxiety, self-doubt, self-consciousness, vulnerability, etc.) from both neuroticism and trait NA measures. Meanwhile, the MEAQ items loaded onto factors capturing experiential avoidance and third-wave content in every solution. As more factors were extracted, most of the MEAQ items formed their own experiential avoidance factor. Indeed, several subscales of the MEAQ and neuroticism measures were roughly re-created as independent factors in the lower part of the bass-ackwards hierarchy. This was true even when we over-extracted factors; despite the emergence of factors without marker items, we continued all the way up to 18 factors as suggested by the parallel analysis, and the patterns described above persisted (see supplemental tree figure).

In the undergraduate sample (see Figure 1), the AAQ-II items initially loaded onto a factor containing a mix of neuroticism and NA and third-wave behavioral therapy content. However, as more factors were extracted, the AAQ-II items loaded onto a broad neuroticism and NA factor while the MEAQ and other third-wave content broke away to form new factors. Mirroring the subscale level, the broad neuroticism and NA factor eventually separated into more specific factors representing facets of neuroticism and NA (i.e., anger separated from anxiety and depression). By the four-factor solution, all AAQ-II items loaded onto a factor representing the depression and anxiety facets of neuroticism and NA. This pattern continued through the eight-factor solution. Starting at the nine-factor solution, there were signs of over-extraction, with the ninth factor consisting of only a splitter item. Indeed, in every subsequent solution, there were one or more factors that either consisted of only splitter-item(s) or had no marker items at all. Thus, all factors beyond the nine-factor solution should be interpreted with caution, as results are impacted by issues of over-extraction.

However, we extracted up to 17 factors in the undergraduate sample, as suggested to be the maximum number of possible factors by the parallel analysis (see supplemental tree figure). As we started over-extracting at the nine-factor solution, one AAQ-II item (i.e., “I’m afraid of my feelings”) did not load on any factors. Indeed, in several of the remaining factor solutions, three of the seven AAQ-II items (i.e., “I’m afraid of my feelings,” “Emotions cause problems in my life,” and “I worry about not being able to control my worries and feelings”) showed unique loading patterns. Specifically, items either did not load as a marker item on any factors, or they loaded onto an “Emotional Self-Judgement” factor, which first appeared in the nine-factor solution and captured the tendency to criticize oneself for experiencing negative emotions. However, this loading pattern was inconsistent and likely attributable to over-extraction, as several items across measures were being pulled away to capture increasingly specific content, with new factors often showing signs of being “bloated specifics” or factors formed purely based on similar terms/phrases used within marker items rather than representing a valid underlying latent construct (Clark & Watson, 2019). Of note, even when the items did not load onto any factors, they often nearly loaded onto either the “N/NA” or “Emotional Self-Judgement” factor.

Nevertheless, even as we over-extracted factors, most AAQ-II items loaded with neuroticism/NA content. In contrast, for every factor solution, the MEAQ consistently loaded onto factors capturing third-wave content. Replicating results in the MTurk sample, the MEAQ items separated from items of other measures to form factors roughly representing MEAQ facets (i.e., one factor primarily consisted of MEAQ distress endurance items).

Discussion

The current study adds to the growing body of research demonstrating that scores on the AAQ-II are indicators of neuroticism and NA, with the current results clarifying that the AAQ-II specifically captures content within the anxiety and depression facets of neuroticism. In contrast, the MEAQ scores appear to assess experiential avoidance. These findings held true, regardless of whether the analyses were conducted at the scale, subscale, or item level. The current research replicates Rochefort and colleagues’ (2018) findings using updated neuroticism and NA measures (with numerous subscales) in new samples, ensuring that these results were not sample- or measure-specific.

Comparing Findings Across Studies

In both the current study and Rochefort et al. (2018), the AAQ-II and MEAQ scores were only moderately correlated with each other. Although correlations were slightly higher in the current study, they did not reach the level of convergent validity. Instead, the AAQ-II scores demonstrated convergence with neuroticism and NA scores in both studies, with correlations being slightly higher in the current study. Meanwhile, the MEAQ scores achieved discriminant validity from neuroticism and NA scores in both studies, although discriminant validity correlations were slightly higher in the current study. Indeed, all correlations were generally higher in magnitude in the current study, including those for the FFMQ.

In both studies, AAQ-II scores were more strongly correlated with neuroticism and NA than with the MEAQ and FFMQ scores. Regarding the MEAQ, in the original study, the MEAQ scores were more strongly correlated with FFMQ scores than all neuroticism and NA scores. Indeed, the highest correlation between the MEAQ and neuroticism or NA was a correlation of r = .57 with the Big Five Aspects Scale–Neuroticism scores (BFAS; DeYoung et al., 2007). In the current study, the MEAQ was only more strongly correlated with the FFMQ than the BFI-N, but not with the other neuroticism and NA measures. The lack of complete replication of significance tests between the MEAQ and FFMQ may be related to the FFMQ scores showing higher than expected correlations with some neuroticism and NA measures in the current study. It is also worth noting that the FFMQ subscales showed differential associations with the MEAQ subscales, both within and across samples. Differences across samples call into question what associations should be expected among the FFMQ and MEAQ subscales, as the facets of each construct might be differentially related while the overall domains remain moderately associated as expected. In addition, these results may highlight potential issues with the psychometric properties of the FFMQ that should also be explored.

Overall, the structural analyses largely replicate Rochefort et al. (2018) with a slight discrepancy in the scale-level EFA. In the current study, AAQ-II scores split across factors in the two-factor undergraduate solution. This did not occur in Rochefort et al. (2018), where the AAQ-II scores always loaded completely with neuroticism and NA. Regarding the subscale structural analyses, general patterns replicated across studies. Furthermore, the current study extended upon the results of the original study, finding that the AAQ-II specifically loaded with depression and anxiety subscale scores in the last factor solutions. This was also true for the item-level analysis. Indeed, in both studies most if not all items from the AAQ-II did not load with other experiential avoidance or third-wave behavioral measures, nor did they form an independent “AAQ-II” factor. Instead, the AAQ-II items formed factors with depression and anxiety content from neuroticism and NA scales, or (in the undergraduate sample) either loaded onto an “Emotional Self-Judgement” factor or did not load onto any factor. When comparing the MEAQ item-level EFAs in the current study to those in Rochefort et al. (2018), we found that many of the same factors emerged, including “Mindfulness,” “Avoidance,” and “Repression and Denial.” As such, the MEAQ scores have continued to foster strong support for their construct validity, particularly discriminant validity from neuroticism and NA scores.

Taken together, the current results combined with previous research (Rochefort et al., 2018; Vaughan-Johnston et al., 2017; Wolgast, 2014) make it clear that the AAQ-II scores demonstrate poor construct validity, fail to assess the target latent construct (i.e., experiential avoidance), and instead assess anxiety and depression content within both neuroticism and NA. The subscale and item-level analyses expanded upon past research, demonstrating that the AAQ-II scores are best conceptualized as assessing a depression facet, and to a lesser extent, an anxiety facet, of the personality trait neuroticism. Rochefort and colleagues (2018, p. 446) concluded, “To the extent that the AAQ-II scores function as an indicator of neuroticism and NA, any conclusions regarding experiential avoidance based on the AAQ-II should be interpreted with caution.” We echo and reemphasize this as researchers and clinicians alike must be aware of the limitations of the AAQ-II.

Implications for Experiential Avoidance as a Construct

It is important to emphasize that the psychometric issues with the AAQ-II scores do not necessarily translate to experiential avoidance as a construct. Rather, reliance on the AAQ-II makes it difficult to understand the nature of experiential avoidance, as previous findings using the AAQ-II are best understood as replications of the well-established associations of neuroticism and negative affect with important clinical outcomes. Indeed, it is pertinent that researchers test the role of experiential avoidance in the development, maintenance, and treatment of internalizing psychopathology using psychometrically sound measures that actually assess experiential avoidance (see “Future Directions”). In the current study, the MEAQ scores demonstrated appropriate discriminant validity from neuroticism and NA scores. Past research provides initial evidence that the MEAQ scores demonstrate criterion validity for psychopathology (Gámez et al., 2011). Taken together, the current results combined with the results of Rochefort et al. (2018) and Gámez et al. (2011) provide evidence that experiential avoidance (when properly assessed) is a valid construct distinct from neuroticism and NA.

Semantic Overlap and Shared Method Variance of the AAQ-II

Of note, Clark and Watson (2019) argue that any item containing “worry,” or other similar words, is essentially guaranteed to capture neuroticism and/or negative affectivity content. As such, several (if not all) items from the AAQ-II seem to overlap semantically with items from neuroticism and negative affect scales.⁵ We speculate that this might explain the poor discriminant validity of the AAQ-II scores from measures of neuroticism and NA. It may also explain the EFA results in the current and original study (Rochefort et al., 2018); it is likely that the AAQ-II scores loading with neuroticism content is the result of poorly worded items instead of something inherent in experiential avoidance as a construct. It is worth noting that there were no major differences across measurement formats for any of the measures included. The minor differences were that the AAQ-II and FFMQ shared the same response format (i.e., “never true” to “always true”), whereas the MEAQ, BFI-N, IPIP_N, and TAI used the same response format (i.e., “strongly agree” to “strongly disagree”). As such, shared method variance cannot explain the current results, as it would have increased the associations between the AAQ-II and FFMQ, as well as the MEAQ associations with neuroticism and NA. More importantly, the current results for the MEAQ, as well as the broader MEAQ literature, provide evidence for the validity of experiential avoidance as a construct in and of itself.

Implications for Research and Clinical Work Using the AAQ-II

Despite considerable evidence regarding substantial psychometric problems with the AAQ-II, it remains the most widely used measure of experiential avoidance. It is worth noting that some psychologists may focus on predictive validity and may prioritize usability. However, because outcomes are multi-determined, a focus on predictive validity provides only limited information about what a measure is truly assessing and can lead to inaccurate conclusions. Nevertheless, despite clear evidence that the AAQ-II does not assess experiential avoidance, some may argue that its predictive ability may still provide heuristic utility. It is also true that the brief nature of the AAQ-II makes it feasible for repeated assessments over the course of treatment, whereas longer measures (i.e., the MEAQ) may not be feasible. However, we argue that clinicians must consider their therapeutic goals. Given that therapies like ACT are not trying to alter neuroticism but rather aim to improve psychological well-being, measuring changes in neuroticism over time (as one would be doing with repeated administrations of the AAQ-II) may not align with the core tenets of ACT.

It is also worth noting that other brief measures of experiential avoidance exist, such as the Brief Experiential Avoidance Questionnaire (BEAQ; Gámez et al., 2014), which is a 15-item short form of the MEAQ. To explore the BEAQ as a clinically feasible alternative, we ran post hoc analyses in both samples following the same “bass-ackward” approach with the BEAQ. Critically, results (not reported) replicated those found with the MEAQ. This suggests that the BEAQ is a clinically feasible alternative to the AAQ-II when the full MEAQ cannot be administered.

Ultimately, we urge researchers and clinicians to interpret AAQ-II scores with the knowledge that they are not capturing changes in latent experiential avoidance. Moreover, results from previous studies using the AAQ-II must be re-interpreted with the knowledge that the AAQ-II scores are best conceptualized as targeting anxiety/depression facets of neuroticism. We believe this should be a primary consideration when selecting measures for use in research and clinical work.

Further Evaluation of the MEAQ

Although considerable research has found that the MEAQ scores demonstrate appropriate discriminant validity from neuroticism and NA, evidence regarding its discriminant validity from other constructs (e.g., distress tolerance, impulsivity, committed action, self-as-context) is lacking. Furthermore, it is necessary to empirically test the MEAQ’s ability to predict important outcomes for third-wave interventions, such as decreased experiential avoidance and increased quality of life. Likewise, few studies have evaluated the incremental predictive power of scores on the MEAQ over and above neuroticism. Two studies evaluated the incremental validity of two MEAQ subscales; Naragon-Gainey and Watson (2018) used the Behavioral Avoidance scale, and Anderson and colleagues (2021) used the Distress Aversion scale. These studies found limited or no evidence of incremental validity above and beyond neuroticism (Anderson et al., 2021; Naragon-Gainey & Watson, 2018). However, examining only two subscales makes it impossible to draw any firm conclusions. As such, it is critical for future research to test whether the MEAQ and other experiential avoidance measures demonstrate unique predictive power for internalizing symptoms that are not accounted for by neuroticism.

Limitations and Future Directions

We used a combination of large, age-diverse student and community samples to ensure more generalizable results than a single sample alone, and these samples replicated upon each other and past research. However, the current samples were not racially or gender diverse. Replicating this study using samples with greater variability in ethnicity and gender would improve the generalizability of findings. Likewise, these issues have yet to be explored in clinical samples. Although it is unlikely that different patterns of results would emerge in a clinical sample, it is critical to test this assumption, as much of the research on links between the MEAQ (and AAQ-II) scores and psychopathology has been conducted using student and online samples (see original development papers, Bond et al., 2011; Gámez et al., 2011).

Second, as previously discussed, a semantic analysis of the AAQ-II would help address unanswered questions regarding the AAQ-II. Specifically, if researchers are interested in knowing why the AAQ-II scores continue to perform poorly (as found in the present study, as well as past work; Rochefort et al., 2018; Wolgast, 2014), it would be beneficial to explicitly evaluate the degree of semantic overlap between the AAQ-II items and neuroticism and NA items. Clark and Watson (2019) have stated that “the inclusion of almost any negative mood term (e.g., “I worry about . . .,” . . .) virtually guarantees a substantial neuroticism/negative affectivity component to an item.” This issue applies to several AAQ-II items (e.g., “I worry about not being able to control my worries and feelings,” “Worries get in the way of my success.”). Although outside the scope of the current study, future research could use semantic similarity analyses (e.g., cosine similarity using lexical embeddings) and other methods to empirically test the degree to which poor item-wording is the reason for the poor discriminant validity of the AAQ-II.

Overall, the wide use of the AAQ-II highlights the need to inform broader audiences about the importance of using psychometrically sound measures whose scores assess the intended construct in research and practice. Of note, past reliance on the AAQ-II and the current results raise the question of whether experiential avoidance as a construct has unique associations with, or provides incremental predictive power for, internalizing psychopathology. This is an important topic for future research that must be tested using measures that actually assess experiential avoidance. Moreover, whether experiential avoidance has unique associations with other theoretically relevant outcomes (i.e., quality of life, life satisfaction) is a rich area of exploration.

A third limitation is that many third-wave behavioral theory constructs have murky conceptualizations, and measures of those constructs lack adequate support for their psychometric properties (Chawla & Ostafin, 2007; Gillanders et al., 2014; Ruiz, 2012). Indeed, in the current study, the FFMQ scores also demonstrated discriminant validity issues with neuroticism and NA scores, albeit not as severe as the AAQ-II scores. In fact, in the current study, much of the FFMQ contributed to the “Emotional Self-Judgement” factor, which seems to be a facet of NA given that it captures self-criticism for experiencing negative emotions. In addition, as mentioned earlier, it is possible that the FFMQ subscale scores are differentially associated with experiential avoidance scale scores. For example, based on their definitions, observing internal experiences (i.e., “Observe” subscale) is likely less related to experiential avoidance than not reacting to internal experiences (i.e., “Nonreact” subscale). Possible differential associations were not explored in the current study. Taken together, the psychometric properties of the FFMQ may need further evaluation in and of themselves, potentially limiting its strength as a comparison third-wave measure. Future studies should also evaluate FFMQ scores alongside neuroticism and NA scores, and scores of measures of related constructs (e.g., EA, mindfulness, fusion, committed action), the same way the AAQ-II and MEAQ scores were evaluated in the current study.

In general, the use of psychometrically sound measures must become a priority in third-wave behavioral theory research and practice. Aside from the MEAQ, there is the Brief Experiential Avoidance Questionnaire (BEAQ; Gámez et al., 2014), which is meant for quicker assessments of experiential avoidance. There are also other potential measures, such as the Multidimensional Psychological Flexibility Inventory (MPFI; Landi et al., 2021), which was developed to ensure discriminant validity from internalizing psychopathology but needs to be evaluated alongside neuroticism and NA. Moving forward, we strongly recommend that researchers use more psychometrically sound measures of experiential avoidance instead of the AAQ-II. More generally, the development of new, more valid, and reliable third-wave measures remains an important future direction. While all aspects of construct validity are important, demonstrating discriminant validity will be essential in this endeavor; there must be a greater focus on ensuring that measures capture their target construct and do not capture other similar constructs. Furthermore, the field must explicitly establish and empirically test expected associations between constructs like mindfulness and experiential avoidance. Such work would enhance the confidence of researchers’ findings, further solidifying third-wave behavioral theory.

Conclusion

Given the results of the current and past research, we recommend against the continued use of the AAQ-II, as there is substantial evidence that the AAQ-II scores represent latent trait neuroticism and NA (specifically anxiety and depression facets). Moreover, measurement must become a more central focus in third-wave behavioral theory research. Improved measures will allow researchers to better understand the constructs relevant to their theory and have greater confidence in their interpretations and results. To continue to advance research on experiential avoidance, and third-wave behavior therapy more broadly, considerable attention must focus on the psychometric properties of third-wave behavioral measures.

Supplemental Material

sj-docx-1-asm-10.1177_10731911261423143 – Supplemental material for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ

Supplemental material, sj-docx-1-asm-10.1177_10731911261423143 for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ by Alexa Jimenez, Catherine Rochefort Modén and Michael Chmielewski in Assessment

Supplemental Material

sj-docx-2-asm-10.1177_10731911261423143 – Supplemental material for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ

Supplemental material, sj-docx-2-asm-10.1177_10731911261423143 for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ by Alexa Jimenez, Catherine Rochefort Modén and Michael Chmielewski in Assessment

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Ethical Considerations

This research study has been reviewed and approved by the SMU Institutional Review Board. Participants provided written informed consent before participating in this study.

Pre-Registration

Study was pre-registered at .

ORCID iD

Alexa Jimenez

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Akbari

Seydavi

Hosseini

Z. S.

Krafft

Levin

M. E.

(2022). Experiential avoidance in depression, anxiety, obsessive-compulsive related, and posttraumatic stress disorders: A comprehensive systematic review and meta-analysis. Journal of Contextual Behavioral Science, 24, 65–78. https://doi.org/10.1016/j.jcbs.2022.03.007

American Psychological Association (2018). Avoidance. In APA dictionary of psychology. Retrieved January 14, 2026, from https://dictionary.apa.org/avoidance

Anderson

G. N.

Tung

E. S.

Brown

T. A.

Rosellini

A. J.

(2021). Facets of emotion regulation and emotional disorder symptom dimensions: Differential associations and incremental validity in a large clinical sample. Behavior Therapy, 52, 917–931. https://doi.org/10.1016/j.beth.2020.11.003

Angelakis

Gooding

(2021). Experiential avoidance in non-suicidal self-injury and suicide experiences: A systematic review and meta-analysis. Suicide and Life-Threatening Behavior, 51, 978–992. https://doi-org.proxy.libraries.edu/10.1111/sltb.12784

Angelakis

Pseftogianni

(2021). Association between obsessive-compulsive and related disorders and experiential avoidance: A systematic review and meta-analysis. Journal of Psychiatric Research, 138, 228–239. https://doi.org/10.1016/j.jpsychires.2021.03.062

Baer

R. A.

Smith

G. T.

Hopkins

Krietemeyer

Toney

(2006). Using self-report assessment methods to explore facets of mindfulness. Assessment, 13(1), 27–45. https://doi.org/10.1177/1073191105283504

Bond

F. W.

Hayes

S. C.

Baer

R. A.

Carpenter

K. M.

Guenole

Orcutt

H. K.

Waltz

Zettle

R. D.

(2011). Preliminary psychometric properties of the Acceptance and Action Questionnaire–II: A revised measure of psychological inflexibility and experiential avoidance. Behavior Therapy, 42(4), 676–688. https://doi.org/10.1016/j.beth.2011.03.007

Brereton

McGlinchey

(2020). Self-harm, emotion regulation, and experiential avoidance: A systematic review. Archives of Suicide Research, 24(Suppl. 1), S1–S24. https://doi.org/10.1080/13811118.2018.1563575

Broman-Fulks

J. J.

Hall

C. A.

Kelso

K. C.

Kundert

(2021). Incremental validity of the AAQ-II for anxiety disorder symptomology. Journal of Contextual Behavioral Science, 22, 77–86. https://doi.org/10.1016/j.jcbs.2021.09.007

10.

Byrne

S. P.

Haber

Baillie

Costa

D. S. J.

Fogliati

Morley

(2019). Systematic reviews of mindfulness and acceptance and commitment therapy for alcohol use disorder: Should we be using third wave therapies? Alcohol and Alcoholism (Oxford, Oxfordshire), 54(2), 159–166. https://doi.org/10.1093/alcalc/agy089

11.

Chawla

Ostafin

(2007). Experiential avoidance as a functional dimensional approach to psychopathology: An empirical review. Journal of Clinical Psychology, 63(9), 871–890. https://doi.org/10.1002/jclp.20400

12.

Chmielewski

Kucker

S. C.

(2020). An MTurk crisis? Shifts in data quality and the impact on study results. Social Psychological and Personality Science, 11(4), 464–473. https://doi.org/10.1177/1948550619875149

13.

Clark

L. A.

Watson

(1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7, 309–319. http://doi.org/10.1037/1040-3590.7.3.309

14.

Clark

L. A.

Watson

(2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 1412–1427. https://doi-org.proxy.libraries.edu/10.1037/pas0000626

15.

Costa

P. T.

Jr. McCrae

R. R.

(2008). The revised NEO personality inventory (NEO-PI-R). In Boyle

G. J.

Matthews

Saklofske

D. H.

(Eds.), The Sage handbook of personality theory and assessment: Vol. 2. Personality measurement and testing (pp. 179–198). Sage. https://doi.org/10.4135/9781849200479.n9

16.

Cronbach

L. J.

Meehl

P. E.

(1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302. https://doi.org/10.1037/h0040957

17.

DeYoung

C. G.

Quilty

L. C.

Peterson

J. B.

(2007). Between facets and domains: 10 aspects of the Big Five. Journal of Personality and Social Psychology, 93(5), 880–896. https://doi.org/10.1037/0022-3514.93.5.880

18.

Douglas

B. D.

Ewell

P. J.

Brauer

(2023). Data quality in online human-subjects research: Comparisons between MTurk, Prolific, CloudResearch, Qualtrics, and SONA. PLOS ONE, 18(3), Article e0279720. https://doi.org/10.1371/journal.pone.0279720

19.

Gámez

Chmielewski

Kotov

Ruggero

Watson

(2011). Development of a measure of experiential avoidance: The Multidimensional Experiential Avoidance Questionnaire. Psychological Assessment, 23(3), 692–713. https://doi.org/10.1037/a0023242

20.

Gámez

Chmielewski

Kotov

Ruggero

Suzuki

Watson

(2014). Brief experiential avoidance questionnaire: Development and initial validation. Psychological Assessment, 26(1), 35–45. https://doi.org/10.1037/a0034473

21.

Garey

Zvolensky

M. J.

Spada

M. M.

(2020). Third wave cognitive and behavioral processes and therapies for addictive behaviors: An introduction to the special issue. Addictive Behaviors, 108, 106–465. https://doi.org/10.1016/j.addbeh.2020.106465

22.

Gillanders

D. T.

Bolderston

Bond

F. W.

Dempster

Flaxman

P. E.

Campbell

Kerr

Tansey

Noel

Ferenbach

Masley

Roach

Lloyd

May

Clarke

Remington

. (2014). The development and initial validation of the cognitive fusion questionnaire. Behavior Therapy, 45(1), 83–101. https://doi.org/10.1016/j.beth.2013.09.001

23.

Goldberg

L. R.

(1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In Mervielde

Deary

De Fruyt

Ostendorf

(Eds.), Personality psychology in Europe (Vol. 7, pp. 7–28). Tilburg University Press.

24.

Goldberg

L. R.

(2006). Doing it all bass-ackwards: The development of hierarchical factor structures from the top down. Journal of Research in Personality, 40(4), 347–358. https://doi.org/10.1016/j.jrp.2006.01.001

25.

Goldberg

L. R.

Johnson

J. A.

Eber

H. W.

Hogan

Ashton

M. C.

Cloninger

C. R.

Gough

H. C.

(2006). The International Personality Item Pool and the future of public-domain personality measures. Journal of Research in Personality, 40, 84–96. https://doi.org/10.1016/j.jrp.2005.08.007

26.

Hauser

D. J.

Moss

A. J.

Rosenzweig

Jaffe

S. N.

Robinson

Litman

(2022). Evaluating CloudResearch’s approved group as a solution for problematic data quality on MTurk. Behavior Research Methods, 55, 3953–3964. https://doi.org/10.3758/s13428-022-01999-x

27.

Hayes

S. C.

Ciarrochi

Hofmann

S. G.

Chin

Sahdra

(2022). Evolving an idionomic approach to process change: Towards a unified personalized science of human improvement. Behaviour Research and Therapy, 156, 104155. https://doi.org/10.1016/j.brat.2022.104155

28.

Hayes

S. C.

Luoma

J. B.

Bond

F. W.

Masuda

Lillis

(2006). Acceptance and commitment therapy: Model, processes and outcomes. Behaviour Research and Therapy, 44(1), 1–25. https://doi.org/10.1016/j.brat.2005.06.006

29.

Hayes

S. C.

Strosahl

Wilson

K. G.

(1999). Acceptance and commitment therapy: An experiential approach to behavior change. Guilford Press.

30.

Hayes

S. C.

Wilson

K. G.

Gifford

E. V.

Follette

V. M.

Strosahl

(1996). Experiential avoidance and behavioral disorders: A functional dimensional approach to diagnosis and treatment. Journal of Consulting and Clinical Psychology, 64(6), 1152–1168. https://doi-org.proxy.libraries.edu/10.1037/0022-006X.64.6.1152

31.

Hayes-Skelton

S. A.

Eustis

E. H.

(2020). Experiential avoidance. In Abramowitz

J. S.

Blakey

S. M.

(Eds.), Clinical handbook of fear and anxiety: Maintenance processes and treatment mechanisms (pp. 115–131). American Psychological Association. https://doi-org.proxy.libraries.edu/10.1037/0000150-007

32.

Howell

A. J.

Passmore

H.-A.

(2019). Acceptance and commitment training (ACT) as a positive psychological intervention: A systematic review and initial meta-analysis regarding ACT’s role in well-being promotion among university students. Journal of Happiness Studies: An Interdisciplinary Forum on Subjective Well-being, 20(6), 1995–2010. https://doi.org/10.1007/s10902-018-0027-7

33.

John

O. P.

Donahue

E. M.

Kentle

R. L.

(1991). Big Five Inventory (BFI) [Database record]. APA PsycTests. https://doi.org/10.1037/t07550-000

34.

Johnson

J. A.

(2014). Measuring thirty facets of the Five Factor Model with a 120-item public domain inventory: Development of the IPIP-NEO-120. Journal of Research in Personality, 51, 78–89. https://doi.org/10.1016/j.jrp.2014.05.003

35.

Kotov

Gámez

Schmidt

Watson

(2010). Linking “big” personality traits to anxiety, depressive, and substance use disorders: A meta-analysis. Psychological Bulletin, 136(5), 768–821. https://doi.org/10.1037/a0020327

36.

Kroska

E. B.

Miller

M. L.

Roche

A. I.

Kroska

S. K.

O’Hara

M. W.

(2018). Effects of traumatic experiences on obsessive-compulsive and internalizing symptoms: The role of avoidance and mindfulness. Journal of Affective Disorders, 225, 326–336. https://doi.org/10.1016/j.jad.2017.08.039

37.

Landi

Pakenham

K. I.

Crocetti

Grandi

Tossani

(2021). The Multidimensional Psychological Flexibility Inventory (MPFI): Discriminant validity of psychological flexibility with distress. Journal of Contextual Behavioral Science, 21, 22–29. https://doi.org/10.1016/j.jcbs.2021.05.004

38.

Lee

I. A.

Preacher

K. J.

(2013, September). Calculation for the test of the difference between two dependent correlations with one variable in common [Computer software]. http://quantpsy.org

39.

Madley-Dowd

Hughes

Tilling

Herson

(2019). The proportion of missing data should not be used to guide decisions on multiple imputation. Journal of Clinical Epidemiology, 110, 63–73, https://doi.org/10.1016/j.jclinepi.2019.02.016

40.

Naragon-Gainey

Watson

(2018). What lies beyond neuroticism? An examination of unique contributions of social-cognitive vulnerabilities to internalizing disorders. Assessment, 25(2), 143–158. https://doi.org/10.1177/1073191116659741

41.

O’Conner

B. P.

(2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test. Behavior Research Methods, Instruments & Computers, 32(3), 396–402. https://doi.org/10.3758/BF03200807

42.

Ong

C. W.

Pierce

B. G.

Peterson

J. M.

Barney

J. L.

Fruge

J. E.

Levin

M. E.

Twohig

M. P.

(2020). A psychometric comparison of psychological inflexibility measures: Discriminant validity and item performance. Journal of Contextual Behavioral Science, 18, 34–47. https://doi.org/10.1016/j.jcbs.2020.08.007

43.

Robinson

P. L.

Russell

Dysch

(2019). Third-wave therapies for long-term neurological conditions: A systematic review to evaluate the status and quality of evidence. Brain Impairment, 20(1), 58–80. https://doi.org/10.1017/BrImp.2019.2

44.

Rochefort

Baldwin

A. S.

Chmielewski

(2018). Experiential avoidance: An examination of the construct validity of the AAQ-II and MEAQ. Behavior Therapy, 49, 435–449. https://doi.org/10.1016/j.beth.2017.08.008

45.

Ruiz

F. J.

(2012). Acceptance and commitment therapy versus traditional cognitive behavioral therapy: A systematic review and meta-analysis or current empirical evidence. International Journal of Psychology and Psychological Therapy, 12(3), 333–358. https://doi.org/10.1016/j.jcbs.2020.09.009

46.

Soto

C. J.

John

O. P.

(2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. https://doi.org/10.1037/pspp0000096

47.

Steiger

J. H.

(1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87, 245–251. https://doi.org/10.1037/0033-2909.87.2.245

48.

Stenhoff

Steadman

Nevitt

Benson

White

R. G.

(2020). Acceptance and commitment therapy and subjective well-being: A systematic review and meta-analysis of randomised controlled trials in adults. Journal of Contextual Behavioral Science, 18, 256–272. https://doi.org/10.1016/j.jcbs.2020.08.008

49.

Strauss

M. E.

Smith

G. T.

(2009). Construct validity: Advances in theory and methodology. Annual Review of Clinical Psychology, 5, 1–25. https://doi.org/10.1146/annurev.clinpsy.032408.153639

50.

Tackett

J. L.

Brandes

C. M.

King

K. M.

Markon

K. E.

(2019). Psychology’s replication crisis and clinical psychological science. Annual Review of Clinical Psychology, 15, 579–604. https://doi.org/10.1146/annurev-clinpsy-050718-095710

51.

Tackett

J. L.

Lilienfeld

S. O.

Patrick

C. J.

Johnson

S. L.

Krueger

R. F.

Miller

J. D.

Oltmanns

T. F.

Shrout

P. E.

(2017). It’s time to broaden the replicability conversation: Thoughts for and from clinical psychological science. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 12(5), 742–756. https://doi.org/10.1177/1745691617690042

52.

Tackett

J. L.

Miller

J. D.

(2019). Introduction to the special section on increasing replicability, transparency and openness in clinical psychology. Journal of Abnormal Psychology, 126(6), 487–492. https://doi.org/10.1037/abn0000455

53.

Tyndall

Waldeck

Pancani

Whelan

Roche

Dawson

D. L.

(2019). The Acceptance and Action Questionnaire-II (AAQ-II) as a measure of experiential avoidance: Concerns over discriminant validity. Journal of Contextual Behavioral Science, 12, 278–284. https://doi.org/10.1016/j.jcbs.2018.09.005

54.

Vaughan-Johnston

T. I.

Quickert

R. E.

MacDonald

T. K.

(2017). Psychological flexibility under fire: Testing the incremental validity of experiential avoidance. Personality and Individual Differences, 105, 335–349. https://doi.org/10.1016/j.paid.2016.10.011

55.

Watson

(2012). Objective tests as instruments of psychological theory and research. In Cooper

(Ed.), Handbook of research methods in psychology: Vol. 1. Foundations, planning, measures, and psychometrics (pp. 349–3690). American Psychological Association. https://doi.org/10.1037/13619-019

56.

Watson

Clark

L. A.

(1992). On traits and temperament: General and specific factors of emotional experience and their relation to the Five-Factor model. Journal of Personality, 60(2), 4410476. https://doi.org/10.1111/j.1467-6494.1992.tb00980.x

57.

Watson

Clark

L. A.

Tellegen

(1988). Development and validation of brief measures of positive and negative affect: The PANAS scales. Journal of Personality and Social Psychology, 54(6), 1063–1070. https://doi.org/10.1037/0022-3514.54.6.1063

58.

Watson

Stasik

S. M.

Chmielewski

Naragon-Gainey

(2015). Development and validation of the temperament and affectivity inventory (TAI). Assessment, 22(5), 540–560. https://doi.org/10.1177/1073191114557943

59.

Wolgast

(2014). What does the Acceptance and Action Questionnaire (AAQ-II) really measure? Behavior Therapy, 45(6), 831–839. https://doi.org/10.1016/j.beth.2014.07.002

60.

Zhenggang

Shiga

Luyao

Sijie

Iris

(2020). Acceptance and Commitment Therapy (ACT) to reduce depression: A systematic review and meta-analysis. Journal of Affective Disorders, 260, 728–737. https://doi.org/10.1016/j.jad.2019.09.040