Abstract
Experiential Avoidance is a core construct of third-wave behavioral theories and a predictor of internalizing psychopathology. Experiential avoidance has been most frequently measured using the Acceptance and Action Questionnaire-II (AAQ-II). However, several studies have indicated the AAQ-II scale scores demonstrate poor discriminant validity from neuroticism, calling into question the interpretation of past findings and leading some researchers to suggest measuring experiential avoidance with the Multidimensional Experiential Avoidance Questionnaire (MEAQ). In large online community (
Research on third-wave behavioral therapies has provided an abundance of empirical support for their efficacy in fostering well-being (Howell & Passmore, 2019; Stenhoff et al., 2020) and, by extension, treating symptoms of anxiety and depression (Byrne et al., 2019; Garey et al., 2020; Robinson et al., 2019; Zhenggang et al., 2020). As such, third-wave behavioral theories are a major focus of psychological research, particularly for improving clinical interventions (Hayes et al., 2022). Specifically, a core construct of third-wave behavioral theory, experiential avoidance, has been suggested as a key factor in the development and maintenance of psychological distress (Akbari et al., 2022; Angelakis & Gooding, 2021; Angelakis & Pseftogianni, 2021; Brereton & McGlinchey, 2020). Experiential avoidance has been defined as “the tendency to avoid uncomfortable thoughts, feelings, and experiences, even when doing so leads to long-term negative consequences” (Gámez et al., 2011; Hayes et al., 1999). This includes attempts to control internal experiences by means of suppressing thoughts and emotions and judging oneself for having negative internal experiences (Hayes-Skelton & Eustis, 2020). Past research has demonstrated that experiential avoidance predicts several clinical outcomes, such as anxiety, depression, trauma, and obsessive-compulsive symptoms (Akbari et al., 2022; Angelakis & Gooding, 2021; Angelakis & Pseftogianni, 2021; Kroska et al., 2018).
Notably, experiential avoidance expands upon typical conceptualizations of avoidance. Avoidance has been traditionally discussed as the resistance to encountering an unpleasant person, place, or situation due to fear of a specific and immediate consequence (APA, 2018). Experiential avoidance takes this further by emphasizing the act of avoiding
Experiential Avoidance as a Distinct Construct
It is important to note that Hayes and colleagues (1996) state experiential avoidance is a unique and distinct construct from both personality trait neuroticism and trait negative affect (NA), two constructs that are themselves very similar and highly overlapping (e.g.,
As such, the theoretical distinction of neuroticism and NA from experiential avoidance is essential to third-wave behavioral theory, and many theoretical assumptions rest on the foundation of experiential avoidance as a unique entity. Moreover, it is well-established that verifying discriminant validity from neuroticism and NA is an essential step in determining the construct validity of scale scores for any clinically relevant measure (Clark & Watson, 1995, 2019; Watson, 2012). Failure to do so can lead to inaccurate results and undermine conclusions drawn (Clark & Watson, 1995, 2019; Cronbach & Meehl, 1955; Strauss & Smith, 2009). To align with theory, the distinctions of experiential avoidance from neuroticism and NA must be captured by the scale scores of measures of experiential avoidance. For the current study’s purposes, neuroticism and NA will be discussed together, given that the goal is to distinguish experiential avoidance from these constructs. However, neuroticism is discussed as the main reference point for establishing discriminant validity of experiential avoidance because of its historically robust associations with, and predictive power of, internalizing symptoms.
Issues With Measurement of Experiential Avoidance
The most used measure of experiential avoidance is the Acceptance and Action Questionnaire II (AAQ-II; Bond et al., 2011); however, concerns regarding the construct validity of scores on the AAQ-II, and therefore the measure itself, have emerged (Broman-Fulks et al., 2021; Rochefort et al., 2018; Tyndall et al., 2019; Vaughan-Johnston et al., 2017; Wolgast, 2014). These criticisms have mainly focused on the AAQ-II scale score’s poor discriminant validity from neuroticism and NA. Several researchers have criticized the AAQ-II for including items that contain language with high conceptual overlap with neuroticism, and having poor internal consistency (Gámez et al., 2011; Rochefort et al., 2018; Tyndall et al., 2019; Vaughan-Johnston et al., 2017; Wolgast, 2014). Alternatively, another measure, the Multidimensional Experiential Avoidance Questionnaire (MEAQ; Gámez et al., 2011), was specifically created to capture the theoretical distinctions of experiential avoidance from neuroticism and NA (i.e., demonstrate adequate discriminant validity) as well as to capture the full breadth of experiential avoidance (Gámez et al., 2011).
Rochefort et al. (2018) conducted the most thorough evaluation of how well scores from both measures assess experiential avoidance, show discriminant validity from neuroticism and NA measures, and align with other third-wave behavioral constructs. In two large samples, they assessed (a) the convergent validity of the AAQ-II and the MEAQ scale scores and their discriminant validity from neuroticism and NA (b) the hierarchical structure of the AAQ-II, MEAQ, mindfulness,
1
neuroticism, and negative affect at the total scale score, subscale score, and item levels. In both samples, the AAQ-II scale scores were so highly correlated with scale scores from measures of neuroticism and NA that they would be considered evidence of convergent validity (
Need for Replication and Extension of Rochefort and Colleagues (2018) Study
In recent years, scientists have highlighted the importance of replication for clinical psychology (Tackett et al., 2019; Tackett et al., 2017). As noted by Tackett & Miller, (2019, p. 597) “The replication movement requires greater involvement and engagement by clinical psychological researchers.” Replication, a “core principle of objective, empirical science,” bolsters our trust in scientific results (Tackett & Miller, 2019). Moreover, despite research indicating the AAQ-II scale scores demonstrate poor discriminant validity from those of neuroticism and NA measures, it continues to be the most widely used measure of experiential avoidance; this continued widespread use of the AAQ-II suggests the existing evidence is not sufficient to change practices among researchers. In addition, it is necessary to ensure results are not due to the specific measures used in any given study. It is possible that the neuroticism and NA measures used in Rochefort et al. (2018) were driving the strength of the associations and that other measures would yield different results. As such, replicating their findings in new samples with different measures of neuroticism and negative affectivity is necessary.
It also remains unclear if the scores from the AAQ-II assess neuroticism and NA at the domain-level, or more specifically assess facet-level scores of neuroticism and/or NA measures. Facets home in on specific components of neuroticism, such as emotional volatility. This is important because the neuroticism and NA measures used by Rochefort et al. (2018) included a total of only two subscales to target neuroticism facets. Assessing more neuroticism and NA facets scores would allow a more fine-grained test; examining associations of AAQ-II and MEAQ scores with increased specificity is highly relevant for more precisely evaluating what is assessed by scores on the AAQ-II and MEAQ, as well as clarifying results from research using the AAQ-II.
Current Study
The current study replicates and extends Rochefort et al. (2018) using two large samples of online community and undergraduate participants. Following the same procedures and analyses as Rochefort et al. (2018), we expanded upon that study by using more comprehensive and/or updated measures of neuroticism and negative affectivity. This was done to (a) rule out any measure-specific effects that may have contributed to the findings in the original study and (b) include more subscales capturing facets of neuroticism and NA, thereby allowing for a more precise assessment of what the AAQ-II and MEAQ scores are assessing. Our methods, sample size, data analytic plan, and hypotheses were pre-registered (https://doi.org/10.17605/OSF.IO/DK8EX). 2
Compared to Rochefort et al. (2018), the current study uses the Big Five Inventory-2 (BFI-2; Soto & John, 2017) instead of the Big Five Inventory (BFI; John et al., 1991), the NEO-IPIP (NEO-IPIP; Johnson, 2014) instead of the Big Five Aspects Scale (BFAS; DeYoung et al., 2007), and the Temperament and Affectivity Inventory (TAI; Watson et al., 2015) instead of the Positive and Negative Affect Schedule (PANAS; Watson et al., 1988). The BFI-2 and TAI were specifically developed to address the psychometric limitations of the BFI and PANAS (see “Measures”). Altogether, current measures have subscales that target 9 neuroticism facets and 8 NA facets, compared to only 2 facets in the original study (i.e., BFAS Neuroticism: volatility and withdrawal).
Method
Participants and Procedures
The current study follows the same procedures, design, and sample types as Rochefort and colleagues’ (2018) study. The first sample (
The second sample (
All participants provided informed consent and completed the study online via Qualtrics. Participants were removed if they failed to correctly respond to 2 out of 3 validity check items (e.g., “please select “always true””). Both samples were primarily female (MTurk: 58.3%, undergraduate: 71.1%) and non-Hispanic white (MTurk: 93.3%, undergraduate: 89.5%). MTurk participants received monetary compensation, and undergraduate students received course credit for their participation.
Hypotheses
We hypothesize that general findings from Rochefort et al. (2018) will be replicated in the current study. Specifically, based on the original study and the literature regarding discriminant and convergent correlation cut-offs (see Clark & Watson, 2019), we hypothesize that the AAQ-II scale scores will demonstrate poor discriminant validity (
We also hypothesize that the AAQ-II scale scores will function as an indicator of latent neuroticism and NA instead of latent experiential avoidance at the scale, subscale, and item levels in our structural analyses. In other words, the AAQ-II total score and its individual items are expected to consistently load onto factors with neuroticism and NA content across levels of analysis. In contrast, the MEAQ total scale score is expected to function as expected for a measure of experiential avoidance. Specifically, the MEAQ scores will load on factors with the mindfulness measure scores, or they will form their own factors. We did not have any pre-registered hypotheses regarding whether the AAQ-II scores assess neuroticism or NA broadly or are best conceptualized as facets of neuroticism and/or NA.
Measures
Third-Wave Behavioral Therapy Scales
Acceptance and Avoidance Questionnaire-II
The Acceptance and Avoidance Questionnaire-II (AAQ-II) is a widely used measure of experiential avoidance 3 (Bond et al., 2011; Hayes et al., 2006). It provides a total score made up of seven items (e.g., “I’m afraid of my feelings,” and “I worry about not being able to control my worries and feelings”) that participants rate using a Likert-type scale ranging from 1 (“Never or very rarely true”) to 5 (“Very often or always true”). Cronbach’s alpha for this study was α = .95 in the MTurk sample and α = .93 in the undergraduate sample, which resembled the Rochefort et al. (2018) (MTurk: α = .93, college: α = .91).
Multidimensional Experiential Avoidance Questionnaire
The MEAQ is a measure of experiential avoidance that was developed using the construct validity approach to scale development (Gámez et al., 2011). In the original development study, the MEAQ had good internal consistency across student, community, and clinical samples (scale level: α = 0.91–.95, subscale level: .76–.90,
Five Facet Mindfulness Questionnaire
The Five Facet Mindfulness Questionnaire (FFMQ) is a measure of mindfulness that consists of 39 items. Participants rate items using a Likert-type scale ranging from 1 (“Never or very rarely true”) to 5 (“Very often or always true”). This measure was developed via factor analysis of items from five existing mindfulness measures and is the most comprehensive mindfulness measure to date (Baer et al., 2006). The FFMQ has five subscales: Observe (e.g., “I pay attention to sensations, such as the wind in my hair or sun on my face”), Describe (e.g., “I can usually describe how I feel at the moment in considerable detail), Act with Awareness (e.g., “I find myself doing things without paying attention”), Nonjudge (e.g., “I tell myself I shouldn’
Neuroticism and Negative Affect Scales
Big Five Inventory-2–Neuroticism (BFI-2-N)
The BFI-2 is a 60-item measure of the Big Five personality traits. Participants rate how much each phrase generally describes them using a Scale that ranges from 1 (“Strongly agree”) to 5 (“Strongly disagree”). The BFI-2 is an update to the BFI, adding subscales (Anxiety, Depression, and Emotional Volatility) and rewording difficult items (Soto & John, 2017). In the current samples, Cronbach’s alpha for the full neuroticism scale was α = .93 in the MTurk sample and α = .89 in the undergraduate sample. For the subscale scores, the Cronbach’s alphas ranged α = .83 to .86 in the MTurk sample and α = .79 to .83 in the undergraduate sample. The domain scale alphas were like Soto and John (2017), α = .90, but the subscale alphas in the current study were generally higher (α = .78 to .84; Soto & John, 2017).
International Personality Item Pool–NEO, Neuroticism (IPIP-N)
The International Personality Item Pool (IPIP)-NEO is an open-source measure mirroring the NEO PI-R (Costa & McCrae, 2008; Goldberg, 1999; Goldberg et al., 2006). Like the NEO PI-R, it assesses the Big Five broad trait domains, and six lower-order facets of each domain. The IPIP Neuroticism scale contains 60 items and includes subscales targeting the following facets: Anxiety (e.g., “Worry about things”), Anger (e.g., “Get angry easily”), Depression (e.g., “Often feel blue”), Self-Consciousness (e.g., “Am easily intimidated”), Immoderation (e.g., “Often eat too much”), and Vulnerability (e.g., “Become overwhelmed by events”). Participants rate how much these items generally describe them by using a Likert-type scale that ranges from 1 (“Strongly Agree”) to 5 (“Strongly Disagree”). In the current sample, Cronbach’s alpha for the Neuroticism total scale was α = .97 for the MTurk sample and α = .95 for the undergraduate sample. The subscale Cronbach’s alphas ranged α = .82 to .91 in the MTurk sample and α = .75 to .91 in the undergraduate sample, all of which were slightly higher than those found in the original development of the measure; Johnson (2014) had a Neuroticism domain scale alpha of α = .90, and subscales ranged α = .72 to .87 (Johnson, 2014).
Temperament and Affectivity Inventory
The Temperament and Affectivity Inventory (TAI) is a 93-item measure that assesses trait affectivity developed using the traditional personality format (i.e., full sentences rather than single words or phrases); scales were validated in four samples and showed good convergent and discriminant validity, incremental validity, internal consistency, and dependability (Watson et al., 2015). The TAI was created to address limitations of the single-word items in the trait version of the Positive and Negative Affect Schedule (PANAS, see Watson et al., 2015), which was used in Rochefort et al. (2018). Participants rated how much they agree with a statement using a scale ranging from 1 (“Strongly Disagree”) to 5 (“Strongly Agree”).
In the current research the Negative Affectivity total score and subscales were used: Regret (e.g., “I have more than my share of regrets”), Depression (e.g., “the world can be a very dreary place”), Anger (e.g., “I tend to be rather irritable”), Anxiety (e.g., “I tend to be nervous”), Mistrust (e.g., “people are usually not what they seem”), Self-Doubt (e.g., “I often doubt myself”), Lassitude (e.g., “I sometimes feel too tired to do anything”), and Attentiveness (reverse-keyed, e.g., “It is easy for me to focus my attention on what needs to be done”). In the current samples, the Cronbach’s alpha for the overall Negative Affect scale score was α = .96 in the MTurk sample and α = .95 in the undergraduate sample. The Cronbach’s alphas ranged α = .84 to .93 in the MTurk sample and α = .74 to .90 in the undergraduate sample. These alphas were similar to those across six different samples in the original development of the TAI (Watson et al., 2015), where alphas ranged α = .80 to .90.
Data Analysis
Mirroring the analysis conducted in Rochefort and colleagues’ (2018) study, missing data were imputed using the SPSS version 25 Imputation Function. To follow the procedures of the original study, all missing data were imputed for every scale that had less than 30% missing data 4 (MTurk: missing item data ranged 2.2%–4.8% across scales; undergraduates: missing data ranged 1.4%–11.7% across scales). Pearson’s correlations were used to evaluate the convergent and discriminant associations between scale scores, and significance tests (Lee & Preacher, 2013) for dependent correlations were used to assess the difference in strength of associations.
Following Rochefort et al. (2018), separate exploratory factor analyses (EFA) with promax rotation were conducted at the total scale, subscale, and item level. For each set of analyses, to ensure every possible opportunity for the AAQ-II scores to distinguish themselves from neuroticism and NA scores, we continued extracting factors as suggested by scree plots, parallel analyses (O’Conner, 2000), and item loadings. In addition, we continued extracting factors (beyond what was suggested by scree plots and parallel analyses) as long as they contained “marker items” or items with loadings of 0.4 or higher on one primary factor (Clark & Watson, 2019).
To summarize the item-level analyses, we followed Goldberg’s (2006) bass-ackward method, where each consecutive factor solution in the hierarchy is graphically presented. Regression-based factor scores were computed for each factor solution and correlated with those of the next consecutive factor solution (e.g., factor scores from the two-factor solution correlated with those of the three-factor solution) (see Goldberg, 2006). This method
Results
Descriptive statistics for the total scales and subscales are presented in Table 1. Values resembled those found by Rochefort et al. (2018) and prior research (Broman-Fulks et al., 2021; Vaughan-Johnston et al., 2017). Correlations and corresponding Steiger-z significance tests are presented in Table 2. Notably, results were highly similar across the samples. Correlations between the AAQ-II and MEAQ (MTurk:
Means and Standard Deviations for All Scales and Subscales.
Pearson Correlations Between Total Scales for Both Samples.
Significance tests demonstrated that the AAQ-II scores were more strongly associated with neuroticism and NA scores than with the FFMQ and MEAQ scores in both the MTurk (i.e., BFI-2-N, IPIP-N, TAI, all
Subscale score correlations between the neuroticism subscales, the MEAQ subscales, and AAQ-II total scores are shown in Table 3. Like the scale-level correlations, results were similar across samples. Overall, the AAQ-II scores had moderate to strong associations with neuroticism subscales (MTurk:
Pearson Correlations Between Subscales for Both Samples.
Structural Analyses
Scale Level
EFAs with Promax rotation were conducted on the AAQ-II, MEAQ, FFMQ, BFI-2-N, IPIP-N, and TAI scale scores. In the MTurk sample, the scree plot suggested the extraction of two factors, and parallel analysis suggested the extraction of a single factor at the scale level. Up to three factors could be extracted before new factors no longer had any marker items. The two-factor solution contained a neuroticism and NA factor (BFI-2-N, IPIP-N, TAI, and AAQ-II) and an ACT/ “third-wave behavior therapy” (MEAQ) factor, with the FFMQ splitting across both factors. The three-factor solution included: (a) a neuroticism (IPIP-N, BFI-2-N) factor, (b) an ACT/ “third-wave behavior therapy” (MEAQ, FFMQ) factor, and (c) a worry and negative affectivity (AAQ-II, TAI) factor. The TAI cross-loaded on the neuroticism factor. Moreover, the neuroticism and NA factor were correlated
Subscale Level
The results of the subscale analyses are presented in Table 3. The columns are organized by the number of factors extracted for each sample, and the numbers in each row represent which factor (sub)scale scores loaded greater than .40 (e.g., subscales with a “1” loaded onto the first factor, subscales with a “2” loaded onto the second factor, etc.). Subscale scores that load onto more than one factor have more than one number.
In the MTurk sample, the scree plot suggested the extraction of two to eight factors, and parallel analyses suggested the extraction of 10 factors. Up to seven factors were extracted before no new factors had any marker items. Across factor solutions, the AAQ-II always loaded onto a factor consisting of neuroticism and NA subscales (e.g., anxiety, depression, immoderation, self-doubt, mistrust, anger) regardless of how many factors were extracted (see Table 4). Meanwhile, the MEAQ subscales either (a) formed their own factors (i.e., distraction and suppression, behavioral avoidance), or (b) loaded with FFMQ subscales, creating factors that represent broader third-wave behavior therapy constructs (i.e., mindfulness). Starting at the five-factor solution, the broad neuroticism and NA factor separated into specific neuroticism facet content (IPIP-N anger and TAI anger broke away to create an anger facet). Meanwhile, the AAQ-II scores (which are not supposed to measure neuroticism) continued to load on the broad neuroticism and NA factor, made up of anxiety and depression content (e.g., anxiety, depression, immoderation, self-doubt, mistrust). This general pattern continued for the remaining solutions.
Summary of Subscale Level Factor Analysis.
Indicates the subscale did not load > 0.40 on any factor.
In the undergraduate sample, the scree plot suggested the extraction of two to nine factors, and parallel analyses suggested the extraction of nine factors. However, only eight factors could be extracted before new factors no longer had any marker items. In the initial factor solutions, the pattern was initially less clear. In the two-factor solution, the first factor included scores from the AAQ-II, all the MEAQ subscales, most of the FFMQ subscales, and most of the TAI neuroticism and NA subscales (e.g., anxiety, depression, self-doubt, mistrust, lassitude, etc.). The BFI and IPIP neuroticism and NA formed the second factor. In the three-factor solution, several IPIP and TAI neuroticism and NA subscales moved to the general neuroticism and NA factor with the BFI and other IPIP subscales, while the AAQ-II and the MEAQ scores remained part of the large, mixed first factor. In addition, some of the FFMQ and TAI subscales formed a “distractedness” factor where mindfulness and attentiveness scales (i.e., Acting with Awareness, Procrastination, Attentiveness, Immoderation, and Lassitude) had negative loadings and TAI subscales (i.e., Lassitude, [in]attentiveness) had positive loadings. Starting at the four-factor solution, loadings shifted dramatically; the AAQ-II scores had high loadings on the broad neuroticism and NA factor, which consisted mainly of depression and anxiety subscale scores. The MEAQ subscales either (a) loaded onto their own experiential avoidance factors (i.e., distress aversion, behavioral avoidance, distraction, and suppression) or (b) loaded with FFMQ and TAI to create “denial” and “distractedness” factors.
In subsequent solutions, additional factors represented
To summarize, in both samples, the AAQ-II scores consistently loaded onto a neuroticism and NA factor for almost every solution. This occurred even when some neuroticism and NA subscale scores broke away to create a specific factor representing anger/emotional volatility, leaving the AAQ-II scores loading with depression/anxiety facets of neuroticism. Conversely, the MEAQ subscale scores loaded together to create a general experiential avoidance factor or loaded with the FFMQ subscale scores to create factors representing other third-wave constructs (i.e., mindfulness, distractedness, denial).
Item Level
Item level analyses provide information regarding what

MTurk Sample Item-Level Latent Construct Hierarchy.

Undergraduate Sample Item-Level Latent Construct Hierarchy.
In the MTurk sample (see Figure 1), the AAQ-II items initially loaded onto a factor that was a mix of third-wave behavior (distress aversion, distraction and suppression, behavioral avoidance, procrastination, non-judgment) and neuroticism and NA (mistrust, lassitude, self-doubt, depression, regret, anger, anxiety, etc.) content. However, as soon as a three-factor solution was extracted, all the AAQ-II items loaded onto a neuroticism and NA factor (e.g., depression, anxiety, anger, volatility, self-doubt, self-consciousness, vulnerability, lassitude). This pattern continued regardless of how many factors were extracted. As in the subscale analyses, neuroticism and NA began to break into more specific content, with items capturing anger forming their own factor. Nevertheless, all AAQ-II items continued to load onto a factor with content related to depression (depression, anger, volatility, lassitude, immoderation, etc.) and anxiety (anxiety, self-doubt, self-consciousness, vulnerability, etc.) from both neuroticism and trait NA measures. Meanwhile, the MEAQ items loaded onto factors capturing experiential avoidance and third-wave content in every solution. As more factors were extracted, most of the MEAQ items formed their own experiential avoidance factor. Indeed, several subscales of the MEAQ and neuroticism measures were roughly re-created as independent factors in the lower part of the bass-ackwards hierarchy. This was true even when we over-extracted factors; despite the emergence of factors without marker items, we continued all the way up to 18 factors as suggested by the parallel analysis, and the patterns described above persisted (see supplemental tree figure).
In the undergraduate sample (see Figure 1), the AAQ-II items initially loaded onto a factor containing a mix of neuroticism and NA and third-wave behavioral therapy content. However, as more factors were extracted, the AAQ-II items loaded onto a broad neuroticism and NA factor while the MEAQ and other third-wave content broke away to form new factors. Mirroring the subscale level, the broad neuroticism and NA factor eventually separated into more specific factors representing facets of neuroticism and NA (i.e., anger separated from anxiety and depression). By the four-factor solution, all AAQ-II items loaded onto a factor representing the depression and anxiety facets of neuroticism and NA. This pattern continued through the eight-factor solution. Starting at the nine-factor solution, there were signs of over-extraction, with the ninth factor consisting of only a splitter item. Indeed, in every subsequent solution, there were one or more factors that either consisted of only splitter-item(s) or had no marker items at all. Thus, all factors beyond the nine-factor solution should be interpreted with caution, as results are impacted by issues of over-extraction.
However, we extracted up to 17 factors in the undergraduate sample, as suggested to be the maximum number of possible factors by the parallel analysis (see supplemental tree figure). As we started over-extracting at the nine-factor solution, one AAQ-II item (i.e., “I’m afraid of my feelings”) did not load on any factors. Indeed, in several of the remaining factor solutions, three of the seven AAQ-II items (i.e., “I’m afraid of my feelings,” “Emotions cause problems in my life,” and “I worry about not being able to control my worries and feelings”) showed unique loading patterns. Specifically, items either did not load as a marker item on any factors, or they loaded onto an “Emotional Self-Judgement” factor, which first appeared in the nine-factor solution and captured the tendency to criticize oneself for experiencing negative emotions. However, this loading pattern was inconsistent and likely attributable to over-extraction, as several items across measures were being pulled away to capture increasingly specific content, with new factors often showing signs of being “bloated specifics” or factors formed purely based on similar terms/phrases used within marker items rather than representing a valid underlying latent construct (Clark & Watson, 2019). Of note, even when the items did not load onto any factors, they often
Nevertheless, even as we over-extracted factors, most AAQ-II items loaded with neuroticism/NA content. In contrast, for every factor solution, the MEAQ consistently loaded onto factors capturing third-wave content. Replicating results in the MTurk sample, the MEAQ items separated from items of other measures to form factors roughly representing MEAQ facets (i.e., one factor primarily consisted of MEAQ distress endurance items).
Discussion
The current study adds to the growing body of research demonstrating that scores on the AAQ-II are indicators of neuroticism and NA, with the current results clarifying that the AAQ-II specifically captures content within the anxiety and depression facets of neuroticism. In contrast, the MEAQ scores appear to assess experiential avoidance. These findings held true, regardless of whether the analyses were conducted at the scale, subscale, or item level. The current research replicates Rochefort and colleagues’ (2018) findings using updated neuroticism and NA measures (with numerous subscales) in new samples, ensuring that these results were not sample- or measure-specific.
Comparing Findings Across Studies
In both the current study and Rochefort et al. (2018), the AAQ-II and MEAQ scores were only moderately correlated with each other. Although correlations were slightly higher in the current study, they did not reach the level of convergent validity. Instead, the AAQ-II scores demonstrated convergence with neuroticism and NA scores in both studies, with correlations being slightly higher in the current study. Meanwhile, the MEAQ scores achieved discriminant validity from neuroticism and NA scores in both studies, although discriminant validity correlations were slightly higher in the current study. Indeed, all correlations were generally higher in magnitude in the current study, including those for the FFMQ.
In both studies, AAQ-II scores were more strongly correlated with neuroticism and NA than with the MEAQ and FFMQ scores. Regarding the MEAQ, in the original study, the MEAQ scores were more strongly correlated with FFMQ scores than
Overall, the structural analyses largely replicate Rochefort et al. (2018) with a slight discrepancy in the scale-level EFA. In the current study, AAQ-II scores split across factors in the two-factor undergraduate solution. This did not occur in Rochefort et al. (2018), where the AAQ-II scores always loaded completely with neuroticism and NA. Regarding the subscale structural analyses, general patterns replicated across studies. Furthermore, the current study extended upon the results of the original study, finding that the AAQ-II specifically loaded with depression and anxiety subscale scores in the last factor solutions. This was also true for the item-level analysis. Indeed, in both studies most if not all items from the AAQ-II did not load with other experiential avoidance or third-wave behavioral measures, nor did they form an independent “AAQ-II” factor. Instead, the AAQ-II items formed factors with depression and anxiety content from neuroticism and NA scales, or (in the undergraduate sample) either loaded onto an “Emotional Self-Judgement” factor or did not load onto any factor. When comparing the MEAQ item-level EFAs in the current study to those in Rochefort et al. (2018), we found that many of the same factors emerged, including “Mindfulness,” “Avoidance,” and “Repression and Denial.” As such, the MEAQ scores have continued to foster strong support for their construct validity, particularly discriminant validity from neuroticism and NA scores.
Taken together, the current results combined with previous research (Rochefort et al., 2018; Vaughan-Johnston et al., 2017; Wolgast, 2014) make it clear that the AAQ-II scores demonstrate poor construct validity, fail to assess the target latent construct (i.e., experiential avoidance), and instead assess anxiety and depression content within both neuroticism and NA. The subscale and item-level analyses expanded upon past research, demonstrating that the AAQ-II scores are best conceptualized as assessing a depression facet, and to a lesser extent, an anxiety facet, of the personality trait neuroticism. Rochefort and colleagues (2018, p. 446) concluded, “To the extent that the AAQ-II scores function as an indicator of neuroticism and NA, any conclusions regarding experiential avoidance based on the AAQ-II should be interpreted with caution.” We echo and reemphasize this as researchers and clinicians alike must be aware of the limitations of the AAQ-II.
Implications for Experiential Avoidance as a Construct
It is important to emphasize that the psychometric issues with the AAQ-II scores do not necessarily translate to experiential avoidance as a construct. Rather, reliance on the AAQ-II makes it difficult to understand the nature of experiential avoidance, as previous findings using the AAQ-II are best understood as replications of the well-established associations of neuroticism and negative affect with important clinical outcomes. Indeed, it is pertinent that researchers test the role of experiential avoidance in the development, maintenance, and treatment of internalizing psychopathology using psychometrically sound measures that actually assess experiential avoidance (see “Future Directions”). In the current study, the MEAQ scores demonstrated appropriate discriminant validity from neuroticism and NA scores. Past research provides initial evidence that the MEAQ scores demonstrate criterion validity for psychopathology (Gámez et al., 2011). Taken together, the current results combined with the results of Rochefort et al. (2018) and Gámez et al. (2011) provide evidence that experiential avoidance (when properly assessed) is a valid construct distinct from neuroticism and NA.
Semantic Overlap and Shared Method Variance of the AAQ-II
Of note, Clark and Watson (2019) argue that any item containing “worry,” or other similar words, is essentially guaranteed to capture neuroticism and/or negative affectivity content. As such, several (if not all) items from the AAQ-II seem to overlap semantically with items from neuroticism and negative affect scales. 5 We speculate that this might explain the poor discriminant validity of the AAQ-II scores from measures of neuroticism and NA. It may also explain the EFA results in the current and original study (Rochefort et al., 2018); it is likely that the AAQ-II scores loading with neuroticism content is the result of poorly worded items instead of something inherent in experiential avoidance as a construct. It is worth noting that there were no major differences across measurement formats for any of the measures included. The minor differences were that the AAQ-II and FFMQ shared the same response format (i.e., “never true” to “always true”), whereas the MEAQ, BFI-N, IPIP_N, and TAI used the same response format (i.e., “strongly agree” to “strongly disagree”). As such, shared method variance cannot explain the current results, as it would have increased the associations between the AAQ-II and FFMQ, as well as the MEAQ associations with neuroticism and NA. More importantly, the current results for the MEAQ, as well as the broader MEAQ literature, provide evidence for the validity of experiential avoidance as a construct in and of itself.
Implications for Research and Clinical Work Using the AAQ-II
Despite considerable evidence regarding substantial psychometric problems with the AAQ-II, it remains the most widely used measure of experiential avoidance. It is worth noting that some psychologists may focus on predictive validity and may prioritize usability. However, because outcomes are multi-determined, a focus on predictive validity provides only limited information about what a measure is truly assessing and can lead to inaccurate conclusions. Nevertheless, despite clear evidence that the AAQ-II does not assess experiential avoidance, some may argue that its predictive ability may still provide heuristic utility. It is also true that the brief nature of the AAQ-II makes it feasible for repeated assessments over the course of treatment, whereas longer measures (i.e., the MEAQ) may not be feasible. However, we argue that clinicians must consider their therapeutic goals. Given that therapies like ACT are not trying to alter neuroticism but rather aim to improve psychological well-being, measuring changes in neuroticism over time (as one would be doing with repeated administrations of the AAQ-II) may not align with the core tenets of ACT.
It is also worth noting that other brief measures of experiential avoidance exist, such as the Brief Experiential Avoidance Questionnaire (BEAQ; Gámez et al., 2014), which is a 15-item short form of the MEAQ. To explore the BEAQ as a clinically feasible alternative, we ran post hoc analyses in both samples following the same “bass-ackward” approach with the BEAQ. Critically, results (not reported) replicated those found with the MEAQ. This suggests that the BEAQ is a clinically feasible alternative to the AAQ-II when the full MEAQ cannot be administered.
Ultimately, we urge researchers and clinicians to interpret AAQ-II scores with the knowledge that they are not capturing changes in latent experiential avoidance. Moreover, results from previous studies using the AAQ-II must be re-interpreted with the knowledge that the AAQ-II scores are best conceptualized as targeting anxiety/depression facets of neuroticism. We believe this should be a primary consideration when selecting measures for use in research and clinical work.
Further Evaluation of the MEAQ
Although considerable research has found that the MEAQ scores demonstrate appropriate discriminant validity from neuroticism and NA, evidence regarding its discriminant validity from other constructs (e.g., distress tolerance, impulsivity, committed action, self-as-context) is lacking. Furthermore, it is necessary to empirically test the MEAQ’s ability to predict important outcomes for third-wave interventions, such as decreased experiential avoidance and increased quality of life. Likewise, few studies have evaluated the incremental predictive power of scores on the MEAQ over and above neuroticism. Two studies evaluated the incremental validity of two MEAQ subscales; Naragon-Gainey and Watson (2018) used the Behavioral Avoidance scale, and Anderson and colleagues (2021) used the Distress Aversion scale. These studies found limited or no evidence of incremental validity above and beyond neuroticism (Anderson et al., 2021; Naragon-Gainey & Watson, 2018). However, examining only two subscales makes it impossible to draw any firm conclusions. As such, it is critical for future research to test whether the MEAQ and other experiential avoidance measures demonstrate unique predictive power for internalizing symptoms that are not accounted for by neuroticism.
Limitations and Future Directions
We used a combination of large, age-diverse student and community samples to ensure more generalizable results than a single sample alone, and these samples replicated upon each other and past research. However, the current samples were not racially or gender diverse. Replicating this study using samples with greater variability in ethnicity and gender would improve the generalizability of findings. Likewise, these issues have yet to be explored in clinical samples. Although it is unlikely that different patterns of results would emerge in a clinical sample, it is critical to test this assumption, as much of the research on links between the MEAQ (and AAQ-II) scores and psychopathology has been conducted using student and online samples (see original development papers, Bond et al., 2011; Gámez et al., 2011).
Second, as previously discussed, a semantic analysis of the AAQ-II would help address unanswered questions regarding the AAQ-II. Specifically, if researchers are interested in knowing why the AAQ-II scores continue to perform poorly (as found in the present study, as well as past work; Rochefort et al., 2018; Wolgast, 2014), it would be beneficial to explicitly evaluate the degree of semantic overlap between the AAQ-II items and neuroticism and NA items. Clark and Watson (2019) have stated that “the inclusion of almost any negative mood term (e.g., “I worry about . . .,” . . .) virtually guarantees a substantial neuroticism/negative affectivity component to an item.” This issue applies to several AAQ-II items (e.g., “I worry about not being able to control my worries and feelings,” “Worries get in the way of my success.”). Although outside the scope of the current study, future research could use semantic similarity analyses (e.g., cosine similarity using lexical embeddings) and other methods to empirically test the degree to which poor item-wording is the reason for the poor discriminant validity of the AAQ-II.
Overall, the wide use of the AAQ-II highlights the need to inform broader audiences about the importance of using psychometrically sound measures whose scores assess the intended construct in research and practice. Of note, past reliance on the AAQ-II and the current results raise the question of whether experiential avoidance as a construct has unique associations with, or provides incremental predictive power for, internalizing psychopathology. This is an important topic for future research that must be tested using measures that actually assess experiential avoidance. Moreover, whether experiential avoidance has unique associations with other theoretically relevant outcomes (i.e., quality of life, life satisfaction) is a rich area of exploration.
A third limitation is that many third-wave behavioral theory constructs have murky conceptualizations, and measures of those constructs lack adequate support for their psychometric properties (Chawla & Ostafin, 2007; Gillanders et al., 2014; Ruiz, 2012). Indeed, in the current study, the FFMQ scores also demonstrated discriminant validity issues with neuroticism and NA scores, albeit not as severe as the AAQ-II scores. In fact, in the current study, much of the FFMQ contributed to the “Emotional Self-Judgement” factor, which seems to be a facet of NA given that it captures self-criticism for experiencing negative emotions. In addition, as mentioned earlier, it is possible that the FFMQ subscale scores are differentially associated with experiential avoidance scale scores. For example, based on their definitions, observing internal experiences (i.e., “Observe” subscale) is likely less related to experiential avoidance than not reacting to internal experiences (i.e., “Nonreact” subscale). Possible differential associations were not explored in the current study. Taken together, the psychometric properties of the FFMQ may need further evaluation in and of themselves, potentially limiting its strength as a comparison third-wave measure. Future studies should also evaluate FFMQ scores alongside neuroticism and NA scores, and scores of measures of related constructs (e.g., EA, mindfulness, fusion, committed action), the same way the AAQ-II and MEAQ scores were evaluated in the current study.
In general, the use of psychometrically sound measures must become a priority in third-wave behavioral theory research and practice. Aside from the MEAQ, there is the Brief Experiential Avoidance Questionnaire (BEAQ; Gámez et al., 2014), which is meant for quicker assessments of experiential avoidance. There are also other potential measures, such as the Multidimensional Psychological Flexibility Inventory (MPFI; Landi et al., 2021), which was developed to ensure discriminant validity from internalizing psychopathology but needs to be evaluated alongside neuroticism and NA. Moving forward, we strongly recommend that researchers use more psychometrically sound measures of experiential avoidance instead of the AAQ-II. More generally, the development of new, more valid, and reliable third-wave measures remains an important future direction. While all aspects of construct validity are important, demonstrating discriminant validity will be essential in this endeavor; there must be a greater focus on ensuring that measures capture their target construct and do not capture other similar constructs. Furthermore, the field must explicitly establish and empirically test expected associations between constructs like mindfulness and experiential avoidance. Such work would enhance the confidence of researchers’ findings, further solidifying third-wave behavioral theory.
Conclusion
Given the results of the current and past research, we recommend against the continued use of the AAQ-II, as there is substantial evidence that the AAQ-II scores represent latent trait neuroticism and NA (specifically anxiety and depression facets). Moreover, measurement must become a more central focus in third-wave behavioral theory research. Improved measures will allow researchers to better understand the constructs relevant to their theory and have greater confidence in their interpretations and results. To continue to advance research on experiential avoidance, and third-wave behavior therapy more broadly, considerable attention must focus on the psychometric properties of third-wave behavioral measures.
Supplemental Material
sj-docx-1-asm-10.1177_10731911261423143 – Supplemental material for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ
Supplemental material, sj-docx-1-asm-10.1177_10731911261423143 for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ by Alexa Jimenez, Catherine Rochefort Modén and Michael Chmielewski in Assessment
Supplemental Material
sj-docx-2-asm-10.1177_10731911261423143 – Supplemental material for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ
Supplemental material, sj-docx-2-asm-10.1177_10731911261423143 for Assessing Experiential Avoidance: Further Testing of the AAQ-II and the MEAQ by Alexa Jimenez, Catherine Rochefort Modén and Michael Chmielewski in Assessment
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Ethical Considerations
This research study has been reviewed and approved by the SMU Institutional Review Board. Participants provided written informed consent before participating in this study.
Supplemental Material
Supplemental material for this article is available online.
