Abstract
Measuring trichotillomania is essential for understanding and treating it effectively. Using the Situated Assessment Method (SAM2), we developed a psychometric instrument to assess hair pulling in situations where it occurs. In two studies, pullers evaluated their pulling in relevant situations, along with how much they experience factors that potentially influence it (e.g., external triggers, reduction in negative emotion, negative self-thoughts). Individual measures of pulling, averaged across situations, exhibited high test reliability, construct validity, and content validity. Large differences between situations in pulling were observed, along with large individual-situation interactions (with limited evidence distinguishing focused versus automatic pulling subtypes). In linear regressions for individual participants, factors that influence pulling tended to correlate with pulling as predicted, explaining a median 74%–83% of its variance. By identifying factors that predict pulling for each individual across situations, the SAM2 Trichotillomania Assessment Instrument (TAI) offers a rich understanding of an individual’s pulling experience, potentially supporting individualized pulling interventions.
Trichotillomania, or hair pulling disorder, is characterized by the recurrent pulling of one’s own hair, leading to hair loss and marked functional impairment (American Psychiatric Association, 2013). Trichotillomania is a highly heterogenous disorder, varying in pulling situations (e.g., watching TV, looking in the mirror), in pulling sites (e.g., head, arm, eyebrows), and in pulling duration (Barber et al., 2024). Hair pulling further varies in whether it is focused or automatic (Flessner, Woods, Franklin, Keuthen, et al., 2008). Focused pulling occurs when an individual pulls their hair intentionally, with awareness of the pulling and an associated urge to do so. Automatic pulling occurs when an individual pulls their hair with little or no awareness that they are doing so. There is debate as to the existence of these subtypes and the potential number, with some researchers suggesting as many as four (Flessner, Conelea, et al., 2008) and others three (Grant et al., 2021). Recent research has also suggested that focused and automatic subtypes are not valid or useful, with individuals often enacting both types within and across pulling episodes (Grant & Chamberlain, 2021a).
Significant distress can be associated with trichotillomania, impacting a person’s quality of life (Barber et al., 2024; Grant et al., 2020). Despite the potentially serious consequences of trichotillomania, relatively limited research has addressed it, compared to other psychopathologies, making the design of effective treatments all the more difficult. To develop appropriate well-motivated treatments, it is first important to measure and characterize trichotillomania accurately. Our primary aim here is to contribute a novel psychometric tool for doing so.
Methods for Measuring Trichotillomania
Current approaches for assessing trichotillomania take an unsituated approach, using decontextualized items that ask an individual to abstract over situations and establish general impressions of how much they agree with statements about pulling. For example, a widely used self-report psychometric instrument, the Massachusetts General Hospital Hair Pulling Scale (the MGH-HPS; Keuthen et al., 1995), asks individuals to answer seven statements, such as “On an average day, how often did you feel the urge to pull your hair?” To answer such assessment items, an individual must abstract over life situations (e.g., watching TV, sitting in a meeting) to provide a general impression of their urges. Individuals need not consult their experience of pulling in specific situations but can simply access or construct general impressions of their overall pulling experience, using whatever information comes to mind. Other examples of unsituated measures used currently to assess trichotillomania include self-report measures such as the Trichotillomania Scale for Children (TSC; Tolin et al., 2008), the Milwaukee Inventory for Styles of Trichotillomania—Adult and Children Versions (MIST-A, MIST-C; Flessner et al., 2007; Flessner, Woods, Franklin, Cashin, et al., 2008), and the Trichotillomania Dimensional Scale (TTM-D; LeBeau et al., 2013). Additional unsituated measures include interview scales such as the NIMH Trichotillomania Impact Scale/Trichotillomania Severity Scale (TIS/TIM; Swedo et al., 1989), the Yale-Brown Obsessive-Compulsive Scale-Trichotillomania (Y-BOCS-TM; Stanley et al., 1993), and the Psychiatric Institute Trichotillomania Scale (PITS; Winchel et al., 1992).
Using these unsituated measures for trichotillomania could lead to inaccurate responses when it is difficult for individuals to abstract an accurate judgment across relevant situations. Instead, individuals may rely on intuitive theories and/or the availability heuristic to do so (Ajzen, 1977; Gelman & Legare, 2011; Tversky & Kahneman, 1973).
A second issue is that unsituated measures ignore situational variability (Bandura, 1978; Cervone, 2005; Cervone et al., 2001; Fleeson & Jayawickreme, 2021; Mischel & Shoda, 1995). Decades of research demonstrate that individuals do not exhibit constant levels of a construct or behavior across situations. Consider hair pulling. An individual may pull their hair regularly when alone watching TV but may pull rarely when at work. In addition, different individuals may respond differently to the same situations, such that an individual-situation interaction results. While one puller might pull mostly in stressful situations, another might pull mostly in boring situations. Thus, when assessing a construct, it is important to go beyond simply establishing a single trait-level measure for each individual. It is also essential to capture how the construct varies for each individual uniquely across situations. Dutriaux et al. (2023) provide further discussion about the implications of situation effects for assessment instruments. Indeed it has been noted that assessing trichotillomania is particularly challenging due to the heterogeneous nature of the condition both between and within individuals (Barber et al., 2023). Unsituated measures may therefore struggle to capture the rich individual differences documented in the trichotillomania literature (Barber et al., 2023; Woods & Houghton, 2014).
An Alternative Approach to Measuring Trichotillomania—The Situated Assessment Method
The Situated Assessment Method (SAM2) is a general assessment framework that measures diverse behaviors in a situated manner, thereby addressing the limitations of unsituated assessment measures just described (for a detailed treatment, see Dutriaux et al., 2023). When constructing a SAM2 assessment instrument to assess a construct, one first identifies relevant situations where the construct does and does not occur (to ensure unrestricted variance) and then subsequently identifies processes that influence the construct in these situations. Thus, to establish a SAM2 Trichotillomania Assessment Instrument (the SAM2 TAI), we first identified a set of situations where pulling typically does and does not occur. We then identified processes established in the scientific and clinical literatures known to influence trichotillomania, presumably in these kinds of situations. The following sections describe how we integrated these two dimensions of situatedness to build the SAM2 TAI.
Establishing Situations Where Pulling Does and Does Not Occur
Often experience sampling is used to measure a construct in situations where it occurs. Experience sampling exhibits two important limitations that can make it difficult to assess individual differences efficiently and accurately (Dutriaux et al., 2023). First, because experience sampling is typically performed over many days, collecting situational data is expensive and effortful, making it a relatively inefficient assessment procedure. Second, because the situations sampled are not controlled, they can vary widely between individuals. As a result, well-controlled measures across individuals do not result, creating challenges to assessing individual differences accurately.
The SAM2 approach offers solutions to both problems. First, a SAM2 assessment can be performed in a single session, making it efficient (Dutriaux et al., 2023, further suggest a variety of approaches for creating brief SAM2 instruments that are even more efficient). Second, SAM2 assesses all individuals in a comparable manner by assessing them in the same set of situations (rather than in different sets).
To establish situations for the SAM2 TAI here, we first conducted a norming study that collected 435 unique pulling and non-pulling situations (fully described in SM-1). From these 435 situations, we sampled a representative set of 52 situations to evaluate in the SAM2 TAI (31 pulling situations, 21 non-pulling situations). Tables 1 and 2 present these situations. As Dutriaux et al. (2023) describe, presenting these situations to participants is likely to activate specific situational memories from their life that they then evaluate when responding to survey items.
The 31 Pulling Situations Assessed by all Participants in Studies 1 and 2.
Note. The situations above were established in a separate norming study. Each situation is presented with its domain, its generated frequency, and its average judgment for frequency, arousal, and valence across the participants who produced it. See the text and SM-1 for further details. For both Tables 1 and 2, the domains are as follows: UniWork-activities related to university or work; FamRel-activities related to families or relationships,; Travel-travel related activities; Health-health related activities; LeisHom-leisure activities at home; LeisOut-leisure activities outside of the home; NonLeis-non-leisure activities at home. Generated frequency refers to how commonly a situation was generated by participants for either pulling or non-pulling situations. Frequency was assessed by participants on a scale from 0 (never) to 5 (once or more a day) for how often the situation occurred. Arousal was assessed by participants on a scale from 0 (no bodily arousal) to 4 (intense bodily arousal) for how much bodily arousal they felt during the situation regardless of pulling. Valence was assessed by participants on a scale from −3 (highly unpleasant) to 3 (highly pleasant) for how pleasant they found the situation regardless of pulling.
The 21 Non-Pulling Situations Assessed by All Participants in Studies 1 and 2.
Establishing Processes in Situations That Influence Pulling
To establish processes likely to influence pulling for individuals with trichotillomania, we turned to the current literature. Of particular interest were three models of hair pulling: the Comprehensive Behavioral (ComB) Model, the Model of Cognitions and Beliefs, and the Emotion Regulation Model. The ComB Model was included because it offers a well-established explanation of hair pulling behavior, developed to capture and address important aspects of the hair pulling experience. The ComB Model also motivated the first treatment developed for trichotillomania, a treatment that has received significant support in the literature (Bottesi et al., 2020; Carlson et al., 2021; Falkenstein et al., 2016). The Emotion Regulation Model was also included here because it offers a widely accepted and established account of hair pulling (Bottesi et al., 2016; Crowe et al., 2024). Finally, the Model of Cognitions and Beliefs was included to establish potentially important cognitions in hair pulling, given that dominant models in the literature have tended to focus on behavior and emotion regulation (Rehm et al., 2016). We address each model next in turn, describing processes that each suggests are likely to influence pulling and urges. Finally, we summarize the processes extracted from these models for use in the SAM2 TAI.
The Comprehensive Behavioral Model
The ComB model is rooted in behavioral theory, following principles of classical and operant conditioning, thereby focusing on conditioned cues, discriminative stimuli, conditioned behaviors, and their consequences (Mansueto et al., 1997). Mansueto et al. propose that encountering a conditioned cue for pulling increases the urge to pull. Cues can be external (e.g., settings, pulling implements) and/or internal (e.g., affective, sensory, and cognitive states). Mansueto et al. posit that external and internal cues become classically conditioned to hair pulling, such that they become triggers for pulling urges and pulling behaviors.
In addition to the proposed processes that trigger urges and pulling, ComB further proposes that instrumental processes can facilitate or inhibit pulling. Similar to cues that initiate pulling, cues that modulate pulling can be external or internal. Once the cycle of pulling begins, accompanying behaviors can occur ritualistically before pulling, during pulling, or after pulling. These behaviors can lead to consequences that are reinforcing, including emotional consequences (e.g., pleasure) and relief from unwanted emotions. Aversive consequences can also occur, such as undesired emotional states that appear when pulling terminates. If these aversive consequences also function as cues for the individual, the pulling cycle may continue.
Model of Cognitions and Beliefs
Rehm et al. (2015) identified six superordinate themes related to cognitions and beliefs that are often central to the pulling cycle: (a) negative self-beliefs, with subthemes for worthless self and viewing oneself as abnormal; (b) control beliefs, with subthemes for loss of control and importance of control; (c) coping beliefs, with subthemes for low coping efficacy and experiential avoidance; (d) negative emotional beliefs that deem emotions as “good” or “bad,” with subthemes for tolerability and acceptability; (e) permission giving beliefs, with subthemes for justification, all-or-nothing, and reward; (f) perfectionism related to judgments about hair quality and pulling quality, with subthemes for “just right” standards and mastery through perfection. These beliefs and cognitions play different roles at different points in the pulling cycle, sometimes being antecedent and sometimes supporting maintenance.
Emotion Regulation Model
Emotion regulation refers to how a person experiences and expresses emotion, along with how they influence its presence and timing (Roberts et al., 2013). The Emotion Regulation Model for hair pulling focuses on negative reinforcement, where the function of pulling is to alleviate negative emotion, with relief subsequently reinforcing and perpetuating pulling behavior. When an uncomfortable emotional experience occurs, it triggers a pulling episode that results in relief, which in turn rewards pulling.
Processes That Influence Pulling Included in the SAM2 TAI
To measure processes that influence pulling behavior in pulling situations, the SAM2 TAI initially included 13 processes extracted from the three models just reviewed (later reduced in Study 2 based on the results of Study 1). Table 3 presents these processes, together with the scales used to measure them. Consistent with the ComB Model, we included processes for triggers (external cues and internal cues), behavior (automatic vs. focused pulling, ritualized behavior), and reward (reduction in negative emotion, how good pulling feels, long-term consequences). Consistent with the Cognitions and Beliefs Model, we included processes for negative self-beliefs (internal triggers, self-valence), negative emotion (self-valence, arousal), control beliefs (external control, internal control), poor coping (experiential avoidance), justifying outcomes (reduction in negative emotion, how good pulling feels, long-term consequences), and perfectionism (perfectionistic standards, ritualized behavior). Consistent with the Emotion Regulation Model, we included processes for emotional states (self-valence, arousal), emotion regulation (internal control), and pulling as emotion regulation (reduction in negative emotion).
Scale, Interrater Agreement, and Test Reliability for the 13 Measures Assessed in Study 1 and for the 8 Distilled Processes in Study 2.
Note. Intraclass correlations were computed using the ICC function in the R Psych package. The values on the left that assessed interrater agreement for a measure across situations used the ICC2 measure for random effects, such that these values are likely to generalize across samples of participants from the same population. Test reliability, estimated by ICC3k, for each measure is shown on the far right (i.e., Cronbach’s alpha). The first two measures are the dependent variables (frequency, urge). The following eight measures correspond to the eight distilled measures in Study 2. For the processes that were distilled in Study 2, the measures from Study 1 that they were distilled from are shown as well. For instance, triggers in Study 2 were distilled from external cue and internal cue in Study 1 because of their high correlation.
Because the processes important for each of the three models overlap, most of the included processes were not specific to one model. Instead, our aim was to capture all relevant processes across models to establish a comprehensive set that could potentially predict an individual’s pulling behavior at a high level across pulling and non-pulling situations.
Overview and Hypotheses
The primary aim of the following two studies was to assess the SAM2 TAI’s psychometric properties related to individual differences, test reliability, situation effects, construct validity, and content validity. Another primary aim was to see what we could learn about trichotillomania from using the SAM2 TAI to assess it. A secondary aim was to compare the SAM2 TAI with a traditional unsituated psychometric instrument for assessing trichotillomania (the MGH-HPS). A final aim was to investigate how both measures of trichotillomania are related to personality traits, self-control, and automatic versus focused pulling.
After performing Study 1, we developed two additional aims for Study 2. First, we aimed to replicate the basic pattern of results observed in Study 1. Second, we wanted to improve on the set of predictors in the SAM2 TAI. Study 1 used 13 predictors that, in some cases, were highly correlated, leading to potential problems with collinearity. In addition, participants had to evaluate 52 situations for 13 predictors, thereby requiring much time to complete the assessment. Study 2 therefore distilled the initial 13 predictors into 8 critical predictors, making them less redundant and less work for participants to evaluate. As we will see, reducing the number of predictors did not diminish their overall ability to explain variance in pulling—indeed, the 8 predictors actually explained more variance than the 13 predictors.
Because Studies 1 and 2 were exploratory, we did not pre-register hypotheses. Nevertheless, we did have tentative hypotheses about results that we expected to see, especially after performing Study 1. We were also interested in performing several exploratory analyses.
Hypothesis 1: Large Reliable Individual Differences in Trichotillomania
Specifically, we expected that mean individual scores for pulling frequency and urge strength across situations on the SAM2 TAI would range across at least half the scale from 2.5 to 7.5.
Hypothesis 2a: Substantial Situation Effects
Specifically, we expected that a given participant would pull frequently in some situations but not pull at all in others, such that their judgments would typically range across the entire scale from 0 to 10.
Hypothesis 2b: Substantial Situation by Individual Interactions
Specifically, we expected that participants would differ considerably in how they pull across the same situations, such that the intraclass correlation for agreement between would not be high (i.e., <.50).
Hypothesis 3: High Construct and Content Validity for SAM2 Measures of Trichotillomania
Specifically, for construct validity, we predicted that the SAM2 TAI measures for frequency and urge would tend to be moderately to highly correlated with many, if not most, of the influential processes (>|.30| to|.60|). For content validity, we predicted that the influential processes would explain high amounts of variance in individual regressions (>60%), demonstrating that these processes explain pulling comprehensively.
Hypothesis 4: Low Correlations Between Situated and Unsituated Measures of Pulling
Specifically, we predicted that the SAM2 TAI measures for frequency and urge would correlate <.30 with the MGH-HPS. Because the SAM2 TAI assesses pulling in a specific set of relevant situations, its trait-level measure of pulling should differ significantly from the trait-level measure in an unsituated instrument, where a much smaller set of situations may be evaluated, a different set, or perhaps none at all.
Discovery
In a first discovery analysis, we assessed how consistently pullers exhibited automatic versus focused pulling across situations. In a second discovery analysis, we explored correlations of the SAM2 TAI measures for pulling frequency and urge strength with unsituated measures for the Big Five personality traits, self-control, and automatic versus focused pulling but had no specific predictions. In a final discovery analysis, we assessed whether participants exhibited awareness of the influential processes that are most important in their pulling. To explore this issue, we assessed the correlation of (a) a participant’s explicit judgments of how much the different processes influence their pulling with (b) the SAM2 TAI’s implicit assessments of how strongly the processes were actually associated with the individual’s pulling across situations.
Methods
Because the methods and analyses used for Studies 1 and 2 were essentially the same, except for the influential processes assessed, the methods for both studies have been combined into a single methods section. Similarly, the results for both studies have later been combined into a single results section.
Participants
Study 1 recruited 124 participants from social media support groups for trichotillomania and from the TLC Foundation for Body-Focused Repetitive Behaviors (www.bfrb.org). Study 2 recruited 99 participants from social media support groups. For both studies, available funds for paying participants determined the number of participants sampled. Participants were required to be aged 18 years or older, be fluent English speakers, and self-report having trichotillomania.
Several diagnostic checks were conducted before running the main analyses to identify participants who either responded mechanically (giving a constant response) or randomly. Seven participants were excluded from Study 1 as a result of these checks, leaving a total of 117 participants (F = 105, M = 7, other = 5, mean age = 29. 38, SD = 8.77). No participants were excluded from Study 2 (n = 99, F = 90, M = 8, other = 1, mean age = 28.59, SD = 8.33). For both studies, participants were paid £7 in Amazon vouchers (or the equivalent in USD, CAD, or EUR).
Design
Studies 1 and 2 both used a multilevel design, with all participants at the individual level evaluating the same 52 situations at the situation level (Tables 1 and 2). Both studies assessed the same two dependent variables across situations (pulling frequency and urge strength), together with processes known to influence them (13 processes in Study 2, 8 processes in Study 3; Table 3). In addition, all participants completed four unsituated individual difference measures at the individual level.
Materials
SAM2 Trichotillomania Assessment Instrument
The SAM2 TAI used the 52 situations in Tables 1 and 2, together with 15 judgment scales (Study 1) or 10 judgment scales (Study 2) in Table 3. The situations were sampled from a norming study presented in SM-1. As described in the introduction, the judgment scales were motivated by models of pulling.
In Study 2, we wanted to reduce the number of influential processes assessed in Study 1 for two reasons. First, some of these processes were highly correlated in Study 1, thereby potentially introducing problems of collinearity. Second, participants needed a lot of time to evaluate the 13 processes, and we wanted to reduce the time needed to evaluate them significantly. We therefore assessed the 13 influential processes carefully, first examining the empirical correlations between them in Study 1, and second examining how related they are conceptually and/or theoretically. Based on these analyses, we reduced the 13 influential processes in Study 1 to 8 processes in Study 2 as described next.
Because external and internal cues were highly correlated (r = .66) and are closely related conceptually/theoretically, we distilled them into a single process that combined both types of cues. Because (negative) self-valence and experiential avoidance were highly correlated (r = .66) and are related conceptually/theoretically, we distilled them into a single process that captured negative valence. Because situational control and internal control were highly correlated (r = .69) and are closely related conceptually/theoretically, we distilled them into a single process that combined both types of control. Because hair pulling subtype and perfectionist standards were modestly correlated (r = .37) and because perfectionism is often associated with more focused pulling (Grant et al., 2021), we distilled them into a single process that focused on pulling subtype. Because reduction in negative emotion and how good pulling feels were moderately correlated (r = .49) and are closely related conceptually/theoretically, we distilled them into a single process that combined both. Again, we wanted to distill the influential processes as much as possible to reduce the time required to perform the SAM2 TAI. As will become clear later, reducing the number of predictors from 13 to 8 did not diminish the SAM2 TAI’s performance—if anything performance improved. To see how scales for the influential processes evolved from Study 1 to Study 2, please see the specific forms they took in Table 3.
Unsituated Individual Difference Measures
The following psychometric instruments were used to assess personality, self-control, hair pulling severity, and hair pulling subtype: The Big Five Inventory (BFI, John & Srivastava, 1999); Brief Self-Control Scale (BSCS; Tangney et al., 2004); The MGH-HPS (Keuthen et al., 1995); and The Milwaukee Inventory for Subtypes of Trichotillomania–Adult version (MIST-A, Flessner, Woods, Franklin, Cashin, et al., 2008).
Awareness of Influential Processes
To assess participants’ awareness of how strongly the influential processes in each study were related to their pulling, they were asked to estimate, “To what extent does [influential process] in a situation influence the amount of pulling you perform?” SM-3 presents all the specific questions asked in Studies 1 and 2. For each process, the estimated influence was measured on a slider scale from 0 to 100, with the labels, “no influence at all,” “moderate influence,” and “very strong influence.” Results, presented in SM-3, indicate that participants exhibited some awareness of the processes that influence their pulling, accompanied by many incorrect beliefs.
Procedure
All participants performed the study online using the Qualtrics platform, after being referred there by a link on social media or a website. Participants first received an information sheet about the study and then provided informed consent. Ethics approval was granted by the College of Science and Engineering Ethics Committee at the University of Glasgow (application 300180053).
Participants first evaluated the 52 situations for the two dependent variables, urge and frequency, and then evaluated the 13 processes in Study 1 or for the 8 distilled processes in Study 2 (Tables 1 and 2). For Study 1, the 15 measures were presented in six blocks that combined two or three measures in a block as follows: Block 1 assessed urge strength and pulling frequency; Block 2 assessed external and internal cues; Block 3 assessed valence, arousal, and experiential avoidance; Block 4 assessed situational and internal control; Block 5 assessed subtype, perfectionistic standards, and ritualized behavior; Block 6 assessed how pulling feels, reduction in negative emotion, and long-term consequences. In each of the six blocks, the 52 situations were presented in a random order. As each situation appeared, participants evaluated it sequentially on the two or three measures assessed in the respective block. For all participants, the six blocks were presented in the order described above. Similarly, the measures within each block were collected for each situation in the order just described. Instructions at the start of each block provided a detailed description of the measures to be evaluated in it.
For Study 2, the two dependent variables were presented initially in two separate blocks ordered randomly for each participant, followed by the eight blocks for the distilled processes in Table 3, also ordered randomly. While 15 measures were combined in 6 blocks for Study 1, 10 measures were collected individually in 10 blocks for Study 2. As for Study 1, the 52 situations were randomized within each block uniquely for each participant, and instructions for each measure were presented at the start of the respective block.
For both studies, the collection of demographic information for nationality, gender, age, and education level followed the SAM2 blocks. Then, to assess explicit awareness of the processes that influence pulling, participants estimated how much they believed each of the 13/8 processes influence their pulling. Finally, the four unsituated individual difference measures followed: the BFI, the BSCS, the MGH HPS, and the Milwaukee Inventory of Subtypes of Trichotillomania (adult version).
At the conclusion of each study, participants were debriefed, thanked for their participation, and paid. Including breaks, participants took approximately 100 minutes to complete Study 1 and approximately 55 minutes to complete Study 2.
Results
All data and analysis scripts are publicly available online at OSF (https://osf.io/sqhzu/).
Hypothesis 1: Large Reliable Individual Differences in Trichotillomania
We predicted that individuals would exhibit considerable variability in trait levels of pulling frequency and urge strength (when averaged across situations). Figure 1 shows each participant’s mean judgment across the 52 situations for each dependent variable (pulling frequency, urge strength), together with their mean evaluation for each of the 13 influential processes in Study 1 and for each of the 8 influential processes in Study 2. Each plot shows the distribution of trait-level values for a measure across the individuals sampled. In both studies, median levels of about 3.5 for pulling frequency and of about 4 for urge strength indicate that many individuals typically experienced low to moderate levels of pulling and urges across these situations. As we will see shortly, however, each individual tended to vary widely in their pulling and urges across situations, typically exhibiting high levels in some situations.

Box and Whisker Plots for Average Pulling Frequency, Urge Strength, and the Influential Processes in Study 1 (Panel A) and Study 2 (Panel B).
The median levels of pulling frequency and urge strength in Figure 1 were accompanied by substantial individual differences, as predicted. In both studies, trait-level values of pulling frequency ranged from about 0.5 to 8, and trait-level values of urge strength ranged from about 0.5 to 9, both covering nearly the entire scale. Across the same 52 situations, some individuals exhibited very low overall levels of pulling frequency and urge strength, whereas others exhibited very high levels.
Interestingly, as Figure 1 illustrates further, roughly half the individuals in each study tended to be focused pullers across situations (with a mean value for subtype greater than 0), whereas the other half tended to be automatic pullers (with a mean value less than 0). Although a few individuals in each study exhibited extreme levels of focused pulling (approaching +5) or automatic pulling (approaching −5), most participants exhibited values near 0, exhibiting a mixture of both focused and automatic pulling (as seen in more detail later).
As we just saw in Figure 1, the SAM2 TAI establishes large individual differences for trait-level measures of pulling frequency and urge strength. It also establishes reliable measures, as established by Cronbach’s alpha (specifically ICC3k; Shrout & Fleiss, 1979). Table 3 presents these results on the far right. As can be seen, satisfactory alphas were observed well above the acceptable range of 0.70–0.80, averaging around 0.95. Similar levels also occurred for the influential processes in both studies, demonstrating that the SAM2 TAI exhibits excellent test reliability for all its measures. Because we were only interested in the reliability of overall measures, coefficient alpha was sufficient for this purpose. Because it is not necessary that the situations in the SAM2 TAI exhibit internal consistency (Dutriaux et al., 2023), it was not appropriate to assess coefficient omega (Flora, 2020).
Hypothesis 2a and 2b: Substantial Situation Effects and Individual-Situation Interactions
We predicted that specific situations would have a substantial impact on an individual’s pulling frequency and urge strength, with their levels varying situation by situation. Rather than exhibiting constant trait levels of pulling as situations varied, we expected to observe substantial variability in each individual’s pulling across situations. Indeed, we expected that a participant’s judgments for pulling frequency and urge strength would typically cover the entire range of these scales across situations (also see Fleeson & Jayawickreme, 2021). We further predicted that there would be a large individual-situation interaction for each measure, as the levels of pulling and urges would depend not only on the situation but also on the individual.
Figures 2A and 2B present strong support for these hypotheses. In each visualization, a row represents a participant’s judgments of pulling frequency in Study 1 or Study 2. Each column represents the judgments for 1 of the 52 situations. Each cell represents a participant’s judgment of pulling frequency in the respective situation. The redder a cell, the higher the pulling frequency; the bluer the cell, the lower the pulling frequency. Highly similar results were obtained for urge strength, but because the two dependent variables correlated .85 and .88 in Studies 1 and 2, respectively, we only present the results for pulling frequency here.

Visualizations of the Pulling Frequency Judgments for the 117 Participants in Study 1 (Panel A) and the 99 Participants in Study 2 (Panel B) Across the 52 Situations.
As Figure 2 illustrates, substantial situation effects are present. For most participants, their individual judgments covered nearly the entire scale across situations. Across participants, some situations exhibited a consistently high (red) pulling frequency, whereas other situations exhibited a consistently low (blue) frequency. Figure 2 also visualizes the trait levels of pulling for individuals shown earlier in Figure 1, reflected here in the overall redness/blueness of a participant’s row.
Finally, Figures 2A and 2B demonstrate substantial individual-situation interactions. Specifically, individuals varied widely in the pattern of pulling frequency they exhibited across the same 52 situations (further reflected in the different clusters of individuals shown on the left). Across situations, different participants (and clusters of participants) exhibited different patterns of pulling. The intraclass correlations for agreement in Table 3 quantify the magnitude of these interactions, establishing the average correlation between participants. Specifically, the average correlation between participants (rows) in their judgments of pulling frequency across situations (columns) was only .41 in Study 1 and .43 in Study 2. As these values for agreement indicate, participants interacted with situations considerably by showing different patterns of pulling across the same 52 situations. Again, the SAM2 TAI captured these large individual differences.
In an exploratory analysis, we further assessed situation effects for the pulling subtype measure. Of interest was how consistent individuals were across situations in focused versus automatic pulling, and also how much individual patterns differed across situations. Figures 3A and 3B visualize the hair-pulling subtype judgments in Study 1 and Study 2 for each participant (rows) in each situation (columns). As values become redder, individuals pulled in a more focused manner; as values become bluer, they pulled in a more automatic manner.

Visualizations of the Hair-pulling Subtype Judgments for the 117 Participants in Study 1 (Panel A) and the 99 Participants in Study 2 (Panel B) Across the 52 Situations.
As Figure 3 illustrates, only a small minority of individuals solely performed a single type of pulling across the 52 situations. Instead, most individuals performed both types of pulling in different situations, with the specific situations where each type of pulling occurred varying considerably between individuals. As a result, very large individual-situation interactions occurred in both studies, as reflected in agreement (ICC2) of only .05 in Table 3 for pulling subtype measure. As Figure 3 further illustrates, three clusters of individuals emerged for the subtype. A top cluster in both panels exhibited mixed pulling (both automatic and focused). A smaller middle cluster predominantly exhibited focused pulling (but not always) and a cluster toward the bottom predominantly exhibited automatic pulling (again not always). These patterns not only demonstrate there are no clear automatic and focused pullers but also show how much situations affect the type of pulling an individual performs, and also how these situational effects differ across individuals.
Hypothesis 3: High Construct and Content Validity for SAM2 TAI Measures of Trichotillomania
We next assessed construct validity at the individual level. For each individual, we first computed a composite measure of pulling frequency and urge strength (i.e., for each situation, the average of an individual’s frequency and urge judgments). Because these two measures correlated very highly (r = .85 in Study 2; r = .88 in Study 3), they captured highly similar information. Combining them simplified later analyses and created a robust dependent variable that reflected both measures.
For each individual, we then correlated their composite measure of pulling across the 52 situations with each of their judgments for the 13 influential processes across situations in Study 1, or with each of their judgments for the 8 influential processes across situations in Study 2. The resulting vector of 13/8 correlations constituted a prediction profile for each individual. If the SAM2 TAI exhibits construct validity, correlations within these prediction profiles should be high. The composite measure of pulling should correlate highly with processes known to influence pulling.
Figures 4A and 4B visualize the individual prediction profiles obtained in this analysis. Each row in Figure 4A represents the vector of 13 correlations for one individual in Study 1; each row in Figure 4B represents the vector of 8 correlations for one individual in Study 2. Each column represents the correlations for a single influential process across individuals. Each cell in a row visualizes the magnitude of a correlation for an individual between the composite measure of pulling and a specific influential process. As a cell becomes redder, the correlation approaches +1; as a cell becomes bluer, the correlation approaches −1; as a cell becomes whiter, its correlation approaches 0. The correlations are summarized at the bottom of each figure, presenting the median and interquartile range of the correlations for each influential process across participants.

Individual Prediction Profiles of Pulling in Study 1 (Panel A) and Study 2 (Panel B).
General patterns across individuals emerge in Figures 4A and 4B. Consistently, across both studies, internal and external cues (just triggers in Study 2) predicted pulling the strongest (median r = .62, .79, .79). Reduction in negative emotion also predicted pulling strongly in both studies (median r = .55 and .77). In Study 1, internal control (−0.53) predicted pulling well, followed by situational control (−0.38), ritualistic behaviors (0.37), perfectionist standards (0.36), valence (−0.35), how pulling feels (0.30), experiential avoidance (−0.29), and long-term consequences (0.18). In Study 2, rituals (0.70), control (−0.64), and long-term consequences (0.63) all predicted pulling well, followed by valence (−0.39) and arousal (0.22). Pulling subtype tended not to predict pulling well in either study (0.13, 0.02). Similar to what we saw earlier in Figures 3A and 3B, individuals varied widely in how subtype related to their pulling. For about one-third of the individuals, pulling increased as focused pulling increased (red cells); for another third, pulling increased as focused pulling decreased and automatic pulling increased (blue cells); and for the final third, little relation emerged between pulling and pulling subtype.
These results establish strong construct validity for the SAM2 composite measure of pulling. Processes established in the literature that influence pulling predicted pulling well in the SAM2 TAI at the individual level (except for pulling subtype, which showed substantial individual differences).
Finally, we assessed the content validity of the SAM2 TAI. We hypothesized that the influential processes would explain a relatively large amount of variance in the composite measure of pulling, demonstrating comprehensive coverage. To assess content validity at the group level for the composite measure, we established the amount of variance that a multilevel mixed-effect model explained in it. For each study, the influential processes were modeled as fixed effects. Due to the moderate-to-high correlations between five pairs of processes in Study 1 (described earlier in the methods), a single component was constructed for each pair using principal component analysis. Three original processes were left unchanged, resulting in a total of eight fixed factors included to predict the composite measure of pulling. For Study 2, all eight of the original processes were included as fixed factors, given that no problems emerged with collinearity. For both studies, random intercepts and slopes were included for participants and situations. Across models, the variance explained at the group level was around 65% in Study 1 and 70% in Study 2. These results indicate that the SAM2 TAI exhibits high content validity at the group level, with the influential processes comprehensively explaining variance in the composite measure of pulling.
At the individual level, the variance explained was even higher, indicating that explained variance at the group level was attenuated by individual differences. For each individual, their composite measure was regressed onto their judgments for the 13/8 influential processes across situations (using simple linear regression). The median individual variance explained across these individual regressions was 74% for Study 1 and 83% for Study 2. These high levels of explained variance at the individual level again indicate that the influential processes comprehensively explained the composite measure of pulling in the SAM2 TAI.
Hypothesis 4: Low Correlations Between Situated and Unsituated Measures of Trichotillomania
We predicted that there would be low correlations of the SAM2 measures for pulling frequency and urge strength with the unsituated MGH-HPS (Keuthen et al., 1995). Indeed, the correlation between the SAM2 measures and the MGH-HPS was relatively low, but nevertheless significant in both studies (Study 1 frequency r = .33, p < .001, Study 1 urge r = .31, p < .001, Study 2 frequency r = .23, p = .020, and Study 2 urge r = .24, p = .019). These correlations are noticeably lower than the correlations between the SAM2 measures for pulling frequency and urge strength with each other (r = .85 in Study 1, p < .0001; r = .88 in Study 2, p < .0001).
Discovery: Correlations Between SAM2 TAI Measures and Individual Difference Measures
In a final discovery analysis, we explored correlations of the SAM2 measures for pulling frequency and urge strength with measures for the Big 5 personality traits, self-control, and focused versus automatic pulling but had no specific predictions. For Study 1, only the SAM2 measure for urge strength correlated significantly with neuroticism (r = .32, p = .0005); no other correlations were significant. For Study 2, both SAM2 measures for frequency and urge correlated significantly with neuroticism (r = .38, p = .0001; r = .36, p = .0002) and focused pulling (r = .44, p < .0001; r = .39, p < .0001). Interestingly, all these correlations were higher for the SAM2 measures than for the MGH-HPS measure (and also for Study 2 relative to Study 1; SM-2 presents the full tables of correlations).
Discussion
Using the Situated Assessment Method (SAM2; Dutriaux et al., 2023), we developed a situated approach to assessing trichotillomania. Rather than assessing hair pulling with unsituated test items—as in typical psychometric instruments—we assessed it in specific situations where hair pulling does and does not tend to occur. In addition, we assessed processes known to influence pulling frequency and urge strength in these situations from well-established models of pulling in the literature. Using this approach, we established a rich descriptive profile of pulling for each individual across pulling and non-pulling situations.
Summary of Results
Individual Differences
Using the SAM2 TAI, we established trait levels of pulling frequency and urge strength for each individual (i.e., their mean judgment for each construct across the 52 pulling and non-pulling situations). The median trait-level value for both pulling frequency and urge strength was around 3.5 to 4 in both studies (on a scale of 0–10), indicating moderate levels in our samples (Figure 1). More important was how much these trait judgments varied across individuals, indicating substantial individual differences. Some individuals exhibited very low levels of pulling frequency and urge strength, whereas others experienced very high levels across the same situations. When Cronbach’s alpha was used to assess test reliability, these trait-level measures exhibited excellent levels around .95.
Situation Effects and Situation by Individual Interactions
Not only did the SAM2 TAI establish large individual differences, it also established large differences between situations (Figures 2A and 2B). As expected, some situations exhibited relatively high levels of pulling frequency and urge strength, whereas others exhibited relatively low levels. More importantly, large situation by individual interactions emerged for both pulling frequency and urge strength, indicating that individuals experienced the same 52 situations quite differently with respect to pulling and urges. On average, across the two studies, pulling frequency for one individual across situations only correlated around .42 with pulling frequency for another individual on average. A similar level of .42 emerged for urge strength (Table 3).
All these results indicate that both situation effects and situation-individual interactions are important when assessing individual levels of pulling frequency and urge strength. Only focusing on a single trait-level measure masks considerable individual-specific variability at the situation level. Establishing the unique pattern of situational variability for an individual is central to understanding their pulling (Dutriaux et al., 2023; Fleeson & Jayawickreme, 2021). The SAM2 TAI captures these patterns. Because different individuals experience different patterns of pulling and urges across the same situations, the situation alone is not the sole cause of their pulling experience. Instead, each individual’s unique cognitive-affective system also plays a major role, reflecting the kinds of processes proposed in the three models of trichotillomania addressed earlier (Bandura, 1978; Cervone, 2005; Cervone et al., 2001; Dutriaux et al., 2023; Fleeson & Jayawickreme, 2021; Mischel & Shoda, 1995).
Construct Validity
The SAM2 TAI exhibited high levels of construct validity. Specifically, the SAM2 composite measure of pulling correlated well with processes known to influence pulling in the literature (Figure 4). Some of these processes correlated quite highly with pulling, including external cues, internal cues, and reduction in negative emotion. Other processes correlated moderately to weakly with pulling, including self-valence, the abilities to control situations and emotions, ritualized pulling behavior, perfectionist standards, long-term consequences, and arousal. In general, the SAM2 composite measure of pulling captured diverse sources of influence known to affect pulling, thereby establishing its construct validity.
Perhaps one finding that deserves some explanation is the positive correlation between the long-term consequences of pulling and the SAM2 composite measure. It might seem surprising that pulling increases as the negative long-term consequences of pulling increase as well. Instead, it might seem that people would pull less as the long-term consequences of pulling become increasingly severe. What this relationship might indicate instead is that the more people pull, the worse the long-term consequences become. Rather than long-term consequences causing pulling to decrease, increased pulling causes long-term consequences to increase. Because our correlational data do not justify causal conclusions, these possibilities constitute a potential topic for future research.
Content Validity
The SAM2 TAI also exhibited high levels of content validity. Specifically, the influential processes that the SAM2 TAI assessed explained high levels of variance in the composite measure of pulling (i.e., the average of pulling frequency and urge strength). At the group level, the influential processes explained around 65%–70% of the variance. At the individual level, the influential processes explained an even higher 74%–83%. Higher explanation at the individual level most likely resulted from large individual differences attenuating prediction at the group level. These results indicate that the influential processes in the SAM2 TAI explain the construct of hair pulling comprehensively.
Relations to Unsituated Individual Difference Measures
The SAM2 TAI correlated significantly with the unsituated MGH-HPS but only at low to moderate levels (r = .24–.33), indicating that the situated and unsituated measurements captured related but different information. Because the SAM2 TAI assesses pulling in a specific set of relevant situations, its trait-level measure of pulling differed significantly from the trait-level measure in an unsituated instrument, where a much smaller set of situations may have been evaluated, a different set, or perhaps none at all.
Of further interest was the relationship between the SAM2 TAI and other unsituated individual difference measures. For both studies, urge strength correlated positively with neuroticism (emotionality); for Study 2, pulling frequency correlated positively with neuroticism as well. This is perhaps not surprising, given that neuroticism has correlated with trichotillomania consistently (Grant & Chamberlain, 2021b; Hagh-Shenas et al., 2015; Keuthen et al., 2015, 2016).
Implications for Models of Hair Pulling
When examining the correlational results for each individual (Figure 4), evidence for current models of air pulling emerged. Support for the ComB emerged most strongly (Mansueto et al., 1997), as reflected in the strong positive correlations for triggering cues for almost every participant. Furthermore, for many participants, but not all, ritualistic behavior also demonstrated strong positive correlations with frequency and urges. Consistent with the reward component of the ComB model, reduction in negative emotion and how good pulling feels exhibited strong positive correlations for the majority of participants.
In support for the Model of Cognitions and Beliefs (Rehm et al., 2015), the importance of negative self-beliefs and negative appraisal of negative emotions was captured by influential processes here for internal cues and self-valence (negative self-beliefs). In Figure 4, self-valence often correlated negatively with pulling, and internal cues often correlated positively. Also central to the model by Rehm et al. is the role of experiential avoidance in pulling. Consistent with this account, Study 1 exhibited a negative relationship between experiential avoidance and pulling for many individuals (Figure 4A). As individuals became less willing to experience negative emotion, they pulled more (although a minority of individuals exhibited the opposite relation). Control in the hair-pulling cycle also plays a central role in this model. Again, in our results we can see that, for many individuals, low levels of control, particularly internal control, were associated with increased pulling. Similar to the ComB model, the positive correlations of pulling with reduction in negative emotion and how good pulling feels also support the cognitions and beliefs model. For both models, pulling is related to the outcomes of pulling. Finally, this model also discusses the importance of perfectionistic standards in the hair-pulling cycle. Figure 4 offers mixed support for this factor, with it being quite important for some individuals but not important for others, in particular, more automatic pullers.
Finally, our results also support the Emotion Regulation Model of hair pulling. Perhaps the strongest evidence comes from the importance of internal cues (which could be one’s emotional state), internal control (evidence of emotion regulation—or lack of), and reduction in negative emotion. Although these influential processes have a strong relationship with pulling and offer support for the emotion-regulation model, one could also argue that this model ignores a lot of other important processes in the pulling cycle. Indeed, all three models receive support here, but no one alone accounts for all the influential processes in pulling observed.
Perhaps the Situated Action Cycle can be used to integrate the important insights across all three models (Barsalou, 2020; Dutriaux et al., 2023). In the Situated Action Cycle, perceived entities and events in the environment typically initiate the cycle, such as external cues for pulling. Once these cues are perceived, their self-relevance is assessed in relation to the individual’s goals, values, social norms, and identity. For hair pulling, self-relevance takes the form of internal cues, how good pulling feels, reduction in negative emotion, and self-valence. These states of self-relevance then induce affect that can take the form of emotions or motivations, including the urge to pull, self-valence, arousal, internal control, and experiential avoidance. If motivation to pull is sufficiently strong, it can induce actions such as actual hair pulling (frequency of pulling), situational control, subtype behavior (automatic vs. focused), perfectionistic standards, and ritualized behavior. Finally, actions lead to outcomes, including how good pulling feels, reduction in negative emotion, and long-term consequences. As this brief summary illustrates, the Situated Action Cycle offers a natural way to integrate processes across the three models of hair pulling.
Hair Pulling Subtypes
As the distribution of trait-level values for subtype in Figure 1 illustrates, the SAM2 TAI captured individual differences in focused versus automatic pulling. Whereas some individuals exhibited high levels of focused pulling across situations (high positive values), other individuals exhibited high levels of automatic pulling (low negative values).
When looking at the correlations between subtype and the composite measure of pulling in Figure 4, similar differences emerged. For some individuals, the more focused their pulling, the more they pulled. For other individuals, the more automatic their pulling, the more they pulled.
Figure 3, however, suggests a striking heterogeneity within pulling types, with most individuals exhibiting various mixtures of automatic and focused pulling across situations. From examining these visualizations, it is difficult to conclude that there are two distinct types of pullers, or even three. Instead, it appears that most individuals pull in both ways, with some individuals pulling more often in an automatic manner, with others pulling more often in a focused manner, and with still others pulling in an evenly mixed manner across situations. Interestingly, high levels of pulling can emerge across situations when pulling is either focused or automatic.
The existence of subtypes, together with their number and associated characteristics, continues to be an important issue in the trichotillomania literature (Flessner, Conelea, et al., 2008; Grant & Chamberlain, 2021a; Grant et al., 2021). Based on the results observed here, however, it is not clear how compelling these typologies are. When examining Figures 1 and 3, strong well-differentiated clusters of pulling subtypes do not emerge. Instead, there simply seems to be tremendous variability in the processes associated with pulling for different individuals, together with large situational effects and situation-individual interactions.
If the type of pulling someone exhibits is related to the efficacy of treatment, then continuing to establish subtypes is important (McGuire et al., 2020). As our findings suggest, though, the most important differences may exist at the level of individuals, not at the level of subtypes. If so, then trying to fit individuals into pulling subtypes may not be all that useful or beneficial for designing effective interventions. Within potential subtypes, large individual variation may affect treatment outcomes significantly. For this reason, it may be more useful if treatment focuses on the individual and is tailored to what influences that individual’s pulling most.
Limitations
One significant limitation of this study is the correlational nature of its design and results. Although these results are informative and provide a rich description of individual differences in trichotillomania, they do not establish causality. We cannot conclude what may cause someone to pull their hair but can only conclude that certain factors are associated with pulling. We cannot be sure, for example, that removing external triggers in an environment will reduce pulling frequency and urge strength, even though they are highly correlated with one another. Exploring these relationships further with causal methods offers a useful avenue for future research, especially for developing effective treatments. Nonetheless, even if a process does not cause pulling, its relationship to pulling can still be useful in treatment for a variety of reasons. For example, knowing that external cues are strongly associated with pulling offers a potential target for managing pulling. The external cues may not cause the pulling, but learning to avoid them may minimize encountering correlated factors that together play causal roles.
Another significant limitation is that we do not use the SAM2 TAI to predict actual pulling experience in everyday life. More specifically, we do not verify that the levels of pulling and urges that an individual indicates in the SAM2 TAI for each situation actually occur when these situations are experienced. An important issue for future research is to establish whether the SAM2 TAI offers accurate predictions of pulling in actual situations, together with accurate trait-level measures across them.
Conclusion
The SAM2 offers a novel approach to assessing the important condition of trichotillomania. By assessing hair pulling in situations, it becomes possible to establish rich descriptive profiles of pulling for individuals and to further examine how individuals vary in their situational profiles. In addition, the SAM2 TAI exhibits high levels of test reliability, construct validity, and content validity. By evaluating processes extracted from existing models of trichotillomania, it became possible to establish the processes associated with pulling at both the group and individual levels. Establishing such relationships can play an important role in defining trichotillomania and in determining effective treatments for reducing it.
Supplemental Material
sj-docx-1-asm-10.1177_10731911241262140 – Supplemental material for Developing and Evaluating a Situated Assessment Instrument for Trichotillomania: The SAM2 TAI
Supplemental material, sj-docx-1-asm-10.1177_10731911241262140 for Developing and Evaluating a Situated Assessment Instrument for Trichotillomania: The SAM2 TAI by Courtney Taylor Browne Lūka, Katie Hendry, Léo Dutriaux, Judith L. Stevenson and Lawrence W. Barsalou in Assessment
Footnotes
Acknowledgements
We are grateful to the editor and the reviewers for their constructive observations and suggestions that strengthened this article significantly.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
