Abstract
States are increasingly important in personality theory and research. Yet, the assessment of personality states usually relies on ad hoc measures whose development and evaluation are largely separated from theoretical considerations. To enable theory-guided development and evaluation of personality state measures, we introduce a framework based on the revised latent state-trait (LST-R) theory. The theory defines latent states as the expectation of an observed measure given a person in a specific situation, which can be decomposed into latent traits and latent situation-specific state residuals. Consequently, items and scales can be evaluated for their reliability due to latent traits (consistency) and situation-specific influences (specificity). We propose that specificity, in particular, is an appealing property for instruments designed to assess personality states. We illustrate this framework with experience sampling data on personality states. Our framework has implications for both the conceptualisation and the assessment of personality states. On the theoretical side, we provide a formal definition of personality states, which enables integration between trait-, process-, and development-focused theories. On the practical side, we show how using LST-R models allows researchers to develop and evaluate state measures on their own terms rather than applying criteria for trait measures to assess the qualities of state measures.
Plain language summary
A personality state is made up of the feelings, thoughts, behaviours, and/or desires a person experiences at a particular moment in time. Although personality is often thought of in terms of stable traits, personality states can fluctuate from moment to moment. Contemporary personality theories hold that personality traits reflect the distribution of states a person experiences over time. Despite the theoretical importance of personality states, personality (and other) researchers often lack the tools to assess states. One reason for this is the lack of a framework which can guide the development and evaluation of personality state measures. We propose such a framework. In doing so, we build on the revised latent state-trait (LST-R) theory. LST-R theory enables us to explicitly define personality states and evaluate personality state measures. In particular, we propose that personality state measures should be specific, which means that they reliably capture moment-to-moment fluctuations in personality states. We illustrate this framework by re-analysing personality states captured through experience sampling methods. Our framework has implications for both the conceptualisation and the assessment of personality states. On the theoretical side, we provide a formal definition of personality states, which enables integration between trait-, process-, and development-focused theories. On the practical side, we show how using LST-R models allows researchers to develop and evaluate measures which capture the most important properties of personality states.
Introduction
Personality states are increasingly important in personality research (Baumert et al., 2017; Fleeson & Jayawickreme, 2015; Horstmann & Ziegler, 2020). They are crucial ingredients of contemporary theoretical frameworks such as whole trait theory (WTT), which describes personality traits as distributions of states (Fleeson, 2001; Fleeson & Jayawickreme, 2015; Jayawickreme et al., 2019), and in recent work on personality dynamics (Danvers et al., 2020; Sosnowska et al., 2020). From a practical perspective, the ubiquity of smartphones has enabled researchers to gather intensive longitudinal data on personality states in everyday life (Hamaker & Wichers, 2017; Van Berkel et al., 2017). This has led to a proliferation of research on the variation of personality states across contexts and relationships (e.g. Church et al., 2013; Geukes et al., 2017; Kuper et al., 2022). Increasingly, personality states have also found conceptual and empirical applications outside of personality psychology. Examples span across clinical (Clark et al., 2003; Wright & Simms, 2016) and organisational psychology (Abrahams et al., 2023; Beckmann et al., 2021; Huang & Ryan, 2011; Judge et al., 2014; Nübold & Hülsheger, 2021) as well as computer science (Kalimeri et al., 2013; Staiano et al., 2011). This interdisciplinary popularity is a success story for personality psychology, but also underlines the importance of coherent approaches to the assessment of personality states.
Despite the theoretical and empirical popularity of personality states, there remains a surprising degree of uncertainty about the development, evaluation, and interpretation of personality state measures (Horstmann & Ziegler, 2020). This is all the more problematic as many personality and applied researchers may not be aware of these uncertainties. A recent survey of the field highlights that researchers typically develop and evaluate such measures ad hoc (Horstmann & Ziegler, 2020). As a consequence, the theoretical conceptualisation of personality states and empirical practices are frequently ill-aligned with each other. In particular, researchers may repurpose methods and criteria designed for trait constructs to validate and evaluate state measures (e.g. using cross-sectional confirmatory factor analysis to establish the reliability of state measures, which conflates the trait and state components of the construct). This mismatch between the theoretical construct of personality states on the one hand and empirical assessment practices on the other hand highlights the need for a unified framework which links the definition and assessment of personality states.
In this article, we define personality states within the revised latent state-trait theory (LST-R theory; Steyer et al., 2015; Steyer et al., 1999). On this basis, we provide a comprehensive framework for developing, evaluating, and interpreting state measures. The paper begins with a brief review of definitions of personality states and current approaches to measuring state constructs. In doing so, we highlight discrepancies between their theoretical conception and their assessment in practice. Second, we introduce a definition of personality states within the framework of LST-R theory (Steyer et al., 1999, 2015) and discuss plausible LST-R models for (intensive) longitudinal data. Third, we define several desiderata for personality state measures within the LST-R framework. Fourth, we illustrate how models of LST-R theory can be used to evaluate personality state measures using an empirical example. Finally, we discuss the implications of our framework for the theoretical conceptualisation of personality states and for the development, evaluation, and interpretation of personality state measures in practice. 1
States in contemporary personality theory
A prominent definition describes personality states as “quantitative dimension[s] describing the degree/extent/level of coherent behaviours, thoughts, and feelings at a particular time” (Baumert et al., 2017, p. 528). This definition contains two central elements: the coherence of internal characteristics and behaviour, and the temporal constraint to a particular situation. The first element reflects contemporary theories which define personality as individual differences along dimensions of coherent behaviours, thoughts, and feelings (Baumert et al., 2017). The emphasis on coherence also distinguishes personality states from purely idiosyncratic momentary experiences (DeYoung, 2015; McCabe & Fleeson, 2012). At this point, let us introduce our running example: Consider Ahmed, who has just arrived at his friend Bouke’s apartment for a party. Ahmed feels energised and is eager to meet new people. In this situation, Ahmed can be said to experience a state of Extraversion, reflecting his coherent behaviours (attending a party), feelings (feeling energised), and motives (meeting new people). However, part of Ahmed’s experience is idiosyncratic (e.g. when he recognises the door to Bouke’s apartment from his previous visits). In defining personality states as describing coherent behaviours, thoughts, and feelings, this idiosyncratic part of his experience is excluded from the construct.
The second element of the definition of personality states constrains them to a particular point in time. In practice, personality states are often elicited by asking what a person feels, thinks, or does ‘at this moment’ or ‘in this situation’ (Horstmann & Ziegler, 2020). It is implied that they may have felt, thought, and behaved differently earlier, and that they might again feel, think, and behave differently in the future. Indeed, while individual rank-order differences on many personality dimensions (‘traits’) can be fairly stable across situations and time (Bleidorn et al., 2021, 2022; Henry et al., 2022; Mõttus et al., 2019; Seifert et al., 2022), the extent to which an individual experiences or exhibits specific behaviours, thoughts, or feelings at a particular time (‘states’) may fluctuate from situation to situation (Baumert et al., 2017; DeYoung, 2015; Fleeson & Jayawickreme, 2015). Recall Ahmed, who is in a state of Extraversion at the party. The next morning, he may feel tired and cancel his brunch date (a more introverted state). Both of these situations are part of a distribution of states whose central tendency may fall somewhere between them.
Whole Trait Theory provides the most explicit account of the relationship between relatively stable individual differences in traits on the one hand and intrapersonal variation in states on the other hand. In doing so, WTT distinguishes between a descriptive and an explanatory (sub-)model. Descriptively, personality traits reflect the conceptually corresponding states a person experiences across situations (Fleeson, 2001; Fleeson & Jayawickreme, 2015), such that trait measures should correspond closely to the central tendency of the density distribution of states (Fleeson, 2001; Fleeson & Gallagher, 2009; Rauthmann et al., 2019). Explanatorily, personality traits correspond to social-cognitive mechanisms which process inputs and output personality states. Individual differences in social-cognitive mechanisms (as well as stable individual differences in inputs) can explain interindividual variation in traits (i.e. the central tendency of the distribution of states), while situation-to-situation variation in inputs can explain intraindividual variation in states.
Other theoretical accounts largely concur with the descriptive side of WTT, but diverge in their explanation of the relationship between states and traits. Cybernetic Big Five theory (CB5T; DeYoung, 2015) describes traits as individual differences in the parameters of cybernetic systems and posits that traits are situationally specific in that they ‘describe responses to specific classes of stimuli’. Similarly, interactionist or affordance-based theories hold that situational factors afford (or, conversely, constrain) the expression of specific traits in behaviour (e.g. Columbus et al., 2019; de Vries et al., 2016; Hilbig et al., 2018; Horstmann et al., 2021; Mischel & Shoda, 1995; Shoda & Mischel, 2000; Tett & Burnett, 2003; Thielmann et al., 2020; Zettler & Hilbig, 2010). Although such interactionist models are largely silent on personality states, they imply that interindividual differences in traits interact with intraindividual variation in inputs to produce variation in the central tendency and shape of the distribution of behaviours. Contemporary theories thus broadly agree that traits and states are linked because the interaction of social-cognitive mechanisms (or cybernetic system) and situation-specific inputs (or affordances) produces both cross-situational interindividual differences in traits and intraindividual variability in states across situations.
The substantive content of personality states
Personality dimensions encompass broad factors (e.g. HEXACO or Big Five dimensions), but also narrower facets and nuances lower in the personality hierarchy (Mõttus et al., 2017). Personality states can equally be positioned at any level of the personality hierarchy. WTT further requires that personality states ‘have the same affective, behavioural, and cognitive content as a corresponding trait’ (Fleeson & Jayawickreme, 2015). This isomorphism arises from the definition of traits as distributions of states (although the theory imposes the empirically derived structure of personality traits back onto personality states). However, it is imaginable that the structure of personality states differs from the structure of traits. Therefore, we consider structural trait-state isomorphism an empirical question rather than part of the definition of personality states (Rauthmann et al., 2019).
The proposed content of personality states reflects empirical analyses of Big Five trait scales, which have distinguished between references to affect, behaviours, cognitions, and desires (Wilt & Revelle, 2015; Zillig et al., 2002). Of these, behaviour arguably deserves a special status. Some researchers have equated personality states with behaviour (e.g. Horstmann et al., 2021) or have shown that feelings, thoughts, and motives co-vary with corresponding behaviours (McCabe & Fleeson, 2012). Other frameworks suggest that situational factors may afford or constrain the expression of internal states in overt behaviour (Columbus, Böhm, Moshagen, & Zettler, in prep; Columbus et al., 2019; Thielmann et al., 2020). Consequently, while feelings, thoughts, and desires may be aligned with behaviour in situations which afford their expression, this alignment may not exist in the absence of relevant affordances. Returning to our example, at the party, Ahmed’s energy and motivation to make new connections may express themselves in approaching people. Had Ahmed missed his train, however, he may have limited opportunity to express these internal states. In an affordance framework, the conceptualisation of personality states may thus explicitly exclude behaviour, and corresponding state measures should omit items with purely behavioural content.
Conceptualising and measuring personality states
Several recent publications highlighted the demand for measures designed to assess personality states (Baumert et al., 2017; Horstmann & Ziegler, 2020; Ringwald et al., 2022). For example, a consensus article by a group of personality psychologists states that ‘we must have measures of cognitive, affective, motivational, and behavioural states under specified situational conditions’ (Baumert et al., 2017, p. 517). One reason that such measures are lacking may be that existing theoretical frameworks underspecify the criteria by which personality state scales should be evaluated. In the absence of dedicated measures, research on personality states has often relied on measures constructed ad hoc by adapting items from trait measures by appending them with phrases such as ‘in this situation…’ (Horstmann & Ziegler, 2020). Such items were originally worded to capture stable individual differences across situations rather than moment-to-moment fluctuations (Horstmann & Ziegler, 2020). Trait scale evaluation also prioritises consistency over specificity 2 , for example, by relying on factor analyses of cross-sectional data. Consequently, the use of such adapted trait scales may result in overly narrow distributions of states and may attenuate associations of personality states with situation-specific antecedents or consequences. Instead, we argue that personality state scales should be designed to be highly specific in order to capture moment-to-moment fluctuations in coherent affect, cognitions, and desires.
We propose a framework for conceptualising and measuring personality states. Our conceptualisation begins with the definition of personality states as coherent characteristics of a person at a particular time. To capture these core theoretical desiderata of coherence and specificity in a formal definition of personality states, we draw on LST-R theory (Steyer et al., 1999, 2015). LST-R theory defines states as latent variables representing an individual’s characteristics at a particular time in a specific situation. These latent state variables in turn reflect the influences of the person’s immutable characteristics and past experiences (i.e. traits) as well as purely situation-specific influences and person-situation interactions (i.e. state residuals). As such, the conceptualisation of personality states within an LST-R framework is commensurable with the explanatory models of WTT and CB5T without committing to a particular form of the social-cognitive mechanisms or cybernetic system which give rise to traits and states.
By decomposing the variance of observed variables into components due to traits (consistency) and due to systematic situation-specific influences (specificity) as well as measurement error, LST-R theory is particularly well-suited to assess the psychometric properties of state measures. By incorporating autoregressive effects, LST-R theory can also be used to model trait change as a consequence of situational experience in intensive longitudinal data (Eid et al., 2017; Stadtbaeumer et al., 2022). Below, we first introduce the fundamental definitions of states, traits, and state residuals in LST-R theory. In the second step, we translate these definitions into latent variable models for intensive longitudinal data.
Revised latent state-trait theory
Definitions of states, traits, and state residuals in LST-R theory
The basic idea underlying LST-R theory is that a person cannot be assessed in a situational vacuum. That is, observations made at a single time point capture a situation-specific characteristic of the person (state), which may depend on the characteristics of this person independent of the situation (trait), the situation, and the interaction between characteristics of the person and the situation (Steyer et al., 1999). Returning to our example, Ahmed’s state Extraversion may depend on how extraverted he is in general, on the fact that he is at a party (which may make him more extraverted), and on the situation-specific effects of his general Extraversion (as a generally extraverted person, Ahmed may become more outgoing at a party, whereas Bouke, a more introverted person, may feel less comfortable in the same situation). Typically, the items measuring such a state will not be perfectly reliable, which means that the observed variable (i.e. responses on a state measure) also contains measurement error.
Formalising these ideas based on probability theory, each observed variable, Y
it
, can be described as the sum of its latent state variable, τ
it
, and a measurement error variable, εit,
Importantly, the definition of states and traits in LST-R theory is based on a dynamic concept of a person, which assumes that individuals undergo trait changes over time due to their experiences. More specifically, time-specific person variables, U t , are used to indicate that a person at time point t may differ from the person at the previous time point t − 1 by the situation that realises at time t − 1, the observations made at time t − 1, and the experiences made between t − 1 and t (Steyer et al., 2015). For instance, Ahmed’s general level of Extraversion might change from time 1 to time 2 due to the situation occurring at time 1 (e.g. making new friends at the party), due to filling out a questionnaire at time 1 (e.g. because reflecting on the items encourages him to behave more boldly), or due to experiences made between both time points (e.g. becoming the victim of a mugging on the way home from the party).
The latent state variable is defined as the conditional expectation of the observed variable given the person variable, U
t
, and the situation variable, S
t
, at time t:
Here, the := symbol indicates a definition and E (.|.) indicates a conditional expectation. In other words, the latent state variable represents the (error-free) state of a person being in a specific situation at a particular time point. Measurement error is the random deviation of the observed variable from this expected value. The latent state variable can further be decomposed into a latent trait variable, ξ
it
, and a latent state residual variable, ζ
it
, according to
The latent trait variable is defined as the conditional expectation of the observed variable given the person variable, U
t
, at time t:
A value on the latent trait variable can be interpreted as the expected value for an individual at a particular time across all possible situations that could occur at this time point. Although the latent trait of a person is thus situation-independent, it is defined for a particular time point and can therefore be regarded as a time-specific disposition that is open to change (Eid et al., 2017). Conceptually, the latent trait reflects the influences of the person’s immutable characteristics and experiences up to that point (i.e. Ahmed’s general level of Extraversion) on their personality state in all possible situations that this person may encounter. This corresponds to the parameters of the cybernetic system of CB5T, which likewise may change as a result of new experiences (DeYoung, 2015). However, LST-R models have a decontextualised view on latent traits and assume that these are not situation-specific.
The latent state residual variable, in turn, is defined as the difference between the latent state and trait variables,
It represents a systematic situation-specific deviation from the trait variable and may include both situation effects and person × situation interactions. A positive value on the latent state residual variable would indicate that the state of a person in a specific situation was higher than expected based on the trait level of that person. Conceptually, the latent state residual reflects the influences of characteristics of the situation (i.e. the effect of being at a party) as well as person-situation interactions (i.e. the effect of being at a party conditional on Ahmed’s general level of Extraversion) on a personality state in the given situation.
Taken together, an observed variable can be decomposed as follows:
This means that Ahmed’s score on a state Extraversion measure is composed of his true Extraversion trait level, a situation-specific deviation from his Extraversion trait level, and measurement error. In principle, each observed variable has its own indicator- and time-specific latent state, trait, state residual, and measurement error variable (as evident from the subscripts i and t). In practice, however, estimating such a model is not possible because the observed variables do not provide sufficient information to infer the free parameters of all latent variables, resulting in mathematical non-identification. Nevertheless, by making assumptions about the equivalence of the latent variables or about the stability and change in the latent variables across time, identified and estimable models can be derived. In the following, we present two types of models suitable for analysing state measures: the basic multistate-singletrait model and its extension including autoregressive effects, which is particularly appropriate for intensive longitudinal data.
Models of LST-R theory
Multistate-singletrait model
The multistate-singletrait model depicted in Figure 1 assumes that all observed variables assessed at a particular time point share the same latent state variable, τ
t
. Therefore, only one common latent state variable per time point is specified. These variables represent, for instance, the level of state Extraversion at each time point. However, the observed variables might measure the latent state variable on a different metric, for example, due to using a different rating scale or differing in item difficulty (items might vary in the level of Extraversion that is required to agree to an item) or discrimination (some items might be more representative of the Extraversion domain [e.g. an item asking whether a person currently talks to others] than other items [e.g. an item asking whether a person is currently physically active]). Therefore, intercepts, ν
it
, and factor loadings, λ
it
, can be introduced that are allowed to vary between the indicators and time points:
3
Alternative Representations of the Multistate-Singletrait Model. Note. The models in Panel A and B are equivalent in terms of the implied variance-covariance matrix and mean structure as well as model fit. For the sake of clarity, the mean structure of the model is omitted. ξ
t
= latent trait variables; τ
t
= latent state variables; ζ
t
= latent state residual variables; Y
it
= manifest variables; ε
it
= measurement error variables; λ
it
= regression coefficients of the latent state variables on the indicators; 
It may also be assumed that the intercepts and factor loadings only vary between the indicators but not between time points, such that the latent state variables are measured invariantly across time. To identify the model, the intercept of the first indicator is typically set to 0 and the factor loading of the first indicator is set to 1 (note that the intercepts are omitted in Figure 1 for clarity).
As can be seen in Figure 1(a), the time-specific latent state variables are further decomposed into time-specific latent trait variables (ξ
t
; e.g. the trait Extraversion level at each time point) and time-specific state residual variables (ζ
t
; e.g. the situation-specific deviations of the state extraversion level from the trait Extraversion level). Importantly, although the latent trait variables are time-specific, they are assumed to represent functions of the latent trait variable at the first time point, as indicated by the paths from ξ1 to ξ2 and ξ3. The scale on which the latent trait variables are measured may thus change over time but this change can be perfectly described by a linear function (i.e. adding the constant α
t
and multiplying by the constant
The multistate-singletrait model assumes that the stability of the latent state variables across time is entirely explained by the common latent trait factor and implies that the state residual variables, capturing the variability across time, are uncorrelated. This may be reasonable if the lag between time points is long or if the construct under investigation is highly stable, such that the situation-specific deviations from the general trait level are independent from each other. However, for more fluctuating constructs or in experience sampling studies in which individuals are assessed repeatedly within short periods of time, situational influences may carry over to subsequent time points, which makes the multistate-singletrait model inappropriate in this case. For example, when Ahmed feels more extraverted than usually because he is at a party at time 1 (i.e. his Extraversion state deviates from his trait Extraversion level), this might affect how extraverted he feels at time 2, especially if these time points are closely spaced (e.g. because having made new friends at the party, Ahmed becomes more lively and communicative). To account for such carry-over effects between adjacent time points, LST-R models can be extended to include autoregressive effects.
Multistate-singletrait model with autoregressive effects
There are different approaches to formulate models with autoregressive effects in line with LST-R theory (Stadtbaeumer et al., 2022).
5
Here, we present the trait-state occasion (TSO) model (Cole et al., 2005) as reformulated by Eid et al. (2017) according to the LST-R theory, which represents a multistate-singletrait model with autoregressive effects. The basic idea of the reformulated TSO model is that individuals exhibit relatively stable levels of a construct over time, but that the situations or life events encountered by individuals (situation effects) and the reaction to those situations (person × situation interactions) can change the trait level. In contrast to the basic multistate-singletrait model, this implies that the latent state residual variables have an effect on all subsequent latent trait variables, as shown by the autoregressive effects, Multistate-Singletrait Model with Autoregressive Effects. Note. For the sake of clarity, the mean structure of the model is omitted. ξ
t
= latent trait variables; τ
t
= latent state variables; ζ
t
= latent state residual variables; Y
it
= manifest variables; ε
it
= measurement error variables; λ
it
= regression coefficients of the latent state variables on the indicators; 
To illustrate this idea, let us return to our example. At the beginning of our study, Ahmed has a certain level of trait Extraversion, captured by the value of the time-specific trait variable, ξ1. At the first time point, Ahmed is at a party, which causes his state Extraversion to be higher than expected based on his time-specific trait level, implying a positive value for the first state residual variable, ζ1. Ahmed’s experiences at the party strengthen his social self-esteem, which increases his trait level of Extraversion at the second time point, ξ2. As a consequence, Ahmed is now more lively and communicative in various situations. At the second time point, Ahmed experiences social rejection, leading to a state Extraversion level lower than expected based on the time-specific trait variable, ξ2, and a negative value for the state residual variable, ζ2. This again can affect the subsequent time-specific trait level, ξ3, by lowering the situation-independent Extraversion level at the third time point, but arguably less so than if Ahmed had not experienced a positive situation at the first time point. In other words, situation-specific experiences influence all future trait levels. Importantly, however, the effect of state residual variables on the trait variables is expected to fade out over time (implied by the multiplication of the autoregressive effects over multiple time points, see Figure 2; Eid et al., 2017). Thus, sustained trait change is more likely to occur through repeated situational experiences (Wrzus & Roberts, 2017).
Coefficients of LST-R theory
Variance decomposition coefficients in LST-R theory.
Note. The example shows the calculation of the coefficients for the second indicator of the third latent state variable in the model with autoregressive effects shown in Figure 2. Depending on the specific definition of a model (e.g. assuming a common state variable at each time point and imposing invariance constraints on the factor loadings across time), the calculation of the coefficients may differ from the general formulae presented. In models without autoregressive effects, the calculation of the coefficients simplifies because all
The reliability coefficient gives the proportion of variance in an observed variable that is explained by the latent state variable, or, in other words, the proportion of variance that is not due to random measurement error (Steyer et al., 2015). The reliability of an observed variable is the sum of the proportion of variance explained by the latent state residual variable (captured by the specificity coefficient) 6 and the proportion of variance explained by the latent trait variable (captured by the consistency coefficient). The specificity coefficient thus represents the degree to which the score on an observed variable is determined by situation-specific influences and the interaction between the person and the situation, with higher values indicating that the observed variable reflects a more state-like construct (Steyer et al., 2015). The consistency coefficient, in turn, represents the degree to which the score on an observed variable is determined by stable individual differences, with higher values indicating that an observed variable reflects a more trait-like construct (Steyer et al., 2015).
In models with autoregressive effects, the latent trait variable at a specific time point is influenced by both the latent trait variable at the first time point and previous state residuals, which allows further decomposing the consistency coefficient into these two components. The proportion of variance in an observed variable that is explained by the first latent trait variable is given by the predictability by trait 1 coefficient, indicating to which degree the score on an observed variable reflects trait differences at the first time point. The proportion of variance in an observed variable that is explained by previous state residuals is given by the unpredictability by trait 1 coefficient, indicating to which degree the score on an observed variable reflects carry-over effects (Eid et al., 2017). Unpredictability by trait 1 thus indicates to what degree individual differences in states can be attributed to trait changes as a result of diverging experiences over the course of the study.
Criteria for evaluating personality state measures
A recent review by Horstmann and Ziegler (2020) has documented that research on personality states largely relies on ad hoc measures. Existing personality state measures have rarely undergone a thorough psychometric evaluation with respect to their validity and reliability (for an exception, see Ringwald et al., 2022). If the validity and reliability of ad hoc state measures is assessed at all, this is often done using methods designed for trait measures (Horstmann & Ziegler, 2020; Wright & Zimmermann, 2019). Here, we describe a framework for developing and evaluating personality state scales on their own terms, and define criteria by which to do so.
LST-R theory provides a coherent framework for defining and measuring personality states. In line with contemporary personality theories, we have defined personality states as coherent affect, cognitions, and desires at a particular time, which can be formalised as latent state variables in LST-R theory. Within the LST-R theory framework, we can define criteria for good personality state measures, that is, criteria to assess the coherence and temporal specificity of proposed state measures. These criteria can be used to evaluate existing scales and to develop new instruments.
Criteria to evaluate the validity and reliability of state measures within an LST-R framework.
Data considerations
We formally model personality states as latent state variables in LST-R models. To fit such models, the construct of interest should be measured with at least two (preferably three or more) items on three or more time points (Clark & Watson, 1995). The interval between measurement occasions should reflect how quickly the construct is expected to fluctuate. For personality states, which are expected to fluctuate from situation to situation rather than, for example, from day to day, experience sampling data with multiple measures per day may be most appropriate. Finally, data should come from the intended population of both persons and situations. For example, if personality states are elicited at the same hour each day, this may underestimate intrapersonal fluctuations because the sampled situations may not reflect the universe of situations the participants experience throughout the day.
Psychometric evaluation and item selection
Internal structure
The internal structure of a scale should correspond to the theoretical structure of the personality state construct it is meant to measure. To assess the degree of correspondence, it is necessary to translate the theoretical assumptions about the structure of the construct into a formal (LST-R) model. Within the LST-R framework, an important aspect to consider is whether the indicators are assumed to reflect multiple states and a single trait (translating into a multistate-singletrait model) or both multiple states and multiple traits (translating into a multistate-multitrait model; Steyer et al., 2015). One must also decide whether autoregressive effects should be included in the model, which may be necessary if the short time lag between measurement occasions or the fluctuating nature of a construct lead to carry-over effects. Note that the omission or inclusion of autoregressive effects also implies different assumptions about the rank-order stability of the trait levels of a construct over the course of the study. Whereas models without autoregressive effects assume a perfectly stable rank order of individuals regarding their trait levels, models with autoregressive effects allow for rank-order changes in trait levels (Geiser, 2021).
The degree of correspondence between the theoretical structure of the construct and the internal structure of the measure can be evaluated using typical goodness-of-fit indices for factor analytic models (Bader & Moshagen, 2022; Hu & Bentler, 1999; West et al., 2012). However, commonly recommended cut-off points may need to be adjusted for models of intensive longitudinal data with many degrees of freedom (see e.g. Norget & Mayer, 2022; Yuan et al., 2015). If multiple models are theoretically plausible, the most appropriate model can be selected using model comparison (e.g. Columbus, Norget, Mayer, & Balliet, in prep). Ultimately, however, an LST-R model should fit the data. Persistent model misfit may indicate that the internal structure of the measure does not fit the theoretical structure of the construct. This may occur when the observed variables do not cohere in the predicted manner. In this case, it may be necessary to reconsider the theoretical conception of the state construct, or to revise the items.
Personality state measures are often multidimensional (e.g. scales corresponding to the Big Five and HEXACO models of personality). When multidimensional measures are evaluated, it should also be assessed whether theoretical assumptions about the dimensionality of the measure (e.g. number of underlying factors and relation between the factors) are in line with its empirical structure. If correlated factors are assumed, specific attention should be paid to the magnitude of factor correlations to ensure that the correlations between factors are neither too low (indicating a lack of association between constructs) nor too high (indicating a lack of discriminability between constructs).
Longitudinal measurement invariance
When state measures are to be used to compare scores on the latent state and/or trait variables over time, it is necessary to establish the longitudinal measurement invariance of the manifest variables. Measurement invariance refers to the comparability of measurement across different time points or groups, which is important for interpreting changes in the levels of latent variables unambiguously (Widaman et al., 2010). Measurement invariance is typically tested by comparing sequences of nested models with increasingly strict equality constraints (Meredith, 1993). Configural invariance imposes an equal factor structure in terms of the number of factors and with regards to which items load on the factors. Metric invariance additionally constrains unstandardised factor loadings of corresponding items to equality across different time points, and scalar invariance additionally constrains the intercepts of corresponding items to equality across time, which is necessary to compare latent mean levels at different time points. Finally, strict invariance imposes additional equality constraints on the measurement error variables of the items, which implies a constant reliability of corresponding items and thus enables the comparison of manifest scale scores across time (Meredith, 1993).
Specificity
Any psychometric measure should pick up on systematic variation in the proposed construct (and as little as possible on noise). Trait measures are therefore evaluated for reliability (i.e. the proportion of variance due to systematic individual differences) using indices of internal consistency such as Cronbach’s α or McDonald’s ω. Such indices assess whether a measure reliably captures stable interindividual differences; systematic intrapersonal fluctuations can create the impression that a measure is unreliable (Horstmann & Ziegler, 2020). Importantly, though, reliability and within-person variability are not opposed: Fluctuations can be measured reliably, and good personality state scales are designed to do so. 7
Reliability alone thus is not sufficient to assure that the measure is sensitive to systematic moment-to-moment fluctuations. However, with LST-R models of longitudinal data, it is possible to decompose reliability 8 into specificity (the proportion of variance due to the latent state residuals, i.e. systematic situation-specific influences) and consistency (the proportion of variance due to the latent trait variables). High specificity is particularly desirable for personality state measures. A substantial amount of specificity ensures that the selected indicators are responsive to situation-specific influences and interactions between persons and situations, thereby capturing intraindividual variability in states across time points.
Low specificity is problematic when state measures are used to study the predictors, correlates, outcomes, or dynamics of personality states. If the reliability of indicators was solely attributable to consistency, the resulting measure could only provide insights into stable interindividual differences in trait levels. Using such a measure in a study of personality dynamics may give the impression that personality states have no meaningful antecedents or consequences. This could be the case even if the measure exhibits sizeable intrapersonal variability (e.g. as indicated by the intraclass correlation coefficient), if this variability is purely due to measurement error. Researchers interested in studying personality dynamics should therefore develop and select measures with sufficient specificity to answer their substantive research questions.
One might object that, if the specificity of items is maximised to such an extent that none of the reliable variance in indicators is due to consistency, the indicators could no longer be interpreted as measuring a common underlying trait. In practice, however, it is rather unlikely to obtain indicators that are unrelated to trait effects because psychometric evaluations of existing state measures suggest that indicators are typically affected by stable interindividual differences to a substantial extent (e.g. Rauthmann et al., 2019; Ringwald et al., 2022; Zimmermann et al., 2019), though there might be exceptions (e.g. a brief measure of relative power exhibited high specificity, but only negligible consistency; Columbus, Norget, et al., in prep). In contrast to overall reliability, it is thus not the case that higher specificity is always better, nor is there some optimal level of specificity. Instead, the specificity of state measures should be evaluated with respect to theoretical concerns (how much is the construct expected to vary) and in comparison to related measures.
Further validation
Convergent and discriminant relations
In addition to evidence based on the internal structure, the relation of the target construct to other conceptually related constructs is a further important source of validity evidence. To validate a newly developed state measure, one should therefore concurrently collect data using other measures of the same or very similar constructs as well as of distinct constructs. These constructs can then be included as additional latent variables into the LST-R model to assess their associations with the latent trait and latent state residual variables. Whereas convergent evidence may be obtained from substantial correlations between the latent variables of the newly developed state measure and of alternative measures for the same or very similar constructs, discriminant evidence may be obtained based on weak correlations to the latent variables of measures intended to assess different constructs (AERA, APA, & NCME, 2014).
Importantly, validation should include both the state and trait level of a construct (Wright & Zimmermann, 2019). Correlations at the aggregate level do not necessarily mean that individual states of the constructs co-vary. Therefore, it is necessary to (also) probe convergent and discriminant validity at the level of particular states. That is, there should be theory-consistent associations between latent state residual variables and conceptually related time-varying variables (e.g. a positive deviation of the state Extraversion level from the respective trait level should be positively related to the level of state sociality), but also theory-consistent associations between latent trait variables and related trait measures (e.g. trait levels of Extraversion should be positively related to trait levels of sociality).
Criterion relations
A related source of validity evidence is the relation of the latent variables of the state measure to theoretically meaningful or practically relevant criteria (AERA, APA, & NCME, 2014). For personality states, such criteria may include self-reported behaviour, but also observational data (e.g. smartphone data; Stachl et al., 2021) or physiological measures. Again, the relation to relevant criteria should be demonstrated both with regard to the state and trait level of the measure. For instance, if, for a state measure of Extraversion, the latent state residual variables were shown to correlate significantly with objective measures of talkativeness (e.g. recorded via wearable cameras or electronically activated recorders; Brown et al., 2017) and the latent trait variables were shown to be positively related to the number of (Facebook) friends (Lönnqvist & Itkonen, 2014), this may be interpreted as evidence for the validity of the measure.
Developing personality state measures
The proposed criteria for evaluating personality state measures can also be applied in developing new instruments designed to assess personality states. Most personality state measures are generated ad hoc from existing trait measures (Horstmann & Ziegler, 2020). One reason for the lack of instruments developed explicitly for the assessment of personality states may be the relative scarcity of guidelines. The development of native state measures benefits from general good practice in scale development (DeVellis & Thorpe, 2021). However, there are several aspects of state measures which call for a distinct approach. Below, we highlight how the formal definition of personality states in the LST-R framework and criteria for model evaluation can inform scale development.
Initial theoretical considerations
The development of any psychological measurement instrument ideally begins with a clear account of the target construct and the proposed use of the measure. When developing (personality) state measures, the following questions should be addressed: (1) What is the construct being measured? (2) What is the intended purpose of the measure? and (3) What is the targeted population of persons and situations? (Horstmann & Ziegler, 2020).
Answering the first question requires providing a theoretical account of the structure and content of the proposed construct. The LST-R framework provides an explicit statement of the expected structure of the construct. Different models express different assumptions about the structure of a construct. In particular, one must specify whether the construct corresponds to one or multiple traits (e.g. in the case of personality dimensions with multiple facets). It is also important to consider at which time intervals the construct is assumed to fluctuate. Finally, one must define the content of the construct. When developing state scales corresponding to existing models of (trait) personality, one can draw on conceptual (e.g. Ashton et al., 2014) and empirical (e.g. Zettler et al., 2020) analyses to identify the content of relevant states. Scale developers may also decide to constrain the domain they are studying on theoretical grounds. For example, analyses of trait scales have shown that they capture a mixture of affect, behaviour, cognition, and desire (Wilt & Revelle, 2015; Zillig et al., 2002). However, in developing the HEXACO Personality States Inventory, Columbus, Böhm, et al. (in prep) explicitly excluded behaviours, arguing that concrete behaviours are better understood as consequences of personality states. Such conceptual concerns can inform item generation and scale validation.
Concerning the second question, two main purposes of state measures can be distinguished: (i) using time-specific state scores as an indicator of the state level of a person at a particular time point and (ii) using aggregated state scores as an indicator of the trait level. Depending on the purpose, desirable characteristics of a state measure may differ. For instance, whereas the use of time-specific state scores requires indicators that are responsive to situation effects and person × situation interactions (i.e. high specificity), the use of aggregated state scores benefits from indicators that are strongly influenced by trait effects (i.e. high consistency). Our focus will be on the first purpose, as the assessment of time-specific states is arguably the more common and natural use of state measures (for more information on developing state measures for the purpose of assessing traits, see Horstmann & Ziegler, 2020). Finally, the population for which the state measure is intended refers to both the target population of participants and the target population of situations. This should be specified because it informs the generation of items and the appropriateness of these items for the intended sample of participants (e.g. adolescents, older adults) and situations (e.g. interpersonal situations and situations at work). When items are irrelevant to the sampled situations, this may distort the measurement of personality states (Kritzler et al., 2023).
Item generation
A crucial next step in the development of state measures is the generation of the initial item pool. Guided by the initial theoretical considerations and empirical findings on the nomological net of a target construct, a comprehensive set of all content that might be relevant to the construct should be devised (Clark & Watson, 2019). Here, it is recommended to be overinclusive because whereas subsequent psychometric analyses can detect unrelated content that may be excluded, missing content cannot be identified (Clark & Watson, 1995; Loevinger, 1957). The set of item content then serves as a basis for formulating concrete items. A particular concern when developing state items is the trade-off between breadth and specificity (Horstmann & Ziegler, 2020). On the one hand, items should be broad enough to be applicable to most situations within the sampling frame (Kritzler et al., 2023) and to cover the content domain of the construct. On the other hand, items should refer to specific (momentary) affect, behaviour, cognition, or desire, lest they fail to capture occasion-specific manifestations of the personality dimension and lose specificity. Additionally, general recommendations for optimal item wording (e.g. using simple syntax), choosing an adequate rating scale, and developing appropriate instructions should be taken into account (Horstmann & Ziegler, 2020).
Item selection
The most important contribution of the LST-R framework to state scale development is during the item selection process. Once an initial item pool has been generated, one must reduce the number of items to the final scale. For this purpose, it is important to collect data which fit the criteria outlined in the initial considerations (e.g., right population of participants and situations and appropriate lag between measurement points). In selecting items for a state scale, it is important to use longitudinal data to which an LST-R model can be fitted (for a tutorial, see Norget et al., 2023).
While it is common to select individual items based on item-level criteria (e.g. factor loadings), a more principled approach selects item sets which meet the various desiderata of the scale. Using algorithmic methods, it is possible to reduce an initial item pool into a shorter scale by selecting the combination of items which performs best against a set of pre-specified criteria (for a review, see Olaru et al., 2019). This makes it feasible to select item sets using the criteria of model fit, measurement invariance, specificity, and/or convergent and discriminant validity outlined above. For example, Columbus, Böhm, et al. (in prep) used ant colony optimisation to develop a personality state measure corresponding to the HEXACO model of personality. This algorithm fits subsets of the overall item set to a multidimensional LST-R model to select four items per HEXACO dimension (one per facet) while maximising model fit and specificity and minimising correlations between traits across dimensions. Thus, algorithmic item selection methods can be used to create measures which fit the criteria for good personality state measures defined within the LST-R framework.
The proposed criteria for the evaluation of personality state measures can be adapted or complemented depending on the construct of interest or the intended purpose of the measure. For instance, for some state measures it may be desirable to reduce the impact of autoregressive effects on resulting scores by selecting indicators that are characterised by high specificity and high predictability by trait 1. Conversely, it may also be desirable to develop state measures that are particularly sensitive to the effect of current and previous latent state residual variables by selecting indicators with high specificity and high unpredictability by trait 1. Customising the set of desirable criteria is facilitated by a clear account of the target construct and the proposed use of the measure.
Either before or after the first selection is made from the initial item pool, items should be rated for construct validity by a set of domain experts. Important questions to ask include whether raters can correctly associate items with the relevant construct (especially when the scale captures multiple dimensions) and whether they consider the item appropriate for the theoretical construct. It can also be valuable to use techniques such as cognitive interviewing to probe whether participants interpret items as intended (Peterson et al., 2017; Ryan et al., 2012). Insights from these qualitative procedures can be used to improve or discard items which may be misinterpreted or fail to capture the construct of interest.
The initially collected evidence for the reliability and validity of personality state measures should not be regarded as the end of the scale construction process. The development of psychological measurement instruments ideally proceeds in an iterative fashion, whereby the psychometric properties of a measure should be replicated and potentially optimised across multiple independent samples. Often, LST-R analyses and expert ratings will reveal gaps in the initial item set. For example, it is possible that some facets of the construct of interest are not captured by any of the original items. In this case, it is important to revise or replace items. Moreover, it is important to avoid overfitting. Therefore, once a final candidate item set has been identified, it is important to fit the chosen LST-R model to a separate, confirmatory sample.
Empirical example
To illustrate the proposed approach for developing and evaluating personality state measures within an LST-R framework, we use daily diary assessments of Big Five personality states from Ringwald et al. (2022). The data stem from three independent samples of undergraduate students (Sample 1: N = 330; 62% female; mean age = 18.6, SD = 0.96 years), community members (Sample 2: N = 342; 52% female; mean age = 27.6, SD = 4.9 years), and participants of the University of Pittsburgh Adult Health and Behavior project (Sample 3: N = 458; 54% female; mean age = 59.5, SD = 7.2 years). For the present analyses, we pooled the data of all three samples. Upon exclusion of 176 individuals who did not provide any data on the relevant variables at the time points of interest, the final sample comprised N = 954 participants. In all samples, Big Five personality states were measured daily using four items for each of the five traits (for more details on the procedure, see Ringwald et al., 2022). Participants were instructed to indicate which of two (opposing) adjectives described them best in the past 24 hours on a 7-point Likert scale. We investigate the personality state items for extraversion, namely Item 1 (lethargic/energetic), Item 2 (bold/timid; reverse scored), Item 3 (talkative/silent; reverse scored), and Item 4 (unassertive/assertive), and only consider data of the first three time points for the sake of simplicity. The focus of this empirical example is on the psychometric evaluation of and item selection for a state measure of extraversion. No hypotheses were preregistered and all analyses were performed in an exploratory fashion. The data and commented R code for the following analyses are publicly available on the Open Science Framework (OSF; https://osf.io/szfu7/).
To evaluate the psychometric properties of this extraversion state measure, we first translated the theoretical assumptions about the structure of the measure into an LST-R model. Specifically, we assumed that the four indicators measure a distinct extraversion state at each of the three time points as well as a common extraversion trait across all time points. This corresponds to a multistate-singletrait model as presented in Figure 1 (but with four manifest variables for each latent state variable).
As the measurement occasions were closely spaced in time with a lag of one day between adjacent time points, we also investigated the possibility of carry-over effects by additionally estimating a multistate-singletrait model with autoregressive effects as shown in Figure 2 (again with four manifest variables for each latent state variable). To identify the latter model, we imposed equality constraints on the autoregressive effects, the variances of the occasion residual factors, and the variances of the latent state residuals (Prenoveau, 2016). The models were estimated using the R package lavaan (Rosseel, 2012) as well as the R package lsttheory (Mayer, 2015), which facilitates the specification of LST-R models (see also Norget et al., 2023). Missing values and non-normally distributed data were addressed using full information maximum likelihood estimation with robust (Hubert-White) standard errors and a test statistic that is asymptotically equal to the test statistic by Yuan and Bentler (2000).
The multistate-singletrait model without autoregressive effects yielded only a poor degree of goodness-of-fit to the data, χ2 (51) = 358.79, p < .001, RMSEA = .103, RMSEA 90% CI = [.093, .113], SRMR = .061, CFI = .813. We therefore allowed for correlations between the measurement error variables of the same items at different time points to account for stable indicator-specific variance, which significantly improved the model fit ∆χ2(12) = 197.82, p < .001 and led to an acceptable degree of goodness-of-fit of the modified model according to descriptive fit indices, χ2 (39) = 153.75, p < .001, RMSEA = .072, RMSEA 90% CI = [.060, .084], SRMR = .044, CFI = .930.
Next, the longitudinal measurement invariance of the measure was assessed to test whether the manifest variables measure the latent state variables equivalently across time. Given the acceptable fit for the multistate-singletrait model assuming one latent state variable for each time point, configural invariance was assumed. Constraining the unstandardized factor loadings of the manifest variables to equality across time points did not lead to a significant deterioration of model fit, ∆χ2(6) = 10.90, p = .091, ∆CFI configural−metric = .003, thus supporting metric measurement invariance. When constraining the intercepts of the manifest variables to equality across time points, the model fit deteriorated significantly, ∆χ2(6) = 36.13, p < .001, and the difference in the comparative fit index, ∆CFI metric−scalar = .011, exceeded benchmarks typically considered indicative of a lack of measurement invariance (Chen, 2007; Cheung & Rensvold, 2002). Thus, scalar invariance cannot be assumed for the state measure and differences in the mean levels of the latent variables should be interpreted with caution. Given the lack of scalar invariance, strict invariance was not assessed.
When adding autoregressive effects to the multistate-singletrait model, the model with autoregressive effects (BIC = 33,025, AIC = 32,816) showed minor improvements in the BIC but not in the AIC compared to the model without autoregressive effects (BIC = 33,035, AIC = 32,816). Moreover, the autoregressive effects did not differ significantly from zero (
Descriptive statistics, standardised factor loadings, and variance decomposition coefficients for the indicators of the extraversion state measure.
Note. Parameters were obtained from a multistate-singletrait model with metric measurement invariance and correlations between the measurement error variables of the same indicators at different time points. Standardised loadings may differ between time points because metric invariance imposes equality constraints on the unstandardised factor loadings. The reliability of an item corresponds to the square of its standardised factor loading. The specificity and consistency coefficients do not always sum up exactly to the reliability coefficient due to rounding.
As can be seen in Table 3, the manifest variables exhibited only a small to moderate degree of reliability. Whereas Item 2, on average, exhibited the highest reliability, Item 1 exhibited the lowest reliability at all time points. For all indicators, a larger proportion of reliable variance was due to stable trait differences than due to situation effects and interactions between the person and the situation, as is evident from the higher values for the consistency coefficients compared to the specificity coefficients. For the total extraversion scale, the reliability was .69, .68, and .60, and the specificity was .28, .23, and .21 at the first, second, and third time point, respectively (see the R code on the OSF for how to calculate the reliability and specificity of the total scale). This means that, on average, only 24% of the variance in the extraversion state scores could be attributed to reliable situation-specific influences. These reliability estimates are lower than those obtained via multilevel modelling, where the within-person reliability for the extraversion scale was .52 and the between-person reliability was .83 (Ringwald et al., 2022).
Taken together, the results support the internal structure of the extraversion state measure and suggest that autoregressive effects between adjacent time points are not required. Whereas metric measurement invariance can be assumed for the measure, scalar invariance was not supported. Therefore, additional work should identify which indicators exhibit different values for their intercept across time and ideally replace those indicators with alternative items conforming to scalar invariance. Furthermore, some indicators showed only a small degree of reliability and very small specificity. In particular, Item 1 exhibited the lowest specificity at all three time points and should be replaced by an indicator that can measure situation-specific effects in extraversion more reliably. In addition to identifying more suitable indicators for the extraversion state measure, further steps in evaluating the measure are to investigate the convergent and discriminant relations of state and trait scores to other conceptually relevant constructs and, potentially, criterion variables.
Discussion
Personality states refer to the affect, behaviour, cognition, and desires of a person in a particular situation. We formally define personality states within LST-R theory, which we translate into testable latent variable models. To examine the predictors, correlates, outcomes, or dynamics of personality states, researchers must rely on valid and reliable measures. Such measures must capture the intrapersonal fluctuations arising from systematic situation-specific influences. On the basis of the LST-R framework, we propose a series of criteria for evaluating personality state measures. In particular, we highlight specificity – the proportion of variance in observed scores due to systematic situation-specific influences – as a key desideratum of personality state measures. These definitions, and the resulting criteria for state measures, have important implications for personality theory and for the assessment of personality states.
Implications for personality theory
LST-R theory provides a formal definition of key concepts such as states, traits, and state residuals, which map onto commonly used definitions of traits and states in contemporary personality theory (e.g. Baumert et al., 2017). Different models of LST-R theory specify the relationships between traits, states, and state residuals and can be used to inform theoretical accounts such as WTT (Fleeson, 2001; Fleeson & Jayawickreme, 2015; Jayawickreme et al., 2019). Of particular interest, recent models of LST-R theory allow for trait change as a consequence of situational experiences (Eid et al., 2017; Stadtbaeumer et al., 2022), which captures a proposed mechanism of trait change (Wrzus & Roberts, 2017). Defining and modelling personality states within this framework takes a step towards the integration of personality structure, personality processes, and personality development (Baumert et al., 2017).
Comparison to whole trait theory
WTT is a leading contemporary account of the relationship between personality traits and personality states (Fleeson, 2001; Fleeson & Jayawickreme, 2015; Jayawickreme et al., 2019). According to WTT, personality traits are made up of two distinct but linked parts. Descriptively, traits are density distributions of states, such that individual differences can be described in terms of the parameters of the distribution of states. Explanatorily, traits consist of social-cognitive mechanisms which generate states from internal and external cues. People differ in these information processing mechanisms, such that the same inputs can produce different states. Thus, individual differences in social-cognitive mechanisms, but also differences in the inputs experienced can explain differences in the distribution of states.
The latent states defined by LST-R theory map onto the state construct in WTT. They capture the characteristics of a person at a particular time (Baumert et al., 2017). In particular, by defining states as latent variables, the coherent affects, behaviours, cognitions, or desires which make up the personality state are disentangled from idiosyncratic influences and measurement error. In LST-R theory, these latent states reflect the influences of situation-independent characteristics of the person at this point in time – the latent trait – as well as situation-specific influences – the latent state residuals. The latent state residuals also capture the effects of person × situation interactions.
The latent trait in LST-R theory is an expectation across all possible situations in which the person might be. This maps broadly onto the location of the distribution of states, which has been used to operationalise traits in WTT (Fleeson, 2001). However, whereas WTT defines traits as the distribution of states, in LST-R theory, latent states reflect the influences of characteristics of the person (traits) and situation-specific influences (latent state residuals). Moreover, LST-R makes this specific to the time point, whereas WTT implicitly assumes a static trait, though recent extensions of the theory do allow for the possibility of trait change (Jayawickreme et al., 2019). Thus, the latent trait in LST-R theory is better understood to capture individual differences in social-cognitive mechanisms which produce individual differences in the distribution of states.
Conversely, latent state residuals capture situation-specific influences and person × situation interactions. This maps onto the role of cues in WTT. On the explanatory side, WTT posits that intrapersonal variation in states arises from variation in cues. Both LST-R theory and WTT further assume that individuals may differ in their response to cues, which can give rise to individual differences in the distribution of states. On the whole, LST-R theory is thus consistent with the core tenets of WTT as well as personality theories which adopt the descriptive side of WTT, such as CB5T (DeYoung, 2015). At the same time, formalising the definition of personality in LST-R theory allows for cumulative theory development, for example, by incorporating trait change in descriptive models of personality states.
Linking personality states and personality development
One challenge for the integration of personality processes and personality development is the question of how state changes may accumulate into trait change (Baumert et al., 2017; Nesselroade & Molenaar, 2010). According to the TESSERA model, personality states triggered by situational factors can be transferred into long-term personality development through reflective and associative processes (Wrzus & Roberts, 2017). Thus, repeatedly experiencing states which differ from one’s previous trait level can elicit personality change. Importantly, though, empirical evidence for such accumulation of states into trait changes is lacking (Baumert et al., 2017; Hofmann et al., 2009; Wrzus et al., 2021).
Recently developed LST-R models with autoregressive effects formalise the accumulation of state residuals (i.e. states not explained by traits) into traits: state residuals (i.e. situation-specific influences) lead to trait change, which in turn affects future personality traits and states (Eid et al., 2017; Stadtbaeumer et al., 2022). LST-R models with autoregressive effects are thus consistent with the TESSERA framework in that situational factors lead to trait changes through the states a person experiences. One advantage of LST-R models with autoregressive effects is their potential to account for experience-dependent trait change in a single model. Moreover, it is possible to include hypothesised time-varying predictors (e.g. situation factors which may trigger atypical personality states). Future research within the TESSERA framework may thus apply LST-R models to test whether personality states accumulate into trait change.
Implications for the development, evaluation, and interpretation of personality measures
In line with recent work (e.g. Horstmann & Ziegler, 2020), we hold that state constructs should be measured using ‘native’ state measures. In current practice, state constructs are often assessed using adapted trait measures (Horstmann & Ziegler, 2020). This may lead to an underestimation of intrapersonal variability and the effect of situation-specific influences if the items are designed to capture consistent individual differences rather than situation-specific states. Therefore, state measures should be designed to reliably capture intrapersonal fluctuations, that is, to have high specificity. The use of LST-R models in the development and evaluation of state measures allows researchers to quantify how well a scale can be expected to capture both stability and change in personality states.
A second problematic practice is the use of intraclass correlation coefficients to quantify the extent of intrapersonal variation (Horstmann & Ziegler, 2020). ICCs confound variation due to coherent situation-specific influences with measurement error. If researchers define personality states as coherent, ICCs thus inflate the degree of intrapersonal variation in personality states. Latent states as defined in LST-R theory correspond to this commonly adopted definition of personality states as the level of coherent affect, behaviour, cognitions, and desires at a particular time. Thus, specificity – the proportion of variance in the latent states that is due to situation-specific influences – is a better indicator of intrapersonal variability than intraclass correlation coefficients.
Intensive longitudinal data often exhibit autoregressive effects. However, these are typically not modelled explicitly, especially in mixed-effects models. In LST-R models, autoregressive effects have a particular substantive interpretation. Specifically, autoregressive effects reflect experience-dependent trait change (Stadtbaeumer et al., 2022). Thus, including autoregressive effects in LST-R models enables researchers to examine the degree to which experiencing particular personality states accumulates into trait change, which is, in turn, reflected in future personality states. This provides an opportunity to integrate research on personality processes with personality development (Baumert et al., 2017; Wrzus & Roberts, 2017).
Applications to other state constructs
In this manuscript, we focus on introducing a framework for developing, evaluating, and interpreting personality state measures. To do so, we draw on personality theory to conceptualise personality states. However, there are other psychological state constructs such as affect (Kuppens, 2015), perceived situation characteristics (Rauthmann et al., 2015), and modes (Lazarus & Rafaeli, 2023). LST models have been applied to affect (e.g. Olatunji et al., 2020; Yasuda et al., 2004). However, given the vast and diverse literature on the nature of affective states (e.g. Barrett, 2017; Moors et al., 2013; Scherer, 2009), whether the framework we develop here applies to these constructs depends on their conceptual definition.
Another area in which the revised latent state-trait theory has been applied are perceived situation characteristics, which are conceptualised as dimensional mental representations of situations (Rauthmann et al., 2015). Thus, they are state variables which are shaped both by situational cues and by characteristics of the perceiver (Rauthmann et al., 2015, 2019). Columbus, Norget, et al. (in prep) analyse perceived situation characteristics (specifically, perceptions of multiple dimensions of interdependence) using LST-R models. They find that perceptions of multiple dimensions of interdependence reflect trait and state influences to different degrees. Perceived situation characteristics are in many ways similar to personality states in that they are coherent, dimensional, and situation-specific. Thus, our proposed criteria for the evaluation of personality state measures may similarly apply to measures of perceived situation characteristics.
Alternative approaches and possible extensions
A limitation of the LST-R models presented in this paper is that the latent state residuals are a composite of situation effects and person × situation interactions. As such, these models do not allow for insights into whether the situation-specific deviation from the trait level is purely due to situation-specific influences or also dependent on the trait level. For instance, is Ahmed’s high Extraversion state solely caused by the fact that he is at a party or does his generally high Extraversion trait level lead him to enjoy the party even more than a person with a lower Extraversion trait level, or both? However, there are alternative LST models that make it possible to disentangle trait-dependent influences of the situation (i.e. person × situation interactions) from main effects of the situation.
LST models for the combination of random and fixed situations (LST-RF) rely on a specific longitudinal measurement design to disentangle situation effects and person × situation interactions (Geiser et al., 2015). Whereas in LST-R models situations are assumed to be random (i.e. sampled randomly and interchangeably from the universe of possible situations) and unknown, LST-RF models additionally require fixed situations that are either experimentally induced (e.g. manipulated in a laboratory) or naturally occurring (e.g. recorded in ecological momentary assessment studies) and thus known to the researcher. This design allows comparing the effect of situations that are of particular substantive interest on states and traits (Geiser et al., 2015). In addition, compared to LST-R models, which implicitly conceptualise traits as situation-unspecific, LST-RF models enable a more contextualised view on traits and allow researchers to investigate whether and to which degree traits are situation-specific (Castro-Alvarez et al., 2022; Geiser et al., 2015).
Despite their appealing properties, LST-RF models have rarely been applied in empirical research. One challenge is that the models rely on known, fixed situations. One promising avenue for future research may be to combine the assessment of personality states with the assessment of situations using modern situation taxonomies (e.g. Situational Interdependence Scale, DIAMONDS; Gerpott et al., 2018; Rauthmann et al., 2015) or mobile sensing (Harari et al., 2020). Combining the assessment of personality states and situations would enable researchers to use LST-RF models to parse out the contribution of personality × situation interactions to personality states. This may be particularly valuable in the context of interactionist affordance models, which posit that manifestations of personality traits are context-dependent (e.g. de Vries et al., 2016; Thielmann et al., 2020).
Besides LST-R theory, there exist a range of alternative approaches to modelling intrapersonal dynamics which may be amenable to the assessment of personality states. For example, it has recently been suggested to estimate both between- and within-person reliabilities using a two-level random dynamic measurement model (Xiao et al., 2023). Moreover, latent Markov factor analysis can be used to probe intrapersonal changes in measurement models over time or situations (Vogelsmeier et al., 2019). Latent Markov factor analysis may be particularly valuable to identify context-specific changes in the measurement model of personality states.
LST-R models only address the level of states, but do not account for other parameters of their distribution. Therefore, the approach presented here does not provide insights into individual differences in variability across situations. However, mixed-effects location-scale models (Hedeker et al., 2008) allow for between- and within-person heterogeneity in variances, which makes it possible to identify person- and situation-level influences on variability in personality states (for recent applications, see Mader et al., 2023; Shrestha et al., 2024).
Conclusion
Personality states describe how a person feels, thinks, and behaves in a particular situation. We formally define personality states within LST-R theory and translate this definition into testable latent variable models. In this framework, latent states reflect the influences of a person’s characteristics and prior experiences as well as those of systematic situation-specific influences. Within this framework, we propose criteria for evaluating and interpreting measures of personality states. We argue that it is particularly important to design and evaluate personality state measures for their specificity, that is, for their ability to reliably assess intrapersonal variability. An application of this framework to an existing measure of Extraversion illustrates how our approach leads to different interpretations and conclusions compared to common practices in state scale evaluation. Adopting an LST-R framework for research on personality states has the potential to improve measurement practices and to clarify and advance personality theory.
Footnotes
Acknowledgements
We thank Whitney R. Ringwald for sharing the data for the empirical example.
Author contributions
Martina Bader and Simon Columbus share first authorship on this manuscript.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
