Sage Journals: Discover world-class research

Abstract

Dynamic characteristics like intraindividual means (iMs) and standard deviations (iSDs) reliably provide insight into the level and variability of emotions people experience within a given window of time. However, it is unclear the extent to which these measures characterize capture stable individual differences rather than transient states and, if so, how many observations are necessary to ensure reliable measurement. In the current study, we use data from two experience sampling studies that use burst designs to test the effect that the number of within-person observations has on the one-week and one-year stability of iMs and iSDs for various measures of emotion. We find that a modest one-year of stability of r = .50 could be reached for iMs with an average of 14 observations per person, while for iSDs an average of 42 observations were necessary, with many items failing to reach r = .50 at the maximum number of observations available. These findings suggest that in typical experience sampling designs iMs capture stable individual differences but iSDs may not, even with many within-person observations.

Keywords

ecological momentary assessment intraindividual variability measurement

Introduction

Experience sampling studies – which involve repeated assessments of affect and behavior – are thought to capture people’s lives with greater ecological validity and precision than typical, single time point surveys (Trull & Ebner-Priemer, 2013). Repeated assessments are often summarized with statistics like intraindividual means (iM) and standard deviations (iSD), providing insight into the typical level and variability, respectively, of a given emotion or behavior that a person experiences (Dejonckheere et al., 2019; Ram & Gerstorf, 2009); these summary statistics are referred to as dynamic characteristics. Researchers often assume that dynamic characteristics accurately reflect relatively stable, trait-like features of people (Ram & Gerstorf, 2009). However, it is unclear to what extent these estimates capture stable individual differences that persist across time periods of months or years. If dynamic characteristics are sufficiently stable, researchers need to know how many repeated assessments are necessary to collect, given the cost of these studies to both participants and scientists (Hopwood et al., 2022). In the current study, we investigate the one-week and one-year stability of affective and personality iMs and iSDs as a function of the number of assessments administered.

Why use dynamics?

While cross-sectional surveys have long been used to capture individual differences in affect and behavior, repeated assessments are thought to better capture these constructs because they reduce memory biases and allow for the study of within-person variation (Ram & Gerstorf, 2009; Colombo et al., 2020). Repeated assessments can be summarized into dynamic characteristics using both simple equations (e.g., a variance) or more complex calculations (e.g., mean squares of successive difference) to estimate different theoretical person-level properties (e.g., variability and momentary instability, respectively) (Dejonckheere et al., 2019). Researchers often use these dynamic characteristics to summarize the states people experience across a short period of time. Moreover, these dynamic characteristics are often assumed to capture trait-like properties (Ram & Gerstorf, 2009; Scott et al., 2020). It is this latter construal of dynamic characteristics that are the focus of the current paper.

The current paper will focus on two commonly studied dynamic characteristics: intraindividual means (iMs) and standard deviations (iSDs). Intraindividual means are typically construed as corresponding with the trait level of a construct someone experiences. For example, the iM of positive affect is thought to capture a person’s trait positive affect. iSDs, on the other hand, are typically construed as trait variability, or how variably a person experiences positive affect, for example. The variability of emotions is of direct and unique interest in understanding the emotional experience of people with and without mental illness (Ong & Ram, 2017; Bosley et al., 2019).

Experience sampling studies are typically limited in the number of items assessed at each timepoint and it is commonplace to aggregate across items of positive and negative affect to decrease measurement error (Epstein, 1983). However, it is also important to assess single items, which have been shown to be stable, heritable, and predict important life outcomes (Allen et al., 2022; Mõttus et al., 2017, 2019). Additionally, single-item and aggregated measures of momentary affect have shown comparable correspondence with retrospective reports, reliability, and concurrent and predictive validity (Dejonckheere et al., 2022; Ganzach & Yaor, 2019; Song et al., 2023).

Dynamic characteristics like iMs and iSDs of positive affect are relevant to many important outcomes, including mental health and socioemotional well-being (Houben et al., 2015; Heininga et al., 2019; Houben & Kuppens, 2020). Dynamic characteristics have been posited to describe trait-like tendencies and abilities (Ram & Gerstorf, 2009), and there is evidence that dynamic characteristics correspond with trait measures and are heritable (Colombo et al., 2020; Schneider et al., 2020; Zheng et al., 2016).

Are measures of dynamics reliable?

While the utility of complex affect dynamics (including the autocorrelations and the mean square of successive differences) has been questioned (Dejonckheere et al., 2019), research has demonstrated the power of simple dynamics like iM and iSD to predict outcomes like psychological well-being and mental health diagnoses (Wendt et al., 2020; Dejonckheere et al., 2019; Bosley et al., 2019; Schuler et al., 2021). However, the validity of measures is dependent on their reliability. Measures with low reliability have inconsistent or spurious relationships with outcomes, limiting the validity of inferences that can be made from either significant or null results (John & Soto, 2007; McCrae et al., 2011).

A substantial amount of research has assessed the reliability of momentary measures of affect and dynamic indices (Dejonckheere et al., 2022; Scott et al., 2020; Wendt et al., 2020; Wilhelm & Schoebi, 2007). Broadly speaking, this work provides evidence for the good reliability of momentary measures of affect and intraindividual means. For example, Wendt and colleagues (2020) found acceptable split-half reliabilities of intraindividual means (r’s > .9) and standard deviations (r’s between .67 and .95), whereas more complex dynamics were largely redundant with iMs and iSDs. However, other research has found that iSDs have lower reliability and are particularly sensitive to the number of observations (Estabrook et al., 2012; Wang & Grimm, 2012).

Are measures of dynamics stable?

Similar to but distinct from construct reliability is construct stability. Stability can be assessed in many ways (e.g., Aldridge et al., 2017; Parsons et al., 2019; Revelle & Condon, 2019) but is typically assessed through test-retest correlations. Test-retest correlations contain information both about how reliable a measure is (i.e., how consistently a measure captures the underlying construct) and how stable it is (i.e., the degree to which that underlying construct is an enduring, trait-like feature of people; Fraley & Roberts, 2005). When researchers attempt to measure known traits, low test-retest correlations are typically construed as a problem of measurement. On the other hand, in the context of stability, low test-retest correlations are typically construed as countervailing evidence for the trait-like nature of the construct.

The time frame between retests is an important consideration for distinguishing reliability from stability. Reliability is broadly considered to be the consistency of a measure across shorter time frames. For example, Dejonckeere and colleagues (2022) provide a demonstration of this decay across a time scale of minutes in their recent investigation of the reliability of single items of momentary affective measures. Stability, on the other hand, is this consistency of a measure across longer periods of time. For example, the degree to which test-retest correlations decay over long periods of time provides insight into the extent to which the construct is stable (Fraley & Roberts, 2005).

What constitutes an adequate to assess stability, however, depends on the purposes for which these measures are intended. For researchers intending to characterize the emotional dynamics of a single period of time, such as one day or one week, extant research demonstrates the reliability of dynamic indices (Mejía et al., 2014). However, longer time intervals are necessary to determine whether these measures capture stable individual differences (Caspi & Roberts, 2001; Roberts & DelVecchio, 2000). Intervals of a year or longer are typical for assessing the stability of measures of personality (Fraley & Roberts, 2005; Specht et al., 2011). This is an unfortunate gap in the literature: most research on the psychometrics of momentary assessments and dynamic characteristics focus on the (short-term) reliability of these measures (Mejía et al., 2014) but not the extent to these measures capture (long-term) stable individual differences. For example, Scott and colleagues (2020) found adequate to good reliability at both the within- and between-person levels across multiple datasets utilizing a single burst of data collection. However, it is important to recognize that the individual study bursts ranged from 7 to 14 days. Therefore, it is unclear whether the constructs measured during this window capture something about each participant that will remain stable across a longer period. Nonetheless, this work provides evidence that these measures reliably capture meaningful within- and between-person differences within a given period.

Prior research on the stability of affect dynamics utilizes simulations to estimate their reliability and stability under various assumptions (Estabrook et al., 2012; Wang & Grimm, 2012). This research is valuable in that it provides insight into the likely effect of the number within-person observations on the stability of these dynamics (Wang & Grimm, 2012). However, the extent to which these assumptions hold in real-world data collection scenarios is currently unclear (Wrzus & Neubauer, 2023). A goal of the current research is to build on this work using data from studies that more closely resemble the data that most researchers will utilize.

How stable is stable enough?

The optimal stability of measures depends on the purposes of the measures. For measures that are intended to capture transient states, lower levels of stability are to be expected and unproblematic. However, measures that are intended to characterize enduring patterns of experience or behavior (i.e., traits), stability is crucial. Prospective prediction is impeded when measures contain more state variance than trait variance, whereas contemporaneous relationships will not be impacted (Clark et al., 2003).

What constitutes sufficient stability, then, will vary across research goals. Extant research on the stability of personality traits can help contextualize the degree to which affect dynamics are stable. For example, single-timepoint, global measures of neuroticism show a one-year stability between .49 and .67 for young adults (Fraley & Roberts, 2005). Hudson and colleagues (2017a) find that day to day affect using a daily reconstruction method show a one-year stability of ∼ .50 for positive affect and ∼.35 for negative affect. Further, they show that global measures of positive affect are comparably stable, but global negative affect shows increased stability relative to the daily reconstruction method (rs = .60–.73). If affect dynamics capture enduring patterns of emotional experience, we would expect to find comparable stability.

Both reliability and stability need to be established

Reliability is necessary, but not sufficient, for stability (Heise, 1969; Watson & Walker, 1996; Wheaton et al., 1977). Reliability without stability would indicate that dynamic indices capture short-term emotional states but not enduring person-level traits. Possibly – perhaps probably –dynamic characteristics capture both states and traits. Nonetheless, demonstrating that dynamic indices reliably characterize state emotional experience is not the same as demonstrating that these dynamic indices provide useful information about traits (Revelle & Condon, 2019). Understanding whether these constructs are stable measures of individual difference across longer periods of time is, therefore, a separate but important question that addresses the degree to which dynamic characteristics estimate traits (Parsons et al., 2019).

A typical repeated-measures study – such as an ecological momentary assessment or daily diary study – captures participants for many days over a single period of assessment, such as one week or one month. This is an impediment to establishing the stability of dynamic characteristics, as these summary statistics are estimated for each participant only once during the study. Alternatively, burst designs – wherein multiple bursts of repeated assessments are separated by a period of time – provide an opportunity to investigate the test-retest correlations of iMs and iSDs derived from repeated assessments (Gerstorf et al., 2014; Ram et al., 2014). For example, a burst design may assess participants several times a day for one week, then pause data collection for a month, then assess participants for a second week.

How many assessments are sufficient?

The use of repeated assessments assumes that multiple measures produce better estimates of means and variances than a single assessment (Epstein, 1980a; 1980b). More is presumably better, but a practical question is how many assessments are needed to capture something enduring about a person rather than merely summarizing their emotional experience during a short period of time? Wrzus and Neubauer (2023) find that on average, EMA studies scheduled six assessments per day for seven days with a 79% compliance rate, or 33 within person observations. However, there is currently limited evidence regarding the sufficient number of repeated assessments to produce stable dynamics.

Because collecting repeated assessments is burdensome to both participants and researchers, identifying the minimum number of assessments to produce stable dynamics is an important practical goal for researchers. Moreover, such information is essential for a priori power analyses (e.g., Lafit et al., 2021). Simulations suggest that iMs achieve good stability with a smaller number of assessments than iSDs, which rarely achieved a stability of .60 even after 150 assessments (Wang & Grimm, 2012). To our knowledge, there has been no investigation of the effect of burst spacing on dynamic characteristic stability.

In the current study, we utilize data from two studies using a burst design to assess the one-week and one-year stability of affect dynamics. We examine the impact that the number of within-person assessments has on reliability to provide insight into the question: how many assessments need to be included in an experience sampling study to attain stable affect dynamics?

Methods

Participants and procedures

Data used in this study came from two pre-existing datasets: the Personality and Interpersonal Relationships Study (PAIRS; Vazire et al., 2015) and Leuven 3-Wave-Longitudinal Study (Erbas et al., 2018; accessed via the EMOTE Database). Table 1 shows total sample sizes and demographic information for both studies. In the PAIRS study, participants completed surveys four times per day for two weeks for two bursts spaced one year apart (M = 373 days). In the Leuven study, participants completed surveys ten times per day for one week for two bursts spaced one year apart (M = 363 days). As in most EMA studies, some participants failed to complete all surveys, and there was some amount of attrition from one burst to the next. Therefore, different models in our analysis used different subsets of these data. For the number of participants and observations per analysis, see Table 2.

Table 1.

Sample demographics. Gender and race are presented as N (% of full sample).

	Leuven (N = 191)	PAIRS (N = 416)	Total (N = 607)
Gender
N-Miss	0	2	2
Female	107 (56.0%)	278 (67.1%)	385 (63.6%)
Male	84 (44.0%)	136 (32.9%)	220 (36.4%)
Race
African	2 (1.0%)	0 (0.0%)	2 (0.3%)
American Indian or Alaska native	0 (0.0%)	1 (0.2%)	1 (0.2%)
Asian	1 (0.5%)	0 (0.0%)	1 (0.2%)
Asian or Asian-American	0 (0.0%)	98 (23.6%)	98 (16.1%)
Black or African American	0 (0.0%)	44 (10.6%)	44 (7.2%)
European	185 (96.9%)	0 (0.0%)	185 (30.5%)
Hawaiian native or Pacific Islander	0 (0.0%)	1 (0.2%)	1 (0.2%)
Hispanic or Latino	0 (0.0%)	27 (6.5%)	27 (4.4%)
Middle Eastern	3 (1.6%)	0 (0.0%)	3 (0.5%)
Not reported	0 (0.0%)	4 (1.0%)	4 (0.7%)
Other	0 (0.0%)	27 (6.5%)	27 (4.4%)
White	0 (0.0%)	214 (51.4%)	214 (35.3%)
Age
N-Miss	0	15	15
Mean (SD)	18 (1)	19 (2)	19 (2)
Range	17–24	18–39	17–39

Table 2.

Sample sizes and within-person observations.

Study	Interval	N	N Obs
PAIRS	One week	347	10,844
PAIRS	One year	205	10,720
Leuven	One year	178	25,087

Measures

The PAIRS study included thirteen emotion and behavior items asking participants to rate on a scale from 1 (Not at All) to 5 (A lot) “In the last hour, how _____ were you” and two items asking “How much positive/negative emotion did you experience” on the same scale. The Leuven study included ten items asking participants to rate on a slider scale from 0 to 100 “How ____ do you feel at the moment?” and a valence-arousal grid asking people to rate how passive/active and unpleasant/pleasant they felt on a scale from 0 to 8. To maximize the comparability of results from both studies, we present results using items that were either assessed in both studies (depressed, lonely, happy, relaxed) or had close counterparts across these two studies (positive/negative in PAIRS and unpleasant/pleasant in Leuven) are presented in the main analyses. Additionally, we aggregated the items Happy and Relaxed to create a positive affect scale and Depressed and Lonely for a negative affect scale. Results using the full set of items can be found in the supplemental file.¹

Analyses

How stable are iMs across one year and one week?

We tested the one-year stability of dynamic indices by calculating each person’s mean and standard deviation using all their responses to each item (a maximum of 60 responses per person in the PAIRS sample and 80 in the Leuven sample) at each burst. Stability was operationalized as the Pearson test-retest correlation between iM or iSD of each item at each burst. This provided a point estimate for the stability of the iM and iSD of each item. For example, the stability of iM of Positive Emotion was the Pearson correlation between iM of Positive Emotion of burst 1 and iM of Positive Emotion at burst 2. We include the one-year stability for both the PAIRS and Leuven studies to enable comparisons between the findings, but also test the one-week stability of the PAIRS study by dividing the data from the first wave of the study into week 1 and week 2 and repeat all analyses using these data. Additionally, we include stability estimates from all possible time intervals in the supplemental file.²

We then used a bootstrapping approach to quantify the robustness of this point estimate (Efron, 1994). For each bootstrap sample, we sampled N participants with replacement; N was equal to the full sample size for each interval. Then, within-participant, we sampled O_i observations from each burst with replacement; O_i is the number of observations for each participant. In other words, each bootstrapped sample yields the same number of people, N, as the original sample and, for each person, the same number of observations, O_i. We calculated the iM and iSD for each person at each burst and calculated the Pearson correlation between these estimates. This procedure was repeated 500 times, resulting in 500 bootstrapped test-retest estimates for each item and index. Empirical means and 99% confidence intervals were calculated from the bootstrapped distributions of test-retest correlations.

How many assessments are needed for stable iMs?

To test the impact that the number of within-person observations has on the one-year stability of intraindividual means, we calculated a series of bootstrapped Pearson test-retest correlations, increasing the number of observations O_i selected from 1 to the maximum possible (i.e., 60 for the PAIRS sample and 80 for the Leuven sample) for each person. As before, for each of the 500 bootstrap samples we sampled N participants with replacement; then, within-individual, sampled O_i observations from each burst with replacement. We calculated the iM for each person at each burst and calculated the Pearson correlation (and 99% confidence interval) between these estimates.

One-week stabilities were then identified using data from the PAIRS study comparing the first and second weeks of the first burst of the study. We repeated the bootstrapping procedure described above, increasing the number of observations O_i selected from 1 to the maximum possible (i.e., 30 observations) for 500 bootstrap samples.

How stable are iSDs across one-week and one-year, and how many assessments are necessary?

We then investigated the stability of iSDs across one-week and -year, as well as the impact that the number of within-person observations has on that stability. We repeated the analyses described in the previous sections, substituting the calculation of the iSD for the iM in each bootstrap sample, and calculating the test-retest correlations between iSD at each timepoint.

Results

How stable are iMs across one year and one week?

For each study, duration, and item, we computed the average correlation among the bootstrap samples at each number of observations from one to the maximum possible. These results are presented in Figure 1. The average one-year stability of iMs across all items achieved at the maximum number of observations was 0.57. This ranged from a low of 0.39, 99% CI [0.38, 0.40] for Relaxed in the PAIRS study to 0.68 [0.68, 0.69] for the Positive Composite in the Leuven study. The Leuven study had higher maximum reliabilities across shared measures with an average of 0.63 for Leuven and 0.52 for PAIRS. There was a notable difference for Relaxed, which showed a difference of 0.26 across studies. See Supplemental Table 8 for the maximum correlations for each item, study and duration.

Figure 1.

Stability of iMeans as a function of the number of observations. Each curve depicts the 99% confidence interval around the mean retest-correlation at each number of observations.

The average one-week stability of iMs achieved at the maximum number of observations for the PAIRS study was 0.55 with a minimum of 0.40 [0.39, 0.41] for Relaxed to 0.62 [0.62, 0.63] for Depressed. While one-week stability for most items in the PAIRS sample were comparable to one-year stability, a notable exception was for Negative Emotion, with a maximum average stability of 0.55 [0.54, 0.55] at one week and 0.41 [0.40, 0.42] at one year.

How many assessments are needed for stable iMs?

We estimate the minimum number of observations required to achieve an average stability of .50 (an intentionally low threshold for determining when a measure is stable) by taking the average stability across all bootstrap samples for each potential number of observations. Across items and studies, the average number of observations required to reach .50 was 14 with a range of 7–42.

How stable are iSDs across one-week and one-year, and how many assessments are necessary?

As with iMs, for each study, time duration, and item, we computed the average correlation among the bootstrap samples at each number of observations from one to the maximum possible. These results are presented in Figure 2. The average one-year stability of iSDs achieved at the maximum number of observations for each study was 0.36 with a minimum of 0.14 [0.13, 0.15] for Relaxed in the PAIRS study and a maximum of 0.60 [0.58, 0.60] for Positive Emotion in the Leuven study. The Leuven study had higher maximum stability across shared measures, with an average stability of 0.52 for Leuven and 0.20 for PAIRS. Again, these differences persisted when comparing each study at the same number of observations.

Figure 2.

Stability of iSDs as a function of the number of observations. Each curve depicts the 99% confidence interval around the mean retest-correlation at each number of observations.

The average one-week stability of iSDs achieved at the maximum number of observations for the PAIRS study was 0.33 with a minimum of 0.22 for Negative Emotion to a maximum of 0.41 for Depressed, notably higher than the one-year stability for iSDs in the PAIRS sample.

For items that achieved a stability of .50, the average number of observations required to reach a one-year stability of .50 was 42 with a range of 26–55. Notably, the only items that achieved a stability of .50 were from the Leuven study.

Discussion

The current study attempted to estimate the one-week and one-year stability of dynamic characteristics (intraindividual means and standard deviations) commonly extracted from ESM studies. Intraindividual means were moderately stable across both one week and one year, providing justification for their use and interpretation as stable individual differences. Indeed, the current results are comparable to estimates by extant research on the stability of measures of neuroticism and daily emotion (i.e., one-year stabilities between .35 and .67) (Fraley & Roberts, 2005; Hudson et al., 2017a). On the other hand, intraindividual standard deviations showed poor to moderate stability across either window of time, limiting their interpretation as stable traits.

While iMs were largely stable, there were some discrepancies in the stability of single items between the samples. For example, Relaxed showed the lowest stability of all items in the PAIRS sample, which was not the case for the Leuven sample. It is possible that the greater number of daily observations in the Leuven sample increased the likelihood of assessing participants during a time when they were able to be relaxed (i.e., multiple assessments in the evenings), but this was not explicitly tested. Accordingly, future work should examine the impact of assessment frequency and timing on the reliability and stability of these measures.

The single item measures of positive and negative affect showed divergent stability at longer-and comparable stability at shorter-timescales. Intraindividual means of the Positive Affect Composite across one year showed stability that was comparable to most other constructs, while the iM of the Negative Affect Composite showed lower stability than most other constructs. However, the iM of the Negative Affect Composite showed comparable stability across one week showed stability that was comparable to most other constructs. This is consistent with previous findings that daily positive emotion is more strongly related to trait- than state-dynamics compared to negative emotion, though both are moderately stable (Hudson et al., 2017b).

Across both studies, a one-year iM stability of .50 was achieved using an average of 14 observations (Leuven = 12, PAIRS = 16). In the PAIRS sample, a one-week stability of .50 was achieved using an average of 10 observations. These findings are concordant with those from Wang and Grimm’s (2012) simulation. This has direct implications for researchers who use ESM to assess intraindividual means: stable measures can be achieved with a relatively small number of observations per participant. Notably, the average study that collects 33 within person observations (e.g., Wrzus & Neubauer, 2023) should provide useful information about how much positive and negative affect people typically experience.

Intraindividual variability (iSDs) were less stable across one week and one year than iMs. iSDs from the Leuven study demonstrated moderate stability across one year, with an average test-retest correlation of .52 (using 80 observations); in the PAIRS sample, average one-year stability was substantially lower (.20 using 60 observations). The average one-week stability of iSDs in the PAIRS sample was slightly higher (.33). As in Wang and Grimm (2012), iSDs show consistently lower stability than iMs. However, in the current study, we find substantially lower stability for iSDs than in Wang and Grimm (2012).

These results shed some doubt on the notion that iSDs exhibit the qualities of stable, trait-like individual differences without careful attention to measurement. At the very least, iSDs should not be assumed to be characteristic of people, rather than merely descriptions of states, without further evidence that the stable component has been captured. Given that the standard errors of standard deviations are larger than the standard deviation of the mean, it is to be expected that iSDs would show lower stability than iMs at the same sample size (McNemar, 1962; Schmiedek et al., 2009).

Despite this, it may be the case that intraindividual variability is not a stable individual difference or that it is unrealistic to capture the stable component in a typical ESM study, and that it clarifies mechanisms of well-being and mood dysregulation (Houben et al., 2015). However, if measures of intraindividual variability characterize the moment more than the person, it will limit their predictive potential across time and attenuate relationships with future outcomes (McCrae et al., 2011). If researchers fail to find prospective relations between iSDs and future outcomes, it will be unclear whether those findings are evidence that intraindividual variability is not predictive of those outcomes or whether the measures captured mostly state-like fluctuations in affect that are not predictive across time because of features of the study the impacted measurement (Clark et al., 2003). They may falsely conclude that intraindividual variability doesn’t have predictive power, when the truth may be that they did not capture enough of the stable component to adequately test whether variability is related to distal outcomes due to insufficient sampling. In short, assuming that these measures capture stable characteristics of people when they more closely reflect people-during-a-specific-timeframe can lead researchers to draw false inferences about their results.

The studies differ in multiple ways, some of which may help explain their divergent results. While we did not systematically evaluate the impact of study differences on the stability of these measures, researchers may find it useful to evaluate their study plans to determine whether the data they expect to collect may be closer to the Leuven or PAIRS studies. Indeed, the Leuven and PAIRS study diverge in the one-year stability of the variability of the same items at the same number of observations, suggesting that the overall number of observations is not sufficient to explain differences in the average maximum stability. The Leuven study had notably higher adherence than the typical ESM study (Wrzus & Neubauer, 2023), and less attrition across waves than the PAIRS study. The PAIRS study, however, included more within-person observations over a longer period than the typical study (Wrzus & Neubauer, 2023). Additionally, the sampling intensity of the Leuven study (10 times per day) may have captured more meaningful variability in emotions than the PAIRS study (4 times per day). A limitation is that the current study does not address the impact that sampling frequency has on the stability of these measures but focuses on the total number of within person observations (Hopwood et al., 2022). For example, it may be the case that meaningful variability occurs over a shorter timescale and therefore benefits more from intensive sampling in a short window rather than sparse sampling over a longer time, whereas stable means can be obtained simply with a sufficient number of observations. Future research should examine the impacts of different sampling designs and study features on the stability of dynamic indices. Nonetheless, these results provide insight into the stability of emotion dynamics with similar items and sampling designs (Wrzus & Neubauer, 2023).

Conclusions

We find that intraindividual means derived from repeated measures of affect and personality show acceptable stability across one week and one year with relatively few within-person observations. However, intraindividual standard deviations show less consistent stability and may be more sensitive to factors like study design.

Supplemental Material

Supplemental Material - The stability of intraindividual means and standard deviations of affect

Supplemental Material for The stability of intraindividual means and standard deviations of affect by Ian Shryock, David M Condon and Sara J Weston in Personality Science

Footnotes

Author note

The handling editor was Dr Carolyn MacCann.

Acknowledgements

Data for the Leuven study were provided by the EMOTE Database.

Author contributions

Ian Shryock: Conceptualization; Data curation; Formal analysis; Methodology; Visualization; Writing - original draft; Writing - review & editing.

Dr David M Condon: Conceptualization; Methodology; Writing - review & editing.

Dr Sara J Weston: Conceptualization; Data curation; Methodology; Writing - original draft; Writing - review & editing.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the National Science Foundation Grant BCS-1125553 (to Simine Vazire).

ORCID iDs

Ian Shryock

David M. Condon

Data accessibility statement

Data from study 1 are available from the corresponding author upon request and data from study 2 are available at the EMOTE database.

Supplemental material

Supplemental material for this article is available online. All code and a datafile containing results of the bootstrapping analyses are available at the OSF page for this project: .

Notes

References

Aldridge

V. K.

Dovey

T. M.

Wade

(2017). Assessing test-retest reliability of psychological measures: Persistent methodological problems. European Psychologist, 22(4), 207–218. https://doi.org/10.1027/1016-9040/a000298

Allen

M. S.

Iliescu

Greiff

(2022). Single item measures in psychological science. European Journal of Psychological Assessment, 38(1), 1–5. https://doi.org/10.1027/1015-5759/a000699

Bosley

H. G.

Soyster

P. D.

Fisher

A. J.

(2019). Affect Dynamics as Predictors of Symptom Severity and Treatment Response in Mood and Anxiety Disorders: Evidence for Specificity. Journal for Person-Oriented Research, 5(2), 101–113. https://doi.org/10.17505/jpor.2019.09

Caspi

Roberts

B. W.

(2001). Personality development across the life course: The argument for change and continuity. Psychological Inquiry, 12(2), 49–66. https://doi.org/10.1207/S15327965PLI1202_01

Clark

L. A.

Vittengl

Kraft

Jarrett

R. B.

(2003). Separate personality traits from states to predict depression. Journal of Personality Disorders, 17(2), 152–172. https://doi.org/10.1521/pedi.17.2.152.23990

Colombo

Suso-Ribera

Fernández-Álvarez

Cipresso

Garcia-Palacios

Riva

Botella

(2020). Affect Recall Bias: Being Resilient by Distorting Reality. Cognitive Therapy and Research, 44(5), 906–918. https://doi.org/10.1007/s10608-020-10122-3

Dejonckheere

Demeyer

Geusens

Piot

Tuerlinckx

Verdonck

Mestdagh

(2022). Assessing the reliability of single-item momentary affective measurements in experience sampling. Psychological Assessment, 34(12), 1138–1154. https://doi.org/10.1037/pas0001178

Dejonckheere

Mestdagh

Houben

Rutten

Sels

Kuppens

Tuerlinckx

(2019). Complex affect dynamics add limited information to the prediction of psychological well-being. Nature Human Behaviour, 3(5), 478–491. https://doi.org/10.1038/s41562-019-0555-0

Efron

(1994). The Jackknife, the bootstrap and other resampling plans (6. printing). Society for Industrial and Applied Mathematics.

10.

Epstein

(1980a). The stability of behaviour: II. Implications for psychological research. American Psychologist, 35(9), 790–806. https://doi.org/10.1037/0003-066x.35.9.790

11.

Epstein

(1980b). The stability of behaviour: II. Implications for psychological research. American Psychologist, 35(9), 790–806. https://doi.org/10.1037/0003-066X.35.9.790

12.

Epstein

(1983). Aggregation and beyond: Some basic issues on the prediction of behaviour. Journal of Personality, 51(3), 360–392. https://doi.org/10.1111/j.1467-6494.1983.tb00338.x

13.

Erbas

Ceulemans

Kalokerinos

E. K.

Houben

Koval

M. L.

Kuppens

(2018). Why I don’t always know what I’m feeling: The role of stress in within-person fluctuations in emotion differentiation. Journal of Personality and Social Psychology, 115(2), 179–191. https://doi.org/10.1037/pspa0000126

14.

Estabrook

Grimm

K. J.

Bowles

R. P.

(2012). A Monte Carlo simulation study of the reliability of intraindividual variability. Psychology and Aging, 27(3), 560–576. https://doi.org/10.1037/a0026669

15.

Fraley

R. C.

Roberts

B. W.

(2005). Patterns of continuity: A dynamic model for conceptualizing the stability of individual differences in psychological constructs across the life course. Psychological Review, 112(1), 60–74. https://doi.org/10.1037/0033-295X.112.1.60

16.

Ganzach

Yaor

(2019). The retrospective evaluation of positive and negative affect. Personality and Social Psychology Bulletin, 45(1), 93–104. https://doi.org/10.1177/0146167218780695

17.

Gerstorf

Hoppmann

C. A.

Ram

(2014). The promise and challenges of integrating multiple time-scales in adult developmental inquiry. Research in Human Development, 11(2), 75–90. https://doi.org/10.1080/15427609.2014.906725

18.

Heininga

V. E.

Dejonckheere

Houben

Obbels

Sienaert

Leroy

van Roy

Kuppens

(2019). The dynamical signature of anhedonia in major depressive disorder: Positive emotion dynamics, reactivity, and recovery. BMC Psychiatry, 19. https://doi.org/10.1186/s12888-018-1983-5

19.

Heise

D. R.

(1969). Separating reliability and stability in test-retest correlation. American Sociological Review, 34(1), 93–101. https://doi.org/10.2307/2092790

20.

Hopwood

C. J.

Bleidorn

Wright

A. G. C.

(2022). Connecting theory to methods in longitudinal research. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 17(3), 884–894. https://doi.org/10.1177/17456916211008407

21.

Houben

Kuppens

(2020). Emotion Dynamics and the Association With Depressive Features and Borderline Personality Disorder Traits: Unique, Specific, and Prospective Relationships. Clinical Psychological Science, 8(2), 226–239. https://doi.org/10.1177/2167702619871962

22.

Houben

Van Den Noortgate

Kuppens

(2015). The relation between short-term emotion dynamics and psychological well-being: A meta-analysis. Psychological Bulletin, 141(4), 901–930. https://doi.org/10.1037/a0038822

23.

Hudson

N. W.

Lucas

R. E.

Donnellan

M. B.

(2017a). Day-to-day affect is surprisingly stable: A two-year longitudinal study of well-being. Social Psychological and Personality Science, 8(1), 45–54. https://doi.org/10.1177/1948550616662129

24.

Hudson

N. W.

Lucas

R. E.

Donnellan

M. B.

(2017b). Day-To-Day Affect is Surprisingly Stable: A 2-Year Longitudinal Study of Well-Being. Social Psychological and Personality Science, 8(1), 45–54. https://doi.org/10.1177/1948550616662129

25.

John

O. P.

Soto

C. J.

(2007). The importance of being valid: Reliability and the process of construct validation. In Handbook of research methods in personality psychology (pp. 461–494). The Guilford Press.

26.

Lafit

Adolf

J. K.

Dejonckheere

Myin-Germeys

Viechtbauer

Ceulemans

(2021). Selection of the Number of Participants in Intensive Longitudinal Studies: A User-Friendly Shiny App and Tutorial for Performing Power Analysis in Multilevel Regression Models That Account for Temporal Dependencies. Advances in Methods and Practices in Psychological Science, 4(1), 251524592097873. https://doi.org/10.1177/2515245920978738

27.

McCrae

R. R.

Kurtz

J. E.

Yamagata

Terracciano

(2011). Internal consistency, retest reliability, and their implications for personality scale validity. Personality and Social Psychology Review: An Official Journal of the Society for Personality and Social Psychology, Inc, 15(1), 28–50. https://doi.org/10.1177/1088868310366253

28.

McNemar

(1962). Psychological statistics (3rd ed.). Wiley.

29.

Mejía

Hooker

Ram

Pham

Metoyer

(2014). Capturing intraindividual variation and covariation constructs: Using multiple time-scales to assess construct reliability and construct stability. Research in Human Development, 11(2), 91–107. https://doi.org/10.1080/15427609.2014.906728

30.

Mõttus

Bates

T. C.

Condon

D. M.

Mroczek

Revelle

(2017). Leveraging a more nuanced view of personality: Narrow characteristics predict and explain variance in life outcomes. PsyArXiv. https://doi.org/10.31234/osf.io/4q9gv

31.

Mõttus

Sinick

Terracciano

Hřebíčková

Kandler

Ando

Mortensen

E. L.

Colodro-Conde

Jang

K. L.

(2019). Personality characteristics below facets: A replication and meta-analysis of cross-rater agreement, rank-order stability, heritability and utility of personality nuances. Journal of Personality and Social Psychology, 117(4), e35–e50. https://doi.org/10.1037/pspp0000202

32.

Ong

A. D.

Ram

(2017). Fragile and Enduring Positive Affect: Implications for Adaptive Aging. Gerontology, 63(3), 263–269. https://doi.org/10.1159/000453357

33.

Parsons

Kruijt

A.-W.

Fox

(2019). Psychological science needs a standard practice of reporting the reliability of cognitive-behavioural measurements. Advances in Methods and Practices in Psychological Science, 2(4), 378–395. https://doi.org/10.1177/2515245919879695

34.

Ram

Conroy

D. E.

Pincus

A. L.

Lorek

Rebar

Roche

M. J.

Coccia

Morack

Feldman

Gerstorf

(2014). Examining the interplay of processes across multiple time-scales: Illustration with the intraindividual study of affect, health, and interpersonal behaviour (iSAHIB). Research in Human Development, 11(2), 142–160. https://doi.org/10.1080/15427609.2014.906739

35.

Ram

Gerstorf

(2009). Time-structured and net intraindividual variability: Tools for examining the development of dynamic characteristics and processes. Psychology and Aging, 24(4), 778–791. https://doi.org/10.1037/a0017915

36.

Revelle

Condon

D. M.

(2019). Reliability from α to ω: A tutorial. Psychological Assessment, 31(12), 1395–1411. https://doi.org/10.1037/pas0000754

37.

Roberts

DelVecchio

(2000). The rank-order consistency of personality traits from childhood to old age: A quantitative review of longitudinal studies. Psychological Bulletin, 126(1), 3–25. https://doi.org/10.1037/0033-2909.126.1.3

38.

Schmiedek

Lövdén

Lindenberger

(2009). On the relation of mean reaction time and intraindividual reaction time variability. Psychology and Aging, 24(4), 841–857. https://doi.org/10.1037/a0017799

39.

Schneider

Junghaenel

D. U.

Gutsche

Mak

H. W.

Stone

A. A.

(2020). Comparability of Emotion Dynamics Derived From Ecological Momentary Assessments, Daily Diaries, and the Day Reconstruction Method: Observational Study. Journal of Medical Internet Research, 22(9), e19201. https://doi.org/10.2196/19201

40.

Schuler

Ruggero

C. J.

Mahaffey

Gonzalez

Callahan

L.J.

Boals

Waszczuk

M. A.

Luft

B. J.

Kotov

(2021). When Hindsight Is Not 20/20: Ecological Momentary Assessment of PTSD Symptoms Versus Retrospective Report. Assessment, 28(1), 238–247. https://doi.org/10.1177/1073191119869826

41.

Scott

S. B.

Sliwinski

M. J.

Zawadzki

Stawski

R. S.

Kim

Marcusson-Clavertz

Lanza

S. T.

Conroy

D. E.

Buxton

Almeida

D. M.

Smyth

J. M.

(2020). A coordinated analysis of variance in affect in daily life. Assessment, 27(8), 1683–1698. https://doi.org/10.1177/1073191118799460

42.

Song

Howe

Oltmanns

J. R.

Fisher

A. J.

(2023). Examining the concurrent and predictive validity of single items. In Ecological momentary assessments. Office for Health Improvement and Disparities.

43.

Specht

Egloff

Schmukle

S. C.

(2011). Stability and change of personality across the life course: The impact of age and major life events on mean-level and rank-order stability of the big five. Journal of Personality and Social Psychology, 101(4), 862–882. https://doi.org/10.1037/a0024950

44.

Trull

T. J.

Ebner-Priemer

(2013). Ambulatory assessment. Annual Review of Clinical Psychology, 9(1), 151–176. https://doi.org/10.1146/annurev-clinpsy-050212-185510

45.

Vazire

Wilson

R. E.

Solomon

Bollich

Harris

Weston

Jackson

J. J.

(2015). Personality and interpersonal roles (PAIRS) .

46.

Wang (Peggy)

Grimm

K. J.

(2012). Investigating reliabilities of intraindividual variability indicators. Multivariate Behavioural Research, 47(5), 771–802. https://doi.org/10.1080/00273171.2012.715842

47.

Watson

Walker

L. M.

(1996). The long-term stability and predictive validity of trait measures of affect. Journal of Personality and Social Psychology, 70(3), 567–577. https://doi.org/10.1037//0022-3514.70.3.567

48.

Wendt

L. P.

Wright

A. G. C.

Pilkonis

P. A.

Woods

W. C.

Denissen

J. J. A.

Kühnel

Zimmermann

(2020). Indicators of affect dynamics: Structure, reliability, and personality correlates. European Journal of Personality, 34(6), 1060–1072. https://doi.org/10.1002/per.2277

49.

Wheaton

Muthen

Alwin

D. F.

Summers

G. F.

(1977). Assessing reliability and stability in panel models. Sociological Methodology, 8, 84. https://doi.org/10.2307/270754

50.

Wilhelm

Schoebi

(2007). Assessing mood in daily life. European Journal of Psychological Assessment, 23(4), 258–267. https://doi.org/10.1027/1015-5759.23.4.258

51.

Wrzus

Neubauer

A. B.

(2023). Ecological momentary assessment: A meta-analysis on designs, samples, and compliance across research fields. Assessment, 30(3), 825–846. https://doi.org/10.1177/10731911211067538

52.

Zheng

Plomin

von Stumm

(2016). Heritability of Intraindividual Mean and Variability of Positive and Negative Affect. Psychological Science, 27(12), 1611–1619. https://doi.org/10.1177/0956797616669994

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.01 MB

0.00 MB