Abstract
Infant temperament is usually considered biologically driven and a precursor of personality. Despite being conceived as trait measures, parent reports for assessing infant temperament use short timescales, for example, the past seven days, implying variability in temperament traits’ expressions. In two daily diary studies, we used the whole trait theory perspective to investigate whether infant temperament is observable daily and to what degree it varies within person across days. In Study 1, N = 137 mothers of infants aged 6–18 months reported on their infant’s daily (state) temperament (median number of days: 8 and total observations: 984). The results suggest a substantial within-person variation in daily infant temperament (ICCs: .41–.54). Study 2 (N = 199 mothers, median number of days: 7, and total observations: 1375) replicated these results on the variability in infant state temperament (ICCs: .41–.51). In addition, infant state temperament was related to infant trait temperament. However, certain temperament items—primarily those assessing surgency—were frequently rated as not applicable and did not seem suitable for daily assessments. Across both studies, results indicate substantial within-person variability in daily infant temperament and a strong trait component.
Plain language summary
Infants differ in their reactions to situations. For instance, some infants are easily soothed while others might cry a lot. While temperament is often seen as something fixed that children are born with and that is related to their future personality, the way researchers ask parents to report on it tends to focus on short time frames, like the past week. In two studies, we asked mothers of infants aged 6 to 18 months to report on how their child behaved each day to see how much infants’ behavior changes from day to day. In the first study, 137 mothers reported on their infants’ daily temperament for about 8 days and the results showed that there are significant variations in their behavior. The second study (199 mothers) confirmed these findings, indicating that infants’ daily behavior differs quite a bit from day to day. Interestingly, certain aspects of temperament were more difficult to observe on a daily basis, so the questions used to assess these aspects might need to be revised. Overall, our results show that infants’ behavior varies daily, but there is also a stable component.
“No trait theory can be sound unless it allows for, and accounts for, the variability of a person’s conduct.” (Allport, 1961, p. 333)
Introduction
Infant and child temperament are consensually considered precursors of personality and have been shown to predict adult personality (Shiner, 2019; Tang et al., 2020) and life outcomes 30 years later (Wright & Jackson, 2022). Infant temperament is typically conceptualized as a biologically driven trait. Though this is rarely made explicit, this notion seems to imply that infant temperament is a stable trait. In terms of assessment, infant temperament is most typically measured as a one-time rating, which is then used to predict later outcomes. In contrast to measures of adult personality, however, instruments assessing infant temperament refer to very short timescales, such as the past seven days, which seems to imply that infant temperament varies day-to-day. In the present study, we address this inconsistency by testing whether infant temperament can be reliably assessed on a daily level and whether trait measures converge with aggregates of daily assessments.
Within- and between-person variability in infant temperament
To date, research on infant temperament has almost exclusively focused on understanding between-person variability (i.e., differences between individuals) using a trait theory perspective. Drawing on theoretical conceptions as well as empirical research in personality psychology, we know that also within-person variability in personality (i.e., variations within one individual across different times and situations) can be meaningfully assessed and used to predict outcomes of interest (e.g., Baird et al., 2006).
The role of within-person variability in personality has been discussed and recognized in personality psychology for decades (e.g., Bem, 1983; Fournier et al., 2008; Moskowitz, 1982, 1994; for a review, see Mischel, 2004). While some early studies (e.g., Moskowitz, 1982) have demonstrated both cross-situational consistency and variability of specific traits in children, this notion has not yet been integrated into conceptualizations of child temperament. In studying how parent-reported infant temperament varies from day to day, we rely on whole trait theory (Fleeson & Jayawickreme, 2015, 2021) in particular. Whole trait theory offers both a descriptive and explanatory account of traits. The present study focuses on the descriptive side of the theory, which suggests that personality traits can be understood as a density distribution of personality states, that is, momentary enactments of personality traits (Baird et al., 2006; Fleeson, 2001). From this perspective, individuals can be described not only in terms of the mean of the density distribution but also in terms of the variation of the observed personality states. In terms of assessment, individuals are assessed multiple times (typically within similar and/or different situations), allowing a description of both the consistency and dynamics of personality states.
For instance, 13-month-old Joshua might be described by an average mean level of negative affectivity and a low variation in displaying negative affectivity. That is, Joshua displays some negative emotionality across many situations and times but does not vary a lot in his reactions to different situations or across different days. Another 13-month-old, Nora, might also be described by an average mean level of negative affectivity but high variability in displaying negative affectivity. Nora could be characterized by showing strong negative emotional reactions in certain situations (such as being restricted) on certain days but not showing any negative emotional reactions in other situations or on other days.
Infants typically show a wide range of behaviors, which may appear to contradict the notion of a relatively stable temperament. Whole trait theory proposes a way to reconcile this apparent inconsistency. It suggests that traits can both be relatively stable and vary in their expression across situations. Findings regarding different personality traits in adults have suggested substantial variability in personality states at the within-person level (i.e., the personality states of one person typically vary substantially across different situations or days). At the same time, studies have also supported substantial between-person variability in the density distribution of personality states (i.e., general differences in personality dimensions concerning the width, shape, and location of their state distributions), allowing to identify consistency (Fleeson & Jayawickreme, 2021). In adult personality, trait measures of personality are strong predictors of state manifestations despite variability in state manifestations of personality (Fleeson & Gallagher, 2009). To our knowledge, these ideas have never been applied to the study and assessment of (infant or child) temperament. In the present study, we thus investigated whether this general pattern of substantial within-person and between-person variability also holds for the dimensions of infant temperament. We also explored the implications of this perspective for the assessment of infant temperament.
Dimensions of infant temperament
Temperament is commonly conceptualized to describe individual differences in reactions to internal and external stimuli and in self-regulatory processes. An individual’s temperament is expressed in a general pattern of responses shown in behavior (Rothbart & Bates, 2006). In other words, temperament traits can be defined as “early emerging basic dispositions in the domains of activity, affectivity, attention, and self-regulation” (Shiner et al., 2012, p. 437). However, “temperament is not a static construct but one that develops” (Stifter & Dollar, 2016, p. 547) in interaction with genetic predispositions and the environment. For instance, infants who cry more often, or are more easily excited, evoke different parental soothing strategies (Stifter & Moding, 2018), shaping the development of temperament (Stifter & Dollar, 2016). Thus, behavioral manifestations of temperament differ by age and change over time.
In infancy, temperament typically manifests in the child being easily startled, excited, or soothed. As reviewed by Zentner and Bates (2008), most temperament measures used in infancy rely on either the neurobiological developmental approach of Rothbart (Rothbart, 1981; Rothbart & Bates, 2006) or the child psychiatric approach of Thomas and Chess (1977). Thomas and Chess describe nine dimensions of infant temperament (activity level, regularity, approach/withdrawal, adaptability, threshold of responsiveness, intensity of reaction, quality of mood, distractibility, and attention span/persistence), whereas Rothbart distinguishes three broader biological-based dimensions: surgency, negative affectivity, and orienting/regulation. While the concrete dimensions of infant temperament described by different theoretical conceptions differ, they can be summarized under the four dimensions of emotionality, extraversion, activity, and persistence (Mervielde & Asendorpf, 2000). In terms of the five-factor model of personality (McCrae & John, 1992), temperament dimensions like surgency are seen as a precursor of extraversion, negative affectivity or quality of mood as a precursor of neuroticism, and orienting/regulation or task persistence as a precursor of conscientiousness (Mervielde & Asendorpf, 2000). These dimensions predict the corresponding adult personality traits decades later (Shiner, 2019; Tang et al., 2020).
In the present study, we investigate the higher-order temperament dimensions of surgency (= extraversion and activity), negative affectivity (=emotionality; neuroticism), and orienting/regulation (=persistence; conscientiousness) as conceptualized by Mary Rothbart (Gartstein & Rothbart, 2003).
Assessment, stability, and variability of infant temperament
Infant temperament can be assessed by both observational measures, such as the Neonatal Behavior Assessment Scale (NBAS; Brazelton, 1973) or the Infant Laboratory Temperament Assessment Battery (Lab-TAB; Goldsmith & Rothbart, 1996), and parent questionnaires. These different assessment approaches typically converge to a modest degree, with correlations rarely exceeding .30 (e.g., Planalp et al., 2017; Tirosh et al., 1992). Parent questionnaires represent the most common method for assessing infant temperament (Kiel et al., 2018), despite being subject to biases and limitations (e.g., Gartstein et al., 2012). Parents are seen as being in a unique position to observe their infants across many different situations and over longer periods compared to other observational measures (Daum et al., 2022; Rothbart & Bates, 2006). Already during infancy, parent ratings of infant temperament are relatively consistent across situations (Wachs et al., 2004) and time (Bornstein et al., 2015; Casalin et al., 2012; Gartstein et al., 2015; Putnam et al., 2008; Sieber & Zmyj, 2022). Thus, the temporal stability of temperament measures is generally higher for parental reports than for observational measures (Planalp et al., 2017; Stifter & Dollar, 2016). For instance, Planalp and colleagues (2017) reported that rank-order stabilities for infants aged 6 and 12 months varied between r = .41 and r = .59 for parent reports of infant temperament and between r = .14 and r = .32 for observational measures. Likewise, meta-analytic evidence across various temperament measures suggests a moderate rank-order stability of ρ = .35 for infants under three years of age (Roberts & DelVecchio, 2000). Generally, the stability of temperament increases with age (Lemery et al., 1999; Roberts & DelVecchio, 2000) and decreases with the length of the interval between assessments (Bornstein et al., 2015). Although there might be differences in the stability of different temperament dimensions (e.g., Lemery et al., 1999; Worobey & Blajda, 1989), the overall pattern remains unclear and is not well-studied to date.
Despite being conceived as trait measures, parent reports of infant temperament commonly use relatively short timescales as a reference in their instructions, such as the past seven days in the Infant Behavior Questionnaire (IBQ; Gartstein & Rothbart, 2003; Putnam et al., 2014). This is in contrast to typical five-factor model personality questionnaires, which generally do not consider such a period (e.g., Danner et al., 2016). However, it does take into account the rapid changes in child development in the first years of life. In 6-month-old children, for example, an age difference of just two weeks can be associated with major differences in language development, cognitive development, and motor repertoire (Bayley, 2006). However, the given time interval of the last seven days implies that all described behaviors should be observable on a daily basis. Yet, to date, no study has asked parents to report daily on their infants’ temperament to determine if all behaviors are observable daily. Therefore, it remains unknown if infant temperament is observable daily, whether infant temperament shows stability across days, and to what degree temperament varies within individuals across multiple days.
Most studies on the stability of infant temperament focused on intervals of several months, and the shortest time intervals used have typically been six weeks or three months (Bornstein et al., 2015; Carranza Carnicero et al., 2000; Sieber & Zmyj, 2022; Worobey & Blajda, 1989). However, given the growing interest in describing infant development on different timescales, for instance, dynamics in parental soothing and infant regulation (Buhler-Wassmann & Hibel, 2021), it becomes pertinent to investigate how infant temperament manifests itself on shorter timescales, such as days. Studies in adults and school-aged children have particularly focused on within-person variability with regard to negative affect (e.g., Brose et al., 2020; Könen et al., 2016), and higher variability in personality states has been typically associated with higher trait neuroticism/emotional instability (Eid & Diener, 1999). This association of trait negative affectivity with variability in state manifestations would also be consistent with theoretical accounts that posit interindividual differences in children’s sensitivity to context (Belsky et al., 2007; Belsky & Pluess, 2009). Such general differences in sensitivity to context have been associated with higher levels of negative affectivity in children (Pluess et al., 2018) and higher levels of neuroticism in adults (Lionetti et al., 2018) and can be considered a trait in its own right. In addition, meta-analytic evidence suggests that children with higher levels of negative affectivity might be more susceptible to parenting behaviors (Slagt et al., 2016), suggesting a stronger within-person coupling between parenting behaviors and behavioral outcomes in children high in trait negative affectivity. Consequently, we expected negative affectivity (as a trait) to be associated with increased variability in daily temperament states.
Overview and aims of the present studies
In two studies, we investigated the variability of infant temperament by assessing daily fluctuations in parent-reported temperament across ten days. In Study 1, we investigated two higher-order dimensions of infant temperament (negative affectivity and orienting/regulation) using a daily diary design. In Study 2, we replicated Study 1’s findings and extended them by assessing a third dimension (surgency) and collecting assessments on trait infant temperament at baseline. The two studies focused on four main objectives. Firstly, we aimed to describe the variability of daily measures of infant temperament at both the within- and between-person levels. Concretely, we addressed the following research questions: • Do daily measures of infant temperament show substantial within-person and between-person variability? (Studies 1 and 2) • Does within-person variability of daily infant temperament differ between different dimensions of infant temperament? (Studies 1 and 2)
Secondly, we sought to apply the descriptive side of the whole trait theory (Fleeson & Jayawickreme, 2015, 2021) to infant temperament. In doing so, we aimed to answer the following research question: • Are temperament states substantially related to the respective temperament trait? (Study 2)
Thirdly, we wanted to test to which extent variability in temperament states represents a trait and to what extent it is related to trait negative affectivity. • Is within-person variability in daily (state) temperament (across all temperament dimensions) related to the trait temperament dimension of negative affectivity? (Study 2)
Lastly, our fourth aim was to enhance the understanding of the role of time scales in the assessment of infant temperament. Therefore, we investigated the characteristics of individual temperament items (mean, within-person variation, and between-person variation) when assessed daily in Study 2. We also studied the associations of the aggregated state item with its corresponding trait item and scale.
Study 1 was exploratory and has not been preregistered. Study 2 was confirmatory and has been preregistered (https://osf.io/edh2u).
Study 1: Variability of infant temperament
In Study 1, we exploratively investigated the between- and within-person variability in parent-rated infant temperament. It was part of a larger data collection during the COVID-19 pandemic (Reinelt et al., 2022). Because previous studies using German versions of the IBQ and its short forms (Gartstein & Rothbart, 2003; Putnam et al., 2014) had only found two factors (Bayer et al., 2015; Sieber & Zmyj, 2022; Vonderlin et al., 2012), we only assessed the dimensions of negative affectivity and orienting/regulation (and not surgency) in Study 1. This study’s data can be freely obtained from Zenodo: https://zenodo.org/record/6399959#.ZAVAaa2ZND8.
Method
Participants
In total, N = 357 parents participated in the larger study (Reinelt et al., 2022). In the present study, we included only mothers (i.e., participants who identified as female) with infants aged 6–18 months who completed the baseline questionnaire (containing the demographic questions) and at least two of the ten daily diaries. We further excluded data from very preterm-born children (i.e., <32 weeks of gestation at birth) and diary data from days without variance on the item level (i.e., same answers for all nine temperament items). Documentation of how many observations/participants were lost due to each exclusion criterion can be viewed in a participant flowchart on the OSF. The included and excluded participants (i.e., eligible participants without valid diary data) did not differ significantly with regard to child variables (i.e., age, gestational age at birth, and gender). With regard to parental variables, there were no significant differences in age, employment status, and parental income. However, included parents with valid diaries had a higher level of education (χ2 = 6.54, p = .011) and were less likely to have a migration background (χ2 = 7.01, p = .008).
The final sample consisted of N = 137 infants. Infants were M = 11.6 months old (SD = 3.21 months), 45.3% were boys, and 54.7% were girls. Thirteen of these children (9.49%) were born preterm (i.e., before 37 weeks of gestation), leading to an average corrected age of M = 11.46 months (SD = 3.21 months). Mothers were M = 34.9 years old (SD = 4.01 years, range: 24–49 years). Most of the mothers (n = 127, 94.1 %) lived in Switzerland, and 54.8% had a migration background (i.e., were born outside of the country of residence). At the time of the survey, 68.9% of the mothers were working either full-time or part-time, whereas 31.1% were on parental leave or unemployed. The majority of mothers (55.6%) were first-time mothers, while 36.3% were caring for another child, and 8.1% had two or three additional children. Overall, participating mothers were highly educated (82.2% reported having a tertiary education degree) and reported a median household income of 10′000–12′000 CHF (inter-quartile range: [7′500–8′700 CHF; 12′000–15′000 CHF]) 1 , which is above the average Swiss household income of families with children younger than four years of age (approximately 8′300 CHF, Bundesamt für Statistik, 2021).
Due to the exploratory nature of Study 1, we did not conduct an a priori power analysis. A sensitivity analysis using G*Power 3.1 (Faul et al., 2009) revealed that given α = .05 (two-tailed), our sample allowed us to detect correlations of at least |r | = .24 with 80% power.
Procedure
Participants were recruited from April to July 2021. To recruit participants, we mainly contacted parents who had given birth in the 18 months prior to the study at the University Hospital Zurich and who had provided general consent to be contacted for research studies. Additionally, the study was promoted on social media, targeting parents from German-speaking countries. Participants were not paid for participating but took part in a raffle for one of 10 vouchers valued at 50 CHF (approximately 50 USD). Participants received three raffle tickets for completing the baseline questionnaire and one additional raffle ticket for each day they participated in the diary (i.e., a maximum of 13 raffle tickets). After giving informed consent, participants completed a baseline online questionnaire. Starting the following evening, they were invited by e-mail to complete a diary survey every evening for ten consecutive days. On average, mothers reported on their infant’s temperament on 7.14 days (SD = 2.85 days; median: 8 days, Ntotal observations = 984).
Measures
Daily infant temperament was assessed by an adaptation of the German version of the IBQ (Gartstein & Rothbart, 2003; Putnam et al., 2014) used in the German National Educational Panel Study (NEPS; Bayer et al., 2015). This version assesses two higher-order temperament dimensions: negative affectivity (4 items) and orienting/regulation (5 items). The items are answered on a 7-point Likert-type scale (ranging from 1 = “never” to 7 = “always”). To make the instrument suitable for assessing daily temperament, the instructions and all items were rephrased to refer to the current day. For instance, an item starting with “If your child was tired” was adapted to “If your child was tired today.” In most cases, this only required a slight adaptation of the item. Mothers could also indicate that an item did not apply on the respective day. The study (both baseline and daily diary) contained several other measures, as described in Reinelt et al. (2022).
Results
Composition of within- and between-person variabilities of daily infant temperament
Intra-class correlations (ICCs) were computed to differentiate between-person variability from within-person variability and measurement error in mothers’ daily assessments of infant temperament. The ICC and 95% bootstrap confidence interval (CI) for negative affectivity was ICC = .54, 95% CI: [.46; .62], indicating that 54% of the daily variance of the negative affectivity dimension could be attributed to between-person differences, that is, general differences between mothers or infants that do not vary across days. Likewise, an ICC = .41, 95% CI: [.32; .49], for orienting/regulation indicated that 41% of the daily variance of the orienting/regulation dimension could be attributed to between-person differences.
MacDonalds ω was calculated to assess whether the temperament dimensions could be reliably assessed both on the between-person and on the within-person level. Results indicated high reliabilities on the between-person level (negative affectivity: ω = .86; orienting/regulation: ω = .89) and satisfactory reliabilities on the within-person level (negative affectivity: ω = .60; orienting/regulation: ω = .66).
Differences in variability across temperament dimensions
When comparing the within-person means and standard deviations between the two temperament dimensions, the average within-person mean was lower for negative affectivity (M = 3.72, SD = 1.17) than for orienting/regulation (M = 5.98, SD = 0.69), t (125) = −16.66, p < .001, d = 2.34. In contrast, the average within-person variability was higher for negative affectivity (MSD = 0.81, SDSD = 0.40) than for orienting/regulation (MSD = 0.56, SDSD = 0.42), t (125) = 5.33, p < .001, d = 0.55.
Whereas the within-person variability of negative affectivity and orienting/regulation was moderately correlated, r = .32, 95% CI: [.15; .47], p = .002, both were unrelated to the infants’ age or corrected age, gestational age at birth, and sex as well as the mother’s age, educational level, migration background, working status, and whether the infant was the mother’s first child (for test statistics see online supplement on the Open Science Framework (OSF): https://osf.io/fb74t).
Discussion
Study 1 explored the extent of within- and between-person variability in infants’ daily temperament. Results indicate substantial variability in the negative affectivity and orienting/regulation dimensions. Approximately 50% of the variance in each dimension could be attributed to the within-person level. Importantly, both temperament dimensions showed a high reliability on the between-person level and a satisfactory reliability on the within-person level. Thus, results support a trait component of infant temperament while also revealing a substantial state component.
Within-person variability was higher for negative affectivity than for orienting/regulation, which aligns with de Weerth et al.’s (1999) argument that emotional reactions like crying or fussing are an infant’s means to communicate with their caregivers. Indeed, intra-individual variability in crying has been considered standard in typically developing infants (St. James-Roberts & Halil, 1991), and it might covary with parenting practices (de Weerth & van Geert, 2001). Yet, neither daily variability in negative affectivity nor daily variability in orienting/regulation were related to descriptive characteristics of the infant (age, gestational age at birth, and sex) or the mother (age, education, and migration background), suggesting that variability in infant temperament might itself constitute an infant’s characteristic.
Such a notion aligns with assumptions that there are between-person differences in children’s sensitivity to environmental stimulation (as a trait), implying between-person differences in within-person behavioral variation (e.g., regarding crying and soothability) (Belsky et al., 2007; Belsky & Pluess, 2009). We found an association between infant variability in daily negative affectivity and daily variability in orienting/regulation, which could indicate a common underlying trait. Although higher state variability has typically been linked to higher trait neuroticism in adults (Eid & Diener, 1999), no conclusions can be drawn from the current study because no measures of infant trait temperament had been assessed. In addition, it was not assessed whether mothers interacted with their children during the day, and some mothers might have consulted their partners or other caretakers.
Study 2: Variability of infant temperament in relation to trait measures
Study 2 aimed to replicate Study 1’s findings and extend them in two ways: First, we included the third higher-order dimension of temperament (surgency) to study the full set of dimensions as described in Rothbart’s conception of temperament. Second, we also included a baseline measure of trait infant temperament to investigate how the daily measures’ mean level and variability relate to the standard trait measure of temperament. Finally, Study 2 addressed a methodological weakness of Study 1 by excluding daily reports if the mother had no contact with her child on the respective day.
The following preregistered hypotheses were tested in Study 2: (1) Daily measures of infant temperament show substantial within-person and between-person variabilities. The variance decomposition of daily temperament measures in the present study is similar to that in Study 1. (2) The within-person variability of daily temperament is larger for the temperament dimension negative affectivity than the temperament dimension orienting/regulation. (3) Temperament states are substantially related to the respective temperament trait. In particular, we expect the convergent associations (e.g., aggregated negative affectivity states with negative affectivity trait) to be larger than the discriminant correlations (e.g., aggregated negative affectivity states with orienting/regulation or surgency). (4) Within-person variability in daily (state) temperament (across all three temperament dimensions) is positively related to the trait temperament dimension of negative affectivity.
Method
Participants
A total of 369 parents participated in Study 2. As preregistered, we included mothers (i.e., participants who identified as female) with infants aged 6–18 months who completed the baseline questionnaire and at least two of the ten daily surveys. Further inclusion criteria were that infants had to be born after 32 weeks of gestation. In addition, we excluded participants without variance on the baseline temperament measure and diary entries without variation (i.e., on a specific day, mothers chose the same response for all items). Deviating from the preregistration, all diary entries on days when mothers reported having had no contact with their child were also excluded. We documented how many observations/participants were lost due to each exclusion criterion in a participant flowchart on the OSF. Included and excluded participants (i.e., eligible participants without valid diary data) did not differ significantly on child variables (age, gestational age at birth, gender, and baseline temperament dimensions) or parent characteristics (mother’s age, employment status, income, educational level, and migration background). The final sample consisted of N = 199 German-speaking mothers of singleton infants. Mothers were M = 34.84 years old (SD = 4.13; range: 24–51). Infants were M = 12.97 months old (SD = 2.98, range: 6–18 months), 52.8% were boys, and 47.2% were girls. Nine of these infants (4.52%) were born preterm (i.e., before 37 weeks of gestation), resulting in an average corrected age of M = 12.93 months (SD = 2.99 months). Most participants (n = 185, 93.0%) lived in Switzerland, and 55.3% had a migration background (i.e., were born outside of the country of residence). Overall, participants were highly educated (74.4% reported having completed a tertiary education degree) and reported a median household income of 8′700 CHF to 10′100 CHF (inter-quartile range: [6′400 CHF to CHF 7′500 CHF; 12′000 - 15′300 CHF])1. At the time of the survey, 82.4% of the mothers were working either full-time or part-time, whereas 17.6% were on parental leave, unpaid vacation, or unemployed. For 67.8% of the mothers, this infant was their first child, while 28.1% reported it was their second child, and 4.0% reported it was their third child.
Procedure
Participants were recruited from August to November 2022 2 . The sample size was determined by the number of mothers who participated until November 15, 2022 but was also informed by considerations of statistical power. We aimed at recruiting at least 138 participants since this sample size would have allowed us to detect at least medium-sized correlations (|r | ≥ .3) with a power of .95, according to a power analysis using G*Power 3.1 (Faul et al., 2009) assuming α = .05 (two-tailed). We chose the effect size of at least r = .30, building on work on the convergence between personality states and traits in adults, which finds that, meta-analytically, state-trait convergence is between .42 and .56 (Fleeson et al., 2009). Since we had no previous knowledge of the size of this association in infants, we opted for .30 as a more conservative estimate.
To recruit participants, we contacted parents who had given birth in the 18 months prior to the study at the University Hospital Zurich (Zurich, Switzerland) and who had provided general consent to be contacted for research studies. Due to a lower participation rate than expected from Study 1, we additionally used a database of parents recruited from birth registries in communities in and around the city of Zurich who had also given consent to be contacted for research studies. Therefore, the population in this database is highly comparable to the one recruited at the University Hospital Zurich. In addition, we advertised the study on social media platforms. Participants were not paid for their participation but could take part in a raffle for one of 10 vouchers valued at 50 CHF (approximately 50 USD). Participants received three raffle tickets for completing the baseline questionnaire and one additional raffle ticket for each day they participated in the diary (i.e., a maximum of 13 raffle tickets).
Upon providing informed consent, participants completed a baseline questionnaire presented online. Starting the following evening, they were invited by e-mail to complete a diary survey every evening for ten consecutive days, which took around 13 minutes. On average, mothers reported on their infant’s temperament on 6.91 days (SD = 2.55; median: 7 days, Ntotal observations = 1375). Both the baseline and daily diary survey contained additional measures irrelevant to the present study’s aims.
Measures
Trait infant temperament was assessed by the German version of the IBQ (Gartstein & Rothbart, 2003; Putnam et al., 2014) used in the pilot phase of the NEPS (Bayer et al., 2015). This version assesses three higher-order temperament dimensions: negative affectivity (5 items, ω = .81), orienting/regulation (5 items, ω = .77), and surgency (5 items, ω = .56). Compared to Study 1, one additional item was added in the negative affectivity scale, and the surgency dimension was additionally assessed. Items were answered on a 7-point Likert-type scale (ranging from 1 = “never” to 7 = “always”) and referred to the previous seven days.
Daily infant temperament was assessed by an adaptation of the trait measure, parallel to Study 1. Items of the trait measure were adapted to refer to the respective day. Again, mothers could also indicate that an item did not apply on this day.
Data analyses
To test Hypotheses 1 and 2, intra-class correlations were calculated for each dimension of the state temperament measure. Intra-class correlations of this study were statistically compared to intra-class correlations from Study 1, limiting the analysis to the same set of items for this comparison. Equality of intra-class correlations was tested by comparing the 95% confidence intervals of the ICCs derived from 5000 bootstrap samples. To test Hypothesis 3, we used linear regressions predicting the aggregated daily temperament scores (states) by the temperament traits. To test the robustness of these results, we used multilevel models predicting daily infant state temperament with infant trait temperament as a level 1 predictor. For all multilevel analyses, we centered person-level predictors around the grand mean. To test Hypothesis 4, we first computed the standard deviation of an infant’s score across all daily measures of temperament. Following the suggestion by Baird et al. (2006) to account for the dependency of the standard deviations with the mean, we first predicted the within-person standard deviation by the associated within-person mean and the square of the within-person mean in a regression analysis. We then used the resulting residuals as dependent variables in a regression analysis with the trait temperament dimensions as independent variables. The item-level research questions were analyzed by the same methods, using individual temperament items instead of scales. Because they might potentially impact reports on infant temperament, the following variables were included in the analyses of Hypotheses 3 and 4 as covariates as preregistered: infant’s age and sex, gestational age at birth (i.e., week of pregnancy at birth), mother’s age, mother’s educational level (tertiary education: yes/no), and mother’s migration background (yes/no). Multilevel analyses regarding Hypothesis 3 additionally included measurement time point (i.e., number of completed daily assessments) to control for potential effects of repeated assessment. For robustness checks, all analyses have been repeated without covariates and with the infant’s corrected age instead of the combination of the infant’s chronological age and gestational age at birth. All materials, analysis scripts, and supplementary analyses are provided on the OSF (https://osf.io/fb74t).
Results
Hypothesis 1: Composition of within- and between-person variabilities of daily infant temperament
Intra-class correlations for the present study indicated that 51% of the variance in daily negative affectivity, 95% CI: [.43; .47 ] 3 , 47% of the variance in daily orienting/regulation, 95% CI: [.40; .53], and 41% of the variance in daily surgency 95% CI: [.33; .48] could be attributed to differences between infants. The ICCs for negative affectivity and orienting/regulation were similar to the ICCs in Study 1. Likewise, as in Study 1, reliability estimates for negative affectivity and orienting/regulation were high on the between-person level (negative affectivity: ω = .86; orienting/regulation: ω = .93) and satisfactory on the within-person level (negative affectivity: ω = .66; orienting/regulation: ω = .70). However, reliability estimates were low for surgency (ωwithin = .41 and ωbetween = .44).
Hypothesis 2: Differences in variability across temperament dimensions
The temperament dimensions differed with regard to the within-person means, F(2, 396) = 307.79, p < .001, and standard deviations, F(2, 396) = 71.02, p < .001. Bonferroni-corrected paired comparisons revealed that within-person means for orienting/regulation (M = 5.90, SD = 0.69) were higher than within-person means for surgency (M = 5.48, SD = 0.67), p < .001, d = 0.62, and negative affectivity (M = 3.98, SD = 1.08), p < .001, d = 2.14. Within-person means for surgency were also higher than within-person means for negative affectivity, p < .001, d = 1.65. The opposite pattern was observed for the within-person standard deviations. On average, within-person standard deviations were larger for negative affectivity (M = 0.86, SD = 0.37) than for orienting/regulation (M = 0.55, SD = 0.34), p < .001, d = 0.89, and surgency (M = 0.58, SD = 0.36), p < .001, d = 0.78. The average within-person standard deviations did not differ between the temperament dimensions orienting/regulation and surgency, p = .60, d = 0.10. Thus, regarding orienting/regulation and negative affectivity, the within-person means and standard deviations show the same pattern as in Study 1.
Hypothesis 3: Associations of temperament traits and aggregated temperament states
Convergent and discriminant associations between temperament traits and aggregated temperament states.
Note. The table displays standardized beta-coefficients from a linear regression controlling for infant age, gestational age at birth, and sex, as well as maternal age, education, and migration background. 95% CIs are given in brackets.
Hypothesis 4: Trait negative affectivity as a predictor of within-person variability
Like in Study 1, the within-person variability of the negative affectivity and orienting/regulation dimensions were correlated, r = .33, 95% CI: [.20; .45], p < .001. In addition, we observed correlations between variability in negative affectivity and surgency, r = .24, 95% CI: [.11; .37], p < .001, and between orienting/regulation and surgency, r = .43, 95% CI: [.31; .53], p < .001. However, after controlling for differences in the temperamental state mean levels and the covariates, baseline (trait) negative affectivity was only related to within-person variability in surgency, β = .16, 95% CI: [.02; .31], p = .025. There were no associations between baseline (trait) negative affectivity and to within-person variability in negative affectivity, β = −.01, 95% CI: [-.16; .13], p = .847, or orienting/regulation, β = .03, 95% CI: [−.11; .17], p = .715. The pattern remained when the covariates of infant age and gestational age at birth were replaced with the infant’s corrected age (see Supplement on the OSF).
Exploratory analyses on the item level
Characteristics of Daily Temperament Items.
Note. Nmothers = 199; Ntotal observations = 1375; M w = mean within-person mean; SD w = mean within-person standard deviation; ICC = intra-class correlation; rid = association between aggregated state item and part-whole corrected aggregated corresponding state scale; rii = association between aggregated state item and corresponding trait item; rit = association between aggregated state item and corresponding trait scale. rid, rii, and rit reflect regression coefficients after controlling for infant age, gestational age at birth, and sex, as well as maternal age, educational level, and migration background. Brackets include the lower and upper limits of a 95% confidence interval.
Discussion
Study 2 aimed to replicate the results from Study 1 and extend them by including the temperament dimension of surgency and a baseline trait measure of infant temperament. As in Study 1, about 50% of the daily variance in the infant temperament states of negative affectivity and orienting/regulation could be attributed to the between-person level. Also, within- and between-person reliabilities were similar to Study 1, indicating both a substantial trait component of infant temperament and a substantial state component. Furthermore, like in Study 1, variability was larger for negative affectivity than for orienting/regulation.
However, these results did not translate to the temperament dimension of surgency. Admittedly, the ICC for surgency was similar to the ICCs for negative affectivity or orienting/regulation. Still, both the reliability for the within- and the between-person level were low—reflecting the relatively low reliability for surgency in the baseline trait measure. Low reliabilities for surgency had been reported before for German samples (Bayer et al., 2015; Sieber & Zmyj, 2022; Vonderlin et al., 2012) and might be due to some items not being adequately observable in the given timeframe. Indeed, mothers sometimes questioned the appropriateness of the timescales used in the questionnaires (Bayer et al., 2015). The results of the item-level analyses revealed that items related to the surgency dimension were frequently not applicable to the current day. For instance, a child might not be bathed every day, parents might not play “cuckoo” every day, and some parents might not play “cuckoo” at all. Thus, some items might not only be unsuitable for daily measurements but also result in low reliability (and validity) for standard trait measures of infant temperament.
Regarding construct validity, the convergent associations between aggregated state measures and the baseline trait measures were consistently larger than the discriminant associations for each scale. Thus, the data aligns with the assumptions of the whole trait theory. Item-level analyses further demonstrated that the associations for aggregated items were larger with the aggregated state scales than with the baseline trait measure. This might indicate that the reliability coefficients for these temperament dimensions might be higher on shorter timescales, namely, daily, than for a timespan “during the last seven days.” This is supported by the between-person reliabilities for negative affectivity and orienting/regulation, which were higher than those for the baseline trait dimensions. Thus, these results extend previous research arguing that the stability of temperament measures usually decreases with the length of the time interval between assessments (Bornstein et al., 2015; Stifter & Dollar, 2016) to the daily level.
Within-person variability was correlated across scales, suggesting a common underlying factor. However, contrary to our expectations and previous results from adult personality (e.g., Eid & Diener, 1999), trait negative affectivity as a precursor of neuroticism did not explain within-person variability except for the surgency dimension. One reason could be that during infancy, negative affectivity not only reflects a neuroticism-like trait, but crying, fussing, and whining also serve as a way of communication (de Weerth et al., 1999). In addition, infants depend on their caregivers to meet their needs and regulate their emotions (Pauen, 2016; Taipale, 2016). During the first year of life, parents learn how to respond adequately to their infant’s signals. They improve their soothing strategies and sort out strategies that did not work (Dayton et al., 2015). This changes the frequency of infant crying and how easily an infant can be soothed. Thus, in infants, negative affectivity might not be as predictive for variability across temperament dimensions as a developed personality trait like neuroticism has been for variability in adult personality.
General discussion
The present studies addressed the variability of infant temperament when measured by daily mother reports. Study 1 demonstrated substantial within- and between-person variability and Study 2 replicated these findings and further showed that daily measures of infant temperament systematically relate to trait measures.
Our first—primarily descriptive—aim was studying the within-person and between-person variability in daily measures of infant temperament. We found evidence that around 50% of the variance in daily measures of infant temperament can be attributed to between-person differences. To put these numbers into perspective, we can compare them to studies on variability in infant behavior and variability in child and adult affect. Regarding infant behavior, James-Roberts and Plewis (1996) found that within-person variability also accounted for around half (44%–53%) of the variability in sleeping, fussing, and crying from day to day. Our results regarding temperament states are comparable to these results. It seems that infant behavior—whether described on a more basic level or as a state expression of temperament—is characterized by variability and relatively stable individual differences. The results are also similar to results on daily affect in children and adults. In elementary school children, within-person variability accounted for 45–66% of the variance in daily positive affect, negative affect, and interest over one month (Könen et al., 2016). In adults, within-person variability accounted for 46% of the variance daily negative affect over eight consecutive days (Mroczek et al., 2003).
We compared the degree of variability across daily measurements for the three broad dimensions of temperament (negative affectivity, orienting/regulation, and surgency). Overall, negative affectivity showed more within-person variability than orienting/regulation (Studies 1 and 2) and surgency (Study 2). One possible explanation considers infant negative affectivity, particularly with regard to crying, whining, and fussing, not only as an infant’s characteristic but also as an infant’s way of communicating their needs (de Weerth et al., 1999). Thus, variability in negative affectivity is part of normal development (de Weerth et al., 1999; St. James-Roberts & Halil, 1991) but might decrease with the infant’s age as communication between infants and their caregivers advances, infants develop the ability to self-soothe, and more stable characteristics emerge (Pauen, 2016). In this study, we did not observe any associations of age and variability in infant state temperament. Still, as the sample size did not allow us to analyze more complex age effects, this question remains open.
Our second overarching aim was to apply whole trait theory (Fleeson & Jayawickreme, 2015, 2021) to infant temperament by investigating the extent to which state ratings of infant temperament align with trait ratings. The convergent associations between aggregated temperament states and their corresponding temperamental traits were strong (β ≈ .50) and consistently larger than the discriminant associations with different temperamental traits. The strength of these associations was similar to or even larger than the strength of associations between aggregated states and personality traits in adults (Fleeson & Gallagher, 2009; Rauthman et al., 2019). This suggests that the descriptive side of the whole trait theory can be applied to infant temperament as a precursor of personality traits.
Since there is considerable within-person variability, especially for negative affectivity, it is important to consider both the mean level of temperament and its variability. For instance, crying, whining, and fussing are a normal part of infant development. However, prolonged and excessive crying—that is, high levels of negative affectivity with low variability across days—is clinically relevant and might lead to long-term behavior problems (Hemmi et al., 2011; Zeifman & St James-Roberts, 2017).
The third aim of this study was to test to which extent variability in temperament states represents a trait, and to what extent it is related to trait negative affectivity. We found that within-person variability in the different temperament dimensions was positively correlated. Trait negative affectivity was only related to within-person variability in surgency but not to within-person variability in negative affectivity or orienting/regulation.
Our final aim was to expand knowledge on the role of time scales in assessing infant temperament and, specifically, to provide information on which infant behaviors are observable on a daily basis. We found that several items used were not easily observable. Seven of the 15 items were rated as not applicable to the present day more than 10% of the time, five of these (one item assessing negative affectivity, one item assessing orienting/regulation, and three items assessing surgency) more than 25% of the time, and two of these items (“How often did your child seem angry (crying and fussing) when you left them in the crib?” and “When your child was put in the bath water today, how many times did they laugh?”) were even rated as not applicable more frequently than they were answered (i.e., more than 50%).
We also tested the correspondence between each item answered using the trait instruction (“during the last seven days”) at baseline and aggregated across up to 10 days using the state instruction (“today”). Overall, we found a relatively high convergence between these two measures but also considerable variation. If we assume a relative stability of the behaviors assessed in the IBQ, which is supported by both our results and previous work on the test-retest reliability of the scale (Bornstein et al., 2015; Putnam et al., 2014; Worobey & Blajda, 1989), this convergence might be informative about the extent to which parent ratings in the trait version reflect what they observe in their infant’s daily behavior. For some items, this convergence is relatively low, and it is conceivable that for these items, the validity of the assessment could be improved by assessing it daily, allowing parents to report on their more immediate observations instead of recalling their infants’ behavior from several days ago.
Implications for the assessment of temperament
Our findings have implications for the assessment of infant temperament. First, given the between-person differences we observed and the convergence between aggregated state measures and trait measures, it seems that daily measures of infant temperament generally tap into relatively stable individual differences. Thus, our results imply that infant temperament can also be assessed at the daily level.
However, our findings raise concerns about some of the items in the IBQ, one of the most widely used parent questionnaires for evaluating infant temperament (Gartstein & Rothbart, 2003; Putnam et al., 2014). Firstly, some of the IBQ items were not observable on a daily basis. In the IBQ, parents are asked to report how often their infant has displayed a specific behavior in the past week. Our study suggests that parents may have had limited opportunities to observe some behaviors, sometimes only once or twice per week, or even not at all, which questions whether the item can accurately reflect the infant’s behavior. Therefore, we suggest carefully examining the situational conditions described in temperament items (e.g., leaving the child in the crib and bathing the child) and determining whether these situations occur frequently enough for parents to report them meaningfully in the given timeframe. The frequency of such situational conditions might also vary across time and changes in parenting practices, environmental conditions (e.g., whether or not a child attends daycare), and cultures. Revised temperament questionnaires could then be based only on items that can be observed over short time intervals. Alternatively, caregivers could be asked to report on their infant’s behavior over a longer period than the past seven days. More generally, we argue that more attention should be paid to the role of situations when assessing infant and child temperament. This would also allow studying whether infants show behaviors consistently in some situations (e.g., approaching an unfamiliar object) but not in other similar situations (e.g., approaching an unfamiliar person). Assessment instruments that carefully consider the role of different situations would allow to investigate behavioral signatures (Fournier et al., 2008; Mischel, 2004), that is, specific patterns of within-person variability across situations.
Second, we observed that the means of some items assessing state orienting/regulation and state surgency were relatively high, that is, close to or above 6 on a 7-point answer scale. In contrast, the means of the items assessing negative affectivity were closer to the scale midpoint. This suggests that some orienting/regulation and surgency items might be less able to differentiate between infants with high trait levels and that the relatively high means may limit our ability to observe variability. Analyzing the item characteristics of state assessments and selecting items accordingly so that within-person variability can be observed and reliably measured with sufficient validity (Mielniczuk, 2023) will be necessary to develop or improve both state measures of infant temperament and trait measures, particularly those using short timescales. In particular, it will be important to further increase the reliability of state measures of infant temperament, as low reliability may result in underestimation of between-person variability when using intra-class correlation (Wilms et al., 2020).
Theoretical implications
With the present studies, we responded to recent calls to apply insights from personality dynamics, specifically whole trait theory, to developmental psychology (Dykhuis et al., 2023). Our findings support the idea that whole trait theory can indeed be applied throughout the lifespan since in our sample of infants aged 6–18 months, we found temperament states and variability to be meaningful and, if aggregated, to converge with trait temperament. Like adult personality, infant temperament displays consistency over time and varies within person across days. Of course, the present effort can only be a start in bridging personality dynamics and developmental psychology, but our results are a promising starting point for the upcoming steps.
For example, although we employed the widely used intra-individual standard deviation as a starting point for assessing within-person variability in infant temperament, measures of within-person variability other than intra-individual standard deviation should be used to replicate the findings of these studies. Such attempts should include both alternative measures to assess within-person variability in infant temperament amplitude (e.g., intra-individual coefficient of variation) and measures that assess temporal dependencies in infant temperament from day to day (e.g., autocorrelations), as well as methods that simultaneously model within-person variability in amplitudes and temporal dependencies (Wang et al., 2012).
With manifestations of infant temperament changing substantially during the early years (Stifter & Dollar, 2016), within-person variability, changes of within-person variability over time, between-person differences in within-person variability, and daily within-person couplings of infant temperament with environmental factors (e.g., parenting behavior) could be particularly informative for understanding the development of trait temperament and personality within context. For instance, in developmental psychology, within-person variability is often seen as an indicator of long-term intra-individual change (Nesselroade, 1991) shaping between-person differences (Neubauer et al., 2023). As such, within-person variability should be higher during life transitions or might be indicative of sensitive developmental periods in which the environment has a stronger impact on development (Walasek et al., 2022). Regarding temperament and its development, there might be increased within-person variability at times of change (e.g., when a younger sibling is born or when the child enters daycare).
In adults, both biological and environmental factors influence personality development (Specht et al., 2014). Particularly in infancy and early childhood, parental co-regulation (e.g., calming down a crying infant) and parenting behavior in general (e.g., sensitive or harsh parenting) have been related to between-person differences in temperament (Samdan et al., 2020). Analyzing within-person couplings of temperament and environmental factors (e.g., parenting behavior) over time might, therefore, shed light on which environmental factors are relevant in shaping temperament and personality development at what time, as within-person couplings should be stronger for more important environmental factors.
Furthermore, between-person differences in such within-person couplings of state temperament and environmental factors could be interpreted as evidence for theoretical accounts proposing between-person differences in sensitivity to context (e.g., Belsky et al., 2007; Ellis et al., 2011). If a child is more sensitive to contextual factors (e.g., parenting behavior), this directly implies a covariation of these context factors with behavioral states (e.g., state temperament). However, although sensitivity to context is an inherent within-person research question, it has mainly been investigated by between-person study designs (Fischer et al., 2020). Thus, reliable measures of infant states (e.g., temperament states) could open new possibilities for analyzing sensitivity to context and how sensitivity to context, and changes in sensitivity to context (e.g., with age) might shape developmental change, particularly in the development of trait temperament.
Limitations
Several limitations of the present studies should be mentioned. First, our samples only comprised mothers, mainly of low-risk families with high socio-economic status. Both parents’ gender and education are related to the measurement of infant temperament (e.g., Casalin et al., 2012; Parade & Leerkes, 2008), and these characteristics limit the generalizability of our results. Likewise, our sample consisted of German-speaking mothers primarily living in Switzerland. Thus, the findings might not readily translate to other languages or cultural contexts.
Second, since we wanted to learn how mothers respond to standard temperament items when assessed at a daily level, we used a small set of temperament items that had not been developed for daily assessments. Our results showed that not all these items are suitable for daily assessments since they describe situations that do not typically occur daily. In addition, the 9 (Study 1) or 15 (Study 2) items focus on the three broad dimensions of negative affectivity, orienting/regulation, and surgency and do not lend themselves to more fine-grained analyses of narrower temperament dimensions. Future studies should try a broader range of items (e.g., from the long version of the IBQ) to find the items best suited for daily assessments while still reflecting all temperament dimensions and considering general guidelines for the assessment of personality states (e.g., Horstmann & Ziegler, 2020).
Third, our sample included mothers of infants between the ages of 6 and 18 months, covering a relatively broad age range within infancy. It is possible that the variability of temperament changes across this period. Although we used age as a covariate in our analysis, our sample size did not permit a more comprehensive examination of potential age effects, such as the trends documented by de Weerth et al. (1999), who showed that within-person variability in crying decreased after the age of ten months.
Conclusion
Both studies found substantial within-person variability in infant temperament and a strong trait component. The convergence between aggregated temperament states and trait measures of temperament suggests that whole trait theory can be applied during infancy. However, some items, particularly those related to surgency, were not applicable on a daily basis, which could impact the reliability and validity of commonly used trait measures of infant temperament.
Supplemental Material
Supplemental Material - How was your child’s temperament today and last week? Considering within-person variability in the measurement of infant temperament
Supplemental Material for How was your child’s temperament today and last week? Considering within-person variability in the measurement of infant temperament by Tilman Reinelt, Lisa Wagner, Debora Suppiger, Moritz M Daum, and Giancarlo Natalucci in European Journal of Personality
Supplemental Material
Supplemental Material - How was your child’s temperament today and last week? Considering within-person variability in the measurement of infant temperament
Supplemental Material for How was your child’s temperament today and last week? Considering within-person variability in the measurement of infant temperament by Tilman Reinelt, Lisa Wagner, Debora Suppiger, Moritz M Daum, and Giancarlo Natalucci in European Journal of Personality
Footnotes
Acknowledgments
The authors thank Marco Bleiker, Clarissa Frey, Ronja Noser, and Rebecca Oertel for their assistance in conducting the studies.
Author contributions
T.R.: Conceptualization, investigation, data curation, formal analysis, writing—original draft, and writing—review and editing. L.W.: Conceptualization, investigation, formal analysis, writing—original draft, and writing—review and editing. D.S.: Investigation and writing—review and editing. M.M. D.: Writing—review and editing. G.N.: Writing—review and editing.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
We are grateful for the support of the Family Larsson-Rosenquist Foundation.
Open science statement
Data for Study 1 are published on Zenodo (https://doi.org/10.5281/zenodo.6946048), and data for Study 2 are available on the Open Science Framework (
). In addition, we provide all materials (e.g., the original wording of instructions and items) as well as the R scripts underlying all analyses presented.
Ethical statement
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
