Abstract
Intensive longitudinal designs have been used to examine the fluctuations of callous-unemotional (CU) traits and their dynamic links with daily correlates; however, scant research has explored how CU traits manifest in daily contexts at the within-person level. This study evaluated the multilevel factor structure and psychometric properties of a short version of the Inventory of CU Traits in daily contexts among adolescents (n = 99, 2,132 daily reports) and young adults (n = 313, 6,431, and 4,018 daily reports at each wave). Both bifactor and correlated-factor models showed acceptable fit and reliability at the within- and between-person levels, though the general factor in the bifactor model demonstrated low reliability in university students. Longitudinal measurement invariance was supported among university students over a 2.5-year period, while structural differences emerged between the two samples. Findings highlight meaningful within-person fluctuations in daily CU traits. Future studies should evaluate the applicability of different factor models for a more accurate assessment across age groups.
Keywords
Callous-unemotional (CU) traits are characterized by shallow affect, lack of remorse and empathy, and lack of concern or interest in doing things well (Frick et al., 2014a, 2014b). A large body of research has shown that CU traits in childhood and adolescence are concurrently and prospectively associated with conduct problems (Colins et al., 2016), aggressive behaviors (Ciucci et al., 2014), antisocial behaviors (Cardinale & Marsh, 2020), risky sexual behaviors (Anderson et al., 2017), and substance use (Anderson et al., 2018), underscoring the significance of establishing valid and reliable tools to assess CU traits for both clinical and research purposes. CU traits are not immutable over time (Perlstein et al., 2023) and show substantial fluctuations (Goulter et al., 2024). Recent research has started to use intensive longitudinal designs (ILDs) to examine their fluctuations on a micro timescale and their dynamic links with daily antecedents and correlates (Goulter et al., 2024; Y. Zheng et al., 2025), offering a unique insight into elucidating the within-person variability of CU traits.
Expanded from the original four items from the Antisocial Process Screening Device, the 24-item Inventory of Callous-Unemotional Traits (ICU; self-, parent-, and teacher-report versions; Frick, 2004) was developed and has since become among the most comprehensive and widely used measurement tools of CU traits among different age groups spanning from children to adults (e.g., Docherty et al., 2024; Kemp et al., 2024; Y. Zheng et al., 2021). Subsequent factor analyses primarily tested three structures/models: (a) a single-factor model, with all items loading on a single CU factor; (b) a correlated-factor model, with each item loaded on one of three intercorrelated factors: Callousness (reduced empathic responding; e.g., “I did not care who I hurt to get what I want”), Uncaring (lack of concern about performance and relationships; e.g., “I work hard on everything I do,” reverse scored), and Unemotional (impoverished emotional experience and expression; e.g., “I did not show my emotions to others”; Byrd et al., 2013; Ray & Frick, 2020); and (c) a bifactor model, with each item loaded on both an overarching CU factor and one of the three aforementioned specific factors that are independent from each other. The bifactor model aligns with how CU traits are conceptualized in the Diagnostic and Statistical Manual for Mental Disorders (5th ed.; DSM-5; American Psychiatric Association [APA], 2013) as a specifier for Conduct Disorder (i.e., “with Limited Prosocial Emotions”). To qualify this specifier, an individual must display at least two of four traits: lack of remorse or guilt, callousness/lack of empathy, unconcern about performance, and shallow or deficient affect. This framework implies a common underlying disposition while allowing for potential heterogeneity in its manifestation, thereby supporting the use of a bifactor model. In such a model, all items load onto a general CU traits factor, reflecting their shared variance, while also allowing specific factors to retain unique variance. That is, although these items are thought to reflect a unified construct with shared causal or etiological processes, they may not always co-occur perfectly. The bifactor model accommodates this by modeling both the shared and distinct components of CU trait dimensions (Watts et al., 2024). Most studies also revealed that the bifactor model provides the best model fit across different age groups (e.g., adolescents aged 10–16 years, Ciucci et al., 2014; young adults, Byrd et al., 2013), although recent psychometric literature has called for cautions in the application and interpretation of the superior performance of bifactor models on fit indices, which may be partly attributed to their greater flexibility to accommodate complexity (Reise et al., 2016).
Although the Unemotional factor captures a core aspect of CU traits—namely, shallow or restricted emotional expression (Byrd et al., 2013; Ray & Frick, 2020)—research using the ICU has often reported weak-to-marginal internal consistency for this factor, along with its modest relations with the other two factors and criterion variables (e.g., antisocial behaviors; see Cardinale & Marsh, 2020; Deng et al., 2019 for systematic reviews). Many studies have therefore excluded some or all items from the Unemotional factor, leading to the development of shortened ICU versions. For instance, Hawes et al. (2014) developed a 12-item parent-report version (ICU-12) among boys aged 6–12 years, which removed all but one item from the Unemotional subscale based on item-total correlations and the Item Response Theory (IRT) analysis. Further studies have suggested that an 11-item version (ICU-11), which excluded the only remaining Unemotional item from ICU-12, achieved a better fit across self- (detained females aged 12–17 years, Colins et al., 2016; male adult offenders, Y. Zheng et al., 2021), parent-, and teacher-report (children in Grades 1 to 6, Wang et al., 2020) versions. This revised scale has demonstrated measurement invariance across informants, detained status (e.g., incarcerated vs. community samples), and cultures (e.g., Allen et al., 2021; Corbelli et al., 2024; Wang et al., 2020; Y. Zheng et al., 2021). Nonetheless, it is important to note that the Unemotional factor still carries important theoretical and empirical relevance to the construct of CU traits, especially in calculating an overall CU traits score (Byrd et al., 2013; Kimonis et al., 2008; Ray & Frick, 2020).
During adolescence and young adulthood, psychopathology symptoms arise and manifest primarily in various situations, events, or relationships (Thunnissen et al., 2022; Walz et al., 2014). To investigate how these symptoms fluctuate in the daily contexts where they manifest while minimizing recall bias (Sliwinski, 2011; Y. Zheng & Goulter, 2024), a burgeoning number of studies have employed ILDs to understand day-to-day variability of psychopathology symptoms on a micro timescale as well as their associations with contextual factors and treatment outcomes. Although the term CU “traits” may seemingly suggest relatively stable individual characteristics among adolescents and young adults, research has indicated that these traits are not immutable and are sensitive to environmental factors including treatments (Docherty et al., 2024; Fleming et al., 2022; Frick et al., 2014a; Perlstein et al., 2023). This view aligns with contemporary personality theory, which recognizes that traits reflect stable patterns of thinking, feeling, and behaving, but may still fluctuate meaningfully across time and context (Fleeson, 2004; Soto & Tackett, 2015; Wright & Simms, 2016). In light of these views, some scholars have recommended alternative terms, such as CU behaviors, features, or symptoms (Schuberth et al., 2019; Waller & Hyde, 2017), to emphasize that CU traits are not fixed and may exhibit within-person fluctuations. To our best knowledge, however, only two recent studies have examined CU traits in daily contexts using ILDs. Following 99 adolescents for 30 days, Goulter et al. (2024) found that CU traits demonstrated substantial within-person fluctuations at both item-level (intraclass correlation [ICC] = .56–.78) and subscale-level (ICC = .60–.66), as well as cross-day associations with positive affect and conduct problems at the within-person level. The other study revealed that daily parental inconsistent discipline was positively associated with adolescents’ CU traits on the next day (Y. Zheng et al., 2025).
These diary studies measured CU traits by directly using ICU-11/12, as brief instruments are necessary for ILDs to minimize participant burden and accommodate time limits. However, the original ICU and its short versions have only been evaluated psychometrically with cross-sectional or conventional longitudinal (e.g., multi-year) designs (e.g., Byrd et al., 2013; Kemp et al., 2024). The identified (sub)factors are based on between-person level structures, which indicate whether people exhibit general or specific CU traits that are similar to or different from others. For example, a between-person factor implies that individuals who score higher than others on one item also tend to score higher than others on additional items within the same factor. In contrast, a within-person factor structure reflects how items co-fluctuate over time within the same individual (Li et al., 2025; H. Zheng & Zheng, 2025). For instance, a within-person factor suggests that when individual scores higher than their own average on one item at one time, they are also likely to score higher than their average on other items within the same factor at the same time. Recent studies on positive and negative affect (Cooke et al., 2022) and externalizing and internalizing psychopathology (H. Zheng & Zheng, 2025) have shown that between-person level structure does not always translate to within-person level. Therefore, to accurately capture within-person fluctuations in CU traits, it is crucial to first establish the multilevel structure of CU traits measures while properly disentangling variance at the within- and between-person levels.
The secondary CU traits framework posits that, in contrast to primary CU traits, which stem from temperamental or genetically based emotional deficits, secondary CU traits are primarily shaped by adverse contextual factors, such as interpersonal stress and trauma (Craig et al., 2021; Schuberth et al., 2019). During the transition from adolescence to young adulthood, individuals often face challenges such as leaving the family of origin, shifts in financial dependence, and the need to establish new social ties (H. Zheng & Zheng, 2024). This heightened stress may lead some young adults to develop a “mask” of emotional detachment and callousness as an adaptive strategy to cope with stress (Craig et al., 2021). As a result, the patterns of item endorsement and the latent structure of CU traits may shift during this transition period. Evaluating whether the ICU functions equivalently across developmental stages is important to ensure that the observed changes in CU traits reflect actual developmental processes rather than measurement non-invariance.
Previous findings on the links between CU traits and psychopathology outcomes further suggest the existence of a potential secondary variant of CU traits and highlight the importance of distinguishing the factor structures of CU traits across different levels. While the total CU scores and all subfactor scores have consistently exhibited positive associations with externalizing problems (e.g., conduct problems; Cardinale & Marsh, 2020), their associations with internalizing problems are less consistent. Some studies reported that total CU and Uncaring scores were negatively linked with internalizing symptoms (e.g., depressive and anxiety symptoms; Colins et al., 2016; Hawes et al., 2014; Waller et al., 2015), whereas a meta-analysis found a slightly positive association (Cardinale & Marsh, 2020). One potential explanation for these mixed findings is that CU traits may exhibit divergent associations with internalizing symptoms over short- versus long-term periods and/or at different levels. Internalizing problems demonstrate substantial and meaningful daily within-person fluctuations (Walz et al., 2014; H. Zheng & Zheng, 2025). In the short term, emotional numbing and a lack of empathy may serve as a coping mechanism to help people detach from traumatic or distressing situations, creating temporary adaptive effects on mitigating the manifestation of internalizing symptoms (Craig et al., 2021). These short-term adaptive effects, however, may cumulatively lead to adverse outcomes in the long term. It remains to be explored how daily fluctuations in CU traits, when disentangled from between-person differences, are associated with internalizing and externalizing problems at the within-person level. Such an investigation could also help determine whether daily fluctuations in CU traits reflect meaningful variations and clarify whether factor structures across levels represent similar or distinct constructs.
The Present Study
The psychometric properties of the ICU have typically been evaluated in cross-sectional or conventional longitudinal designs at the between-person level. Catering to the burgeoning research investigating the daily fluctuations in CU traits and more broadly psychopathology symptoms using ILDs, this study aims to explore the factor structures and psychometric properties of the ICU at both within- and between-person levels in daily diary studies. The present study used two independent samples (one among adolescents and one among young adults), each with a month-long daily diary design, to evaluate CU trait structures previously identified and supported in the literature. A comprehensive set of standards was used to compare these models, including model fit, factor reliability and properties, longitudinal measurement invariance, as well as within- and between-person level criterion validities. We also explored whether the multilevel structure of CU traits remains invariant between adolescents and young adults. Based on previous studies, we expected that the bifactor model would show the best performance (better model fit and higher factor reliability) at the between-person level. Given the scarce literature on the CU traits structure at the within-person level, especially using daily diary designs, an exploratory approach was taken for within-person structures and measurement invariance, and we opted not to make any specific hypothesis. Regarding criterion validities, it was expected that CU traits factor scores would be positively associated with externalizing problems at both within- and between-person levels. In contrast, the associations between CU traits scores and internalizing problems were expected to be negative at the within-person level, but positive at the between-person level.
Method
Participants and Procedures
This study used data from two independent community-based samples of adolescents and young adults. The research procedure and instruments for both datasets received approval from the research ethics committee at University of Alberta. Survey instruments were developed and administered using RedCap (Harris et al., 2019). We report all data exclusions, all manipulations, and all measures used in the study. Since this study involved analyses of existing datasets rather than collecting new data, determining sample size was not applicable.
University Student Sample
An initial sample of 313 Canadian university freshmen (Mage = 18.1 years, SD = 1.31, range 17–29, 72% female) completed a baseline survey and at least one of a 30-day daily diary study (6,431 total observations, M = 21.43 days, SD = 9.65, 72.5% ≥ 20 days) between September and December 2019 in wave 1. Freshmen self-identified as Asian (53%), White (30.3%), Black (5%), Multiracial (4.7%), Native (0.7%), Latino or Hispanic (1%), and Others (5.3%). Two and a half years later, 194 (64% retention rate) participants took part in the baseline survey of wave 2 and at least one of the 30-day daily diary study (4,018 total observations, M = 21.0 days, SD = 8.84, 68.0% ≥ 20 days) between February and May 2022. Participants who retained across waves tended to be a bit younger (18.0 ± 0.7 vs. 18.5 ± 1.9, t[310] = 3.58, p < .001), but did not differ in sex, ethnicity-race, or parental education. Participants who did not participate in wave 2 showed slightly higher person-average means of the total CU traits (0.79 ± 0.37 vs. 0.70 ± 0.34, t[297] = 2.24, p = .013) and the Callousness factor (0.40 ± 0.45 vs. 0.29 ± 0.30, t[297] = 2.59, p = .010) in wave 1 than those who remained in the study, but did not differ in the Uncaring factor.
Participants were recruited from a large Western Canadian university through online advertisements, on-campus posters, and short in-class presentations. For wave 1, all first-year undergraduate students were eligible for inclusion, while wave 2 was limited to those who had completed at least the baseline survey in wave 1. In both waves, participants first completed a baseline survey after providing informed consent online, and participated in daily surveys for 30 consecutive days. Daily surveys were sent by email at 7 pm each night and participants were asked to complete the survey before going to sleep that night. Participants received a $60 and $75 e-gift card as compensation in each wave, respectively (see Cooke et al., 2022; H. Zheng & Zheng, 2025 for more information on recruitment procedures).
Adolescent Sample
A total of 99 Canadian adolescents (Mage = 14.60 years, SD = 1.76, range 12–17, 55.8% female) participated in a 30-day daily diary study (2,132 total observations, M = 21.6 days, SD = 7.80, 77.8% ≥ 20 days) between April 2019 and September 2020. Participants self-identified as 51.5% White, 23.2% Asian, 8.1% multiracial, 4.0% Latinx or Hispanic, 2.0% Black, 6.1% other, and 5.1% missing. Participants were recruited through newsletters, social media, and flyers posted or circulated in a Western Canadian province. Participants received an online baseline survey after providing consent/assent online, and an online daily survey 5 days after the completion of the baseline survey for 30 consecutive days. The daily survey was sent out at 5 pm, and adolescents were asked to fill out the survey before bed that night. Adolescents received a $45 e-gift card as compensation for their participation. Parental consent and adolescent assent were obtained prior to commencing the study (See H. Zheng et al., 2023 for more information on recruitment procedures).
Measures
Daily Callous-Unemotional Traits
Daily CU traits were assessed with an 11-item shortened version of the Inventory of Callous-Unemotional Traits (e.g., “I do not care if I get into trouble.” Colins et al., 2016; Frick, 2004; Wang et al., 2020) in the daily survey. One item deemed as infrequently occurring in the daily lives of adolescents and young adults and with low within-person fluctuations (i.e., “I apologize to persons I hurt”) was excluded. Therefore, the final measure included 10 items, which contains two subscales: Callousness (6 items) and Uncaring (4 items). Participants reported the extent to which they felt the way described by these items on that day on a 4-point scale from 0 (not at all true) to 3 (definitely true). Reverse-worded items were recoded before analyses. Higher scores indicate higher levels of CU traits.
Criterion Validity Measures
Daily Emotional and Conduct Problems
In the daily survey, emotional problems were measured with five items from the emotional problem subscale (e.g., “I have many fears.” “I worry a lot.”) of the Strengths and Difficulty Questionnaire (SDQ; Goodman et al., 1998) validated in daily diary research (H. Zheng & Zheng, 2025). Participants indicate how each item applies to them on that day on a 3-point scale rated (0 = not true, 1 = somewhat true, 2 = certainly true). Conduct problems were measured with five items from the conduct problem subscale for adolescents and three items (“I got very angry and lost my temper,” “I was generally willing to do what other people want,” and “I fight a lot.”) for university students. Scores were averaged with higher scores representing more emotional (university students wave 1: ordinal ωw = .61, ωb = .90; wave 2: ordinal ωw = .63, ωb = .86; adolescents: ordinal ωw = .73, ωb = .94) and conduct problems (university students wave 1: ordinal ωw = .72, ωb = .90; wave 2: ordinal ωw = .70, ωb = .87; adolescents: ordinal ωw = .65, ωb = .92), respectively.
Daily Anxiety Symptoms
University students reported their daily panic disorder (3 items; e.g., “When I got frightened, my heart beats fast.”), social (2 items; e.g., “I felt nervous with people I didn’t know well.”), and generalized anxiety (3 items; e.g., “I was nervous.”) symptoms modified from the Screen for Adult Anxiety Related Emotional Disorders (Angulo et al., 2017; Li et al., 2025) on a 3-point scale (0 = not true, 1 = somewhat true, 2 = very true) in wave 1. In wave 2, only panic disorder and social anxiety symptoms were assessed. Scores were averaged with higher scores reflecting more panic disorder (wave 1: ordinal ωw = .56, ωb = .87; wave 2: ordinal ωw = .65, ωb = .82), social (correlations between two items in wave 1: rw = .48, rb = .76; wave 2: rw = .48, rb = .80), and generalized anxiety (wave 1: ordinal ωw = .57, ωb = .92) symptoms. This measure was not implemented in the adolescent sample.
Depressive Symptoms
Depressive symptoms were measured using 17 items adapted from the Center for Epidemiological Studies Depression scale (Radloff, 1977) in the two baseline surveys. University students indicated how often the statements described them in the past year on a 4-point scale (0 = rarely or none, 1 = some or a little of the time, 2 = occasionally or a moderate amount of time, 3 = most or all of the time). Items were averaged with a higher score indicating higher levels of depressive symptoms (ordinal ωs = .90–.91). This measure was not implemented in the adolescent sample.
Analytic Strategy
Model Estimation
Multilevel confirmatory factor analyses (MLCFAs) using the weighted least squares with mean and variance corrected (WLSMV) estimator (default for ordinal items) were used to examine the structure of the 10-item ICU at the within- and between-person levels. All observed variables were treated as ordinal. First, ICCs for ICU items were calculated to determine the appropriateness of multilevel modeling. Next, models were estimated using data from the first wave of the university student sample. Three structures/models of ICU were tested: a single-factor model (i.e., a single factor at each level), a correlated-factor model with two factors, and a bifactor model with two specific factors. To investigate if ICU potentially exhibits distinct structures at different levels, models with varying structures across levels were also estimated. Certain subpar models were excluded based on their structural validity. The remaining models were retained for estimation in wave 2 of the university student sample and the adolescent sample to examine their replication across samples. All analyses were conducted in Mplus 8.3 (Muthén & Muthén, 1998–2017).
Structural Validity
Traditional model fit indices (Hu & Bentler, 1999) include the root mean square error of approximation (RMSEA) <.05, standardized root mean square residual (SRMR) < .08, comparative fit index (CFI) > .90, and Tucker–Lewis Index (TLI) > .90. It should be noted that the CFI, TLI, RMSEA, and SRMR at the within-person level (SRMRw) are not sensitive to the between-person level misspecification (Hsu et al., 2015). The SRMR at the between-person level (SRMRb) was specifically used to assess model fit at the between-person level.
Factor reliability was evaluated by the following indices: The index of construct replicability (H Index) assesses how well a latent factor can be replicated across studies, with H values > .80 for general factors and .70 for specific factors indicating optimal reliability (Rodriguez et al., 2016). For the correlated-factor model, omega subfactor (ωs) values > .75 are considered good (Revelle & Condon, 2019). In the bifactor model, omega hierarchical (ωh) and omega hierarchical specific (ωhs) indicate the proportion of total score variance specifically attributable to general and specific factors, respectively, with ωh/ωhs > .50 demonstrating acceptable reliability. Explained common variance (ECV) quantifies the percentage of common variance explained by each latent factor, which shows the relative strength of factors and the extent of unidimensionality (Rodriguez et al., 2016).
Longitudinal and Multigroup Measurement Invariance
Longitudinal measurement invariance (MI) was assessed in the university student sample. Unconstrained models were compared with models where factor loadings (metric MI), thresholds (scalar MI), and residual variances (strict MI) were constrained to be equivalent across two waves (Widaman et al., 2010). MI at the within-person level is indicated by a ΔCFI decreasing ≤ .01 and an RMSEA ≤ .015 (Khojasteh & Lo, 2015). MI at the between-person level is indicated by a decrease in SRMRb ≤ .030 (Khojasteh & Lo, 2015).
Multigroup MI was tested between the adolescent sample and wave 1 of the university student sample using the Maximum Likelihood estimation with the robust standard error (MLR) estimator. Since Mplus does not support multigroup analysis with multilevel models for categorical items, the ICU items were treated as continuous variables only in this analysis. The same criteria as those used for longitudinal MI were applied to indicate MI between the two groups.
Within- and Between-Person Criterion Validity
Factor scores estimated with the Bayesian estimator were used as observed variables to examine criterion validity. At the within-person level, concurrent predictive validity was evaluated by correlating each latent factor with same-day criterion measures. The between-person concurrent validity was examined by incorporating the correlations between latent factors at the between-person level and person-average levels of validity variables, reflecting the associations between the random intercepts of these components across individuals.
Transparency and Openness
This study was not preregistered. The code and output files for all the analyses are publicly available (https://osf.io/rh7mv). Data are not publicly available due to ethics agreements. However, the data required for the analyses performed in the study are available from the corresponding author upon reasonable request.
Results
Descriptive Statistics
All ICU items showed moderate to high ICCs in the university student (wave 1: .48–.64; wave 2: 46–.62) and adolescent (.56–.77) sample (Table 1). This indicates that approximately 46% to 77% of the variation in ICU items occurred at the between-person level, while the remaining variation can be attributed to within-person fluctuations over days.
Intraclass Correlations and Frequencies of ICU Items.
Structural Validity
University Student Sample
All three structures/models were fully crossed to enumerate all possible combined structures across levels in wave 1 of the university student sample. Fit indices (Table 2) reveal that models with the single-factor model at either within- or between-person level showed unacceptable model fit. The other models, comprising bifactor and correlated-factor models, demonstrated acceptable-to-good fit. Generally, the bifactor models fit the data better than the correlated-factor model at both within- (higher CFI and TLI, and lower RMSEA and SRMRw) and between- (lower SRMRb) person levels. Based on these results, we proceeded with only the bifactor and correlated-factor models to examine replication in wave 2 of the university student sample and the adolescent sample. All four models estimated with wave 2 data demonstrated acceptable-to-good fit (Table 2). The bifactor models provided a better fit to the data than the correlated-factor models at both the within- and between-person levels.
Fit Indices for the Multilevel Confirmatory Factor Analyses for Daily CU Traits.
Note. CF = Correlated Factors, RMSEA = root mean square error of approximation, SRMR = standardized root mean square residual, CFI = comparative fit index, TLI = Tucker–Lewis index.
In the correlated-factor models (Table 3), all factor loadings were positive and significant, with all indicators showing loadings ≥ .35 at both levels. For bifactor models, the specific factors were strongly indicated by the items at both within- (wave 1 & 2 median λw = .61 and .57) and between- (wave 1 & 2 median λb = .83 and .79) person levels. Loadings on the general factors were relatively lower at both within- (wave 1 and 2 median λw = .26 and .34) and between- (wave 1 and 2 median λb = .45 and .48) person levels, with two items showing non-significant factor loadings at the between-person level in each wave.
Standardized Factor Loadings in the Models.
Note. CF = Correlated Factors, Gen = General factor, Spec = Specific factor, w = with-person level, b = between-person level. All loadings are significant, ps < .05, except those in italic. Items 1, 5, 6, 7, 10, and 11 loaded onto the Callousness (specific) factor, items 2, 4, 9, and 12 on the Uncaring (specific) factor.
Overall, factor reliability was greater at the between-person level than at the within-person level (Table 4). The correlated-factor models were reliable and well-defined at both levels across waves, except for the Uncaring factor at the within-person level, showing unsatisfactory reliability (i.e., ωs < .75). In the bifactor models, the specific factors generally showed acceptable reliability (i.e., ωhs > .50) except for the Uncaring specific factor at the within-person level in wave 1. At both waves, the general factor did not reliably capture variances (i.e., ωh < .50) at either level.
Factor Reliability Indices.
Note. All models were estimated using the WLSMV estimator. H index = index of construct replicability, ωh = omega hierarchical, ωhs = omega hierarchical specific, ωs = omega specific, ECV = explained common variance.
Adolescent Sample
Consistent with the university student sample, all four models among adolescents showed acceptable-to-good fit (Table 2). In the correlated-factor model, all factor loadings were positive and significant, with only one item in the Uncaring factor having loadings < .35 at the between-person level (Table 3). In the bifactor model, the factor loadings of the specific factors were not as strong as those observed among university students (median λw = .47; median λb = .42), whereas the loadings for the general factors were higher than those among university students (median λw = .45; median λb = .67). At the within-person level, one and two items had non-significant factor loadings on the general and specific factors, respectively. At the between-person level, three out of four factor loadings appeared to be non-significant within the Uncaring specific factor.
The reliability indices of the correlated-factor models in the adolescent sample showed a generally consistent pattern with those observed in the university student sample (Table 4). In the bifactor models, the specific factors generally showed unacceptable reliability (i.e., ωhs < .50), with the exception of the Uncaring specific factor at the within-person level. Nonetheless, the general factor exhibited good reliability at both levels (i.e., ωh > .50). At the between-person level, the models were unidimensional to some extent since the general factor accounted for the most variance (ECVb of the general factor = .71) and showed superior reliability and replicability (ωh-b = .78; Hb = .96).
Longitudinal and Multigroup Measurement Invariance
As shown in Table 5, both the correlated-factor and bifactor models achieved longitudinal metric MI at the within-person level (i.e., ∆CFI ≤ .01 and ∆RMSEA ≤ .015), as well as metric, scalar, and strict MI at the between-person level (i.e., ∆SRMRb ≤ .030). The correlated-factor model revealed no significant difference between the latent means of the two factors over time. The bifactor model indicated a slight increase in the general factor from wave 1 to wave 2 (Diff = .19, SE = .10, p = .048), while the two specific factors showed no significant change.
Longitudinal Measurement Invariance Tests Among the University Student Sample.
Note. w = with-person level, b = between-person level. In the unconstrained models, the means of latent factor scores at the within-person level in both waves, and at the between-person level in wave 1 were fixed to 0, and the variances at both levels in wave 1 were fixed to 1. The first factor loading of each latent factor was free to estimate, while corresponding first loadings at the same level were constrained to be equal across waves 1 and 2. The first threshold of each latent factor was constrained to be invariant across time at each level (Widaman et al., 2010).
As shown in Table 6, the correlated-factor model demonstrated full MI across the adolescent sample and wave 1 of the university student sample. University students exhibited a higher latent mean of Uncaring (Diff = .59, SE = .06, p < .001) and a marginally higher latent mean of Callousness (Diff = .08, SE = .04, p = .051) than adolescents. Regarding the bifactor model, metric MI at the between-person level was not supported (∆SRMRb > .030). Thus, further steps were not conducted at the between-person level. At the within-person level, metric MI was supported, but strict MI was violated (∆CFI > .01).
Multigroup Measurement Invariance Tests Among the Adolescent Sample and the First Wave of the University Student Sample.
Note. w = with-person level, b = between-person level. All models were estimated with the MLR estimator.
Within- and Between-Person Criterion Validity
University Student Sample
In both waves (Table 7), the Callousness factor in the correlated-factor model was positively correlated with all validity variables at the within- (wave 1: rw = .09–.22; wave 2: rw = .11–.23) and between- (wave 1: rb = .26–.55; wave 2: rb = .26–.49) person levels. The Uncaring factor was correlated with conduct problems in both waves at the within- (wave 1 & 2: rw = .28 and .27) and between- (wave 1 & 2: rb = .60 and .66) person levels. However, the Uncaring factor was negatively correlated with internalizing symptoms. Specifically, at the within-person level, Uncaring was negatively associated with same-day emotional problems (rw = −.05, p = .021), social anxiety (rw = −.06, p = .009), and generalized anxiety (rw = −.12, p < .001) symptoms in wave 1, but not in wave 2. At the between-person level, Uncaring was negatively correlated with person-average levels of emotional problems (rb = −.12, p < .001) and generalized anxiety symptoms (rb = −.22, p < .001) in wave 1.
Criterion Validity Tests at the Within- and Between-Person Level.
Note. Emo = Emotional Problems; Con = Conduct Problems; Panic = Panic Disorder Symptoms; Soc = Society Anxiety Symptoms; Gen = Generalized Anxiety Symptoms; Dep = Depressive Symptoms.
p < .05. **p < .01. ***p < .001.
In the bifactor model, the general factor was correlated with almost all validity variables at the within-person level (wave 1: rw = .10–.18; wave 2: rw = .05–.26), with the exception of panic disorder symptoms in wave 2. At the between-person level, all person-average validity variables were correlated with general factor scores in wave 1 (rb = .15–.46) and wave 2 (rb = .16–.34). The Callousness specific factor exhibited a generally consistent pattern with the Callousness factor in the correlated-factor model, showing positive correlations with almost all validity variables at both levels in both waves. Regarding the Uncaring specific factor, it only showed positive correlations with conduct problems at both levels, but exhibited negative correlations with internalizing symptoms at both levels.
Regarding prospective correlations, after controlling for wave 1 corresponding validity variable, in the correlated-factor model, wave 1 Callousness factor was positively correlated with emotional problems (r = .27, p < .001) and generalized anxiety (r = .15, p = .050) in wave 2. Wave 1 Uncaring factor was positively correlated only with conduct problems (r = .23, p = .001) but not with any internalizing symptoms in wave 2. In the bifactor model, the general factor (r = .17, p = .018) and the Callousness specific factor (r = .23, p = .002) in wave 1 were positively correlated with emotional problems in wave 2. The Uncaring specific factor was positively correlated with conduct problems (r = .24, p < .001).
Adolescent Sample
The Callousness factor in the correlated-factor model was positively correlated with emotional and conduct problems at both levels, while the Uncaring factor was only positively correlated with conduct problems (rw = .13, p < .001; rb = .31, p < .001) but not with emotional problems (rw = .05, p = .087; rb = .18, p = .162). In the bifactor model, the general factor was correlated with conduct problems at the within-person level (rw = .09, p = .004), as well as both validity variables at the between-person level (rb = .23–.67). The Callousness specific factor was positively correlated with both validity variables at both levels, while Uncaring was only positively correlated with conduct problems.
Discussion
This study examined the within- and between-person factor structure of CU traits in daily contexts using two independent samples with month-long daily diary designs. The results indicated that both the bifactor and correlated-factor models demonstrated accepted fit at the within- and between-person levels, though the general factor in the bifactor model in the university student sample showed low reliability and replicability. Longitudinal MI was observed within the university sample over a 2.5-year span, while structural differences emerged between adolescents and university students. At both levels, the general factor and the Callousness (specific) factor were positively associated with internalizing and externalizing problems across both samples. In contrast, the Uncaring (specific) factor was positively associated with conduct problems in both samples but negatively associated with internalizing problems among university students.
Corroborating previous studies conducted at the between-person level (Byrd et al., 2013; Ciucci et al., 2014; Wang et al., 2020; Y. Zheng et al., 2021), conventional fit indices suggest a preference for the bifactor model at both within- and between-person levels. These results warrant cautious interpretation, as these fit indices tend to favor models with greater flexibility (Reise et al., 2016). In the university student sample, the general factors explained < 35% of the total variance at both levels and showed low reliability (ωh < .50) and replicability (H index < .70), suggesting that this general factor may primarily reflect absorbed measurement error rather than a true latent factor (Rodriguez et al., 2016). These results indicate that there is no reliable general factor at either within- or between-person level in this sample. In contrast, the general factor showed acceptable reliability and replicability in the adolescent sample, particularly at the between-person level.
The measurement non-invariance across the two groups further emphasizes that the bifactor model, particularly at the between-person level, is not equivalent across different age groups. This discrepancy aligns with the abovementioned findings regarding divergent psychometric validity across the two samples and suggests that the factor structure of CU traits may change substantially from adolescence to young adulthood. Using a bifactor model may be more appropriate for adolescents. The unidimensionality observed at the between-person level suggests that future studies using ILDs in adolescents could reasonably rely on the general factor of CU traits (Ray & Frick, 2020; Reise et al., 2016; Rodriguez et al., 2016). This finding aligns with a previous meta-analysis, which demonstrated that the reliable variance in the total score of CU traits were largely determined by the general factor in the bifactor models, thereby recommending simply using the total score rather than subscale scores in future studies (Ray & Frick, 2020). Nonetheless, at the within-person level, both the general and specific factors should be considered for a more comprehensive understanding of CU traits in adolescents. If the research specifically focuses on the two subfactors, the correlated-factor model can also be applied to generate separate subfactor scores. In the young adult sample, the unreliable general factors indicate that items in the Callousness and Uncaring factors appear more heterogeneous and distinct and share fewer co-fluctuations. Studies in this population may benefit from assessing Callousness and Uncaring as separate factors to more accurately capture CU traits and to explore their potentially divergent associations with antecedents and behavioral outcomes.
Consistent with findings from between-person level studies (e.g., Hawes et al., 2014), CU traits were consistently associated with conduct problems at the within-person level in both samples. The negative associations between the Uncaring (specific) factor and emotional problems and anxiety symptoms in university students join the existing evidence (e.g., Fontaine et al., 2023; Waller et al., 2015) that CU traits may be developed as a coping strategy to offer a short-term protective effect against internalizing symptoms for young adults. The challenges during the transition from adolescence to young adulthood may prompt emotional numbing and avoidance of stressful situations, such as neglecting their performance in critical contexts (Byrd et al., 2013; Craig et al., 2021), thereby reducing their exposure to stressors that could otherwise exacerbate internalizing symptoms. It has been expected that these short-term adaptive processes might cumulatively increase internalizing problems over the long term (Craig et al., 2021). However, we did not find evidence supporting this hypothesis in either between-person level associations, which revealed negative associations, or the prospective correlations, which showed no link between Uncaring and internalizing symptoms 2.5 years later. It should be noted that the between-person level results in this study do not fully represent long-term effects, as they only capture associations between person-average means over a relatively short 30-day period. These findings underscore the complexity in the link between CU traits and internalizing symptoms and highlight the need for further exploration across various timescales and developmental periods. Such approaches could clarify whether the short-term effects between CU traits and internalizing symptoms persist and how they relate to long-term effects (Y. Zheng & Goulter, 2024), offering a more nuanced perspective of the role of CU traits in the development of internalizing symptoms.
Taken together, these findings offer several important implications for clinical practice and future research. First, the substantial daily fluctuations observed in CU traits highlight their “state” feature. These results support the notion that, similar to personality traits (Fleeson, 2004; Soto & Tackett, 2015; Wright & Simms, 2016), CU traits are not “fixed” individual characteristics but exhibit meaningful dynamics and variability (Fleming et al., 2022; Goulter et al., 2024; Schuberth et al., 2019; Waller & Hyde, 2017). Incorporating intensive assessments within daily contexts may hence provide a more ecologically valid understanding of how CU traits manifest in everyday life. Second, the findings demonstrate that the latent structure of CU traits can diverge across levels. For instance, the unidimensional structure observed at the between- but not the within-person level among adolescents indicates that while CU traits items tend to co-occur across individuals (i.e., individuals who score higher than others on one item also tend to score higher on other items), they do not necessarily fluctuate together within individuals on a day-to-day basis (Li et al., 2025; H. Zheng & Zheng, 2025). These results highlight the importance of differentiating between within- and between-person structures in the assessments of CU traits. Within-person level structures can be used to track and monitor CU traits in daily contexts over time and understand their short-term antecedent and outcomes on a micro timescale. In contrast, between-person level structures are informative for ranking or comparing the level of CU traits across individuals. Third, these insights could inform interventions to treat CU traits as modifiable characteristics in adolescents and young adults. Adopting a micro timescale approach could enable context-sensitive intervention strategies to manage daily variations in CU traits (Y. Zheng & Goulter, 2024) and, by extension, decrease the likelihood of their progression to more severe externalizing problems. In addition, the measurement non-invariance in factor structures across adolescent and university student samples may suggest that CU traits reorganize during the transition from adolescence to young adulthood. In adolescents, targeting general CU traits through broad-based treatments may be effective, whereas in young adults, tailoring interventions to more specific components such as Uncaring or Callousness may be more beneficial.
Strengths, Limitations, and Future Directions
This study has several notable strengths. Previous research has examined CU traits structure primarily using cross-sectional or conventional longitudinal designs with long time intervals and focused on between-person analyses. This study addressed these limitations by conducting month-long daily diary designs to explore the CU traits structure at both within- and between-person levels. The findings confirm meaningful daily variations in CU traits, which align with emerging research that emphasizes a micro-level approach to better capture psychopathology symptom variability and dynamic links with contextual factors and behavioral outcomes (Thunnissen et al., 2022; Walz et al., 2014). Moreover, this study replicated findings across two independent samples and identified age-related factor structure differences, as well as stability within young adults over 2.5 years, which highlights both developmental continuity and discontinuity in the manifestation of CU traits.
Despite these strengths, several limitations warrant consideration. First, the current study used data from two community samples with relatively low endorsement of items, reflecting limited levels of severity. Replications with high-risk samples (e.g., incarcerated and clinical) could help validate and extend these findings to populations with elevated CU traits (Fontaine et al., 2023; Kemp et al., 2024; Y. Zheng et al., 2021). In addition, the sex distribution in the current study is unbalanced, with over 70% of the participants in the university student sample identifying as female. This may limit the generalizability of the findings, particularly regarding potential sex differences in the expression and fluctuation of CU traits. Future research should replicate these findings in more sex-balanced and diverse samples to enhance generalizability. Second, this study relied exclusively on self-reports. Different informants (e.g., self- vs. parent-reports) may influence psychometric properties of the ICU (Cardinale & Marsh, 2020; Deng et al., 2019; Wang et al., 2020). Future studies should incorporate multi-informants to examine the robustness of within- and between-person factor structures. In addition, although we confirmed sufficient within-person variability in these ICU items, the scale was originally developed for trait-level assessment. Some items may still reflect retrospective evaluations rather than context-sensitive behaviors. Future research could consider developing or validating CU trait measures specifically designed for ILDs. Third, it remains possible that the factor structure of CU traits is partly driven by method variance (Hawes et al., 2014; Ray & Frick, 2020), as all items in the Callousness factor are negatively worded, while items in the Uncaring factor are positively worded. Previous studies have found that positively worded items tend to better discriminate individuals with higher levels of CU traits, and negatively worded items discriminate best at lower levels (Hawes et al., 2014; Ray & Frick, 2020). The current study, unfortunately, cannot directly explore potential method variance as it would require positively worded Callousness items and negatively worded Uncaring items (Paiva-Salisbury et al., 2017). Future studies should integrate IRT at the within-person level to examine item discrimination efficiency in daily contexts, as well as consider using the full 24-item ICU scale to better separate substantive variance from method variance to facilitate more accurate comparisons of alternative models. Using the full ICU scale would also enable evaluating the psychometric validity of the Unemotional factor, which reflects a critical component of CU traits but was excluded from the short-form version due to its low internal consistency at the between-person level (Colins et al., 2016; Wang et al., 2020). Investigating its manifestation at the within-person level may offer important and novel insights into its dynamic properties and contribute to a more comprehensive understanding of CU traits. Finally, the current study assessed adolescents' and young adults' CU traits once per day, which limits our ability to detect meaningful fluctuations that may occur on finer timescales, such as within hours. Future research could employ ecological momentary assessment (Thunnissen et al., 2022) to capture more granular, moment-to-moment changes in CU traits and to examine whether there are meaningful and robust underlying structures on more refined timescales.
Conclusion
Daily CU traits exhibit meaningful within-person fluctuations in adolescents and young adults, highlighting their dynamic nature in daily contexts. The manifestation of CU traits appears to change across different developmental periods. In adolescents, both bifactor and correlated-factor models can be used at different levels depending on specific research purposes. Among young adults, the Callousness and Uncaring factors exhibit greater heterogeneity and distinctiveness, suggesting that these factors should be analyzed separately to explore their potentially divergent underlying mechanisms. These findings underscore the importance of ILDs in deepening our understanding of CU traits in real-world settings, which could inform context-sensitive intervention strategies aimed at managing daily fluctuations in CU traits and potentially reducing their progression to severe externalizing problems.
Footnotes
Acknowledgements
The authors gratefully acknowledge all the participants, research assistants, Elk Island and St. Albert public schools, and the following organizations at University of Alberta for their support: International Student Services, English for Academic Purposes program, New Chinese Generation, Chinese Students and Scholars Association, iGeek, Undergraduate Research Initiative, China Institute, East Asian Studies Undergraduate Students Association, and Taiwanese Student Association. Study data were collected and managed using RedCap electronic data capture tools hosted and supported by the Women and Children’s Health Research Institute at the University of Alberta.
Data Availability Statement
Research data are not publicly available due to ethics agreements. However, the data required for the analyses performed in the study are available from the corresponding author upon reasonable request. This study was not preregistered. To promote transparency and openness, the codes for all the analyses are publicly available at
.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported partly with funding from the China Institute at the University of Alberta, the Social Sciences and Humanities Research Council (IDG 430-2018-00317 and 409-2020-00080) and Natural Sciences and Engineering Research Council (RGPIN-2020-04458 and DGECR-2020-00077) of Canada, and a Killam Research Fund Cornerstone Grant. HZ was supported by a Mitacs Accelerate Grant (IT 18227) awarded to YZ, the Ivy A Thomson and William A Thomson Scholarship, and the Women and Children’s Health Research Institute Graduate Studentship.
