Abstract
Callous-unemotional (CU) traits are directly related to psychopathic traits, as they are based on features such as emotional insensitivity, lack of empathy, and lack of remorse. These traits typically develop during childhood and adolescence. Identifying them is extremely important, as they are predictors of antisocial behavior and the potential development of psychopathic traits in adulthood. The Inventory of Callous-Unemotional Traits (ICU) has been widely used to measure callous-unemotional traits in children and adolescents. A systematic literature review was conducted to establish reference values for the ICU in these age groups. Using specific inclusion and exclusion criteria, relevant studies were identified through databases and manual searches. The review included articles with empirical and quantitative methodologies. A total of 297 studies were included. Across regions, ICU scores were consistently higher in clinical, Disruptive Behavior Disorder-diagnosed, and forensic samples than in community samples, supporting the role of CU traits as markers of elevated psychopathological risk. Subscale analyses showed that Callousness and Uncaring more reliably differentiated high-risk groups than the Unemotional subscale. Substantial regional variation was observed, with particularly elevated community scores in Asian samples and broadly comparable forensic scores across Western regions. ICU scores vary systematically by sample type, region, and subscale, underscoring the importance of context-sensitive interpretation. The pooled reference values provided in this review offer an empirical framework for comparative and clinical interpretation of ICU scores in the absence of standardized norms or cutoffs.
Plain Language Summary
Callous unemotional traits describe patterns such as low empathy, reduced emotional sensitivity, and little feeling of guilt or remorse. These traits usually begin to appear during childhood and adolescence and are important to identify because they are linked to higher risks of serious behavior problems and antisocial behavior later in life. In some cases, they may also be associated with the development of psychopathic traits in adulthood. The Inventory of Callous–Unemotional Traits (ICU) is one of the most commonly used questionnaires to measure these traits in children and adolescents. However, understanding what an ICU score means can be difficult without clear reference values showing what scores are typical for different groups. This study reviewed nearly 300 scientific studies from different regions of the world to summarize reference values for ICU scores in young people. The results show that ICU scores are generally higher in clinical groups, in youth diagnosed with disruptive behavior disorders, and in forensic samples than in children and adolescents from the general community. This supports the idea that higher CU traits are linked to increased psychological and behavioral risk. The review also found that some parts of the ICU, especially those measuring lack of care for others and lack of concern about performance, were better at distinguishing high-risk groups than items focused only on emotional expression. ICU scores also varied across regions, with higher average scores in some Asian community samples and similar patterns in forensic samples across Western countries. Overall, this review highlights the importance of considering context, sample type, and cultural background when interpreting ICU scores and provides useful reference values to support research and clinical decision-making.
Research has shown that psychopathic traits are essential for understanding antisocial behavior in adults, as individuals with pronounced psychopathic traits tend to display a particularly persistent, severe, and aggressive pattern of antisocial behavior (Anderson & Kiehl, 2014; Frick et al., 2003; Gendreau et al., 2002; Hare, 2003; Vitacco & Vincent, 2006). While psychopathic traits were once considered a significant specifier for severe antisocial and aggressive behaviors in adults, they are now recognized as a relevant factor in the development of conduct disorder in youth (Frick et al., 2000; Pisano et al., 2017; Ručević & Andershed, 2022). Over the past two decades, these traits in child and adolescent samples have been consistently linked to more severe and persistent patterns of conduct disorder and aggression, lower levels of social competence and prosocial behavior, delinquency and adult antisocial behavior, and adult psychopathy (Colins et al., 2024; Frick et al., 2014; Lynam et al., 2009; Salekin, 2017; Salekin et al., 2022, 2025). Several of these findings have even been replicated in early childhood research (e.g., Colins et al., 2021; Ezpeleta et al., 2013; López-Romero et al., 2022). Expanding the psychopathy construct to youth has thus become an important focus of research (Byrd et al., 2013; Harris et al., 2024; Ngo et al., 2024; Salekin & Lynam, 2011), with a particular emphasis on callous-unemotional (CU) traits (Frick & Ellis, 1999; Frick & Hare, 2001).
Callous-unemotional traits refer to specific deficits in affective experience and interpersonal style, characterized by a lack of guilt, lack of empathy, disregard for others for personal gain, and diminished emotional expression (Cooke et al., 2006; Fanti et al., 2009; Frick & White, 2008; Kimonis et al., 2015; Muñoz et al., 2011). Evidence from previous studies suggests that these traits are relatively stable from childhood through adolescence (Frick et al., 2003; Obradović et al., 2007), at least when compared to other measures of childhood personality and psychopathology (Frick et al., 2003). While these traits do not necessarily result in adverse outcomes (Kimonis, Frick, Munoz, & Aucoin, 2008), they may serve as significant predictors of distinct patterns of aggressive and violent behavior in youth populations.
Compared to other antisocial youth, those with CU traits are more likely to exhibit impairments in processing negative emotional stimuli (Blair, 1999; Blair et al., 2001; Kimonis, Frick, Munoz, & Aucoin, 2008; Kohls et al., 2025; Loney et al., 2003), show lower levels of fear and anxiety (Frick et al., 2003; Frick & Ellis, 1999; Lynam et al., 2005; Michielsen et al., 2022), and display diminished sensitivity to punishment cues - particularly in contexts where behavior is motivated by potential rewards (Barry et al., 2000; Fisher & Blair, 1998; Paz et al., 2024; Sakki et al., 2024).
It is worth noting, however, that Cleckley’s original conceptualization of psychopathy emphasized that its core traits - comparable to CU traits - can be present in individuals who do not display antisocial behavior (Cleckley, 1976). Indeed, later longitudinal studies suggest that CU traits can appear independently of antisocial behavior (e.g., Barker et al., 2011; Fontaine et al., 2011; Frick et al., 2003; Rowe et al., 2010), though they are often associated with later antisocial outcomes (e.g, Squillaci & Benoit, 2021). Moreover, youth exhibiting CU traits typically demonstrate other disruptive characteristics, such as poor peer relationships, low prosocial behavior, and increased hyperactivity (e.g., Fontaine et al., 2011; Matlasz et al., 2022; Pueyo et al., 2024; Zych et al., 2019). Thus, CU traits may serve as a potential clinical marker of psychiatric vulnerability and psychosocial maladjustment, and may also aid in the subtyping of children and adolescents with disruptive behavior disorders (Donohue et al., 2023; Ezpeleta et al., 2017; Johnson et al., 2014; Wright et al., 2019).
Several tools have been developed to assist in the assessment of both general psychopathy levels and specific psychopathic traits in adolescence and even childhood, including the Psychopathy Checklist: Youth Version (PCL: YV; Forth et al., 2003), the Child Psychopathy Scale (CPS; Lynam, 1997), and the Youth Psychopathic Traits Inventory (YPI; Andershed et al., 2002). Although several instruments include subscales for assessing CU traits, the Inventory of Callous-Unemotional Traits (ICU; Frick, 2004) was explicitly developed as a comprehensive and independent measure, enabling a more detailed and focused assessment of the affective dimensions of psychopathy (Essau et al., 2006).
Inventory of Callous-Unemotional Traits
The ICU (Frick, 2004) was developed to assess CU traits in children and adolescents. It evolved from the Antisocial Process Screening Device (APSD; Frick & Hare, 2001), which in turn was based on the Psychopathy Checklist–Revised (PCL-R; Hare, 2003). Since its development in 2004, the ICU has been translated into more than 28 languages and validated across a wide range of developmental stages, cultural contexts, and assessment settings. Validation studies have supported its use in child samples (e.g., Benesch et al., 2014; Ezpeleta et al., 2013; Figueiredo et al., 2022, 2023; Houghton et al., 2013; Kimonis et al., 2015), as well as in adolescent samples drawn from community settings (e.g., Byrd et al., 2013; Carvalho et al., 2018; Essau et al., 2006) and forensic settings (Pechorro et al., 2016, 2017a, 2017b, 2018; Kimonis, Frick, Skeem, et al., 2008). Although the ICU is also widely used in clinical contexts, no validation study has been conducted specifically in clinically referred samples, meaning that its psychometric properties in these populations are typically inferred from community- and forensic-based research. Today, the ICU is widely recognized and used internationally as a key tool for assessing CU traits in children and adolescents.
The ICU consists of 24 items rated on a 4-point Likert scale, ranging from 0 (= Not at all true) to 3 (= Definitely true). There are five versions of the ICU with identical content but slight wording differences: (a) parent and teacher versions for preschool-aged children; (b) parent and teacher versions for school-aged children; and self-report versions for (c) school-aged children, (d) adolescents, and (e) adults. The first validation study of the ICU was conducted by Essau et al. (2006) in a sample of adolescents and proposed a three-factor model: (a) Unemotional (lack of emotional expression); (b) Uncaring (lack of concern for performance and others’ feelings); and (c) Callousness (lack of empathy, remorse, and guilt). Over the years, many other studies have examined the psychometric properties of the ICU across different countries, languages (e.g., English, Chinese, German, Norwegian, Portuguese, and Spanish), age groups, and settings, with findings suggesting variability in its factor structure - such as two-factor models (Uncaring and Callousness; Carvalho et al., 2018; Figueiredo et al., 2022, 2023; Kimonis et al., 2015; Willoughby et al., 2015) and three-factor models (Uncaring, Unemotional, and Callousness; Benesch et al., 2014; Essau et al., 2006; Ezpeleta et al., 2013; Pechorro et al., 2016, 2017a, 2017b, 2018). This heterogeneity suggests that the factor structure of the ICU may vary across developmental stage, language, culture, and assessment context, underscoring the importance of accounting for these sources of variability when interpreting ICU scores.
Despite its widespread use in research and clinical settings, the ICU lacks empirically established normative values or clinically validated cutoff scores for children and adolescents. To date, no study has defined thresholds that reliably distinguish between typical and clinically significant levels of callous–unemotional traits across age groups, cultures, or assessment settings. As a result, ICU scores are generally interpreted relatively or descriptively, often by comparing individuals or groups within a given sample rather than against standardized reference values. This lack of agreed-upon norms and cutoffs represents an important limitation for both research and clinical practice. It underscores the need for studies that provide reference values across different populations and contexts.
An essential issue in the literature concerns the compatibility and comparability of CU trait assessment measures in children and adolescents. Establishing equivalence across measures is necessary for scientific purposes, as inconsistencies between instruments can obscure findings and potentially lead to inaccurate or misleading conclusions (Borsboom, 2006). More specifically, if CU trait scores systematically vary due to factors such as instrument factor structure or individual characteristics like gender or age, there is a risk of misinterpreting, misclassifying, or underestimating the presence of CU traits in certain youth groups.
For the above reasons, establishing reference values for the ICU can be especially valuable for its practical application and the interpretation of results, both in clinical practice and in scientific research related to CU traits in youth populations. Thus, the present study aims to systematically review the use of the ICU and identify reference values for this inventory.
Method
The systematic literature review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2021). These guidelines are designed to ensure the quality, transparency, and consistency.
Search and Study Selection Strategy
The research for the studies was conducted until June 2025 across multiple databases, namely EBSCO, PubMed, and Web of Science. The EBSCOhost platform provides integrated access to several key databases in psychology and health sciences, including APA PsycNet, PsycINFO, Scopus, and the Cochrane Library, ensuring broad coverage of the relevant literature. The following search expression was applied across all platforms: FX (“Inventory of Callous-Unemotional Traits” OR ICU) AND AB (juvenile* OR adolescen* OR youth) NOT AB (“intensive care unit”). This database search was supplemented by a manual search through the reference lists of the selected studies.
The SPIDER criteria (Sample, Phenomenon of Interest, Design, Evaluation, and Research type; Cooke et al., 2012) were employed to define the inclusion and exclusion criteria (Camilo & Garrido, 2019). Accordingly, the inclusion criteria were as follows: (1) Type of publication – the study must present empirical and quantitative data; (2) Population – the sample had to consist of children and adolescents up to 20 years of age. Although contemporary developmental frameworks conceptualize adolescence and youth as extending into the mid-twenties (up to 26 years; see Sawyer et al., 2018), the present review adopted a more conservative upper age limit to ensure that the included samples reflected individuals primarily with childhood and adolescence, minimizing developmental heterogeneity associated with the transition to adulthood; (3) Outcome – the study must report scores for the ICU total and/or its subscales, providing means (M) and standard deviations (SD) (or another statistical measure that allows for variance calculation). When such data were not reported, the primary authors of the studies were contacted and asked to provide them.
The exclusion criteria were as follows: (1) Type of publication – unpublished articles, reviews, theoretical papers, commentaries, case reports, theses, dissertations, and editorials were excluded; (2) Population – studies including adult participants were excluded unless separate data were provided for participants aged 20 years or younger; Samples consisting of children or adolescents with primary diagnoses of autism spectrum disorder, anxiety disorders, or depressive disorders were excluded; (3) Outcome – (a) studies that did not report means and standard deviations, even after attempts to obtain them by contacting the corresponding authors via email, were excluded; (b) studies were excluded if they used modified versions of the ICU, defined as versions in which items were added, removed, or assigned to factor structures that differed from the original 24-item instrument, unless sufficient psychometric information was provided to support their equivalence. Sample type (community, forensic, clinical, community with DBD, or clinical with DBD) was determined based on the recruitment setting and diagnostic information explicitly reported in each study. A diagnosis of disruptive behavior disorder (DBD) was considered present only when established using standardized diagnostic criteria (e.g., DSM or ICD) or validated diagnostic instruments. It should be noted that when the same sample was used across multiple studies, results were considered only once to avoid duplication.
Two independent reviewers selected the studies based on abstract screening, as recommended by the PRISMA guidelines (Page et al., 2021). Inter-rater agreement for the study selection process was assessed using Cohen’s Kappa (Landis & Koch, 1977), revealing substantial agreement, K = .86, p < .001. Discrepancies between reviewers were discussed and resolved by consensus.
Quality and Risk Bias of Quantitative Studies
The Quantitative Research Assessment Tool (QRAT; Child Care & Early Education Research Connections, 2019) was used to assess the methodological quality of the studies included in this review. The QRAT comprises 12 items regarding the methodological features of the studies. Items can be rated −1, 0, 1, or NA (not applicable), except for the 12th question, where NA is not an option. According to the QRAT specifications, studies with lower scores should be regarded with more caution compared with studies that have higher scores, which are methodologically more robust. Most of the studies included in this review (89%) obtained a score higher than seven (see Supplemental Table 1).
Identification and Screening
A total of 998 articles, published between 2006 and 2025, were identified across all databases as mentioned above. A total of 206 studies were removed as duplicates, leaving 792 for screening. Based on the abstract review, 195 articles were excluded due to: (a) type of publication (n = 37 studies); (b) population type (n = 24); and (c) failure to include an assessment of CU traits and/or use of the ICU (n = 133). In total, 600 articles were retained for full-text review, from which 312 were excluded for the following reasons: (a) incorrect publication type (n = 17); (b) inclusion of individuals over the age of 20 in the sample (n = 25); (c) failure to assess CU traits using the ICU (n = 54); (d) absence of reported means and/or standard deviations (or other statistics allowing for variance calculation) (n = 201); (e) studies using repeated samples (n = 4); (f) the full text of the study was not accessible, and the authors did not respond to the request for access (n = 7); and (g) composite scores combining different ICU versions (e.g., parent + teacher) (n = 4). After full-text screening, 288 studies were included in this systematic review, supplemented by 11 additional records identified from other sources, for a total of 299 studies (see Figure 1). Flowchart of the literature review process
Statistical Analysis
To provide reference values for the total ICU scores and their respective subscales, means and standard deviations from all studies included in this systematic review were compiled. The pooled mean was calculated (by summing the means multiplied by their respective sample sizes and dividing the result by the sum of the sample sizes), along with the pooled variance - which estimates the variance across different populations with differing means - and the pooled standard deviation (the square root of the pooled variance) for the total scores of the several available ICU versions, as well as for each subscale (i.e., Callous/Callousness, Uncaring, and/or Unemotional). It is important to note that only results from the same Likert scale and factorial structure were pooled. Given prior evidence that ICU scores and factor structures may vary systematically as a function of developmental stage, language, culture, and assessment context, all pooled estimates were stratified by sample type, sex, geographic region, and ICU version to preserve comparability and to avoid conflating meaningfully different populations. Thus, this procedure was conducted separately for each sample-type, namely: (a) forensic samples, referring to participants recruited from institutions within the justice system (e.g., juvenile justice facilities, detention centers, or court-mandated services); (b) community samples, consisting of participants drawn from the general population in non-clinical settings; (c) community samples with a diagnosis of Disruptive Behavior Disorders (DBD), which include participants recruited from community settings who met formal diagnostic criteria for a DBD; (d) clinical samples, comprising participants recruited from mental health or clinical services due to psychological or psychiatric concerns; and (e) clinical samples with a diagnosis of DBD, referring to participants assessed in clinical settings who met diagnostic criteria for a DBD. These analyses were further stratified by sex (male, female, and mixed samples) and geographic region (North America, Europe, Asia, and Oceania), accounting for the different versions of the ICU available.
Results
A summary of the characteristics of the studies included in this systematic review is presented in the Supplemental Table 2. The included studies (n = 297) were published between 2006 and 2025. Most articles used samples from the general community (n = 159), while 64 studies used samples from individuals involved in the criminal justice system/juvenile courts. The remaining studies (n = 50) included clinical samples and community and clinical samples with DBD diagnosis (n = 24). All participants were between 3 and 20 years old.
Meta-Analytic Analysis for Community Samples: Pooled Means, Standard Deviations and Variances for the Inventory of Callous-Unemotional Traits
Note. ICU = Inventory of Callous-Unemotional Traits; SD = standard deviation.
Meta-Analytic Analysis for Community Samples With DBD Diagnosis: Pooled Means, Standard Deviations and Variances for the Inventory of Callous-Unemotional Traits
Note. ICU = Inventory of Callous-Unemotional Traits; SD = standard deviation.
Meta-Analytic Analysis for Clinical Samples: Pooled Means, Standard Deviations, and Variances for the Inventory of Callous-Unemotional Traits
Notes. ICU = Inventory of Callous-Unemotional Traits; SD = standard deviation.
Meta-Analytic Analysis for Clinical Samples With DBD Diagnosis: Pooled Means, Standard Deviations, and Variances for the Inventory of Callous-Unemotional Traits
Notes. ICU = Inventory of Callous-Unemotional Traits; SD = standard deviation.
Meta-Analytic Analysis for Forensic Samples: Pooled Means, Standard Deviations and Variances for the Inventory of Callous-Unemotional Traits
Note. ICU = Inventory of Callous-Unemotional Traits; SD = standard deviation.
Regarding the self-report version of the ICU (24 items, 0–3 Likert scale), in Europe, community samples of both sexes exhibited lower mean scores (M p = 22.90, SD p = 8.02, n = 19,479) compared to community samples with DBD diagnosis (M p = 33.59, SD p = 8.68, n = 598), clinical samples (M p = 31.12, SD p = 9.27, n = 3,084), clinical samples with DBD diagnosis (M p = 27.26, SD p = 9.15, n = 1,780) and forensic samples (M p = 29.19, SD p = 10.02, n = 1,707). Within these groups, male participants tended to show the highest scores. However, this pattern was not observed in North America. That is, community samples showed lower average scores (M p = 25.99, SD p = 10.43, n = 14,989), followed by forensic samples (M p = 26.47, SD p = 8.15, n = 13,250), clinical samples (M p = 27.47, SD p = 9.00, n = 2,267), clinical samples with DBD diagnosis (M p = 27.11, SD p = 10.24, n = 566), with community samples with DBD diagnosis presenting the highest scores (M p = 35.37, SD p = 9.58, n = 246).
A more detailed analysis ICU self-report subscales indicated distint patterns across regions and sample types. In Europe, clear diferences emerged across the sample types on all ICU subscales. For Callousness (11 items, 0–3 Likert scale), pooled means were lowest in general community samples (M p = 10.48, SD p = 5.57, n = 9,386), increased slightly in forensic samples (M p = 11.55, SD p = 5.54, n = 751), and were highest in community samples with DBD, for which subscale-level data indicate elevated CU traits relative to non-diagnosed community youths. Parent-reported Callousness scores in clinical samples were comparatively low (M p = 4.39–4.54), suggesting informant-related differences. A similar gradient was observed for Uncaring (8 items, 0–3 Likert scale), with general community samples showing moderate pooled means (M p = 10.81, SD p = 5.11, n = 10,303), forensic samples presenting slightly lower values (M p = 9.18, SD p = 4.09, n = 751), and community samples with DBD displaying clearly elevated scores. For the Unemotional subscale (5 items, 0–3 Likert scale), forensic youths showed the highest pooled mean (M p = 10.61, SD p = 4.51, n = 751), followed closely by community samples with DBD (M p = 10.72, SD p = 4.09, n = 301), whereas general community samples showed slightly lower but comparable levels (M p = 10.65, SD p = 5.58, n = 6,647).
In North America, differences across the five samples were more pronounced, particularly for the Uncaring subscale. General community samples showed moderate pooled means for Callousness (M p = 14.17, SD p = 6.44, n = 2,670), Uncaring (M p = 12.31, SD p = 5.10, n = 3,226), and Unemotional traits (M p = 8.12, SD p = 3.07, n = 3,226). In contrast, community samples with DBD presented markedly elevated Uncaring scores (M p = 17.16, SDp = 3.81, n = 148), exceeding those observed in all other sample types. Forensic samples showed lower Callousness pooled means (M p = 8.81, M p = 5.89, n = 1,427) than community samples, but higher Uncaring (M p = 9.73, SD p = 4.80, n = 2,173) and comparable Unemotional scores (M p = 8.01, SD p = 2.90, n = 2,199). Although subscale-level data were not available for North American clinical samples with DBD, elevated total ICU scores in these groups suggest generally high CU trait levels.
Data from Asia were available primarily for community samples. Asian community youths showed substantially higher pooled ICU total scores than European and North American community samples (M p = 47.00, SD p = 8.61, i = 6,238), indicating markedly elevated CU trait levels in this region. At the subscale level, Asian community samples also exhibited comparatively high Callousness (M p = 10.21, SD p = 5.53, n = 816) and Uncaring scores (M p = 9.39, SD p = 48.14, n = 999), suggesting a broad elevation across affective and interpersonal CU dimensions relative to Western samples. Parent-reported ICU totals in Asia were likewise higher than those observed in Europe and North America (M p = 27.80, SD p = 5.61, n = 505) (Table 2). In Oceania, data were available for community and forensic samples. In community samples, parent-reported ICU total scores were relatively high (M p = 21.72, SD p = 11.80, n = 1,363), comparable to or exceeding those observed in European and North American community samples. In forensic samples, youths from Oceania showed ICU total scores similar to those observed in Europe and North America (M p = 26.92, SD p = 8.72, n = 821), with elevated Uncaring (M p = 11.85, SD p = 4.81) and Unemotional (M p = 8.60, SD p = 2.75) subscale scores.
Discussion
The Inventory of Callous-Unemotional Traits (ICU) has been widely used to identify and assess callous-unemotional traits in children and adolescents for research and clinical or forensic practices. However, the literature analysis reveals considerable differences when scores are based on factors such as the instrument’s factorial structure and sex, which can lead to assessment inconsistencies and inaccurate conclusions, such as misclassifying or underestimating CU traits in specific groups. Therefore, the main goal of this study is to establish and provide reference values for the self- and other-report versions of the ICU in its more common factorial structures, considering different sample types (e.g., community, community with disruptive behavior disorder [DBD] diagnosis, clinical, clinical with DBD diagnosis, or forensic), participants’ sex, and geographic origin.
The ICU has been widely used based on various structural models. The results obtained reveal distinct patterns in the expression of callous-unemotional (CU) traits between European and North American samples, depending on both sample type (community, clinical, forensic) and informant type (self-report, parents, teachers). These variations suggest that cultural, methodological, and contextual factors may play a significant role in the manifestation of, or at least in the perception of, callousness, lack of empathy, and emotional detachment (Gendron, 2017).
Notably, the present review also provides insight into CU trait expression in underrepresented regions, particularly Asia and Oceania. Asian community samples showed substantially higher ICU total and subscale scores than their European and North American counterparts, suggesting that CU-related affective and interpersonal features may be more frequently endorsed or more strongly expressed in these cultural contexts. Cross-cultural research has shown that emotional expression, empathy, and social conformity are shaped by culturally embedded norms and values, which can influence both the development and the self-reporting of socio-affective traits (Markus & Kitayama, 1991; Mesquita & Leu, 2007; Shou et al., 2019). In addition, differences in response styles, such as greater acquiescence or reduced use of extreme response categories in some East Asian cultures, may further affect how ICU items are endorsed (Chen et al., 1995; He & van de Vijver, 2012).
In Oceania, although the number of available studies was smaller, both community and forensic samples showed ICU scores broadly comparable to those observed in Europe and North America, particularly in forensic contexts. This pattern is consistent with previous findings indicating that CU traits are robustly associated with antisocial and justice-involved behavior across Western cultures (Frick et al., 2014; Kimonis, Frick, Skeem, et al., 2008; Salekin et al., 2018). Nevertheless, the relatively limited number of studies from Oceania underscores the need for further research to establish more stable and culturally sensitive reference values for this region.
Regarding the self-report version of the ICU, it was observed that, in Europe, mean scores were, as expected, higher in clinical and forensic samples than in community samples. This pattern aligns well with the literature, which demonstrates a higher prevalence of CU traits among youth with behavioral disorders or contact with the justice system (Frick et al., 2014). The clear distinction between groups suggests that, in Europe, the criteria for inclusion in clinical and forensic samples more directly reflect elevated levels of these traits. Interestingly, this pattern is reversed in North America, where community samples showed the highest mean scores. This finding may reflect methodological differences in selecting community samples, including participants with high but undiagnosed risk factors, or differences in affective socialization. In social contexts where assertiveness, emotional independence, or less empathetic behavior are more accepted or valued, characteristics that may overlap with CU traits, these features may be more evident even in non-clinical populations (Kimonis, Frick, Skeem, et al., 2008). This hypothesis seems to be supported by subscale analyses: although community samples showed higher scores on the Callousness and Unemotional subscales in Europe, the highest Unemotional scores were found in clinical samples. This pattern suggests that the Unemotional dimension, often considered more internalized, may manifest more strongly in individuals with externalizing problems. In North America, all subscales showed surprisingly low scores in community samples, indicating a possible cultural difference in the perception and self-assessment of these traits.
Additionally, data from the ICU’s parent- and teacher-report versions show a more conventional pattern, with higher scores in clinical populations in Europe and North America. However, it is notable that North American teachers assigned significantly higher scores to community children than their European counterparts. This discrepancy may reflect differences in teachers’ perceptions of challenging behaviors, possibly linked to pedagogical practices, school norms, or distinct cultural expectations regarding children’s emotional and behavioral expressions (Achenbach et al., 2008). Moreover, it may reflect cultural diversity, meaning that cultural and social norms can influence how individuals express themselves and their traits are assessed (e.g., Shou et al., 2019). Therefore, it is essential to recognize that conducting research across different cultures does not guarantee measurement equivalence across groups. There may be changes in the instrument’s content, the testing process, or the tool’s underlying structure (Waltz et al., 2005).
Differences between community and clinical samples were pronounced in both Europe and North America, with youths recruited from clinical and DBD-diagnosed groups showing substantially higher ICU total scores than those from the general population. This pattern is consistent with theoretical models positing CU traits as a marker of severe and persistent antisocial behavior, particularly when embedded within clinical presentations of disruptive behavior disorders (Frick et al., 2014; Frick & White, 2008). In both regions, the elevation of ICU scores in clinical samples suggests that CU traits are not merely extreme variants of normative personality but are closely linked to psychopathological risk and functional impairment. Importantly, subscale patterns revealed nuances that were not always captured by total scores alone. In some cases, clinical and forensic samples showed disproportionately elevated Callousness and Uncaring scores, whereas Unemotional scores were less differentiated from those of community samples. This finding is in line with psychometric evidence indicating that the Unemotional subscale is less consistently associated with externalizing pathology and shows weaker reliability and validity across samples (Carvalho et al., 2018; Kimonis et al., 2015; Willoughby et al., 2015). The fact that Callousness and Uncaring more clearly distinguished high-risk groups supports their central role in the CU construct and helps explain why two-factor models often outperform three-factor models in applied settings. These regional patterns should also be interpreted considering cross-national differences in juvenile justice and mental health systems. In several Nordic and other European countries, youths under the age of 18 are rarely processed through the criminal justice system, even when serious antisocial behavior is present, and are instead referred to child welfare or mental health services rather than to juvenile courts (Lappi-Seppälä, 2011, 2018; Pratt & Eriksson, 2013). As a result, “forensic” samples in these contexts may represent a smaller and more severely affected subgroup than in countries such as the United States, where juvenile justice involvement is more common and institutional placement is more frequently used for adolescents with serious conduct problems (Abrams, 2013; Tonry & Farrington, 2005).
Meta-analytic data from clinical, community, and forensic samples reveal sex differences in the expression of CU traits, particularly in European and North American samples. Overall, the data indicate that boys score higher than girls on total ICU scores and the Callousness and Uncaring subscales, with this trend being more pronounced in clinical and community samples. These sex differences support previous literature showing a higher prevalence of CU traits in boys, especially concerning externalizing behaviors associated with conduct disorders and psychopathy risk (Essau et al., 2006; Frick et al., 2014). Gender socialization may contribute to this disparity, as boys are often encouraged to suppress emotions as well as empathy or vulnerability, which may facilitate the development of callous-unemotional traits or disruptive behavior. On the other hand, the pattern observed among girls, with higher scores on the Unemotional subscale in some contexts, raises the possibility of a more internalized profile of female psychopathy, less visible through disruptive behaviors but still clinically and functionally relevant. This profile may be underestimated in traditional assessments, prioritizing behavioral manifestations more typical of males (Fontaine et al., 2010).
In forensic contexts, available data refer primarily to boys, making direct comparisons with girls difficult. Nevertheless, boys in these samples continue showing high scores in Europe and North America. The available subscale data confirm this trend, especially in the callousness and uncaring dimensions. The scarcity of data on girls in this context may be related to their lower representation in the juvenile justice system or to the underestimation of disruptive behaviors in girls. CU traits in girls may manifest in less visible ways (e.g., social manipulation, masked CU traits), suggesting a more internalized or covert female CU profile (e.g., Cardoso et al., 2023).
Despite the consistency and validity of the ICU, it is essential to consider the potential bias of this, and other instruments used to measure CU traits. Like other psychological tools, the ICU may be more sensitive to expressions of these traits that are more typical in boys. This may occur because many assessment scales were originally developed using predominantly male samples, which may limit their sensitivity in detecting how these traits manifest in girls.
Certain limitations should be noted in this study. As is common in systematic reviews, there is a risk of publication bias, as only studies published in identified sources were included. However, this risk was somewhat mitigated by additional manual searches. Another important point is the heterogeneity of the studies in the current review. Although the systematic review followed rigorous guidelines such as the PRISMA method, the diversity of samples and methodological variability across studies hinder the establishment of a solid basis for cross-cultural, age, and sex comparisons. This limitation is further compounded by the fact that some studies did not provide sufficient data for complete statistical analysis, resulting in their exclusion from the review.
Additionally, the study faced challenges in aggregating data across specific ICU versions, further limiting comparability between clinical and community samples. This is a critical issue, as the lack of a cohesive database hampers the establishment of widely applicable reference values, particularly in clinical contexts (e.g., developmental disorders). The lack of standardization across studies limits the generalizability of results and suggests the need for further research to consolidate proposed reference values.
Finally, there is a methodological issue regarding how the ICU is applied across different cultural contexts. Psychological assessment tools do not always effectively translate concepts across cultures, thereby affecting the instruments’ reliability and validity. Some cultural groups may struggle with negatively worded items or exhibit cultural preferences that shape their responses. These variations may compromise the assessment’s accuracy, and researchers must focus on developing appropriate cultural adaptations to ensure the ICU’s cross-cultural validity.
Despite the abovementioned limitations, establishing and providing reference values for the ICU is particularly important for drawing valid conclusions about the characteristics of children and adolescents assessed in clinical, community, and forensic settings, especially given the ICU is widespread use in these evaluative contexts. Moreover, the values provided in this study may serve as useful benchmarks for future research, particularly for comparative purposes.
Supplemental Material
Supplemental Material - Reference Values of the Inventory of Callous-Unemotional Traits in Youth Populations: A Systematic Literature Review
Supplemental Material for Reference Values of the Inventory of Callous-Unemotional Traits in Youth Populations: A Systematic Literature Review by Patrícia Figueiredo, Beatriz Lopes, Melhyssa Gomes, Eduarda Ramião, and Fernando Barbosa in Clinical Child Psychology and Psychiatry.
Supplemental Material
Supplemental Material - Reference Values of the Inventory of Callous-Unemotional Traits in Youth Populations: A Systematic Literature Review
Supplemental Material for Reference Values of the Inventory of Callous-Unemotional Traits in Youth Populations: A Systematic Literature Review by Patrícia Figueiredo, Beatriz Lopes, Melhyssa Gomes, Eduarda Ramião, and Fernando Barbosa in Clinical Child Psychology and Psychiatry.
Supplemental Material
Supplemental Material - Reference Values of the Inventory of Callous-Unemotional Traits in Youth Populations: A Systematic Literature Review
Supplemental Material for Reference Values of the Inventory of Callous-Unemotional Traits in Youth Populations: A Systematic Literature Review by Patrícia Figueiredo, Beatriz Lopes, Melhyssa Gomes, Eduarda Ramião, and Fernando Barbosa in Clinical Child Psychology and Psychiatry.
Footnotes
Ethical Considerations
The study was conducted according to 7th edition APA ethical standards.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
