Abstract
An accurate assessment of cognitive abilities in populations that differ from the majority in cultural and linguistic characteristics is one of the main challenges in cognitive testing. Previously developed methods for assessment of the validity of cognitive scores in individuals with diverse backgrounds, such as the Culture-Language Interpretative Matrix (C-LIM), have not been empirically substantiated. We tested the applicability of the C-LIM in the European context, by comparing selected test scores from the Woodcock-Johnson-IV Test of Cognitive Abilities (WJ-IV) between Roma children aged 7–11 years (n = 399) and their counterparts from the normative population (n = 131). The largest differences were detected in WJ-IV tests requiring abstract reasoning and manipulation with complex signs. Furthermore, the C-LIM did not reliably discriminate between our groups and its use appears to be inappropriate for making diagnostic decisions about children from populations that do not traditionally rely on processes such as categorical thinking, abstract reasoning, and generalization.
Keywords
An accurate estimation of the cognitive abilities of students whose cultural and linguistic characteristics differ from those of the majority presents a challenge for psychologists in many countries. Substantial variability in cognitive performance in minorities relative to the majority has been well documented, while it has been acknowledged that these differences are a consequence of standardization procedures that typically do not account for groups outside the normative population. Importantly, the validity of the cognitive scores for these minority groups has not yet received sufficient attention from researchers, particularly in Europe (Ortiz, 2019; Ortiz et al., 2018; Styck & Watkins, 2013). This is highly surprising, given the seriousness of the consequences for many children whose cognitive capabilities cannot be accurately estimated due to their different backgrounds. To provide more insight into the topic, the aim of this study was to evaluate intelligence test bias in children from the Roma minority in the European context.
Test bias can be defined as any nuisance factor (non-cognitive, culture-related) related to the measured construct, method of testing, or item that systematically jeopardize validity of an instruments across different populations (van de Vijver & Leung, 2021). Construct-related bias emerges when concept of a construct differs across cultures. For example, it is known that the most frequent conceptualization of intelligence within cognitive tests omits domains perceived as important by non-Western populations (e.g., interpersonal skills). Correspondingly, preferences for specific cognitive abilities that are considered to reflect high intelligence are also culture-specific: While Western cultures prioritize inductive thinking and rapid problem solving, other cultures value high levels of motivation or more practical skills such as a sense of direction or knowledge of herbal medicines (Singh et al., 2021).
Cultural background therefore significantly impacts the development of cognitive functions (Wang & Kushnir, 2019) and learning styles (Joy & Kolb, 2009). Even the most basic cognitive processes, such as visual perception, appear to be strongly shaped by culture. Members of Western cultures, for instance, tend to process central objects and organize information according to categories, whereas East Asians demonstrate holistic, contextual, information-processing bias with an emphasis on relational rather than categorical information (Park & Huang, 2010; Wang, 2016).
Second type of bias is associated with characteristics of the instrument, its administration, or sampling (van de Vijver & Leung, 2021). The relevant sources comprise familiarity with stimulus material, response style, response procedures, or communication. Cultural conventions and values relevant for psychological assessment from this perspective include for instance the degree of assertiveness in the culture of origin, or cultural practices associated with communication (Singh et al., 2021). The last type of bias refers to language- or culture-related artifacts present at item level.
All cultural influences are closely reflected in language development. Necessary and inherent connections between language and another sign (e.g., symbolic) system have been acknowledged in most definitions of culture. Both culture and language are based on meaning-making that is realized via signs and are therefore inseparable from each other (Lotman, 2022). Correspondingly, the language- and culture-related test bias cannot be separated (Ortiz, 2019).
Despite known impact of distinct cultural and linguistic experience on the accuracy of cognitive scores, empirically-driven guidelines for practitioners for the mitigation of this problem are practically non-existent (Cormier et al., 2022). Previous attempts to reduce cognitive test bias in minorities in practice have been mainly twofold: a search for culture-free cognitive tests (or use of non-verbal tests as culture-fair alternatives) and a development of criteria for the evaluation of the validity of cognitive tests standardized on normative samples (Ortiz et al., 2018).
Data from studies following the first line of research strongly suggest that a cognitive test that could be defined as completely culture- and language-independent does not exist. Although language-related bias is decreased to a certain degree in non-verbal tests, testing bias is not completely eliminated because non-verbal abilities are also shaped by culture (Ortiz, 2019; Singh et al., 2021), specifically by the degree of experience and relevance of skills such as spatial reasoning, visual memory, or drawing on everyday life in the culture of origin (Fasfous et al., 2013; Lozano-Ruiz et al., 2021; Singh et al., 2021).
The second line of research includes attempts to define criteria for determining the validity of cognitive scores for a given individual. One of these efforts is represented by the Culture-Language Interpretative Matrix (C-LIM) that has been developed to assist practitioners examining minority populations in the United States (Flanagan et al., 2013). The C-LIM is a classification system that differentiates the extent to which both cultural and linguistic load influence cognitive performance. In a 3 × 3 matrix, tests are considered to have low, medium, and high cultural/linguistic load. The performance of participants outside the normative sample is presumed to decline systematically as a function of increasing cultural and/or linguistic demand when test validity is likely to be uncertain. Available evidence shows that while this approach appears promising when differentiating minority groups from normative populations, it is inaccurate when examining individual participants’ test scores (Calderón-Tena et al., 2022; Kranzler et al., 2010; Styck & Watkins, 2013, 2014). In other words, cognitive performance is significantly worse and more variable for minorities than for native speakers at the group level; however, only a marginal proportion of minority samples (4%–37%) adheres to the expected decline down the C-LIM diagonal, while nearly a quarter of the normative group’s scores show a similar decline (Kranzler et al., 2010; Styck & Watkins, 2013, 2014).
It has been suggested that the definitions of linguistic and cultural demands in the C-LIM matrix have been formulated largely based on expert-opinion agreement (Styck & Watkins, 2013); the references cited to justify the current classification are either unpublished manuscripts or are greatly outdated (see references in Ortiz et al., 2018). Additional research is certainly necessary to support and validate the use of the C-LIM.
Inadequate measurement of cognitive abilities in minorities can have serious consequences for many children who may be misdiagnosed and consequently likely to attend schools with restricted curricula, which subsequently leads to inadequate educational attainment. The Roma ethnic group is the largest minority in Europe, with long-standing disadvantages related to lower education, income, and job opportunities (EUAFR, 2016). This minority has received relatively little attention from researchers to date—concerns about Roma children’s educational disadvantage have been raised in several European countries only in the past two decades (Dolean & Cãlugãr, 2020). Occasional efforts to develop culture-fair “practice” tests (Čavojová & Belovičová, 2009) and to examine the validity of non-verbal (Páchová, 2013) and verbal intelligence measures standardized for a normative population in samples of European Roma children (Denglerová, 2016) have not yet resulted in systematic and official recommendations for practice, as is the situation elsewhere (Ortiz, 2019). We therefore set out to provide evidence-based guidance on interpreting Roma children’s performance in the Woodcock-Johnson-IV Test of Cognitive Abilities (WJ-IV) compared with their counterparts from normative (majority) population. Previous analyses of the same dataset at item level revealed that the WJ-IV did not systematically favor one group over the other (Authors, under review). In the present paper, we therefore present additional analyses of differences between the groups at the test level.
Previous research has revealed variables that significantly contribute to performance differences between majority and minority populations beyond cultural and linguistic influences. Socio-economic variables such as family income, parental education, and living conditions appear to affect the development of cognitive abilities significantly, and this is particularly true for the Roma minority (Biro et al., 2009; Burneo-Garcés et al., 2019; Dolean & Cãlugãr, 2020). Compared to a medium socio-economic status (SES), a low SES negatively impacts several cognitive processes (e.g., visual–motor coordination and sustained attention; Burneo-Garcés et al., 2019). Socio-economic indices were therefore considered in this study in interpreting Roma children’s cognitive scores.
The overall aim of this study was to contribute to the sparsely investigated issue of intelligence-measurement validity in a specific minority in the European context by revealing potential culturally relevant sources of test bias. To confirm previously reported differences, WJ-IV performances of Romani children were compared with those of their counterparts from the normative population. Second, the relationships between intelligence scores and related factors, including socio-economic, socio-cultural, linguistic, and behavioral characteristics were investigated (Burneo-Garcés et al., 2019; Dolean & Cãlugãr, 2020). Finally, the relevance of the C-LIM algorithm was assessed as a tool for the accurate interpretation of cognitive scores specifically for the Roma minority in the Czech Republic.
Method
Participants
The sample comprised 399 minority children (Mage = 9.4; ±1.6; 189 males) and 131 children from the majority population (Mage = 8.8; ±1.5; 73 males), aged 7–11 years. The latter dataset was drawn from a standardization sample of the Czech adaptation of the WJ-IV (McGrew et al., 2019) for which the data collection was completed under a different project. Roma participants were recruited from all districts of the Czech Republic and selected by quota and age. The two samples did not differ with respect to gender (χ2(1) = 2.43; p = .119); however, the minority group was significantly older than the majority (t(229.82) = −3.45, p < .01, 95% CI [−.83, −.23]). The minority subsample included children from families self-identifying as members of the Roma ethnic group. Of these families, 51% lived in socially excluded environments defined in (Čada et al., 2015) and characterized by limited access to the labor market and public services, limited contact with social surroundings, and limited political participation and possibility of resolving personal crises. This variable was labeled family exclusion status.
Approximately 78% of the Romani children attended primary schools, 13% followed individualized education plans, and 2% attended special or practical schools, while the rest of the sample had not started formal education at the time of testing (6.5%). Most of the Roma families reported use of official language at home—only 8% percent of the children and 10% of their parents spoke Romani.
The study was approved by the local Ethics Committee of the University of South Bohemia in České Budějovice, Faculty of Health and Social Studies, nr. 12,062018. The participants and their parents/guardians were acquainted with the aim and purpose of the research and all the parents provided written informed consent prior to participation in the study. The children’s parents each received 300 Kč (approx. 12€) upon completion of the testing protocol.
Instruments
The testing battery for Romani participants comprised the Czech adaptation of the WJ-IV (McGrew et al., 2019) and Social exclusion questionnaire ([SEQ]; Seifert & Jelínek, 2019).
The WJ-IV
The WJ-IV is an assessment of cognitive abilities borne out of the Cattell–Horn–Carroll theory. The following WJ-IV tests were administered: Oral Vocabulary, Number Series, Verbal Attention, Phonological Processing, Story Recall, Concept Formation, Numbers Reversed, Number-Pattern Matching, and Pair Cancellation (McGrew et al., 2019). The rationale for selecting these tests was to: (1) maximize the collection of data reflecting basic cognitive abilities whilst limiting the testing time to maintain the children’s motivation and attention and (2) coordinate the design of this study with another, related project. The following composite scores were derived from this dataset: Brief Intellectual Ability (BIA; intellectual ability based on verbal comprehension, fluid reasoning, and short-term memory), Short-term Working Memory (Gwm; the ability to store, retrieve, and recode verbal information within working memory), Fluid Reasoning (Gf; quantitative and verbal logical reasoning and the ability to solve unfamiliar problems), and Number Facility (N; the efficiency of visual-perceptual discrimination, temporary storage, and working-memory processing of numerical information; Ding & Alfonso, 2016). The average reliability across the age groups for the WJ-IV tests and composite scores was excellent: Oral Vocabulary: α = .97; Number Series: α = .99; Verbal Attention: α = .97; Phonological Processing: α = .98; Story Recall: α = .96; Numbers Reversed: α = .98; Number-Pattern Matching: α = .98; Pair Cancellation: α = .98; BIA: α = .97; Gwm: α = .97; Gf: α = .98; and N: α = .97 (McGrew et al., 2019). In addition to the cognitive tests, the WJ-IV comprises a brief rating scale that documents observations of a child’s behavior during testing in the following categories: conversational proficiency; self-confidence; cooperation; care in responding; activity; response to difficult tasks; and attention and concentration. These ratings were included in the analyses to shed light on relationships between behavioral indices and cognitive scores.
The SEQ
The SEQ (Seifert & Jelínek, 2019) was used to examine the parents’ perceptions in their family’s SES (e.g., Sometimes we cannot make ends meet), socio-cultural differences ([SCD]; e.g., Our family customs differ considerably from those of majority), socio-cultural family assets ([SCFA]; e.g., My son/daughter has good role models for their future professional life in our neighborhood), and school and education-related conflicts ([SERC]; e.g., Our school respects the specifics of our child and our family situation sufficiently). Although the overall internal consistency of this questionnaire was acceptable in this sample (α = .71), the alpha levels for the individual dimensions were relatively poor (SES: α = .53, SCD: α = .66, SCFA: α = .72, and SERC: α = .51).
Socio-economic and socio-cultural characteristics
The socio-economic variables included the mother’s and father’s highest education attainment, household crowding (reported by parents), and living in a socially excluded environment (i.e., family exclusion status; Čada et al., 2015). Parental education was coded on a scale from 1 (unfinished elementary education) to 7 (university degree), and these scores were subsequently averaged across both parents (Dolean & Cãlugãr, 2020; range: 0–5). Household crowding was represented by the number of family members per room in a given household (range: .6–8.0). Socially excluded and non-excluded communities were scored as 1 and 2, respectively. Linguistic characteristics were proxied by the language spoken by a child and their parents: The official language was coded as 1 and Romani as 2.
Data pre-processing and analyses
All the scores were calculated following the Czech version of the WJ-IV manual (McGrew et al., 2019). Specifically, the raw scores were transformed into the Rasch-model-based W-scores. These W-scores were subsequently transformed into IQ scores according to age-based Czech norms, with each IQ score having its own conditional standard-error value. Romani children with 95% confidence intervals below 70 on the BIA IQ composite score—a value that would be indicative of performance within a range for intellectual disability in the normative sample—were identified and removed from the sample. This processing step resulted in the removal of 6.1% from the normative group and a considerable proportion of the Romani sample (46.4%). Such an approach can be regarded as questionable because the diagnostic accuracy of cognitive tests is known to be affected negatively outside the standardization sample (Daugherty et al., 2017; Fasfous et al., 2013; Lozano-Ruiz et al., 2021). To account for this inadequacy, the results for the entire sample are reported in the Supplementary Material.
The statistical analyses were conducted in three steps: (1) to express the performance differences between the groups, effect sizes were calculated using Cohen´s d for each WJ-IV test and age group separately and then averaged across age using the weighted mean to account for unequal group sizes; (2) Pearson’s correlations were used to assess the relationships between cognitive performance and the other relevant variables related to language and Romani families’ socio-economic and socio-cultural conditions; and (3) to test applicability of the C-LIM framework in our sample, a percentage of the participants whose test scores corresponded with the C-LIM pattern previously suggested to reflect a combination of cultural and linguistic load were computed (Flanagan et al., 2013). Specifically, difference values were calculated from averaged scores across WJ-IV tests with low cultural and linguistic load (i.e., Number Series, Visualization, Number-Pattern Matching, and Pair Cancellation) and those high on both characteristics (i.e., Oral Vocabulary and Story Recall) in the C-LIM matrix (Flanagan et al., 2013). In the final exploratory step, the difference values were computed to compare the two test clusters defined based on the performance differences that emerged between our study groups in an effort to illustrate a data-driven (and hopefully more appropriate) identification of minority children at risk of misdiagnosis. These difference values were subsequently divided by a pooled, participant-specific standard error to account for within-person variability, and evaluated against a criterion defined to capture minimally meaningful differences between the study groups. The criterion value was set to .5, following previous research on minimal (clinically) relevant difference (Norman et al., 2003). All the statistical analyses were performed using the R statistical software (R Core Team, 2021).
Results
Cognitive Performance
Group differences for several WJ-IV tests reached or exceeded the threshold for large effect sizes, revealing the largest discrepancies in Concept Formation, Number Series, Oral Vocabulary, Numbers Reversed, and Number-Pattern Matching. Notably, this cluster comprised tests from four different cells of the C-LIM classification with various combinations of presumed linguistic and cultural loading. In contrast, small to medium effects were found in Verbal Attention, Pair Cancellation, and Phonological Processing characterized as having high linguistic and moderate cultural demand, and in the case of the Pair Cancellation, low demand on both aspects (see Figure 1 and Table 1). Consistent with these results, composite scores averaged across age groups showed large effects in three out of four of these indices (Gf: ES = 1.35; N: ES = 1.10; BIA: ES = 1.06; and Gwm: ES = .88). Comparison of average WJ-IV IQ scores between the groups. Note. WJ-IV = Woodcock-Johnson-IV test of cognitive abilities. The order of the WJ-IV tests (left to right) reflects Roma children’s decreasing performance. Comparisons of IQ Scores for Individual WJ-IV Tests and all age Groups (Participants with BIA<70 Filtered out) Note. ES=effect size.
Socio-economic and Socio-cultural Characteristics
Correlation analysis revealed that family exclusion status was not related to any of the WJ-IV composite scores. Similarly, use of the Roma language was not associated with any composite scores (p ≥ .45). Meanwhile, higher parental education was significantly related to a better performance in BIA (r = .22, p < .01), Gwm (r = .16, p = .02), and N (r = .13, p = .05). The degree of household crowding was inversely related to BIA (r = −.21, p < .01), Gf (r = −.24, p < .01), and Gwm (r = −.16, p = .02). Finally, the higher the perceived socio-cultural family differences, the poorer the performance of the child on most of the composite scores (BIA: r = −.18, p = .01; Gf: r = −.21, p < .01; Gwm: r = −.20, p < .01; and N: r = −.19, p < .01). Surprisingly, no relationships were detected with respect to subjectively perceived SES (p ≥ .41), SCFA (p ≥ .30), and SERC (p ≥ .21). Following up on an unexpected absence of a relationship between family exclusion status and cognitive performance, within-sample comparisons contrasting Roma children living in socially excluded and non-excluded environments showed comparable performance across all WJ-IV tests with effect sizes ranging from .01-to-.21 (see Supplementary Table 4).
Test-related Behavior
Poorer conversational proficiency was significantly related to lower BIA (r = −.21, p < .01); Gf (r = −.18, p = .01); Gwm (r = −.23, p < .01); and N (r = −.17, p = .01). Specific WJ-IV tests associated with conversational-proficiency ratings were verbal as well as non-verbal (Oral Vocabulary, Number Series, Phonological Processing, Story Recall, Visualization, Numbers Reversed, and Pair Cancellation), with the closest associations detected for Phonological Processing (r = −.34, p < .01) and surprisingly, Visualization (r = −.35, p < .01). Significant associations were also detected for other behavioral ratings (e.g., cooperation, response to difficult tasks, attention, and concentration). Notably, Phonological Processing and Story Recall seemed to be related to behavioral ratings in most cases, whereas Verbal Attention, Concept Formation, and Number-Pattern Matching were largely independent of these characteristics. These relationships were much stronger for the entire Roma sample (see Supplementary Table 6), suggesting the importance of monitoring test-related behavior and conversational proficiency and cooperation specifically when assessing cognitive abilities in minority populations.
Applicability of the C-LIM
Last, analyses of performance patterns corresponding to the C-LIM predictions for the combined effect of linguistic and cultural load identified 57.72% of majority group and only 42.06% of minority children as producing scores that followed the expected decline. Such suboptimal performance of the C-LIM is not surprising in light of previous studies, and we were interested in exploring how to improve the accuracy of the differentiation of the participants at risk of misdiagnosis. Close inspection of cognitive performance revealed that the least biased tests involved tasks relying predominantly on auditory processing or comprising concrete stimulus material (objects and pictures) processed visually (Verbal Attention, Phonological Processing, Story Recall, Visualization, and Pair Cancellation), whereas those with the largest bias were mostly visual and involved manipulation with complex sign material (i.e., numbers) and, perhaps most importantly, required abstract and logical reasoning such as induction or inference (Oral Vocabulary, Number Series, Concept Formation, Numbers Reversed, and Number-Pattern Matching). The WJ-IV tests were consequently divided into two groups to reflect high and low complex sign/abstract load. We labeled this classification the Sign
1
Interpretative Matrix (SIM). Additional analyses showed that the pattern of decline from complex sign/abstract-free to sign/abstract-loaded WJ-IV tests emerged for 6.50% of majority participants and 38.78% of Romani children (see Figure 2 for an example comparison between C-LIM and SIM classification accuracies and Table 2 for test scores for both groups). Classification accuracies of the C-LIM and SIM criteria. Note. The panels depict percentages of participants in each group whose WJ-IV performance patterns were consistent (right) and inconsistent (left) with the C-LIM and SIM decline. WJ-IV IQ Scores Observed in the Two Study Groups Within the Culture-language Interpretative Matrix. Note. WJ-IV tests with large (dark grey) and small (light grey) performance differences between majority and minority groups. The two colors allow for a direct comparison between the Culture-Language Interpretative Matrix (C-LIM) and Sign Interpretive Matrix (SIM) classifications.
Participants’ Characteristics
Assessment of relationships between the two classification systems and social, cultural, language-related, and behavioral characteristics showed that while the C-LIM was not related to any of these variables (p ≥ .46), the larger SIM value was significantly associated with greater socio-cultural differences reported by the family (r = .20, p < .01, 95% CI [.07, .32]).
Finally, the C-LIM was not associated with any of the behavioral ratings (p ≥ .06); however, lower conversational proficiency and weaker cooperation were related to poorer performance in tests with a high sign/abstract load (r = −.14, p = .05; r = −.16, p = .02; respectively).
Discussion
The aim of this study was to assess the validity of the WJ-IV as a measure of intelligence among the Roma minority in the European context. This goal was achieved by evaluating performance differences between groups of Romani and majority children, exploring the role of socio-economic, socio-cultural, linguistic, and behavioral characteristics in cognitive performance, and evaluating the relevance of the C-LIM classification among the Roma population.
Our results revealed the largest differences between Roma and majority children in tasks building strongly on abstract and logical reasoning together with predominantly visual presentation of complex sign material. In addition, the minority group’s cognitive scores were positively associated with parental education but inversely related to household crowding and reported socio-cultural distinction from the majority. Cognitive performance was also related to the administrators’ behavioral ratings with conversational proficiency, response to difficult tasks, and attention and concentration as the most relevant items in this context. Applying the C-LIM to the dataset did not result in reliable discrimination between the groups on high/high relative to low/low C-LIM cells. Indeed, a higher percentage of individuals from the majority relative to the minority group was misclassified as having invalid test scores. The largest (and smallest) differences were identified in tests classified as having various combinations of high to low linguistic and cultural demand. Furthermore, the C-LIM index was unrelated to any of the socio-cultural, socio-economic, and behavioral characteristics. In contrast, our data-driven SIM classification not only distinguished between the study groups with more precision, but was also related significantly to the socio-cultural differences reported by the families and the children’s conversational proficiency and cooperation as rated by the examiners.
The main results of this study are largely consistent with available evidence, suggesting problematic application of the C-LIM in practice. The C-LIM appears to distinguish individuals with different linguistic and cultural backgrounds from other populations only at the chance rate, while a very limited proportion of minority individuals performed consistently with all its predictions (Calderón-Tena et al., 2022; Kranzler et al., 2010; Styck & Watkins, 2013, 2014). A close inspection of the Roma children’s performance patterns revealed that they experienced difficulties particularly in tests requiring cognitive processes such as categorization, rule inference, induction and deduction, and manipulation with complex signs such as numbers (e.g., Concept Formation, Oral Vocabulary, and Numbers Series). In contrast, group differences between native English speakers and English language learners have been documented primarily in tests that build on familiarity with the language of administration, such as Verbal Comprehension, Sound Blending (Calderón-Tena et al., 2022; Kranzler et al., 2010), Vocabulary, Similarities (Styck & Watkins, 2013, 2014), and Visual-auditory Learning (Calderón-Tena et al., 2022); however, differences have also been reported in tests based on abstract reasoning, such as Concept Formation (Sotelo-Dynega et al., 2013). We must acknowledge at this point that our findings are comparable to those obtained in the aforementioned research only to a certain extent: The Roma minority has been living in Europe for a long time, and bilingualism was not common in our sample. Issues related to the acquisition of official language can therefore be regarded as less relevant. In contrast, the cultural background of the Roma is traditionally significantly different from that of the mainstream population, and this is frequently combined with their social exclusion and unfavorable SES (Biro et al., 2009; Dolean & Cãlugãr, 2020). Given the significant impact of culture on cognition, for instance, on preference for abstract conceptualization over concrete experience (Joy & Kolb, 2009), it might be more appropriate to compare our findings with data from minorities with potentially similar learning styles. A recent investigation into the applicability of the C-LIM in Hmong Americans is relevant in this regard: Consistent with our findings, the Hmong participants scored within the average range in practical tasks (Block counting, Triangles, Story Completion), but achieved very low scores on tests measuring abstract reasoning, generalization and categorization (Conceptual Thinking, Pattern Reasoning; Romstad & Xiong, 2017). Differences in categorization performance either with respect to preference for a certain categorization style (Kundrátová, 2012) or speed of categorization processing (Denglerová, 2016) have already been documented among Romani children. Thus, our data aptly illustrate that the use of the C-LIM in its current form would contribute to testing bias in similarly culturally different populations. A distinction between formal and informal learning styles could facilitate interpretation of these findings: while the former learning style is individual and based on scientific analysis, categorization of information, and logical deduction, the latter occurs naturally during the daily activities and socio-cultural practices of an entire community (Marshall & DeCapua, 2013). As with Hmong Americans, Romani children are likely to learn in a pragmatic and informal manner in their home environments. Consequently, tasks based on a formal learning style do not reflect their cognitive abilities accurately. In addition to the importance of abstract thinking and degree of collectivism in a given culture, other cultural differences, such as familiarity with certain tasks should be considered as impacting cognitive performance in this context (Daugherty et al., 2017; Singh et al., 2021). For instance, educational climate within families also significantly affects children’s intellectual development. Compared to non-Roma children with low SES, Roma children generally do not experience an adequate, stimulating atmosphere at home; their access to age-appropriate toys, books, or writing/drawing tools is typically limited (Biro et al., 2009). We can therefore speculate that our study groups differed in familiarity with the testing situation and experience with specific tasks and stimulus materials; however, these variables were not tested here directly, leaving this question open for future investigations. Another potentially serious confounding factor is the motivation to perform well. Participants with below-average IQ scores can score lower either as a result of test motivation or ability, while test motivation tends to be consistently higher in participants scoring above average (Duckworth et al., 2011). A weaker need to evaluate their own performance in cognitive tests has already been noted among Roma children (Denglerová, 2016). This corresponds with our finding that poorer working memory and fluency with numerical information are associated with a stronger tendency to give up when presented with difficult tasks, indicating that the Roma children’s performance was confounded at least to a certain degree by this non-cognitive aspect.
The omission of a detailed evaluation of language proficiency is among the limitations of this study. On the one hand, the proportion of children who spoke Romani at home was very low in our sample and this variable was unrelated to cognitive performance. On the other hand, the Roma children’s conversational proficiency as rated by examiners yielded strong associations with verbal and non-verbal WJ-IV scores. A poorer ability to communicate cannot thus be attributed to bilingualism, but could rather be a consequence of a different cultural background and learning styles. Second, due to limited number of administered WJ-IV tests, no inferences can be made with respect to other C-LIM cells or other cognitive abilities. Future studies on Roma and similarly culturally diverse populations should focus on differentiating performance in tests grouped according to requirements for abstract thinking (Romstad & Xiong, 2017). Third, although the overall internal consistency of the SEQ was acceptable, its three dimensions reached questionable reliability values. Thus, the observed associations (or lack thereof) with cognitive performance should be interpreted with caution. Fourth, socio-economic characteristics such as family crowdedness and family exclusion status were not available for the majority children. Nuanced relationships between socio-economic variables and cognitive performance in our sample of Roma children suggest that considering role of additional SES factors in cognitive performance of both study groups could be insightful. Fifth, identified relationships between variables were generally relatively weak. This might be associated with complex nature of associations among the variables and with above-mentioned limitations concerning employed socio-economic and socio-cultural indices. We therefore encourage investigators to include established measures when investigating Roma population to test these relationships. Interestingly, test-related behavior appears to be an additional domain worth considering during assessment of cognition in minority children. Finally, while we acknowledge that our data-driven SIM categorization requires replication in independent samples of Roma children and other minorities with distinct profiles of cultural and linguistic backgrounds, it offers testable hypotheses for future research that is certainly necessary for an empirically supported revision of the C-LIM method.
In conclusion, large differences in cognitive performance revealed between majority and minority children highlight importance of addressing this issue in research and practice. The presented findings outline potential sources of cognitive test bias in Roma children related to the measurement method and comprising familiarity with test stimuli and test situation, and motivation to perform well. These cultural specificities of Roma minority could be targeted in practice by a categorization of cognitive tests with respect to the requested degree of abstract reasoning and manipulation with complex sign material. Further empirical scrutiny should involve more comprehensive socio-economic and socio-cultural indices to confirm fundamental aspects of test bias in Roma population.
Supplemental Material
Supplemental Material - Validity of Intelligence Assessment Among the Roma Minority Population
Supplemental Material for Validity of Intelligence Assessment Among the Roma Minority Population by Kristína Czekóová and Tomáš Urbánek in Journal of Psychoeducational Assessment
Footnotes
Acknowledgments
We wish to thank Alena Hricová and Matěj Seifert for their efforts in project organization and data collection.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported financially by the project No. TL02000187; Standardization of Woodcock-Johnson IV test for population of Romani children (TAČR) and conducted in cooperation with National Pedagogical Institute of the Czech Republic.
Supplemental Material
Supplemental material for this article is available online.
Note
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
