Abstract
Vocabulary assessment is an important part of measuring language proficiency in both monolingual and bilingual children. The LITMUS Cross-Linguistic Lexical Tasks (CLT) provides a framework for assessing the vocabulary of monolingual and bilingual children using a standardized procedure and comparable stimuli across languages. All language versions of CLT include picture naming and picture recognition tasks, using both nouns and verbs as target words. The present study demonstrates high reliability (internal consistency) and convergent validity of the Polish CLT (CLT-PL) in a monolingual sample of children between 3 and 6 years of age. We present data collected from 479 participants. Based on the analysis of the impact of children’s demographic characteristics on CLT scores, we developed a set of monolingual norms for children aged 3;0 to 5;11. We conclude that the CLT-PL is a reliable and valid tool for assessing vocabulary in monolingual children that could be applied in research and has the potential to serve as a diagnostic tool in the future. With this, we are one step further in designing tools for assessing the vocabularies of multilingual children learning Polish.
Keywords
Introduction
Vocabulary assessment for monolinguals and bilinguals
Tools measuring the number of words a child knows, or a child’s vocabulary size, are useful to practitioners since vocabulary size may serve as a reliable estimate of the child’s overall language competence. For example, a limited vocabulary may indicate a language disorder (McGregor et al., 2013; Rice & Hoffman, 2015; Verbeek et al., 2024), especially if the child has difficulty understanding and producing verbs (Riches et al., 2005). When evaluating the vocabulary size of multilingual children, it is crucial to have tools that can be used to assess monolingual and multilingual children in the same way, for example, to determine language dominance (Bonvin et al., 2023, Hansen et al., 2024). In addition, comparable assessment of vocabulary skills across multiple languages may allow for cross-linguistic research on universals in lexical acquisition (Bleses et al., 2008; Braginsky et al., 2016; Haman et al., 2017).
Both researchers and practitioners emphasize that a reliable assessment of multilingual children’s vocabulary must be conducted in all the languages to which the child is exposed, as assessing the child’s lexicon in only one language may severely underestimate his or her abilities (Peña et al., 2016). However, assessment tools for many languages are lacking, existing instruments (or tasks) are not standardized or normed, and the ones that are available may not be comparable to available tools for other languages. Direct translation of language assessment tools is not recommended to compensate for the lack of comparable tools, because words in different languages can vary significantly in difficulty, as measured by age of acquisition, morphological complexity, or word frequency (Mueller Gathercole et al., 2008; Peña, 2007; Schaefer et al., 2016; van Wonderen & Unsworth, 2021). The first steps to create comparable tools and prepare them for use in practice and research are to design tools according to the same set of rules, and then, to establish monolingual norms and compare whether they are similar across languages.
Normed tools assessing vocabulary in mono- and bilingual preschool-aged children
The Polish version of the Cross-Linguistic Lexical Tasks, CLT-PL, is certainly not the first standardized instrument for measuring the vocabulary of monolingual Polish-speaking children. On the Polish market of psychometric instruments, there are already tests such as OTSR (Obrazkowy Test Słownikowy – Rozumienie; Haman & Fronczyk, 2012), used for measuring receptive vocabulary, with norms for children aged 2;0 to 6;11; TRJ (Test Rozwoju Językowego; Smoczyńska et al., 2015), which includes two vocabulary tasks (naming pictures and understanding words), with norms for children aged 4;0 to 8;11; and TSD (Test Słownikowy dla Dzieci; Koć-Januchta, 2013), which includes four subtests, including naming pictures, with norms for children aged 4;0 to 7;11. However, although all these instruments have good psychometric properties in Polish monolingual children, they were not developed in a way that would allow for easy adaptation or translation to other languages. Thus, they do not support cross-linguistic studies of monolingual speakers of different languages. Moreover, they fail to enable the assessment of Polish-speaking multilinguals in a manner that facilitates the comparison of their scores across all their languages. Furthermore, these existing tools are not freely available for use in research or diagnosis. Standardization of the CLT-PL can fill this gap in free, useful assessment tests for Polish monolinguals, and is a first step in providing data that can be used in cross-linguistic studies, and eventually assessment of Polish-speaking children living abroad.
Worldwide, there is a general lack of normed vocabulary tests (as well as other language tests) for bilingual populations (Freeman & Schroeder, 2022; Teoh et al., 2018), particularly for children acquiring languages other than English. As a result, researchers and practitioners often use tasks that have not been normed or that have only been normed for monolingual populations (Rethfeldt et al., 2024). However, there are two examples of successful norming of tests for bilingual populations (Mueller Gathercole et al., 2008; Peña et al., 2018).
One example comes from the Bilingual English-Spanish Assessment (BESA): a test designed to measure language skills of Spanish-English bilingual children aged 4 to 6 years (Peña et al., 2018). Administered to a particular population of Latino children growing up in the United States, the test comparatively assesses multiple aspects of both languages. Based on BESA scores, it is possible to monitor progress in language development, document language dominance, and identify language impairment in bilingual and monolingual children in this specific population (Peña et al., 2018).
Mueller Gathercole et al. (2008) developed a picture-based vocabulary comprehension test in which children acquiring Welsh and English aged 7 to 11 years were asked to select one of four pictures presented after hearing a target word. As the language input of these children varied considerably, the authors decided to divide the children into three groups (hearing mainly Welsh at home, hearing mainly English at home, and hearing both languages at home). They then calculated norms separately for each of these three groups and provided examples of scores that might be typical, for instance, for children who hear mostly English at home, but low for children who hear mostly Welsh at home. These exceptions show that there is a widespread need for bilingually normed tools for several language pairs, in the face of global multilingualism.
Despite the fact that they have not been normed for either mono-, bi-, or multilingual children, the CLTs have been used extensively for nearly a decade in over 25 languages to assess vocabulary in both monolingual and multilingual children (Abbot-Smith et al., 2018; Altman et al., 2017; Bohnacker et al., 2022; Chondrogianni et al., 2022; Czapka et al., 2023; Eikerling et al., 2022, 2023; Gatt et al., 2017; Haman et al., 2017; Hansen et al., 2019; Kapalková & Slančová, 2017; Khoury Aouad Saliby et al., 2017; Komeili et al., 2023; Lindgren & Bohnacker, 2022; Łuniewska et al., 2022; Polišenská et al., 2018; Smolík & Bytešníková, 2021; van Wonderen & Unsworth, 2021; van Zwet & Unsworth, 2024; Verbeek et al., 2024). The standardization of the CLT-PL in a monolingual population can provide a model for the monolingual standardization of CLTs for other languages, as well as for standardization attempts for specific bi- or multilingual populations. The studies mentioned above show that it is not only the languages spoken that are important in a norming study of bilingual populations but also sociolinguistic and demographic factors. Previously normed vocabulary tests of monolingual Polish speakers have shown that sex, maternal education, place of residence, that is, rural/urban area, or region of residence in Poland may affect the vocabulary scores of Polish-speaking children (Haman et al., 2012; Smoczyńska et al., 2015). In the CLT-PL norming study, we first explore this set of demographic factors to determine the composition of the norming sample. Second, we provide the monolingual norms for the CLT-PL.
Cross-Linguistic Lexical Tasks
Cross-Linguistic Lexical Tasks (CLT 1 ; Haman et al., 2015) are part of the Language Impairment Testing in Multilingual Settings (LITMUS) battery (Armon-Lotem et al., 2015) developed in the COST Action IS0804 (2010–2013) ‘Language Impairment in a Multilingual Society: Linguistic Patterns and the Road to Assessment’. CLT was developed to measure the comprehension and production of nouns and verbs in children aged 3 to 6 years (Haman et al., 2015). However, the age range of children who have been tested with various CLTs to date is much broader, ranging from 15 months to 11 years (e.g. Haman et al., 2017; Komeili et al., 2023; Smolík & Bytešníková, 2021).
The main idea behind the development of CLT lies in the common procedure of task design. In each language, the form of the tasks is similar, that is, the tasks use the pictures from the same picture base (Wolna et al., 2023), and have the same structure; they contain four subtests (noun comprehension, verb comprehension, noun production, and verb production) with the same number of test items in each subtest. However, for each language, the exact selection of test items is done individually following a similar approach, and is based on the age of acquisition and the phonological and morphological complexity of the words (Haman et al., 2015; Simonsen & Haman, 2017), as well as the cultural appropriateness of the pictures (Wolna et al., 2023), that is, pictures that are considered taboo (Lew et al., 2025; Yap et al., 2017). Because of the identical test development process that began with a list of 300 concepts derived from the names of objects and actions that easily evoked a word in a picture naming study across 34 different languages (Haman et al., 2015), the tools have the potential to be comparable across languages. Thus, the list of potential target words in all CLT language versions contains only cross-linguistically recognized concepts, which limits the possibility that a test item would be unintelligible to children speaking a given language in a context other than that in which the norms were collected. An example of an unintelligible picture is the illustration of ‘carpet beating’ that children are asked about in one of the Polish vocabulary tests. Although children in Poland recognize this picture, children who grew up with Polish as one of their languages and live in another part of the world, for example, Norway or the U.K., may not recognize the activity in the picture, as it is not performed in their environment (some more examples are given in the discussion).
The first studies of monolingual children using the CLTs covered 17 different languages, and a comparison of data from these studies suggests that these tasks have the potential to have similar levels of difficulty across languages (Haman et al., 2017). The authors, who measured accuracy in the noun and verb naming and comprehension tasks, found only a weak language effect, in contrast to stronger effects of children’s age, word class, or task (Haman et al., 2017). However, another study reported that the identical procedure of item selection does not guarantee the reliability of direct comparisons of scores across languages (van Wonderen & Unsworth, 2021). Therefore, although CLT was developed to allow comparability of scores across languages, this assumption is still in need of being verified, and norming studies should be conducted in each language to ensure the reliability of cross-linguistic comparisons (Haman et al., 2015; van Wonderen & Unsworth, 2021).
Reliability and validity of CLT
Although CLT is currently available in 40 languages and has been used in many studies described in over 70 publications since 2015, data on the reliability and validity of the tasks in each language are still limited. Previous studies analyzing the reliability of various CLTs have typically focused on task construction in terms of correlations between item accuracy in the CLTs (i.e. proportion of correct responses for an item) and word properties used during the tool development, such as the age of word acquisition or word complexity (Haman et al., 2017; Hansen et al., 2017; van Wonderen & Unsworth, 2021). The reliability of the tool, understood as internal consistency, was reported in only four publications: First, Kapalková and Slančová (2017) reported that the internal consistency of the Slovak CLT was very high (Cronbach’s α = .90) in their sample of monolingual Slovak speakers aged 4 to 5 years (including typically developing children and children diagnosed with developmental language disorders). Test–retest reliability was also established in the Slovak study (data from a subsample of 22 children retested after 2 weeks), showing very high reliability for the production tasks (r = .93, p < .01, n = 22) and slightly lower reliability for the comprehension part (r = .77, p < .01, n = 22; Kapalková & Slančová, 2017). Recently, Yılmaz Çifteci and Tunçer (2024) reported on the reliability of the Turkish CLT in a sample of monolingual Turkish speakers aged 2 to 5 years. They reported very high internal consistency of the CLT-TR (Cronbach’s α = .96) and high reliability in a test–retest study following a 2- to 3-week interval, ranging from rho = .82 (p < .001, n = 25) for noun comprehension to rho = .95 (p < .001, n = 25) for verb comprehension (Yılmaz Çifteci & Tunçer, 2024). The reliability of the unrevised CLT-PL was reported by Łuniewska et al. (2022), who calculated split-half coefficients based on the data from a sample of 90 monolingual and bilingual Polish children aged 4 to 7 years. The study reported very high internal consistency, with Cronbach’s alpha coefficients of .90 for production and .87 for comprehension. Lately, a study by Verbeek et al. (2024) reported high internal consistency of the CLT comprehension task for Dutch (Cronbach’s α = .95), Polish (Cronbach’s α = .97), and Turkish (Cronbach’s α = .95) in a sample of monolingual Dutch children and bilingual Dutch–Polish and Dutch–Turkish children, including both typically developing children and children with developmental language disorders.
Other measures of internal validity of CLTs include intercorrelations between scores on the comprehension and production subtasks, which have been reported in only a few studies: In a sample of 90 monolingual and bilingual Polish speakers aged 4 to 7 years, Łuniewska et al. (2022) reported a strong correlation between the comprehension and production subtasks (r = .69, p < .001, n = 90). In a sample of 55 bilingual Scottish Gaelic-English speakers aged 6 to 8 years, a comparable correlation coefficient was lower, although still of moderate magnitude (r = .41; p < .01, n = 55; Chondrogianni & Butcher, 2023). Altman et al. (2017) reported the intercorrelations separately for monolingual (26 children aged 5–6 years who spoke Hebrew) and bilingual (27 Russian-Hebrew bilinguals of the same age) children, and separately for nouns and verbs. They found that the correlations were stronger in the bilingual sample and stronger for verbs than nouns (monolinguals: rnouns = .43, p < .05, n = 26; rverbs = .62, p < .01, n = 26; bilinguals: rnouns = .68, p < .001, n = 27, rverbs = .80, p < .001, n = 27; Altman et al., 2017).
The body of research reporting on the validity of CLT is more extensive and includes studies that compare CLT scores with other indirect measures of linguistic ability (e.g. grammar production and comprehension, word morphology, narrative), with parental reports (e.g. vocabulary reports, reports of general language ability), and with measures of nonlinguistic ability (e.g. perspective taking, nonverbal IQ). Validity studies are summarized in Table 1. In general, they show strong relationships between CLT scores and other test scores of language skills, such as receptive grammar (e.g. Dutch: van Wonderen & Unsworth, 2021; van Zwet & Unsworth, 2024; Farsi: Komeili et al., 2023; Slovak: Polišenská et al., 2018; Spanish: van Wonderen & Unsworth, 2021), productive grammar (Polišenská et al., 2018), narrative microstructure and macrostructure (Farsi and English: Komeili et al., 2023; Swedish and Turkish: Bohnacker et al., 2022; German and Swedish: Lindgren & Bohnacker, 2022), and independent measures of vocabulary (Italian: Eikerling et al., 2023) or language samples from free play (Kapalková et al., 2024). At the same time, the relationships between CLT-measured vocabulary size and parental ratings of language skills are somewhat weaker (Abbot-Smith et al., 2018; Hansen et al., 2019; Kapalková et al., 2024; Khoury Aouad Saliby et al., 2017; Łuniewska et al., 2024; Smolík & Bytešníková, 2021).
Convergent validity of Cross-Linguistic Lexical Tasks.
Notes. CCC2–GCC = Children’s Communication Checklist 2, General Communication Composite; CDI = Communicative Development Inventory; DLD = developmental language disorder; ELO-L = Oral Language Evaluation – Lebanese Arabic; ISTs = Internal State Terms; MAIN = Multilingual Assessment Instrument for Narratives; NWR = non-word repetition task; PaBiQ = Parents of Bilingual Children Questionnaire; SLI = specific language impairment; SRT = sentence repetition task; TD = typically developing; TFL = Test Fono-Lessicale: Valutazione delle abilità lessicali in età prescolare.
p < .05. **p < .01. ***p < .001.
The current study
The current study presents the first norming study of any version of the CLT. First, we present the revised version of the Polish CLT along with information on its reliability in terms of internal consistency and its convergent validity in terms of correlations with other standardized measures of vocabulary and grammar. We then analyze the impact of the demographic characteristics of the sample, that is, sex, region, and area of residence (rural/urban), as well as maternal education, on the scores in the CLT-PL. Based on this analysis, we select a subsample chosen to reflect the Polish population in terms of the characteristics that may affect the CLT-PL scores and present a set of norms for the CLT-PL for monolingual children aged 3;0 to 5;11. Following the principles of open science, the current paper is supported by a complete database that can be freely used for further research (OSF: https://osf.io/8cd73/).
Methods
Participants
This study was part of a larger PolkaNorski project and was approved by the Research Ethics Committee at the Faculty of Psychology, University of Warsaw.
We recruited 763 participants, of whom 479 children completed the testing procedure. 2 Participants were recruited through posters and flyers in preschools, as well as through social media. Parents signed an informed consent form prior to testing. Children who verbally agreed to participate in the study were tested in preschools or at the Faculty of Psychology, University of Warsaw, and received books, diplomas, t-shirts, and stickers as tokens of appreciation, depending on the number of tasks they had to perform. The group consisted of 257 girls and 222 boys aged 2.88 to 7.46 years (M = 4.97, SD = 1.05; including 4 children younger than 3 years, 96 3-year olds, 139 4-year olds, 159 5-year olds, 68 6-year olds, and 13 children older than 7 years). The participants attended 37 different preschools situated in 31 different cities and villages in all seven macroregions of Poland (Statistics Poland, 2021). The educational structure of mothers in the sample was: 69% higher education (bachelor’s degree or above), 26% secondary education, and 4% primary education. The expected proportions for the population of children born in Poland between 2016 and 2020 would be 60% living in urban and 40% in rural areas, with the following distribution of maternal education: 52% higher education, 33% secondary education, and 16% primary education (Statistics Poland, n.d.). Therefore, the overall sample in the current study overrepresented children of mothers with university education and underrepresented children of mothers with primary education. In terms of the rural/urban area of residence, the total sample was relatively close to the population structure.
Regarding the children’s language environment, the parents of 77 children (16%) reported that their children had some exposure to English, mainly in preschool or while watching television. The remaining 402 children (84%) were reported to have no regular exposure to languages other than Polish. None of the parents of the children included in the sample reported any serious health or neurodevelopmental problems in their children.
Subsamples of the children have participated in additional validation studies in which either the parent completed the Polish CDI-III (Krajewski et al., 2023) or the child was tested with Obrazkowy Test Słownikowy – Comprehension; OTSR; Haman & Fronczyk, 2012; see the section on validation studies; see Table 2).
Characteristics of subsamples of participants in validation studies of the CLT-PL.
Notes. CDI = Communicative Development Inventory; CLT = Cross-Linguistic Lexical Tasks; CLT-PL = Polish CLT; OTSR = Obrazkowy Test Słownikowy – Rozumienie.
To calculate norms, that is, percentile scores, standardized/IQ (M = 100, SD = 15) and stanine scores (M = 5, SD = 2), for the Polish CLT, we extracted data from a subsample of children chosen in a way that their demographics were as close to the population as possible in terms of the characteristics that affect the CLT-PL scores (n = 280, see Table 3; see the section of the impact of demographic measures to learn how the characteristics were chosen). In this dataset, we included data from all children whose mothers had primary education. As for the data of children from the overrepresented subgroup (university level of mother’s education), we included randomly selected data.
Characteristics of the subsample used for the calculation of the norms.
Materials and procedures
Materials
Demographic data, as well as data on children’s language environment and development, were collected using the Polish version of PaBiQ 3 (Parents of Bilingual Children Questionnaire; Mieszkowska et al., 2021; Tuller, 2015). In the current project, we used data from PABiQ on parents’ education, children’s development and health, and language environment.
Polish CLT
As mentioned above, the CLT-PL includes both word comprehension (picture recognition) and word production (picture naming) subtasks. The CLT works like this: In the comprehension subtasks, children listen to prerecorded questions (e.g. ‘Where is the flag?’ in the noun comprehension task and ‘Who is swimming?’ in the verb comprehension task) and select one of four pictures displayed on a tablet by touching the screen – the correct picture is referred to as ‘target’ and the other three pictures are referred to as ‘distractors’. In the production subtasks, children are presented with a single picture and respond to pre-recorded questions (e.g. ‘What is this?’ in the noun production task and ‘What is she doing?’ in the verb production task). Each subtask (noun comprehension, verb comprehension, noun production, verb production) consists of 32 items, resulting in a score range of 0 to 64 for both receptive and expressive vocabulary, for a total of 128 items. The selection of target words and distractors in the CLT-PL was done in the same way as in other language versions of CLT, that is, based on the age of word acquisition and on their phonological and morphological characteristics, thus allowing for comparability of results across different versions of CLT. The full list of the CLT-PL items is available in the OSF archive: https://osf.io/8cd73/.
Revisions of the CLT-PL
Reliability
The original version of the CLT-PL has been used in several studies, primarily examining the vocabulary size of Polish monolinguals and bilinguals, including Polish–Dutch, Polish–English, Polish–Italian, and Polish–Norwegian speakers (Abbot-Smith et al., 2018; Haman et al., 2017; Hansen et al., 2017, 2019; Krysztofiak et al., 2025; Łuniewska et al., 2022, 2024; Muszyńska et al., 2024; Verbeek et al., 2024). Using both previously published (Abbot-Smith et al., 2018; Haman et al., 2017; Hansen et al., 2017, 2019; Krysztofiak et al., 2025; Łuniewska et al., 2022; Muszyńska et al., 2023) and unpublished data, we analyzed the psychometric properties of the test items. The data included 386 children aged 3 to 8 years, comprising 209 monolinguals and 177 bilinguals who spoke Polish as their first language and English, Norwegian, or Italian as their second language. In general, the original version of the Polish CLT had high reliability (as assessed by Guttman’s λ-4), which varied between .94 and .97 in the bilingual sample, closely corresponding to the reliability reported in an independent sample of Polish–Dutch bilinguals (Cronbach’s α = .97; Verbeek et al., 2024). In the monolingual sample, reliability was somewhat lower – depending on the task, it varied between .90 and .96 for children aged 4;0 to 4;11, between .84 and .98 for children aged 5;0 to 5;11, and between .64 (noun production) and .97 (noun comprehension) for children aged 6;0 to 8;0. The reduced reliability of the tasks for children above the age of 6 was due to ceiling effects. Therefore, we decided to limit our study to data from children aged 3;0 to 5;11.
Naming agreement
In the revised CLT-PL designed for the present study, we replaced 15 words in the noun and verb production tasks due to children’s frequent use of synonyms or regional variants, adult naming inconsistencies (Wolna et al., 2023), or both. To determine which words to replace due to children’s use of synonyms, we performed a ‘soft scoring’ which included synonyms and regional variants as correct responses (Bohnacker et al., 2016), and a ‘strict scoring’, which awarded points only for responses that included the stem of the target word (Hansen et al., 2017; Łuniewska et al., 2022), for all responses. Then, we calculated the differences between soft and strict scoring, and replaced words with significant differences between strict and soft scoring across age groups. We replaced eight items because children frequently used synonyms (e.g. koszula ‘a shirt’ was often named bluzka ‘a blouse’) or regional variants (e.g. huśtawka ‘a swing’ was often named bujawka or bujaczka). These were items that, if named with the popular synonym or regional variant, would be scored as correct according to the rules proposed by Bohnacker et al. (2016) and as incorrect according to the ‘strict scoring’ procedure used by other authors (e.g. Hansen et al., 2019; Łuniewska et al., 2022). To determine which words to replace due to naming disagreements in adults, we analyzed naming agreement for the pictures in the noun and verb production tasks based on data collected from adult Polish native speakers (Wolna et al., 2023). We decided that items with a naming agreement of less than 80% should not be used as targets in the naming subtasks, as we could not expect children to name the pictures consistently in such cases. Four items were modified due to low naming agreement among adults. For example, a picture of a person frying eggs (target word: smażyć ‘to fry’) was labeled with the verb gotować, ‘to cook’, by 30% of adults. Finally, three items were both named by children with synonymous words and showed low (less than 80%) naming agreement among adults (e.g. krokodyl, ‘crocodile’ was named as aligator, ‘alligator’). The production targets were exchanged with matching targets or their distractors from the comprehension subtask (see details in Haman et al., 2015). Comprehension match was a word from the comprehension subtask of similar age of acquisition, morphological/phonological complexity, and semantic domain (e.g. production target krokodyl ‘crocodile’ was exchanged with a comprehension target pingwin ‘penguin’). In most of the cases, the exchange was made with a comprehension target (12 out of 15 replacements) but in the remaining three cases, the matching comprehension targets also had a low naming agreement in adults, so the production target was replaced by a matching distractor. The full list of changes made to the original Polish CLT and their rationale can be found in Supplemental Material 1, which is available in the OSF archive: https://osf.io/8cd73/. Please note that revisions in the production subtask also required changes in the comprehension subtask since items were exchanged between the two parts.
Picture revisions
In 2021, we also revised the entire CLT picture base (available at: https://osf.io/8cd73/, Wolna et al., 2023). We had three goals for the revisions. First, we wanted to update the images representing objects that had changed significantly between 2011 and 2021, such as technological hardware (e.g. computers and televisions) and machines (e.g. cars and buses). Second, we wanted to increase the representation of people of different ethnicities. Third, we wanted to diversify the gender of people performing actions in the images by increasing the number of male agents performing domestic tasks (e.g. ironing, cooking, cleaning) and increasing the number of female agents performing other types of tasks (e.g. typing on a computer, hammering).
In the original Polish CLT, the majority of the agents in the pictures were white (95%), which was reduced to 85% in the revised version. Similarly, in the original CLT-PL, although the overall proportions of female and male people performing the actions were very close (44% and 46%) when the scope was limited to domestic tasks, the work was mostly performed by female agents (79%). In the revised version of CLT-PL, we limited this number to 64% and increased male performance in domestic tasks from 21% to 36%. The change was most pronounced for the pictures representing target words, where the increase in male agents performing domestic tasks was from 17% to 33%.
Procedure
The CLT-PL test was administered according to the norming study manual (available in the OSF archive: https://osf.io/8cd73/), and the procedure was very similar to previous studies (Hansen et al., 2019; Łuniewska et al., 2022). We administered the CLT-PL using a mobile app (Child Lexicon CLT-PL app; Sobota et al., 2021a, 2021b) that is compatible with Android tablets and iPads (links to the app are provided in the OSF archive: https://osf.io/8cd73/). The tests were administered individually in a quiet room within the children’s preschools (92% of the test sessions), at the Faculty of Psychology, University of Warsaw (6% of the sessions), or at the child’s home (2% of the sessions). The test sequence started with comprehension (nouns followed by verbs) and then moved to production tasks (nouns followed by verbs). For standardization purposes, all children completed the tasks in the same order so that the effect of task order on the results would be the same for all participants. The tasks were ordered from easiest (comprehension) to hardest (production) to allow children to become accustomed to the testing situation with less demanding tasks.
In the comprehension part, children heard pre-recorded questions, followed by a brief 100 ms display of a blank screen, before viewing four pictures on the tablet and making their choice. In the production part, pictures were presented sequentially, each accompanied by a corresponding question, and the child tapped the screen to proceed to the next item. In some cases, if the child did not want to tap the screen, the experimenter did so. The entire CLT-PL test session typically lasted approximately 15 to 20 min. The production tasks were audio-recorded to allow offline scoring of children’s responses. After participating in the CLT assessment, some of the children also took part in another study conducted within the PolkaNorski project, concerning children’s inductive inferences within living things (Tarłowski et al., 2024).
Validation studies
Subsamples of children were additionally tested with another standardized measure of receptive vocabulary or a parental checklist to assess vocabulary and grammar skills. To assess receptive vocabulary, we used the Obrazkowy Test Słownikowy – Rozumienie (Picture Vocabulary Test – Comprehension; OTSR; Haman & Fronczyk, 2012). The OTSR is a standardized picture recognition test designed for children between the ages of 2;0 and 6;11; it assesses comprehension of nouns, verbs, and adjectives. The test consists of a series of picture boards. The participant’s task is to point to the picture (one of the four presented) that corresponds to the given question (e.g. ‘Where’s the horse?’ or ‘Who’s cutting?’). The OTSR test was always administered on the same day as the CLT-PL test.
The parental questionnaire was Polish CDI-III (Krajewski et al., 2023), which consists of a vocabulary checklist of 100 words acquired by children typically between the ages of 2;10 and 4;0, and a set of grammar questions, including either direct questions on selected aspects of grammar skills, or questions presenting two examples of utterances of varying complexity. This format is similar to other CDI-III adaptations (Eriksson, 2017; Holm et al., 2023; Tulviste & Schults, 2020). Parents completed the CDI-III before (n = 24) or after (n = 4) testing with the CLT-PL. The time interval between the CDI-III and the CLT-PL varied between 1 and 51 days (M = 13.1, SD = 12.3, Me = 10) for children whose parents completed the CDI-III before testing with the CLT-PL, and between 1 and 29 days (M = 13.3, SD = 13.5, Me = 11.5) for children who were first tested with the CLT-PL before their parents were asked to complete the CDI-III. Parents who completed the CDI-III received a 40 PLN voucher for an online bookstore.
Data preparation and analyses
CLT-PL scoring
We coded accuracy for the comprehension (picture recognition) and production (picture naming) tasks. For the picture recognition tasks, accuracy was scored automatically by the app. Children received 1 point for selecting the target picture and 0 points for selecting a distractor (or no picture). In the rare cases where children changed their minds about which picture to choose, the accuracy score took into account the child’s last response.
For the picture naming task, we implemented a soft accuracy scoring criterion that is closer to the scoring system proposed by Bohnacker et al. (2016) than to previous studies using the CLT-PL (Haman et al., 2017; Łuniewska et al., 2022). All responses that included a form of the target word, such as inflections, derivations, or mispronunciations, were considered correct. In addition, responses classified as close synonyms and regional variants were considered correct. All other responses, such as definitions, or semantic, phonological, or other errors, were considered incorrect. For each child, we calculated a score in each subtask, a comprehension score (noun comprehension + verb comprehension), a production score (noun production + verb production), and a total score (a sum of all subtasks).
Data analysis
The current paper is supported by a complete database containing all responses from all participants who completed the full testing with all four subtasks of the CLT-PL (Supplemental Material 2, see the OSF archive: https://osf.io/8cd73/).
First, we analyzed the reliability of the CLT-PL in terms of internal consistency. Second, we analyzed the validity of the CLT-PL in terms of intercorrelations between the four subtasks and in terms of correlations with the scores of the OTSR and the CDI-III. Third, in a multivariate regression analysis, we examined the relationships between scores on the CLT-PL and demographic measures such as age and sex of the participant, the region where the participant lives, and the size of the place of residence (rural area, urban area: small town, or urban area: large city), as well as maternal education (primary, secondary, or university level). Finally, we selected a subsample of children that was as close as possible to the general population of monolingual children in Poland in terms of the demographic measures that might affect the CLT-PL scores, to provide percentile, as well as standardized and stanine scores for all four subtasks of the CLT-PL, as well as summarized comprehension and production scores, and the total score.
Results
Reliability
Internal consistency
In the group of children aged 3;0 to 5;11, the CLT-PL had high internal consistency in terms of Cronbach’s alphas and Guttman’s lambdas-4 split-half coefficients (see Table 4).
The internal consistency of the CLT-PL.
Note. CLT-PL = Polish CLT.
Validity
Intercorrelations of the subtasks
The comprehension and production subtasks of the CLT-PL were strongly correlated (r(482) = .80, p < .001), and the correlation was still high even after partialing out the participants’ age (r(482) = .56, p < .001).
Convergent validity
Scores on the CLT-PL were moderately or strongly correlated with the scores on the CDI-III (Krajewski et al., 2023) and the OTSR (Haman & Fronczyk, 2012, see Table 5 and Figure 1).
Pearson’s correlation coefficients between CLT-PL scores and other measures of language skills.
Notes. CDI = Communicative Development Inventory; CLT-PL = Polish CLT; OTSR = Obrazkowy Test Słownikowy – Rozumienie.
p < .05. **p < .01. ***p < .001.

Relationship between scores on the CLT-PL (dark blue circles: comprehension, light blue squares: production) and other measures of language skills with fit lines and 95% confidence intervals.
Impact of demographic measures
For the comprehension subtask in children aged 3;0 to 5;11, the multivariate regression analysis revealed a significant overall model fit, F(12, 341) = 24.42, p < .001, adj. R 2 = .44. Scores on the CLT comprehension subtasks were significantly predicted by the child’s age (β = 0.65, 95% CI [0.57, 0.74], p <.001) and by sex, with girls scoring slightly higher than boys (β = 0.17, 95% CI [0.01, 0.32], p = .037, Figure 2A). A one-way ANOVA revealed a significant effect of age on comprehension scores, F(2, 359) = 127.6, p < .001, η² = 0.42. Tukey’s HSD post-hoc analysis showed that 3-year olds (M = 51.20, SD = 7.72) scored significantly lower than 4-year olds (M = 58.87, SD = 4.39; p < .001, d = 1.29) and 5-year olds (M = 61.49, SD = 2.06; p < .001, d = 2.04). Four-year olds also scored significantly lower than 5-year olds (p < .001, d = 0.77). Additionally, children whose mothers had primary education scored lower than children whose mothers had secondary education (β = −0.55, 95% CI [−0.98, −0.13], p = .011), while there was no significant difference between children of mothers with secondary and university education (β = 0.15, 95% CI [−0.04, 0.34], p = .111; Figure 3A). The other variables, that is, rural/urban area and region of residence, were not significant predictors in the regression (all p-values above .12).

Relationship between CLT-PL scores (A: comprehension, B: production), children’s age, and sex (boys: dark blue triangles, girls: green circles), with fit lines and 95% confidence intervals.

Relationship between the CLT-PL scores (A: comprehension, B: production), children’s age, and maternal education (primary: dark blue triangles, secondary: blue circles, university: green squares), with fit lines and 95% confidence intervals.
For the production subtask in children aged 3;0 to 5;11, the multivariate regression analysis revealed a similar significant model, F(12, 372) = 26.15, p < .001, adj. R2 = .44. Scores on the CLT production subtasks were significantly predicted by the child’s age (β = 0.66, 95% CI [0.58, 0.74], p < .001) and by sex with girls again scoring slightly higher than boys (β = 0.17, 95% CI [0.03, 0.32], p = .022; Figure 2B). A one-way ANOVA revealed a significant effect of age on production scores, F(2, 391) = 118.6, p < .001, η² = 0.38. Tukey’s HSD post-hoc analysis showed that 3-year olds (M = 39.97, SD = 8.70) scored significantly lower than 4-year olds (M = 49.01, SD = 7.11; p < .001, d = 1.16) and 5-year olds (M = 56.60, SD = 5.17; p < .001, d = 2.03). Four-year olds also scored significantly lower than 5-year olds (p < .001, d = 0.75). Additionally, children whose mothers had university education scored higher than children whose mothers had secondary education (β = 0.21, 95% CI [0.03, 0.39], p = .022), while there was no significant difference between children of mothers with secondary and primary education (β = −.32, 95% CI [−0.73, 0.10], p = .135; Figure 3B). Again, the other variables, that is, rural/urban area and region of residence, were not significant predictors in the regression (all p-values above .15).
Norms
Finally, we calculated percentile as well as standardized and stanine scores for the 1-year age intervals (i.e. 3-year olds from 3;0 to 3;11, 4-year olds from 4;0 to 4;11, and 5-year olds from 5;0 to 5;11). Since the results in CLT-PL depended on the age of the children, their sex, and their mothers’ level of education (but not on the urban/rural category or the region of residence), these three variables were considered when selecting children for the normalization subsample. This was done to ensure that the selected subsample matched the population in terms of children’s sex and their mothers’ level of education. Due to the overrepresentation of children whose mothers had university education (who achieved the highest scores in the CLT-PL), calculating norms based on all collected data could result in overestimated scores. The norms are available in Supplemental Material 3 (available from the archive: https://osf.io/8cd73/).
The norms reflect age-related score increases (see Table 6), higher comprehension than production scores, and better performance on nouns than verbs. Ceiling effects are evident in noun comprehension from age 4, with typical scores of 30−31 out of 32, and even 3-year olds scoring 26 to 30 (over 80%). Similar ceiling effects appear in verb comprehension and noun production for 4-year olds (typically >26 points, >80%) and 5-year olds (>28 points, >87%). Verb production is the only task without ceiling effects across all age groups, with typical accuracy ranging from 41% to 63% in 3-year olds, 63% to 75% in 4-year olds, and 72% to 84% in 5-year olds. Overall, comprehension tasks show ceiling effects (>89% accuracy) in 4- and 5-year olds but not in 3-year olds (72%−89%). Production tasks remain further from ceiling, with typical accuracy of 55% to 72% in 3-year olds, 72% to 84% in 4-year olds, and 80% to 89% in 5-year olds.
Excerpt of the CLT-PL norms: low (L), typical (T), and high (H) raw scores for children aged 3;0 to 5;11.
Note. Complete set of the CLT-PL norms is available from the OSF archive: https://osf.io/8cd73/.
Low scores (L) correspond to standardized scores below 85 and stanine scores 1 to 3, typical scores (T) correspond to standardized scores between 85 and 115 and stanine scores between 4 and 6, and high scores (H) correspond to standardized scores above 115 and stanine scores 7 to 9. In rare cases where there is a slight mismatch between the two scales, the stanine scale took precedence for this table. CLT-PL = Polish CLT.
Discussion
Psychometric properties of the CLT-PL
In this paper, we have presented a standardization of a tool for assessing the receptive and expressive vocabulary of monolingual Polish-speaking preschool children, namely the revised Polish version of the LITMUS Cross-Linguistic Lexical Tasks. We have demonstrated high reliability and validity of the instrument and presented a set of norms for children aged 3;0 to 5;11.
The revised CLT-PL shows high internal consistency and high validity. The scores of the two subtasks of the CLT-PL correlated strongly with each other, even after controlling for the age of the participants. The CLT-PL also has high convergent validity: for the youngest children tested, that is, 3-year olds, scores on the CLT-PL are strongly positively correlated with scores on the CDI-III, a parent-report instrument used to assess the language skills of children between the ages of 3 and 4 (Krajewski et al., 2023). Additionally, a strong correlation was found between scores on the CLT and scores on another standardized tool for measuring children’s passive vocabulary, the OTSR (Haman & Fronczyk, 2012), for children in a broader age range (3–6 years). The strength of the correlation between scores on the CLT-PL and scores on other tools measuring children’s language skills is comparable to previously reported data from other languages (Kapalková et al., 2024; Khoury Aouad Saliby et al., 2017; Smolík & Bytešníková, 2021).
Results on the CLT-PL follow the same pattern as other instruments measuring language proficiency in Polish-speaking children (Haman et al., 2012; Smoczyńska et al., 2015): girls score slightly higher than boys, and the higher the mother’s educational level, the higher the child’s scores on the CLT-PL. Interestingly, we found no effect of the size of the place of residence on the scores on the CLT-PL, that is, children living in urban areas have the same level of scores as children living in rural areas. We also found no effects of region of residence. Thus, we confirmed that the results in the CLT-PL are independent of small regional differences in the vocabulary used in various places in Poland.
The lack of effects of region of residence on vocabulary scores may have two causes: First, in revising the test, we minimized items that elicit synonymic responses or regional variants of target words. If done correctly, this should reduce dialectal or sociolectal differences. In total, the data collected in the current study contains 33,280 responses to individual items in the production subtasks. Only 204 of these responses were classified as regional variants of synonyms, representing less than 1% of all responses scored as correct (see Supplemental Material 2). Second, we applied a soft, or liberal, scoring of responses in which synonymic and regional responses were scored as correct. This would eliminate dialectal or sociolectal effects that might have remained after revision. Since the effects of linguistic regional differences are removed in this way, the results of no effects of the urban/rural area or region of residence suggest that the CLT-PL is robust to non-linguistic regional variation, too. It would be interesting to see whether this would also be the case for CLTs in other languages characterized by greater dialectal lexical variability than Polish.
CLT-PL and monolingual norms
The monolingual norms for the CLT-PL may be a first step towards making the CLT-PL a diagnostic tool for use by practitioners working with Polish monolingual children. Our goal is to make the CLT-PL the first reliable, standardized tool that is available to practitioners as a mobile app on iPads or Android tablets and that can be used free of charge. The monolingual norms also make the CLT-PL a more useful research tool in cross-linguistic studies comparing monolingual Polish-speaking children to monolingual children acquiring other languages. This is particularly relevant for researchers who use different language versions of the CLT in their research as a proxy for multilingual children’s language proficiency or language dominance. While initial studies suggest that monolingual children acquiring different languages generally show similar patterns of performance on CLT (Haman et al., 2017), more detailed data from Spanish and Dutch suggest that there are differences in performance between language versions of CLT (van Wonderen & Unsworth, 2021). A large-scale study with comparable groups, for example, monolingual children of the same age and similar socioeconomic status, is needed to assess whether parallel task development results in tools of similar difficulty across languages. Differences in the level of difficulty of the CLT in different languages would suggest that the same method of developing the instruments does not guarantee their comparability. In such a case, one might conclude that the CLTs cannot be used, for example, to determine the linguistic dominance of bilingual children by comparing vocabulary scores in two languages, as is commonly done.
We believe that the data presented in the current paper may be useful for further comparisons of psychometric properties such as reliability, validity, and difficulty of CLTs across languages as more monolingual data are collected in other languages. The CLT-PL should also be tested with children diagnosed with language delay or disorders through other means. Only then will we know if, and in which way, the tool is sensitive enough to point at the children with language impairment. The monolingual norms can be a first step in this direction, as thanks to them we know which scores can be considered lower than typical for the age.
Although the CLT-PL can also be used to assess bilingual or multilingual children with Polish as one of their languages, the Polish skills of these children should not be compared to norms, or the norms should be carefully adjusted (Elin Thordardottir, 2015). As for multilingual children in general, Polish bilingual and multilingual children show an immense diversity, including children from migrant families for whom Polish is the home language, children from families migrating to Poland who use other languages at home (e.g. Ukrainian), or children who acquire Polish as a minority language in Lithuania. To deal with multilingual diversity, previous research (Mueller Gathercole et al., 2008; Peña et al., 2018) have developed norms for specific bilingual groups, taking into account sociolinguistic and demographic factors. Perhaps a similar approach should be adopted for Polish-speaking bilinguals, taking into account factors such as daily language input, length of stay outside (or in) Poland, or length of attendance at an early childhood and care center in a municipality or center with a different majority language. On the other hand, the list of factors to be taken into account can quickly become very long if we consider all the variables that may affect the size of bilingual children’s vocabularies, and then the ‘norms for bilinguals’ may essentially have to constitute a set of ‘norms’ for each individual child (see also: Newbury et al., 2020).
We hope that providing monolingual norms for a tool designed for bilinguals may also encourage practitioners working with bi- or multilingual children in Poland to reflect on the limited applicability of norms to diverse populations in Poland. Practitioners should be aware of the potential differences between monolingual and multilingual norms, due to the characteristics of multilingual development, and treat relatively low scores on the CL-PL not as a sign of delay, but as a sign of typical diversity (Bak et al., 2021; Hoff & Ribot, 2017). We are also aware of the ethical and methodological difficulties of over-comparing the development of bilingual children with the monolingual norm (Leivada et al., 2023; Rothman et al., 2023). Therefore, our goal is not to present Polish monolingual CLT scores as a ‘standard’ to which bilingual children should aspire.
Even so, we do assume that it would be possible to build methods of ruling out language delay in at least some multilingual children based on monolingual normed tools. If, for example, a Polish-speaking child living in Norway scores within the monolingual CLT-PL norms, we may expect that relatively low Norwegian skills may improve with more experience with Norwegian without any immediate need to worry about this child’s general ability to acquire language. It is also possible that monolingual CLTs may serve as a basis for multilingual norms based on total vocabulary or total conceptual vocabulary. This is yet to be tested.
CLT-PL and ceiling scores
Ceiling effects in the CLT-PL
Despite its good psychometric properties, that is, reliability and convergent validity, the CLT-PL is an easy test for Polish monolinguals. Although 3-year olds showed a relatively high variability of scores in all subtests, 4-year olds already showed ceiling effects in noun comprehension, as 56% of the children in the norming subsample either scored maximally or missed only one item. In the case of 5-year olds, not only noun comprehension but also verb comprehension was characterized by a ceiling effect, with 47% of the children scoring either 31 or 32 points (out of 32). In this group, noun production was also characterized by very high scores, and only verb production was not affected by ceiling effects. These findings were clearly reflected in the norms, according to which, in some subtasks and especially for the older age groups, the typical scores correspond to accuracy over 85% or even 90%. These observations suggest that, despite its valid psychometric properties, the CLT-PL may be too easy to be used as a reliable diagnostic tool in typically developing Polish monolinguals older than 4 years. Due to the ceiling scores in typically developing children, the CLT-PL cannot be used as a tool to determine exceptional word knowledge in children older than 3 years, as it cannot distinguish between typical and high scores. On the other hand, the relatively high scores of typically developing children are a very good starting point for the recognition of children with limited vocabulary, for example, children with language impairment.
Assessment of 2-year olds
Since Polish monolingual 3-year olds were able to follow the instructions of the CLT-PL and solved the majority of the items correctly, and older children reached the ceiling scores, it seems reasonable to think that 2-year olds should be included in future norms for the CLT-PL. In a recent study using the Turkish version of the CLT, 2-year olds achieved accuracies ranging from 30% for naming actions to 68% for noun comprehension (Yılmaz Çifteci & Tunçer, 2024). Smolík and Bytešníková (2021), using the Czech version of the CLT, conducted a study of the youngest children to date, that is, 15 to 42 months old monolingual Czech speakers. Even in the youngest group, consisting of children aged 15 to 23 months, the authors reported an average accuracy of 43% in the word comprehension tasks. In the group of 2-year olds (24–29 months) and two-and-a-half-year-olds (30–35 months), the authors reported 69% and 74% accuracy, respectively, in the comprehension tasks (Smolík & Bytešníková, 2021). The picture-naming tasks were much more challenging for the Czech monolinguals than the comprehension tasks, as the authors reported that some children did not follow the instructions and did not produce verbal labels for the pictures. In the picture naming tasks, the youngest children, under 2 years of age, named 21% of the pictures correctly. Performance in the picture naming tasks increased rapidly from the age of 2 and the average accuracy of both 24- to 29-month olds and 30- to 35-month olds was around 44% (Smolík & Bytešníková, 2021). The results of the Turkish and Czech studies suggest that CLT can indeed be used with monolingual children under 3 years of age, although perhaps only the comprehension tasks are usable for the youngest children, that is, under 2 years of age.
Assessment of other groups than typically developing monolinguals
The potential of the CLT-PL to be used as a diagnostic tool may depend on the purpose of the assessment. Profiling children’s vocabulary or assessing individual differences for research purposes may be compromised in the two older age groups. However, the CLT-PL may still be useful for screening for language disorders, especially when accompanied by other standardized tools to ensure a more comprehensive assessment. We would recommend that researchers and practitioners using the CLT-PL to assess the vocabulary of monolingual Polish-speaking children focus on evaluating productive vocabulary rather than comprehension. This is particularly advisable, as alternative instruments are available to assess comprehension in monolingual Polish-speaking children of similar age, which do not exhibit ceiling effects (see next section). We would also recommend using both parts of a subtask, that is, assessing both nouns and verbs, as these longer scales show higher reliability and suffer less from ceiling effects than assessing only one of the subtasks.
Directions for further revisions
To enhance the utility of CLT-PL for older children or create a more reliable task without ceiling effects in typically developing 4- and 5-year olds, it would be necessary to add items of higher difficulty. For production tasks (picture naming), this can be achieved by including words with a higher age of acquisition, as demonstrated by van Wonderen and Unsworth (2021) in their research. However, for comprehension tasks, which already show high scores in 3-year olds, selecting a sufficiently difficult set of items based on the words included in the current version of the CLT word list is problematic. This stems from the cross-linguistic procedure used in creating the CLT, which, as a byproduct, limited the word set to those acquired early (mainly before age 6; Łuniewska et al., 2016, 2019) and with high imageability (Haman et al., 2015). A potential way to increase the difficulty of comprehension tasks could involve introducing more challenging distractors, such as words semantically related to the target. However, this approach would not be universally applicable across languages, as pictures representing semantically close words in one language might be named with the same word in another, potentially resulting in item difficulty systematically varying across languages. Alternatively, the revised, more difficult version of the CLT could include items from other word classes than nouns and verbs which are often acquired earliest in life (Braginsky et al., 2019). However, the decision to include nouns and verbs in the CLT design again resulted from the cross-linguistic perspective and an assumption that these word classes are the most universal across languages (Haman et al., 2015).
Database of the CLT-PL scores
This paper is accompanied by a database with a complete set of CLT-PL data from 479 children between the ages of 2 and 7 years. This database can serve as a starting point for subsequent analyses or as a reference point for future studies in which the results of children tested with the Polish version of the CLT could be related not to the population results but to the results of a selected control subgroup, for example, the results of children of highly educated parents living in large cities. This database, when combined with new multilingual data, may also allow for more detailed analyses, such as testing the influence of word properties on lexical access in monolinguals and bilinguals (Łuniewska et al., 2022), or reflecting on the organization of the mental lexicon of monolingual and bilingual children (Krysztofiak et al., 2025).
CLT-PL in comparison with other Polish instruments
The standardization of the CLT-PL makes the instrument an interesting alternative to OTSR (Obrazkowy Test Słownikowy – Rozumienie, Haman & Fronczyk, 2012), TRJ (Test Rozwoju Językowego; Smoczyńska et al., 2015), and TSD (Test Słownikowy dla Dzieci, Koć-Januchta, 2013), because it is designed to be easily adaptable and comparable to other languages. In fact, comparable CLTs already exist in 40 languages (https://multilada.pl/en/projects/clt/). In addition, the instrument is easy for clinicians to administer thanks to the mobile app design on tablets. It is also relatively fun for children to complete the task thanks to touch screen use, but also to the child-friendly colorful pictures.
The version of the CLT-PL presented in this text has been additionally revised so that the pictures representing all target words are characterized by very high agreement in naming by native Polish-speaking adults (Wolna et al., 2023). The CLT-PL was the first language version of CLT to be systematically revised based on the data collected and the naming agreement of adults, and the revision procedure of the Polish CLT that we adapted can serve as a model for similar work in subsequent languages.
In contrast to the CLT, other commonly used Polish language assessment tools are not easily adapted to other languages, since the process may radically alter their psychometric properties. Although the OTSR is superficially similar to the popular British English BPVS or American English PPVT (Muszyńska et al., 2024), it cannot easily be translated to other languages. While three of the four pictures might still make sense in translated versions, that is, the target picture (e.g. dog), the semantic distractor (cat) and the thematic distractor (bone), the phonetic distractor (e.g. pies ‘dog’ – dres ‘tracksuit’), cannot (Haman et al., 2012). Another example of the inadequacy of the currently available Polish language assessment tools, which we have purposely avoided in the development of the CLT, is the limited cultural relevance of the pictures – for example, TRJ includes a picture of a boy beating a carpet with a traditional carpet beater. Although this image is understandable to almost 70% of Polish 6-year olds and over 80% of 8-year olds (Smoczyńska et al., 2015), it may be completely unrecognizable to individuals from other cultural backgrounds.
The CLT-PL is free from such cross-cultural limitations because it was developed according to a cross-linguistic test development procedure (Haman et al., 2015). The words used in the Polish version of the CLT (as well as in any other language version) were systematically selected in cross-linguistic studies so that the actions and objects depicted in the pictures would be recognizable in as many languages as possible (Haman et al., 2015), thus minimizing the risk that the respective test items would be unfamiliar to children.
Limitations
The most serious limitation of the study, discussed in detail above, is that it does not provide norms for bilingual children acquiring Polish as one of the languages. We hope that future studies will enable the development of such norms. Similarly, we believe that future studies will test the sensitivity and specificity of the CLT-PL in screening for risk of language development disorders.
Another limitation of the present dataset is that, due to the relatively large but still limited number of subjects and the significant over-representation of children of mothers with higher education, we were not able to develop norms separately for girls and boys, nor by six-month age range as in some other tests, and the norms we developed were developed on samples of 80 to 100 children per age range. For this reason, we recommend extending the norms in the future when data are collected from younger children (2-year olds) and more data are collected from children aged 3 to 5 years. With the availability of data from our study in the OSF archive, it will be possible to include these in future analyses.
Finally, the CLT-PL as a task for assessing the vocabulary of monolingual Polish speakers has its own shortcomings. In particular, although the task is reliable and valid, it seems to be too easy for typically developing monolingual Polish speakers, as we observed ceiling effects in almost all tasks (with the exception of verb production) and in almost all age groups (i.e. in the range of 3;0 to 5;11, the older the children, the more the task tended to produce ceiling scores).
Conclusion
The CLT-PL is a new reliable and valid tool for assessing the receptive and expressive vocabulary of Polish-speaking monolingual children. The reliability of the Polish CLT was confirmed by a high internal consistency, while the validity was supported by strong correlations with other standardized instruments. The CLT-PL has potential for use in research with Polish-speaking children between the ages of 3 and 6 (or older in the case of multilingual children, and perhaps younger in the case of monolinguals), as it is the first openly available assessment instrument in Polish. The CLT-PL norms can be used to assess the vocabulary of monolingual Polish-speaking children or to conduct cross-linguistic research on monolingual populations.
Supplemental Material
sj-pdf-1-fla-10.1177_01427237251336497 – Supplemental material for Polish LITMUS Cross-Linguistic Lexical Task: Reliability, validity, and norms for monolingual 3- to 5-year olds
Supplemental material, sj-pdf-1-fla-10.1177_01427237251336497 for Polish LITMUS Cross-Linguistic Lexical Task: Reliability, validity, and norms for monolingual 3- to 5-year olds by Magdalena Łuniewska, Magdalena Krysztofiak, Weronika Białek, Martyna Burdach, Ewa Komorowska, Grzegorz Krajewski, Judyta Pacewicz, Julia Radzikowska, Nina Gram Garmann and Ewa Haman in First Language
Footnotes
Correction (June 2025):
Article updated with the addition of online supplemental material (the Polish translation).
Authors’ Contribution
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was possible thanks to a grant from the Norwegian Financial Mechanism for 2014–2021 (via National Science Centre, grant 2019/34/H/HS6/00615; project PolkaNorski). During the work on the paper, MŁ was also supported by a subsidy from the Polish Ministry of Science and Higher Education in the Excellence Initiative – research university (2020–2026) program.
The revision of Polish LITMUS-CLT was additionally supported by the by the Faculty of Psychology, University of Warsaw, from the funds awarded by the Ministry of Science and Higher Education in the form of a subsidy for the maintenance and development of research potential in 2021 (501-D125-01-1250000 zlec. 5011000239 awarded to EH, 501-D125-01-1250000 zlec. 5011000208 awarded to MK), and in 2022 (501-D125-01-1250000 zlec. 5011000202 awarded to MK).
The validation study of the Polish LITMUS-CLT was supported by the Faculty of Psychology, University of Warsaw, from the funds awarded by the Ministry of Science and Higher Education in the form of a subsidy for the maintenance and development of research potential in 2024 (501-D125-01-1250000 zlec. 5011000241 awarded to MŁ and GK).
The revisions of the CLT pictures were supported with an internal grant from the University of Warsaw POB IV for parents – New Ideas awarded to MŁ and funded from subvention from the Polish Ministry of Science and Higher Education in the Excellence initiative – research university (2020–2026). The University of Warsaw holds copyrights for the CLT picture database.
The preparation of the Polish translation of the paper was funded by the Norwegian and EEA Funds for the years 2014–2021 (bilateral initiative number: 2024/43/7/HS6/00002, Action: PolkaNorski Implemented). The Polish translation of the paper is freely available as a Supplementary material.
We would like to thank the research assistants and the students: Julia Bohr, Alicja Caban, Monika Gurak, Alicja Jeleń, Antoni Kędzierski, Zofia Kordas, Bartosz Miklaszewski, Dorota Orzeszek, Łucja Pawluk, Julia Pawłowicz, Julia Skibińska, Martyna Stasiewicz, Marta Strzelczyk, and Hanna Tomczyńska who participated in the recruitment process, data collection or data coding.
The Cross-Linguistic Lexical Tasks (CLT; https://multilada.pl/en/projects/clt/) were first designed within COST Action IS0804 to develop a tool to test the vocabulary of monolingual and bilingual children across languages; these are now part of the LITMUS (Language Impairment Testing in Multilingual Settings) battery (
).
We are very grateful to Krzysztof Sobota, who designed and programmed the Child Lexicon CLT mobile application we used in the study. The mobile application is available for every researcher interested in using it from the official shops for Android tablets: https://play.google.com/store/apps/details?id=pl.edu.uw.childlexiconclt and for iPads
.
Last but not least, we are grateful for the great contribution of Justyna Kamykowska, the artist who is the creator of all the CLT pictures and who has been working for years on the development of the CLT picture base and its continuous growth. The full set of CLT pictures, including additional variants of pictures, is available on request at
.
Ethical approval and informed consent statements
The norming study of the CLT-PL was part of a larger project (PolkaNorski) and was approved by the Research Ethics Committee of the Faculty of Psychology, University of Warsaw. Parents signed an informed consent form prior to testing. Children who verbally agreed to participate in the study were tested in preschools or at the Faculty of Psychology, University of Warsaw.
ORCID iDs
Data Availability Statement
Supplemental Material
Supplemental material for this article is available from the OSF archive: Polish translation of the article is available here.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
