Abstract
Aims and objectives/purpose/research questions:
Students who immigrate to a new country are commonly expected to “catch up” to their mainstream peers in language and academics and to complete their education in the new language and culture. Yet little is known about recent immigrants’ academic trajectories, particularly in Germany, where research remains scarce despite the relatively high proportion of children who immigrate after the start of mandatory schooling (age 6 or later). This study focused on reading as a central skill for literacy and school achievement.
Design/methodology/approach:
The study invoked standardized tests of reading fluency, reading comprehension and vocabulary among 76 (+13) recently immigrated students and 192 of their non-immigrated peers in eight lower secondary schools in spring 2022 and followed up on a small subgroup annually for 2 years to provide exploratory information on learning trajectories.
Data and analysis:
Analyses using mixed-effects models compared the reading performance of immigrated and non-immigrated students. In addition, individual factors such as vocabulary breadth were analyzed to determine their impact on reading development.
Findings/conclusion:
Immigrated students consistently scored below their non-immigrated peers across all reading measures, with average performance gaps ranging from 0.2 to 1.2 standard deviations. Longitudinal follow-ups indicated little evidence that these students were closing the gap over time, though individual differences, particularly vocabulary breadth, had some effects.
Originality:
This study contributes novel insights into the long-term academic development of immigrated students in Germany, an understudied population in this context. It highlights the challenges faced by these students in acquiring academic language skills and the limitations of data collection instruments.
Significance/implications:
The findings highlight the need to clarify realistic literacy expectations for immigrant students and to better understand how their literacy trajectories vary. Such insight can guide policies and instructional practices that provide developmentally appropriate, linguistically responsive support.
Introduction
Since 2015, Germany has witnessed a notable increase in the numbers of immigrated persons. It is not only the country with the fourth largest number of external refugees worldwide and the most in Western Europe (currently hosting over 2.6 million refugees, UNHCR, 2023), but it is also a desirable nation for inner-European, primarily economic, migrants. These patterns are reflected in the school population. While the proportion of first-generation school-aged migrants was only about 1% in 2014, by 2024, it had reached 13.6% of all 5- to 20-year-olds. Importantly, a large proportion of the population – 8.9% – had immigrated after the age of 5 (Destatis, 2025). 1
Immigration has wide-reaching effects on society, and educational systems are particularly affected when large numbers of children without knowledge of the language(s) of schooling arrive within a short period of time. In order to meet the needs of schools, educators, parents, and students, information on expected language and academic learning trajectories is helpful to form expectations for students’ achievement, consider additional support needs, and develop teaching and learning materials.
To date, little is known about the academic and linguistic progression of recently immigrated students in Germany, that is, students who immigrated to the country at or after the cut-off for mandatory schooling at age 6 (Dickopp, 1982). The dearth of research is due to various factors, including both the high diversity of the target group as a whole and the differences in educational measures enacted to support them, making educational and systemic factors difficult to estimate (Twente & Marx, 2026). In Germany, schooling models for immigrants are highly diverse. Depending on the state, the type of school, the age of the students, and the financial and personnel capabilities of schools, students may be fully integrated and receive no additional language support, or they may be schooled only in German as an Additional Language (DaZ) classes with up to 25 hours per week for up to 2 years (Ohm & Ricart Brede, 2023; Will et al., 2021), or they may attend any number of partially integrative measures for different lengths of time and with classroom hours varying from 1 to 15 hours per week (Diebel & Ahrenholz, 2023). Regardless of language support received, students are commonly expected to successfully attend mainstream classes and subsequently catch up academically with their non-immigrant peers, that is, reach grade-level norms in content subjects. In Germany, it is unclear whether students reach this goal – and, if so, how long they need to do so.
The continuing lack of research is a central problem for schools, teacher training and research alike. Of special concern is the development of reading skills in German, since much academic coursework is based on written texts, and higher reading skills have consistently been shown to be associated with higher academic achievement (e.g., Sparks et al., 2014), especially among children with varying linguistic backgrounds (Verhoeven & Vermeer, 2006). However, recently immigrated students may have reduced access to grade-level texts due to lower reading skills. Previous studies indicate that the target group remains at a fairly low reading level even after 2 years of schooling in Germany, that is, below the CEFR level of B1 (Marx et al., 2021), and far below measured levels of reading skills of their cohort for much longer – at least 8 years – than previously (Caspari & Marx, 2024; Marx et al., 2021). However, the evidence base remains sparse und unclear, especially following the Covid-19 pandemic.
The aim of this study, carried out as a stand-alone study within a larger research project (Marx, Barberio, Twente, Fuchs, Eisenbeiss, et al., 2024), was thus to investigate reading skills in German of immigrated students attending mainstream education in Grades 5 to 8 (n = 76) and 13 further students in Grades 9 and 10 after a minimum transition period of 1 year after migration. The study questions long-held assumptions about immigrated students’ language progression and, further, the research instruments used when investigating this specific target population.
Immigrated students in the interface of SLA and general (language) education
Immigrated students pose an interesting group for research, not only due to their increasing numbers in the education system and the resulting practical implications but also because they challenge traditional categories in linguistics research. Unlike simultaneous bilinguals or early L2 learners – who dominate much of psycholinguistics research (Aronin, 2022) – or highly educated adults, often the focus of SLA (second language acquisition) studies (Mackey & Gass, 2022), they also do not fit the typical profile of second language learners in school settings. While school-based second language learners share relatively similar ages, language backgrounds, and educational experiences, immigrated students provide for an exceptional study in heterogeneity. This diversity extends across multiple dimensions, including prior education, personal and linguistic backgrounds, and the varied language learning contexts they encounter in the host country. In Germany, these challenges are compounded by a lack of specialized teacher training in German as a second language (Schröter et al., 2024), as well as the absence of official curricula, standardized teaching materials, or authorized learning resources.
Such individual and contextual differences (Bronfenbrenner & Morris, 2007; Dörnyei, 2005; Ellis, 2005) complicate both participant recruitment and data reliability, and consequently affect sampling, research design, and the validity of conclusions drawn from the data. Data protection guidelines for this vulnerable population, language barriers faced by legal guardians, and schools’ reluctance to accommodate researchers in already overburdened environments further hinder access. High participant morbidity due to student mobility to other schools or regions exacerbates these difficulties.
Regarding research designs, data collection instruments are often unsuitable for multilingual students in general (De Angelis, 2021) and for this population in particular. Instruments are either designed for non-immigrant student populations or for foreign language learners, or are developed ad hoc for specific studies. The high heterogeneity of this population makes norming nearly impossible, leading researchers to apply instruments without any comparison to further groups. Moreover, conclusions about the effects of individual and contextual variables on learning must be interpreted with caution, as many such variables are difficult – if not impossible – to measure validly.
In Germany, these challenges have shaped research methodologies, leading to specific approaches in sampling, data collection, and research design. A recent scoping review on research methodologies in studies of immigrant learners of German (Twente & Marx, 2026) analyzed 43 studies published between 2012 and 2024 that focused on language use or development. The findings revealed that over two-thirds of studies had sample sizes below 30 participants, three-quarters relied solely on data from the target group without comparison groups, and nearly all used convenience sampling. There is also a strong preference for cross-sectional designs, particularly in studies with larger samples. Of the nine studies with at least 150 participants, only two included longitudinal analyses. Among the 22 studies with fewer than 30 participants, only six conducted a second round of data collection. Regarding research instruments, non-standardized and non-normed tests, as well as self-report measures, are widely used to assess language skills, all of which have little validity for the population (see, for example, Tomoschuk et al., 2019). When normed tests are employed, they are typically normed on populations that exclusively use the language of schooling as their family language, excluding even non-immigrated students with heritage language backgrounds.
These challenges are further compounded by a strong traditional focus on specific aspects of German proficiency. Most studies emphasize syntax development, following the tradition of the ZISA study from the early 1980s (Clahsen et al., 1983), particularly verb placement (see Gamper & Schlauch, 2021, for a recent example). As a result, research on literacy skills essential for academic success remains scarce. Since 2021, only five published studies have included a measure of writing skills, and only three – including this study – have examined reading. Beyond the methodological limitations outlined earlier, this gap in research presents significant challenges for applying findings to classroom assessment and pedagogy.
Reading skills in educational research
Reading skills were chosen as the central variable of interest for the study. Unlike oral communicative skills, which are often the focus of SLA instruction, literacy-relevant skills – reading and writing – are frequently overlooked in research on this population (see Cummins (2021) for an in-depth discussion, and Twente and Marx (2026) for a discussion in the context of Germany). This oversight is particularly problematic for secondary school children, as written texts serve as the primary source of information in most subjects, making reading central for knowledge expansion, language development, and academic success. Recognizing its importance, international assessments such as the OECD’s PISA have prioritized reading literacy since 2000.
Reading is a complex skill that is difficult to model theoretically and assess empirically. From a cognitive perspective, reading comprehension involves constructing meaning from text through the interaction of lower- and higher-level cognitive, linguistic, and non-linguistic skills (Lenhard, 2013; Nassaji, 2014). This process enables readers to develop a situation model – a coherent mental representation of a text’s meaning – by integrating prior knowledge, making inferences, recognizing words visually, and processing syntax (van Dijk & Kintsch, 1983). Reading fluency mediates these processes (Kuhn et al., 2010), which also depend on general cognitive skills, motivation, and L1 and L2 proficiency (Grabe, 2009; Lenhard, 2013). Given the complex interaction of these skill sets, acquired semantic and pragmatic knowledge is a necessary prerequisite for reading success. It is thus unsurprising that a particularly strong predictor of both reading fluency and comprehension is vocabulary breadth and depth (Verhoeven & van Leeuwe, 2008). For L2 learners, this relationship has been shown to be strongly reciprocal: while reading supports vocabulary development more than any other skill (Jeon & Yamashita, 2014), vocabulary knowledge also significantly influences reading comprehension (Choi & Zhang, 2021).
For L2 learners, challenges arise across all reading-related skills and subskills (Jeon & Yamashita, 2014). Higher-level difficulties include limited prior knowledge such as cultural specificity in concepts and vocabulary breadth and depth, as well as challenges in further top-down comprehension processes (Droop & Verhoeven, 2003). Limited literacy education in the first language can lead to inefficient reading strategies and reduced working memory capacity, further hindering coherence building; this is especially relevant for immigrated students arriving in the host country after the start of formal education (DeCapua et al., 2009). At the same time, reduced reading fluency due to, for example, slow word recognition can impede text comprehension.
It can thus be expected that immigrant students will show better results in reading tests if they have (1) longer stays in Germany, (2) more education in the home country and first language, and (3) higher vocabulary skills in German, as they have had more chances both to become familiar with specific cultural concepts and semantic word knowledge, and to develop reading skills in their L1 and German.
Despite these challenges and the crucial role of reading in academic success, little research has examined the reading skills and progression of immigrant schoolchildren in Germany. Two large-scale surveys offer broad insights. The PISA 2022 study (OECD, 2023) found that first-generation immigrant students in Germany scored 1.2 standard deviations (SD) below their non-immigrant peers – more than double the OECD average gap of 0.58 SD. However, available PISA data do not distinguish between students who arrived in Germany during early childhood and those who immigrated later. The National Education Panel Study (NEPS; Blossfeld & Roßbach, 2019), a longitudinal survey conducted from 2010 to 2019 with over 20,000 students, provides more detailed insights by tracking cohorts for, among others, students beginning Grade 5 and Grade 9 at the start of the study. A reanalysis of this data by Caspari and Marx (2024) found that immigrated students exhibited a reading comprehension gap of 0.45 to 0.79 SD compared to their non-immigrant peers. However, high attrition rates, especially among older students, and small sample sizes limit the reliability of these findings: for the younger cohort, available data per year ranged from just 22 to 54 students, while for the older cohort, it varied between 100 and 353 students.
Only one further non-survey project investigated reading comprehension, gathering longitudinal data from immigrants and a comparison group. Marx et al. (2021) examined reading skills among 136 immigrated students in Grades 7 and 8 at the start of the study. Results showed that, after one full year of preparatory language classes encompassing approximately 20 hours per week, only 20% of students passed a CEFR-based reading exam at B1 level, the recommended (but not mandatory) level for integration into mainstream in these states. Following full integration, compared to non-immigrated peers in the same schools (n = 517; 58% of these students were heritage language speakers), they showed an average reading comprehension gap of between 1.0 and 1.5 SD and a reading fluency gap of 1.0 to 1.2 SD, differences which were maintained over the course of the 2-year study. Furthermore, students who had attended schooling in Germany for up to 7 years performed similarly to students who had attended only 3 years, and there was no effect for length of stay after the first year of mainstream schooling.
Such results are troubling, as they indicate that, regardless of length of stay, immigrated students are having difficulties with both lower-level (fluency) reading skills and higher-level skills. However, the evidence base remains small, while expectations for the group continue to be high. This was a central impetus for this study, which aimed to consider different aspects of reading and their central correlates – individual and context factors – among immigrated students in Germany.
Research questions
This study posed three central research questions (RQs), aimed at considering reading skills as a central skill for educational achievement and correlates which may moderate these skills:
How do immigrated students’ results in German compare to those of their classmates regarding both (a) reading comprehension and (b) lower-level reading processes (reading fluency, cf. Kuhn et al., 2010)?
Do results converge with those of non-immigrant students, and if so, after how much time in German schools?
Are specific individual and proximal factors, including receptive vocabulary knowledge, age of acquisition, and total length of time in schooling, associated with higher or lower achievement?
Method
Sample
This study was conducted in seven lower secondary schools, including three comprehensive schools (Gesamtschule), three vocational schools (Realschule), and one general school (Hauptschule) in North-Rhine Westphalia (NRW), Germany’s most populous state, which also has the highest number of non-German passport holders. These grades were selected because lower secondary education in Germany begins in Grade 5 and typically continues through Grade 9 or 10, a critical period for determining students’ educational and vocational pathways. Schools’ willingness to allow for data collection and also the conditions they imposed resulted in an unexpected oversampling of Grade 7. Rather than artificially reducing the sample, all data points were retained as a valid representation of the cohort.
The sample included 76 immigrated students who arrived in Germany at or after age 6 (the start of mandatory schooling) and 192 non-immigrated peers; children who had immigrated between the ages of 0 and 5 years were not included in the study. Before recruitment, an a priori power analysis was conducted using G*Power v3.1 to estimate the required sample size assuming a moderate effect site of .25 based on the results of Marx et al. (2021). With a significance criterion of α = .05 and power = .80, the minimum sample size needed with this effect size is N = 158 for analyses. Thus, the obtained sample size of N = 268 for reading comprehension was deemed adequate to test the hypotheses.
Among the non-immigrated group, 53% spoke a language other than German at home, reflecting current school demographics and enhancing the external validity of the comparison group. To align with previous research (Marx et al., 2021) and ensure that students were able to understand data collection procedures, only students who had been in German schools for at least 1 year and were attending all classes in mainstream (i.e., with their non-immigrated peers) were included. For RQ3, which evaluated data solely from immigrated students, data from a further seven students in Grade 9 (M age = 15.84 years, SD = 0.77) and six in Grade 10 (M age = 16.59 years, SD = 0.87) were included.
At the start of data collection, the immigrated students had attended school in Germany for an average of 51 months (range: 8–96 months). They represented 27 countries of origin, with the largest groups coming from Ukraine (n = 20), Syria (n = 11), and Iraq (n = 7), while 24 other countries, including Afghanistan, Romania, and Kosovo, were represented by between one and five participants each. Most students reported beginning German language acquisition simultaneously with their schooling in Germany, although there was high variation across grades (see Table 1).
Demographic variables by grade.
Of the students who answered questions about German as a Second Language (DaZ) classes, 72% reported attending preparatory German classes, with class hours ranging from 2 to 25 hours (M = 7.1, SD = 6.0, n = 28), and 61% reported currently attending some form of DaZ class, with an average of 2.4 hours per week (SD = 3.5, n = 33).
Data collection was conducted in spring 2022 and repeated in 2023 and 2024 for one school, where 13 immigrated and 48 non-immigrated peers participated. All participants and their legal guardians provided informed consent before participation, with consent forms available in German and English. Prior to data collection, ethics approval was obtained through the Ethics Committee of the Faculty of Humanities, University of Cologne.
Materials
Reading fluency and comprehension data were assessed using the standardized instrument LGVT 5-12+ (Schneider et al., 2017). This paper-and-pencil instrument was chosen for several reasons: it is normed across all non-special needs school types and for Grades 5 to 12, requires only 6 minutes to administer, avoids ceiling effects through its design, and includes three test versions, enabling repeated use without practice effects. It also reports measures of test reliability, including test–retest and parallel-test reliability, as well as measures of convergent validity with four further normed tests. Although the norms were not used for analysis in this study, its usefulness to deliver comparison scores for students across all secondary school grades in mainstream education was the decisive factor in its choice for this study. A full description of the test construction, norming, item analyses, reliability measures and model conformity can be found in the work by Schneider et al. (2017).
The instrument was piloted in spring 2021 at two schools not involved in the main study. Forty-five immigrated students completed two test versions each during the pilot. The objectives were to assess the test’s validity for the target group and to determine the most suitable versions for this demographic. No significant differences were found between the versions Brot und Rosenkohl and Des Königs Laufbursche, which were subsequently selected for the main study and used for data collection in 2022/2024 and 2023, respectively.
Immigrant students also completed a comprehensive questionnaire addressing language use, school career, and other relevant factors. Mainstream students completed a shorter version focused on language use and reading preferences. The questionnaire data provided demographic context, controlled for potential confounding variables, and supported the analysis of RQ3. Both versions were developed by the research team (Marx, Barberio, Twente, Fuchs, Eisenbeiß, et al., 2024) and were administered digitally on tablets using the software LimeSurvey. Two members of the research team were present during data collection in order to assist with questions. In some cases, teachers were asked to verify students’ responses when they were unclear or to supply information if students did not complete questions on school-related issues (such as length of time spent in DaZ classes).
Results from a vocabulary test conducted as part of a related subproject with the same participants were included in the explanatory models (Marx, Barberio, Twente, Fuchs, Eisenbeiß, et al., 2024). The test is a modified and shortened version of the frequency-based tests provided by the Institut für Testforschung und Testentwicklung e. V. (https://itt-leipzig.de). It was developed for German language learners and measures general and academic receptive vocabulary in five different frequency bands. It consists of up to 150 items but is time-limited to 30 minutes to prevent ceiling effects and test load on participants. Administered digitally on tablets, the modified test has been shown to yield reliable and valid results for this demographic (Caglia & Tschirner, 2021, 2025).
Procedure
Reading tests and questionnaires were administered in students’ regular classrooms whenever possible. To streamline data collection and minimize disruptions to the schools, classrooms without immigrant students were excluded from the study. Trained research assistants administered all tests to ensure consistency in procedures and to clarify task instructions. No additional support was provided to students once testing commenced.
Students were informed at each testing session that participation was voluntary and that all results would remain confidential, with no disclosure to teachers, parents, or schools. To maintain anonymity, pseudonyms were assigned to each student at the start of the study and recorded on test materials. A list linking pseudonyms to real names was securely stored in locked safes and accessed only for testing sessions.
Data collection took place with all students in March 2022. Students completed both the reading test and the questionnaire in a single sitting, which lasted no more than 45 minutes to ensure completion within one class period. The vocabulary test, which took 30 minutes, was administered within 1 month of the reading test.
Longitudinal reading data, which addressed RQ2, was collected from a subgroup of students at only one school due to the previously mentioned constraints. Retesting occurred in May 2023 and April 2024.
Data preparation
The reading test was scored by hand according to the standardized results sheet provided with the instrument. Because the target group under investigation was not expected to reach grade-level norms, data was not transformed to the t-values provided. Rather, raw scores were used for data analysis, that is, number of correct answers achieved in the test. Reading fluency was scored according to the test guidelines (number of words completed during the test time).
The questionnaire results were automatically scored and transferred to the data set. Missing and implausible values were corrected in part through direct inquiry of the students’ teachers; as well, a paper-and-pencil questionnaire refreshment was carried out in two schools in spring of 2024 in order to collect missing information for a total of 25 students.
Data analysis
Reading comprehension and reading fluency were analyzed using linear mixed-effects models to account for the hierarchical structure of the data (students nested within schools). To capture between-school variability in both baseline reading performance and group-related differences, the models included random intercepts and random slopes for Group across schools. Fixed effects included Group, Grade, and their interaction. Grade was treated as a fixed effect because developmental differences across grade levels were a central focus of the analysis. For reading fluency, results were excluded if the calculated t-value corresponding to test norms (not used in the present analyses) exceeded the t-value of the reading comprehension score by more than 15 points. This criterion helped strengthen the validity of the fluency results, as a large discrepancy would suggest that students answered without actually reading the text. As a result, the number of observations differs between the reading comprehension and reading fluency analyses. All results are reported with their 95% confidence intervals (CIs).
Applying this criterion in the longitudinal analyses, where the sample size was considerably smaller, resulted in a substantial reduction of available fluency cases across measurement points. To preserve statistical power and ensure the reliability of the longitudinal results, only reading comprehension was analyzed longitudinally.
To explore the effects of additional variables on individual test results (RQ3), exploratory post hoc analyses were conducted using Pearson’s correlation coefficients for metric data and Spearman’s rho for ordinal data. To compare results for DaZ class attendance, analyses of variance (ANOVAs) were calculated. Analyses were conducted using SPSS (v29) and R (v4.3.1), with linear mixed-effects models fitted in R using the lme4 package (Bates, Maechler, et al., 2015).
Data availability
According to project standards, all raw and transformed data collected and analyzed during this study have been uploaded to the Research Data Repository VerbundFDB of the Institute for Educational Quality Improvement (IQB) (Marx et al., 2025). The data set adheres to the repository’s standards and requirements, ensuring accessibility and compliance with research data management best practices (DDP-Bildung & VerbundFDB, 2024).
Results
Reading comprehension
For reading comprehension, the mixed-effects model showed a significant main effect of group: newly arrived students scored lower than mainstream students (b = –4.07 [–5.88, –2.26], p < .001). School grade was also a significant predictor (b = 1.02 [0.41, 1.63], p = .001), indicating better performance in higher grades. The interaction between group and grade was not significant (b = –0.44 [–1.45, 0.58], p = .400), suggesting that the performance gap between groups was consistent across grade levels.
A detailed summary for the raw data of means, SD and effect sizes (Cohen’s d) for each grade level can be found in Table 2.
Means, standard deviations, and effect sizes for reading comprehension by grade level.
Note. N indicates the total number of participants in each group. Effect sizes (d) were calculated based on differences between the recently arrived and comparison groups.
Reading fluency
Regarding reading fluency, the model indicated a significant group difference, with newly arrived students reading fewer words than their mainstream peers (b = –116.45 [–179.39, –53.52], p < .001). On average, comparison group students read 116 words more (M = 551, SD = 167) than immigrated students (M = 435, SD = 178). School grade showed a positive association with reading fluency (b = 65.68 [32.54, 98.82], p < .001), reflecting higher fluency at more advanced grade levels. The interaction between group and grade was non-significant (b = –10.35 [–66.42, 45.73], p = .716), indicating that the group difference did not vary systematically across grades. The effect sizes were, however, not similar across grade levels (Table 3).
Means, standard deviations, and effect sizes for reading fluency by grade level.
Note. N indicates the total number of participants in each group. Effect sizes (d) were calculated based on differences between the recently arrived and comparison groups using pooled SDs.
Longitudinal analyses of reading comprehension
Retesting was conducted at one school in spring of Grade 7 (2022), Grade 8 (2023), and Grade 9 (2024), with 48 comparison group participants and 13 immigrated students. A linear mixed-effects model indicated that, although the overall sample demonstrated a significant improvement in reading comprehension from 2022 to 2024 (fixed effect of time: b = 2.11 [1.54, 2.69], p < .001), newly arrived students showed lower performance across measurement points (fixed effect of group: b = –3.27 [–5.61, –0.93], p = .006). Importantly, the interaction between time and group was not significant (b = –0.19 [–1.43, 1.05], p = .760), indicating that the performance gap between the two groups remained statistically constant throughout the study period.
A summary of means in reading comprehension can be found in Table 4.
Longitudinal reading comprehension scores by grade level and group.
Predictor variable: vocabulary knowledge
In the final step, vocabulary score was included in the models to examine whether differences in vocabulary knowledge account for variance in reading performance. When vocabulary was added, the full random-effects structure failed to converge. Following recommended practice for mixed-effects modeling (Bates, Kliegl, et al., 2015), the random-effects structure was simplified to a random-intercept-only model, which converged successfully.
In the extended model for reading comprehension, vocabulary emerged as a strong positive predictor (b = 0.05 [0.04, 0.07], p < .001). After controlling for vocabulary, the previously significant effect of group was substantially reduced and no longer significant (b = –0.30 [–1.48, 0.87], p = .610). Grade level remained a significant, though comparatively small, positive predictor (b = 1.11 [0.44, 1.79], p = .001). The interactions between group and vocabulary, between grade and vocabulary, and the three-way interaction were all non-significant (all p > .25).
For reading fluency, vocabulary knowledge also had a significant effect (b = 1.85 [0.93, 2.77], p < .001) and again nullified the effect of immigrant status (b = –4.15 [–73.51, 65.20], p = .906); the effect of grade level remained moderate (b = 52.38 [16.67, 88.10], p = .004), indicating that fluency increased with grade. Most interaction terms were non-significant, with two exceptions: the interaction between vocabulary and grade level (b = –1.07 [–2.10, –0.03], p = .044), and the three-way interaction among group, vocabulary, and grade level (b = 2.50 [0.58, 4.43], p = .011). These patterns suggest that the relationship between vocabulary and reading fluency varies somewhat depending on students’ grade and group membership, although the effects are modest.
Further predictor variables for immigrant students’ reading
To explore potential predictors of reading comprehension and reading fluency among immigrated students and to inform future research directions, nine variables were analyzed using Pearson’s or Spearman’s correlations, as appropriate. These variables included participant age, age of onset (AoO) of German acquisition, time since AoO, length of schooling in Germany, use of German outside school (with family and friends), reading and writing in German outside of school over the past 2 weeks, weekly hours previously spent in preparatory classes, and current weekly hours in a DaZ (German as a Second Language) class. ANOVAs were conducted to compare mean reading scores between students who self-reported attending preparatory or DaZ remedial classes (yes/no).
Analyses identified significant associations for reading comprehension and six variables: reading in German outside of school (r s = .309, p = .003), writing in German outside of school (r s = .342, p = .033), number of DaZ class hours attended at the time of testing (r = –.368, p = .013), participant age (r = .240, p = .024), length of schooling in Germany (r = .478, p = .006), and years learning German (see below).
For reading fluency, similar patterns were observed for four variables, including a negative correlation with the number of DaZ class hours attended at the time of testing (r = –.356, p = .026) and positive correlations with participant age (r = .265, p = .030), length of schooling in Germany (r = .415, p = .044), and years learning German (see below). In addition, a positive association was found with the use of German outside of school with family and friends (r s = .352, p = .028).
Notably, the number of years students reported learning German had only a moderate correlation with test results (reading comprehension: r = .422, p = .002, n = 70; reading fluency: r = .363, p = .007, n = 54); differing ns reflect exclusion of cases on fluency as detailed above). Figure 1 illustrates the relationship between reading comprehension and years of language learning:

Reading comprehension results by years of German language learning (n = 70).
For both reading comprehension and fluency, there was a main effect of both preparatory course attendance and DaZ remedial class attendance, though with differing outcomes. For reading comprehension, there was a significant effect of preparatory class attendance, F (1, 41) = 7.41, p = .009, ηp2 = .153, as well as a significant effect of not attending remedial DaZ classes, F (1, 41) = 6.26, p = .016, ηp2 = .133. A similar pattern was observed for reading fluency: students who had attended preparatory classes had significantly higher fluency scores, F(1, 35) = 9.44, p = .004, ηp2 = .212, while students in remedial DaZ classes had lower fluency scores compared to those not attending such classes, F(1, 35) = 7.65, p = .009, ηp2 = .179. No significant effect was found for the interaction in either reading comprehension, F(1, 41) = 0.92, p = .342, ηp2 = .022, or reading fluency, F(1, 35) = 0.70, p = .407, ηp2 = .020. Descriptives can be found in Table 5.
Comparison of reading comprehension and fluency based on DaZ class participation (M (SD), n).
While the highest scores in both reading comprehension and fluency were observed among students who had attended preparatory German classes but were not currently enrolled in remedial DaZ classes, it is important to note that these values are confounded by school-internal decisions. Students are often assigned to remedial DaZ classes if they show especially low proficiency in German after full integration in regular coursework. This selection bias likely influences the observed differences (see Discussion).
Discussion
Results of the study provide insights into the reading skills of recently arrived students by addressing three central research questions.
RQ1: Performance gaps in reading
Regarding RQ1, immigrated students consistently performed considerably lower than their classmates in both (1) reading comprehension and (2) lower-level reading processes (reading fluency), despite full integration into mainstream classes and grading based on the same norms as their non-immigrant peers. These findings align with Marx et al. (2021), who examined students in different states, grade levels, and regions of origin in an earlier study. While such results are perhaps not surprising, considering the complexity of reading and reading development, they do suggest that school expectations need to be adjusted to allow for more realistic expectations of German reading skills in newly immigrated students and to consider how to support the long-term development of these skills.
RQ2: Longitudinal results
For a smaller sub-cohort, students who were in Grade 7 at the study’s outset were retested yearly until Grade 9. While the group showed significant improvement in both reading comprehension and fluency from 2022 to 2024, with no significant interaction between group and time. This suggests that, although immigrant students made progress, they did not even begin to close the gap with their non-immigrant peers over the 2-year period. Again, these results are consistent with Marx et al. (2021).
RQ3: Receptive vocabulary knowledge and additional individual variables
Finally, RQ3 examined whether specific individual variables were associated with reading achievement. Given the small cohort, analyses were exploratory and focused on scores at t1. Receptive vocabulary, a well-established correlate of reading skills, was first added to the model. For both reading comprehension and fluency, the inclusion of vocabulary knowledge rendered immigrant status negligible, indicating that vocabulary was a stronger predictor of reading outcomes than group membership. This result is important for a number of reasons; most centrally, that any measurement of reading with immigrant students should be accompanied by a vocabulary measure as a central explanatory variable. As well, it confirms the validity of the ITT vocabulary instrument used in this study.
Further exploratory analyses focused on recently arrived students. Reading comprehension was positively and moderately associated with regular reading and writing in German outside of school and the length of schooling in Germany. There was also a weak but significant correlation with participant age, and a moderate negative correlation for attending remedial DaZ classes. For reading fluency, results were similar, although no association was found for reading and writing outside of school.
A notable finding emerged regarding attendance in DaZ remedial classes. At first glance, the data appear to suggest that students attending these classes had lower reading outcomes. However, this pattern likely reflects sample selection rather than program effectiveness. Unlike preparatory courses, which most recently arrived students in our sample attended, remedial DaZ class enrollment was constrained by factors such as instructor availability and the number of immigrant students in mainstream education. Critically, students were placed in remedial DaZ classes when they were considered to have low German proficiency, suggesting that their lower reading performance predates their enrollment in the program. Consequently, DaZ class attendance more likely indicates pre-existing low reading skills than the program’s impact.
Limitations
This study faced several limitations, particularly regarding sampling, sample size, data collection instruments, and the inherent challenges of working with recently immigrated students.
First, two major systemic challenges – the Covid-19 pandemic and the sudden influx of Ukrainian students – disrupted school coordination and limited access to participants. These factors made an already difficult-to-reach population even harder to study. As a result, convenience sampling was employed, relying on schools willing to participate. While this reduced the intended sample size, the absence of main effects for school suggests that the sample was broadly representative of lower secondary schools and their student populations. A detailed description of how these and further issues were met is provided in the project report (Marx, Barberio, Twente, Fuchs, Eisenbeiß, et al., 2024).
Second, data collection was hindered by the lack of standardized assessments for recently immigrated students. Unlike language assessments in foreign language classrooms, testing in mainstream education requires comparisons with students who have been educated entirely in the language of instruction. To address this, results were calculated in mixed models, including school and grade level, and analyses of further factors were limited to immigrated students.
Third, ensuring data reliability and validity posed additional challenges. The individual variables were collected through a digital questionnaire, but some students skipped questions or provided inconsistent responses, leading to missing values or possibly unreliable answers. This is a common issue in digital surveys with this population (Ahrenholz & Maak, 2013; Caspari & Marx, 2024). It was also difficult to obtain reliable information on the fidelity of language courses, including whether they even regularly took place. This limits insights into individual and contextual factors affecting learning (Bronfenbrenner & Morris, 2007).
Finally, defining appropriate comparison groups remained a central challenge. Immigrant students constitute a highly heterogeneous population, differing in emotional (often traumatic) experiences (e.g., due to forced displacement), linguistic backgrounds, prior education, academic achievement, and host-country schooling experiences. This diversity complicates intra-group comparisons and necessitates cautious interpretation of findings.
Conclusion
This study contributes to the growing body of research on an underexplored group of language learners in Europe, focusing on Germany. The findings highlight both the learning and academic challenges faced by recently arrived students and the methodological difficulties in studying this diverse population.
First, consistent with findings from Germany (Caspari & Marx, 2024; Marx et al., 2021) and internationally in the United States (Clark-Gareca et al., 2019), Israel (Levin & Shohamy, 2008), and Canada (Paradis et al., 2020), recently arrived students, despite attending the same mainstream classes as their non-immigrant peers, exhibit significantly lower reading performance; and although these students improve over time, they do not appear to be closing the achievement gap. While perhaps not surprising, it does indicate that educators must recognize the lengthy times immigrant students need to develop German reading skills and, subsequently, consider how to support recently arrived students over a longer period so that they can fully participate in classroom learning.
Second, participation in preparatory and remedial DaZ classes – programs theoretically mandated for immigrant students – is lower than expected. Fewer than three-quarters of the students reported attending preparatory classes, and just over half were enrolled in remedial DaZ classes at the time of the study, despite a clear need for additional language support. In this vein, while preparatory courses appear to play a crucial role in language development after immigration, the mechanisms underlying their impact remain unclear. The seemingly conflicting results in this study are a prime example of how the lack of detailed data on the length, scope, and fidelity of the German language courses and the role of allocation measures complicates interpretation. Future research, including qualitative methods such as interviews with students and teachers, could provide insight into the quality of DaZ instruction, the availability of emotional and academic support, teachers’ decisions and roles, and the extent to which these courses meet students’ needs (see, for example, Abramicheva, 2024).
Third, several individual factors appear to be associated with stronger reading outcomes, the most important of which was receptive vocabulary; as well, regular reading and writing in German, more time in the German education system, and participant age were associated with better reading results. While exploratory, these findings suggest implications for educational policy (e.g., increasing preparatory class duration) and classroom practice (e.g., promoting independent reading and writing outside of school).
Finally, the study underscores the need to critically assess the methodologies used in research on immigrated students. Digital data collection methods may be less reliable for this population, resulting in higher rates of missing or inconsistent responses compared to traditional paper-and-pencil methods (PAPI), particularly for questionnaires assessing language learning factors (Marx et al., 2021). Furthermore, longitudinal studies are urgently needed to track the long-term language development of immigrant students. The lack of such research not only limits the academic knowledge base but also hampers the development of evidence-based recommendations for educational policy and practice.
In summary, the findings from this study suggest that both research and classroom practice with immigrated students should consider more clearly expectations for literacy-related skills such as reading and the time and instructional conditions required for these skills to develop. Greater clarity regarding what constitutes realistic progress, how literacy trajectories vary across learners, and which forms of support are most effective would enable educators to design instruction that is both developmentally appropriate and responsive to students’ diverse linguistic backgrounds.
Supplemental Material
sj-docx-1-ijb-10.1177_13670069261432528 – Supplemental material for Playing the catch-up game: How long do immigrated students lag behind?
Supplemental material, sj-docx-1-ijb-10.1177_13670069261432528 for Playing the catch-up game: How long do immigrated students lag behind? by Nicole Marx and Anna Gorsch in International Journal of Bilingualism
Footnotes
Acknowledgements
I would like to thank Anna Schweizer and Johanna Brockmann for scoring and preparing the raw data from the reading tests, and Leonie Twente for verifying and adjusting the questionnaire data. I also especially thank the schools involved, teachers, parents and guardians, and the students who participated in the study.
Ethical considerations
Prior to data collection, ethics approval was obtained through the Ethics Committee of the Faculty of Humanities, University of Cologne.
Consent to participate
All participants and their legal guardians provided informed consent before participation, with consent forms available in German and English.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research project “Language Skills of Newly Immigrated Students in Mainstream Education” was financially and administratively supported by the Mercator Institute for Literacy and Language Education and the University of Cologne from 01.02.2021 to 30.06.2024.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
1. The paper-and-pencil instrument used to test reading comprehension and reading fluency is the LVGT 5-12+, a standardized and normed text under protection of copyright. The reference is: Schneider, W., Schlagmüller, M., & Ennemoser, M. (2017). LGVT 5-12+. Lesegeschwindigkeits- und Verständnistest für die Klassen 5-12+. Hogrefe.
2. The Instrument “Leipziger Wortschatztests” („Leipzig Vocabulary Tests“) is publicly available in the non-modified forms over the Website of the Institut für interkulturelle Kommunikation e.V.: https://itt-leipzig.de/projekte/#wortschatztests. The direct link to the receptive vocabulary tests for German (non-modified form) is:
. For the format used in this study, an adapted version for use on tablets, no publicly available format exists. The test is under copyright protection. It is reported in: Caglia, D., & Tschirner, E. (2021). Examining the validity and reliability of the Receptive German 3 Vocabulary Size Test (VST) (Technical Report 2021-EU-PUB-1). Institut für Testforschung und Testentwicklung e.V.
3. The (online) questionnaire was developed specifically for the project. It is publicly available over the Research Data Repository VerbundFDB of the Institute for Educational Quality Improvement (IQB) (Marx, N., Barberio, T., Twente, L. R., Fuchs, M., Eisenbeiß, S., Dewitz, N. von, Bredthauer, S., & Goltsev, E. (2025). Sprachkompetenzen neuzugewanderter Schülerinnen und Schüler im Regelunterricht (SpraNZiR) (Version 1) [Data set]. IQB – Institut zur Qualitätsentwicklung im Bildungswesen. http://doi.org/10.5159/IQB_SpraNZiR_v1). An anonymised pdf printout of the questionnaire is uploaded with this submission. Please note that this instrument is available only in German.
Supplemental material
Supplemental material for this article is available online.
Notes
Author biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
