Abstract
The purported foreign language gains of content and language integrated learning (CLIL) over traditional EFL (English as a foreign language) programs with young learners are still unclear. Specifically, little is known about how CLIL time and timing impact grammatical complexity. Additionally, mediating factors such as socioeconomic status (SES) and extramural exposure have been rarely controlled in the existing literature. This study analysed grammatical complexity in four groups of young learners in Spain (n = 108) during an oral task. The sample comprised: (1) an EFL-only group (1,766 EFL hours), (2) a low-exposure CLIL group (1,766 EFL hours + 707 CLIL hours), (3) a high-exposure CLIL group (1,766 EFL hours + 2,473 CLIL hours), and (4) a younger high-exposure CLIL group (1,545 EFL hours + 2,164 CLIL hours). All groups were matched for SES and extramural exposure. The analysis included independent ratings and computational measures of overall sentence complexity, subordination, and coordination. Distribution, Kruskal–Wallis and post-hoc tests were conducted. Results showed significant differences in favour of the high-exposure groups over the EFL-only group in the ratings and in two of the computational measures: overall sentence complexity and subordination. This evidence highlights the potential of high-exposure CLIL to supplement grammatical instruction in EFL programs. Our results also suggest that the comparatively higher exposure of the younger high-exposure CLIL group has the potential to override the one-year cognitive advantage of the older, EFL-only learners.
I Introduction
The teaching of content through a foreign language (FL) by means of content and language integrated learning (CLIL) has become one of the most important pedagogical innovations in the last decades (Oxbrow, 2018), and one which has transcended the European borders as it has been gradually embraced by Latin America and Asia (Pérez Cañado, 2020). In addition to a shift towards more meaning-oriented exposure, rather than form-oriented exposure, the proliferation of CLIL has brought about two important changes when compared with regular English as a foreign language (EFL) programs: both an intensification of the exposure, and the introduction to the FL at an earlier age. Despite said proliferation and an increasing body of research in the field, many more studies are needed in order to gain a finer understanding of the interaction between CLIL exposure, age and specific aspects of FL proficiency, particularly with young learners (YLs) (Muñoz, 2015), among which oral production has been comparatively under-researched (Pérez Cañado & Lancaster, 2017).
Regarding exposure, there is the fundamental issue of how much additional CLIL exposure is needed to produce sizeable language gains over EFL-only populations. This is important because the relationship between FL exposure and proficiency is non-linear, as it is mediated by elements such as quality of input, interaction, and individual learner differences (DeKeyser, 2000; Ellis, 2008; Gass & Mackey, 2007; Long, 1996). Although there is consistent quantitative evidence of 300 CLIL hours as the benchmark beyond which significant language gains surface (for a review, see Muñoz, 2015), such threshold has been underpinned by research carried out with secondary-school learners predominantly, and with data which hinged within a 200 h – 400 h (hour) range of CLIL exposure. Consequently, the time issue remains very much unanswered when (1) dealing with younger learners and (2) analysing CLIL programs whose variability in terms of exposure has already provided evidence of affecting aspects like motivation significantly (Azpilicueta-Martínez & Lázaro-Ibarrola, 2023; Lázaro-Ibarrola & Azpilicueta-Martínez, 2024). In addition, and, although the tide seems to be turning, the potential influence of moderating variables such as socioeconomic status (SES) and extramural exposure to the FL have often not been controlled for in previous CLIL-based research (Lázaro-Ibarrola, 2024b; Soto-Corominas et al., 2024). Crucially, empirical evidence has been provided of a significant effect of SES on primary and secondary education learners’ foreign language proficiency (Lorenzo et al., 2020; Rascón & Bretones, 2018). Similarly, a growing body of research is revealing an interplay between extramural exposure and different aspects of learners’ language attainment (Bollansée et al., 2020; Wouters et al., 2024), including oral proficiency with YLs (De Wilde et al., 2020), which renders the need for more studies in the field all the more necessary.
Concerning specific aspects of FL development, there is the complex issue of the different ways and extents to which CLIL affects skills and subskills, for which ample evidence has been reported, although major findings seem to be, again, largely based on secondary-school contexts (Dalton-Puffer, 2008, 2011; Hidalgo & Villarreal, 2024; Ruiz de Zarobe, 2011, 2015). The existing research on CLIL exposure and oral proficiency with YLs mostly points to a significantly positive CLIL effect (Madrid & Barrios, 2018; Martínez Agudo, 2019; Nieto, 2016), although there is also evidence of a non-significant effect (Azpilicueta-Martínez, 2024; Gálvez Gómez, 2021; Lancaster, 2018). Regarding oral grammar specifically, the most recent evidence points to a non-significant effect of CLIL on YLs’ accuracy (Fernández Pena & Gallardo del Puerto, 2021; Martínez-Adrián & Nieva-Marroquín, 2023; Vraciu, 2020). However, recent evidence suggests that CLIL exposure might significantly enhance YLs’ grammatical complexity, although this finding is based on research involving cohorts not matched for age (Martínez-Adrián & Nieva-Marroquín, 2023).
The present study constitutes an attempt to provide empirical evidence of the effect that substantially different levels of CLIL exposure have on age-matched and non-age-matched YLs’ grammatical complexity while performing an oral task. Specifically, it analyses the impact that different amounts of CLIL exposure have on 108 YLs’ (mean age = 10.46 years) oral grammatical complexity including two different approaches, namely a holistic assessment by means of two independent ratings, in addition to computational measures including (1) overall sentence complexity, and amount of (2) subordination and (3) coordination. The study is innovative because (1) it includes markedly high levels of non-optional CLIL exposure (707, 2,164, and 2,473 hours), (2) it combines two markedly different data analysis methods in order to increase reliability – including independent ratings and computational measures of overall sentence complexity, subordination, and coordination – and it controls the potentially moderating variables of SES and extramural exposure.
II Literature review
1 The CLIL exposure threshold and challenges ahead
The expansion of CLIL towards increased exposure and an ever earlier introduction to the FL, particularly English (Bower et al., 2020; Coyle, 2006; Muñoz, 2014) seems unstoppable (Macaro et al., 2019) as its rapid geographic spread has impregnated Asia, the Middle East, and South America (Ito, 2018; Riddlebarger, 2013; Siqueira et al., 2018, as cited in Bower et al., 2020). However, the relationship between CLIL exposure, expectations and proficiency gains presents a complex challenge, as it has long been proven that instructional time in a FL does not correlate linearly with proficiency, as this is affected by the type and quality of its exposure, and by individual differences including age and cognitive level (Collins et al., 1999; DeKeyser, 2000; Ellis, 2008; Gass & Mackey, 2007; Long, 1996; Swain, 1981). Conversely, CLIL programs often supplement low-exposure EFL-only instruction, and scholars have warned about the inherent difficulty in attributing FL learning gains to the CLIL factor exclusively or rather to the increased amount of extended exposure to the FL over time (Ruiz de Zarobe, 2015). In fact, the absence of EFL-only programs offering an amount of exposure comparable to the one in CLIL programs hampers efforts to isolate the CLIL factor and provide a reliable comparison in this respect (Ruiz de Zarobe, 2010).
Adding another layer of complexity, the term ‘CLIL’ encompasses a diverse range of levels of exposure and ages of onset, elements with the potential to boost YLs’ implicit learning. Muñoz (2015) suggested a threshold of 300 h as the amount of additional CLIL exposure needed for significant language gains to take place over regular EFL-only instruction. The author underpinned her claim on the existing empirical evidence provided until 2015, which comprised a series of studies ranging between 200 and 400 CLIL hours of exposure: 259 h (Gallardo et al., 2009), 328–363 h (Villarreal & García Mayo, 2009), or 363 h (Martínez-Adrián & Gutiérrez Mangado, 2009), to name but some, with the lowest difference between CLIL and EFL-only groups at 43.2 h (Xanthou, 2011), and the largest at 480 h (Lázaro Ibarrola, 2012).
Notwithstanding this, Muñoz (2015) enumerated several challenges which should be addressed in order to shed new light on the CLIL exposure issue. First, the body of research analysing CLIL exposure quantitatively is limited, and a great deal of the existing evidence has been provided by older learners, which casts doubt on the generalizability of its findings with younger populations. Second, previous CLIL-exposure based studies have often failed to account for several cohort-matching issues, such as different SES and extramural exposure levels. First, although more research is needed, SES-based empirical studies in the Spanish context, i.e. the one in the present study, point to a significant effect of SES on primary and compulsory secondary education non-CLIL learners’ foreign language proficiency (Lorenzo et al., 2020; Rascón & Bretones, 2018). This fact appears to be linked to CLIL students being more likely to take extracurricular English lessons than their non-CLIL counterparts due to differences in parental SES (Muñoz, 2015). In the same vein, an increasing number of studies are showing how extramural exposure can have an important effect on different aspects of learners’ proficiency, including aspects like vocabulary (Bollansée et al., 2020), reading (Wouters et al., 2024), translation skills (Kuppens, 2010), or, as is the case of the present study, oral proficiency with YLs (De Wilde et al., 2020). In addition to these two limitations, and, although the proliferation of CLIL might be minimizing this variable, it would be desirable to reduce the CLIL ‘selection’ effect reported by several authors (De Smet et al., 2019; Mearns et al., 2020) by which the most proficient and/or motivated students are chosen for their enrolment in optional CLIL programs. Fortunately, CLIL-based research has started to control some of these variables (e.g. Lázaro-Ibarrola, 2024a; Madrid & Barrios, 2018; Martínez Agudo, 2019), but there is still an impending need for the analysis and comparison of different levels of CLIL exposure in relation to language proficiency gains, particularly with YLs.
2 CLIL and young learners’ oral grammar
Our knowledge of the ways and extent to which CLIL affects different language aspects is still developing, particularly with YLs. The pioneering review on the existing research carried out in Europe by Dalton-Puffer (2008) summarized the hitherto existing evidence on the aspects in which CLIL exposure had provided a significant advantage over EFL-only populations. These included, predominantly, receptive skills and general aspects like vocabulary or morphology; conversely, her analysis also shed light on elements over which CLIL yielded no significant advantage, or where evidence remained inconclusive, such as writing, syntax or pronunciation. Nevertheless, Dalton-Puffer’s (2008) study was published over a decade ago, and CLIL programs have undergone significant changes and developments since then. Notwithstanding this fact, the body of research dealing with the effects of CLIL on specific aspects of oral production is still rather limited, since this skill has traditionally been one of the least researched (Pérez Cañado & Lancaster, 2017). Regarding oral skill gains specifically, subsequent reviews (Ruiz de Zarobe, 2011, 2015) provided evidence of a CLIL advantage over EFL-only instruction in a series of specific aspects, like communicative efficiency, self-confidence (Dalton-Puffer et al., 2009, as cited in Ruiz de Zarobe, 2015), spontaneous production (Admiraal et al., 2006; Lasagabaster, 2008; Ruiz de Zarobe, 2008), self-reported fluency, flexibility and listener-orientedness (Hüttner & Rieder-Bünemann, 2010; Maillat, 2010; Moore, 2009), as well as fluency and risk-taking in relation to the low affective filter (Ruiz de Zarobe, 2015). In contrast to the positive findings revealed in these aspects, there are two oral competencies over which CLIL has failed to reveal a consistently significant advantage over EFL-only groups: pronunciation and grammar. Regarding the former, scholars have argued that learners might require a specific pedagogical intervention in order to improve (Gallardo et al., 2009; Varchmin, 2010). Similarly, concerning grammar in its oral form, different studies have provided evidence of how CLIL exposure alone might fail to lead to sizeable language gains in aspects like subject production or morphology (e.g. Basterrechea & García Mayo, 2013, 2014; García Mayo & Villarreal Olaizola, 2011). Notwithstanding the above, it is timely to highlight how none of the studies cited so far in this subsection have been carried out in primary school contexts. This complicates the generalizability of their findings with younger populations and raises interesting questions as to the optimal age to benefit from CLIL instruction, as different scholars have provided evidence of how older learners might benefit more than YLs thanks to their more developed cognitive, language processing and reasoning skills, first-language foundations, and motivation (Dalton-Puffer, 2007; Lasagabaster, 2011; Llinares et al., 2012).
The number of studies delving into CLIL and language gains in primary education is far less copious, as research has tended to place its focus on receptive skills (Jiménez Catalán & Ruiz de Zarobe, 2009; Navés, 2011; Pérez-Vidal & Roquet, 2015). The existing empirical evidence has provided mixed findings (Azpilicueta-Martínez, 2024; Gálvez Gómez, 2021; Housen, 2012; Lancaster, 2018; Martínez Agudo, 2019; Vraciu, 2020), and it has been hypothesized that the additional exposure provided by CLIL might fail to grant YLs with a significant advantage over EFL-only learners (Housen, 2012, as cited in Muñoz, 2015). The studies analysing aspects of oral production with primary school CLIL learners in their samples have often followed a holistic approach (Gálvez Gómez, 2021; Nieto, 2016), since oral production constituted a fragment within broader projects comparing CLIL and EFL-only groups including other language skills (Lázaro-Ibarrola, 2024a; Madrid & Barrios, 2018) or various educational stages (Martínez Agudo, 2019). In other words, the values for oral production in these studies were global and resulted from the addition of various components including not only grammatical aspects, but also elements like lexical range, fluency or pronunciation (Gálvez Gómez, 2021; Madrid & Barrios, 2018; Martínez Agudo, 2019). Interestingly, when the grammatical component was mentioned as constituting an element in the oral production rubric in these studies, it was done either generally (‘grammar’) or focusing on ‘grammatical accuracy’ alone (Madrid & Barrios, 2018, p. 36; Martínez Agudo, 2019, p. 74), i.e. not grammatical complexity. The studies which have compared oral production in CLIL and EFL-only primary school contexts have nevertheless revealed mixed findings. Nieto (2016) compared a series of skills and subskills including spoken production and interaction with CLIL and EFL-only groups, with YLs learners in Spain (ages 9, 10) in which the CLIL learners had received an additional 250 h of FL exposure. She reported that the only statistically significant advantage of the CLIL groups over the EFL-only groups lay in the oral production and interaction aspects. In the same vein, research by Madrid and Barrios (2018) and Martínez Agudo (2019) compared, amongst other, oral production with YLs aged 11–12 years and reported statistically significant advantages for the CLIL groups over the EFL-only groups in those particular skills precisely. In contrast to these findings, Lancaster (2018) conducted a study that examined various skills and subskills, including oral production, among CLIL and non-CLIL groups of primary (ages 11–12 years) and secondary (ages 14–15 years) school students. The results showed no significant oral proficiency gains for the CLIL groups on either age group, and, for primary learners specifically, there were no notable advantages in fluency, interaction, pronunciation, or lexical range. Concurring with Lancaster (2018), a study by Gálvez Gómez (2021) compared various language components, including oral production in CLIL and EFL-only groups, with primary (ages 11–12 years) and secondary school learners and reported significant advantages for the CLIL groups over the EFL-only groups in all measures, in both age groups, except for oral production with the primary school learners.
The body of research delving with specific subskills like oral grammar with CLIL YLs is still sparse (Martínez-Adrián & Nieva-Marroquín, 2023) and studies have shown a tendency to prioritize accuracy over complexity (Fernández Pena & Gallardo del Puerto, 2021; Vraciu, 2020) with notable exceptions (Martínez-Adrián & Nieva-Marroquín, 2023). Vraciu (2020) analysed the effect of additional low-exposure CLIL instruction (one weekly CLIL hour) on oral grammatical accuracy, specifically suppletive and affixal verb morphology with L1-Catalan–L2-Spanish learners of English at different times (ages 9–10, and then 11–12 years), and reported no significant advantage for the CLIL groups over the EFL-only groups at any of the data collection times. In the same vein, Fernández Pena and Gallardo del Puerto (2021) conducted a study which compared L1-Spanish (ages 11–12 years) learners in EFL-only and CLIL settings, where the latter had received an additional 488 h of CLIL exposure. The analysis examined grammatical accuracy operationalized by means of agreement morphology and subject omission errors. Concurring with Vraciu (2020), no statistically significant differences between the EFL-only and CLIL learners were reported in any of the features under study. Finally, a recent longitudinal study by Martínez-Adrián and Nieva-Marroquín (2023) focused on the ‘timing’ element by comparing CLIL exposure with L1-Spanish learners in Primary Year 4 (ages 9–10 years) and Primary Year 6 (ages 11–12 years), and analysed grammatical accuracy (subject omission, subject–verb inversion and third person singular ‘s’ morpheme omission), as well as syntactic complexity, operationalized through the production of simple and complex clauses. Interestingly, the younger group in their study had received low-exposure CLIL instruction, while the older group had received a higher degree of exposure, which led to a 1,064 h-exposure advantage for the Year 6 group over the Year 4 group at the time of testing. Despite the substantial increase in CLIL exposure and the cognitive maturity advantage of the older learners, no significant differences were reported between both groups for the grammatical accuracy measures. The results for syntactic complexity, however, showed a statistically significant advantage for the Year 6 learners both on the production of simple and complex clauses, a finding which the researchers attributed to the fact that the older group’s productions ‘were longer and contained more lexical verbs than those of the less proficient group’ (p. 1218). These findings, thus, invite us to consider that grammatical complexity, unlike accuracy, might have the potential to be significantly improved by increased CLIL exposure with YLs. Notwithstanding, it is also true that the cognitive maturity advantage in the Year 6 group over the Year 4 group in the study by Martínez-Adrián and Nieva-Marroquín (2023) might have played a role in the results, raising the question as to what degree the improvements in grammatical complexity can be ascribed to the CLIL exposure factor exclusively.
In view of the above there is an impending need to shed light on the extent to which oral grammatical complexity might be affected by different levels of CLIL exposure with YLs, including age-matched and non-age-matched peers, and controlling potential moderating variables like SES and extramural exposure to the FL.
In view of this research gap, the following main and secondary research questions were entertained:
Research question 1: What is the effect of different levels of additional CLIL exposure on young learners’ grammatical complexity while performing an oral task?
Research question 2: How does the younger high-exposure CLIL learners’ grammatical complexity compare to the other groups’?
III The study
1 Participants
The sample in the study included a total of 108 children (mean age = 10.46 years) from six state schools in the Chartered Community of Navarre (Spain), comprising, in turn, the following populations:
An EFL-only group (Primary Year 6), which had received an average of five weekly EFL sessions* (n = 23; mean age = 10.91 years).
A low-exposure CLIL group (Primary Year 6), which had received an average of five weekly sessions of EFL instruction plus two weekly CLIL sessions (n = 21; mean age = 10.24 years), including the subjects of arts and crafts, music, and physical education.
A high-exposure CLIL group (Primary Year 6), which had received an average of five weekly sessions of EFL instruction plus seven weekly CLIL sessions (n = 32; mean age = 10.91 years) which included the subjects of arts and crafts, maths, science and social studies subjects.
A younger high-exposure CLIL group (Primary Year 5), which had received an average of five weekly instruction of EFL classes plus seven weekly CLIL sessions (n = 32; mean age = 9.84 years), including the same CLIL subjects as the previous group.
Participants had been enrolled in their respective programmes since the age of three. The amount of exposure to English was calculated by multiplying eight (in the case of the Year 6 students) and seven (Year 5) academic years by the number of weeks of instruction (i.e. 37 weeks per year plus 16 weeks of the ongoing year at the time of data collection) and multiplying the resulting number by the number of sessions divided into hours, as displayed in Table 1. The participants’ teachers had a certified minimum B2 level of proficiency (CEFR) in English. The CLIL programs in the schools participating in the study were compulsory, thus eliminating the selection bias typical of optional CLIL programs (De Smet et al., 2019; Mearns et al., 2020). All CLIL hours in Table 1 correspond to content-based subjects taught through the medium of English.
Study group characteristics.
Notes. Each session for each of the groups lasted 50 minutes. CLIL = content and language integrated learning. EFL = English as a foreign language.
a Socioeconomic status (SES)
The following proxy measures for SES were examined by means of an individual online questionnaire, each of which including a value range between 1 and 5:
Parental (including both parents) level of education (two measures).
Parental (including both parents) employment status (two measures).
Parental (including both parents) current occupation (two measures).
Number of books at home (one measure).
Number of home resources (including PCs, laptops, tablets, a room of their own, to name but some) (one measure).
To quantify each participant’s SES level, the five measures with the highest standard deviation were chosen and summed for each individual. No SES differences were detected between any of the four groups, or between any of the six schools participating in the study, and no outliers were identified by means of boxplot chart analysis. There was a non-normal SES distribution within each group (Kolmogorov–Lilliefors p < 0.01), which provided strong evidence of mixed SES indicators in all groups.
b Control of extramural exposure and extracurricular foreign language classes
An individual online questionnaire was designed in order to measure extramural exposure and extracurricular FL instruction with all participants. In it, learners were asked:
Whether they attended extracurricular FL lessons.
Whether they had spent more than one month in an English-speaking country, even if such exposure had been achieved by means of several stays.
Whether they interacted in English with parents, siblings or friends on a regular basis. How much time they devoted to the following activities in English: ○ TV viewing, including cartoons, TV series, films; ○ reading books, comics, or magazines; ○ videogaming.
In order to guarantee that the groups were matched for extramural exposure and extracurricular FL instruction and be able to ascribe the results to the CLIL exposure factor, learners who replied affirmatively to items 1, 2 and 3, or who reported spending 6 hours or more for the activities in item 4 combined were excluded from the study. The total number of learners excluded from the initial 222 YLs in the study due to differences in extramural exposure or extracurricular FL instruction was 114, which left a total of 108 participants.
2 Task and procedure
The task used in the present study was Heaton’s (1966) ‘Bicycle’. It is an integrated task (Iwashita et al., 2008) including a six-frame picture prompt. Participants were told the name of the story and were described the first picture (see Appendix 1), after which they were asked to continue narrating the story. The procedure tailored for the study was inspired by the format of the story narration in Cambridge’s YLs’ Movers test, because said test is designed for children between the ages of 7 and 12 years and is aimed at encouraging participants’ oral production by creating an initial confidence-building attitude. The task was carried out and recorded individually with each student with the help of the teaching staff at the schools participating in the study, i.e. the participants’ EFL or CLIL teacher introduced the task to them in order to ensure familiarity. Proactive or reactive or questions (Appendix 1) were used when participants’ responses were nonexistent or too short. Authorized consents were granted by the parents or legal guardians of all participants.
3 Data collection and analysis
All the data were collected between 2019 and 2021. Each participant performed the task once. The total amount of recorded time was 197 minutes and 20 seconds, averaging 1 minute and 50 seconds per participant. The oral production was subsequently assessed by two independent raters using a common rubric adapted from Merino and Lasagabaster (2018), in which each participant’s performance was awarded a value which ranged between 1 (minimum value) and 4 (maximum value) (see Appendix 2). Both raters were proficient speakers of English with years of professional experience assessing oral production with YLs. The intraclass correlation coefficient (ICC) between raters was calculated following Shrout and Fleiss (1979) with an ICC value of 0.725. Oral data were then fully transcribed for computational analyses. Computational measures included (1) overall sentence complexity, and amount of (2) subordination and (3) coordination, for which the Second Language Syntactic Complexity Analyzer (L2SCA) was used (Lu, 2010, 2011). Following the operationalization used in previous research (Ai & Lu, 2013; Lu, 2010, 2011), the following elements were selected: complex T-unit ratio (overall sentence complexity); dependent clause per T-unit ratio (amount of subordination); and coordinate phrase per T-unit ratio (amount of coordination). The rationale for combining two measures of analysis intends to provide the results with greater construct validity. The identification of subordination and coordination is straightforward, and it was done through computational analyses. The ratings were based on a more holistic grammatical complexity approach, because even if the computational analysis might yield said values, there were many instances of inaccurate grammar use in the participants’ interlanguage (e.g. subject–verb agreement and grammatical clarity, as in ‘In this story eh . . . the boy, eh . . . go to the floor with the face’) which can mediate raters’ perception of complexity. Hence the interest in providing two different methods for analysis.
A Lilliefors normality test was conducted, which revealed a non-normal distribution of the population (p < .01). Thus, non-parametric Kruskal–Wallis tests were selected, with significance set at p ⩽ .05. Where significant differences were detected, post-hoc tests (using the Holm correction to adjust p-values) were conducted to interpret the nature of the differences. Effect sizes have also been provided according to Plonsky and Oswald’s (2014) benchmarks for eta-squared values in applied linguistics research, in which eta-squared (η²), small, medium, and large effects were defined as η² = 0.01, η² = 0.06, and η² = 0.14, respectively.
IV Results
In this section we will attempt to answer our main research question, that is, to analyse the effect of additional CLIL exposure on YLs’ grammatical complexity while performing an oral task, as well as our secondary research question, i.e. to examine how the younger high-CLIL learners’ grammatical complexity compared to the other groups’. First, the results concerning the ratings will be provided, which will be followed by the results of the computational measures. In order to provide a more comprehensive picture, findings regarding density and distribution have also been included. Box plots have been used to better illustrate the differences and similarities between groups.
1 Ratings
The normality tests revealed a normal distribution for the EFL-only group (all tests), but a non-normal distribution of the data in the low-exposure, high-exposure and younger high-exposure CLIL groups (Appendix 3), which made non-parametric Kruskal–Wallis tests appropriate.
The Kruskal–Wallis test revealed significant differences (p = .011), with a medium-to-large effect size (η² = 0.08). The post-hoc test confirmed the existence of two significantly different groups, with the high-exposure and younger high-exposure CLIL groups (non-significantly different between them) clustering at the highest end of the spectrum, and the EFL-only group at the lowest (Table 2). The low-exposure CLIL group lay in between the other three groups, showing no significant differences with any of them.
Ratings: Post-hoc classification of groups.
Notes. CLIL = content and language integrated learning. EFL = English as a foreign language.
The differences between groups can be graphically observed in the following box plot (Figure 1), where three quartiles of the high-exposure CLIL groups (both ages) are placed above the median of the EFL-only and low-exposure CLIL groups. The figure also illustrates how values were substantially more homogeneous in the high exposure group than in the younger high-exposure CLIL group.

Box plot showing raters’ average values for all groups.
Results regarding the density and distribution of the groups reveal a clear asymmetrical pattern towards the lowest value range for the EFL-only group (Figure 2). By contrast, the distribution density for the younger high-exposure group depicts an asymmetrical tendency towards the highest value range. Both the low-exposure and high-exposure CLIL groups are placed between the more polarized distribution shapes observed in the EFL-only and younger high-exposure CLIL groups. It is worth noting that there is particularly clear evidence of mixed populations within the high-exposure group, which is illustrated through the appearance of several bell-shaped curves.

Density and distribution in all groups.
2 Computational measures
a Overall sentence complexity
As previously stated, computational measures for overall sentence complexity were based on the complex T-unit ratio. The normality tests revealed a non-normal distribution of the data in all groups (Appendix 3). Consequently, non-parametric Kruskal–Wallis tests were conducted. The Kruskal–Wallis test showed that there was a significant difference between groups (p = .041) with a small-to-medium effect size (η² = 0.05), as shown in Table 3. A post-hoc Dunn–Bonferroni test was then conducted, and it showed a significant difference between the EFL-only and the high exposure group, with an adjusted p-value of .037, as noted in Table 4. The box plot in Figure 3 illustrates how the significantly higher values of the high-exposure group over the EFL-only group were reflected in the distribution patterns.
Results of the Kruskal–Wallis test regarding overall sentence complexity.
Notes. EFL = English as a foreign language.
Results of the post-hoc Dunn–Bonferroni Test regarding overall sentence complexity.
Note. Adj. p = Values adjusted with Bonferroni correction. EFL = English as a foreign language.

Box plot showing distribution for overall sentence complexity.
An example illustrating the overall sentence complexity difference between the EFL-only group and the high-exposure CLIL group can be seen in the following utterances, representative of their production: (1) * EFL-only: In this story eh . . . the boy, eh . . . go to the floor with the face. And stand up and go riding the bike. (2) * High-exposure CLIL: When Henry check again, go to her house and he found the angry man repairing her car that was broken.
Note how, in the EFL-only participant, despite the addition of a prepositional phrase (‘with the face’) the first sentence is still simple. The second sentence is still a simple sentence, although it includes a compound predicate (‘stand up and go riding the bike’). This type of construction was typical among the EFL-only participants. By contrast, the extract provided by the high-exposure CLIL learner provides evidence of a longer, more complex grammatical construction including an initial subordinate clause (‘when Henry check again’) two coordinate clauses (‘(he) go to her house and he found the angry man’) followed by two dependent clauses (‘repairing the car’; ‘that was broken’).
b Amount of subordination
As previously stated, computational measures for subordination were based on the dependent clause per T-unit ratio (DC-T). The normality tests revealed a non-normal pattern which mirrored the values for overall sentence complexity with all groups in all tests. Therefore, non-parametric Kruskal–Wallis tests were performed.
The Kruskal–Wallis test revealed a significant difference between groups (p = .003) with a medium-to-large effect size (η² = 0.10), as shown in Table 5. A post-hoc Dunn–Bonferroni test was then conducted, and it showed a significant difference between the EFL-only and the high-exposure CLIL group, with an adjusted p-value of .002, as may be noted in Table 6. The following box plot in Figure 4 illustrates how the values for amount of subordination in the high-exposure group were visibly higher than those in the EFL-only group.
Results of the Kruskal–Wallis test regarding amount of subordination.
Notes. EFL = English as a foreign language.
Results of the post-hoc Dunn–Bonferroni test regarding overall sentence complexity.
Note. Adj. p = Values adjusted with Bonferroni correction. EFL = English as a foreign language.

Box plot showing distribution for amount of subordination.
An illustrating example of the difference between the high-exposure CLIL and the EFL-only groups regarding the amount of subordination can be spotted in the following extracts: (3) * EFL-only: Then the, the angry man push Henry and Henry fell down to the bushes. (4) * High-exposure CLIL: Henry start putting nervous because the angry man was driving very fast.
Note how the EFL-only participant’s utterance contains two independent clauses (‘Then the angry man push Henry’; ‘Henry fell down to the bushes’) linked by a coordinating conjunction (‘and’) and no subordinate clauses. The oral production by the high-exposure CLIL YL, however, provides evidence of a T-unit integrating a main clause (‘Henry started to get nervous’) followed by a cause-and-effect subordinate clause (‘because the angry man was driving very fast’). This type of construction was much more commonly spotted among the high-exposure CLIL learners.
c Amount of coordination
Computational measures for coordination were based on the coordinate phrase per T-unit ratio (CP-T). The normality tests revealed a non-normal pattern for all groups (Appendix 3); therefore, non-parametric Kruskal–Wallis tests were conducted.
Unlike the previous two elements, the Kruskal–Wallis test (Table 7) revealed no significant difference between groups (p = 0.136), so it was not reasonable to compute a post-hoc test. Nevertheless, the following box plot intends to provide a visual representation of how values for the verb phrase per T-unit ratio were distributed across the four groups under study (Figure 5).
Results of the Kruskal–Wallis test regarding amount of coordination.
Notes. EFL = English as a foreign language.

Box plot showing distribution for amount of coordination.
V Discussion
The primary aim of the present study was to analyse the effect of different levels of additional CLIL exposure on YLs’ oral grammatical complexity using ratings and computational analyses and controlling the SES and extramural exposure variables. The study also intended to compare the grammatical complexity of younger high-exposure CLIL learners with the other groups. The findings are twofold. First, the results of the ratings indicate that only the high-exposure CLIL groups (i.e. both age groups) obtained significantly higher values than the EFL-only group. The low-exposure CLIL group, by contrast, showed no significant advantage over the EFL-only group. Regarding age, the analysis of the distribution suggests that the age factor might have played a role in the high-exposure CLIL group if compared with the one-year younger high-exposure CLIL group, since a greater proportion of the older high-exposure CLIL learners appeared to be agglutinated in the highest quartile. Second, the results from the computational analyses have revealed that the high-exposure CLIL group (older group only) produced a significantly more complex oral grammar output than the EFL-only group at both the overall sentence complexity and the amount of subordination levels. No significant differences were revealed between groups at the amount of coordination level, although the box plot analysis showed a more positive trend for the high-exposure CLIL group in relation to the other groups. The fact that there was a significant CLIL effect at the overall sentence complexity and amount of subordination levels, but not at the amount of coordination level, might be explained by the discoursal features elicited by a picture-based storytelling task. In other words, learners might have naturally gravitated towards narrating the sequence of events in the story using syntactically equivalent units joined by coordinate conjunctions, irrespective of their ability to produce more grammatically complex sentences in terms of overall complexity or subordination.
Consequently, regarding our main research question, our results reveal a consistently significant advantage for the high-exposure CLIL group over the EFL-only group in the oral grammatical complexity subskill, as both the ratings’ values and the computational measures have provided statistically significant evidence which points in that direction. This finding would suggest that the additional CLIL exposure needed for specific sizeable language gains to take place with YLs might differ greatly from the 300 h threshold estimated for secondary-school-or-older populations (Muñoz, 2015). Our results support that grammatical complexity, unlike grammatical accuracy (Fernández Pena & Gallardo del Puerto, 2021; Vraciu, 2020), does seem to significantly benefit from high-CLIL exposure with YLs. Our findings, thus, dovetail with those by Martínez-Adrián and Nieva-Marroquín (2023) yet refine them by isolating the CLIL exposure variable from the age variable, since the high-exposure, low-exposure and EFL-only groups were matched for age, SES and extramural exposure. Our results also question the validity of low-exposure CLIL to reap a sizeable oral grammatical complexity advantage over EFL-only instruction. This fact might be explained by hypothesizing that the 707 h differential in CLIL exposure does not constitute enough CLIL exposure and may not be sufficient for YLs at this age to take full advantage of implicit learning, particularly in terms of enhancing oral grammatical complexity.
Concerning our secondary research question, the ratings indicated (1) significantly higher values for the younger high-exposure CLIL group over the EFL-only group, and (2) non-significant differences between the younger high-exposure CLIL group and the high-exposure CLIL group. These findings contribute nuanced evidence to the notion that YLs might not benefit from CLIL exposure as much as older learners due to their more limited cognitive and linguistic skills (Dalton-Puffer, 2007; Lasagabaster, 2011; Llinares et al., 2012). However, despite the significant improvement in the ratings, the computational analyses revealed no statistically significant advantage for the younger high-exposure CLIL group over any of the other groups on any of the three measures under scrutiny, namely overall sentence complexity, amount of subordination, or amount of coordination. This discrepancy highlights the need for further studies combining diverse, complementary data analysis methods in order to shed light on this issue.
VI Limitations and conclusions
Even though the present study sheds light on the CLIL exposure issue, it is not without limitations. First, it should be acknowledged that the sample in the present study was limited because of the comparatively large number of participants who had to be excluded due to their higher extramural exposure or attendance at extracurricular FL lessons. Second, it is problematic to determine whether the data obtained are the result of additional CLIL exposure, or simply additional exposure to the FL, since there is an absence of EFL-only programs providing an equivalent exposure time for comparison with CLIL programs (Ruiz de Zarobe, 2010). Third, it would have been interesting to have included a younger low-exposure CLIL group in the sample. However, since the age-matched low-exposure CLIL group did not show any significant gains over the EFL-only group, it seems even less likely that younger low-exposure CLIL learners would show such advantages. Third, the results were obtained with a particular task at hand, and evidence has proven that even seemingly similar tasks with different prompts might significantly affect learners’ oral performance in other aspects, such as lexis or fluency (De Jong & Vercellotti, 2016). Although the choice of task in the study lies in the fact that it is content-neutral, future studies could explore whether a task based on any of the CLIL subjects might yield different results in this respect. Finally, it is important to acknowledge that the present study did not examine syntactic complexity at the more advanced level of phrasal complexity, which would have provided a more comprehensive understanding of syntactic development.
In spite of these limitations, the present study broadens our understanding of the effect that different CLIL exposure levels have on YLs’ oral grammatical complexity. It does so by controlling for more variables, namely SES, extramural exposure, and the CLIL selection bias. Our results show that the high-exposure group produced significantly more complex oral grammar than the age-matched EFL-only group in both the ratings’ values and on two of the three computational measures analysed. The results of the raters’ assessment also suggest that even the younger high-exposure group produced significantly more complex oral grammar than the EFL-only group. By contrast, the 707 additional CLIL hours of the low-exposure group yielded no significant advantage over the EFL-only group in any of the measures, suggesting that a much higher amount of exposure (in excess of 2,000 h in the case of these learners) might be required to provide oral grammatical complexity gains with age-matched YLs. At a theoretical level, this finding would question the validity of the CLIL threshold suggested by Muñoz (2015) with YLs, and at the same time it would refine Housen’s (2012) claim that CLIL exposure might not be enough for YLs to show significant language gains since it would limit this non-significant effect to low-exposure CLIL. At the levels of language policy and planning, the findings in the study redefine expectations associated with the general ‘CLIL’ label (Dalton-Puffer et al., 2022) and draw a clear-cut distinction between high-exposure CLIL and low-exposure CLIL with respect to conventional EFL instruction. The consistently significant advantage of the high-exposure CLIL group over the EFL-only group across the ratings and two of the three measures in the computational analyses, namely overall sentence complexity and amount of subordination, should inform FL policymakers about the affordances of high-CLIL exposure over EFL-only exposure in terms of oral grammatical complexity with YLs. Conversely, the lack of a significant advantage for the low-exposure CLIL group over the EFL-only group in any of the measures analysed provides FL decision-makers with empirical evidence questioning the potential of this type of instruction to foster oral grammatical complexity with children at this age. These findings could be further explored by regional and local administrations conducting diagnostic assessment across schools with varying levels of CLIL intensity. At a research method level, our findings revealed a consistent pattern across the ratings and computational analyses vis-à-vis (1) a significant advantage of the high-exposure CLIL group over the EFL-only group, and (2) a lack of a significant advantage of the low-exposure CLIL group over the EFL-only group. By contrast, differing results across methods were observed for the younger high-exposure CLIL group, where a statistically significant advantage over the EFL-only group emerged only in the ratings. These findings compel us to advocate for the integration of two distinct yet complementary research methods: combining rater assessments with computational analyses to enrich our understanding of the complex interaction between CLIL exposure and different aspects of FL proficiency.
Footnotes
Appendix
Results of the normality tests.
| EFL | Low exposure | High exposure | Young high exposure | |||||
|---|---|---|---|---|---|---|---|---|
| Statistics | p | Statistics | p | Statistics | p | Statistics | p | |
| Ratings: | ||||||||
| Kolmogorov–Smirnov | 0.19 | .328 | 0.26 | .105 | 0.19 | .154 | 0.14 | .526 |
| Kolmogorov–Smirnov (Lilliefors corrected) | 0.19 | .029 | 0.26 | .001 | 0.19 | .003 | 0.14 | .119 |
| Shapiro–Wilk | 0.85 | .003 | 0.88 | .013 | 0.94 | .078 | 0.94 | .067 |
| Anderson–Darling | 1.19 | .004 | 1.25 | .003 | 0.9 | .022 | 0.68 | .075 |
| Complexity (CT-T): | ||||||||
| Kolmogorov–Smirnov | 0.33 | .009 | 0.19 | .399 | 0.12 | .718 | 0.16 | .361 |
| Kolmogorov–Smirnov (Lilliefors corrected) | 0.33 | < .001 | 0.19 | .051 | 0.12 | .296 | 0.16 | .04 |
| Shapiro–Wilk | 0.75 | < .001 | 0.89 | .027 | 0.92 | .026 | 0.91 | .011 |
| Anderson–Darling | 2.53 | < .001 | 0.76 | .047 | 0.66 | .086 | 0.92 | .019 |
| Subordination (DC-T): | ||||||||
| Kolmogorov–Smirnov | 0.31 | .018 | 0.18 | .438 | 0.11 | .776 | 0.22 | .07 |
| Kolmogorov–Smirnov (Lilliefors corrected) | 0.31 | < .001 | 0.18 | .067 | 0.11 | .377 | 0.22 | < .001 |
| Shapiro–Wilk | 0.71 | < .001 | 0.89 | .021 | 0.93 | .046 | 0.87 | .001 |
| Anderson–Darling | 2.67 | < .001 | 0.85 | .028 | 0.56 | .147 | 1.35 | .002 |
| Coordination (CP-T): | ||||||||
| Kolmogorov–Smirnov | 0.18 | .398 | 0.17 | .527 | 0.21 | .113 | 0.17 | .263 |
| Kolmogorov–Smirnov (Lilliefors corrected) | 0.18 | .051 | 0.17 | .117 | 0.21 | .001 | 0.17 | .016 |
| Shapiro–Wilk | 0.83 | .001 | 0.91 | .046 | 0.67 | < .001 | 0.89 | .004 |
| Anderson–Darling | 1.13 | .006 | 0.71 | .065 | 2.68 | < .001 | 1.15 | .005 |
Acknowledgements
We thank all participants, their parents, and schools for their participation. We are grateful to the Spanish Ministry for Science and Innovation, and the Public University of Navarra, for their financial support. We are also grateful to the editors and reviewers for their valuable insight and feedback, and to José Antonio Moler Cuiral for his guidance with the statistical procedures.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Spanish Ministry for Science and Innovation (grant number PID2020-113990GB-I00), and by the Public University of Navarra (grant numbers PJUPNA05-2022; PJUPNA2023-11401).
Ethical approval and informed consent statements
The study complies with the treatment of personal information policy at the Public University of Navarra and has the approval of its Ethical Committee. Authorized consents were granted by the parents or legal guardians of all participants.
