Abstract
Although multiple factors influence language proficiency in instructed settings, the prevalence of content and language integrated learning (CLIL) research in recent decades has placed intensity of exposure (via CLIL lessons) at center stage, sidelining other variables. This study aims to rectify this by examining the impact of CLIL alongside three additional factors: extramural English (EE), socioeconomic status (SES), and non-verbal intelligence (NVI). Specifically, this study analyses the interplay of these variables in the proficiency of 171 young English learners (aged 10–11 years) in Navarre, Spain. The participants were divided into a low-intensity (LI) (n = 54) group and a high-intensity (HI) (n = 117) group depending on exposure to English in school. Results indicate that HI learners are superior in reading, and even more clearly in speaking. EE is very frequent in both groups but more abundant among HI learners, and it shows several positive associations with learners’ scores. Higher NVI levels positively correlate with all skills in both groups, except for speaking, which appears to be affected by EE and, to a lesser extent, by SES.
Keywords
I Introduction
In the past decades, primary schools have witnessed vital changes in the way foreign languages are implemented. The foreign language, mainly English (Ushioda, 2017), has gained ground in terms of time and timing, that is, in terms of how early and how intensely it is introduced (Muñoz, 2015; Swain, 1981). These days, many primary schools worldwide offer content and language integrated learning (CLIL) sessions in addition to foreign language (FL) instruction and, in many cases, as early as the first years of school (Lorenzo et al., 2021; Muñoz, 2015). The potential that this new linguistic landscape offers to deepen our understanding of language acquisition – particularly the significance of intensity in instructional settings – has also spawned a large number of empirical studies. This research has mostly been conducted in secondary schools, where a number of advantages have been reported for CLIL cohorts (Muñoz, 2015; Pérez Cañado, 2012). Beyond academia, CLIL is also popular and has often become a source of social and political contention (Pérez Cañado, 2021). In Spain, for example, surveys and heated debates highlighting the advantages and downsides of the implementation of foreign languages in schools are common in the media, and so are political decisions increasing or decreasing the amounts and types of L2 instruction.
Focusing on the studies measuring L2 achievements, those conducted in secondary schools agree in establishing a correlation between higher exposure and higher levels of proficiency (e.g. Merino & Lasagabaster, 2018), although not without controversy. In fact, there have been warnings about sample biases in favor of CLIL groups (Bruton, 2013) and about the lack of control of some mediating variables (socioeconomic status, extramural English, intelligence, L2 motivation, etc.), which might be as important as (if not outweigh) CLIL itself (Pérez Cañado, 2018; Soto-Corominas, 2024). However, as the focus of most research has been placed on CLIL, all other variables have generally played a tangential role, with authors mainly referring to them when trying to explain and interpret their results, either in the limitation sections or, frequently, as elements to turn to in order to explain unexpected results. In addition, CLIL learners are usually treated as a single group and benefits are attributed to ‘CLILers’ as if they conformed a homogeneous group (vs. ‘non-CLILers’). Yet, in reality, the intensity of CLIL exposure varies a great deal from context to context, that is, from study to study (Lázaro-Ibarrola & Azpilicueta-Martínez, 2024). For instance, in some studies learners only receive one (Pladevall-Ballester, 2019) or two (Artieda et al., 2020) hours of CLIL classes per week on top of the English as a foreign language (EFL) classes, while in other studies CLIL learners receive five (Pérez Cañado, 2018), eight (Merino & Lasagabaster, 2018) or even 13 hours (Somers & Llinares, 2021) of extra exposure to English. Finally, in primary school, research remains rather scarce and the few available studies offer confounding results. Some authors have found advantages for young learners exposed to greater intensity via CLIL lessons (Jiménez Catalán & Ruiz de Zarobe, 2009), while some studies have reported no differences between CLIL and non-CLIL cohorts (Agustín-Llach & Canga Alonso, 2016; Gálvez Gómez, 2021; Housen, 2012; Soto-Corominas et al., 2024), or have highlighted the fact that, when there are advantages, these are small and restricted to specific aspects (Nieto Moreno de Diezmas, 2016; Stotz & Meuter, 2003).
In sum, more research with primary school learners is needed and CLIL intensity needs to be considered. Besides, the findings of CLIL research suggest that, as interesting as their results, is the fact that the researchers frequently refer to multiple variables that can interact with the role of intensity via CLIL lessons. This reveals the need to control as many variables as possible in order to move CLIL research forward. In this study, we will focus on some variables that have been brought forward as relevant in second language acquisition (SLA) research (Ellis, 1994; Skehan, 1991). Specifically, we will focus on socio-economic status (SES), non-verbal intelligence (NVI), and extramural English (EE). Thus, the present study aims to examine the interplay of these three factors and CLIL intensity on the proficiency of primary school students learning English as a foreign language. With our study, we hope to shed more light on the effectiveness of linguistic programs in schools and, ultimately, on the intricacies of SLA in childhood. We also hope to do our bit to embrace the ‘SLA for all’ movements (Andringa & Godfroid, 2020a, 2020b). Our study addresses the age gap and the social gap, as our participants are children who attend state schools in working-class neighborhoods and participate in both CLIL and non-CLIL programs from the start of their education, without any pre-selection based on social or intellectual superiority.
The theoretical background is divided into two sections. The first one highlights the shortcomings in CLIL research, and the second one deals with the mediating variables other than CLIL that are under scrutiny in this article.
II Theoretical background
1 Shortcomings in CLIL research
With the proliferation of CLIL studies, there is consensus among researchers that the future research agenda on CLIL should try to address the flaws of previous studies. This article constitutes an attempt to address three of them: (1) the need to address younger populations (Lázaro-Ibarrola, 2023); (2) the need to consider a series of variables whose effects have been neglected (Pérez Cañado, 2012; Pérez Cañado, 2016); and (3) the need to consider different degrees of intensity, given that the amount of (weekly and accumulated) exposure to English lessons varies a great deal from context to context. Given that the effects of CLIL implementation can vary significantly across countries due to contextual differences (Sylvén, 2013), our discussion of CLIL outcomes mainly concentrates on studies carried out with young learners and in Spain, which is the setting for the present research and also the most prolific setting in the literature. However, as some recent reviews highlight, there are CLIL studies in many other parts of Europe (Goris et al., 2019) and Latin America (Ruiz de Zarobe & Banegas, 2024; Rumlich, 2020).
a CLIL in primary school
CLIL research has greatly contributed to our understanding of the impact of intensity in school contexts. In secondary school, most studies suggest that learners following CLIL strands obtain higher levels of proficiency than their non-CLIL peers (see Dalton-Puffer et al., 2010; Muñoz, 2015; Pérez Cañado, 2012; Pérez Cañado, 2021; Ruiz de Zarobe & Jiménez Catalán, 2009). Nevertheless, to date, there is little evidence to support the advantages of CLIL for elementary school students (Soto-Corominas et al., 2024).
For instance, Housen (2012) compared, among other groups, 24 Italian learners of English (grades 3–4) in an EFL context with 72 Italian learners of English in an elitist CLIL context (European Schools). His results showed very similar outcomes in terms of rate and outcome of L2 learning. Housen (2012) concluded that during the first 3 years of primary education, the extra exposure was not enough for students to benefit from it (although an advantage in lexical development was identified). In another study with 223 learners in Jaén (Southern Spain), Gálvez Gómez (2021) found very clear differences when comparing CLIL and non-CLIL students in secondary education but no differences between cohorts regarding the oral production of primary education learners (year 6). In a similar vein, also in monolingual Spain, Agustín-Llach and Canga Alonso (2016) compared 58 CLIL and 49 non-CLIL learners and found similar receptive vocabulary sizes in CLIL and non-CLIL groups in primary school. Pladevall-Ballester and Vallbona (2016) examined the receptive skills of 287 primary school students in Catalonia and found that a group of CLIL learners had no advantages in reading and were, in fact, inferior to EFL learners in their listening skills. With slightly more optimistic results, in a study conducted with Spanish primary school students in both CLIL and non-CLIL strands, Nieto Moreno de Diezmas (2016) examined the proficiency of over 19,000 learners in Castilla La Mancha (Spain) and discovered benefits for CLIL groups in speaking and interacting. Stotz and Meuter (2003) analysed data from CLIL and non-CLIL learners in Zurich and found that oral production was the same for both groups, but CLIL learners were better at listening.
In sum, the few available studies report few or no advantages, and when advantages are reported, they are small and restricted to specific aspects, which vary from study to study. In light of these findings, some authors have suggested that the introduction of CLIL could take place when learners are a bit older (in middle school), which would help optimize resources (Lorenzo et al., 2010).
b Intensity
Another research gap in CLIL research is the need to account for the differences in amounts of exposure not only between non-CLIL and CLIL learners but also between CLIL learners exposed to different quantities of CLIL lessons. Merino and Lasagabaster (2018), tested a group of 285 adolescents divided into three intensity groups twice (when they were 11–12 and when they were 12–13). The results showed that the learners following a high-intensity track were superior in terms of proficiency while there were no differences between low-intensity CLIL learners and non-CLIL learners. Likewise, Ruiz de Zarobe (2008, 2010) analysed data from 89 secondary school learners with different intensities of exposure to CLIL, and also from a non-CLIL group of learners with only EFL lessons. Her findings showed that the high-intensity group scored higher in all the aspects under analysis, suggesting that greater exposure via CLIL enhances learners’ performance. However, the author also explained that while some learners in the CLIL group received extra lessons, the learners included in the non-CLIL group did not.
c Control of variables
CLIL research is necessarily conducted in existing programs, which is extremely valuable to informing real teaching practices (Collins et al., 1999). However, many of the factors that influence second language learning are hard or impossible to control. Thus, studies do not always provide information on key variables that mediate the impact of CLIL (Dalton-Puffer et al., 2010) and very few studies have included experimental and control groups, and when they have, cohort matching has not usually been present (see Pérez Cañado, 2012; Pérez Cañado & Lancaster, 2017). Besides, there have been serious concerns regarding a certain selection of students in CLIL cohorts, which has probably caused an overestimation of the program’s impact (Bruton, 2013; Mearns et al., 2020). The flaws of CLIL studies also become apparent when the authors make reference to the possible influence of different variables while discussing their results. For instance, Admiraal et al. (2006) and Stotz and Meuter (2003) admit that there was no initial matching of their cohorts, which calls into question the attribution of gains to CLIL. Similarly, Prieto-Arranz et al. (2015) are cautious when presenting the gains of their CLIL learners in reading because in two of the tests, the groups were significantly different at the outset. Lasagabaster (2008) reported gains in all tests for his CLIL cohorts but pointed out that students choosing CLIL programs were probably more academically gifted. In a similar vein, Alonso et al. (2008) in their comparison of CLIL and non-CLIL groups in secondary school explain that the CLIL schools only selected students who demonstrated high linguistic competence in English and Basque. In addition, their CLIL learners had started to learn English earlier and performed more out-of-school activities in this language. Ruiz de Zarobe (2008) mentions that the non-CLIL group did not receive any extra English classes outside school, which was not the case for the CLIL groups. Likewise, in the case of Olaizola and Mayo (2009), these authors report that 78% of the students attended extracurricular lessons in the CLIL group while none did in the non-CLIL group. Hüttner and Rieder-Bünemann (2010) concluded that their CLIL students performed better on both micro- and macro-level features of oral narratives but also highlighted the fact that students in the CLIL group could have been more motivated. In primary school, Pérez Cañado (2018) conducted a large-scale longitudinal study across various communities in Spain and found that some of the differences between CLIL and non-CLIL learners waned when considering other variables, such as school type (charter vs. public strands). This is also the case in the study by Soto-Corominas et al. (2024). These authors – moved by the same motivation as the present article, that is, to discern the effect of CLIL by considering other variables among young learners – analysed data from children at the onset of primary and did not find evidence that the greater intensity provided by CLIL resulted in any particular advantages once other variables were accounted for. The authors concluded that their results did not provide support for the implementation of CLIL at such early ages.
In conclusion, despite the significant advancement in our understanding of intensity brought about by CLIL research, we still know very little about its impact in primary school and about the conjoint role of other factors. There is consensus that the research agenda on CLIL should remedy the flaws of previous studies by introducing stricter control of the mediating variables (Pérez Cañado, 2012; Pérez Cañado, 2016). Three of them – socioeconomic status, extramural English, and nonverbal intelligence – are the focus of the present study and will be the subject of our discussion in the sections that follow.
2 Mediating variables
a Socioeconomic status
SES is used in sociological research to identify differences related to social classes. In education surveys and reports, it is frequently used as a valid measurement and a relevant variable (Lorenzo et al., 2010). It encompasses a series of aspects such as family income, occupations, space in the house or material possessions.
As Rascón Moreno and Bretones Callejas (2018) point out, addressing socioeconomic status in CLIL research is a dire need because it is one of the factors to which the success of CLIL has been ascribed (Bruton, 2013). The few studies addressing SES in CLIL contexts have focused on measuring its correlation with FL achievements. These studies have found that SES differences do not have an impact (or not so much) on the L2 proficiency of CLIL learners. In contrast, SES has an impact in non-CLIL contexts, where learners from privileged backgrounds tend to obtain higher scores (Admiraal et al., 2006; Lorenzo et al., 2021; Rascón Moreno & Bretones Callejas, 2018). For instance, Lorenzo et al. (2021), in a very large-scale study, reported a certain egalitarian effect in CLIL programs, after finding that proficiency was unaffected by SES level among CLIL learners while a staircase pattern was identified in the performance of non-CLIL students, favoring those from higher social classes. Along the same lines, but focusing on content knowledge, Rascón Moreno and Bretones Callejas (2018) found that SES played a significant role in content knowledge in non-CLIL groups (in primary and secondary education) while it did not impact content knowledge among CLIL learners. With less optimistic results, Anghel et al. (2016) demonstrated that low SES levels correlated negatively with achievement in bilingual programs in primary school. In particular, students of parents without higher education displayed poorer knowledge of the content in the subjects taught through CLIL. However, the authors also speculated that when schools were better equipped and prepared to teach English, with teachers who had received adequate prior English training, the negative effects might be mitigated. Along the same lines, Fernández-Sanjurjo et al. (2018) reported that CLIL learners fared worse than non-CLIL learners in science, and low-SES learners fared worse in both contexts, but especially so in CLIL. In fact, the authors reported that low-SES CLIL learners did not meet the academic standards of the primary education curriculum.
b Non-verbal intelligence
The study of intelligence has generated innumerable disagreements, especially regarding its definition and how to measure it (Oller, 1981). The language has often been seen as the main product of human intelligence and as an artifact that enables intelligence to develop. Therefore, language and intelligence are understood as indissoluble (Chomsky, 1972; Oller, 1981). In fact, the concept of intelligence, despite controversies, has been classified into different components, which include, among others, verbal and non-verbal intelligence (NVI) (see Gardner, 2011). While the connection between verbal abilities and language acquisition seems a truism, the connection with NVI is less apparent.
NVI refers to thinking skills and problem-solving abilities that do not rely on verbal language production or comprehension. It aligns well with IQ scores and, although not present in CLIL studies, some researchers have placed this type of intelligence among the many individual variables that shape language learning aptitude. The clearest results come from a series of studies demonstrating a strong association between NVI and L2 morphology (Brooks & Kempe, 2013; Brooks et al., 2016; Kempe & Brooks, 2008). In terms of general proficiency, Genesee (1976) investigated young learners in grade 4, 7 and 11 following two language programs in Canada: French immersion and French as a second language. A sample of average, below average, and above average students regarding their IQs was selected in each program. The results showed that learners’ performance on the reading and language usage tests correlated positively with IQ level. In contrast, no correlation was found in the performance on the tests of listening comprehension and interpersonal communication skills. This was true for all students, regardless of grade level and language program. In a later study, Genesee and Hamayan (1980) also reported a correlation between NVI and language development but showed that it was less strong with younger learners. In line with these findings, a study by Lasagabaster (2001) demonstrated that NVI played a crucial role in the language skills of 14-year-old children learning English as a foreign language, although this was not so much the case in the case for 11-year-olds, as among these younger students, creativity exerted a greater influence. In CLIL studies, NVI is not present either as a measure to control cohort matching or as a variable to focus on.
c Extramural English
Extramural English (EE) is the term used to refer to the English that students encounter outside of the classroom, for example, when students watch TV, play video games, read comics, or listen to music in English. Although mostly neglected in the literature, EE is common and on the rise in Europe and beyond (Schurz & Sundqvist, 2022). Some empirical studies have shown that it has an impact on learners’ proficiency (Kuppens, 2010; Lindgren & Muñoz, 2013; Peters, 2018; Soto-Corominas et al., 2024) and that some children can even acquire a certain level of English proficiency via EE before they receive English lessons (De Wilde et al., 2022; Muñoz et al., 2018; Puimège & Peters, 2019). Also, at a very young age, EE can have greater relevance than exposure to the language in the classroom (Soto-Corominas et al., 2024). In the context of Sweden, Sundqvist (2009) showed an association between oral proficiency and vocabulary, and EE among adolescents (ages 15–16 years). In a follow-up study, Sylvén and Sundqvist (2015) also demonstrated a mutual positive influence between EE and English proficiency in younger children (ages 10–11 years). Olsson and Sylvén (2015) did not find an impact of EE on the vocabulary growth of secondary school students (ages 16–19 years) but reported that CLIL students spent more time on EE than their non-CLIL peers. In a large-scale study conducted in several European contexts, Lindgren and Muñoz (2013) found that EE, especially watching TV, had a positive impact on the listening and reading outcomes of young learners (aged 10–11 years). Likewise, Muñoz et al. (2018) demonstrated that exposure to films in English positively correlated with listening comprehension among Spanish/Catalan and Danish children learning English (at the age of 9 years). In Belgium, Peters et al. (2019) focused on Flemish learners of English and French. Their results showed that, despite fewer years of English instruction, learners’ vocabulary knowledge in English was considerably larger and suggested that the reason for this could be the learners’ large amounts of out-of-school exposure to English, mainly via online activities. Yet, the authors did not consider linguistic distance in their study, and Flemish vocabulary is much closer to English than to French. In another study with Flemish participants, Puimège and Peters (2019) examined vocabulary knowledge among 560 participants (aged 10–12 years) who had not yet received any formal English lessons. Their findings showed that extracurricular exposure to English resulted in significant vocabulary acquisition. The study also found that the most influential factor in predicting word knowledge was the presence of cognates (words with similar forms and meanings in Flemish and English).
III Methodology
1 Participants
The original pool of participants consisted of 224 learners belonging to five schools. All the schools were state-funded and located in working-class areas in the city of Pamplona (Navarre) in Northern Spain. Navarre has two official languages: the majority language, Spanish, and the regional language, Basque. Around 30% of state schools offer the possibility to learn in Basque (Basque immersion schools), and the rest have Spanish as the main language. In all cases, English is taught as a foreign language with different degrees of intensity. Regarding the schools involved in our study, two of them were Basque-immersion schools, and three of them were Spanish schools. The choice of school in the area is generally made by the parents based on location and preferred main language of instruction (Spanish or Basque).
To ensure comparability of the groups, a questionnaire was administered to all the learners prior to the data collection (see in supplemental materials online). Only participants who complied with the following requirements were included in the sample: (1) the learners had started to learn English in school from the first year of primary; (2) they did not use English at home with parents, siblings, or friends; and (3) they had not stayed in an English-speaking country for more than two weeks.
In another we also gathered information about learners’ participation in English lessons after school (see in supplemental materials online). These extracurricular lessons are a common option in some contexts, like Spain, and normally take place either in private language schools or via private tutors. Students who were not receiving extra lessons in the present year and had not received extra lessons in the past three years were classified as learners without extra lessons. Students who were receiving lessons in the present year or had received them at least in the past year were classified as learners with extra lessons. All students who did not fit precisely into one of these categories were excluded from our study. However, we could not control the quantity or quality of the extra lessons.
Our final sample consisted of one 171 learners. In terms of intensity of exposure to English in school, the participants were divided into two clearly distinguishable groups (Table 1). The HI group comprised 117 students (68 boys and 49 girls). All these learners were enrolled in the Spanish schools and had received five English lessons and between five and seven CLIL lessons per week from the beginning of schooling. The CLIL lessons included Arts and Crafts, Science, and Social Sciences. The LI group comprised 54 students (30 boys and 24 girls). All of them were enrolled in the Basque-immersion schools and had received either five English EFL lessons or five EFL lessons and two CLIL lessons per week in Arts and Crafts, also from the beginning of schooling. As for extra lessons, slightly over 40% of the learners in each group received them after school: 53 learners in the HI group (45.29%) and 24 in the LI group (44.44%).
Participants and amount of exposure in school.
Notes. CLIL = content and language integrated learning. EFL = English as a foreign language. LI = low-intensity.
2 Questionnaires and tests
All the participants completed a SES and an EE questionnaire, an NVI test, and four linguistic tests.
a SES Questionnaire
The participants completed an official socioeconomic status (SES) questionnaire used by the regional Government (see in supplemental materials online). This test, with a maximum score of 30 points, included information on parental education, occupation and income, space in the house, and access to resources (books, culture, and technology). The students took the questionnaires home on paper before completing them online in the school to make sure they knew how to complete the information.
b Nonverbal intelligence test
The Raven’s Progressive Matrices (RPM) Test (version for children 5–12) was employed to measure NVI. All the participants in our sample completed this test and their answers complied with the test validation criteria. These validation criteria are very strict and state that only tests that show consistency across sets when compared to norms (with a maximum variation of +2/−2) and in which learners do not score higher on a later set than on an earlier one (reversals) are valid.
c EE questionnaire
In the EE questionnaire, the learners were asked about the type of activities they performed in English outside school and about the amount of time they devoted to them every week. The activities included are the following: reading, listening to music, playing video games, watching series, films or programs, surfing the internet, and performing other activities (such as sports or music) in English. The questionnaire mirrored questionnaires employed in previous research (Sundqvist & Sylvén, 2016); however, we excluded the question ‘Do you speak English with family or friends?’ The reason for this was that we did not include learners who spoke English with family or friends in our pool of participants. The questionnaire consisted of six items, and the value for Cronbach’s alpha showed acceptable internal consistency reliability for the survey (α = .738).
d Proficiency tests
To measure proficiency in the four skills, the students completed four proficiency tests. For listening and reading comprehension, a Cambridge Flyers Test (A2) was employed. The test was administered online. For speaking and writing, the students had to narrate a story following six pictures. The pictures (Heaton, 1996) were chosen after discussion with the school teachers (see Appendix A). The oral production of the students was audio and video recorded. The compositions were written by hand. The oral recordings and the written narratives were rated by one independent rater using holistic rubrics (see in supplemental materials online). To assess the speaking test, we employed the rubric used by Merino and Lasagabaster (2018). To assess the compositions, the Cambridge writing assessment subscales for A2 Key for Schools were employed.
3 Ethical issues
The project under which this study was conducted obtained institutional approval and the research was carried out following the institution’s research protocol. All the tests were supervised and approved by the school teachers and administered with their help. The researchers had personal meetings to explain to the schoolteachers how to administer the tests. They also provided them with some basic written instructions about the procedure. Authorized consents were signed by the parents or legal guardians of the participants before data collection. As for the children themselves, the teachers explained the objectives of the research to them, and they were allowed to decide whether they wanted to participate or not. All the children decided to participate.
4 Statistics and ratings
As for statistical analysis, K-S tests for normality were conducted. As a non-normal distribution of the population (p < 0.01) was found, the data analysis between groups for each of the skills was performed using Mann–Whitney U tests. Significance was set at p ⩽ 0.05. Spearman’s correlation coefficients were used to assess the strength and direction of association between two variables; in our case, it was used to assess the association between the scores obtained in each of the linguistic tests and the following variables: NVI, SES, and EE. When significant correlations were found using Spearman’s test, regression analyses were conducted to further investigate the relationships and, specifically, to understand how much of the variability in the linguistic test scores can be explained by the independent variables (NVI, SES, EE). To interpret the magnitude of effect sizes, we followed the alternate field-specific scale proposed by Plonsky and Oswald (2014) for L2 research: d values around .40 are small, around .70 medium, and around 1.00 large; for correlations, rho (ρ) coefficients close to .25 are small, close to .40 medium, and close to .60 large.
IV Results
As Table 2 and Figure 1 show, the HI group receives higher ratings across the board. In two skills, listening (p = .317) and writing (p = .162), the difference is not statistically significant. In reading (p = .019), the mean score (p = .004), and especially in speaking (p = .001), the difference between groups is significant, and effect sizes range from small to medium confirming the superiority of scores among HI learners.
Results for linguistic tests.

Linguistic scores in both groups.
As for SES (see Table 3 and boxplots in Figure 2), the two groups had very similar levels (p = .334), although intragroup variability was slightly greater in the HI group, which can be partly explained by the larger number of learners. In terms of correlations, no association between scores and SES was found in the LI group. In the HI group (Table 4), on the other hand, SES presented associations with all the skills. There was a moderate correlation with the speaking (ρ = .321) and the mean score (ρ = .317), and a weak correlation with the writing (ρ = .221), listening (ρ = .187), and reading scores (ρ = .228). All of these correlations were positive, that is, the higher the SES, the higher the scores.
Socioeconomic status (SES) scores (0–20).

Boxplot for socioeconomic status (SES) scores.
Correlations between socioeconmic status (SES) and learners’ scores.
When examining NVI, both groups obtained similar scores (Figure 3), and the statistical analyses confirmed no significant differences between them (Table 5). It is also noteworthy that there was considerable intragroup variation in both groups (standard deviations around 9), suggesting that the learners display quite different levels of NVI.

Boxplots for non-verbal intelligence (NVI).
Non-verbal intelligence (NVI) scores (0–60).
In terms of associations, positive correlations between NVI and the scores were found in both groups (Table 6). In the HI group, the positive associations were small for listening (ρ = .192) and reading (ρ = .228) and moderate for writing (ρ = .289) and the mean score (ρ = .262), that is, in this group, the speaking score was the only one that did not seem to be affected by NVI. In the LI group, the positive associations were all weak and appeared in connection to listening (ρ = .273), reading (ρ = .341), and the mean (ρ = .335).
Correlations between non-verbal intelligence (NVI) and linguistic scores in high-intensity group.
As for EE (Table 7, Figure 4), the first observation that we need to make is that it was frequent in our database and that most learners engaged in some activities in English outside school. Only seven learners (five in the HI group and two in the LI group) did not report engaging in any activity involving English outside school. Overall, the HI group devoted more time to informal activities in English than the LI group (8.4 hours per week vs. 6.7 hours); in fact, these learners devoted more time to all the activities, except for listening to music. This difference is significant (p = .046) and with a medium effect size (d = .622). In particular, HI learners devote a significantly larger amount of time to reading comics and books (1.1 hours per week vs. 0.5 hours) (p = .001; d = .846); to performing other activities, such as sports (1.2 hours vs. 0.5 hours) (p = .002; d = .735); and to playing video games (1.5 hours vs. 1 hour) (p = .037; d = .621). On the other hand, listening to music was the most popular activity in both groups (2.4 hours in the HI group and 2.6 hours in the LI group), and the time devoted to this activity, to watching TV, and to using the internet did not present significant differences between groups.
Distribution of extramural English (EE): Mean number of weekly hours in the two groups.

Distribution of extramural English (EE) activities.
In the HI group (Table 8) reading books or comics in English (ρ = .263), listening to music (ρ = .246) and, especially, watching TV (ρ = .323) showed a positive association with the oral and mean score. In the LI group (Table 9), there was a weak association between using the internet and the reading score (ρ = .322).
Correlations between extramural English (EE) and linguistic scores in the high-intensity group.
Correlations between extramural English (EE) and linguistic scores in the low-intensity group.
Finally, in light of the results and given that associations between SES and NVI with three of the skills and the mean score in the HI group were found, a multiple regression analysis was conducted. The ANOVAs confirmed the significance of these two factors for the three skills (p = .001 for writing, listening, and mean; p = .002 for reading). The R-squared values indicated that 15.8% of variance in the mean scores, 14.5% in the writing scores, 10.1% in the reading scores, and 6.6% in the listening scores could be accounted for by learners’ NVI and SES levels.
V Discussion
In terms of scores, the HI group obtained higher scores in all skills and this superiority reached statistical significance in reading, in the mean score, and, especially, in speaking. Thus, these positive findings shed an optimistic light regarding the impact of HI CLIL programs in primary schools, where the evidence so far had not been strong enough to support the advantages of CLIL (Soto-Corominas et al., 2024). The superiority of the HI program also aligned well with the positive findings from secondary schools considering CLIL intensity and reporting advantages in high-intensity tracks (Merino & Lasagabaster, 2018; Ruiz de Zarobe, 2008, 2010). As for the skills, our findings contributed to the controversy as to which skills benefit from CLIL, as they coincided with some previous findings but clashed with others. For instance, they went in line with Nieto Moreno de Diezmas (2016), who also reported advantages in speaking, but contradicted the findings by Stotz and Meuter (2003) and Gálvez Gómez (2021), who found no differences in this skill. Regarding listening, our findings clashed with Stotz and Meuter (2003), as these authors found advantages that we did not. Regarding reading, they clashed with Pladevall-Ballester and Vallbona (2016), who found no advantages in this skill. However, in their study, CLIL learners had received only 54 hours of extra exposure, while in our study the difference in exposure between groups was much larger (over 700 hours accrued in five school years). Also, they coincided with Ruiz de Zarobe (2010) in finding no differences in the writing skill, which she attributed to the fact that, due to the learners’ young age, they had not reached control of this skill yet. This could also be the case in our learners. All these differences were probably due to the great variety of variables involved in each study, including research methodologies, hours of exposure, learners’ profiles, etc. In any case, our findings contributed to demonstrating the effectiveness of CLIL in primary school, when the intensity of exposure is large and accrued over a long period of time.
As for SES, in the LI group, there were no associations between SES and the scores. On the contrary, we found positive associations with all skills in the HI group, weak with writing, listening, and reading, and moderate with speaking (and with the mean). The identification of correlations between SES and scores generally agreed with previous claims (Lorenzo et al., 2021) and so did the fact that these associations were not very strong, given that several authors had reported that CLIL minimizes the influence of SES (Admiraal et al., 2006; Lorenzo et al., 2021; Rascón Moreno & Bretones Callejas, 2018). The fact that no associations were found in the LI group might be because there were fewer learners (n = 54) and intergroup variability was smaller. Therefore, the statistical analyses might not have been powerful enough to grasp potential associations.
Our results also revealed that there was a wide range of NVI scores within each group, suggesting diverse cognitive abilities among the learners, and that higher levels of NVI positively influenced learners’ scores in both groups. These positive associations were present in writing, listening, reading, and the mean score in the HI group; that is, the speaking score was the only one that did not seem affected by NVI (although, as reported above, it was affected by SES). In the LI group, positive associations were found with the listening, reading and mean scores. These associations reinforced the idea that language and intelligence were indissoluble (Chomsky, 1972; Oller, 1981), confirming the few studies that had demonstrated a positive link between NVI and the ability to learn some aspects of a second language (Brooks & Kempe, 2013; Brooks et al., 2016; Kempe & Brooks, 2008; Genesee, 1976). The strength of the associations (small and moderate) also partly reinforced the claim that the influence of NVI was probably not very strong among young learners (Genesee & Hamayan, 1980; Lasagabaster, 2001). Also, our data revealed no associations between NVI and speaking, which might be explained because this skill involves other types or abilities, such as creativity, phonetic abilities, or even a certain degree of self-confidence. Speaking, measured in terms of interpersonal communication, was also unaffected in the study conducted by Genesee (1976).
Our study also revealed that EE was very frequent among young learners and that, in line with previous research (Olsson & Sylvén, 2015; Sylvén & Sundqvist, 2015), HI learners spent more time engaged with EE than LI learners. Thus, HI learners read, performed other activities, and played video games in English more frequently. Listening to music was the most popular activity in our groups, while in Swedish studies the most popular activity was watching TV. Perhaps Swedish learners found watching TV in English more accessible than our participants, who resorted to listening to music, which was less demanding but also less conducive to learning. In the HI group, where EE was more intense, we also found that reading books or comics in English, listening to music and, especially, watching TV (coinciding with the studies by Lindgren and Muñoz, 2013; Muñoz et al., 2018) showed a positive association with the oral score and with the mean score.
In the LI group, on the contrary, there was only a weak association between using the internet and the reading score. Probably, using the internet in this group also entailed reading or, perhaps, it also entailed using some English learning apps that seemed to be becoming more and more popular. Future studies will need to find out more information about the nature of EE activities. For example, we need to know if they watch TV with captions, if they listen to music and try to understand and/or repeat the words, if they read in English with the help of dictionaries, if their use of internet entails watching videos or mostly reading, etc. In any case, our data seemed to suggest a weak but positive influence of EE activities, especially in the HI group. All this confirmed previous research reporting a positive mutual influence between EE and proficiency (De Wilde et al., 2022; Muñoz et al., 2018; Olsson & Sylvén, 2015; Sundqvist, 2009; Sundqvist & Sylvén, 2016; Sylvén, 2010; Sylvén & Sundqvist, 2015). Results for the LI group, on the other hand, remained inconclusive in this respect, probably, as we hypothesized in the case of SES, due to the smaller number of participants but, in this case, also due to the smaller amount of time these learners devoted to EE.
Finally, when we considered our findings regarding EE in connection to the findings regarding scores we saw that in the HI group, EE (watching TV) mainly benefited the oral score, which happened to be the one in which these learners more clearly exceeded the LI group, and the skill that was most affected by SES. All this suggested that the program intensity, reinforced by SES and by EE, exerted a positive influence in the speaking score. In addition, we knew that HI learners read a lot more in their free time and that their reading score was also superior to that of the LI group. When considering all this together it appeared that the conjoint potential of intensity and EE, in particular of watching TV and reading, might have been exerting a very beneficial role. It was also possible that the positive associations could be explained because the best students might have been the ones who devoted more time to EE. In any case, there was probably a mutually reinforcing loop between in and out-of-school exposure worth further exploration.
Our study had many limitations. For instance, although the pool of participants was relatively homogeneous in many aspects, it was also relatively small, and the learners belonged to different schools with different teachers and teaching styles. The HI group is made up of learners attending schools whose main language of instruction is Spanish, while the LI group is made up of students whose main language of instruction is Basque. The cultural environment of these two linguistic programs might affect learners’ motivation towards English (Azpilicueta-Martínez & Lázaro-Ibarrola, 2023; Lázaro-Ibarrola & Azpilicueta-Martínez, 2024). Also, the EE questionnaire was very general and may not have captured all the information effectively. In fact, our questionnaire asked participants to estimate the average time spent every week, which could prove particularly challenging for young learners. Future studies could use language diaries and other tracking resources to recall time spent on specific activities more accurately. On the other hand, despite its acceptable reliability, the questionnaire might benefit from further examination of its content and construct validity. Addressing these validity concerns in future studies would bolster the overall robustness of the research instrument and would help face the challenge of controlling exposure outside institutional settings (Sundqvist, 2024; Sundqvist & Uztosun, 2023). In addition, the linguistic tests might not have been discriminating enough and their self-reports in the EE or SES test might not have been fully reliable. Furthermore, as the tests were administered by schoolteachers without our direct oversight, we could not rule out the possibility of procedural variations. These limitations called for caution about our claims. Our findings need to be tested in future studies in different contexts with more participants, with more information about EE, etc. Also, it would be interesting to see if these variables maintain their weight in secondary schools.
VI Conclusions
Our study aimed to address several gaps in CLIL research. It has shed light on the role of intensity among young learners, operationalizing intensity as the amount of exposure to the language inside and outside the classroom, and demonstrating that both types of exposure appear to reinforce each other, positively affecting learners’ scores. Additionally, we attempted to measure the relative influence of SES and NVI, both of which seem to contribute to the proficiency of participants, as evidenced by small to moderate associations with scores, particularly in the case of NVI, which was found to correlate with scores in both groups and across all skills, except speaking. In sum, our results have showed that HI learners, who also engage more actively in EE, outperform their LI counterparts in reading, and especially in speaking. This skill seems to be unaffected by NVI, but positively associates with EE activities (watching TV and listening to music) and with higher SES levels. Reading, in turn, also seems to be positively affected by extramural reading in the HI group and by using the internet in the LI group.
From a pedagogical perspective, the study highlights the need to make teachers aware of the importance of EE. EE activities, especially watching TV and reading, could be integrated in the L2 lessons, and their use at home could also be encouraged from the school. Also, practitioners should be aware of the role that NVI and SES could exert to better understand the students’ achievements and limitations. For L2 policies, our study supports the implementation of HI programs in primary school, as they are clearly superior to LI ones. Finally, for SLA research, the findings can contribute to gain a more comprehensive understanding of language acquisition with young learners in instructed settings.
Supplemental Material
sj-doc-1-ltr-10.1177_13621688241292277 – Supplemental material for What factors contribute to the proficiency of young EFL learners in primary school? Assessing the role of CLIL intensity, extramural English, non-verbal intelligence and socioeconomic status
Supplemental material, sj-doc-1-ltr-10.1177_13621688241292277 for What factors contribute to the proficiency of young EFL learners in primary school? Assessing the role of CLIL intensity, extramural English, non-verbal intelligence and socioeconomic status by Amparo Lázaro-Ibarrola in Language Teaching Research
Footnotes
Appendix A
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Spanish Ministry of Science and Innovation [grant number PID2020-113990 GB-I00], State Research Agency (AEI).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
