Abstract
There is considerable debate about whether bilingual children have an advantage in executive functioning relative to monolingual children. In the current meta-analysis, we addressed this debate by comprehensively reviewing the available evidence. We synthesized data from published studies and unpublished data sets, which equated to 1,194 effect sizes from 10,937 bilingual and 12,477 monolingual participants between the ages of 3 and 17 years. Bilingual language status had a small overall effect on children’s executive functioning (
Keywords
Questions concerning the bilingual advantage in children have become a critical focus in the broader debate about bilingual language status and its relation to executive functioning. According to the prevailing
In light of these claims, we conducted an exhaustive and comprehensive review of studies of the relationship between language status and executive functioning in children. In all, we included data from 136 peer-reviewed articles, 11 doctoral theses, and two unpublished data sets spanning the period from 1987 to November 2020 and that together reported findings from the study of 23,414 children (10,937 bilinguals and 12,477 monolinguals) between the ages of 3 and 17 years. We chose 3 years as the lower bound because it is around this age that children can complete measures of executive functioning that are comparable with tasks completed by older children. We chose 17 years as the upper bound because although age-related changes in executive functioning continue into early adulthood, children are furthest from a putative performance ceiling prior to the age of 18 years (Davidson et al., 2006).
Language-status effects were assessed on an exhaustive set of executive-functioning measures including operationalizations considered central to the bilingual-advantage hypothesis (e.g., Bialystok, 2017). In all, we included 1,194 separate effect sizes based on task-based measures of selective attention, flexibility, working memory, response inhibition, automatic attention (such as alerting and orienting), and planning, as well as global survey measures of executive functioning. We tested for an overall effect of language status on all measures of children’s executive functioning aggregated together. We also tested for effects of language status within specific domains of executive functioning given that executive functioning is generally considered a multidimensional construct, and language-status effects have been hypothesized to be stronger in some domains of executive functioning than others (Bialystok, 2017; Bialystok et al., 2009; Carlson & Meltzoff, 2008). Specific effects of language status were therefore tested within nine different domains of executive functioning, each defined according to gold-standard definitions in the literature, and that included three domains of executive attention thought to be particularly germane to detecting the bilingual advantage (Bialystok, 2017).
In view of concerns surrounding the methodological rigor of studies examining the bilingual advantage in children, we examined the relationship between the magnitude of reported effects and the methodological quality of reporting studies (Morton, 2015). We applied an objective measure of study quality called the Appraisal Tool for Cross-Sectional Studies (AXIS), which evaluates studies according to their reported objective measurement of independent and dependent variables, use of representative samples, and transparent discussion of study limitations. Additionally, we examined specific indices of study quality that have been discussed in the literature, including the measured equivalence of groups and the control of socioeconomic status (SES).
Additional moderation analyses examined whether language-status measurement has implications for the assessment of language-status effects on children’s executive functioning (DeLuca et al., 2019). We tested whether reported effect sizes varied depending on whether children’s language status was measured by means of receptive vocabulary measures in both languages, language-use surveys, or an adult’s nomination. We also compared effect sizes in bilingual children who showed full mastery of two languages with effect sizes in bilingual children who showed emerging proficiency but not mastery of a second language. These analyses were undertaken in response to calls for more nuanced characterizations of bilingualism and a recognition that bilingual language status is not all or nothing (Luk & Bialystok, 2013)
Statement of Relevance
According to some accounts, bilingual language experience leads to a measurable advantage in executive functioning in children, a view that has gained substantial traction within the psychological sciences and the popular media. Critics, however, charge that empirical support for the bilingual advantage is weak because important confounding variables have not been consistently measured and controlled. The present meta-analysis synthesized data from 136 peer-reviewed articles, 11 doctoral theses, and two unpublished data sets, which equated to 1,194 effect sizes, and found a small effect of language status on children’s executive functioning that was largely explained by moderating factors and bias. Therefore, the safest conclusion to be drawn from the current review is that the bilingual advantage in children’s executive functioning is small, variable, and potentially not attributable to the effect of language status.
Finally, we tested for bias in the reporting of research findings by examining the relationship between the size and the precision of reported effects and testing whether there is a disproportionate number of large positive effects among studies reporting imprecise effect-size estimates. We then corrected for distortions in the literature by recalculating estimates of language-status effects on children’s executive functioning while adjusting for bias.
Primary Research Questions
There were four primary research questions. The first was, “Do bilingual children show an advantage in executive functioning relative to monolingual children?” The second question was, “Is the bilingual advantage in children’s executive functioning more pronounced in some domains than others?” The third question was, “What additional variables moderate the relationship between language status and children’s executive functioning?” And the fourth question was, “Is the literature on the bilingual advantage in children biased in favor of confirmatory over disconfirmatory evidence?”
Method
Literature search and study selection
A comprehensive search of PsycINFO, Scopus, and Web of Science databases was conducted using the search term

Flowchart depicting the article screening and inclusion process.
Coding procedure
Executive-function domains
To guide the classification of individual measures into distinct executive-function domains, we defined executive functioning as a set of higher order cognitive processes that support children’s goal-directed behavior (Zelazo et al., 1997, 2003). These processes include planning, flexibility, decision-making, working memory, and selection. Domain boundaries were refined to ensure that tasks hypothesized to be the locus of language-status effects were aggregated together in the same domain and labeled as such (Bialystok, 2017). The result was nine different executive-function domains, including three “executive-attention” domains (i.e., selection, nonverbal working memory, flexibility) hypothesized to be the locus of language-status effects (Bialystok, 2017). A full list of domains and associated measures appears in Table 1; definitions appear at https://osf.io/jv7wt/.
Overview of Executive-Function Domains and Tasks Included in Each Domain
Meta-analytic procedure and analyses
The data, R code for computing all analyses, and additional details on all aspects of the analysis are available at https://osf.io/jv7wt/.
Effect-size calculation
For studies that reported means and standard deviations, effect sizes were transformed to Hedges’s
Multilevel model
Individual effect sizes cannot be treated as statistically independent because individual effects can originate from different comparisons within experiments, different experiments within articles, or different articles from the same research group. Dependencies of this kind can produce artificially narrow confidence intervals (CIs) and artificially small estimates of the standard error of the effect (Van den Noortgate et al., 2013, 2015). Therefore, following the removal of outliers (six effects—0.005% of the data—whose absolute
Model-Fit Indices, Comparison Statistics, Estimated Effect Sizes (
Note: AIC = Akaike information criterion; LRT = likelihood-ratio test.
Moderation analysis
Residual effect sizes from the multilevel model were statistically heterogeneous with respect to both the overall effect of language status on children’s executive functioning and the effect of language status within specific executive-function domains (see the Results section). Moderation analysis therefore tested whether the effect of language status on children’s executive functioning was moderated by other variables, including (a) executive-function domain; (b) participant characteristics, including age and degree of bilingualism (balanced, emergent, or unclassifiable); (c) study quality, including an overall assessment of study quality using the AXIS (Downes et al., 2016), measured equivalence of groups (yes or no), and reported objective measurement of SES (yes or no); (d) measure of language status (nomination, survey instrument, or receptive vocabulary test); (e) geographic origin of the sample (North America, Europe, East Asia, Middle East, or mixed); and (f) year of study publication. Details concerning the definition and measurement of moderator variables appear at https://osf.io/jv7wt/.
Analysis of publication bias
Publication bias was assessed by means of funnel plots that display effect-size estimates against the standard error of effect-size estimates (see Figs. 2 and 3). In the absence of publication bias, funnel plots should be symmetrical around the mean effect, and effect sizes should be more closely distributed around the mean effect as precision increases. Funnel-plot asymmetry suggests selective reporting of evidence and was evaluated by means of Egger’s regression test.

Contour-enhanced funnel plots for the overall effect, executive-attention domain, and other executive-function domains. Effect-size estimates are plotted against the standard error of effect-size estimates. Dots represent individual studies. Shading in the triangular regions indicates significance (white area:

Contour-enhanced funnel plots for each executive-function domain with sufficient data. Effect-size estimates are plotted against the standard error of effect-size estimates. Dots represent individual studies. Shading in the triangular regions indicates significance (white area:
Reestimate of language-status effects adjusting for publication bias
To correct for distortions introduced by the selective reporting of evidence, we estimated bias-adjusted estimates of language-status effects using the precision-effect test (PET) and PET with standard errors (PEESE; Stanley & Doucouliagos, 2014). Effect sizes were regressed onto their standard errors in a weighted least-squares regression model (i.e., PET) to test whether the bias-adjusted average effect size was distinct from zero. A significant and positive association between effect sizes and their standard errors is taken to suggest that studies with low precision report larger effects, and therefore, the overall effect may be potentially biased. The intercept of this model reflects the estimate of the true and unbiased effect in a hypothetical study with no bias or error (Stanley & Doucouliagos, 2014). Next, as recommended by Stanley and Doucouliagos, if the PET revealed a significant and positive association between effect sizes and their standard errors (i.e., the average bias-adjusted effect size, or intercept, was distinct from zero), then the PET was followed up by a PEESE to determine whether the average bias-adjusted effect size was statistically distinct from zero. The PEESE involves using variance as a predictor in the weighted least-squares regression model (Stanley & Doucouliagos, 2014).
Results
The final data set consisted of 1,194 effect sizes (1,105 following removal of outliers) drawn from 136 peer-reviewed publications, 11 doctoral dissertations, and two unpublished data sets (Cho et al., 2021; Goldsmith, 2021). Descriptive statistics for all included studies are presented in Table 3, and individual effect-size estimates are presented in Figure S2 in the Supplemental Material. Additional details can be found at https://osf.io/jv7wt/.
Descriptive Statistics for Studies Included in the Meta-Analysis
Note: NR = not reported.
These studies were included in study-quality subgroup analyses (i.e., study-quality score > 12, matched samples, measured socioeconomic status [SES]).
This is an unpublished data set, and there was not enough information to calculate a study-quality score.
For the method used to assess language status, questionnaires that asked parents to indicate whether another language was spoken in the home or whether the child spoke another language were classified as self-report (SR) questionnaires. Language-use (LU) questionnaires asked parents to indicate the child’s proficiency in the second language, the amount of time children spoke or were exposed to the second language in the home, and other questions designed to assess proficiency and exposure. Studies that indicated that parents were asked only if the child spoke another language at home were classified as having SR language status (by participant, parent, or school official). Studies that determined language status by enrollment in immersion programs were included in the SR category. RV = measured receptive vocabulary in both the first and second language.
Details pertaining to the classification of equivalence testing or matched samples are reported in Table S2 at https://osf.io/jv7wt/.
For measure of SES (Y = yes, N = no), authors had to report that SES was objectively measured (income, parental occupation, or parental education). Studies that recruited from low- or high-income neighborhoods or schools without additional measures to confirm that participants in the sample were indeed in that SES bracket received a “no” classification for this measure.
Results of the multilevel model revealed a small effect of language status across all domains of executive functioning that favored bilingual children (
Given a concern that the bilingual advantage may not be apparent in children who learned their second language through immersion schools or other educational programs, we evaluated the overall effect with these studies removed. Results indicated that the overall effect size was unchanged when these samples were removed from analyses (
Moderator analyses
Executive-function domain
Executive-function domain moderated the effect of language status on children’s executive functioning, as reflected by a test for whether the moderator explained heterogeneity in the data,

Forest plots showing the mean effect-size estimate for (a) the overall effect of language status on executive functions (EFs), executive attention, and other EF domains and (b) each executive-function domain. Diamonds indicate overall effect sizes. Error bars represent 95% confidence intervals.
Effect Size (
Note: We provide statistics for key tasks within each domain. These are for descriptive purposes only, and because of small samples within each task, results should be interpreted with caution. For variance components, σ21 represents variance in the effect-size estimate due to variability between research groups (highest level), σ22 represents variance in the effect-size estimate between studies clustered within research groups, and σ23 represents within-sample variance in the effect-size estimate. CI = confidence interval; DCCS = Dimensional Change Card Sorting Task; TMT = Trail Making Test; WCST = Wisconsin Card Sorting Task.
Verbal versus nonverbal tasks
Use of verbal versus nonverbal tasks moderated the overall language-status effect on executive functioning,
Participant characteristics
Age
Mean age did not moderate the overall effect of language status on children’s executive functioning,
Degree of bilingualism
Degree of bilingualism moderated the overall effect of language status on executive functioning,
Geographic origin of the sample
Geographic origin of the sample moderated the overall effect of language status on children’s executive functioning,
Study quality
The AXIS measure of study quality
Study quality as measured by the AXIS (see Table 5) moderated the overall effect of language status on children’s executive functioning,
Percentage of Studies Meeting the Yes, No, and Unclear Criteria for All Study-Quality Measurements
Note: One unpublished data set did not include enough information to rate the study on any of the Appraisal Tool for Cross-Sectional Studies (AXIS) dimensions or to use matched samples or equivalence testing. AXIS scores ranged from 7 to 18 (out of a maximum score of 20;
Measured equivalence of groups
The equivalence of monolingual and bilingual groups needs to be established through measurement to ensure that between-groups differences reflect an effect of independent variables rather than unmeasured confounds. Thus, measured equivalence of groups, through either matching or statistical testing, on confounding factors including age, nonverbal IQ, gender, or SES is an important measure of study quality. In all, 41 of 159 studies reported matching monolingual and bilingual samples on at least a single variable, and an additional 32 of 159 studies reported using equivalence testing to ensure that groups were comparable on at least one demographic variable. Ensuring that monolingual and bilingual samples were comparable on any demographic variables by using either matched samples or equivalence testing was not a significant moderator of the language-status effect on overall executive functioning,
The use of matched samples or equivalence testing was, however, a significant moderator within specific executive-function domains,
Effect-Size Estimates and Confidence Intervals (CIs) for Studies That Measured Group Equivalence Using Either Matched Samples or Equivalence Testing and Studies That Did Not Measure Group Equivalence
Measurement of SES
Study quality as assessed by reported objective measurement and control of SES moderated the effect of language status on children’s executive functioning. In all, 94 of 158 studies reported objectively measuring SES. Measurement of SES moderated the language-status effect on overall executive functioning,
Language-status measure
Choice of language-status measure moderated the overall effect of language status on children’s executive functioning,
Year of publication
Year of publication was not a significant moderator of the overall effect of language status on children’s executive functioning,
Multiple meta-regression
To consider all moderator variables in tandem, we conducted a multiple meta-regression analysis that predicted residualized effect sizes from participant characteristics, AXIS study-quality scores, use of matched samples or equivalence testing, measurement of SES, language-status measure, and year of publication,
Publication bias
Funnel plots, Egger’s test of asymmetry
Contour-enhanced funnel plots for the overall effect and by executive-function domain are presented in Figures 2 and 3. Asymmetry of effect sizes was clearly observed for the overall effect and for many of the included domains. This asymmetry was confirmed using the modified Egger’s regression test for funnel-plot asymmetry (Pustejovsky & Rodgers, 2019; see Table 7).
PET-PEESE-Corrected Estimates and Results From the Modified Egger’s Regression Test for the Overall Language-Status Effect
Note: Values in brackets are 95% confidence intervals. PET = precision-effect test; PEESE = PET with standard errors.
PET-PEESE correction for publication bias
PET-PEESE analysis also revealed evidence of publication bias in both the estimate of the overall effect of language status on children’s executive functioning and the effect of language status within specific domains. Overall effect sizes and effect sizes within each domain were significantly associated with both their standard error (
PET-PEESE analysis was then used to adjust for the influence of publication bias. Results indicated that after we adjusted for publication bias, the overall effect of language status on children’s executive functioning was indistinguishable from zero (
Discussion
A systematic review of available literature revealed no coherent evidence that bilingual children are advantaged in executive functioning relative to monolingual children. A multilevel model of 1,194 effect-size estimates revealed a small (
Language-status effects were, however, evident in only one of nine theoretically defined domains of executive function—response inhibition—and were indistinguishable from zero in all three domains of executive attention hypothesized to be the locus of language-status effects in children (Bialystok, 2017). Further, effect-size heterogeneity was elevated in almost every domain of executive functioning. Variability in the magnitude of reported effects derived primarily from the influence of different research groups and studies, suggesting that selected studies and research groups exert an inordinate influence on estimates of language-status effects.
Moderation analyses identified two additional factors that contribute to variability in reported effect sizes, including study quality and measurement of SES. Reported effects were larger in low-quality studies and those that did not measure SES and were statistically indistinguishable from zero in high-quality studies and those that measured SES. To be sure, a priori criteria for both moderator variables were not that stringent. To achieve a high score on the study quality AXIS, a study needed to objectively measure independent and dependent variables, provide evidence of the representativeness of experimental and control samples, and the authors had to be transparent in reporting conflicts of interest and study limitations. And to be classified as measuring SES, a study merely had to measure family income, parental education, or an objective proxy thereof.
The analysis also revealed evidence of a confirmatory bias in the reporting of research evidence. Funnel plots of the magnitude versus the standard error of effect-size estimates revealed asymmetries that were driven by a disproportionate number of large positive effects among studies with low precision estimates. Such asymmetries are considered a reflection of publication or small sample bias because they suggest that confirmatory findings are more likely to survive peer review than are disconfirmatory findings. After adjusting for the influence of publication and small sample bias using the PET-PEESE procedure (Stanley & Doucouliagos, 2014), we found that effect-size estimates for both executive functioning overall and almost all included executive-function domains were statistically indistinguishable from zero; the PET-PEESE corrected estimate for nonverbal working memory indicated a statistically significant effect in favor of a bilingual disadvantage.
Taken together, the current findings parallel those of Gunnerud et al. (2020), who found little evidence of a bilingual advantage among children ages 2 through 15 years, considerable heterogeneity in the magnitude of reported effects, a moderating effect of SES, and evidence of publication bias in a substantially smaller survey of the pediatric literature (583 vs. the current 1,194 effect sizes). The current findings do, however, extend the findings of Gunnerud et al. (2020) in several important ways. First, we very specifically tested for—and found no evidence of—language-status effects in three domains of executive attention that Bialystok (2017) highlighted as particularly relevant for identifying the bilingual advantage. Thus, our findings show that null effects reported by Gunnerud and colleagues cannot be explained away by arguing that executive-functioning domains were not properly defined to reveal a bilingual advantage. Critical executive-attention domains used in the current analysis were defined according to recent theory (Bialystok, 2017) to maximize the likelihood of detecting language-status effects. Despite this, we found no evidence of any language-status effects. Second, we tested for and found evidence of the importance of study quality in explaining heterogeneity in reported effects. Gunnerud et al. also found substantial heterogeneity in reported effects but identified only two moderating variables: SES and research group. Our findings therefore provide additional insight into methodological considerations that contribute to variance in the magnitude of reported effects, as has been suggested by various critics (for a discussion, see Morton, 2015).
The findings challenge the view that bilingual language status favorably impacts children’s executive functioning. In the face of null findings from the study of adults, proponents of the bilingual-advantage hypothesis have argued that language-status effects are more difficult to detect in adults than in children because adults perform at ceiling on executive-function tasks, whereas children do not. The implication is that if language-status effects are to be detected at all, they are more likely to be detected earlier rather than later in development (see Grundy et al., 2017). The results of the current meta-analysis challenge this argument by suggesting that language-status effects on executive functioning in children, should they exist at all, are diminishingly small and very difficult to detect. Based on the current review, the overall effect (
The current findings have important implications for future research on the bilingual advantage in children. First, there is a need to move away from the use of small samples. Given current estimates, language-status effects are far too small to be detected by comparisons of 20 or 30 children, which is the current standard. Samples need to be scaled up considerably if language-status effects are to be reliably detected, perhaps through the coordinated efforts of a consortium (for a discussion, see Morton, 2015). Second, there is a need to raise basic methodological standards on a number of fronts. This would include a more exhaustive cataloguing of, and matching of groups on, potentially confounding variables such as SES and immigration status. Although language status may influence children’s executive functioning, to date, reported effects are highly variable from study to study and likely reflect the influence of factors other than language status. Finally, to properly appreciate the complex relationship between language status and children’s executive functioning, it may be necessary to move away from simple binary characterizations of language status such as that utilized in the present review. However, to achieve this, we see no way forward other than to abandon the practice of measuring language status through basic self-nomination or paper-and-pencil measures and commit to more thorough measurements that yield continuous, standardized, and reliable measures of language proficiency. Only in this way will it be possible to examine the relation between levels of bilingualism and children’s executive functioning across different studies.
Supplemental Material
sj-pdf-1-pss-10.1177_0956797621993108 – Supplemental material for The Bilingual Advantage in Children’s Executive Functioning Is Not Related to Language Status: A Meta-Analytic Review
Supplemental material, sj-pdf-1-pss-10.1177_0956797621993108 for The Bilingual Advantage in Children’s Executive Functioning Is Not Related to Language Status: A Meta-Analytic Review by Cassandra J. Lowe, Isu Cho, Samantha F. Goldsmith and J. Bruce Morton in Psychological Science
Supplemental Material
sj-pdf-2-pss-10.1177_0956797621993108 – Supplemental material for The Bilingual Advantage in Children’s Executive Functioning Is Not Related to Language Status: A Meta-Analytic Review
Supplemental material, sj-pdf-2-pss-10.1177_0956797621993108 for The Bilingual Advantage in Children’s Executive Functioning Is Not Related to Language Status: A Meta-Analytic Review by Cassandra J. Lowe, Isu Cho, Samantha F. Goldsmith and J. Bruce Morton in Psychological Science
Footnotes
Transparency
C. J. Lowe and J. B. Morton developed the study concept. C. J. Lowe conducted the literature search. C. J. Lowe, S. F. Goldsmith, and I. Cho reviewed the abstracts and articles and coded study quality. C. J. Lowe extracted data from individual articles, coded all moderators, and analyzed and interpreted the data. C. J. Lowe and J. B. Morton drafted the manuscript, and S. F. Goldsmith and I. Cho provided critical revisions. All the authors approved the final manuscript for submission.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
