Abstract
Despite considerable efforts made to understand the impact that instructional interventions have upon L2 reading development, we still lack a clear picture of the influence that PA and phonics instruction has upon reading in English as an L2. A search of the research literature published from 1990 to 2019 yielded 45 articles with 46 studies containing 3,841 participants in total. Effect sizes were recorded for the effect of various PA and/or phonics instructional interventions on word and pseudo word reading. Results demonstrated that L2 PA and phonics instruction has a moderate effect on L2 word reading (g = 0.53) and a large effect on pseudo word reading (g = 1.51). Moderator analyses revealed effects of a number of moderators including testing method, type of PA/phonics intervention, and context where the intervention occurred. Based upon these conclusions, policymakers and educators can provide beginning learners of English as an L2 with PA and phonics instruction that will enable them to read, understand and enjoy English better. Future research should also strive to adhere to more stringent standards of excellence in educational research.
Keywords
Introduction
Teaching English learners phonological awareness (PA) and phonics to decode unfamiliar printed text is a foundational step in fostering English reading skills. PA is defined as “an awareness of sounds in spoken (not written) words that is revealed by such abilities as rhyming, matching initial consonants, and counting the number of phonemes in spoken words” (Stahl & Murray, 1994, p. 221). PA is widely thought to be the foundation upon which phonics is built. Examples of classroom instructional practices that support the development of PA include having learners pay attention to the sounds of spoken language as they engage in activities such as reading poems, songs and games like “I spy” (Yopp & Yopp, 2009). Phonics has been described as both “. . .a system for encoding speech sounds into written symbols. . . [and a way of] teaching learners the relationships between letters and sounds and how to use this system to recognize words” (Mesmer & Griffith, 2005, pp. 366–367). Classroom instructional practices that support the development of phonics knowledge include teaching sound-letter correspondences and having students practice decoding (i.e., sound out) words (Heilman, 2002).
Birch (2014) highlights the importance of phonics-related skills in L2 language reading while lamenting how these skills have traditionally been neglected in the L2 reading classroom due to an over emphasis on whole language principles of reading instruction. The term L2 refers to the learning of English as both a second and foreign language in the present study. Adesope et al. (2011) add that “although the effects of phonics instruction have been widely documented in the literature, there is limited understanding of the effectiveness of this approach in teaching literacy to immigrant students whose first language is not English” (p. 632). Although, in recent years, a number of meta-analyses and literature reviews have been conducted to get a clearer sense of the effectiveness of various methods for supporting L2 literacy development, none have focused on the specific role of PA and phonics instruction in developing English learners’ L2 word decoding ability. It is necessary to pay greater attention to word decoding ability because such decoding ability is widely recognized as being such an essential component of L2 reading ability (Birch, 2014). Likewise, pseudowords reading is a reliable predictor of reading achievement and using pseudoword assessments helps to ensure that learners have not previously encountered the target words allowing them to demonstrate their phonetic decoding ability to accurately identify the pseudowords (Curtis, 1980). Pseudowords are words that follow English spelling but have no meaning (e.g., sarp, desh, or chab).
The results of this meta-analysis could help summarize the empirical findings on the effectiveness of teaching PA and phonics on L2 learners’ English word and pseudo word reading. These results may prove useful to policy makers seeking input as they consider the value of devoting limited instructional time and resources to teaching PA and phonics to beginning English L2 readers. At the school and classroom level, these findings may convince teachers who are uncertain about the effectiveness of phonics instructional techniques to integrate them into their L2 reading pedagogy.
Literature review
Previous meta-analyses
A number of meta-analyses, systematic reviews, and traditional literature reviews have yielded valuable insights into the relationship between phonics instruction and L2 literacy development. One early meta-analysis of L2 studies that was based on a rather small sample of only five studies lead to the cautious conclusion that explicit PA and phonics instruction benefits L2 learners (August & Shanahan, 2007). However, basing these conclusions on such a small sample of studies and not specifying the specific aspects of English reading ability that PA/phonics instruction improves makes it difficult for educators to apply the results to their practice.
Several meta-analyses within the field of applied linguistics have also examined research conclusions regarding various aspects of L2 literacy. These meta-analyses have explored the relationship between L2 reading and other reading component variables and discovered that L1 writing system correlated with L2 decoding ability (Jeon & Yamashita, 2014; Melby-Lervåg & Lervåg, 2011) and reading comprehension (Jeon & Yamashita, 2014) indicating that learners’ L1 has an influence on their L2 decoding and comprehension ability. Other meta-analyses have looked at the impact of various interventions on English learners’ PA and phonics knowledge development. Several reading interventions designed for L1 learners (e.g., Success for All) have been found to be effective for the learning of PA and phonics (Adesope et al., 2011; Cheung & Slavin, 2012).
Previous Systematic Literature Reviews
In addition to these meta-analyses, several systematic reviews of the research literature have also been conducted in recent years. Findings from this work indicate that L2 vocabulary and grammatical knowledge (Choi & Zhang, 2021), weak PA (Varghese, 2015), and L1 writing system (Han, 2015) influence English L2 learners’ decoding skill which can in turn hinder their word recognition and comprehension. However, these deficiencies can be ameliorated by explicit instruction of word-level skills (Murphy & Unthiah, 2015).
These researchers also identify several limitations in the extant research literature. Firstly, serious issues exist with regard to the variables in these studies including problems with inconsistent construct definition, operationalization, and measurement that create doubt regarding their legitimacy (Choi & Zhang, 2021; Han, 2015). Murphy and Unthiah (2015) also highlight the lack of intervention studies examining best practices for improving L2 literacy in more varied educational contexts.
Previous Narrative Literature Reviews
A number of traditional narrative literature reviews have generated valuable insights into the effectiveness of various types of L2 literacy instruction. For instance, literacy knowledge and skill have been found to transfer between the L1 and L2 so teachers should take ESL learners’ L1 into account (Snyder et al., 2017). They can do this by adapting their PA/phonics instruction for L2 learners by focusing on problematic sounds that do not exist in these learners’ L1 (Irujo, 2007). Systematic and explicit PA/phonics instruction can efficiently support English learners’ decoding skill and word reading development (August et al., 2014; Irujo, 2007; Snyder et al., 2017) but this instruction must be based on L2 oral language practice and opportunities to read authentic L2 texts to ensure that L2 reading is meaningful to learners (August et al., 2014; Irujo, 2007).
Some reviewers also mentioned specific instructional programs and techniques that can support the growth of L2 reading proficiency. For example, reading programs (e.g., Jolly Phonics) originally designed for English L1 speakers also work with ELLs, especially if they address the unique challenges ELLs face that stem from L1 to L2 differences such as L2 phonemes that do not exist in the L1 (August et al., 2014). Other suggested instructional techniques include August et al.’s (2014) advice that ELL learners should receive instruction that is based on mastery learning. August et al. (2014) and Snyder et al. (2017) also agreed on the importance of differentiated instruction, teacher modeling, peer tutoring, and plenty of practice to reinforce new material.
To summarize key findings from previous research, explicit PA and phonics instruction helps beginning L2 readers, but this instruction should be performed upon a firm base of oral L2 proficiency. L1 literacy should also be taken into consideration to expedite the process through facilitating L1-L2 transfer and targeting problem areas. Several specific programs and techniques have been demonstrated to help teach beginning L2 readers, but several limitations associated with this research warrant continued investigation.
Results from these previous reviews offer considerable insight into the role that various issues and pedagogical techniques play in L2 reading development while providing general recommendations to teachers and policy makers. However, these broader scope reviews do not offer a sufficiently complete analysis of the specific influence that PA and phonics instruction can have upon particular aspects of L2 reading such as differing impacts on reading real words versus pseudo words. This is regrettable given the widely-acknowledged importance of these word and pseudo word reading for both L1 and L2 reading (Williams, 2016). Therefore, the present meta-analysis will address this oversight in the existing research literature by focusing on studies that examined these two central aspects of L2 reading development. The specific research questions to be addressed in this investigation are as follows:
What are the overall effect sizes for L2 readers who have experienced L2 PA and/or phonics instruction on both their L2 word and pseudo word reading?
How do the characteristics of each study influence effect size in L2 word and pseudo word reading?
Method
Literature Search Procedures
This meta-analysis synthesized results from empirical studies of PA and phonics instruction with second and foreign readers of English as a second or foreign language following established procedures for conducting meta-analyses in the social sciences (Card, 2015; Cooper, 2015). Online and manual bibliographical searches were carried out to locate both published research studies and unpublished dissertations for the meta-analysis. These online database searches were performed using Academic Search Premier (10), Linguistic and Language Behavior Abstracts (0), Web of Science (3), ERIC (7), PsycINFO (2), Google Scholar (7), EBSCO (8), and Pro Quest Dissertation and Thesis (8). The number in parentheses refers to the number of publications found through each database. Studies published between January 1990 and December 2019 were included in the search because no empirical investigations that were conducted prior to that date could be located. Thirty-four of the studies were published in peer reviewed journals and 12 were unpublished dissertation studies.
The following search terms were used: immigrant, ELL, ESOL, ESL/second language read*, EFL/foreign language read*, L2 read*, second language literacy, foreign language literacy, and L2 literacy. These search terms were cross-referenced with the following keywords: reading intervention, reading instruction, phonics, phonemic/phonological awareness, letter-sound knowledge, alphabet principle, decoding, letter knowledge, and word recognition.
Inclusion and Exclusion Criteria
Empirical studies and reviews were surveyed in order to develop the criteria by which studies would be included and excluded from the analysis. After careful consideration of relevant factors the following criteria were used.
Participants: studies of learners of English as a second language (ESL) and English as a foreign language (EFL) were included. Research with students learning to read in English as their first language was excluded. Studies of students beyond middle school age were excluded from the analysis as were studies of students with learning disabilities.
Design: Studies of those learning to read in languages other than English were excluded as were those published in a language other than English. Studies with other outcome variables such as spelling were the outcomes measured were excluded. Statistics including means, standard deviations, participant numbers, p-values, etc., were necessary to calculate Hedge’s g effect size to perform the meta-analysis. The independent variables in included studies was required to be PA and/or phonics instruction and the dependent variable had to be a measure of word or pseudo word reading.
Intervention: the intervention had to be based upon phonological awareness or phonics instruction. Interventions that focused upon other aspects of L2 reading such as fluency, vocabulary, and comprehension were not included in the analysis.
Coding of Study Characteristics and Effect Sizes
The coding process was conducted as follows: the author, title, year, type of publication, and sample size were recorded first. Next, information about the study’s participant and study characteristics was noted. Participant characteristics included the sample grade and first language writing system. The first language writing systems were classified into alphabetic, abjad, syllabary, and logographic. The study characteristics were also recorded including the research design and assessment type. Research design was categorized into whether the study used a pretest-posttest with no control group, a pretest-posttest with a control group or a posttest only with a treatment and control group.
After that, information was recorded about the interventions themselves. Intervention characteristics included the type of instructional approach which were categorized into those whose intervention included PA only versus those with only a phonics intervention and lastly those studies that included a combination of both PA and phonics instruction. Information was also documented pertaining to intervention implementation. This included the instructional context which was separated into second (e.g., the US or the UK) and foreign (e.g., Korea or Russia) language contexts. The second category with regard to intervention implementation was the duration of instruction was grouped into 1 to 500, 501 to 1,000, 1,001 to 2,000, and 2,001+ minutes.
Rationale for Each Moderator
Research design: Pretest-posttest control group (CG) designs are commonly thought to be the gold standard in experimental methodology, but a large number of other published studies are either post-test only designs or pre-post designs without a CG. This analysis could establish whether these alternative designs yield significantly different effect sizes (ES) than pretest-posttest control group (CG) designs.
Assessment type: A sizable proportion of the studies have used researcher-developed assessments to measure student decoding variables while others have used standardized instruments. A comparison of the results of studies using the two types of assessments should provide some insight into whether or not there is any significant differences in ES between them.
Instructional approach: Various approaches to phonics instruction have been used in these intervention studies with include instructional approaches based on either PA, phonics, or a combination of PA and phonics. Detecting differences in ES among these instructional approaches would provide valuable information to educators considering which approach to use with their students.
Instructional context: The distinction is typically made between those learning in second versus foreign language contexts. This moderator analysis can help to establish whether differences in the instructional context has any bearing on students’ decoding performance.
Duration of instruction: It is widely assumed that the more time learners spend learning a particular skill the better they will be able to use that skill. This analysis can investigate this claim with respect to the effect of instruction duration on L2 decoding performance.
Type of L1 writing system: English learners’ L1’s are written with different kinds of writing systems (i.e., alphabet, alpha-syllabary, abjad, and logographic). This analysis may provide insight into whether the type of writing system of the L1 had any bearing on their L2 decoding ability.
Educational stage: The grade level of the learner is commonly investigated in relation to L2 reading instructional interventions. The widely-held assumption that will be investigated here is that earlier interventions yield greater positive L2 decoding outcomes for students.
Effect Size Extraction and Data-Analytic Strategy
Effect size (ES) has been described as “a quantitative reflection of a magnitude of some phenomenon that is used for the purpose of addressing a question of interest” (Kelley & Preacher, 2012, p. 140). In the present investigation, the ES represents the magnitude of the influence of the PA/phonics instructional intervention on L2 word and pseudo word reading. A summative mean effect size was calculated from the weighted effect sizes using inverse-variance weights to evaluate the intervention’s effectiveness. Studies that had larger samples were weighted more heavily because the effect sizes from these studies are usually more precise (Card, 2015).
Data were extracted and coded from 45 articles (containing 46 unique studies) that satisfied the inclusion criteria. These data were entered into the “Metafor” (Viechtbauer, 2010) package in the R statistical programing language (R Core Team, 2020) to compute the overall mean effect sizes and perform the moderator analysis. The “esc” (Lüdecke, 2019) package within R was also used to convert results from pre-post and posttest only studies into Hedge’s g for this analysis. Hedge’s g was selected as the effect size statistic in this analysis over Cohen’s d because several of the studies had relatively small sample sizes and Hedge’s g adds a correction factor for studies with small samples.
A random effects model was used for the computing the mean effect size. This model is the most appropriate when data originates from a collection of studies conducted by different researchers in various contexts calling into doubt the assumption that they share a common effect size. A random effects model is also more suitable when aspiring to generalize to a wider range of populations which is the aim of this meta-analysis (Borenstein et al., 2007).
Potential Publication Bias and Moderator Analysis
Publication bias is the tendency of studies that report small or non-significant effects to be underrepresented in the published literature (Lipsey & Wilson, 2001). A funnel plot inspection as well as rank and regression tests for funnel plot asymmetry were employed to check for the publication bias. Visual inspection of the funnel plot did not show distinct asymmetry which indicates minimal publication bias in the sample for both the word and pseudo word reading data. Egger’s regression test for funnel plot asymmetry was also conducted. The Egger test evaluates the statistical significance of the intercept in an unweighted simple regression between the effect size indices and their standard errors. A non-significant result indicates no publication bias. The result of the Egger test was not significant with z = 1.07, p = .281 for pseudo word reading data and z = 0.848, p = .396 for word reading data. The Begg rank correlation test was performed as well. It tests whether Kendall’s rank correlation between the effect sizes and their variances equals zero. The test can be used to examine whether the observed outcomes and the corresponding sampling variances are correlated. A high correlation would indicate that the funnel plot is asymmetric suggesting publication bias. The result of the rank correlation test was not significant with Kendall’s τ = 0.149, p = .184 for word reading data and τ = 0.22, p = .14 for pseudo word reading data which indicates no publication bias. The results from the visual inspection of the funnel plots as well as both the Egger and Begg tests all show no evidence of serious publication bias in the sample of studies used in this meta-analysis.
A sensitivity analysis was also performed to determine whether the results of the study were disproportionately influenced by deviant data points or possible outliers. Visual inspection of a forest plot that was sorted according study precision indicated no strong relationship between the precision of the study and the effect size estimate which does not imply the presence of availability bias for neither the word reading nor the pseudo word reading studies. An inspection of the residuals to ensure that none deviated substantially from the mean indicated that two studies in the word reading data and two studies in the pseudo word reading data had z-scores that were larger than two but none were larger than 2.5. This result suggests that none of the studies included in the analysis deviate excessively from the mean. A “leave-one-out” analysis showed that removing any study from the analysis would not result in substantial change in the overall outcome estimate. Therefore, the inclusion of all studies in the analysis was justified.
Initial results indicating a significant amount of heterogeneity in overall effect size results prompted the decision to pursue a follow up analysis of various potential moderator variables because effect sizes did not appear to represent a common population mean (Lipsey & Wilson, 2001). The characteristics of the studies themselves (i.e., designs and assessments), features of the interventions (i.e., instructional approach), implementation of the intervention (i.e., context and duration) as well as characteristics of the participants (i.e., L1 writing system and educational stage) were incorporated into the analysis because these variables have been established as being important factors in L2 education. These moderators were individually evaluated through a series of Q tests.
Results
An overall mean effect size for word reading was calculated from 36 effect sizes taken from 35 different studies based upon the random-effects method utilizing the restricted maximum likelihood estimator. A moderate and statistically significant mean effect size was identified for the effect of phonics instruction on L2 word reading skills: g = 0.53 (SE = 0.12), 95% CI = [0.27, 0.79], and t(35) = 4.17, p < .001. These findings suggest that phonics instruction supports the development of word decoding ability of English learners. The results shown in Table 1 represent the estimated effect sizes, the standard error and the 95% CI for each effect size for both the word reading and pseudo word reading variables. The pre-post experimental-control group studies are located in the upper section of the table, the post-test-only between-group studies are found in the middle section, and the within-group studies in the bottom section.
Treatment Effects for Experimental- Versus Control-Group Comparisons.
A heterogeneity analysis yielded a result of Q (35) = 242.26, p < .001 demonstrated statistically significant heterogeneity in effect sizes in the word reading studies. Similarly, an I2 of 88.53% also indicated considerable variability among effect sizes in the sample. Therefore, the inclusion of a moderator analysis to possibly account for this unexplained heterogeneity was warranted.
The mean effect size calculated for pseudo word reading was g = 0.79 (SE = 0.202), 95% CI = [0.37, 1.21], and t(23) = 3.92, p < .001. This result suggests that teaching L2 readers using PA and/or phonics methods has a statistically significant and large effect on their ability to decode pseudo words in English. However, as with the word reading result reported above, a statistically significant result of Q (23) = 224.79, p < .001 as well as an I2 statistic indicates the percentage of the variability in effect estimates due to heterogeneity equaled 94.12%. This amount of heterogeneity in the effect sizes also justified a moderator analysis of the studies with effect sizes for pseudo word reading.
The Moderator Analyses for Word Reading Variables
Study characteristics
Results for the three different research designs shown in Table 2 revealed significant differences among them (Q = 42.46, p < .001). That is, the studies with a pretest-posttest control group design (g = 0.33) had significantly lower effect sizes than those with a posttest only design (g = 0.75). The within subjects studies with only a pretest-posttest design and no control group had the highest mean effect size (g = 1.68) which was significantly higher than the other two types of studies. These results indicate a substantial difference in effect size depending on the design of the study.
Moderator Analysis for Experimental- Versus Control-Group Comparisons for Word Reading Results.
Knapp and Hartung (2003) adjusted F ratio results are reported for the variables that have two categories (e.g., assessment type).
significant at p < 0.05.
Learners also showed significant differences in the mean effect sizes for their L2 word reading performance depending on the type of assessment used to measure their L2 word reading (Q = 9.27, p < .001). Studies that used standardized assessments tended to report smaller effect sizes (g = 0.26) than those that relied upon researcher designed assessment tools (g = 0.71). Thus, it appears that the type of assessment tool used to measure learners’ progress can influence their performance so that studies using standardized assessments demonstrate substantially less growth in English L2 word reading.
Intervention implementation
The instructional approach that was used during the studies was also shown to impact learners’ L2 word reading performance (Q = 8.57, p < .001). Studies that were based on a combination of PA/phonics (g = -0.03) had the lowest effect size while those that relied upon PA had a significantly higher moderate effect size (g = 0.46) and those that used a phonics-based approach exclusively reported the largest mean effect size of all (g = 0.76). This result suggests that using a phonics-based approach is the most effective method to support the development of word reading in English as an L2.
The difference between second (g = 0.26) and foreign (g = 0.83) language instructional context was statistically significant as well (Q = 26.81, p < .001). Second language instructional contexts showed small effect sizes but those in foreign language instructional contexts were found to be considerably larger. This result indicates that phonics instruction promotes L2 word reading more effectively in foreign language than in second language contexts.
The duration of the phonics instruction was another moderator included in the analysis. Statistically significant differences were identified between the various time durations given for phonics instruction across the studies included in the analysis (Q = 14.81, p = .005). Generally, the studies showed medium effect sizes that increased as the duration of time devoted to phonics instruction increased. Investigations that spent from 1 to 500 minutes on phonics instruction had a moderate mean effect size of (g = 0.48) while those that allotted 501 to 1,000 minutes had a moderate effect size of (g = 0.55). Studies that had students learn about phonics from 1,001 to 2,000 minutes reported a moderate mean effect of (g = 0.49) and those who received phonics instruction for longer than 2,001 minutes had the largest effect size (g = 0.69). Thus, the more time is devoted to phonics instruction, the larger the effect of the instruction on English L2 word reading appears to be.
Participant characteristics
A significant difference in effect sizes for L2 word reading was identified among the different L1 writing systems included in each of the various studies (Q = 27.92, p < .001). Statistically significant mean effect sizes in the small to medium range were observed for learners from alphabet (g = 0.39) and logographic (g = 0.43) backgrounds. A small to medium effect was found for learners from abjad L1 writing systems (g = 0.30) but it was not significant and the number of studies this result was based upon was limited (k = 2). A large and significant effect was found for learners from L1 backgrounds that were alpha-syllabaries (g = 1.68) but the low number of studies (k = 2) this finding was based upon leaves it open to criticism. Nevertheless, these results suggest that the type of L1 writing system background of the learner may influence their L2 word reading.
The other participant characteristic included in the analysis was educational stage. With regard to educational stage, there was a statistically significant relationship to the English word reading depending on the educational stage that the learners in a given study were at (Q = 7.53, p < .001). Those at the primary (g = 0.43) and elementary (g = 0.56) levels showed moderate effect sizes while those at the middle school level (g = 1.72) had a large effect size. Thus, the effect of PA/phonics instruction on their English L2 word reading seems to have a relationship to the grade they were in when they received that instruction. However, it should be noted that the middle school results were based on a low number of studies (k = 2).
The Moderator Analyses for Pseudo Word Reading Variables
Studies characteristics
As shown in Table 3, significant differences were observed in the mean effect sizes for learners’ L2 pseudo word reading depending on the research design that was used for the study (Q = 5.76, p = .005). The pretest-posttest CG design yielded a statistically significant medium effect size of (g = 0.60). The posttest-only CG had a large effect size but it was not significant (g = 1.08). Studies with a pretest posttest within-group only design had a large and significant effect (g = 1.51). These results indicate that differences exist in the effect of phonics instruction on L2 learners’ reading of pseudo words depending on the research design used. However, both the posttest only (k = 3) and within groups designs (k = 4) consisted of relatively few studies so care must be taken in interpreting these results.
Moderator Analysis for Experimental- Versus Control-Group Comparisons for Pseudo Word Reading Results.
Knapp and Hartung (2003) adjusted F ratio results are reported for the variables that have two categories (e.g., assessment type).
significant at p < 0.05.
Assessment type was another potential moderator variable that was investigated. In this case, a large and significant difference was observed between studies that used assessments designed by the researcher (g = 0.58) and those that incorporated standardized assessments (g = 0.96) (Q = 7.39, p = .003). As these results show, investigations that used standardized instruments have a large effect which suggests that learners demonstrate substantial growth in their L2 pseudo word reading ability when standardized assessment tools to measure it.
Interventions characteristics
Characteristics of the interventions themselves were also considered as possible moderators between PA/phonics instruction and learner pseudo word reading performance. Instructional approach was explored as a potential moderator and significant differences were found in effect sizes depending on the instructional approach that learners experienced (Q = 15.50, p = .001). Those who were exposed to an approach designed to focus on PA showed a large and significant effect on their L2 pseudo word reading (g = 0.82) as did those who received primarily phonics-oriented instruction (g = 0.83) and a combination of both PA and phonics (g = 0.80) instruction. This shows that while all three instructional approaches are effective for promoting L2 pseudo word reading, combining both PA and phonics in the same program may be the least effective.
Intervention implementation
Several potential moderators related to the implementation of the PA/phonics intervention were also analyzed. One of these was the instructional context. Findings showed studies that occurred in second language environments had large and significant effects (g = 0.90) while those that took place in foreign language settings showed moderate and non-significant results (g = 0.52) (Q = 16.54, p < .001). This shows that PA and/or phonics instruction is considerably more effective in second language contexts than in foreign language contexts.
Analysis of the duration of the instruction period showed statistically significant differences among the mean effect sizes across the various time periods (Q = 75.37, p < .001). A small effect size (g = 0.26) was associated with learners receiving instruction for anywhere from 1 to 500 minutes. A much larger mean effect size (g = 1.98) was noted in studies where the intervention lasted between 501 and 1,000 minutes. In studies where learners were instructed for between 1,001 and 2,000 minutes, a small non-significant mean effect was found (g = 0.16). However, a small to medium effect size (g = 0.41) was observed when learners received instruction for 2,001 or more minutes. These effects suggest that L2 phonics instruction has most impact on pseudo word reading when it lasts for between 501 and 1,000 minutes.
Participant characteristics
The potential influence of L1 writing system was investigated as a possible moderator in the analysis. Statistically significant differences existed between studies of learners with different L1 writing systems (Q = 13.48, p = .003). In studies of learners with alphabetic L1’s, a significant and moderate mean effect size (g = 0.52) was obtained. As well, a moderate to large effect size (g = 0.82) was typically reported for English learners whose L1 writing system was logographic. A comparable effect size (g = 0.77) was observed for learners whose L1 writing was based upon an alpha-syllabary. Taken together, these results imply that there are meaningful differences in the L2 pseudo word reading performance of learners from different L1 writing systems but considering the limited number of studies for some L1 writing systems (i.e., alpha-syllabary) these findings should be interpreted with some caution (Tipton, 2015).
There was also a statistically significant relationship to the English pseudo word reading depending on the educational stage of the learners (Q = 9.60, p = .001). Those in the primary grades (g = 0.17) had a small effect size while those in the elementary grades had a large mean effect size (g = 1.03). Thus, PA/phonics instruction is more effective for developing the English L2 pseudo word reading ability for learners in the elementary rather than the primary grades.
Discussion
The overall effectiveness for L2 phonics instruction was found to be moderate for L2 word reading and large for pseudo word reading. These moderate to large mean effect sizes both generally are in accordance with results from other studies that PA/phonics instruction supports decoding skill development and word reading ability (August & Shanahan, 2007; August et al., 2014; Cheung & Slavin, 2012; Han, 2015; Irujo, 2007; Murphy & Unthiah, 2015).
A number of studies produced moderately negative effect sizes (i.e., Baker et al., 2016; Kamps et al., 2007; Solari & Gerber, 2008; Yen, 2004) that contrasted with the results presented by most other studies. There are at least two possible explanations for these somewhat unexpected results. Yen’s (2004) study was an unpublished masters’ thesis study conducted by a novice researcher which may explain the substantial divergence of its results from those of most other peer-reviewed investigations. In the case of Baker et al. (2016), Kamps et al. (2007), and Solari and Gerber (2008), these studies all contained sizable numbers of at-risk learners. This relatively high proportion of at-risk learners within the sample may account for the discrepancy in their findings and those of most of the other studies in the analysis.
Moderator analyses pointed to some differences in effect sizes for both word and pseudo word reading depending on the assessment type that was used in the study. While previous research did not specifically examine discrepancies in results between studies conducted with standardized or researcher-designed measures, some researchers have voiced concerns about how L2 literacy related constructs have been defined, operationalized and measured in previous research (Choi & Zhang, 2021; Han, 2015). Thus, the results of the present analysis agree that discrepancies may exist with the operationalization and measurement of L2 literacy variables in this research like word and pseudoword reading that could offer an explanation for why there are substantial differences in their ES depending on the type of assessment used in the research.
Instructional approach results showed that PA/phonics interventions had a small effect size for word reading in comparison to medium effects for PA and large effects for phonics instruction alone. Significant differences were observed for pseudo word reading but PA, phonics, and PA/phonics all had large and similar effect sizes. Thus, all approaches appeared to be more effective for pseudo word reading possibly because all approaches focus on teaching decoding skills and pseudo word reading relies more on decoding skills than real word reading.
Learners who received their instruction in second language contexts showed a small mean effect on their L2 word reading while those taught in foreign language settings displayed a large mean effect size. Somewhat unexpectedly, the opposite was found for L2 pseudo word reading. A possible explanation for this discrepancy might be that EFL teachers encourage the practice of decoding pseudo words more than ESL teachers possibly because they do not face the same pressing concerns as ESL teachers to prepare learners to read in their content classes. Only additional research can confirm the veracity of this speculation.
Previous meta-analyses of L2 reading instruction did not discuss the role that factors such as instructional context, and duration of instruction play in PA/phonics instruction and L2 word and pseudo word reading development. Given the somewhat counterintuitive findings presented here, additional investigation of these moderators appears warranted.
Findings for the L1 writing system moderator were that PA/phonics instruction appears to support the development of pseudo word reading for learners from logographic L1 backgrounds more than for those from alphabetic L1 backgrounds. Previous meta-analyses and literature reviews concurred that L1 to L2 distance moderates L2 decoding performance (Jeon & Yamashita, 2014; Melby-Lervåg & Lervåg, 2011) and learners from alphabetic or non-alphabetic L1s perform differently on measures of word recognition (Han, 2015). Fortunately, as demonstrated both here and in previous research, literacy skills transfer between the L1 and L2 (Snyder et al., 2017), but teachers may still need to tailor their PA/phonics instruction to teach the specific phonemes that are absent from the learners’ L1 (Irujo, 2007).
The findings from this meta-analysis support the use of phonics instruction in the L2 reading classroom and demonstrate that phonics instruction can improve L2 word reading and pseudo word decoding although the moderators discussed above should also be taken into account. Debate among theoreticians and researchers has long continued about the role of phonics instruction in the first and second language reading classroom. Second-language researchers and educators have historically tended to encourage more of a whole-language based approaches (Birch, 2014). However, in recent years, somewhat of a shift in attitude has occurred toward a more balanced view of L2 literacy instruction that incorporates the best practices of both whole language and phonics instruction (Gunderson et al., 2020).
Based upon the conclusions reported here, policymakers should consider providing beginning learners of English as an L2 with phonics instruction. Likewise, L2 pre-service and in-service reading teachers who are less knowledgeable about phonics instruction should seek training to teach phonics effectively to enable their learners to decode the target language better. This decoding ability is the vital first step to reading, enjoying and ultimately learning the target language.
This meta-analysis has also exposed some possible issues with the current research base that suggest some caution when interpreting the results. First, there are potential concerns about quality in some of the studies. Several failed to report sufficient information about the background characteristics of the learners (e.g., age, gender etc.), the experimental intervention, and the assessment instruments used. As well, some studies were technically pre-experimental because they did not include a control group. This limitation leaves designs open to criticisms regarding their inability to account for threats to their internal validity. A related limitation was the wide variety of assessments used to evaluate learners’ reading ability made it difficult to directly compare their results. This oversight was compounded by the fact that instrument validity and reliability was often discussed in vague terms or ignored altogether.
These limitations can threaten the conclusions of the present meta-analysis because they may call into question the validity of the studies upon which these results are based. Nevertheless, pointing out these weaknesses in the existing research does not necessarily negate the value of these investigations because the limitations mentioned above are common in social science research in general. Thus, the best we can do at present is to acknowledge and learn from them as we strive to produce more carefully-designed research going forward. Reports of future studies should include more complete and detailed information about instrumentation to clearly demonstrate their validity and reliability. Additionally, researchers should make great efforts to design and conduct more true experiments that include important characteristics such as the inclusion of a control group and the random assignment of participants to experimental conditions.
Conclusion
The present meta-analysis has examined a relatively large sample of the extant empirical studies of L2 PA and phonics instructional interventions and demonstrated that they have a moderate effect on L2 word reading and a large effect on pseudo word reading. Moderator analyses has revealed that standardized tests yielded different effects than researcher designed tests for both the word and pseudo word reading variables. PA/phonics interventions have a small effect for word reading in comparison to medium effects for PA and large effects for phonics instruction alone. PA/phonics, PA, and phonics all have a moderate effect for pseudo word reading. Second language learners showed a smaller effect on their L2 word reading than foreign language learners, but the opposite was found for pseudo word reading.
Even though current research efforts have provided us with valuable insights, important work remains to be done. For example, additional research can provide us with a more fine-grained understanding of the interaction between learners’ personal characteristics (e.g., age, L1 reading ability etc.) and instructional techniques and approaches to which they are exposed. Hopefully, the limitations in existing research pointed out here will also be given careful consideration in future investigations so that the quality of the research that we produce in this important area will only continue to improve.
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
Disclosure
The author certifies that he has no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
