Abstract
Glossing is a widely used and examined vocabulary learning tool, and one of the major branches of glossing research has compared the relative effects of first language (L1) and second language (L2) glosses on reading comprehension and vocabulary learning. However, the findings in this literature have not been consistent, calling for a comprehensive and systematic review. To this end, we conducted a meta-analysis to investigate the relative effects of L1 and L2 glossing on L2 reading comprehension and L2 vocabulary learning. Based on 78 effect sizes gathered from 26 studies representing 30 independent samples (N = 2,189), we found that L1 glossing was more effective than L2 glossing in general (Hedge’s g = .33, SE = .09, p < .001), but the effect size may vary depending on the target outcome measure. The relative effectiveness of L1 glossing was particularly supported by the results of immediate posttests of vocabulary, rather than delayed posttests of vocabulary and reading comprehension tests. Further, among a few selected moderator variables, the results of meta-regression revealed that learners’ L2 proficiency level significantly influenced the average effectiveness, such that L1 glossing is particularly effective for beginner learners compared to those with intermediate or higher L2 proficiency levels.
I Introduction
While vocabulary learning through reading has long been advocated in the second language (L2) teaching and learning literature (e.g. Nation & Wang, 1999), it has also been suggested that L2 learners may not be able to benefit much from such a learning approach, as they often make incorrect inferences about the meaning of unfamiliar words appearing in L2 texts (Nassaji, 2003), and may even retain these incorrectly guessed meanings in their L2 lexicon (Mondria, 2003). To prevent this from occurring, L2 researchers have put forward various ways of modifying reading texts for L2 learners, with glossing (i.e. lexical information about unfamiliar words within reading passages, usually placed on the same page) being one of the most frequently researched topics in this regard. The pedagogical effectiveness of glossing has been examined in the contexts of both traditional, i.e. pen-and-paper, and computer-assisted language learning (CALL). The subsequent findings have indicated the positive effects of this device for L2 vocabulary learning.
As more studies have evidenced positive effects of glossing in general for L2 learners, L2 glossing research has proceeded to examine the effects of different types of glossing. Variables examined to date include the location or format of glosses (e.g. AbuSeileek, 2011; Lee & Lee, 2015; Watanabe, 1997), frequency of target glossed words in the text (e.g. Choi, 2016; Hulstijn, Hollander & Greidanus, 1996), and language of glosses (e.g. Jacobs, Dufon & Hong, 1994; Kang, Kweon & Choi, 2020; Miyasako, 2002; Yoshii, 2006). A large number of studies have addressed the last of these, comparing the relative effects of first language (L1) and L2 glosses on L2 learners’ comprehension of an L2 text and their learning of L2 words included in the text. However, the findings of these studies have not been consistent, with some studies reporting the superiority of L1 glosses and others revealing non-significant differences. These inconsistent findings call for meta-analysis, which could offer a more comprehensive, systematic understanding of the relative effects of L1 and L2 glossing for L2 learning.
To address this call, we have undertaken a meta-analysis of the relative effects of L1 and L2 glosses on L2 vocabulary learning and reading comprehension. In addition, we examined the effects of potential mediators of the relative effects of L1 and L2 glossing, with the aim of providing pedagogical implications for L2 teachers and learners as well as directions for future research.
II Background
1 The issue of L1 and L2 input for L2 learning
The question of whether to provide the L1 or the L2 meaning of target vocabulary as glossary information can be situated within the broader, controversial issue of monolingual versus bilingual approaches to L2 teaching (Cook, 2010). Each side of this debate has gained support from different strands of research in the sociocultural, sociolinguistic, and second language acquisition literature (for review, see Lee, 2012). The proponents of the monolingual approach have stressed the importance of maximum exposure to L2 input (ACTFL, 2010), and its power to enhance students’ motivation to learn the L2 (Macdonald, 1993). In contrast, those supporting the bilingual approach have suggested that the L1 is an important mediator in L2 learners’ minds when processing L2 input (Antón & DiCamilla, 1998; Villamil & De Guerrero, 1996) and that it could efficiently make L2 input more comprehensible (Cook, 2001; Lee, 2012; Macaro, 2009). In fact, the findings of studies on the issue of L1 or L2 glossing lend weight to these somewhat mutually exclusive approaches.
Among several theoretical frameworks that support the monolingual and bilingual approaches (respectively), the two most relevant to the issue of the effects of L1 and L2 glossing are arguably Jiang’s psycholinguistic model of L2 vocabulary acquisition (Jiang, 2000, 2004) and Kroll and Stewart’s (1994) Revised Hierarchical Model, which both predict that L2 learners may gain access to the concept of an unfamiliar L2 word more easily through its L1 equivalent(s) than without it at the initial stage of registering this word in the lexicon (i.e. ‘lexical association’ stage in Jiang’s terms). Simultaneously, however, these psycholinguistic models also point to the possibility that, as L2 learners develop their L2 proficiency (or more specifically, expand their L2 vocabulary system), they may rely less on L1 equivalents in their lexical processing, progressively requiring the L1 to a lesser extent. Thus, based on these models’ propositions, it appears that learners may be able to gain access to the concept of a target L2 word more easily through the L1 equivalent of that word than through its L2 explanation or definition.
The matter of the relative effects of teacher’s code-switching (L1-based instruction) or L2 instruction on L2 vocabulary learning also deserves some space here, as it is closely related to the issue of L1 versus L2 glossing. Research on the effects of teacher’s code-switching (i.e. brief use of learners’ L1 for pedagogical purposes; see Macaro, 2009) has often adopted a framework similar to L1 and L2 glossing research, in that the objective of its pedagogical context is not only to comprehend the meaning of a target text but also to acquire target vocabulary (Hennebry, Rogers, Macaro & Murphy, 2017). The difference between pedagogical code-switching and glossing research lies in the nature of the input provided for L2 vocabulary learning, where ‘input’ in research on the effects of teacher’s code-switching refers to teacher’s oral instruction on target vocabulary, and in L1 and L2 glossing research, to written glosses. Although there is no meta-analysis of the relative effects of code-switching and L2-only instruction on L2 vocabulary learning, the results of teacher code-switching research (e.g. Hennebry et al., 2017; Lee & Levine, 2020; Lee & Macaro, 2013; Song & Lee, 2019; Tian & Macaro, 2012) have been consistent, pointing to the superiority of teachers’ code-switching (i.e. use of L1 input) for L2 vocabulary learning over L2-only instruction.
2 Moderating variables for the present meta-analysis
In this section, we introduce the variables that have been selected as moderators in the present meta-analysis, as they may influence the relative effects of L1 and L2 glossing. We divide these variables into three major categories, namely outcome measures, learner factors, and instructional features of glossing.
a Outcome measures
While reading an L2 text with glosses has the dual goal of achieving reading comprehension of the target text and learning unfamiliar L2 vocabulary, research on L1 and L2 glosses has not been consistent in terms of measuring the target learning outcomes through which the effects of these two types of glosses are examined, with some researchers adopting reading comprehension tests only (e.g. Al-Jabri, 2009; Ha, 2016), others implementing vocabulary tests only (e.g. Rouhi & Mohebbi, 2012; Yoshii, 2006), and still others, adopting both types of tests (Arpacı, 2016; Shiki, 2008). It should be noted that some studies have implemented vocabulary tests immediately after reading and a certain period after reading, to measure the long-term effects of glossing on vocabulary learning (e.g. Kim & Choi, 2017; Rouhi & Mohebbi, 2012). On the other hand, others have measured participants’ vocabulary knowledge only once, i.e. immediately after the intervention (e.g. Shiki, 2008). This inconsistency calls for a meta-analysis to synthesize findings across studies that have implemented different outcome measures.
b Learner factors
We focused on two moderating factors pertaining to L2 learners, namely their English learning context and their English proficiency level. Regarding the former, the literature showed that most participants had either Asian or Arabic backgrounds, and that all studies were conducted in the English as a foreign language (EFL) context. Although these tendencies require further clarification, one explanation is that reading L2 texts equipped with glosses is a common classroom activity in these regions.
Learners’ L2 proficiency level is another moderator that may influence the relative effects of L1 and L2 glossing, especially in view of the aforementioned literature on the effects of teachers’ code-switching, which may interact with learner proficiency on L2 learning. The empirical studies on this topic (e.g. Lee & Levine, 2020; Lee & Macaro, 2013; Song & Lee, 2019) have examined the relative effects of teachers’ L1 and L2 input on L2 vocabulary learning of students with higher and lower levels of L2 proficiency based on the above-presented psycholinguistic model of L2 vocabulary acquisition (Jiang, 2000, 2004) and Kroll and Stewart’s (1994) Revised Hierarchical Model. Overall, the findings of these studies revealed that learners can benefit more from L1 input than from L2 input, but lower-proficiency learners may benefit to a greater extent. Thus, the research on the effects of teacher code-switching points to learner proficiency as a potential moderator of the effects of L1 and L2 input on L2 learning. In L1 and L2 glossing research, some studies (e.g. Ko, 2017; Miyasako, 2002) have directly addressed the issue of the interaction between language of glossing and proficiency level by sampling learners with various L2 proficiency levels. As most studies on L1 and L2 glossing have only included participants with a specific L2 proficiency level, we aim to examine in the present meta-analysis whether L2 proficiency level could impact the relative effects of L1 and L2 glosses.
c Instructional features of glossing
One instructional feature that has gained attention in the glossing research is relative distance between the gloss and the glossed word. One of the earliest and most widely cited studies on this moderator is Watanabe (1997), who compared the effect of marginal glosses with that of glosses placed right next to the target word (i.e. appositives). Watanabe’s finding showed that marginal glossing was more effective than appositives. Meanwhile, some researchers (e.g. Pishghadam & Ghahari, 2011; Salimi & Mirian, 2019) have also placed the glosses on a separate sheet, making the distance between glossed words and glossary information even farther than marginal or bottom glossing. More recently, advances in CALL have enabled researchers and teachers to be more flexible in terms of the format and location of glossing. Indeed, Taylor (2009) suggests that traditional glossing may negatively influence L2 readers, as such glosses are placed in a different part of the text. More recently, Lee and Lee (2015) have discussed the value of tooltip-type glossing on reading in CALL environments, wherein glossary information appears when the user places the mouse cursor over the glossed word. Thus, the review of previous studies on glossary distance has indicated the need for a more systematic investigation into the moderating role of this variable regarding the effects of L1 and L2 glossing.
The other instructional feature of glossing examined in this meta-analysis is related to the density of glossed words. Previous L2 vocabulary research has suggested that understanding of 95% to 98% of the words in the target passage is required for L2 learners to achieve adequate comprehension thereof (Hu & Nation, 2000; Laufer, 1989). These informed estimates have served as practical guidance to help L2 teaching practitioners determine the likely proportion of unfamiliar words in the target passage for their learners, which can subsequently be adjusted if some or all of these words can be glossed by the teachers. In other words, glossing may enable L2 learners to comprehend text including 5% or more unfamiliar vocabulary. Although the density of the glossed words in the target L2 passage has not been a major consideration among researchers on glossing, it appears plausible that this variable may influence the effects of glossing. Thus, along with the distance between glosses and glossed words, the density of glossed words in the target L2 passages is examined as a potential moderator of the relative effects of L1 and L2 glosses in this meta-analysis.
3 Previous meta-analysis studies on glossing
Previous meta-analytic efforts have investigated the effectiveness of glossing in L2 learning. One of the earliest attempts was by Abraham (2008), who meta-analysed 11 studies on computer-mediated glosses and found that they had positive effects on L2 reading comprehension and vocabulary learning. Taylor’s (2009) meta-analysis of 32 studies from both CALL and traditional reading contexts revealed that glossing led to a higher level of L2 reading comprehension than no glossing, with computer-mediated glosses being more effective than traditional ones.
Regarding glossing’s overall positive effects on language learning, the most recent meta-analysis (Yanagisawa, Webb & Uchihara, 2020), based on 42 empirical studies, revealed how various types of glossing have different impacts on L2 vocabulary learning. For example, multiple-choice glosses were more effective than other types of glosses (e.g. in-text glosses, glossaries), and L1 glossing led to higher gains in vocabulary learning than its L2 counterpart.
Our meta-analysis extends Yanagisawa et al. (2020) by focusing on the relative effects of L1 and L2 glosses. First, we focused on empirical studies that included both L1 and L2 glossing. Second, we examined the effects of L1 and L2 glossing on L2 reading comprehension as well as L2 vocabulary learning. Finally, while Yanagisawa et al. (2020) included 11 studies that had both L1 and L2 glossing, we identified and added 15 more studies to this list. Therefore, we provide a more accurate and comprehensive meta-analysis of the relative effects of L1 and L2 glossing on L2 learning.
III Research questions
Based on our review of the literature, the present meta-analysis puts forward the following three research questions:
Research question 1: What are the relative effects of L1 and L2 glosses on L2 learning?
Research question 2: Do the relative effects of L1 and L2 glosses vary across different L2 learning outcomes?
Research question 3: What are the moderating variables and how do they influence the relative effects of L1 and L2 glosses?
IV Methods
1 Identifying primary studies
In this meta-analysis, the first stage of data retrieval adopted the following procedures. First, the first author searched online databases such as Linguistics and Language Behavior Abstract (LLBA), Educational Resources Information Clearinghouse (ERIC), and ProQuest. The key words and combinations of key words used in this search included glossing, L1 gloss, L2 gloss, L1 and L2 gloss, incidental vocabulary learning, vocabulary acquisition, reading comprehension, second language, and second language learning. Then, the authors conducted further manual search using the following academic journals accessible online: Applied Linguistics, Language Teaching Research, Modern Language Journal, Studies in Second Language Acquisition, TESOL Quarterly, and Reading in a Foreign Language. In addition, Google Scholar was used to find up-to-date studies, and the authors then tracked back to previous studies cited in those recent studies (i.e. backward search). Finally, we retrieved the reference sections of relevant book chapters and published meta-analyses on glossing (e.g. Abraham, 2008; Taylor, 2013, 2014; Yanagisawa et al., 2020, Yun, 2011) as additional sources of studies (i.e. forward search). The number of studies retrieved after this stage was 1,095 after removing duplicates.
2 Inclusion and exclusion criteria
As shown in Figure 1, after retrieving 1,095 studies, we screened their titles and abstracts as a first check before assessing full-text articles, and a total of 1,047 studies were found to be irrelevant to the topic. The remaining 48 studies were analysed for their eligibility for inclusion in the meta-analysis according to the following nine criteria. First, a study should be published between 1990 and 2020. This range was decided to capture both traditional pen-and-paper (i.e. non-CALL) and CALL contexts. Second, a study should examine either English as a second language or English as a foreign language; other languages were excluded to ensure statistical accuracy. Two studies were excluded for this reason. Third, a study should investigate learning contexts in which the primary goal is to comprehend the meaning of a written text and the secondary goal is to acquire the meaning of unfamiliar and new lexical items. Four studies were excluded for this reason; for example, studies in which the primary objective was to memorize target words and their meanings using glosses, without the presence of a reading text (e.g. Meihami & Meihami, 2014; Soureshjani & Riahipour, 2012). Fourth, a study should be designed as an experimental, quasi-experimental, or repeated-measures study comparing the relative effects of L1 and L2 glossing; the repeated-measured design in this case is one in which the same group of participants is exposed to both L1 and L2 glossing conditions. Two studies were excluded on this basis. Fifth, a study should report sufficient statistics to calculate effect sizes, including mean (M), standard deviation (SD), and sample size (n) of each group. On this criterion, a total of nine studies were excluded. Sixth, a study should measure L2 learning outcome in at least one of vocabulary learning or reading comprehension. Seventh, a study should confirm the homogeneity of the two groups (L1 and L2 glossing) by administering a pretest for target vocabulary or other proficiency measures prior to intervention or by implementing a randomized controlled trial, allocating participants to each target condition. It should be noted that this criterion does not apply to studies based on the repeated-measures design (e.g. Barabadi, Asma & Panahi, 2018; Kongtawee & Sappapan, 2018; Taheri & Zade, 2014). Three studies were excluded on this basis. Eighth, a study should focus solely on comparison between L1 and L2 glossing; in other words, the study was excluded when the treatment of L1 and L2 glossing was not isolated from each other (i.e. both L1 and L2 glosses were provided for the same target vocabulary). Two studies were excluded for this reason. Finally, a study should be written in English. Ultimately, a total of 26 studies remained and were included in the meta-analysis.

Flowchart for Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).
3 Data coding
The final set of studies that satisfied our inclusion criteria was coded according to the coding scheme in Table 1. The preliminary version of the coding scheme was established based on previous meta-analytic studies on glossing (Abraham, 2008; Taylor, 2013, 2014; Yanagisawa et al., 2020; Yun, 2011). For the coding procedure, two of the authors of the present study together coded seven randomly selected studies (27%; 7/26), achieving an inter-rater reliability (Kappa coefficient) of .96. Any differences between the two coders were resolved through discussions with the third author. The remaining data set were coded by the first author, while the third author randomly checked 5 of these studies to ensure accuracy of data coding.
Coding scheme.
As shown in Table 1, in the coding scheme, the first category was ‘study characteristics’, such as the author, the publication year, and type of publication. The second category was ‘learner factors’. This included participants’ English learning context and their L2 proficiency level. Participants’ instructional status (e.g. college-level, secondary-level) and their major were also included in this category. The third category was ‘research design’. This category included the type of research design (i.e. experimental, quasi-experimental, or repeated-measures). Level of randomization, if any, was coded as either ‘individual’ or ‘class’, except in the repeated-measures design. ‘Pretest’ referred to whether some type of pretest was given to the participants to confirm the homogeneity of the groups. Finally, the number of posttests and the interval between the treatment and delayed posttest, if any, were coded in this category.
The fourth category was ‘instructional features of glossing’. This category included gloss distance and density of glossed words. Gloss distance was coded as either ‘in-text’ or ‘out-text’, depending where the glossary information was provided in the selected study. For the density of glossed words, the number of glossed words divided by the total number of words included in the target text was calculated; for example, in Miyasako (2002), the number of glossed words was 15, and the total number of words included in the target text was 504; so, the density of glossed words was 3% (15/504). The fifth category was related to the calculation of the effect size. Test type refers to one of the following tests implemented in the selected study: ‘reading comprehension tests’, ‘immediate posttests of vocabulary’, or ‘delayed posttests of vocabulary’. The mean, standard deviation, and group size of L1 and L2 glossing groups were coded based on the information from the selected studies.
Among the anticipated variables in our coding scheme, we examined four (test type, L2 proficiency level, glossary distance, and density of glossed words; see Table 2), as several anticipated variables had missing values (i.e. the collected studies did not provide details regarding these variables). Further, although the English learning context variable (i.e. EFL or ESL contexts) was coded and did not have missing values, we excluded it from the meta-analysis because all studies were conducted in EFL contexts.
Coded moderator variables.
V Data analysis and results
1 Effect size calculation and overall average effect size
Effect sizes are generally used to measure treatment effects by subtracting posttest scores of control conditions from those of treatment conditions; thus, positive effect sizes indicate that treatment conditions were more effective than control conditions. Because the present study focuses on the relative effects of L1 and L2 glossing, we calculated effect sizes by subtracting scores in the L2 glossing condition from those in the L1 glossing condition, such that a positive effect size would indicate that L1 glossing was more effective than L2 glossing. To do so, Hedge’s g was chosen because it corrects small sample bias, while Cohen’s d might overestimate effect size particularly for small samples (Lipsey & Wilson, 2001). In calculating effect sizes, we found that some individual studies contained two or more independent samples; for example, Shiki (2008) had three, while Ha (2016) and Ko (2017) had two each. Thus, the collected 26 studies represented a total of 30 independent samples. Furthermore, we found that the majority of the 30 samples included more than one effect size, because there were multiple implementations of the different test types (reading comprehension tests, immediate posttests of vocabulary, delayed posttests of vocabulary). Instead of calculating one effect size for each study to avoid the dependency issue, we opted to use the multi-level meta-analysis approach in order to include multiple effect sizes for each study, following Hox, Moerbeek, and van de Schoot’s (2010) theoretical suggestion and Lee, Warschauer, and Lee’s (2019) guidance. In this way, we calculated a total of 78 effect sizes (N = 2,189) with a multi-level structure (i.e. effect sizes nested in studies), and the multi-level approach was used to calculate an average effect size based on these calculated 78 effect sizes as well as to conduct a meta-regression model to estimate the impacts of moderating variables on the average effect size. Table 3 includes a list of studies for each of the coded variables, as well as their identified values.
List of studies included in the meta-analysis.
Notes. ES = Effect size. L1 = first language. L2 = second language. a When a study has more than one independent sample, the numbers after hyphens differentiate them. b RC, V1, and V2 stand for reading comprehension tests, immediate posttests of vocabulary, and delayed posttests of vocabulary, respectively. c For instructional features of glossing, distance and density stand for glossary distance and density of glossed words, respectively. d A total of 28 students took the reading comprehension test and immediate posttest of vocabulary, and 27 took the delayed posttest of vocabulary. e A total of 28 students took the reading comprehension test and immediate posttest of vocabulary, and 26 took the delayed posttest of vocabulary. f To compute the total sample size for the meta-analysis, the sample size for Arpacı (2016) was counted as 56 (28 for L1 gloss and another 28 for L2 gloss).
After calculating the effect sizes, we checked if there was potentially any small-study effect. In a meta-analysis, where previously conducted studies are gathered to produce an average estimate, each study matters for the synthesized results; for example, it has been widely suggested that studies with small sample sizes (i.e. low precision) are more likely to produce biased results when compared to those with larger sample sizes (i.e. high precision) and that this may cause biased results for calculating overall average effect size.
To this end, we used both a funnel plot and Egger’s linear regression test to check for any sign of a small-study effect among our calculated effect size estimates. As shown in Figure 2, the funnel plot (left) illustrated that the majority of effect size estimates were inside the funnel, which encompasses the 95% confidence interval of the average effect size. Although about 21 effect sizes were outside the funnel, the majority were not, and the distribution of the effect sizes seemed symmetrical, indicating that the calculated effect size estimates are overall naturally correlated with their standard errors, so there was no notable visual sign of a small-study effect. In addition to this visual examination, the results of Egger’s test (the right panel in Figure 2) revealed that the regression line between the precision (the inverse of the standard errors) and the effect sizes had a y-intercept that included zero in its 95% confidence interval (b = −1.07, p > .05) and that there was no statistically significant sign of a small-study effect among the calculated 78 effect sizes.

Funnel plot for small-study effect (left) and Egger’s bias plot (right).
Figure 3 illustrates the distributions of the computed effect sizes (n = 78), which ranged from −1.77 to 1.81; the mean was .30 and the standard deviation was .58. The normal density plot and the skewness/kurtosis test showed that the computed effect size estimates were normally distributed (p > .05). Taken together, these descriptive statistics basically showed that the effects sizes varied across the studies and that, on average, L1 glosses were more effective than L2 glosses.

Histogram of effect size estimates with normal density plot.
However, the mean estimate computed by this approach is not accurate, because each effect size estimate has a different level of precision; therefore, when calculating averages using these estimates, appropriate weights should be considered (Lee et al., 2019). We conducted a multi-level regression approach to tackle this issue. Table 4 presents the results of multi-level meta-analysis including effect sizes (n) nested in studies (k). When conducting the regression analyses, the variances (squares of the standard errors) of effect sizes were included, making our model a ‘known-variance model’ (Lee et al., 2019; Raudenbush & Bryk, 2002). In this way, we could take into consideration not only the nested structure of the data set (i.e. effect sizes nested in studies) but also the different precision of each effect size estimate (i.e. effect sizes had a wide range of standard errors stemming from the different sample sizes). As a result, we found that, on average, L1 glossing was more effective than L2 glossing (g = .33, p < .001) based on the calculated 78 effect sizes obtained from 30 samples based on 26 studies (see Model 1) in Table 4.
Overall average effect sizes.
Notes. * p < .05. *** p < .001.
We ran an additional model by test type to compute average effect sizes for the three different test types (see Model 2 in Table 4). Results indicated the relative effectiveness of L1 as compared to L2 glossing across different test types. In particular, we found statistically significant effects on the scores of immediate posttest of vocabulary (p < .001) and delayed posttest of vocabulary (p < .05).
2 Moderator analyses: Differential impact of L2 proficiency level and instructional features of glossing
As previously mentioned, we coded three moderating variables among the anticipated variables. Moderator analyses were conducted to examine how these variables influenced the relative effects of L1 and L2 glossing. First, we conducted simple regression analyses for the three moderator variable to estimate average effect sizes for each feature, such as L2 proficiency, glossary distance, and density of glossed words. Then, we conducted a meta-regression analysis for the three moderators. The results of the meta-regression enabled us to statistically compare average effect sizes under the influence of moderator variables in each category; thus, we could interpret the impact of each moderator variable on the effectiveness of L1 glossing over L2 glossing. In doing so, we included a control variable (test type) into the meta-regression model to obtain more accurate estimates and avoid misinterpretations from the synthesis of effect sizes from different measurements. Table 5 shows three simple regression models for and a meta-regression model with L2 proficiency, glossary distance, and density of glossed words. In particular, the meta-regression model helps interpret the coefficients of each moderator variable, after controlling the other moderator variables and test type variable.
Average effect sizes by moderator variables.
Notes. * p < .05. ** p < .01. *** p < .001. A simple regression model is formulated for each feature. The meta-regression model is a comprehensive model including every feature after controlling for test type.
Table 5 presents average effect size of each of the three moderator variables. As for learners’ different L2 proficiency levels, the results indicated that L1 glossing had an average medium-sized effect over L2 glossing especially for beginner L2 learners (g = .80), while the effects were marginal for those with intermediate or higher English proficiency (g = .15). The results of the meta-regression showed a statistically significant difference between beginner level and intermediate and higher levels (ES difference = .65, p < .001); as mentioned earlier, this difference should be interpreted to mean that L1 glossing had larger impact than L2 glossing for beginner L2 learners, holding test types constant, when compared to learners at higher English proficiency levels.
For the glossary distance variable, we found that L1 glossing had average small-sized effects over L2 glossing regardless of the distance of glossary. The meta-regression results showed that the effect size difference was not statistically significant (p > .05). For the density of glossed words, the results indicated that the density did not significantly influence the relative effects of L1 glossing over L2 glossing. The results of meta-regression also revealed a non-significant effect size difference between different densities of glossed words (p > .05)
VI Discussion
The current meta-analysis was intended to synthetically obtain the relative effect of L1 glossing over L2 glossing and to identify the differential impact of L2 proficiency level as well as instructional features of glossing. The results indicated the more positive effects of L1 glossing than that of L2 glossing on overall L2 learning (i.e. reading comprehension and vocabulary learning combined). The moderator analyses further stipulated that L1 glossing was particularly effective for L2 learners with lower proficiency compared to those with higher proficiency.
1 Overall findings for L1 vs. L2 glosses across test types
Based on 26 empirical studies and 78 effect sizes, the findings of the present meta-analysis revealed that, on average, L1 glosses were slightly more effective than L2 glosses (g = .33). This accords with the results of a recent meta-analysis on glossing (Yanagisawa et al., 2020), as well as those of studies on the relative effects of teachers’ L1 and L2 input on L2 vocabulary learning (e.g. Hennebry et al., 2017; Lee & Levine, 2020; Lee & Macaro, 2013; Song & Lee, 2019; Tian & Macaro, 2012). Thus, when it comes to the explanation of new or unfamiliar L2 vocabulary, L1 input brings about more learning gains than L2 input, whether it is delivered through teachers’ oral explanation or through glossary information. These pieces of evidence together lend weight to the proposition of using the bilingual approach and support the use of L1 in L2 teaching (as do, for example, Cook, 2001; Lee, 2012; Macaro, 2009). However, the above finding should be interpreted cautiously due to its small effect size and until further studies on L1 and L2 glossing are conducted.
It is worth noting again that the pedagogical context of the sampled studies in this meta-analysis was aligned with the dual goal of offering L2 learners the opportunity to read a target L2 text and exposing them to new or unfamiliar L2 vocabulary, which they could acquire through the glossary information provided in the target text. For this reason, existing studies have examined the effects of glossing on L2 reading comprehension and/or vocabulary learning.
Analysis of test type as a moderating variable showed that L1 glosses were more effective than L2 glosses (g = .44), with a small effect size in immediate posttests of vocabulary. This finding can be accounted for by the aforementioned psycholinguistic model (Jiang, 2000, 2004), which predicts that L2 learners’ lexical processing of an unfamiliar L2 lexical item would involve the mediation of its L1 equivalents. According to Jiang (2004, p. 426), when exposed to an unfamiliar L2 lexical item, having access to the L1 equivalent would give ‘the [L2] learner a sense of certainty about the meaning of a word, a certainty that is a vital first step for reinforcing the form–meaning connection’, and would perhaps also increase the possibility of registering that lexical item in the learners’ lexicon. Jiang’s model further suggested that L2 learners are predisposed to processing L2 vocabulary using L1 inputs, indicating that L2 learners may benefit more from L1 than L2 glossing for the short-term intake of the target L2 vocabulary.
Additionally, for L2 reading comprehension tests and delayed posttests of vocabulary, L1 glossing was more effective than its L2 counterpart, although the effect sizes accounting for the differences were smaller. This finding suggests that the language of a gloss does not affect the comprehension of the target text (g = .21) and vocabulary learning in the long term (g = .28), compared to its effect on short term vocabulary learning (g = .44). Regarding L2 reading comprehension, the language of a gloss may not affect L2 learners’ comprehension when the glossed words do not add significantly to the meaning of the target text. Further, the relative effectiveness of L1 compared to L2 glosses, indicated by the results of the delayed posttests of vocabulary, may have been because of several unknown factors. However, further research is required to gauge the superiority of L1 glosses in the long-term learning of L2 vocabulary.
2 Roles of L2 proficiency and other moderator variables in the effectiveness of L1 glossing
The results of the moderator analysis showed that only L2 proficiency level was significantly related to the effects of language of gloss, and that the effect size of the difference between L1 and L2 glossing was medium for beginning learners (g = .80) and very small for higher-level ones (g = .15). This finding indicates that L1 glosses were much more effective for beginners, while language of glossing matters to a lesser extent for higher-level learners after controlling for test type. The positive role of L1 input in L2 learning, particularly for learners with beginning-level L2 proficiency, can best be accounted for by Kroll and Stewart’s (1994) Revised Hierarchical Model, which predicts that L2 learners with a limited size of L2 lexicon and/or at earlier stages of L2 learning would be more reliant on L1 equivalents of L2 vocabulary in processing L2 lexical items. Thus, this model and our finding related to L2 proficiency level provide the important pedagogical implication that L2 teachers may usefully present the meaning of new or unfamiliar L2 vocabulary in a target L2 passage initially through L1 glossing, and then gradually introduce L2 glosses to their learners as their L2 proficiency level develops.
The role of L2 glossing for higher-proficiency learners merits further discussion. Considering the small difference between L1 and L2 glossing for this group, L2 glosses may be a more appropriate linguistic device for those who have passed a certain threshold of L2 proficiency, as they can provide both a lexical explanation of unfamiliar vocabulary and L2 input (Yanagisawa et al., 2020). Additionally, given the incremental nature of L2 vocabulary learning (Schmitt, 2008), more proficient learners are likely to have a greater amount of partial knowledge of unfamiliar words, which will assist their vocabulary learning through L2 glosses. Furthermore, since more proficient learners may access more cognitive resources to engage in the learning of unfamiliar vocabulary (Lee, Warschauer & Lee, 2020), L2 glosses may be cognitively less demanding for this group than for their lower-proficiency counterparts.
The other moderator variables (i.e. glossary distance, and density of glossed words) did not show significant difference between L1 and L2 glossing. However, the effect sizes were positive, indicating that L1 glossing was more effective than its L2 counterpart across these moderator variables, although this finding should be interpreted with some caution, due to the small number of effect sizes included in the meta-analysis. Regarding glossary distance and density of glossed words in the target text in particular, the findings suggest that neither the distance between the glossary information and glossed words nor the ratio of glossed words to the length of the target text had a significant impact on the relative effects of L1 and L2 glossing. We cannot exclude the possibility that the lack of significance of these moderating variables is due to the small number of effect sizes derived from the selected studies. Some values for the moderator variables were missing due to the incomplete descriptions of methodology in some selected studies, which complicated our meta-analytic efforts (for further discussion, see the next section).
VII Limitations and further research
In this section, we discuss the limitations of this meta-analysis, along with directions for future research. The first limitation involves the small number of studies included in the meta-analysis (n = 26). We hope that researchers will conduct more empirical studies comparing the effects of L1 and L2 glossing on L2 learning, which can then be included in future meta-analytic efforts. The second limitation involves missing values for moderating variables across the selected studies. Some of the selected studies did not specify details related to one or more of the following variables: participants’ proficiency levels, the format and location of the glosses in the target text, and/or the length of the target text and number of glossed words. The lack of such information unfortunately reduced the sample size in each of the moderator analyses conducted here, although we made substantial efforts to contact the authors of these studies and ask them to fill in the missing values. In view of this limitation, greater detail regarding sampled participants, research instruments, methods of data analysis, and data collection procedure should be provided in future research, so that a more comprehensive analysis of relative effects of L1 and L2 glossing can be conducted. In addition, we suggest that future researchers converge on more standardized, more consistent ways of estimating the L2 proficiency level of their participants, as existing studies have used various ways of doing so, which weakens meta-analytic efforts to synthesize their findings.
Thus, based on the current status of the literature on the relative effects of L1 and L2 glossing on L2 learning, and keeping in mind the limitations of this meta-analysis, we call for a greater number of empirical studies on this issue, with more rigorous methodological designs and more detailed methodological information. We would also like to highlight the need to examine a wider range of moderators. Most existing studies only examined the effects of L1 and L2 glossing on passive or receptive knowledge of target L2 vocabulary, with the exceptions being Öztürk and Yorgancı (2017) and Rouhi and Mohebbi (2012), who adopted productive vocabulary tests in addition to the receptive ones. Thus, more glossing studies to examine diverse types of L2 vocabulary knowledge are needed. The frequency of the target words in the target text is another important moderator that future research should consider, especially in view of the finding of Choi (2016) that L1 glossing showed a long-term benefit over L2 glossing for target words that appeared four times but not for those that appeared twice. As most existing studies have not controlled for the frequency of appearance of the target words, future studies may stratify target words in terms of their frequency in the target texts, to examine the effects of the frequency of input on vocabulary learning through glossing. We believe that the availability of a greater number of studies on the effects of L1 and L2 glossing and their interaction with the aforementioned moderator variables, with more rigorous research designs, would allow us to estimate the effects of language of glossing more accurately and contribute to our understanding of the larger issue of monolingual versus bilingual approaches to L2 teaching.
Footnotes
Acknowledgements
The authors are grateful for anonymous reviewers’ constructive feedback and suggestions. An earlier version of this article was based on the first author’s thesis, and they would like to express their gratitude to Professors Jie-Young Kim and Ho Lee for their helpful comments.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
