Lexical Demands and Features of English Textbooks for Vietnamese 10th Graders: An In-depth Comparison of Listening Sections

Abstract

English textbooks are the most pivotal learning materials in the EFL context. However, in Vietnam, English textbooks for general education are randomized by schools, which arouses a problem of whether they are equivalent for students to gain the same proficiency. This study attempts to report the differences in lexical demands and features, including length, sophistication, and diversity, of these textbooks. To this end, the researchers compiled a corpus of 54,566 tokens extracted from their listening transcripts. Results profiled by AntwordProfiler showed that 1,000, 2,000 to 3,000, and 3,000 to 4,000 word families in the BNC/COCA wordlist plus four supplementary lists of proper nouns (PNs), marginal words (MWs), transparent compounds (TCs), and acronyms were necessary to comprehend 85%, 95%, and 98% of listening in all the textbooks, correspondingly. Moreover, the results of pairwise comparisons run by Jamovi indicated significant differences in text length and lexical diversity between the textbooks but not in sophistication. In short, these grade-10 textbooks could support students in learning English vocabulary although they impede students from using them randomly. Hence, the study heightens the roles of teachers in the classroom and paves the way for researchers who are fond of vocabulary and English textbooks.

Keywords

English textbooks vocabulary demands lexical richness listening text length

Introduction

In the Vietnamese context, English is a mandatory subject for ten-year education from Grade 3 to Grade 12 (Vu & Peters, 2021). Therefore, the foreign language programs for these grades have been particularly concerned, which leads to the compilation of series of textbooks as an instruction for English teaching and learning (Hoang, 2018). Due to the lack of language environments, textbooks hold a preeminent status far outstripping other sources in the English language classroom (Nu, 2018). They provide language lessons for students and guide teachers on how to build a course through its framework (Lau et al., 2018; Richards, 2001; To, 2018). To release new English textbook series, the Vietnam Ministry of Education and Training (hereafter MoET) has been collaborating with several well-known domestic and international publishers, listed as Pearson Education, National Geographic Learning, Oxford University Press, Cambridge University Press, Express Publishing, and Garnet Publishing. Despite having various series of textbooks, they are all designed to contain major themes relating to humans and lives (Hoang, 2018).

In 2020, the MoET stipulated the textbook selection process. Firstly, general education establishments propose a list of textbook series. Afterward, Provincial Departments of Education synthesize and send them to the Council to create practical textbook lists. Finally, the results are transferred to local administrations to approve suitable textbook series from the list. Therefore, many series of textbooks have been brought into use and permitted at random in different provinces and schools, depending on the local education syllabus. For various series of textbooks to be applicable to the same student population, one crucial hypothesis must be met. They should be equal in difficulty and maintained at an adequate level so as not to take much of students’ time or affect their lesson planning. Moreover, these series should have a similar range of vocabulary to prevent differences in the input for incidental vocabulary learning. On the contrary, students will face a disparity in qualification, making the MoET’s Project deviate from their initial goal for the national general education.

To the best of the researcher’s awareness, studies on English textbooks in Vietnam attempt to examine the sociolinguistic and pragmatic contents (Dang & Seals, 2016; Ton Nu & Murray, 2020) and teaching method or framework (Bui & Newton, 2022; To, 2018). Most of them have been conducted at the primary level although the upper secondary level is more decisive to the education path of general students in Vietnam. Since Grade 10 is the initial grade for upper secondary level in preparation for the national general English proficiency evaluation, the listening component in their textbooks is opted to get the corpus for investigation. Also, among the four main language skills, listening skill is the least researched despite its plentiful materials (Matthews & Cheng, 2015). To date, no published research has investigated and compared the spoken vocabulary in the new series. To fill in these gaps, the objective of this study is to evaluate the lexical demands, lengths, and lexical richness of grade 10 English textbooks in Vietnam. These textbook series, moreover, are compared to figure out whether they are of the same difficulty. Although the scope of this study is confined to Vietnamese Grade-10 textbooks, it can be beneficial to stakeholders from other EFL settings. Its findings will shed light upon the textbooks’ lexical features for teachers, curriculum designers, and students to plan for appropriate vocabulary teaching and learning strategies as well as for textbook writers to revise their future products. In this regard, the present study attempts to address the two research questions below:

How much vocabulary is required for grade-10 students to achieve 85%, 95%, and 98% coverage of listening sections in the eight English textbook series?

To what extent do the listening sections in these eight grade-10 English textbook series differ in terms of text length, lexical diversity, and lexical sophistication?

Literature Review

Vocabulary Knowledge and Listening Comprehension

Vocabulary is intimately involved with the listening skill (Nation, 2022). In specific, the degree to which learners can gain proficient listening comprehension is driven by their vocabulary knowledge (Ha, 2022a, 2022b; Matthews & Cheng, 2015; Trang et al., 2023). To estimate the adequate comprehension level in listening, lexical demand and lexical coverage are the two mostly used concepts in relation to vocabulary knowledge (Ha, 2021; Nation, 2022). It is commonly assumed that the strong correlation between lexical coverage and listening comprehension is reflected through the appropriate range to achieve moderate comprehension. In accordance with the results of van Zeeland and Schmitt (2013), 95% coverage is minimal, and 98% coverage is better for optimal comprehension of listening texts. In addition, high-frequency words play a crucial role in comprehension as learners can effortlessly recognize them in listening tasks (Matthews & Cheng, 2015).

To best facilitate vocabulary learning, learners should set their own long-term goal to measure the number of vocabulary needed (Nation, 2022). In the light of Nation (2022), vocabulary demand is known as a certain range of words to which a learner understands a text. From the perspective of lexical demand for learners, there is a question about the number of vocabulary needed to reach minimal and optimal comprehension of a text. Over the past decade, a large volume of published research has claimed three common thresholds for readers or listeners to comprehend a text, which are 85%, 95%, and 98% (Laufer & Ravenhorst-Kalovski, 2010; Laufer, 2013; Schmitt et al., 2011; van Zeeland & Schmitt, 2013). For language-focused instruction, students are suggested to cover no less than 85% (McLean, 2021). For acceptable comprehension, Laufer and Ravenhorst-Kalovski (2010) confirmed that the threshold should be 95%. In case the instruction focuses on meaning, 98% is the optimal threshold (Nation, 2022). It is transparent that there will be a lower density of unknown words at higher thresholds. In detail, if learners reach 95% and 98% coverage, they are just unable to identify 1 word in every 20 or 50 words.

The notion of word frequency is conceptualized relying on the occurrence of words in a certain corpus. Specifically, the present study refers to the British Nation Corpus/Corpus of Contemporary American English (BNC/COCA). This wordlist was introduced by Nation in 2012 and was described as a “better indication of word frequency” (Schmitt et al., 2017, p. 7). It is composed of 25 1,000-word-family levels and four supplementary lists including Proper Nouns (PN), Marginal Words (MW), Transparent Compounds (TC), and Acronyms (Trang et al., 2023). In the BNC/COCA wordlist, words are divided into three degrees: high-frequency, mid-frequency and low-frequency. Research on high-frequency words by Schmitt and Schmitt (2014) suggested that high-frequency levels are the first three 1,000-word levels in the BNC/COCA wordlist. These high-frequency levels are composed of function words (e.g., the, a, for, etc) and familiar content words (e.g., access, brand, century, etc). Aside from that, they make up the highest proportion of running words in almost all texts (Nation, 2022; Yang & Coxhead, 2022). Accompanied by the four supplementary lists, they aid learners in approaching even 95% of text comprehension. Following high-frequency levels are the next 6,000 mid-frequency word families spreading across the fourth to the ninth 1,000-word levels. These levels include words of medium occurrence. Words beyond the ninth level, from the tenth level onward, are classified into low-frequency words. These words account for the greatest number of word families compared to the two others. However, they take a limited percentage of a text due to their infrequency. Thus, learners do not need to care about them for the purpose of comprehension (Laufer, 2013).

The Impact of Text Length on Listening Comprehension

According to Miller (1956), because listeners have to consecutively deal with phoneme sequences, longer texts can exhaust their capacity for attention. It is apparent that the length of a text can strongly impact one’s comprehension through their memory and attentional skills (Forrin et al., 2018; Forrin et al., 2020). For readers, a text can put them in a state of distraction if it is too long to remember and understand. Similar to readers, listeners also face these problems when listening to a lengthy text. Additionally, listeners cannot deal with a passage several times as readers, therefore, they have to strive to maximize their memory and decode the content in a short time (Wolf et al., 2018). Besides, listeners are also limited in their consciousness as they simultaneously receive and process too much information while suffering the pressure of given time in listening tasks (Csikszentmihalyi, 1990). As a result, text length is a factor that has an intense influence on students’ listening comprehension.

Lexical Richness and Lexical Diversity

Lexical richness has been termed as a measurement of the variety of words in a certain sample or text (Daller et al., 2003). To the present, it is considered one of the most common terms to estimate learners’ vocabulary (Kim et al., 2018). This notion was equated with the other term lexical diversity for a long time although they had been defined distinguishingly (Jarvis, 2013). However, many researchers have recently adopted a broader notion of lexical richness. The research of Read (2000) says that lexical richness includes, but not only, lexical diversity, along with lexical sophistication.

Lexical diversity is influenced by the proportion of types or unique words in a text and their repetition (Jarvis, 2013). It is recognized as linguistic complexity, which is a useful gauge for assessing language proficiency, vocabulary knowledge, and so forth. Lexical sophistication, in the same vein, helps predict the difficulty of words and reflects the level of vocabulary knowledge in accordance with frequency levels (Jarvis, 2013). It relates to advanced words whose levels are at low frequencies (Laufer & Nation, 1995; Kim et al., 2018). In the present study, lexical sophistication is defined by the traditional notion and is measured on the basis of the frequency-based approach, which is to estimate low-frequency words from 4,000 to 25,000 word families in the BNC/COCA wordlist.

The Most Common Metrics of Lexical Diversity

Among numerous indices of lexical diversity (LD), the type-token ratio (TTR) is “the simplest and most widely used LD index” (Zenker & Kyle, 2021, p. 1), determined by dividing the number of unique words by the total number of words in a text (Daller et al., 2003; Jarvis, 2013). Nonetheless, this index is strongly affected by text length. In an attempt to seek the lexical diversity indices that are the least impacted by text length, Zenker and Kyle (2021) compared conventional and modern indices on texts of different lengths. The authors drew line graphs to present the raw lexical diversity values and z-scores of the indices, and conducted the binned analysis of Pearson correlation. The results suggested that Moving Average Type Token Ratio (MATTR), Vocab-D (HD-D), and Measure of Textual Lexical Diversity (MTLD) were the three most stable lexical diversity indices having negligible correlations with text length (r < .100).

Moving Average TTR (MATTR or MATTR50) is an adjustment of simple TTR to decrease the effect of text length (Covington & McFall, 2010). MATTR cuts the text into each 50-word segment by moving up to 1 word unit, so it is also known as MATTR50. Afterwards, each segment is averaged, and finally, all of the results are re-averaged for the output of the last MATTR.

The hypergeometric distribution diversity (HD-D) index is a metric calculated through probabilities (McCarthy & Jarvis, 2010). HD-D computes the probabilities of encountering all tokens of a type in a sample of 42 words in a text. After that, the sum of all probabilities of all types is the final value of lexical diversity.

Measure of Textual Lexical Diversity (MTLD) is a metric calculating the average of token strings reaching a certain TTR value following two directions: forward and backward (Koizumi, 2012; Zenker & Kyle, 2021). In each direction, MTLD cuts the text into different segments with the last word reaching TTR value under or equal to 0.72. If the string is under 10 words, it is not counted. In case a text impossibly reaches 0.72 at the end of the text, the remainder of the text is proportionately calculated contingent on the trajectory from 1.00 to 0.72 (Koizuimi, 2012). Later on, the total tokens of the text are divided by the times that TTRs value 0.72 or below, plus the decimal of the remainder (i.e., $\frac{t o t a l t o k e n s}{t i m e s o f T T R \leq 0.72 + d e c i m a l o f t h e r e m a i n d e r}$ ). The two values of the forward and backward directions are finally averaged to produce the last result of lexical diversity (Koizumi, 2012; McCarthy & Jarvis, 2010; Zenker & Kyle, 2021).

Studies on the Vocabulary of English Textbooks in EFL Contexts

In the Indonesia context, Aziez and Aziez (2018) examined English textbooks for junior and senior high-school students to clarify three aspects: (a) their lexical coverage by the BNC list; (b) their lexical variety calculated by the TTR index; and (c) the number of academic words and words beyond 2,000 high frequency words. For the question about lexical coverage, the study found that at 95%, students needed 4,000 word families at the junior level and 5,000 word families at the senior level. The finding for the second question addressed the TTR scores of all levels of textbooks which was equal to 0.23. In terms of academic words, for the junior level, the percentage of academic words and those beyond the 2,000-word level in the textbooks both equaled 1.75%. In the meantime, as to the senior level, these words covered 3.56% and 11.87% of the textbooks. All these findings pointed out that the vocabulary level of those textbooks was too high for the competence of the students. Therefore, Indonesia should adjust their textbooks to be compatible with the students’ knowledge.

Interested in Chinese secondary schools, Yang and Coxhead (2022) investigated the corpus from Books 3 and 4 of the New Concept English Textbook Series (NCE). In order to analyze a total of 40,895 tokens, the two researchers operated RANGE program 1.0.0 (Heatley et al., 2002) and adopted the BNC/COCA lists accompanied by the four supplementary lists. The results showed that learners needed 3,000 and 5,000 with the four supplementary lists to respectively achieve 95% and 98% coverage of Book 3, and 4,000 to 6,000 word families plus the supplementary lists for 95% and 98% coverage of Book 4. In addition, more than 85% of the textbooks fell to high-frequency words in the first three levels of the list. Focusing on the same country, Sun and Dang (2020) conducted a study on 265 voluntary high school students who used Yilin textbooks in their curriculum. For the application of the study, the researchers accumulated a corpus of 273,094 tokens from written and listening transcripts and ran the data on RANGE with the BNC/COCA wordlist. The results exhibited that to reach 95% and 98% coverage of the Yilin textbooks, learners needed to know 3,000 and 9,000 word families along with the four supplementary lists. The finding also disclosed that the Yilin textbooks were a big pressure on Chinese high school students.

To the best of the researchers’ awareness, Nguyen (2020) was one of the pioneers in analyzing the vocabulary of the new high school English textbooks published in 2020 for Vietnamese students, namely Tieng Anh 10, Tieng Anh 11, and Tieng Anh 12. 422 high school students were recruited from the entire country and the three corresponding grades were tested by the Vocabulary Levels Test (VLT; Webbbalance et al., 2017) to identify their vocabulary knowledge. Then, reading passages in the textbooks were processed by the Vocabprofilers on the Lextutor.ca website using the BNC/COCA wordlist and compared with the results of VLT. The findings pointed out that as the student’s knowledge covered the first two high-frequency levels in BNC/COCA, they could comprehend 87.1% of these textbooks. To gain the coverage of 95% and 98% of these textbooks, they were expected to possess the vocabulary knowledge of the third and fifth 1,000-word lists in BNC/COCA. As can be seen from the study of Nguyen, although the study adopted BNC/COCA, it was not accompanied by the four supplementary lists including PNs, MWs, TCs and acronyms. Therefore, the results of lexical demand could be misleading and the difficulty of the textbooks, if these lists had been counted, might have been different. Two years later, Le and Dinh (2022) investigated the edited version of the grade 10 textbook in the study of Nguyen (2020). To gain the lexical demand and coverage of high-frequency words of the textbook, they developed a corpus of 41,137 words extracted from written texts and audio transcripts and processed them on Vocabprofilers with the BNC/COCA lists. The finding indicated that Vietnamese students needed 3,000 to 5,000 word families to achieve 95% and 98% coverage of the textbook, which was challenging for them to learn without support. This finding seemingly resonated with Nguyen’s study (2020), however, the outcome did include the supplementary lists. Moreover, the researchers found that although the textbook covered the first high-frequency level, just over half of the second 1,000-word list was encountered. Thus, teachers are encouraged to adapt the textbook for teaching effectively.

Research Gaps and the Present Study

The researches reviewed above have contributed a valuable property to the literature on the vocabulary of EFL textbooks. Notwithstanding their contributions, they still expose some gaps that need addressing. Firstly, the way to evaluate textbooks in the previous studies unveils two methodological issues. With regard to lexical demand, only Sun and Dang (2020) considered the first three 1,000-word levels as high-frequency words while the others accepted the first two 1,000-word levels. Additionally, these studies were based on frequency to gauge the depth of textbooks instead of other lexical features except for the work of Aziez and Aziez (2018) measuring lexical diversity. This study, however, relied on the TTR index which is significantly sensitive to text length. To gain a more accurate insight, the present study works on the 3,000 level and takes into account the other features including length, sophistication, and diversity.

Secondly, the studies of Nguyen (2020) and Le and Dinh (2022) were conducted solely on one new English textbook series. Nonetheless, to date, there have been nine textbook series widely applied to grade 10. In spite of the emergence of other English textbook series, attempts to research them are almost nil. Accordingly, no evidence or investigation has proven the validity of these textbooks. As a result, the present study aims to bridge this gap by comparing all the series for a more comprehensive view of grade 10 English textbooks in Vietnam.

Methodology

Research Design

In this research, the quantitative method was employed to measure the lexical coverage and lexical richness of the English textbook series. The AntwordProfiler version 2.1.0 (Anthony, 2023) accompanied with BNC/COCA lists created by Nation (2017) was run to respond to Research question 1. Then, Research question 2 was answered by calculating a sort of lexical diversity indices with Tool for the Automatic Analysis of Lexical Diversity (TAALED; Kyleet al., 2021). The study also employed Jamovi version 2.3.28 to examine whether these textbooks significantly varied through pairwise comparisons.

Data Collection

In the period of 2022 to 2023, the MoET released nine new English textbook series for grade 10 students. Unfortunately, the authors’ survey on the selection of textbooks indicated that very few provinces have opted for Macmillan Move On, and it was also unavailable for download. Thus, it was excluded from the data source of this study. As mentioned earlier, the key data was the listening sections in those English textbooks. The corpus included a total of 54,566 tokens. The size of units and passages were different from each other. Information relating to the eight series of English textbooks is presented in Table 1.

Table 1.

Information of the Eight Textbook Series.

Publishers	Series	Year of publication	Target learners	Number of units	Number of Passages	Number of tokens
Hue University Publishing House, Express Publishing & DTP Education Solutions	Bright	2022	Grade 10	8 units	26	5.532
Vietnam National University Publishing House & Garnet Publishing	C21 Smart	2020	Grade 10	10 units	42	9.095
HCMC University of Education Publishing House & Pearson	English Discovery	2022	Grade 10	9 units & 1 CLIL unit	49	8.540
HCMC University of Education Publishing House & National Geographic Learning	Explore New Worlds	2022	Grade 10	12 units	31	4.258
Vietnam Education Publishing House Limited Company & Oxford University Press	Friends Global	2020	Grade 10	8 units	77	12.487
Vietnam Education Publishing House Limited Company & Pearson	Global Success	2022	Grade 10	10 units & 4 Review units	36	5.662
Hue University Publishing House & DTP Education Solutions	i-Learn Smart World	2022	Grade 10	10 units & 1 Review unit	25	5.140
HCMC University of Education Publishing House & Cambridge University Press	THiNK	2021	Grade 10	8 units, 1 Welcome unit & 2 Review units	25	3.852

CC: Culture Corner; CLIL: Content and Language Integrated Learning; PC: Progress Check.

Source: author.

Data Preparation

The listening transcripts were mostly taken from teacher’s books of these series. Some of them were collected from the subtitles of the audios. Then, a data extraction tool from Microsoft named PowerToys was applied to convert raw texts from the data sources into text files. However, there were some spelling errors due to the unoptimized function of the tool. It could be seen that the characters “m” and “rn” were quite similar, which caused some misinterpretations in the scanning process of the tool. Besides, some uppercase characters such as “I” and “O” were converted as the numbers “1” and “0.” Thus, the researcher retyped the errors manually, removed redundant words, and re-added missing words to the corpus.

Data Analysis

The AntWordProfiler (Anthony, 2023) was utilized to process all of the text files. The program used the BNC/COCA list (Nation, 2017) to classify words into their frequency levels, count their occurrences, and calculate their sums and percentages. Tokens were arranged in 25 levels and four supplementary lists, moreover, outlying tokens would be put in “Not in the list.” Then, those not-in-the-list words would be returned to their correct positions in the four supplementary lists. The updated data would be processed to report the coverage of each full textbook, and each unit in each textbook was further analyzed independently.

Research question 2 could be answered based on the results of each unit in terms of length, sophistication and diversity. Length was counted by the total tokens of each unit. At the same time, lexical sophistication was calculated according to the sum of word families from the 4,000 to 25,000 levels ( $L e x i c a l s o p h i s t i c a t i o n = \frac{T o k e n s f r o m 4, 000 t o 25, 000 l e v e l s}{T o t a l t o k e n s} \times 100 %$ ), and diversity was computed on TAALED using the three lexical diversity indices: MATTR50, HD-D, and MTLD Original. In addition, each textbook was coded according to alphabetic characters, and units in the same textbook series were coded with the same letter. Finally, the data were run by Jamovi to make pairwise comparisons relating to length, sophistication and diversity.

Results

RQ1: How Much Vocabulary Do Grade-10 Students Need to Achieve 95% and 98% Coverage of Listening Sections in the Eight English Series of Textbooks?

Table 2 illustrates the lexical coverage of each level in two cases whether learners have the awareness of the four supplementary lists of PNs, MWs, TCs and Acronyms or not. Learners who could understand these lists just needed 2,000 or 3,000 word families to reach 95% of the textbooks. In contrast, learners without recognition of them had to double, triple, or even more their vocabulary knowledge to comprehend 95% of the textbooks.

Table 2.

Cumulative Coverage With and Without the Four Supplementary Lists in the BNC/COCA.

Wordlist	Bright		C21 smart		English discovery		Explore new worlds		Friend global		Global success		i-learn smart world		THiNK
Wordlist	Without	With	Without	With	Without	With	Without	With	Without	With	Without	With	Without	With	Without	With
1.000	87,11	91,23	83,93	86,63	87,35	91,17	88,05	92,44	88,25	91,61	82,13	88,65	81,34	89,43	88,79	92,42
2.000	91,76	95,88	91,11	93,81	92	95,82	92,51	96,90	93,23	96,59	89,05	95,57	87,53	95,62	93,33	96,96
3.000	94,02	98,14	94,79	97,49	93,77	97,59	93,4	97,79	94,77	98,13	91,61	98,13	89,9	97,99	94,42	98,05
4.000	94,96	99,08	95,63	98,33	94,54	98,36	94,2	98,59	95,55	98,91	92,14	98,66	90,7	98,79	94,86	98,49
5.000	95,39	99,51	96,07	98,77	95,33	99,15	94,93	99,32	95,89	99,25	92,6	99,12	91,09	99,18	95,35	98,98
6.000	95,59	99,71	96,6	99,30	95,57	99,39	95,02	99,41	96,05	99,41	92,78	99,30	91,34	99,43	95,61	99,24
7.000	95,64	99,76	96,83	99,53	95,74	99,56	95,28	99,67	96,16	99,52	92,78	99,30	91,36	99,45	95,74	99,37
8.000	95,7	99,82	96,88	99,58	95,75	99,57	95,42	99,81	96,23	99,59	92,99	99,51	91,56	99,65	95,92	99,55
9.000	95,79	99,91	96,99	99,69	95,78	99,60	95,47	99,86	96,28	99,64	93,01	99,53	91,58	99,67	96,21	99,84
10.000	95,79	99,91	96,99	99,69	95,87	99,69	95,51	99,90	96,39	99,75	93,01	99,53	91,58	99,67	96,21	99,84
11.000	95,81	99,93	97,05	99,75	95,91	99,73	95,51	99,90	96,4	99,76	93,01	99,53	91,58	99,67	96,24	99,87
12.000	95,81	99,93	97,06	99,76	95,91	99,73	95,51	99,90	96,42	99,78	93,04	99,56	91,6	99,69	96,24	99,87
13.000	95,82	99,94	97,09	99,79	95,93	99,75	95,51	99,90	96,48	99,84	93,16	99,68	91,67	99,76	96,24	99,87
14.000	95,84	99,96	97,14	99,84	95,93	99,75	95,51	99,90	96,49	99,85	93,16	99,68	91,67	99,76	96,24	99,87
15.000	95,84	99,96	97,17	99,87	95,98	99,80	95,51	99,90	96,52	99,88	93,18	99,70	91,69	99,78	96,31	99,94
16.000	95,84	99,96	97,19	99,89	96,04	99,86	95,51	99,90	96,55	99,91	93,2	99,72	91,79	99,88	96,31	99,94
17.000	95,86	99,98	97,21	99,91	96,08	99,90	95,51	99,90	96,56	99,92	93,24	99,76	91,79	99,88	96,31	99,94
18.000	95,86	99,98	97,23	99,93	96,08	99,90	95,51	99,90	96,57	99,93	93,24	99,76	91,83	99,92	96,31	99,94
19.000	95,86	99,98	97,23	99,93	96,08	99,90	95,51	99,90	96,58	99,94	93,24	99,76	91,83	99,92	96,31	99,94
20.000	95,86	99,98	97,23	99,93	96,08	99,90	95,51	99,90	96,58	99,94	93,24	99,76	91,83	99,92	96,31	99,94
21.000	95,86	99,98	97,23	99,93	96,08	99,90	95,51	99,90	96,58	99,94	93,25	99,77	91,83	99,92	96,31	99,94
22.000	95,86	99,98	97,23	99,93	96,08	99,90	95,51	99,90	96,58	99,94	93,27	99,79	91,83	99,92	96,31	99,94
23.000	95,86	99,98	97,24	99,94	96,08	99,90	95,51	99,90	96,59	99,95	93,27	99,79	91,83	99,92	96,31	99,94
24.000	95,86	99,98	97,24	99,94	96,08	99,90	95,51	99,90	96,59	99,95	93,27	99,79	91,83	99,92	96,31	99,94
25.000	95,86	99,98	97,25	99,95	96,08	99,90	95,51	99,90	96,59	99,95	93,27	99,79	91,83	99,92	96,31	99,94
Total	5.532		9.095		8.540		4.258		12.487		5.662		5.140		3.852

In the situation of not being aware of these lists, learners could only reach a maximum of around 97%. Therefore, learners were required to understand PNs, MWs, TCs and Acronyms so that they just needed about 3,000 to 4,000 word families to achieve 98% coverage.

Figure 1 demonstrates the vocabulary demand to understand 85%, 95%, and 98% of listening parts of the eight textbook series. It has been shown that all of these series just needed the first frequency level to achieve 85% of the words in the series. In terms of the 95% threshold, except C-21 Smart requiring 2,000 to 3,000 word families, all the remaining series were at 1,000 to 2,000 levels of the BNC/COCA lists. Regarding 98% coverage, Bright, Friend Global, Global Success and THiNK needed from 2,000 to 3,000 word families while the others needed more than 1,000.

Figure 1.

Vocabulary needed to understand 85%, 95% and 98% listening parts of the eight textbook series.

RQ2: To What Extent Do the Listening Sections in These Eight Grade-10 English Textbook Series Vary?

Table 3 sheds a light on the correlation of text length to lexical sophistication and diversity. It is apparent that the relationship between text length and sophistication were negligible. In terms of diversity, there were three indices commended by Zenker and Kyle (2021), including MATTR, HD-D and MTLD. These are the three most reliable metrics to measure lexical diversity across different text lengths.

Table 3.

Correlations Between Length, Lexical Sophistication, and Lexical Diversity Indices.

		Sophistication	Diversity
	Correlation	% 4–25k	MATTR50	HD-D	MTLD Original
Length	Pearson’s r	0.097	−0.029	0.486	0.128
	df	82	82	82	82
	p-value	.382	.795	<.001	.246

In Table 3, the result of MATTR50 (r = −0.029) showed the lowest correlation, indicating that it would show the lexical diversity of the textbooks the most accurately and reliably with the lowest text length effect. In addition, the p-value of MATTR50 was equal to .795 which was far larger than .05. It could be seen that the result was insignificant, in other words, there was no correlation between text length and lexical diversity.

In Table 4, the Skewness and Kurtosis values were reported to identify the distribution and shape of the histogram for the purpose of choosing one among the three central tendencies (i.e., mean, median, and mode). There was outlying data out of the interval of [−2; 2] (Curran et al., 1996). Due to the asymmetric histogram, different distribution and outlying figures, the data was somewhere not normally distributed. Therefore, a non-parametric test represented by the median was run to make pairwise comparisons of the eight textbooks.

Table 4.

Descriptive Statistics of Listening Transcripts in the Eight English Textbooks Series on Length, Sophistication, and Diversity.

Participant		Mean	Median	Min	Max	SD	Skewness	Kurtosis
A (Bright)	Length	691,500	654,500	533	941	140,2151	0.7468	−0.3905
	Sophistication	1,865	1,687	0.767	3.242	0.8153	0.5621	−0.2214
	Diversity	0.774	0.774	0.747	0.813	0.0231	0.4560	−0.7218
B (C21 Smart)	Length	909,500	907,000	597	1278	210,0298	0.4078	−0.0925
	Sophistication	2,360	1,806	0.366	4.281	1.3809	0.2119	−1.6409
	Diversity	0.762	0.758	0.733	0.784	0.0146	−0.3331	0.6076
C (English discovery)	Length	854,000	964,000	189	1190	339,8523	−0.9263	−0.1548
	Sophistication	2.375	1.977	0.930	3.906	1.0106	0.4250	−1.0909
	Diversity	0.749	0.747	0.711	0.816	0.0321	0.9668	0.7972
D (Explore new world)	Length	354,833	367,000	256	441	65,2448	−0.2719	−1.0853
	Sophistication	2.061	1.988	0.263	3.628	1.1142	−0.0781	−1.3476
	Diversity	0.731	0.727	0.617	0.793	0.0425	−1.6403	4.7425
E (Friends global)	Length	1560,875	1527,500	1386	1960	182,6851	1.6788	3.3981
	Sophistication	1.866	2.008	0.924	2.613	0.6668	−0.3467	−1.8378
	Diversity	0.758	0.758	0.740	0.771	0.0111	−0.3905	−1.2892
F (Global success)	Length	404,429	470,000	95	726	186,7292	−0.5423	−0.4411
	Sophistication	1.538	1.408	0.000	4.255	1.3136	0.4945	−0.4352
	Diversity	0.780	0.786	0.730	0.842	0.0298	0.0518	0.3001
G (i-Learn smart world)	Length	467,273	430,000	185	1020	209,1206	1.9014	5.4751
	Sophistication	1.956	2.257	0.541	3.218	0.9341	−0.1040	−1.5614
	Diversity	0.775	0.780	0.727	0.812	0.0227	−0.6740	1.0627
H (THiNK)	Length	350,182	287,000	238	1037	230,3457	3.1859	10.3578
	Sophistication	1.795	2.101	0.000	3.689	1.0958	0.0444	−0.5800
	Diversity	0.740	0.740	0.717	0.769	0.0136	0.5130	1.2315

In Table 5, the Shapiro-Wilk test was run to determine the normality of statistics. While the Shapiro-Wilk of Sophistication, W = 0.973, reflected that the assumption of normality was not violated (p = 0.072), the statistics of Length and Diversity, W = 0.945 and W = 0.936 respectively, showed departures from normality (p = .001 and p < 0.001). Hence, the Kruskal-Wallis test was reported.

Table 5.

Shapiro-Wilk test of Normality.

Comparison	W	p
Length	0.945	.001
Sophistication	0.973	.072
Diversity	0.936	<.001

Due to the violation of normality, a non-parametric test called Kruskal-Wallis was performed on the three constructs including length, sophistication and diversity. Then, multiple comparisons were conducted on the medians of the three constructs. As observed in Table 6, the results of length and diversity indicated significant divergence with p-values being smaller than 0.001 (p < .001).

Table 6.

Results of the Kruskal-Wallis Test.

Variable	x2	df	p
Length	55.57	7	<.001
Sophistication	4.81	7	.683
Diversity	30.03	7	<.001

The ANOVAs were significant for length and diversity, then, Dwass-Steel-Critchlow-Fligner post-hoc results were reported in Table 7. Of all the textbooks, regarding Length, D (Explore New Worlds), E (Friends Global), F (Global Success), G (i-Learn Smart World) and H (THiNK) were the five most noticeable series. The trinity of D, E and F were significantly different from A, B and C. Besides this trinity, both G and H had a statistically significant difference with B and E. In addition, there were also two other pairs showing significant differences in length, including D versus E and E versus F.

Table 7.

Pairwise Comparisons Between the Listening Transcripts in the Eight English Textbooks.

		Length		Sophistication		Diversity
Series		W	p	W	p	W	p
A	B	3,457	.220	0.754	.999	−1,131	.993
A	C	1,885	.887	1.759	.919	−2,513	.636
A	D	−5,239	.005	0.327	1.000	−3,601	.176
A	E	4,753	.018	−0.149	1.000	−2,228	.765
A	F	−4,924	.012	−1.160	992	0,772	.999
A	G	−3,972	.093	0.234	1.000	0,467	1.000
A	H	−4,204	.059	0.000	1.000	−4,437	.036
B	C	−0,107	1.000	0.535	1.000	−2,138	.802
B	D	−5,597	.002	−1.026	.996	−3,637	.166
B	E	5,026	.009	−0.628	1.000	−1,005	.997
B	F	−5,632	.002	−2.072	.826	2,567	.610
B	G	−4,682	.021	−1.394	.977	2,390	.694
B	H	−4,681	.021	−1.195	.991	−3,983	.091
C	D	−4,477	.033	−1.212	.990	−1,119	.994
C	E	5,026	.009	−1.508	.964	1,634	.944
C	F	−4,390	.040	−2.238	.761	3,230	.303
C	G	−3,885	.109	−1.295	.985	3,087	.362
C	H	−3,784	.130	−1.494	.966	−0,598	1.000
D	E	5,239	.005	−0.546	1.000	2,837	.478
D	F	2,474	.655	−1.820	.904	4,655	.022
D	G	2,961	.419	−0.522	1.000	4,352	.044
D	H	−2,960	.420	−0.957	.998	0,957	.998
E	F	−5,407	.003	−1.063	.995	2,896	.450
E	G	−5,140	.007	0.117	1.000	3,036	.385
E	H	−5,138	.007	−0.584	1.000	−3,503	.205
F	G	−0,465	1.000	1.472	968	−1,084	.995
F	H	−1,781	.914	0.659	1.000	−4,336	.045
G	H	−3,438	.226	−0.882	.999	−4,411	.038

With regard to diversity, H draws great attention by virtue of its difference to most of the series. H was significantly different from three of seven textbooks involving A, F, and G. Aside from that, D was the only textbook that showed a significant difference with F and G. It can be seen from the results that H was the most special textbook in terms of length as well as diversity. In respect of diversity, the trinity of D, E, and F were the three most noteworthy.

Discussion

It can be seen from the analysis that the eight English textbook series had a comparatively equivalent lexical demand for 85%, 95% and 98% of the listening transcripts. Detailedly, these three strands could represent three kinds of instructions in language learning. All of these textbooks appropriately facilitated students’ language-focused learning at around the 1,000 level. Comparing this result with that of Nguyen (2020), which showed that students could understand 87.1% of textbooks with words at the 2,000 bands, the new series were adjusted to be more suitable and less difficult than the previous ones. In comparison with the study of Yang and Coxhead (2022), the NCE textbooks in China needed the first three high-frequency levels for 85% coverage. It is transparent that the textbooks in China were more difficult than the new textbooks in Vietnam.

At 95% coverage, the eight English textbook series were relatively equal. All of them just required nearly 2,000 word families, except for C21 Smart which required upwards of 2,000 word families. In the research of Nguyen (2020) and Le and Dinh (2022), the previous textbooks required students to master 3,000 word families. In essence, the Tieng Anh 10 textbook was apparently more difficult than the eight new textbooks. In comparison with the vocabulary demands in the two studies of Sun and Dang (2020) and Yang and Coxhead (2022), Chinese students had to achieve 1,000 to 2,000 more word families than the students using the eight textbooks in Vietnam. All of these findings shine a light that the vocabulary demands of the textbooks in Vietnam are less laborious than those in the other EFL countries, particularly in China.

As regards optimal comprehension, the eight series fell into the range of 3,000 to 4,000 word families. To be more specific, two main tendencies could be noticed: (1) four textbooks (i.e. Bright, Friends Global, Global Success, and THiNK) did not exceed 3,000 word families, whereas (2) the others were close to the fourth levels. The results stand out that some series reached mid-frequency levels when it comes to the coverage of 98% of running words. Compared to the vocabulary demands of the textbooks in Nguyen’s research (2020), and more recently in Le and Dinh’s research (2020), which were up to 5,000 word families, the preceding series were 1.25 to 1.7 times higher than the new series. Similarly, Book 3 and Book 4 in the NCE textbooks in China correspondingly demanded 5,000 and 6,000 word families, and the Yilin textbooks had an unexpected vocabulary demand escalating to around 8,000 to 11,000 word levels. It can be deduced that the new series of English textbooks in Vietnam are less lexically demanding than these textbooks.

As seen from the three thresholds above, the new series of English textbooks in Vietnam necessitated moderate vocabulary compared to the textbooks in the other contexts. The gap of 3% from 95% to 98% of the textbooks in Vietnam is also more realistic than that of the other textbooks, which helps lessen the time and difficulty for students in learning vocabulary. In spite of that, the lexical coverage of these textbooks is still profoundly difficult for grade-10 students. Researching the receptive knowledge of university students in Vietnam, Dang (2020) pointed out that about one-half of the students had not approached the first high-frequency word level and just one-tenth of them could master a maximum of 2,000 words. Her research focused on university students whereas the present study targeted grade-10 students. Therefore, it would be seemingly impossible for grade-10 students to comprehend textbooks that are even beyond the knowledge of university students. Consequently, textbook writers and administrators should realise the shortcomings of these new series of textbooks so as to adjust them. One of the feasible solutions is to apply a profiling program to control the vocabulary load of these textbooks. Furthermore, the level of English proficiency is different among students, therefore, they should conduct investigations and empirical studies to offer an appropriate range of vocabulary for students across the country. Moreover, they also have to modify the difficulty of these textbooks to guarantee the equivalent outcome among students.

As remarked above, the four supplementary lists would be a decisive factor in the coverage of the textbooks since they can reduce the word levels and the number of words that students actually need to learn. The reason for this is that even if these words are unfamiliar to students, they can be easily predicted. For instance, PNs can be recognized by capital forms (e.g., Jayden, Nguyen Thi Dinh, Phong Nha, Kyoto, China, Nestle, Facebook, Instagram, etc). Aside from PNs, students can also recognise MWs (e.g., eww, err, haha, uh, uhm, etc.), familiar TCs (e.g., smartphones, ballgame, mindmap, etc.) and acronyms (e.g., CCTV, BTS, USB, etc.). Hence, the four supplementary lists support much for the students. Besides, most PNs were relevant to Vietnamese places or names, therefore, students could easily identify them owing to their familiarity. This finding is entirely in line with that of Le and Dinh (2022). As a result, supplementary lists have great benefits for students to comprehend the listening parts in the eight textbooks.

Other important findings on the lexical features of these textbooks were also detected. First, the most noticeable feature was the different text lengths between these textbooks. As reported in Tables 4 and 7, there were some texts containing even less than 100 words, whereas some transcripts could be at least 1,900 running words. These dramatic differences can make students learning different textbooks diverge in knowledge acquisition (Mesmer & Hiebert, 2015). Detailedly, longer texts will take students more time and more effort to comprehend. So, they will be under more pressure and less productive in class. Moreover, longer texts will distract students’ attention from the main idea of the text and make them unable to remember messages while listening.

The next feature to be compared is the lexical sophistication of the textbooks. Thanks to their insignificant differences, the number of advanced words in the textbooks was equal. In both studies in Vietnam, advanced words accounted for 3.4% of Tieng Anh 10 (Le & Dinh, 2022; Nguyen, 2020;). Meanwhile, the lexical sophistication scores of the present study fluctuated from 1.7 to 2.3. The new series, hence, partially solves the problem of Tieng Anh 10 by substantially cutting off the number of advanced words and mostly focusing on high-frequency words. Also, in the statement of Nation (2022), the percentage of around 2% is the most effective for students’ vocabulary absorption and retention. In comparison with the studies in the Chinese context, the vocabulary demand of the Yilin textbooks stretched to the 11,000 word level and the percentage of words in 4,000 to 25,000 levels took up just 2.74% (Sun & Dang, 2020). It can be noticed that there was a slight disparity in the proportion of sophisticated words in the Yilin textbooks and those in the present study. One more interesting finding is that the sophisticated words in the Tieng Anh 10 textbook put an end to the 16,000- or 17,000-word level while some of the textbooks in the eight new series can spread to the 25.000-word level. In such a case, even though the new series of textbooks cut down the number of sophisticated words, the occurrence of rare and extraordinary words seems to be more than that of advanced words. Accordingly, the new textbooks still hinder students’ vocabulary learning.

Daller et al. (2003) deemed that lexical sophistication affects the ratio of lexical variety in a text, in other words, they are proportional to each other. It is somewhat surprising that lexical diversity was inverse to sophistication in this study and had the same tendency as length in that the lexical diversity was significantly varied between the eight textbooks. The finding proves that although the eight textbooks offer an equivalent amount of advanced vocabulary, old types also repeat regularly. This can be explained by the presence of conversations in listening sections. The study of Kim (2002) clearly demonstrated that repetition is a mechanism that enables speakers to maintain the talk. Therefore, the number of new types will decrease as the conversation proceeds. Nonetheless, the fact that textbook writers cannot control the equivalence in lexical diversity between the eight textbooks will be detrimental to students’ vocabulary uptake and learning. It will also cause differences in students’ proficiency.

Due to these lexical problems, textbook writers should re-measure the level of words in these textbooks. They can refer to some frequency lists like the BNC/COCA lists to limit the occurrence of rare words. Especially for listening sections, students will be extremely confused as they aurally encounter rare words. Not only that, textbook writers should balance text lengths to cope with the case that students are distracted during long listening. Besides, teachers also hold a keynote position in the classroom. They should consider and be aware of the target proficiency to plan for teaching methods and adapt the textbooks more effectively. A common advice is that the Updated Vocabulary Levels Test (Webb et al., 2017) should be applied to evaluate the vocabulary size of students. Moreover, Younas and Dong (2024) have proven that animated movies are considerably effective for linking vocabulary with its use in the real world through visual and auditory skills. Hence, teachers can browse through some movies having a similar range of vocabulary with these textbooks to increase their student’s exposure to newly acquired words. Apart from teaching the written and oral forms of vocabulary and context-based instruction, metaphorical competence, which is the ability to identify and interpret metaphors in listening and reading, is also recommended to train vocabulary for learners (Zhou et al., 2022). Additionally, the teachers should also be flexible in teaching by eliminating redundancy in listening sections of each unit or textbook and focusing mostly on key lessons.

Conclusion

This study is undertaken to compare the vocabulary in the eight English textbook series in Vietnam. It has provided insights into the soundness of randomising EFL textbooks, with a particular focus on the lexical features of their listening texts. The findings showed that at 85% coverage, students needed approximately 1,000 word families; for adequate comprehension at 95%, the figure stretched to 2,000 to 3,000 word families; and for optimal comprehension at 98% coverage, 3,000 to 4,000 word families were required. Certainly, this vocabulary demand included the four supplementary lists in the BNC/COCA lists and they were applicable to all of the textbooks. Furthermore, although the differences in lexical sophistication of those textbooks were insignificant, their lengths and lexical diversity were significantly varied. In conclusion, despite their equivalence in lexical demands and sophistication, the significant differences in length and diversity will be a considerable impediment to using these textbooks randomly. Besides, the lexical demands of these textbooks were still high, which hampers students’ vocabulary learning.

Limitations and Directions for Future Research

Despite a number of evident contributions, there are some limitations that need to be addressed in future research. First, the data source of this study excluded Macmillan Move On due to its limited access. Therefore, the researcher could not compare it with the other new textbooks to give suggestions for students or teachers who use this textbook. Future research can focus on this series to extend the literature on the vocabulary of Grade 10 English textbooks in Vietnam. Secondly, the study just explored the vocabulary coverage of the eight textbooks but did not test the vocabulary knowledge of students to identify the proper number of words needed. This opens the opportunity for research on examining the alignment between the lexical demands and levels of the textbooks with the student’s knowledge using The Updated Vocabulary Levels Test (Webb et al., 2017) or The New Vocabulary Levels Test (McLean & Kramer, 2015). Another research direction, which is the most concerning to the researcher, is the lack of an investigation into the knowledge advancement in textbooks across grades and levels. The lexical difficulty of textbooks for the following grades should gradually increase to guarantee students’ progress. Future studies can aim at comparing the vocabulary in EFL textbooks of different grades.

Footnotes

Acknowledgements

The authors would like to sincerely thank Mr. Hung Tan Ha, Victoria University of Wellington, New Zealand for his great support to this research project.

Authors’ Contribution

The authors declare that this is their original work, except where proper citations are made. It is not considered for publication anywhere else.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is funded by University of Economics Ho Chi Minh City, Vietnam.

Ethical Approval

Ethical approval is not applicable for this article.

Statement of Human and Animal Rights

This article does not contain any studies with human or animal subjects.

Statement of Informed Consent

There are no human subjects in this article and informed consent is not applicable.

ORCID iDs

Nhi Hoa Mai

Nam Nhat Lien

Nguyen Huynh Trang

Data Availability Statement

The data that support the findings of this study are openly available in Figshare at .

References

Alsaif

Milton

(2012). Vocabulary input from school textbooks as a potential contributor to the small vocabulary uptake gained by English as a foreign language learners in Saudi Arabia. The Language Learning Journal, 40(1), 21–33. https://doi.org/10.1080/09571736.2012.658221

Anthony

(2023). AntWordProfiler (Version 2.1.0) [Computer Software]. Waseda University. https://www.laurenceanthony.net/software

Aziez

(2018). The vocabulary input of Indonesia’s English textbooks and national examination texts for junior and senior high schools. TESOL International Journal, 13(3), 66–67.

Bauer

Nation

(1993). Word Families. International Journal of Lexicography, 6(4), 253–279. https://doi.org/10.1093/ijl/6.4.253

Bộ Giáo dục và Đào tạo [MoET] (2018a). Chương trình giáo phổ thông: Chương trình tổng thể [General School Education Curriculum] (Circular No. 32/2018/TT-BGDĐT).

Bộ Giáo dục và Đào tạo [MoET] (2018b). Chương trình giáo dục phổ thông: Chương trình môn tiếng Anh [General Education English Curriculum] (Circular No. 32/2018/TT-BGDĐT).

Bộ Giáo dục và Đào tạo [MoET] (2018c). Chương trình giáo dục phổ thông làm quen tiếng Anh lớp 1 và lớp 2 [General School Education Introductory English Curriculum for Grade 1 and Grade 2] (Circular No. 32/2018/TT-BGDĐT).

Bộ Giáo dục và Đào tạo [MoET] (2020). Thông tư về Quy định việc Lựa chọn Sách giáo khoa trong cơ sở Giáo dục phổ thông [Circulars on the Selection of textbooks in general education institutions] (Circular No. 25/2020/TT-BGDĐT).

Bộ Giáo dục và Đào tạo [MoET] (2022). Danh mục sách giáo khoa lớp 10 sử dụng trong cơ sở giáo dục phổ thông [List of grade-10 textbooks used in general education institutions] (Decision No.442/QĐ-BGDĐT).

10.

Bui

T. L. D.

Newton

(2022). Developing task-based lessons from PPP lessons: A case of primary English textbooks in Vietnam. RELC Journal, 53(1), 203–215. https://doi.org/10.1177/0033688220912040

11.

Cheng

Matthews

Lange

McLean

(2022). Aural single-word and aural phrasal verb knowledge and their relationships to L2 listening comprehension. TESOL Quarterly, 57(1), 213–241. https://doi.org/10.1002/tesq.3137

12.

Cob

(2008). Web Vocabprofiler (Version 2.6) [Computer software].

13.

Covington

M. A.

McFall

J. D.

(2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100. https://doi.org/10.1080/09296171003643098

14.

Council of Europe (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press. http://www.coe.int/t/dg4/education/elp/elpreg/Source/Key_reference/CEFR_EN.pdf

15.

Crossley

S. A.

McNamara

(2010). Cohesion, coherence, and expert evaluations of writing proficiency. In Catrambone

Ohlsson

(Eds.), Proceedings of the 32nd annual conference of the cognitive science society (pp. 984–989).

16.

Crystal

(2012). English as a global language (2nd ed.). Cambridge University Press.

17.

Csikszentmihalyi

(1990). The psychology of optimal experience (1st ed.). Harper & Row Publishers.

18.

Curran

P. J.

West

S. G.

Finch

J. F.

(1996). The robustness of test statistics to nonnormality and specification error inconfirmatory factor analysis. Psychological Methods, 1(1), 16–29. https://doi.org/10.1037/1082-989X.1.1.16

19.

Daller

van Hout

Treffers-Daller

(2003). Lexical Richness in the spontaneous speech of bilinguals. Applied Linguistics, 24(2), 197-222. https://doi.org/10.1093/applin/24.2.197.

20.

Dang

T. C. T.

Seals

(2016). An evaluation of primary English textbooks in Vietnam: A sociolinguistic perspective. TESOL Journal, 9(1), 93–113. https://doi.org/10.1002/tesj.309.

21.

Dang

T. N. Y.

(2020). Vietnamese Non-English Majored EFL University Students’ Receptive Knowledge of the Most Frequent English Words. VNU Journal of Foreign Studies, 36(3), 1-11. https://repository.vnu.edu.vn/bitstream/VNU_123/89354/1/VIETNAMESE%20NON-ENGLISH%20MAJORED%20EFL.pdf

22.

Dunkel

(1991). Listening in the native and second/foreign language: Toward an integration of research and practice. TESOL Quarterly, 25(3), 431-457. https://doi.org/10.2307/3586979

23.

Forrin

N. D.

Mills

D’Mello

S. K.

Risko

E. F.

Smilek

Seli

(2020). TL;DR: Longer sections of text increase rates of unintentional mind-wandering. The Journal of Experimental Education, 89(2), 278–290. https://doi.org/10.1080/00220973.2020.1751578

24.

Forrin

N. D.

Risko

E. F.

Smilek

(2018). In the eye of the beholder: Evaluative context modulates mind-wandering. Acta Psychologica, 185, 172–179. https://doi.org/10.1016/j.actpsy.2018.02.005

25.

Gregory

(1978). Statistical methods and the geographer (1st ed.). Routledge. https://doi.org/10.4324/9781315837185

26.

H. T.

(2021). Exploring the relationships between various dimensions of receptive vocabulary knowledge and L2 listening and reading comprehension. Language Testing in Asia, 11, Article 20. https://doi.org/10.1186/s40468-021-00131-8

27.

Hashimoto

Egbert

(2019). More than frequency? Exploring predictors of word difficulty for second language learners. Language Learning, 69(4), 839–872. https://doi.org/10.1111/lang.12353

28.

H. T.

(2022a). Vocabulary demands of informal spoken English revisited: What does it take to understand movies, TV programs, and soap operas? Frontiers in Psychology, 13, 831684. https://doi.org/10.3389/fpsyg.2022.831684

29.

H. T.

(2022b). Lexical profile of newspapers revisited: A corpus-based analysis. Frontiers in Psychology, 13, 800983. https://doi.org/10.3389/fpsyg.2022.800983

30.

Heatley

Nation

I. S. P.

Coxhead

(2002). Range: A program for the analysis of vocabulary in texts [software]. https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-analysis-programs

31.

Hoang

V. V.

(2016). Renovation in curriculum design and textbook development: An effective solution to improving the quality of English teaching in Vietnamese schools in the context of integration and globalization. VNU Journal Of Science: Education Research, 32(4), 9–20. https://doi.org/10.25073/2525-2445/vnufs.4866

32.

Hoang

V. V.

(2018). MoET’s three pilot English language communicational curricula for schools in Vietnam: Rationale, design and implementation. VNU Journal Of Foreign Studies, 34(2), 1-25. https://doi.org/10.25073/2525-2445/vnufs.4258

33.

Hoang

V. V.

(2020). The roles and status of english in present-day Vietnam: A socio-cultural analysis. VNU Journal of Foreign Studies, 36(1), 4495. https://doi.org/10.25073/2525-2445/vnufs.4495

34.

Hoang

V. V.

(2022). Interpreting MOET’S 2018 general education English curriculum. VNU Journal of Foreign Studies, 38(5), 4866. https://doi.org/10.25073/2525-2445/vnufs.4866

35.

Jarvis

(2013). Capturing the Diversity in Lexical Diversity. Language Learning, 63(1), 87-106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

36.

Johnson

(2009). The Rise of English: The Language of Globalization in China and the European Union. Macalester International, 22, 131–168.

37.

Kim

(2002). The form and function of next-turn repetition in English conversation. Language Research, 38(1), 51–81.

38.

Kim

Crossley

S. A.

Kyle

(2018). Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal, 102(1), 120-141. http://www.jstor.org/stable/44981049

39.

Koizumi

(2012). Relationships between Text Length and Lexical Diversity measures: Can we use short texts of less than 100 tokens? Vocabulary Learning and Instruction, 1(1), 60-69. https://doi.org/10.7820/vli.v01.1.koizumi

40.

Kyle

Crossley

S. A.

(2015). Automatically assessing lexical sophistication: Indices, Tools, findings, and application. TESOL Quarterly, 49(4), 757-786. http://www.jstor.org/stable/43893786

41.

Kyle

Crossley

S. A.

Jarvis

(2021). Assessing the validity of lexical diversity using direct judgements. Language Assessment Quarterly, 18(2), 154-170. https://doi.org/10.1080/15434303.2020.1844205

42.

Lau

K. H.

Lam

Kam

B. H.

Nkhoma

Richardson

Thomas

(2018). The role of textbook learning resources in e-learning: A taxonomic study. Computers & Education, 118. https://doi.org/10.1016/j.compedu.2017.11.005

43.

Laufer

(1989). What percentage of text lexis is essential for comprehension? In Lauren

Nordman

(Eds.), Special Language: From Humans Thinking to Thinking Machines (pp. 316–323). Multilingual Matters.

44.

Laufer

(1995). Beyond 2000: A measure of productive lexicon in a second language. In Eubank

Selinker

Sharwood Smith

(Eds.), The current state of interlanguage. Studies in Honor of William E. Rutherford (pp. 265–272). John Benjamins.

45.

Laufer

Ravenhorst-Kalovski

G. C.

(2010). Lexical threshold revisited: Lexical Text coverage, learners’ vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30.

46.

Laufer

(2013). Lexical Thresholds for Reading Comprehension: What They Are and How They Can Be Used for Teaching Purposes. TESOL Quarterly, 47(4), 867–872.

47.

N. T. M.

Dinh

H. T.

(2022). Vocabulary coverage in a high school Vietnamese EFL textbook: A corpus-based preliminary investigation. Vietnam Journal of Education, 6(2), 102–113. https://doi.org/10.52296/vje.2022.1877

48.

Matthews

Cheng

(2015). Recognition of high frequency words from speech as a predictor of L2 listening comprehension. System, 52, 1–13. http://doi.org/10.1016/j.system.2015.04.015

49.

May

(2001). Language and minority rights: Ethnicity, nationalism, and the politics of language. Longman.

50.

McCarthy

P. M.

Jarvis

(2007). Vocd: A theoretical and empirical evaluation. Language Testing, 24(4), 459–488. https://doi.org/10.1177/0265532207080767

51.

McCarthy

P. M.

Jarvis

(2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392. https://doi.org/10.3758/brm.42.2.381

52.

McLean

(2021). The coverage comprehension model, its importance to pedagogy and research, and threats to the validity with which it is operationalized. Reading in a Foreign Language, 33(1), 126-140. http://nflrc.hawaii.edu/rfl

53.

Mesmer

H. A.

Hiebert

E. H.

(2015). Third graders’ reading proficiency reading texts varying in complexity and length: Responses of students in an Urban, High-Needs School. Journal of Literacy Research, 47(4), 473–504. https://doi.org/10.1177/1086296X16631923

54.

Miller

G. A.

(1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. The Psychological Review, 63(2), 81–97. Harvard University.

55.

Moghadam

S. H.

Zainal

Ghaderpour

(2012). A review on the important role of vocabulary knowledge in reading comprehension performance. Procedia - Social and Behavioral Sciences, 66, 555-563. https://doi.org/10.1016/j.sbspro.2012.11.300

56.

Nation

I. S. P.

(2012). The BNC/COCA word family lists. http://www.victoria.ac.nz/lals/about/staff/paul-nation

57.

Nation

I. S. P.

(2017). The BNC/COCA Level 6 word family lists (Version 1.0.0) [Data file]. http://www.victoria.ac.nz/lals/staff/paul-nation.aspx

58.

Nation

I. S. P.

(2022). Learning vocabulary in another language (3rd ed.). Cambridge University Press.

59.

Nguyen

T. T. M.

(2007). Textbook evaluation: The case of English textbooks currently in use in Vietnam’s upper- secondary schools College of Foreign Languages [Unpublished research report]. RELC SEAMEO. https://doi.org/10.13140/RG.2.1.1219.2165

60.

Nguyen

M. T. T.

(2011). Learning to communicate in a globalized world: To what extent do school textbooks facilitate the development of intercultural pragmatic competence? RELC Journal, 42(1), 17-30. https://doi.org/10.1177/0033688210390265

61.

Nguyen

C.-D.

(2020). Lexical features of reading passages in english-language textbooks for vietnamese high-school students: Do they foster both content and vocabulary gain? RELC Journal, 52(3), 509–522. https://doi.org/10.1177/0033688219895045

62.

T. A. T.

(2018). Pragmatic input in newly-published national English textbooks for Vietnamese students [Master’s thesis]. Macquarie University. https://doi.org/10.25949/19433474.v1

63.

Quốc hội nước Cộng hòa Xã hội Chủ nghĩa Việt Nam [The National Assembly of the Socialist Republic of Vietnam] (2019). Luật Giáo dục [Education Law].

64.

Read

(2000). Assessing Vocabulary. Cambridge University Press.

65.

Rost

(2002). Teaching and Researching: Listening. Routledge. https://doi.org/10.4324/9781315833705

66.

Richards

J. C.

(2001). Curriculum development in language teaching. Cambridge University Press. http://dx.doi.org/10.1017/CBO9780511667220

67.

Schmitt

Cobb

Horst

Schmitt

(2017). How much vocabulary is needed to use English? Replication of van Zeeland & Schmitt (2012), Nation (2006) and Cobb (2007). Language Teaching, 50(2), 212–226. https://doi.org/10.1017/s0261444815000075

68.

Schmitt

Jiang

Grabe

(2011). The percentage of words known in a text and reading comprehension. The Modern Language Journal, 95(1), 26–43. https://doi.org/10.1111/j.1540-4781.2011.01146.x

69.

Schmitt

(2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching, 47(4), 484–503. https://doi.org/10.1017/S0261444812000018

70.

Sun

Dang

T. N. Y.

(2020). Vocabulary in high-school EFL textbooks: Texts and learner knowledge. System, 93, 102279. https://doi.org/10.1016/j.system.2020.102279

71.

The jamovi project (2023). Jamovi (Version 2.4) [Computer Software]. https://www.jamovi.org.

72.

Thủ tướng Chính phủ [The Prime Minister] (2008). Quyết định về việc phê duyệt Đề án dạy và học ngoại ngữ trong hệ thống giáo dục quốc dân giai đoạn 2008 – 2020 [Decision on the approval of the project “Teaching and learning foreign languages in the national education system for the period of 2008 – 2020”] (Decision No. 1400/ QĐ-TTg).

73.

Thủ tướng Chính phủ [The Prime Minister] (2017). Quyết định phê duyệt, điều chỉnh, bổ sung Đề án dạy và học ngoại ngữ trong hệ thống giáo dục quốc dân giai đoạn 2017 – 2025 [Decision on the approval, adjustment and supplementation of the project “Teaching and learning foreign languages in the national education system for the period of 2017 – 2025”] (Decision No. 2080/ QĐ-TTg).

74.

(2018). Linguistic complexity analysis: A case study of commonly-used textbooks in Vietnam. SAGE Open, 8(3), 7586. https://doi.org/10.1177/2158244018787586.

75.

Ton Nu

A. T.

Murray

(2020). Pragmatic content in EFL textbooks: An investigation into Vietnamese national teaching materials. TESL-EJ, 24(3), 1–28. http://www.tesl-ej.org/wordpress/issues/volume24/ej95/ej95a8

76.

Trang

N. H.

Nguyen

D. T. B.

H. T.

(2023). Vocabulary demands of academic spoken English revisited: A case of university lectures and TED presentations. SAGE Open, 13(1), 5334. https://doi.org/10.1177/21582440231155334

77.

van Zeeland

Schmitt

(2013). Lexical coverage in L1 and L2 listening comprehension: the same or different from reading comprehension? Applied Linguistics, 34(4), 457–479. https://doi.org/10.1093/applin/ams074

78.

D. V.

Peters

(2021). Vocabulary in English language learning, teaching, and testing in Vietnam: A review. Education Sciences, 11(9), 563. https://doi.org/10.3390/educsci11090563

79.

Webb

Nation

(2008). Evaluating the vocabulary load of written text. TESOLANZ Journal, 16, 1–9. https://doi.org/10.26686/wgtn.12552152.v1

80.

Webb

(2009). The effects of pre-learning vocabulary on reading comprehension and writing. The Canadian Modern Language Review, 65(3), 441–70.

81.

Webb

Sasao

Ballance

(2017) The updated vocabulary levels test: Developing and validating the VLT. ITL - International Journal of Applied Linguistics, 168(1), 33–69. https://doi.org/10.1075/itl.168.1.02web

82.

Wolf

M. C.

Muijselaar

M. M. L.

Boonstra

A. M.

de Bree

E. H.

(2018). The relationship between reading and listening comprehension: Shared and modality-specific components. Reading and Writing, 2019(32), 1747–1767. https://doi.org/10.1007/s11145-018-9924-8

83.

Yang

Coxhead

(2022). A corpus-based study of vocabulary in the new concept English textbook series. RELC Journal, 53(3), 597–611. https://doi.org/10.1177/0033688220964162

84.

Younas

Dong

(2024). The impact of using animated movies in learning English language vocabulary: An empirical study of Lahore, Pakistan. Sage Open, 14(2), 8398. https://doi.org/10.1177/21582440241258398

85.

Yule

G. U.

(1944). Reginald Hawthorn Hooker, M.A. Journal of the Royal Statistical Society, 107(1), 74–77. http://www.jstor.org/stable/2981362

86.

Zenker

Kyle

(2021). Investigating minimum text lengths for lexical diversity indices. Assessing Writing, 47, Article 100505. https://doi.org/10.1016/j.asw.2020.100505

87.

Zhou

Younas

Omar

Guan

(2022). Can second language metaphorical competence be taught through instructional intervention? A meta-analysis. Frontiers in Psychology, 13, 1065803. https://doi.org/10.3389/fpsyg.2022.1065803