Sage Journals: Discover world-class research

Abstract

This study expands arguments calling for a more rigorous approach to high-frequency vocabulary list-based learning in EFL learning environments. Test and flashcard item designs were validated through quantitative (midterm and final test) and qualitative (survey) results to explore the impact of digital flashcards designed to build recall-level comprehension on both timed gap-fill and traditional multiple-choice posttests. Quizlet was chosen as the platform due to the affordance it provides teachers to create flashcard content and monitor practice. The results showed that multiple-choice, recognition-level test items result in a 20% overestimation of knowledge relative to gap-fill posttests. Additionally, a post-semester survey of 138 Japanese, pre-intermediate students of English showed a highly positive response to the recall-focused practice and testing system. The results demonstrate that for high-frequency L2 vocabulary, a paradigm shift from form-meaning recognition to form-meaning recall is an important direction for high-frequency vocabulary instruction and testing.

Plain language summary

English as a Foreign Language (EFL) instructors face the dilemma of not only choosing which vocabulary items to teach their students but also which level of knowledge to target for practice and assessment. To address this dilemma, this study examines the design, implementation, and evaluation of a Quizlet vocabulary curriculum for a pre-intermediate English course at International University (IU) (pseudonym) in Japan, specifically investigating how a Quizlet vocabulary curriculum can promote recall level knowledge on high-frequency vocabulary lists. Research was conducted to investigate the impact of digital flashcards via Quizlet designed to improve students’ ability to recall specific vocabulary words as opposed to choosing the vocabulary word from a set of multiple-choice options. Quizlet was selected as the tool because it allows instructors to make flashcard material and track student progress. The results of the study demonstrated that compared to recall test items, multiple-choice test questions resulted in a 20% overestimation of new vocabulary knowledge. In addition, 138 Japanese pre-intermediate English language learners responded very favorably to the recall-focused practice and assessment method in a post-semester survey. The findings show that test items for vocabulary tests should change from multiple choice to recall test items to promote a deeper knowledge of high frequency words.

Keywords

digital flashcard design vocabulary acquisition Quizlet New General Service List high-frequency vocabulary

Introduction

One challenge facing English as a Foreign Language (EFL) teachers is determining not only which lexical items to teach their students, but also the depth of knowledge to target for assessment and practice. Nation’s (2013) argument is that for EFL teachers to maximize learning, they should ensure that high-frequency vocabulary is chosen and that most items are unfamiliar to students in the class. Therefore, there has been a recent increase in intentional high-frequency word list learning through online apps and websites such as Quizlet, Memrise, and Anki in the last 10 years in language classes (Tsai & Tsai, 2018). Additionally, studying corpus-based high-frequency word lists via an app or online program is becoming the standard for many English language programs (Golonka et al., 2014).

However, for many of these online programs, vocabulary is tested through form-meaning recognition and multiple-choice questions. Recent research has shown the weakness of the form-meaning recognition test items (Gyllstad et al., 2015; Kremmel & Schmitt, 2016; Schmitt et al., 2020; Stewart, 2014; Stoeckel & Sukigara, 2018) and that form recall items might be a more valid way to test students on their word list item understanding because not only does it promote a deeper knowledge of the word but also it mimics the skill needed in reading more accurately (Kremmel & Schmitt, 2016; McLean et al., 2020; Nation & Webb, 2011; Stewart, 2014; Stoeckel et al., 2019). However, few studies have been conducted exploring the use of online applications such as Quizlet specifically for word list recall study. Moreover, analysis on the theoretical framework of L2 vocabulary acquisition is also lacking from the present studies (Yang et al., 2021). To address this gap, this study examines the design, implementation, and evaluation of a Quizlet recall vocabulary curriculum for a pre-intermediate English course at International University (IU) (pseudonym) in Japan, specifically investigating how a Quizlet vocabulary curriculum can promote recall-level knowledge on high-frequency vocabulary lists. Discussion of the results will include recommendations for online vocabulary study from high-frequency word lists, flash card design, and future research directions.

Literature Review

This paper first summarizes prior studies on wordlists, vocabulary recall, and recognition test items. Next the efficacy of utilizing digital flashcards and Quizlet for vocabulary learning will be discussed.

Word Lists

For most language programs, the most common factor utilized in the selection of word lists has been the frequency of lexical elements in the English language. By studying from a high-frequency word list, students are choosing the most important and useful words that they will encounter in English. The first English high-frequency word list was the General Service List (GSL) (West, 1953). However, the GSL has become outdated due to the rapid expansion of enormous digital corpora and changing perceptions about what constitutes a “word” (Stoeckel & Bennett, 2015). Recently, GSL has been updated and renamed the New General Service List (NGSL). Developed through an analysis of the Cambridge English Corpus, it comprises 2,800 lexical items that provide 92% coverage of the words students are likely to ever meet in the average newspaper, book, magazine, TV show, movie, or daily speech (Browne, 2013).

Different Types of Knowledge of a Word

Laufer and Goldstein’s (2004) research elucidates two critical issues for assessing word meaning knowledge: (a) the aspect of lexical information tested, word meaning (supplying the L2 meaning from the given form) or word form (supplying the L2 form for the given meaning), and (b) whether the learner must retrieve this knowledge from memory (recall) or simply recognize it from a list of options (recognition). Four types of form-meaning knowledge may be derived from these two dichotomies and Laufer and Goldstein (2004) have created a hierarchy from easiest to hardest to learn. In the hierarchy, the two recognition levels are considered easier than the two recall levels because the learner only has to recognize the meaning or the form from a set of options, not supply the meaning or the form as in recall. Examples of these concepts, updated in Stoeckel et al. (2019), employed in this study’s curriculum, are shown in Figure 1.

Figure 1.

Levels of vocabulary comprehension (based on a table in Stoeckel et al., 2019).

Criticisms of Meaning and Form Recognition Test Items

Recent scholarship has identified issues with meaning and form-recognition test items. First, fluent reading necessitates quick detection of the word form and retrieval of the appropriate meaning (Grabe, 2009). However, this is not tested in meaning-recognition test items because words, that may not be known when reading, are potentially recognized by the L2 learner from a list of options (Laufer & Goldstein, 2004; Nation, 2013; Schmitt, 2014; Stoeckel et al., 2019, 2021). Secondly, random guessing and test strategies can aid in answering meaning-recognition questions correctly (Gyllstad et al., 2015; Kremmel & Schmitt, 2016; Schmitt et al., 2020; Stewart, 2014; Stoeckel & Sukigara, 2018). Many studies have confirmed that recall test items are more congruous to the element of lexical knowledge needed in reading because when reading the L2 learner summons the meaning of the word from memory, without referring to multiple-choice options (Kremmel & Schmitt, 2016; McLean et al., 2020; Nation & Webb, 2011; Stewart, 2014; Stoeckel et al., 2019).

Research has shown that item format can greatly impact the overestimation of vocabulary. One of the weaknesses of meaning recognition items mentioned above, random guessing and test-taking strategies, can ultimately lead to the overestimation of lexical knowledge. Research in the last 5 years related to estimates of vocabulary size (the number of word families a learner knows) has shown that meaning recognition tests can lead from as little as 13.9% overestimation of vocabulary size (Stoeckel & Sukigara, 2018) to as large as 62.7% (Gyllstad et al., 2019). Other studies have predicted around 40% overestimation (Stoeckel et al., 2019; Stoeckel & Sukigara, 2018). In Stoeckel et al.’s (2019) study on the overestimation of student vocabulary when employing the Vocabulary Size Test (Nation & Beglar, 2007) it was found that when vocabulary size tests employ meaning recognition items the learners’ vocabulary size can be overestimated by 1,000 words or more.

Furthermore, while language testing has become more rigorous in the last decade many vocabulary tests are still designed with a lack of validation evidence and over-reliance on meaning recognition items. A recent paper written by three renowned vocabulary specialists, Schmitt et al. (2020), argues for more systematic procedures for vocabulary test creation. They emphasize that the test designers must first decide which level of mastery of the word they want students to attain and then when designing the test, they must be aware of item format weaknesses. This argument, for more rigorous procedures, can also be applied to online vocabulary learning and digital flashcard creation, which will be discussed below.

Online Vocabulary Learning

For vocabulary study, L2 learners have access to an ever-growing range of online learning resources. One way for assisting learners in acquiring a command of vocabulary is to employ online programs, often using gamification to keep learners systematically engaged. For example, digital flashcards have emerged as one alternative method for improving student vocabulary acquisition in the recent decade. While there are several online vocabulary study programs on the market today, each with its own set of features and claims to performance, the present study focuses on Quizlet, a free online program with flashcards, games and other learning tools for vocabulary learning. The website claims there are over 60 million monthly users and over 500 million different types of flashcard sets to supplement or replace textbook learning. The usefulness of Quizlet in improving vocabulary acquisition has been validated in several studies. Dizon (2016) reported substantial learning outcomes and positive student opinions of Quizlet after using it to facilitate Academic Vocabulary List (AVL) study for Japanese EFL learners. Other research has correlated the positive effects of Quizlet with receptive vocabulary acquisition and learner autonomy (Aksel, 2021; Cunningham, 2017; Sanosi, 2018; Waluyo & Bakoko, 2021; Wright, 2016).

In addition to promoting vocabulary learning, research has shown that students also prefer learning vocabulary through Quizlet. Bueno-Alastuey and Nemeth (2022) found that university students favored Quizlet to other vocabulary systems because they felt it was an effective way to acquire new vocabulary items. Consistent with those findings, when comparing several online digital flashcard programs: Study Stack, Cram.com, and Quizlet, C. W. Chien’s (2015) research revealed that the participants preferred Quizlet the most mainly because it has a gamification aspect with many modes and activities students can study. Specifically, students reported that they liked the typing and spelling option because it offers more engagement than a traditional flashcard (C. W. Chien, 2015), which can help them remember the word and how to spell it better (Altiner, 2011). In another study, C. Chien (2013) specifically investigated which modes of Quizlet Taiwanese EFL students prefer and why. Out of the 76 participants, 21 students picked Speller (now included in Learn) as their most effective vocabulary-learning tool, while 37 students preferred Space Race (renamed Gravity). Students liked Speller because it allowed them to develop their listening and spelling abilities while with Space Race, students discovered that they could not only improve their typing abilities but also recall vocabulary phrases.

Although Quizlet’s efficacy has already been shown by a number of quasi-experimental studies as shown above, research is still scant and the research that has been conducted is lacking theoretical underpinnings (Yang et al., 2021) or control groups (Nguyen & Le, 2023). Yang et al.’s (2021) systematic review of theoretical trends in technology integrated L2 vocabulary research found recent empirical studies to lack a clear theoretical basis and suggested that online flashcard L2 vocabulary curriculum should place a strong emphasis on having a clear theoretical justification. Among the various theoretical frameworks, the constructivist approach is important when designing high-frequency vocabulary learning systems because this approach places importance on prior vocabulary knowledge (Yang et al., 2021) so students only spend time learning individualized unfamiliar words. Additionally, the sociocognitive theory and washback theory are also important theoretical frameworks underlying online vocabulary learning. Included in sociocognitive theory is the notion that the teacher plays an important role in the creation of the learning environment (Unrau et al., 2018) while positive washback has been defined as test-design driven, constructive alignment that impacts the way teachers teach and learners learn, such that there is greater validity in the eyes of various stakeholders in a learning community (Shohamy, 1993).

Most studies of vocabulary acquisition that involve Quizlet lack control groups and investigation of students’ prior vocabulary knowledge, and mainly investigate students’ perceptions of Quizlet. One study did in fact use a control group with a pretest-posttest design and semi-structured interviews to determine the effect of working memory and prior vocabulary knowledge had on retention of vocabulary learning via Quizlet with 68 Vietnamese undergraduate participants (Nguyen & Le, 2023). Results from the quasi-experimental and semi-structured interviews demonstrated that students who used Quizlet to review vocabulary significantly improved their ability to recall the L1 meanings of the target terms (Nguyen & Le, 2023).

In summary, a large and expanding body of research has shown how utilizing digital flashcards may help EFL students learn vocabulary items. However, research on the topic has mostly been limited to employing qualitative approaches to describe learners’ preference for online vocabulary programs. Less is known about learners’ perspectives of a recall-focused vocabulary curriculum or about designing digital flashcards for recall word knowledge. Additionally, as mentioned previously, research has proven the weakness of recognition test items for vocabulary study (Laufer & Goldstein, 2004; Nation, 2013; Schmitt, 2014; Stoeckel et al., 2019, 2021), however, curriculum and test designers continue to use those test items. The present study aims to fill these gaps by investigating the overestimation of lexical knowledge and detailing the design, implementation, and evaluation of a recall vocabulary curriculum via Quizlet.

Research Questions

Based on the theoretical background discussed above, the present study seeks to answer the following research questions:

Do form-recognition, multiple-choice test items overestimate the knowledge of high-frequency vocabulary items when compared with recall level items, and if so, by how much?

Which Quizlet modes do students perceive as the most useful?

How do students perceive a recall-focused flashcard and testing system based on Quizlet?

Methods

Research Context

The study takes place at International University (pseudonym), a mid-sized private university in Japan, where students are required to take EFL courses up to an upper intermediate level (B1−B1+ on the Common European Framework of Reference for Languages (CEFR)). After completing the English program, students must complete 20 credits of content courses taught in English. Therefore, students must acquire high proficiency in academic English. The participants were 244 students (220 domestic Japanese and 24 from other Asian countries) enrolled in the pre-intermediate English level at IU, which is benchmarked at A2 on the CEFR scale. The course was 15 weeks long, and classes met twice per week for 95 min. Most students were first year although some were second or third-year students.

Vocabulary Study

The NGSL word list was chosen as an official word list of IU’s English program because of its high percentage of lexical word coverage. The decision to employ NGSL lists was one decision of a much more in-depth assurance of learning validation process. For IU’s curriculum, the NGSL list was broken down into manageable sub-lists of about 500 terms per 15-week semester for more in-depth study. A diagnostic test, the New General Service List Test (NGSLT), was conducted to allocate specific NGSL bands to each level (Stoeckel & Bennett, 2015). Based on the results, the pre-intermediate level was allocated the NGSL words from #1301 to #1800 (500 words). The pre-intermediate level was selected purposively as a site to pilot Quizlet (in contrast to other levels of the program which employed other online systems) because Quizlet afforded more control over the content of word lists, the design of flashcards, and their alignment with in-house tests.

Design of the Quizlet Vocabulary Curriculum

Once the NGSL list and specific band was chosen, a vocabulary curriculum and study program needed to be created to ensure that students reached the objectives of the program, which included mastering vocabulary both receptively and productively. The vocabulary program consisted of two types of Quizlet flashcards for each NGSL word (divided into 10 sets of 50 words), weekly practice exercises on the university learning management system, a form-recall-focused midterm, and a final test based upon the target Quizlet flashcards. Quizlet was chosen for vocabulary study for the pre-intermediate course at IU because of the affordance it provides course designers for controlling vocabulary list content and flashcard design so that it aligned with the overall program goals. Quizlet provided both teachers and students with a simply designed learning system that did not require a large amount of time to learn to use. The next section will explain the vocabulary curriculum which was designed to improve form-recall mastery.

Flashcard Design

For each of the items in the 10 sets of 50 words in the NGSL band of 1301-1800, two types of Quizlet flashcard sets (definition and gap-fill) were created. Both types provided the Japanese equivalent on the backside of the card. Here is an example of the definition flashcards. Each term had a corresponding definition and L1 translation:

Side 1: Term	Side 2: Definition
Repeat	to say (something) again 繰り返す

Here is an example of the term/gap-fill type flashcard.

Side 1: Term	Side 2: Gap-fill
Repeat	I wasn’t listening. Can you ______ the question? 繰り返す

Students were asked to complete five activities per week on Quizlet with each of the two types of flashcards (definition and gap-fill). It is important to note that a selected number of nouns and verbs on the gap-fill flashcards included different word endings called inflectional morphemes and derivational morphemes. Inflectional morphemes include making nouns plural (adding s, es) or changing the verb tense (adding s, ing, ed, ied). For example, the verb “read” becomes “reads.” Note the part of speech is not changed. Derivational morphemes also comprise different word endings, but do change part of speech, for example, the verb “read,” becomes a noun by adding er, “reader.” It was hypothesized that by giving students exposure to standard inflectional and derivational morphemes the students would have to notice cues provided in the context sentence and vary the word endings they produce. Rather than relying only on receptive knowledge of the word, their ability to change morphemes would also determine to what extent the student could master the usage of the word in slightly varying contexts.

Quizlet System

The Quizlet system provided various modes for vocabulary engagement for NGSL word list study. At the time of this study, Quizlet provided seven practice activities: Flashcard, Learn, Write, Spell, Test, Match, and Gravity. For each mode, the practice correlates to differing levels of processing depending upon the design of the flashcards (see Table 1).

Table 1.

Target Comprehension Level, Flashcard Design, and Quizlet Modes.

Level of vocabulary comprehension targeted	Flashcard design	Recommended Quizlet modes
Form recall	Side 1: word	Spell, Write, Gravity
Form recall	Side 2: context sentence with gap-fill	Spell, Write, Gravity
Meaning recall	Side 1: word	Matching
Meaning recall	Side 2: word in L2	Matching
Form recognition	Side 1: word	Learn, Test
Form recognition	Side 2 multiple-choice options	Learn, Test
Meaning recognition	Side 1: word	Flashcard, Match
Meaning recognition	Side 2: multiple-choice L2 definitions or L1 options	Flashcard, Match

In other words, each distinct mode gives students a chance to cover a range of form/meaning and recall/recognition questions. Students were instructed to complete five activities from each of these sets (definition and gap-fill), but teachers and students quickly realized that the gap-fill sets aligned better with the midterm and final tests and thus, students focused on the gap-fill sets with some students adding a few activities from the definition sets. Thus, students might report completing anywhere from zero to 14 activities per set of 50 NGSL words.

Learning Management System (LMS)-Based Exercises

In addition to weekly Quizlet study of the NGSL, students were also instructed to complete an NGSL-graded exercise on the LMS system. The graded exercise was a short story to practice the NGSL words for the given week. The story included matching and multiple-choice questions which were form-recognition level, along with a word bank with NGSL terms in groups by part of speech (nouns, verbs, adjectives) and a few gap-fill exercises that required students to change the inflectional morpheme. Students were instructed to do the exercise multiple times until they received a perfect score for a small portion of homework credit. The LMS exercises, along with the flashcard practice were homework to be completed outside of class time. As these exercises predominantly provided receptive practice in fresh contexts of NGSL terms, these exercises are known to have impacted recognition rather than recall-level test results and are not further described in this report.

Data Collection Tools and Procedures

To assess the participants’ vocabulary learning outcomes and whether form recognition leads to an overestimation of vocabulary knowledge, a midterm and final test were employed as a data source. To assess the participants’ perceptions of the vocabulary curriculum, a survey was administered.

Midterm and Final

The midterm and final were utilized to evaluate whether form recognition items overestimate vocabulary knowledge of NGSL terms when compared to (recall) gap-fills. The midterm and final were designed with the same context sentences provided as on the Quizlet card sets with 30 form recall gap-fill questions and 20 form recognition multiple-choice items. To create review sets, the mastery reports generated in Quizlet were used to determine which words the cohort had the most difficulty with. Then, many of these words appeared on the midterm and final test. The multiple-choice items were adapted from the Quizlet Test function. The main difference between the gap-fill (form recall) and the multiple-choice questions is the presence of the distractors (form recognition). See examples below.

Form Recall (30 items)

I wasn’t listening. Can you ______ the question? 繰り返す

Form Recognition (20 items)

Moderate exercise ________ good health. 促進させる1. promotes2. appreciates3. manufactures4. settles

Note that on the midterm and final 10 (of 30) gap-fill questions presented the learner with inflectional morpheme changes. For the midterm, the plural -s and third person agreement -s were added after the blank. Below is an example of a mid-term test item gap-fill.

My friends come from a variety of different _________ s. 背景(Answer: background)

In contrast to this, the final test did not provide hints in English but note the L1 equivalent included past tense marking (した).

The government _________ a new plan to increase taxes. 適用した/実行した(Answer: implemented)

On both the midterm and final test, the context sentence and L1 equivalent that was present on the Quizlet flashcard sets was the same one as the test item.

Survey

To answer Research Questions 2 and 3, during week 11 of the 15-week semester, a 10-item survey investigating student perceptions regarding vocabulary study was administered to all students in the pre-intermediate level. The survey consisted of closed-response questions on mode preference, vocabulary study preference, and the usefulness of Quizlet, and one open-response question asking participants for any comments or opinions about the overall vocabulary study systems. A pilot study (N = 149) was conducted in the previous semester to investigate how often students engaged in vocabulary study and which methods and tools they preferred. The survey was refined based on the results of that pilot study. 138 students who took the survey agreed to share their answers in this study and 52 answered the open-response question. Note, the survey was voluntary, therefore, not all teachers and students in the course decided to participate. Regarding the validity of the data, 138 out of 244 registered students represent 57% of the target population, thus it meets sampling sufficiency. Due to the Covid-19 pandemic, classes at the time were conducted online. Therefore, the survey was sent as a Google form via a chat link by class teachers. The open-response question on the survey was used for the purpose of triangulation.

Data Analysis Procedure

Midterm and Final: The mean scores for the form recall (gap-fill) and form recognition (multiple-choice) sections were calculated for both the midterm and final tests. The overestimation is the mean percentage on the recognition section minus the mean percentage score on the recall section. Descriptive statistics were calculated in Microsoft Excel.

Survey: Descriptive statistics, such as frequency counts and mean, were employed for the closed answer quantitative data analysis (Nunan & Bailey, 2008) to enrich understanding of students’ preferences and opinions of Quizlet. For the open-response answers, data was coded and analyzed for thematic constructs and patterns. The data was interpreted by drawing on past research (Creswell, 2008). The open-response answers illuminated the students’ opinions by providing more details, for example, as to why they preferred Quizlet over other means of studying vocabulary.

Findings

Form-Recognition and Its Overestimation of High-Frequency Vocabulary Knowledge

As shown in Table 2, the results indicated that multiple-choice, form recognition test section scores of NGSL knowledge were more than 22% higher than gap fill, form recall test scores. Furthermore, when inflections were supplied on the midterm for gap-fill questions, there was still a 16% overestimation with the multiple-choice, form-recognition test items. On the final, where students had to produce the target word and inflection or derivation, there was almost a 25% overestimation by the form-recognition test.

Table 2.

Fall 2020 Pre-Intermediate Midterm and Final Test Results.

n = 240	Form recall		Form recognition		Overestimation:Form Recognition % minusForm Recall %
	Raw score	Percent	Raw score	Percent	Overestimation:Form Recognition % minusForm Recall %
Midterm gap fill (Exact)	46.05	76.75	37.15	92.88	16.1
Midterm gap fill (Revised)	47.57	79.28	NA	NA	13.6
Final gap fill (Exact)	39.43	65.71	36.42	91.05	25.3
Final gap fill (Revised)	40.91	68.18	NA	NA	22.87
Total average − Exact scoring	42.74	71.23	36.79	91.97	20.7
Total average − Revised for correct synonyms	44.24	73.73	NA	NA	18.24

Note. The error margin for providing correct synonyms in the gap fill is from 0 to 2.5%.

The fact that the gap-fill average of midterm test scores was higher than the gap-fill average for the final test scores suggests that providing the word ending, especially for the plural and the -s in third person verb endings, was facilitative for the students. Moreover, the lower final test scores for the gap fill section suggests that students have difficulty with picking up cues to determine the part of speech represented by the blank in the given sentence or that students did not know how to alter the word ending to correctly form the target part of speech.

To determine the potential error range, both exact scoring and revised scoring of the gap fill portion of the midterm and final was carried out. About 2.5% of incorrect responses on the recall test items resulted from the recall of near-synonyms of the target word on the gap-fill questions on the tests. This semantic interference has been reported in other studies (Nation, 2022; Stewart et al., 2021). Students regularly supplied near synonyms instead of the target term from the flashcard list, sometimes with the correct ending morpheme, and sometimes without it:

The government implemented a plan to increase taxes.適用した/実行した

In the test, several students provided the words “applied” or “executed”. These answers correctly complete the gap fill but were different from the word provided in the practice materials. The results shown above in Table 2 show that after correcting for when students supplied different words that had the correct meaning and correct form including inflection, the recall test scores increased by about 2.5%. It is problematic to know whether a student who did not produce the target synonym had recalled an alternative solution or did not know the target item, so there is a potential error of from 0 to 2.5% in the overestimation calculations.

NGSL Vocabulary Study System Survey Results

Students’ Perceptions of Useful Quizlet Modes

As seen in Figure 2, the students felt the Spell and Write activities were most useful for learning. This is not surprising as these two activities most focused on the form recall knowledge required on the midterm and final NGSL vocabulary tests. The Learn mode, which presents a variety of activities, was the third most helpful according to the students. In this activity with the form-recall design, the learner can listen to the context sentence and choose the best option while having access to the Japanese equivalent and context sentence in English. After these three, the traditional “Flashcard” activity and “Test” activities were about equally rated.

Figure 2.

Quizlet activities perceived as most useful for learning by students.

Students’ Perceptions of Recall-Focused Flashcards on Quizlet

The results in Table 3 show that the hybrid form-recall NGSL Quizlet/LMS system was by far the preferred system over other options.

Table 3.

Student Perceptions of Study Methods for the Current Semester and the Future.

Method type	What method do you think is effective for you when you study vocabulary?	Which method would you prefer for vocabulary study in the future?
Quizlet + LMS NGSL	58.0%	75.4%
LMS NGSL system only	22.5%	8.0%
Previous system	10.1%	10.9%
Something different	9.4%	5.8%

In the open responses, 22 students wrote that Quizlet was useful or helpful for their vocabulary study. For example, one student wrote, “The Quizlet system is great. I think I was able to memorize well because of the repetition.” Another student wrote, “Quizlet had many different modes that helped me remember words.” Five students wrote that they thought Quizlet study was fun. Just 5.8% of the learners expressed a desire to try something completely different in a future semester.

To understand how students studied, the participants were also asked which combinations of methods they used. According to the students’ responses in Figure 3, most of the students reported using the Quizlet NGSL sets with other methods or only the Quizlet sets. Taken together, these two combinations accounted for 72% of the student preferences. Overall, 42% of the respondents reported completion of the university learning management system’s practice quizzes every week.

Figure 3.

Methods used to study.

The students were required to complete more than five Quizlet activities per week, a time-consuming task, so it was important to find out how many activities they completed on average each week. About 65% of the students reported completing five or more activities per week and about 35% reported completing fewer than five activities per week. Eight students wrote in the open response that completing the activities required too much time or effort. For example, one student wrote, “The amount of work I have to do each time is too much, and I lose motivation.” This is not surprising as recall-level items demand more effort from students and are more time-consuming.

Discussion

Overestimation of Vocabulary Knowledge

Overestimation can vary in several ways. Considering the depth of knowledge model in Figure 1, it is possible to conceive of the following gaps in knowledge estimation:

a. Between form recognition and form recall

b. Between meaning recognition and meaning recall

c. Between form recognition and meaning recognition

d. Between meaning recall and form recall

Other overestimation studies have considered other aspects of knowledge and item design. It is interesting that the gap in this study (18%–23%) is not as large as in studies of overestimation with respect to Vocabulary Size, using the Vocabulary Size Test. For example, Gyllstad et al. (2019) reported a 62.7% overestimation of vocabulary size on the Vocabulary Size Test. Stoeckel and Sukigara (2018) reported a 13.9% overestimation and Stoeckel et al. (2019) reported a 41.9% overestimation of Vocabulary Size. These studies aim was to provide data to support that the use of receptive tests to make inferences about learners’ general knowledge is not valid. Additionally, these studies were improving recognition-level item designs that are still inferior to the form recall item design covered in the present study.

There are multiple pieces of evidence in this study that suggest the gap-fill design of midterm and final test items are successfully working to provide positive washback on instructional and student study strategies (Shohamy, 1993). First, the prevalence of students prioritizing the write, spell and learn modes of Quizlet shows that students are engaging in more cognitively difficult vocabulary tasks. Second, by identifying high-frequency words that many students were missing for the midterm and final test items, combined with the test results that showed these words were receptively mastered according to receptive, multiple-choice tests, suggests that the learning community should be ready to embrace more rigorous learning standards for such high-frequency vocabulary. Teng and Xu (2022) have recently demonstrated that moving learners to productive knowledge from receptive knowledge would not only entail changing expectations from multiple-choice, recognition-level test items to gap-fill items, but also to move from gap-fill to sentence translation, and sentence translation to sentence writing item types.

Recently (2023), Quizlet has improved many options and functions to give administrators more flexibility in creating test items to meet their needs. However, it has also removed some of the practice functions, such as “Gravity” and “Write” modes and moved the “Spell” mode as a special option within the Learn mode. Quizlet seems to be promoting the easiest modes for learners to complete, perhaps in order to encourage a sense of achievement. Course designers and teachers need to be aware of this and develop test item plans that promote the appropriate depth of knowledge for high-frequency vocabulary. Additionally, pretests could be used to determine the extent to which a particular cohort already has receptive knowledge of particular word list (Sevigny & Ramonda, 2013), and then design supplemental exercises at the sentence translation and sentence writing level such as those demonstrated in Nguyen and Le (2023). Thus, midterm and final test designs for NGSL items could reasonably expect learners to demonstrate not only inflectional and derivational morpheme manipulation at the word level, but also common collocation knowledge through sentence translation for example, where complete phrases are expected in the gap.

To better deal with semantic interference from synonyms the students already know, one solution is to provide the first letter of the target word as a hint along with the L1. For example:

The government _________ a plan to increase taxes.適用した/実行した (first letter = i)

A second approach to such items where there are multiple possible synonyms is to provide a scrambled set of the letters with the L1 term. This approach can help prime students to notice cues available for inflection and derivation marking (Schmidt & Frota, 1986). In this case, the cues are the past tense in the L1 term and the letters are provided with the scrambled set of letters. For example:

The government ______ a plan to increase taxes.適用した/実行した (ditmeeepln)

Overall, these findings indicate that there was indeed a gap between average recognition and recall item scores and supports the growing body of research in this area that teaching recognition knowledge via multiple-choice items of the dictionary form (lemma form) does not result in transfer to recall-level knowledge, the knowledge that is more congruous to the lexical knowledge needed when reading (Kremmel & Schmitt, 2016; McLean et al., 2020; Nation & Webb, 2011; Stewart, 2014; Stoeckel et al., 2019). In other words, implementing more rigorous levels of recall-level, gap-fill vocabulary items for digital flashcards are warranted for high-frequency vocabulary to build core fluency.

Students’ Opinions of Quizlet

These findings add to the growing body of research on which modes students prefer in Quizlet. For example, like C. Chien’s (2013, C. W. Chien’s 2015) study, the participants in this study also liked Spell and Write, and liked modes where they typed the word (Altiner, 2011). However, different to the students in Chien’s study who preferred Space Race (re-named Gravity), only 9.1% of students in this study liked Gravity the best. This might be because, for those slow at typing, slow at recall, or slow at both, this can quickly become an onerous task.

The mainly positive results from the survey support other research on learner preferences toward Quizlet (Bueno-Alastuey & Nemeth, 2022; C. W. Chien, 2015; Dizon, 2016). For example, students in this study also felt Quizlet was fun and engaging as found in Bueno-Alastuey and Nemeth (2022) and C. W. Chien (2015). Furthermore, like Dizon’s (2016) research, which had only 9 participants, students in this study also perceived Quizlet to be easy to use and they answered they would like to use it in the future. The current study therefore can fill the knowledge gap with its larger sample size.

Overall, it was found that the design of Quizlet as a vocabulary application provides two important affordances for language teachers and program coordinators. First, it allows all stakeholders the power to construct their own flashcard sets. Second, the Quizlet teacher account allows teachers to track which vocabulary activities students have completed and identifies items with which a class group has had difficulties by providing the students’ mastery of each word in percentages. This allows for easily making review sets for midterm and final exam study. In fact, this option combined with the students’ perception that there was too much practice required and the low level of support for subsequent use of LMS activities suggests that the LMS practice activities could be dropped.

Conclusion

This study has sought to elucidate the appropriate rigor with which to assess recall-level knowledge of high-frequency vocabulary. The results of this study emphasize and challenge several ongoing misconceptions. One is if learners are tested on the form-meaning recognition of high-frequency vocabulary they will be able to recognize the word when reading. Another is that they will be able to recall these words. A third misconception is that if they have form recognition ability, then they will be able to adjust inflections to work in typical contexts. This study adds to research that can improve learning objectives, test item writing, and digital flashcard designs to enhance washback effects and help learners develop a more active L2 vocabulary core.

It is recommended that professional associations of language teachers and their special interest groups whose focus is vocabulary learning can pool teacher contributors to help with constructing flashcard sets on platforms like Quizlet. Such collaboration can improve all the factors identified with the process of normalizing blended vocabulary programs (Mack et al., 2021): transparency, teacher knowledge, skills, attitudes, and constructive alignment. More collaboration between Ed Tech companies like Quizlet and universities could help with the design of better analytic dashboards for teachers and program coordinators. For example, popular modes that research supports as effective would not be eliminated as with the case of Gravity mode.

Investing in the creation and implementation of tools such as Quizlet can have major positive impacts on student vocabulary knowledge and can be a method to practice form-recall. Moreover, students believe in its efficacy and show some gains in the strength of vocabulary recall. This study also supports the argument that form-recognition and form-recall need to be assessed separately. When choosing vocabulary targets for English course curriculum, objectives that direct students toward form-recall are more challenging and need consistent, recall-focused activities to practice. Form recognition does not need to be eliminated, but rather assessed separately for maximum vocabulary coverage. Curriculum designers need to learn how to distinguish which words in a unit are appropriate to test at the recognition level, which words should be tested at the recall level, and which high frequency words students should be able to change inflections for. There is a pervasive belief that when students have meaning recognition knowledge of a word, they have mastered the whole word family. This study has challenged that notion and hopes to influence testing instrument makers, industry leaders, and program stakeholders. This research suggests that with vocabulary list study, form-recall practice of common morphological and derivational inflections is an essential part of a high frequency vocabulary learning program.

Footnotes

Acknowledgements

We would like to thank the coordinators of our Assurance of Learning Team for their thoughtful feedback. Specifically, we want to thank Malcolm Larking and James Blackwell for their support in creating this curriculum.

Contribution

Paul Sevigny: Conceptualization, methodology, investigation, data analysis, resources, data curation, writing-original draft, reviewing and editing, visualization Lindsay Mack: Conceptualization, methodology, data analysis, writing-original draft, reviewing and editing Lance Stilp: Resources, writing-original draft, reviewing and editing, visualization Maiko Berger: Methodology, investigation, resources, reviewing.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethics Statement

All decisions about data collection were made in conjunction with regular Assurance of Learning meetings and were part of a bigger Assurance of Learning Program wide project. Data was collected with the permission and approval of the Center for Language Education, although not an independent ethics committee, it ensured we followed the ethical guidelines set forth by our University. Great care was taken to ensure ethical procedures and guidelines set out by our university were followed for data collection and all permissions were granted. All surveys were anonymous, voluntary, and any identifying information has been removed to protect confidentiality.

ORCID iDs

Paul Sevigny

Lindsay Mack

Lance Stilp

Maiko Berger

Data Availability Statement

The data instruments and anonymized survey results that support the findings in this study are available on Open Science Framework upon request to the author.

References

Aksel

(2021). Vocabulary learning with Quizlet in higher education. Language Education and Technology, 1(2), 53–62.

Altiner

(2011). Integrating a computer-based flashcard program into academic vocabulary learning. Iowa State University. https://www.learntechlib.org/p/116246/

Browne

(2013). The new general service list: Celebrating 60 years of vocabulary learning. Language Teaching, 4, 13–16.

Bueno-Alastuey

M. C.

Nemeth

(2022). Quizlet and podcasts: Effects on vocabulary acquisition. Computer Assisted Language Learning, 35(7), 1407–1436. https://www.learntechlib.org/p/116246/

Chien

(2013). Perception and practice of Taiwanese EFL learners’ making vocabulary flashcards on Quizlet. International Association for Development of the Information Society.

Chien

C. W.

(2015). Analysis the effectiveness of three online vocabulary flashcard websites on L2 learners’ level of lexical knowledge. English Language Teaching, 8(5), 111–121.

Creswell

J. W.

(2008). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Sage Publications, Inc.

Cunningham

K. J.

(2017). Quizlet for learner training and autonomy. In Hubbard

Ioannou-Georgiou

(Eds.), Teaching English reflectively with technology (pp. 123–135). chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://members.iatefl.org/downloads/sigs/LTSIG_ebook.pdf

Dizon

(2016). Quizlet in the EFL classroom: Enhancing academic vocabulary acquisition of Japanese university students. Teaching English with Technology, 16(2), 40–56.

10.

Golonka

E. M.

Bowles

A. R.

Frank

V. M.

Richardson

D. L.

Freynik

(2014). Technologies for foreign language learning: A review of technology types and their effectiveness. Computer Assisted Language Learning, 27(1), 70–105. https://doi.org/10.1080/09588221.2012.700315

11.

Grabe

(2009). Reading in a second language. Cambridge University Press.

12.

Gyllstad

McLean

Stewart

(2019, July 1–3). Empirically investigating the adequacy of item sample sizes of vocabulary levels and vocabulary size tests: A bootstrapping approach [Paper presentation]. Paper presented at the Vocab@Leuven Conference, Leuven, Belgium.

13.

Gyllstad

Vilkaitė

Schmitt

(2015). Assessing vocabulary size through multiple-choice formats: Issues with guessing and sampling rates. International Journal of Applied Linguistics, 166(2), 278–306. https://doi.org/10.1075/itl.166.2.04gyl

14.

Kremmel

Schmitt

(2016). Interpreting vocabulary test scores: What do various item formats tell us about learners’ ability to employ words? Language Assessment Quarterly, 13(4), 377–392. https://doi.org/10.1080/15434303.2016.1237516

15.

Laufer

Goldstein

(2004). Testing vocabulary knowledge: Size, strength, and computer adaptiveness. Language Learning, 54(3), 399–436. https://doi.org/10.1111/j.0023-8333.2004.00260.x

16.

Mack

Sevigny

Larking

Stilp

(2021). Validating the normalization of vocabulary systems in a university EFL program. Cogent Education, 8(1), 1–15. https://doi.org/10.1080/2331186x.2021.1985688

17.

McLean

Stewart

Batty

A. O.

(2020). Predicting L2 reading proficiency with modalities of vocabulary knowledge: A bootstrapping approach. Language Testing, 37(3), 389–411. https://doi.org/10.1177/0265532219898380

18.

Nation

I. S. P.

(Ed.). (2013). Learning vocabulary in another language (2nd ed.). Cambridge University Press.

19.

Nation

I. S. P.

(2022). Learning vocabulary in another language. (3rd ed.). Cambridge University Press.

20.

Nation

I. S. P.

Beglar

(2007). A vocabulary size test. Language Teaching, 31(7), 9–13. https://openaccess.wgtn.ac.nz/articles/journal_contribution/A_vocabulary_size_test/12552197

21.

Nation

I. S. P.

Webb

S. A.

(2011). Researching and analyzing vocabulary. Heinle, Cengage Learning.

22.

Nguyen

L. Q.

H. V.

(2023). Enhancing L2 learners’ lexical gains via Quizlet learning tool: The role of individual differences. Information Technologies in Education, 28, 12143–12167. https://doi.org/10.1007/s10639-023-11673-0

23.

Nunan

Bailey

K. M.

(Eds.). (2008). Exploring second language classroom research: A comprehensive guide (1st ed.). Heinle ELT.

24.

Sanosi

A. B.

(2018). The effect of Quizlet on vocabulary acquisition. Asian Journal of Education and eLearning, 6(4), 71–77.

25.

Schmidt

Frota

S. N.

(1986). Developing basic conversational ability in a second language: A case study of an adult learner of Portuguese. In Day

R. R.

(Ed.), Talking to learn: Conversation in second language acquisition (pp. 237–326). Newbury House.

26.

Schmitt

(2014). Size and depth of vocabulary knowledge: What the research shows. Language Learning, 64(4), 913–951. https://doi.org/10.1111/lang.12077

27.

Schmitt

Nation

Kremmel

(2020). Moving the field of vocabulary assessment forward: The need for more rigorous test development and validation. Language Teaching, 53(1), 109–120. https://doi.org/10.1017/s0261444819000326

28.

Sevigny

Ramonda

(2013). Vocabulary: What should we test? In Sonda

Krause

(Eds.), JALT2012 Conference Proceedings. Tokyo: JALT. https://jalt-publications.org/sites/default/files/pdf-article/jalt2012-73.pdf

29.

Shohamy

(1993). The power of tests: The impact of language tests on teaching and learning. NFLC Occasional Papers.

30.

Stewart

(2014). Do multiple-choice options inflate estimates of vocabulary size on the VST? Language Assessment Quarterly, 11(3), 271–282. https://doi.org/10.1080/15434303.2014.922977

31.

Stewart

Stoeckel

McLean

Nation

Pinchbeck

G. G.

(2021). What the research shows about written receptive vocabulary testing: A reply to Webb. Studies in Second Language Acquisition, 43(2), 462–471. https://doi.org/10.1017/s0272263121000437

32.

Stoeckel

Bennett

(2015). A test of the New General Service List. Vocabulary Learning and Instruction, 4(1), 1–8.

33.

Stoeckel

McLean

Nation

(2021). Limitations of size and levels tests of written receptive vocabulary knowledge. Studies in Second Language Acquisition, 43(1), 181–203. https://doi.org/10.1017/s027226312000025x

34.

Stoeckel

Stewart

McLean

Ishii

Kramer

Matsumoto

(2019). The relationship of four variants of the vocabulary size test to a criterion measure of meaning recall vocabulary knowledge. System, 87, 1–14. https://doi.org/10.1016/j.system.2019.102161

35.

Stoeckel

Sukigara

(2018). A serial multiple-choice format designed to reduce overestimation of meaning-recall knowledge on the vocabulary size test. TESOL Quarterly, 52(4), 1050–1062. https://doi.org/10.1002/tesq.429

36.

Teng

(2022). Pushing vocabulary knowledge from receptive to productive mastery: Effects of task type and repetition frequency. Language Teaching Research, 1–19. https://doi.org/10.1177/13621688221077028

37.

Tsai

Y. L.

Tsai

C. C.

(2018). Digital game-based second-language vocabulary learning and conditions of research designs: A meta-analysis study. Computers & Education, 125, 345–357. https://doi.org/10.1016/j.compedu.2018.06.020

38.

Unrau

N. J.

Alvermann

D.E.

Sailors

(2018). Literacies and their investigation through theories and models. In Alvermann

D.E.

Unrau

N.J.

Sailors

Ruddell

R.B.

(Eds.), Theoretical models and processes of literacy (pp. 3–34). Routledge.

39.

Waluyo

Bakoko

(2021). Vocabulary list learning supported by gamification: Classroom action research using Quizlet. Journal of AsiaTEFL, 18(1), 289–299.

40.

West

(1953). A general service list of English words. Longmans.

41.

Wright

B. A.

(2016). Transforming vocabulary learning with Quizlet. In Clements

Krause

Brown

(Eds.), Transformation in language education (pp. 436–440). JALT.

42.

Yang

Kuo

L. J.

Eslami

Z. R.

Moody

S. M.

(2021). Theoretical trends of research on technology and L2 vocabulary learning: A systematic review. Journal of Computers in Education, 8, 465–483. https://link.springer.com/article/10.1007/s40692-021-00187-8

High-Frequency Vocabulary: Moving From Recognition to Recall Level on Quizlet

Abstract

Plain language summary

Keywords

Introduction

Literature Review

Word Lists

Different Types of Knowledge of a Word

Criticisms of Meaning and Form Recognition Test Items

Online Vocabulary Learning

Research Questions

Methods

Research Context

Vocabulary Study

Design of the Quizlet Vocabulary Curriculum

Flashcard Design

Quizlet System

Learning Management System (LMS)-Based Exercises

Data Collection Tools and Procedures

Midterm and Final

Survey

Data Analysis Procedure

Findings

Form-Recognition and Its Overestimation of High-Frequency Vocabulary Knowledge

NGSL Vocabulary Study System Survey Results

Students’ Perceptions of Useful Quizlet Modes

Students’ Perceptions of Recall-Focused Flashcards on Quizlet

Discussion

Overestimation of Vocabulary Knowledge

Students’ Opinions of Quizlet

Conclusion

Footnotes

Acknowledgements

Contribution

Declaration of Conflicting Interests

Funding

Ethics Statement

ORCID iDs

Data Availability Statement

References