Abstract
Vocabulary card learning stands out as one of the most efficient methods for memorizing a large number of words in a short period of time. Multilinguals have the possibility to use their multilingual repertoire to their advantage when learning new words, for example in the acquisition of formally overlapping words. This study aims to answer the as-yet unexplored question of whether the use of bilingual translations in vocabulary cards can provide advantages for multilinguals when learning vocabulary even in a linguistically distant language. Furthermore, predictions based on the ‘type of processing – resource allocation’ (TOPRA) model regarding trade-offs through semantic processing are taken into consideration. In this study, traditional vocabulary card learning is compared to two further elaborated learning tasks which included multilingual translations. In a 3 × 3-within-participant design, 72 participants learned 60 Estonian nouns in three conditions: (1) monolingual: one translation in the participant’s dominant language, (2) bilingual: two translations in the participant’s two strongest languages, and (3) multilingual semantic+: one translation in the dominant language including the instruction to retrieve the meaning of the Estonian word in any other known language. The dependent measures were immediate and delayed L2–L1 cued-recall post-tests and efficiency (ratio of learning performance to learning time). The results show that the monolingual condition significantly outperforms the two other conditions in all recall tests. In terms of efficiency, however, the monolingual condition only significantly outperforms the multilingual semantic+ condition. The comparison between the bilingual and the multilingual semantic+ condition shows descriptively (but not statistically significant) better performance for the bilingual condition.
Keywords
I Introduction
Recent studies on vocabulary card learning have predominantly focused on the comparative efficacy of physical versus digital vocabulary cards (Nakata, 2008, 2011), as well as the impact of spacing and massing strategies (Kornell, 2009; Pyc & Rawson, 2007). This study addresses a new question that has – as far as our knowledge of the literature goes – not yet thoroughly been addressed or tested in the field of vocabulary card learning: the explicit integration of the learners’ multilingual repertoire through multilingual translations and their possible learning outcomes. This study seeks to address this gap by investigating whether it is worthwhile incorporating multilingual translations for learners who are studying vocabulary of a genealogically unrelated language relative to their multilingual repertoire.
Moreover, our study delves into the processing of vocabulary words in a foreign language. At present, contrasting perspectives exist regarding how increased semantic engagement through elaborated tasks can affect learning success (see Barcroft, 2002; Craik & Lockhart, 1972). By employing a semantically elaborated vocabulary card learning task, we directly compare and test the impact of increased semantic engagement on learning success. This should contribute to our understanding of how vocabulary card learning can be successfully and efficiently implemented in the learning process.
II Literature review
1 Vocabulary card learning
When learning a new foreign language, learners are dependent on encountering, understanding and ultimately making use of its vocabulary both receptively and productively. Given the sheer number of lexical items that need to be acquired in order to guarantee a successful use of a foreign language (see Hsueh-chao & Nation, 2000; Nation, 2006; Webb & Rodgers, 2009), vocabulary card learning is one of the most efficient methods for memorizing a large number of words within a short period of time (Nakata, 2011; Nation, 2022), and to successfully retrieve them later when using the target language (Bahrick & Phelps, 1987; Beaton et al., 1995; Webb et al., 2020). This paired associate learning can be defined as ‘forming associations between a foreign language word form (written or spoken) and its meaning (often in the form of a first-language translation, although it can also be a second-language definition or a picture or real object)’ (Nation, 2013, p. 437). Vocabulary card learning not only enables learners to establish form–meaning connections, it also provides the opportunity for repeated exposures to previously encountered lexical information and thus fosters active retrieval and entrenchment of memory traces (Barcroft, 2007; Nakata, 2017; Nation, 2022).
2 The multilingual lexicon and its role in learning a new language
Multilinguals store all the words in any language that they know productively and/or receptively in the mental lexicon. The mental lexicon consists of a system of categories and concepts with strong corresponding associations to words in a person’s first language (De Wilde et al., 2020; Kroll & Stewart, 1994), but associations with languages acquired later are possible too (Nation, 2022). In the mental lexicon, network structures are present that follow the assimilation process; similar concepts or similar phonological or orthographic word forms are stored nearby and are systematically co-activated whenever similar forms or concepts are accessed receptively or productively (Baxter et al., 2022; Beaton et al., 1995; Dijkstra & van Heuven, 2002; Ellis & Beaton, 1993; Karuza et al., 2016; Zhao et al., 2017; Zinszer et al., 2016). Therefore, the words stored in one’s multilingual lexicon can help to successfully implement words of a new language to the already existing network structures as long as the degree of overlap between the stored word in a given language and the newly learned word is high (Puimège & Peters, 2019; Tonzar et al., 2009). This is the case for cognate words (De Groot & Keijzer, 2000; Marecka et al., 2021; Sánchez-Casas & García-Albea, 2005; Xue et al., 2022) or words with formal or semantic similarity (Deconinck et al., 2017; Dijkstra & van Heuven, 2002; Karuza et al., 2016; Zhao et al., 2017; Zinszer et al., 2016).
The mainstream view is that multilinguals, thanks to their wider (lexical, phonological, grammatical) repertoires, benefit from heightened linguistic awareness as well as other advantages, and are thus overall better learners of additional languages (D’Angelo, 2023, p. 8; Montanari, 2019, p. 299). There is, however, also a series of studies that question such overall advantages in additional language learning (Berthele & Udry, 2022; Lorenz et al., 2020).
3 Current debates on the effect of increased semantic processing
Cognitive engagement in learning tasks and its positive and negative consequences have been a popular research topic in the last few years and have led to various debates about whether more cognitive involvement leads to greater learning success (see Laufer & Hulstijn, 2001; Zarifi et al., 2021) or to less learning due to a bottleneck of limited cognitive capacities (see Barcroft, 2002). Craik and Lockhart (1972) assume within their ‘level of processing’ (LOP) framework that the potential of learning a word depends on different levels, ranging from shallow to deep processing: the deeper the engagement, the higher the learning potential. Shallow analysis is said to involve visual perception of the word form and auditory phonemic word processing. Deep engagement is then said to be brought about through semantic analysis of the given word. Therefore, according to the LOP, semantically oriented learning tasks lead to deeper processing and understanding of the word and hence to higher learning success than structural oriented learning tasks. Several studies support this hypothesis in the field of memory for first language (L1) (Hyde & Jenkins, 1969; Schulman, 1971; Tresselt & Mayzner, 1960), as well as in the domain of new words in a foreign language, the latter also including the keyword learning method (Atkinson & Raugh, 1975; Pressley et al., 1982).
Barcroft (2015, p. 60) states that the LOP has significantly influenced research in second language (L2) language acquisition. However, he criticizes the associated assumption that focused and increased attention to the meaning of a word in L2 vocabulary acquisition will necessarily be beneficial in all test formats. Within his ‘type of processing – resource allocation’ (TOPRA) model, Barcroft (2002, 2003) therefore argues that semantically oriented learning tasks, depending on how word memory is tested, can also lead to poorer learning gains due to reduced form or mapping processing resources available. The TOPRA model ‘makes predictions about how different types of tasks are likely to affect processing resource allocation during L2 vocabulary learning’ (Barcroft, 2015, p. 64). Thus, the model assumes that learners do not have infinite cognitive processing resources: at a certain point, they reach their maximum capacity and cannot absorb and process any additional information. This assumption of limited resources is represented in Figure 1 by the two thick lines on the left and the right hand sides of the graph. Depending on the learner and their individual differences in their processing capacity, these lines may be closer or farther apart, but they represent an individual’s maximum processing capacity.

The type of processing – resource allocation (TOPRA) model.
If within those finite resources all three components – namely semantic processing, form processing and processing for mapping – are used to the same extent, one encounters the situation as depicted in Figure 1A. If, however, a given type of task results in more engagement with one component, a redistribution of processing resources takes place as more resources are allocated to the component in question while fewer resources are available for the other components (see Figures 1B and 1C). This can ultimately lead to different learning outcomes: if, for instance, more processing resources are allocated to the semantic component by means of a semantically oriented task, this results in the reduced availability of structural and mapping resources (see Figure 1B). If the recall test is also semantically related (for example free recall in L1), higher learning outcomes can be expected, as more resources and attention have been allocated to the semantic processing during the learning phase. If, however, the type of test is structural in nature (for example a free-recall test in L2), the semantically oriented task will lead to lower learning outcomes compared to a structurally oriented task (Barcroft, 2002).
A few studies have compared structurally oriented tasks with semantically oriented tasks. They generally support the predictions of the TOPRA model (Barcroft, 2000, 2002, 2004, 2006; Kida, 2010; Kida et al., 2022; Wong & Pyun, 2012). For instance, Barcroft (2002) recruited 48 English speakers learning Spanish as an L2. In a within-participant design, participants learned a total of 24 invented ‘Spanish’ nouns in three different conditions: in the semantically constructed condition, participants had to rank the words according to their pleasantness; in the structural condition, they had to count the letters of the words; and in the third condition, they had to learn the words as best as they could. The participants’ learning gains were assessed in three different tests: a lexical free-recall test in Spanish, a free-recall test in English and a picture-to-Spanish-cued-recall test. The Spanish free-recall test and cued-recall test tapped into structural knowledge of the word forms, while the English free-recall test was operationalized semantic knowledge. The results were in line with the TOPRA model, as the structural condition yielded significantly better gains than the semantic condition in the Spanish free-recall test, whereas the semantic condition fared significantly better in the English free-recall test. In the cued-recall test, the structural condition showed better gains than the semantic condition, although this difference was not statistically significant. The condition without any instruction on explicit semantic or structural engagement (‘learn as best you can’) yielded significantly higher gains in all tests compared to the other two conditions. Barcroft argues that the semantic and structural conditions ‘essentially “got in the learners’ way” of being able to encode the target words as input in the most effective manner’ (Barcroft, 2015, p. 66).
Fewer studies (see Deconinck et al., 2010, 2014, 2017; Kida & Barcroft, 2018) have tested an increase in the mapping component and its resulting effects on cued-recall tests, which are thought to measure whether mappings were learnt (see Kida & Barcroft, 2018). A study by Kida and Barcroft (2018) focuses on resource allocation of the mapping component through homographs. In the within-participant design study with 112 participants and a semantic, a structural and a mapping condition, they could show support for the TOPRA predictions as learning in the mapping condition outperformed all other conditions. Deconinck et al. (2010, 2014, 2017) increased the mapping component through a form–meaning fit rating and could also show that this mapping-oriented task yielded better results. Furthermore, they showed that half of the participants’ verbal elaborations of the form–meaning fit, which were documented in a think-aloud protocol, were cross-linguistic associations (Deconinck et al., 2017).
III Research questions and purpose of the present study
To investigate the influence of bilingual translations in vocabulary card learning on the one hand, and the impact of a task with an increased focus on lexical semantics while using vocabulary cards on the other hand, our study consists of a within-participant design in which the participants had to learn 60 Estonian nouns in the following three conditions:
Monolingual condition (control condition): traditional vocabulary card learning with the written translation in the dominant (usually first) language of each participant;
Bilingual condition: vocabulary card learning with the written translation in the two strongest languages of each participant;
Multilingual semantic+ condition: vocabulary card learning with the written translation in the dominant first language and a question mark (the question mark indicated an instruction to retrieve and verbalize the same meaning of the word in any other known language).
This study investigates the following two research questions:
• Research question 1: Monolingual vs. bilingual translations: Does the integration of bilingual translations in vocabulary card learning in a linguistically distant language lead to more correctly recalled word items and higher efficiency in the immediate and delayed L2–L1 cued-recall post-tests in comparison to the traditional learning method?
For this research question, the monolingual condition was compared to the bilingual condition. We hypothesized that the bilingual condition would show higher and more efficient learning in the post-tests.
• Research question 2: Increased semantic processing via a multilingual card task: Does a task with an increased focus on lexical semantics in vocabulary card learning lead to fewer correctly recalled word items and lower efficiency in the immediate and delayed L2–L1 cued-recall post-tests in comparison to two other conditions without an increased focus on lexical semantics?
To answer this question, the multilingual semantic+ condition will be compared to the monolingual and bilingual conditions. As predicted by the TOPRA model, we hypothesize that the multilingual semantic+ condition will lead to fewer correctly recalled word items and lower efficiency in the immediate and delayed L2–L1 cued-recall post-tests than the other two conditions.
Multilingual learners of additional languages arguably draw on their individually different repertoires which help them integrating new vocabulary if there are formal or semantic similarities between those languages (De Groot & Keijzer, 2000; Deconinck et al., 2017; Zhao et al., 2017). The question arises as to what extent this individual multilingual repertoire, in addition to the well-known cross-language influence (CLI) of related languages, can be a (dis)advantage for vocabulary card learning of a linguistically distant language. This study examines research question 1 by having participants learn lexical items from Estonian, that is, words that are genealogically unrelated to their translation equivalents in the participants’ multilingual repertoires. Therefore, no positive transfer based on cognate effects is expected. Regardless, we assumed that translations into two languages on vocabulary cards (rather than one) leads to more correctly recalled word items and more efficient learning, as the meaning of the Estonian word is being activated twice through the two languages. Mapping a new word form to two known word forms, arguably, could activate more phonological, orthographic, semantic or experience-based associations with each language and therefore lead to a potentially better integration of the new word form into the network-structure of the mental lexicon (see Baxter et al., 2022; Dijkstra & van Heuven, 2002). We also deem it possible that the participants reflect on which of the two translations is a better form–meaning fit to the corresponding Estonian target word and therefore allocate more processing resources to the mapping component (see Barcroft, 2002, 2003). Deconinck et al. (2010, 2014, 2017) showed that learners who performed such form–meaning fit ratings based on their individual multilingual repertoire and the idiosyncratic properties of the target words benefited significantly from this rating-task in the conducted meaning and form recall tests.
Furthermore, this study focuses on semantic elaboration in vocabulary card learning. The amount of cognitive involvement is defined as an important aspect of learning success by many researchers (see Barcroft, 2015; Laufer & Hulstijn, 2001; Morris et al., 1977; Schmitt, 2010; Zarifi et al., 2021) and is essentially dependent on the processing of the phonological and orthographic word forms, the semantic processing, and the association between the word forms and their meanings (Barcroft, 2000; Nation, 2022). To capture these different dimensions, Barcroft (2000, 2003) developed the TOPRA model, which states that all three of these components must be taken into account so that efficient learning can take place. The TOPRA model assumes that a stronger focus on semantics (which entails an increased processing demand of the semantic component), contrary to the assumption of the LOP framework (Craik and Lockhart, 1972), can also lead to reduced vocabulary learning success in a foreign language. This prediction is related to research question 2. To test it, we compared a more semantically oriented task condition to two other conditions. To compare the effects of these different conditions on learning, we administered a L2–L1 cued-recall test, as cued-recall tests are primarily dependent on mapping processing (Kida & Barcroft, 2018). Whereas no decrease of processing resources for the mapping component in the monolingual and bilingual condition is expected, such a decrease is anticipated in the multilingual semantic+ condition, as more processing resources are allocated to the semantic processing component. The multilingual semantic+ condition requires more semantic processing as the participants had to associate the Estonian word and its concept with the given written translation in the dominant language first. Afterwards, they had to find an additional translation in any other of their languages. This entails that the participants had to actively think about and imagine the concept in order to then retrieve its meaning in another language. Therefore, participants’ attention was mostly geared towards the concept and the meaning of the Estonian word rather than the Estonian word form or the form–meaning connection.
IV Method
1 Participants
72 participants living in Switzerland, Russia and Germany were recruited for the present experiment. The participants’ ages ranged from 20 to 56 years, with a mean age of 29.2 years. Among the participants were 46 women and 26 men. Thirty-seven of the participants were enrolled at a university while 35 participants were active in their professional lives. On average, participants spoke 3.2 languages.
In a first stage, suitable participants were recruited in the circle of acquaintances of the authors. In addition, recruiting was carried out on social networking platforms, which are widely used by students in Switzerland and exchange students living in Switzerland, as these target groups are often multilingual. Afterwards, some participants helped to find additional participants in their respective social networks. All participants took part voluntarily in the study and did not receive any financial remuneration.
Only participants who either grew up bilingually or had been learning at least one second/foreign language and could communicate fluently in that language according to their self-assessment (see below) were included. To assess their (foreign) language proficiency and to identify and exclude potential participants who could either speak Estonian or another Finno-Ugric language, participants had to fill out a questionnaire. In that questionnaire, participants had to self-evaluate their oral language competence in their two strongest languages on a 7 point Likert scale. Participants who judged their oral competence in their second-best language to be below 3 were automatically excluded from the experiment. The two participants who judged their oral competence on level 3 had a follow-up conversation with the authors, leading to an admission to the experiment because their language proficiency during the conversation with the authors in that respective language turned out to be sufficient. Twenty additional participants had to be excluded from the experiment as they were not able to invest enough time to do the full experiment, had technical problems, or did not follow all the instructions during the experiments.
The participants showed different language profiles (see Table 1). Our study uses a within-participant design in which all participants took part in all three conditions. This allowed us to compare within-participant performance across the three learning conditions. It seemed thus unproblematic to include various multilingual profiles, provided that the individual repertoires included only languages that are unrelated to Estonian. Therefore, the most important criterion on participant selection was their lack of knowledge of any Finno-Ugric language.
Participants’ language profiles.
2 Target word items
The target words used for this experiment were 60 Estonian nouns. We chose items from a Finno-Ugric language to exclude positive lexical transfer via cognate words (see De Groot & Keijzer, 2000; Marecka et al., 2021; Puimège & Peters, 2019; Sánchez-Casas & García-Albea, 2005; Tonzar et al., 2009; Xue et al., 2022). All items were equally unknown for all participants. The learning task was cognitively demanding and the number of new word items that had to be learned in the three conditions was quite high. We therefore restricted the target items to nouns, which are generally considered easier to learn than other parts of speech, e.g. verbs (Gillette et al., 1999) or adverbs (Morgan & Bonham, 1944).
As the majority of the participants spoke German (see Table 1), the German corpus ‘SUBTLEX-DE’ and its data set ‘SUBTLEX-DE raw file’ (Brysbaert et al., 2011a) was used as the basis for selecting the 60 vocabulary word items. This corpus consists of 25.4 million words from the subtitles of over 4,610 movies and TV-series (Brysbaert et al., 2011b) and arguably represents spoken everyday language rather well. Thus, it is safe to assume that most lemmas in this corpus are part of the receptive vocabulary of L1 German speakers. We selected only nouns which we then translated into Estonian. Moreover, proper nouns, vulgar expressions, titles (Miss, Mister, . . .) and deverbal nouns were excluded.
There are a number of within-word variables that may affect the word recognition and learning (Barcroft, 2015; Brysbaert et al., 2011b; Nation, 2022). The following of these word-related parameters were taken into account:
• corpus frequency per million, as multiple studies in different languages have shown that it is one of the most important predictors on learning success in lexical decision tasks (Balota et al., 2007; Brysbaert et al., 2011b; Ferrand et al., 2010; Keuleers et al., 2010; Vitkovitch & Humphreys, 1991) as well as in word recall (Gregg et al., 2006; Lotto & de Groot, 1998);
• the number of syllables, as the length of the word correlates with the learning effort (Baddeley et al., 1975; Barclay & Pellicer-Sánchez, 2021; Brysbaert et al., 2011b; Deconinck et al., 2017, p. 42);
• the normalized Levenshtein distance to standard German (one of the dominant languages of the large majority of participants), as it calculates the orthographic similarity between two languages and therefore is an important predictor for learning success (Brysbaert et al., 2011b; Mulder et al., 2019) and spontaneous positive transfer (Vanhove & Berthele, 2015);
• concreteness, as numerous studies have shown that concrete words are easier to learn in recall tests (De Groot & Keijzer, 2000; De Groot & Smedinga, 2014; Miller & Roodenrys, 2009; Tsuboi, 2019) and recognition tests (De Groot & Keijzer, 2000; De Groot & Smedinga, 2014; Tsuboi, 2019).
Corpus frequency was calculated based on the frequency information in the SUBTLEX-DE corpus (frequency/25.4 million words = token frequency per million). The normalized Levenshtein distance was computed through the comparison of the German word and the Estonian translation of each word item. Concreteness was assessed drawing on data provided by Charbonnier and Wartena (2020), a data set which consists of 4,181 German words and their concreteness information which was assessed through three different data sets (Göttingen Word Norms, Web Word Norms and Leipzig Affective Word Norms) and put into the same scaling system. The number of syllables of the Estonian translations was counted applying German hyphenation norms, with the exception that onset vowels counted as a separate syllable. For the present experiment, only Estonian nouns with one or two syllables were chosen as shorter vocabulary items are faster and better memorized than longer words (Baddeley et al., 1975; Barclay & Pellicer-Sánchez, 2021; Deconinck et al., 2017, p. 42).
We needed three word lists of 20 words each (from a total of 60 Estonian nouns) that were balanced for all above mentioned parameters. To obtain these lists we matched 20 times three vocabulary word items that shared similar variables in frequency, Levenshtein distance, concreteness and syllables. The maximal difference we allowed between the three word items regarding these item-related parameters are listed in Table 2.
Maximum differences for stimulus words to be considered equivalent.
These equivalent lexical triplets were randomly assigned to one of three word lists. To obtain a more or less natural distribution of more concrete (concreteness of 3.5–7.0) and abstract nouns (concreteness of 0–3.5) we calculated the percentage distribution in the data set with all the 515 nouns. Two abstract triplets (12.43% abstract nouns in the dataset) and 18 concrete triplets (87.57% concrete nouns in the dataset) were selected for the lists. The averages of the item-related variables across the three vocabulary lists, as well as across the conditions after the intervention, are listed in Appendix A.
3 Procedure
To answer the presented research questions, a 3 × 3-within-participant design was applied (see Figure 2). In this design, we tested the (1) monolingual condition, (2) bilingual condition and (3) multilingual semantic+ condition on day 1, day 3 and day 10 of the intervention. All participants were tested in all those three conditions. The order of the three conditions was counterbalanced on the first as well as on day 3 for each participant in order to cancel order effects. The three word lists were randomly assigned to the three conditions for each participant (for example for participant 12: word list 1 assigned to ‘bilingual condition’, word list 2 to ‘monolingual condition’ and word list 3 to ‘multilingual semantic+ condition’). This was done on day 1. This allocation then remained unchanged throughout the experiment. Altogether, the duration of the experiment spanned over 10 days for each participant.

Study design.
All participants learned 20 Estonian nouns in each condition respectively. After learning the word items in each condition, participants were immediately tested with an L2–L1 cued-recall post-test. After having completed the test, participants were instructed to take a 10-minute break before continuing with the next condition to avoid cognitive overload. This procedure remained the same on day 1 and day 3 of the intervention. Seven days after the second learning session (day 10), the delayed L2–L1 cued-recall post-test including all 60 words of the three conditions was directly administered without any learning phase or repetition of the words.
Participants completed the learning and testing process independently at home without the presence of the investigators. This was done to increase willingness to participate in the study, as the participants were more flexible in when to start the intervention. In addition, they were able to learn in their natural environment, without interference from other participants or researchers, and at their own pace as forcing a time limit upon participants can lead to an uneven distribution of the learning time of target words that are further down on the list of items to be learned (Barclay & Pellicer-Sánchez, 2021, p. 264). To ensure participants had a good understanding of the learning and testing procedure, an online briefing via zoom with each participant took place individually. In this briefing, an overview of the 10-day intervention was presented and the participants were told exactly what they had to do in each condition and how to use the Memocard (2023) learning and the Involve.me (2023) testing programs. After a comprehensive explanation of the procedure, individual questions were clarified, and the starting day and the desired means of communication with the investigators were determined. On every day of learning and testing, participants received a message with a reminder of the most important instructions. Furthermore, we provided written explanations of the procedure with hyperlinks to the learning tasks and test materials. These materials were customized for each participant according to their individual language profiles. In the following sections we explain the learning procedure of the Estonian words, followed by details regarding the post-tests and scoring.
a Learning procedure
The participants learned the Estonian words in each condition independently through the online learning program Memocard (2023). In this program and for all conditions, the Estonian word to learn was always prompted, after which participants could click on the button ‘answer’ to receive the according translation(s) in their language(s). To get to the next Estonian word, participants had to click on the button ‘next’ until they learned all 20 words in each condition and therefore completed one round of learning. As each participant showed a different language profile and word lists were randomly assigned to the conditions, the learning program for each condition with corresponding translation(s) had to be customized for each participant individually. The translation(s) and the procedure in each condition were as following:
• In the monolingual condition, the Estonian word to learn was prompted, then its translation in the participants’ strongest language was displayed. For balanced bilinguals, one of the two languages was chosen at random. The participants were asked to repeat the Estonian word and its translation twice aloud, for example: juust – cheese, juust – cheese.
• In the bilingual condition, participants were shown the Estonian word, and received translations in their two strongest languages. The translations were arranged one below the other; the translation in the strongest language always came first, followed by the translation in the second strongest language. For balanced bilinguals, the order of the translations was random. Participants were asked to say out loud the Estonian word and its translation once in each language, for example: juust – cheese, juust – fromage.
• In the multilingual semantic+ condition, participants were shown the Estonian word first, and got the translation in their strongest language, followed by a question mark (?). The translation in the strongest language always came first, followed by a question mark on the next line. The question mark was the prompt for the participants to retrieve the meaning of the presented Estonian word in another language they master. The participants could use any of their languages, and they were free to change the language for each item. As for the other conditions, participants had to repeat the Estonian word once with the given translation and once in their own retrieved translation in another language, for example: juust – cheese, juust – ? (possible answer: Käse). In case participants could not retrieve the meaning of the Estonian word in any other language, they were asked to repeat the word in the given translation twice.
The participants had to use the learning program Memocard (2023) on day 1 and day 3 of the intervention to learn the Estonian word items in each condition. There was no time limit given for the learning sessions; the learning time in each condition for each participant, however, was recorded so that we could later calculate learning efficiency. On average, participants spent 14 minutes 30 seconds in the monolingual, 15 minutes 2 seconds in the bilingual and 16 minutes 12 seconds in the multilingual semantic+ condition studying the word items in our intervention. Furthermore, participants were explicitly told that they should only learn the words as instructed on day 1 and day 3 with the following number of rounds, hence repetitions:
• On day 1, participants learned the 20 Estonian word in each condition within 3 rounds.
• On day 3, each participant had to repeat the learned words from day 1 with two more rounds in each condition.
The reason for having multiple learning rounds on day 1 and day 3 was to increase the number of repetitions (Nation, 2022; Peters, 2014), which implies more instances of active retrieval of the newly learned words (Barcroft, 2007; Nakata, 2017, p. 20) as well as the possibility of distributed learning through spacing (Karpicke & Bauernschmidt, 2011; Kornell, 2009). Both features are related to long-term retention in learning. We decided to have the learners do three learning rounds on day 1. In the first round, the participants encountered the completely novel Estonian words. In the following two more learning rounds, they were expected to actively retrieve the Estonian words. Therefore, participants went through two rounds of active retrieval for each word on day 1 and day 3 of the intervention. Determining the number of retrieval sessions is to some extent arbitrary: The literature suggests that there is no noteworthy learning improvement between one or three retrieval practices within a learning session (Nakata, 2017). Our choice of two such retrieval sessions was based upon the remark made by Nakata (2017, p. 675) that giving learners more retrieval possibilities can make them feel more confident about the task.
As more learning rounds lead to more learning time needed, the learning rounds were restricted to the number of mentioned rounds above, as more rounds would have exceeded the feasible time investment for the participants. A pilot study had shown this number of rounds to yield sufficient learning gains.
b Testing procedure and scoring
After each learning phase in each condition on day 1 and day 3, the participants completed an immediate L2–L1 cued-recall post-test through the survey program Involve.me (2023), which resulted in a total of six immediate L2–L1 cued-recall post-tests per participant. The Estonian words in each immediate post-test were presented in a randomized order, which differed from the order in the learning phase. The delayed L2–L1 cued-recall post-test was administered on day 10 of the intervention, hence 7 days after the last learning session and included all 60 Estonian words of all conditions in a randomized order. All the tests prompted an Estonian target word, which the participants had to translate into one of their known languages. Scoring did not depend upon which language they chose, whether spelling was correct spelling, whether articles were used or not before the noun (‘the house’ or ‘house’ or ‘a house’), or whether singular or plural forms were provided (‘the house’ or ‘the houses’). All participants had been informed about the above-mentioned testing and scoring design during the briefing and the relevant information was refreshed at the beginning of each post-test.
The testing platform Involve.me (2023) provided information on when the participants had completed the test, how long it took them and which translations they had given for each Estonian word. Participants who did not start or complete the learning and testing phases in the given time interval (day 1, day 3 and day 10) were contacted that same day by the authors to remind them once again to participate. If they have not participated in all the learning and testing phases accordingly to the time frame, they were excluded from further analyses.
For the analysis, the number of correctly translated Estonian words for each of the seven post-tests on day 1, day 3 and day 10 for each condition and each participant individually was evaluated by the authors and summarized in a spreadsheet on word and condition level. For the scoring system, each word item was either rated as correct or incorrect. Rightfully translated words were scored as correct. If the translation of a word item was false, it was marked as incorrect. If participants indicated that they did not know the translation of a word, marked the answer with a question mark or left the answer blank, it was also scored as incorrect.
In addition, efficiency for each participant in each condition was calculated. As pointed out by Nakata (2017), the time spent on a task should be considered as a contributing factor when elaborating and evaluating learning strategies. Learning success can be increased by simply spending more time on the words to be learned (Schmitt, 2008). Our participants could determine their own pace of learning, and the time spent on the learning tasks may have differed across the different conditions. Thus, we included efficiency in terms of number of items learnt per time unit in our study.
To calculate efficiency, learning time in each condition on day 1 and day 3 and the number of correctly reproduced vocabulary words on the delayed post-test were used as follows:
V Results
To examine the research questions and assess the learning success for all conditions and test points, a logistic regression model using a 3 × 3 hypothesis matrix (see Schad et al., 2020) was fitted to the data, modeling the participants’ responses on the item level. Because more than just 9 hypotheses were formulated, the 3 × 3 hypothesis matrix was calculated twice in total (for more details, see Appendix A). In addition, the factor ‘concreteness’ was included in the analysis as a co-variate to calculate the influence of this inner word variable for possible confounds. Figure 3 and Table 3 summarizes the final results of this study.

Results of the post-tests in all conditions.
Comparisons of the conditions.
Notes. biling = bilingual condition. mono = monolingual condition. multisem = multilingual semantic+ condition. p-values in bold are < 0.05.
The analysis of each word item shows that concreteness is a significant predictor (p = 0.007). Moreover, more words were learned on the second immediate post-test (day 3) than on the first immediate post-test (day 1, p ⩽ 0.001). Words were learned in total three times on day 1 and were then repeated two more times on day 3. These additional two rounds led to better results on day 3. Furthermore, there is a significant difference between the immediate post-tests and the delayed post-test (p ⩽ 0001): Scores in the immediate post-tests are significantly higher than in the delayed post-test, indicating the characteristic memory decay shown in many vocabulary learning studies (see, for instance, Barclay & Pellicer-Sánchez, 2021; Ellis & Beaton, 1993; Nakata & Webb, 2016; Peters, 2014; Schneider et al., 2002).
Efficiency was calculated for each participant and each condition as the number of correctly retrieved word items per 10 minutes. A multiple regression analysis was conducted using a 3 × 1 hypothesis matrix (see Schad et al., 2020) to compare the three conditions. To test all hypotheses, the 3 × 1 hypothesis matrix was again calculated twice (for more details, see Appendix A). Results are shown in Figure 4 and Table 4.

Number of correctly retrieved words per 10 minutes in each condition.
Comparison of efficiency between the different conditions.
Notes. biling = bilingual condition. mono = monolingual condition. multisem = multilingual semantic+ condition. meanbiling + multisem = mean of bilingual and multilingual semantic+ condition. p-values in bold are < 0.05.
Research question 1: Using bilingual translations in vocabulary cards
Research question 1 aimed to compare the learning success and efficiency of bilingual translations (bilingual condition) in comparison to traditional translation into one language (monolingual condition) in vocabulary card learning. We hypothesized that the bilingual condition would yield better results in the post-tests. The regression model fitted to the data revealed that the monolingual condition significantly outperformed the bilingual condition on the two immediate post-tests (T1: p = 0.004, T2: p ⩽ 0.001) and on the delayed post-test (T3: p = 0.027). The hypothesis that the bilingual condition yields better long term performance must therefore be rejected: Words were learned significantly better throughout all the testing points with only one translation in the dominant language of each participant. The comparison of efficiency between the monolingual and bilingual conditions, however, yielded no statistically significant effect (p = 0.177), even though the monolingual condition descriptively yields better scores.
Figure 5 shows the participants who performed better in the monolingual condition (all data points above the zero line) and the participants who performed better in the bilingual condition (all data points below the zero line) in the delayed post-test: 29 participants benefited more from the bilingual condition, whereas 34 participants were able to reproduce more Estonian words in the monolingual condition. Nine participants achieved the same scores in both conditions.

Comparison of participants’ results in the delayed post-test.
Research question 2: Testing the semantic TOPRA model prediction
Research question 2 aimed to verify whether a multilingual semantic oriented learning task (multilingual semantic+ condition) would yield worse results in comparison to other learning tasks (monolingual and bilingual condition) in the L2–L1 cued-recall post-tests, as predicted by the TOPRA model (Barcroft, 2002, 2003).
Our analysis reveals that only the monolingual condition performed significantly better than the multilingual semantic+ condition in all the conducted post-tests (T1: p ⩽ 0.001 / T2: p ⩽ 0.001 / T3: p = 0.002; see also Table 3). In terms of efficiency, similar results were found (see Table 4): The monolingual condition was statistically more efficient than the multilingual semantic+ condition (p = 0.016).
As can be detected in Figure 3, the bilingual condition performed minimally better than the multilingual semantic+ condition. This difference, however, is small and reaches statistical significance only for the first immediate post-test (T1: p = 0.01 / T2: p = 0.095 / T3: p = 0.388). In terms of efficiency, the result is similar: There was no statistical difference between the two conditions; the bilingual condition, however, performed descriptively better (p = 0.289) than the multilingual semantic+ condition (see also Figure 4).
Thus, our hypothesis can only be confirmed in regard to the comparison between the monolingual and the multilingual semantic+ condition. No difference is found between the bilingual and the multilingual semantic+ conditions.
VI Discussion
1 Using bilingual translations in vocabulary cards
Research question 1 was whether bilingual translations on vocabulary cards have a beneficial effect on the learning of Estonian words. To address this question, the monolingual condition, which represents traditional vocabulary learning with a translation into learners’ first dominant language, was compared to the bilingual condition, which included two translations in the two strongest languages of each participant. We assumed that those two translations present in the bilingual condition could activate more language-specific associations which could then lead to a deeper integration of the Estonian target word into the network-structure of the mental lexicon (see Baxter et al., 2022; Dijkstra & van Heuven, 2002). Additionally, the two translations could encourage participants to perform spontaneous form–meaning fit elaborations, deciding which of the two translations best fits the Estonian target word. Deliberate form–meaning fit rating tasks have been shown to be beneficial in meaning and form recall tests (Deconinck et al., 2010, 2014, 2017). Based on this rationale, we hypothesized that the bilingual condition will be more efficient and yield better performance in the retention of the newly learned words in the post-tests.
The results show that learning in the monolingual condition significantly outperformed learning in the bilingual condition in all post-tests. In terms of efficiency, however, results indicate no significant difference between the two conditions, that is both conditions seem more or less equally efficient.
So why did the bilingual condition, contrary to our expectations, still not perform better? Studies have shown that any stored representation in the multilingual lexicon is activated as soon as a target item in some way is similar to one of the stored items, even if participants have not been explicitly prompted to go into a multilingual mode (Abbas et al., 2021; Child, 2022; Jamali et al., 2021; Rothman, 2011) as for example in the monolingual condition. However, our study design was conceived in such a way that almost no positive transfer on the level of the word form was expected, as the translation equivalents from the participant’s individual multilingual repertoires were formally dissimilar to the Estonian target items. Therefore, it is highly probable that the inclusion of a supplementary language with no cross-linguistic possibilities led to additional learning efforts in the bilingual condition.
Explicit form–meaning fit elaborations have resulted in a positive learning effect in some studies (Deconinck et al., 2014, 2017). In those studies, however, participants had to perform form–meaning fit elaborations (1) only with one translation and not with two as in our study in the bilingual condition and (2) had Dutch translations for English target words, which both belong to the Germanic languages and therefore are formally similar, and (3) participants were explicitly asked to do such form–meaning fit elaborations, which was not the case in our study.
Drawing on the TOPRA model (Barcroft, 2000, 2002), the task type in the bilingual condition may have increased to some extent also the structural component, since two written word forms (= the two translations) had to be linked to the Estonian word form. Therefore, the participants had to deal more with word forms, as they had to process two known forms and compare them to the target form, maybe assessing how they could formally build a bridge across the very dissimilar languages (as, for example, done in the popular keyword learning method). The underlying trade-off idea of the TOPRA model (Barcroft, 2002, 2003) entails that an increase in components of one kind unavoidably leads to a decrease in processing capacity for components of another kind. In our case, this means that there were fewer semantic processing resources available due to the higher engagement with the other two components. Arguably, this may have had a negative effect on L2–L1 cued recall, since our test not only taps into the form–meaning mapping, but also into semantic processing.
2 Testing the semantic TOPRA model predictions
Research question 2 was whether a task with an increased focus on lexical semantics (multilingual semantic+ condition) has a detrimental effect on vocabulary card learning in the post-tests administered. To answer this question, the multilingual semantic+ condition was compared to the monolingual and bilingual conditions. Drawing on the TOPRA model (Barcroft, 2002, 2003), we hypothesized that the multilingual semantic+ condition, compared to the other two conditions, will lead to fewer correctly recalled word items and lower efficiency in all L2–L1 cued-recall post-tests administered, as more resources are allocated to the semantic processing and therefore fewer resources will be available for the form and mapping components of the word items.
The results showed that the multilingual semantic+ condition performed significantly worse than the monolingual condition – both in the post-tests as well as in terms of efficiency. In comparison to the bilingual condition, the multilingual semantic+ condition also performed descriptively worse in the post-test and in terms of efficiency – these differences, however, weren’t statistically significant. The predictions of the TOPRA model that fewer resources are available for the mapping and form components, if semantic engagement is more intense, can be confirmed – however only with a statistically significant difference in comparison to traditional vocabulary card learning with translations into one language only. Studies that compare a semantically oriented learning task to a control task for learning new L2 vocabulary items yield similar results: In all studies we are aware of, the control group has reached better results than the group learning via an elaborated semantic learning task. Among these tasks were activities such as writing the vocabulary word item in a sentence (Barcroft, 2004; Wong & Pyun, 2012), a pleasantness rating of the vocabulary word item (Barcroft, 2002; Kida, 2010), asking questions concerning the meaning of the vocabulary word item (Barcroft, 2003) or choosing the more semantically similar word item to the given vocabulary word item (Kida et al., 2022).
The comparison between the multilingual semantic+ and the bilingual condition yielded, as hypothesized, worse results for the semantic+ condition. These results, however, were not significant. As mentioned above, we believe that the bilingual condition has potentially promoted higher levels of structural and mapping engagement. This may have resulted in less engagement with the semantic component. Given that L2–L1 cued-recall post-tests rely not only on mapping, but also on semantic elaboration, this may have contributed to the outcome of only worse, but not significantly worse, results in the L2–L1 cued-recall post-tests.
VII Conclusions and practical implications
This study has shown that none of the more complex learning conditions led to better gains than traditional vocabulary card learning. Arguably, straightforward vocabulary card learning is sufficiently cognitively challenging that any additional cognitive load leads to lower gains. We chose Estonian target items to avoid formal overlaps with lemmas in the multilingual mental lexicons of the participants. Barcroft (2015) rightly points out that the smaller the lexical overlap, the higher the learning burden and therefore the more challenging the task. It may be possible that the more complex tasks in the bilingual and multilingual semantic+ condition led to divided attention and/or back and forth switching of attention between the form and the meaning of the corresponding word, which, along the lines of the trade-off idea of our underlying model, may have hampered the learning of the association of forms and meanings.
In regard to research question 1, it is nevertheless important to point out that a minority of 29 out of the 72 participants did in fact benefit more from the bilingual condition than the monolingual condition and that both conditions showed similar efficiency. Further research is needed to identify the individual difference variables that make the bilingual condition more efficient for certain learners. Including two translations, especially for bilinguals, might still be a valuable tool for some students, but future research should be done to identify the individual profiles of these learners.
Regarding research question 2, we conclude that deeper semantic processing does not necessarily lead to better results, even though this assumption ties in with pedagogical common sense and is entailed by influential models of learning such as LOP (Craik & Lockhart, 1972).
A limitation of our study is that participants carried out the learning and testing procedure independently at home rather than in a controlled lab environment. Despite detailed briefing, support materials, reminders on each intervention and check of who completed the tests on time, there was in the end relatively limited control of how participants carried out the learning and testing tasks. The authors relied on the individual responsibility and self-motivation of the participants to fulfill all the tasks as described, as they all took part voluntarily in the study.
Furthermore, this study only produced results from a L2–L1 cued-recall post-test. In future research, it would be worthwhile including other types of tests such as L1–L2 cued recall or free recall in L1 and L2. This would enable us to compare differences depending on which component (structural, semantic or mapping) is being tested. It would be particularly interesting to investigate how the multilingual conditions perform, since we hypothesize that the bilingual condition may have resulted in higher structural and mapping elaboration which would lead to better L1–L2 cued-recall compared to the L2–L1 test used in the present study.
Vocabulary card learning is efficient but cognitively demanding. Teachers should be aware of the limited cognitive processing capacities of the learners and try to create a learning environment and learning strategies that support the actual task, and that eliminate additional input that leads to possible distortions of attention or unduly high processing of one component at the expense of another. We hope the present study contributes to our understanding of the factors that have an impact on deliberate vocabulary learning in multilingual learners of additional languages.
Footnotes
Appendix A
Parameter estimates for the post-test results.
| Results: | |||
| Predictors | Odds ratios | CI | p |
| (Intercept) | 0.68 | 0.25–1.83 | 0.449 |
| CellT2T1 | 4.48 | 3.59–5.60 |
|
| CellT3T1T2 | 0.21 | 0.20–0.24 |
|
| CellT1 bilingvs.mono | 0.76 | 0.64–0.92 |
|
| CellT2 bilingvs.mono | 0.60 | 0.49–0.74 |
|
| CellT3 bilingvs.mono | 0.82 | 0.69–0.98 |
|
| CellT1 bilingvs.multisem | 1.26 | 1.06–1.51 |
|
| CellT2 bilingvs.multisem | 1.18 | 0.97–1.44 | 0.095 |
| CellT3 bilingvs.multisem | 1.08 | 0.91–1.28 | 0.388 |
| concreteness | 1.27 | 1.07–1.50 |
|
| Random effects: | |||
| σ2 | 3.29 | ||
| τ00 Participant | 1.48 | ||
| τ00 Item | 0.49 | ||
| ICC | 0.37 | ||
| NParticipant | 72 | ||
| NItem | 60 | ||
| Observations | 12957 | ||
| Marginal R2 / Conditional R2 | 0.122 / 0.450 | ||
Notes. biling = bilingual condition. mono = monolingual condition. multisem = multilingual semantic+ condition. p-values in bold are < 0.05.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
