Abstract
This study examines whether learners exposed to specific example sentences through data-driven learning (DDL) can not only identify generalized linguistic patterns but also apply the patterns to other expressions, thereby demonstrating that DDL is a learning method based on a usage-based model. Forty-three Japanese learners of English participated in DDL activities to study the use of six verbs from two verb classes (three from the Throw class and three from the Whisper class) in terms of the dative alternation. Specifically, they studied whether these verbs can be used in the double object (DO) construction or the prepositional dative (PD) construction. The participants underwent pre-, post-, and delayed post-tests, during which they evaluated the grammaticality of sentences containing the studied verbs, as well as unstudied verbs from the same classes and verbs from the control classes (the Send and Mention classes). A cumulative link mixed model (CLMM) was employed to analyse the effects of test timing (pre/post/delayed post), learning (studied/unstudied), and construction (PD/DO) on test scores. The results showed that learners made more correct judgments on the post-test than on the pre-test. This improvement was observed not only for the studied verbs but also for unstudied verbs from the same classes, and even for verbs from the control classes. This indicates that DDL embodies the idea of a usage-based model; that is, learners generalize linguistic patterns through language experience. Furthermore, the learning effects were retained even in the delayed post-test, suggesting that DDL is not merely a tool for referencing word usage but also a learning method that converts input into intake.
Keywords
I Introduction
The rapid development and widespread use of personal computers have made it possible to process vast amounts of linguistic data at high speeds. Consequently, a variety of linguistic studies have been conducted using corpora, including their use in language education. Generally, the primary application of corpora in language education involves the use of empirical findings obtained from corpus analyses in the compilation of dictionaries and grammar books. This usage of corpora can be said to be indirect for learners, in that the users of the corpora are not learners but linguists. In 1991, Johns proposed a new corpus-based learning method called data-driven learning (DDL) that has attracted the attention of researchers. The key difference between DDL and other teaching methods is that it views the learner as ‘a research worker whose learning needs to be driven by access to linguistic data’ (Johns, 1991, p. 2). That is, DDL is a learning method in which learners are exposed to examples of language and inductively find patterns within them.
Over 30 years have passed since DDL was first proposed, and many empirical studies have been conducted on this method. However, as Boulton (2017) claimed, although many empirical studies have been conducted in the last 30 years, few have directly sought to test the theoretical foundations of this method. In other words, the critical question of ‘Why can DDL be effective?’ remains unanswered. In addition, a meta-analysis of DDL studies (Boulton & Cobb, 2017) revealed that DDL research to date lacks reporting of delayed post-tests. Of course, DDL has been reported to be effective during writing activities as a reference tool to correct errors or to check the usage of a word a learner is going to use (e.g. Yoon & Jo, 2014). However, DDL appears to be more than a convenient reference tool. Through these activities, learners can not only inductively find but also acquire linguistic patterns in the target language. To demonstrate this learning process, it is essential to conduct delayed post-tests to show that the effect is not temporary.
Thus, the aims of this study were twofold: First, it sought to demonstrate that DDL is a learning method supported by the language acquisition theory (i.e. the usage-based model), thereby establishing DDL as an effective learning method with a solid theoretical foundation. Second, it aimed to demonstrate that the knowledge acquired through DDL persists in delayed post-tests. This will address the gaps in previous studies by examining the long-term effects of DDL.
The remainder of this article proceeds as follows: Section II provides an overview of DDL and introduces research on the usage-based model as a theoretical justification of the learning method. It then describes the problems with current DDL research. Section III briefly explains the dative alternation employed to address these problems in the present study. Section IV presents the methods of this study with results detailed in Section V. This is followed by the discussion and conclusions in Sections VI and VII, respectively.
II How data-driven learning works
1 DDL
Johns (1991) proposed an innovative language-learning method in which learners access linguistic data to identify patterns in the target language. Recently, various types of DDL have been proposed. Most of them use computers, but some studies have used printed material (e.g. Smart, 2014). Some teachers have used preselected concordance lines, whereas others have not. Typically, DDL is conducted as follows: Learners are presented with several example sentences containing the target language patterns in Key Word in Context (KWIC) format. Learners then find patterns on their own. For example, in Johns (1991), learners were asked to discover some differences between convince and persuade using handouts, which are partly shown in Figure 1. In this study, learners read through the concordance lines in the handout and were expected to realize that convince tended to be followed by a that-clause, while persuade was most often followed by a to-infinitive. Furthermore, the learners discovered an unexpected pattern with persuade: When followed by a to-infinitive, it refers to an action, whereas in the case of a that-clause, it refers to a truth. The learners found this explanation more useful than the teacher’s conventional explanation of the differences between the two.

A handout used in Johns (1991).
Since its inception, DDL has attracted significant attention from language-learning researchers. According to Boulton and Cobb (2017), more than 200 empirical DDL studies had been published by June 2014. Many researchers have stated that DDL offers several advantages. For example, Boulton (2010) claimed that DDL encourages ‘noticing and consciousness-raising, leading to greater autonomy and better language learning skills in the long term’ (p. 535), and Johns (2012) argued that DDL helps learners to ‘develop inductive strategies’ and ‘become better language learners outside the classroom’ (p. 297).
2 Usage-based approach of DDL
One of the most important characteristics of DDL is that it reflects learning theories proposed in recent years. For example, some researchers have argued that usage-based models provide a theoretical justification for DDL (Allan et al., 2023; Boulton & Cobb, 2017; O’Keeffe, 2023). Proponents of usage-based/exemplar-based models (e.g. Bybee, 2008, 2010; Evans, 2014; Taylor, 2012; Tomasello, 2003) regard domain-general cognitive processes, such as categorization and analogy, as important key factors in language acquisition. Underlying this is the idea that ‘the human brain is programmed to detect patterns in the world around us’ (Boulton & Cobb, 2017, p. 350). Language learning then is not special but rather a basic learning process in which humans make full use of their general cognitive abilities, including pattern-finding skills.
1
In a more recent study taking a similar stance, Dehaene (2020) defines learning in general, including language acquisition, as ‘inferring the grammars of a domain’, stating: Characteristic of the human species is a relentless search for abstract rules, high-level conclusions that are extracted from a specific situation and subsequently tested on new observations. Attempting to formulate such abstract laws can be an extraordinarily powerful learning strategy, since the most abstract laws are precisely those that apply to the greatest number of observations. Finding the appropriate law or logical rule that accounts for all available data is the ultimate means to massively accelerate learning – and the human brain is exceedingly good at this game. (p. 35)
In this view, the human brain is innately able to extract general and systematic patterns, creating an abstract ‘mental model’ (Dehaene, 2020, pp. 6–7) from concrete phenomena. According to usage-based models, learning ability is an important driving force of language acquisition.
The proponents of usage-based models see language as ‘dynamic, complex, probabilistic, interactive, and patterned’ (Boulton & Cobb, 2017, p. 350). For example, Evans (2014) argues that exposure to many examples of the -s suffix used with nouns leads to the discovery of a pattern, plural [NOUN -s], and its application to new nouns, thereby advocating a dynamic process of language acquisition in which a more abstract language system is gradually constructed from exposure to actual language use in specific situations. In recent years, several second language acquisition studies (e.g. Bybee, 2008) have adopted this model.
This view of language acquisition implies a need for substantial input. In an English as a foreign language (EFL) environment such as Japan, it is difficult for learners to gain exposure to sufficient input to acquire the target patterns. Therefore, the lack of language experience must be compensated for. DDL is a promising approach that facilitates learning by providing sufficient input in accordance with the usage-based theory of language learning (O’Keeffe, 2023). That is to say, the accumulation of language experience through DDL could allow learners to ‘proceed toward the target norm by progressive approximations’ (Boulton & Cobb, 2017, p. 350).
3 Problems of previous studies
Recently, DDL has been shown to have excellent learning effects. Boulton and Cobb (2017) conducted a meta-analysis of 84 sample groups from 64 studies reporting medium-to-large effects. However, this method had several limitations. One of the greatest concerns is that although the effectiveness and efficiency of the learning method have been revealed, research has not been able to explain why such effects might occur. As Boulton and Cobb (2017) state, ‘a meta-analysis cannot identify which theoretical underpinnings lead to these results’ (p. 386). Boulton (2017) views this as an overall problem in applied linguistics research: ‘[W]hile many empirical studies refer to theoretical and pedagogical foundations, few seek directly to test them, and theory has not been a major driving force leading to new practices’ (p. 485). It seems that this lack of theoretical investigation as a driving force is one of the problems in DDL research. Although many studies refer to the background theories of DDL, most have focused on peripheral issues such as the effects of proficiency, the effects on complexity, accuracy and fluency (CAF), the need for training in using corpora or computers, and the differences depending on how language data are accessed – through direct exposure to the corpus or paper-based access (e.g. Boulton, 2009, 2010; Saeedakhtar et al., 2020; Samoudi & Modirkhamene, 2020; Smart, 2014; Yoon & Jo, 2014). In other words, the underlying theoretical rationale for DDL based on the usage-based model, that is, the accumulation of intensive language experience that allows learners to find patterns in the target language, has not been sufficiently tested directly. To demonstrate this, it is necessary to show with a specific linguistic phenomenon that intensive exposure to many example sentences through DDL can help learners notice language patterns and further generalize their knowledge.
Another problem with DDL studies is a lack of delayed post-tests (Boulton & Cobb, 2017). DDL can be used in various learning situations. One popular situation is in writing activities, where learners use corpora to find the words necessary to write what they want to express. However, DDL seems to be not only a simple and useful reference tool but also a powerful learning method that affords learners exposure to ample examples to generalize and acquire the patterns found in them. To demonstrate this, it is necessary to conduct delayed post-tests to show that the input has turned into intake. Dehaene (2020) explained that active engagement and processing depth are key factors in better learning. From the perspective of neuroscience, he describes learning retained over the long term as follows: An unconscious image enters sensory areas but creates only a modest wave of activity in the prefrontal cortex. Attention, concentration, processing depth, and conscious awareness transform this small wave into a neuronal tsunami that invades the prefrontal cortex and maximizes subsequent memorization. (p. 180)
As DDL is a learning method in which learners construct their own abstract mental models from given examples and discover general patterns, it emphasizes active engagement and deep processing. This aligns well with Dehaene’s suggestions. By incorporating delayed post-tests, we can better demonstrate the effectiveness of DDL, showing that it not only aids in immediate comprehension but also supports the sustained retention of knowledge.
III Acquisition of dative alternation
To provide empirical evidence that can demonstrate the theoretical foundation of DDL with a specific linguistic phenomenon, the dative alternation is examined as an interesting example. Some verbs can take both the prepositional dative and double-object constructions. For example, the sentence ‘Taro gave a cake to Mary’ can be altered to ‘Taro gave Mary a cake’. SVO and SVOO are basic English constructions that are learned by all learners. Although rewriting between these two constructions is considered important in grammar education, there are subtle rules that are difficult to teach explicitly and are rarely taught in schools.
In school grammar, for convenience, learners are often taught that SVO and SVOO sentences can be rewritten each for the other, although in reality, this rule does not always hold true. The following sentences illustrate this.
(1) a. John threw a ball to Mary. b. John threw Mary a ball. (2) a. John pushed a ball to Mary. b. *John pushed Mary a ball. (Inagaki, 1997, p. 638)
Sentence patterns (1a) and (2a) are called prepositional datives (PDs) because they contain the preposition to. In contrast, sentence pattern (1b) and (2b) are called double-object datives (DOs) because the verb is followed by two objects. Dative alternation refers to the alternation between PD and DO. However, the above sentences clearly show that this alteration does not always hold true. According to Pinker (1989), ‘only certain relatively narrow classes of verb meanings are given the privilege of being reconstruable’ (p. 120). In other words, only some verb classes that share similar meanings can be altered between PD and DO. Based on these similarities, various studies have classified verbs into several classes (e.g. Levin, 1993; Pinker, 1989). For example, Pinker (1989) says that the verbs of ‘instantaneous imparting of force to an object causing ballistic physical motion’ to which verbs such as throw belong are allowed in both PD and DO constructions, but verbs of ‘continuous causation of accompanied motion in some manner’ to which verbs such as push belong cannot be used with DO construction, which is the case of (2b).
Bley-Vroman and Yoshinaga (1992) and Inagaki (1997) have conducted studies of the acquisition of the dative alternation by L2 learners. Bley-Vroman and Yoshinaga (1992), based on the Fundamental Difference Hypothesis, hypothesized that Japanese L2 learners cannot acquire semantic verb classes. They investigated whether L2 learners could correctly apply PD and DO constructions with both real and made-up verbs using a grammaticality judgment task. The results revealed that while native speakers could accurately judge the grammaticality of both real and made-up verbs in both constructions, L2 learners could only do so with real verbs. This suggests that, as hypothesized, L2 learners had not acquired semantic verb classes and were learning individual verbs one by one without recognizing a common semantic verb class.
Similarly, Inagaki (1997), building on Bley-Vroman and Yoshinaga’s (1992) research, suggested that Japanese L2 English learners acquire the dative alternation based on the frequency of individual verbs. This implies that learners tended to acquire verbs that appeared more frequently.
These studies propose explanations based on the Fundamental Difference Hypothesis, assuming Universal Grammar (UG). They conclude that learners cannot acquire verb classes because there is no corresponding semantic verb class or dative alternation in their native language; instead, learners acquire the usages of individual verbs one by one. However, more recent research has examined the acquisition of the dative alternation from a perspective that does not assume UG, but a usage-based model. Specifically, learners acquire dative alternations by generalizing linguistic patterns from the input to which they are exposed. For instance, de Marneffe et al. (2012) suggested that dative alternation in L1 is incrementally acquired based on input from caregivers. From this perspective, the aforementioned results suggesting that L2 learners cannot make judgments based on semantic verb classes can be explained by assuming that it may be due to insufficient input in a foreign language environment, resulting in insufficient generalization. If this holds true, it can be assumed that a sufficient amount of appropriate input can lead to the acquisition of semantic verb classes by L2 learners as well.
If L2 learners can generalize patterns through intensive DDL input and accurately judge the grammaticality of verbs that they have not explicitly learned, then DDL can be considered a learning method that fosters the discovery and acquisition of language patterns from numerous examples. This in turn will support the usage-based model as the theoretical foundation of DDL.
Considering the above, this study aims to fill the current gap in DDL research by demonstrating that a usage-based learning process underlies the effectiveness of DDL and showing its retention effects. To this end, learners’ ability to generalize verb classes through DDL was tested using pre-, post-, and delayed post-tests. The research questions and hypotheses are as follows:
Research question 1: Can L2 learners generalize linguistic patterns through DDL?
Hypothesis 1: L2 learners who engage in DDL activities will be able to generalize their linguistic patterns. According to proponents of usage-based models such as Tomasello (2003), language is acquired through general cognitive abilities, including pattern-finding skills. Therefore, when learners are exposed to extensive input through DDL, they are expected to identify and generalize the underlying patterns. If this hypothesis is confirmed, it will be demonstrated that DDL is a learning method grounded in the theoretical principles of a usage-based model.
Research question 2: Is the knowledge acquired through DDL retained in the long term?
Hypothesis 2: Knowledge acquired through DDL will be retained in the long term. DDL is a learner-centered approach in which learners discover patterns from the examples provided. According to Dehaene (2020), active engagement maximizes memory retention. Therefore, considering the characteristics of DDL, it is expected to have long-term learning effects. In this study, a delayed post-test was conducted one week later to verify whether the learning effects were retained.
IV The present study
1 Procedure
Before participating in the experiment, all participants (and their parents if they were under 18 years old) read and completed a consent form with an explanation written in Japanese. The participants first took a paper-based vocabulary size test (Aizawa & Mochizuki, 2010) with no time limitation. This took no longer than approximately 20 min. After completing an online background questionnaire, the participants completed a pre-test. During the tests, if the participants found any unknown words, a Japanese translation was provided. After the pre-test, the participants were given a 10-minute break.
After the break, at the beginning of the DDL activity, the basic grammatical rule of the dative alternation was reviewed by the researcher for participants with an example using the verb give, which was not included in the list of verbs used in the experiment. The basic rule was that when DO sentences were provided, they could be converted into PD sentences. After the rule was confirmed by the participants, the DDL activity started. During the activity, participants were asked to check whether the rule could be applied to any verb in the handouts named the ‘rule-finding handouts’. DDL instructions were provided in Japanese. The English translations of the instructions are as follows: Now, let’s see if verbs can be used in both SVO and SVOO constructions. Please read the sentences on the Rule-Finding Handouts, then focus on what kinds of words follow each verb. When you read them, using certain symbols can be helpful. For example, when you see a noun after a verb, put a triangle on it, and put a rectangle when you see to-NP.
After completing this DDL task, the participants proceeded to a post-test, which they performed in the same manner as the pre-test.
Finally, after all the participants finished the post-test, they were given a QR code on paper to access an online delayed post-test and were instructed to take the delayed post-test by themselves a week later. 2
2 Participants
The participants comprised of 43 Japanese EFL high school learners (19 male, 24 female) aged between 16 and 18 years attending public high schools in Japan; 24 of them were in grade 10, 4 in grade 11, and 15 in grade 12. An English learning background survey revealed that almost none of the participants had ever traveled to English-speaking countries, except for one participant’s one-week stay in Canada. Proficiency was measured using a vocabulary size test (Aizawa & Mochizuki, 2010), and the participants’ average vocabulary size was 3,082 words (SD = 578.4). As Japanese textbooks of English teach approximately 3,000 words (1,200 in junior high school and 1,800 in high school; Ministry of Education, Culture, Sports, Science and Technology [MEXT], 2009), the participants were considered intermediate-level high school students. 3 Table 1 summarizes the characteristics of the participants.
Participant information.
Note. Of all participants, only one 10th grader had experience abroad (one week stay in Canada).
3 Materials
a Items
The target verb classes and words were selected based on Pinker (1989) and Levin (1993). Four verb classes were chosen for this study: Instantaneously Causing Ballistic Motion (henceforth, the Throw class), Manner of Speaking (henceforth, the Whisper class), Verbs of Sending (henceforth, the Send class), and Verbs of Communication of Proposition and Propositional Attitude (henceforth, the Mention class). Table 2 lists the selected verbs. Of these, three verbs were randomly selected from both the Throw and Whisper classes and printed in the DDL handouts. None of the verbs in the Send and Mention classes appeared in the handouts (for details, see Section IV.3.b). It was assumed that learners’ knowledge of the Throw and Whisper class verbs would change after the DDL activity, but that of the Send and Mention class verbs would not. In other words, these differences would presumably indicate that the changes in learners’ knowledge in this study were due to DDL learning. Thus, the verbs in the Throw and Whisper classes were termed Experimental Items, whereas those in the Send and Mention classes were termed Control Items. 4 All verbs were at the 3,000-word level or below, based on the JACET List of 8,000 Basic Words (Japan Association of College English Teachers [JACET], 2003), to ensure that all the items were familiar to Japanese high school students. 5 According to Pinker (1989) and Levin (1993), the Throw and Send class verbs allow the PD/DO alternation, but the Whisper and Mention class verbs do not. In other words, verbs belonging to the latter classes can only be used in PD constructions.
List of verbs used in the experiment.
Note. DO = double object construction.
b Handouts
Handouts titled ‘rule-finding handouts’ were created for participants to use during the experiment. These handouts included six English verbs (three from each of the Throw and Whisper class), their Japanese translations, and eight example sentences for each verb. If all 12 verbs from the Throw and Whisper classes were studied, participants would need to read 96 example sentences, which would be too burdensome. Therefore, three verbs from each class were randomly selected, resulting in participants being exposed to 48 example sentences.
Two kinds of handouts (handouts A and B) were prepared and randomly provided to the participants to avoid word selection influencing the results. Figure 2 shows the process of creating the handouts. As a result of random verb selection, two words – slide and scream – were included in both of the handouts A and B (Table 3). All sentences were presented in the KWIC format (Figure 3). For the handouts, see Appendix A.

Process for creating handouts.
List of verbs included in each handout.
Notes. Verbs marked with A, for example, were used in Handout A. Since verbs were randomly chosen, some verbs were used in both Handouts A and B.

Some examples of Key Words in Context (KWIC) extracted from the handouts.
c Example sentences
In typical DDL activities, example sentences are taken from corpora. In this experiment, we created example sentences due to the purpose of the present study. This study aims to demonstrate that learners can generalize patterns in example sentences. If learners independently accessed the corpus on a computer, the example sentences to which they were exposed would vary across individuals, leading to different outcomes. Additionally, authentic example sentences obtained from general corpora can be too difficult for secondary school learners to understand, partly because of unfamiliar vocabulary items or the lack of context; pedagogical mediation is thus required to perform DDL (e.g. Wicher, 2019). Thus, to better control the experimental conditions and to avoid an unnecessary burden on the learners, we created sets of example sentences and prepared printed handouts with sample sentences for the experiment.
As mentioned previously, eight example sentences were created for each verb. For the verbs that could take both PD and DO constructions, the Throw and Send class verbs, four PD and four DO example sentences were created. In contrast, for only PD-class verbs, the Whisper and Mention class verbs, all eight example sentences were prepared only in the form of PD constructions. The example sentences were designed to be sufficiently easy for the participants to understand. All words used in the sentences were at the beginner level, defined as the 3,000-word level or below based on the JACET List of 8,000 Basic Words (JACET, 2003). The grammaticality of all the example sentences was verified by a native English speaker.
d Tests
The pre-, post-, and delayed post-tests were prepared using the same 72 sentences, consisting of three types of sentences: 24 experimental sentences, 24 control sentences, and 24 distractors. Participants were asked to rate the grammaticality of the sentences on a seven-point Likert scale. 6 The ratings ranged from –3 (‘completely odd and impossible in English’) through 0 (‘do not know or unable to decide’) to 3 (‘completely natural and possible in English’). The tests were created using Google Forms and each participant completed the tests using their personal smartphones (Figure 4).

An example question from the grammaticality judgment tests with a scale of –3 (strongly negative) to 3 (strongly positive).
For the experimental sentences, six Throw class verbs and six Whisper class verbs were used, with each verb in both the PD and DO constructions, resulting in a total of 24 sentences. Note that Throw class verbs can be used in both the PD and DO constructions, whereas Whisper class verbs can be used only in the PD construction. As control sentences, six Send class verbs and six Mention class verbs were presented in the same format, yielding 24 sentences. As distractors, 24 sentences unrelated to the dative alternation, such as relative clauses, were included (for a list of sentences, see Appendix B). Hence, there were 72 sentences on the tests in total whose grammaticality participants were asked to rate in the pre-, post-, and delayed post-tests, with the order of the questions randomly changed for each test.
To summarize the experimental design, in this study, only three of the six verbs in each experimental class (Throw and Whisper) were provided to the participants in the rule-finding handouts. As the participants studied sentences that included these three verbs, the grammaticality judgment scores for these studied verbs were expected to improve after the treatment. The critical question here is whether the scores for the remaining unstudied verbs in the same classes would also improve. If generalization of language patterns occurs, the ratings for verbs in the same class but not included in the handouts should also improve after DDL. In contrast, the ratings for the verbs in the control classes (Send and Mention) were expected to remain unchanged because the participants were exposed to none of the verbs from the control classes. However, there is another possibility for the control classes. Contrary to our expectations, the ratings for the control class verbs might improve after DDL treatment. If this happens, it would indicate that learners do not distinguish between the Throw and Send classes or between the Whisper and Mention classes; in other words, the distinctions between the semantic verb class differences proposed by linguists, including Pinker and Levin, do not apply to L2 English learners.
4 Analysis
The reliabilities of the pre-, post-, and delayed post-tests were checked using Cronbach’s α, which showed relatively high internal consistency (pre-test = .83, post-test = .94, delayed post-test = .94).
In the tests, participants were instructed to rate their judgment of the grammaticality of the sentences on a 7-point Likert scale, with −3 as the strongest negative and 3 as the strongest positive. If the sentences were grammatical, the ratings should be inclined toward 3. If ungrammatical, they should be toward −3. In the case of grammatical sentences, all the negative ratings were incorrect, whereas for ungrammatical sentences, all the positive ratings were incorrect. For the correct cases, the absolute values of the ratings represent the degree of certainty from 1 to 3. Based on these ideas, we converted the original ratings into a new score ranging from 0 to 3. All conversion patterns are illustrated in Table 4. For example, Whisper class verbs do not take DO constructions; therefore, for sentences with DO constructions using Whisper class verbs, which are ungrammatical sentences, ratings of 1 to 3 in positive values are all incorrect, so the converted scores are all 0.
Conversion of grammaticality judgment ratings to scores.
Since the scores were ordered and categorical, we conducted an ordered logistic regression analysis. We fitted a cumulative link mixed model (CLMM) to analyse the experimental data (the Throw and Whisper classes). All predictors of construction (PD, DO), timing (Pre, Post, Delayed), and learning (Studied, Unstudied) were sum-coded and the reference levels were the averages of the levels of each variable. 7 The analysis was conducted using R version 4.3.3 and the ordinal package version 2023.12.4. The model included three predictors (timing, learning, and construction) and the interaction of construction and timing as fixed effects, with the intercepts of the participants and items as random effects (score ~ construction + timing + learning + construction:timing + (1 | ID) + (1 | itemid)). 8 In addition, the emmeans package version 1.10.1 was used to analyse simple main effects.
The analysis of the control items, namely the Send and Mention classes, followed a similar process. However, because none of the verbs in these two classes were included in the handout, the variable of learning was not included in the model (score ~ construction + timing + construction:timing + (1 | ID) + (1 | itemid)).
V Results
The changes in grammaticality judgment ratings in the pre-, post-, and delayed post-tests are shown in Figure 5. From the figures, it can be seen that changes in ratings occurred not only for the studied verbs but also for the unstudied verbs. Also, it can be observed that the changes in the ratings of the control items were similar to those of the experimental items.

This figure illustrates changes in ratings.
Learners’ ratings were converted into scores for analysis, and the score changes for the Throw and Whisper classes are illustrated in Figure 6. Regarding the DO construction of the Whisper and Mention classes (the verb classes that are not allowed to be used in DO construction), it is noticeable that the mean scores of the pre-test are low compared to those of the others, indicating a relatively large number of incorrect responses in the pre-test for the Whisper and Mention classes. The overall trend, however, shows that the scores tended to improve after treatment, regardless of whether the verbs had been studied. In addition, the score changes between the experimental and control items are similar.

This figure illustrates changes in scores.
Subsequently, a CLMM was fitted to each dataset for the experimental items (the Throw and Whisper classes) to explain the score with the predictors, namely construction, timing, learning, and their interaction construction:timing. First, in the case of the Throw class, as shown in Table 5 (and Figure 7 for their odds ratios), the effects of construction, learning, and the two interactions were not statistically significant. This result shows that the two levels of construction, PD and DO, did not cause any significant differences in the scores. Similarly, the two levels of learning – whether items are studied or not – made no difference. The effects of timing, however, were statistically significant. The estimate for timing[Pre] (−0.659) represents the difference between the pre-test and the average of the three levels, and the estimate for timing[Post] (0.646) represents the difference between the post-test and the average. Since the sum of the estimates across all levels must equal zero, the estimate for the delayed post-test can be calculated as 0.013.
Cumulative link mixed model (CLMM) results for the Throw class.
Notes. PD = prepositional dative construction; model: score ~ construction + timing + learning + construction:timing + (1 | ID) + (1 | itemid); ***p < .001.

Estimated odds ratios for the Throw class.
The differences in timing for each construction can be illustrated through multiple paired comparisons, as presented in Table 6. In both constructions, the post-test and delayed post-test scores were higher than the pre-test scores, whereas the delayed post-test scores were lower than the post-test scores.
Multiple paired comparisons of the Throw class.
Notes. DO = double object construction; PD = prepositional dative construction; *p < .05; **p < .01; ***p < .001.
Next, the CLMM for the Whisper class was conducted to examine the predictors influencing the changes in scores. Compared to Throw class verbs, Whisper class verbs take PD constructions but do not take DO constructions grammatically. The result of the CLMM fitting, as shown in Table 7, demonstrates that construction[PD] is statistically significant. Here, construction[PD] means that PD is 0.716 higher than the average of the two construction levels. Accordingly, the other level, DO, is 0.716 lower than the average. The predictors of timing and the interaction between construction[PD] and timing[Pre] were also statistically significant. The estimate for timing[Pre], which represents the pre-test, indicates that the score is 1.125 lower than the overall average, whereas the estimate for timing[Post], which represents the post-test, indicates that the score is 0.854 higher than the overall average. Based on sum coding, this also means that the delayed post-test score is 0.271 higher than the overall average. See also Figure 8.
Cumulative link mixed model (CLMM) results for the Whisper class.
Notes. PD = prepositional dative construction; model: score ~ construction + timing + learning + construction:timing + (1 | ID) + (1 | itemid); **p < .01; ***p < .001.

Estimated odds ratios for the Whisper class.
Furthermore, the significant interaction between construction[PD] and timing[Pre] suggests that at the pre-test, PD had a significantly higher impact on scores than DO. This is consistent with the results shown in Figure 6, where the mean score for the PD construction is higher than that for the DO construction at the pre-test. In other words, many learners responded that DO was acceptable at the pre-test, even though the DO construction is not grammatically allowed in the Whisper class, resulting in lower DO scores. On the other hand, at the time of the post-test, learners had learned that DO is not acceptable, which led to higher DO scores. This suggests that learners were able to judge at the same level as PD, and CLMM did not find significant difference in construction[PD]:timing[Post].
The differences in timing for each construction in the Whisper class are listed in Table 8. In both constructions, the post-test and delayed post-test scores were higher than the pre-test scores, and the delayed post-test scores were lower than the post-test scores. This result was similar to that of the Throw class.
Multiple paired comparisons of the Whisper class.
Notes. DO = double object construction; PD = prepositional dative construction; **p < .01; ***p < .001.
As a control condition, the Send class was subjected to the same analytical treatment. Unlike the experimental items, no verbs from this class were included in the handouts, which means that the learning variable was not included as a predictor in the model. It was assumed that the scores for this class would remain unchanged because no verbs in this class were learned during the treatment. However, as shown in Figure 6, the scores in this class changed in a manner similar to those in the Throw class. Table 9 shows that both timing[Pre] and timing[Post] were significant. Specifically, the scores were 0.563 lower at the pre-test and 0.357 higher at the post-test than the overall average. This also means that the scores were 0.206 higher at the delayed post-test than the overall average. See also Figure 9 for their odds ratios.
Cumulative link mixed model (CLMM) results for the Send class.
Notes. PD = prepositional dative construction; model: score ~ construction + timing + construction:timing + (1 | ID) + (1 | itemid); ***p < .001.

Estimated odds ratios for the Send class.
The differences in timing for each construction are listed in Table 10. In both constructions, the post-test and delayed post-test scores were higher than the pre-test scores. However, there were no significant differences between the delayed post-test and post-test scores. The other control condition, the Mention class, was analysed using the same procedure. 9 As in the Send class, the scores were assumed to remain unchanged. However, Figure 6 clearly shows that the scores changed in a manner similar to those for the Whisper class, suggesting that learners found that the verbs in this class could be used only in PD. As shown in Table 11, the results indicate that construction[PD] is statistically significant, meaning that its PD is 0.975 higher than the average of the two construction levels. The timing predictors were also statistically significant. Timing[Pre], the pre-test, indicated that the scores were 0.843 lower than the overall average, whereas Timing[Post], the post-test, indicated that the scores were 0.540 higher than the overall average. Additionally, based on sum coding, the scores at the delayed post-test were calculated to be 0.303 higher than the overall average. Furthermore, the interaction between construction[PD] and timing[Pre] was significant, whereas that between construction[PD] and timing[Post] was not. This shows the same pattern as the Whisper class. As shown in Figure 6, the Mention class exhibited similar score changes as the Whisper class; specifically, the PD construction shows a higher mean score in the pre-test than that of the DO construction. See also Figure 10 for their odds ratios.
Multiple paired comparisons of the Send class.
Notes. DO = double object construction; PD = prepositional dative construction; ***p < .001.
Cumulative link mixed model (CLMM) results for the Mention class.
Notes. PD = prepositional dative construction; model: score ~ construction + timing + construction:timing + (1 | ID); **p < .01; ***p < .001.

Estimated odds ratios for the Mention class.
The differences in timing for each construction are listed in Table 12. In both constructions, the post-test and delayed post-test scores were higher than the pre-test scores. However, there were no significant differences between the delayed post-test and post-test scores. This result was similar to that for the Send class.
Multiple paired comparisons of the Mention class.
Notes. DO = double object construction; PD = prepositional dative construction; ***p < .001.
VI Discussion
Although some studies, including Boulton and Cobb (2017), have mentioned the usage-based model as the theoretical rationale for DDL, it has not been directly tested. In this study, we addressed this gap by investigating whether learners can generalize linguistic patterns of the dative alternation through DDL. The grammaticality judgments for the experimental items (verbs in the Throw and Whisper classes) improved at both the post- and delayed post-tests. Importantly, this improvement was observed not only in the scores of the experimental-item verbs to which learners were exposed but also in those of the unstudied verbs in the same classes. Additionally, the judgment patterns were similar for the control items (in the Send and Mention classes), to which learners were not exposed at all. The following section presents a detailed discussion of the results.
1 Generalization of linguistic patterns and retention of learning
The grammaticality judgment scores of the verbs included in the handouts demonstrated improvement, indicating that learners recognized the usage of the verbs studied in the handouts. Meta-analyses, including those by Boulton and Cobb (2017), have highlighted the effectiveness of DDL, and the findings of this study corroborate the theoretical perspectives of previous studies.
Furthermore, the learning effect was retained in the delayed post-test for not only both the Throw and Whisper classes but also both the Send and Mention classes. Considering that DDL is a learning method in which learners interact with linguistic data and discover patterns within it, it can be described as a learner-centered approach requiring attention, concentration, processing depth, and conscious awareness: factors that Dehaene (2020) argues maximize subsequent memorization. Although this study did not directly examine these factors, phenomena supporting Dehaene’s suggestion were observed. Therefore, it is possible that such characteristics of DDL may contribute to greater learning effectiveness and knowledge retention. While previous studies of DDL often lacked delayed post-test data (Boulton & Cobb, 2017), the present study demonstrated that its learning effect persisted for at least one week. This indicates that DDL is not only a reference tool during writing but also an effective activity for solidifying what has been learned. Future studies should directly examine the factors and mechanisms underlying DDL retention.
Another noteworthy point is that there was no significant difference between the studied and unstudied verbs in either the Throw or Whisper classes. This indicates that the grammaticality judgment scores for both the studied and unstudied verbs improved after the DDL activity. The learners’ ability to correctly judge the usage of verbs that they had not studied indicates that they identified and generalized linguistic patterns from the examples provided by DDL and applied these patterns to novel items of the same semantic classes.
This result contradicts the findings of previous studies, including Bley-Vroman and Yoshinaga (1992) and Inagaki (1997), which reported that Japanese L2 learners of English were unable to judge the grammaticality of novel words concerning the dative alternation. However, these studies did not include interventions. If, as proponents of the usage-based model (e.g. Bybee, 2010; Tomasello, 2003) suggest, learners incrementally acquire language based on the input provided to them by using general cognitive abilities such as pattern-finding skills, then in foreign language environments like Japan, it is necessary to support learners by increasing the frequency of input to help them gain generalized and abstract knowledge (O’Keeffe, 2023). In this study, it is considered that the intensive input provided by DDL facilitated learners’ identification of patterns related to the dative alternation. Therefore, as proposed by Allan et al. (2023) and O’Keeffe (2023), DDL can be regarded as a learning method that uses a usage-based model as its theoretical justification.
However, it is necessary to exercise caution in determining whether the generalization of patterns in this study occurred solely because of the DDL activity with the handouts. The original assumption was that generalization occurred during the DDL activity, in which specific example sentences were observed. Following this line of thought, when exposed to the three verbs of each verb class in the handout, learners might have made inductive inferences based on the example sentences, generalized their usage, and used this knowledge to evaluate grammaticality in the post-test. Here, if we assume that learning linguistic knowledge is dynamic and ‘proceed[s] toward the target norm by progressive approximations’ (Boulton & Cobb, 2017, p. 350), it is possible to assume that two levels of generalization occurred during the experiment, at both the DDL activity and the post-test. 10 During the DDL activity, learners should have acquired generalized knowledge of similar verbs by identifying commonalities between verbs with similar meanings. Then, at the post-test, when presented with new, unstudied verbs similar to the studied ones, learners likely inferred that they could apply the same patterns to these new verbs, further advancing their generalization. Thus, it is plausible that the learners not only applied their learned knowledge but also used analogy and extended their generalized knowledge at the time of testing. In this regard, future studies are necessary to determine the points at which learning occurs: during the DDL activity, at the time of testing, or both. A think-aloud method may be used for this purpose.
In addition, it is necessary to focus on the results of the control items (the Send and Mention classes) to understand what generalizations have been made. Since no verbs in the control classes were included in the handouts, it was initially assumed that their ratings would remain unchanged. However, unexpected changes in the ratings were observed. In particular, the rating for the Send class demonstrated a pattern similar to that observed for the Throw class, while the rating for the Mention class exhibited a trend similar to that observed for the Whisper class. This raises the possibility that Pinker’s (1989) and Levin’s (1993) verb classifications, although linguistically sophisticated, may be too detailed for learners. In other words, the learners seemed to have assumed that the Throw and Send classes, and the Whisper and Mention classes were the same. For example, they might have regarded the verbs in Throw and Send classes as verbs of physical transfer, and the verbs in Whisper and Mention classes as verbs of vocal activity. Given this, the improvement in the control items’ scores can be explained by the same mechanism as the improvement in the unstudied items. That is, the unstudied and control item scores improved because the learners generalized the patterns from studied items to both the unstudied and control items based on semantic similarity between them. 11 However, the limited number of verbs used in these experiments might have influenced our results. Additionally, sensitivity in distinguishing between verb classes may improve as learners’ proficiency increases. In future research, it will be necessary to consider a wider variety of verbs and conduct the experiment with more proficient learners.
2 Limitations and pedagogical implications
This study has certain limitations. First, it remains uncertain what generalizations the learners have made. Specifically, although this study identified changes in scores on the grammaticality judgment task for unstudied verbs as evidence of generalization, the changes in scores did not reveal the specific generalizations that occurred. For example, it may be possible to clarify the details by asking learners to verbalize any patterns they have discovered. In addition, conducting the think-aloud method at the time of the DDL activity and at the time of testing could clarify the reasoning process at each stage of learning.
Second, in the present study, learners were not required to perform production tasks, although DDL may be effective in improving production. Furthermore, the learning effect may be retained longer if productive activities are performed during the learning sessions. Therefore, the effects of production should be investigated in future studies.
Lastly, while there was a statistically significant difference between the delayed post-test and the pre-test, there seemed to be a tendency for the scores to deteriorate at the delayed post-test compared to the post-test. This lower performance on the delayed post-test could indicate that the knowledge of verb usage, like other knowledge, decays over time. Review activities may be necessary to entrench the knowledge more effectively.
Despite these limitations, the present findings hold important educational implications. DDL can be a very efficient learning method because learners can not only learn the verb usages they are directly exposed to but also apply the usages to similar verbs. As O’Keeffe (2023) states, DDL can be described as a learning method that intensifies learners’ language experiences. In EFL environments like Japan, where language experience is solely limited to the classroom, DDL can accelerate learning by helping learners generalize linguistic patterns.
VII Conclusions
This study examined whether learners can generalize linguistic patterns through intensive input with DDL. Specifically, the study used the dative alternation as an example of a linguistic phenomenon. The results showed that after engaging in DDL activities, learners could judge the grammaticality of not only the studied verbs but also the unstudied verbs belonging to the same class more accurately. This study concludes that learners can generalize linguistic patterns through DDL. Consequently, DDL can be regarded as a learning method that reflects the principles of the usage-based model.
Furthermore, the study demonstrated that the learning effect persisted even after one week. A notable limitation of previous DDL studies is the absence of delayed post-tests. The present findings indicate that DDL is not merely a tool for referencing word usage but also a learning method capable of consolidating the knowledge acquired through it.
Although many researchers have argued that DDL is an effective language learning method, its use in educational settings remains limited. However, because DDL has been demonstrated to be a theoretically supported method, it can be considered more applicable in practice. Future studies should examine whether these results can be replicated with other verbs or grammatical structures and employ production tasks not only to assess whether learners can judge the grammaticality of usage but also to examine whether they can actually use the learned constructions correctly. We believe that our study can serve as a stepping stone for future research in DDL.
Footnotes
Appendix A
Appendix B
Acknowledgements
We are grateful to the anonymous Language Teaching Research reviewers for their valuable and insightful comments on earlier versions of our manuscript.
Authors’ note
This article is based on the first author’s master thesis submitted to Nagoya University and has been revised through reanalysis.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was partially supported by JSPS KAKENHI Grant Number 21K00799 awarded to the second author. This work is partially supported by Open Access Acceleration Project.
