Abstract
In order to facilitate the instruction and acquisition of EFL writing, this paper compares the uses of the light verb
Keywords
Introduction
Knowledge of language is increasingly considered equivalent to knowledge of patterns (i.e., the recurrent elements of languages that are entities, Alexander, 1977), as accentuated by pioneers of usage-based theories of language (e.g., Francis et al., 1996; Hanks & Pustejovsky, 2005; Sinclair, 1991). Corpus pattern analysis (CPA), which focuses on the lexical analysis of collocational patterns based on corpus evidence, is a promising technique for establishing lexical resources in order to elucidate word meaning in context (Hanks, 2008). This technique is based on the theory of norms and exploitations (TNE), a lexicocentric, corpus-driven, bottom-up theoretical approach proposed by Hanks (2013). Regardless of its comparatively recent origin, CPA has attracted plentiful attention and been substantially applied in natural language processing (NLP), computational lexicography, and language learning and teaching (Hanks, 2013; Hanks & Ma, 2021; Hanks & Moze, 2019; Maarouf & Baisa, 2013). The Pattern Dictionary of English Verbs (PDEV) is the main fruit of CPA, seeking to unfold a well-grounded corpus-driven account of English verbs (Bradbury & Maarouf, 2013). However, since the analytic procedure of CPA is labor-intensive and time-consuming, PDEV has only covered approximately 30% of the target verbs. For instance, light verbs (i.e., verbs that make relatively “light” contributions to the semantics of constructions, such as
Light verbs play a central role in the English language due to their maximal analytic complexity and high frequency. Native speakers have internalized light verb patterns and are able to apply them in actual usage due to their lifelong exposure to such patterns. Learners of English as a foreign language (EFL), on the other hand, who lack such experience, have been observed to acquire these patterns in a different manner and with greater difficulty since they are only exposed to a new language later in life (Crosthwaite et al., 2020). In spite of the high frequency of light verb usage, EFL learners tend to consistently exhibit underuse, overuse, and misuse of these verbs due to their intricate semantics and syntax. Additionally, the light verb
Consequently, this study aims to identify the similarities and disparities in the patterning of light verb
Literature Review
English Light Verbs
The term “light verb” was proposed by Jespersen (1949, p. 117) and has since attracted particular attention across languages due to its popularity and complexity in language use. In his conception, light verbs (such as
However, previous studies mainly focus on the typical light uses of English light verbs, remaining the holistic patterns of light verbs underexplored. Moreover, limited research is found on the mapping of the patterns and meanings of light verbs.
Comparative Studies on EFL and L1 English Writing
The past few decades have witnessed significant research comparing EFL and L1 writing at the undergraduate level, driven by advancements in computational processing technologies, and the utilization of corpus-informed and corpus-based approaches (Alfalagg, 2020; Chen, 2017; Shamalat & Ghani, 2020; Sun & Hu, 2023).
The evaluation of EFL writing performance in relation to L1 writing has been extensively explored, with a focus on complexity, accuracy, and fluency, including the investigations of cohesive devices (Alfalagg, 2020), personal pronouns (Maclntyre, 2019), hedging (Sun & Hu, 2023), among others. Previous studies consistently reveal the frequent occurrence of overuse, underuse, or misuse of these linguistic features in EFL writing, posing significant challenges for EFL learners (Chen, 2017). Guided by the findings of comparative research, pedagogical methodologies such as genre-based instruction (Zhai & Razali, 2023), task-based instruction (Derakhshan, 2018), and production-oriented instruction (Zhang, 2020) have demonstrated the potential to enhance learners’ motivation, self-efficacy, writing awareness, and language proficiency, consequently improving their writing performance. Despite its recent origin, data-driven learning (DDL) has emerged as a thriving field within EFL writing research (Flowerdew, 2015; Muftah, 2023; Sun & Hu, 2023). With the aid of corpus technology, DDL enables students to uncover language patterns using concordancing tools directly or by employing concordance-based analysis as part of their study.
Comparative studies have also investigated the use of light verbs in L1 and L2/EFL English varieties. According to Crosthwaite et al. (2020), comparing the similarities and disparities between learner corpora and L1 corpora can explicate the gap between native and non-native speakers’ language usage and enable the analysis of the linguistic challenges L2/EFL learners face. The majority of the comparative studies on light verbs focus on frequency analysis (Gilquin, 2019; Giparaite & Baliūte, 2019). For instance, when comparing Asian English varieties with British and American English, Giparaite and Baliūte (2019) found that there is a greater range of LVC types and modifiers in native English varieties than non-native ones. Furthermore, the study illustrates that native English varieties exhibit more diverse patterns of modification in their structure.
Despite the burgeoning body of comparative studies on EFL and L1 writing, still insufficient research has been conducted in the field of light verbs.
The Light Verb GET
In comparison with other light verbs,
Briefly put, therefore, the light verb and LVCs have become a hot research field for corpus linguistics. However, previous research has been subject to limitations. Firstly, there is still limited research on the association of English light verb patterns and meaning. Secondly, a holistic pattern analysis of the light verb which includes both its light and heavy usages calls for more investigation. Thirdly, comparatively little is known about how light verbs are patterned differently in Chinese and L1 writing. Fourth, the semi-light verb
Based on the previous studies, the current study presents a CPA-based approach to explore how the meaning and use of the light verb
Methodology
Corpora Description
The present study mainly used two corpora, that is, Chinese Undergraduate English Writing (CUEW), written by Chinese undergraduates, and a sub-corpus selected from British Academic Written English (BAWE), written by L1 undergraduates. The two corpora are comparable in that they are similar in size and both are argumentative essays written by undergraduates.
Since 2010, the National Association of English Writing (NAEW) in China has been holding automated English writing (AEE) competitions every 6 months which attract more than a million participants nationwide each time. All the participants in AEE competitions agree to share their assignments for academic use before submitting them to the system. CUEW is a collection of assignments randomly drawn from the automated English writing competitions for Chinese undergraduates from April 2019 to October 2020. The sub-corpus of CUEW contains 5,000 essays which have 1,874,512 words in total. CUEW is deemed to be a useful resource for Chinese EFL instructors, learners, and linguistic researchers in light of its ample and up-to-date data and the wide geographical spread of Chinese universities.
BAWE is a collection of high-standard student assignments across disciplines and levels (over 6.5 million words; Nesi, 2008). BAWE is comprised of over 30 disciplines, categorized into the four disciplinary groups of arts and humanities, life sciences, social sciences, and physical sciences, from Level 1 to Level 3 undergraduates and masters, making up more than 2,700 assignments in total. To make BAWE more comparable, only the disciplines under the domain of arts and humanities from L1 to Level 3, which map with the argumentative writing in CUEW, are under investigation (651 texts and 1,479,912 words). Despite the disparate average length of the texts in CUEW and BAWE, both corpora could in a way be seen as representative of the target population’s English writing proficiency, and the sub-corpus of BAWE provides a high-quality standard for the essay writing of Chinese EFL undergraduates. In summary, the sub-corpus of BAWE is a rich lexical resource that supports the identification of writing characteristics of native English writing at the undergraduate level in the genre of argumentative writing; hence, it can be used as a comparable corpus to CUEW.
Furthermore, the British National Corpus (BNC), a 100-million-word native English collection built by the Oxford University Press, mainly from 1991 to 1994, on which PDEV is based, is used as a reference corpus to validate the patterns that are found in CUEW but not in BAWE. More specifically, if instances of such patterns exist in BNC and make sense in writing, they are accepted as systematic; if not, further analysis is required. Furthermore, BNC is adopted to validate the significant disparity across CUEW and BAWE: if a certain pattern is more present in CUEW than in BAWE, it is likely that the given pattern is commonly used in a colloquial situation rather than in formal writing. Therefore, a comparison of frequencies in both written and spoken texts in BNC can be conducted to confirm the degree of formality.
Data Collection
AntConc, a free concordancing software designed by Anthony Laurence, is used to extract the concordance lines of lemma
Corpus Pattern Analysis (CPA)
CPA is primarily concerned with the lexical examination of collocational patterns by utilizing corpus evidence as textual sources. Each pattern set reveals both typical norms associated with a given word and atypical phraseologies that deviate from normal usage, which are categorized as exploitations (Hanks, 2013). This approach offers an alternative perspective on the structured characteristics of natural language, enabling computers to comprehend, process, and generate language data accurately (Hanks & Pustejovsky, 2005). As a result, there has been a growing interest in pattern-based lexical research employing CPA, including studies on metaphors and idioms (Hanks, 2004), similes (Hanks, 2008), nouns (e.g., “way,”Hanks & Moze, 2019), and verbs (e.g., “poison” verbs, Bradbury & Maarouf, 2013). Previous studies have shown that CPA facilitates the mapping of meanings onto word patterns and offers insights into how words are authentically used within specific contexts through the analysis of corpus evidence, thereby providing promising avenues for computational lexicography (Hanks, 2013).
Consequently, CPA has proven to be a reliable technique to analyze the collocational patterns of the light verb
(i)
(ii)
(iii)
To ensure clear and trustworthy annotation, four pre-trained annotators who are linguistics-major postgraduate and doctoral students, including the authors, are involved in the CPA process. Prior to the analysis, all annotators have achieved a high level of inter-annotator agreement (IAA). Each concordance line has been annotated by at least two annotators separately. We put forward any doubts at any moment which are then cleared via immediate discussion.
Log-Likelihood
To compare the two corpora in terms of differences in pattern frequency, log-likelihood (LL) was conducted to examine the similarities and differences in the use of the light verb
Results
This section explores the patterning of the light verb get in two steps. Firstly, it interprets the divergent patterns (patterns that deviate from those found in native English corpora) while distinguishing between systematic patterns (divergent patterns that make sense and are found in BNC) and erroneous patterns. Secondly, after the erroneous patterns in CUEW are excluded, the left patterns, that is, the refined patterns in the target corpora, are compared to analyze the similarities and disparities in Chinese and native English writing.
Interpreting the Divergent Patterns
As illustrated in Table 1,
Total Hits and Patterns of
Systematic Patterns of GET in CUEW
An investigation in BNC shows two systematic patterns, the details of which are presented in Table 2. Queries in BNC indicate that Patterns 20 and 28 are produced significantly more often in spoken texts than in written texts, suggesting that these two patterns are more likely to be used in informal contexts than formal ones. It can be inferred that Chinese undergraduates tend to show an uncertainty of registers in writing.
Systematic Patterns of
Erroneous Patterns of GET in CUEW
Table 3 gives the other three divergent patterns of
Erroneous Patterns of
Instances of Erroneous Pattern 1 of
Instances of Erroneous Pattern 2 of
Instances of Erroneous Pattern 3 of
Regarding the first erroneous pattern
Three sets of instances of Erroneous Pattern 1 are identified from the dataset by examining the meanings in the context (see Table 4). The first set includes
The second divergent pattern of
The third divergent pattern of
Comparing the Refined Patterns
When erroneous uses and instances of patterns that contain less than three occurrences are removed, totals of 418 and 1,990 refined hits of
Refined Hits and Patterns of
Table 8 presents the frequencies and LL values of the 30 refined patterns of
Frequencies and LL Values of the Normal Patterns of
Analysis of Significantly Overused Patterns
A further investigation of the concordance lines of this pattern reveals that the two corpora share similar frequencies of semantic types in the subject slot but vary in the object slot. Seventy-five percent of the instances of this pattern are followed by the semantic type
(1)
(2)
In contrast, the other two less frequent semantic roles
Patterns 2 and 3 embody the heavy use of
An in-depth investigation of the instances of Pattern 4 in CUEW reveals that a large percentage of the object slot is filled with high-frequency fixed expressions (
(3)
(4)
Above are a further five significantly overused patterns of
Pattern 20 has been analyzed in Table 2, so no further discussion is carried out here.
Analysis of Significantly Underused Patterns
An observation of the instances of Pattern 12 found that Chinese undergraduates tend to use some fixed collocates, such as
(5)
(6)
Moreover, both Chinese and L1 undergraduates produced
(7)
Pattern 21 was only created by L1 writers, with no instances found in CUEW. Learners may be more familiar with
Discussion
Similarities in Terms of Light Verb Patterns
The similarities in English writing by Chinese and L1 undergraduates in terms of light verb patterns mainly fall into two areas. Firstly, most of the prototypical patterns identified in the two corpora are used by both Chinese and L1 undergraduates. As shown in Table 8, most of the patterns, except a small proportion identified in either BAWE or CUEW, are shared by the two varieties of English. Secondly, both L1 and Chinese writers are capable of exploiting language innovatively in their writing, according to the results of refined patterns. The exploitation of norms plays a crucial role in linguistic innovation, making language use more vivid, dynamic, and interesting. It indicates that writers do not just exhibit their fundamental linguistic competence, but also occasionally exploit that competence to say new and interesting things or to say old things in a new and interesting way (Hanks, 2004; Hanks & Moze, 2019). Besides semantic-type coercion, typical means of exploitation of a norm found in both BAWE and CUEW include ellipsis, anomalous collocates, metaphors, and similes (Hanks, 2013).
Hanks (2013) highlights that words in isolation do not have meanings; rather, they have meaning potentials which can be activated by contextual triggers such as phraseologies and collocations. This research has also indicated that CPA/PDEV can be an effective method for describing light verb patterns through identifying norms, divergent patterns, and exploitations of norms in context. In view of the abstract nature of light verbs, CPA/PDEV can provide disambiguated patterns, as well as implicatures, frequencies, and authentic examples, making it a rich resource in furthering our understanding of how the meaning and use of English light verbs are associated, so as to clarify the semantic and syntactic ambiguity these polysemous light verbs entail.
Disparities in Terms of Light Verb Patterns
According to the results, there are four aspects to the disparities in terms of light verb patterns in BAWE and CUEW: erroneous patterns, overuse and underuse, language diversity, and register awareness. The contrasting patterns in learner corpora and native-speaker corpora can provide guidance in language acquisition for learners.
Erroneous Patterns
From the perspective of errors, Chinese learners tend to produce correct patterns most of the time and make mistakes in a varied but recognizable way. As revealed in the analysis of divergent patterns, only a few erroneous patterns of
CPA enables the identification of disambiguated patterns in context with the auspices of semantic types, allowing the specification of divergent patterns used by a certain number of learners (Alqarni, 2019; Hanks, 2013). The identification of the erroneous patterns of the light verb
Overuse and Underuse
Chinese learners tend to use high-frequency patterns of
Previous studies have also consistently found that the overuse, underuse, or misuse of English items tends to frequently occur in EFL writing and poses enormous challenges to EFL learners (Chen, 2017). Light verbs are often “lexical teddy bears” (Hasselgren, 1994) for non-native learners of English, being part of their vocabulary repository as preferred words because they are reckoned both safe and easy to use. The over-presentation of
Language Diversity
One remarkable consequence of the over-repetition of frequent patterns and under-production of infrequent ones is a lack of diversity in the range of patterns. Although both Chinese and L1 undergraduates tend to produce a similar number of patterns, CUEW displays a much narrower range of normalized patterns. Additionally, L1 writers tend to use more diverse complements in a pattern while Chinese learners appear to use fixed expressions, such as
Therefore, enhancing EFL learners’ language variety is of significance for writing proficiency. It is essential for English teachers in China to take a strong view that language variety is an indispensable feature of English writing proficiency. Firstly, EFL teachers can design classroom activities to motivate learners’ wide usage and creative exploitation of language patterns, such as brainstorming as many collocates of a word as possible before writing. Secondly, encouraging students to create topic-related corpora based on authentic written texts is also efficient in helping them to find and acquire diverse alternatives for a certain linguistic expression. Finally, with the auspices of corpus technology, data-driven learning (DDL) allows students to discover the regularities of language by using concordancing tools directly or adopting concordance-based output indirectly in their study, which can guarantee vast authentic exposure to contextualized linguistic data and a full range of word patterns in different varieties of English (Chang, 2020; Jablonkai & Csomay, 2022; Muftah, 2023; Sun & Hu, 2023). Data-driven PDEV can also be conducted by EFL learners to broaden their width of senses and forms of polysemes and bridge the gap that exists between EFL learners and native speakers in terms of pattern variety.
Register Awareness
Uncertainty of register is another notable feature in Chinese undergraduates’ English writing. The results suggest that Chinese learners appear to overuse some informal patterns that are rarely or not used by native speakers in formal contexts, such as
EFL teachers should attach importance to the cultivation of learners’ language awareness. Biber and Barbieri (2007) claim that spoken registers and written registers are strikingly disparate in terms of their lexico-grammatical features; hence, language teachers should apply different course management techniques for different types of classes. DDL is another promising approach that can be employed to raise learners’ register awareness and pattern awareness. This approach exposes learners to a wide range of texts of all genres and patterns, providing them with appropriate styles and linguistic features of varied production modes and exposing them to contextualized texts, which will bring about a rewarding result in raising learners’ register awareness and pattern awareness by reducing the influence of L1 and the target language (Larsson & Kaatari, 2020; M. H. Lin, 2021). Besides, students’ direct engagement in the concordancing and data-analyzing process promotes learner autonomy, language awareness, thinking skills, and cognitive abilities (Flowerdew, 2015). For instance, learners’ direct application of CPA can expose them to an enormous amount of authentic evidence and help them internalize complex English patterns autonomously, thus bolstering their language awareness and intuition.
Conclusion
Drawing on insights from TNE, his paper compared the patterning of the light verb
Limitations and Suggestions
The present study, however, only tackled a restricted range of issues in a wide and complicated research field, leaving the ground open for further studies. Firstly, due to practical constraints, this study did not provide a holistic picture of how different levels of learners use light verbs in their writing. Thus, further comparison of how different levels of learners use light verbs is needed, which could pave the way for a better understanding of developments in light verb patterning. Secondly, as the process of CPA is labor-intensive and time-consuming, only one light verb was analyzed in the study. How light verbs other than
Footnotes
Appendix
Refined Patterns of
| No. | Pattern and implicature | FC (Fre, %) | FB (Fre, %) | LL | |
|---|---|---|---|---|---|
| 1 | 585 (28.43) | 80 (17.82) | +322.98 | ||
| 2 | 467 (22.69) | 75 (16.70) | +230.50 | ||
| 3 | 308 (14.97) | 46 (10.24) | +160.27 | ||
| 4 | 203 (9.86) | 9 (2.00) | +176.51 | ||
| 5 | 89 (4.32) | 4 (0.89) | +77.13 | ||
| 6 | 62 (3.01) | 21 (4.68) | +16.89 | ||
| 7 | 49 (2.38) | 13 (2.90) | +14.63 | ||
| 8 | 49 (2.38) | 3 (0.67) | +39.00 | ||
| 9 | 25 (1.21) | 30 (6.68) | −2.40 | ||
| 10 | 41 (1.99) | 4 (0.89) | +27.27 | ||
| 11 | 24 (1.17) | 14 (3.12) | +0.83 | ||
| 12 | 15 (0.73) | 30 (6.68) | −9.27 | ||
| 13 | 12 (0.58) | 21 (4.48) | −5.07 | ||
| 14 | 17 (0.83) | 3 (0.67) | +7.79 | ||
| 15 | 8 (0.39) | 14 (3.12) | −3.38 | ||
| 16 | 4 (0.19) | 6 (1.34) | −1.01 | ||
| 17 | 6 (0.29) | 3 (0.67) | +0.44 | ||
| 18 | 3 (0.15) | 6 (1.35) | −1.85 | ||
| 19 | 4 (0.19) | 4 (0.89) | −0.11 | ||
| 20 | 7 (0.34) | 0 (0) | +8.15 | ||
| 21 | 0 (0) | 7 (1.56) | −11.46 | ||
| 22 | 3 (0.15) | 3 (0.67) | +0.08 | ||
| 23 | 3 (0.15) | 2 (0.45) | +0.03 | ||
| 24 | 1 (0.05) | 4 (0.89) | −2.71 | ||
| 25 | 0 (0) | 4 (0.89) | −6.55 | ||
| 26 | 0 (0) | 4 (0.89) | −2.23 | ||
| 27 | 0 (0) | 4 (0.89) | −6.55 | ||
| 28 | 3 (0.15) | 0 (0) | +3.49 | ||
| 29 | 2 (0.10) | 1 (0.22) | +0.15 | ||
| 30 | 0 (0) | 3 (0.67) | −4.91 | ||
| Total | 1,990 (96.70) | 418 (93.30) | |||
Acknowledgements
We would like to thank all the reviewers who have given suggestions on this paper. Our special thanks go to Prof. Patrick Hanks and Pro. Su Hang for their insightful comments and profound questions which motivated us to enhance this study from diverse aspects.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a grant from the SISU Postgraduate Research Innovation Project in 2024 (No. SISU2024XK001).
Ethical Approval
This article does not contain any studies with human or animal subjects.
Data Availability Statement
The data supporting the findings of this study is available from the corresponding author upon request.
