A Multidimensional Analysis of Language Use in English Argumentative Essays: An Evidence From Comparable Corpora

Abstract

Multidimensional (MD) analysis contributes to a comprehensive linguistic description of second language (L2) writing texts. However, cross-linguistic studies based on MD analysis are still insufficient and need to be increased. In particular, the unavailability of comparable corpora prevents current MD-based L2 writing research from making reliable comparisons of linguistic differences between native writers and L2 learners. Considering these, this study adopts corpora of maximum similarity for the cross-linguistic comparative analysis. Using Biber’s MD analysis, this study compares linguistic differences between native English writers and Chinese English learners in argumentative essays with similar communicative purposes. The analysis reveals that the two groups of texts show significant differences in slightly over half of the individual linguistic features but share more similarities than differences in textual dimensions. It also shows that the two groups are, in fact, different varieties of English language use, even though they share a similar text type of “involved persuasion.” The results confirm the importance of comparable corpora for cross-linguistic comparisons and the necessity of MD analysis for understanding L2 writing.

Keywords

multidimensional analysis argumentative essays language use L2 writing comparable corpora

Introduction

Multidimensional (MD) analysis, also known as multi-feature multi-dimensional analysis (Biber, 1988), is a methodological approach to analyzing linguistic variation across registers (i.e., different situational contexts and communicative purposes) (Biber & Conrad, 2009). This approach integrates corpus linguistics and advanced statistical methods to identify clusters of co-occurring linguistic features and describe linguistic variation (Biber & Conrad, 2013). In contrast to other linguistic analyses (i.e., analyses of individual linguistic features and analyses of a set of features with similar textual functions or indicative of linguistic complexity), MD analysis provides a relatively comprehensive profile of a given text, register, or genre in terms of textual dimensions underlying linguistic co-occurrence patterns (Biber, 2014). As such, it has received increasing attention from researchers interested in descriptive and applied linguistics (cf. Biber, 2014; Conrad & Biber, 2013a).

The past two decades have witnessed a surge of interest in L2 writing research based on MD analysis. Studies have investigated language use and its variation in various situational contexts, such as registers or genres (e.g., Asención-Delaney, 2014; Goulart, 2021; Hardy & Friginal, 2016; Larsson et al., 2021), English proficiency (e.g., Kim & Nam, 2019; Pan, 2018), writing quality (e.g., Friginal et al., 2014; Jarviset al., 2003), writing tasks (e.g., Biber et al., 2016), and writing topics or prompts (e.g., Weigle & Friginal, 2015). Systematical cross-linguistic comparisons have also been considered in such studies (e.g., Abdulaziz et al., 2016; Cao & Xiao, 2013; Crosthwaite, 2016; van Rooy, 2008; van Rooy & Terblanche, 2009; Weigle & Friginal, 2015; Zhao & Wang, 2017). They reveal linguistic similarities and differences between writers from different L1 backgrounds, mainly native (L1) and non-native (L2) writers of English, providing significant implications for identifying linguistic features of L2 writing and understanding how L2 writers differ from native writers in their use of these features.

Although some cross-linguistic comparisons of L2 writing for research and publication purposes exist (e.g., Cao & Xiao, 2013; Crosthwaite, 2016; Pan, 2018; Weigle & Friginal, 2015), studies comparing linguistic differences between L2 learners’ and native spearkers’ writing remain insufficient and thus need to be increased. Moreover, such studies seem to rely more on corpora built in different writing contexts and for different purposes (e.g., van Rooy, 2008; van Rooy & Terblanche, 2009; Zhao & Wang, 2017), possibly due to the unavailability of comparable corpora, that is, corpora of maximum similarity (Moreno, 2008). For example, the corpora of native writers and L2 learners used in previous studies were not equivalent in writing topics, time constraints, and text lengths. However, these factors are essential in contrastive interlanguage analysis (CIA) (Ishikawa, 2013). This inequivalence may lead to more differences than similarities observed between different L1 writing groups (e.g., van Rooy, 2008; Zhao & Wang, 2017). Likewise, this has led to the difficulty for related studies to explain confounding variables and even to provide reliable results (Moreno, 2008).

The two research gaps deserve further exploration. First, MD analysis contributes to a comprehensive linguistic description of learner writing (Biber, 2014; Biber et al., 2016). Second, for cross-linguistic comparisons, the use of comparable corpora is more reliable (Moreno, 2008). This study addresses these gaps to verify whether language differences in cross-linguistic MD analyses are related to uncontrolled writing variables. It uses a text corpus for CIA to compare language differences between native writers and non-native learners (i.e., Chinese English learners). This study will provide a reliable description of the linguistic features and functional dimensions of L2 learner writing and help to understand the differences between L2 learners and native writers in these two dimensions.

Literature Review

This section reviews multidimensional analysis (Biber, 1988) and MD-based L2 writing research, with a focus on cross-linguistic studies based on the 1988 MD analysis. It highlights the need to apply multidimensional analysis in L2 writing research, identifies current research gaps, and points out the importance of comparable corpora for cross-linguistic comparisons.

Multidimensional Analysis

Multidimensional analysis was originally proposed by American corpus linguist Douglas Biber in 1988 to describe the linguistic variation between spoken and written English registers. The corpus analyzed in the 1988 MD analysis consisted of 481 contemporary spoken and written texts in British English, sampled from 23 register categories, including 15 from the Lancaster-Oslo-Bergen Corpus and 8 from the London-Lund Corpus of Spoken English. The linguistic features (i.e., lexical, grammatical, syntactic, and rhetorical features) involved in this analysis are those used by speakers and writers to construct and realize different texts or discourses in a given context of language use. Based on the theoretical assumption that linguistic features co-occur in texts to achieve potential communicative functions, Biber extracted a seven-factor solution from 67 linguistic features through factor analysis to determine register variation dimensions (Biber, 1988, Chaps. 4 & 5; Biber, 1995, Chap. 5). According to Biber (1988), the first six factors (i.e., functional dimensions) have strong factor structures and the features clustered on each factor are functionally consistent and easily interpreted as dimensions of register variation. As shown in Table 1, these six factors are (1) Involved versus Informational Production; (2) Narrative versus Non-Narrative Concerns; (3) Explicit versus Situation-Dependent Reference; (4) Overt Expression of Persuasion; (5) Abstract versus Non-Abstract Information; and (6) On-Line Informational Elaboration. Table 1 also lists the descriptions of each dimension and the linguistic features in which they co-occur.

Table 1.

Descriptions and Summary of the Six Dimensions for MD Analysis (Biber, 1988) (Adapted from Nini, 2019).

Dimension	Description	Linguistic features
1. Involved versus informational production	High scores: affective/interactional text Low scores: informationally dense discourse.	Involved production features: that-deletions, contractions, present tenses, second person pronouns, do as pro-verb, analytic negation, demonstrative pronouns, emphatics, first person pronouns, pronoun it, be as main verb, causative subordinators, discourse particles, indefinite pronouns, hedges, amplifiers, sentence relatives, wh-questions, possibility modals, non-phrasal coordinations, wh-clauses, stranded prepositions. Informational production features: nouns, average word length, prepositions, type/token ratio, attributive adjectives.
2. Narrative versus non-narrative concerns	High scores: narrative text	Narrative concern features: past tenses, third person pronouns, perfect aspects, public verbs, synthetic negations, present participial clauses.
3. Explicit versus situation dependent reference	High scores: context-independent	Situational-dependent reference features: time adverbials, place adverbials, general adverbs. Explicit reference features: wh-relative clauses on object position, pied-piping relatives, wh-relative clauses on subject position, phrasal coordinations, nominalizations.
4. Overt expression of persuasion	High scores: explicitly marks the author’s point of view as well as their assessment of likelihood and/or certainty	Overt expression of persuasion features: infinitives, prediction modals, suasive verbs, conditional subordinations, necessity modals, split auxiliaries.
5. Abstract versus non-abstract information	High scores: provides information in a technical, abstract and formal way.	Abstract informational features: conjuncts, agentless passives, past participial clauses, WHIZ deletion relatives, other adverbial subordinators.
6. Online informational elaboration	High scores: informational in nature but produces under certain time constraints	Online informational elaboration features: that clauses as verb complements, demonstratives, that relative clauses on object position, that clauses as adjective complement.

The 1988 MD Analysis, built on a large text corpus and multiple linguistic features, has been widely used as a valid and comprehensive framework for register variation analysis in various domains (Biber & Conrad, 2013; Conrad & Biber, 2013b). First, studies based on this framework do not require the selection of new linguistic features for new factor analyses. Second, they do not need to interpret corresponding variation dimensions according to specific contexts. Third, by mapping the new corpus onto this framework, researchers can compare the new register with the established English ones (Nini, 2019). It is worth noting that another type of MD-based research exists, namely, studies that conduct new MD analyses. This type of research entails selecting new linguistic features, conducting factor analysis, interpreting potential functional dimensions, and comparing new registers (Conrad & Biber, 2013b).

The registers examined in current MD-based studies are at different levels of specificity, depending on the definition of textual units and the depth of analysis of situational parameters (Biber, 2014; Egbert & Gracheva, 2023; Goulart et al., 2020). As a relatively specialized register, L2 writing has been studied in a variety of situational contexts to differentiate sub-registers of learner writing (e.g., Abdulaziz et al., 2016; Biber et al., 2016; Biber & Gray, 2013; Cao & Xiao, 2013; Friginal et al., 2014; Friginal & Weigle, 2014; Jarvis et al., 2003; Pan, 2018; Ren & Lu, 2021; van Rooy, 2008; van Rooy & Terblanche, 2009; Zhao & Wang, 2017). Although inadequate, these studies suggest that MD analysis is a powerful method for investigating the linguistic characteristics of L2 writing in different contexts.

Multidimensional Analysis Research in L2 Writing

Language and its use have long been at the forefront of L2 writing research, instruction, and assessment (e.g., Bulté & Housen, 2014; Crossley, 2020; Taguchi et al., 2013). To this end, many comparative and quantitative studies have examined the linguistic features of L2 writing in different contexts through different linguistic analyses. Relevant studies have been conducted mainly through the analysis of individual linguistic features (e.g., Ferris, 1994; Grant & Ginther, 2000; Hinkel, 2002; Lee et al., 2021) and the analysis of groups of linguistic features that have specific rhetorical functions or indicate linguistic complexity (e.g., Jiang, 2015; Larsson & Kaatari, 2020; Parkinson & Musgrave, 2014; Petch-Tyson, 1998; Zhang & Lu, 2022; Zhang & Zhan, 2020). However, over the past two decades, MD analysis has attracted growing attention from L2 writing research because it provides a relatively comprehensive description of textual features in terms of linguistic features and functional dimensions (Jarvis et al., 2003).

MD-based L2 writing research examines language use and its variation in relation to various factors, such as registers or genres (e.g., Asención-Delaney, 2014; Goulart, 2021; Hardy & Friginal, 2016; Larsson et al., 2021); English proficiency (e.g., Kim & Nam, 2019; Pan, 2018), writing quality (e.g., Friginal et al., 2014; Jarvis et al., 2003), writing tasks (e.g., Biber et al., 2016), and writing topics or prompts (e.g., Weigle & Friginal, 2015). Among these studies, cross-linguistic comparisons, although few, are one of the frequently studied areas (e.g., Abdulaziz et al., 2016; Cao & Xiao, 2013; Crosthwaite, 2016; Goulart, 2021; Pan, 2012, 2018; van Rooy, 2008; van Rooy & Terblanche, 2009; Zhao & Wang, 2017). Research along this line generally falls into two streams. One strand of research focuses on linguistic variation in English academic writing for research and publication purposes (e.g., Cao & Xiao, 2013; Crosthwaite, 2016; Friginal & Weigle, 2014; Pan, 2018). The other strand centers on learner writing for general purposes (e.g., Abdulaziz et al., 2016; Goulart, 2021; Pan, 2012; van Rooy, 2008; van Rooy & Terblanche, 2009; Zhao & Wang, 2017). This division highlights the need to improve research on learner writing; otherwise, there will be insufficient knowledge about the linguistic characteristics of L2 learner writing. In addition, as insightful as studies on writing for research and publication purposes may be in understanding textual variation across different L1 backgrounds, their findings cannot be indicative of L2 learner writing.

In the case of cross-linguistic studies on L2 learner writing, they have mainly compared the writing of English native writers with that of non-native learners, with or without considering additional variables (e.g., Abdulaziz et al., 2016; Goulart, 2021; Pan, 2012, 2018; van Rooy, 2008; van Rooy & Terblanche, 2009; Zhao & Wang, 2017). For example, in a comparative analysis of Brazilian English learners’ and native learners’ language use in 11 registers, Goulart (2021) found more similarities but fewer differences between the two writer groups on most functional dimensions. She illustrated that group similarities stem from common communicative purposes in most registers. However, their differences were due to their different interpretation of the communicative purposes associated with particular registers (i.e., case studies, designs, exercises, and research reports). This study indicates that for linguistic differences between L1 writers, it is not necessarily L1 background that matters, but the registers and writers’ interpretation of registers. Pan (2012) is a typical example of a comparison of language differences between native writers and Chinese English learners involving various factors, such as genres and learner proficiency. She utilized a corpus that includes native writers’ essays (i.e., the Louvain Corpus of Native English Student Essays, LOCNESS) and Chinese English learners’ texts in two genres produced by learners of two proficiency levels. She concluded that the Chinese learner corpus differs from the native writer corpus in all five dimensions of the 1988 MD analysis she examined. Although the researcher compared the internal differences of each Chinese learner corpus, the cross-linguistic differences could not be attributed to language background.

In contrast to the above cross-linguistic studies involving more variables, other studies only compare linguistic differences between the writing of English native writers and that of non-native learners. For example, in a comparative analysis of student writing between the Tswana Learner English Corpus (TLEC) and LOCNESS, van Rooy (2008) observed significant differences between the two groups in 39 linguistic features and four functional dimensions. Zhao and Wang (2017) compared the corpus of Chinese English learners’ writing (WECCL) and LOCNESS. They found that the two writing groups significantly differed in all six dimensions and 57 linguistic features (more than 87% of 67 features). Based on the 1988 MD analysis, this pair of studies each reported more differences than similarities between L1 groups in linguistic features and textual dimensions. These findings may be related to the corpus they used. Specifically, the corpora used in these comparative studies were designed for different communicative purposes in different writing conditions. For example, they did not effectively control over factors such as writing topics, text length, and time constraints between corpora, possibly due to the unavailability of comparable corpora.

Previous research may need to pay more attention to writing specificity that can only be identified in texts produced under strict situational conditions (Ishikawa, 2013) and for specific communicative purposes (Biber, 2012; Biber & Conrad, 2009). First, research has identified those factors that affect language use in L1 and L2 writing, such as registers or genres (e.g., Goulart, 2021; Larsson et al., 2021; Larsson & Kaatari, 2020; Staples & Reppen, 2016), writing topics or prompts (e.g., Sarte & Gnevsheva, 2022; Weigle & Friginal, 2015; Yang et al., 2015; Yoon, 2017). Moreover, controlling writing conditions, such as text length and time constraints, is crucial for the CIA (Ishikawa, 2013). In addition, language varies depending on communicative purposes (Biber, 2012; Biber & Conrad, 2009). Taken together, as Moreno (2008) stated, “if corpora are not selected carefully, it will be more difficult to determine which contextual factor is responsible for the possible differences” in cross-linguistic research (p. 39). For these reasons, corpora of maximum similarity are crucial for cross-linguistic research. Based on the 1988 MD analysis framework, the present study seeks to expand current cross-linguistic comparisons by using corpora of texts produced in similar writing contexts (e.g., writing genres, writing topics, writing situations) and with similar writing purposes. With this study, it is possible to verify whether linguistic differences in previous cross-linguistic MD analyses are due to uncontrolled writing variables in the corpora under study. It is also possible to identify typical linguistic and functional features of L2 learner writing and provide significant implications for L2 writing research and pedagogy.

Methodology

This study compares the linguistic differences between L1 and L2 writing produced in similar writing contexts (e.g., genres, writing topics, writing situations) and with similar writing purposes. Based on Biber’s (1988) multidimensional analysis, the goals of this paper are trifold, (1) to reveal the differences between L1 and L2 writing in terms of individual linguistic features, (2) to reveal their differences in terms of textual dimensions, and (3) to describe their overall textual features.

Research Questions

Specifically, this study addresses the following three research questions:

What are the significant differences in linguistic features between L1 writing and L2 writing?

What are the significant differences in textual dimensions between the two groups of writing?

What are the overall characteristics of L2 writing, in comparison with L1 writing and other sampled registers investigated in Biber (1988)?

Data

The study is based on a corpus of English argumentative essays, including 162 texts by English native writers (ENS) and 338 by Chinese English learners (CEL) (see Table 2). These texts were sampled from the Written Essay 2.4 module of the International Corpus Network of Asian Learners of English (ICNALE), a learner corpus for CIA (Ishikawa, 2013).

Table 2.

Details of Corpora for English Native Writers and Chinese English Learners.

Corpus	ENS	CEL	Total
Texts	162	338	500
Total words	36,852	82,770	119,622
Mean words	227.48	244.88	239.24

The texts were carefully controlled to ensure maximum similarity between ENS and CEL. First, they were sampled from the corpus for CIA and completed under identical writing conditions, such as time constraints, text lengths, and writing environment (Ishikawa, 2013). Second, to avoid potential linguistic differences caused by writing topics (e.g., Sarte & Gnevsheva, 2022; Weigle & Friginal, 2015; Yang et al., 2015; Yoon, 2017), these texts had the same topic (i.e., It is important for college students to have a part-time job). Third, to ensure maximum homogeneity in writing purposes, they were all one-sided arguments, that is, either for or against the viewpoint. For example, an essay supporting the view should demonstrate the importance of students taking a part-time job.

Two points are worth noting. The first is the different levels of English proficiency of the writers. Research has shown that advanced L2 writers have limited language reserves and limited access to linguistic devices in time-critical writing tasks compared to native writers (Hinkel, 2002). Moreover, even native writers have different proficiency. The native writers in this study are no exception, although ICNALE does not assess their level. In this sense, not controlling for L2 learner proficiency provides better diversity and makes their writing more representative, as is shown in previous cross-linguistic comparisons (e.g., Ai & Lu, 2013; Lan et al., 2022). The other is the imbalance in the number of essays between the two corpora. This imbalance does not influence the research results for the following reasons. First, the frequency counts of all linguistic features are autonomously normalized in the data analysis. Second, many MD studies use such unbalanced corpora (e.g., Biber, 1988; Biber & Gray, 2013). In addition, based on the 1988 MD analysis, this study does not require factor analysis but instead plots CEL and ENS on the previously identified dimensions. Most importantly, the analysis is completed using the Multidimensional Analysis Tagger (Nini, 2015) (see Section 3.3 below).

Overall, the corpora used in this study are better controlled between writing groups than those used previously.

Multidimensional Analysis Tagger

In this study, the Multidimensional Analysis Tagger (MAT) version 1.3 (Nini, 2015) was used for the MD analysis, as MAT successfully replicated Biber’s (1988) analysis (Nini, 2015). It works on the principle of how a new register is different or similar to other English registers (Biber, 1988) but without the need to extract new dimensions for the register(s) under study. MAT provides the overall register features of the newly investigated text and corpus, the normalized frequency per 100 words of each tagged feature, the z-scores of 67 linguistic features, and the six dimensions of each text and corpus (Nini, 2015, 2019).

Except for its reliability in analyzing English native texts from LOB and Brown corpora in Nini (2015), the reliability of MAT is high for non-native learners’ texts. On the one hand, current probability-based or rule-based POS taggers can provide 96 to 97% tagging accuracy for L2 learners’ texts, comparable to that for English native speakers (Liang, 2006). The Stanford tagger, on which MAT relies, is one of the most widely used accurate taggers available (Crosthwaite, 2016). It ensures tagging accuracy in the MAT tagger and subsequent statistics on textual dimensions and linguistic features. On the other hand, it has been used practically for L2 learners’ writing (e.g., Crosthwaite, 2016; Kim & Nam, 2019; Zhao & Wang, 2017).

Considering the relatively short text lengths of 200 to 300 words, in this study, 100 tokens were used to calculate the type-token ratio during the MAT analysis. In addition, a z-score correction was used to calculate the dimension scores as it can help short texts avoid the problem of infrequent variables affecting the overall dimension scores. Furthermore, the size and diversity of the corpora currently under study are not an issue in this MD analysis, as MAT does not produce exploratory factor analysis but rather maps new corpora on the given dimensions studied in Biber (1988). Finally, after MAT completed the annotation and analysis of both corpora, the statistics of all linguistic features and dimensions were exported to Excel 2019 and SPSS 25.0 for further statistical analysis.

Results

After the statistical analysis of the data MAT 1.3 generated, all the results are as below:

Comparison of Linguistic Features Between L1 Writing and L2 Writing

Using the z-scores of the 67 linguistic features, a non-parametric Mann-Whitney U-test was performed to identify the differences in individual linguistic features between ENS and CEL. Table 3 shows the means, standard deviations and Z values for linguistic features. As can be seen, 37 (of the 67 linguistic features) were statistically different, accounting for 55% of the linguistic variance between the two groups. Specifically, ENS had 22 features with higher scores but 15 with lower scores than CEL.

Table 3.

Means of Linguistic Features and Results of Mann-Whitney U-Test against Each Dimension.

Dimensions	Linguistic features	ENS		CEL		Z
Dimensions	Linguistic features	M	SD	M	SD	Z
Involved versus informational production	Second personal pronouns	0.78	1.5	1.16	1.87	−2.83*
	Do as pro-verb	0.34	0.4	0.57	0.71	−3.22*
	Demonstrative pronouns	0.68	0.56	0.32	0.38	−7.38***
	First personal pronouns	3.42	3.09	4.8	3.25	−4.82
	Causative subordinators	0.38	0.46	0.15	0.27	−6.39***
	Discourse particles	0.01	0.08	0.04	0.12	−2.4*
	Hedges	0.03	0.11	0.07	0.17	−2.52*
	Amplifiers	0.43	0.53	0.29	0.39	−3.21*
	Sentence relatives	0.02	0.09	0.15	0.28	−5.85***
	Direct WH-questions	0.02	0.08	0.1	0.2	−5.17***
	Possibility modals	1.24	0.89	2.07	1.06	−8.16***
	Non-phrasal coordinations	0.52	0.56	0.43	0.55	−2.01*
	WH-clauses	0.13	0.29	0.18	0.27	−2.47*
	Adverbs	0.04	0.14	0	0.03	−5.7***
	Conditional subordinators	0.62	0.56	0.34	0.4	−5.67***
	Contractions	0.57	0.77	0.76	0.75	−3.09*
	[Nouns]	17.99	3.86	21.03	3.36	−8.3***
	[Total prepositional phrases]	9.01	1.86	8.42	1.96	−2.7*
	[Type-token ratio]	0.68	0.05	0.66	0.06	−4.98***
Narrative versus non-narrative concerns	Past tense verbs	0.97	1.45	0.36	0.66	−5.9***
	Third personal pronouns	3.4	2.17	2.39	2.17	−5.34***
	Perfect aspect verbs	0.5	0.63	0.26	0.37	−4.09***
	Public verbs	0.41	0.41	0.34	0.4	−2.22*
	Present participial clauses	0.23	0.41	0.39	0.49	−3.8***
Explicit versus situational dependent references	WH-relative clauses on subject positions	0.34	0.43	0.22	0.33	−3.18*
	Nominalizations	1.82	1.27	1.41	0.89	−3.05*
	[Time adverbials]	0.29	0.41	0.34	0.36	−2.02*
Overt expression of persuasion	Infinitives	3.36	1.12	2.89	1.08	−4.53***
	Prediction modals	1.14	0.88	0.86	0.79	−3.83***
	Suasive verbs	0.41	0.41	0.24	0.32	−5.23***
	Necessity modals	0.38	0.51	0.53	0.53	−3.27*
	Split auxiliaries	0.56	0.52	0.38	0.47	−4.01***
Abstract versus non-abstract information	Conjuncts	0.4	0.43	0.56	0.52	−2.67*
Abstract versus non-abstract information	Other adverbial subordinators	0.38	0.39	0.19	0.31	−5.89***
Online informational elaboration	Demonstratives	0.89	0.59	0.54	0.52	−6.61***
	That relative clauses on object positions	0.34	0.4	0.18	0.3	−4.75***
	That clauses as verb complements	0.73	0.69	0.32	0.4	−7.43***

Note. Linguistic features in square bracket are negative features on the dimensions, and the others are positive ones; M in bold marks the larger mean between ENS and CEL.

p < .05 (2-tailed). *** p < .001 (2-tailed).

Regarding the linguistic features on dimension 1, 19 features showed significant differences between ENS and CEL, accounting for 56% of the linguistic variance among the 34 features that co-occurred on this dimension (Biber, 1988). Of the 16 positive features, CEL used significantly fewer “demonstrative pronouns,” “causative subordinators,” “amplifiers,” “non-phrasal coordinations,” “adverbs,” and “conditional subordinators,” but significantly more “1^st and 2^nd personal pronouns,” “do as pro-verb,” “discourse particles,” “hedges,” “sentence relatives,” “possibility modals,” “direct WH-clauses,” “WH-questions,” and “contractions”) than ENS. For the negative features, CEL had significantly fewer “total prepositional phrases” and a lower “type-token ratio” but more “nouns.” For the other features on the left dimensions, CEL had significantly more “present participial clauses” (dimension 2), “time adverbials” (dimension 3), “necessity modals” (dimension 4), and “conjuncts” (dimension 5) but fewer “past tense verbs,” “3^rd personal pronouns,” “perfect aspect verbs,” “public verbs” (dimension 2), “WH relative clauses on subject position,” “nominalizations” (dimension 3), “infinitives,” “prediction modals,” “suasive verbs,” “split auxiliaries” (dimension 4), “other adverbial subordinators” (dimension 5), “demonstratives,” “that relative clauses on subject positions” and “that clauses as verb complements” (dimension 6) than ENS.

Comparison of Textual Dimensions Between L1 Writing and L2 Writing

Table 4 shows the means and standard deviations for both corpora and the results of the textual dimension variances between the corpora via a Non-parametric Mann-Whitney U test. ENS and CEL had insignificant differences on dimensions 1 (Z = −0. 374, p > .05), 2 (Z = −1.498, p > .05), 3 (Z = −1.388, p > .05), and 5 (Z = −0.146, p > .05). Both corpora had positive scores on dimensions 1, 3, and 5, but they were manifested differently. That is, CEL had a higher score on involved production (M = 6.89) but lower on explicit reference (M = 2.24) and abstract information (M = 1.33) than ENS. In contrast to the above manifestations, both corpora had negative scores on dimension 2, with the dimension score of CEL (M = −2.33) being lower than that of ENS (M = −1.82).

Table 4.

Dimension Means for Both Corpora and Results of Mann-Whitney U-test.

Dimensions	ENS		CEL		Z
Dimensions	M	SD	M	SD	Z
1. Involved versus informational production	5.69	10.05	6.89	7.74	−0.374
2. Narrative versus non-narrative concerns	−1.82	2.68	−2.33	2.90	−1.498
3. Explicit versus situation dependent reference	3.08	3.30	2.24	3.18	−1.388
4. Overt expression of persuasion	6.94	4.78	3.90	3.77	−3.005*
5. Abstract versus non-abstract information	2.08	3.21	1.33	3.43	−0.146
6. Online informational elaboration	0.95	2.42	−1.12	1.85	−3.884***

p < .05 (2-tailed). *** p < .001 (2-tailed).

ENS and CEL showed significant differences in dimensions 4 (Z = −3.005, p < .05) and 6 (Z = −3.844, p < .001). Both corpora showed overt expressions of persuasion on dimension 4, but CEL (M = 3.90) was significantly less overtly persuasive than ENS (M = 6.94). On dimension 6, ENS (M = 0.95) scored significantly higher than CEL (M = −1.12), showing the opposite discourse characteristic. A stepwise regression analysis was conducted to further identify the linguistic features that determine group differences in these two dimensions. As Table 5 shows, “prediction modals” (β = −.191) and “suasive verbs” (β = .162) had significant influences on dimension 4 for ENS (R² = 0.051), explaining 5.1% of the dimension variance, while “split auxiliaries” (β = −.133) and “necessity modals” (β = .110) had significant influences on dimension 4 for CEL (R² = 0.033), accounting for 3.3% of the dimension variance. Table 5 also shows that “that relative clause on object positions” was the only predictor that negatively influenced dimension 6 for CEL (R² = 0.015, β = −.121).

Table 5.

Stepwise Regression Models of Linguistic Features against Dimensions.

Dimensions	Model		R	R ²	R² change	F	F change	B	Beta (β)
2 Overt expression of persuasion	ENS	[Constant]						4.363
		Prediction modals	0.160	0.026	0.026	4.214	4.214*	−0.365	−.191
		Suasive verbs	0.226	0.051	0.025	4.268	4.237*	0.485	.162
	CEL	[Constant]						4.769
		Split auxiliaries	0.145	0.021	0.021	7.259	7.259*	−0.320	−.133
		Necessity modals	0.182	0.033	0.012	5.720	4.114*	0.194	.110
6. Online informational elaboration	CEL	[Constant]						−0.035
6. Online informational elaboration	CEL	That relative clauses on object positions	0.121	0.015	0.015	5.020	5.020*	−0.106	−.121

p < .05.

Comparison of Register Characteristics Between L1 Writing and L2 Writing

As can be seen from the dimension statistics produced in the MAT results, ENS and CEL resembled the “involved persuasion” text type, with high scores on dimension 1 (M = 5.69 for ENS, M = 6.89 for CEL) and dimension 4 (M = 6.94 for ENS, M = 3.90 for CEL).

Table 6 presents the mean scores of all six dimensions for ENS and CEL and those for comparable registers selected from Biber (1988). ENS and CEL were closer to prepared speech on dimension 1, academic prose on dimension 2, personal letters on dimension 4, and press reportage on dimension 5. However, they differed on dimensions 3 and 6, where CEL was closer to prepared speech on dimension 3 and broadcast on dimension 6, and ENS was closer to academic prose on both dimensions.

Table 6.

Dimension Means for ENS, CEL, and Others from Biber (1988).

Dimensions	ENS	CEL	AcaW	PreS	PerL	PreR	BroC
1. Involved versus informational production	5.69	6.89	−14.9	2.2	19.5	−15.01	−4.3
2. Narrative versus non-narrative concerns	−1.82	−2.33	−2.6	0.7	0.3	0.4	−3.3
3. Explicit versus situation dependent reference	3.08	2.24	4.2	0.3	−3.6	−0.3	−9.0
4. Overt expression of persuasion	6.94	3.90	−0.5	0.4	1.5	−0.7	−4.4
5. Abstract versus non-abstract information	2.08	1.33	5.5	−1.9	−2.8	0.6	−1.7
6. Online informational elaboration	0.95	−1.12	0.5	3.4	−1.4	−0.9	−1.3

Note. AcaW = academic writing; PreP = prepared speech; PerL = personal Letters; PresR = press reportage; BroC = broadcast.

Discussion

This analysis revealed significant findings regarding the similarities and differences between the writing of native English speakers and Chinese English learners in terms of individual linguistic features, textual dimensions, and overall registers characteristics through multidimensional analysis. This section summarizes the main findings and discusses them in the light of relevant previous studies on linguistic analysis, particularly MD-based analysis, and qualitative analysis.

Differences in Linguistic Features Between L1 Writing and L2 Writing

This analysis revealed some similarities but significantly more differences in linguistic features between L1 and L2 writing, indicating the usefulness of considering multiple linguistic features in the comparative analysis. This analysis also revealed consistency and variation in language use along textual dimensions, indicating the need to pay attention to linguistic co-occurrence patterns when understanding the language used in L1 and L2 writing.

A total of 37 features out of all the 67 linguistic features showed significant differences, indicating that 55% of the differences in language use between native writers and Chinese English learners were significant when writing identical genres for similar writing topics and purposes. Compared with previous studies based on the 1988 MD analysis framework, such as 68% between TLE and LOCNESS in van Rooy (2008) and 85% between TECCL and LOCNESS in Zhao and Wang (2017), this study presented a relatively lower percentage of linguistic variances among the same 67 features between L1 and L2 writing. The inconsistencies in the linguistic variances of these studies could be the result of different controls on the writing conditions of the corpora, that is, the extent to which the corpora are similar. If this is the case, studies based on comparable corpora may find more similarities than differences in language use, as in this study. Other linguistic studies also support the importance of using comparable corpora for comparative research. For example, research has consistently found that writing topics or prompts influence language use in writing produced by writers from different cultural and linguistic backgrounds. In particular, L2 learners’ writing is more susceptible to topic effects (e.g., Sarte & Gnevsheva, 2022; Weigle & Friginal, 2015; Yang et al., 2015; Yoon, 2017). Moreover, research has also demonstrated that other factors, such as genre or register, influence language use in L2 writing, even for the same topic (e.g., Hardy & Friginal, 2016; Qin & Uccelli, 2020). In this regard, it is understandable that this study (among the current cross-linguistic studies based on the 1988 MD analysis) found the slightest differences in language differences, as all texts from both corpora used in this study were of the same genre for the same topic and communicative purposes.

Textual functions of linguistic features lead to specific language use (Biber et al., 1999). On the dimension of Involved vs. Informational Production, linguistic features, such as “1^st and 2^nd personal pronouns,” “possibility modals,” “contractions,” altogether showed oral involved production, while others, such as “prepositional phrases,” “nouns,” and “type-token ratio,” showed a written informational focus (Biber, 1988, pp. 104–108). In this sense, the linguistic manifestations of the two corpora implied that CEL resembled involved writing more than ENS. For example, CEL used significantly more “1^st and 2^nd personal pronouns” (e.g., I, we, you), “possibility modals” (e.g., can, may, could), “contractions” (e.g., can’t, shouldn’t, isn’t), but lower “type-token ratio” than ENS. Previous cross-linguistic studies have reported similar findings regarding interpersonal involvement and writer/reader visibility in learners’ writing. For example, van Rooy and Terblanche (2006) found that Tswana learners’ English writing was less formal and more colloquial, as they used the features of involvement 20% more often than native writers. In analyzing the linguistic features that mark writer/reader visibility in EFL learners’ writing, Petch-Tyson (1998) found that L2 learners used more features of writer visibility in their writing and that learners from different L1 backgrounds manifest them differently. In other words, L2 learners’ writing is much more involved than that of native speakers, and different linguistic features are used to express personal feelings and attitudes. For example, Swedish English learners prefer first-person plurals, while Finnish learners use first-person singulars most often (Petch-Tyson, 1998).

The description of past events depends heavily on the features of “past tense verbs,” “3^rd personal pronouns,” “perfect aspect verbs,” “public verbs,” and “present participial clauses” (Biber, 1988). A qualitative check on the corpora showed that native writers used more “past tense verbs” (e.g., did, experienced, led, needed, gained), “3^rd personal pronouns” (e.g., he, she), “perfect aspect verbs” (e.g., have been done/ earned/ learned/made), and “public verbs” (e.g., agree, say, admit, assert). In contrast, Chinese English learners relied more on “present participial clauses,” such as having/ taking a (part-time) job is/can, having/taking a job, we/you can … For this reason, native speakers’ writing was more narrative than that of Chinese English learners. Example 1, taken from the corpus of English native writers, illustrates how they used past tense words (italicized) to recount a personal experience.

Ex. 1 On a personal note, I worked on and off throughout college. Interestingly, I got better grades when I had a part-time job. I focused better, exercised greater discipline and worked harder in the available study time that I had. (ENS)

This finding is largely consistent with that of Zhao and Wang (2017), which found that native writers used more public verbs, past tense verbs, and perfect aspect verbs, while Chinese English learners used more negations, present tense verbs, and third personal pronouns. To some extent, this difference in using linguistic features that characterize narration between native English writers and Chinese English learners may be due to their respective rhetorical traditions, that is, Confucian rhetoric and Aristotelian approach (Hinkel, 2002; Liu & Huang, 2021). This difference may also be due to the lack of a morphological system for marking tenses and aspects in Chinese. Therefore, it is challenging for Chinese English learners to acquire and use these reflectional forms of verbs appropriately (Hinkel, 2002, 2009). Meanwhile, learners may intentionally avoid using the reflectional forms in order to avoid making mistakes (Zhao & Wang, 2017).

“Nominalization,” “WH relative clauses on subject position,” and “time adverbials” can distinguish texts with explicit or situation-dependent reference (Biber, 1988). Native writers used more features indicating explicit discourses, whereas Chinese English learners relied more on those marking situations. Specifically, native writers used more “nominalization,” such as graduation, happiness, occupation, situation, to expand their idea and integrate more information (Biber, 1988). They also used more “WH relative clauses on subject position,” such as students who devote much time to…, many problems which led to … to show explicit and elaborate identification of person, job, and other substances (Biber et al., 1999). Instead, Chinese English learners relied more on “time adverbials” to indicate temporal situations, such as after, now, once, today.

Features of “infinitives,” “prediction modals,” “suasive verbs,” “necessity modals,” and “split auxiliaries” characterize overt persuasiveness (Biber, 1988). A qualitative check of ENS showed that native writers used significantly more “prediction modals” (e.g., will, would, shall, won’t, ’d), “suasive verbs” (e.g., agree, disagree, ask, intend, suggest), and “split auxiliaries” (e.g., should be highly recommended, can be easily ruined, do not know, can be done easily). Similar to van Rooy (2008), native writers used more linguistic features that mark overt persuasion.

“Conjuncts,” “agentless passives,” “past participial clauses,” “WHIZ deletion relatives,” and “other adverbial subordinators” showed abstract, technical, and formal discourse (Biber, 1988). A qualitative study of both corpora was conducted to identify further the linguistic devices used in L1 and L2 writing to mark similar discourse. It was found that native writers primarily relied on “other adverbial subordinators,” such as since, while, whereas, as/so long as, to mark the logical relations between clauses, while Chinese English learners used more “conjuncts,” such as hence, however, furthermore, moreover, therefore, thus. Moreover, discourses of informational elaboration under strict time constraints are frequently represented with “demonstratives,” “that relative clauses on object positions,” and “that clauses as verb complements” (Biber, 1988). Example 2, taken from ENS, shows how often native writers used linguistic features, particularly “that verb complement” (italicized), to express stance or thoughts in real-time writing conditions. Since the verbs preceding that clause (i.e., think, thought, found, say, believe) are used more often in speech (Biber et al., 1999), the use of these features was consistent with those expressing involvement. Overall, L1 writers used significantly more features marking informational elaboration than Chinese English learners.

Ex. 2 I think that even though it has been a bit of a squeeze to fit everything in a lot of the time, I have learned a lot of tips and trick … I wouldn’t say that it has been a lot of fun every day … I thought that my studies would suffer but they haven’t because I found that if I just stick to my time schedules, I seem to get everything done OK. …, I believe that I won’t run into many major problems. This might not be the ideal thing for everyone else, but I can only say that it is working for me. I found that it was important for me to remember that some days … (ENS)

Differences in Textual Dimensions Between L1 Writing and L2 Writing

Under the control of similar writing conditions for both corpora, this study found more similarities than significant differences between L1 and L2 writing in terms of textual dimensions of linguistic co-occurrence. However, this finding is inconsistent with previous cross-linguistic studies based on the 1988 MD analysis, which consistently found more differences than similarities (e.g., van Rooy, 2008; Zhao & Wang, 2017). This inconsistency in the findings of textual dimensions may be due to the lack of control over the writing conditions. For example, writing topics and purposes influencing language use are not controlled in previous studies (e.g., van Rooy, 2008; Zhao & Wang, 2017).

In terms of the similar dimension distributions for both corpora, that is, on the positive sides of dimensions 1, 3, 4, and 5 and negative side of dimension 2, this study showed that L1 and L2 argumentative essays were, in nature, involved, persuasive, context-independent, abstract but narrative. This finding further illustrates that using comparable corpora is essential for any cross-linguistic comparative studies; only in this way can the similarities and differences between L1 groups be identified and explained reliably (Moreno, 2008). Another noticeable finding was that most dimension scores of ENS were higher than those of CEL, except dimension 1, indicating that ENS was more informatively persuasive than CEL, with more technical and elaborate information conveyed through narration and explicit reference in the timed conditions. In terms of involved or informational focus, native speakers’ writing was less involved but more informational than that of Chinese English learners. This finding is consistent with research on Chinese English learners’ argumentative writing in different contexts (e.g., Pan, 2012; Zhao & Wang, 2017), which indicates that Chinese English learners’ argumentative writing is characterized by involvedness regardless of the writing conditions. It is also congruent with studies regarding involvement and writer/reader visibility in argumentative essays (e.g., Petch-Tyson, 1998; van Rooy & Terblanche, 2006).

A closer look at the dimension scores for the two corpora reveals significant differences between ENS and CEL on dimensions 4 and 6. Although both were overtly persuasive, the persuasiveness of ENS was significantly more overt than that of CEL. This finding is opposite to that of Zhao and Wang (2017). One possible reason for the finding difference is the writing conditions for L1 and L2 writing rather than cultural or language differences. Although both studies compared the argumentative texts of native writers and Chinese English learners, they differed in the extent to which the corpora they used were equivalent. Compared with Zhao and Wang (2017), which relied on learner corpora (i.e., WECCL and LOCNESS) of texts produced under different writing conditions, this study used corpora developed for CIA (Ishikawa, 2013). In addition, it controlled the writing purposes (i.e., texts with one-sided discussions). Therefore, these findings may be more appropriate for understanding the cross-linguistic differences.

Furthermore, the significant difference on dimension 4 may correspond to their respective approaches to persuasion. Chinese English learners prefer to use authority and present the known truths to present their propositions and opinions. In contrast, native writers tend to express personal goals and viewpoints to validate their opinions and propositions (Hinkel, 2002). Examples 3 and 4, typical argument texts written to persuade readers show how writers from different rhetorical traditions support their arguments about the importance of taking a part-time job. Example 3, taken from CEL, shows that Chinese learners tend to benefit their readers by describing the facts from a third party’s point of view, whereas example 4, taken from ENS, shows that native writers prefer to share their ideas and experience to persuade their readers.

Ex. 3 Firstly, gaining a part-time job is not only for the purpose of earning much money. What values more is the experience you get to promote your capacity such as communication, independence, your solution to an issue, and so on. That is to say, you can do things on your own if you gain some experience from part-time job. Secondly, it is a good opportunity to be involved in the society, forming a foundation for your future work. We are students but we are adults now. Therefore, the campus is not the only place we should stay. Outside school, you will see more interesting things. (CEL)

Ex. 4 By having a part-time job, I have the freedom of disposable income, and that allows me to go shopping or go out to eat with my friends whenever I want to. Although my parents pay for college, I do not think they should have to pay for all my day to day expenses and so I’m trying to do them a favor by supporting myself a little bit. Also, I really don’t like the rich kids whose parents do everything for them and I think it is important for us to learn how to live and be responsible adults so that we can go out into the world one day and know how to take care of ourselves. Finally, in some cases a part-time job can get you experience that you can use after you graduate. (ENS)

This line of difference between English and Chinese writing largely echoes Liu and Huang (2021) regarding how Chinese writers develop their expository paragraphs. They reported that Chinese learners’ writing prefers to quote authorities, stories, and the words of famous people and books to promote their trustworthiness, moderate their tone, and convey their voice less directly in their writing.

The significant difference in dimension 6 between L1 and L2 writing might result from the nativeness. Compared with L2 learners, native writers have a richer lexical and syntactic repertoire, making their writing more explicit and sophisticated. They primarily use features prevalent in written or informational discourse (Hinkel, 2002). Moreover, native writers used a larger number of syntactic structures, such as “that clauses as verb complements” and “that relative clauses on subject positions” which are often used for “informational elaboration under real-time production constraints” (Biber, 1988, p. 113). Thus, native speakers’ writing was more informational than Chinese English learners’ under the strict real-time constraints. They also used these features to express personal opinions, attitudes, or statements (Biber et al., 1999). The finding regarding the more use of these features in L1 writing suggests that L1 writing also involves the explicit marking of personal stances, thus echoing the finding that native speakers’ writing was more overtly persuasive on dimension 4. Consequently, it can be concluded that writing texts that express personal stances or attitudes also require informational elaboration under real-time constraints and vice versa.

Overall Register Characteristics of L1 Writing and L2 Writing

Overall, ENS and CEL are closer to involved persuasion (Biber, 1988), displaying the involved and persuasive nature of argumentative writing (Pan, 2012; van Rooy, 2008; Zhao & Wang, 2017). Moreover, this analysis revealed that both were involved, persuasive, context-independent, abstract but narrative. However, ENS was more overtly persuasive and appropriately informational than CEL.

A further comparison with the registers investigated in Biber (1988) revealed that L1 and L2 writing manifested linguistic differences on certain textual dimensions, even though they constructed similar textual functions. Take dimension 2 as an example. Although both corpora were closer to academic prose, they drew on different linguistic resources to construct their respective non-narrative discourses, as is seen in Table 3. The similarities and differences in linguistic features and textual dimensions between the two corpora have remarkably indicated that ENS and CEL were, in fact, particular sub-registers of language use within the specific register of argumentative writing. These findings have also illustrated the necessity and superiority of Biber’s (1988) MD Analysis in differentiating these sub-registers in terms of register variation, compared to other comparative linguistic analyses based on individual linguistic features or textual functions (Biber, 2012, 2014; Goulart et al., 2020).

Conclusion

Using two corpora of argumentative texts produced under similar writing conditions and with similar writing purposes, this MD-based study compared linguistic features and textual dimensions between the texts of native English writers and Chinese English learners. It also demonstrated the register characteristics of these two groups of writing texts through a more comprehensive comparison with the registers studied in Biber (1988). The analysis showed that the two groups of written texts used different linguistic resources to realize similar discourses. It also revealed that the two groups shared more similarities than significant differences in textual dimensions. They were, in fact, different varieties of English language used in the same genre of argumentative writing. The results confirmed the importance of comparable corpora for cross-linguistic comparative analysis and the necessity of MD analysis for understanding L2 learner writing.

The findings enriched the current understanding of L2 argumentative writing and have practical implications for L2 writing research and pedagogy. For example, the study demonstrated that L2 learners’ argumentative writing is inherently involved and persuasive, as previously noted (e.g., van Rooy, 2008; van Rooy & Terblanche, 2009; Zhao & Wang, 2017). The findings on the differences in linguistic features and textual dimensions between L1 groups showed the importance of corpora of maximum similarity for cross-linguistic comparison (Moreno, 2008). Moreover, the findings concerning the overall register characteristics of the two groups showed that MD analysis, especially the 1988 MD analysis, is suitable for characterizing any two or more registers with different levels of specificity for learner writing. In addition, it will be helpful for L2 writing pedagogy to analyze and understand why and how L2 learners differ from native writers in terms of actual language use and the textual functions they seek to realize.

However, limitations still exist, and future avenues are open. For example, this study is limited to argumentative texts by college-level learners. Future studies can examine such texts written by learners at different proficiency levels and even include more genres or text types. Future studies can use much more refined comparable corpora of argumentative essays, such as those supporting a particular viewpoint, to understand differences in language use between L1 and L2 writing for specific communicative purposes.

Footnotes

Acknowledgements

Special thanks are given to the anonymous reviewers for their insightful and instructive comments and suggestions.

Declaration of Conflicting Interests

I declare that I have no financial and personal relationship with any person or organization that could inappropriately influence this research.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Ethics Statement

I declare that no ethics issue is involved in this research.

ORCID iD

Yujiao Zhang

References

Abdulaziz

Muhammad

A. M.

Azher

(2016). Variation in learner’s argumentative essays—A multi-dimensional comparative analysis. Science International, 28(4), 413–425.

(2013). A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing. In DÃaz-Negrillo

Ballier

Thompson

(Eds.), Automatic treatment and analysis of learner corpus data (pp. 250–264). John Benjamins.

Asención-Delaney

(2014). A multi-dimensional analysis of advanced written L2. In Sardinha

T. B.

Pinto

M. V.

(Eds.), Multi-dimensional analysis, 25 years on a tribute to Douglas Biber (pp. 239–269). John Benjamins.

Biber

(1988). Variation across speech and writing. Cambridge University Press.

Biber

(1995). Dimensions of register variation: A cross-linguistic comparison. Cambridge University Press.

Biber

(2012). Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory, 8(1), 9–37.

Biber

(2014). Using multidimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast, 14(1), 7–34.

Biber

Conrad

(2009). Register, Genre and Style. Cambridge University Press.

Biber

Conrad

(2013). Introduction: Multi-dimensional analysis and the study of register variation. In Conrad

Biber

(Eds.), Variation in English: Multi-dimensional studies (pp. 3–12). Routledge.

10.

Biber

Gray

(2013). Discourse characteristics of writing and speaking task types on the TOEFL iBT test: A lexico-grammatical analysis. TOEFL iBT Research Report. Educational Testing Service.

11.

Biber

Gray

Staples

(2016). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, 37(5), 639–668.

12.

Biber

Johansson

Leech

Conrad

Finegan

(1999). Longman grammar of spoken and written English. Longman.

13.

Bulté

Housen

(2014). Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing, 26, 42–65.

14.

Cao

Xiao

(2013). A multi-dimensional contrastive study of English abstracts by native and non-native writers. Corpora, 8(2), 209–234.

15.

Conrad

Biber

(Eds.) (2013a). Variation in English: Multi-dimensional studies. Routledge.

16.

Conrad

Biber

(2013b). Multi-dimensional methodology and the dimensions of register variation in English. In Conrad

Biber

(Eds.), Variation in English: Multi-dimensional studies (pp. 13–42). Routledge.

17.

Crossley

S. A.

(2020). Linguistic features in writing quality and development: An overview. Journal of Writing Research, 11(3), 415–443.

18.

Crosthwaite

(2016). A longitudinal multidimensional analysis of EAP writing: Determining EAP course effectiveness. Journal of English for Academic Purposes, 22, 166–178.

19.

Egbert

Gracheva

(2023). Linguistic variation within registers: Granularity in textual units and situational parameters. Corpus Linguistics and Linguistic Theory, 19(1), 115–143.

20.

Ferris

D. R.

(1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28(2), 414–420.

21.

Friginal

Weigle

(2014). Revisiting multiple profiles of learner composition: A comparison of highly rated NS and NNS essays. Journal of Second Language Writing, 23, 1–16.

22.

Friginal

Weigle

(2014). Exploring multiple profile of L2 writing using multi-dimensional analysis. Journal of Second Language Writing, 26, 80–95.

23.

Goulart

(2021). Register variation in L1 and L2 student writing. Register Studies, 3(1), 115–143.

24.

Goulart

Gray

Staples

Black

Shelton

Biber

Egbert

Wizner

(2020). Linguistic perspectives on register. Annual Review of Linguistics, 6, 7.1–7.21.

25.

Grant

Ginther

(2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2), 123–145.

26.

Hardy

J. A.

Friginal

(2016). Genre variation in student writing: A multi-dimensional analysis. Journal of English for Academic Purposes, 22, 119–131.

27.

Hinkel

(2002). Second language writers’ text: Linguistic and rhetorical features. Lawrence Erlbaum.

28.

Hinkel

(2009). The effects of essay topics on modal verb uses in L1 and L2 academic writing. Journal of Pragmatics, 41, 667–683.

29.

Ishikawa

(2013). The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. Learner Corpus Studies in Asia and the World, 1, 91–118.

30.

Jarvis

Grant

Bikowski

Ferris

(2003). Exploring multiple profiles of highly rated learner compositions. Journal of Second Language Writing, 12, 377–403.

31.

Jiang

F. K.

(2015). Nominal stance construction in L1 and L2 students’ writing. Journal of English for Academic Purposes, 20, 90–102.

32.

Kim

J. E.

Nam

(2019). How do textual features of L2 argumentative essays differ across proficiency levels? A multidimensional cross-sectional study. Reading and Writing, 32(9), 2251–2279.

33.

Lan

Zhang

Lucas

Sun

Gao

(2022). A corpus-based investigation on noun phrase complexity in L1 and L2 English writing. English for Specific Purposes, 67, 4–17.

34.

Larsson

Kaatari

(2020). Syntactic complexity across registers: Investigating (in)formality in second-language writing. Journal of Second Language Writing, 45, 82–94.

35.

Larsson

Paquot

Biber

(2021). On the importance of register in learner writing A multi-dimensional approach. In Seoane

Biber

(Eds.), Corpus-based approach to register variation (pp. 235–258). John Benjamins.

36.

Lee

J. J.

Tytko

Larkin

(2021). (Un)attended this/these in undergraduate student writing: A corpus analysis of high- and low-rated L2 writers. Journal of English for Academic Purposes, 50, 1–13.

37.

Liang

(2006). POS tagging reliability on EFL learners’ written data. Foreign Language Teaching and Research, 38(4), 279–286. (In Chinese)

38.

Liu

Huang

(2021). Rhetoric construction of Chinese expository essays: Implications for EFL composition instruction. Sage Open, 11(1), 1–10.

39.

Moreno

A. I.

(2008). The importance of comparable corpora in cross-cultural studies. In Connor

Nagelhout

Rozycki

W.V.

(Eds.), Contrastive Rhetoric: Reaching to intercultural rhetoric (pp. 25–44). John Benjamins.

40.

Nini

(2015). Multidimensional analysis tagger (Version 1.3). https://sites.google.com/site/multidimensionaltagger

41.

Nini

(2019). The multi-dimensional analysis tagger. In Sardinha

T. B.

Veirano

P. M.

(Eds.), Multi-dimensional analysis research methods and current issues (pp. 67–96). Bloomsbury Academic.

42.

Pan

(2012). MF and MD analysis of written texts produced by Chinese non-English major undergraduates and graduates. Foreign Language Teaching and Research, 44(2), 220–230. (In Chinese)

43.

Pan

(2018). A multidimensional analysis of L1–L2 differences across three advanced levels. Southern African Linguistics and Applied Language Studies, 36(2), 117–131.

44.

Parkinson

Musgrave

(2014). Development of noun phrase complexity in the writing of English for Academic Purposes students. Journal of English for Academic Purposes, 14, 48–59.

45.

Petch-Tyson

(1998). Writer/reader visibility in EFL written discourse. In Granger

(Ed.), Learner English on computer (pp. 107–118). Longman.

46.

Qin

Uccelli

(2020). Beyond linguistic complexity: Assessing register flexibility in EFL writing across contexts. Assessing Writing, 45, 1–14.

47.

Ren

(2021). A multi-dimensional analysis of the management’s discussion and analysis narratives in Chinese and American corporate annual reports. English for Specific Purposes, 62, 84–99.

48.

Sarte

Gnevsheva

(2022). Noun phrasal complexity in ESL written essays under a constructed-response task: Examining proficiency and topic effect. Assessing Writing, 51, 1–15.

49.

Staples

Reppen

(2016). Understanding first-year L2 writing: A lexico-grammatical analysis across L1s, genres and language ratings. Journal of Second Language Writing, 32, 17–35.

50.

Taguchi

Crowford

Wetzel

(2013). What linguistic features are indicative of writing quality? A case of argumentative essays in a college composition program. TESOL Quarterly, 47, 420–430.

51.

van Rooy

. (2008). A multidimensional analysis of student writing in Black South African English. English World-Wide, 29(3), 268–305.

52.

van Rooy

Terblanche

. (2006). A corpus-based analysis of involved aspects of student writing. Language Matters, 37(2), 160–182.

53.

van Rooy

Terblanche

. (2009). A multi-dimensional analysis of a learner corpus. In Renouf

Kehoe

(Eds.), Corpus Linguistics: Refinements and Reassessments (pp. 239–254). Rodolpi.

54.

Weigle

Friginal

(2015). Linguistic dimensions of impromptu test essays compared with successful student disciplinary writing: Effects of language background, topic, and L2 proficiency. Journal of English for Academic Purposes, 18, 25–39.

55.

Yang

Weigle

S. C.

(2015). Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality. Journal of Second Language Writing, 28, 53–67.

56.

Yoon

(2017). Linguistic complexity in L2 writing revisited: Issue of topic, proficiency, and construct multidimensionality. System, 66, 130–141.

57.

Zhang

Zhan

(2020). Understanding voice in Chinese students’ English writing. Journal of English for Academic Purposes, 45(4), 1–9.

58.

Zhang

(2022). Revisiting the predictive power of traditional vs. fine-grained syntactic complexity indices for L2 writing quality: The case of two genres. Assessing Writing, 51, 1–14.

59.

Zhao

Wang

(2017). A multi-dimensional analysis of the register variations of Chinese learner of English: From the perspective of the discrepancy between English temporality and Chinese spatiality. Waiyu Dianhua Jiaoyu, 176(8), 71–78. (in Chinese)