Abstract
Purpose
This article aims to investigate the longitudinal Syntactic Complexity (SC) development of English as a Foreign Language (EFL) learners and the variations by grade level.
Design/Approach/Methods
This study conducts a longitudinal analysis of SC development among 199 high school EFL learners in eastern China. The corpus contains 920 argumentative essays on scientifically and socially contentious topics. We employ the Second Language Syntactic Complexity Analyzer (L2SCA) to examine syntactic features in these essays through 11 computerized indices measuring different dimensions of SC (i.e., coordination, subordination, phrasal complexity). Multi-level regression analyses are used to depict SC development and cross-level interactions are investigated to examine variations by grade.
Findings
Results suggest that all SC indices except T-units per sentence exhibit varying growing trends. The results align with Biber's hypothetical stages of syntactic development, indicating learners’ progression toward greater subordination and phrasal complexity. Negative cross-grade interactions suggest that lower-grade students show heightened improvements in phrasal sophistication over time.
Originality/Value
The study portrays the developmental patterns of SC within and across grade levels, highlighting both group trends and individual variability in language development. It conceptualizes SC as a multidimensional construct and informs more precise measurement of SC features in writing assessment and instruction.
Keywords
Introduction
In the field of English as a Foreign Language (EFL) writing research, academic language has been regarded as a specialized register of language, requiring learners to utilize a different set of linguistics skills, different from that in general communication (Figueroa et al., 2018; Gui, 2015; Schleppegrell, 2004). Argumentative writing is a cognitively demanding task that requires learners to not only adopt complex linguistic features but also engage in critical thinking of important social and academic issues. The quality of argumentative writing has been proven to be a valid indicator of learners’ academic language proficiency (Ravid & Tolchinsky, 2002; Schleppegrell, 2001). Over the past decade, applied linguistic scholars have taken on different perspectives in the analysis of argumentative writing, a major line of which focuses on the lexical, syntactic, and discoursal representations in the writings, portraying the academic language features of learners from relatively small semantic units to larger contextual elements. Among the three levels of linguistic analysis, scholars have undertaken investigations from diverse vantage points into SC development among English learners over an extended period of time (Lu, 2011; Ortega, 2003; Qin & Uccelli, 2020). SC stands as a pivotal metric for gauging the evolution of English writing (Crossley & McNamara, 2014). Presently, scholars have begun to shift their focus from cross-sectional investigations toward longitudinal inquiries into the dynamic evolution of SC in English learners (Kyle et al., 2021). Nevertheless, past research has encountered substantial impediments relating to research design, sampling, and metrics selection, making it challenging to provide a comprehensive and detailed portrayal of the multifaceted SC development in EFL learners. The primary objective of this study is to establish an extensive learner corpus and undertake a comprehensive longitudinal investigation of SC development among EFL learners.
Literature review
Linguistic demands of argumentative writing
Plenty of writing studies have been conducted on the genre of argumentative writing in that it has been widely recognized as a task requiring not only relatively high cognitive capabilities but also a certain level of command of academic writing skills (Qin & Uccelli, 2016; Schleppegrell, 2001; Taylor et al., 2019). Academic language skills have been proven to be a strong predictor of academic tasks, an important one of which is academic writing. Uccelli et al. developed the construct of Core Academic Language Skills (CALS) in 2015, and further studies validated that CALS was a significant predictor of learners’ academic writing skills (Phillips Galloway et al., 2020; Phillips Galloway & Uccelli, 2019). Other studies have also suggested the predictive power of academic language skills on argumentative writing (Aunurrahman et al., 2017; Taylor et al., 2019).
Previous studies on argumentative writing have mainly centered on three large segments, namely, lexical, syntactic, and discoursal features produced by learners in writing (Biber et al., 2004, 2011; Du & Cai, 2013; Fang, 2005; Hyland, 2008; Kim, 2022; Qin & Uccelli, 2020; Qin & Zhang, 2022; Yang & Wang, 2016; Zhao, 2013). Researchers hold that academic language skills, which differ from those used in daily casual communication, are required to fulfill this task (Phillips Galloway et al., 2020; Schleppegrell, 2004; Uccelli et al., 2015). At the syntactic level, previous research has provided evidence that distinctively different syntactic structures appeared in writings for academic and general purposes. In 2011, Biber and his colleagues proposed the hypothetical development stages of syntactic development, indicating that over time, syntactic structures displayed in learners’ writing tended to experience the development from coordination to subordination and eventually to phrasal structures. This process, characterized by packing dense information into shorter syntactic structures, suggests learners’ transition from conversational language competence to academic language competence. Such a theory has been tested across different cultural contexts, including China (Yang & Wang, 2016), in which the cross-sectional research found that writings from Chinese high school EFL learners with higher English proficiency included more subordination structures than those from students who were less proficient. Further research moved from omnibus syntactic features to fine-grained syntactic structure features in students’ argumentative writings and similar findings were also present (Qin & Zhang, 2022). The comparison of the SC features applied in academic and colloquial writings of EFL learners was also conducted and results revealed that EFL learners adopted different syntactic structures across the two text genres: A higher frequency of complex noun phrases was found in academic writings than in colloquial writings (Qin & Uccelli, 2020).
Syntactic complexity as multi-dimensional construct
Syntactic complexity, a key aspect of language development, pertains to the degree of complexity and variation in syntactic structures employed in both spoken and written language (Lu, 2011). Plenty of indices have been developed and applied in research to measure syntactic complexity (Kyle & Crossley, 2018; Lu, 2010, 2011; Ortega, 2003; Wolfe-Quintero et al., 1998). Lu (2010) introduced the Second Language Syntactic Complexity Analyzer (L2SCA), an automatic analysis tool for SC, which included 14 SC indices measuring complexity at sentence, clausal, and phrasal levels. These indices were divided into five categories: length of production unit, subordination, coordination, phrasal complexity, and overall sentence complexity. Length of production unit encompasses three indices designed to assess the extent of production at various linguistic levels, specifically the clausal, sentential, or T-unit levels. These metrics include the mean length of clause (MLC), mean length of sentence (MLS), and mean length of T-unit (MLT). Subordination complexity encompasses four indices elucidating the degree of subordination at the clausal level, encompassing a T-unit complexity ratio (clauses per T-unit, or C/T), a complex T-unit ratio (complex T-units per T-unit, or CT/T), a dependent clause ratio (dependent clauses per clause, or DC/C), and dependent clauses per T-unit (DC/T). Coordination indices comprise three ratios gauging how sentences are coordinated in writing, specifically coordinate phrases per clause (CP/C), coordinate phrases per T-unit (CP/T), and a sentence coordination ratio (T-units per sentence, or T/S). The subsequent category, phrasal complexity, consists of three ratios examining the correlation between sub-clausal syntactic structures and larger production units, namely complex nominals per clause (CN/C), complex nominals per T-unit (CN/T), and verb phrases per T-unit (VP/T). Overall sentence complexity is characterized by a sentence complexity ratio denoted as clauses per sentence (C/S).
Various studies have indicated that the ability to produce syntactically complex sentences serves as evidence of enhanced syntactic competence and can be utilized as an indicator of higher proficiency in second language (L2) acquisition (Crossley & McNamara, 2014). Lei et al. (2023) broadly delineated three lines of research concerning SC, with a particular focus on L2 writing. The first line of research investigates the indices that can measure SC in L2 writing. For instance, Ortega (2003) reviewed 25 studies related to L2 writing and identified that mean length of sentence (MLS) and mean length of T-unit (MLT) are effective in distinguishing L2 writings from those in the first language (L1). In the second line of research, scholars explore factors influencing SC in L2 writing. It was observed that the essay genre significantly impacts SC, with L2 learners employing more complex sentences in argumentative essays compared to narrative ones (Lu, 2011). The third line of research delves into the relationship between SC and L2 proficiency or writing quality. For example, Lahuerta Martínez (2018) investigated the SC of essays written by EFL learners of different English proficiency levels, as characterized by different grade levels (lower intermediate and intermediate). Results showed that the SC of EFL learners was positively correlated with their English proficiency, meaning that more proficient EFL learners were more likely to generate more complex essays syntactically. Zhang & Lu (2022) looked into the effect of SC features on L2 writing quality across two text genres: application letters and argumentative writings. They concluded that fine-grained SC indices had stronger predictive power of L2 writing quality and that genre effect proved an important factor of the explanatory power of SC on L2 writing quality.
Despite extensive research on SC in L2 writing, many studies have relied on cross-sectional designs, providing only a snapshot of L2 learners’ SC. Such findings may not fully capture the dynamic development of L2 writing SC (Lei et al., 2023). In response to this, some researchers have adopted a longitudinal perspective to assess syntactic development. For instance, Kyle et al. (2021) conducted a longitudinal study of syntactic development in nine secondary school students, analyzing 54 essays produced over two years. Their results demonstrated significant development in indices like mean length of T-unit (MLT) and dependent clauses per clause (DC/C). Though longitudinal, the study was relatively small in sample size. Lei et al. (2023) filled this gap by conducting a large-scale longitudinal investigation into the syntactic development of 1,081 Chinese college students over two years. All 14 SC indices in L2SCA were included in the study for SC analysis. Results revealed that length-based measures, measures of coordination, and those of phrasal complexity significantly increased while measures of clausal subordination significantly decreased. Learners became more proficient in adopting syntactic structures frequently in academic writing over time. The study was a comprehensive corpus-based investigation of longitudinal SC development. However, all learners in the study were college-level students who had received training in academic writing, while the SC development of younger learners who have yet to enter the world of academia, such as high school students, was left understudied.
Nevertheless, due to varied research designs, different measures for assessing SC (Lei et al., 2023), and limited longitudinal studies, questions persist regarding how the SC of EFL learners evolves over time. In light of the growing interest in longitudinal investigations of SC in L2 writing, more research is needed to provide a comprehensive understanding of this construct. Several factors warrant additional attention. (1) Sample size: Previous studies often had small sample sizes, limiting the generalizability of findings (Lei et al., 2023). For instance, Menke & Strawbridge (2019) only collected 42 writings written by three Spanish students over six semesters. Only one participant has followed through all six semesters. The size of the corpus, regardless of the number of essays or participants, could be too small to produce generalizable findings. (2) Syntactic complexity measurement: Many studies incorporated only a subset of syntactic indices, offering an incomplete portrayal of L2 learners’ syntactic development. Bulté & Housen (2018) included only five indices in their longitudinal studies: mean length of T-unit, subclause ratio, coordinate clause ratio, mean length of finite clause, and mean length of noun phrase. These five indices, despite covering indices at the sentence, clausal, and phrasal level, were relatively insufficient in measuring complexity at the phrasal level, failing to depict the SC development of phrasal coordination (such as coordinate phrase per clause). This leaves it incomprehensive in depicting the multidimensional nature of SC.
To address these gaps, this study investigates the longitudinal syntactic development in high school EFL learners’ argumentative writing. This study distinguishes itself mainly in three aspects. First, it is based on a substantial longitudinal corpus collected from the same participants under controlled conditions, including 920 argumentative writings by 199 Chinese high school students. The large sample size and consistency in data collection ensure the reliability and generalizability of research findings. Second, the study adopts a multidimensional perspective to analyze SC by elucidating a variety of SC indices that capture different dimensions of complex language usage. The findings showcase which SC indices are more effective in capturing development at the current stage of writing development. Third, the study combines the longitudinal and cross-sectional design in data collection, which allows researchers to investigate both intra-individual development over an academic year and inter-individual differences across grade levels. The study is guided by the following research questions:
Methods
The participants
The current study works on an existing corpus of 940 argumentative essays written by 199 high school students from a public high school in Eastern China. This sample of students comprising 86 Grade 9 students and 113 Grade 10 students, aged between 15 and 17 years. There were 79 female students and 120 male students. They had an English learning experience for approximately 8–11 years. All participants self-reported speaking Chinese as their native language. According to students’ performance on a general standardized test adapted from TOEFL (Baron & Tannenbaum, 2011), most students have achieved intermediate- to upper-intermediate-level proficiency (B1 and B2) based on the Common European Framework of References (Council of Europe, 2001). Moreover, a statistical test reveals a significant difference in English proficiency by grade level, with Grade 10 students achieving significantly higher scores, on average, than their Grade 9 counterparts (t = −5.426, p < .001).
We have obtained consent from the participating school to perform secondary data analysis using the corpus data they provided for the purpose of informing and enhancing teaching practices. All essays used for analysis were naturally generated from the writing course self-developed by the school. The data has been stripped of all identifying information and there is no way it could be linked back to the participants from whom it was originally collected. All students were assigned a unique ID prior to data collection so that we could track their individual development over time. Sample sentences used in the current paper were selected from the corpus with all writers’ information removed.
The longitudinal learner corpus
The writing course self-developed by the school included a ninety-minute English argumentative writing class and a practice lesson, operated on alternative weeks. Over the course of an academic year (from September 2022 to June 2023), the students engaged in eight argumentative writing tasks. The writing prompts were selected from the “Word Generation” program, a researcher-designed curriculum to cultivate academic language and critical thinking skills through argumentative writing (Jones et al., 2019). This curriculum includes a series of argumentative topics closely related to adolescent life (e.g., Should teenagers listen to rap music? Should high school students learn a second foreign language? Should the government invest in green technology?), allowing students to engage in critical thinking and writing of socially and scientifically contentious issues. These writing prompts were carefully adapted by researchers to fit the proficiency level and cultural background of Chinese high school students. The writing tasks were completed in the classroom under the supervision of teachers. Hence, argumentative essays collected from the program could be viewed as students’ spontaneous production of language without reference to any other external sources, including reading materials, information from the Internet, and generative artificial intelligence. Due to logistic reasons, not all students were able to complete all eight writing tasks throughout the academic year. A total of 920 essays were collected to construct the longitudinal learner corpus.
Syntactic complexity measurement
We utilized the Second Language Syntactic Complexity Analyzer (L2SCA) (Lu, 2010) to measure SC. The instrument allows researchers to examine different dimensions of SC through a variety of computerized indices, as summarized in Table 1. Eleven SC indices were selected and divided into five sub-dimensions, namely overall sentence complexity, clausal coordination, clausal subordination, phrasal coordination, and phrasal sophistication. These five dimensions measure English learners’ SC at different levels, progressing from overall sentence complexity, clausal complexity to phrasal complexity. The following section introduces the five dimensions of SC with examples.
Indices of syntactic complexity (Lu, 2010).
Note. *N stands for the total number of occurrences.
Overall sentence complexity
We chose mean length of sentence (MLS) as the indicator of overall sentence complexity. A sentence is a set of words segmented by punctuation marks signaling the end of a sentence (Hunt, 1965). As the name suggests, MLS is the average word count per sentence in a single essay. In the following example (see Figure 1), the MLS is 13.

Example of overall sentence complexity.
Clausal coordination
A complexity sentence might be constructed by coordinating several independent T-units, which is a main clause with all other attached or embedded subordinate clauses or non-clausal structures (Hunt, 1970). T-units per sentence (T/S) was selected as the measure for clausal coordination. In the example below (see Figure 2), two T-units were coordinated using the conjunction “and,” so the T/S was 2.

Example of clausal coordination.
Clausal subordination
We then took a closer look at the clausal level to see how students organized different clause-level elements to pack rich meanings. A clause is characterized by a subject and a finite, which could include independent clauses, adjective clauses, adverbial clauses, and nominal clauses (Hunt, 1965; Polio, 1997). A dependent clause is a sub-category of clause, which includes finite adjective, adverbial, or nominal clause (Cooper, 1976; Hunt, 1965; Kameen, 1983). Four indices were selected as measures of clausal subordination: clauses per T-unit (C/T), complex T-unit ratio (CT/T), dependent clauses per clause (DC/C), and dependent clauses per T-unit (DC/T). In the example below, there are 2T units coordinated by the conjunction “but,” four clauses, and two dependent clauses. Therefore, C/T, CT/T, DC/C, and DC/T were 2, 0.5, 0.5, and 1 respectively (see Figure 3).

Example of clausal subordination.
Phrasal coordination
Coordinate phrases per clause (CP/C) and coordinate phrases per T-unit (CP/T) were used to gauge phrasal coordination (Cooper, 1976). In the following example, two coordinate verb phrases appear in the first two clauses each. Hence, the CP/C and CP/T are 0.67 and 2, respectively (see Figure 4).

Example of phrasal coordination.
Phrasal sophistication
For phrasal complexity, we further looked into how different types of phrases, either nominal or verbal, were applied in student writings. Complex nominals can be (1) nominal clauses, (2) nouns with adjective, possessive, prepositional phrase, relative clause, participle, or appositive, or (3) gerunds and infinitives as subjects (Cooper, 1976). Verb phrases include both finite and non-finite verb phrases. In this sense, we chose complex nominals per clause (CN/C), complex nominals per T-unit (CN/T), and verb phrases per T-unit (VP/T) as measures for phrasal sophistication. In the example below, there were four complex nominals in the sentence, with one embedded in another, and three verb phrases in each of the clauses. Therefore, the CN/C, CN/T, and VP/T were 1.33, 4, and 3, respectively (see Figure 5).

Example of phrasal sophistication.
Data analysis
A preliminary graphic display of the distribution of all syntactic indices shows that all of them meet the requirements of a normal distribution (Linck & Cunnings, 2015). All 11 indices were statistically standardized using the z score to facilitate the further comparison of syntactic development in different aspects. To answer the first research question about the types of SC features adopted in student writings and their variation by grade, the descriptive statistics of all 11 SC indices were listed by grade. Higher mean values would indicate more frequent use of the indices on average in a certain grade level. T-tests were also conducted to see if these SC indices vary significantly by grade.
The second research question regarding the developmental trajectories of all SC indices as well as their variation by grade was addressed using multi-level regression analysis. In the present study, each student composed eight argumentative essays in total. Therefore, the essays collected were nested within individuals, meaning that the SC indices are expected to be more correlated within an individual (e.g., different essays written by the same student) than between individuals (e.g., different essays written by different students). In this sense, multi-level models were more suitable for explaining both intra-individual differences and inter-individual differences as well as tracing the longitudinal development of syntactic indices (Cunnings, 2012; Lei et al., 2023). Multi-level models were fitted using StataSE 16 software: the SC indices were included as the dependent variables in each model. Three independent variables were considered in the model as fixed effects and were specified at two levels: (1) the within-individual (level 2) variables were student grade and English proficiency; (2) the between-individual (level) variable was time (as represented by data collection points wave0–wave7). Time was the key predictor in the model; student grade and English proficiency were hypothesized to be moderators. Random intercepts were specified to account for the random effects by allowing the SC of each student to vary randomly. The effect sizes of the fixed effects and the random effects were reported in terms of within-individual effects (R2w), between-individual effects (R2b), and overall effects (R2o). Specifically, the R2w suggested the within-individual variance alone explained by the mode while the R2b reported the between-individual variance alone explained by the model. R2o suggested how much the model could explain the overall variance. Likelihood ratio tests were conducted to see whether the model was the best fit for the data. To see whether grade proved to be a significant moderator in affecting the SC development of students over time, we added the time-grade interaction term into the models. A significant interaction term would demonstrate that grade positively/negatively predicts the development of indices over time.
Results
Descriptive analysis of the SC indices
We first examine the descriptive statistics of the 11 SC indices of students in Grade 9 and Grade 10 respectively to see how high school EFL learners perform syntactically in different dimensions (see Table 2). As shown in Table 2, on average, learners in the sample use 19.23 words per sentence, with a large degree of individual variability ranging from 9.25 to 147 words per sentence. Across grade levels, Grade 10 students on average demonstrate a higher level of overall syntactic complexity (MLS: 19.27Grade 10 > 19.20Grade 9). However, such sentence complexity is largely manifested in the use of coordinating structures, including clausal coordination (T/S: 1.14Grade 10 > 1.13Grade 9) and phrasal coordination (CP/C: 0.25Grade10 > 0.23Grade 9; CP/T: 0.41Grade 10 > 0.39Grade 9). In contrast, Grade 9 students use higher frequencies of clausal subordination (C/T: 1.72Grade 9 > 1.68Grade 10; DC/T: 0.70Grade 9 > 0.67Grade 10) and phrasal sophistication (CN/T: 2.10Grade 9 > 2.02Grade 10; CN/C: 1.21Grade 9 > 1.20Grade 10; VP/T: 2.41Grade 9 > 2.39Grade 10). In sum, descriptive analysis reveals that students in general are capable of using complex sentences in argumentative writing, but the strategies they have adopted to extend the length of sentences vary across grade levels. While higher-grade students prefer the use of clausal- and phrasal-coordinating structures, lower-grade students prioritize the use of clausal subordination and phrasal sophistication. We conduct t-tests on the 11 SC indices to see if there are significant cross-grade differences in the use of syntactic structures. However, none of the results reveal statistical significance after the Bonferroni correction of multiple comparisons. Despite the statistical non-significance, we still believe the observational mean difference in SC indices across grades can help us capture SC information in our corpus in the preliminary analytic stage.
Descriptive statistics of SC indices by grade.
SC development and cross-grade variation
To measure the SC development of EFL learners on the five dimensions, we first build a series of multi-level regression models with time as the key predictor, grade level and English proficiency as control variables. The results indicate that, except for T-units per sentence (T/S), all other ten SC indices exhibit an increasing trend over time. To be specific, students’ overall sentence complexity as measured by mean length of sentence follows an increasing trend over the course of one academic year (MLS: t = 3.52, p < .001, R2w = 0.020, R2b = 0.031, R2o = 0.021). This indicates that MLS increases at the rate of 0.05 standard deviations at each data collection point (i.e., every 30 days). Similarly, in terms of clausal subordination, the four indices (C/T: t = 3.14, p < .01, R2w = 0.014, R2b = 0.040, R2o = 0.015; DC/T: t = 3.55, p < .001, R2w = 0.018, R2b = 0.044, R2o = 0.019; CT/T: t = 2.33, p < .05, R2w = 0.007, R2b = 0.052, R2o = 0.008; DC/C: t = 3.99, p < .001, R2w = 0.023, R2b = 0.050, R2o = 0.023) have significant growths over time. Specifically, C/T, DC/T, CT/T, and DC/C increase by 0.03, 0.04, 0.03, and 0.05 standard deviations, respectively, at each progressive data collection point. At the phrasal level, significant growths can be observed in phrasal coordination indices of CP/C and CP/T (CP/C: t = 3.60, p < .001, R2w = 0.021, R2b = 0.028, R2o = 0.028; CP/T: t = 4.51, p < .001, R2w = 0.032, R2b = 0.037, R2o = 0.034) over time. To be specific, CP/C, and CP/T increase at the rate of 0.05 and 0.06 standard deviations respectively at each data collection point. Similar growing trends are also present in all three indices exhibiting phrasal sophistication (CN/C: t = 6.29, p < .001, R2w = 0.064, R2b = 0.022, R2o = 0.054; CN/T: t = 6.08, p < .001, R2w = 0.057, R2b = 0.044, R2o = 0.053; VP/T: t = 6.15, p < .001, R2w = 0.056, R2b = 0.058, R2o = 0.051). The results mean that CN/C, CN/T, and VP/T experience growths of 0.08, 0.07, and 0.07 standard deviations respectively with data collected every 30 days. Conversely, the clausal coordination index demonstrates a decrement over time and this trend is marginally significant (T/S: t = −1.75, p = .08, R2w = 0.003, R2b = 0.019, R2o = 0.008). In summary, results suggest that EFL learners in the sample demonstrate an increasing trend in four SC dimensions, including overall sentence complexity, clausal subordination, phrasal coordination, and phrasal sophistication, whereas a slight decrement is observed in clausal coordination. Despite the small effect sizes of the regression models, we still believe that they capture the developmental trajectories of SC indices over time.
We also compare the standardized regression coefficients of the key predictor—time—to see how SC develops in the five dimensions. Among the four dimensions with an upward trend, it can be observed that students are capable of producing longer sentences overall (MLS: β = 0.05). Notably, SC indices at the phrasal level, both phrasal coordination (CP/C: β = 0.05; CP/T: β = 0.06) and phrasal sophistication (CN/C: β = 0.08; CN/T: β = 0.07; VP/T: β = 0.07) have higher growth rates than those of clausal subordination (C/T: β = 0.03; CT/T: β = 0.03; DC/T: β = 0.04; DC/C: β = 0.05). This means that EFL learners in our study exhibit more improvements in using phrasal structures than clausal ones (Table 3).
Regression model of SC indices development, as predicted by time.
Note. ∼p < .10. *p < .05. **p < .01. ***p < .001.
We then examine the cross-level interaction terms—i.e., the interaction between intra-individual variable time and inter-individual variable grade—to investigate whether students at different grade levels demonstrate varying SC development trends over time. The results indicate significant negative interactions between grade and time when measuring SC for phrasal sophistication. This suggests that, on average, students in Grade 9 tend to show stronger growth syntactically over time as measured by complex nominals per clause, complex nominals per T-unit, and verb phrases per T-unit (CN/C: β = −0.05, t = −2.04, p = .04, R2w = 0.070, R2b = 0.030, R2o = 0.058; CN/T: β = −0.05, t = −2.20, p = 0.03, R2w = 0.064, R2b = 0.054, R2o = 0.059; VP/T: β = −0.06, t = −2.60, p = .01, R2w = 0.068, R2b = 0.053, R2o = 0.059). Figure 6 provides a more intuitive depiction of how grade negatively moderates the development of the aforementioned SC indices. It can be observed from the figure that, although students in Grade 9 have relatively lower levels of complexity on the three indices at the beginning of the academic year, they demonstrate stronger growth over time and eventually outperform their Grade 10 counterparts. Interaction terms are also tested when using other SC indices as the outcome variable, but none is shown to be statistically significant. Thus, the interaction terms are removed from the final model and only main effects are reported in Table 4.

Graphed significant time-grade interactions on SC indices CN/C, CN/T, and VP/T.
Regression model of SC indices development, as predicted by time, with time-grade interaction included.
Note. ∼p < .10. *p < .05. **p < .01. ***p < .001.
In summary, descriptive statistics show that students in different grade levels incorporate different syntactic features in their writings: Students in Grade 10 tend to use clausal and phrasal coordination structures, while their Grade 9 counterparts adopt more clausal subordination structures as well as complex phrases in argumentative writing. However, statistical tests reveal non-significant differences in such cross-grade variations. Multi-level regression results reveal growth over time in all SC indices except T-units per sentence. The significant negative time-grade interaction terms at the phrasal sophistication level suggest that students in Grade 9 experience stronger growth compared with Grade 10 students when it comes to using more complex phrases in writing.
Discussion
This present study tracks the longitudinal SC development of argumentative essays written by 199 high school students in Grade 9 and Grade 10 over the course of one academic year. Results indicate that grade, as a common indicator of EFL learners’ language proficiency, does not always reflect higher levels of SC. In our study, writings from lower-grade students have higher occurrences of SC indices indicating subordination and use of complex phrases, whereas the SC indices in writings from higher-grade students are more frequent in coordination structures. Grade is also found to negatively moderate the development of SC indices over time, especially on the dimension of phrasal sophistication, meaning that students in the lower grade show faster progress in terms of using phrasal structures to pack dense information.
Syntactic complexity indices exhibit different developmental trends over time
This study observes that the changes in SC in student-written essays, sourced from our learner corpus, are mixed and varied. Our findings align with previous research results on EFL writing development (Bulté & Housen, 2018; Lei et al., 2023; Menke & Strawbridge, 2019), demonstrating that in our corpus, as students’ English learning time increases, indices of overall sentence complexity, clausal subordination, phrasal coordination, and phrasal sophistication display stable incremental trends. Since these SC indices are confirmed predictors of English proficiency (Ortega, 2003), our study validates that English learners gradually improve their English proficiency across grade levels. Nevertheless, the upward trend was not observed in T-units per sentence, the measure for clausal coordination. Previous research has indicated that L2 SC development generally followed the progression from the use of coordination to subordination and eventually to more advanced phrasal structures (Biber et al., 2011; Norris & Ortega, 2009).
Our findings partly confirm the hypothetical developmental stages, as students in our study significantly produce longer sentences and adopt more clausal subordination and phrasal structure over time. This is further testified by the comparison of standardized regression coefficients of the key predictor time, in which the decrease of clausal coordination indices and stronger growth of phrasal sophistication indices than others suggest that EFL learners are gradually progressing from using subordination structures in essays to writing more complex phrases. Our findings are different from the study by Lei et al. in 2023, in which he identified decreasing use of subordination structures and attributed it to the relatively stable SC stage that the EFL students in his study had reached (Lei et al., 2023; Menke & Strawbridge, 2019). This discrepancy can possibly be explained by the Complex Dynamic System Theory (De Bot et al., 2007; Larsen-Freeman & Cameron, 2008; Zheng & Li, 2023), which theorizes that language development is a constant dynamic process with multiple factors, both internal and external, coming into play (Verspoor et al., 2008). Variability is the inherent property of L2 development, in which different linguistic systems (namely lexical and syntactic resources) interact and self-organize. The fluctuation and variation of L2 development signal the trade-offs between these systems (Baba & Nitta, 2014; Qin et al., 2023; Verspoor et al., 2008, 2021; Zheng & Li, 2023). Therefore, they should be carefully studied rather than erased as noises in research. In a study investigating the grammatical complexity (GC) of EFL learners’ argumentative writing in a longitudinal corpus (Qin et al., 2023), researchers found significant group-level differences in two phrasal GC features. Individual variability was also observed in four types of clausal and phrasal GC features, suggesting both inter- and intra-variability of GC development over time. In light of this theory, we believe that the SC of EFL students in our study may develop with different patterns. Therefore, the SC indices may not necessarily all exhibit linear development at the same pace. There can be trade-offs between different SC indices in the process of learners’ writing development.
Higher grade level does not always indicate higher sc
The inclusion of cross-level interaction terms in our models, specifically the time-grade interaction terms, reveals intriguing research findings. Typically, we might assume that students in higher grades possess richer English learning experiences, which consequently lead to higher English proficiency. And it has been validated that higher English proficiency often correlates with greater SC (Polat et al., 2020). However, within our learner corpus, significant negative time-grade interaction terms are observed at the phrasal sophistication level. This indicates that Grade 9 students in our study achieved more substantial progress in phrasal sophistication during the research project and ultimately performed better on this dimension. As a result, we posit that student grade level fails to comprehensively reflect the SC of argumentative writing by EFL learners in our study. Academic language, considered a separate register from day-to-day oral communication, has been extensively studied to be a valid predictor of learners’ academic writing performance (Qin & Uccelli, 2016; Ravid & Tolchinsky, 2002; Schleppegrell, 2001). In light of this, our results can possibly be explained in that when students are writing argumentative writings, they adopt a different set of linguistic repertoires, those of which are not captured in the general English proficiency differentiating their grade level. Therefore, grade is not indicative of their academic language performance. Another possible explanation for the results can be the degree of EFL learners’ engagement in writing tasks. Higher engagement can lead to higher writing proficiency (Chen et al., 2022). We speculate that Grade 9 students, who generally experience lower academic pressure compared to Grade 10 students, exhibit greater involvement in writing tasks, thereby producing essays of higher quality.
Conclusion and limitations
This study conducts a longitudinal investigation of the SC development in the argumentative essays of 199 Chinese high school students over the course of one academic year. The findings indicate that, except for T-units per sentence, all 10 other SC indices show an increasing trend over time. Notably, the indices reflecting phrasal sophistication demonstrate more growth than those in other dimensions. Moreover, it has been observed that grade level cannot fully predict the SC of students’ argumentative essay writing. Other factors may exert influence. This underscores the notion that the SC development in English learners is mixed and diverse.
This research contributes to the longitudinal study of SC development in English learners in three key aspects. Firstly, although high school EFL learners in this study experience improvements in SC, this growth does not necessarily translate to an increase in the quality of writing that correlates with English performance (Lei et al., 2023). In other words, the indicators reflecting SC development might not align with the features frequently emphasized in human-scored essays (Crossley & McNamara, 2014). For instance, in this study, English learners exhibit significant improvements in the use of phrasal structures, but human essay scoring typically places more emphasis on the complexity of coordination structures and subordination structures (Biber et al., 2011; Lei et al., 2023; Yoon & Polio, 2017). Therefore, English teachers need to understand the linguistic features closely related to writing quality and incorporate them into classroom instruction, implementing necessary teaching interventions. Secondly, while EFL learners’ SC development displays certain regularities at the macro level, it is undeniable that development remains intricate and dynamic (De Bot et al., 2007; Larsen-Freeman & Cameron, 2008). Consequently, English teachers and researchers in English language instruction should focus on the individual SC development in English learners. Thirdly, despite the overall SC growth in students’ argumentative writing over time, grade level fails to capture the development, which leaves the operationalization of academic language skills in both writing research and teaching an indispensable factor to consider (Phillips Galloway et al., 2020; Uccelli et al., 2015). Teaching activities aimed at improving students’ academic language skill sets are advised to be incorporated into classroom instructions.
Of course, this research has its limitations. Firstly, despite significant differences in English proficiency of students in Grade 9 and Grade 10, they are all intermediate and upper-intermediate (B1 and B2) learners of English. This leaves us with the absence of data from learners at other levels of English proficiency (i.e., basic or advanced levels), unable to obtain a fuller picture of SC development across all English proficiency levels. Secondly, possible explanations were brought up to account for the negative time-grade interaction, but further research, either quantitative or qualitative, should be conducted. Thirdly, the granularity of the SC indices used in the study was relatively large (Biber et al., 2016; Zhang, 2022), and we aim to employ both large-grained and fine-grained SC indices (Kyle & Crossley, 2018) in future research to provide a more detailed and comprehensive representation of English learners’ SC development.
Footnotes
Acknowledgments
Special thanks are given to the students and teachers who participated in the project and to research assistants from the LEAD Lab at the College of Foreign Languages and Literature at Fudan University for their support in data coding and constructive feedback.
Contributorship
Weiran Wang is in charge of conducting data analysis and drafting the paper. Wenjuan Qin, the academic advisor of the first author, is in charge of guiding the research design and revising the paper. Linyi Wang is in charge of supervising the writing project at the research site and revising the paper.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical statement
The present study conducts a secondary data analysis using written texts generated from a writing course at the Shanghai Foreign Language School Affiliated to Shanghai International Studies University. The project is exempt from IRB review at Fudan University for three reasons:
It deals with existing datasets collected for other purposes, not specifically for the research project. According to the IRB guideline provided by a number of higher education institutes, secondary data analysis could be exempt from IRB review. The data have been stripped of all identifying information when they were provided to the researchers. There is no way it could be linked back to the subjects from whom it was originally collected, so its subsequent use by the researchers would not constitute “human subject research.” The research team has obtained consent from the data provider to conduct secondary data analysis that would enhance teaching practices.
Given the reasons above, we believe the current project meets the ethical requirements of most academic journals in the field.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Shanghai Institute of Educational Science (Grant number: C2023155) awarded to Dr. Wenjuan Qin at Fudan University. The opinions expressed are those of the authors and do not represent the views of the funders.
