Sage Journals: Discover world-class research

Abstract

Task design has been viewed to be essential in the context of language assessment. This study investigated whether increasing task complexity affects learners’ writing performance. It employs three writing tasks with different levels of complexity based on Robinson’s Componential Framework. A cohort of 278 participants was selected using a simple random sampling technique, and a corpus of 603 compositions was used for data analysis. The results showed that learners demonstrated higher scores and higher levels of coherence in the compositions of more complex tasks. The findings of this study contribute to the growing field of cognitive research by exploring the effects of task complexity on learners’ writing production. The results have implications for online automated writing assessment, writing test development, and the design of high-stakes language tests. The findings also provide recommendations on how tasks should be designed to develop learners’ language production skills in writing instruction.

Keywords

task complexity text features Coh-Metrix 3.0 college english test language assessment

Introduction

Writing tasks provided in language exams require special consideration because they originate from and guide the writing process, which results in samples for language assessment and evaluation (Liu, 2022; Xu et al., 2021). Thus, task design in language tests has been assigned a central role in language instruction and assessment (Allaw, 2021; An & Lee, 2021; Bläsing & Bornewasser, 2021). Skehan and Robinson predicted how task design elements affect learning by placing various levels of cognitive demand on learners’ cognitive resources and focusing their attention on aspects of language usage, production, and learning. Long (2017) argued that in a task-based syllabus, pedagogic problems should be used as the unit of analysis, with challenges progressing from easy to complicated and gradually approaching the demands of real-world activities. It is true that when designing a task and syllabus, it is essential to consider task complexity as an appropriate criterion for determining validity. The design of the college English-testing syllabus was intended to facilitate maximum learning and effective teaching with appropriate task complexity. It should be clarified that a task-based language syllabus should intensify as the tasks become more complex.

However, in real writing situations, the influence of task complexity on college students’ writing performance remains uncertain, meaning that instructors fail to provide students with an appropriate level of task complexity in classroom teaching, which may result in poor ability to handle different types of tasks (Golparvar & Rashidi, 2021). Writing instructors are expected to design diverse writing tasks with different levels of complexity across semesters. More challenging tasks for senior students might encourage them to practice and write more, instead of reciting sample essays or memorizing sentence patterns. As a result, teachers do not follow any guidelines for designing writing tasks for students in different semesters.

Based on the above discussion, previous research suggests that tasks with different levels of complexity may result in varying levels of writing performance (Adams et al., 2015; Wang & Hu, 2021). Golparvar and Rashidi (2021)’s study also evident the observed discrepancy between task complexity in pedagogical design and actual performance outcomes. A notable niche, that is, the direct impact of task complexity on writing performance, especially within the nuanced context of college English assessments in Southern China, remains less-investigated and need more empirical support from a diverse context.

To this end, the current study investigates how task complexity, particularly the reasoning demands dimension, influences writing scores and text features in student compositions. By operationalizing a gradient of task complexity, the study examines if and how varying complexities within the College English Test (CET) writing section correlate with differences in students’ writing performance. College English Test (CET) is a high-stakes standardized assessment for English language proficiency, with approximately 18 million tertiary-level undergraduates participating annually. This investigation is of great necessity and significance as the results can shed light on pedagogical design and language assessment (Rahimi & Zhang, 2018).

Literature Review

Task complexity is theoretically grounded on Robinson’s (2011) “Triadic Componential Framework Model” (or “Cognition Hypothesis”); it is manipulated by a cognitive factor that affects two dimensions: resource-directing and resource-dispersing, consisting of several dimensions which can be used in designing tasks. These two categories identify important differences in how these dimensions affect resource allocation during task performance. The last decade has seen an increasing empirical interest in the effects of attentional resource allocation on language production (Adams et al., 2015; Ismail & Samad, 2017; Salimi & Dadashpour, 2012); these empirical studies investigated the two categories of resource-directing and resource-dispersing and compared the effects of these variables on task performance.

Task Complexity and Writing Scores

The relationship between task complexity and writing scores has been the subject of considerable research interest in the field of second-language writing. Robinson (2001) believes that task complexity refers to the results of attention, memory, inference, or other information-processing demands imposed on language learners by the task structure. He argues that several dimensions, such as working memory, reasoning demand, and prior knowledge, can affect the complexity level of a task, resulting in differences in learners’ task performance, as previous literature suggests that simple tasks will always be less demanding than complex tasks (Wang et al., 2024) and that task complexity levels can be met by different task designs. According to Robinson, task complexity and attentional resources are correlated. Thus, fewer attentional resources are needed to process simple tasks, and less attention is needed. Conversely, a more demanding attention allocation is required in more complex tasks. Robinson’s argument suggests that when generating writing output in their second language, the writer must devote a portion of their cognitive capacity to the language, preventing other functions, such as higher-order functions for organizing and conversation, from being fully utilized.

Ruiz-Funes (2015) conducted a study on task complexity and writing performance, and the findings revealed a close connection between task complexity and writing performance in second-language writing. Task complexity was manipulated using topic familiarity, writing genre, and task type. Writing performance was assessed using complexity, accuracy, and fluency (Ruiz-Funes, 2015). An increased reasoning demand for tasks led to an increased language output.

Task classification, manipulating the cognitive demands of tasks (i.e., task complexity), has been the primary issue in the classroom context, among other task variables, as advocates of task-based language teaching have assumed that carrying out tasks ordered from simple to complex could promote L2 production and development (An & Lee, 2021). This study was motivated by Robinson’s Cognition Hypothesis and aimed to investigate the impact of task grading along the reasoning-demands dimension on students’ writing scores and text features.

Studies have indicated that learners’ writing performance is influenced by many factors (Darong, 2021; Golparvar & Rashidi, 2021; Li & Huang, 2022), such as writing task design in classroom instruction and task complexity in language assessment. Tasks are widely viewed as significant factors in the design of language tests; task characteristics, testing methods, testing production, and language assessment are important components of validation, language testing, and evaluation. This study was conducted in the context of language assessment. The researcher converted reasoning demand and prior knowledge into task complexity based on Robinson’s triadic componential framework. The differentiation of writing tasks by task complexity not only makes writing tasks comparable but also contributes to their design, especially for task sequences (Lawrence, 2017; Robinson, 2011). Accordingly, the tasks were graded as least complex, mid complex, or most complex.

To fill the existing research gap, the effect of task complexity manipulations within writing tasks on EFL college students’ writing scores and text features, this study aims to examine task-generated cognitive burden manipulated by prior knowledge, reasoning demand, and writing genres and its impact on learners’ writing performance. This study is expected to contribute to writing task designs in classroom instruction, college English tests, and high-stakes language testing and assessment. This study is an initial effort to use a multitask technique.

Task Complexity and Text Feature

The question of whether increasing task complexity affects learners’ written output has attracted considerable attention from scholars and educators. Several studies have explored the relationship between the complexity of writing tasks and textual features. For example, Tavakoli and Skehan (2005) examined the effects of task complexity on text features such as lexical diversity, syntactic complexity, and fluency. The results indicated that more complex tasks led to greater lexical diversity and syntactic complexity, as well as improved fluency in written output. Similarly, Pae (2012) investigated the effects of task complexity on using cohesive devices in written output. The findings showed that more complex tasks led to a greater use of cohesive devices, which are linguistic features that signal connections between ideas and help create coherent texts.

However, traditional documents rely mainly on surface measures, such as text and sentence lengths, and text-based measures, such as lexical diversity and word frequency, to predict or explain writing production (Connor, 1990; Ferris, 1994). Few studies have investigated the deep level of linguistic features, such as the meaning and intention of discourse (Engber, 1995). In contrast to previous studies, this study used the Coh-Metrix to measure surface and deep levels of discourse. McNamara et al. (2010) analyzed linguistic features and identified predictive indices of the Coh-Metrix tool for writing quality. According to this study, syntactic complexity, lexical diversity, and word frequency are the three most significant predictors of writing quality.

The assessment of text features was based on three constructs generated using Coh-Metrix 3.0. The three constructs are text fluency, lexical diversity, and grammatical complexity. These were the most relevant measures for discourse analysis. Learning to use articles, for example, is resource-directing, as learners are directed to produce specific aspects of language with increased complexity. Previous investigations of text features in English writing mainly focused on three dimensions: complexity, accuracy, and fluency. The Coh-Metrix effectively measures overall text difficulty using a readability formula suitable for L2 learners. A more detailed analysis of text difficulty is also identified along eight functional dimensions (narrativity, syntactic simplicity, word concreteness, referential cohesion, deep cohesion, verb cohesion, connectivity, and temporality) through calculating 53 lexical-grammatical and semantic features (Jiang & Han, 2018).

In task-related research, performance is often measured based on language complexity, accuracy, and fluency (Wang & Skehan, 2014). Text feature analysis was conducted on three major dimensions of the linguistic features of texts: complexity, accuracy, and fluency.

In this study, text fluency, lexical complexity, and grammatical complexity are highlighted as measures of writing quality. Measures of text features were generated using Coh-Metrix 3.0 (Graesser et al., 2004), which was employed to assess the text features of students’ writing performance. It is a computer-based automated tool based on computational science and corpus linguistics. A Coh-Metrix (www.cohmetrix.com) is one of the most comprehensively based, most sophisticated automatic tools available to assess texts today, which can be used to evaluate text and discourse automatically, encompassing a range of language and discourse measures (Petchprasert, 2021). A major advantage of this tool is its latent semantic analysis (LSA) technology, which goes beyond the superficial features of analysis. Latent Semantic Analysis (LSA) is a technique for creating vector-based representations of texts which are claimed to capture their semantic content (Dumais, 2004). The Coh-Metrix establishes a potential textual semantic space and then calculates the cosine of the angle between vectors represented by two linguistic units, namely, their semantic similarity (Landauer et al., 2013).

Methodology

Data Collection Procedures

This study employed three writing tasks selected from the Authentic CET-4 Writing Section (see Table 1). Writing tests were conducted over three consecutive weeks. Learners’ task performance was evaluated based on three independent raters’ assessments, and text features were assessed using three constructs–text fluency, lexical complexity, and grammatical complexity–generated by Coh-Metrix 3.0, a text analysis tool.

Table 1.

Information About Writing Tasks in the College English Test.

Tasks	Task 1	Task 2	Task 3
Writing tasks	Write a short essay about a classmate of yours who has influenced you most	Start your essay with a brief account of the increasing use of mobile phones in people’s lives and then explain the consequences of overusing them.	Write a short essay on the use of robots. Try to imagine what will happen when robots increasingly take the place of human beings in industries as well as in people’s daily lives.
Genre	Narration	Argumentation	Argumentation
Theme	Personal experience	Daily technology	Frontier technology
Reasoning demands	–	+	++
Previous knowledge	+	+	–
Task complexity	Least-complex	Mid-complex	Most-complex

The participants were students from a comprehensive, open-enrollment college in Jiangxi Province, China. They were undergraduate test-takers who passed the College English test band 4 in 2019. This study employed a simple random sampling technique to select the participants. A cohort of 278 participants were selected as the research samples, unfortunately, owing to missing or incomplete data, 77 cases were removed from the dataset. The final research sample comprised 201 students who voluntarily agreed to attend the study.

Sampling Technique

The study utilized a random simple sampling technique to create a sample that is representative of the population with previous experience in writing the College English Test (CET). The inclusion criteria were specifically designed to select participants who had a direct and practical understanding of the CET, thus ensuring the study’s relevance to real-world language assessment scenarios Following Morgan’s (2011) criteria for determining sample size in research, a minimum sample size of 201 was deemed necessary for a population of 420 eligible test-takers. Initially, 278 participants were selected to account for potential data attrition. After screening for completeness and relevance, the final sample comprised 201 individuals. This methodical approach to sampling was taken to ensure statistical validity and reliability, as well as to support the generalization of the study’s findings to the broader population of CET test-takers.

Statistical Analysis

Upon collection, the data were subjected to rigorous analysis utilizing SPSS version 26 for statistical computations and Coh-Metrix 3.0 for text analysis. The analysis focused on evaluating the writing compositions based on a standardized rubric. To bolster the validity of the evaluation, three raters with substantial experience in language assessment were selected. These raters underwent a training session to align their marking strategies and ensure a consistent application of the assessment criteria across all samples. The high internal consistency score of 0.917 in their evaluations points to a strong inter-rater reliability, an essential aspect of trustworthy qualitative research.

Statistical comparison of writing scores across the three tasks was carried out using one-way ANOVA. This analysis method was chosen for its ability to determine whether there are any statistically significant differences between the means of independent (unrelated) groups. The writing scores were stratified into four distinct performance levels to facilitate a more nuanced understanding of the impact of task complexity on writing quality. Moreover, text features extracted through Coh-Metrix were also compared using one-way ANOVA to determine if task complexity influenced linguistic characteristics such as fluency, diversity, and grammatical complexity within the written texts. These methodological choices—rooted in robust statistical theory—were intended to distill clear insights into the relationship between task complexity and language production, thereby contributing to the development of effective language assessment tools.

Results

Student Writing Scores

The researcher classified the students’ writing scores into four levels (0–8.99; 9.00–10.04; 10.05–11.99; 12.00–15.00). Detailed information on the students’ writing performance levels across the three writing tasks is shown in the following tables.

The writing levels of the three writing tasks were crystallized, as shown in Tables 2 to 4. Students’ writing levels on the College English Test differed according to task complexities. Most students who passed the test demonstrated moderately good writing levels and obtained a total score of more than 70%. More specifically, only 10.4% of the participants failed the least-complex writing task compared to 4% in the mid-complex writing task, and only 3% in the most-complex writing task, indicating that students performed worse (writing score below nine points) in the least-complex writing task, scored comparatively better in the mid-complex task, and were least likely to fail in the most complex task. In contrast, only three students (1.5%) performed excellently (writing scores > 12) on the least complex writing task. This figure increased to 25 (12.4%) in the mid-complex writing task and 27 (13.4%) in the most complex writing task. The number of students who scored at level 2 (score between 9.00 and 10.04) and level 3 (score between 10.05 and 11.99) accounted for the largest proportion in the least-complex writing task, at 44.3% and 43.8%, respectively. In comparison, 65.2% (131) and 60.7% (122) of the participants performed moderately well (level 3) in the mid- and most-complex writing tasks, suggesting that students performed much better in the mid- and most-complex tasks compared with the least-complex task.

Table 2.

Level of Writing for Least-Complex Task.

Writing level	Frequency	Percent	Valid percent	Cumulative percent
1 (0–8.99)	21	10.4	10.4	10.4
2 (9.00–10.04)	89	44.3	44.3	54.7
3 (10.05–11.99)	88	43.8	43.8	98.5
4 (12.00–15.00)	3	1.5	1.5	100.0
Total	201	100.0	100.0

Table 3.

Level of Writing for Mid-Complex Task.

Writing level	Frequency	Percent	Valid percent	Cumulative percent
1 (0–8.99)	8	4.0	4.0	4.0
2 (9.00–10.04)	37	18.4	18.4	22.4
3 (10.05–11.99)	131	65.2	65.2	87.6
4 (12.00–15.00)	25	12.4	12.4	100.0
Total	201	100.0	100.0

Table 4.

Level of Writing for Most-Complex Task.

Writing level	Frequency	Percent	Valid percent	Cumulative percent
1 (0–8.99)	6	3.0	3.0	3.0
2 (9.00–10.04)	46	22.9	22.9	25.9
3 (10.05–11.99)	122	60.7	60.7	86.6
4 (12.00–15.00)	27	13.4	13.4	100.0
Total	201	100.0	100.0

Differences in Writing Scores

Descriptive analysis, Levene’s test, one-way ANOVA, and post-hoc tests were performed to assess the significance of the students’ writing scores on the three writing tasks. The results are shown in the following tables.

The descriptive statistics of the writing scores of the three writing tasks (Table 5) assessed by the three raters showed that the average score of the least-complex writing task was relatively low compared with the mid-complex and most-complex tasks. The mean value of the least complex task was 10.03, and the minimum and maximum values for this task were 5.33 and 12.67, respectively. The average value of mid-complex task predominated among the three tasks, at 10.80, and the minimum and maximum scores were 8.00 and 13.67, respectively. The second highest score was for the most complex task, at 10.72, with a standard deviation of 0.98, a minimum value of 8.00, and a maximum value of 13.67.

Table 5.

Descriptive Statistics of Writing Scores Among Three Writing Tasks.

Task complexity	N	Mean	Std. deviation	Std. error	Minimum	Maximum
Least-complex	201	10.03	0.98	0.07	5.33	12.67
Mid-complex	201	10.80	1.03	0.07	8.00	13.67
Most-complex	201	10.72	0.98	0.07	8.00	13.33
Total	603	10.52	1.05	0.04	5.33	13.67

The results of Levene’s Test for Equality of Variance in Table 6 show that p = .626 (p > .05), indicating that the error variance of the writing performance was equal across the three writing tasks, and the data met the hypothesis of homogeneity of variance.

Table 6.

Test of Homogeneity of Variances (Writing Scores).

	Levene’s Test for Equality of Variance	Levene statistic	df1	df2	Sig.
Writing scores	Based on mean	.469	2	600	.626
	Based on median	.442	2	600	.643
	Based on median and with adjusted df	.442	2	598.803	.643
	Based on trimmed mean	.417	2	600	.659

The bold indicates the P-value from Levene’s test is .626, which is greater than the commonly used significance level of .05, meeting the condition for conducting an ANOVA.

When the means were compared using one-way ANOVA, a significant difference was detected at less than 1% in all the variants (Table 7), indicating a significant difference in the mean value of writing performance among the three writing tasks at p = .000 (p < .01). Thus, we rejected the null hypothesis and concluded that the writing performance of the three writing tasks was significantly different.

Table 7.

One-way ANOVA of Writing Scores.

Writing scores	Sum of squares	df	Mean square	F	Sig.
Between groups	72.231	2	36.116	36.135	.000
Within groups	599.679	600	.999
Total	671.910	602

Multiple comparisons of writing scores among the three tasks were analyzed using a post-hoc Tukey test, which showed that the writing scores of the least-complex task significantly differed from those of the mid-complex task and the most-complex task. However, the significance level between the mid-complex and most-complex tasks was insignificant, p = .735, which was higher than the threshold value of 0.05. Similarly, the LSD (least significant difference) results showed that the significance level of the least-complex and mid-complex writing tasks was less than 0.001. The same level can be observed between the least and most complex tasks. In contrast, the scores of the mid-complex and most complex tasks were not significantly different, with a significance value of 0.455.

Different Performance of Text Features

To compare the different text features across the three tasks, the researchers employed a corpus of 603 EFL college students. Each composition was analyzed using the Coh-Metrix to generate the text features of text fluency, lexical diversity, and grammatical complexity. The text features of each composition were analyzed and compared using one-way ANOVA. In particular, the text features of writing performance across the three tasks were assessed through text fluency, lexical diversity, and grammatical complexity. These data were generated using the latest version of Coh-Metrix 3.0, an online computational analysis tool that is theoretically based on computational and corpus linguistics. The six most relevant indices were employed to assess the text features with respect to text fluency (text length and LSA), lexical diversity (type-token ratio [TTA] of all words, lexical diversity, and VOCD of all words), and grammatical complexity (average sentence length [ASL] and sentence syntax similarity of all sentences across paragraphs). Detailed information is provided in Table 8.

Table 8.

Six Measures of Text Features.

Text features	Measures
Text fluency	Text length (TextL)
Text fluency	Latent syntactic analysis (LSA)
Lexical diversity	Type-token ratio, all words (TTR)
Lexical diversity	Lexical diversity, VOCD, all words (VOCD)
Grammatical complexity	Average sentence length (ASL)
Grammatical complexity	Sentence syntactic similarity, all sentences, across paragraphs (STRUT)

Text fluency was assessed using two measures: text length and LSA. Research has shown that the longer the composition, the more skillful is the writer. Thus, text length was used to measure writing fluency. A longer text indicates a higher level of fluency in students’ writing compositions. Furthermore, LSA provides measures of semantic overlap between sentences or between paragraphs (Landauer, 2015). The LSA varies from 0 (low cohesion) to 1 (high cohesion).

Lexical diversity is the range of distinct word types or sorts of words present in a text relative to the total number of words (or tokens). When the total number of words (tokens) and the number of word types are equal, all words are distinct, meaning that when lexical diversity is at its highest level, the text is likely to have a very low level of cohesion. When a text contains many words, new words must be included in the context of the discourse. In contrast, when more terms are used repeatedly throughout the text, lexical diversity is lower (and cohesion is higher). In other words, a high TTR score is obtained for texts with low lexical diversity, and vice versa. We used TTR of all words and lexical diversity to measure lexical diversity.

We utilized ASL and STRUT (syntactic sentence similarity, all sentences across paragraphs) to assess grammatical complexity. Previous researchers synthesized studies on syntactic complexity in L2 writing and showed that it varies significantly according to L2 skill level; therefore, ASL was employed as a grammatical complexity metric. By comparing syntactic trees for each sentence pair, STRUT assesses the level of similarity in syntactic structures within a passage. Lower STRUT values may indicate greater grammatical complexity. Crossley and McNamara (2012) discovered that STRUT was one of the three factors that contributed to EFL learners’ writing readability, though they did not analyze L2 writers’ writing.

A one-way ANOVA was conducted to assess whether the six text features differed significantly across the three task types. ANOVA was used to compare 603 (201 × 3) compositions written by identical participants over three successive weeks. Significant differences were found among the five measures across the three task types, whereas no difference was found in ASL. Thus, the results indicate a significant effect of task type on text features. The details are presented in Table 9.

Table 9.

Variance Analysis for Task Complexity Effects for Text Features.

Measures	textL		LSA		TTR		VOCD		ASL		STRUT
ANOVA (F, sig)	0.33; .72		86.47; .000***		5.68;004*		6.33;.002*		32.70; .000***		32.19; .000***
Descriptive	M	SD	M	SD	M	SD	M	SD	M	SD	M	SD
Task 1	176.53	36.91	.124	.035	.581	.0530	78.04	18.74	13.84	3.886	.129	.0393
Task 2	173.91	33.50	.181	.058	.573	.0615	73.17	22.13	11.88	3.333	.110	.0335
Task 3	174.48	32.26	.184	.056	.562	.0598	70.91	20.53	11.12	3.153	.101	.0332

Note. task 1 = least-complex task; task 2 = mid-complex task; task 3 = most-complex task.

p < .05. ***p < .001.

As shown in the ANOVA results, the text features differed significantly across the three tasks among the five text measures—LSA, TTR, VOCD, ASL, and syntactic similarity—and all sentences across paragraphs.

Specifically, in terms of text fluency, text length did not differ significantly among the three task types, F = 0.33, p = .72. In contrast, a significant difference was found in the LSA measure, and LSA was used to measure the texts’ cohesion. The LSA measure varies from 0 (low cohesion) to 1 (high cohesion), suggesting that the higher the value, the higher the level of cohesion. The results demonstrate that the compositions of the mid-complex and most-complex tasks were more coherent. The LSA values were 0.181 and 0.184, respectively, compared to 0.124 in the least-complex writing task.

Additionally, lexical diversity differed among the three tasks. TTR’s values for three tasks were 0.581, 0.573, and 0.562, respectively; As mentioned above, when more words are used multiple times across the text, TTR is lower, and cohesion is higher; a lower value of TTR represents a higher level of lexical diversity, suggesting that cohesion in the most complex writing tasks was the highest, followed by the mid-complex and least complex tasks. Moreover, this feature was supported by lexical diversity VOCD values. The figures for the three tasks were 78.04, 73.17, and 70.91, respectively, indicating that writing in the most-complex task was more coherent than the mid-complex and least-complex tasks. The lexical diversity was lower, indicating a higher level of cohesion.

Finally, grammatical complexity was assessed using the ASL and sentence syntax similarity of all sentences across paragraphs. The ASL values for the three tasks were 13.84, 11.88, and 11.12, respectively, suggesting that sentences in the least-complex task compositions were longer than those in the mid-complex and most-complex task compositions. Moreover, lower STRUT values may indicate greater grammatical complexity (Zhang, et al. 2023). As indicated in Table 9, STRUT values for the three task types were 0.129, 0.110, and 0.101, suggesting that the compositions for the most complex tasks were grammatically the most complex.

In conclusion, text features differed significantly across the three tasks regarding text fluency, lexical diversity, and grammatical complexity. The results showed that writing requires latent coherence in thinking and concepts, not just surface discourse linkages. For lexical diversity, students tended to use words multiple times in the least-complex and mid-complex writing tasks compared to the most complex task. This result confirms that the most-complex and mid-complex tasks were more coherent. Finally, in terms of grammatical complexity, the compositions of the most complex task tended to be the most complex, followed by the mid- and least-complex tasks. Significant differences were observed in all three tasks. The results of text feature analysis generated by Coh-Metrix 3.0 showed a high quality of essays in the mid-complex and most-complex tasks, with better coherence, less repeated vocabulary, and more complex grammatical patterns.

Discussion and Conclusion

Learners demonstrated a higher writing performance on reasoning-demanding tasks (mid- and most-complex tasks). They displayed no significant differences in the tasks with respect to prior knowledge. The study found that an increased level of task complexity pushed students to generate higher writing scores, suggesting that learners’ attention was not limited by writing tasks requiring multiple working memories. They were equipped with a working memory capacity to process information that required multiple attentional capacities. Learners coordinate their attention during writing tasks. This study also found that prior knowledge of the topic did not significantly impact writing scores. Familiarity with the topic did not improve writing test outcomes.

The study found that identical learners (test-takers) performed differently when assigning different tasks on the College English Test. This study suggests that reasoning-demanding tasks can activate test-takers’ language skills and help them fully express their ideas. Learners were found to be more interested in writing argumentative essays (tasks with reasoning demands) than narrative writing (tasks without reasoning demands). Tasks with different complexity levels in the College English Test resulted in significant variations in test takers’ writing scores.

Enhanced Writing Performance on Reasoning Demanding Tasks

The first major focus of this study was to investigate the level of learners’ writing scores on the College English Test and compare the three writing tasks, as well as the impact of task complexity on EFL learners’ writing scores.

The analysis results indicated that a larger proportion of high-quality levels were distributed to the mid-complex and most complex tasks. These findings in the current study were supported by Robinson’s (2010) finding that more complex tasks can elicit students’ full attention to attend to all aspects of the tasks. According to Robinson’s (2001) argument, if the cognitive degree of task complexity increases, learners can improve their performance and complete tasks successfully. He argued that human attentional resources are unlimited, so that learners can attend to and process multiple linguistic aspects without trade-off effects in their language production (Cho, 2015). Learners do not attend entirely to all components of language performance simultaneously in a simple activity, but must fully pay attention to all aspects of the task to accomplish it in a complicated assignment.

According to Robinson’s cognition hypothesis (2001), increases in task complexity along the so-called resource-directing variables may not decrease language output but may contribute to higher structural complexity and greater accuracy of learner output if the cognitive task complexity dimensions belong to separate attentional resource pools. College learners have a mechanism to process reasoning-demanding writing tasks. Their attentional capacities can become coordinated and cooperative rather than disjointed and oppositional. Tasks manipulated according to reasoning demands push learners to release their potential abilities and coordinate their attention. Participants in this study had a strong working memory capacity to process information. Working memory refers to the storage and processing mechanism of complex cognitive activities such as reading and writing (Lu, 2010). This is why the students in this study demonstrated a higher level of performance with a lower degree of familiarity with the topics. This finding is of great significance for language instruction and assessment, as well as cognitive psychology.

Variations of Writing Scores Among Tasks

This study’s second research objective was to determine whether there was any significant difference in students’ writing performance (writing scores) among the three tasks with different complexity levels. Students’ writing scores were analyzed and compared using one-way ANOVA to determine whether the differences between group means were statistically significant. The study found that learners generated high-scoring essays on reasoning-demanding tasks (mid- and most-complex tasks). In line with previous research, task complexity may result in variations in learners’ writing performance (Abdi Tabari & Ivey, 2015; Adams et al., 2015; Jiaxin, 2015; Kuiken & Vedder, 2007; Simin & Farahman, 2017).

Learners perform better on writing tasks that require reasoning. This is because such tasks push them to generate higher-level language output. According to Robinson (2005), students focus more on complex tasks, improving their performance. Tasks with moderate reasoning demands help learners articulate their thoughts better, matching their writing habits and language proficiency. As learners perform tasks with a higher cognitive load, their language proficiency increases, resulting in better essays and higher writing scores. Another reason for students’ varying writing levels across tasks could be their past practice. Argumentative essay writing is the most prevalent essay genre encountered by college students across the curriculum and in large-scale writing assessments (Lu, 2010). This popularity leads students to invest more time in preparing for it. Practice with this type of writing familiarizes students with reasoning-demanding tasks. Writing teachers should use a variety of task types to train students in handling diverse content.

In conclusion, the reasoning demand of a task largely determines the processing of each output and final task performance. Students demonstrated a higher level of writing performance in reasoning-demanding tasks from the cognitive factors’ perspective.

Variations of Text Features Among Tasks

The results indicated that the students’ essays displayed different text features across the three tasks; increasing task complexity resulted in gains in coherence, lexical diversity, and syntactic complexity. Complex tasks are more likely to elicit multiple attentional faculties to generate a higher level of coherence, a more diverse lexicon, and more complex grammatical patterns. This finding is supported by Robinson’s (2010) argument that humans have different attentional capacities to process information, meaning that as task complexity increases, so does accuracy, complexity, and fluency.

Previous studies (Lawrence, 2017; Yang et al., 2015) claiming that increasing task complexity concerning reasoning demands resulted in higher gains in lexical and syntactic complexity. In contrast, no significant effect was detected on accuracy (Sattarpour & Farrokhi, 2017). Conversely, Lawrence (2017) found that as task complexity increased, syntax complexity and lexical length increased significantly, but lexical diversity showed no significant change (Lawrence, 2017). The current finding was also partially consistent with Ismail et al. (2012), who claimed that task reasoning demands and task conditions significantly affected grammatical accuracy (Ismail et al., 2012). Another similar study conducted by Faruji and Kharaghani (2019) among 90 intermediate EFL learners examined the impact of task grading along reasoning demand on syntactic complexity. The results indicated a significant positive impact of sequencing tasks from simple to complex on the syntactic complexity of intermediate Iranian learners (Faruji & Kharaghani, 2019).

The variation in text features among the three tasks again confirmed the robustness of the Coh-Metrix in analyzing texts based on various linguistic features. Coh-Metrix was found to be capable of meeting the needs of discourse analysis for test designers, textbook compilers, and instructors. It has been used to detect a wide range of textual variations. Previous studies have found variations in written discourse using the Coh-Metrix (Graesser et al., 2004; McCarthy et al., 2006). These investigations show that the Coh-Metrix is a powerful text analysis tool capable of assessing and distinguishing a wide range of text types from the chapter level to the sentence level.

The reasons learners displayed a higher level of coherence, lexical diversity, and grammatical complexity of text features in reasoning-demanding tasks (mid-complex and most-complex tasks) can be explained as follows. One explanation is that argumentative tasks, manipulated along reasoning demands, push learners to support their ideas using conjunctions, which contributes to a high level of coherence in essays (Cho, 2015). Another possible explanation could be that learners’ previous practice and the accumulation of argumentative writing led them to apply the samples they memorized, which resulted in more accurate language output.

In conclusion, this study found that different task complexities resulted in variations in text features in students’ written compositions, suggesting that task complexity is an important factor that affects students’ productive abilities. This study also confirmed that increasing task complexity leads to gains in coherence, lexical diversity, and grammatical complexity, suggesting that increasing task complexity may improve linguistic performance and stretch learners’ L2 understanding. This finding was supported by Robinson’s argument that humans possess multiple attentional capacities to process writing tasks. Task complexity did not compete with task performance or language attention.

Implications

The theoretical underpinnings of this study bolster Robinson’s Cognition Hypothesis, which posits that enhanced task complexity catalyzes cognitive engagement, thereby facilitating improved language production. The findings elucidate the dynamic nature of learners’ cognitive resources, suggesting that higher reasoning demands associated with task complexity can activate deeper language processing abilities without overtaxing the learners. This runs counter to the prevailing notion of a zero-sum game between task difficulty and cognitive load. Additionally, the study reveals an intriguing paradigm where prior knowledge of a subject matter is less influential on performance than the cognitive demands of a task. This sheds light on the potential for a more nuanced understanding of how language learning tasks can be structured to accentuate cognitive engagement over content familiarity, thereby enriching the academic discourse on second language acquisition.

From a practical standpoint, the implications of this study resonate with pedagogical strategies, assessment practices, and educational policy in English as a Foreign Language (EFL) contexts. The research advocates for the incorporation of incrementally complex tasks in the curriculum, which can engender substantial gains in language proficiency. This aligns with a pedagogical shift toward enhancing cognitive competencies over mere content retention, suggesting a need for teachers to diversify writing tasks that promote reasoning skills. Assessment-wise, the study signals a need to realign language evaluation metrics, like the College English Test, to mirror the cognitive capabilities of learners, rather than merely testing content knowledge. Moreover, the validation of Coh-Metrix 3.0 for text analysis reaffirms its utility in designing and assessing EFL materials, thereby supporting language educators in their quest to cultivate a richer linguistic output from learners. These practical insights carve a pathway for future educational initiatives to foster a cognitive-centric approach in language education, transcending traditional methodologies that prioritize rote learning.

The research findings of this study provided implications for task-based language testing, language assessment and writing instructions in classroom for the stakeholders. The findings related to task complexity and writing performance provided distinct implications for the knowledge of second language acquisition area, like the cognitive complexity framework of Robinson and Skehan, the writing process of the learners, and the linguistic performance of text features. Additionally, the findings also provided practical implications of language assessment for task design and writing test development in which cognitive complexity of tasks are expected to take into consideration. Finally, instructional implications were offered by showing the need to train the learners with diverse task types, like tasks of different degrees of topic familiarity, different levels of cognitive complexity, and different writing genres. In this way, students were supposed to be equipped with writing competence to face tasks with different challenges. Because the world has become increasingly text-oriented, writing has been designated as one of the most important abilities.

Limitations and Recommendations for Future Research

This study has some limitations. First, the limitations regarding the study sample and its impact on the generalizability of the findings must be addressed. Second, limitations concerning the research instrument, Coh-Metrics 3.0, which failed to detect grammatical errors in writing compositions need to be addressed. Researchers must amend errors before analyzing text features. Finally, the research design limitation needs to be addressed, and parallel groups need to be established to collect the data more effectively.

Developing theoretically motivated, empirically substantial, and pedagogically feasible sequencing criteria has long been acknowledged as a major research goal for operationalizing task-based approaches to syllabus design. Future studies should be conducted to further explore cognitive and affective aspects of writing. Furthermore, a qualitative study might be needed to explore the processing procedures of the writing process and probe why learners’ writing differed across task types and what kinds of task types could best elicit learners to generate high-quality text features that contribute to high-scoring essays.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the researor publication of this a/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Impact of Task Complexity and Task Difficulty on Learners’ Writing Performance in College English Tests, funded by Jiangxi Agricultural University Teaching Reform and Research Project, Grant Number 2023B2ZZ55.

ORCID iD

Yumei Zou

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author, Yumei Zou, upon reasonable request.

References

Abdi Tabari

Ivey

T. A

. (2015). Cognitive task complexity effects on L2 writing performance: An application of mixed-methods approaches. Latin American Journal of Content and Language Integrated Learning, 8(1), iii–v. https://doi.org/10.5294/laclil.2015.8.1.6

Adams

Nik Mohd Alwi

N. A.

Newton

(2015). Task complexity effects on the complexity and accuracy of writing via text chat. Journal of Second Language Writing, 29, 64–81. https://doi.org/10.1016/j.jslw.2015.06.002

Allaw

(2021). A learner corpus analysis: Effects of task complexity, task type, and L1 & L2 similarity on propositional and linguistic complexity. IRAL - International Review of Applied Linguistics in Language Teaching, 59(4), 569–604. https://doi.org/10.1515/iral-2018-0294

Lee

C. H.

(2021). Effects of task complexity on developing university students’ L2 writing through synchronous CMC. Korean Journal of English Language and Linguistics, 21, 1272–1293. https://doi.org/10.15738/kjell.21.202112.1272

Bläsing

Bornewasser

(2021). Influence of increasing task complexity and use of informational assistance systems on mental workload. Brain Sciences, 11(1), 1–19. https://doi.org/10.3390/brainsci11010102

Cho

(2015). Effects of task complexity on English argumentative writing. English Teaching, 70(2), 107–131. https://doi.org/10.15858/engtea.70.2.201506.107

Connor

(1990). Linguistic/rhetorical measures for international persuasive student writing. Research in the Teaching of English, 24, 67–87.

Crossley

S. A.

McNamara

D. S.

(2012). Predicting second language writing proficiency: The roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2), 115–135. https://doi.org/10.1111/j.1467-9817.2010.01449.x

Darong

H. C.

(2021). Integrated task on students’ writing quality: Is it more effective? English Language Teaching Educational Journal, 4(1), 25. https://doi.org/10.12928/eltej.v4i1.3336

10.

Dumais

S. T.

(2004). Latent semantic analysis. Annual Review of Information Science and Technology, 38, 188–230. https://doi.org/10.1002/aris.1440380105.

11.

Engber

C. A.

(1995). The relationship of lexical proficiency to the quality of ESL compositions. Journal of Second Language Writing, 4(2), 139–155. https://doi.org/10.1016/1060-3743(95)90004-7

12.

Faruji

L. F.

Kharaghani

(2019). Second language task grading and syntactic complexity in writing. BRAIN: Broad Research in Artificial Intelligence and Neuroscience, 10(4), 73–99. https://doi.org/10.18662/brain/06

13.

Ferris

D. R.

(1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28(2), 414. https://doi.org/10.2307/3587446

14.

Golparvar

S. E.

Rashidi

(2021). The effect of task complexity on integrated writing performance: The case of multiple-text source-based writing. System, 99, 102524. https://doi.org/10.1016/j.system.2021.102524

15.

Graesser

A. C.

McNamara

D. S.

Louwerse

M. M.

Cai

(2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, and Computers, 36(2), 193–202. https://doi.org/10.3758/BF03195564

16.

Hyejin

(2015). Effects of task complexity on english argumentative writing. English Teaching, 70(2), 107–131. https://doi.org/10.15858/engtea.70.2.201506.107

17.

Ismail

Samad

A. A.

(2017). Effects of task complexity variables on complexity, accuracy and fluency of second language production: A critical review. International Journal of Linguistics, 9(4), 94. https://doi.org/10.5296/ijl.v9i4.11609

18.

Ismail

Samad

A. A.

Wong

B. E.

Noordin

(2012). Effects of task reasoning demand and task condition on learner written output in ESL classrooms. International Journal of Education, 4(4), 2249. https://doi.org/10.5296/ije.v4i4.2249

19.

Jiang

J. L.

Han

B. C.

(2018). A study of reading text difficulty of cet 6, TOEFL and IELTS based on Coh-Metrix. Foreign Languages in China, 15(3), 86–95. https://doi.org/10.13564/j.cnki.issn.1672-9382.2018.03.017

20.

Jiaxin

(2015). The effects of increasing task complexity on EFL learners’ writing performance. Studies in Literature and Language, 11(4), 34–39. https://doi.org/10.3968/7711

21.

Kuiken

Vedder

(2007). Task complexity and measures of linguistic performance in L2 writing. IRAL—International Review of Applied Linguistics in Language Teaching, 45(3), 261–284. https://doi.org/10.1515/iral.2007.012

22.

Landauer

T. K.

(2015). LSA as a theory of meaning. In Handbook of latent semantic analysis. https://doi.org/10.4324/9780203936399.ch1

23.

Landauer

T. K.

McNamara

D. S.

Dennis

Kintsch

(2013). Handbook of latent semantic analysis. Psychology Press.

24.

Lawrence

Z. J.

(2017). Effects of task type on planning and writing production in online EFL writing. Modern Foreign Language, 40, 102–114.

25.

Huang

(2022). The impact of essay organization and overall quality on the holistic scoring of EFL writing: Perspectives from classroom English teachers and national writing raters. Assessing Writing, 51, 100604. https://doi.org/10.1016/j.asw.2021.100604

26.

Liu

(2022). Effects of task complexity on Chinese Efl writing. Education, Sustainability & Society, 4(2), 77–80. https://doi.org/10.26480/ess.02.2021.77.80

27.

Long

M. H.

(2017). Questions and Answers on Task-Based Language Teaching. International Chinese Education, 2(1).

28.

(2010). Cognitive Factors Contributing to Chinese EFL Learners’ L2 Writing Performance in Timed Essay Writing [Dissertation, Georgia State University]. https://scholarworks.gsu.edu/alesl_diss/13

29.

McNamara

D. S.

Louwerse

M. M.

McCarthy

P. M.

Graesser

A. C.

(2010). Coh-Metrix: Capturing linguistic features of cohesion. Discourse Processes, 47(4), 292–330. https://doi.org/10.1080/01638530902959943

30.

Morgan

P. V. K. D. W

. (2011). Determining sample size for research activities. Rna, 17(8), 1566–1577. https://doi.org/10.1261/rna.2763111

31.

Pae

(2012). Task complexity and second language writing performance: The effect of narrative prompts on the use of cohesive devices. Journal of Second Language Writing, 21(4), 418–433.

32.

Petchprasert

(2021). Utilizing an automated tool analysis to evaluate EFL students’ writing performances. Asian-Pacific Journal of Second and Foreign Language Education, 6(1), 1–17. https://doi.org/10.1186/s40862-020-00107-w

33.

Rahimi

Zhang

L. J.

(2018). Effects of task complexity and planning conditions on L2 argumentative writing production. Discourse Processes, 55(8), 726–742. https://doi.org/10.1080/0163853X.2017.1336042

34.

Robinson

(2011). Second language task complexity: Researching the cognition hypothesis of language learning and performance. John Benjamins Publishing Company.

35.

Ruiz-Funes

(2015). Exploring the potential of second/foreign language writing for language learning: The effects of task factors and learner variables. Journal of Second Language Writing, 28, 1–19. https://doi.org/10.1016/j.jslw.2015.02.001

36.

Simin

Farahman

(2017). Exploring the interplay of planning time, reasoning demands, and language learning aptitude in Iranian Efl learners’ written production. Journal of AsiaTEFL, 14(4), 736–754. https://doi.org/10.18823/asiatefl.2017.14.4.10.736

37.

Salimi

Dadashpour

(2012). Task complexity and language production dilemmas (Robinson’s cognition hypothesis vs. Skehan’s trade-off model). Procedia – Social and Behavioral Sciences, 46, 643–652. https://doi.org/10.1016/j.sbspro.2012.05.177

38.

Sattarpour

Farrokhi

(2017). Exploring the Interplay of Planning Time, Reasoning Demands, and Language Learning Aptitude in Iranian EFL Learners’ Written Production. Journal of Asia TEFL, 14(4), 736.

39.

Tavakoli

Skehan

(2005). Strategic planning, task structure, and performance testing. In Ellis

(Ed.), Planning and task performance in a second language (pp. 239–273). John Benjamins Publishing Company Publishing. https://doi.org/10.1075/lllt.11.15tav

40.

Wang

Zhang

L. J.

Wang

Cooper

(2024). The effects of task complexity and collaborative writing on L2 syntactical complexity development?: A self-determination theory perspective. Learning and Motivation, 88(April), 102035. https://doi.org/10.1016/j.lmot.2024.102035

41.

Wang

Zhang

L. J.

(2019). Peter Skehan’s influence in research on task difficulty. In Honour of Peter Skehan Wen

Z. E.

Ahmadian

M. J.

(Eds.), Researching L2 task performance and pedagogy (pp. 183–198). John Benjamins Publishing Company.

42.

Wang

(2021). The impact of task grading on writing performance: An investigation into Chinese university EFL learners. Journal of Second Language Writing, 52, 100803.

43.

Wang

Skehan

(2014). Structure, lexis, and time perspective. Processing Perspectives on Task Performance, 5, 155–185.

44.

T. S.

Zhang

L. J.

Gaffney

J. S.

(2021). Examining the relative effects of task complexity and cognitive demands on students’ writing in a second language. Studies in Second Language Acquisition, 44, 1–24. https://doi.org/10.1017/S0272263121000310

45.

Yang

Weigle

S. C.

(2015). Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality. Journal of Second Language Writing, 28, 53–67. https://doi.org/10.1016/j.jslw.2015.02.002

46.

Zhang

Lan

Yang

(2023). Grammatical complexity: Insights from English for academic purposes teachers. Journal of Second Language Writing, 60, 100974. https://doi.org/https://doi.org/10.1016/j.jslw.2023.100974

Influence of Task Complexity on Text Features and Writing Scores: Evidence from College Students in Southern China

Abstract

Keywords

Introduction

Literature Review

Task Complexity and Writing Scores

Task Complexity and Text Feature

Methodology

Data Collection Procedures

Sampling Technique

Statistical Analysis

Results

Student Writing Scores

Differences in Writing Scores

Different Performance of Text Features

Discussion and Conclusion

Enhanced Writing Performance on Reasoning Demanding Tasks

Variations of Writing Scores Among Tasks

Variations of Text Features Among Tasks

Implications

Limitations and Recommendations for Future Research

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Data Availability Statement

References