Sage Journals: Discover world-class research

Abstract

As integrated writing tasks in large-scale and classroom-based writing assessments have risen in popularity, research studies have increasingly concentrated on providing validity evidence. Given the fact that most of these studies focus on adult second language learners rather than younger ones, this study examined the relationship between written discourse features, vocabulary support, and integrated listening-to-write scores for adolescent English learners. The participants of this study consisted of 198 Taiwanese high school students who completed two integrated listening-to-write tasks. Prior to each writing task, a list of key vocabulary was provided to aid the students’ comprehension of the listening passage. Their written products were coded and analyzed for measures of discourse features and vocabulary use, including complexity, accuracy, fluency, organization, vocabulary use ratio, and vocabulary use accuracy. We then adopted descriptive statistics and hierarchical linear regression analyses to investigate the extent to which such measures were predictive of integrated listening-to-write test scores. The results showed that fluency, organization, grammatical accuracy, and vocabulary use accuracy were significant predictors of the writing test scores. Moreover, the results revealed that providing vocabulary support may not necessarily jeopardize the validity of integrated listening-to-write tasks. The implications for research and test development were also discussed.

Keywords

Academic writing adolescent English language learners discourse features L2 integrated writing assessment text analysis

Introduction

Integrated skill tasks are commonly employed in language assessments of varying stakes and scales to evaluate test-takers’ language production based on information they receive from source texts. This information may take the form of reading materials, lectures, class discussions, or other methods that are used to deliver content-specific information (Cumming, 2013; Cumming et al., 2016; Plakans, 2015; Plakans, Gebril, & Bilki, 2019; Weigle & Parker, 2012). Research has shown that integrated skill assessments tap into multiple contributing skills of interest and general language proficiency (Sawaki et al., 2013). The provision of source texts in an integrated assessment allows for a more holistic assessment of a test-taker’s language proficiency, ranging from language use to source use based on the accurate comprehension of a given text, than is possible with independent writing tasks that solely measure writing skills (Plakans, 2008).

In an authentic academic setting, almost all writing assignments require learners to engage with reading or listening materials and may include writing a research synthesis on a scientific phenomenon or reflecting upon a lecture topic (Gebril & Plakans, 2009a, 2009b; Leki & Carson, 1997; Ohta et al., 2018; Plakans et al., 2018; Plakans, Gebril, & Bilki, 2019; Plakans, Liao, & Wang, 2019; Sawaki et al., 2013; Weigle & Parker, 2012). Test-takers are also frequently required to interpret the source materials in their own words, producing what Leki and Carson (1997) classified as “content responsible” writing (p. 41). In second language (L2) integrated writing assessments, scholars have identified an interplay between receptive skills (e.g., reading and listening) and writing skills within both task-completion processes and source integration strategies (Barkaoui, 2015; Cumming et al., 1989; Delaney, 2008; Hirvela, 2016; Plakans, 2009b; Rukthong & Brunfaut, 2020; Shin & Ewert, 2015; Watanabe, 2001; Weigle & Parker, 2012). These studies have also found that discourse synthesis represents a crucial cognitive mechanism underlying integrated writing, which itself includes skills that are essential to (1) organizing source information and summarizing points, (2) selecting specific parts of the source and using them in essays, and (3) making connections between source information, the topic, and personal experiences (Barkaoui, 2015; Cumming et al.,2005, 2006; Gebril & Plakans, 2009; Plakans, 2008, 2009a, 2009b; Plakans et al., 2018; Plakans, Liao, & Wang, 2019).

Research on integrated assessment has been carried out to provide reliability and validity evidence for using scores derived from a test task that integrates different language modalities. While gathering and presenting reliability evidence can be relatively straightforward (see Yan & Fan, 2021, for current methods in reliability and dependability estimation), validity evidence can be quite multifaceted and may be gathered rather broadly (Chapelle, 2020). A common type of validity evidence that has been the focus of integrated assessment research can be best characterized as evidence based on internal structure (Johnson & Christensen, 2020). Researchers have focused heavily on how discourse features vary at different score levels in integrated writing assessments to identify the specific features that predict the resulting writing scores (e.g., Cumming et al.,2005, 2006; Gebril & Plakans, 2013; Guo et al., 2013; Knoch et al., 2014; Plakans & Gebril, 2017; Plakans, Gebril, & Bilki, 2019). In a study of two distinct types of integrated task, Cumming et al. (2005, 2006) argued that, when compared with their lower scoring counterparts, higher scoring test-takers tend to accurately write longer clauses with more complicated structures and include higher quality arguments when summarizing source evidence. Gebril and Plakans (2013) also found systematic differences in grammatical accuracy and source text use between the lowest and the upper two score levels. Such studies provide evidence for the systematic differences in essay quality across score levels, thereby contributing to the validation of integrated writing scores. This evidence is critical for proper score interpretation as well as for verifying the rating scale that is used to score integrated writing tasks.

However, it is worth noting that these earlier studies of written discourse features primarily focused on adult L2 learners, since Martínez (2018) and Ortega and Carson (2010) indicated that little is known about the relationship between adolescent English as a foreign language (EFL) learners’ writing performance and written discourse features. Research focusing on adolescent learners’ writing performance in integrated writing tasks is even scarcer, potentially due to the complicated nature of its tasks, for which other language modalities are involved in addition to writing skills. Such tasks have been exclusively used to assess adult learners’ L2 writing proficiency (Knoch & Sitajalabhorn, 2013). Integrated writing tasks are generally more challenging because they require test-takers to sufficiently comprehend source materials before accurately and coherently expressing their ideas in their own writing (Cumming et al., 2005; Plakans & Gebril, 2017). After conducting factor analysis for the Test of English as a Foreign Language (TOEFL) iBT integrated writing task, Sawaki et al. (2013) attested, “Examinees whose academic English ability has not yet reached the [college] admission level may be having difficulty (a) comprehending the source materials . . . (b) selecting and organizing the appropriate information from source materials, or (c) both” (p. 93). Considering the purported connections between integrated writing tasks and academic writing required in the higher education context, college-bound high school students need to prepare for types of writing tasks that involve multiple language modalities, such as integrated writing tasks. Integrated writing tasks can still be helpful for non-college-bound high school students to the extent that they may help students develop important academic and professional skills of reasoning, argumentation, and critical thinking (Deane et al., 2008). Thus, we argue that the scarcity of research on the use of integrated writing tasks for adolescent L2 learners is problematic and should be addressed.

With limited information about whether integrated writing tasks can be used to assess the writing performance level of adolescent L2 learners, we are left to wonder if language learners and teachers should wholly avoid administering integrated writing assessments to adolescent learners despite the importance of using multiple language skills simultaneously in academic contexts. There are prominent differences between these adolescent learners and their adult counterparts, and it is not possible to merely transfer research findings on adult learners to this under-research population of learners. One salient difference is vocabulary knowledge. Older learners tend to possess a larger L2 vocabulary size than younger learners (Puimège & Peters, 2019). Adult learners’ larger vocabulary knowledge in their L1 may help them realize cross-linguistic transfer between L1 and L2 (i.e., words that share morphological, phonological, or semantic similarities), so they are more likely to efficiently acquire L2 vocabulary than secondary school learners (Ke & Xiao, 2015). It is also our understanding that enabling skills, including vocabulary, grammar, and pronunciation, affect performance in all four language skills (i.e., reading, listening, speaking, and writing), and adolescent learners are still in the process of developing these skills (Cameron, 2001; Coxhead, 2006). In addition, compared with adult learners who may have already tackled a variety of writing tasks in academic contexts, adolescent L2 learners usually lack experience and training in academic writing (Maamuujav et al., 2021). With these differences in mind, it is clear that some scaffolding is needed for adolescent learners when administering integrated writing tasks. Given that comprehension skills for input materials are critical to tackle integrated writing assessment tasks (Barkaoui, 2015; Plakans & Gebril, 2013), we are curious as to what scaffolding, if any, can be provided to adolescent learners to aid their comprehension. These queries can be answered by providing vocabulary support, which is crucial for both comprehension and language production.

The purpose of this research was to investigate the relationship between integrated listening-to-write scores and written discourse features with the understanding that this relationship informs the extent to which these features are predictive of adolescent EFL learners’ integrated listening-to-write scores. Discourse features in this study included complexity, accuracy, fluency, and organization. Moreover, this study aimed to understand the relationship between vocabulary support and the resulting scores. Measures of vocabulary support impact included vocabulary use ratio and the accurate use of provided words. The present study thus contributes to the literature by describing variations in discourse features and the use of vocabulary support across different score levels for the purpose of providing validity evidence for adolescent EFL learners’ integrated writing scores.

Literature review

In explorations of the trajectory of L2 development, second language acquisition research has focused on three major components of the multidimensional construct of L2 proficiency, namely complexity, accuracy, and fluency (CAF; e.g., Housen et al., 2012; Larsen-Freeman, 1978, 2009; Norris & Ortega, 2009; Skehan, 1996, 1998, 2009). Complexity and accuracy represent L2 learners’ current stage of language knowledge, as they reveal the degree to which learners are able to accurately use language in a sophisticated way. Conversely, fluency connotes how quickly L2 learners can access their language knowledge to produce a certain amount of language in a limited amount of time (Wolfe-Quintero et al., 1998). Research studies in second language acquisition and language testing tend to examine these three components in combination when attempting to understand the impact of such conditions as instruction or planning, rather than in isolation (Ellis & Yuan, 2004; Foster & Skehan, 1996). These measures serve as crucial evidence of a learner’s development in L2 speaking or writing.

The validity argument for proper score interpretation in L2 testing requires the investigation of differences in CAF across varying score levels, and it is expected that higher scoring test-takers exhibit more complex, accurate, and fluent language production than their lower scoring counterparts. While a number of studies have been carried out to provide validity evidence for utilizing CAF and relevant measures for written discourse features (e.g., organization and vocabulary knowledge) when assessing L2 learners’ integrated writing proficiency (e.g., Baba, 2009; Cumming et al., 2006; Gebril & Plakans, 2013; Plakans & Gebril, 2017; Plakans et al., 2019), the scope of interpretation is bounded by research design, task type, and task condition. In particular, the lack of studies examining written discourse feature measures in relation to adolescent EFL learners prevents us from understanding the characteristics of the products of integrated writing composed by learners of this profile. For this reason, investigating adolescent EFL learners’ integrated writing with regard to written discourse features would add to the validity argument of integrated writing tasks.

Complexity

In its most comprehensible explanation, the term complexity refers to “the use of more challenging and difficult language” (Ellis, 2003, p. 343). Housen et al. (2012) provided a more detailed definition of this term as being “the ability to use a wide and varied range of sophisticated structures and vocabulary in the L2” (p. 2). In practice, complexity is the most problematic construct in the CAF triad (Pallotti, 2009), subsuming many different aspects under this concept. In the literature, complexity is categorized as syntactic complexity and lexical complexity (Housen et al., 2012).

Syntactic complexity

Syntactic complexity refers to, as defined by Ortega (2015), “the range and the sophistication of grammatical resources exhibited in language production” (p. 82). Lu (2011) wrote that syntactic complexity can be measured by quantifying one or more of the following aspects: “length of production unit, amount of subordination or coordination, range of syntactic structures, and degree of sophistication of certain syntactic structures” (pp. 36–37). Syntactic complexity is commonly employed as a developmental measure of L2 speaking or writing. Researchers interested in measuring the complexity of L2 production must first determine which aspects of syntactic complexity they will focus upon; such a decision often considers the contextual characteristics of the given study, such as the current language proficiency of L2 learners.

In assessing the writing of L2 learners who have yet to fully develop in terms of language capacity, the T-unit serves as a viable index of syntactic complexity. Previous studies have used the T-unit, or the minimal terminable unit, to analyze L1 writing by children and adolescents (e.g., Hunt, 1965; Scott, 1988). Plakans, Gebril, and Bilki (2019) operationalized complexity as the mean length of T-unit to identify its contributions to the variance in integrated writing scores, while Cumming et al. (2006) reported that the length of clauses varied significantly among learners at different score levels.

Lexical complexity

While researchers agree that lexical complexity is an important indicator of L2 proficiency, there is as yet no uniform definition for it. The consensus is that lexical complexity is a multidimensional construct that consists of several different aspects, as with the constructs in the CAF triad. Jarvis (2013) suggested that lexical sophistication and lexical diversity are important dimensions of lexical complexity. Conversely, Read’s (2000) conceptualization of lexical complexity included lexical density, lexical sophistication, lexical variation, and the number of errors in vocabulary use. Among these traits, lexical sophistication denotes the extent to which a learner is able to use low-frequency, advanced lexical choices (Laufer & Nation, 1995). In language assessment, more convenient measures are commonly adopted to denote the maturity of a test-taker’s lexical sophistication. For example, in a comparison study of the discourse features between independent and integrated writing tasks, Cumming et al. (2006) gauged lexical sophistication by utilizing average word length and type–token ratio. However, recent research has indicated that the type–token ratio approach measures lexical cohesion, rather than lexical sophistication (Crossley et al., 2015).

Baba (2009) conducted a study in which she ran correlation and regression procedures to identify the influence of three lexical complexity indices—vocabulary size, word definition ability, and lexical diversity—on summary writing performance. While both vocabulary size and the ability to define words correlated moderately with summary writing performance, lexical diversity had a negative, statistically nonsignificant correlation with the dependent variable. According to the regression analyses, however, the effects of lexical complexity were not comparable to those of basic language abilities, including English reading proficiency, summary length, English proficiency, knowledge of Japanese vocabulary, and Japanese writing proficiency.

Grammatical accuracy

Accuracy refers to the extent to which language production is free of grammatical errors (Housen & Kuiken, 2009; Wolfe-Quintero et al., 1998). Since language production is compared against a presumed norm, accuracy is arguably “the most straightforward and internally consistent construct” (Housen et al., 2012, p. 4). In L2 testing and assessment, the number or rate of errors in a given essay is considered to be a common proxy of writing accuracy. As with other constructs in the framework, accuracy holds a positive correlational relationship with proficiency score; that is, higher scoring learners tend to produce essays with higher accuracy (e.g., Arnaud, 1992; Scott & Tucker, 1974). A recent study by Peng et al. (2020) provided a more nuanced understanding about the construct, demonstrating that writing accuracy can differ as a function of the linguistic complexity of the source text. After administering a continuation task (a type of integrated writing that requires reading an incomplete story and completing it) to the participants, the researchers found that providing a simplified text that better reflected the learner’s production capability was more conducive to eliciting better writing accuracy than giving them an unsimplified text. In contrast, in another study by Shi et al. (2020), which explored the effect of prompt type in the continuation task, writing accuracy was comparable across all four different types of prompts. Taken together, mixed evidence exists as to the level of grammatical accuracy that an L2 learner exhibits in an essay. Also, it is important to note the potentially inverse relationship between grammatical accuracy and syntactic complexity, in the sense that an L2 learner might tend to construct sentences in a test task in a way that increases grammatical accuracy at the price of reduced syntactic complexity (Jagaiah et al., 2020).

Fluency

Fluency is understood to be the ease and rapidity with which a test-taker accesses his or her language system to retrieve the necessary language forms in response to a task requirement. L2 writing assessment research has mostly utilized length as a proxy for fluency development. As such, fluency in L2 writing assessment contexts indicates the rate and amount of written production generated within a given time. Study results have consistently shown that fluency correlates with performance level; that is, lower scoring test-takers tend to produce briefer essays than higher scoring test-takers (Cumming et al., 2005, 2006; Jiang et al., 2019; Plakans et al., 2019). As an illustration, when investigating the discourse features of L2 test-takers’ integrated writing, Cumming et al. (2005, 2006) and Plakans, Gebril, and Bilki (2019) tallied the total number of words written in a composition as a measure of fluency. Plakans, Gebril, and Bilki (2019) reported that fluency was the biggest contributor to the variance in integrated writing scores, accounting for approximately 25% of variance. It is also worth noting similar findings in Gebril and Plakans (2013) and Shi et al. (2020). In particular, Shi et al. indicated that a source text whose complexity is comparable with the production ability of the test-takers can provide more significant gains in fluency than can an unsimplified source text.

Organization

In assessing the potential relationship between organization and score in a writing task integrated with listening and reading, Plakans and Gebril (2017) observed that the five-point rubrics used for grading essay organization showed statistically significant differences in the main effect of integrated score. Coherence, which was operationalized as the logical flow of written production, elicited significantly different means across score levels, whereas the results of a multivariate analysis of variance (MANOVA) test suggested that cohesion (e.g., repetition and use of connection words) did not derive significant differences in the integrated writing scores. Similarly, a study by Crossley and McNamara (2012) revealed that essays written by L2 learners categorized as highly proficient were characterized by effective lexical sophistication, rather than cohesion. The purported non-significance of cohesion can be explained by Graesser et al. (2004), who suggested that higher level texts usually contain more implicit cohesiveness, requiring readers to make inferences. Taken together, these studies demonstrate that organization can be a complex construct for L2 learners to effectively learn and display in an evaluative context.

Vocabulary support

As Cumming (2013) and Sawaki et al. (2013) noted, among the constraints that test-takers of an integrated writing task face is the threshold proficiency level that makes writing for integrated tasks possible. That is, learners who are not equipped with a minimum level of language proficiency at the time of assessment are unlikely to produce an effective essay when faced with a prompt that requires the integration of different language skills. While a number of cognitive and affective capabilities are necessary for effective performance in an integrated writing task, vocabulary knowledge is considered one of the most foundational factors in comprehending input materials (e.g., Beglar & Hunt, 1999; Laufer, 1992; Qian, 1999; Stæhr, 2008, 2009; Zhang, 2012). For instance, in a study on the composing processes of L2 writers completing reading-to-write tasks, Plakans (2009a) found that writers of all levels experienced challenges related to vocabulary. These results showed that the limited vocabulary knowledge of lower scoring writers affected how well they were able to understand the source materials. Similarly, Rukthong and Brunfaut (2020) indicated, in an integrated task that involves listening and writing, that high-quality integrated task performance is mediated crucially by a complex interaction between low- and high-level text processing and strategy use. Importantly, activation of high-level text processing (e.g., semantic and pragmatic processing) is impossible without text processing at a low level, such as acoustic–phonetic processing, word recognition, and parsing.

Although studies focusing on the impact of vocabulary support on integrated listening-to-write test performance are lacking, research has shown that providing vocabulary support in advance of a listening task aids test-takers’ listening comprehension. Chung (2002) suggested that the post-test scores of their experimental group, which had been exposed to a treatment of vocabulary pre-teaching, were significantly higher than those of the group given no treatment. Comparing four types of pre-listening support, Babaei and Izadpanah (2019) found using multimedia annotations and pre-teaching key vocabularies to be associated with significantly positive gains in listening comprehension scores for elementary-level EFL learners. In a similar study carried out by Madani and Kheirzadeh (2022), vocabulary preparation and pre-reading questions facilitated elementary-level EFL learners’ listening comprehension.

Studies have shown that analyzing the textual features of integrated writing tasks helps us differentiate test-takers at varying English performance levels. However, it is worth noting that these studies focused solely on one particular population of learners (specifically, adult, university-level English as a second language [ESL]/EFL learners). It is therefore unknown if textual features can be used to differentiate the performance levels of other populations, such as adolescent or young English-learning students. Moreover, the integrated writing tasks that were used in these studies were predominantly reading-to-write or reading–listening–writing, and thus the impact of listening-to-write tasks on textual features is unclear. In addition to the textual features of integrated writing, studies have highlighted the influence of vocabulary on both reading and listening comprehension and writing performance. Furthermore, although previous studies have suggested that pre-learning the vocabulary of source materials may benefit learners’ comprehension, the tasks that were used in these studies were not integrated. Since studies of the effects of pre-learning vocabulary remain scarce, more studies are needed to more fully understand the role of pre-learning vocabulary in integrated writing assessments.

The purpose of the present study was to investigate the textual features that are produced in listening-to-write tasks as well as the contributions of vocabulary support to adolescent L2 learners’ integrated writing performance. With this objective in mind, the current study intends to address the following questions:

Do textual features and vocabulary support predict students’ writing performance in integrated listening-to-write assessments?

If yes, how are the measures of textual features and vocabulary support characterized by score level?

Methods

Participants

The participants in this study consisted of 198 Taiwanese high school students. In Taiwan, high school education lasts 3 years (Grades 10 through 12 in U.S. education) and is part of compulsory formal education, with a national curriculum set forth by the Ministry of Education (MOE). The graduation requirements specify that students take five English class periods per week for 3 years. Each classroom has mixed proficiency levels.

The participants included 128 males and 70 females, all of whom were in their second year of high school (equivalent to Grade 11 in the U.S. school system). All the participants were 17 years of age, and none had studied English abroad or lived outside their country at the time of data collection. Their native language was Mandarin Chinese, although some also spoke Taiwanese or another indigenous language at home or in their community. Based on information from their English teachers, learning materials, and class assignments, the participants’ overall English proficiency ranged from A2 (low-intermediate level) to B1 (intermediate level) on the Common European Framework of Reference for Languages (Council of Europe, 2001) proficiency scale. It is important to note that the MOE launched in 2021 the Bilingual 2030 Policy (Ministry of Education Republic of China, 2021) which aimed to transform Taiwan into a Mandarin–English bilingual nation by 2030. To accommodate this national policy, all compulsory education, as well as higher education, institutions have been working toward using English as one of the mediums of instruction.

Of the 198 students, 130 completed two integrated writing tasks, and the remaining 68 completed only one integrated writing task. Although we collected 328 essays in total, 60 of these were excluded from the data analysis, since these essays (1) did not include any written responses, meaning that there were no texts to analyze; (2) only expressed the students’ inability to compose written responses (e.g., I do not know or I can’t write anything); or (3) included only a word from the lectures (e.g., Hurricanes) that had been previously provided to them. Since the focus of this study was to examine the relationship between textual features and their scores, we focused on the responses that included at least a single phrase that was relevant to the topic of the lectures. In the end, a total of 268 essays were used for data analysis, of which 210 were written by those who had completed the two listening-to-write tasks.

Listening-to-write tasks

Language tasks that require both listening and writing skills are appropriate and useful for adolescent L2 learners. Research has shown that younger language learners tend to comprehend source materials through audio input better than through visual input (Miller & Smith, 1989; Price et al., 2016; Prior & Welling, 2001). Moreover, performing a writing task (e.g., writing a reflection or summary) based on aurally acquired information (e.g., from a lecture or discussion) is a typical task that students encounter in secondary and higher education. Moreover, integration of listening and writing skills is often seen in standardized English proficiency tests, such as the TOEFL. We contend that compared with other types of integrated writing tasks, such as reading-to-write or reading–listening–writing, listening-to-write tasks were more appropriate to administer to our participants, considering their age and the prevalence of such tasks in a real-life academic context.

Two listening-to-write tasks were provided as part of the extracurricular activity, and the data collection took place during summer, when a special summer curriculum was in place. The school principal and the English teachers at the data collection site considered these tasks to be beneficial for their students, as they would usually learn English in an integrated manner whereby multiple language skills were developed in a single class. Both integrated writing tasks were adopted from a standardized test preparation book (Uehara, 2015). Each task required the students to listen to an academic lecture while taking notes and then write a paragraph describing the lecture based on their listening comprehension. Two writing topics were related to science, one about the two types of mountains and the other about the life cycle of a hurricane. All students completed the two listening-to-write tasks on the same day, except for those who only completed one task. The 130 students who participated in the two listening-to-write tasks took a 10-minute break after one task had been completed. When implementing the two tasks, the order was randomized; 134 students completed the mountain topic first, and the rest completed the hurricane topic first. Those who completed one listening-to-write task performed the first topic only.

Before listening to each lecture, a list of key vocabulary words was provided to aid the students’ understanding of the listening material. When students received the vocabulary list, we explained the meaning of the words in students’ L1 and taught their pronunciation. We also gave example sentences for words that were abstract in meaning. We encouraged students to ask questions while they were processing these given words. Students were also allowed to take notes before they listened to the lecture. This vocabulary support was considered a necessary scaffolding for students who may not have been familiar with some vocabulary words included in the lecture, as research has shown that integrated writing tasks require a threshold level of comprehension of source materials (e.g., Cumming, 2013; Sawaki et al., 2013). To create the vocabulary list, we began by selecting words based on the curriculum guidelines developed by the Taiwan MOE for K–12 English education. For its national high school curriculum, the MOE has selected 4500 English words for students to learn by the time they graduate from high school (Ministry of Education Republic of China, 2018). To ensure that there were no other words that might be challenging for the students, we asked English teachers to look at the listening scripts and identify words that need glossing based on their professional judgment. We then added those selected words to the glossary. The vocabulary list included the Chinese equivalents. Since the purpose of providing the glossary was to aid their listening comprehension, the students were not instructed to use these provided words, nor were they encouraged to use the words in their writing.

The two raters (both of whom were experienced English teachers) holistically scored the written products according to the TOEFL Junior Writing Scoring Guide: Listen–Write (Educational Testing Service, 2012). The interrater reliability between the two raters was r = .90 with p < .001. A third rater, who was also an experienced English language teacher, was involved to remedy any discrepancies between the other two raters. Based on their listening-to-write scores, the students’ essays were categorized into low, intermediate, and high score levels (see Table 1) before being analyzed for textual features and the impact of vocabulary support.

Table 1.

Levels of writing scores.

Score level	Essay scores	Number of participants
High	3–4	43
Middle	2	109
Low	0–1	116
Total		268

Selection of textual features

When analyzing the textual features of integrated writing performance, Cumming et al. (2005, 2006) focused on the following areas: lexical sophistication, syntactic complexity, rhetoric, and pragmatics. Based on Cumming et al. (2005, 2006), Gebril and Plakans (2009), and Plakans, Gebril, and Bilki (2019), we chose several textual features as the focus of this study to further investigate the relationship between textual features and integrated writing performance. These chosen textual features are described in the following sections.

Lexical sophistication

We chose to measure lexical sophistication by examining the average word length (Balota et al., 2007; Cumming et al., 2005, 2006; Grant & Ginther, 2000; Kramer & McLean, 2019; Maamuujav et al., 2021; Sawaki et al., 2013; Yoon, 2017). Microsoft Word was used to calculate the average word length, in accordance with the formula that was proposed by Cumming et al. (2005), namely: “the number of characters divided by the number of words per composition” (p. 9).

Syntactic complexity

Based on Ortega (2003) and prior studies, syntactic complexity was measured by calculating the average number of T-units in each sentence as well as the mean T-unit length (Casal & Lee, 2019; Cumming et al., 2005; Henry, 1996; Jiang et al., 2019; Jin et al., 2020). Previous research has suggested that mean T-unit length is capable of differentiating writing levels among learners for a writing task based on source material (Casal & Lee, 2019), lending support for its use as a measure of syntactic complexity.

Fluency

Studies have shown that word count can be an effective approach for distinguishing writing fluency across score levels (Cumming et al., 2005, 2006; Johnson et al., 2012; Kim et al., 2018; Plakans, Gebril, & Bilki, 2019; Shi et al., 2020). As Johnson et al. (2012) suggested, using the total number of words in a given essay is a feasible approach for measuring fluency, since in an evaluative context, timing is a variable held constant for all test-takers. Also, as mentioned above, Cumming et al. (2005) and Plakans, Gebril, and Bilki (2019) found that in their integrated assessment, the fluency measure operationalized by the total word count accounted for the largest score variance, reflecting ability difference among test-takers.

Grammatical accuracy

The grammatical accuracy of the students’ essays was judged based on the holistic scoring rubric used by Cumming et al. (2005), which was developed based on Hamp-Lyons and Henning (1991). The scores ranged from 1 to 3, with 3 being the highest score. The interrater reliability between the two raters was .84 with p < .001 (see Table 2 for the rating scale).

Table 2.

Rating scale of grammatical accuracy.

Score	Criteria
3	Few errors, and comprehensibility seldom obscured
2	Some errors, but comprehensible
1	Many severe errors, often affecting comprehensibility

Source: Adopted from Cumming et al. (2005, pp. 9–10).

Organization

We developed a rating scale and piloted for revisions according to previous studies that focused on evaluating organization in L2 writing (Carr, 2000; Jacob et al., 1981; Li & He, 2015; Plakans & Gebril, 2017). During piloting, the anchor essays at each score level were selected to rate the appropriateness of organization using the rubric. We then used the rating scale to rate all the essays, and the interrater reliability between the two raters was .96 with p < .001 (see Table 3 for the rating scale).

Table 3.

Rating scale of organization.

Score	Criteria
3	Ideas are clearly stated and connected. Ideas are logically sequenced and well organized. Cohesive devices are used appropriately and effectively
2	Ideas are somewhat choppy and loosely connected. Main ideas are present but with limited support. Ideas are logical but may not use appropriate cohesive devices
1	Ideas are not logically sequenced and are mostly disconnected. Ideas are minimally developed and unclear. Cohesive devices are absent or inappropriate. Ideas are isolated and unassociated with each other
0	Not enough materials to evaluate. The content only contains a list of words

Vocabulary support

Although we did not encourage the students to use the provided vocabulary, most of the written products included the given words. Thus, we calculated the vocabulary ratio to explore patterns of using the provided vocabulary among students. To calculate the vocabulary ratio, we began by counting the number of glossed words in the students’ essays and then divided these identified glossed words by the total number of words written.

Moreover, to investigate how vocabulary support affected the students’ writing performance, we focused on the accurate use of the provided words (i.e., whether the students used the provided words appropriately and meaningfully). To create a scoring rubric for vocabulary use, we first reviewed the literature relevant to the assessment of vocabulary usage. Then we piloted the scoring rubric with anchor essays at each score level and revised the rubric for appropriateness, clarity, and task relevance. The interrater reliability between the two raters was .94 with p < .001 (see Table 4 for the scoring rubric).

Table 4.

Scoring rubric of vocabulary use in terms of accuracy.

Score	Criteria
3	• Student uses the provided vocabulary words while demonstrating his or her understanding of the part of speech• Student uses most of the provided words in a meaningful way• Student conjugates some of the provided verbs in an accurate way• Student uses modified words or alternative words that mean the same thing (synonym)• Occasional instances of misused provided words are present• Student rarely misspells provided words
2	• Student copies the words from the list, capitalizing the first letter• Student sometimes uses the provided words in inappropriate or incoherent ways (i.e., the part of speech is wrong)• Essay shows a combination of incorrect and correct usage of the provided words• Some words are misspelled
1	• Student copies words from the list and plugs them in randomly• Student copies the words from the list, capitalizing the first letter• Student does not demonstrate his or her understanding of the correct part of speech• Misspelling occurs frequently• Essay is too short to fully demonstrate the usage of provided vocabulary
0	• Student did not use the provided vocabulary at all

Analysis

We employed descriptive statistics and hierarchical multiple regression (HMR) analyses (i.e., stepwise) in our attempts to answer the research questions. To avoid interplay between the variables of textual features and vocabulary support, the stepwise regression analyses were carried out separately, thereby allowing us to examine the impact of textual features and vocabulary support on students’ writing performance. For the first regression model, the criterion variable was the students’ essay scores, and the predictor variables were fluency, organization, grammatical accuracy, mean number of T-units in each sentence, mean T-unit length, and lexical sophistication. For the second regression model, the criterion variable was the students’ essay scores, and the predictor variables were vocabulary use ratio and accurate use of the provided vocabulary words.

We checked the assumptions for the HMR analyses in accordance with Thorndike and Thorndike-Christ (2009). First, there was no multicollinearity, as the tolerance scores for the interested variables were all above .1 (organization = .28; fluency = .28; grammatical accuracy = .68; mean number of T-units in each sentence = .92; mean T-unit length = .80; lexical sophistication = .78; vocabulary use ratio = .96; vocabulary use accuracy = .96). Second, the histogram and normal probability plots showed that the residuals were normally distributed. Third, no outlier was identified based on the results of Cook’s distance (i.e., <1). Fourth, homogeneity of variance was checked by reviewing the scatterplots of standardized predicted values by standardized residuals.

Results

Descriptive statistics of textual features and vocabulary support by writing score level

Table 5 presents the descriptive statistics for textual features and vocabulary support across different score level groups. In terms of textual features, a general pattern revealed that some textual features (including fluency, organization, grammatical accuracy, and mean T-unit length) increased as the essay scores rose. However, other features (i.e., mean number of T-units in each sentence and lexical sophistication) did not follow this linear pattern. Regarding vocabulary support, vocabulary use accuracy increased as the essay scores increased, while vocabulary use ratio showed an opposite pattern.

Table 5.

Descriptive statistics for textual features and vocabulary support across three score levels.

Score level	Fluency		Organization		Grammatical accuracy		Syntactic complexity(Mean number of T-units in each sentence)
Score level	M	SD	M	SD	M	SD	M	SD
Low(n = 116)	24.47	17.11	.97	.55	1.30	.54	1.39	.63
Middle(n = 109)	51.58	13.36	1.88	.53	1.95	.68	1.41	.62
High(n = 43)	78.67	18.61	2.62	.48	2.47	.50	1.32	.38
Score level	Syntactic complexity(Mean T-unit length)		Lexical sophistication		Vocabulary use ratio		Vocabulary use accuracy
Score level	M	SD	M	SD	M	SD	M	SD
Low(n = 116)	7.55	3.23	5.05	.63	.15	.15	1.02	.72
Middle(n = 109)	8.44	1.84	4.71	.35	.11	.06	1.77	.74
High(n = 43)	9.32	1.96	4.71	.30	.10	.06	2.44	.59

The predication of textual features and vocabulary support on students’ writing performance in integrated listening-to-write assessments

In the first regression model, fluency was entered in the first step, followed by organization, grammatical accuracy, mean number of T-units in each sentence, mean T-unit length, and lexical sophistication. The second regression model included accurate use of provided words in the first step, followed by vocabulary use ratio.

As Table 6 demonstrates, the textual features (i.e., fluency, organization, and grammatical accuracy) accounted for 72% of the variance in essay scores, with sole contributions from fluency (Model 1, ΔR² = .61), organization (Model 3, ΔR² = .07), and grammatical accuracy (Model 3, ΔR² = .03). The addition of other variables (i.e., mean number of T-units in each sentence, mean T-unit length, and lexical sophistication) did not change R² significantly, meaning that they did not contribute to the variance of essay scores.

Table 6.

Stepwise regression for textual features predicting essay scores.

	Variable	Total R²	ΔR²	B	SE B	β
Step 1	Fluency	.61	.61	.03	.00	.78**
Step 2	FluencyOrganization	.68	.07	.01.54	.00.07	.40 .46
Step 3	FluencyOrganizationGrammatical accuracy	.72	.03	.01.40.27	.00.07.05	.40 .35 .21**
Step 4	FluencyOrganizationGrammatical accuracyThe mean number of T-units in each sentence	.72	.00	.01.41.27.01	.00.07.05.05	.40 .35 .21** .01
Step 5	FluencyOrganizationGrammatical accuracyThe mean number of T-units in each sentenceThe mean T-unit length	.72	.00	.01.41.27.01.00	.00.07.05.05.01	.40 .35 .21** .00.00
Step 6	FluencyOrganizationGrammatical accuracyThe mean number of T-units in each sentenceThe mean T-unit lengthLexical sophistication	.72	.00	.01.41.27.01.00.02	.00.07.05.05.01.06	.40 .35 .21** .01.00.01

Note: **p < .001.

Table 7 shows that the vocabulary use of scaffolding materials accounted for 45% of the essay score variance, with the major contribution stemming from vocabulary use accuracy (Model 1, ΔR² = .39). Although adding the vocabulary use ratio contributed to the model significantly, it only slightly increased R² (Model 2, ΔR² = .06). Compared with vocabulary use accuracy, vocabulary use ratio had a smaller effect on the essay score variance.

Table 7.

Stepwise regressions for vocabulary support predicting essay scores.

	Variable	Total R²	ΔR²	B	SE B	β
Step 1	Vocabulary use accuracy	.39	.39	.67	.05	.62**
Step 2	Vocabulary use accuracyVocabulary use ratio	.45	.06	.73–2.10	.05.38	.67 –.25

Note: **p < .001.

In summary, the results of the HMR analyses indicate that written discourse features had varying degrees of impact on the essay score variance. The first regression model revealed that fluency, followed by organization and grammatical accuracy, was the most predictive variable of the essay scores. Regarding the influence of vocabulary scaffolding materials, the second regression model indicated that vocabulary use accuracy had more predictive power than vocabulary use ratio in the essay scores.

Discussion and implications

The results of this study address the construct validity of integrated listening-to-write assessments by investigating the relationship between written discourse features, vocabulary support, and the resulting scores. Validation research involves collecting five different types of empirical evidence to support proper interpretations and uses of test scores, one of which is examining the internal structure of test-taker responses (Chapelle, 2020). This research is critical for supporting proper score interpretations and uses that warrant the implementation of an innovative integrated writing task for adolescent EFL learners. According to the HMR analyses, several textual features significantly predicted students’ writing performance in the integrated listening-to-write assessments. That is, the regression model showed that fluency (i.e., text length), organization, and grammatical accuracy were significant predictors of the current study’s integrated listening-to-write scores. Successful performance in listening-to-write assessments depends on students’ ability to (1) comprehend the lecture content by identifying key ideas and supporting details and (2) construct a piece of writing with minimum grammatical errors and proper organization. By doing these successfully, the text will naturally be lengthy.

The results of our study coincide with those of studies focusing on adolescent L2 learners’ independent writing tasks. Uccelli et al. (2013) and Wolf et al. (2018) indicated that text length, organization, and grammatical accuracy were significant predictors of independent writing scores. Among these significant predictors, text length was the major contributor to determining essay quality, as our study also demonstrated. Our findings also correspond with Cumming et al. (2006), whose investigation of discourse features included a listening-to-write task for adult L2 learners, finding that higher scoring writers tended to write longer compositions and clauses with greater lexical variations and grammatical accuracy. This coincides with another finding from Cumming et al.’s (2005) study, which suggested that lower scoring students tend to write shorter noun phrases and compositions and repeatedly use words provided from source texts with a lack of grammatical accuracy.

Looking more broadly, fluency, organization, and grammatical accuracy have been found to be important for successful performance in adult L2 learners’ integrated writing assessment (e.g., Cumming et al., 2005, 2006; Gebril & Plakans, 2013; Plakans, 2009a; Plakans & Gebril, 2017; Plakans, Gebril, & Bilki, 2019) regardless of the type of integration, including reading–listening–writing and reading-to-write. These studies emphasized that higher scoring writers are more likely to produce longer texts with appropriate organization and grammatical accuracy than their lower scoring counterparts. The current study confirms that listening-to-write tasks for adolescent L2 learners are no exception for assessing these three important writing features as target constructs. While bearing in mind that adolescent L2 learners are different in many ways to adult learners, the findings of this study coincide with previous research focusing on adult L2 learners. That is, adolescent learners must strive to develop skills to transfer their own understanding of the source material into a coherent piece of writing. It is critical for them to practice composing a sentence, organize a few sentences based on idea units, and then identify and correct language errors during the course of writing.

The present study also investigated how the use of provided vocabulary words predicted integrated listening-to-write performance. The regression model demonstrated that students who used the provided words accurately and meaningfully in their written products tended to receive higher listening-to-write scores than those who frequently copied the words and used them randomly in their paragraph. The lower scoring students in this study used the provided words in their writing more often than did their higher scoring counterparts (see Table 5 for the vocabulary ratio). On the other hand, higher scoring students exhibited incidental use of the provided words in an attempt to express their understanding of the lecture content in their own words. These results correspond with the findings of Kyle (2020) and Weigle and Parker (2012) in that, compared with higher scoring writers, the lower scoring counterparts mainly copied words, phrases, or content from the source texts when performing integrated writing tasks. In this present study, the vocabulary support was provided to aid listening comprehension, but this scaffolding did not necessarily bring about students’ improved performance in the listening-to-write tasks, nor did it disproportionately benefit a particular group of students. Those who chose not to use the provided words in their essays were not systematically penalized. Thus, we argue that vocabulary support does not jeopardize the interpretation of listening-to-write scores.

All in all, vocabulary support should be considered a useful scaffolding tool when administering integrated listening-to-write assessments to adolescent learners. Research in the field of L2 listening has shown that pre-listening activities (e.g., providing key vocabulary) can bolster L2 learners’ vocabulary knowledge, which is required for listening comprehension (Babaei & Izadpanah, 2019; Chung, 2002; Jafari & Hashim, 2012; Madani & Kheirzadeh, 2022). Yet in our study, we did not have empirical data to claim that vocabulary support actually helped students’ listening comprehension, except for positive anecdotal comments that we received from the English teachers and participating students. Clearly, more research is needed to investigate the efficacy of vocabulary support in integrated writing assessments that involve listening skills, such as including comprehension questions before the writing stage.

Conclusion

While the findings of this study make an important contribution to the literature of L2 integrated writing assessment for adolescent learners, it has limitations that should be noted. First, as the content of the lectures was related to science alone, including lectures related to more diverse fields (e.g., art, history, or politics) may minimize the topic effect. Second, in accordance with previous research (e.g., Cumming et al., 2005; Gebril & Plakans, 2009; Sawaki et al., 2013; Shi et al., 2020), we chose measures of textual features to understand how these features are related to integrated listening-to-write performance; however, using other measures of discourse characteristics may provide different findings. For example, studies have suggested the use of corpus data, such as average reference corpus word range (i.e., the frequency of a word in reference to a corpus of texts), the average reference corpus bigram and trigram (i.e., combinations of two and three words, such as your assignment and complete your assignment), and word frequency to examine lexical sophistication (e.g., Crossley et al., 2012; Crossley et al., 2010; Kyle & Crossley, 2015, 2016)

Third, while the multicollinearity of the predictor variables was checked, it is possible that the vocabulary ratio may have been influenced by the number of words the students wrote in their essays (i.e., the vocabulary ratio decreases when students write lengthy essays using their own vocabulary bank). Fourth, this study did not have a control group, so we could not conduct group comparisons. This limits our understanding of the impact of vocabulary support on students’ integrated writing performance. In future studies, researchers should consider dividing students into experimental and control groups to examine whether vocabulary support affects their understanding of the listening materials as well as their integrated writing performance. Since the integrated listening-to-write tasks in this study were given to the students as extracurricular activities, which were not linked to their school performance, their motivation levels might have varied. Consequently, some students might not have made the same effort as they would have if these tasks were given as regular classroom assessments. Another limitation is that since the participants in this study were all Taiwanese EFL learners, it may be difficult to generalize the findings to EFL learners in other parts of the world. Indeed, our findings may have varied had our study included English learners from diverse cultural and language backgrounds. Last, it should be noted that the researchers had only limited information about the students’ overall English proficiency prior to the study; as such, we have a limited understanding of how the students’ English proficiency levels may interplay with the current integrated writing tasks.

Overall, our findings demonstrate how textual features and vocabulary support predict adolescent English learners’ integrated writing performance. The present study sheds light on an under-researched population of learners’ L2 writing performance. To successfully perform a listening-to-write assessment, adolescent learners must write a text with a proper length, appropriately organize their understanding of the listening materials, and include as few grammar errors as possible that interfere with meaning. We found that vocabulary support did not pose a threat to the validity of integrated listening-to-write tasks. Thus, practitioners, such as teachers and test developers, can consider providing adolescent learners with vocabulary support as a pre-listening activity, given the importance of vocabulary knowledge in listening comprehension.

Footnotes

Acknowledgements

Our special thanks go to Warren Merkel for his professional feedback. We would also like to thank the four anonymous reviewers for their assistance in revisions of this article.

Author contribution(s)

Ray J. T. Liao: Data curation; Formal analysis; Investigation; Methodology; Project administration; Supervision; Writing – original draft; Writing – review & editing.

Renka Ohta: Data curation; Formal analysis; Investigation; Methodology; Project administration; Supervision; Writing – original draft; Writing – review & editing.

Kwangmin Lee: Writing – original draft; Writing – review & editing.

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: This study was conducted when Dr. Ohta was affiliated with the University of Iowa.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Ray J. T. Liao

Renka Ohta

Kwangmin Lee

References

Arnaud

P. J. L.

(1992). Objective lexical and grammatical characteristics of L2 written compositions and the validity of separate-component tests. In Arnaud

P. J. L.

Béjoint

(Eds.), Vocabulary and Applied Linguistics (pp. 133–145). Palgrave Macmillan. https://doi.org/10.1007/978-1-349-12396-4_13

Baba

(2009). Aspects of lexical proficiency in writing summaries in a foreign language. Journal of Second Language Writing, 18(3), 191–208. https://doi.org/10.1016/j.jslw.2009.05.003

Babaei

Izadpanah

(2019). Comparing the effects of different advance organizers on EFL learners’ listening comprehension: Key vocabularies, previewing comprehension questions, and multimedia annotations. Cogent Education, 6(1), 1–25. https://doi.org/10.1080/2331186X.2019.1705666

Balota

D. A.

Yap

M. J.

Hutchison

K. A.

Cortese

M. J.

Kessler

Loftis

Neely

J. H.

Nelson

D. L.

Simpson

G. B.

Treiman

(2007). The English lexicon project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758/BF03193014

Barkaoui

(2015). Test takers’ writing activities during the TOEFL iBT® writing tasks: A stimulated recall study. ETS Research Report Series, 2015(1), 1–42. https://doi.org/10.1002/ets2.12050

Beglar

Hunt

(1999). Revising and validating the 2000 word level and university word level vocabulary tests. Language Testing, 16(2), 131–162. https://doi.org/10.1177/026553229901600202

Cameron

(2001). Teaching languages to young learners. Cambridge University Press. https://doi.org/10.1017/CBO9780511733109

Carr

N. T.

(2000). A comparison of the effects of analytic and holistic rating scale types in the context of composition tests. Issues in Applied Linguistics, 11(2), 207–241. https://doi.org/10.5070/L4112005035

Casal

J. E.

Lee

J. J.

(2019). Syntactic complexity and writing quality in assessed first-year L2 writing. Journal of Second Language Writing, 44, 51–62. https://doi.org/10.1016/j.jslw.2019.03.005

10.

Chapelle

C. A.

(2020). Argument-based validation in testing and assessment. SAGE. https://doi.org/10.4135/9781071878811

11.

Chung

J. M.

(2002). The effects of using two advance organizers with video texts for the teaching of listening in English. Foreign Language Annals, 32(3), 295–308. https://doi.org/10.1111/j.1944-9720.2002.tb03157.x

12.

Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press. https://rm.coe.int/16802fc1bf

13.

Coxhead

(2006). Essentials of teaching academic vocabulary. Houghton Mifflin.

14.

Crossley

S. A.

Cai

McNamara

D. S.

(2012). Syntagmatic, paradigmatic, and automatic n-gram approaches to assessing essay quality. In McCarthy

P. M.

Youngblood

G. M.

(Eds.), Proceedings of the 25th international Florida artificial intelligence research society (FLAIRS) (pp. 214–219). AAAI Press.

15.

Crossley

S. A.

Kyle

McNamara

D. S.

(2015). To aggregate or not? Linguistic features in automatic essay scoring and feedback systems. Journal of Writing Assessment, 8(1), 1–23. https://escholarship.org/uc/item/1f21q8ck

16.

Crossley

S. A.

McNamara

D. S.

(2012). Predicting second language writing proficiency: The role of cohesion, readability, and lexical difficulty. Journal of Research on Reading, 35(2), 115–135. https://doi.org/10.1111/j.1467-9817.2010.01449.x

17.

Crossley

S. A.

Salsbury

McNamara

(2010). The development of polysemy and frequency use in English second language speakers. Language Learning, 60(3), 573–605. https://doi.org/10.1111/j.1467-9922.2010.00568.x

18.

Cumming

(2013). Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10(1), 1–8. https://doi.org/10.1080/15434303.2011.622016

19.

Cumming

Kantor

Baba

Erdosy

Eouanzoui

James

(2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10, 5–43. https://doi.org/10.1016/j.asw.2005.02.001

20.

Cumming

Kantor

Baba

Erdosy

Eouanzoui

James

(2006). Analysis of discourse features and verification of scoring levels for independent and integrated tasks for the new TOEFL®. ETS Research Report Series, 2005(1), 1–77. http://doi.org/10.1002/j.2333-8504.2005.tb01990.x

21.

Cumming

Lai

Cho

(2016). Students’ writing from sources for academic purposes: A synthesis of recent research. Journal of English for Academic Purposes, 23, 47–58. https://doi.org/10.1016/j.jeap.2016.06.002

22.

Cumming

Rebuffot

Ledwell

(1989). Reading and summarizing challenging texts in first and second languages. Reading and Writing, 1(3), 201–219. https://doi.org/10.1007/BF00377643

23.

Deane

Odendahl

Quinlan

Fowles

Welsh

Bivens-Tatum

(2008). Cognitive models of writing: Writing proficiency as a complex integrated skill. ETS Research Report Series, 2008(2), i–36. https://doi.org/10.1002/j.2333-8504.2008.tb02141.x

24.

Delaney

Y. A.

(2008). Investigating the reading-to-write construct. Journal of English for Academic Purposes, 7(3), 140–150. https://doi.org/10.1016/j.jeap.2008.04.001

25.

Educational Testing Service. (2012). TOEFL Junior® Comprehensive writing scoring guides. https://www.ets.org/s/toefl_junior/pdf/toefl_junior_comprehensive_writing_scoring_guides.pdf

26.

Ellis

(2003). Task-based language learning and teaching. Oxford University Press.

27.

Ellis

Yuan

(2004). The effects of planning on fluency, complexity, and accuracy in second language narrative writing. Studies in Second Language Acquisition, 26(1), 59–84. https://doi.org/10.1017/S0272263104026130

28.

Foster

Skehan

(1996). The influence of planning and task type on second language performance. Studies in Second Language Acquisition, 18(3), 299–323. https://doi.org/10.1017/S0272263100015047

29.

Gebril

Plakans

(2009). Investigating source use, discourse features, and process in integrated writing tests. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 7(1), 47–84.

30.

Gebril

Plakans

(2013). Toward a transparent construct of reading-to-write tasks: The interface between discourse features and proficiency. Language Assessment Quarterly, 10(1), 9–27. https://doi.org/10.1080/15434303.2011.642040

31.

Graesser

A. C.

McNamara

D. S.

Louwerse

M. M.

Zhiqiang

(2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, and Computer, 36(2), 193–202. https://doi.org/10.3758/BF03195564

32.

Grant

Ginther

(2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2), 123–145. https://doi.org/10.1016/S1060-3743(00)00019-9

33.

Guo

Crossley

S. A.

McNamara

D. S.

(2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238. https://doi.org/10.1016/j.asw.2013.05.002

34.

Hamp-Lyons

Henning

(1991). Communicative writing profiles: An investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts. Language Learning, 41(3), 337–373. https://doi.org/10.1111/j.1467-1770.1991.tb00610.x

35.

Henry

(1996). Early L2 writing development: A study of biographical of essays by university-level students of Russian. The Modern Language Journal, 80(3), 309–326. https://doi.org/10.1111/j.1540-4781.1996.tb01613.x

36.

Hirvela

(2016). Connecting reading and writing in second language writing instruction (2nd ed.). The University of Michigan Press. https://doi.org/10.3998/mpub.8122864

37.

Housen

Kuiken

(2009). Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics, 30(4), 461–473. https://doi.org/10.1093/applin/amp048

38.

Housen

Kuiken

Vedder

(Eds.). (2012). Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA. John Benjamins. https://doi.org/10.1075/lllt.32

39.

Hunt

K. W.

(1965). Grammatical structures written at three grade levels [NCTE Research Report No. 3]. National Council of Teachers of English. https://files.eric.ed.gov/fulltext/ED113735.pdf

40.

Jacobs

H. L.

Zinkgraf

S. A.

Wormuth

D. R.

Hartfiel

V. F.

Hughey

J. B.

(1981). Testing ESL composition: A practical approach. English composition program. Newbury House Publishers.

41.

Jafari

Hashim

(2012). The effects of using advance organizers on improving EFL learners’ listening comprehension: A mixed method study. System, 40(2), 270–281. https://doi.org/10.1016/j.system.2012.04.009

42.

Jagaiah

Olinghouse

N. G.

Kearns

D. M.

(2020). Syntactic complexity measures: Variation by genre, grade-level, students’ writing abilities, and writing quality. Reading and Writing, 33(10), 2577–2638. https://doi.org/10.1007/s11145-020-10057-x

43.

Jarvis

(2013). Capturing the diversity in lexical diversity. Language Learning, 63(Suppl. 1), 87–106. https://doi.org/10.1111/j.1467-9922.2012.00739.x

44.

Jiang

Liu

(2019). Syntactic complexity development in the writings of EFL learners: Insights from a dependency syntactically-annotated corpus. Journal of Second Language Writing, 46, Article 100666. https://doi.org/10.1016/j.jslw.2019.100666

45.

Jin

(2020). Syntactic complexity in adapted teaching materials: Differences among grade levels and implications for benchmarking. The Modern Language Journal, 104(1), 192–208. https://doi.org/10.1111/modl.12622

46.

Johnson

M. D.

Mercado

Acevedo

(2012). The effect of planning sub-processes on L2 writing fluency, grammatical complexity, and lexical complexity. Journal of Second Language Writing, 21(3), 264–282. https://doi.org/10.1016/j.jslw.2012.05.011

47.

Johnson

R. B.

Christensen

(2020). Educational research: Quantitative, qualitative, and mixed approaches. SAGE.

48.

Xiao

(2015). Cross-linguistic transfer of morphological awareness between Chinese and English. Language Awareness, 24(4), 355–380. https://doi.org/10.1080/09658416.2015.1114624

49.

Kim

H. R.

Bowles

Yan

Chung

S. J.

(2018). Examining the comparability between paper-and computer-based versions of an integrated writing placement test. Assessing Writing, 36, 49–62. https://doi.org/10.1016/j.asw.2018.03.006

50.

Knoch

Macqueen

O’Hagan

(2014). An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® writing test. ETS Research Report Series, 2014(2), 1–74. https://doi.org/10.1002/ets2.12038

51.

Knoch

Sitajalabhorn

(2013). A closer look at integrated writing tasks: Towards a more focussed definition for assessment purposes. Assessing Writing, 18(4), 300–308. https://doi.org/10.1016/j.asw.2013.09.003

52.

Kramer

McLean

(2019). L2 Reading Rate and Word Length: The necessity of character-based measurement. Reading in a Foreign Language, 31(2), 201–225. https://files.eric.ed.gov/fulltext/EJ1232248.pdf

53.

Kyle

(2020). The relationship between features of source text use and integrated writing quality. Assessing Writing, 45, 1–12. https://doi.org/10.1016/j.asw.2020.100467

54.

Kyle

Crossley

(2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34, 12–24. https://doi.org/10.1016/j.jslw.2016.10.003

55.

Kyle

Crossley

S. A.

(2015). Automatically assessing lexical sophistication: Indices, tools, findings and application. TESOL Quarterly, 49(4), 757–786. https://doi.org/10.1002/tesq.19

56.

Larsen-Freeman

(1978). An ESL index of development. TESOL Quarterly, 12(4), 75–84. https://doi.org/10.2307/3586142

57.

Larsen-Freeman

(2009). Adjusting expectations: The study of complexity, accuracy, and fluency in second language acquisition. Applied Linguistics, 30(4), 579–589. https://doi.org/10.1093/applin/amp043

58.

Laufer

(1992). How much lexis is necessary for reading comprehension? In Bejoint

Arnaud

(Eds.), Vocabulary and applied linguistics (pp. 126–132). Palgrave Macmillan. https://doi.org/10.1007/978-1-349-12396-4_12

59.

Laufer

Nation

(1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics, 16(3), 307–322. https://doi.org/10.1093/applin/16.3.307

60.

Leki

Carson

(1997). “Completely different worlds”: EAP and the writing experiences of ESL students in university courses. TESOL Quarterly, 31(1), 39–69. https://doi.org/10.2307/3587974

61.

(2015). A comparison of EFL raters’ essay-rating processes across two types of rating scales. Language Assessment Quarterly, 12(2), 178–212. https://doi.org/10.1080/15434303.2015.1011738

62.

(2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36–62. https://doi.org/10.5054/tq.2011.240859

63.

Maamuujav

Olson

C. B.

Chung

(2021). Syntactic and lexical features of adolescent L2 students’ academic writing. Journal of Second Language Writing, 53, 1–16. https://doi.org/10.1016/j.jslw.2021.100822

64.

Madani

B. S.

Kheirzadeh

(2022). The impact of pre-listening activities on EFL learners’ Listening comprehension. International Journal of Listening, 36(1), 53–67. https://doi.org/10.1080/10904018.2018.1523679

65.

Martínez

A. C. L.

(2018). Analysis of syntactic complexity in secondary education EFL writers at different proficiency levels. Assessing Writing, 35, 1–11. https://doi.org/10.1016/j.asw.2017.11.002

66.

Miller

S. D.

Smith

D. E.

(1989). Relations among oral reading, silent reading and listening comprehension of students at differing competency levels. Literacy Research and Instruction, 29(2), 73–84. https://doi.org/10.1080/19388079009558006

67.

Ministry of Education Republic of China (Taiwan). (2018). Curriculum Guidelines of 12—year Basic Education for Elementary, Junior High Schools and General Senior High Schools Language Arts—English. https://www.naer.edu.tw/eng/PageSyllabus?fid=148

68.

Ministry of Education Republic of China (Taiwan). (2021). Bilingual 2030 Policy. https://bilingual.ndc.gov.tw/

69.

Norris

J. M.

Ortega

(2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578. https://doi.org/10.1093/applin/amp044

70.

Ohta

Plakans

L. M.

Gebril

(2018). Integrated writing scores based on holistic and multi-trait scales: A generalizability analysis. Assessing Writing, 38, 21–36. https://doi.org/10.1016/j.asw.2018.08.001

71.

Ortega

(2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492–518. https://doi.org/10.1093/applin/24.4.492

72.

Ortega

(2015). Syntactic complexity in L2 writing: Progress and expansion. Journal of second language writing, 29, 82–94. https://doi.org/10.1016/j.jslw.2015.06.008

73.

Ortega

Carson

(2010). Multicompetence, social context, and L2 writing research praxis. In Silva

Matsuda

P. K.

(Eds.), Practicing theory in second language writing (pp. 48–71). Polar Press.

74.

Pallotti

(2009). CAF: Defining, refining and differentiating constructs. Applied Linguistics, 30(4), 590–601. https://doi.org/10.1093/applin/amp045

75.

Peng

Wang

(2020). Effect of the linguistic complexity of the input text on alignment, writing fluency, and writing accuracy in the continuation task. Language Teaching Research, 24(3), 364–381. https://doi.org/10.1177/1362168818783341

76.

Plakans

(2008). Comparing composing processes in writing-only and reading-to-write test tasks. Assessing Writing, 13(2), 111–129. https://doi.org/10.1016/j.asw.2008.07.001

77.

Plakans

(2009a). Discourse synthesis in integrated second language writing assessment. Language Testing, 26(4), 561–587. https://doi.org/10.1177/0265532209340192

78.

Plakans

(2009b). The role of reading strategies in integrated L2 writing tasks. Journal of English for Academic Purposes, 8(4), 252–266. https://doi.org/10.1016/j.jeap.2009.05.001

79.

Plakans

(2015). Integrated second language writing assessment: Why? What? How? Language and Linguistics Compass, 9(4), 159–167. https://doi.org/10.1111/lnc3.12124

80.

Plakans

Gebril

(2013). Using multiple texts in an integrated writing assessment: Source text use as a predictor of score. Journal of Second Language Writing, 22(3), 217–230. https://doi.org/10.1016/j.jslw.2013.02.003

81.

Plakans

Gebril

(2017). Exploring the relationship of organization and connection with scores in integrated writing assessment. Assessing Writing, 31, 98–112. https://doi.org/10.1016/j.asw.2016.08.005

82.

Plakans

Gebril

Bilki

(2019). Shaping a score: Complexity, accuracy, and fluency in integrated writing performances. Language Testing, 36(2), 161–179. https://doi.org/10.1177/0265532216669537

83.

Plakans

Liao

J. T.

Wang

(2018). Integrated assessment research: Writing-into-reading. Language Teaching, 51(3), 430–434. https://doi.org/10.1017/S0261444818000149

84.

Plakans

Liao

J. T.

Wang

(2019). “I should summarize this whole paragraph”: Shared processes of reading and writing in iterative integrated assessment tasks. Assessing Writing, 40, 14–26. https://doi.org/10.1016/j.asw.2019.03.003

85.

Price

K. W.

Meisinger

E. B.

Louwerse

M. M.

D’Mello

(2016). The contributions of oral and silent reading fluency to reading comprehension. Reading Psychology, 37(2), 167–201. https://doi.org/10.1080/02702711.2015.1025118

86.

Prior

S. M.

Welling

K. A.

(2001). “Read in your head”: A Vygotskian analysis of the transition from oral to silent reading. Reading Psychology, 22(1), 1–15. https://doi.org/10.1080/02702710121388

87.

Puimège

Peters

(2019). Learners’ English vocabulary knowledge prior to formal instruction: The role of learner-related and word-related variables. Language Learning, 69(4), 943–977. https://doi.org/10.1111/lang.12364

88.

Qian

(1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension. Canadian Modern Language Review, 56(2), 282–308. https://doi.org/10.3138/cmlr.56.2.282

89.

Read

(2000). Assessing vocabulary. Cambridge University Press. https://doi.org/10.1017/CBO9780511732942

90.

Rukthong

Brunfaut

(2020). Is anybody listening? The nature of second language listening in integrated listening-to-summarize tasks. Language Testing, 37(1), 31–53. https://doi.org/10.1177/0265532219871470

91.

Sawaki

Quinlan

Lee

Y. W.

(2013). Understanding learner strengths and weaknesses: Assessing performance on an integrated writing task. Language Assessment Quarterly, 10(1), 73–95. https://doi.org/10.1080/15434303.2011.633305

92.

Scott

C. M.

(1988). Spoken and written syntax. In Nippold

(Ed.), Later language development: Ages nine through nineteen (pp. 41–91). Little Brown.

93.

Scott

M. S.

Tucker

G. R.

(1974). Error analysis and English-language strategies of Arab students. Language Learning, 24(1), 69–97. https://doi.org/10.1111/j.1467-1770.1974.tb00236.x

94.

Shi

Huang

(2020). Effect of prompt type on test-takers’ writing performance and writing strategy use in the continuation task. Language Testing, 37(3), 361–388. https://doi.org/10.1177/0265532220911626

95.

Shin

S. Y.

Ewert

(2015). What accounts for integrated reading-to-write task scores? Language Testing, 32(2), 259–281. https://doi.org/10.1177/0265532214560257

96.

Skehan

(1996). A framework for the implementation of task-based instruction. Applied Linguistics, 17(1), 38–62. https://doi.org/10.1093/applin/17.1.38

97.

Skehan

(1998). A cognitive approach to language learning. Oxford University Press. https://doi.org/10.1177/003368829802900209

98.

Skehan

(2009). Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics, 30(4), 510–532. https://doi.org/10.1093/applin/amp047

99.

Stæhr

L. S.

(2008). Vocabulary size and the skills of reading, listening and writing. The Language Learning Journal, 36(2), 139–152. https://doi.org/10.1080/09571730802389975

100.

Stæhr

L. S.

(2009). Vocabulary knowledge and advanced listening comprehension in English as a foreign language. Studies in Second Language Acquisition, 31(4), 577–607. https://doi.org/10.1017/S0272263109990039

101.

Thorndike

R. M.

Thorndike-Christ

(2009). Measurement and evaluation in psychology and education (8th ed.). Merrill.

102.

Uccelli

Dobbs

C. L.

Scott

(2013). Mastering academic language: Organization and stance in the persuasive writing of high school students. Written Communication, 30(1), 36–62. https://doi.org/10.1177/0741088312469013

103.

Uehara

(2015). Hajimete no TOEFL Junior® Comprehensive test mondaishu [First time TOEFL Junior® Comprehensive test workbook]. Kumon Shuppan.

104.

Watanabe

(2001). Read-to-write tasks for the assessment of second language academic writing skills: Investigating text features and rater reactions [Unpublished doctoral dissertation]. University of Hawaii.

105.

Weigle

S. C.

Parker

(2012). Source text borrowing in an integrated reading/writing assessment. Journal of Second Language Writing, 21(2), 118–133. https://doi.org/10.1016/j.jslw.2012.03.004

106.

Wolf

M. K.

Wang

Tsutagawa

F. S.

(2018). Young adolescent EFL students’ writing skill development: Insights from assessment data. Language Assessment Quarterly, 15(4), 311–329. https://doi.org/10.1080/15434303.2018.1531868

107.

Wolfe-Quintero

Inagaki

Kim

H. Y.

(1998). Second language development in writing: Measures of fluency, accuracy, & complexity. University of Hawaii Press.

108.

Yan

Fan

(2021). Reliability and dependability. In Harding

Fulcher

(Eds.), The Routledge handbook of language testing (2nd ed., pp. 350–362). Routledge. https://doi.org/10.4324/9781003220756-37

109.

Yoon

H. J.

(2017). Linguistic complexity in L2 writing revisited: Issues of topic, proficiency, and construct multidimensionality. System, 66, 130–141. https://doi.org/10.1016/j.system.2017.03.007

110.

Zhang

(2012). Vocabulary and grammar knowledge in second language reading comprehension: A structural equation modeling study. The Modern Language Journal, 96(4), 558–575. https://doi.org/10.1111/j.1540-4781.2012.01398.x

The relationship between written discourse features and integrated listening-to-write scores for adolescent English language learners

Abstract

Keywords

Introduction

Literature review

Complexity

Syntactic complexity

Lexical complexity

Grammatical accuracy

Fluency

Organization

Vocabulary support

Methods

Participants

Listening-to-write tasks

Selection of textual features

Lexical sophistication

Syntactic complexity

Fluency

Grammatical accuracy

Organization

Vocabulary support

Analysis

Results

Descriptive statistics of textual features and vocabulary support by writing score level

The predication of textual features and vocabulary support on students’ writing performance in integrated listening-to-write assessments

Discussion and implications

Conclusion

Footnotes

Acknowledgements

Author contribution(s)

Declaration of conflicting interests

Funding

ORCID iDs

References