Abstract
Most research into second language (L2) writing has focused on the products of writing tasks; much less empirical work has examined the behaviours in which L2 writers engage and the cognitive processes that underlie writing behaviours. We aimed to fill this gap by investigating the extent to which writing speed fluency, pausing, eye-gaze behaviours and the cognitive processes associated with pausing may vary across independent and integrated tasks throughout the whole, and at five different stages, of the writing process. Sixty L2 writers performed two independent and two integrated TOEFL iBT writing tasks counterbalanced across participants. While writing, we logged participants’ keystrokes and captured their eye-movements. Participants took part in a stimulated recall interview based on the last task they had completed. Mixed effects regressions and qualitative analyses revealed that, apart from source use on the integrated task, L2 writers engaged in similar writing behaviours and cognitive processes during the independent and integrated tasks. The integrated task, however, elicited more dynamic and varied behaviours and cognitive processes across writing stages. Adopting a mixed-methods approach enabled us to gain more complete and specific insights than using a single method.
Keywords
I Introduction
Most existing research into second language (L2) writing performance, development and assessment has focused on the product of writing (see Cumming, 2016; Polio and Lee, 2017), with researchers relying on increasingly more sophisticated analytical tools to study text quality including corpus-based, natural language processing techniques (e.g. Alexopoulou et al., 2017). Less is known about the writing processes in which L2 writers engage (e.g. Roca de Larios et al., 1999; Spelman Miller, 2000; Stevenson et al., 2006), although there is a growing body of research on L2 writing behaviours (e.g. fluency, pausing) and associated cognitive processes using more and more advanced research methodology (Révész and Michel, 2019). We conducted this study to contribute to and expand this work. Through the triangulation of keystroke logging, eye-tracking, and stimulated recall, we investigated the extent to which writing behaviours (speed fluency, pausing, and eye-movements) and cognitive writing processes associated with pausing (inferred from stimulated recall protocols) may vary across independent and integrated tasks throughout the whole, and at different stages, of the writing process. Few studies have looked into the effects of task type on L2 writing behaviours (Barkaoui, 2015) and cognitive processes (Plakans, 2008), and none have employed a mixed-methods approach combining keystroke logging and verbal protocols with eye-tracking.
II Background
1 Theoretical background
We have adopted Kellogg’s (1996) model of writing as a theoretical framework. Like most existing theoretical accounts of writing (e.g. Flower and Hayes, 1980; Galbraith, 2009; Hayes, 2012; Scardamalia and Bereiter, 1987), this model was developed to explain first language writing. Compared to other writing models, however, Kellogg’s framework puts greater emphasis on linguistic encoding processes. This makes it particularly suitable for studying L2 writing, given that encoding ideas into written form tends to require more conscious attention and effort in one’s second than first language (Kormos, 2012; Roca de Larios et al., 1999).
Kellogg’s (1996) model distinguishes three main processes: formulation, execution, and monitoring. Formulation involves higher-order processes such as planning content, retrieving ideas from the task input and/or from long-term memory, and organizing ideas into a coherent plan. In addition, formulation entails the lower-order translation processes of lexical retrieval, syntactic encoding, and creating cohesion, which translate the writer’s plan into language form. At the execution stage, the writer uses motor movements to hand-write or type their text. During monitoring, writers ensure that the written text maps onto their intended plan and, if needed, revision is triggered. Importantly, the processes of formulation, execution, and monitoring are assumed to take place concurrently and cyclically until the text expresses the writer’s plan.
2 Researching L2 writing behaviours and cognitive processes
To test this and other models of writing, researchers of L2 writing processes have employed verbal protocols such as the think-aloud and stimulated-recall procedures (e.g. Roca de Larios et al., 2008). Although these techniques have generated useful information, the validity of the think-aloud procedure has been questioned on the grounds of causing reactivity (i.e. altering the writing process), and both verbalization techniques have been argued to carry the risk of veridicality (i.e. not capturing writers’ thoughts in full). To address these limitations, scholars increasingly rely on keystroke logging, alone or together with verbal protocols, to study L2 writing processes (Barkaoui, 2019; Révész et al., 2017a; Spelman Miller, 2000; Stevenson et al., 2006; van Waes and Leijten, 2015). However, even when combined with verbal reports, keystroke logging does not give insight into what writers look at when they compose. This shortcoming may be addressed through triangulating keystroke logging and verbal report data with eye-gaze recordings. As argued below, adding information about, for example, reading during writing, will provide a more complete picture of the processes underlying writing (Révész and Michel, 2019).
Only a few L2 studies have employed eye-gaze measurements to tap into looking behaviours during writing, with all of them triangulating eye-gaze recordings with other techniques (Chukharev-Hudilainen et al., 2019; Gánem-Gutiérrez and Gilmore, 2018; Révész et al., 2017b, 2019). The quantitative measurement of real-time written production and/or eye-gaze recordings, especially when combined with the qualitative examination of thought processes, should provide a fuller and more specific description of the behaviours and cognitive processes of L2 writers.
To date, only a few L2 writing studies have adopted a mixed-methods approach. For example, Révész et al. (2017a) utilized keystroke logging and stimulated recall to examine the speed fluency, pausing, and revision behaviours of L2 writers and associated cognitive processes during an argumentative task with or without content support. Triangulating data from the two methodologies, the researchers concluded that content support likely decreased the pressure on planning processes (e.g. inducing fewer between-sentence pauses and more below-clause level revisions), thereby freeing up attentional resources for linguistic encoding (i.e. more translation-related pauses and revisions). Khuder and Harwood (2015) combined keystroke logging, stimulated recall, and screen recordings to compare L2 writers performing tasks under a test versus no test condition. Findings revealed more translation and surface revision processes under the test condition, but higher proportions of meaning-focused revisions and evaluation in the non-test situation, with differences being more pronounced at the last writing stage.
Recently, researchers have also begun to include eye-tracking methodology in mixed-methods studies when investigating L2 writing processes. For instance, Gánem-Gutiérrez and Gilmore (2018) complemented digital screen capture data with eye tracking, video recording, and stimulated recall when studying Japanese L2 English writers. Qualitative analyses, which considered the number and frequency of writing activities (e.g. rereading, use of external sources), revealed that most of the writing time was dedicated to text construction, while other activities took up comparatively little time. Analyses across writing stages additionally found that, as participants progressed with the task, they spent gradually less time on text construction and increasingly more time rereading their work. Révész et al. (2017b, 2019) combined eye-tracking with stimulated recall and keystroke logging to examine writing processes of Chinese L2 users of English completing an argumentative essay. Like the present study, one aim was to investigate pausing behaviours and associated cognitive processes. The researchers obtained measures of pause frequency and length, classified according to location—whether they occurred within words, between words, or between sentences. Of interest was whether eye-gazes remained at inscription point or moved back within the word/phrase, clause, sentence, or paragraph preceding the inscription point. As hypothesized, when participants paused between sentences, pauses were longer, they looked back at longer stretches of text, and they engaged in higher-order writing activities. Pauses within and between words were shorter, induced shorter lookbacks, and involved lower-order writing processes.
Overall, findings of these mixed-methods studies mirror trends in previous writing research, where longer pauses were associated with higher textual units (between clause and sentences), and shorter pauses were linked to lower textual units (within and between words) (e.g. Deane and Zhang, 2015; Spelman Miller, 2006; Spelman Miller et al., 2008; Xu and Qi, 2017). However, the combination of data sources provided a fuller picture of writing behaviours and more complete understanding of the underlying processes. Inspired by this earlier work and methodological advances in L2 writing-process research, the present study also adopted a mixed-methods approach employing keystroke logging, eye-tracking, and stimulated recall. Specifically, we investigated L2 writing processes across independent versus integrated tasks, hoping that the triangulation of methods would afford deeper insights into writing processes across these task types.
3 The role of task type: Independent versus integrated tasks
It is well documented in L2 writing research that the type of task in which learners engage has an impact on the writing product (e.g. Alexopoulou et al., 2017; Lu, 2011). Given this understanding, researchers have shown a keen interest in exploring how writing performances might be affected by the distinction between independent and integrated tasks, the latter task type being employed increasingly in language assessment contexts in an attempt to increase authenticity. Independent tasks typically ask writers to address a prompt or respond to a question relying on their own resources. Integrated tasks ‘require learners or test takers to incorporate substantive content from source materials’ (Cumming, 2013: 1), thus writers need to synthesize information from, for example, a listening and/or reading input and summarize it into a coherent text. While in the assessment literature there is a growing body of research comparing L2 performances on independent versus integrated writing tasks, most existing work has examined the extent to which the product of writing or text quality may vary across the task types (e.g. Biber and Gray, 2013; David, 2015). To the best of our knowledge, only three studies have looked into writing processes and behaviours on integrated vs. independent tasks, all examining the TOEFL iBT test.
Based on Kellogg’s model (1996), one might expect integrated tasks to engage writers in planning and translation processes to a lesser degree, because writers can rely on the sources for help with content and language. This, in turn, might elicit greater fluency and more time for monitoring. Previous studies largely reflect these predictions. Using a think-aloud procedure, Plakans (2008) found that in the TOEFL iBT integrated task, students were more likely to reread the prompt, engage in thinking to interpret the task, and do during-writing planning. In contrast, the independent task elicited more initial but less during-writing planning and more frequent rereading. Unlike Plakans, Barkaoui (2015) employed stimulated recalls, and revealed that, during the independent task, participants planned more, experienced greater difficulty with generating content and language, and revised language more frequently. In a keystroke-logging study, Barkaoui (2019), like the present research, examined the impact of task type on pausing behaviours. When working on the integrated task, participants paused longer on average, probably because they went back to the reading while composing. They also paused longer between paragraphs on the independent task, suggesting more time was required for planning content. Finally, writers produced more revision pauses on the independent task, maybe because they could not extract language from a provided text. Through the use of a mixed-methods approach including eye-tracking, we aimed to substantiate and expand this research.
4 Stage of writing
Largely motivated by the work of Rijlaarsdam and Van den Bergh (1996), a growing number of studies examined the temporal distribution of writing activities, demonstrating that the behaviours and the cognitive processes in which writers engage differ during the composing process. Previous work, mostly focusing on independent, argumentative writing tasks, suggests that L2 writers tend to plan more initially, while formulation activities are more frequent in the middle phases (e.g. Barkaoui, 2015; Roca de Larios et al., 2008; Tillema, 2012; Van Weijen, 2009). Less uniform patterns were observed for revision and rereading. Some studies found increased revision over time (Barkaoui, 2015; Roca de Larios et al., 2008), whereas others reported stable amounts of revision across stages (Gánem-Gutiérrez and Gilmore, 2018; Tillema, 2012). For rereading, Tillema (2012) observed similar amounts throughout the writing process, but in Gánem-Gutiérrez and Gilmore (2018) there was a decrease during the task. In addition, Roca de Larios et al. (2008) found that more proficient writers showed greater variety of activities and were more versatile in responding to the different demands of the evolving text, whereas lower level writers exhibited similar processes and behaviour across stages.
Only a few studies have considered the distribution of writing activities in integrated writing tasks. In comparing independent and integrated tasks, Barkaoui’s previously discussed (2015) study found that the independent task initially elicited reflection, followed by planning and text generation, while the final stage was characterized by evaluation and revision. On the integrated task, participants interacted with the sources in the first stage, but the other activities were largely parallel to those observed for the independent task. Leijten et al. (2019) revealed similar trends for source-based writing: writers spent most of the first interval consulting the various source texts, followed by an intensive writing period involving only short switches to the sources. In the final stages, hardly any sources were consulted; high achievers engaged in revising their texts. Finally, Barkaoui (2019) observed longer pauses in the first third of the writing process compared to the second and last stage irrespective of task type. However, the average frequency and length of pausing differed across stages and task type. For the first interval, there were more but shorter pauses in the independent than the integrated task. Also, pause frequency was almost equal across the three stages on the independent task, but pauses were three times as frequent in the second and third stages as in the first on the integrated task. Barkaoui attributed this difference to the participants’ rereading of the source text during the initial stage of the integrated task and look-backs to the source at later stages. One aim of this study was to gain direct evidence about eye-gaze behaviours during writing to reach firmer conclusions about processes such as rereading.
5 Study
Through a mixed-methods approach employing keystroke logging, eye-tracking, and verbal protocols, we pursued the following research questions:
To what extent does task type influence the behaviours of L2 writers a. during the whole writing process? b. at various stages of the writing process?
To what extent does task type influence the cognitive processes underlying writing behaviours a. during the whole writing process? b. at various stages of the writing process?
Task type was operationalized as differences between independent and integrated writing tasks. Writing behaviours were measured in terms of measures of speed fluency and pausing obtained through keystroke logging and indices derived from eye-gaze recordings. Underlying cognitive processes were investigated by eliciting stimulated recall comments on participants’ composing processes.
III Methodology
1 Participants
Participants were 60 L2 users, 20 students each at levels B1, B2, and C1 of the Common European Framework of Reference (CEFR). They were all Chinese L2 users of English, studying at the University of London. We recruited an initial pool of 103 participants. Of these, 84 students were invited to participate based on their performance on the research form of the TOEFL iBT listening and reading tests and a typing test. From among these students, 24 were excluded due to technical issues or because they failed to complete all tasks. The final cohort were mostly females (n = 55), aged 18 to 36 (M = 23.76, SD = 3.22). The majority were studying for an MA (n = 55), two were working towards a BA and three towards a doctorate.
2 Instruments and procedures
a Typing test
We controlled for keyboarding skills using the software, Typing Test Pro (Barkaoui, 2014). Per proficiency level, all individual scores of net typing speed (i.e. words per minute adjusted for accuracy) were within 2 SDs from the mean per group (B1: M = 25.32, SD = 7.52; B2: M = 26.60, SD = 11.71; C1: M = 36.55, SD = 10.90).
b Writing tasks
Participants completed two research versions of the TOEFL iBT independent and integrated writing tasks to control for prompt effects, resulting in 240 performances altogether. The order of the four tasks was counterbalanced across participants. The independent tasks asked participants to write an argumentative essay in 30 minutes. For the integrated task, participants first read a passage, then listened to a lecture on the same topic. While reading and listening, they could take notes on paper. Next, their task was to summarize the points made in the lecture, explaining how the named aspects in the lecture cast doubt on the points that were put forward in the reading passage. The reading text and any notes taken were available in the 20 minutes participants had for writing their summaries.
The tasks were administered in the TOEFL iBT research platform, without additional planning time. The actual writing, however, was completed in a Microsoft (MS) Word document, as the Inputlog software logged data in MS Word. The MS Word document was opened on top of the TOEFL iBT environment and set up in such a way that the font type, font size, spacing and editing tools mimicked the original TOEFL iBT writing window (for an example of the set-up of the integrated task, see Figure 1; in the independent task, the reading passage area was blank). While writing, an Eyelink1000 with a temporal resolution of 1,000 Hz recorded participants’ eye movements.

Screen set-up for integrated task.
c Stimulated recall
Immediately after completing the writing tasks, all participants engaged in a stimulated recall session to elicit the participants’ thoughts about the writing. Recall was based on the last writing task they had performed, that is, we collected these data for 30 independent and 30 integrated performances. To prompt recall, we used the recordings of participants’ keystrokes and eye-movements while writing. We explained to participants how to interpret eye-gaze recording data in everyday language, for example, that the red circles (eye-fixations) and lines (saccades) in the eye-gaze recordings indicated their eye movements, and that larger circles represented longer eye-fixations. Participants also listened to a sample stimulated-recall performance based on a different writing task. Participants could stop the recording at any point when they wanted to share their during-writing thoughts. Additionally, the researcher elicited their thoughts when they paused, revised, or produced unexpected or interesting eye-movements (e.g. longer fixations, regressions) but did not produce a comment. We stressed to only report on what they were thinking during task completion. Participants could use their native language, given that the third author, a Mandarin speaker, conducted all stimulated recall sessions. We video-recorded all sessions to capture participants’ verbal comments as well as gestures (e.g. pointing to the screen). The sessions lasted approximately 60–90 minutes.
d Data collection
First, participants took part in a group session in a computer lab. After providing informed consent, they completed a background questionnaire (10 minutes), the listening (60–90 minutes) and reading (60–80 minutes) components of the TOEFL iBT test, and a typing test (10 minutes). Those who obtained appropriate proficiency and typing scores attended two individual sessions. In the first session, they completed the first two tasks (60–70 minutes), in the second session, participants performed the remaining writing tasks (60–70 minutes), followed by the stimulated recall session (60–90 minutes).
Before participants began a writing task, we launched the Inputlog software and the eye-tracker. Participants sat about 600 mm away from the centre of the screen. Once calibration on a 9-point grid was successful, we started the SR Research Screen Recorder software and opened the appropriate version of the research version of the TOEFL iBT writing task. Participants took short breaks between the writing tasks and prior to the stimulated recall interview. To ensure ecological validity, we used the remote set-up of the eye-tracker, which allowed participants to move their head freely during writing. We recalibrated participant’s eyes between each writing task, but no re-calibration was done during writing tasks. We monitored participants’ eye-movements throughout the writing session on the researcher’s screen and adjusted the seating of the participant if tracking was lost. To account for track loss, we measured blink duration and number for each participant and calculated the mean percentage of blink duration (see Table S1, online supplemental material). There were fewer but on average longer blinks on the integrated than the independent task. Percentage of track loss was around 30% for both tasks, a lower rate than reported in earlier work (e.g. Gánem-Gutiérrez and Gilmore, 2018).
3 Data analysis
a Writing behaviours
We obtained speed fluency and pausing measures from Inputlog 7 (Leijten and van Waes, 2013). We used a pause threshold of 200 ms when calculating the fluency and pausing indices, as this low threshold allowed us to capture lower-level writing processes (van Waes and Leijten, 2015). Speed fluency was expressed with two measures: characters per P-burst (i.e. number of characters produced between pauses) and mean duration of character production (i.e. total writing time excluding pauses divided by number of characters produced). We classified the pause frequency and length measures by location - whether pauses occurred within words, between words, or between sentences - while counting between-word pauses as one pause (adding up the pause before and after pressing the spacebar).
Eye-gaze data were analysed with the SR Research Data Viewer software. Given that the TOEFL iBT research environment leaves little white space around words and lines, we used relatively coarse eye-gaze measures to gauge viewing behaviours within the writing window as a whole. That is, for this study, the box allocated for writing was defined as the area of interest (AOI). For this AOI, we calculated the following indices (Brunfaut and McCray, 2015): fixation count; total fixation duration; mean fixation duration; number of forward and of backward saccades; median length of forward and of backward saccades (in degrees of visual angle); and proportion of regressive movements (i.e. number of backward saccades divided by the total number of saccades). Forward and backward saccades were defined as eye-movements that had a positive (forward) and negative (backward) angle between the horizontal plane and the direction of the current saccade, respectively. We corrected for time on task by dividing the measures by the time needed for task completion for the measures total fixation duration and number of fixations, forward saccades, and backward saccades.
b Cognitive processes
The analysis of the stimulated recall comments involved five steps. First, the data were transcribed. Second, the third author reviewed the comments related to pausing and identified emergent categories. Third, the resulting micro-categories were merged into more general categories following Kellogg (1996) (see Table 1). Fourth, the third author coded all the comments. Another Mandarin speaker of L2 English with an L2 research background coded 20 percent of the data, yielding a high inter-coder reliability (Cohen’s kappa = .91). Finally, comments were added up resulting in a frequency count per participant by category.
Examples for stimulated recall comments by coding category.
c Stages of writing
All analyses were conducted for the overall writing process. Following earlier research (e.g. Tillema et al., 2011), we also divided the total time participants spent writing each task into five equal intervals and calculated all indices (keystroke logging, eye-gaze, and stimulated recall) for these intervals. This allowed us to capture potential changes in processes as a function of writing stage within participants and to compare writing processes by stage across participants and tasks.
4 Hypotheses
Based on Kellogg’s (1996) framework, Rijlaarsdam and Van den Bergh’s (1996) temporal model of writing and earlier empirical research, we expected that the independent and integrated writing tasks would yield different writing behaviours, both when considered as a whole and when looking at writing stages.
For task type, we hypothesized that the availability of oral and written sources would ease planning and translation processes during the integrated task, resulting in greater writing fluency (i.e. characters per P-burst and mean duration of character production) and fewer and shorter pauses (particularly between higher textual units). We also anticipated fewer and shorter fixations on the integrated task, accompanied by fewer but longer forward and backward saccades, given that participants were expected to return to the source text and listening notes while writing. A higher proportion of backward saccades was expected during the integrated task, indicating more rereading (a signal of monitoring). Stimulated recall comments were hypothesized to be aligned with these prognoses.
Concerning different writing stages, we hypothesized that differences due to source use would be most pronounced at the initial stages, while later stages were anticipated to yield more similar behaviours across the task types, demonstrating focused writing in the middle stages (e.g. fewer/shorter pauses, fixations and saccades) and monitoring (e.g. longer pauses, longer saccades, higher proportion of backward saccades) in the last stage.
5 Statistical analyses
According to G*Power (Faul et al., 2007), a sample size of 60 allowed us to identify medium-size relationships, given the within-subject design and number of observations. To address the research questions, we constructed linear mixed effects models using the lmer function of the lme4 package in the R statistical environment. The r.squared GLMM function in the MuMln package was used to compute effect sizes (R2) for fixed effects, and Cohen’s d was employed to obtain effect sizes for Bonferroni post-hoc tests. Following Plonsky and Oswald (2014), d values of .60, 1.00 and 1.40 were considered as small, medium, and large. The alpha level was set at .05 for initial analyses and at .01 for any post-hoc tests. Residual plots were used to check the linearity, homoscedasticity, and normality assumptions for the models; the data met the assumptions.
IV Results
1 Task type and L2 writing behaviours
Research question 1a investigated the extent to which task type influenced writing behaviours during the whole writing process. In all statistical models, a writing behaviour index served as the dependent variable, the fixed effect was task type, and participant and prompt were the random effects. We also added by-participant random slopes for task type to account for the potentially differential effects of task type on the participants.
The descriptive statistics for the measures of speed fluency, pausing, eye-fixations, and (forward and backward) saccades are summarized in Tables 2 and 3, respectively (for complete figures, see supplemental material Table S2 to Table S5), while Table 4 provides relevant inferential statistics. Task type was found to have a significant effect on seven indices. Participants showed greater speed fluency, as measured by active writing time per characters, on the independent as compared to the integrated task, with task type accounting for 39% of the variation. The independent task also yielded significantly shorter pauses, but task type explained less than 1% of the variance. Of the eye-tracking indices, participants fixated significantly longer and more often on the writing window during the independent than the integrated task and made more forward saccades and longer forward and backward saccades when completing the independent task. However, the eye-tracking measures only explained 2%–14% of the variance.
Fluency and pausing measures by task type and pause location (n = 60).
Notes. P-burst = between two pauses. Ww = within word. Bw = between words. Bs = between sentences. Full descriptives (including data by stages) are available as online supplemental material S2 and S3.
Eye-fixation and saccade measures by task type (n = 60).
Note. Full descriptives (including data by stages) are available as online supplemental material S4 and S5.
Significant effects identified by the models examining the effects of task type on writing behaviours.
Notes. * Task = Task type. R2m = R2 marginal. R2c = R2 conditional. chars = characters. sac = saccade.
Research question 1b examined the extent to which task type influenced L2 writing behaviours at various writing stages. In the series of mixed effects analyses the dependent variable was a writing behaviour measure; the fixed effects were task type, stage of writing, and their interaction; and the random effects were participant and prompt. By-participant random slopes for task type and writing stage were also added to take into account participant-by-stage and participant-by-task type variation. For some dependent variables, the participant-by-stage random slope (characters per P-burst, median length of backward saccades) or the participant-by-task slope (total pause number) were removed to ensure model convergence. The predictors of interest were the interactions between task type and writing stage, with significant effects meaning that participants behaved differently in the independent and integrated tasks during a particular writing stage. The analyses yielded a significant interaction effect for 15 measures: characters per P-burst; active writing time per characters; pause length total and between sentences; pause frequency total, within words, between words and between sentences; total fixation duration; fixation count; number of backward and forward saccades; median length of forward and backward saccades; and proportion of backward saccades (see Tables S9–S13, online supplemental material).
To investigate the interaction effects, we ran another series of mixed effects analyses for the independent and integrated tasks separately. This time, writing stage was the single fixed effect in the models, and the random effects remained participant and prompt. By-participant random slopes for writing stage were also added, but these were removed for some dependent variables to achieve convergence (fixation count and number of forward saccades for both independent and integrated tasks; characters per P-burst, median pause length total, and number of backward saccades for independent task only; active writing time per characters, and pause frequency total and between words for integrated task only). As shown in Table 5, Bonferroni post-hoc tests revealed that, overall, stage of writing had a greater influence on writing behaviours during the integrated task, with the analyses yielding considerably more significant differences among writing stages for this task. Notably, although a significant overall interaction effect was identified for median pause length between sentences, median length of forward and backward saccades, and proportion of backward saccades, no significant stage effects emerged in the post-hoc analyses. The significant stage effects are visually represented in Figures 2–6.
Significant results for post-hoc Bonferroni tests examining the effects of task type and stage of writing on writing behaviours (p < .01).
Note. Full Table S6 including p- and SE-values is available as online supplemental material.

Significant interaction effects identified for fluency measures across stages.

Significant interaction effects identified for pausing totals across stages.

Significant interaction effects identified for pause frequency per location across stages.

Significant interaction effects identified for eye-tracking indices based on fixations across stages.

Significant interaction effects identified for eye-tracking indices based on saccades across stages.
Most differences on the integrated task were found between stage 1, stage 5, and the rest of the stages. During stage 5, participants produced significantly fewer characters between pauses and paused less frequently, as compared to earlier stages. Exceptions to this trend were the lower number of pauses observed during stage 1 (overall) and stage 4 (overall) than stage 5. According to the eye-gaze data, participants looked at the writing window more often and longer during stage 5 than earlier stages, as evidenced in higher total fixation time and number of fixations and saccades.
During stage 1, there was more pausing than at stage 5 (overall, within and between words, between sentences), but overall pause length was shorter. In addition, the eye-gaze data revealed that, at stage 1, participants spent less time viewing the writing window than at later stages, reflected in shorter total fixation durations and fewer fixations and saccades. Most effect sizes were in the small range, but a few effect sizes for the eye-gaze indices were large.
On the independent task, speed fluency, as measured by active writing time per characters, was higher during stages 2–4 than during stages 1 and 5, with a small effect size. For pause frequency, most differences were observed between stage 1 and stages 2–4. Participants paused less during stage 1 than subsequent stages, the only exception being pause frequency within words, where more pauses were observed during stage 1 as compared to stage 4. The effect sizes were in the small range. Turning to eye-gaze behaviours, participants fixated shorter and less often on the writing window during stage 1 as compared to later stages, and more saccades were observed during stage 1 than stage 4. The effect sizes were in the small to medium range.
2 Task type and cognitive processes underlying L2 writing behaviours
Research question 2a examined the extent to which task type influenced cognitive writing processes during the whole writing period, as evidenced in the stimulated recall comments prompted by pauses. The comments are summarized in Figure 7 for the independent and integrated tasks, respectively (for exact statistics, see online supplemental material Tables S7 and S8). For the independent task, almost half the comments concerned translation processes, about one-third referred to planning, and approximately a sixth described monitoring.

Stimulated recall comments across stages for the integrated and independent task.
Similarly, on the integrated task, participants referred to translation processes in about half the comments (including resource use), and reported monitoring a sixth of the time. However, they described spending only a fifth of the time planning (including resource use). Overall, about 30 percent of the comments described resource use. Interestingly, resource use was associated with almost 40 percent of the translation-related comments but only a fifth of the planning-related comments.
Research question 2b was concerned with the effects of task type on cognitive processes as a function of writing stage. The distribution of stimulated recall comments (see Figure 7) showed some changes across the five stages. On the independent task, planning- and translation-related comments demonstrated a small decrease across the stages, whereas the comments describing monitoring displayed an increase. Likewise, on the integrated task, there was a decrease in comments referring to translation (excluding resource use) and an increasing number of monitoring-related comments. Unlike on the independent task, however, participants reported slightly more planning (excluding resource use) towards later stages. Resource use was mentioned with gradually lower frequency.
The two task types yielded similar trends regarding pause locations. Within-word pauses and between-word pauses were found to be primarily related to translation processes, mostly lexical retrieval. Only when between-word pauses were associated with resource use, participants mentioned planning more than translation. On both task types, between-sentence pauses were mostly linked to planning (mainly content). Patterns for pause location were similar across the five stages.
V Discussion
This study aimed to contribute to and expand on existing work on L2 writing processes (e.g. Révész and Michel, 2019) by investigating how writing behaviours and the cognitive processes underlying them may differ across independent and integrated tasks during the whole, and at different stages, of the writing process. To this end, we used keystroke logging to measure speed fluency and pausing behaviours, eye-tracking methodology to gauge viewing behaviours during writing, and stimulated recall to tap into the cognitive processes of L2 writers.
1 Writing behaviours and underlying cognitive processes across independent versus integrated tasks
We found that task type had a significant impact on six behavioural indices. Participants took a relatively shorter time to produce characters on average and had shorter pauses in the independent than the integrated task. During the independent task, they made more looks to and spent more time viewing the writing window, and made more and longer forward saccades as well as longer backward saccades. The stimulated recall comments revealed that participants used relatively more pauses for planning (about one-third) in the independent than the integrated task (about one-fifth), and about 30 percent of the pauses on the integrated task were linked to resource use. The three data sources converged on the interpretation that, when working on the integrated task, participants spent proportionately more time viewing the reading text and/or notes that they had taken while listening. This is consistent with the fewer eye-fixations and saccades observed in the writing window and the lower active writing time for the integrated task. In other words, increased source use resulted in decreased time spent on writing. The availability of sources accounts for the reduced time spent planning on the integrated task, given that content could be mined from the reading text and listening notes. No significant task type differences emerged for the other speed fluency, pausing, and eye-gaze measures. Neither did the stimulated recall comments find differences between time spent on translation (about a half) and monitoring (about a sixth) processes across the task types. These results suggest that, apart from using the source text and/or notes in the integrated task, L2 writers engaged in similar writing behaviours during the two task types.
Our results partially replicate previous research findings. Similar to Barkaoui (2015) and Plakans (2008), we observed that the largest proportion of the verbal protocol comments referred to source use on the integrated task, and like Barkaoui (2015), we found more planning-related comments during the independent tasks. Yet, our keystroke logging results run counter to Barkaoui’s (2019) findings, where participants displayed differential pausing behaviours during TOEFL iBT independent and integrated tasks. Barkaoui’s independent task elicited longer pauses between higher textual units, suggesting that more planning time was required (Schilperoord, 1996).
Although we found no task type effects for the pause measures, it is worth highlighting that, like existing findings (e.g. Révész et al., 2017b, 2019; Spellman Miller, 2000), pause durations were longer between larger (e.g. between sentences) than smaller (e.g. within words) textual units. Also, the stimulated recall comments confirmed the assumption that pauses between smaller and larger textual units tend to be associated with lower- and higher-order writing processes, respectively (Schilperoord, 1996).
Importantly, when interpreting our findings for pause behaviours, we have to consider that we used a 200 ms pause threshold to better capture lower-level writing processes (van Waes and Leijten, 2015). This inevitably resulted in smaller average figures for speed fluency (e.g. characters per P-burst: M = 1.54) compared to earlier work using a 2-second threshold. Indeed, when we apply the 2-second threshold to our data, the mean values (independent: M = 24.14; integrated: M = 23.08) are comparable to those in Révész et al. (2017b; M = 20). The higher speed observed in van Waes and Leijten (2015; M = 55) may be explained by differences in proficiency and/or L1 background across the studies. A further result of applying a lower pause threshold is that most pauses in our data occurred within words. This contrasts with previous work using a 2-second threshold (e.g. Barkaoui, 2019; Révész et al., 2017b), where pause frequency was highest between words. Again, applying a 2-second threshold would bring our data in line with earlier work.
It is also interesting to consider what insights the eye-tracking data offered in addition to the information we gained from the keystroke-logging and stimulated recall data. We learnt from the keystroke-logging data that the integrated task led to slower writing and longer pauses, and the stimulated recall data revealed that, on the integrated task, about a third of the pauses were associated with resource use and fewer pauses were linked to planning. On the one hand, the eye-tracking data, an objective measure, substantiated some of the patterns emerging from the subjective, stimulated recall comments. The fewer visits and less time spent viewing the writing window during the integrated task is compatible with the stimulated recall finding that participants often consulted sources (outside the writing window) when they stopped writing. On the other hand, the eye-tracking data yielded additional information that could not have been derived from keystroke-logging or stimulated recall alone. The fact that the independent task generated more planning-related comments as well as more and longer saccades supports the interpretation that participants might have reread previously produced text for the purpose of generating new content.
The little eye-tracking research that exists on L2 writing processes has used different approaches to analysing eye-gaze behaviours during writing (e.g. Chukharev-Hudilainen et al., 2019; Gánem-Gutiérrez and Gilmore, 2018; Révész et al., 2017b, 2019), thus our findings cannot be directly related to them. It is worthwhile, however, to compare our results to those of L2 reading research. Interestingly, we observed considerably higher proportion of backward saccades (independent: M = .48; integrated: M = .47) than Brunfaut and McCray (M = .28), which supports the view that, compared to regular reading, rereading during writing serves different functions, such as generating new content, managing cohesion, and applying metacognitive revision strategies (Wengelin et al., 2009).
2 The role of writing stage across independent and integrated tasks
The analyses tapping into the role of writing stage revealed that behaviours were considerably more varied during the integrated than the independent tasks (51 vs. 23 significant differences). Most differences set apart stage 1 and/or stage 5 from the middle stages.
For the independent task, the initial stage was characterized by slower writing, fewer pauses, and shorter and fewer fixations in the writing window than the middle stages. Stages 3 and 4 demonstrated faster writing speed as compared to stage 5, and fewer saccades were observed in stage 4 than stage 1. According to the stimulated recall data (elicited in relation to pausing), both planning- and translation-related comments demonstrated a small decrease over time, whereas comments on monitoring increased. Taking the behavioural and verbal protocol data together, participants at stage 1 were probably more in a ‘planning mode’ than during the middle stages, which resulted in less continuous writing than in later periods. Greater speed at stage 4 might reflect that participants were trying to finish the task and thus focus on text production. This interpretation is also compatible with the fewer saccades observed at stage 4. As participants were concerned with text production, they were less likely to reread previous texts, resulting in their eye-gazes remaining more often at the inscription point. Then, at stage 5 they likely slowed down to focus on rereading and monitoring. These results are consistent with Barkaoui (2019), who observed fewer pauses initially on a TOEFL iBT independent task, and others, who found that planning decreased from the initial to later writing stages (Barkaoui, 2015; Roca de Larios et al., 2008; Tillema, 2012; Van Weijen, 2009).
The integrated task reveals a somewhat more varied picture. According to the eye-gaze data, participants spent less time viewing the writing window at stage 1 than at later stages. Pause frequency, at all locations, was highest at stages 2 and 3. At stage 5, participants viewed the writing window more often than previously, reflected in higher total fixation and counts; and moved more within the text, evidenced in greater number of forward and backward saccades. The stimulated recall comments revealed a decrease in translation and resource use but an increase in planning and monitoring at later stages.
Triangulating these findings we may infer that, as expected, during stage 1 participants focused on reading the source text and/or notes. In stages 2 to 4 they primarily engaged in text construction involving both higher- and lower-order writing processes. At stage 5, participants allocated most of their attention to their text, probably to monitor and revise their summaries to ensure that it reflected their intended content. These patterns are well-aligned with previous research on integrated tasks, which reports greater source use at initial stages of writing, increasing attention to own text construction in middle stages, and a primary focus on revision in the last stage (Barkaoui, 2015, 2019; Leijten et al., 2019). The addition of eye-gaze data helped us confirm an initial visual focus on the source text and greater visual engagement with participants’ own texts towards the end of writing.
3 Limitations and directions for future work
This study has a number of limitations. Our participants were London-based Chinese university students, which affects generalizability to other populations. Next, participants did not complete the TOEFL iBT test under high-stakes circumstances. Accordingly, our results may not transfer to real testing conditions.
A number of limitations pertain to the eye-tracking methodology. We utilized relatively coarse eye-gaze measures focusing on the whole writing window. Word-level analyses were not possible, given that the design of the TOEFL iBT platform did not provide sufficient white space between words and lines, nor was the font size large enough (see Chukharev-Hudilainen et al., 2019; Révész et al., 2017b, 2019). Another artefact of the test setup was that, for the independent task, the space where the reading text was positioned in the integrated task remained empty. Thus, an increased attention to the writing window on the independent task was not unexpected. Furthermore, due to space limitations, we restricted the eye-gaze analyses to the writing window only. In our future work, we aim to report on gaze information for the reading window (integrated task only), directions and question, as well as switching behaviours between these interest areas.
A further issue concerns individual variation in typing style. For writers who are not touch typists, there is considerable track loss when participants look at the keyboard (around 30 percent on average in our study, including both touch and non-touch typists). A possible solution is to recruit touch typists only, but this would limit generalizability. Alternatively, one may consider the use of eye-tracking glasses. However, this would introduce different limitations (for accuracy, for example, see Conklin et al., 2018). For now, we need to report on and consider track loss when interpreting eye-gaze data during writing.
Another methodological issue concerns the 200 ms pause threshold we adopted. This lower threshold enabled us to gain information about lower-level processes. As a consequence, however, our results are not directly comparable to much of the existing research on L2 pausing, where a 2-second threshold was employed.
Finally, following seemingly arbitrary criteria (Tillema et al., 2011), we compared writing behaviours and processes across five stages. It would be valuable to consider writing processes according to the stages of planning, translation and monitoring, as observed in the replay of writing or as reported by participants. For our study, this would have resulted in highly individualized data, making a comparison across participants and tasks challenging. We would, however, encourage future work to explore this approach.
VI Conclusions
We investigated the behaviours and cognitive processes of L2 writers when completing independent and integrated writing tasks. In the integrated task, source use was most prevalent initially, with participants dedicating gradually more attention to the writing pane. Apart from source use, however, L2 writers engaged in similar types of writing behaviours and cognitive processes during the two tasks. However, the distribution of writing activities varied across the different stages in the two task types. The integrated task elicited more dynamic and varied behaviours and cognitive processes. From a practical perspective, these findings help provide evidence of response process validity of the TOEFL iBT writing test. The two task types generated, as intended (Cumming et al., 2000; Enright and Tyson, 2011), partially different writing behaviours and underlying processes, assisting the measurement of different aspects of the construct. Moreover, it is worth noting that the study yielded no evidence for construct irrelevant behaviours. Finally, it is important to highlight that adopting a mixed-methods approach, through the use of keystroke logging, eye-tracking and stimulated recall, enabled us to gain more complete and specific insights than the use of a single method would have made possible.
Supplemental Material
200216_MichelRevesz_etal_Effects_of_task_type_on_L2_writing_processes_submitted_supplement – Supplemental material for Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study
Supplemental material, 200216_MichelRevesz_etal_Effects_of_task_type_on_L2_writing_processes_submitted_supplement for Investigating L2 writing processes across independent and integrated tasks: A mixed-methods study by Marije Michel, Andrea Révész, Xiaojun Lu, Nektaria-Efstathia Kourtali, Minjin Lee and Lais Borges in Second Language Research
Footnotes
Acknowledgements
We would like to thank the Educational Testing Service (ETS) for providing financial support to carry out the project on which this study is based. We are also grateful to ETS, and John Norris in particular, for constructive feedback and practical support at various stages of the project. In addition, we would like to thank Khaled Barkaoui for sharing his typing test and Ana Pellicer-Sánchez for advice on eye-tracking issues.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
Ethical approval from the author’s institution was provided before data collection started.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by Educational Testing Service (ETS) under a Committee of Examiners and the Test of English as a Foreign Language research grant. ETS does not discount or endorse the methodology, results, implications, or opinions presented by the researchers.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
