Abstract
Drawing on models of first language writing and models of second language (L2) oral task performance, this quasi-experimental study investigated the effect of pre-task planning (PTP) on the product of computer-based L2 writing against a no-planning (NP) condition with the total time-on-task controlled for. It also examined the possible influence of writers’ strategy preferences on the effect of PTP. The study adopted a within-group design. Forty-three participants were required to complete two argumentative essays on a computer with one produced under the PTP condition and the other in the NP condition. A post-task questionnaire was used to collect participants’ strategy preferences. Statistical analyses showed that PTP affected only writing fluency and had no effect on complexity, accuracy, lexis, idea density and coherence and cohesion. Considerable individual differences were found on how learners were affected by the planning conditions but no mediating effect of strategy preferences was evident. Based on the results, we argue that teachers can require students to pre-plan when they want to encourage fluent writing but should not expect PTP to improve text quality in computer-based writing. Theoretical implications are discussed in relation to the different theoretical positions on the effect of PTP.
Keywords
I Introduction
With the development of computer technology, investigating writing on a keyboard is increasingly important. Compared to paper-based writing, the computer makes it easier to revise what has been written without leaving visible traces on the final product. This possibly allows writers more flexibility in coordinating the various cognitive processes involved in writing and may result in very different writing behaviours. Research has shown that the transcription mode has no effect on the nature of the cognitive processes involved (Weir et al., 2007) but can alter how time and attention are allocated among processes (Chamorro, 2022; Chan et al., 2018). Thus, it is important to ask whether strategies commonly recommended for pen-and-paper writing are still useful when the transcription method differs. The study reported in this article investigated the effect of one such strategy – pre-task planning (PTP) – on the product of second language (L2) argumentative writing.
II Theoretical background
Planning is a preparatory activity that involves thinking about what to write and how to write it (Ellis, 2005). Ellis (2003, 2005) distinguished pre-task and within-task planning. Pre-task planning (PTP) commences before the start of a task and can be further divided into rehearsal (or drafting) and strategic planning (Ellis, 2005), which is the focus of the present study. Strategic planning typically involves giving writers the task materials beforehand and asking them to consider the main elements to be included when they write. This is what the teachers usually have in mind when they say, ‘plan before you write’. Within-task planning occurs alongside text transcription and is also sometimes referred as online planning (OLP). It can be pressured or unpressured depending on the time given for writing (Ellis, 2005). The underlying assumption is that OLP always occurs, even when the writers are too busy to do it carefully. As Ellis (2019, 2021) noted, any effect that PTP has on the written product may be mediated by online planning.
1 Models of first language (L1) writing
The role of planning is addressed in a number of cognitive models of writing and models of task performance, but the predictions they offer about the effects of PTP differ somewhat. Cognitive models of L1 writing (e.g. Bereiter & Scardamalia, 1987; Chenoweth & Hayes, 2001, 2003; Galbraith, 1999; Hayes & Flower, 1980; Kellogg, 1996) generally identify four major sub-processes involved in writing: proposing and organizing ideas, transforming ideas into linguistic messages, writing down or typing out the encoded messages, and evaluating the output of these three components and making changes. We will refer to these cognitive operations as proposing, translating, transcribing and revising, respectively. For clarity’s sake, we have adopted the term ‘proposing’ for the first sub-process while using ‘planning’ to refer to pre-task and online planning.
Important for this study is how these cognitive processes are coordinated and managed as a functioning system. There are two contrasting positions on the effect of planning – the Overload Hypothesis (OH) and the Interaction Hypothesis (IH) – both proposed by Kellogg (1990), representing different views on the process management of writing. Overload Hypothesis stresses the competition among different cognitive processes during writing. Drawing on Hayes and Flower’s (1980) model, the OH assumes the writer has a limited working memory (WM) capacity which is often overloaded during composing. Pre-writing planning can reduce cognitive overload and improve both writing fluency and text quality. Kellogg further developed this view in his writing model (Kellogg, 1996) by linking the different sub-processes of writing with components of WM. The model proposes competition among cognitive operations for limited attentional resources, which can be relieved by PTP. According to Kellogg, pre-task planning funnels writers’ attention to content before they start writing, making resources available for the selection of language during text production, which can enhance writing performance (Kellogg et al., 2013).
Kellogg’s (1990) Interaction Hypothesis, not to be confused with Long’s (1996) Interaction Hypothesis, emphasizes the nonlinear and recursive nature of writing where the different cognitive processes interact extensively during the development of a text. This view is consistent with Chenoweth and Hayes’ (2001, 2003) model, which describes a two-way interaction between proposing and translating. Once a chunk of text has been produced, it becomes part of the task environment that provides input for further content generation. Similarly, Galbraith’s (1999) Knowledge-Constituting Model explains how the process of text production can facilitate the formulation of new ideas and promote discovery by providing ‘inhibitory feedback’ to the proposer. According to the IH, requiring detailed pre-task planning can interrupt the proposing-translating interaction and thus impact negatively on writing performance. Kellogg (1990) tested the OH and the IH with L1 writers, finding evidence supporting the former, but other studies investigating L2 writing have shown otherwise as we will see in Section III.
It is possible that both competition and interaction occur among processes during writing but with one more dominant than the other. As Galbraith (1999) noted, the extent to which writers utilize inhibitory feedback when developing new ideas depends on a number of factors, including the writers’ encyclopaedic and linguistic knowledge, the form of output required, the writer’s personal orientation and the choice of writing strategy. The strategy that writers adopt may affect how they respond to planning conditions (Galbraith, 1999). Research has shown that there is considerable variation in the strategy that different writers prefer (Galbraith & Torrance, 2004) but the choice of strategy across tasks and school years is relatively stable in individual writers (Torrance et al., 1994). Galbraith and Torrance distinguished four broad types of writers: (1) planners who work out the content before setting pen to paper, (2) revisers who work out their ideas in the course of writing, letting them evolve over a series of drafts, (3) mixed strategy users, and (4) writers who use neither planning nor revising strategies (i.e. the non-users). Galbraith (1992) associated strategy choice with the extent to which writers monitor their self-representation against the social and situational context; that is, high self-monitors are planners and low self-monitors are revisers. The former benefit from PTP while the latter perform better under an OLP condition. Kieft et al. (2007) found that both planners and revisers benefited from training on pre-planning strategies while ‘non-users’ benefited from instruction directed at revising. From this perspective, we can expect the effect of PTP to vary across individuals and writing tasks. Because of this, we may find mixed results across planning studies with different participants using different writing prompts. Also, because of individual differences we may not see any group differences between a PTP and a no-planning (NP) condition within a single study.
2 Models of oral task performance
Previous studies investigating the effect of pre-task planning on writing frequently draw on two psycholinguistic models of oral task performance, namely, the Limited Attentional Capacity Model (Skehan, 1998, 2015; Skehan & Foster, 2001) and the Cognition Hypothesis (Robinson, 2001, 2005, 2011). Similar to the Overload Hypothesis, Skehan (1998) envisaged competition between meaning and form during task performance due to limited WM resources. He posited a trade-off effect among fluency, complexity and accuracy where a focus on one aspect leads to less attention to others during production (Skehan & Foster, 2001). Pre-task planning was predicted to promote fluency and complexity and perhaps also improve accuracy if learners rehearsed their text while planning (Skehan, 2015). Robinson (2001) viewed PTP as a resource-dispersing factor affecting the procedure demands a task entails. His model claims that removing PTP directs learners’ attention away from their linguistic resources and has a negative effect on fluency, complexity and accuracy. Thus, PTP can enhance task performance on all three aspects. Robinson differs from Skehan in that he views the effect of PTP on accuracy as requiring no deliberate attention to form during planning.
Compared to the writing models, Skehan’s and Robinson’s models offer more specific predictions on how particular aspects of language performance can be affected by PTP. Both models were developed to account for oral task performance by drawing on Levelt’s (1989) model of speech production. Levelt (1989) distinguished three processing stages for speaking – conceptualizing, formulating and articulating – regulated by a self-monitoring process where the speakers access and inspect their overt and/or internal speech. These cognitive processes for speaking are quite similar to the writing sub-processes identified by the writing models. There are thus grounds for utilizing Skehan’s and Robinson’s models for investigating written production as has been done in previous studies (e.g. Ellis & Yuan, 2004).
However, as some scholars (e.g. Galbraith & Baaijen, 2019; Kormos, 2014) have pointed out, there are fundamental differences between speaking and writing. First, there is little real-time communication pressure in writing; and second, the text produced is visible throughout. This allows learners more flexibility for coordinating different cognitive demands (Williams, 2012) and more opportunity to attend to both meaning and form during online planning (Ellis, 2021). Therefore, as Ellis speculated, it may be OLP rather than PTP that has the greater impact on the complexity and accuracy of the written product, especially when there is no time pressure to write rapidly.
III Literature review
Previous studies investigating the effect of PTP on L2 written products have produced mixed results. One reason may be the variation in how the planning conditions have been operationalized and how the written product was measured. To enable a meaningful comparison across studies, we will first consider these issues.
The planning studies can be divided into three groups according to how the NP condition was operationalized: (1) writing time controlled (following a pilot study), (2) writing time controlled (without a pilot study), and (3) overall task time controlled. The classical research design involves controlling the writing time, namely allocating the same time for writing to both the PTP and NP conditions with the former given extra time (e.g. 10 minutes) for pre-task planning. Studies adopting this design can be further divided based on whether there was pilot testing. Some researchers (e.g. Ellis & Yuan, 2004; Tabari, 2020) carried out a pilot study where learners were given unlimited time for composing. They noted down the actual time spent and used the shortest or average time as the time allocated in the main study. This was likely to lead to pressured OLP while writing. There are also researchers who decided the time limit based on common practice in classrooms or exams (e.g. Johnson et al., 2012; Sattarpour & Farrokhi, 2017). Though ecologically valid, these studies tell us little about the extent to which the NP condition pushed the participants to write faster. The third group of planning studies controlled for the total time-on-task. For example, in Ong & Zhang (2010), the PTP condition involved 10 minutes’ planning followed by 20 minutes’ writing and the NP condition 30 minutes writing. When operationalized this way, the NP condition will result in more opportunities for online planning than the PTP condition.
The planning studies have typically measured the written product in terms of complexity, accuracy, lexis and fluency (CALF), using different indices to tap these multidimensional constructs. Following the suggestions of Ellis (2019) and Johnson (2017), we have grouped the different CALF measures according to sub-constructs to facilitate meaningful comparisons across studies. Table 1 shows the main sub-constructs with examples of the measures used. We then examined the statistical significance and effect size of each subconstruct separately. Cohen’s d was calculated using the statistics provided in the original studies. A summary of the results can be found in Appendix A.
Subconstructs identified for the complexity, accuracy, lexis and fluency (CALF) metrics.
The effect of PTP seems to depend on the research design. Among the 13 studies that controlled for the writing time, the results are relatively consistent regarding complexity and fluency. Nine studies found a significant positive effect on at least one measure of complexity, and eight out of the 10 studies investigating Fluency I (syllables per minute of the transcription time) reported a benefit for PTP. The results seem to be more consistent when the writing time was based on pilot testing. It seems that the more the learners were pressured during the actual writing, the more they benefitted from PTP. Such a tendency is even clearer for accuracy. There were only four studies reporting a significant effect, all of which involved a pilot study. The above observation lends support to Ellis’ (2019, 2021) assertion that any effect of PTP will be mediated by OLP. It also points to the possibility that offering more time for OLP will compensate for the lack of PTP. The best way to test this possibility would be to adopt a research design that controls for the total time-on-task. However, there are only four studies to date that have done this. These studies tend to show a non-significant difference between the PTP and NP conditions for syntactic complexity and accuracy, and mixed results for lexis and fluency. See Appendix A.
There is a lack of studies investigating the effect of PTP on computer-based writing. Many planning studies failed to mention the method of transcription or to consider its possible influence. This is understandable when pen and paper was the norm. The only study that explicitly stated that composing happened on a keyboard is Tabari (2021), but this study only controlled for writing time not for total time-on-task.
IV The study
Pre-task planning is an important component of traditional L2 writing pedagogy and is frequently recommended in teacher guides (Ellis, 2021, 2022). In the context where we conducted the study, students are often advised or required to pre-plan. However, as the literature review above has demonstrated, research has not consistently found PTP to be effective. We have shown that this is possibly because of the variation in research designs, with most PTP studies controlling for the time spent writing and only a few for the total time-on-task. The present study adopted the latter design. In accordance with Ellis’ (2021) position regarding the role of planning in writing, we argue that research comparing PTP with a NP condition that allows opportunities for online planning has greater pedagogical validity, especially in examinations where the time-on-task is limited and students need to decide whether to spend time on PTP or to get started right away and spend more time planning online.
The study set out to add to the few studies that have investigated the effect of pre-task planning on computer-based writing. A secondary aim was to investigate how learners viewed pre-task planning and whether their strategic preferences could explain the effects of planning. The research questions were as follows.
• Research question 1: What effect does pre-task planning have on the complexity, accuracy, lexis, fluency, idea density, and coherence and cohesion of the written product in computer-based L2 writing?
• Research question 2: Do differences in the learners’ preferences for PTP and NP mediate the effects of PTP on their written products?
The theories of writing and models of task performance along with the results of previous studies point to different possibilities about the effect of PTP on written production. We are not able, then, to propose specific hypotheses. Instead, we use the study to examine the following possibilities:
• Possibility 1: PTP may reduce the cognitive load on online planning since content is prepared beforehand and available in planning notes leading to increased writing fluency as posited by the Overload Hypothesis (Kellogg, 1990). PTP may also result in greater attention to language leading to improved complexity and/or accuracy as proposed by the models of task performance.
• Possibility 2: PTP may inhibit feedback from the translating process resulting in degraded writing performance as claimed by the Interaction Hypothesis (Kellogg, 1990).
• Possibility 3: Learners will vary individually in terms of how PTP and OLP affect the written product, as suggested by Galbraith’s (1999) model. Some writers may benefit from PTP while others do not, depending on their strategy preferences.
• Possibility 4: There will be no difference between PTP and NP on complexity or accuracy when the NP condition affords more opportunities for online planning as previous studies have indicated. This is compatible with Ellis’ (2021) argument that writers may attend to both meaning and form during OLP just as they do in PTP.
V Method
1 Context and participants
The study was conducted in an English-as-a-foreign-language (EFL) context in China. The participants were 48 second-year English majors enrolled in a four-year program in Shanghai International Studies University. They were volunteers recruited from a general writing course where they met with the instructors for 90 minutes once a week. The participants were 40 females and 8 males, who were all native speakers of Chinese aged between 19 and 22 years (M = 19.50, SD = 0.62). They had studied English for at least 10 years. Twenty-one participants had visited an English-speaking country, but most of them (n = 16) for less than a month. Only four had spent more than a month in an exchange or summer camp program lasting 6–8 weeks. In general, the participants received 12–15 hours of classroom instruction per week covering all major language skills. Argumentative writing was an integral part of the program and was also assessed in the national standardized examination. According to their instructors, their proficiency level ranged from intermediate to advanced.
2 Design
The study adopted a within-participant quasi-experimental design. We chose a within-group design for two reasons: (1) it helped to control for possible interference of learner factors such as WM capacity and language proficiency, and (2) it reduced the number of participants needed to enable meaningful statistical analyses. The second reason was of practical importance in our research setting as pilot studies had shown we would have difficulty in finding the participants needed for a between-group design.
The independent variable was the planning condition; the dependent variables were complexity, accuracy, lexis, fluency (CALF), and the content and discourse quality of the written product. Each participant wrote two argumentative essays on different topics on a computer under two different planning conditions: a pre-task planning condition (PTP) and a no planning condition (NP). The PTP condition was operationalized as 10-minute individual strategic planning. The participants were given 10 minutes to plan before starting to write for 30 minutes. They were instructed to plan individually on a planning sheet with minimal guidance in terms of what and how to plan (see Appendix B). They were encouraged to make detailed notes but only in words and phrases not complete sentences. We allowed the participants to keep their notes until they finished writing. The no planning condition required the participants to start as soon as possible and write for 40 minutes. Thus the total time-on-task was the same for both PTP and NP conditions.
The planning conditions and the writing topics were counter-balanced across sessions to avoid possible practice and fatigue effect. As shown in Table 2, participants were randomly assigned to two groups of equal number: Group 1 had the NP condition in the first session and the PTP condition in the second session, and Group 2 vice versa. Each group was further divided into two subgroups, one having Topic A first and the other writing on Topic B first.
Counter-balancing the conditions and topics.
Notes. NP = no planning. PTP = planning. A = Topic A. B = Topic B.
3 Instruments
The data collection instruments included a background questionnaire, two writing prompts and a post-task questionnaire. The background questionnaire was used to collect information on the basic demographics and English learning history of the participants. Two different writing prompts were used, both of which were modelled on IELTS (Academic) Writing Task 2. Each prompt provided two alternative arguments on a same issue and asked the learners to discuss both of them and give their own opinions (see Appendix C). The length requirement was 250–300 words. The post-task questionnaire elicited learners’ views about PTP. It consisted of a Likert scale question, a dichotomous choice and an open-ended question (see Appendix D). It asked learners to report how useful they found pre-task planning for the writing they had just completed, to indicate whether they preferred PTP or NP and to provide reasons for their choice.
4 Procedures
Data collection was carried out mainly in small groups in a computer laboratory, with one-on-one sessions offered when the students’ schedules were hard to coordinate. The participants were first briefed about the whole process, signed the consent form, and filled in the background questionnaire. They then completed the two 40-minute writing sessions with a 10–15 minutes break in-between. Before each writing session, the participants were given 2–3 minutes to read and understand the writing prompts and to ask for clarification if needed. During the writing sessions, the learners wrote in a test-like condition where no dictionary or any other external resources were permitted. They were allowed to turn in the essay early with the exact starting and submitting times recorded by a keystroke logging program (Leijten & Van Waes, 2013). The post-task questionnaire was given out after the second writing session. All instructions were in Chinese.
5 Measures
The participants’ written products were analysed using multiple measures of complexity, accuracy, lexis and fluency (CALF), as recommended by Norris and Ortega (2009). There were also measures of content and organization.
a Complexity
Following Norris and Ortega (2009), the study assessed three different sub-constructs of syntactic complexity: (1) global complexity as measured by mean length of t-units, (2) clause-level complexity as the percentage of subordinate clauses, and (3) sub-clausal complexity as mean length of clauses.
b Accuracy
The study used a general measure of accuracy: percentage of error-free clauses; and two specific measures: the percentage of clauses containing local and global errors. An error was defined as any violation of rules of syntax, morphology, lexical choice and collocation (Ellis & Yuan, 2004). Local errors are errors that affect single elements in a sentence and do not usually hinder communication; global errors are those that affect the overall organization of a sentence and may cause difficulties in understanding the message (Burt, 1975).
c Lexis
Lexical variation was measured by Malvern and Richards’ (2002) vocd and was calculated using Coh-Metrix 3.0. This is an advanced version of the mean segment type-token ratio that takes account of the length of text analysed. Two measures were employed to capture lexical sophistication: the average word frequency and range of content words. The former measures the mean frequency of content words against a corpus, and the latter reveals how widely words are used across contexts. Both were calculated using TAALES 2.0 (Crossley & Kyle, 2018). We elected to examine just content words rather than all the words to avoid the indices being inflated by the frequent use of functional words such as ‘the’ and ‘of’.
d Fluency
Following Ong and Zhang (2010), we used two different fluency measures: Fluency I was measured by the number of characters per minute of the actual writing, and Fluency II was measured by characters per minute of the total time-on-task (i.e. planning plus writing time). The counts of characters included letters and punctuation marks but excluded spaces as calculated by Microsoft Word.
e Content and organization
The content quality of the essays was operationalized as propositional idea density, calculated through the software CPIDR 5.1 (Brown et al., 2008). Coherence and cohesion were measured using three different indices provided by Coh-Metrix: the frequency of connectives, stem overlap between adjacent sentences, and Latent Semantic Analysis (LSA). LSA considers semantic overlap between ‘explicit words and words that are implicitly similar or related in meaning’ (Dowell et al., 2016, p. 77), and is believed to reflect both cohesion and coherence (McNamara et al., 2007).
6 Analysis
The data collected were first screened for technical quality and ‘condition membership’, i.e. whether the PTP and NP condition had been operationalized successfully. This was done mainly by examining the keystroke logs, the planning sheet, and field notes. Anyone who spent more than 2 minutes before typing the first word in the NP condition was excluded from the dataset. The final sample size was 43.
To enable the calculation of complexity and accuracy measures, we manually coded the essays for clauses, erroneous clauses, subordinate clauses and t-units. A code name was given to each script at the onset, and the code names and conditions under which the scripts had been produced were kept separate throughout the coding process. Intra-rater reliability of the coding was calculated for all features. At least 4 weeks after the initial coding, over 10% of the data (9 scripts) were randomly selected and coded again by the first author. For the error coding, we also attempted to establish inter-rater reliability. Nine randomly-selected scripts were coded independently by a native-speaker of English after receiving 15 hours of training. The percentage agreement was above 80% for all features; Cohen’s Kappa ranged from 0.689 to 0.825. For indices calculated using automated analysis tools, careful pre-cleaning was performed beforehand in accordance with user manuals. The questionnaire data were used to divide participants according to strategy preferences. Details of this analysis are provided later.
Statistical analyses were carried out using SPSS 27. A repeated-measures MANOVA was used to compare planning conditions on the various measures. We elected to use a multivariate test, instead of multiple univariate tests (e.g. paired sample t-tests), as it reduces the number of tests needed and thus lowers the probability of Type I errors (Field, 2018). 1 We could have chosen to conduct t-tests with Bonferroni corrections but this would necessitated a very low alpha level for significance, potentially leading to Type II errors (Larson-Hall, 2010). 2
To secure adequate test power, the dataset was carefully tested against the assumptions for MANOVA. Normality was checked by examining the histogram of the data distribution, the z scores of skewness and kurtosis (-1.96 – 1.96) and the results of the Shapiro-Wilk test (> .05). Outliers were spotted through z scores, P_P plots and histograms. Since there is no non-parametric equivalent to MANOVA, violations to normality were corrected through transformation and winsorizing following the steps outlined in Field (2018, pp. 263–270). MANOVA also assumes the absence of multicollinearity, i.e. there should be no strong correlation between dependent variables. This was examined through multiple regression analyses. Most variables met the requirements of MANOVA after appropriate correction; the rest were investigated using separate univariate tests. The results of assumption testing are reported in the next section. Cohen’s r was calculated for pair comparisons in post-hoc tests following the MANOVA while Cohen’s d was calculated for paired sample t-tests. The effect sizes were interpreted using the disciplinary specific benchmarks for small, medium and large effect sizes – 0.25, 0.4, 0.6 for r and 0.4, 0.7, 1.0 for d – as suggested by Plonsky and Oswald (2014).
VI Results
1 Assumption testing
Assumption testing revealed light positive skewness of the data distribution for Fluency I and semantic overlap (i.e. LSA) and moderate positive skewness for word frequency (WFr) and Fluency II. Square root and log 10 transformation were performed, respectively. Since the scores of WFr range from zero to 1, they were multiplied by 100 before the transformation to avoid ending up with negative scores that are hard to interpret. Extreme outliers (z > 3.29) were spotted among scores on mean length of t-units (MLT), global errors, idea density and connectives, and were corrected using the technique of winsorizing. Once corrected, normality was achieved for all dependent variables. Descriptive statistics were calculated both before and after correction (see Table 3).
Descriptive statistics for dependent variables before and after correction.
Notes. *M = Mean, SD = standard deviation, CI = 95% confidence interval. **win = winsorizing, sqrt = square root, log = log10, 100 log = multiplied by 100 and then log10 transformed. ***MLT = mean length of t-units, Sub = subordination, MLC = mean length of clauses, WFr = word frequency, Cnn = connectives, LSA = Latent Semantic Analysis.
To avoid multicollinearity, it is necessary to carefully select the dependent variables to include in a MANOVA (Tabachnick & Fidell, 2014). The study employed both an overarching measure for complexity and accuracy (i.e. MLT and error-free clauses) and measures tapping different sub-constructs such as subordination and local accuracy. Pearson correlations showed that these sub-construct measures significantly correlated with their respective superordinate measures (/r/ > 0.4, p < .05). We decided to include only the overarching measures in the multivariate test. Strong correlations were also found between stem and semantic overlap and between the two fluency measures (r > 0.7, p < .001), so only one measure from each pair was included in the one-way repeated-measure MANOVA. The remaining nine variables, as listed in Table 4, were examined using multiple regression analysis. The results suggested no multicollinearity: VIF = 1.469–4.520, tolerance = 0.221–0.681.
Results of post-hoc ANOVAs.
Notes. *M = mean; NP = no planning, PL = planning. **Observed power: calculated using alpha = .05. ***MLT = mean length of t-units, WFr = word frequency, LSA = Latent Semantic Analysis; win = winsorized, log = log10, 100log = multiplied by 100 and then log10 transformed, sqrt = square root.
2 Descriptive statistics for the PTP and NP condition
As shown in Table 3, large standard deviations (SD) were found for many dependent variables in both PTP and NP conditions, suggesting considerable within-group variation. To get a clearer view, we computed a difference score (Dif) for each dependent variable using the formula Dif = PL – NP to see whether a learner performed better in the PTP than the NP condition. We then calculated standard deviations (SD) for the Difs. For measures with a scale from zero to 1 (e.g. word frequency and LSA), we multiplied the SD by 100 to enable easier comparisons across measures. The analysis revealed large SDs (> 10) for subordination, overall accuracy and local errors, vocd and word frequency, and all three measures of coherence and cohesion. The results suggested large individual differences for the effect of PTP on these measures.
3 Inferential statistics: The effect of pre-task planning
Using Pillai’s trace, the multivariate analysis revealed a significant main effect for pre-task planning: V = 0.763, F(9, 34) = 12.187, p < .001, ηp2 = 0.763. Post-hoc ANOVAs showed that the effect was largely due to the between-condition difference for Fluency I: F(1, 42) = 78.418, p < .001, r = 0.807. No significant effect was found for any of the other measures (see Table 4) nor did the effect sizes reach the benchmark for small.
Paired samples t-tests were conducted for the six measures excluded from MANOVA, and Cohen’s d was calculated. To enable easier comparisons across dependent variables, Cohen’s d was converted to coefficient r using the following formula:
Results of paired-sample t-tests.
Notes. *M = Mean, NP = no planning, PL = planning, MDif = mean of paired differences, SDDif = standard deviation of paired differences. **MLC = mean length of clauses; win = winsorized, log = log10. ***Test power, calculated using alpha = .05.
To sum up, the results showed that pre-task planning benefited Fluency I (r = 0.807) but with a small and negative effect on Fluency II (r = 0.336). No significant effect was found on any other measures of the written product.
4 Questionnaire: The role of strategy preferences
Research question 2 asks whether strategy preferences could explain the effect of the planning conditions. The questionnaire elicited participants’ ratings for the usefulness of PTP on a six-point Likert scale, with 6 representing very useful and 1 indicating not useful at all. The mean and standard deviation of the ratings were calculated: M = 4.01, SD = 1.18. Participants were also asked to indicate their preference between the two conditions. Most of them (n = 36) selected PTP with only six learners preferring the no planning condition. However, we noticed that there were participants offering inconsistent answers to the two questions. Ten learners stated a preference for PTP but gave a low score for how useful it is; one writer preferred the NP condition but found PTP relatively helpful. Based on these answers, we divided the participants into planners (n = 26) and non-planners (n = 16), with the former showing unambiguous preference for PTP and the latter preference for NP or were ambiguous about PTP.
To answer research question 2, we then examined the difference scores (Dif) by groups. We re-coded the data into two categories – values above and below the median for the whole sample – and calculated the number of writers in each category. Using Fisher’s Exact, we found no significant differences between the two groups for all measures (see Table 6). The results suggest that strategy preferences did not mediate the effects of the planning conditions.
Results of Fisher’s exact based on the difference scores.
Notes. MLT = mean length of t-units. MLC = mean length of clauses. LSA = lantent semantic analysis.
VII Discussion
Research question 1 asked about the effect of pre-task planning on the product of L2 writing. The results revealed significant effects only for writing fluency. There were large standard deviations in the Dif scores for measures of complexity, accuracy, lexis and coherence and cohesion, indicating considerable variation in how PTP affected different writers. No group differences reached significance on these variables. Research question 2 investigated the role of strategy preferences. The results suggested no mediating effect of strategy preferences on the effect of the planning conditions. Our results corroborate the research findings of some previous planning studies but differ from those of others.
In Section IV, we identified four theoretical positions for the effect of PTP: (1) the Overload Hypothesis, (2) the Interaction Hypothesis, (3) Galbraith’s (1999) proposal concerning the varying effects of PTP on individual writers, and (4) Ellis’ (2021) argument concerning the role of online planning. Here, we will first discuss the results in terms of each position and then consider why there is inconsistency with existing studies.
The Overload Hypothesis (OH) claims that PTP reduces the cognitive load for online planning and predicts benefits for both fluency and text quality. Our results lend only partial support to OH as they showed a large improvement for Fluency I under PTP but no effect for any measures indicative of text quality (e.g. complexity and accuracy). Kellogg (1996) argued that PTP would benefit writing performance by funnelling writers’ attention to content and thus allowing more resources available for language while writing. Skehan (2015) suggested that PTP would lead to more complex content and language. Both models seem to assume that the content that learners conceptualized during PTP would be translated into verbal messages and included in the resulting essay. This may be true for adult L1 writers for whom linguistic encoding is highly automatized. For L2 learners, however, they may lack the linguistic resources required to encode the planned content. When this happens, learners may elect to use the online planning time to formulate new content plans that are linguistically less demanding. In such cases, PTP does not necessarily reduce the cognitive resources devoted to proposing.
The Interaction Hypothesis (IH) describes a two-way interaction between proposing and translating where text production provides important feedback for idea generation. The IH argues that PTP will interrupt the interaction and damage writing performance. Our results lend only partial support to this hypothesis: PTP had a small negative effect on text length as indicated by Fluency II but no effect on any measures reflecting text quality. A possible explanation is that PTP as operationalized in this study offered opportunities for the proposing-translating interaction. The participants may have translated part of their content plans and stored the outcome in their planning notes. This would allow the proposer some ‘inhibitory feedback’ (Galbraith, 1999) from the translator and enable writers to test the ‘linguistic feasibility’ of their content plans as well. Another possible explanation concerns the type of writing we investigated. Compared to pen-and-paper writing, computer-based writing allows writers more flexibility as they can revise easily with no worry of producing messy essays full of visible corrections. They could thus adopt strategies to deal with challenges caused by separating planning and text production.
Galbraith’s (1999) Knowledge-Constituting Model proposed that: (1) writers may vary individually in their responses to PTP and OLP – some may benefit more from the former and others more from the latter, and (2) strategy preferences could be one of the factors affecting writers’ responses. The lack of group difference and the large SDs for difference scores (Dif) on almost all aspects of writing except fluency provide some support for these arguments. However, we found no difference between participants who indicated a clear preference for PTP and those who did not. In other words, strategy preferences had no mediating effect on planning conditions contrary to Galbraith’s prediction. In search of an explanation, we conducted a post-hoc analysis examining the Dif scores of individual learners on the different measures. Three general patterns emerged: (1) there were large positive Difs for almost all participants on at least one aspect of writing besides fluency, (2) there were also large negative Difs for most writers on at least one measure and (3) the positive and negative Difs tended to appear on different measures for different learners. It seems that participants benefited more from PTP on some aspects of writing and more from OLP on others. In other words, the aspects affected by each planning condition varied from one individual to another but variably not in terms of any general difference in preferences for PTP or NP.
Ellis (2021) predicted that when writers were asked to start immediately but given more writing time in the NP condition, they would engage more fully in online planning with similar benefits to pre-task planning. Our results support these predictions. The lower Fluency I found in the NP condition suggests that the participants spent more time planning online than in the PTP condition as expected. The non-significant effect found on all measures indicative of text quality confirms Ellis’ hypothesis that prolonged OLP in writing can compensate for the absence of PTP. What was not envisaged by Ellis (2021) is the differences in what individual learners took from pre-task and online planning as explained above. It is possible that because of the extra flexibility that computer-based writing offers compared to pen-and-paper writing, writers have more options while coordinating the different cognitive demands and can sequence the cognitive processes in different ways to cope with cognitive overload. In other words, computer-based writing allows learners to individualize their approach when allocating attention to different aspects of writing, which may in turn affect what they take from PTP. Further research of a more qualitative nature is needed to explore what factors are responsible for writers’ individual approaches.
We turn now to compare our results with those of other studies. In fact, there are only four studies that investigated the effect of PTP while controlling for time-on-task as in our study: Gauthier (2007), Lin (2013), Ong and Zhang (2010), and Tabari (2018). Our results confirm the general finding of these studies, i.e. there was no effect on complexity and accuracy. For measures where the results were mixed, our study supports some of the previous studies but contradicts others. For example, our results roughly corroborate those of Lin and Tabari on lexis but we did not find the significant negative effects of PTP reported in Gauthier and Ong and Zhang (see Appendix A). As for fluency, our results are consistent with Lin and Tabari but not with the other two studies.
A possible explanation lies in the differences in the participants in these studies. Based on the analysis of the Dif scores, we found that PTP differed in the aspects it affected in different learners. Because the participants varied from one study to another, inconsistent results can be expected. Another possible explanation may be the differences in task implementation. Unlike the present study, Ong and Zhang (2010) allowed learners to review their essay once it was written and produce a second draft. Since this was made clear to the participants at the onset, it may have induced a revising strategy (as opposed to a planning strategy). According to Galbraith (1999), it would encourage more spontaneous articulation of thoughts rather than deliberate rhetorical planning, as Ong and Zhang themselves pointed out. This explains why their participants produced much longer texts when given more writing time in the ‘free-writing’ condition rather than slowing down and spending more time planning online as expected, resulting in a much larger negative effect of PTP on Fluency II (r = 0.608) than in our study (r = 0.336) and a non-significant effect of PTP on Fluency I (r = −0.14).
VIII Conclusions
The study set out to investigate four different positions for the effect of PTP on the L2 written product. Controlling for the total time-on-task, we operationalized the PTP condition as 10 minutes of unguided strategic planning followed by 30 minutes writing and the NP condition as 40 minutes writing starting immediately. We also collected learners’ views about PTP and examined the possible influence of their strategic preferences. The results showed that PTP largely improved writing fluency when this was calculated using the actual writing time (i.e. Fluency I) and had a small negative effect when the production rate was measured against the total time-on-task (i.e. Fluency II). No significant effect was found on any other measures, nor was there any mediating effect observed for strategic preferences on the effect of planning conditions. There were large standard deviations for almost all aspects of writing except fluency in both PTP and NP conditions. Post-hoc analyses of the Dif scores suggested considerable individual variation regarding which aspects of writing were affected by PTP and how they were affected (i.e. positively vs. negatively). In general, our results confirm Ellis’ (2021) prediction that no group difference will be seen when PTP is compared to a NP condition that allows more writing time. They also lend some support to Galbraith’s (1999) proposal that the effect of PTP can vary across individuals although we found no evidence of any moderating effect of strategy preference on planning conditions as Galbraith predicted.
From models of L1 writing we identified three different positions on the effect of PTP: the Overload Hypothesis associated with Hayes and Flower’s (1980) and Kellogg’s (1996) models, the Interaction Hypothesis consistent with Chenoweth and Hayes’ (2001, 2003) model, and the position of Galbraith’s (1999) Knowledge-Constituting Model. Underlining these positions are different views about how cognitive processes are coordinated and managed as a functioning system. The OH and IH emphasize the competition and the interaction among cognitive operations, respectively, while Galbraith assumes the existence of both competition and interaction and argues that writers will vary in how they respond to different tasks and writing conditions. The present study manipulated the writing condition and found considerable individual differences in how learners were affected. In this regard, it lends support to Galbraith’s (1999) view on process management during writing. Skehan’s (1998, 2015) and Robinson’s (2001, 2005) models of task performance have been frequently cited by previous studies investigating the effect of planning on writing. The present study found no evidence of any benefit for PTP on complexity and accuracy that both Skehan’s and Robinson’s models envisage, and thus support Ellis’ (2021) calls for caution in generalizing theoretical predictions across production modes.
The study was initially motivated by a pedagogical concern, namely whether pre-task planning is a useful strategy for computer-based writing. Based on our results, we would recommend language teachers ask students to pre-plan when they want to encourage fluent writing. However, teachers should not expect PTP to always result in better content or organization or in more sophisticated or accurate language. Our study has shown that there is considerable individual variation in how students respond to PTP. Perhaps, then, as Ellis (2022) has suggested, teachers should experiment with implementing PTP in different ways to see what works for particular students. This could involve giving students the option of PTP or no PTP. It could also involve varying the conditions in which PTP is carried out (e.g. with or without training in how to plan) and asking them to reflect on their writing experiences in the different conditions. Perhaps the most important implication of our study is the need to guard against blanket recommendations regarding PTP.
As always, the study was not without limitations. First, it would have been useful to have had a third condition – 30 minutes writing starting immediately. This would have allowed a comparison of no-planning conditions that differed in the opportunity for OLP. It was not included in this study due to logistical constraints – the impossibility of recruiting sufficient participants in our research setting. Including a third condition would have required another writing session from the learners, which would have made the recruitment of volunteers even harder. Also, the study did not investigate the factors that led to the individual differences we found in the participants. Drawing on Galbraith’s model, we did test one possible contributing factor – strategy preference – but found no evidence of any mediating effect. Future studies should investigate the possible influence of other factors that Galbraith proposed such as topical knowledge and personal orientation. The study is also limited as it focused on the written product alone offering little information on how learners arrived at the texts they composed. Future studies should also consider the process features involved (e.g. pauses and revisions), how these features relate to aspects of the product and whether the relationships are affected by planning conditions. We plan to undertake this in future publications.
Footnotes
Appendix A
Results of previous planning studies
| Design | Study | Size | Complexity | Accuracy | Lexical complexity | Fluency | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Global | Sub** | Phrasal | General | Specific | Variation | Sophi | Speed I | Speed II | |||
| W-ctrl, pilot* | Ellis and Yuan (2004) | 42 | –*** | More (0.73) | – | No (0.0) | No (0.43) | No (0.06) | – | More (1.45) | – |
| Rostamian et al. (2018) | 60 | – | – | No (0.07) | No (–0.08) | No (–0.19) | No (–0.05) | – | More (1.17) | – | |
| Tabari (2020) | 120 | – | More (0.88) | More (0.95) | More (0.56) | More (0.28) | More (0.30) | – | More (0.79) | – | |
| Tabari (2021) | 60 | More (0.92) | More (0.83) | More (0.60) | More (0.83) | – | No (0.02) | No (0.27) | More (0.85) | – | |
| Farahani and Meraji (2011) | 123 | – | More (1.28) | – | More (0.40) | No (0.06) | No (0.29) | – | More (0.88) | – | |
| Meraji (2011) | 75 | – | More (1.23) | – | More (0.58) | No (0.60) | No (0.30) | – | – | – | |
| Tabari (2016) | 78 | No (0.0) | – | – | No (0.0) | – | Less (–1.32) | – | More (0.73) | – | |
| Tabari (2017) | 90 | – | More (1.86) | – | – | – | No (0.67) | – | – | – | |
| W-ctrl, no pilot |
Johnson et al. (2012) | 968 | No | – | No | – | – | No | No | No | – |
| Lin (2013) | 75 | No (–0.11) | No (–0.25) | – | No (–0.03) | No | No (0.25) | More (0.38) | No (0.00) | – | |
| Rahimi and Zhang (2018) | 80 | – | More (0.87) | No (0.14) | No (0.24) | – | No (0.20) | No (0.14) | More (0.59) | – | |
| Sattarpour and Farrokhi (2017) | 226 | – | More (0.72) | – | No | – | No | – | – | – | |
| Tabari (2018) | 160 | More (1.12) | No (0.60) | More (0.65) | No (0.21) | no | More (0.86) | – | More (1.07) | – | |
| T-Ctrl | Gauthier (2007) | 24 | – | No (0.42) | No (0.40) | – | No (–0.07) | Less (–1.17) | No (–0.28) | – | No (0.55) |
| Lin (2013) | 75 | No (–0.25) | No (–0.03) | – | No (0.18) | No | No (0.11) | More (0.35) | More (1.47) | – | |
| Ong and Zhang (2010) | 108 | – | – | – | – | – | Less (–0.61) | – | No (–0.14) | Less (–1.53) | |
| Tabari (2018) | 160 | No (–0.13) | No (–0.12) | No (0.08) | No (0.52) | No | No (0.14) | – | – | No (–0.42) | |
Notes. *W-ctrl, pilot = writing time controlled (following pilot-testing); W-ctrl, no pilot = writing time controlled (without pilot); T-Ctrl = time-on-task controlled. **Sub = Subordination, Sophi = sophistication. ***More = a positive effect, no = no significant effect, less = a negative effect, ‘–’ = not measured; numbers in brackets = Cohen’s d. An effect is labelled ‘more’ and ‘less’, if (a) it was reported as statistically significant or (b) the effect size reached medium or higher. The benchmarks for small, medium and large Cohen’s d are 0.4, 0.7 and 1.0, respectively (Plonsky & Oswald, 2014).
Appendix B
Appendix C
Appendix D
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by a joint doctoral scholarship awarded to the first author by Curtin University and the China Scholarship Council, No. 201708250024.
