Abstract
Scholars have long studied the effect of corrective feedback strategies on the writing ability of language learners, but few have formed designs in which more than three feedback strategies have been used. In this research, the ultimate goal was to discover how International English Language Testing System (IELTS-) candidates could be helped to perform better in the writing component of the test with the feedback they get. To this end, 186 learners attending IELTS preparation classes in three different English language institutes participated in this quasi-experimental study. A one-way ANOVA was run to discover the significant difference among the six groups. The findings proposed that Iranian English as a Foreign Language (EFL) students’ writing ability improved as a result of the employment of writing feedback strategies but that reformulation strategy was the most effective one. Teachers can, thus, benefit from the finding of this research by studying the way they should tackle the learners’ inaccurate productions as far as different writing score band descriptors are concerned.
Introduction
One area in second language acquisition that has always attracted practitioners is Corrective Feedback (CF). Scholars and practitioners have studied different aspects of error correction such as the type that best fits certain groups of learners, the time that errors should be taken care of, the type of error that should be addressed, and many more areas. CF has been studied from different perspectives: role of anxiety (see, for example, DeKeyser, 1993), efficiency (see, for example, Lightbown & Spada, 2006; Mohammadi, 2009; Swain, 1985), obtrusiveness (see, for example, C. J. Doughty, 2003; C. Doughty & Varela, 1998; Long, Inagaki, & Ortega, 1998), learners’ preferences (Elwood & Bode, 2014), and proficiency (see, for example, Mackey & Philp, 1998) just to name a few. What is ostensibly missing from the bulk of research in this realm is an all-inclusive study whereby the effect of all these CF strategies is studied. R. Ellis (2009) also pointed out that no research has been carried out that encompasses all the different types of CF: There is an obvious need for carefully designed experimental studies to further investigate the effects of written CF in general and of different types of CF. [This] typology . . . [is based on] the type of CF . . . [making systematic research possible to examine] the effect of distinct types and combinations of CF. (p. 106)
The researchers, therefore, aimed at finding out which of the CF strategies best serves Iranian English as a Foreign Language (EFL) learners at an intermediate level with their writing proficiency in the second task of a high-stakes test, namely, International English Language Testing System (IELTS). This research, therefore, could pave the way for English language teachers to find out the best CF type especially for the second task of the IELTS exam for intermediate EFL learners. It can also be conducive to better language performance for the learners should they know what CF strategy best works for them.
Literature Review
There are different classifications for CF strategies proposed by different researchers (Burke & Pieterick, 2010; R. Ellis, 2009; Lyster & Ranta, 1997). However, these classifications differ in essence. Lyster and Ranta’s (1997) classification that encompasses six different categories, namely, clarification request, explicit feedback, recasts, metalinguistic feedback, elicitation, and repetition is mainly used for learners’ oral productions, although with a little modification, it could also be used for learners’ writing activities. A little modification should be exerted because a technique like elicitation in which the teacher might pause and suggest an erroneous part in the speaker’s performance is not possible in written form. The teacher should, thus, resort to an offline way of correcting rather than an online one, when the learners immediately get feedback.
Unlike that of Lyster and Ranta (1997), Burke and Pieterick’s (2010) classification focuses on the quality of feedback. Their evaluative and advisory types of feedback look at the writing performance of the learners with the aim of assigning a score on their past performance or with the aim of improving the quality of the learners’ written piece respectively.
Among all, the classification put forward by R. Ellis (2009) best serves the purpose of this research in that the focus in this research was writing and how and whether it could be improved via the different CF strategies. Ellis’s classification encompasses six major categories, namely, direct, indirect, metalinguistic, focused/unfocused, electronic, and reformulation. In the direct kind of feedback, the correct form of the inaccurate form is provided. According to Ferris (2006), this could be done by adding or omitting some words to form the correct form. This type of feedback could best work with elementary learners. However, teachers will have to spend a lot of time correcting the learners’ papers (Ferris & Roberts, 2001).
Conversely, in the indirect CF, the teacher indicates where the error exists by underlining or specifying the location of the error. Ferris and Roberts (2001) held that this kind of feedback is advantageous to the direct form in that the learners spend more time trying to figure out what is wrong, hence, more processing time. In other words, this will allow more reflection on the kind of error the learner has; thus, there will more cognitive processing.
Metalinguistic feedback could take one of two forms: use of error coding or a brief grammatical description. In the former type, the teacher writes some codes in the margin to suggest what problems learners have (e.g., wo for word order). Of course, the learners will have a list of the codes to avoid confusion. However, in the second type of metalinguistic feedback, the teacher numbers the errors and briefly provides a brief explanation for the error at the end of the text.
The next type of feedback according to R. Ellis (2009) depends on the focus of the feedback. As the name suggests in unfocused feedback, the scope of correction is unrestrained and the teacher could correct all extant errors, be it grammatical, lexical, sociolinguistic, or the like, but in focused CF, the teacher only focuses on what he or she has taught and ignores the rest of the errors. The processing time of errors in unfocused CF strategy might be overwhelming for the learners because the teacher pinpoints all errors.
The last two types of CF strategies are electronic feedback and reformulation, which are not as common as the ones mentioned earlier. In electronic feedback, learners use an electronic software. Use of an electronic corpus like concordancing can give learners the feedback they need. Reformulation as the last CF type in Ellis’s classification is a kind of feedback in which the teacher reconstructs the inaccurate part to make it more natural. In reformulation, the whole idea is to retain the original meaning but to reshape the form to make it more native-like.
All in all, many scholars suggest that CF strategies are fruitful and in some cases, some scholars favor direct over indirect (Ferris & Roberts, 2001). Conversely, several others discovered that indirect feedback results in either greater or similar levels of accuracy over time (Lalande, 1982; Robb, Ross, & Shortreed, 1986). Yet, some (Truscott, 1996, 1999, 2007) do not think highly of feedback not at least in the long run and claim that there is a gap in research findings regarding the long-time effect of feedback in Second Language Acquistion (SLA). What these scholars neglect is the immediacy of need. A lot of those who take part in a high-stakes test such as IELTS or Test of English as a Foreign Language (TOEFL) are not thinking of their improvement as a long-term goal but rather an immediate one. The claims made against CF by Truscott might be true when it comes to long-term goals. However, there are some people who will have to sit for some high-stakes tests and they might be after some immediate remedies to help improve their inaccurate performance and eventually get a higher score instantly, so they are more worried about their immediate needs rather than their long-term goals. This study could help find out whether any of the feedback types included in the study could have a statistically significant effect on the stakeholders’ writing performance.
Research Question and Hypothesis
To study the effectiveness of the different CF strategies, the researchers posed the following question:
The following null hypothesis was proposed for the aforementioned question:
Method
Participants
The participants in this quasi-experimental research included 186 BA/BS and MA/MS students studying at different universities across the country, and they were preparing themselves to sit for the IELTS exam to attend universities abroad, where the medium of instruction is English. The participants, chosen from the three different institutes where preparation courses were held, ranged between 21 and 35 years of age.
Design
The experimental phase of the present research was based on a pretest–posttest, quasi-experimental design, which involved six experimental groups, as presented in Table 1.
The Design for the CF Strategies Used for Experimental Groups.
The design used here lacked a control group for several reasons. First, depriving the learners from a treatment would be a design defect, for which a lot of researchers have reservations. Furthermore, it is unethical. Although some participants receive some kind of treatment, others in the control group are deprived of any form of CF. In addition, according to Ary, Jacobs, and Razavieh (1996) and Mackey and Gass (2005), when participants are randomly assigned to one of the experimental groups, the comparison of groups receiving different treatment provides the same control over alternative explanations as does the comparisons of treated and untreated groups. They argued that more common than comparing a treatment group with a group receiving no treatment is the situation where we compare groups receiving different treatments.
Procedures
R. Ellis (2009) proposed that there are six major feedback strategies. With the exclusion of one, that is, electronic feedback, and the inclusion of peer feedback, six different groups were required to start the research experimentally. The reason why electronic feedback was removed from the study was the fact that access to the electronic material was not easy at the institutes where the research was run, and because the participants had never been exposed to concordancing before, it would be considered too time-consuming for them to use. Accordingly, six groups were used, each of which was exposed to one kind of feedback strategies. Below, an explanation of how the participants were selected and how the study was carried out is discussed.
The participants were all bachelors or masters students or graduates who were planning to sit for the IELTS exam at three different language institutes in Tehran. On referring to the institutes to sign up for IELTS classes, they were given an IELTS exam to see whether they were qualified to attend IELTS classes or whether they needed some remedial work. If they were within an overall band score of 5 to 6, they could take part in the IELTS preparation classes; otherwise, they would be introduced to some other preparatory classes. Because the number of students in the classes did not exceed 15, for each of the six treatment groups, two IELTS classes were used. This would also account for the participant attrition that is a threat to the internal validity of the research. The homogeneity of the subjects to control the subject variability could also be controlled through the placement procedure. Overall, in all the six experimental groups, the number of students ranged from 25 to 28.
A one-way ANOVA was run to compare the mean scores of the six CF groups on the pretest of writing to prove that they were homogeneous in terms of their writing ability prior to the administration of the treatments. However, the two assumptions of normality and homogeneity of variances were probed before reporting the results of the one-way ANOVA. The students were then exposed to some writing strategies and required to write. The only thing that differed in the six groups was the CF strategy used to correct their writing.
In the final session of the course, the students were given the final writing task. This task served as the posttest to gauge the participants’ writing ability improvement. The writing samples were corrected 3 times by two different raters based on the criteria proposed by the Cambridge English for Speakers of Other Languages (ESOL) Center, which are grammatical range and accuracy, lexical resources, coherence and cohesion, and task achievement. Three ratings, two by one and a third one by another rater, could guarantee intra- and interrater reliability. As displayed later, significantly high reliability suggested that the scoring could be used as good source for deciding about the participants’ writing skill improvement.
Another one-way ANOVA was run to compare the mean scores of the six CF groups on the writing posttest to find out which error correction method had the most significant effect on the improvement of the students’ writing ability on the posttest. The two assumptions of normality and homogeneity of variances were probed before reporting the results of the one-way ANOVA. The ratios of skewedness and kurtosis over their respective standard errors were within the ranges of ±1.96, that is to say, the students’ scores on the posttest of writing enjoyed normal distribution.
When the F value indicated significant differences among the mean scores of the six CF groups on the posttest of writing, the researcher had to exercise a follow-up post hoc Scheffe’s test to compare the mean scores in pairs to see where the significant difference lay.
To test the null hypotheses, the researchers first had to make sure that there were inter- and intrarater reliabilities when the learners’ writings are rated. Therefore, analyses of these two measures follow.
Intra- and interrater reliability indexes
The students’ writings were rated 3 times by two raters. The first rater rated them twice with a time interval of 2 weeks. As displayed in Table 2, the intrarater reliability for the first rater’s two ratings is .78 (.000 < .05). Based on these results, it can be concluded that the two ratings of the first rater enjoy statistically significant intrarater reliability.
Intrarater Reliability Index.
The interrater reliability between the ratings of the second rater with the mean ratings of the first one is .94 (p = .000 < .05). Based on these results, it can be concluded that the two ratings enjoy statistically significant interrater reliability (Table 3).
Interrater Reliability Index.
4.2.11.2 pretest of writing
A one-way ANOVA was run to compare the mean scores of the six groups on the pretest of writing to prove that they were homogeneous in terms of their writing ability prior to the administration of the treatments. The two assumptions of normality and homogeneity of variances should be probed before reporting the results of the one-way ANOVA.
As displayed in Table 4, the ratios of skewedness and kurtosis over their respective standard errors are within the ranges of ±1.96. That is to say, the students’ scores on the pretest of writing enjoy normal distributions.
Normality of Writing Scores on Pretest.
Levene’s statistic tests the assumption of homogeneity of variances. Levene’s F value of 1.51 is not significant (p = .191 > .05). Thus, the second assumption as homogeneity of variances is also met (Table 5).
Homogeneity of Variances.
The SPSS output in Table 6 displays the means, standard deviations, and number of participants in all conditions of the experiment on the pretest of writing.
Descriptive Statistics: Mean of Writing Score on the Pretest.
The results of the one-way ANOVA (Table 7) indicate that there were not any significant differences between the mean scores of the six error correction methods on the pretest of writing (F = 1.67, p = .146 > .05, ω2 = .017). Based on these results, it can be concluded that the six groups were homogeneous in terms of their writing ability prior to the administration of the different error correction methods.
One-Way ANOVA Pretest of Writing.
4.2.11.3 posttest of writing
A one-way ANOVA was also run to compare the mean scores of the six error correction methods on the posttest of writing to find out which error correction method has the most significant effect on the improvement of the students’ writing ability on the posttest. The SPSS output in Table 8 displays the means, standard deviations, and number of participants in all conditions of the experiment on the posttest of writing.
Descriptive Statistics: Posttest of Writing.
The two assumptions of normality and homogeneity of variances should be probed before reporting the results of the one-way ANOVA. As displayed in Table 9, the ratios of skewedness and kurtosis over their respective standard errors are within the ranges of ±1.96. That is to say, the students’ scores on the posttest of writing enjoy normal distributions.
Normality of Writing Scores on Posttest.
Levene’s statistic tests the assumption of homogeneity of variances. Levene’s F value of 2.10 is not significant (p = .069 > .05). Thus, the second assumption as homogeneity of variances is also met (Table 10).
Homogeneity of Variances.
The results of the one-way ANOVA reveal that there are significant differences between the mean scores of the six error correction methods on the posttest of writing (F = 9.36, p = .000 < .05, ω2 = .49). Thus, the null hypothesis is rejected, and we can say that there are significant differences between the mean scores of the six error correction method groups on the posttest of writing. Table 11 displays the mean scores of the six groups on the posttest of writing.
One-Way ANOVA Posttest of Writing.
Figure 1 displays the mean scores of the groups on the posttest of writing. Reformulation was the most useful method used when correcting the writing performance of the students. Surprisingly, direct form as well as the indirect mode of correction, according to the graph, were the least fruitful techniques when compared with the rest of the CF strategies. These two CF strategies were then followed by metalinguistic, peer correction, and error-coding strategies.

Mean scores on posttest of writing.
Although the F value (F = 9.36) indicates significant differences between the mean scores of the six error correction methods on the posttest of writing, the post hoc Scheffe’s tests should be run to compare the mean scores in pairs. Based on the information displayed in Figure 1, it can be concluded that there are significant differences between the following pairs of means.
A: The reformulation group (M = 6.93) outperformed the direct form group (M = 5.68) on the posttest of writing.
B: The reformulation group (M = 6.93) outperformed the indirect form group (M = 5.68) on the posttest of writing.
C: The reformulation group (M = 6.93) outperformed the metalinguistic group (M = 5.86) on the posttest of writing (Table 12).
Post Hoc Scheffe’s Tests.
The mean difference is significant at the .05 level.
Discussion
Like most other studies so far (e.g., Chandler, 2003; Ferris, 2006; Sheen, 2007), this study investigated CF strategies. The investigation of the effect CF strategies on the writing performance of the six different groups suggests that the reformulation group enjoys a more statistically significant score. The learners in reformulation group outperformed all those in other groups. The written CF in this study could help learners with their explicit knowledge of L2 for the interface between the explicit and implicit knowledge does not occur immediately (N. C. Ellis, 2005), and this study was not a longitudinal one to find out whether the learners’ implicit knowledge is also affected. Reformulation proved to work with the students because the students in the group outperformed all the other students as statistically shown (see Table 7).
The findings in this research are in line with what Bitchener (2008), Bitchener and Knoch (2008), Gass (2003), and Lee (2008) claimed in that the performance of the learners in all the different groups improved. However, the findings go against the claim by Kepner (1991), Semke (1984), and Sheppard (1992) who reported no significant difference in the writing accuracy of the students, for in all the groups, there were significant changes in the pretest and posttest of the participants (see Tables 7 and 11). It should be mentioned that the design such as those of Polio, Fleck, and Leder (1998), or Sheppard (1992), did not include a nonfeedback control group. One big difference between this study and those of R. Ellis, Sheen, Murakami, and Takashima, (2008), Farrokhi and Sattarpour (2012) is that they focused on one aspect of grammar like articles. However, this study focused on a global change to grammatical accuracy.
Research findings of this study comport with those of Carroll and Swain (1993) in terms of the efficacy of negative feedback. However, as opposed to what they claimed, that is, the better performance of learners in the implicit groups, this study purported an enhancement of performance as a result of an explicit form of feedback, that is, reformulation in comparison with other forms of feedback. The study also corresponded to that carried out by Lightbown and Spada (1990) who suggested an overall improvement in the participants’ writing skill as a result of the incorporation of form-focused activities in instruction. The significant improvement in the learners’ scores especially in reformulation bears witness to this view. It can thus be claimed that error correction can lead to more accurate written forms and eventually better writing scores (Shintani & Ellis, 2013).
Although the findings in this research lent support to the view that explicit feedback, namely, reformulation, is more fruitful than other forms of CF, a few additional issues should be taken into account. The participants in the reformulation group outperformed (M = 6.93) all the learners in the rest of the groups. However, reformulation was the only feedback strategy that proved significant difference. All the other groups’ mean scores did not differ significantly whether the CF strategy used was explicit of one kind or another. In addition, the period between the two tests, the time frame, was only 10 weeks. The learners in implicit groups apparently needed more time to reflect, so a lengthier time interval might be inversely reported as being more fruitful. Nevertheless, the issue of time is one major problem in matters of high-stakes test. In other words, it should be emphasized that the luxury of time is an overburden over the shoulders of the stakeholders, be it the learners or their teachers.
Theoretical and Pedagogical Implications
Like many of the studies, the findings of this study adds to the bulk of knowledge regarding CF strategies. The main theoretical implication of the study is that it dealt with a particular domain of the language and revolved around a phenomenon, writing, under a “parent” category, socioculturalism. In this respect, the researcher studied writing without recourse to other territories or perspectives. Sociocultural theory is profoundly rooted in the Vygotskyan social constructivist views. The present research aimed at the feedback aspect of language learning and, in this sense, only dealt with one area of language learning. Therefore, it has much in common with what is claimed by Vygotskyan approach to learning through intervention and mediation.
Another significant contribution of the study is that the benefits of reformulation extend beyond form and grammatical structure and encompass other band descriptors in the IELTS writing scoring rubric, namely, task achievement, coherence and cohesion, and lexical resources because the feedback provider dealt with those things at the same time, for the definition of reformulation sanctioned feedback on other areas as well.
Using the findings of the research, from a pedagogical perspective, some implications can be drawn which could benefit stakeholders in high-stakes tests. Although, as Miller Cleary (1991) suggested, responses to language problems are beneficial as long as they are subsidiary to responses on content and ideas, the posttest results (see Table 9) suggest that learners preparing themselves for a high-stakes test benefit significantly from all kinds of feedback, but reformulation has a more significant role. Miller Cleary (1991) stated, “. . . demand for absolute correctness, rightness, or neatness distract[s] the writer’s concentration from other important aspects of the writing process” (p. 498). However, it is a very strong claim that anyone claims “absolute correctness.”
Likewise, Leki’s (1999) claim that refraining from focus on form could lower the learner writers’ anxiety was fundamentally questioned for almost all of the participants of this research who wished to sit for a high-stakes test claimed they needed feedback, and depriving them of any feedback could aggravate their anxiety rather than allay it. What Leki said might be true in some cases but most probably not in such contexts as that of this research.
The results of this study show that reformulation resulted in better performance while learning to write for IELTS Task 2 writing section. The participants in the other five groups improved as well, but the improvement they had was not statistically significant enough. One of the strange things about the findings of the study was the mismatch between how the learners felt toward the direct and indirect feedback and how they affected the learners’ performance on the test. It is imperative, therefore, that teachers pay attention to the fact that what learners think is more beneficial might not be so in effect. This could pave the way for another research in which researchers look for match and mismatch between what learners and even teachers think and what best works for the learners. It should also be borne in mind that the results of the study cannot prove whether the improvement in the scores might mean an enhancement in explicit or implicit knowledge as Shintani and Ellis (2013) also put forward.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research and/or authorship of this article.
