Abstract
In research on EFL writing, much attention has been paid to teachers’ practises in assessment or feedback, but little is known about teachers’ behaviors in these two domains as a whole. There also seems to be a paucity of research on how teachers’ reactions to student writing develop from pre- through in-service. The current study, using a cross-sectional method, aims to compare pre- and in-service teachers’ assessment and feedback in EFL writing pertinent to their potential changes and challenges in responding to student text. Three groups of participants (59 pre-practicum trainees, 31 post-practicum trainees, and 32 in-service teachers) in Mainland China were involved in a simulation task for assessing a descriptive text by using the given scoring rubrics and providing written feedback to the same text. The results of the assessment task showed that there is a salient change in the participants’ severity from pre- through in-service in assessing the student text; participants with less teaching experience focused on conceptual aspects of the text, whereas those with more experience highlighted linguistic issues in the assessment task; and some participants in each group had problems with proper use of the rating scales. Results of the feedback task revealed that all three cohorts at different professional development stages probably faced challenges in reacting to the text due to the limited quantity and poor quality of their written responses. These findings underscore the necessity for assessment literacy training for teachers in both pre-service and in-service programs.
Keywords
Introduction
Teachers of English play a pivotal role in students’ learning to write in English as a foreign language (EFL). There is much evidence that teachers’ effective assessment of student writing and feedback on it is conducive to the improvement of the teaching and learning of writing (e.g., Hawe & Parr, 2014; Lee, 2016a, 2017; Li, Link et al., 2015; Parr & Timperley, 2010; Yu et al., 2020). Much attention has been paid to teachers’ practises regarding assessment (e.g., Lee, 2007, 2008, 2012) and feedback (e.g., Lee, 2019a,b; Parr & Timperley, 2010; Zhao, 2010) relatively independent of each other, treating these as independent separate constructs. Yet feedback is a step in the assessment process, thus it offers the feasibility of addressing these two constructs as an integrated whole.
As is well known, teacher knowledge develops with the accumulation of teaching experience. However, studies on science subjects have revealed that pre-service teachers struggle with either properly evaluating students’ academic achievements or giving useful feedback in the field of chemistry education (Buck et al., 2010; Ropohl & Rönnebeck, 2019). Other relevant research has found pre-service and in-service teachers have difficulty interpreting students’ performance in the domain of science education (Furtak et al., 2016; Talanquer et al., 2015). Nevertheless, there appears so far to be little research on how teachers’ assessment of and reaction to student texts change from pre- through in-service in the practises of EFL writing. Specifically, little research has been done to investigate how teachers’ assessment of learning (AoL) varies as they progress through their professional development. Also, little is known about their assessment for learning (AfL) practises (and potential obstacles), where teacher written feedback serves the purpose of promoting student writing.
The present study, employing a cross-sectional design, aims to examine how teachers’ assessment of and feedback on student writing change over formative years and experiences. Data was collected from pre-practicum trainees, post-practicum trainees, and in-service teachers in Mainland China. They assessed one piece of descriptive text written by an English language learner and provided feedback on the same text. This study thus sets its sights on EFL teacher preparation and professionalization from pre- through in-service as it relates to teacher development in assessment and feedback in EFL writing. It should be noted that this paper was based on the PhD thesis of the corresponding author (Kong, 2018).
Literature Review
Teachers’ practises in assessment and feedback might be mediated to a certain extent by specific institutional and cultural factors. To this end, this section seeks to elucidate relevant literature in the areas of EFL writing learning and instruction, and teacher education in China locally, and teacher assessment and teacher feedback globally.
The Instruction of EFL Writing and Teacher Education in China
Effective writing skills in either one’s first language or a second language play a vital role in people’s communication in academic, economic, social, and cultural settings. In Mainland China, pupils learn to write in Chinese and English in primary school, as part of their Chinese or English classes. English Writing is a compulsory and completely independent course only at the higher education level for English Majors (including those pursuing the B.A. program of English Language Teaching, i.e., teacher training for primary and secondary schools). Nonetheless, within the prevalent exam-driven conventions of education at all levels in China, writing generally accounts for a great proportion of the scores of most English examinations (e.g., 23.3% of gaokao – university entrance examination, and 20% of TEM 4/8—Test for English Majors; see The MoE of China, 2019). It is a crucial factor for students’ success, especially at the secondary and tertiary education levels.
For the sake of improving students’ comprehensive language competence, China has initiated the New English Curriculum Standards nationwide for grades 1 to 9 since 2011, which address students’ Language Skills, Language Knowledge, Attitudes to Learning, Learning Strategies and Cultural Awareness (The MoE of China, 2011). Then, the English Curriculum Standards for grades 10 to 12 were released in 2017, targeting students’ fundamental literacy in English Studies, such as Language Competence, Cultural Awareness, Thinking Quality and Learning Ability (The MoE of China, 2017)
In terms of writing at the basic education stage (grades 1–9), the Curriculum Standards specify relevant requirements for students at corresponding proficiency levels. At the end of primary school (grade 6), learning outcomes include: (1) using capitalization and punctuation with basic accuracy; (2) writing simple greetings; and (3) writing short and simple headings and descriptions to fit pictures or objects (The MoE of China, 2011, p. 14). At the end of junior high school (grade 9), the following learning outcomes are targeted: (1) gather and organize material according to the purpose of the writing; (2) draft short letters and passages independently, editing them with the teacher’s guidance; (3) use common linking devices to express oneself fluently and logically in writing; (4) write simple descriptions of people or things; and (5) write simple paragraphs, instructions, and explanations according to prompts given in pictures or tables (The MoE of China, 2011, p. 17).
In the teaching of EFL, Wang and Fu (2011) reported that Chinese teachers generally spend three-quarters of a writing lesson explaining model essays and commenting, but students have relatively little time to practise in class. The teaching of writing seeks to involve students in practising and mastering the vocabulary and grammar they have learned, and the testing of writing mainly focuses on students’ correct use of language and grammar. Thus, traditional approaches seem to dominate in the writing classroom (Lam, 2016, 2017, 2020; Lam & Lee, 2010; Lee, 2016b). The observation of Wang and Fu (2011), who found the teaching and learning of writing to be far from being an activity of improving expression and communication, still seems to hold today.
Given the disjunction between the requirements of the New Curriculum Standards and teachers’ practises in real writing classrooms, it is imperative to adjust and strengthen teacher training programs to close the gap. However, there are certain issues with EFL teacher education programs for preparing teachers for elementary and secondary schools in Mainland China. First, the curriculum for the B.A. of English language education program contains three modules: General Studies (30% of total credits), English Studies (50% of total credits), and Teacher Education Courses (20% of total credits), all of which are prevalent practises in English teacher training. One could argue that this distribution of the credits is not optimal for a teacher training program. Second, the writing courses only account for a minor portion (approx. 3.8%) of the total credits, and only about 8% of the English Studies credits. Third, due to the limited class time, the writing lessons primarily aim to teach teacher trainees how to learn to write in English, with little attention paid to the development of their knowledge and skills in reacting to others’ written texts (such as their peers’), much less responding to the writing work of younger students. In English Language Teaching courses, there is very little instruction on how to teach writing. As a result, the teacher training program focuses on improving writing skills rather than promoting teacher trainees’ instructional abilities. The above issues stimulate an exploration of how teaching practises influence and enhance teachers’ assessment and feedback skills. Analyzing potential changes and challenges in these competencies across teachers might provide evidence-based recommendations for teacher training in comparable situations.
Teacher Assessment in EFL Writing
Teacher assessment serves two purposes: judging what learners have achieved in writing (i.e., assessment of learning, AoL) and enhancing learners’ processes of learning to write (i.e., assessment for learning, AfL) (see Hawe & Parr, 2014; Lam, 2016; Lee & Coniam, 2013; Mak & Lee, 2014). AoL and AfL are “not mutually exclusive” (Lee, 2007, p. 182) in terms of writing assessment, despite the fact that they work in different ways. Because both AoL and AfL are launched around specific goals in mind—AoL measures learners’ accomplishments against learning objectives, while AfL addresses the achievement of teaching objectives based on students’ authentic performance and problems—they can be integrated so as to better empower learners, promote learning, and improve teaching (Lee, 2017). Teachers, in this vision, are inextricably linked to the interrelationships between teaching, learning, and evaluation. It is, therefore, of critical importance to look into how teachers assess student writing in order to improve the effectiveness of writing instruction and learning. In other words, it is necessary to investigate teacher assessment literacy, or whether teachers are able to distinguish between the two types of assessment (formative and summative) and behave equitably in accordance with these functions.
Many factors, such as teacher perception of scoring criteria, severity or leniency, and the complexity of the rating process can influence teachers’ assessment of student text (Bejar et al., 2006; Lumley, 2005). Research has shown that teachers as raters differ from one another in a variety of ways, including their conformity to scoring rubrics, understanding of criteria used in their assessment, severity or leniency in grading texts, interpreting and employing rating scales, and consistency of student text and rating criteria (Lumley, 2005; Weigle, 2002; Weir, 2005). Teachers’ evaluations of student writing have also revealed certain parallels. For example, teachers tend to focus on language form, such as vocabulary and grammar (Lee, 2007, 2011), rather than content, such as ideas, organizations and style of a text (K. Hyland & F. Hyland, 2006), and more experienced teachers are stricter with student texts than those with less teaching experience, giving significantly lower marks and more negative remarks on conceptual, organizational, and linguistic issues (Shi et al., 2003). However, research has also indicated that teachers are not completely successful in the AfL for EFL writing (Lee & Coniam, 2013).
As a result, more research into the growth of teachers’ knowledge and skills in writing assessment from pre-service through in-service is required. Only when teachers are equipped with adequate assessment literacy can they perform meaningful assessments to improve student learning.
Teacher Feedback in EFL Writing
Feedback has been viewed as “input from a reader to a writer with the effect of providing information to the writer for revision” (Keh, 1990, p. 294). That is, teacher feedback on student texts is linked to assessment for learning, as it refers to teachers’ provision of comments, problem identifications, and suggestions to help students improve their writing through revising issues such as content, organization, language, etc. Despite the fact that there was much dispute over the necessity and effectiveness of teacher feedback on student writing (e.g., Ferris, 1999, 2004, 2006; Truscott, 1996, 1999), numerous studies have backed up Ferris’s findings and demonstrated the benefits of teacher feedback. For example, Ferris et al. (1997) argue that teacher written feedback on student text is critical to stimulating and encouraging students. Teacher feedback has been widely recognized as pedagogically beneficial (e.g., Hedgcock & Lefkowitz, 1994; Yang et al., 2006) and primarily informative in scaffolding responses and suggestions to promote improvements (F. Hyland & K. Hyland, 2001).
Research has shown that teacher feedback on student text has witnessed a shift from a predominant focus on issues regarding language errors (Zamel, 1985) to issues concerned with content and organization (see Caulk, 1994; Conrad & Goldstein, 1999). Nevertheless, research has suggested that teachers should give equal weight to content, organization, language, and other factors in their feedback on student texts (see Ferris, 2003; K. Hyland & F. Hyland, 2006). More recent studies, however, have indicated that teachers still prioritize linguistic issues when providing feedback to student writing (e.g., Furneaux et al., 2007; Lee, 2008).
It is obvious from the above that teachers give preference to local issues (e.g., vocabulary and grammar) rather than global features (e.g., content and organization) in their feedback, though there has been a voice for AfL in EFL writing (Lee & Coniam, 2013) which emphasizes quality feedback. As yet, however, little is known regarding the changes in teachers’ awareness, knowledge and skills in both assessing and giving feedback to student writing as they progress through their careers. It is therefore worthwhile to investigate the latent differences and potential difficulties in responding to EFL student writing among pre-service and in-service teachers. The present study is therefore expected to provide teacher educators with informative testimonies and contribute to the development of new EFL teacher training schemes that are relevant to teachers’ development of competencies in both AoL and AfL from pre- through in-service.
The Study
Research Questions
The purpose of this cross-sectional study is to compare pre-service and in-service teachers’ assessment of and feedback on EFL student writing in order to discover improvements and problems in teacher assessment and feedback knowledge and skills at various stages of professional development. The study is designed to answer the following research questions:
What is the difference between pre- and in-service teachers’ assessment of the same EFL student text?
How do these groups differ in giving feedback to the same English learner’s text?
Does their severity level of assessment influence their feedback given to the student text?
Participants
To address the research questions, a cross-sectional study was conducted that adopted convenience samples of participants from Mainland China. Specifically, three groups of respondents at different stages of their professional development were involved in the study: 59 pre-practicum trainees (male = 3, female = 56), 31 post-practicum trainees (all female), and 32 in-service teachers (male = 5, female = 27). All participants engaged voluntarily in the current study. They were asked to evaluate a student text as well as give written feedback to the same text. The tasks were done pseudonymously and independently by all participants. As to teaching experience, the pre-practicum trainees were in their third year of a B.A. program and had no formal teaching experience at the time of data collection. The post-practicum trainee group had just completed an 8-week-long teaching practise in schools. The in-service teachers’ teaching experience ranged from 1 to 27 years. Among them, novice teachers and experienced ones were represented in equal ratios, that is, 17 teachers had teaching experience of less than 5 years, and the other 15 had teaching experience of more than 5 years. In terms of the degrees teachers obtained, 18 had a B.A. degree in English teaching, 10 had an M.A. certificate in this field, and four had no degree but had just completed a 3-year English teacher training program.
The in-service teachers are treated as one group in the analysis of data and discussion of results, due to the homogeneity of the novice and experienced teachers’ behaviors reacting to the EFL learner text. That is, Independent Samples T-Tests revealed no significant differences in the mean values of the assessment criteria and feedback data. In the assessment task, both parties graded the overall quality of the text relatively equally (Mean experienced teachers = 4.25, Mean novice teachers = 4.12) with small standard deviation values (0.65 for the experienced teachers and 0.61 for the novices), and both rated the style of the text considerably low (M = 3.87 for the experienced teachers and M = 3.65 for the novices), demonstrating the two subgroups’ unanimity on the assessment task. In the feedback task, no differences were identified in the two subgroups’ responses to the learner text either.
Instruments
Since the present study is designed to compare pre- and in-service teachers’ knowledge and skills in assessing EFL student writing, participants must complete the same assessment task in order to better identify the differences in their reactions to the same student text. For this purpose, the three groups described above were requested to take part in a simulation task for assessing the identical text written by an English language learner, Pat (a pseudonym). Pat’s assignment was to write a description of a place that students knew well. The descriptive text, 436 words in total, was examined closely by an expert panel beforehand. The panel agreed that the text featured numerous problems at all linguistic levels and was therefore appropriate for the feedback tasks. For the present study, Pat was contacted and informed, and the use of Pat’s text in the study was authorized.
The simulation tasks were designed in English and then translated into Chinese because an earlier pilot study of a questionnaire in English targeting Chinese EFL teachers’ experiences in the practicum revealed that communication in the participants’ mother tongue yielded more information (Kong, 2017). For research question 1 (targeting AoL for writing), the three groups were asked to rate the text on a 5-point scale (1 refers to extremely poor, 5 to extremely good). The overall quality of the text, content, structure, style (including word choice and expression), grammatical correctness, and mechanics were included in the evaluation criteria, which are proposed and widely used in relevant studies (see Knoch, 2011; Lee, 2011). For research question 2 (targeting AfL for writing), the same participants were asked to indicate the errors and problems in the text, using their own system of correction, such as underlining “_____,” circling “○,” or any other marks they usually use; also, they were asked to write a few sentences of feedback in Chinese to Pat who wrote the text. The prompt explained that participants could praise certain aspects of the text, highlight some problems, or make some suggestions to help the student improve.
Data Collection and Analysis
The paper-and-pencil method was used to collect data from participants in Mainland China in the autumn of 2017, using the convenience sampling method. The three groups of participants were presented with the same sample text from Pat, and they were asked to complete three tasks: (1) rate the text using the rubrics provided; (2) identify the errors and problems in the text; and (3) give written feedback to the same text.
For research question 1, the participants needed to score the text using the criteria stated above. Then, the Many-Facet Rasch Measurement (MFRM) was used to analyse the participants’ reactions toward the sample text and the assessment criteria as well.
For research question 2, the participants responded to the strengths and weaknesses of the text by identifying problems, giving written feedback (positive and/or negative feedback), and making suggestions. Qualitative data analysis typically involves a preliminary reading of the data, generating subcategories, defining categories, and revising the main categories and subcategories (Schreier, 2012). Accordingly, all responses were sorted into several categories based on the marks and themes of the comments made by the participants, partially referring to the analytical framework of feedback types by Ferris (2003). Derived from the frequencies of the respondents’ reactions to the given tasks, the participants’ responses were translated into English and coded into six aspects: holistic, content, structure, style, grammar, and mechanics. Then, MANOVAs were conducted to identify if there were any differences among the three groups’ written feedback given to the student text in terms of the above factors.
For research question 3, MFRM was used to elicit the severity level of each group based on their 5-point scale ratings of the same student text. The participants in each group were divided into two subgroups: severe (measure logit score > 0) and lenient (measure logit score < 0). Then, Independent Samples T-tests were conducted to identify the differences within each subgroup.
Results and Discussion
RQ1: What is the Difference Between Pre- and In-Service Teachers’ Assessment of the Same EFL Student Text?
In order to demonstrate the differences among the three groups’ assessments of the same EFL student text more clearly, it is necessary to first illustrate each group’s performance. The results of the pre-practicum trainees’ assessment are presented in Figure 1.

FACETS variable map of the pre-practicum trainee group in the assessment (N = 59).
The variable map in Figure 1 shows that roughly half of the pre-practicum trainees were harsh with the student text, while the other half were relatively lenient. It demonstrates that the individuals are of varying degrees of severity. The model fit values (separation index = 1.72, separation reliability = .75, χ2 = 213.2, p < .05) also support this result.
Results from the many-facet Rasch analysis also showed a large ratio of misfit and overfit cases. In particular, 12 out of the 59 pre-practicum trainees were above the misfit values (i.e., the infit mean square > 1.5), indicating overly inconsistent rating behaviors among these participants due to their selective attention to certain scoring criteria (Bachman, 2004). Another seven pre-practicum trainees were below the overfit values (i.e., the infit mean square < .5), exhibiting excessive consistency with the specified scoring categories, which suggests a central tendency effect on different rating scales. That is, these participants were unable to distinguish between the various scoring categories and tended to grade equally across all of them (Knoch et al., 2007). Taken together, this raises concerns about these respondents’ complete understanding and proper use of the rating criteria. Their diversity is therefore an essential signal for teacher training regarding assessment knowledge and abilities.
As to the criteria used in the assessment, the pre-practicum trainees were harshest on the structure and the most lenient on the holistic assessment of the sample text. From Figure 1, it can also be seen that the measure logit scale between these criteria is not particularly large, varying between 1 and minus 1. Still, the Separation index (= 1.22), Separation reliability (.60), and the chi-square values (= 12.4, df = 5, p < .05) indicate that the assessment criteria were significantly different from one another in the pre-practicum trainees’ ratings. This result is twofold. On the one hand, the assessment criteria seemed to work well in the simulative assessment task for the pre-practicum trainees due to the sound self-differentiation of the criteria items; on the other hand, some respondents didn’t differentiate between them. This finding also gives evidence for rater training targeting teacher assessment literacy.
In the same vein, the results of the post-practicum trainees’ assessment are presented in Figure 2. It is clear from the variable map that around half of the participants rated harshly on the sample text. However, a wide range of measure logit scores was discovered, that is, ranging from 4 to minus 4. The results revealed significant differences between the participants’ severity levels toward the student text. The model fit values (separation index = 2.06, separation reliability = .81, χ2 = 165.3, p < .01) corroborated this outcome.

FACETS variable map of the post-practicum trainee group in the assessment (N = 31).
Results from the MFRM model also revealed a relatively high rate of misfit and overfit cases. To be specific, 3 out of the 31 post-practicum trainees were above the misfit values, indicating that these participants’ rating behaviors were too inconsistent. Four of them were below the overfit values, indicating that they were excessively consistent. These findings suggest that, similarly to the pre-practicum trainee group, the same problems regarding writing assessment are also identified in the post-practicum trainees’ reactions to the sample text.
In terms of the criteria, the post-practicum trainees scored most severely on style and relatively harshly on grammar, but leniently on mechanics, content, holistic judgment, and structure. It can be seen in Figure 2 that the range of measure logit scores for these criteria is greater than that for the pre-practicum trainees, which varies between 2 and minus 1. Also, the Separation index (= 1.62), Separation reliability (.73), and chi-square value (= 18.8, df = 5, p < .01) indicate that the assessment criteria were significantly different from one another in the post-practicum trainees’ ratings, indicating that the assessment criteria acted properly in the post-practicum trainees’ assessment of the same sample text. Yet, given some participants’ central tendency effect on the rating scales, this finding is also a call for teacher preparation in the pre-service programs targeting teacher assessment competence.
The results of the in-service teachers’ assessment of the student text are presented in Figure 3. One can see from the figure that most teachers were lenient toward the text with measure logit scores below 0. Yet, the wide range of measure logit scores can be also seen, which extends between 5 and minus 5. The MFRM model fit values (separation index = 2.33, separation reliability = .84, χ2 = 203.4, p < .01) also confirmed the significant differences in teachers’ severity levels.

FACETS variable map of the teacher group in the assessment (N = 32).
Results from the MFRM model showed a higher ratio of misfit and overfit cases compared to the other two groups. Five out of the 32 in-service teachers were above the misfit values, indicating overly inconsistent rating behaviors among these teachers. Seven teachers were below the overfit values, indicating they were too consistent. It seems that, within the same context, the three groups of participants at different stages of professional development are confronted with the same problems in assessing EFL student writing, which provides critical evidence to the pre- and in-service teacher training programs for references in this regard.
As for the criteria, it is worth noting that teachers also tended to be most strict on style and most lenient on content and the holistic score. The measure logit scores between these criteria range from 1 to minus 1. Nonetheless, the Separation index (= 1.72), Separation reliability (.75), and the chi-square value (= 19.8, df = 5, p < .01) indicate that the assessment criteria in teachers’ ratings were significantly different from one another. Taken together, it indicates that the assessment criteria worked soundly in the simulation task for the three groups’ assessments of the identical sample text, though there were participants in each group who tended to be consistent or inconsistent with the scoring criteria items.
Having discussed the pre- and in-service teachers’ assessment separately, then, a comparison of the three groups’ evaluations of the same student text is imperative. A thorough examination of Figures 1 to 3 and a synthesis of the findings revealed above clearly what changes the participants experienced and what challenges they encountered in the task of assessing the same student text.
First, the results of the three groups’ assessment suggest that the EFL learner’s text is considered to be of higher quality corresponding to the raters’ longer teaching experience. Surprisingly, this finding in our study conflicts with former research on university teachers’ evaluation of English Majors’ writing, which found that experienced teachers tend to be severer than less experienced ones (Shi et al., 2003). This could be because university teachers have higher expectations of Language Majors’ writing abilities, whereas school teachers in the present study know their students’ English writing skills well and tend to be more understanding of students’ writing problems in order to encourage them to keep writing with more self-efficacy. There is much evidence that positive feedback can have a huge influence on students’ conceptions of themselves and can accordingly boost their enthusiasm and effort in a task (e.g., Deci et al., 1999; Hattie & Timperley, 2007).
In contrast, the pre-practicum trainees generally scored harshest on the learner text, which can be partially attributed to the exam-driven culture of the study settings (Cf. Yu & Suen, 2005), that is, the participants without much formal teaching experience tended to be severe on the sample text probably because they themselves had numerous exams (e.g., TEM 4/8, introduced in the Literature Review Section) to cope with and followed the rating scales strictly for the sake of preparing for such exams.
Interestingly, the findings show that another change emerges from the three groups of participants’ preferences for the given criteria used in the assessment task. All participants were generally lenient with the holistic scoring but they had different severity tendencies when it came to the specific domain of the rating scales. Specifically, the pre-practicum trainees rated harshly on the structure of the sample text, but the post-practicum trainees and teachers were more severe on the style of the same text. This could be regarded as a result of their responsibilities shifting. In the initial teacher training stage, pre-service teachers are supported to improve their knowledge of their subject area (Chan, 2016). That is, the pre-practicum trainees in the current study were in a critical period of developing their writing skills, especially the conceptual aspects of writing, and therefore paid close attention to the formulation and organization of ideas. As discussed above, this might in turn help them cope with various exams, especially those requiring writing abilities. The findings, on the other hand, suggest that teaching experience, whether extensive or limited, seemed to shape or reshape their perceptions of the quality of the text. With the increase in teaching experience, both post-practicum trainees and in-service teachers turned to focus on the language problems in the text. This finding is consistent with the existing studies (Cf. Lee, 2011). However, focusing on the surface features of writing rather than learners’ cognitive development might not function well in greatly enhancing their writing skills.
Unexpectedly, the findings also suggest that certain participants at each different phase of professional development faced challenges in accurately understanding and using the rating scales. More surprisingly, earlier results of the many-facet Rasch analyses revealed that individuals with less and more teaching experience, respectively, appeared to face more obstacles and challenges than the post-practicum trainee group. To be more specific, approximately 32% of the pre-practicum trainees, 38% of the teachers, and 23% of the post-practicum trainees respectively had difficulty characterizing the offered rubrics for the assessment task. For the pre-practicum trainee group, this can be attributed to the lack of appropriate courses in the initial teacher training programs as introduced earlier. The teacher group may have also witnessed the deficiency of a similar curriculum in the same educational setting. Furthermore, the teachers may have formed their own ways of evaluating student work after starting their teaching career. For example, they may have tended to score holistically rather than analytically. The post-practicum trainees, after an 8-week short-term internship, however, were cautious about students’ performance, probably because they saw themselves as beginner teachers and were more serious and responsible for the evaluation task in order to “survive” (e.g., Çakmak et al., 2019). But in any case, it is noticeable that one-quarter to one-third of the three groups didn’t sufficiently understand and correctly use the grading criteria, which should serve as a red flag to teacher educators. This finding gives evidence for revising the teacher training curriculum, not only improving teachers’ theories of assessment (Brookhart, 2011) at all stages but also increasing their evaluation opportunities and practises as well (Genç, 2016).
RQ 2: How Do Pre- and In-Service Teachers Differ in Giving Feedback?
The three groups (59 pre-practicum trainees, 31 post-practicum trainees, and 32 in-service teachers) were also involved in the simulation task of giving feedback to the same text (written by Pat), including identifying its problems, providing written feedback and making suggestions to the writer. Table 1 summarizes the frequencies of the responses of the pre-practicum trainees, grouped into categories emerging from the data. They seemed to focus on grammatical and lexical issues when identifying the problems in the student text, echoing the results of previous research (e.g., Furneaux et al., 2007; Lee, 2008). They praised the vocabulary and the overall quality of the text when giving positive feedback and focused on its structure in their negative feedback. They gave more suggestions regarding the structure and grammar.
The Pre-Practicum Trainees’ Responses to the Student Text.
However, it is notable that the pre-practicum trainees didn’t address the given aspects sufficiently, considering the numerous problems in the learner text. That is, the Mean value (pieces of feedback information) of most aspects was smaller than one, particularly the Mean value of all aspects regarding positive and negative feedback and suggestions was less than one, which mainly address meaningful feedback to the text for improvement, that is, targeting AfL for EFL writing. It seems that the participants may have felt uncomfortable with the given tasks or that they had difficulty with such tasks. In either case, this could be an indicator for pre-service teacher education programs to attach importance to effective teacher feedback.
Table 2 summarizes the feedback provided by post-practicum trainees. The results show that the post-practicum trainees focused on grammatical issues when identifying the problems in the student text, which is basically in line with the results of the pre-practicum trainees. They praised the content and structure of the text when giving positive feedback and highlighted grammar in their negative feedback. They gave more suggestions regarding the structure and grammar issues. Likewise, the post-practicum trainees didn’t provide as much feedback as could be expected based on the text, either. It seems that the short length of teaching practise didn’t distinguish them from their pre-practicum counterparts in reacting to EFL writing. In this case, it is also an indication that pre-service teacher preparation programs should target genuine teacher feedback.
The Post-Practicum Trainees’ Responses to the Student Text.
The teacher group’s responses are presented in Table 3. Similarly to the trainees, the teachers focused on grammatical issues when identifying problems in the student text. They commended the vocabulary of the text when giving positive feedback and focused on grammar in their negative feedback. They gave relatively more suggestions regarding the structure and grammar issues. It is also noticeable that, similarly to the trainee groups, the teacher group didn’t address the given tasks abundantly, either. This result is somewhat surprising for longer teaching experience is supposed to influence, much or less, teachers’ thinking and actions in their written feedback (see Junqueira & Kim, 2013). Findings in the present study suggest that the participants’ AfL for EFL writing didn’t advance substantially with their growth in teaching experience. This is a critical alert for teacher training in both pre- and in-service programs.
The Teachers’ Responses to the Student Text.
From the analyses above, it seems that all three groups of participants exhibited similar tendencies and problems when providing feedback to the student text. In particular, the results indicate that the problems identified as well as the feedback and suggestions given by the three groups were all quite limited in either quantity or quality. This finding justifies the results of previous research on pre-service teachers’ challenges in giving corrective feedback to student writing (see Guénette & Lyster, 2013). The findings also show that teachers’ skills in responding to student text seem not to develop greatly in tandem with their growth in teaching practises. This could be interpreted as a lack of corresponding feedback knowledge and skills (Brookhart, 2011), or a weakness in their own writing competencies (Bouchefra, 2015). Furthermore, such unexpected findings might be explained by the difficulty of discovering and correcting the problems embedded in the text (Brenes, 2017) or the fear of making mistakes (Guénette & Lyster, 2013) in providing constructive feedback to the text. But no matter what the reason is, these findings are a clear indication that teachers at various levels of professional development are in urgent need of feedback-related theories and skills (Genç, 2016). Thus, teacher trainers need to reexamine and reset initial teacher preparation programs to offer corresponding courses in the realm of constructive feedback.
Still, did the three groups’ responses show any discrepancies, and if so, to what extent? MANOVAs were run to identify the possible differences among the reactions of the three groups. Results showed that the pre-practicum trainees detected more problems regarding vocabulary (F = 3.38, p < .05) than the post-practicum trainee group. Furthermore, the pre-practicum trainees tended to be more positive toward the overall quality of the text (F = 3.55, p < .05) than the post-practicum trainees. No significant difference was found in the trainee and teacher groups’ negative feedback and suggestions supplied to the student text (probably due to the very limited responses in this regard).
In general, the feedback from the three groups yields dual results. On one hand, it is not surprising that the participants in each group invariably focused on language errors in their feedback, as this has been observed in previous research around the world (e.g., Lee, 2008; Zamel, 1985). Despite this, it seems that the issue of effective feedback, which was addressed decades ago, still prevails today, calling for the immediate attention of teacher educators. On the other hand, it is surprising that the feedback from all three groups seemed to lack in contribution to the meaning-making of the learner. As introduced earlier, teacher training courses mainly target trainees’ own writing skills and leave out their knowledge and skills in assessment and feedback, and teachers’ classroom teaching primarily addresses students’ language skills in writing, resulting in teachers’ lack of appropriate and complete reactions to student writing. For example, many teachers usually give general feedback to students’ writing with implicit terms, such as “very good, perfect, well done” (see Ferris et al., 1997; Waring, 2008), which does not provide valuable information for students to revise or rewrite. The lack of awareness and understanding among teachers about how to provide constructive feedback raises concerns that stakeholders should rethink and reinforce the effectiveness of both pre- and in-service teacher training programs.
RQ 3: Does Pre- and In-Service Teachers’ Severity Level of Assessing the Same Student Text Influence Their Feedback Given to It?
Research question 1 targeted the assessment of learning, and research question 2 addressed the assessment for learning, both using the same sample learner text. Research question 3 aims at identifying the relationships between the assessment behaviors of the participants in these two scenarios.
Earlier, the Many-Facet Rasch Measurement (MFRM) was used to elicit the three groups’ severity levels based on their ratings of the same student text on a 5-point scale. Then the participants in each group were divided into two subgroups: the severe group (measure logit score > 0) and the lenient group (measure logit score < 0) (see Table 4).
Severe and Lenient Participants in Each Group.
Independent Samples T-tests were conducted to compare the severe and lenient subgroups within each group on the feedback information they provided. Only a handful of differences were revealed by this comparison. Among the pre-practicum trainees, participants in the severe group recognized more vocabulary problems (t = 2.22, p < .05) than those who were lenient (M = 2.9 for the severe group, M = 1.67 for the lenient). Similarly, among the post-practicum trainees, the severe group identified more vocabulary problems (t = 3.07, p < .05) than the lenient ones (M = 1.92 for the severe group; M = 0.56 for the lenient). Teachers in the severe group made more suggestions regarding the structure of the text (t = 2.81, p < .05) than their lenient counterparts, who didn’t reflect on this issue in their written feedback (M = 0.31 for the severe respondents).
In sum, it seems that the participants’ severity level didn’t have an extensive influence on the three cohorts’ written feedback probably due to the very limited quantity and poor quality of their responses to the student text. The results would give a glimpse of what teachers see as features of the text the writer has control over (and can learn about from [teacher] feedback and from no other source).
Implications and Conclusions
The current study examined how teacher assessment literacy develops in the Chinese context. Although the small sample size of each group in this study makes the generalization of the findings impractical, the results of the three groups’ assessment and feedback suggest that teachers at various stages of professional development give preference to AoL for EFL writing. Despite this, the findings of the assessment task show that there is a salient change in the participants’ severity toward the sample text from pre- through in-service. Teachers relatively tend to be more understanding of student writing than the two groups of pre-service trainees. Yet, there are clear signs of inconsistent rating behaviors and the presence of the halo effect among the respondents within each group. This is an indication that assessment literacy specific to teaching EFL writing skills is an area worth including in pre-service and in-service teacher education programs (Genç, 2016; Guénette & Lyster, 2013).
In terms of teacher feedback, the findings suggest that both the trainee and teacher groups’ focus was similar on text features. The three groups generally highlighted overly language-related issues that are easier to judge and respond to, but they seemed to overlook more conceptual problems of the student text. The findings also suggest that neither teaching experience nor raters’ severity level has a systemic effect on the three groups’ feedback, which is somewhat unexpected but still reasonable given numerous studies showing that EFL teachers prefer to focus primarily on language errors when providing written feedback to student writing (see Cheng & Zhang, 2021; Rao & Li, 2017). It suggests that the call by Zamel (1985) to make assessment and feedback contribute to students’ meaning-making still seems to be pertinent.
It can be drawn from the findings that the three groups’ reactions to the student text in this study are in a large part influenced by their cultural, educational and institutional settings and values (Lee, 2007). Specifically, the product-based method of writing instruction prevails in the examination-driven culture of learning and instruction, and as a result, classroom teaching and even corresponding writing assessment narrow their focus mainly on lexical and syntactical issues. Yet the imbalanced proportion of language teacher education courses seems not to be sufficient for preparing teachers very well in these areas of writing teaching and assessment. As such, teachers have a significant barrier in attempting to use AfL for EFL writing due to a lack of appropriate evaluation skills and insufficient English language education programs. It is clear that the three groups of participants in the present study, at different professional development phases, seem to be spontaneously more familiar with the conventional practises of assessing writing, that is, AoL for EFL writing, but are hesitant from pre- through in-service to use the more meaningful assessment and feedback, that is, AfL for EFL writing.
Future research could adopt a longitudinal design with a larger sample size to investigate the genuine evolution of teachers’ (including pre-practicum trainees, post-practicum trainees, novice teachers, and experienced teachers’) assessment and feedback practises, particularly in the light of AfL for EFL writing. Further research should also include more diverse text types, such as informative and persuasive essays, in order to examine teachers’ writing and writing instruction-related knowledge in greater depth and provide a more complete picture of teachers’ knowledge of assessment and feedback in EFL writing, as well as their ongoing development from pre-service through in-service.
Footnotes
Acknowledgements
The authors would like to thank the anonymous reviewers for their insightful comments and constructive guidance in the revision of this manuscript. The authors also thank all pre- and in-service teachers for their participation.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Chongqing Municipal Social Sciences Planning (Key) Project (2021NDZD13).
