Sage Journals: Discover world-class research

Abstract

Technological innovations have promised a great deal to language teachers over the years, raising expectations but without always delivering the benefits we hope for. The potential of automated writing evaluation systems and generative artificial intelligence tools such as ChatGPT, however, might change this. Their ability to relieve teachers of hours of marking while providing instant local and global written corrective feedback across multiple drafts targeted to student needs and in greater quantities seems too good to ignore. In this short ‘viewpoint’ paper, I explore the main pros and cons of these developments and ask if generative artificial intelligence (GenAI) tools are just robotic marking machines or whether they actually help improve our feedback and the writing skills of our students.

Keywords

Writing feedback GenAI automated written evaluation technology and language teaching

Introduction

Technological innovations have often been greeted with caution by language teachers suspicious of both the hype which tends to surround them and the threats they can pose to the integrity of student work. However, with ever-increasing class sizes and constant admonishments to provide ever more – and more useful – feedback on students’ assignments, teachers might welcome the arrival of generative artificial intelligence (GenAI) applications in classrooms. Their potential to relieve teachers of hours of marking while providing instant local and global written corrective feedback across multiple drafts targeted to student needs and in greater quantities seems too good to ignore. In this short ‘viewpoint’ paper, I want to explore the main pros and cons of these developments, particularly in higher education contexts, and ask if GenAI tools are just robotic marking machines or whether they actually help improve our feedback and the writing skills of our students, whether L1 or L2 learners.

GenAI, Feedback, and Writing Instruction

The benefits of feedback on student written work is now well attested in the literature and needs no cheerleading from me. Teachers are encouraged to provide feedback that is timely, personalized, and detailed (Hattie and Timperley, 2007), that encourages student engagement (Carless, 2016), and that contains do-able recommendations for improvement (Ferris and Kurzer, 2019). Delivering on these strictures, however, is less straightforward and imposes heavy demands on teachers already burdened with substantial workloads. A recent survey of US teachers, for example, found they spent 9.9 hours per week grading, and 32% seriously considered leaving the profession in the past year because of this (Learnosity, n.d.). In the UK, a National Audit Office report highlighted excessive marking workloads as a key reason teachers leave (Hudson, 2025). Nor is delivering such high-quality feedback always practical, particularly in the context of large-scale assessments such as the Test Of English as a Foreign Language (TOEFL), massive open online courses (MOOCs), or large class sizes.

Automation has the potential to change all this and, in the last few years, we have seen new digital resources riding to the rescue, promising a new dawn of support for teachers suffering from feedback burnout. There are huge potential benefits to AI's ability to correct and explain language use, offer example sentences and translations (Kohnke, 2024), and scaffold and review students’ argumentative writing (Su et al., 2023). Recent developments have produced tools which can create automatic translations, error corrections, and automated scoring systems. It is now relatively straightforward for teachers to use these programmes to give feedback on student texts through text exemplars for specific content and linguistic needs, to generate test items, and to foster learner autonomy through user inquiries, content creation, and feedback with metalinguistic explanations (Godwin-Jones, 2024). More generally, in a recent review of 24 studies, for example, Khalifa and Albadawy (2025) identify six domains where AI helps academic writing and research: (1) facilitating idea generation and research design, (2) improving content and structuring, (3) supporting literature review and synthesis, (4) enhancing data management and analysis, (5) supporting editing, review, and publishing, and (6) assisting in communication, outreach, and ethical compliance.

So far, so good, but there is, as usual, vinegar in the salad oil. Attracting most controversy, of course, is GenAI's apparent ability to instantly produce text in an appropriate register across any genre or discipline through a simple natural language prompt. No teacher is unaware of the risks here. Feedback occurs in a context of instruction, and AI feedback is intimately related to text generation itself. The worry is that students might submit AI-generated texts as their own – and so far it has proved impossible to identify these texts with any certainty (Gao et al., 2023). A recent survey of 3017 high school and college students in the US, for example, found that almost one-third confessed to using ChatGPT for assistance with their homework (Pudasaini et al., 2024). The rise of large language models (LLMs) such as GPT-4, Claude 3.5 Sonnet, and Gemini has therefore led to a surge in academic misconduct and ‘an ongoing technical arms race between detection technologies and evasion tactics’ (Pudasaini et al., 2024).

While studies show that GenAI use is difficult to automatically detect, even with specialist tools such as DetectGPT, RADAR, and GPT-Sentinel, automatically generated texts might not always deliver what is hoped for. Accompanying citations, references, and even content may be factually incorrect (or ‘hallucinated’), and AI writing can seem awkward, impersonal, and shallow. Work I’ve been doing with Kevin Jiang (Jiang & Hyland, 2025a, 2025b, 2025c), for example, shows that ChatGPT produces impressively coherent academic texts. However, to do so it uses a narrower selection and more repetitive range of lexical bundles, significantly fewer epistemic and attitudinal stance markers – particularly questions and personal asides – and exhibits far less authorial presence in its essays compared with student writers (Jiang & Hyland, 2025a, 2025b, 2025c). Research also points to the negative impact of AI use on critical thinking, authorship, and academic integrity (Crompton et al., 2024), which may deprive students of learning opportunities (Barrot, 2023).

So not everything in the AI garden is rosy. GenAI tools, of course, are not human beings. They have only limited topic comprehension, a restricted contextual awareness, an inability to critically assess information, and a deficiency in higher-order thinking skills. But while they are still far from capturing the subtleties of human writing, LLMs have considerable potential to deliver personalized feedback at scale.

Automating Writing Evaluation

We arrive at this point after several years of seeing improvements in automated writing evaluation (AWE), a feature which emerged at the turn of the century to provide students with instant scoring and corrective feedback (Warschauer and Ware, 2006). While early versions were criticized for their over-reliance on surface-level corrections, neglect of rhetorical features, and limited feedback specificity (e.g., Ranalli et al., 2017), recent renderings show much more promise. Grammarly, ProWritingAid, Criterion, and PaperRater can all flag grammar, structure, and style mistakes using natural language processing (NLP) and machine learning, which help the tools to both understand human writing and improve over time. More recently, AWE systems have incorporated dictionaries, thesauruses, and writing prompts to become intelligent tutoring systems.

Overall, studies show that AWE has a positive effect on writing development, although with some reservations (see Zhai and Ma, 2023, for a meta-analysis). AWE can encourage L2 students to improve the quality of their L2 drafts, with reduced errors, longer texts, and higher scores (e.g. Zhang and Hyland, 2018). Learners see the process as empowering as they have control over their revising and gain the confidence of submitting error-free (or reduced error) work to a teacher. Zhang and Hyland (2018) found that students saw the opportunity to revise an essay multiple times at their own pace without the need to wait as a major advantage of AWE feedback, while other studies have shown that it can promote multi-drafting and learner autonomy (Chen and Cheng, 2008). However, Stevenson and Phakiti (2019) discovered that while L2 students using AWE reduced their errors across drafts of the same assignment, this learning did not transfer well across tasks.

Teachers who encourage their students to use these tools will know that they offer them some relief from the drudgery of mundane grammar correction. Principally this is because students can submit an assignment to the programme as many times as they like to improve their score before the teacher sees it. Once a student has managed to reach a threshold score set by the teacher, they can then read the corrected draft without struggling through mechanical errors, perhaps augmenting this with both teacher and peer feedback. Students sometimes receive AWE feedback together with teacher comments, either separately (Zhang and Hyland, 2018) or inserted through the AWE system (Grimes and Warschauer, 2010). When students use AWE first in this way it allows teachers to spend more time on organization, content, and critical thinking issues (Wilson and Czik, 2016). Overall, it seems that positive student and teacher perceptions are greater where the software is used regularly for pre-writing and drafting. In fact, the success of AWE may well depend on how it is integrated into L2 classrooms.

Teachers, then, are not completely off the hook when it comes to marking student writing. AWE systems seem to be particularly effective when they are used in conjunction with teacher and peer feedback. One line of research has dichotomized teacher and peer feedback (e.g., Murillo-Zamorano and Montanero, 2018), or opposed teacher feedback with computer-generated feedback (e.g., Dikli and Bleyle, 2014). But these dichotomies fail to reflect the realities of student learning and ignore the fact that in real classrooms students often have access to more than one type of feedback. Combining AWE with teacher feedback seems to yield greater improvements in writing performance (Han and Sari, 2024), with teachers offering more substantive, higher-order feedback (Wilson and Czik, 2016). Han and Li (2024), for example, asked over 100 students to complete two writing tasks, with both corrective and holistic feedback provided by ChatGPT which was later modified by teachers, with the students incorporating more of this co-produced feedback into subsequent revisions.

A key aspect of the process is engagement, or students’ willingness and ability to make use of the feedback they receive, and this shifts our focus to seeing students as active users of feedback rather than passive receivers. In one study, for example, Victor Zhang and Hyland explored how a group of 33 Chinese students engaged with an approach that integrated AWE, peer, and teacher feedback (Zhang and Hyland, 2022). Analysing multiple assignment drafts, interviews, and the feedback provided by peers, a teacher, and an AWE system called Pigai, we found that almost all students actively engaged with this feedback and that it effectively promoted thoughtful revisions. In a later study, we suggest that a key component of engaging with AWE-provided feedback is the need for good digital literacy skills, which seem to have a greater impact than proficiency in English (Zhang and Hyland, 2024). In other words, students who are aware of the affordances and constraints of technologies are better able to evaluate and make use of digital information and are more willing to use these technologies for peer collaboration.

GenAI: The New Kid on the Block

Feedback-capable GenAI systems have only been around since late 2022 with the emergence of more flexible, general-purpose LLMs such as ChatGPT and Gemini. GenAI represents a paradigm shift in both capability and pedagogical potential with the ability to identify infelicities ranging from spelling and punctuation (e.g., Fokides and Peristeraki, 2024) to language and content (e.g., Meyer et al., 2024). Based on different technological principles, involving transformers and neural nets rather than statistical models, GenAI promises to go beyond feedback on spelling and grammar to include coherence, tone, content, and critical thinking. It creates more personalized and interactive responses than AWE programmes and can mimic Socratic feedback which can guide revision, offer models, and explain reasoning. Ideally, then, GenAI programmes’ ability to create text and grasp context offers a promising basis for feedback adapted to different tasks. The fact that they can follow simple prompts enables teachers to integrate rules in the feedback and so bridge students’ need for personalized support and teachers’ involvement in the process.

These are early days, but research on using LLMs for automated feedback has so far been encouraging. Guo and Wang (2024), for example, found that ChatGPT generated a greater quantity of feedback than EFL teachers and this was more balanced across content, organization, and language. Banihashem et al. (2024) reported that ChatGPT provided more detailed feedback on argument structure than peers, and Wang et al. (2024) found that the tool offered more comprehensive feedback on students’ argumentative texts. Users may also respond more positively to LLM-generated feedback than that given by a teacher (Wan & Chen, 2024) or peers (Zeevy-Solovey, 2024). Chinese students in Li et al.'s (2024) study, for example, rated Chat GPT4's provision of written feedback as more relevant to their specific needs compared with teachers’ more general feedback, as well as being comprehensive, addressing content, organization, and language-related issues.

Before launching a celebratory fireworks display, though, we need to add a note of caution. The ability of digital tools to deliver useful feedback obviously depends on their effectiveness in analysing texts, but this is far from assured. In a recent paper, for example, Curry et al. (2024) found that ChatGPT4 did a poor job of categorizing keywords in specialized texts, made false inferences about concordance lines, and failed to identify and analyse direct and indirect questions, so making function-to-form analysis problematic. Worse, Yoon et al.'s (2023) study discovered most feedback sentences generated by ChatGPT were highly abstract and generic, failing to provide concrete suggestions for improvement. Moreover, the accuracy in detecting major problems, such as repetitive ideas and the inaccurate use of cohesive devices, depended on superficial linguistic features and was often incorrect.

There also seems to be significant differences between the scores given by human raters and those generated by GenAI when grading texts, with ChatGPT being a significantly tougher, although more consistent, marker than teachers (Topuz et al., 2025). Another problem is that much of the research has been conducted outside of realistic learning contexts in experimental situations, raising questions about its usefulness to classroom teachers (e.g., Steiss et al., 2024). We might also want to rethink the use of large proprietary LLMs like OpenAI's GPT while ignoring smaller open-source models such as Llama and Mistral, which can offer more accessible and cost-effective options for teachers.

So Where Do We Go from Here?

Overall, then, it is now becoming clear that automation is not (yet?) the answer to our prayers and that we cannot just assume that automated marking means automatic improvements in student writing. We have to consider a range of complex factors such as learner engagement, digital literacy, and the role of teachers in the process. Students generally express a desire for richer feedback that includes different modes (Henderson et al., 2021), and there are question marks over the ability of general-purpose LLMs to generate reliable feedback in fields where knowledge of disciplinary-specific rhetorical conventions are needed (Capellini et al., 2024). In fact, GPT's ready-to-use out-of-the-box chat version, which relies on prompting to shape feedback, may not be the most effective use of AI at all. Research is starting to show that fine-tuning the model may lead to better results (Mazzula and Bullet, 2024). The fact that AI models are able to follow instructions means that they can be programmed to integrate teachers’ preferences in the feedback and thus connect students’ needs for individual support with teachers’ involvement in the process. This kind of pre-training, however, requires expertise which teachers often lack, running the risk of sidelining our direct involvement as teachers and ceding it to specialist techies.

More problematic, I think, is the key assumption underlying a lot of this triumphant cheerleading for GenAI. It seems to me that a problem with much current work sees feedback as a somewhat mechanical process simply aimed at improving student texts rather than encouraging the human activity of learning. What seems crucial, at this stage of tech-assisted feedback, is the need to move away from what has been something of an obsession with improving texts to how we can improve writers. While considerable attention has been devoted to the quality of feedback generated by AWE and GenAI, their future capacity to reform feedback processes is also significant. Technology-enabled feedback potentially allows students, in collaboration with teachers, to become more actively involved in their feedback and how they use it while gaining greater feedback awareness and digital literacy skills. How, in other words, can GenAI be leveraged in the service of developing writers rather than drafts?

There is a tendency in all of this to focus attention on the tools rather than the learners themselves and the skills they need to engage with automated feedback effectively. Navigating AI tools requires an additional skill set that may not be intuitive, especially for students with limited digital literacy. This poses a growing challenge, aggravated by language barriers and the widening digital divide (Warschauer et al., 2023). Recent research suggests that many students miss the ‘human touch’ that teacher-provided feedback offers and stresses the important role of people in the process (Han and Li, 2024; Teng, 2024). At its most effective, formative feedback is more than information transmission: simply providing students with advice on their texts. Teachers recognize that feedback is a dialogue between students and teachers designed to encourage reflection and growth. This understanding of feedback as a social practice requires a greater role for teachers in the process, making it crucial that we leverage AWE and GenAI to create more interactive and collaborative feedback loops rather than static, one-way advice.

I leave the final word to GenAI itself: ‘ChatGPT should be used as a starting point, not a final arbiter of writing quality’ (ChatGPT, 2025).

Footnotes

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Ken Hyland

References

Banihashem

Kerman

Noroozi

, et al. (2024) Feedback sources in essay writing: Peer-generated or AI-generated feedback? International Journal of Educational Technology in Higher Education 21(1).

Barrot

(2023) Using ChatGPT for second language writing: Pitfalls and potentials. Assessing Writing 57: 100745.

Capellini

Atienza

Sconfield

(2024) Knowledge accuracy and reducing hallucinations in LLMs via dynamic domain knowledge injection. Research Square. DOI: 10.21203/rs.3.rs-4540506/v1.

Carless

(2016) Feedback as dialogue. Encyclopedia of Educational Theory and Philosophy 1: 286–289.

Chen

C-FE

Cheng

W-YE

(2008) Beyond the design of automated writing evaluation: Pedagogical practices and perceived learning effectiveness in EFL writing classes. Language Learning & Technology 12(2): 94–112.

Crompton

Edmett

Ichaporia

, et al. (2024) AI And English language teaching: Affordances and challenges. British Journal of Educational Technology 55: 2503–2529.

Curry

Baker

Brookes

(2024) Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT. Applied Corpus Linguistics 4(1): 100082.

Dikli

Bleyle

(2014) Automated essay scoring feedback for second language writers: How does it compare to instructor feedback? Assessing Writing 22: 1–17.

Ferris

Kurzer

(2019) Does error feedback help L2 writers?: Latest evidence on the efficacy of written corrective feedback. In: Hyland

Hyland

(eds) Feedback in Second Language Writing. Cambridge: Cambridge University Press, 106–124.

10.

Fokides

Peristeraki

(2024) Comparing ChatGPT’s correction and feedback comments with that of educators in the context of primary students’ short essays written in English and Greek. Education and Information Technologies 30: 2577–2621.

11.

Gao

Howard

Markov

(2023) Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digital Medicine 6: 75. DOI: 10.1038/s41746-023-00819-6.

12.

Godwin-Jones R (2024) Distributed agency in language learning and teaching through generative AI. Language Learning & Technology 28(2): 5–30.

13.

Grimes

Warschauer

(2010) Utility in a fallible tool: A multi-site case study of automated writing evaluation. The Journal of Technology, Learning and Assessment 8(6).

14.

Guo

Wang

(2024) To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies 29: 8435–8463.

15.

Han

(2024) Exploring ChatGPT-supported teacher feedback in the EFL context. System 126: 103502.

16.

Han

Sari

(2024) An investigation on the use of automated feedback in Turkish EFL students’ writing classes. Computer Assisted Language Learning 37(4): 961–985.

17.

Hattie

Timperley

(2007) The power of feedback. Review of Educational Research 77(1): 81–112.

18.

Henderson

Ryan

Boud

, et al. (2021) The usefulness of feedback. Active Learning in Higher Education 22(3): 229–243.

19.

Hudson

(2025) AI can help stop teachers leaving the profession. The Times. Available at: www.thetimes.com/comment/columnists/article/ai-can-help-stop-teachers-leaving-the-profession-p2bj8qpln?utm_source=chatgpt.com.

20.

Jiang F and Hyland K (2025a) Does ChatGPT argue like students? Bundles in argumentative essays. Applied Linguistics, 46(3): 375–391.

21.

Jiang FK and Hyland K (2025b) Metadiscursive nouns in academic argument: ChatGPT vs student practices. Journal of English for Academic Purposes. DOI: 10.1016/j.jeap.2025.101514.

22.

Jiang FK and Hyland K (2025c) Rhetorical distinctions: Comparing metadiscourse in essays by ChatGPT and students. English for Specific Purposes.

23.

Khalifa

Albadawy

(2025) Artificial intelligence for clinical prediction: Exploring key domains and essential functions. Computer Methods and Programs in Biomedicine Update 5: 100148.

24.

Kohnke

(2024) Exploring EAP students’ perceptions of GenAI and traditional grammar-checking tools for language learning. Computers and Education: Artificial Intelligence 7: 100279.

25.

Learnosity (n.d.) A third of US teachers considered leaving education in last 12 months due to grading workload. Available at: https://learnosity.com/edtech-blog/a-third-of-us-teachers-considered-leaving-education-in-last-12-months-due-to-grading-workload/?utm_source=chatgpt.com (accessed 29 June 2025).

26.

Huang

, et al. (2024) Evaluating the role of ChatGPT in enhancing EFL writing assessments in classroom settings: A preliminary investigation. Humanities and Social Sciences Communications 11(1): 1–9.

27.

Mazzula

Bullet

(2024) Automated feedback generation for open-ended questions: Insights from fine-tuned LLMs. Proceedings of Machine Learning Research 1: 1–18.

28.

Meyer

Jansen

Schiller

, et al. (2024) Using LLMs to bring evidence-based feedback into the classroom: AI-generated feedback increases secondary students’ text revision, motivation, and positive emotions. Computers and Education: Artificial Intelligence 6: 100199.

29.

Murillo-Zamorano

Montanero

(2018) Oral presentation in higher education: A comparison of the impact of peer and teacher feedback. Assessment and Evaluation in Higher Education 43(1): 138–150.

30.

Pudasaini

Miralles-Pechuán1

Lillis

, et al. (2024) Survey on AI-generated plagiarism detection: The impact of large language models on academic integrity. Journal of Academic Ethics.

31.

Ranalli

Link

Chukharev-Hudilainen

(2017) Automated writing evaluation for formative assessment of second language writing: Investigating the accuracy and usefulness of feedback as part of argument-based validation. Educational Psychology 37(1): 8–25.

32.

Steiss

Tate

Graham

, et al. (2024) Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction 91: 101894.

33.

Stevenson

Phakiti

(2019) Automated feedback and second language writing. In: Hyland K and Hyland F (eds) Feedback in Second Language Writing: Contexts and Issues. Cambridge: Cambridge University Press, 125–142.

34.

Lin

Lai

(2023) Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing 57: 100752.

35.

Teng

(2024) “ChatGPT is the companion, not enemies”: EFL learners’ perceptions and experiences in using ChatGPT for feedback in writing. Computers & Education: Artificial Intelligence.

36.

Topuz

Yıldız

Taşlıbeyaz

, et al. (2025) Is generative AI ready to replace human raters in scoring EFL writing? Comparison of human and automated essay evaluation. Educational Technology & Society 28(3): 36–50.

37.

Wan

Chen

(2024) Exploring generative AI assisted feedback writing for students’ written responses to a physics conceptual question with prompt engineering and few-shot learning. Physical Review Physics Education Research 20(1): 010152.

38.

Wang

Chen

Wang

, et al. (2024) ChatGPT’s capabilities in providing feedback on undergraduate students’ argumentation: A case study. Thinking Skills and Creativity 51: 101440.

39.

Warschauer

Tseng

Yim

, et al. (2023) The affordances and contradictions of AI-generated text for writers of English as a second or foreign language. Journal of Second Language Writing 62: 100882.

40.

Warschauer

Ware

(2006) Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research 10(2): 157–180.

41.

Wilson

Czik

(2016) Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers & Education 100: 94–109.

42.

Yoon

S-Y

Miszoglad

Pierce

(2023) Evaluation of ChatGPT feedback on ELL writers’ coherence and cohesion. arXiv Preprint arXiv:2310.06505v1.

43.

Zhang ZV and Hyland K (2018) Student engagement with teacher and automated feedback on L2 writing. Assessing Writing 36: 90–102.

44.

Zhang ZV and Hyland K (2022) Fostering student engagement with feedback: An integrated approach. Assessing Writing 51: 100586.

45.

Zhang ZV and Hyland K (2024) The role of digital literacy in student engagement with automated writing evaluation (AWE) feedback on L2 writing. Computer Assisted Language Learning. DOI: 10.1080/09588221.2023.2256815.

46.

Zeevy-Solovey

(2024) Comparing peer, ChatGPT, and teacher corrective feedback in EFL writing: Students’ perceptions and preferences. Technology in Language Teaching & Learning 6(3): 3.

47.

Zhai

(2023) The effectiveness of automated writing evaluation on writing quality: A meta-analysis. Journal of Educational Computing Research 61(4): 875–900.

Tomorrowland? Technical Innovation and Feedback on Writing

Abstract

Keywords

Introduction

GenAI, Feedback, and Writing Instruction

Automating Writing Evaluation

GenAI: The New Kid on the Block

So Where Do We Go from Here?

Footnotes

Funding

ORCID iD

References