Sage Journals: Discover world-class research

Abstract

Since ChatGPT's emergence, extensive research has explored the role of generative artificial intelligence (GenAI) in delivering written feedback (WF), encompassing diverse aims, methodologies and findings. This includes studies examining second language (L2) contexts in which such feedback is used, although, to date, no attempt has been made to synthesize these studies for cumulative knowledge building. To provide clarity on the state-of-the-art on GenAI-influenced L2 WF studies, this Preferred Reporting Items for Systematic Reviews and Meta-Analyses-informed scoping review article explores a dataset of 51 such studies taken from Social Sciences Citation Index/Emerging Sources Citation Index-indexed publications since 2022. Two researchers manually coded these studies for data regarding publication outlets, country/region and L2 focus, research aims, research methods, findings and identified (or author-disclosed) limitations. Findings reveal that most studies address improvements to writing quality arising from GenAI produced feedback, and/or student and teachers’ perceptions of such feedback. A diverse set of methods include revision analysis, pre-tests/post-tests of writing quality and quantitative surveys. Results cover improvements in writing quality or skills, mixed perceptions, and varied feedback uptake and revision behaviours particularly when comparing artificial intelligence and human feedback. We close by identifying gaps in cumulative knowledge and suggesting directions for future research.

Keywords

Generative artificial intelligence second language writing written feedback written corrective feedback scoping review

Introduction

The field of second language (L2) writing has already seen fundamental changes in the wake of generative artificial intelligence (GenAI), as researchers, practitioners and students struggle to tackle the technological, ethical, and pedagogical challenges that GenAI has raised. Nowhere is this more apparent than studies on written feedback (WF), which, despite being no stranger to artificial intelligence (AI)-focused research prior to GenAI, for example, studies on written corrective feedback (WCF) tools such as Grammarly or corpus/natural language processing-based automated writing evaluation (AWE) techniques for L2 writing assessment, there is now a (quite rapidly) increasing number of studies focusing on provision of WF on L2 writing involving GenAI applications, for example, ChatGPT.

Despite syntheses available covering WF for L2 writing prior to GenAI (Shi and Aryadoust, 2024) as well as syntheses of GenAI-based feedback for first language (L1) writing (Lee and Moore, 2024), to the best of our knowledge, there is, as yet, no specific systematic review of studies focusing specifically on GenAI WF for L2 writing. By WF, we refer to studies covering: (a) use of GenAI for general feedback on L2 writing (including feedback on grammar, lexis, content and organization); (b) GenAI WCF (specifically covering feedback on errors and their resolution); (c) GenAI AWE (covering evaluations of writing quality, typically against an assessment rubric); or (d) any combination of these.

While recent L2 GenAI reviews have begun to touch on L2 WF, they do so only in passing. M. Li's (2024) focused literature review examines various aspects of GenAI for L2 writing including development, collaborative processing, teacher feedback and assessment, with WF discussed only as one strand within that wider landscape. Similarly, S. Li's (2025) review explores topics such as prompt engineering, assessment, and discourse comparison. Feedback again forms only one component of a broader synthesis, finding that GenAI feedback is more focused on organization while instructor feedback is more focused on substance, together with a necessity for further research involving criteria-based GenAI feedback practices and GenAI feedback quality evaluation. However, neither review provides a systematic, dedicated analysis of GenAI-generated WF for L2 writing. Therefore, in a contribution to collective knowledge building in this space, the present study offers the first Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)-informed scoping review that concentrates exclusively on this topic. By systematically mapping 51 empirical studies, coding their publication venues, countries/regions, languages, research objectives, methodologies, findings and limitations, this review extends prior syntheses by offering a comprehensive evidence base to guide both future research and pedagogical practice for GenAI-focused L2 WF.

Method

Typically, scoping reviews follow a PRISMA approach (Page et al., 2021). PRISMA is an evidence-based set of reporting standards designed to improve the transparency and completeness of reporting in systematic reviews and meta-analyses. It is not a methodological guideline for conducting reviews but a framework for transparent reporting, often including a flow diagram to document the study selection process. This is presented in Figure 1 regarding the sequence of identification, screening and eligibility statistics in our study, following PRISMA 2000 guidelines.

Figure 1.

Identification, Screening, and Eligibility Statistics.

Identification

We first searched Web of Science and Scopus for relevant Social Sciences Citation Index/Emerging Sources Citation Index (SSCI/ESCI)-indexed studies published up to 23 April 2025. Major keywords incorporated ‘Generative Artificial Intelligence,’ ‘Generative AI,’ ‘GenAI,’ ‘GAI,’ ‘Large Language Model,’ ‘LLM,’ or ‘ChatGPT,’ paired with ‘feedback,’ ‘writing/written feedback,’ ‘corrective feedback,’ ‘error correction,’ ‘error feedback,’ ‘grammar correction,’ ‘direct feedback,’ ‘indirect feedback,’ ‘metalinguistic feedback,’ ‘unfocused feedback,’ ‘comprehensive feedback,’ ‘focused feedback,’ ‘reformulations,’ ‘response,’ ‘comment,’ ‘editing,’ or ‘revision.’ We also utilized the snowballing technique to ensure comprehensiveness (Biernacki and Waldorf, 1981). An initial search yielded 123 pertinent SSCI/ESCI-indexed articles, of which 105 studies remained after removing 18 duplicate articles before screening.

Screening

Regarding screening, for inclusion into the dataset, studies had to focus specifically on: (a) L2 writing; (b) L2 written (corrective) feedback, or at least AWE containing WF rather than scores alone; and (c) the use of generative AI in the provision of said feedback, either pre-writing, during-writing or post-writing. GenAI in this study is defined as the use of large language models that can produce human-like text to provide automated, contextually relevant WF for L2 writing. This separates GenAI from other forms of AI such as AWE scoring, automated speech recognition, or rule-based (non-predictive) AI feedback systems such as grammar checkers or template-based feedback tools.

Papers were excluded for the following reasons: no specific focus on WF (n = 7); no identifiable L2-specific (or L1 plus L2) writing focus (n = 35); the study was not about AI (n = 1); the topic did not relate to L2 writing (n = 4); the focus was solely on AWE scoring without accompanying WF (n = 2); the AI used was not generative (n = 1); the paper was a software demonstration (n = 1) or a review article (n = 1); or the study focused on L1/L2 translation (n = 2).

Both researchers read each paper individually to determine its inclusion/exclusion for the dataset, separately colour-coding article entries in the dataset for inclusion/exclusion in a first round of reviews, then meeting to discuss all agreed cases for exclusion and disagreements. The first round led to 49 agreed exclusions, and with six disagreements reaching a second round of coding. A remaining six articles were excluded in the second round, for a final dataset of 51 articles. Due to the small number of articles involved, it was not necessary to conduct quantitative inter-coder reliability checks as these are prone to fluctuation based on sample size and given each article was read and discussed between both coders, with disagreements resolved in person following collaborative reading and re-reading of articles.

Each article was then manually re-read in its entirety (including the abstract, introduction, research questions, methods, results, etc.) to determine: (a) country/region and language focus (where not clear from the metadata); (b) stated research aims; (c) research methods and procedures, including feedback type(s) (WF, WCF and AWE) and feedback timing (pre-writing, during-writing, post-writing); and (d) reported study findings. The researchers also made notes on identified (or author-disclosed) methodological limitations, for example, small participant samples, lack of control group in experimental designs, or limited information about GenAI prompts used in each study.

Results

Publication Outlets

The dataset includes 35 distinct publication outlets. Journals with multiple GenAI and L2 WF studies include Education and Information Technologies (n = 5), Computer Assisted Language Learning (n = 4), Journal of Second Language Writing (n = 3), and several others (n = 2 each) such as System, Cogent Education and Interactive Learning Environments. Thirty outlets were listed in the Social Sciences Citation Index, 19 in the Emerging Sources Citation Index, and one in the Science Citation Index Expanded.

Country/Region and Language Focus

Most studies targeted L2 English (n = 46), though L2 Chinese and German also featured, along with a comparative study of L1 Greek and L2 English. Learners’ L1 s were predominantly Chinese (n = 26) and Arabic (n = 10), with individual studies covering Japanese, Indonesian, Greek, Finnish, Turkish, Czech and Vietnamese. Five studies involved multilingual cohorts.

The research was mostly conducted in Mainland China (n = 16), followed by Hong Kong SAR (n = 7) and Macau (n = 3). Other Asian countries/regions represented include Japan, Indonesia, Vietnam and India with one representative study each. In the Middle East, studies came from Saudi Arabia (n = 5), Taiwan (n = 2), Iran (n = 2), Qatar (n = 1) and Iraq (n = 1). Western countries/regions represented include the United States (n = 2), Turkey (n = 2), and one each from the Czech Republic, Greece, Australia and Finland (see Figure 2).

Figure 2.

Geographical Location of Featured Generative Artificial Intelligence Second Language Written Feedback Studies.

Research Aims

A diverse set of research aims and questions were present across the studies in our dataset, which can be grouped into several overarching themes.

Most studies (n = 16) aimed to evaluate the impact of ChatGPT-generated feedback on improving L2 writing performance. This included improvements in overall writing quality, proficiency, or specific features such as grammar, lexis and organization (e.g., Alanazi et al., 2025; Tsai et al., 2024). Some studies assessed writing outcomes using standardized rubrics (e.g., International English Language Testing System, Chan et al., 2025), while others focused on targeted domains such as argumentation (Luo et al., 2025).

A significant number of studies (n = 15) investigated student and/or teacher perceptions of AI-generated feedback, covering affective and attitudinal aspects including satisfaction, motivation and emotional responses (e.g., Kurt and Kurt, 2024; Teng, 2024). A notable sub-trend is that of studies taking a multifactorial approach to engagement with GenAI feedback, for example, behavioural, cognitive and affective forms of engagement (e.g., Koltovskaia et al., 2024; Zhan and Yan, 2025).

Eleven studies compared the effectiveness, quality, or uptake of AI-generated WF versus teacher or peer feedback (e.g., Asadi et al., 2025; Lin & Crosthwaite, 2024; Guo and Wang, 2024; Zou et al., 2025). Aims included evaluating differences in feedback types (e.g., direct, indirect and metalinguistic) and their impact on revision practices and writing outcomes.

Relatedly, several studies (n = 7) specifically examined students’ engagement with AI feedback during revision processes, such as acceptance/rejection patterns, the focus on form versus content, and factors influencing feedback uptake (e.g., Chen et al., 2024; Tran, 2025; Yan and Zhang, 2024).

A smaller number of studies (n = 5) focused on the accuracy or reliability of AI feedback, particularly for error detection or correction (e.g., Alsaweed and Aljebreen, 2024; Naz and Robertson, 2024; Yang and Chen, 2025). This includes assessing ChatGPT's performance across languages (e.g., English vs. Greek, Chinese) or model versions (e.g., ChatGPT-3.5 vs. 4.0).

Finally, a few studies explored specific or novel use cases of AI feedback. These included its role in brainstorming or enhancing coherence and grammar (e.g., Arifin et al., 2024; Su et al., 2023), the effects of prompting strategies on feedback quality (e.g., Tam, 2024) and comparisons between collaborative and individual engagement with AI feedback (Yan, 2024). A small subset (n = 3) focused on teachers’ use of AI in their feedback practices, including comparisons between AI-supported teacher feedback and AI-only feedback (e.g., Asadi et al., 2025; Han and Li, 2024; Yao et al., 2025).

Research Methods and Procedures

Feedback Type and Timing

Thirty-two studies were classified as involving GenAI-produced WF, with 11 studies identified as featuring exclusively WCF. Four studies explicitly combined WF and WCF, with three explicitly combining AWE grades with WF. With relation to feedback timing, most studies provided feedback post-writing (n = 45). Only one study (Shan et al., 2025) explicitly discussed the use of GenAI WF at the pre-writing stage, while two (Arifin et al., 2024; Allen and Mizumoto, 2024) were identified as involving students’ use of GenAI feedback during writing. One study (Su et al., 2023) specifically outlined how GenAI feedback could be used at all stages of the writing process.

Research Methods

Many studies (n = 17) used pre/post quasi-experimental designs to assess writing quality or proficiency in writing produced before and after receiving AI-generated feedback (e.g., Alanazi et al., 2025; Chan et al., 2024, 2025; Mahapatra, 2024; Tsai et al., 2024). These typically involved comparing writing scores or drafts (pre-intervention and post-intervention) to measure improvements in specific aspects such as grammar, organization, or overall quality. Six studies also examined revisions made to writing drafts based on AI feedback, often analysing changes in form, content, or error correction (e.g., Chen et al., 2024; Long, 2024; Tran, 2025) through tracking revision operations or comparing drafts.

Surveys were also used in 16 studies to collect data on student or teacher perceptions, attitudes, motivation, or feedback literacy (e.g., Abduljawad, 2024; Guo and Wang, 2024; Teng, 2024). Interviews, often semi-structured, were also employed in 16 studies to explore perceptions, experiences, or challenges with AI feedback (e.g., Arifin et al., 2024; Kurt and Kurt, 2024; Zou et al., 2025), providing in-depth qualitative data on engagement or feedback uptake.

Observation was used to study student interactions with AI feedback or classroom dynamics (e.g., Abduljawad, 2024; Arifin et al., 2024; Yan, 2024). Often, this complemented other methods such as interviews or surveys. Stimulated recall sessions, where participants reflected on their writing or feedback processes (often while reviewing drafts or screencasts), were used to explore cognitive and behavioural engagement (e.g., Koltovskaia et al., 2024; Long, 2024; Yeung, 2025).

Several studies (n = 7) analysed corpora of student writing or feedback (AI and/or human) to evaluate error correction, feedback types, or quality (e.g., Alsaweed and Aljebreen, 2024; Fokides and Peristeraki, 2024; Li et al., 2024). Less common methods included screencasts to capture real-time interaction with AI feedback (e.g., Koltovskaia et al., 2024; Yeung, 2025), chat logs to analyse prompt and response interactions with AI (e.g., Guo et al., 2024; Su et al., 2023; Tam, 2024), keylogging to track revision operations (e.g., Yan and Zhang, 2024) and self-reflection or learning journals to assess perceptions or cognitive processes (e.g., Chen et al., 2025; Yan and Zhang, 2024).

Study Findings

Many studies (n = 17) reported that AI-generated feedback, whether in the form of WF WF, WCF, or AWE, led to improvements in writing quality. These improvements spanned grammar, lexis, coherence and organization (e.g., Chan et al., 2024, 2025; Mahapatra, 2024; Polakova and Ivenz, 2024; Tsai et al., 2024). Some studies reported significant gains relative to control groups or baseline performance, although others observed improvements in both experimental and control groups (e.g., Alanazi et al., 2025), or found only small effect sizes favouring AI feedback (e.g., Chan et al., 2025, p = 0.36).

A subset of studies compared the effectiveness of AI feedback with teacher feedback. Several studies found no significant difference between AI and teacher feedback in improving writing (e.g., Alsofyani and Barzanji, 2025; Escalante et al., 2023), while others reported teacher and AI feedback as complementary, particularly for organization or specific error types (e.g., Luo et al., 2025; Zou et al., 2025). Several studies noted that GenAI provided more reformulation or metalinguistic feedback than teachers, but which was often redundant or less relevant for content (e.g., Lin & Crosthwaite, 2024; Li et al., 2024). Notably, combining AI and teacher feedback was more effective than either alone for improving writing quality or addressing diverse error types (e.g., Asadi et al., 2025; Han and Li, 2024; Luo et al., 2025).

Students and teachers generally reported positive perceptions of AI feedback, citing benefits such as motivation, practicality, interactivity, or independence (e.g., Abduljawad, 2024; Kurt and Kurt, 2024; Naz and Robertson, 2024). Some studies, however, noted mixed or negative perceptions, including confusion, mistrust and concerns over over-reliance on AI (e.g., Chen et al., 2025; Escalante et al., 2023; Lo et al., 2025). AI feedback was seen to enhance engagement (behavioural, cognitive and affective) and feedback literacy, with students showing increased motivation, self-efficacy, or collaborative tendencies (e.g., Koltovskaia et al., 2024; Rad et al., 2024; Teng, 2024). However, some studies noted superficial engagement or reliance on AI, reducing creativity or agency (e.g., Shi et al., 2025; Zhan and Yan, 2025). Students preferred AI feedback over peer feedback for editing/proofreading (e.g., Allen and Mizumoto, 2024) or used AI for brainstorming, lexis and coherence (e.g., Arifin et al., 2024). In such cases, AI was perceived as functioning like a personal tutor, although the effectiveness of this role often depended on how the feedback was prompted (e.g., Tam, 2024).

Regarding revisions, studies found that students often accepted and incorporated AI feedback, particularly for form-related corrections (e.g., grammar and lexis), but were less likely to revise content-related feedback (e.g., Chen et al., 2024; Long, 2024). Selective uptake was noted when feedback was excessive or unclear, and uptake varied by proficiency or technological competence (e.g., Tran, 2025; Yan and Zhang, 2024).

Finally, studies assessing the accuracy and reliability of AI feedback reported generally strong performance in detecting and correcting errors in English, but less so in other languages such as Greek or Chinese (e.g., Fokides and Peristeraki, 2024; Yang and Chen, 2025). Other limitations included inaccuracies, misinterpretation of author intent, or errors/hallucinations (e.g., Alsaweed and Aljebreen, 2024; Naz and Robertson, 2024).

Identified Issues

Recurring methodological limitations indicated challenges in ensuring robust study designs in our dataset. Particularly, assessments of ‘writing quality’ often did not relate to the feedback sought from GenAI (e.g., Escalante et al., 2023; Mahapatra, 2024; Lo et al., 2025), with a reliance on self-reported proficiency or skills in some studies (Abduljawad, 2024). Small sample sizes or limited participants were explicitly noted in a few studies (e.g., Tran, 2025, n = 17; Yan, 2024, n = 3; Yan and Zhang, 2024, n = 4). With regards to the employment of control groups comparing GenAI feedback to other forms (e.g., teacher feedback), 25 experimental studies met the criteria, while 16 such studies did not use a control group. Importantly, only 27 out of the 50 included studies provided information on the prompts used with GenAI, with 21 not including this information.

Discussion and Conclusion

This short review article has outlined the aims, methods, findings, and limitations of over 50 published studies on GenAI-assisted written (corrective) feedback for L2 learning and teaching. Overall, the data reveals a rapidly growing and geographically diverse body of research into GenAI-generated feedback on L2 writing (at least, for English L2 writing), and employing varied research designs to examine writing improvement, engagement and perceptions. Findings appear to suggest AI-generated (corrective) feedback does support improvements in linguistic accuracy and learner engagement, though issues remain around methodological consistency, feedback relevance, prompt transparency and the balance between AI and human input. Our synthesis confirms some of the broad trends also noted by M. Li (2024) and S. Li (2025), such as differences between GenAI and teacher feedback and the mixed perceptions of students and teachers. However, unlike these broader reviews, our study contributes a dedicated and systematic analysis of GenAI-generated WF for L2 writing. By focusing exclusively on feedback, we were able to categorize studies by feedback type (WF, WCF and AWE), timing (pre-writing, during-writing and post-writing), research aims, methods and outcomes, providing a level of granularity not available in earlier syntheses. Furthermore, our scoping review highlights issues that have not been systematically charted before, including the: lack of prompt transparency; predominance of English over other L2 s; and under-exploration of pre-writing and during-writing feedback. In doing so, this study moves beyond the descriptive overviews of previous reviews to establish a more comprehensive evidence base that can inform both future empirical research and practical integration of GenAI feedback in L2 writing pedagogy.

Regarding pedagogical implications, the findings highlight the potential of AI-generated feedback, particularly from tools such as ChatGPT, to enhance L2 writing instruction by supporting individualized, scalable and targeted feedback across diverse learner needs. These insights can inform language education policies that integrate AI responsibly into curricula, promote teacher training in AI literacy and guide best practices for balancing human and machine feedback.

Key priorities for future research include the need for a meta-analysis of GenAI L2 feedback studies dealing with ‘writing quality’ or improvements to other writing skills. Such an analysis could help disentangle evidence-based critiques from uncritical enthusiasm by identifying where and how GenAI feedback genuinely contributes to writing development, and where its limitations remain. Secondly, more attention needs to be paid to the links between revisions arising from GenAI feedback and said ‘writing quality,’ as many studies have not yet taken this into account. Thirdly, far more studies are needed on L2s other than English. Fourth, we strongly recommend that all GenAI-related studies on L2 WF (and more generally) include – as a matter of course – information on the prompts used, given the crucial importance of prompts on all aspects of GenAI output. Fifth, more studies investigating GenAI feedback use at pre-writing and during-writing stages are needed. Finally, a move from self-reported or rater-led data sources to online methods including screencasts, etc., can help capture more authentic, process-oriented insights into how learners engage with GenAI feedback in real time, revealing patterns that may be overlooked in retrospective accounts.

In terms of the study's limitations, due to space requirements for RELC Journal review articles, certain variables (e.g., information about participant characteristics including learner level, context, proficiency, or the theoretical frameworks involved in these studies, etc.) were not covered in the present study. We also note that the present study is not strictly a full PRISMA systematic review (see also Chong's (2025) Specific, Measurable, Achievable, Relevant, and Time-bound (SMART) criteria for conducting and reporting systematic reviews in applied linguistics) as we did not address transparent risk of bias assessment using appraisal tools (e.g., Critical Appraisal Skills Programme; https://casp-uk.net/casp-tools-checklists/) nor conduct a structured protocol registration, etc., due to space and time considerations. However, our methods do align with PRISMA/SMART-style transparent reporting and systematic selection procedures. A fuller systematic review addressing an increased number of study variables and following full SMART review guidelines is forthcoming.

Footnotes

ORCID iD

Peter Crosthwaite

Ethical Approval and Informed Consent Statements

This study received an ethics waiver from the University of Queensland as it does not involve human subjects with all data sourced from publicly available sources.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interest

The authors declare no conflicts of interest in the submission or publication of this scoping review.

Data Availability Statement

Data is available upon request from the authors.

References

Abduljawad

(2024) Investigating the impact of ChatGPT as an AI tool on ESL writing: Prospects and challenges in Saudi Arabian higher education. International Journal of Computer-Assisted Language Learning and Teaching 14(1): 1–19.

Alanazi

Elmotri

Khamis

, et al. (2025) Assessing the efficacy of ChatGPT’s automated corrective feedback in enhancing students’ writing proficiency. International Journal of Advanced and Applied Sciences 12(2): 205–214.

Allen

Mizumoto

(2024) ChatGPT over my friends: Japanese English-as-a-foreign-language learners’ preferences for editing and proofreading strategies. RELC Journal: 1–18. DOI: 10.1177/00336882241262533.

Alsaweed

Aljebreen

(2024) Investigating the accuracy of ChatGPT as a writing error correction tool. International Journal of Computer-Assisted Language Learning and Teaching 14(1): 1–18.

Alsofyani

Barzanji

(2025) The effects of ChatGPT-generated feedback on Saudi EFL learners’ writing skills and perception at the tertiary level: A mixed-methods study. Journal of Educational Computing Research 63(2): 431–463.

Arifin

Rahman

Balla

, et al. (2024) ChatGPT affordances and Indonesian EFL students’ perceptions in L2 writing: A collaborative reflexive thematic analysis. Changing English 32(2): 195–211.

Asadi

Ebadi

Mohammadi

(2025) The impact of integrating ChatGPT with teachers’ feedback on EFL writing skills. Thinking Skills and Creativity 56: 101766. DOI: 10.1016/j.tsc.2025.101766.

Biernacki

Waldorf

(1981) Snowball sampling: Problems and techniques of chain referral sampling. Sociological Methods & Research 10(2): 141–163.

Chan

Wong

(2024) Enhancing university level English proficiency with generative AI: Empirical insights into automated feedback and learning outcomes. Contemporary Educational Technology 16(4): 1–17.

10.

Chan

Wong

(2025) Leveraging generative AI for enhancing university-level English writing: Comparative insights on automated feedback and student engagement. Cogent Education 12(1): 1–22.

11.

Chen

Wei

Zhu

, et al. (2025) Unpacking the rejection of L2 students toward ChatGPT-generated feedback: An explanatory research. ECNU Review of Education: 1–20. DOI: 10.1177/20965311241305140.

12.

Chen

Zhu

, et al. (2024) L2 students’ barriers in engaging with form and content-focused AI-generated feedback in revising their compositions. Computer Assisted Language Learning: 1–21. DOI: 10.1080/09588221.2024.2422478.

13.

Chong

(2025) Synthesis methods and reporting tool (SMART) for research syntheses in applied linguistics. Research Synthesis in Applied Linguistics 1(1): 17–38. DOI: 10.1080/29984475.2025.2456880.

14.

Escalante

Pack

Barrett

(2023) AI-generated feedback on writing: Insights into efficacy and ENL student preference. International Journal of Educational Technology in Higher Education 20(1): 1–20.

15.

Fokides

Peristeraki

(2024) Comparing ChatGPT’s correction and feedback comments with that of educators in the context of primary students’ short essays written in English and Greek. Education and Information Technologies 30: 2577–2621.

16.

Guo

Pan

, et al. (2024) Effects of an AI-supported approach to peer feedback on university EFL students’ feedback quality and writing ability. The Internet and Higher Education 63: 1–15.

17.

Guo

Wang

(2024) To resist it or to embrace it? Examining ChatGPT’s potential to support teacher feedback in EFL writing. Education and Information Technologies 29(7): 8435–8463.

18.

Han

(2024) Exploring ChatGPT-supported teacher feedback in the EFL context. System 126: 1–11.

19.

Koltovskaia

Rahmati

Saeli

(2024) Graduate students’ use of ChatGPT for academic text revision: behavioral, cognitive, and affective engagement. Journal of Second Language Writing 65: 101130.

20.

Kurt

(2024) Enhancing L2 writing skills: ChatGPT as an automated feedback tool. Journal of Information Technology Education: Research 23: 1–17.

21.

Lee

Moore

(2024) Harnessing generative AI (GenAI) for automated feedback in higher education: A systematic review. Online Learning 28(3): 82–106.

22.

Huang

, et al. (2024) Evaluating the role of ChatGPT in enhancing EFL writing assessments in classroom settings: A preliminary investigation. Humanities & Social Sciences Communications 11(1): –9.

23.

(2024) Leveraging ChatGPT for second language writing feedback and assessment. International Journal of Computer-Assisted Language Learning and Teaching 14(1): 1–11.

24.

(2025) Generative AI and second language writing. Digital Studies in Language and Literature 2: 122–152.

25.

Lin S and Crosthwaite P (2024) The grass is not always greener: Teacher vs. GPT-assisted written corrective feedback. System 127: 103529.

26.

Wong

Chan

(2025) The impact of generative AI on essay revisions and student engagement. Computers and Education Open 9: 100249. DOI: 10.1016/j.caeo.2025.100249.

27.

Long

(2024) Exploring the use of ChatGPT as a tool for written corrective feedback in an EFL classroom. Journal of Asia TEFL 21(2): 397–412.

28.

Luo

Zhong

(2025) The collaboration of AI and teacher in feedback provision and its impact on EFL learner’s argumentative writing. Education and Information Technologies 30: 17695–17715. DOI: 10.1007/s10639-025-13488-7.

29.

Mahapatra

(2024) Impact of ChatGPT on ESL students’ academic writing skills: A mixed methods intervention study. Smart Learning Environments 11(1): 1–18.

30.

Naz

Robertson

(2024) Exploring the feasibility and efficacy of ChatGPT3 for personalized feedback in teaching. Electronic Journal of E-Learning 22(2): 98–111.

31.

Page

McKenzie

Bossuyt

, et al. (2021) The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. British Medical Journal 372: n71.

32.

Polakova

Ivenz

(2024) The impact of ChatGPT feedback on the development of EFL students’ writing skills. Cogent Education 11(1): 1–12.

33.

Rad

Alipour

Jafarpour

(2024) Using artificial intelligence to foster students’ writing feedback literacy, engagement, and outcome: A case of Wordtune application. Interactive Learning Environments 32(9): 5020–5040.

34.

Shan

Song

Jiang

, et al. (2025) Complementing but not replacing: comparing the impacts of GPT-4 and native-speaker interaction on Chinese L2 writing outcomes. Behavioral Sciences 15(4): 1–21.

35.

Shi

Aryadoust

(2024) A systematic review of AI-based automated written feedback research. ReCALL 36(2): 187–209.

36.

Shi

Chai

Zhou

, et al. (2025) Comparing the effects of ChatGPT and automated writing evaluation on students’ writing and ideal L2 writing self. Computer Assisted Language Learning: 1–28. DOI: 10.1080/09588221.2025.2454541.

37.

Lin

Lai

(2023) Collaborating with ChatGPT in argumentative writing classrooms. Assessing Writing 57(2): 1–11.

38.

Tam

ACF

(2024) Interacting with ChatGPT for internal feedback and factors affecting feedback quality. Assessment and Evaluation in Higher Education 50(2): 219–235.

39.

Teng

(2024) Metacognitive awareness and EFL learners’ perceptions and experiences in utilising ChatGPT for writing feedback. European Journal of Education 60(1): 1–17.

40.

Tran

(2025) Enhancing EFL writing revision practices: The impact of AI- and teacher-generated feedback and their sequences. Education Sciences 15(2): 1–22.

41.

Tsai

Lin

Brown

(2024) Impacts of ChatGPT-assisted writing for EFL English majors: Feasibility and challenges. Education and Information Technologies 29(17): 22427–22445.

42.

Yan

(2024) Collaborative processing of ChatGPT-generated feedback: Effects on L2 writing task improvement and learning. Language Learning & Technology 28(1): 1–19.

43.

Yan

Zhang

(2024) L2 writer engagement with automated written corrective feedback provided by ChatGPT: A mixed-method multiple case study. Humanities & Social Sciences Communications 11(1): 1–14.

44.

Yang

Chen

(2025) ChatGPT and L2 Chinese writing: Evaluating the impact of model version and prompt language on automated corrective feedback. Computer Assisted Language Learning: 1–29. DOI: 10.1080/09588221.2025.2453205.

45.

Yao

Zhu

Xiao

, et al. (2025) Secondary school English teachers’ application of artificial intelligence-guided chatbot in the provision of feedback on student writing: An activity theory perspective. Journal of Second Language Writing 67: 1–18.

46.

Yeung

(2025) University students’ engagement with generative AI-supported automated writing evaluation (AWE) feedback. Journal of Second Language Writing 68: 1–15.

47.

Zhan

Yan

(2025) Students’ engagement with ChatGPT feedback: Implications for student feedback literacy in the context of generative artificial intelligence. Assessment & Evaluation in Higher Education: 1–14. DOI: 10.1080/02602938.2025.2471821.

48.

Zou

Guo

Wang

, et al. (2025) Investigating students’ uptake of teacher-and ChatGPT-generated feedback in EFL writing: A comparison study. Computer Assisted Language Learning: 1–30. DOI: 10.1080/09588221.2024.2447279.

Generative AI and L2 Written Feedback Studies: A Scoping Review

Abstract

Keywords

Introduction

Method

Identification

Screening

Results

Publication Outlets

Country/Region and Language Focus

Research Aims

Research Methods and Procedures

Feedback Type and Timing

Research Methods

Study Findings

Identified Issues

Discussion and Conclusion

Footnotes

ORCID iD

Ethical Approval and Informed Consent Statements

Funding

Declaration of Conflicting Interest

Data Availability Statement

References