Sage Journals: Discover world-class research

Abstract

This study investigated the efficacy of artificial intelligence-based dynamic written corrective feedback on second language writing accuracy, fluency, complexity, and functional adequacy, while also examining user sentiment for teacher and students. Utilizing Claude 3 Opus as the primary artificial intelligence tool, the research compares artificial intelligence-generated feedback to traditional teacher-provided dynamic written corrective feedback within a 15-week intensive English program involving intermediate-high learners of English as a second language (n = 41). Using a quasi-experimental design, participants were randomly assigned to a control (teacher-based feedback) and treatment (artificial intelligence-based feedback) groups. Multiple metrics were used to assess second language writing development, including the error-free clause ratio, fluency, syntactic complexity (mean length of T-unit and clauses per T-unit), and rubric-based functional adequacy. Findings from repeated measures analysis of variance indicated that although both groups receiving dynamic written corrective feedback improved in writing accuracy, the teacher feedback group outperformed the artificial intelligence group in fluency and functional adequacy. No significant differences were observed for measures of syntactic complexity. Sentiment analysis revealed mixed reactions: although most students found artificial intelligence-based feedback helpful and easy to use, 27% of their commentary expressed concerns regarding feedback accuracy and clarity. Teachers echoed these concerns, citing some inconsistencies and student confusion. Additionally, the study compared Claude 3 Opus, Claude 3.5 Sonnet, and ChatGPT-4 in their ability to identify errors. Results suggest Sonnet may outperform Opus and ChatGPT-4, although unexpected autocorrections by Claude models introduced reliability concerns. These findings suggest that although artificial intelligence tools like Claude 3 Opus may facilitate writing accuracy gains comparable to those achieved through teacher feedback, they could inadvertently hinder other aspects of writing development. Given ongoing advancements in generative artificial intelligence, further research is warranted to explore whether newer models employing test-time compute or generative reasoning can offer improved dynamic written corrective feedback quality without compromising fluency or functional adequacy.

Keywords

Dynamic written corrective feedback second language writing artificial intelligence ChatGPT Claude English as a second language functional adequacy writing accuracy

Get full access to this article

View all access options for this article.

References

Allen

Mizumoto

(2024) ChatGPT over my friends: Japanese English-as-a-Foreign-Language learners’ preferences for editing and proofreading strategies. RELC Journal. Epub ahead of print 26 July 2024. 10.1177/00336882241262533.

American Council on the Teaching of Foreign Languages (2024) ACTFL proficiency guidelines 2024. ACTFL. Available at: https://www.actfl.org (accessed 30 April 2024).

American Council on the Teaching of Foreign Languages (n.d.) Assigning CEFR ratings to ACTFL assessments. ACTFL. Available at: https://www.actfl.org/uploads/files/general/Assigning_CEFR_Ratings_To_ACTFL_Assessments.pdf (accessed 30 April 2024).

Chen

(2025) The effect of written corrective feedback on writing accuracy in senior high school. Literature, Language and Cultural Studies 1(2): 55–62.

Chong

(2019) A systematic review of written corrective feedback research in ESL/EFL contexts. Language Education and Assessment 2(2): 57–69.

Crossley

Bradfield

Bustamante

(2019) Using human judgments to examine the validity of automated grammar, syntax, and mechanical errors in writing. The Journal of Writing Research 11(2): 251–270.

DeKeyser

(2020) Skill acquisition theory. In: Van Patten

Keating

Wulff

(eds) Theories in second language acquisition. 3rd edn. New York: Routledge, 83–104.

Eckes

(2023) Introduction to Many-Facet Rasch Measurement. Berlin: Peter Lang.

ElEbyary

Shabara

Boraie

(2024) The differential role of AI-operated WCF in L2 students’ noticing of errors and its impact on writing scores. Language Testing in Asia 14(1): 59.

10.

Evans

Hartshorn

Martin

, et al. (2014) Measuring written linguistic accuracy with weighted clause ratios: A question of validity. Journal of Second Language Writing 24: 33–50.

11.

Feng

(2025) Investigating the effects of artificial intelligence-assisted language learning strategies on cognitive load and learning outcomes: a comparative study. Journal of Educational Computing Research 62(8): 1961–1994.

12.

Gwet

(2014) Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement among Raters. 4th edn. Gaithersburg, MD, USA: Advanced Analytics.

13.

Han

Hiver

(2018) Genre-based L2 writing instruction and writing-specific psychological factors: The dynamics of change. Journal of Second Language Writing 40: 44–59.

14.

Hartshorn

Evans

(2015) The effects of dynamic written corrective feedback:Aa 30-week study. The Journal of Response to Writing 2: 6–34.

15.

Hartshorn

Evans

Merrill

, et al. (2010) Effects of dynamic corrective feedback on ESL writing accuracy. TESOL Quarterly 44: 84–109.

16.

Hartshorn

McMurry

Rich

(2023) ESL Learner and TESOL practitioner perceptions of language skill difficulty. International Journal of TESOL Studies 5(4): 1–19.

17.

Hartshorn

Rice

Eckstein

, et al. (2023) Dynamic written corrective feedback frequency and its effects on ESL writing fluency, accuracy, and complexity. Feedback Research in Second Language 1: 1–26.

18.

Henderson

Bearman

Chung

, et al. (2025) Comparing generative AI and teacher feedback: Student perceptions of usefulness and trustworthiness. Assessment & Evaluation in Higher Education. Epub ahead of print 13 May 2025. 10.1080/02602938.2025.2502582.

19.

Housen

Kuiken

(2009) Complexity, accuracy, and fluency in second language acquisition. Applied Linguistics 30(4): 461–473.

20.

Housen

Kuiken

Vedder

(2012) Complexity, accuracy and fluency: Definitions, measurement and research. In: Housen

Kuiken

Vedder

(eds) Dimensions of L2 Performance and Proficiency: Complexity, Accuracy and Fluency in SLA. Amsterdam: John Benjamins Publishing Company, 1–20.

21.

Hsu

Chen

Tai

(2025) EFL Learners’ processing of written corrective feedback with think-alouds for L2 pragmatic learning. TESOL Quarterly. Epub ahead of print 23 August 2025.10.1002/tesq.70020.

22.

Huawei

Aryadoust

(2023) A systematic review of automated writing evaluation systems. Education and Information Technologies 28(1): 771–795.

23.

Hyland

(2019) Second Language Writing. New York, NY, USA: Cambridge University Press.

24.

Ingley

Pack

(2023) Leveraging AI tools to develop the writer rather than the writing. Trends in Ecology and Evolution 38(9): 785–787.

25.

John

Woll

(2020) Using grammar checkers in an ESL context: An investigation of automatic corrective feedback. The CALICO Journal 37(2): 169–192.

26.

Karim

Nassaji

(2020) The effects of written corrective feedback. Instructed Second Language Acquisition 3(1): 28–52.

27.

Kasneci

Sessler

Küchemann

, et al. (2023) ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences 103: 1–9.

28.

Kim

Tian

Crossley

(2021) Exploring the relationships among cognitive and linguistic resources, writing processes, and written products in second language writing. Journal of Second Language Writing 53: 100824.

29.

Kuiken

Vedder

(2017) Functional adequacy in L2 writing: Towards a new rating scale. Language Testing 34(3): 321–336..

30.

Kuiken

Vedder

(2022) The assessment of functional adequacy in language performance. TASK. Journal on Task-Based Language Teaching and Learning 2(1): 1–7.

31.

Kurzer

(2019) Dynamic written corrective feedback in a community college ESL writing class setting. In: Bitchener

Storch

(eds) Knowledge Mobilization in TESOL: Connecting Research and Practice. Bristol: Multilingual Matters, 59–75.

32.

Kurzer

(2022) Accuracy gains from unfocused feedback: Dynamic written corrective feedback as meaningful pedagogy. Journal of Language and Education 8(4): 102–116.

33.

Kurzer

(2023) Dynamic written corrective feedback: A scoping review. Feedback Research in Second Language Learning and Teaching 1(1): 93–108.

34.

Zhang

(2022) Contextualizing feedback in L2 writing: The role of teacher scaffolding. Language Awareness 31(3): 328–350.

35.

Vuono

(2019) Twenty-five years of research on oral and written corrective feedback in system. System 84: 93–109.

36.

Lim

Renandya

(2020) Efficacy of written corrective feedback in writing instruction: A meta-analysis. TESL-EJ 24(3): 1–26.

37.

Lin

Crosthwaite

(2024) The grass is not always greener: Teacher vs. GPT-assisted written corrective feedback. System 127: 1–19.

38.

Link

Mehrzad

Rahimi

(2022) Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement. Computer Assisted Language Learning 35(4): 605–634.

39.

Mao

Lee

(2024) Written corrective feedback in second language writing: A synthesis of naturalistic classroom studies. Language Teaching 57: 449–477.

40.

Nguyen

Chu

(2024) Written corrective feedback in second language writing: A review of research. Reflections/Sigma Theta Tau 31(2): 858–870.

41.

Pack

Hartshorn

Escalante

, et al. (2025) How well can GenAI (GPT-4) provide written corrective feedback on English language learners’ writing? International Journal of EAP: Research and Practice 5: 7–26.

42.

Rasool

Qian

Saqlain

, et al. (2022) Written corrective feedback strategies: A systematic review. Voyage Journal of Educational Studies 2(2): 67–83.

43.

Roothooft

Bueno-Alastuey

(2025) Students’ beliefs about written corrective feedback in two foreign language contexts. Language Awareness. Epub ahead of print 10 July 2025. 10.1080/09658416.2025.2522107.

44.

Skehan

(2009) Modelling second language performance: Integrating complexity, accuracy, fluency, and lexis. Applied Linguistics 30(4): 510–532.

45.

Snell

Lee

, et al. (2024) Scaling LLM test-time compute optimally can be more effective than scaling model parameters. arXiv. Available at: https://arxiv.org/pdf/2408.03314 (accessed 30 April 2025).

46.

Stavans

Tolchinsky

(2021) A multidimensional perspective on written language development. Journal for the Study of Education and Development 44(1): 1–8.

47.

Suzuki

(ed.) (2023) Practice and Automatization in Second Language Research: Perspectives from Skill Acquisition Theory and Cognitive Psychology. London: Taylor & Francis.

48.

The Jamovi project (2024) jamovi (Version 2.62). Available at: https://www.jamovi.org/download.html (accessed 1 April 2025).

49.

Truscott

(1996) The case against grammar correction in L2 writing classes. Language Learning 46(2): 327–369.

50.

Truscott

(2007) The effect of error correction on learners’ ability to write accurately. Journal of Second Language Writing 16(4): 255–272.

51.

Vysotsky

(2025) Claude 3.5 Sonnet vs Claude 3 Opus: A comprehensive comparison. Writingmate AI. Available at: https://writingmate.ai/blog/claude-sonnet-vs-opus (accessed 4 June 2025).

52.

Wang

(2024) Teacher-versus AI-generated (Poe application) corrective feedback and language learners’ writing anxiety, complexity, fluency, and accuracy. International Review of Research in Open and Distributed Learning 25(3): 37–56.

53.

Wind

(2024) The effects of dynamic written corrective feedback on EFL university students’ writing accuracy: A complex Dynamic Systems Theory perspective. Journal of Response to Writing 10(2): 1–28.

54.

Wolfe-Quintero

Inagaki

Kim

H-Y

(1998) Second Language Development in Writing: Measures of Fluency, Accuracy and Complexity. Honolulu: University of Hawai’i, Second Language Teaching and Curriculum Center.

55.

Yan

Sha

Zhao

, et al. (2024) Practical and ethical challenges of large language models in education: A systematic scoping review. British Journal of Educational Technology 55(1): 90–112.

56.

Yunus

WNMWM

(2020) Written corrective feedback in English compositions: Teachers’ practices and students’ expectations. English Language Teaching Educational Journal 3(2): 95–107.

57.

Zhang

Fathi

Mohammad Hosseini

, et al. (2025) Investigating the role of ideal L2 writing self, writing growth mindset, and writing enjoyment in L2 writing self-efficacy: A mediation model. Applied Linguistics Review. Epub ahead of print 28 April 2025. https://doi.org/10.1515/applirev-2023-0247.

58.

Zhao

(2025) The impact of AI-enhanced natural language processing tools on writing proficiency: An analysis of language precision, content summarization, and creative writing facilitation. Education and Information Technologies 30(6): 8055–8086.

59.

Zhao

Liu

Zhang

, et al. (2025) GenPRM: Scaling test-time compute of process reward models via generative reasoning. arXiv. https://doi.org/10.48550/arXiv.2504.00891.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

6.69 MB

The effects of artificial intelligence-based dynamic written corrective feedback on second language writing and user sentiment

Abstract

Keywords

Get full access to this article

References

Supplementary Material