Abstract
Recent studies have highlighted the potential of generative artificial intelligence, such as ChatGPT, to address challenges in providing accurate and pedagogically relevant feedback. However, empirical evidence on how prompt engineering shapes feedback quality remains limited. This study examined how zero-shot, few-shot and chain-of-thought prompting strategies influenced the accuracy and depth of ChatGPT-generated qualitative feedback on second language (L2) essays. A total of 176 essays from Filipino and Thai learners with intermediate English proficiency were evaluated using ChatGPT-4o under the three prompting strategies. The findings showed that few-shot prompting achieved the highest accuracy, while chain-of-thought prompting produced the most elaborated feedback, particularly in addressing grammatical complexity. Zero-shot prompting lagged in both accuracy and depth, with notable issues in grammatical feedback. Implications for L2 writing instruction, assessment and research are discussed.
Keywords
Get full access to this article
View all access options for this article.
