Abstract
This study explores the integration of generative artificial intelligence (GenAI) with human experts to improve the quality of distractors in multiple-choice questions (MCQs) for second language (L2) listening tests. A psychometric analysis of responses from 2267 EFL Chinese undergraduates, using the two-parameter logistic nested logit model (2PLNLM), identified problematic items and distractors. Guided by established distractor design principles, GenAI was applied iteratively to refine these distractors, and GenAI was iteratively used to revise these distractors, with human experts providing ongoing feedback throughout the process. The revised versions were then evaluated by expert judgment and NLP-based cosine similarity analysis. The results indicate that GenAI effectively enhanced distractor quality by maintaining content and structural alignment and ensuring semantic independence. However, it struggled to fully capture listening miscomprehension patterns and contextualized language use. These preliminary findings suggest that GenAI revisions, guided by principle-based prompts and supervised by humans, tend to effectively improve the quality of distractors. This study offers practical insights into the potential and limitations of GenAI in improving L2 listening tests.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
