Abstract
Background
Multiple-choice item (MCI) assessments are burdensome for instructors to develop. Artificial intelligence (AI, e.g., ChatGPT) can streamline the process without sacrificing quality. The quality of AI-generated MCIs and human experts is comparable. However, whether the quality of AI-generated MCIs is equally good across various domain- and task-specific prompts remains to be determined. Therefore, we ask whether AI can generate high-quality MCIs to assess learning outcomes from a psychology textbook chapter reading.
Objective
In an exploratory study, we enlist Item Response Theory analysis and expert reviewers to assess MCIs generated by ChatGPT-4 from a psychology textbook chapter.
Method
We submitted a prompt and textbook chapter to ChatGPT-4 requesting 20 MCIs. One hundred ninety undergraduate participants read the chapter before responding to the MCIs. Expert reviewers assessed the MCIs for learning outcome alignment and quality.
Results
ChatGPT-4-generated MCIs were low in difficulty and high in discrimination. Expert reviewers found that nearly all items were logically sound, aligned with learning objectives, and met prevailing standards of MCI quality.
Conclusion
When carefully prompted, ChatGPT-4 can rapidly generate high-quality MCIs to test comprehension of a psychology textbook chapter. However, due to the uniformly low difficulty of the items, we recommend enlisting ChatGPT-4 to write MCIs for formative, but not summative, assessments.
Get full access to this article
View all access options for this article.
