Abstract
Traditional item development methods have constrained the advancement of computerized adaptive testing, hindering the achievement of fully intelligent assessments. With the progress of natural language processing technologies, automatic item generation (AIG) based on large language models (LLMs) offers a promising solution to this challenge. This study employed three LLMs to generate Simplified Chinese Big Five personality items and evaluated the effectiveness of the resulting adaptive item bank through two rounds of empirical testing. The goal was to leverage emerging technologies to address one of the key bottlenecks in CAT development and to promote the realization of fully automated, intelligent assessment workflows. Findings indicate that LLM-based AIG can produce high-quality Big Five CAT item banks cost-effectively and efficiently. Moreover, this approach demonstrates robust performance across different LLMs, highlighting its cross-model stability and practical potential.
Keywords
Get full access to this article
View all access options for this article.
