Harnessing AI for Educational Measurement: Standards and Emerging Frontiers

Abstract

The surge of AI in education raises concerns about measurement downsides. Calls for clear standards are warranted. Fortunately, the psychometrics field has a long history of developing relevant standards—like sample invariance and item bias avoidance—crucial for reliable, valid, and interpretable assessments. This established body of knowledge, not unlike traffic laws for self-driving cars, should guide AI assessment development. Measuring new constructs necessitates stronger construct validity research. Instead of rewriting the rulebook, our focus should be on educating AI developers about these standards. This commentary specifically addresses the concern of empowering instructors not with high-stakes testing but with effective item writing through AI. We explore the potential of AI to transform item development, a key area highlighted by researchers. While AI tools offer exciting possibilities for tackling educational challenges, equipping instructors to leverage them effectively remains paramount.

Keywords

AI-powered psychometric tools educational measurement standards of testing

Get full access to this article

View all access options for this article.

References

Allen

M. A.

Yen

W. M.

(1979). Introduction to measurement theory. Reprint by Waveland Press, Dec 14, 2001.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (2014). The standards for educational and psychological testing (2014th ed.).

Chang

H.-H.

(2015). Psychometrics behind computerized adaptive testing. Psychometrika, 80(1), 1–20.

Chang

H.-H.

Mazzeo

Roussos

(1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33(3), 333–353.

Chang

H.-H.

Wang

Zhang

(2021). Statistical applications in educational measurement. Annual Review of Statistics and Its Application, 8, 439–461.

Cheng

Chen

P-H

Qian

J-H.

Chang

H-H.

(2013). Equated pooled booklet method in DIF testing. Applied Psychological Measurement, 37(4), 276–288.

Dieterle

(March 6, 2024). AI and emerging technology ambassadors in education: researchers lead the way. https://blog.englishtest.duolingo.com/ai-and-emerging-technology-ambassadors-in-education-researchers-lead-the-way/#:∼:text=Leveraging%20AI%20for%20secure%2C%20efficient,oversight%20ensures%20fairness%20and%20accuracy

Donnelly

D. F.

Vitale

J. M.

Linn

M. C.

(2015). Automated guidance for thermodynamics essays: Critiquing versus revisiting. Journal of Science Education and Technology, 24(6), 861–874.

Gerard

Kidron

Linn

M. C.

(2019). Guiding collaborative revision of science explanations. International Journal of Computer-Supported Collaborative Learning, 14(3), 291–324.

10.

Nissen

Tang

Zhang

Mehrabi

Chang

Dusen

(under review). Assessing the assessments with Mechanics Cognitive Diagnostic: Skills tested in introductory physics courses.

11.

P. V.

Nissen

J. M.

Tang

Zhang

Mehrabi

Morphew

Chang

H-H.

Van Dusen

(2024). Assessing the assessments with mechanics cognitive diagnostic: Skills tested in introductory physics courses. PsyArXiv. https://arxiv.org/abs/2404.00009

12.

Liu

You

Wang

Ding

Chang

H.-H.

(2013). The development of computerized adaptive testing with cognitive diagnosis for an English achievement test in China. Journal of Classification, 30, 152–172.

13.

Liu

You

Wang

Ding

Chang

H.-H.

(2014). Large-scale implementation of computerized adaptive testing with cognitive diagnosis in China. In Cheng

Chang

H.-H.

(Eds.), Advanced methodologies to support both summative and formative assessments (pp. 245–261). Information Age Publisher Inc.

14.

Morphew

Mestre

Kang

Chang

H.-H.

Fabry

(2018). Using computer adaptive testing to assess physics proficiency and improve exam performance. Physical Review Physics Education Research, 020110-1-202110-16. https://doi.org/10.1103/PhysRevPhysEducRes.14.010127

15.

Traxler

Henderson

Stewart

Papak

Lindell

(2018). Gender fairness within the force concept inventory. Physical Review Physics Education Research, 14(1), 010103.

16.

Zheng

(February 2024). Psychometrics empowering large language models in Chinese essay automated scoring. Invited presentation #2 in Craft Data Science Insights: A 40 Minute Exploration. College of Education at Purdue University. https://youtu.be/2AmBLN0C5m8?si=yS9ePFgxtYyrc2nX