Detecting Compromised Items Using Information From Secure Items

Abstract

In continuous testing programs, some items are repeatedly used across test administrations, and statistical methods are often used to evaluate whether items become compromised due to examinees’ preknowledge. In this study, we proposed a residual method to detect compromised items when a test can be partitioned into two subsets of items: secure items and possibly compromised items. We derived the standard error of the residual statistic by taking the sampling error in both ability and item parameter estimate into account. The simulation results suggest that the Type I error is close to the nominal level when both sources of error are adjusted, and item parameter error can be ignored only when the item calibration sample size is much larger than the evaluation sample size. We also investigated the performance of the residual method when not using information from secure items in both simulation and real data analyses.

Keywords

item preknowledge item response theory goodness of fit

Get full access to this article

View all access options for this article.

References

Belov

D. I.

(2013). Detection of test collusion via Kullback–Leibler divergence. Journal of Educational Measurement, 50, 141–163.

Belov

D. I.

(2014). Detecting item preknowledge in computerized adaptive testing using information theory and combinatorial optimization. Journal of Computerized Adaptive Testing, 2(3), 37–58.

Birnbaum

(1968). Some latent trait models. In Lord

F. M.

Novick

M. R.

(Eds.), Statistical theories of mental test scores (pp. 397–479). Addison-Wesley.

Bishop

Y. M.

Fienberg

S. E.

Holland

P. W.

(2007). Discrete multivariate analysis: Theory and practice. Springer-Verlag.

Chalmers

R. P.

(2012). mirt: A multidimensional item response theory package for the r environment. Journal of Statistical Software, 48(6), 1–29.

Cizek

G. J.

Wollack

J. A.

(2017). Handbook of quantitative methods for detecting cheating on tests. Routledge.

Dorans

N. J.

Kulick

(1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23, 355–368.

Hambleton

R. K.

Swaminathan

Rogers

H. J.

(1991). Fundamentals of item response theory. Sage.

Lee

Lewis

von Davier

A. A.

(2014). Test security and quality control for multistage tests. In Yan

von Davier

A. A.

Lewis

(Eds.), Computerized multistage testing: Theory and applications (pp. 285–300). Chapman and Hall/CRC.

10.

Liu

Yang

Maydeu-Olivares

(2019). Restricted recalibration of item response theory models. Psychometrika, 84, 529–553. https://doi.org/10.1007/s11336-019-09667-4

11.

McLeod

Schnipke

(1999). Detecting items that have been memorized in the computerized adaptive testing environment [Paper presentation]. Annual Meeting of National Council on Measurement in Education, Montreal, CA, United States.

12.

Montgomery

D. C.

(2008). Introduction to statistical quality control (6th ed.). Wiley.

13.

O’Leary

Smith

(2017). Detecting candidate preknowledge and compromised content using differential person and item functioning. In Cizek

G. J.

Wollack

J. A.

(Eds.), Handbook of quantitative methods for detecting cheating on tests (pp. 151–163). Routledge.

14.

R Core Team. (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

15.

Segall

(2002). An item response model for characterizing test comprise. Journal of Educational and Behavioral Statistics, 27(2), 163–179.

16.

Shu

Henson

Luecht

(2013). Using deterministic, gated item response theory model to detect test cheating due to item compromise. Psychometrika, 78(3), 481–497.

17.

Sinharay

(2017a). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46–68.

18.

Sinharay

(2017b). Which statistic should be used to detect item preknowledge when the set of compromised items is known? Applied Psychological Measurement, 41(6), 403–421.

19.

Veerkamp

W. J. J.

Glas

C. A. W.

(2000). Detection of known items in adaptive testing with a statistical quality control method. Journal of Educational and Behavioral Statistics, 25, 373–389.

20.

Wang

Shang

Kuncel

(2018). Detecting aberrant behavior and item preknowledge: A comparison of mixture modeling method and residual method. Journal of Educational and Behavioral Statistics, 43(4), 1–33.

21.

Wang

Liu

Hambleton

R. K.

(2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41(4), 243–263.

22.

Woods

C. M.

Thissen

(2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71, 281–301.

23.

Zhang

(2014). A sequential procedure for detecting compromised items in the item pool of a cat system. Applied Psychological Measurement, 38(2), 87–104.

24.

Zhang

(2000). Monitoring items in real time to enhance cat security. Journal of Educational Measurement, 53(2), 131–151.

25.

Zhu

Liu

(2002). Statistical indexes for monitoring item behavior under computer adaptive testing environment [Paper presentation]. Annual Meeting of the American Educational Research Association, New Orleans, LA, United States.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.17 MB