This article suggests a new approach based on Bayesian decision theory (e.g., Cronbach & Gleser, 1965; Ferguson, 1967) for detection of test fraud. The approach leads to a simple decision rule that involves the computation of the posterior probability that an examinee committed test fraud given the data. The suggested approach was applied to a real data set that involved actual test fraud.
AllenJ.GhattasA. (2016). Estimating the probability of traditional copying, conditional on answer-copying statistics. Applied Psychological Measurement, 40(4), 258–273. https://doi.org/10.1177/0146621615622780
2.
BergerJ. O. (1989). Statistical decision theory. In EatwellJ.MilgateM.NewmanP. (Eds.), Game theory (pp. 217–224). Palgrave Macmillan. https://doi.org/10.1007/978-1-349-20181-5_26
3.
BishopS.EganK. (2017). Detecting erasures and unusual gain scores: Understanding the status quo. In CizekG. J.WollackJ. A. (Eds.), Handbook of detecting cheating on tests (pp. 193–213). Routledge.
4.
CizekG. J.WollackJ. A. (2017). Handbook of detecting cheating on tests. Routledge.
5.
CronbachL. J.GleserG. C. (1965). Psychological tests and personnel decisions. University of Illinois Press.
6.
DrasgowF.GuertlerE. (1987). A decision-theoretic approach to the use of appropriateness measurement for detecting invalid test and scale scores. Journal of Applied Psychology, 72(1), 10–18. https://doi.org/10.1037/0021-9010.72.1.10
7.
DrasgowF.LevineM. V.McLaughlinM. E. (1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11(1), 59–79. https://doi.org/10.1177/014662168701100105
8.
DrasgowF.LevineM. V.WilliamsE. A. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x
9.
DrasgowF.LevineM. V.ZickarM. J. (1996). Optimal identification of mismeasured individuals. Applied Measurement in Education, 9(1), 47–64. https://doi.org/10.1207/s15324818ame0901_5
EckerlyC.SmithR.LeeY. (2018). An introduction to item preknowledge detection with real data applications. Paper Presented at the Conference on Test Security.
12.
FergusonT. S. (1967). Mathematical statistics: A decision theoretic approach. Academic Press.
13.
HanleyJ. A.McNeilB. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747
14.
HollandP. W. (1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (ETS Research Report No. RR-96-7). ETS.
15.
JacobB.LevittS. (2003). Rotten apples: An investigation of the prevalence and predictors of teacher cheating. Quarterly Journal of Economics, 118(3), 843–877. https://doi.org/10.1162/00335530360698441
16.
KarabatsosG. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277–298. https://doi.org/10.1207/s15324818ame1604_2
LevineM. V.RubinD. B. (1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4(4), 269–290. https://doi.org/10.2307/1164595
19.
LewisC.SheehanK. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14(4), 367–386. https://doi.org/10.1177/014662169001400404
20.
LewisC.ThayerD. T. (1998). The power of the K-index (or PMIR) to detect copying (ETS Research Report No. RR-98-49). ETS.
21.
LordF. M.WingerskyM. S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409
22.
MaynesD. (2013). Educator cheating and the statistical detection of group-based test security threats. In WollackJ. A.FremerJ. J. (Eds.), Handbook of test security (pp. 173–199). Routledge.
23.
MaynesD. (2014). Detection of non-independent test-taking by similarity analysis. In KingstonN. M.ClarkA. K. (Eds.), Test fraud: Statistical detection and methodology (pp. 53–82). Routledge.
24.
MaynesD. (2018). Improving answer-copying inferences through bayesian analysis. Paper Presented at the 2018 Conference on Test Security.
25.
MeijerR. R.SijtsmaK. (2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957
26.
MuellerL.ZhangY.FerraraS. (2017). What have we learned? In CizekG. J.WollackJ. A. (Eds.), Handbook of detecting cheating on tests (pp. 373–389). Routledge.
27.
PhelpsR. P. (2000). Estimating the cost of standardized student testing in the United States. Journal of Education Finance, 25, 343–380.
SinharayS. (2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46–68. https://doi.org/10.3102/1076998616673872
31.
SinharayS. (2018a). Application of Bayesian methods for detecting fraudulent behavior on tests. Measurement: Interdisciplinary Research and Perspective, 16(2), 100–113. https://doi.org/10.1080/15366367.2018.1437308
32.
SinharayS. (2018b). A new person-fit statistic for the lognormal model for response times. Journal of Educational Measurement, 55(4), 457–476. https://doi.org/10.1111/jedm.12188
33.
SinharayS. (2020). Detection of item preknowledge using response times. Applied Psychological Measurement, 44(5), 376–392. https://doi.org/10.1177/0146621620909893
34.
SinharayS. (2023). Statistical methods for detection of test fraud on educational assessments. In TierneyR.RizviF.ErcikanK. (Eds.), International encyclopedia of education (4th ed., Vol. 14, pp. 298–307). Elsevier Science. https://doi.org/10.1016/b978-0-12-818630-5.10030-2
35.
SinharayS.DuongM. Q.WoodS. W. (2017). A new statistic for detection of aberrant answer changes. Journal of Educational Measurement, 54(2), 200–217. https://doi.org/10.1111/jedm.12141
36.
SinharayS.JensenJ. L. (2019). Higher-order asymptotics and its application to testing the equality of the examinee ability over two sets of items. Psychometrika, 84(2), 484–510. https://doi.org/10.1007/s11336-018-9627-8
37.
SinharayS.JohnsonM. S. (2021). The use of the posterior probability in score differencing. Journal of Educational and Behavioral Statistics, 46(4), 403–429. https://doi.org/10.3102/1076998620957423
38.
SinharayS.MonroeS. (2024). Assessment of fit of item response theory models: A critical review of the status quo and some future directions. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12378
39.
SkorupskiW. P.WainerH. (2017). The case for Bayesian methods when investigating test fraud. In CizekG. J.WollackJ. A. (Eds.), Handbook of detecting cheating on tests (pp. 214–231). Routledge.
40.
van der LindenW. J. (1980). Decision models for use with criterion-referenced tests. Applied Psychological Measurement, 4(4), 469–492. https://doi.org/10.1177/014662168000400404
41.
van der LindenW. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308. https://doi.org/10.1007/s11336-006-1478-z
42.
van der LindenW. J.GuoF. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384. https://doi.org/10.1007/s11336-007-9046-8
van RijnP.SinharayS. (2023). Modeling item response times. In TierneyR.RizviF.ErcikanK. (Eds.), International encyclopedia of education (4th ed., Vol. 14, pp. 321–330). Elsevier Science. https://doi.org/10.1016/b978-0-12-818630-5.10040-5
45.
WangX.LiuY.HambletonR. K. (2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41(4), 243–263. https://doi.org/10.1177/0146621616687285
46.
WassersteinR. L.LazarN. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108
47.
WollackJ. A. (1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21(4), 307–320. https://doi.org/10.1177/01466216970214002
48.
WollackJ. A.CohenA. S.EckerlyC. A. (2015). Detecting test tampering using item response theory. Educational and Psychological Measurement, 75(6), 931–953. https://doi.org/10.1177/0013164414568716
49.
WollackJ. A.SchoenigR. W. (2018). Cheating. In FreyB. B. (Ed.), The Sage encyclopedia of educational research, measurement, and evaluation (pp. 260–265). Sage.
50.
ZopluogluC. (2017). Similarity, answer copying, and aberrance: Understanding the status quo. In CizekG. J.WollackJ. A. (Eds.), Handbook of detecting cheating on tests (pp. 25–46). Routledge.