Sage Journals: Discover world-class research

Abstract

This article suggests a new approach based on Bayesian decision theory (e.g., Cronbach & Gleser, 1965; Ferguson, 1967) for detection of test fraud. The approach leads to a simple decision rule that involves the computation of the posterior probability that an examinee committed test fraud given the data. The suggested approach was applied to a real data set that involved actual test fraud.

Keywords

item preknowledge posterior probability of cheating signed likelihood ratio test

Get full access to this article

View all access options for this article.

References

Allen

Ghattas

(2016). Estimating the probability of traditional copying, conditional on answer-copying statistics. Applied Psychological Measurement, 40(4), 258–273. https://doi.org/10.1177/0146621615622780

Berger

J. O.

(1989). Statistical decision theory. In Eatwell

Milgate

Newman

(Eds.), Game theory (pp. 217–224). Palgrave Macmillan. https://doi.org/10.1007/978-1-349-20181-5_26

Bishop

Egan

(2017). Detecting erasures and unusual gain scores: Understanding the status quo. In Cizek

G. J.

Wollack

J. A.

(Eds.), Handbook of detecting cheating on tests (pp. 193–213). Routledge.

Cizek

G. J.

Wollack

J. A.

(2017). Handbook of detecting cheating on tests. Routledge.

Cronbach

L. J.

Gleser

G. C.

(1965). Psychological tests and personnel decisions. University of Illinois Press.

Drasgow

Guertler

(1987). A decision-theoretic approach to the use of appropriateness measurement for detecting invalid test and scale scores. Journal of Applied Psychology, 72(1), 10–18. https://doi.org/10.1037/0021-9010.72.1.10

Drasgow

Levine

M. V.

McLaughlin

M. E.

(1987). Detecting inappropriate test scores with optimal and practical appropriateness indices. Applied Psychological Measurement, 11(1), 59–79. https://doi.org/10.1177/014662168701100105

Drasgow

Levine

M. V.

Williams

E. A.

(1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86. https://doi.org/10.1111/j.2044-8317.1985.tb00817.x

Drasgow

Levine

M. V.

Zickar

M. J.

(1996). Optimal identification of mismeasured individuals. Applied Measurement in Education, 9(1), 47–64. https://doi.org/10.1207/s15324818ame0901_5

10.

Eckerly

(2021). Answer similarity analysis at the group level. Applied Psychological Measurement, 45(5), 299–314. https://doi.org/10.1177/01466216211013109

11.

Eckerly

Smith

Lee

(2018). An introduction to item preknowledge detection with real data applications. Paper Presented at the Conference on Test Security.

12.

Ferguson

T. S.

(1967). Mathematical statistics: A decision theoretic approach. Academic Press.

13.

Hanley

J. A.

McNeil

B. J.

(1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747

14.

Holland

P. W.

(1996). Assessing unusual agreement between the incorrect answers of two examinees using the K-index: Statistical theory and empirical support (ETS Research Report No. RR-96-7). ETS.

15.

Jacob

Levitt

(2003). Rotten apples: An investigation of the prevalence and predictors of teacher cheating. Quarterly Journal of Economics, 118(3), 843–877. https://doi.org/10.1162/00335530360698441

16.

Karabatsos

(2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16(4), 277–298. https://doi.org/10.1207/s15324818ame1604_2

17.

Levine

M. V.

Drasgow

(1988). Optimal appropriateness measurement. Psychometrika, 53(2), 161–176. https://doi.org/10.1007/bf02294130

18.

Levine

M. V.

Rubin

D. B.

(1979). Measuring the appropriateness of multiple-choice test scores. Journal of Educational Statistics, 4(4), 269–290. https://doi.org/10.2307/1164595

19.

Lewis

Sheehan

(1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14(4), 367–386. https://doi.org/10.1177/014662169001400404

20.

Lewis

Thayer

D. T.

(1998). The power of the K-index (or PMIR) to detect copying (ETS Research Report No. RR-98-49). ETS.

21.

Lord

F. M.

Wingersky

M. S.

(1984). Comparison of IRT true-score and equipercentile observed-score “equatings”. Applied Psychological Measurement, 8(4), 453–461. https://doi.org/10.1177/014662168400800409

22.

Maynes

(2013). Educator cheating and the statistical detection of group-based test security threats. In Wollack

J. A.

Fremer

J. J.

(Eds.), Handbook of test security (pp. 173–199). Routledge.

23.

Maynes

(2014). Detection of non-independent test-taking by similarity analysis. In Kingston

N. M.

Clark

A. K.

(Eds.), Test fraud: Statistical detection and methodology (pp. 53–82). Routledge.

24.

Maynes

(2018). Improving answer-copying inferences through bayesian analysis. Paper Presented at the 2018 Conference on Test Security.

25.

Meijer

R. R.

Sijtsma

(2001). Methodology review: Evaluating person fit. Applied Psychological Measurement, 25(2), 107–135. https://doi.org/10.1177/01466210122031957

26.

Mueller

Zhang

Ferrara

(2017). What have we learned? In Cizek

G. J.

Wollack

J. A.

(Eds.), Handbook of detecting cheating on tests (pp. 373–389). Routledge.

27.

Phelps

R. P.

(2000). Estimating the cost of standardized student testing in the United States. Journal of Education Finance, 25, 343–380.

28.

Reckase

M. D.

(1983). A procedure for decision making using tailored testing. In Weiss

D. J.

(Ed.), New horizons in testing (pp. 237–255). Erlbaum. https://doi.org/10.1016/b978-0-12-742780-5.50023-8

29.

Rorer

L. G.

Dawes

R. M.

(1982). A base-rate bootstrap. Journal of Consulting and Clinical Psychology, 50(3), 419–425. https://doi.org/10.1037/0022-006x.50.3.419

30.

Sinharay

(2017). Detection of item preknowledge using likelihood ratio test and score test. Journal of Educational and Behavioral Statistics, 42(1), 46–68. https://doi.org/10.3102/1076998616673872

31.

Sinharay

(2018a). Application of Bayesian methods for detecting fraudulent behavior on tests. Measurement: Interdisciplinary Research and Perspective, 16(2), 100–113. https://doi.org/10.1080/15366367.2018.1437308

32.

Sinharay

(2018b). A new person-fit statistic for the lognormal model for response times. Journal of Educational Measurement, 55(4), 457–476. https://doi.org/10.1111/jedm.12188

33.

Sinharay

(2020). Detection of item preknowledge using response times. Applied Psychological Measurement, 44(5), 376–392. https://doi.org/10.1177/0146621620909893

34.

Sinharay

(2023). Statistical methods for detection of test fraud on educational assessments. In Tierney

Rizvi

Ercikan

(Eds.), International encyclopedia of education (4th ed., Vol. 14, pp. 298–307). Elsevier Science. https://doi.org/10.1016/b978-0-12-818630-5.10030-2

35.

Sinharay

Duong

M. Q.

Wood

S. W.

(2017). A new statistic for detection of aberrant answer changes. Journal of Educational Measurement, 54(2), 200–217. https://doi.org/10.1111/jedm.12141

36.

Sinharay

Jensen

J. L.

(2019). Higher-order asymptotics and its application to testing the equality of the examinee ability over two sets of items. Psychometrika, 84(2), 484–510. https://doi.org/10.1007/s11336-018-9627-8

37.

Sinharay

Johnson

M. S.

(2021). The use of the posterior probability in score differencing. Journal of Educational and Behavioral Statistics, 46(4), 403–429. https://doi.org/10.3102/1076998620957423

38.

Sinharay

Monroe

(2024). Assessment of fit of item response theory models: A critical review of the status quo and some future directions. British Journal of Mathematical and Statistical Psychology. https://doi.org/10.1111/bmsp.12378

39.

Skorupski

W. P.

Wainer

(2017). The case for Bayesian methods when investigating test fraud. In Cizek

G. J.

Wollack

J. A.

(Eds.), Handbook of detecting cheating on tests (pp. 214–231). Routledge.

40.

van der Linden

W. J.

(1980). Decision models for use with criterion-referenced tests. Applied Psychological Measurement, 4(4), 469–492. https://doi.org/10.1177/014662168000400404

41.

van der Linden

W. J.

(2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72(3), 287–308. https://doi.org/10.1007/s11336-006-1478-z

42.

van der Linden

W. J.

Guo

(2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73(3), 365–384. https://doi.org/10.1007/s11336-007-9046-8

43.

van der Linden

W. J.

Lewis

(2015). Bayesian checks on cheating on tests. Psychometrika, 80(3), 689–706. https://doi.org/10.1007/s11336-014-9409-x

44.

van Rijn

Sinharay

(2023). Modeling item response times. In Tierney

Rizvi

Ercikan

(Eds.), International encyclopedia of education (4th ed., Vol. 14, pp. 321–330). Elsevier Science. https://doi.org/10.1016/b978-0-12-818630-5.10040-5

45.

Wang

Liu

Hambleton

R. K.

(2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41(4), 243–263. https://doi.org/10.1177/0146621616687285

46.

Wasserstein

R. L.

Lazar

N. A.

(2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133. https://doi.org/10.1080/00031305.2016.1154108

47.

Wollack

J. A.

(1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21(4), 307–320. https://doi.org/10.1177/01466216970214002

48.

Wollack

J. A.

Cohen

A. S.

Eckerly

C. A.

(2015). Detecting test tampering using item response theory. Educational and Psychological Measurement, 75(6), 931–953. https://doi.org/10.1177/0013164414568716

49.

Wollack

J. A.

Schoenig

R. W.

(2018). Cheating. In Frey

B. B.

(Ed.), The Sage encyclopedia of educational research, measurement, and evaluation (pp. 260–265). Sage.

50.

Zopluoglu

(2017). Similarity, answer copying, and aberrance: Understanding the status quo. In Cizek

G. J.

Wollack

J. A.

(Eds.), Handbook of detecting cheating on tests (pp. 25–46). Routledge.

Application of Bayesian Decision Theory in Detecting Test Fraud

Abstract

Keywords

Get full access to this article

References