On the Optimality of Answer-Copying Indices

Abstract

Multiple-choice exams are frequently used as an efficient and objective method to assess learning, but they are more vulnerable to answer copying than tests based on open questions. Several statistical tests (known as indices in the literature) have been proposed to detect cheating; however, to the best of our knowledge, they all lack mathematical support that guarantees optimality in any sense. We partially fill this void by deriving the uniformly most powerful (UMP) test under the assumption that the response distribution is known. In practice, however, we must estimate a behavioral model that yields a response distribution for each question. As an application, we calculate the empirical type I and type II error rates for several indices that assume different behavioral models using simulations based on real data from 12 nationwide multiple-choice exams taken by fifth and ninth graders in Colombia. We find that the most powerful index among those studied, subject to the restriction of preserving the type I error, is one based on the work of Wollack and is superior to the index developed by Wesolowsky.

Keywords

ω index answer copying false discovery rate Neyman–Pearson lemma

Get full access to this article

View all access options for this article.

References

Angoff

W. H.

(1974). The development of statistical indices for detecting cheaters. Journal of the American Statistical Association, 69, 44–49.

Bellezza

F. S.

Bellezza

S. F.

(1989). Detection of cheating on multiple-choice tests by using error-similarity analysis. Teaching of Psychology, 16, 151–155.

Belov

D. I.

(2011). Detection of answer copying based on the structure of a high-stakes test. Applied Psychological Measurement, 35, 495–517.

Casella

Berger

R. L.

(2002). Statistical inference. Pacific Grove, CA: Thomson Learning.

Cohen

(1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

Frary

R. B.

Tideman

T. N.

Watts

T. M.

(1977). Indices of cheating on multiple-choice tests. Journal of Educational Statistics, 2, 235–256.

Germain

Abdous

Valois

(2014). rirt: Item response theory simulation and estimation [Computer software manual]. (R package version 1.3.0).

Hambleton

R. K.

Swaminathan

Rogers

H. J.

(1991). Fundamentals of item response theory. Newbury Park, CA: SAGE.

Holland

(1996). Assessing unusual agreement between the incorrect answers of two examinees using the k index: Statistical theory and empirical support (ETS technical report No. 96–4). Princeton, NJ: Educational Testing Service.

10.

Hong

(2013). On computing the distribution function for the Poisson binomial distribution. Computational Statistics & Data Analysis, 59, 41–51.

11.

Lehmann

(1999). Elements of large-sample theory. New York, NY: Springer.

12.

Lehmann

Romano

(2005). Testing statistical hypotheses. New York, NY: Springer.

13.

Neyman

Pearson

E. S.

(1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 231, 289–337.

14.

Sotaridona

L. S.

Meijer

R. R.

(2002). Statistical properties of the k-index for detecting answer copying. Journal of Educational Measurement, 39, 115–132.

15.

Sotaridona

L. S.

Meijer

R. R.

(2003). Two new statistics to detect answer copying. Journal of Educational Measurement, 40, 53–69.

16.

Sotaridona

L. S.

van der Linden

W. J.

Meijer

R. R.

(2006). Detecting answer copying using the kappa statistic. Applied Psychological Measurement, 30, 412–431.

17.

van der Linden

W. J.

Hambleton

(1997). Handbook of modern item response theory. New York, NY: Springer.

18.

van der Linden

W. J.

Sotaridona

(2004). A statistical test for detecting answer copying on multiple-choice tests. Journal of Educational Measurement, 41, 361–377.

19.

van der Linden

W. J.

Sotaridona

(2006). Detecting answer copying when the regular response process follows a known response model. Journal of Educational and Behavioral Statistics, 31, 283–304.

20.

Wang

Y. H.

(1993). On the number of successes in independent trials. Statistica Sinica, 3, 295–312.

21.

Wesolowsky

(2000). Detecting excessive similarity in answers on multiple choice exams. Journal of Applied Statistics, 27, 909–921.

22.

Wollack

J. A.

(1997). A nominal response model approach for detecting answer copying. Applied Psychological Measurement, 21, 307–320.

23.

Wollack

J. A.

(2003). Comparison of answer copying indices with real data. Journal of Educational Measurement, 40, 189–205.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.00 MB