Prevalence and Magnitude of Paradoxical Results in Multidimensional Item Response Theory

Abstract

Multidimensional Item Response Theory (MIRT) has been proposed as a means to model the relation between examinee abilities and test responses. Three recent articles proved that when MIRT is used in ability estimation, an examinee’s score could theoretically decrease due to a correct answer or increase due to an incorrect answer. The current article examines the extent to which such “paradoxical results” can arise in practice. In an operational test designed to measure two dimensions, a substantial percentage of paradoxical results occurred when using a MIRT model with a prior correlation of 0 between abilities. Assuming a positive correlation between abilities reduced the prevalence of paradoxical results but did not eliminate them entirely. Associated issues in test fairness are discussed.

Keywords

Multidimensional Item Response Theory multidimensional 3-parameter logistic model test fairness

Get full access to this article

View all access options for this article.

References

Ackerman

(1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20, 311–329.

Bock

R. D.

(1972). Estimating item parameters and latent ability when responses are scored in two or more latent categories. Psychometrika, 37, 29–51.

Bock

R. D.

Gibbons

Muraki

(1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261–280.

Bock

R. D.

Gibbons

Schilling

S. G.

Muraki

Wilson

D. T.

Wood

(1999). TESTFACT 3 Manual. Lincolnwood, IL: Scientific Software International Inc.

Burket

G.R.

(1991) PARDUX [Computer program] Unpublished.

Cohen

(1983). Applied multiple regression/correlational analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum.

Finkelman

Nering

Roussos

L. A.

(2009). A conditional exposure control method for multidimensional adaptive testing. Journal of Educational Measurement, 46, 84–103.

Hooker

(in press). On separable tests, correlated priors and paradoxical results in multidimensional item response theory. Psychometrika,

Hooker

Finkelman

(2010). Paradoxical results and item bundles. Psychometrika, 75, 249–271.

10.

Hooker

Finkelman

Schwartzman

(2009). Paradoxical results in multidimensional item response theory. Psychometrika, 74, 419–442.

11.

Luecht

R. M.

(1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389–404.

12.

Reckase

M. D.

(1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.

13.

Reckase

M. D.

(1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36.

14.

Samejima

(1968). Application of the graded response model to the nominal response and multiple choice situations. (Research Report #63). Chapel Hill: University of North Carolina, L.L. Thurstone Psychometric Laboratory.

15.

Samejima

(1979). A new family of models for the multiple choice item. (Research Report #79–4). Knoxville: University of Tennessee, Department of Psychology.

16.

Segall

D. O.

(1996). Multidimensional adaptive testing. Psychometrika, 61, 331–354.

17.

Segall

D. O.

(2000). Principles of multidimensional adaptive testing. In van der Linden

W. J.

Glas

C. A. W.

(Eds.), Computerized adaptive testing: Theory and practice (pp. 53–73). Boston, MA: Kluwer Academic Publishers.

18.

te Marvelde

J. M.

Glas

C. A. W.

Van Landeghem

Van Damme

(2006). Application of multidimensional item response theory models to longitudinal data. Educational and Psychological Measurement, 66, 5–34.

19.

Thissen

Steinberg

(1984). A response model for multiple choice items. Psychometrika, 49, 501–519.

20.

Thissen

Steinberg

(1997). A response model for multiple choice items. In van der Linden

W. J.

Hambleton

R. K.

(Eds.), Handbook of item response theory (pp. 51–65). New York, NY: Springer-Verlag.

21.

van der Linden

W. J.

(1999). Multidimensional adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics, 24, 398–412.

22.

Veldkamp

B. P.

van der Linden

W. J.

(2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67, 575–588.

23.

Yao

Boughton

K. A.

(2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83–105.