Sage Journals: Discover world-class research

Abstract

A number of statistical methods exist for the detection of differential item functioning (DIF). The performance of DIF methods has been widely studied and generally found to be effective in the detection of both uniform and nonuniform DIF. Anecdotal reports suggest that these techniques may too often incorrectly detect the presence of one type of DIF in the presence of the other type (Type I error). The purposes of this simulation study are to ascertain whether these observations are in fact accurate and, if so, to gain some understanding as to the cause of the inflated Type I error. Results do support that the Type I error rates for detecting one type of DIF in the presence of the other are inflated for most common DIF detection techniques. Discussion focuses on potential causes of these results.

Keywords

differential item functioning Type I error rate SIBTEST IRT likelihood ratio

Get full access to this article

View all access options for this article.

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing . Washington, DC: American Educational Research Association.

Bolt, D.M. , & Gierl, M.J. (2006). Testing features of graphical DIF: Application of a regression correction to three nonparametric statistical tests. Journal of Educational Measurement, 43, 313-334.

Borsboom, D. , Mellenbergh, G.J. , & van Heerden, J. (2002). Different kinds of DIF: A distinction between absolute and relative forms of measurement bias. Applied Psychological Measurement, 26, 433-450.

Finch, W.H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.

Finch, W.H. , & French, B.F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 67, 565-582.

Hambleton, R.K. , Swaminathan, H. , & Rogers, J.H. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.

Holland, P.W. , & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Holland & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Lawrence Erlbaum.

Li, H.-H. , & Stout, W. (1996). A new procedure for detection of crossing DIF . Psychometrika, 61, 647-677.

Lord, F. , & Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

10.

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

11.

McDonald, R.P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.

12.

Millsap, R.E. , & Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement , 17, 297-334.

13.

Narayanan, P. , & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning . Applied Psychological Measurement, 20, 257-274.

14.

Narayanan, P. , & Swaminathan, H. (1996). Identification of items that show non-uniform DIF. Applied Psychological Measurement , 20, 257-274.

15.

Raju, N.S. (1988). The area between two item characteristic curves . Psychometrika, 53, 495-502.

16.

Rogers, H.J. , & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.

17.

Roussos, L.A. , & Stout, W. (2004). Differential item functioning analysis. In D. Kaplan (Ed.), The Sage handbook of quantitative methodology for the social sciences (pp. 107-115). Thousands Oaks, CA: Sage.

18.

Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85-100). New York: Springer.

19.

Shealy, R. , & Stout, W.F. (1993). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.

20.

Steinberg, L. , & Thissen, D. (2006). Using effect sizes for research reporting: Examples using item response theory to analyze differential item functioning. Psychological Methods, 11, 402-415.

21.

Swaminathan, H. , & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement , 27, 361-370.

22.

Thissen, D. (2001). IRTLRDIF v.2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software]. Chapel Hill: L. L. Thurstone Psychometric Laboratory, University of North Carolina.

23.

Thissen, D. , Steinberg, L. , & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99, 118-128.

24.

Thissen, D. , Steinberg, L. , & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147-169). Hillsdale, NJ: Lawrence Erlbaum.

25.

Thissen, D. , Steinberg, L. , & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum.

26.

Wang, W.C. , & Yeh, L.Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.

27.

Zumbo, B.D. (1999). A handbook on the theory and methods for differential item functioning: Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, Ontario , Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Anomalous Type I Error Rates for Identifying One Type of Differential Item Functioning in the Presence of the Other

Abstract

Keywords

Get full access to this article

References