Sage Journals: Discover world-class research

Abstract

Differential item functioning (DIF) occurs when an item has different measurement properties for members of one group versus another. Likelihood-ratio (LR) tests for DIF based on item response theory (IRT) involve statistically comparing IRT models that vary with respect to their constraints. A simulation study evaluated how violation of the normality assumption about the random latent variable for one or both groups affected IRT-LR-DIF results. Item response data with or without DIF were generated from the two-parameter logistic model and fitted under the assumption that the latent distribution was normal for both groups. Although the IRT-LR-DIF method performed well when latent distributions were normal for both groups, results were distorted when the distribution was skewed for one or both groups. Specifically, Type I error was inflated, differences between reference- and focal-group item parameter estimates were inaccurate, and group differences in the mean and variance of the latent distribution were overestimated.

Keywords

differential item functioning LR-DIF IRT-LR-DIF item response theory item bias measurement invariance

Get full access to this article

View all access options for this article.

References

Ankenmann, R.D. , Witt, E.A. , & Dunbar, S.B. (1999). An investigation of the power of the likelihood ratio goodness-of-fit statistic in detecting differential item functioning . Journal of Educational Measurement, 36, 277-300.

Balsis, S. , Gleason, M.E. , Woods, C.M. , & Oltmanns, T.F. (2007). Age group bias in DSM-IV personality disorder criteria: An item response analysis. Psychology and Aging, 22, 171-185.

Bielinski, J. , & Davison, M.L. (1998). Gender differences by item difficulty interactions in multiple-choice mathematics items. American Educational Research Journal, 35, 455-476.

Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (Eds), Statistical theories of mental test scores (pp. 395-479). Reading, MA: Addison & Wesley.

Bock, R.D. , & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika , 46, 443-459.

Bock, R.D. , & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179-197.

Borsboom, D. (2006). When does measurement invariance matter? Medical Care, 44, S176-S181.

Borsboom, D. , Mellenbergh, G.J. , & van Heerden, J. (2002). Different kinds of DIF: A distinction between absolute and relative forms of measurement invariance and bias. Applied Psychological Measurement, 26, 433-450.

Boulet, J.R. (1996). The effect of non-normal ability distributions on IRT parameter estimation using full-information and limited-information methods (item response theory, nonlinear factor analysis). Dissertation Abstracts International, 58, 1256.

10.

Camilli, G. , & Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.

11.

Chan, K.S. , Orlando, M. , Ghosh-Dastidar, B. , Duan, N. , & Sherbourne, C.D. (2004). The interview mode effect on the Center for Epidemiological Studies Depression (CES-D) scale: An item response theory analysis. Medical Care, 42, 281-289.

12.

Cohen, A.S. , Kim, S. , & Wollack, J.A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20, 15-26.

13.

De Ayala, R.J. (1995, April). Item parameter recovery for the nominal response model. Paper presented at the annual meeting of the American Educational Research Association, San Francisco .

14.

Donovan, M.A. , & Drasgow, F. (1999). Do men's and women's experiences of sexual harassment differ? An examination of the differential test functioning of the sexual experiences questionnaire. Military Psychology, 11, 265-282.

15.

Ellis, B.B. , Becker, P. , & Kimmel, H.D. (1993). An item response theory evaluation of an English version of the Trier Personality Inventory (TPI). Journal of Cross-Cultural Psychology, 24, 133-148.

16.

Ellis, B.B. , & Kimmel, H.D. (1992). Identification of unique cultural response patterns by means of item response theory. Journal of Applied Psychology , 77, 177-184.

17.

Hill, C.D. (2004). Precision of parameter estimates for the graded item response model. Unpublished master's thesis, University of North Carolina at Chapel Hill.

18.

Holland, P. W., & Wainer, H. (Eds.). (1993). Differential item functioning . Hillsdale, NJ: Lawrence Erlbaum .

19.

Kim, S. , & Cohen, A.S. (1995). A comparison of Lord's chi-square, Raju's area measures, and the likelihood ratio test on detection of differential item functioning. Applied Measurement in Education, 8, 291-312.

20.

Kim, S. , & Cohen, A.S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345-355.

21.

Kirisci, L. , & Hsu, T.C. (1995, April). The robustness of BILOG to violations of the assumptions of unidimensionality of test items and normality of ability distribution. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.

22.

Mackinnon, A. , Jorm, A.F. , Christensen, H. , Scott, L.R. , Henderson, A.S. , & Korten, A.E. (1995). A latent trait analysis of the Eysenck personality questionnaire in an elderly community sample. Personality and Individual Differences, 18, 739-747.

23.

Mellenbergh, G.J. (1989). Item bias and item response theory. International Journal of Educational Research, 13, 127-143.

24.

Millsap, R.E. , & Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement , 17, 297-334.

25.

Mislevy, R. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.

26.

Morales, L.S. , Reise, S.P. , & Hays, R.D. (2000). Evaluating the equivalence of health care ratings by Whites and Hispanics. Medical Care, 38, 517-527.

27.

Muthén, L.K. , & Muthén, B.O. (2006). Mplus: Statistical analysis with latent variables (Version 4.1) [Computer software] Los Angeles, CA: Authors.

28.

Oishi, S. (2006). The concept of life satisfaction across cultures: An IRT analysis. Journal of Research in Personality, 40, 411-423.

29.

Orlando, M. , & Marshall, G.N. (2002). Differential item functioning in a Spanish translation of the PTSD checklist: Detection and evaluation of impact. Psychological Assessment, 14, 50-59.

30.

Reise, S.P. , Widaman, K.F. , & Pugh, R.H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552-566.

31.

Rodebaugh, T.L. , Woods, C.M. , Heimberg, R.G. , Liebowitz, M.R. , & Schneier, F.R. (2006). The factor structure and screening utility of the social interaction anxiety scale. Psychological Assessment, 18, 231-237.

32.

Smith, L.L. , & Reise, S.P. (1998). Gender differences on negative affectivity: An IRT study of differential item functioning on the multidimensional personality questionnaire stress reaction scale. Journal of Personality and Social Psychology, 75, 1350-1362.

33.

Stark, S. , Chernyshenko, O.S. , Chang, K. , Lee, W.C. , & Drasgow, F. (2001). Effects of the testing situation on item responding: Cause for concern. Journal of Applied Psychology, 86, 943-953.

34.

Steinberg, L. (2001). The consequences of pairing questions: Context effects in personality measurement. Journal of Personality and Social Psychology, 81, 332-342.

35.

Stocking, M.L. , & Lord, F.M. (1983). Developing a common metric in item response theory . Applied Psychological Measurement, 7, 201-210.

36.

Stone, C.A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameters logistic response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16, 1-16.

37.

Sweeney, K.P. (1996). A Monte-Carlo investigation of the likelihood-ratio procedure in the detection of differential item functioning. Unpublished doctoral dissertation, Fordham University, New York.

38.

Thissen, D. (1991). MULTILOG user's guide: Multiple categorical item analysis and test scoring using item response theory [Computer software and manual]. Chicago: Scientific Software International.

39.

Thissen, D. (2001). IRTLRDIF v2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software documentation]. Chapel Hill: L. L. Thurstone Psychometric Laboratory, University of North Carolina.

40.

Thissen, D. , Steinberg, L. , & Gerrard, M. (1986). Beyond group-mean differences: The concept of item bias. Psychological Bulletin, 99, 118-128.

41.

Thissen, D. , Steinberg, L. , & Wainer, H. (1988). Use of item response theory in the study of group difference in trace lines. In H. Wainer & H. Braun (Eds.), Test validity (pp. 147-170). Hillsdale, NJ: Lawrence Erlbaum.

42.

Thissen, D. , Steinberg, L. , & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale, NJ: Lawrence Erlbaum.

43.

van den Oord, E.J.C.G. (2005). Estimating Johnson curve population distributions in MULTILOG. Applied Psychological Measurement, 29, 45-64.

44.

Wang, W. , & Yeh, Y. (2003). Effects of anchor item methods on differential item functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27, 479-498.

45.

Williams, M.T. , Turkheimer, E. , Schmidt, K.M. , & Oltmanns, T.F. (2005). Ethnic identification biases responses to the Padua Inventory for Obsessive Compulsive Disorder. Assessment, 12, 174-185.

46.

Woods, C.M. (2006a). Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychological Methods, 11, 253-270.

47.

Woods, C.M. (2006b). RCLOG v.2: Software for item response theory parameter estimation with the latent population distribution represented using spline-based densities (Tech. Rep.). St. Louis: Washington University.

48.

Woods, C.M. , & Thissen, D. (2004). RCLOG v.1: Software for item response theory parameter estimation with the latent population distribution represented using spline-based densities (Tech. Rep.). Chapel Hill: L. L. Thurstone Psychometric Laboratory, University of North Carolina.

49.

Woods, C.M. , & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika , 71, 281-301.

50.

Yamamoto, K. , & Muraki, E. (1991, April). Nonlinear transformation of IRT scale to account for the effect of non-normal ability distribution on the item parameter estimation. Paper presented at the annual meeting of the American Educational Research Association, Chicago .

51.

Zimowski, M. , Muraki, E. , Mislevy, R. , & Bock, D. (2003). BILOG-MG 3 [Computer software]. Lincolnwood, IL: Scientific Software International .

52.

Zwinderman, A.H. , & van den Wollenberg, A.L. (1990). Robustness of marginal maximum likelihood estimation in the Rasch model. Applied Psychological Measurement, 14, 73-81.

Likelihood-Ratio DIF Testing: Effects of Nonnormality

Abstract

Keywords

Get full access to this article

References