A Comparison of Lord's χ2 and Raju's Area Measures In Detection of DIF

Abstract

The area between item response functions esti mated in different samples is often used as a measure of differential item functioning (DIF). Under item response theory, this area should be 0, except for errors of measurement. This study examined the effectiveness of two statistical tests of this area—a Z test for exact signed area and a Z test for exact unsigned area—for different test length, sample size, proportion of DIF items on the test, and item parameter estimation conditions using the two- parameter model. Errors in detection made using these two statistics were compared with errors made using Lord's χ². Differences between all three statistics were relatively small; however, the χ² statistic was more effective than either of the two Z tests at detecting simulated DIF. The Z test for the exact signed area was the least effective and was the most likely to result in false negative errors.

Keywords

Index terms: area measures differential item functioning item response theory item bias Lord's χ2.

Get full access to this article

View all access options for this article.

References

Baker, F.B. (1986). GENIRV: A program to generate item response vectors [Computer program]. Madison WI: University of Wisconsin, Laboratory of Experimental Design.

Baker, F.B. , Al-Karni, A. , & Al-Dosary, I.M. (1991). EQUATE: A computer program for the test characteristic curve method of IRT equating. Applied Psychological Measurement, 15, 78.

Candell, G.L. , & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.

Candell, G.L. , & Hulin, C.L. (1987). Cross-language and cross-cultural comparisons in scale translations: Independent sources of information about item nonequivalence . Journal of Cross-Cultural Psychology, 17, 417-440.

Cohen, A.S. , Kim, S.H. , & Subkoviak, M.J. (1991). Influence of prior distributions on detection of DIF. Journal of Educational Measurement, 28, 49-59.

Divgi, D.R. (1985). A minimum chi-square method for developing a common metric in item response theory. Applied Psychological Measurement , 9, 413-415.

Drasgow, F. (1989). An evaluation of marginal maximum likelihood estimation for the two-parameter logistic model. Applied Psychological Measurement, 13, 77-90.

Ironson, G.H. , & Subkoviak, M.J. (1979). A comparison of several methods of assessing item bias. Journal of Educational Measurement , 16, 209-225.

Kim, S.H. , & Cohen, A.S. (1991). A comparison of two area measures for detecting differential item functioning. Applied Psychological Measurement , 15, 269-278.

10.

Kim, S.H. , & Cohen, A.S. (1992). Effects of linking methods on detection of DIF . Journal of Educational Measurement, 29, 51-66.

11.

Linn, R.L. , Levine, M.V. , Hastings, C.N. , & Wardrop, J.L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement, 5, 159-173.

12.

Lim, R.G. , & Drasgow, F. (1990). Evaluation of two methods for estimating item response theory parameters when assessing differential item functioning. Journal ofApplied Psychology, 75, 164-174.

13.

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum.

14.

McCauley, C.D. , & Mendoza, J. (1985). A simulation study of item bias using a two-parameter item response model. Applied Psychological Measurement, 9, 389-400.

15.

McLaughlin, M.E. , & Drasgow, F. (1987). Lord's chi-square test of item bias with estimated and with known person parameters. Applied Psychological Measurement , 11, 161-173.

16.

Mislevy, R.J. , & Bock, R.D. (1986). PC-BILOG: Item analysis and test scoring with binary logistic models [Computer program]. Mooresville IN: Scientific Software.

17.

Mislevy, R.J. , & Bock, R.D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models [Computer program]. Mooresville IN: Scientific Software.

18.

Mislevy, R.J. , & Stocking, M.L. (1990). A consumer's guide to LOGIST and BILOG. Applied Psychological Measurement, 13, 57-75.

19.

Park, D.G. , & Lautenschlager, G.J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.

20.

Raju, N.S. (1988). The area between two item characteristic curves . Psychometrika, 53, 495-502.

21.

Raju, N.S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.

22.

Rudner, L.M. , Getson, P.R. , & Knight, D.L. (1980). A Monte Carlo comparison of seven biased item detection techniques. Journal of Educational Measurement, 17, 1-10.

23.

Shepard, L.A. , Camilli, G. , & Williams, D.M. (1984). Accounting for statistical artifacts in item bias research. Journal of Educational Statistics , 9, 93-128.

24.

Shepard, L. , Camilli, G. , & Williams, D.M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22, 77-105.

25.

Stocking, M.L. , & Lord, F.M. (1983). Developing a common metric in item response theory . Applied Psychological Measurement, 7, 201-210.