Detection of Differential Item Functioning in the Graded Response Model

Abstract

Methods for detecting differential item func tioning (DIF) have been proposed primarily for the item response theory dichotomous response model. Three measures of DIF for the dichotomous response model are extended to include Samejima's graded response model: two measures based on area differences between item true score functions, and a χ² statistic for comparing differences in item parameters. An illustrative example is presented.

Keywords

Index terms: differential item functioning graded response model item response theory.

Get full access to this article

View all access options for this article.

References

Baker, F.B. (1985). The basics of item response theory. Portsmouth NH: Heinemann.

Baker, F.B. (1986). GENIRV: A program to generate item response vectors [Computer program]. Madison: University of Wisconsin, School of Education, Department of Educational Psychology, Laboratory of Experimental Design.

Baker, F.B. (1992). Equating tests under the graded response model . Applied Psychological Measurement, 16, 87-96.

Baker, F.B. (1993). EQUATE 2.0: A computer program for the characteristic curve method of IRT equating. Applied Psychological Measurement , 17, 20.

Baker, F.B. , & Al-Karni, A. (1991). A comparison of two procedures for computing IRT equating coefficients. Journal of Educational Measurement , 28, 147-162.

Burden, R.L. , & Faires, J.D. (1985). Numerical analysis (3rd ed.). Boston MA: PWS Publishers. Candell, G.L. , & Drasgow , F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.

Candell, G.L. , & Hulin, C.L. (1987). Cross-language and cross-cultural comparisons in scale translations: Independent sources of information about item nonequivalence . Journal of Cross-Cultural Psychology, 17, 417-440.

Divgi, D.R. (1985). A minimum chi-square method for developing a common metric in item response theory. Applied Psychological Measurement , 9, 413-415.

Hogg, R.V. , & Craig, A.T. (1978). Introduction to mathematical statistics (4th ed). New York: Macmillan.

10.

Johnson, N.L. , & Kotz, S. (1970). Continuous univariate distributions: 1. Boston: Houghton Mifflin.

11.

Kim, S.-H. , & Cohen, A.S. (1992). Effects of linking methods on detection of DIF . Journal of Educational Measurement, 29, 51-66.

12.

Linn, R.L. , Levine, M.V. , Hastings, C.N. , & Wardrop, J.L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement, 5, 159-173.

13.

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum.

14.

Loyd, B.H. , & Hoover, H.D. (1980). Vertical equating using the Rasch model. Journal of Educational Measurement, 17, 179-193.

15.

Marascuilo, L.A. , & Slaughter, R.E. (1981). Statistical procedures for identifying possible sources of item bias based on χ2 statistics . Journal of Educational Measurement, 18, 229-248.

16.

McCauley, C.D. , & Mendoza, J. (1985). A simulation study of item bias using a two-parameter item response model. Applied Psychological Measurement, 9, 389-400.

17.

Muraki, E. , & Bock, R.D. (1991). PARSCALE: Parameter scaling of rating data [Computer program]. Chicago IL: Scientific Software.

18.

Raju, N.S. (1988). The area between two item characteristic curves . Psychometrika, 53, 495-502.

19.

Raju, N.S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.

20.

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometric Monograph, No. 17.

21.

Shepard, L.A. , Camilli, G. , & Williams, D.M. (1984). Accounting for statistical artifacts in item bias research. Journal of Educational Statistics , 9, 93-128.

22.

Stocking, M. , & Lord, F.M. (1983). Developing a common metric in item response theory . Applied Psychological Measurement, 7, 201-210.

23.

Thissen, D. (1991). MULTILOG user's guide (Version 6.0) [Computer program]. Chicago IL: Scientific Software.

24.

Thissen, D. (1992). PLOTLOG for the MacIntosh [Computer program]. Chapel Hill: University of North Carolina, L. L. Thurstone Psychometric Laboratory.

25.

Thissen, D. , Steinberg, L. , & Mooney, J.A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247-260.

26.

Thissen, D. , Steinberg, L. , & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale NJ: Eribaum.

27.

Wainer, H. (1993). Model-based standardized measurement of an item's differential impact. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123-135). Hillsdale NJ: Erlbaum.

28.

Wainer, H. , Sireci, S.G. , & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197-219.