Sage Journals: Discover world-class research

Abstract

Sample-size restrictions limit the contingency table approaches based on asymptotic distributions, such as the Mantel-Haenszel (MH) procedure, for detecting differential item functioning (DIF) in many practical applications. Within this framework, the present study investigated the power and Type I error performance of empirical and inferential criteria for DIF detection in small samples. Sample sizes (50/50, 100/50, 200/50, and 100/100 for the reference and focal groups, respectively), ability distributions (equal and unequal), and amount of DIF (moderate and high) were manipulated. The results show the advantages of employing theMHchi-square statistic using high levels of significance (• = .20) as opposed to the empirical criteria (cutoffs for categorizing DIF based on the magnitude of the MH common odds ratio estimator and the standardized p-difference statistic). Some considerations concerning Type I and Type II errors are made.

Keywords

differential item functioning Mantel-Haenszel procedure significance levels small samples standardization index

Get full access to this article

View all access options for this article.

References

Allalouf, A. , Hambleton, R. ,& Sireci, S. (1999). Identifying the causes of translation DIF on verbal items. Journal of Educational Measurement,36, 185-198.

Azocar, F. , Areán, P. , Miranda, J. ,& Muñoz, R. F. (2001). Differential itemfunctioning in a Spanish translation of the Beck Depression Inventory. Journal of Clinical Psychology, 57(3), 355-365.

Camilli, G. ,& Shepard, L. (1994). Methods for identifying biased test items. Newbury Park, CA: Sage.

Camilli, G. , & Smith, J. K. (1990). Comparison of the Mantel-Haenszel test with a randomized and a jackknife test for detecting biased items. Journal of Educational Statistics, 15, 53-67.

Clauser, B. E. , & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17, 31-44.

Dorans, N. J. , & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355-368.

Ellis, B. B. (1995). A partial test of Hulin’s psychometric theory of measurement equivalence in translated test. European Journal of Psychological Assessment, 11(3), 184-193.

Ellis, B. B. , & Mead, A. D. (2000). Assessment of the measurement equivalence of a Spanish translation of the 16PF Questionnaire. Educational and Psychological Measurement, 60, 787-807.

Ercikan, K. (1999, April). Translation DIF on TIMSS. Paper presented at the annual meeting of the National Council on Measurement in Education, Montréal, Quebec, Canada.

10.

Ferreres, D. , Gonzalez-Roma, V. , & Gomez, J. (2002). Funcionamiento diferencial de los ítems en una situación de contacto de lenguas. [Differential item functioning for two interacting languages]. Psicothema, 2, 483-490.

11.

Fidalgo, A. M. (1994). MHDIF: Acomputer program for detecting uniform and nonuniform differential item functioning with the Mantel-Haenszel procedure. Applied Psychological Measurement, 18, 300-300.

12.

Fidalgo, A. M. (in press). Mantel-Haenszel methods. In B. Everitt & D. Howell (Eds.), Encyclopedia of statistics in behavioral science. London: John Wiley.

13.

Fidalgo, A. M. , & Ferreres, D. (2002). Supuestos y consideraciones en los estudios empíricos sobre funcionamiento diferencial de los ítems [Assumptions and considerations on differential item functioning with empirical data]. Psicothema, 2, 491-496.

14.

Fidalgo, A. M. , Mellenbergh, G. J. ,& Muñiz, J. (1998). Comparación del procedimiento MantelHaenszel frente a los modelos loglineales en la detección del funcionamiento diferencial de los ítems [Comparison of the Mantel-Haenszel procedure versus the log linear models for detecting differential item functioning]. Psicothema, 10, 219-228.

15.

Gierl, M. J. , Rogers, W. T. , & Klinger, D. (1999, April). Consistency between statistical procedures and content reviews for identifying translation DIF. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.

16.

Hidalgo-Montesinos, M. D. ,& Lopez-Pina, J. A. (2002). Two-stage equating in differential item functioning detection under the graded response model with the Raju area measures and the Lord statistic. Educational and Psychological Measurement, 62, 32-44.

17.

Holland, W. P. , & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: LEA.

18.

Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18(1), 89-114.

19.

Mantel, N. , & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.

20.

Mazor, K. M. , Clauser, B. E. ,& Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52, 443-452.

21.

Millsap, R. E. ,& Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.

22.

Muñiz, J. , Hambleton, R. K. , & Xing, D. (2001). Small sample studies to detect flaws in item translations. International Journal of Testing, 1(2), 115-135.

23.

Orlando, M. ,& Marshall, G. N. (2002). Differential item functioning in a Spanish translation of the PTSD Checklist: detection and evaluation of impact. Psychological Assessment, 14(1), 50-59.

24.

Parshall, C. G. ,& Miller, T. R. (1995). Exact versus asymptotic Mantel-Haenszel DIF statistics: A comparison of performance under small-sample conditions. Journal of Educational Measurement, 32, 302-316.

25.

Penfield, R. D. ,& Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: Reviewand recommendations. Educational Measurement: Issues and Practice, 19, 5-15.

26.

Rogers, H. J. , & Swaminathan, H. (1993). A comparison of logistic regression and MantelHaenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.

27.

Roussos, L. A. , Schnipke, D. L. , & Pashley, P. J. (1999). A generalized formula for the MantelHaenszel differential item functioning parameter. Journal of Educational and Behavioral Statistics, 24(3), 293-322.

28.

Sasaki, M. (1991). Acomparison of two methods for detecting differential item functioning in an ESL placement test. Language Testing, 8(2), 95-111.

29.

Schmitt, A. P. , & Dorans, N. J. (1990). Differential item functioning for minority examinees on the SAT. Journal of Educational Measurement, 27, 67-81.

30.

Shealy, R. ,& Stout, W. (1993). Amodel-based standardization approach that separates true bias/ DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.

31.

Whitmore, M. L. ,& Schumacker, R. E. (1999). Acomparison of logistic regression and analysis of variance differential item functioning detection methods. Educational and Psychological Measurement, 59, 910-927.

32.

Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In W. P. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337-347). Hillsdale, NJ: LEA.

33.

Zwick, R. , & Thayer, D. T. (2002). Application of an empirical Bayes enhancement of MantelHaenszel differential item functioning analysis to a computed adaptative test. Applied Psychological Measurement, 36, 1-28.

34.

Zwick, R. , Thayer, D. T. ,& Lewis, C. (1999). An empirical Bayes approach to Mantel-Haenszel DIF analysis. Journal of Educational Measurement, 36, 1-28.

35.

Zwick, R. , Thayer, D. T. ,& Lewis, C. (2000). Using loss functions for DIF detection: An empirical Bayes approach. Journal of Educational and Behavioral Statistics, 25, 225-247.

Utility of the Mantel-Haenszel Procedure for Detecting Differential Item Functioning in Small Samples

Abstract

Keywords

Get full access to this article

References