Sage Journals: Discover world-class research

Abstract

MULTISIB is proposed as a statistical test for assessing differential item functioning (DIF) of intentionally two-dimensional test data, such as a mathematics test designed to measure algebra and geometry. MULTISIB is based on the multidimensional model of DIF as presented in Shealy & Stout (1993), and is a direct extension of SIBTEST, its unidimensional counterpart. For an intentionally two-dimensional test, DIF is appropriately modeled to result from secondary dimensional influence from other than the two intended dimensions. Simulation studies were used to assess the performance of MULTISIB to detect DIF in intentionally two-dimensional tests. These results indicate that MULTISIB exhibited reasonably good adherence to the nominal level of significance and good power. Moreover, for each DIF model the average amount of DIF estimated over the 100 simulations of the model by MULTISIB was close to the true value, confirming its relative lack of statistical estimation bias in assessing true DIF. In addition, the simulation studies supported the importance of using the regression correction to adjust the scores on the studied item due to impact and the importance of matching examinees on two subtest scores instead of the total test score.

Get full access to this article

View all access options for this article.

References

Ackerman, T. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.

Ackerman, T. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20, 311-329.

Ackerman, T. (1993, April). A didactic example of the influence of conditioning on the complete latent ability space when performing DIF analyses. Paper presented at the Annual meeting of the National Council on Measurement in Education, Atlanta.

Chang, H. , Mazzeo, J. , & Roussos, L. (1996). Detecting DIF for polytomous scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.

Donoghue, J. R. , Holland, P. W. , & Thayer, D. T. (1993). A Monte Carlo study of factors that affect the Mantel-Haenszel and standardization measures of differential item functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 137-166). Hillsdale NJ: Erlbaum.

Dorans, N. J. , & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355-368.

Douglas, J. , Roussos, L. , & Stout, W. (1996). Item bundle DIF hypothesis testing: Identifying suspect bundles and assessing their differential functioning. Journal of Educational Measurement, 33, 465-484.

Douglas, J. , Stout, W. , & DiBello, L. (1996). A kernel-smoothed version of SIBTEST with applications to local DIF inference and function estimation. Journal of Educational and Behavioral Statistics, 21, 333-363.

Hattie, J. (1985). Methodology review: Assessing uni-dimensionality of tests and items. Applied Psychological Measurement, 9, 139-164.

10.

Holland, P. W. , & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale NJ: Erlbaum.

11.

Fraser, C. , & McDonald, R. P. (1988). NOHARM: Least squares item factor analysis. Multivariate Behavioral Research, 23, 267-269.

12.

Law School Admissions Council . (1989, June). Law School Admissions Test. Newtown PA: Author.

13.

Li, H. & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677.

14.

Lord, F. M. , & Novick, M. R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley.

15.

Mazor, K. , Kanjee, A. , & Clauser, B. E. (1995). Using logistic regression and the Mantel-Haenszel with multiple ability estimates to detect differential item functioning. Journal of Educational Measurement, 32, 131-144.

16.

Millsap, R. E. , & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297-334.

17.

Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy-Stout's test for DIF. Journal of Educational Measurement, 30, 293-311.

18.

Narayanan, P. , & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedure for detecting differential item functioning. Applied Psychological Measurement, 18, 315-328.

19.

Narayanan, P. , & Swaminathan, H. (1996). Identification of items that show nonuniform DIE Applied Psychological Measurement, 20, 257-274.

20.

Raju, N. , van der Linden, W. , & Fleer, P. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353-368.

21.

Reckase, M. D. , & McKinley, R. L. (1983, April). The definition of difficulty and discrimination for multidimensional item response theory models. Paper presented at the Annual meeting of the American Educational Research Association, Montreal.

22.

Roussos, L. , & Stout, W. (1996a). Simulation studies of effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215-230.

23.

Roussos, L. , & Stout, W. (1996b). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371.

24.

Shealy, R. , & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159-194.

25.

Shepard, L. A. , Camilli, G. , & Williams, D. M. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22, 77-105.

26.

Swaminathan, H. , & Rogers, H. J. (1990a, April). A comparison of the logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Paper presented at the Meeting of the American Educational Research Association, Boston.

27.

Swaminathan, H. , & Rogers, H. J. (1990b). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.

28.

Thissen, D. , Steinberg, L. , & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147-169). Hillsdale NJ: Erlbaum.

MULTISIB: A Procedure to Investigate DIF When a Test is Intentionally Two-Dimensional

Abstract

Get full access to this article

References