Analysis of Differential Item Functioning in Translated Assessment Instruments

Abstract

The usefulness of three IRT-based methods and the Mantel-Haenszel technique in evaluating the measure ment equivalence of translated assessment instruments was investigated. A 15-item numerical test and an 18- item reasoning test that were originally developed in English and then translated to French were used. The analyses were based on four groups, each containing 1,000 examinees. Two groups of English-speaking ex aminees were administered the English version of the tests; the other two were French-speaking examinees who were administered the French version of the tests. The percent of items identified with significant differ ential item functioning (DIF) in this study was similar to findings in previous large-sample studies. The four DIF methods showed substantial consistency in identi fying items with significant DIF when replicated. Sug gestions for future research are provided.

Keywords

Index terms: area measures differential item functioning,item response theory language translations Lord's X2,Mantel-Haenszel procedure.

Get full access to this article

View all access options for this article.

References

Baker, F.B. (1993). EQUATE2: Computer program for equating two metrics in item response theory [Computer program]. Madison: University of Wisconsin, Laboratory of Experimental Design.

Brislin, R. (1980). Translation and content analysis of oral and written material. In H. C. Triandis & J. W. Berry (Eds.), Handbook of cross-cultural psychology (Vol. 2; pp. 389-444). Boston: Allyn and Bacon.

Brislin, R. (1986). The wording and translation of research instruments . In W. J. Lonner & J. W. Berry (Eds.), Field methods in cross-cultural research (pp. 137-164). Beverly Hills: Sage.

Candell, G.L. , & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.

Candell, G.L. , & Hulin, C.L. (1987). Cross-language and cross-cultural comparisons in scale translations: Independent sources of information about item nonequivalence . Journal of Cross-Cultural Psychology, 17, 417-440.

Candell, G.L. , & Roznowski, M. (1984, August). Using IRT to establish equivalence across U.S. and Canadian subpopulations. Paper presented at the annual meeting of the American Psychological Association, Toronto.

Cohen, A.S. , & Kim, S.-H. (1993). A comparison of Lord's χ2 and Raju's area measures on detection of DIF. Applied Psychological Measurement , 17,39-52.

de Vera, M.V. (1985). Establishing cultural relevance and measurement equivalence using Emic and Etic items. Dissertation Abstracts International, 46-07B, 2485.

Dorans, N.J. , & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 35-66). Hillsdale NJ: Erlbaum .

10.

Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are the central issues. Psychological Bulletin, 95, 134-135.

11.

Drasgow, F. , & Hulin, C.L. (1989). Cross-cultural measurement. Unpublished manuscript, University of Illinois at Urbana-Champaign .

12.

Drasgow, F. , & Lissak, R.I. (1983). Modified parallel analysis: A procedure for examining the latent dimensionality of dichotomously-scored item responses. Journal of Applied Psychology, 68, 363-373.

13.

Drasgow, F. , & Parsons, C.K. (1983). Application of unidimensional item response theory models to multidimensional data. Applied Psychological Measurement , 7, 189-199.

14.

Ellis, B. (1989). Differential item functioning: Implications for test translators. Journal of Applied Psychology, 74, 912-921.

15.

Ellis, B. (1991). Item response theory: A tool for assessing the equivalence of translated tests. International Test Bulletin , 32, 33-51.

16.

Ellis, B.B. , Minsel, B. , & Becker, P. (1989). Evaluation of attitude survey translations: An investigation using item response theory. International Journal of Psychology, 24, 661-684.

17.

Gulliksen, H. (1950). Theory of mental tests. New York: Wiley.

18.

Holland, P.W. , & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel Procedure. In H. Wainer & H. Braun (Eds.), Test validity (pp. 129-145). Hillsdale NJ: Erlbaum.

19.

Holland, P.W. , & Wainer, H. (1993). Differential item functioning. Hillsdale NJ: Erlbaum.

20.

Hulin, C.L. , Drasgow, F. , & Komocar, J. (1982). Application of item response theory to analysis of attitude translations. Journal of Applied Psychology, 67, 818-825.

21.

Hulin, C.L. , Drasgow, F. , & Parsons, C.K. (1983). Item response theory: Application to psychological measurement. Homewood IL: Dow Jones-Irwin .

22.

Hulin, C.L. , & Mayer, L.M. (1986). Psychometric equivalence of a translation of the JDI into Hebrew. Journal of Applied Psychology, 71, 83-94.

23.

Kim, S.-H. , & Cohen, A.S. (1992). Effects of linking methods on detection of DIF . Journal of Educational Measurement, 29, 51-66.

24.

Lord. F. (1980). Applications of item response theory to practical testing problems. Hillsdale NJ: Erlbaum.

25.

Lord, F.M. , & Novick, M.R. (1968). Statistical theories of mental test scores. Reading MA: Addison-Wesley.

26.

Mantel, N. , & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719-748.

27.

McKinley, R. , & Reckase, M. (1983, April). The use of IRT analysis on dichotomous data from multidimensional tests. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.

28.

Millsap, R.J. , & Everson, H.T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement , 17, 297-334.

29.

Mislevy, R. , & Bock, R.D. (1990). BILOG 3: Item analysis and test scoring with binary logistic models. Mooresville IN: Scientific Software.

30.

Osberg, D.W. , Scott, J.C. , & Raju, N.S. (1985, April). An analysis of the use of item response theory to investigate the fidelity of test translations. Paper presented at the annual meeting of the American Educational Research Association, Chicago.

31.

Park, D.G. , & Lautenschlager, G.T. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.

32.

Raju, N.S. (1988). The area between two item characteristic curves . Psychometrika, 53, 495-502.

33.

Raju, N.S. (1990). Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.

34.

Raju, N.S. , Bode, R.K. , & Larsen, V.S. (1989). An empirical assessment of the Mantel-Haenszel statistic for studying differential item performance. Applied Measurement in Education, 2, 1-13.

35.

Raju, N.S. , Drasgow, F. , & Slinde, J.A. (1993). An empirical comparison of the area methods, Lord's chisquare test, and the Mantel-Haenszel technique for assessing differential item functioning. Educational and Psychological Measurement , 53, 301-314.

36.

Reckase, M.D. (1979). Unifactor latent trait models applied to multi-factor tests: Results and implications. Journal of Educational Statistics , 4, 207-230.

37.

Stocking, M.L. , & Lord, F.M. (1983). Developing a common metric in item response theory . Applied Psychological Measurement, 7, 201-210.

38.

Swaminathan, H. , & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement , 27, 361-370.

39.

Tatsuoka, M.M. (1988). Multivariate analysis: Techniques for educational and psychological research. New York: Macmillan.