Detecting DIF across the different language groups in a speaking test

Abstract

The investigation of differential item functioning (DIF) is crucial in language proficiency tests in which test-takers with diverse backgrounds are involved, because DIF items pose a considerable threat to the validity of tests. To date, DIF analysis in language testing has been conducted mainly for multiple-choice items. However, examining DIF with polytomous response items such as in writing and speaking tests should also be taken into consideration for validating tests. This study aims to investigate DIF across two different broad language groupings, Asian and European, in a speaking test in which the test-takers’ responses are rated polytomously. Data in this study were collected from 1038 nonnative speakers of English from France, Hong Kong, Japan, Spain, Switzerland and Thailand who took the SPEAK test in 1988 (see Educational Testing Service, 1985). The methods used for DIF analysis were the likelihood ratio test and the logistic regression procedure. The primary scoring categories of interest in this study were ‘grammar’, ‘pronunciation’ and ‘fluency.’ The results showed that ‘grammar’ and ‘pronunciation’ functioned differentially across the two groups. A content analysis of the DIF items suggested that the types and the numbers of scoring scales might influence the test validity. The study provides methodological information on differences between two approaches to DIF analysis and suggestions for future research.

Get full access to this article

View all access options for this article.

References

Ackerman, T. A. , Simpson, M. A. and de la Torre, J. 2000: A comparison of the dimensionality of TOEFL response data from different first language groups . Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, Louisiana.

AERA / APA / NCME (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education) 1999: Standards for educational and psychological testing. Washington, DC: American Educational Research Association .

Bachman, L. F. , Davidson, F. , Ryan, K. and Choi, I.-C. 1993: An investigation into the comparability of two tests of English as a foreign language: the Cambridge-TOEFL comparability study. Cambridge: University of Cambridge Press .

Brown, J. D. 1999: The relative importance of persons, items, subtests and languages to TOEFL test variance . Language Testing 16 (2), 217-238 .

Chen, Z. and Henning, G. 1985: Linguistic and cultural bias in language proficiency tests . Language Testing 2 (2), 155-163 .

Clauser, B. and Mazor, K. 1998: Using statistical procedures to identify differentially functioning test items . ITEMS, 31-44 .

Educational Testing Service . 1985: SPEAK examinee handbook and sample questions. Princeton, NJ: Educational Testing Service .

Ginther, A. and Stevens, J. 1998: Language background and ethnicity, and the internal construct validity of the Advanced Placement Spanish Language Examination. In Kunnan, A. J. , editor, Validation in language assessment. Mahwah, NJ: Lawrence Erlbaum , 169-194.

Hale, G. A. , Rock, D. A. and Jirele, T. 1989: Confirmatory factor analysis of the Test of English as a Foreign Language. TOEFL Research Report No. 32. Princeton, NJ: Educational Testing Service .

10.

Kim, S. H. and Cohen, A. 1995: A comparison of Lord’s Chi-Square, Raju’s Area Measures, and the Likelihood Ratio Test on Detecting of Differential Item Functioning . Applied Measurement in Education 8(4), 291-312 .

11.

Kirk, R. E. 1996: Practical significance: a concept whose time has come . Educational and Psychological Measurement 56, 746-759 .

12.

Kunnan, A. J. 1994: Modelling relationships among some test-taker characteristics and performance on EFL tests: an approach to construct validation . Language Testing 11 (3), 225-252 .

13.

Lord, F. M. 1980: Application of item response theory to practical testing problems. Hillsdale, NJ: Erlbaum .

14.

Oltman, P. K. , Stricker, L. J. and Barrows, T. 1988: Native language, English proficiency, and the structure of the Test of English as a Foreign Language for several language groups. TOEFL Research Report No. 27. Princeton, NJ: Educational Testing Service .

15.

Prentice, D. A. and Miller, D. T. 1992: When small effects are impressive . Psychological Bulletin 112 (1), 160-164 .

16.

Raju, N. S. 1988: The area between two item characteristic curves . Psychometrika 53, 495-502 .

17.

Rogers, H. J. and Swaminathan, H. 1993: A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning . Applied Psychological Measurement 17 (2), 105-116 .

18.

Ryan, K. and Bachman, L. F. 1992: Differential item functioning on two tests of EFL proficiency . Language Testing 9 (1), 12-29 .

19.

Samejima, F. 1969: Estimation of latent ability using a response pattern of graded scores. Psychometric monograph No. 17.

20.

Samejima, F. 1996: Graded response model. In van der Linden, W. and Hambleton, R. , editors, Handbook of modern item response theory. New York: Springer , 85-100.

21.

Sasaki, M. 1991: A comparison of two methods for detecting differential item functioning in an ESL placement test . Language Testing 8(2), 95-111 .

22.

SPSS Incorporated 1997: SPSS 8.0. Chicago, IL: SPSS, Inc.

23.

Swaminathan, H. and Rogers, H. J. 1990: Detecting item bias using logistic regression procedures . Journal of Educational Measurement 27, 361-370 .

24.

Swinton, S. S. and Powers, D. E. 1980: Factor analysis of the Test of English as a Foreign Language for several language groups. TOEFL Research Report No. 6. Princeton, NJ: Educational Testing Service .

25.

Thissen, D. 1991: MULTILOG user’s guide. Computer program. Chicago, IL: Scientific Software .

26.

Thissen, D. , Steinberg, L. and Wainer, H. 1988: Use of item response theory in the study of group differences in trace lines. In Wainer, H. and Braun, H. I. , editors, Test validity. Hillsdale, NJ: Erlbaum , 147-169.

27.

Thissen, D. , Steinberg, L. and Wainer, H. 1993: Detection of differential item functioning using the parameters of item response models. In Holland, P. W. and Wainer, H. , editors, Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum , 67-113.

28.

Zumbo, B. 1999: A Handbook on the theory and methods of differential item functioning: logistic regression modeling as a unitary framework for binary and Likert-Type (ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense .