Abstract
The investigation of differential item functioning (DIF) is crucial in language proficiency tests in which test-takers with diverse backgrounds are involved, because DIF items pose a considerable threat to the validity of tests. To date, DIF analysis in language testing has been conducted mainly for multiple-choice items. However, examining DIF with polytomous response items such as in writing and speaking tests should also be taken into consideration for validating tests. This study aims to investigate DIF across two different broad language groupings, Asian and European, in a speaking test in which the test-takers’ responses are rated polytomously. Data in this study were collected from 1038 nonnative speakers of English from France, Hong Kong, Japan, Spain, Switzerland and Thailand who took the SPEAK test in 1988 (see Educational Testing Service, 1985). The methods used for DIF analysis were the likelihood ratio test and the logistic regression procedure. The primary scoring categories of interest in this study were ‘grammar’, ‘pronunciation’ and ‘fluency.’ The results showed that ‘grammar’ and ‘pronunciation’ functioned differentially across the two groups. A content analysis of the DIF items suggested that the types and the numbers of scoring scales might influence the test validity. The study provides methodological information on differences between two approaches to DIF analysis and suggestions for future research.
Get full access to this article
View all access options for this article.
