Abstract
Differential item functioning (DIF) occurs when people with the same proficiency have different probabilities of giving a certain response to an item. The present study focused on an assumption implicit in popular methods for DIF testing that has received little attention in published literature (item residual homogeneity). The assumption is explained, a strategy for detecting violations of it (i.e., item residual heterogeneity) is illustrated with empirical data, and simulations are carried out to evaluate the performance of binary logistic regression, two-group item response theory (IRT), and the Mantel–Haenszel (MH) test in the presence of item residual heterogeneity. Results indicated that heterogeneity inflated Type I error and attenuated power for logistic regression, and attenuated power and produced biased estimates of the latent focal group mean and standard deviation for two-group IRT. The MH test was robust to item residual heterogeneity, probably because it does not use the logistic function.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
