Abstract
This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First, propensity score matching is used to eliminate preexisting group differences before the test, ideally creating equivalent groups as in a randomized experimental study. Then, linear mixed effects models are adopted to perform DIF analysis based on the matched data set. We demonstrate this propensity DIF method using a high-stakes functional English language proficiency test. DIF due to education was investigated in the writing component, which consists of two continuously scored performance-based tasks. Although the proposed method is demonstrated in the context of language testing, it can be applied to other types of performance assessments.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
