Abstract
In item response theory applications, item fit analysis is often performed for precalibrated items using response data from subsequent test administrations. Because such practices lead to the involvement of sampling variability from two distinct samples that must be properly addressed for statistical inferences, conventional item fit analysis can be revisited and modified. This study extends the item fit analysis originally proposed by Haberman et al., which involves examining the discrepancy between the model-implied and empirical expected score curve. We analytically derive the standard errors that accurately account for the sampling variability from two samples within the framework of restricted recalibration. After derivation, we present the findings from a simulation study that evaluates the performance of our proposed method in terms of the empirical Type I error rate and power, for both dichotomous and polytomous items. An empirical example is also provided, in which we assess the item fit of pediatric short-form scale in the Patient-Reported Outcome Measurement Information System.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
