Abstract
Common items with inconsistent b-parameter estimates may have a serious impact on item response theory (IRT)—based equating results. To find a better way to deal with the outlier common items with inconsistent b-parameters, the current study investigated the comparability of 10 variations of four IRT-based equating methods (i.e., concurrent calibration, separate calibration with test characteristic curve [TCC] and mean/sigma [M/S] transformations, and calibration with fixed common item parameters [FCIP]) when outliers were either ignored or considered. Simulated data were generated for the common-item nonequivalent groups matrix design to reflect the manipulated factors: group ability differences and nonequivalent groups, number/score points of outliers, and types of outliers. When no outliers were present, the TCC and M/S transformations performed the best. When there were outliers, overall, the methods that considered them (except the M/S transformation with outliers weighted) resulted in a vast improvement compared to the methods that ignored them.
Get full access to this article
View all access options for this article.
