Abstract
Large-scale testing programs involving classification decisions typically have multiple forms available and conduct equating to ensure cut-score comparability across forms. A test developer might be interested in the extent to which an examinee who happens to take a particular form would have a consistent classification decision if he or she had taken an equated alternate form. In this article, classification consistency indices directly applicable to equating contexts are introduced, and procedures for estimating these indices are presented under three equating designs: the single-group design, the random-groups design, and the common-item nonequivalent-groups design. Two families of psychometric models (item response theory models and beta-binomial models) are introduced, focusing on the procedures for estimating conditional score distributions and ability distributions. Two empirical analyses are provided to illustrate the use of the methodology under the common-item nonequivalent-groups design and the random-groups design, using item response theory models and beta-binomial models.
Keywords
Get full access to this article
View all access options for this article.
