Abstract
This study compares the efficacy of different strategies for translating item-level, proportion-correct standard-setting judgments into a y-metric test cutoff score for use with item response theory (IRT) scoring, using Monte Carlo methods. Simulated Angoff-type ratings, consisting of 1,000 independent 75 Item × 13 Rater matrices, were generated at five points along the y continuum, at three levels of rater fit to the item characteristics curves, yielding 14,625,000 ratings as the basis of the analyses. These simulated proportion-correct ratings were converted to the IRT y scale using test-level and item-level methods explicated by Kane (1987). Kane's optimally weighted, item-level conversion method initially produced anomalous results; however, it was discovered that imposing a restriction on the weights avoided these anomalies and rendered the optimally weighted method the most statistically efficient. Six areas for future research are outlined for advancing the integration of these classical standard-setting ratings into IRT methodology.
Keywords
Get full access to this article
View all access options for this article.
