Abstract
National Assessment has devised a scoring system that involves using nominal classifications of responses as well as an overall evaluation of whether responses provide acceptable evidence that an educational objective has been met. This study was concerned with selecting a measure of scorer agreement for an ongoing quality-control program. The purpose of the agreement statistic was to screen out potential problems for further analysis of disagreements. Based on a review of various statistics proposed for measuring agreement, two were chosen for further study: the simple percent of agreement and Cohen's kappa.
The analysis showed that Cohen's kappa is extremely sensitive to very easy or very difficult items which comprised more than a third of the measures in this study. The percent of agreement statistic, on the other hand, is overly sensitive to high variance items—those of medium difficulty with many scoring categories. Since kappa is inappropriate for so many items, and since National Assessment can use the disagreement information on many-category items in later analysis, staff concluded that Cohen's kappa does not add sufficient information to make its calculation worthwhile.
Get full access to this article
View all access options for this article.
