Abstract
A performance assessment consisting of 10 separate exercises was scored with a randomized scoring procedure. All responses to each exercise were rated once; in addition, a randomly selected subset of the responses to each exercise received an independent second rating. Each second rating was averaged with the corresponding first rating before the scores were computed. This article presents a method for estimating the scoring reliability (interrater reliability) coefficient and the standard error of scoring for the resulting scores. The report concludes with some numerical examples showing how the reliability estimation procedure can be used to estimate the effect of varying the proportions of responses that are double-scored.
Get full access to this article
View all access options for this article.
