Abstract
A method for measuring interrater agreement on checklists is presented. This technique does not assign individual scores to raters, but computes a single agreement score from the concordance of their check mark configurations. An overall coefficient of agreement, called phi, is derived. The agreement coefficient that is expected by chance and the statistical significance of phi are determined by statistical simulation. Despite the dichotomous nature of the checklist agreement (raters either agree or disagree on items), we show that the binomial distribution does not provide a means for testing the statistical significance of phi. A medical education study is used to illustrate the phi methodology.
Get full access to this article
View all access options for this article.
