Abstract
When evaluating a new diagnostic test against a less than perfect “gold standard,” the kappa coefficient of agreement κ is often inappropriately used as a measure of “diagnostic accuracy,” which frequently leads to paradoxical findings. In this paper, κ is expressed as a function of disease prevalence and diagnostic accuracy (subject to Youden's index > 0), whereby necessary and sufficient conditions, given the accuracy rates, are derived to aid in locating the maximizer of κ. Paradoxical behavior of κ can thus be detected in the light of diagnostic accuracy. Attempts are made to clarify the subtle difference between “diagnostic accuracy” and “diagnostic reliability.” The implication of this difference is then assessed from a regulatory perspective. In order to extend the idea of κ beyond its originally intended use, the maximum likelihood method, coupled with the Expectation-Maximization algorithm, is proposed as a remedial option, not for measuring diagnostic agreement or reliability but, rather, for evaluating diagnostic accuracy. Some illustrative examples adapted from published data are provided.
Get full access to this article
View all access options for this article.
