The Equivalence of Multiple Rater Kappa Statistics and Intraclass Correlation Coefficients

Abstract

Using the Gini-Light-Margolin concept of partioning variance for qualitative data, correspondences are established between various kappa statistics and intraclass correlation coefficients under general conditions (multiple raters and polychotomous category systems). A measure of marginal symmetry for multiple ratings is also developed and is shown to have a proportion-of-variance explanation.

Get full access to this article

View all access options for this article.

References

Collis, G. M. (1985). Kappa, measures of marginal symmetry and intraclass correlations. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 45, 55-62.

Conger, A. J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322-328.

Fleiss, J. L. (1965). Estimating the accuracy of dichotomous judgements. Psychometrika, 30, 469-479.

Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378-382.

Fleiss, J. L. and Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 33, 613-619.

Fleiss, J. L. and Cuzick, J. (1979). The reliability of dichotomous judgements: Unequal number of judges per subject. Applied Psychological Measurement, 3, 537-542.

Gini, C. (1912). Variabilita e mutabilita: contributo allo studio delle distribuzioni e delle relazioni statistiche. Bologna: Cuppini.

Gini, C. (1939). Variabilita e Concentrazione. Vol. 1 di: Memorie di metodologia statistica. Milano: Giuffre.

Krippendorff, K. (1970). Bivariate agreement coefficients for reliability of data. In E. F. Borgatta and G. W. Bohrnstedt (Eds.), Sociological methodology 1970. San Francisco: Jossey-Bass.

10.

Light, R. J. and Margolin, B. H. (1971). An analysis of variance for categorical data. Journal of the American Statistical Association, 66, 534-544.

11.

Rae, G. (1984). On measuring agreement among several judges on the presence or absence of a trait. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 44, 247-253.

12.

Shrout, P. E. and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420-428.

13.

Winer, B. J. (1971). Statistical principles in experimental design (2nd ed.). New York: McGraw-Hill.