Consider a reliability study in which different subjects are judged on a dichotomous trait by dif ferent sets of judges, possibly unequal in number. A kappa-like measure of reliability is proposed, its correspondence to an intraclass correlation co efficient is pointed out, and a test for its statistical significance is presented. A numerical example is given.
Get full access to this article
View all access options for this article.
References
1.
Cohen, J.A coefficient of agreement for nominal scalesEducational and Psychological Measurement.1960, 20. 37-46.
2.
Cuzick, J.Asymptotic normality of ?2 in mxn tables with n large and small cell expectations Unpublished manuscript, 1977. Available from SIAM Institute for Mathematics and Society Study of Environmental Factors and Health.
3.
Fleiss, J.L.Measuring nominal scale agreement among many ratersPsychological Bulletin, 1971, 76. 378-382.
4.
Fleiss, J.L.Measuring agreement between two judges on the presence or absence of a traitBiometrics. 1975 , 31, 651-659.
5.
Fleiss, J.L., & Cohen, J.The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliabilityEducational and Pyschological Measurement, 1973, 33, 613-619.
6.
Fleiss, J.L., Nee, J., & Landis, J.R.The large sample variance of kappa in the case of different sets of ratersPsyrhological Bulletin, 1979 , 86, 974-977.
7.
Haldane, J.B.S.The mean and variance of ?2, when used as a test of homogeneity, when expectations are smallBiometrika , 1939, 31, 346-355.
8.
Landis, J.R., & Koch, G.G.The measurement of observer agreement for categorical dataBiometrics. 1977, 33, 159-174. (a)
9.
Landis, J.R., & Koch, G.G.A one-way components of variance model for categorical dataBiometrics, 1977, 33, 671-679. (b)
10.
Light, R.J.Measures of response agreement for qualitative data: Some generalizations and alternativesPsychological Bulletin , 1971, 76, 365-377.
11.
Snedecor, G.W., & Cochran, W.G.Statistical Methods (6thEd.). Ames: Iowa State University Press, 1967.