Weighted kappa described by Cohen in 1968 is widely used in psychological research to
measure agreement between two independent raters. Everitt then provided the exact variance
for weighted kappa for two raters. In this paper, Everitt's exact variance is extended to
three or more raters.
Get full access to this article
View all access options for this article.
References
1.
AndrésA. M.MarzoP. E.
(2004) Delta: a new measure of agreement between two
raters. British Journal of Mathematical and Statistical
Psychology, 57,
1–19.
2.
AndrésA. M.MarzoP. F.
(2005) Chance-corrected measures of reliability and validity
in K x K tables. Statistical
Methods in Medical Research, 14,
473–492.
3.
BanerjeeM.CapozzoliM.McSweeneyL.SinhaD.
(1999) Beyond kappa: a review of interrater agreement
measures. The Canadian Journal of Statistics,
27, 3–23.
4.
BrennanR. L.PredigerD. J.
(1981) Coefficient kappa: some uses, misuses, and
alternatives. Educational and Psychological
Measurement, 41,
687–699.
5.
CohenJ.
(1960) A coefficient of agreement for nominal
scales. Educational and Psychological Measurement,
20, 37–46.
6.
CohenJ.
(1968) Weighted kappa: nominal scale agreement with
provision for scaled disagreement or partial credit.
Psychological Bulletin, 70,
213–220.
7.
De MastJ.
(2007) Agreement and kappa-type indices.
The American Statistician, 61,
148–153.
8.
EverittB.
S. (1968) Moments of the
statistics kappa and weighted kappa. British Journal of
Mathematical and Statistical Psychology, 21,
97–103.
9.
FleissJ. L.CohenJ.EverittB. S.
(1969) Large sample standard errors of kappa and weighted
kappa. Psychological Bulletin, 72,
323–327.
10.
GrahamP.JacksonR.
(1993) The analysis of ordinal agreement data: beyond
weighted kappa. Journal of Clinical Epidemiology,
46, 1055–1062.
11.
HsuL. M.FieldR.
(2003) Interrater agreement measures: comments on
kappan, Cohen's kappa, Scott's Π, and Aickin's a.
Understanding Statistics, 2,
205–219.
12.
KundelH. L.PolanskyM.
(2003) Measurement of observer agreement.
Radiology, 228,
303–308.
13.
MaclureM.WillettW. C.
(1987) Misinterpretation and misuse of the kappa
statistic. American Journal of Epidemiology,
126, 161–169.
14.
MielkeP. W.BerryK. J.
(1988) Cumulant methods for analyzing independence of r-way
contingency tables and goodness-of-fit frequency data.
Biometrika, 75,
790–793.
15.
MielkeP. W.BerryK. J.JohnstonJ. E.
(2005) A FORTRAN program for computing the exact variance of
weighted kappa. Perceptual and Motor Skills,
101, 468–472.
16.
NelsonJ. C.PepeM. S.
(2000) Statistical description of interrater variability in
ordinal ratings. Statistical Methods in Medical
Research, 9,
475–496.
17.
SchusterC.
(2004) A note on the interpretation of weighted kappa and
its relations to other rater agreement statistics for metric scales.
Educational and Psychological Measurement, 64,
243–253.
18.
SchusterC.SmithD. A.
(2005) Dispersion-weighted kappa: an integrative framework
for metric and nominal scale agreement coefficients.
Psychometrika, 70,
135–146.
19.
ZwickR.
(1988) Another look at interrater agreement.
Psychological Bulletin, 103,
374–378.