A permutation algorithm and associated FORTRAN program are provided for weighted kappa. Program EWK provides the weighted kappa test statistic and the exact one-sided upper-tail probability values.
Get full access to this article
View all access options for this article.
References
1.
AgrestiA. (2002) Categorical data analysis. (2nd ed.) New York: Wiley.
2.
AgrestiA.WinnerL. (1997) Evaluating agreement and disagreement among movie reviewers. Chance, 10, 10–14.
3.
AndrésA. M.MarzoP. F. (2005) Chance-corrected measures of reliability and validity in K x K tables. Statistical Methods in Medical Research, 14, 473–492.
4.
BanerjeeM.CapozzoliM.McSweeneyL.SinhaD. (1999) Beyond kappa: a review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3–23.
5.
BarnhartH. X.WilliamsonJ. M. (2002) Weighted least-squares approach for comparing correlated kappa. Biometrics, 58, 1012–1019.
6.
BartkoJ. J.CarpenterW. T. (1976) On the methods and theory of reliability. Journal of Nervous and Mental Disease, 163, 307–317.
7.
BruscoM. J.StahlS.SteinleyD. (in press) An implicit enumeration method for an exact test for weighted kappa. British Journal of Mathematical and Statistical Psychology.
8.
CicchettiD. V. (1981) Testing the normal approximation and minimal sample size requirements of weighted kappa when the number of categories is large. Applied Psychological Measurement, 5, 101–104.
9.
CicchettiO. V. (1994) Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.
10.
CicchettiD. V.FleissJ. L. (1977) Comparison of the null distribution of weighted kappa and the C ordinal statistic. Applied Psychological Measurement, 1, 195–201.
11.
CohenJ. (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
12.
CohenJ. (1968) Weighted kappa: nominal scale agreement: with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
13.
De MastJ. (2007) Agreement and kappa-type indices. The American Statistician, 61, 148–153.
14.
EverittB. S. (1968) Moments of the statistics kappa and weighted kappa. British Journal of Mathematical and Statistical Psychology, 21, 97–103.
15.
FleissJ. L. (1981) Statistical methods for rates and proportions. (2nd ed.) New York: Wiley.
16.
FleissJ. L.CicchettiD. V. (1978) Inference about weighted kappa in the non-null case. Applied Psychological Measurement, 2, 113–117.
17.
FleissJ. L.CohenJ.EverittB. S. (1969) Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.
18.
FleissJ. L.LevinB.PaikM. C. (2003) Statistical methods for rates and proportions. (3rd ed.) Hoboken, NJ: Wiley.
19.
GrahamP.JacksonR. (1993) The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46, 1055–1062.
20.
KraemerH. C.PeriyakoilV. S.NodaA. (2002) Kappa coefficients in medical research. Statistics in Medicine, 21, 2109–2129.
21.
KramerM. S.FeinsteinA. R. (1981) Clinical biostatistics: LIV. The biostatistics of concordance. Clinical Pharmacology and Therapeutics, 29, 111–123.
22.
KvålsethT. O. (2003) Weighted specific-category kappa measure of interobserver agreement. Psychological Reports, 93, 1283–1290.
23.
LudbrookJ. (2002) Statistical techniques for comparing measurers and methods of measurement: a critical review. Clinical and Experimental Pharmacology and Physiology, 29, 527–536.
24.
MaclureM.WillettW. C. (1987) Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161–169.
25.
MielkeP. W.BerryK. J. (2007) Permutation methods: a distance function approach. (2nd ed.) New York: Springer-Verlag.
26.
MielkeP. W.BerryK. J.JohnstonJ. E. (2005) A FORTRAN program for computing the exact variance of weighted kappa. Perceptual and Motor Skills, 101, 468–472.
27.
NelsonL. M.LongstrethW. T.KoepsellT. D.CheckowayH.Van BelleG. (1994) Completeness and accuracy of interview data from proxy respondents: demographic, medical, and life-style factors. Epidemiology, 5, 204–217.
PerkinsS. M.BeckerM. P. (2002) Assessing rater agreement using marginal association models. Statistics in Medicine, 21, 1743–1760.
30.
SaundersI. W. (1984) Algorithm AS 205: enumeration of RxC tables with repeated row totals. Applied Statistics, 33, 340–352.
31.
SchusterC. (2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.
32.
SpitzerR. L.CohenJ.FleissJ. L.EndicottJ. (1967) Quantification of agreement in psychiatric diagnosis. Archives of General Psychiatry, 17, 83–87.
33.
SubkoviakM. J. (1988) A practitioner's guide to computation and interpretation of reliability indices for mastery tests. Journal of Educational Measurement, 25, 47–55.
34.
Von EyeA.MunE. Y. (2005) Analyzing rater agreement.Mahwah, NJ: Erlbaum.