An algorithm and associated FORTRAN program are provided for the exact variance of weighted kappa. Program VARKAP provides the weighted kappa test statistic, the exact variance of weighted kappa, a Z score, one-sided lower- and upper-tail N(0,1) probability values, and the two-tail N(0,1) probability value.
Get full access to this article
View all access options for this article.
References
1.
AgrestiA. (2002) Categorical data analysis. (2nd ed.) New York: Wiley.
2.
BanerjeeM.CapozzoliM.McSweeneyL., & SinhaD. (1999) Beyond kappa: a review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3–23.
3.
BerryK. J.JohnstonJ. E., & MielkeP. W.Jr. (2005) Exact and resampling probability values for weighted kappa. Psychological Reports, 96, 243–252.
4.
CicchettiD. V. (1981) Testing the normal approximation and minimal sample size requirements of weighted kappa when the number of categories is large. Applied Psychological Measurement, 5, 101–104.
5.
CicchettiD. V., & AllisonT. (1971) A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101–109.
6.
CicchettiD. V., & FleissJ. L. (1977) Comparison of the null distribution of weighted kappa and the C ordinal statistic. Applied Psychological Measurement, 1, 195–201.
7.
CohenJ. (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
8.
CohenJ. (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
9.
EverittB. S. (1968) Moments of the statistics kappa and weighted kappa. British Journal of Mathematical and Statistical Psychology, 21, 97–103.
10.
FleissJ. L. (1981) Statistical methods for rates and proportions. (2nd ed.) New York: Wiley.
11.
FleissJ. L., & CicchettiD. V. (1978) Inference about weighted kappa in the non-null case. Applied Psychological Measurement, 2, 113–117.
12.
FleissJ. L., & CohenJ. (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.
13.
FleissJ. L.CohenJ., & EverittB. S. (1969) Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.
14.
FleissJ. L.LevinB., & PaikM. C. (2003) Statistical methods for rates and proportions. (5th ed.) Hoboken, NJ: Wiley.
15.
KingmanA. (2002) Beyond weighted kappa when evaluating examiner agreement for ordinal responses. Journal of Dental Research, 81, A219.
16.
KramerM. S., & FeinsteinA. R. (1981) Clinical biostatistics: LIV. The biostatistics of concordance. Clinical Pharmacology and Therapeutics, 29, 111–123.
LudbrookJ. (2002) Statistical techniques for comparing measures and methods of measurement: a critical review. Clinical and Experimental Pharmacology and Physiology, 29, 527–536.
19.
MaclureM., & WillettW. C. (1987) Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161–169.
20.
MielkeP. W.Jr., & BerryK. J. (2001) Permutation methods: a distance function approach. New York: Springer-Verlag.
21.
PerkinsS. M., & BeckerM. P. (2002) Assessing rater agreement using marginal association models. Statistics in Medicine, 21, 1743–1760.
22.
SchusterC. (2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.
23.
SpitzerR. L.CohenJ.FleissJ. L., & EndicottJ. (1967) Quantification of agreement in psychiatric diagnosis. Archives of General Psychiatry, 17, 83–87.