A permutation algorithm and associated FORTRAN program are provided for resampling weighted kappa. Program RWK provides the weighted kappa test statistic and the resampling one-sided upper-tail probability value.
Get full access to this article
View all access options for this article.
References
1.
BannerjeeM.CapozzoliM.McSweeneyL.SinhaD. (1999) Beyond kappa: a review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3–23.
2.
BerryK. J.JohnstonJ. E.MielkeP. W. (2008) Exact probability values for weighted kappa. Psychological Reports, 102, 53–57.
3.
BerryK.MielkeP. W.MielkeH. W. (2002) The Fisher-Pitman permutation test: an attractive alternative to the F test. Psychological Reports, 90, 495–502.
4.
CicchettiD. V. (1994) Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.
5.
CicchettiD. V.AllisonT. (1971) A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEC Technology, 11, 101–109.
6.
CicchettiD. V.FleissJ. L. (1977) Comparison of the null distribution of weighted kappa and the C ordinal statistic. Applied Psychological Measurement, 1, 195–201.
7.
CohenJ. (1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.
8.
CohenJ. (1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
EverittB. S. (1968) Moments of the statistics kappa and weighted kappa. British Journal of Mathematical & Statistical Psychology, 21, 97–103.
11.
FleissJ. L.CicchettiD. V. (1978) Inference about weighted kappa in the non-null case. Applied Psychological Measurement, 2, 113–117.
12.
FleissJ. L.CohenJ. (1973) Equivalence of weighted kappa and intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.
13.
FleissJ. L.CohenJ.EverittB. S. (1969) Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.
14.
FleissJ. L.LevinB.PaikM. C. (2003) Statistical methods for rates and proportions. (3rd ed.) Hoboken. NJ: Wiley.
15.
GailM.MantelN. (1977) Counting the number of contingency tables with fixed marginals. Journal of the American Statistical Association, 72, 859–862.
16.
GrahamP.JacksonR. (1993) The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46, 1055–1062.
17.
HolfordT. R. (2003) Editorial: exact methods for categorical data. Statistical Methods in Medical Research, 12, 1.
18.
JohnstonJ. E.BerryK. J.MielkeP. W. (2007) Permutation tests: precision in estimating probability values. Perceptual and Motor Skills, 105, 915–920.
19.
KundelH. L.PolanskyM. (2003) Measurement of observer agreement. Radiology, 228, 303–308.
20.
LudbrookJ. L.DudleyH. (1998) Why permutation tests are superior to t and F tests in biomedical research. American Statistician, 52, 127–132.
21.
MaclureM.WillettW. C. (1987) Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161–169.
22.
MielkeP. W.BerryK. J.JohnstonJ. E. (2005) A FORTRAN program for computing the exact variance of weighted kappa. Perceptual and Motor Skills, 101, 468–472.
23.
MielkeP. W.BerryK. J.JohnstonJ. E. (2007) Resampling programs for multiway contingency tables with fixed marginal frequency totals. Psychological Reports, 101, 18–24.
PatefieldW. M. (1981) Algorithm AS 159. An efficient method of generating random RxC tables with given row and column totals. Applied Statistics, 30, 91–97.
26.
PerkinsS. M.BeckerM. P. (2002) Assessing rater agreement using marginal association models. Statistics in Medicine, 21, 1743–1760.
27.
SchusterC. (2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64. 243–253.
28.
SchusterC.SmithD. A. (2005) Dispersion-weighted kappa: an integrative framework for metric and nominal scale agreement coefficients. Psychometrika, 70, 135–146.
29.
SpitzerR. L.CohenJ.FleissJ. L.EndicottJ. (1967) Quantification of agreement in psychiatric diagnosis. Archives of General Psychiatry, 17, 83–87.