Resampling Permutation Probability Values for Weighted Kappa

Abstract

A permutation algorithm and associated FORTRAN program are provided for resampling weighted kappa. Program RWK provides the weighted kappa test statistic and the resampling one-sided upper-tail probability value.

Get full access to this article

View all access options for this article.

References

Bannerjee

Capozzoli

McSweeney

Sinha

(1999) Beyond kappa: a review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3–23.

Berry

K. J.

Johnston

J. E.

Mielke

P. W.

(2008) Exact probability values for weighted kappa. Psychological Reports, 102, 53–57.

Berry

Mielke

P. W.

Mielke

H. W.

(2002) The Fisher-Pitman permutation test: an attractive alternative to the F test. Psychological Reports, 90, 495–502.

Cicchetti

D. V.

(1994) Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.

Cicchetti

D. V.

Allison

(1971) A new procedure for assessing reliability of scoring EEG sleep recordings. American Journal of EEC Technology, 11, 101–109.

Cicchetti

D. V.

Fleiss

J. L.

(1977) Comparison of the null distribution of weighted kappa and the C ordinal statistic. Applied Psychological Measurement, 1, 195–201.

Cohen

(1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

Cohen

(1968) Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.

Edgington

E. S.

Onghena

(2007) Randomization tests. (4th ed.) Boca Raton, FL: Chapman & Hall/CRC.

10.

Everitt

B. S.

(1968) Moments of the statistics kappa and weighted kappa. British Journal of Mathematical & Statistical Psychology, 21, 97–103.

11.

Fleiss

J. L.

Cicchetti

D. V.

(1978) Inference about weighted kappa in the non-null case. Applied Psychological Measurement, 2, 113–117.

12.

Fleiss

J. L.

Cohen

(1973) Equivalence of weighted kappa and intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.

13.

Fleiss

J. L.

Cohen

Everitt

B. S.

(1969) Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.

14.

Fleiss

J. L.

Levin

Paik

M. C.

(2003) Statistical methods for rates and proportions. (3rd ed.) Hoboken. NJ: Wiley.

15.

Gail

Mantel

(1977) Counting the number of contingency tables with fixed marginals. Journal of the American Statistical Association, 72, 859–862.

16.

Graham

Jackson

(1993) The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46, 1055–1062.

17.

Holford

T. R.

(2003) Editorial: exact methods for categorical data. Statistical Methods in Medical Research, 12, 1.

18.

Johnston

J. E.

Berry

K. J.

Mielke

P. W.

(2007) Permutation tests: precision in estimating probability values. Perceptual and Motor Skills, 105, 915–920.

19.

Kundel

H. L.

Polansky

(2003) Measurement of observer agreement. Radiology, 228, 303–308.

20.

Ludbrook

J. L.

Dudley

(1998) Why permutation tests are superior to t and F tests in biomedical research. American Statistician, 52, 127–132.

21.

Maclure

Willett

W. C.

(1987) Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161–169.

22.

Mielke

P. W.

Berry

K. J.

Johnston

J. E.

(2005) A FORTRAN program for computing the exact variance of weighted kappa. Perceptual and Motor Skills, 101, 468–472.

23.

Mielke

P. W.

Berry

K. J.

Johnston

J. E.

(2007) Resampling programs for multiway contingency tables with fixed marginal frequency totals. Psychological Reports, 101, 18–24.

24.

Nelson

L. M.

Longstreth

W. T.

Koepsell

T. D.

van Belle

(1990) Proxy respondents in epidemiologic research. Epidemiologic Reviews, 12, 71–86.

25.

Patefield

W. M.

(1981) Algorithm AS 159. An efficient method of generating random RxC tables with given row and column totals. Applied Statistics, 30, 91–97.

26.

Perkins

S. M.

Becker

M. P.

(2002) Assessing rater agreement using marginal association models. Statistics in Medicine, 21, 1743–1760.

27.

Schuster

(2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64. 243–253.

28.

Schuster

Smith

D. A.

(2005) Dispersion-weighted kappa: an integrative framework for metric and nominal scale agreement coefficients. Psychometrika, 70, 135–146.

29.

Spitzer

R. L.

Cohen

Fleiss

J. L.

Endicott

(1967) Quantification of agreement in psychiatric diagnosis. Archives of General Psychiatry, 17, 83–87.