Sage Journals: Discover world-class research

Abstract

A permutation algorithm and associated FORTRAN program are provided for weighted kappa. Program EWK provides the weighted kappa test statistic and the exact one-sided upper-tail probability values.

Get full access to this article

View all access options for this article.

References

Agresti

(2002) Categorical data analysis. (2nd ed.) New York: Wiley.

Agresti

Winner

(1997) Evaluating agreement and disagreement among movie reviewers. Chance, 10, 10–14.

Andrés

A. M.

Marzo

P. F.

(2005) Chance-corrected measures of reliability and validity in K x K tables. Statistical Methods in Medical Research, 14, 473–492.

Banerjee

Capozzoli

McSweeney

Sinha

(1999) Beyond kappa: a review of interrater agreement measures. The Canadian Journal of Statistics, 27, 3–23.

Barnhart

H. X.

Williamson

J. M.

(2002) Weighted least-squares approach for comparing correlated kappa. Biometrics, 58, 1012–1019.

Bartko

J. J.

Carpenter

W. T.

(1976) On the methods and theory of reliability. Journal of Nervous and Mental Disease, 163, 307–317.

Brusco

M. J.

Stahl

Steinley

(in press) An implicit enumeration method for an exact test for weighted kappa. British Journal of Mathematical and Statistical Psychology.

Cicchetti

D. V.

(1981) Testing the normal approximation and minimal sample size requirements of weighted kappa when the number of categories is large. Applied Psychological Measurement, 5, 101–104.

Cicchetti

O. V.

(1994) Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.

10.

Cicchetti

D. V.

Fleiss

J. L.

(1977) Comparison of the null distribution of weighted kappa and the C ordinal statistic. Applied Psychological Measurement, 1, 195–201.

11.

Cohen

(1960) A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.

12.

Cohen

(1968) Weighted kappa: nominal scale agreement: with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.

13.

De Mast

(2007) Agreement and kappa-type indices. The American Statistician, 61, 148–153.

14.

Everitt

B. S.

(1968) Moments of the statistics kappa and weighted kappa. British Journal of Mathematical and Statistical Psychology, 21, 97–103.

15.

Fleiss

J. L.

(1981) Statistical methods for rates and proportions. (2nd ed.) New York: Wiley.

16.

Fleiss

J. L.

Cicchetti

D. V.

(1978) Inference about weighted kappa in the non-null case. Applied Psychological Measurement, 2, 113–117.

17.

Fleiss

J. L.

Cohen

Everitt

B. S.

(1969) Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.

18.

Fleiss

J. L.

Levin

Paik

M. C.

(2003) Statistical methods for rates and proportions. (3rd ed.) Hoboken, NJ: Wiley.

19.

Graham

Jackson

(1993) The analysis of ordinal agreement data: beyond weighted kappa. Journal of Clinical Epidemiology, 46, 1055–1062.

20.

Kraemer

H. C.

Periyakoil

V. S.

Noda

(2002) Kappa coefficients in medical research. Statistics in Medicine, 21, 2109–2129.

21.

Kramer

M. S.

Feinstein

A. R.

(1981) Clinical biostatistics: LIV. The biostatistics of concordance. Clinical Pharmacology and Therapeutics, 29, 111–123.

22.

Kvålseth

T. O.

(2003) Weighted specific-category kappa measure of interobserver agreement. Psychological Reports, 93, 1283–1290.

23.

Ludbrook

(2002) Statistical techniques for comparing measurers and methods of measurement: a critical review. Clinical and Experimental Pharmacology and Physiology, 29, 527–536.

24.

Maclure

Willett

W. C.

(1987) Misinterpretation and misuse of the kappa statistic. American Journal of Epidemiology, 126, 161–169.

25.

Mielke

P. W.

Berry

K. J.

(2007) Permutation methods: a distance function approach. (2nd ed.) New York: Springer-Verlag.

26.

Mielke

P. W.

Berry

K. J.

Johnston

J. E.

(2005) A FORTRAN program for computing the exact variance of weighted kappa. Perceptual and Motor Skills, 101, 468–472.

27.

Nelson

L. M.

Longstreth

W. T.

Koepsell

T. D.

Checkoway

Van Belle

(1994) Completeness and accuracy of interview data from proxy respondents: demographic, medical, and life-style factors. Epidemiology, 5, 204–217.

28.

Nelson

L. M.

Longstreth

W. T.

Koepsell

T. D.

Van Belle

(1990) Proxy respondents in epidemiologic research. Epidemiologic Reviews, 12, 71–86.

29.

Perkins

S. M.

Becker

M. P.

(2002) Assessing rater agreement using marginal association models. Statistics in Medicine, 21, 1743–1760.

30.

Saunders

I. W.

(1984) Algorithm AS 205: enumeration of RxC tables with repeated row totals. Applied Statistics, 33, 340–352.

31.

Schuster

(2004) A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.

32.

Spitzer

R. L.

Cohen

Fleiss

J. L.

Endicott

(1967) Quantification of agreement in psychiatric diagnosis. Archives of General Psychiatry, 17, 83–87.

33.

Subkoviak

M. J.

(1988) A practitioner's guide to computation and interpretation of reliability indices for mastery tests. Journal of Educational Measurement, 25, 47–55.

34.

Von Eye

Mun

E. Y.

(2005) Analyzing rater agreement. Mahwah, NJ: Erlbaum.

Exact Permutation Probability Values for Weighted Kappa

Abstract

Get full access to this article

References