A Computer Program for Calculating Subject-by-Subject Kappa or Weighted Kappa Coefficients

Abstract

This computer program calculates individual subject kappa or weighted kappa coefficients for each of the following three types of categorical data: (a) nominal (dichotomous/polychotomous), (b) ordinal (dichotomous /continuous), and (c) mixed scales of measurement (containing both nominal and ordinal features). Additional output includes criteria for determining levels of both statistical and clinical significance as well as specific tests of examiner bias.

Get full access to this article

View all access options for this article.

References

Bowker, A. H. (1948). A test for symmetry in contingency tables. Journal of American Statistical Association, 43, 572-574.

Cicchetti, D. V. (1976). Assessing interrater reliability for rating scales: Resolving some basic issues. British Journal of Psychiatry, 129, 452-456.

Cicchetti, D. V. and McCarthy, P. (1989, August). Reliability of mothers' health ratings of febrile infants. Paper presented at the Joint Meetings of the American Statistical Association, Washington, D.C.

Cicchetti, D. V. and Sparrow, S. S. (1981). Developing criteria for establishing interrater reliability of specific items: Applications to assessment of adaptive behavior. American Journal of Mental Deficiency, 86, 127-137.

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46.

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213-220.

Fleiss, J. L. (1981). Statistical methods for rates and proportions (2nd ed.). New York: Wiley.

Fleiss, J. L. , Cohen, J. , and Everitt, B. S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323-327.

Kraemer, H. C. (1979). Tests of homogeneity of independent correlation coefficients. Psychometrika, 44, 329-335.

10.

McCarthy, P. , Sharpe, M. R. , Spiesel, S. Z. , Dolan, T. F. , Forsyth, B. W. , DeWitt, T. G. , Fink, H. D. , Baron, M. A. , and Cicchetti, D. V. (1982). Observation scales to identify serious illness in febrile children. Pediatrics, 70, 802-809.

11.

McCarthy, P. L. , Sznajderman, S. D. , Lustman-Findling, K. , Baron, M. A. , Fink, H. D. , Czarkowski, N. , Bauchner, H. , Forsyth, B. C. , and Cicchetti, D. V. (in press). Mothers' clinical judgment: A randomized trial of the acute illness observation scales.

12.

McNemar, Q. (1947). Note on the sampling error of the differences between correlated proportions or percentages. Psychometrika, 12, 153-157.