A Computer Program for Determining the Significance of the Difference Between Pairs of Independently Derived Values of Kappa or Weighted Kappa

Abstract

A problem that is of interest to research investigators is the extent to which interrater reliability estimates, based upon categorical data, are similar between pairs of observers all of whom have independently evaluated the same sample of subjects. The problem can be resolved statistically by the computer program described herein.

Get full access to this article

View all access options for this article.

References

Aivano, S. L. , Cicchetti, D. V. , and Levine, J. Selecting the most reliable from a set of judges: Toward a general solution . Proceedings of the American Statistical Association (Social Statistics Section), 1976 , 19 (Part 1), 145-149.

Brennan, R. L. GAPID: A program for generalizability analyses with single-facet designs. Applied Psychological Measurement, 1980, 4, 279.

Campbell, D. T. and Stanley, J. C. Experimental and quasi-experimental designs for research . Chicago: Rand McNally, 1966.

Cicchetti, D. V. Assessing inter-rater reliability for rating scales: Resolving some basic issues. British Journal of Psychiatry, 1976, 129, 452-456.

Cicchetti, D. V. Testing the normal approximation and minimal sample size requirements of weighted kappa when the number of categories is large. (To appear in Applied Psychological Measurement, 1981, 5).

Cicchetti, D. V. , Aivano, S. L. , and Vitale, J. A computer program for assessing the reliability and systematic bias of individual measurements. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1976, 36, 761-764.

Cicchetti, D. V. , Aivano, S. L. , and Vitale, J. Computer programs for assessing rater agreement and rater bias for qualitative data. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1977, 37, 195-201.

Cicchetti, D. V. and Fleiss, J. L. Comparison of the null distributions of weighted kappa and the C ordinal statistic. Applied Psychological Measurement , 1977, 1, 195-201.

Cicchetti, D. V. and Heavens, R. RATCAT (Rater Agreement/Categorical Data. The American Statistician, 1979, 33, 91.

10.

Cicchetti, D. V. , Lee, C. , Fontana, A. F. , and Dowds, B. Noel. A computer program for assessing specific category rater agreement for qualitative data. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1978, 38, 805-813.

11.

Cohen, J. A coefficient of agreement for nominal scales. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20, 37-46.

12.

Cohen, J. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 1968, 70, 213-220.

13.

Cronbach, L. J. , Gleser, G. C. , Nanda, H. , and Rajaratnam, H. The dependability of behavioral measurements. New York: Wiley, 1972.

14.

Fleiss, J. L. and Cicchetti, D. V. Inference about weighted kappa in the non-null case . Applied Psychological Measurement, 1978 , 2, 113-117.

15.

Fleiss, J. L. , Cohen, J. , and Everitt, B. S. Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 1969, 72, 323-327.

16.

Heavens, R. H. , Jr. and Cicchetti, D. V. A computer program for calculating rater agreement and bias statistics using contingency table input. Proceedings of the American Statistical Association (Statistical Computing Section), 1978, 21, 366-370.

17.

Horowitz, L. M. , Inouye, D. , and Siegelinan, E. Y. On averaging judges' ratings to increase their correlation with an external criterion. Journal of Consulting and Clinical Psychology , 1979, 47, 453-458.

18.

Reid, J. B. Reliability assessment of observation data: A possible methodological problem. Child Development , 1970, 41, 1143-1150.

19.

Shrout, P. E. and Fleiss, J. L. Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 1979, 86, 420-428.