Sage Journals: Discover world-class research

Abstract

National Assessment has devised a scoring system that involves using nominal classifications of responses as well as an overall evaluation of whether responses provide acceptable evidence that an educational objective has been met. This study was concerned with selecting a measure of scorer agreement for an ongoing quality-control program. The purpose of the agreement statistic was to screen out potential problems for further analysis of disagreements. Based on a review of various statistics proposed for measuring agreement, two were chosen for further study: the simple percent of agreement and Cohen's kappa.

The analysis showed that Cohen's kappa is extremely sensitive to very easy or very difficult items which comprised more than a third of the measures in this study. The percent of agreement statistic, on the other hand, is overly sensitive to high variance items—those of medium difficulty with many scoring categories. Since kappa is inappropriate for so many items, and since National Assessment can use the disagreement information on many-category items in later analysis, staff concluded that Cohen's kappa does not add sufficient information to make its calculation worthwhile.

Get full access to this article

View all access options for this article.

References

Burton

N. W.

Stability of the national assessment scoring methods. In preparation, 1979. Journal of Educational Measurement, 1980, 17, 95–105.

Cochran

W. G.

The comparison of percentages in matched samples. Biometrika, 1950, 37, 256–266.

Coffman

W. E.

Essay examinations. In Thorndike

R. L.

(ed.), Educational Measurement (2nd ed.).Washington, D.C.: American Council on Education, 1971

Cohen

J. A.

A coefficient of agreement for nominal scales. Educational and Psychological Measurement , 1960, 20, 37–46.

Cronbach

Gleser

Nanda

Rajaratnum

The dependability of behavioral measurements. New York: Wiley, 1972.

D'Agostino

R. B.

A second look at analysis of variance on; dichotomous data. Journal of Educational Measurement, 1971, 8, 327–333.

Davidson

T. N.

Coding reliability and consensus coding. (Unpublished manuscript.) Ann Arbor, Michigan: Survey Research Center, University of Michigan, circa 1973.

Folsum

R.E.

National assessment approach to sampling error estimation (Monograph 250-796-5). Research Triangle Park, North Carolina: Research Triangle Institute, April 1977.

Glass

G. V.

Stanley

Statistical methods in education and psychology. Englewood Cliffs, New Jersey: Prentice-Hall, 1970.

10.

Goodman

L. A.

Kruskal

W. H.

Measures of association for cross-classifications. Journal of the American Statistical Association, 1954, 49, 732–764.

11.

Johnson

Burton

Noe

Measures of scorer agreement and patterns of disagreement. (Presented at the annual convention of the American Educational Research Association, Toronto, Canada, 1978.) Denver, Colorado: National Assessment of Educational Progress, 1978.

12.

Kendall

M. G.

Stuart

The advanced theory of statistics (2nd ed.). London: Charles Griffin and Co.. 1967.

13.

Krippindorff

Bivariate agreement coefficients for reliability of data. In Borgatta

E. F.

Bohrnstedt

G. W.

(eds.), Sociological Methodology. San Francisco: Jossey-Bass, 1970.

14.

Light

R. J.

Issues in the analysis of qualitative data. In Travers

R. M. W.

(ed.), Second handbook of research on teaching. Chicago: Rand McNally, 1973.

15.

Pearson

Mathematical contribution to the theory of evaluation. Philosophical Transactions of the Royal Society, 1901, IX (Part: I, Series A), 285–479.

16.

Scott

W. A.

Reliability of a content analysis: the case of nominal scale coding. Public Opinion Quarterly, 1955, 19, 321–325.

Estimating Scorer Agreement for Nominal Categorization Systems

Abstract

Get full access to this article

References