Sage Journals: Discover world-class research

Abstract

The validity of five methods of estimating the reliability of criterion-referenced tests was evaluated: one method based on the binomial expansion; two based on Kuder and Richardson's formulae 20 and 21; and two methods based on the analysis of variance. The methods were compared across nine conditions of variability among item means. Within each condition the number of test items, number of testees, the value of the criterion, the population mean, and variance of true scores were varied to form 1024 cases. The results were analyzed by means of a conditions-by-methods analysis of variance, the Newman-Keuls test, and a nonparametric multiple comparison procedure. There was a tendency for all of the methods to be conservative. The KR-21 method tended to be more valid given low variability among item means, and the KR-20 method given high variability.

Get full access to this article

View all access options for this article.

References

Box, G. E. P. Some theorems and quadratic forms applied in the study of analysis of variance problems: I. Effect of inequality of variance in the one-way classification, Annals of Mathematical Statistics, 1954, 25, 290-302.

Hambleton, R. K. and Novick, M. R. Toward an integration of theory and method for criterion-referenced tests. Journal of Educational Measurement, 1973, 10, 159-170.

Hollander, M. and Wolfe, D. A. Nonparametric Statistical Methods, New York: John Wiley, 1973.

Hoyt, C. J. Test reliability estimated by analysis of variance, Psychometrika, 1941, 6, 153-160.

Kuder, G. F. and Richardson, M. W. The theory of the estimation of test reliability , Psychometrika, 1937, 2, 151-160.

Livingston, S. A. Criterion-referenced applications of classical test theory. Journal of Educational Measurement , 1972, 9, 13-26.

Lord, F. M. Do tests of the same length have the same standard error of measurement, EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1957, 17, 510-521.

Lord, F. M. and Novick, M. R. Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley, 1968.

Lovett, H. T. Criterion-referenced reliability: A comparison of five methods of estimating the Livingston coefficient. 1975 Social Statistics Section Proceedings of the American Statistical Association, 1975, 538-541. (a)

10.

Lovett, H. T. Elaboration and application of a theory of criterion-referenced reliability. Paper read at the meeting of the South-eastern Psychological Association, Atlanta, 1975. (b)

11.

Lovett, H. T. Estimation of the reliability of criterion-referenced tests having multiple criteria. Paper read at the meeting of the South-eastern Psychological Association. New Orleans, 1976.

12.

Lovett, H. T. Criterion-referenced reliability estimated by ANOVA. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT , 1977, 37, 21-29.

13.

Mehrens, W. A. and Lehmann, I. J. Measurement and evaluation in education and psychology . New York: Holt, Rinehart and Winston , 1973.

14.

Nrand, in Large-scale systems math-pack programmers reference. New York: Univac Division of Sperry Rand , 1973, Section 14, 1-3.

15.

Popham, W. J. An evaluation guidebook: a set of practical guidelines for the educational evaluator, Los Angeles: The Instructional Objectives Exchange , 1972.

16.

Randn , in Large-scale systems math-pack programmers reference. New York: Univac Division of Sperry Rand, 1973, Section 14, 8-12.

17.

Winer, B. J. Statistical principles in experimental design. New York: McGraw-Hill, 1971.

The Effect of Violating the Assumption of Equal Item Means in Estimating the Livingston Coefficient

Abstract

Get full access to this article

References