Sage Journals: Discover world-class research

Abstract

This article describes the limitations of certain statistical techniques and approaches for handling multiple criteria in assessing reliability, concurrent validity, and generalizability. The article also suggests alternatives to using these approaches on performance assessment measures. The application of the latent variable modeling approach to the data revealed a significant improvement in the degree of interrater reliability, concurrent validity, and generalizability (over raters and topics) on a scoring rubric. The improvement in concurrent validity was particularly noticeable. However, it should be noted that one of the main limitations of this study was the use of a small number of subjects, which could have affected the validity of some of the findings.

Get full access to this article

View all access options for this article.

References

Abedi, J. (1991). Predicting graduate academic success from under graduate academic performance: A canonical correlation study. Educational and Psychological Measurement, 51, 151-160.

Abedi, J. , Baker, E. L. , & Herl, H. (1993, April). Comparing reliability indices obtained by different approaches for performance assessment. Paper presented at the annual meeting of the American Educational Research Association, Atlanta.

Baker, E. L. , Clayton, S. , Aschbacher, P. , Chang, S. , & Ni, Y (1990, April). Measuring deep understanding of history: The integration of prior knowledge and knowledge acquisition in explanations. Paper presented at the annual meeting of the American Educational Research Association, Boston.

Bentler, P.M. (1992). EQS structural equations program manual. Los Angeles: BMDP Statistical Software, Inc.

Bentler, P. M. , & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological Bulletin, 88, 588-606.

Bers, T. H. , & Smith, K. E. (1990). Assessing assessment programs: The theory and practice of examining reliability and validity of a writing placement test. Community College Review, 18(3), 17-27.

Bloch, D. A. (1989). 2 x 2 kappa coefficients: Measures of agreement or association. Biometrics, 45, 269-287.

Cohen, J. (1960). A coefficient of agreement for normal scales. Educational and Psychological Measurement, 20, 37-46.

Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement and partial credit. Psychological Bulletin, 70, 213-220.

10.

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 1, 98-104.

11.

Crews, W. E., Jr. (1991, February). Analysis of interrater reliability on the evaluation of answers to open-ended questions. Paper presented at the annual meeting of the North Carolina Association for Research in Education, Chapel Hill, NC.

12.

Hayduk, L. A. (1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore: Johns Hopkins University Press.

13.

Kaplan, B. A. , & Johnson, E. G. (1992, April). Reliability of professionally scored data: NAEP-related issues. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

14.

Kraemer, H. C. , Bliwise, N. G. , & Bliwise, D. L. (1991). Appropriate statistics for interrater reliability in sleep studies. Sleep, 14(1), 89-90.

15.

Lehmann, R. H. (1990). Reliability and generalizability of ratings of compositions. Studies in Educational Evaluation, 16, 501-512.

16.

Lundberg, U. , Westermark, O. , & Rasch, B. (1990). Type A behaviour in pre-school children: Interrater reliability, stability over six months and subcomponents. Scandinavian Journal of Psychology, 31, 121-127.

17.

McNamara, T. F. , & Adams, R. J. (1991, March). Exploring rater behaviour with Rasch techniques. Paper presented at the annual Language Testing Research Colloquium, Princeton, NJ.

18.

Posner, K. L. , Sampson, P. D. , Caplan, R. A. , Ward, R. J. , & Cheney, F. W. (1990). Measuring interrater reliability among multiple raters: An example of methods for nominal data. Statistics in Medicine, 9, 1103-1115.

19.

Rothstein, H. R. (1990). Interrater reliability of job performance ratings: Growth to asymptote level with increasing opportunity to observe. Journal of Applied Psychology, 75(3), 322-327.

20.

Scherer, M. J. , & McKee, B. G. (1992, April). Early validity and reliability data for two instruments assessing the predispositions people have toward technology use: Continued integration of quantitative and qualitative methods. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

21.

Shechtman, Z. (1992). Interrater reliability of a single group assessment procedure administered in several educational settings. Journal of Personnel Evaluation in Education, 6, 31-39.

22.

Winer, B. J. , Brown, D. R. , & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed.). New York: McGraw-Hill.

A Latent-Variable Modeling Approach Toassessing Interrater Reliability,Topic Generalizability,and Validity of a Content Assessment Scoring Rubric

Abstract

Get full access to this article

References