Sage Journals: Discover world-class research

Abstract

The proposed theory provides a basis for both measuring and correcting rater stringency error in some grossly incomplete rating data matrices. The theoretical model fit (R = .92, .85, .82; joint p <.000001) ratings made byfaculty and resident physicians (n = 47, 31, and 29) of student clinical performance in each of three junior year medical student cohorts (n = 29, 30, and 35) better than alternative models. In these data the percentage of variation attributable to stringency and ability was about 35 and 40, respectively. Three-month test-retest reliability for rater stringencies was .16 < r < .29 (joint p < .04). Cross-validation supported the proposed model (r = .61) over the conventional alternative (r = .41; z = 2.62, p < .004). Both reliability and convergent validity of the ability construct were .20 greater for one corrected rating than for one observed (uncorrected) rating.

Get full access to this article

View all access options for this article.

References

BAKER, F. B. (1977) "Advances in item analysis." Rev. of Educ. Research 47: 151-178.

CAMPBELL, D. T. and D. W. FISKE (1959) "Convergent and discriminant validation by the multitrait-multimethod matrix." Psych. Bull. 56: 81-105.

CASON, G. J. (1981) "Clinical rating project interim report number 3: background and status." Resources in Education 16, 8: 194-194. (ERIC Document ED 200 623)

CASON, G. J. (1980) "MERLIN: a FORTRAN program for finding least-squares estimates of rater reference points, subject ability points, and goodness-of-fit for Cason and Cason's model of performance rating." University of Arkansas for Medical Sciences, Little Rock.

CASON, G. J. and C. L. CASON (1982) "Application of latent trait theory to clinical performance rating," in C. Friedman (Chair), "Latent Trait Models: How Useful Are They to Professional Education? Symposium presented at the Annual Meeting of the American Educational Research, April. (Audio tape R34.14 [two cassettes]: AERA Cassettes, c/o Teach 'em, Inc., 160 East Illinois Street, Chicago, IL 60611)

CASON, G. J. (1981) "Some promising early results from a rudimentary latent-trait theory of performance rating." Resources in Education 16, 9: 152-152. (ERIC Document ED 201 669)

CASON, G. J. (1979) "Ratings students' clinical performance: interim report number 2." Presented at the annual meeting of the Mid-South Educational Research Association, October.

CHANDLER, J. P. (1965) "STEPIT: a FORTRAN II subroutine for finding local minima of real functions." Bloomington: Quantum Chemistry Program Exchange, Indiana University.

CRONBACH, L. J. and P. E. MEEHL (1955) "Construct validity in psychological tests." Psych. Bull. 52: 281-302.

10.

DAVIDGE, A. M. , W. K. DAVIS , and A. L. HULL (1980) "A system for the evaluation of medical students' clinical competence." J. of Medical Education 55: 65-67.

11.

DIELMAN, T. E. , A. L. HULL , and W. K. DAVIS (1980)"Psychometric properties of clinical performance rating." Evaluation & the Health Professions 3, 1: 103-117.

12.

EBEL, R. E. (1951) "Estimation of reliability of ratings." Psychometrika 16: 407-424.

13.

GULLIKSEN, H. (1950) Theory of Mental Test. New York: John Wiley.

14.

HAMBLETON, R. K. [ed.] (1983) Applications of Item Response Theory. Vancouver: Educational Research Institute of British Columbia.

15.

HAMBLETON, R. K. , H. SWAMINATHAN , L. L. COOK , D. R. EIGNOR , and J. A. GIFFORD (1978) "Developments in latent trait theory: models, technical issues, and applications." Rev. of Educ. Research 48: 467-510.

16.

LAZAR H. L. , E. C. DELAND , and R. K. TOMPKINS (1980) "Clinical performance versus in-training examinations as measures of surgical competence." Surgery 87: 357-362.

17.

LITTLEFIELD, J. H. , J. T. HARRINGTON , N. E. ANTHRACITE , and R. E. GARMAN (1981) "A description and four-year analysis of a clinical clerkship evaluation system." J. of Medical Education 56: 334-340.

18.

LITTLEFIELD, J. H. , N. E. ANTHRACITE , R. HERBERT , and J. McKENDREE (1933) "Adjusting observational ratings to improve inter-rater consistency." Presented at the annual meeting of the American Educational Research Association, April.

19.

LORD, F. M. (1952) "A theory of test scores." Psychometric Monographs No. 7.

20.

MARSH, H. W. and S. BALL (1981) "Interjudgmental reliability of reviewers forthe Journal of Educational Psychology." J. of Educ. Psychology 73, 6: 872-880.

21.

MCNEMAR, Q. (1966) Psychological Statistics. New York: John Wiley.

22.

MESKAUSKAS, J. A. and J. J. NORCINI (1980) "Standard-setting in written and interactive (oral) specialty certification examinations: issues, models, methods, challenges." Evaluation & the Health Professions 3: 321-360.

23.

MOSIER, C. I. (1951) "Problems and designs of cross-validation." Educ. and Psych. Measurement 11: 5-11.

24.

O'DONOHUE, W. J. and J. F. WERGIN (1978) "Evaluation of medical students during a clinical clerkship in internal medicine." J. of Medical Education 5 3: 55-58.

25.

PRICE, P. B. , C. W. TAYLOR , D. E. NELSON , E. G. LEWIS , G. C. LOUGHMILLER, R. MATHIESEN , S. MURRAY , and J. MAXWELL (1971) Measurement and Predictors of Physician Performance: Two Decades of Intermittently Sustained Research. Salt Lake City, UT: Aaron Press.

26.

PIERLEONI, R. G. , G. M. CLARK , and B. A. DUDDING (1979) "A comparison of faculty, resident, and nurse practitioner ratings of ambulatory pediatric students." Presented at the annual meeting of the American Educational Research Association, April.

27.

PRINTEN, K. J. , W. CHAPPELL , and D. R. WHITNEY (1973) "Clinical performance evaluation of junior medical students." J. of Medical Education 48: 343-348.

28.

RASCH, G. (1966) "An item analysis which takes individual differences into account." British J. of Mathematical and Statistical Psychology 19: 49-57.

29.

REMMERS, H. H. , N. W. SHOCK , and E. L. KELLY (1928) "An empirical study of the validity of the Spearman-Brown formula as applied to the Purdue Rating Scale." J. of Educ. Psychology 18: 187-195.

30.

SAMEJIMA, F. (1973) "Homogeneous case of the continuous response model." Psychometrika 38, 2: 203-219.

31.

SHEEHAN, J. T. , S.D.R. HUSTED , D. CANDEE , C. D. COOK , and M. BARGEN (1980) "Moral judgment as a predictor of clinical performance." Evaluation &the Health Professions 3, 4: 393-404.

32.

STANLEY, J. C. (1961) "Analysis of unreplicated three-way classifications with applications to rater bias and trait independence." Psychometrika 26, 2: 203-219.

33.

STERNBERG, S. (1967) "Stochastic learning theory," pp. 1-120 in R. D. Luce et al. (eds.) Handbook of Mathematical Psychology, Vol. 2. New York: John Wiley.

34.

STILLMAN, P. L. (1980) "Arizona Clinical Interview Medical Rating Scale." Medical Teacher 2, 5: 248-251.

35.

STILLMAN, P. L. , D. R. BROWN , D. L. REDFIELD , and D. L. SABER (1977) "Construct validation of the Arizona Clinical Interview Rating Scale." Educ. and Psych. Measurement 37: 1031-1038.

36.

WARD, J. , and E. JENNINGS (1973) Introduction to Linear Models. Englewood Cliffs, NJ: Prentice-Hall.

37.

WHERRY, R. J. (1952) The Control of Bias in Rating: A Theory of Rating(Personnel Research Board Report 922). Washington, DC: Department of the Army Personnel Research Section, February.

38.

WRIGHT, B. D. and M. H. STONE (1979) Best Test Design. Chicago: MESA Press.

A Deterministic Theory of Clinical Performance Rating

Abstract

Get full access to this article

References