Detecting Halo Effects in Performance-Based Examinations

Abstract

The main purpose of this article is to demonstrate how halo effects may be detected and quantified using two independent ratings of the same person. A practical illustration is given to show how halo effects can be avoided.

Keywords

rated data halo effect performance-based testing language testing classical test theory

Get full access to this article

View all access options for this article.

References

Arbuckle, J.L. ( 2006). Amos (Version 7.0). Chicago, IL: SPSS.

Bechger, T.M. , Kuijper, H. , & Maris, G. ( 2009). Standard setting in relation to the Common European Framework of Reference for Languages: The case of the state examinations of Dutch as a second language. Language Assessment Quarterly , 6, 126-150.

Bechger, T.M. , & Maris, G. ( 2004). Structural equation modelling of multiple facet data: Extending models for multitrait-multimethod data. Psicologica , 25, 253-274.

Bechger, T.M. , Maris, G. , Verstralen, H.H.F. M., & Béguin, A.A. (2003). Using classical test theory in combination with item response theory. Applied Psychological Measurement, 27, 319-334.

Bentler, P.M. ( 1995). EQS structural equations program manual. Encino, CA: Multivariate Software.

Brown, W. ( 1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296-322.

Byrne, B. ( 2006). Structural equation modeling with EQS: Basic concepts, applications, and programming (2nd ed.). Mahwah, NJ: Lawrence Erlbaum.

Byrne, B. ( 2009). Structural equating modelling with AMOS: Basic concepts, applications, and programming (2nd ed.). New-York, NY: Taylor & Francis Group.

Campbell, D.T. , & Fiske, D.W. ( 1959). Convergent and discriminant validation by the multitrait multimethod matrix. Psychological Bulletin, 56, 81-105.

10.

Cooper, W.H. ( 1981). Ubiquitous halo. Psychological Bulletin , 90, 218-244.

11.

Croudace, T. , Dunn, G. , & Pickles, A. ( 2009). General latent variable modelling using Mplus (1sted.). London, UK: Chapman & Hall.

12.

De Finetti, B. ( 1974). Theory of probability. New York, NY : John Wiley.

13.

Eid, M. ( 2000). A multitrait-multimethod model with minimal assumptions . Psychometrika, 65, 241-261.

14.

Goffin, R.D. , & Jackson, D.N. ( 1992). Analysis of multitrait-multirater performance appraisal data: Composite direct product method versus conrmatory factor analysis . Multivariate Behavioral Research, 27, 363-385.

15.

Guilford, J.P. ( 1936). Psychometric methods. New York, NY : McGraw-Hill.

16.

Gulliksen, H. ( 1950). Theory of mental tests. New York, NY: John Wiley.

17.

Hales, L.W. , & Tokar, E. ( 1975). The effect of the quality of preceding responses on the grades assigned to subsequent responses to an essay question. Journal of Educational Measurement, 12, 115-117.

18.

Hoyt, W. ( 2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Bulletin , 5, 64-86.

19.

Ip, E.H. , Smits, D.J.M. , & De Boeck, P. ( 2009). Locally dependent linear logistic test model with person covariates. Applied Psychological Measurement, 3, 555-569.

20.

Jöreskog, K.G. , & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago, IL : Scientic Software.

21.

Kelley, T.L. ( 1924). Note on the reliability of a test: A reply to Dr. Crumm’s criticism. Journal of Educational Psychology , 15, 193-204.

22.

Linacre, J.M. ( 1994). Many-faceted Rasch measurement (2nd ed.). Chicago, IL: Mesa Press .

23.

Lord, F.M. , & Novick, M.R. ( 1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.

24.

Lumley, P. ( 2005). Assessing second language writing: The rater’s perspective. Frankfurt, Germany: Peter Lang.

25.

Maris, G. , & Bechger, T.M. ( 2007). Scoring open ended questions. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics (Vol. 26, pp. 663-680). Amsterdam, Netherlands: Elsevier.

26.

Marsh, H.W. , & Butler, S. ( 1984). Evaluating reading diagnostic tests: An application of confirmatory factor analysis to multitrait-multimethod data. Applied Psychological Measurement, 8, 307-320.

27.

McDonald, R.P. ( 1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.

28.

Murphy, K.R. , Jako, R.A. , & Anhalt, R.L. ( 1993). The nature and consequences of halo error: A critical analysis. Journal of Applied Psychology, 78, 218-225.

29.

Muthén, L.K. , & Muthén, B.O. (1998-2007). Mplus user’s guide. Los Angeles, CA: Muthén & Muthén.

30.

Neale, M.C. , Boker, S.M. , Xie, G. , & Maes, H.H. ( 2003). Mx: Statistical modeling (6th ed.) [Computer software manual]. Department of Psychiatry, Virginia Commonwealth University.

31.

Rasch, G. ( 1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Danish Institute of Educational Research. (Expanded edition, 1980. Chicago, IL: University of Chicago Press)

32.

Rosenzweig, P. ( 2007). The halo effect. New York, NY: Free Press.

33.

Sanders, P.F. , & Verschoor, A.J. (1998). Parallel test construction using classical item parameters. Applied Psychological Measurement , 22, 212-223.

34.

Solomonson, A.L. , & Lance, C.E. ( 1997). Examination of the relationship between true halo and halo error in performance ratings. Journal of Applied Psychology , 82, 665-674.

35.

Spearman, C. ( 1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271-295.

36.

Steiger, J.H. ( 1979). MULTICORR: A computer program for fast, accurate, small-sample tests of correlational pattern hypotheses. Educational and Psychological Measurement, 39, 677-680.

37.

Steiger, J.H. ( 2005). Comparing correlations: Pattern hypothesis tests between and/or within independent samples. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics. A festschrift for Roderick P. Mcdonald (pp. 377-414). Mahwah, NJ: Lawrence Erlbaum.

38.

Steyer, R. , & Eid, M. ( 1993). Messen und Testen. Berlin, Germany: Springer-Verlag.

39.

Thorndike, E.L. ( 1920). A constant error in psychological ratings. Journal of Applied Psychology, 33, 263-271.

40.

Thornton, G.C. ( 1992). Assessment centers in human resource management . Reading, MA: Addison-Wesley .

41.

Vaughan, C. ( 1991). Holistic assessment: What goes on in the rater’s mind. In L. H. Lyons (Ed.), Assessing second language writing in academic contexts (pp. 111-125). Norwood, NJ: Ablex.

42.

Verguts, T. , & De Boeck, P. ( 2001). Some Mantel-Haenszel test of Rasch model assumptions . British Journal of Mathematical and Statistical Psychology , 54, 21-37.

43.

Wang, W. , & Wilson, M. ( 2005). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29, 296-318.

44.

Wells, F.J. ( 1907). A statistical study of literary merit. Archives of Psychology, 1, 1-30.

45.

Woodruffe, C. ( 1998). Assessment centers: Identifying and developing competence . London, UK: Institute of Personnel Management.

46.

Yen, W.M. ( 1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Applied Psychological Measurement, 30, 187-213.