Sage Journals: Discover world-class research

Abstract

This study presents a new criterion-referenced approach for exploring rating quality within the framework of latent-class signal detection theory (LC-SDT) that goes beyond commonly used reliability indices, and provides substantively meaningful indicators of rater accuracy that can be used to inform rater training and monitoring at the individual rater level. Specifically, this study illustrates a flexible application of restricted LC-SDT modeling, in which restrictions can be specified for the true latent classification to reflect the unique characteristics of a particular assessment context. While the LC-SDT modeling framework provides immediately useful characterizations of raters’ behavior, the restricted LC-SDT offers complementary evidence to further support the monitoring of rater behavior by bringing criterion ratings to bear. This study uses ratings from a large-scale writing assessment, and findings suggest that the criterion (i.e., restricted) LC-SDT provides useful information about rating quality for operational raters relative to criterion ratings, which may ultimately inform rater training and monitoring procedures.

Keywords

rater effects signal detection theory latent-class model expert rater writing assessment

Get full access to this article

View all access options for this article.

References

Agresti

(2002). Wiley Series in Probability and Mathematical Statistics. Categorical data analysis (2nd ed.). New York, NY: Wiley-Interscience. doi:10.1002/0471249688

Cronbach

(1955). Processes affecting scores on “understanding of others” and “assumed similarity.”Psychological Bulletin, 52(3), 177. doi:10.1037/h0044919

DeCarlo

L. T.

(2002). A latent class extension of signal detection theory, with applications. Multivariate Behavioral Research, 37, 423-451. doi:10.1207/S15327906MBR3704_01

DeCarlo

L. T.

(2005). A model of rater behavior in essay grading based on signal detection theory. Journal of Educational Measurement, 42, 53-76. doi:10.1111/j.0022-0655.2005.00004.x

DeCarlo

L. T.

(2008). Studies of a latent-class signal-detection model for constructed-response scoring (ETS Research Report No. RR-08-63). Princeton, NJ: Educational Testing Service. Retrieved from https://www.ets.org/Media/Research/pdf/RR-10-08.pdf

DeCarlo

L. T.

(2010). Studies of a latent class signal detection model for constructed response scoring II: Incomplete and hierarchical designs (ETS Research Report No. RR-10-08). Princeton, NJ: Educational Testing Service. Retrieved from http://www.ets.org/Media/Research/pdf/RR-10-08.pdf

DeCarlo

L. T.

Kim

Y.-K.

Johnson

M. S.

(2011). A hierarchical rater model for constructed responses, with a signal detection rater model. Journal of Educational Measurement, 48, 333-356. doi:10.1111/j.1745-3984.2011.00143.x

Eckes

(2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd ed.). Frankfurt am Main, Germany: Peter Lang.

Engelhard

(1996). Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33, 56-70. doi:10.1111/j.1745-3984.1996.tb00479.x

10.

Engelhard

Jr. (1997). Constructing rater and task banks for performance assessments. Journal of Outcome Measurement, 1(1), 19-33.

11.

Engelhard

Jr. (2013). Invariant measurement: Using Rasch models in the social, behavioral, and health sciences. New York, NY: Routledge.

12.

Georgia Department of Education. (2011). Georgia grade 8 writing assessment: Atlanta, GA: Scoring Rubric.

13.

Johnson

R. L.

Penny

J. A.

Gordon

(2009). Assessing performance: Designing, scoring, and validating performance tasks. New York, NY: The Guilford Press.

14.

Kim

Moses

(2013). Determining when single scoring for constructed-response items is as effective as double scoring in mixed-format licensure tests. International Journal of Testing, 13, 314-328. doi:10.1080/15305058.2013.776050

15.

Linacre

J. M.

(1989). Many-Facet Rasch measurement. Chicago, IL: MESA Press.

16.

Magidson

Vermunt

J. K.

(2005). Latent GOLD 4.0 [Software]. Belmont, MA: Statistical Innovations. Available from http://www.statisticalinnovations.com/

17.

Nieto

Casabianca

J. M.

Junker

B. W.

(2016, April). The hierarchical rater model for multidimensional rating rubrics. Paper presented at the 2016 annual meeting of the National Council on Measurement in Education, Washington, DC.

18.

Sulsky

L. M.

Balzer

W. K.

(1988). Meaning and measurement of performance rating accuracy: Some methodological and theoretical concerns. Journal of Applied Psychology, 73, 497-506. doi:10.1037/0021-9010.73.3.497

19.

Wang

Engelhard

Jr. Wolfe

E. W.

(2015). Evaluating rater accuracy in rater-mediated assessments using an unfolding model. Educational and Psychological Measurement. Advance online publication. doi:10.1177/0013164415621606

20.

Wind

S. A.

Engelhard

(2013). How invariant and accurate are domain ratings in writing assessment?Assessing Writing, 18, 278-299. doi:10.1016/j.asw.2013.09.002

21.

Wind

S. A.

Peterson

M. E.

(in press). A systematic review of methods for evaluating rating quality in language assessment. Language Testing.

22.

Woehr

Huffcutt

(1994). Rater training for performance appraisal: A quantitative review. Journal of Occupational and Organizational Psychology, 67, 189-205. doi:10.1111/j.2044-8325.1994.tb00562.x

23.

Wolfe

E. W.

McVay

(2012). Application of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practice, 31(3), 31-37. doi:10.1111/j.1745-3992.2012.00241.x

24.

Wolfe

E. W.

Song

Jiao

(2016). Features of difficult-to-score essays. Assessing Writing, 27, 1-10.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.44 MB

Incorporating Criterion Ratings Into Model-Based Rater Monitoring Procedures Using Latent-Class Signal Detection Theory

Abstract

Keywords

Get full access to this article

References

Supplementary Material