Sage Journals: Discover world-class research

Abstract

In any performance-based musical assessment context, construct-irrelevant variability attributed to raters is a cause of concern when constructing a validity argument. Therefore, evidence of rater quality is a necessary criterion for psychometrically sound (i.e., valid, reliable, and fair) rater-mediated music performance assessments. Rater accuracy is a type of rater quality index that measures the distance between raters’ operational ratings and an expert’s criterion ratings on a set of benchmark, exemplar, or anchor musical performances. The purpose of this study was to examine the quality of ratings in the context of a secondary-level solo music performance assessment using a Multifaceted Rasch Rater Accuracy (MFR-RA) measurement model. This study was guided by the following research questions: (a) overall, how accurate were the rater judgments in the assessment context? (b) how accurate were the rater judgments across each of the items of the rubric?, and (c) how accurate were the rater judgments across each of the domains of the rubric? Results indicated that accuracy scores generally matched the expectations of the MFR-RA model, with rater locations higher than the average student performance, item, and domain locations, indicating that the student performances, items, and domains were relatively easy to rate accurately for the sample of raters examined in this study. Overall, rater accuracy ranged from 0.54 logits (SE = 0.05) for the most accurate rater to 0.24 logits (SE = 0.04) for the least accurate rater. Difficulty of rater accuracy across items indicated a range of 0.91 logits (SE = 0.08) to -1.83 logits (SE = 0.17). Difficulty of rater accuracy across domains ranged from 0.25 logits (SE = 0.08) to -0.68 logits (SE = 0.17). Implications for the improvement of music performance assessments with specific regard to rater training are discussed.

Keywords

accuracy assessment music performance Rasch measurement raters rubric

Get full access to this article

View all access options for this article.

References

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. Washington, DC: AERA.

Bond

T. G.

Fox

C. M.

(2015). Applying the Rasch model: Fundamental measurement in the human sciences. New York: Routledge.

Brennan

R. L.

(2001). Generalizability theory. New York: Springer-Verlag.

Cronbach

L. J.

Gleser

G. C.

Nanda

Rajaratnam

(1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: John Wiley.

Eckes

(2012). Operational rater types in writing assessment: Linking rater cognition to rater behavior. Language Assessment Quarterly, 9(3), 270–292.

Engelhard

Jr. (1996). Evaluating rater accuracy in performance assessments. Journal of Educational Measurement, 33(1), 56–70.

Engelhard

Jr. (2013). Invariant measurement: Using Rasch models in the social, behavioral, and health sciences. New York: Routledge.

Florida Bandmasters Association. (2015). Florida Bandmaster Association adjudication manual. Retrieved from http://fba.flmusiced.org/media/1274/adjudication-manual-2015.pdf

Freud

(1920). Beyond the pleasure principle. New York: Liveright.

10.

Hash

P. M.

(2013). Large-group contest ratings and music teacher evaluation: Issues and recommendations. Arts Education Policy Review, 114(4), 163–169.

11.

Hoskens

Wilson

(2001). Real-time feedback on rater drift in constructed-response items: An example from the Golden State Examination. Journal of Educational Measurement, 38(2), 121–145.

12.

Johnson

R. L.

Penny

J. A.

Gordon

(2009). Assessing performance: Designing, scoring, and validating performance tasks. New York: The Guilford Press.

13.

Knoch

Read

von Randow

(2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26–43.

14.

Linacre

J. M.

(2015). Facets Rasch Measurement (Version 3.71.4). Chicago: Winsteps.com.

15.

Murphy

K. R.

Balzer

W. K.

(1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74, 619–624.

16.

Murphy

Cleveland

(1991). Performance appraisal: An organizational perspective. Boston, MA: Allyn & Bacon.

17.

National Association for Music Education (NAfME). (2017). Assessment in music education (position statement). Retrieved from http://www.nafme.org/about/position-statements/assessment-in-music-education-position-statement/assessment-in-music-education/

18.

New York State School Music Association. (n.d.). Becoming a NYSSMA adjudicator. Retrieved from http://www.nyssma.org/committees/adjudication-festival-committee/director-of-adjudicators/

19.

Ohio Music Education Association. (2008). Rules and regulations for OMEA adjudicated events. Retrieved from http://www.omea-ohio.org/Static_PDF/REMOVED%20PDFs/AE/OMEA_Rulebook.pdf

20.

Rasch

(1960). Probabilistic models for some intelligence and attainment tests. Chicago: MESA Press.

21.

Shavelson

R. J.

Webb

N. M.

(1991). Generalizability theory: A primer. Newbury Park, CA: SAGE.

22.

Texas Music Adjudicators Association (n.d.). Membership requirements. Retrieved from http://www.txmaa.org/tmaameminfo.php

23.

Wang

Song

Wang

Wolfe

(2017). Essay selection methods for adaptive rater monitoring. Applied Psychological Measurement, 41(1), 60–79.

24.

Wesolowski

B. C.

(2017). Exploring rater cognition: A typology of raters in the context of music performance assessment. Psychology of Music, 45(3), 375–399.

25.

Wesolowski

B. C.

Amend

R. M.

Barnstead

T. S.

Edwards

A. S.

Everhart

Goins … Williams

J. D.

(2017). The development of a secondary-level solo wind instrument performance rubric using the Multifaceted Rasch Partial Credit Measurement Model. Journal of Research in Music Education, 65(1), 95–119.

26.

Wesolowski

B. C.

Athanas

Burton

Edwards

A. S.

Edwards

K. E.

Goins

… Thompson

(revision under review). Judgmental standards setting: The development of objective content and performance standards for secondary-level solo instrumental music assessment.

27.

Wesolowski

B. C.

Burrack

Parkes

K. A.

(in press). Phenomenography: Bringing together theory and practice through the process of national standards development and measure construction. In Brophy

T. S.

Fautley

(Eds.). Context matters: Selected papers from the 6th International Symposium on Assessment in Music Education.

28.

Wesolowski

B. C.

Wind

S. A.

Engelhard

(2015). Rater fairness in music performance assessment: Evaluating model-data fit and differential rater functioning. Musicae Scientiae, 19(2), 147–170.

29.

Wesolowski

B. C.

Wind

S. A.

Engelhard

(2016a). Rater analyses in music performance assessment: Application of the Many Facet Rasch Model. In Brophy

T. S.

Marlatt

Ritcher

G. K.

(Eds.), Connecting practice, measurement, and evaluation: Selected papers from the 5th International Symposium on Assessment in Music Education (pp. 335–356). Chicago: GIA.

30.

Wesolowski

B. C.

Wind

S. A.

Engelhard

(2016b). Examining rater precision in music performance assessment: An analysis of rating scale category structure using the Multifaceted Rasch Partial Credit Model. Music Perception, 33(5), 662–678.

31.

Wesolowski

B. C.

Wind

S. A.

Engelhard

(in press). Evaluating differential rater functioning over time in the context of solo music performance assessment. Bulletin of the Council for Research in Music Education.

32.

Wind

S. A.

Engelhard

(2013). How invariant and accurate are domain ratings in writing assessment? Assessing Writing, 18(4), 278–299.

33.

Wind

S. A.

Engelhard

Wesolowski

(2016). Exploring the effects of rater linking designs and rater fit on achievement estimates within the context of music performance assessments. Educational Assessment, 21(4), 278–299.

34.

Wolfe

E. W.

Jiao

Song

(2015). A family of rater accuracy models. Journal of Applied Measurement, 16(2), 153–160.

35.

Wolfe

E. W.

Song

Jiao

(2016). Features of difficult-to-score essays. Assessing Writing, 27, 1–10.

Investigating rater accuracy in the context of secondary-level solo instrumental music performance

Abstract

Keywords

Get full access to this article

References