Does Sparseness Matter? Examining the Use of Generalizability Theory and Many-Facet Rasch Measurement in Sparse Rating Designs

Abstract

Sparse rating designs, where each examinee’s performance is scored by a small proportion of raters, are prevalent in practical performance assessments. However, relatively little research has focused on the degree to which different analytic techniques alert researchers to rater effects in such designs. We used a simulation study to compare the information provided by two popular approaches: Generalizability theory (G theory) and Many-Facet Rasch (MFR) measurement. In previous comparisons, researchers used complete data that were not simulated—thus limiting their ability to manipulate characteristics such as rater effects, and to understand the impact of incomplete data on the results. Both approaches provided information about rating quality in sparse designs, but the MFR approach highlighted rater effects related to centrality and bias more readily than G theory.

Keywords

rater effects Generalizability theory Rasch model performance assessment

Get full access to this article

View all access options for this article.

References

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME) . (2014). Standards for educational and psychological testing. AERA.

Andrich

D. A.

(1978). Application of a psychometric rating model to ordered categories which are scored with successive integers. Applied Psychological Measurement, 2(4), 581–594. https://doi.org/10.1177/014662167800200413

Baird

Hayes

Johnson

Lamprianou

(2013). Marker effects and examination reliability: A comparative exploration from the perspectives of Generalizability theory, Rasch modelling and multilevel modelling (Ofqual/13/5261). Office of Qualifications and Examinations Regulation.

Brennan

R. L.

(2000). Performance assessments from the perspective of generalizability theory. Applied Psychological Measurement, 24(4), 339–353. https://doi.org/10.1177/01466210022031796

Brennan

R. L.

(2001). Generalizability theory. Springer-Verlag.

Chiu

C. W. T.

Wolfe

E. W.

(2002). A method for analyzing sparse data matrices in the generalizability theory framework. Applied Psychological Measurement, 26(3), 321–338. https://doi.org/10.1177/0146621602026003006

Eckes

(2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2nd ed.). Peter Lang.

Engelhard

(2002). Monitoring raters in performance assessments. In Tindal

Haladyna

(Eds.), Large-scale assessment programs for ALL students: Development, implementation, and analysis (pp. 261–287). Erlbaum.

Engelhard

Wind

S. A.

(2018). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Taylor & Francis.

10.

Hill

H. C.

Charalambous

C. Y.

Kraft

M. A.

(2012). When rater reliability is not enough: Teacher observation systems and a case for the generalizability study. Educational Researcher, 41(2), 56–64. https://doi.org/10.3102/0013189X12437203

11.

Hoyt

W. T.

(2000). Rater bias in psychological research: When is it a problem and what can we do about it? Psychological Methods, 5(1), 64.

12.

Iramaneerat

Smith

E. V.

Smith

R. M.

(2008). An introduction to Rasch measurement. In Osborne

J. W.

(Ed.), Best practices in quantitative methods. Sage Publications, Inc.

13.

Johnson

R. L.

Penny

J. A.

Gordon

(2009). Assessing performance: Designing, scoring, and validating performance tasks. The Guilford Press.

14.

Kim

S. C.

Wilson

(2010). A comparative analysis of the ratings in performance assessment using generalizability theory and the Many-Facet Rasch model. In Garner

M. L.

EngelhardJr

Wilson

(Eds.), Advances in Rasch measurement (1). JAM Press.

15.

Lin

C.-K.

(2017). Working with sparse data in rated language tests: Generalizability theory applications. Language Testing, 34(2), 271–289. https://doi.org/10.1177/0265532216638890

16.

Linacre

J. M.

(1989). Many-Facet Rasch measurement. MESA Press.

17.

Linacre

J. M.

(1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328. https://doi.org/10.1177/0265532216638890

18.

Linacre

J. M.

(1996). Generalizability theory and Many-Facet Rasch measurement. In Engelhard

Wilson

(Eds.), Objective measurement: Theory into practice (3, pp. 85–98). Ablex.

19.

Linacre

J. M.

(2015). Facets Rasch Measurement (3.71.4). Winsteps.com.

20.

Lumley

McNamara

T. F.

(1995). Rater characteristics and rater bias: Implications for training. Language Testing, 1995(12), 54–71.

21.

Lynch

B. K.

McNamara

T. F.

(1998). Using G-theory and Many-Facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants. Language Testing, 15(2), 158–180. https://doi.org/10.1177/026553229801500202

22.

MacMillan

P. D.

(2000). Classical, generalizability, and multifaceted Rasch detection of interrater variability in large, sparse data sets. The Journal of Experimental Education, 68(2), 167–190. https://doi.org/10.1080/00220970009598501

23.

McEwen

M. R.

(2015). Development of a software prototype for generating and classifying incomplete Many-Facet Rasch Model rating designs. Unpublished doctoral project, Department of Instructional Psychology and Technology, Brigham Young University, Provo, UT. [Available from the BYU Library in electronic form.]

24.

McEwen

(2018). The effects of incomplete rating designs on results from Many-facets Rasch Model analyses. Unpublished PhD. Dissertation, Brigham Young University, Provo, UT.

25.

Moore

C. T.

(2016). Apply generalizability theory with R (0.1.2) [R]. https://CRAN.R-project.org/package=gtheory

26.

Myford

C. M.

Wolfe

E. W.

(2003). Detecting and measuring rater effects using Many-Facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386–422.

27.

Myford

C. M.

Wolfe

E. W.

(2004). Detecting and measuring rater effects using Many-Facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189–227.

28.

Raju

N. S.

(1988). The area between two item characteristic curves. Psychometrika, 53(4), 495–502.

29.

R Core Team . (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

30.

Schumacker

R. E.

(1999). Many-facet Rasch Analysis with crossed, nested, and mixed designs. Journal of Outcome Measurement, 3(4), 323–338.

31.

Shavelson

R. J.

Webb

N. M.

(1991). Generalizability theory: A primer. Sage Publications, Inc.

32.

Sudweeks

R. R.

Reeve

Bradshaw

W. S.

(2004). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9(3), 239–261. https://doi.org/10.1016/j.asw.2004.11.001

33.

Wind

S. A.

Jones

(2018). Exploring the influence of range restrictions on connectivity in sparse assessment networks: An illustration and exploration within the context of classroom observations. Journal of Educational Measurement, 55(2), 217–241. https://doi.org/10.1111/jedm.12173

34.

Wind

S. A.

Jones

(2019). The effects of incomplete rating designs in combination with rater effects. Journal of Educational Measurement, 56(1), 76–100. https://doi.org/10.1111/jedm.12201

35.

Wolfe

E. W.

McVay

(2012). Application of latent trait models to identifying substantively interesting raters. Educational Measurement: Issues and Practice, 31(3), 31–37. https://doi.org/10.1111/j.1745-3992.2012.00241.x

36.

Wolfe

E. W.

Song

(2015). Comparison of models and indices for detecting rater centrality. Journal of Applied Measurement, 16(3), 228–241.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.24 MB