Abstract
Sparse rating designs, where each examinee’s performance is scored by a small proportion of raters, are prevalent in practical performance assessments. However, relatively little research has focused on the degree to which different analytic techniques alert researchers to rater effects in such designs. We used a simulation study to compare the information provided by two popular approaches: Generalizability theory (G theory) and Many-Facet Rasch (MFR) measurement. In previous comparisons, researchers used complete data that were not simulated—thus limiting their ability to manipulate characteristics such as rater effects, and to understand the impact of incomplete data on the results. Both approaches provided information about rating quality in sparse designs, but the MFR approach highlighted rater effects related to centrality and bias more readily than G theory.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
