Abstract
Rater comments tend to be qualitatively analyzed to indicate raters’ application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The data consisted of ratings on 987 essays by 36 raters (a total of 3948 analytic scores and 1974 rater comments) on a post-admission English Placement Test (EPT) at a large US university. We computed a set of comment-based features based on the analytic components and evaluative language the raters used to infer whether raters were aligned to the scale. For data triangulation, we performed correlation analyses between the MFRM measures of rater performance and the comment-based measures. Although the EPT raters showed overall satisfactory performance, we found meaningful associations between rater comments and performance features. In particular, raters with higher precision and fit to what the Rasch model predicts used more analytic components and used evaluative language more similar to the scale descriptors. These findings suggest that NLP techniques have the potential to help language testers analyze rater comments and understand rater behavior.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
