Abstract
Automated, or computer-based, scoring represents one promising possibility for improving the cost effectiveness (and other features) of complex performance assessments (such as direct tests of writing skill) that require examinees to construct responses rather than select them from a set of multiple choices. Indeed, significant advances have been made in applying natural language processing techniques to the automatic scoring of essays. Thus far, most of the validation of automated scoring has focused appropriately (but too narrowly, we contend) on the correspondence between computer-generated scores and those assigned by human readers. Far less effort has been devoted to assessing the relation of automated scores to independent indicators of examinees' writing skills. This study examined the relationship of scores from a graduate level writing assessment to several independent, non-test indicators of examinees' writing skills—both for automated scores and for scores assigned by trained human readers. The extent to which automated and human scores exhibited similar relations with the non-test indicators was taken as evidence of the degree to which the two methods of scoring reflect similar aspects of writing proficiency. Analyses revealed significant, but modest, correlations between the non-test indicators and each of the two methods of scoring. These relations were somewhat weaker for automated scores than for scores awarded by human readers. Overall, however, the results provide some evidence of the validity of one specific procedure for automated scoring.
Get full access to this article
View all access options for this article.
