Abstract
Automated scoring has the potential to dramatically reduce the time and costs associated with the assessment of complex skills such as writing, but its use must be validated against a variety of criteria for it to be accepted by test users and stakeholders. This study approaches validity by comparing human and automated scores on responses to TOEFL® iBT Independent writing tasks with several non-test indicators of writing ability: student self-assessment, instructor assessment, and independent ratings of non-test writing samples. Automated scores were produced using e-rater ®, developed by Educational Testing Service (ETS). Correlations between both human and e-rater scores and non-test indicators were moderate but consistent, providing criterion-related validity evidence for the use of e-rater along with human scores. The implications of the findings for the validity of automated scores are discussed.
Get full access to this article
View all access options for this article.
