Abstract
Assessing translation and interpreting (T&I) is essential in tertiary-level T&I education, professional certification, and foreign language testing. Recently, researchers have explored automating T&I assessment, with large language models (LLMs) emerging as a promising agent for automatic scoring. This study presents one of the first large-scale empirical investigations into the scoring reliability, severity, and validity of GPT-4o and DeepSeek-R1 in English–Chinese consecutive and simultaneous interpreting assessment. Using more than 500 pre-scored samples from the Interpreting Quality Evaluation Corpus (IQEC), the study configured eight e-raters per LLM, systematically varying three scoring parameters: reference availability (zero vs. four references), scoring granularity (segment vs. document-level scoring), and model randomness (temperature 0 vs. 1). A combination of correlation, linear mixed model, and Rasch analyses revealed that: (a) both LLMs demonstrated higher reliability than human raters; (b) DeepSeek-R1 applied significantly harsher scoring patterns than GPT-4o; (c) both LLMs achieved moderately strong correlations with human raters, with overall Spearman’s correlation coefficients ranging from .586 to .700; (d) GPT-4o exhibited higher scoring accuracy than DeepSeek-R1; and (e) LLM-based e-raters’ performance varied significantly across different scoring conditions. These results have important theoretical and practical implications, providing insights into optimizing LLM-based automatic scoring for interpreting and broader language assessment contexts.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
