Abstract
This study strengthens the validation of learner speech assessment in the Common European Framework of Reference (CEFR) by analyzing the quantitative variables related to fluency and accuracy across four CEFR levels (A2, B1, B2, and C1). Drawing on a learner corpus approach, we examine 500,000 tokens from the Louvain International Database of Spoken English Interlanguage (LINDSEI) and its extensions, supplemented by post hoc rater evaluations. Three task types—a semi-monologic topic discussion, a dialogic interaction, and a monologic picture description—are used to elicit variation in speech production. The analysis focuses on speech rates, the frequency of filled and unfilled pauses, and error rates to unveil developmental trends in learner speech. The results reveal strong correlations between these fluency and accuracy metrics and CEFR levels, with speech rate emerging as the most reliable indicator of proficiency. The frequency of unfilled pauses decreases as proficiency increases, while filled pauses, although less critical to fluency assessment, offer insights into speech planning mechanisms. Error rates similarly decline with higher proficiency, reflecting greater accuracy in speech production. Exemplary instances for each CEFR level are presented, offering practical metrics for teaching, assessment, and rater training. While the study’s limitations include an overrepresentation of Mandarin Chinese learners and the exclusion of pronunciation errors, these gaps highlight avenues for future research. This study provides empirical, task-sensitive evidence to enrich CEFR can-do descriptors, enhance rater training, and refine speaking assessments, contributing to more effective language teaching, learning, and assessment practices.
Get full access to this article
View all access options for this article.
