Abstract
This article investigates a group oral test as administered at a university in Japan to find if it is appropriate to use scores for higher stakes decision making. It is one component of an in-house English proficiency test used for placing students, evaluating their progress, and making informed decisions for the development of the English language curriculum. The implementation of a cut-score for students to advance through the university system has recently been proposed, bringing the group oral test component under increased scrutiny. On two successive occasion 113 participants sat the oral test in groups composed of different interlocutors each time. Rasch analysis shows rater fit within acceptable levels considering the length and nature of the test; however, at correlations of .74 inter-rater agreements are lower than has been reported in research on commercially available interview tests. Candidates’ scores on the two different test occasions correlate at .61. A generalizability study shows that the greatest systematic variation in test scores is contributed by the person-by-occasion interaction. Topic, or prompt, was not a significant factor. Candidates’ performances, or how raters perceive an individual candidates’ ability, could be affected to a large degree by the characteristics of interlocutors and interaction dynamics within the group.
Get full access to this article
View all access options for this article.
