Sage Journals: Discover world-class research

Abstract

This paper presents evidence that supports the valid use of scores from fully automatic tests of spoken language ability to indicate a person’s effectiveness in spoken communication. The paper reviews the constructs, scoring, and the concurrent validity evidence of ‘facility-in-L2’ tests, a family of automated spoken language tests in Spanish, Dutch, Arabic, and English. The facility-in-L2 tests are designed to measure receptive and productive language ability as test-takers engage in a succession of tasks with meaningful language. Concurrent validity studies indicate that scores from the automated tests are strongly correlated with the scores from oral proficiency interviews. In separate studies with learners from each of the four languages the automated tests predict scores from the live interview tests as well as those tests predict themselves in a test-retest protocol (r = 0.77 to 0.92). Although it might be assumed that the interactive nature of the oral interview elicits performances that manifest a distinct construct, the closeness of the results suggests that the constructs underlying the two approaches to oral assessment have a stable relationship across languages.

Keywords

automated scoring language testing speech recognition test validity Versant

Get full access to this article

View all access options for this article.

References

Balogh J. , Bernstein J. ( 2007). Workable models of standard performance in English and Spanish. In Matsumoto Y , Oshima DY , Robinson OR , and Sells P (Eds.), Diversity in language: Perspective and implications (pp. 20-41). Stanford, CA: Center for the Study of Language and Information Publications.

Bernstein J. , Cheng J. ( 2007). Logic, operation, and validation of a spoken English test. In Holland VM , Fisher FP (Eds.), The path of speech technologies in computer assisted language learning (pp. 174-194). New York: Routledge.

Bernstein J. , Franco H. ( 1996). Speech recognition by computer. In Lass, N (Ed.), Principles of experimental phonetics (pp. 408-434). St. Louis, MO: Mosby.

Bernstein J. , Cohen M. , Murveit H. , Rtischev D. , and Weintraub M. ( 1990). Automatic evaluation and training in English pronunciation . In Proceedings of the ICSLP-90: 1990 International Conference on Spoken Language Processing (pp. 1185-1188). Kobe, Japan.

Brown A. ( 2005). Interviewer variability in oral proficiency interviews. Language Testing and Evaluation 4. Frankfurt: Peter Lang.

Center for Applied Linguistics (2005). Technical report: Development of a computer-assisted assessment of oral proficiency for adult English language learners. Washington, DC: Centre for Applied Linguistics.

Cherry C. ( 1966). On human communication (2nd ed.). Cambridge, MA: MIT Press .

Clapham C. ( 2000). Assessment for academic purposes: Where next? System, 28, 511-521.

Council of Europe (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment . Cambridge: Cambridge University Press.

10.

Cronbach L. ( 1988). Five perspectives on validation argument. In Wainer H , Braun H (Eds.), Test validity (pp. 3-17). Hillsdale, NJ: Lawrence Erlbaum.

11.

De Jong Jhal , Lennig M. , Kerkhoff A. , and Poelmans P. ( 2009). Development of a test of spoken Dutch for prospective immigrants. Language Assessment Quarterly, 6(1), 41-60.

12.

Educational Testing Service (1982). Oral proficiency testing manual. Princeton, NJ: Educational Testing Service.

13.

Farhady H. ( 2008). Human operated, machine mediated, and automated tests of spoken English. Paper presented at the American Association of Applied Linguistics, Washington, DC.

14.

Franco H. , Bratt H. , Rossier R. , et al. (2010) EduSpeak: A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications. Language Testing, 27(3): 401-418.

15.

Fulcher G. ( 2000). The ‘communicative’ legacy in language testing . System, 28, 483-497.

16.

Fulcher G. , Reiter R. ( 2003). Task difficulty in speaking tests. Language Testing, 20(3), 321-344.

17.

Geranpayeh A. ( 1994). Are score comparisons across language proficiency test batteries justified? A TOEFL-IELTS comparability study. Edinburgh Working Papers in Applied Linguistics, 5, 50-65.

18.

Henning G. ( 1983). Oral proficiency testing: Comparative validities of interview, imitation, and completion methods. Language Learning, 33(3), 315-332.

19.

Hulstijn J. ( 2006). Defining and measuring the construct of second/language proficiency. Plenary address at the American Association of Applied Linguistics (AAAL), Montreal.

20.

Hulstijn JH ( 2007). Psycholinguistic perspectives on second language acquisition . In Cummins J , Davison C (Eds.), The international handbook on English language teaching (pp. 701-713). Norwell, MA: Springer.

21.

Kane M. ( 1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527-535.

22.

Landauer TK , Foltz PW , and Laham D. ( 1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284.

23.

Linacre JM ( 2003). Facets Rasch Measurement Computer Program. Chicago, IL: Winsteps.com.

24.

O’Sullivan B. , Weir C. , and Saville N. ( 2002). Using observation checklists to validate speaking-test tasks. Language Testing, 19(1), 33-56.

25.

Pearson (2008). Versant Arabic Test: Test description and validation summary. Pearson Knowledge Technologies, Palo Alto, California. Available online at www.ordinate.com/technology/VersantArabicTestValidation.pdf (accessed December 2009).

26.

Pearson (2009a). Official guide to Pearson Test of English Academic. London : Longman.

27.

Pearson (2009b). Versant Spanish Test: Test description and validation summary. Pearson Knowledge Technologies, Palo Alto, California. Available online at www.ordinate.com/technology/VersantSpanishTestValidation.pdf (accessed December 2009).

28.

Present-Thomas R. , Van Moere A. ( 2009). NRS classification consistency of two spoken English Tests. Paper presented at the East Coast Organization of Language Testers Conference (ECOLT), Washington, DC .

29.

Rosenfeld E. , Massaro D. and Bernstein J. ( 2003). Automatic analysis of vocal manifestations of apparent mood or affect. Proceedings of the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA), Florence, Italy.

30.

Rubin D. , Kang O. , and Pickering L. ( 2009). Relative impact of rater characteristics versus speaker suprasegmental features on oral proficiency scores. Paper presented at the Language Research Testing Colloquium (LTRC), Denver, CO.

31.

Schoonen R. , De Jong N. , Steinel M. , Florijn M. and Hulstijn J. ( 2009). Profiles of Linguistic Ability at Different Levels of the European Framework: Can They Provide Transparency? Paper presented at the Language Research Testing Colloquium (LTRC), Denver, CO.

32.

Shohamy E. ( 1994). The validity of direct versus semi-direct oral tests . Language Testing, 11(2), 99-124.

33.

Stansfield CW , Kenyon DM ( 1992). Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview. System , 20(3), 347-364.

34.

Suzuki M. , Yokokawa H. , and Van Moere A. ( 2008). Effects of a short-term study abroad program on the development of L2 speaking skills. Paper presented at the American Association of Applied Linguistics (AAAL), Washington, DC.

35.

Van Moere A. , Kobayashi M. ( 2004). Group oral testing: Does amount of output affect scores? Paper presented at Language Testing Forum (LTF) , Lancaster University, UK.

36.

Vinther T. ( 2002). Elicited imitation: A brief overview. International Journal of Applied Linguistics, 12(1), 54-73.

37.

Young S. ( 1996). Large vocabulary continuous speech recognition. IEEE Signal Processing Magazine, 13(5), 45-57.

38.

Young S. , Kershaw D. , Odell J. , Ollason D. , Valtchev V. , and Woodland P. ( 2000). The HTK Book Version 3.0. Cambridge , UK: Cambridge University Press.

39.

Zechner K. , Higgins D. , Xi X. , and Williamson D. ( 2009). Automatic scoring of non-native spontaneous speech in tests of spoken English. Speech Communication, 51, 883-895.

Validating automated speaking tests

Abstract

Keywords

Get full access to this article

References