Systematic effects in the rating of second-language speaking ability: test method and learner discourse

Abstract

Major differences exist in two approaches to the study of second-language performance. Second-language-acquisition (SLA) research examines effects upon discourse, and is typically unconcerned with scores. Language-testing (LT) research investigates effects upon scores, generally without reference to discourse. Within a general framework of test taking and scoring, we report research from these two fields as it relates to questions of systematic effects on second-language tests. We then examine findings incidental to a test-development project.

The findings were consistent with LT research into systematic effects of task and rater on ratings, and with SLA research into systematic effects of task on discourse. Using empirically derived scales as indicators of salient features of discourse, we infer that task type influences strategies for assessing language performance. Explanations for these joint findings are not afforded by either standard LT or SLA perspectives. There is no theory of method to explain how particular aspects of method affect discourse, how those discourse differences are then reflected in ratings and how task features influence the basis for judgement. We conclude that a full account of performance testing requires a paradigm that incorporates relationships that are not specified in either the major language-testing research tradition or the tradition of second-language-acquisition research.

Get full access to this article

View all access options for this article.

References

Bachman, L. 1990: Fundamental considerations in language testing. Oxford: Oxford University Press .

Bachman, L. F. and Cohen, A. D. , editors, in press: Interfaces between second language acquisition and language testing research. New York: Cambridge University Press .

Bachman, L. F. , Lynch, B. K. and Mason, M. 1995: Investigating variability in tasks and rater judgements in a performance test of foreign language speaking . Language Testing 12, 238-252 .

Beebe, L. 1980: Sociolinguistic variation and style-shifting in second language acquisition . Language Learning 30: 433-447 .

Beebe, L. and Zuengler, J. 1983: Accommodation theory: an explanation for style-shifting in second language dialects. In Wolfson, N. and Judd, E. , editors, Sociolinguistics and second language acquisition. Rowley, MA: Newbury House , 195-213.

Brown, A. 1993: The role of test-taker feedback in the test development process: test-takers’ reactions to a tape-mediated test of proficiency in spoken Japanese . Language Testing 10, 277-303 .

Brown, A. 1995: The effect of rater variables in the development of an occupation-specific language performance test . Language Testing 12, 1-15 .

Brütsch, S. 1979: Convergent-discriminant validation of prospective teacher proficiency in oral and written French by means of the MLA Cooperative Language Proficiency Tests for Teachers (TOP and TWP), and self-ratings. Unpublished PhD dissertation, University of Minnesota, Minneapolis, MN.

Campbell, D. and Fiske, D. 1959: Convergent and discriminant validation by the multitrait-multimethod matrix . Psychological Bulletin 56, 81-105 .

10.

Chalhoub-Deville, M. 1995: Deriving oral assessment scales across different tests and rater groups . Language Testing 12, 16-33 .

11.

Clifford, R. 1981: Convergent and discriminant validation of integrated and unitary language skills: the need for a research model. In Palmer, A. , Groot, P. and Trosper, G. , editors. The construct validation of tests of communicative competence. Washington DC: TESOL , 62-70.

12.

Corrigan, A. and Upshur, J. 1982: Test method and linguistic factors in foreign language tests . IRAL XX, 313-321 .

13.

Dickerson, L. 1975: The learner’s interlanguage as a system of variable rules . TESOL Quarterly 9, 401-407 .

14.

Douglas, D. and Selinker, L. 1985: Principles for language tests within the discourse domains: theory of interlanguage . Language Testing 2, 203-226 .

15.

Elder, C. 1993: How do subject specialists construe language proficiency? Language Testing 10, 235-254 .

16.

Elder, C. 1995: The effect of language background on ‘foreign’ language test performance: Problems of classification and measurement . Language Testing Update 17, 36-38 .

17.

Ellis, R. 1987: Interlanguage variability in narrative discourse: style-shifting in the use of the past tense . SSLA 9, 1-20 .

18.

Ellis, R. 1994: The study of second language acquistion. Oxford: Oxford University Press .

19.

Fulcher, G. 1987: Test of oral performance: The need for data-based criteria . English Language Teaching Journal 41, 287-291 .

20.

Fulcher, G. 1988: Lexis and reality in oral testing. Washington DC: ERIC Clearing House on Languages and Linguistics (ED 298 759).

21.

Fulcher, G. 1994: Some priority areas for research in oral language testing . Language Testing Update 15, 39-47 .

22.

Fulcher, G. 1996: Does thick description lead to smart tests? A data-based approach to rating sclae construction . Language Testing 13, 208-238 .

23.

Green, B. F. 1994: Differential item functioning: techniques, findings and prospects. In Laveault, D. , Zumbo, B. D. , Gessaroli, M. E. and Boss, M. W. , editors, Modern theories of measurement: problems and issues. Ottawa, ON: University of Ottawa , 141-162.

24.

Henning, G. and Davidson, F. 1986: Scalar analysis of composition ratings. In Baily, K. , Dale, T. and Clifford, R. , editors, Language testing research, selected papers from the 1986 colloquium. Monterey, CA: Defense Language Institute , 24-38.

25.

Kenyon, D. M. 1995: An investigation of the validity of the demands of tasks on performance-based tests of oral proficiency . Paper presented at the 17th annual Language Testing Research Colloquium, Long Beach, CA.

26.

Kunnan, A. J. 1995: Test taker characteristics and test performance: a structural modelling approach. In Milanovic, M. , series editor, Studies in language testing, Number 2. Cambridge: University of Cambridge Local Examinations Syndicate .

27.

Lazaraton, A. 1996: Interlocutor support in oral proficiency interviews: the case of CASE . Language Testing 13, 151-172 .

28.

Linacre, J. M. 1989-1993a: Many-facet Rasch measurement. Chicago, IL: MESA Press .

29.

Linacre, J. M. 1989-1993b: Users guide to Facets: Rasch measurement computer program. Chicago, IL: MESA Press .

30.

Linacre, J.M. 1994: FACETS (Version 2.75) [Computer software]. Chicago, IL: MESA Press.

31.

Lumley, T. and McNamara, T. F. 1995: Rater characteristics and rater bias: implications for training . Language Testing 12, 54-71 .

32.

Lunz, M. E. and Stahl, J. A. 1990: Judge severity and consistency across grading periods . Evaluation and the health professions 13, 425-444 .

33.

Lunz, M. E. , Wright, B. D. and Linacre, J. M. 1990: Measuring the impact of judge severity on examination scores . Applied Measurement in Education 3, 331-345 .

34.

McNamara, T. F. 1990: Item Response Theory and the validation of an ESP test for health professionals . Language Testing 7, 52-75 .

35.

McNamara, T. F. 1996: Measuring second language performance. London: Longman .

36.

McNamara, T. F. and Adams, R. J. 1991/1994: Exploring rater characteristics with Rasch techniques. In Selected papers of the 13th Language Testing Research Colloquium (LTRC). Princeton, NJ: ETS (ERIC Document Reproduction Service ED 345 498).

37.

Myford, C. M. , Marr, D. B. and Linacre, J. M. 1996: Reader calibration and its potential role in equating for the TWE (TOEFL) Research Report No. 52). Princeton, NJ: Educational Testing Service .

38.

O’Loughlin, K. 1995: Lexical density in candidate output on direct and semi-direct versions of an oral proficiency test . Language Testing 12, 217-237 .

39.

Pavesi, M. 1986: Markedness, discoursal modes and relative clause formation in a formal and an informal context . SSLA 8, 38-55 .

40.

Pica, T. , Holliday, L. , Lewis, N. , Berducci, D. and Newman, J. 1991: Language learning through interaction: what role does gender play SSLA 13, 343-376 .

41.

Plough, I. and Gass, S. 1993: Interlocutor and task familiarity: effects on interactional structure. In Crookes, G. and Gass, S. , editors, Tasks and language learning: integrating theory and practice (35-56). Cleveden, UK: Multilingual Matters .

42.

Pollitt, A. and Hutchinson, C. 1987: Calibrated graded assessments: Rasch partial credit analysis of performance in writing . Language Testing 4, 72-82 .

43.

Pollitt, A. and Murray, N. 1993: What raters really pay attention to . Paper presented at the Language Testing Research Colloquium, Arnhem.

44.

Porter, D. 1991: Affective factors in the assessment of oral interaction: gender and status. In S. Anivan , editor, Current developments in language testing. Singapore: SEAMEO RELC , 99-102.

45.

Reid, J. 1988: Quantitative differences in English prose written by Arabic, Chinese, Spanish and English writers. Unpublished doctoral dissertation, Colorado State University, Fort Collins, CO.

46.

Rose, K. R. and Ono, R. 1995: Eliciting speech act data in Japanese: the effect of questionnaire type . Language Learning 45, 191-223 .

47.

Samuda, R. S. , Kong, S. L. , Cummin, J. , Lewis, J. and Pascual-Leone, J. , editors, 1989: Assessment and placement of minority students. Kingston, ON: C.J. Hogrefe .

48.

Sato, C. 1985: Task variation in interlanguage phonology. In Gass, S. and Madden, C. , editors, Input in second language acquisition. Rowley, MA: Newbury House , 181-196.

49.

Shohamy, E. 1990: Discourse analysis in language testing . Annual Review of Applied Linguistics 11, 115-128 .

50.

Simon, M. G. 1994: Differential item functioning: applicability in a bilingual context. In Laveault, D. , Zumbo, B. D. , Gessaroli, M. E. and Boss, M. W. , editors, Modern theories of measurement: problems and issues. Ottawa, ON: University of Ottawa , 163-169.

51.

Stansfield, C. 1986: A history of the Test of Written English: the development year . Language Testing 3, 224-234 .

52.

Stansfield, C. and Kenyon, D. 1992: Research on the comparability of the oral proficiency interview and the simulated oral proficiency interview . SYSTEM 20, 347-364 .

53.

Sunderland, J. 1995: Gender and language testing . Language Testing Update 17, 24-35 .

54.

Takahashi, T. 1989: The influence of the listener on L2 speech. In Gass, S. , Madden, C. , Preston, D. and Selinker, L. , editors, Variation in second language acquisition, volume I: discourse and pragmatics. Clevedon, UK: Multilingual Matters , 245-279.

55.

Tarone, E. 1985: Variability in interlanguage use: a study of style-shfiting in morphology and syntax . Language Learning 35, 373-403 .

56.

Turner, C. and Upshur, J. 1996: Developing rating scales for the assessment of second language performance. In Wigglesworth, G. and Edler, C. , editors, Australian review of Applied Lingustics, series S, number 13: the language testing cycle: from interception to washback. Melbourne: ARAL , 55-79.

57.

Tyndall, B. and Kenyon, D. M. 1996: Validation of a new holistic rating scale using Rasch multi-facted analysis. In Cumming, A. and Berwick, R. , editors, Validation in language testing. Clevedon, UK: Multilingual Matters , 39-57.

58.

Upshur, J. and Turner, C. 1995: Constructing rating scales for second language tests . ELT Journal 49, 3-12 .

59.

Varonis, E. and Gass, S. 1985: Non-native/non-native conversations: a model for negotiation of meaning . Applied Linguistics 6, 71-90 .

60.

Vaughan, C. 1991: Holistic assessment: what goes on in the reader’s mind? In Hamp-Lyons , editor, Assessing second language writing in academic contexts. Norwood, NJ: Ablex Publishing Corporation , 111-125.

61.

Weigle, S. C. 1994a: Effects of training on raters of ESL compositions . Language Testing 11, 197-223 .

62.

Weigle, S. C. 1994b: Using FACETS to model rater training effects , Paper presented at the 16th annual Language Testing Research Colloquium, Washington DC.

63.

Wigglesworth, G. 1993: Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction . Language Testing 10, 305-335 .

64.

Wright, B. D. and Masters, G. N. 1982: Rating scale analysis. Chicago, IL: MESA Press .