Sage Journals: Discover world-class research

Abstract

The purpose of this study is twofold. First, using FACETS (Linacre, 1996), it investigates how judgements of trained teacher raters are biased towards certain types of candidates and certain criteria in assessing Japanese second language (L2) writing. Previous studies that identified significantly biased rater-candidate interactions did not discuss who the candidates were, but this study examines rater-candidate interactions in much more detail. Secondly, since there is no established rating scale for assessing Japanese L2 writing, this study explores the potential for using a modified version of Jacobs et al.’s (1981) rating scale for norm-referenced decisions about Japanese L2 writing ability. The participants in the study comprised 234 university candidates and three trained teacher raters. The raters produced highly correlated scores and were self-consistent, but significant differences in overall severity surfaced. The raters scored certain candidates and criteria more leniently or harshly, and every rater’s bias pattern was different. The highest percentage of significantly biased rater-candidate interactions was found among the candidates whose ability was extremely high or low. This study suggests that the modified version of Jacobs et al.’s scale can be a reliable tool in assessing Japanese L2 writing in norm-referenced settings, but multiple ratings are still necessary.

Get full access to this article

View all access options for this article.

References

Bachman, L. F. , Lynch, B. K. and Mason, M. 1995: Investigating variability in tasks and rater judgments in a performance test of foreign language speaking . Language Testing 12, 238-257 .

Brindley, G. 1994: Task-centered assessment in language learning: the promise and the challenge . In Bird, N. , editor, Language and learning: papers presented at the Annual International Language in Education Conference (Hong Kong, 1003). ERIC document ED 386045.

Brown, A. 1995: The effect of rater variables in the development of an occupation-specific language performance test . Language Testing 12, 1-15 .

Brown, J. D. 1996: Testing in language programs. Upper Saddle River, NJ: Prentice-Hall .

Brown, J. D. and Bailey, K. M. 1984: A categorical instrument for scoring second language writing skills . Language Learning 34, 21-42 .

Brown, J. D. and Hudson, T. 1998: The alternatives in language assessment . TESOL Quarterly 32, 653-675 .

Brown, J. D. , Hudson, T. , Norris, J. M. and Bonk, W. in press: Investigating second language task-based performance assessments. Honolulu, HI: Second Language Teaching and Curriculum Center, University of Hawaii .

Cohen, A. D. 1994: Assessing language ability in the classroom. 2nd edition. Boston, MA: Heinle & Heinle .

Cumming, A. 1990: Expertise in evaluating second language compositions . Language Testing 7, 31-51 .

10.

Engelhard, G. 1996: Evaluating rater accuracy in performance assessments . Journal of Educational Measurement 33, 56-70 .

11.

Hamp-Lyons, L. 1991: Scoring procedures for ESL contexts. In Hamp-Lyons, L. , editor, Assessing second language writing in academic contexts. Norwood, NJ: Ablex , 241-276.

12.

Hamp-Lyons, L. and Henning, G. 1991: Communicative writing profiles: an investigation of the transferability of a multiple-trait scoring instrument across ESL writing assessment contexts . Language Learning 41, 337-373 .

13.

Henning, G. 1987: A guide to language testing: development, evaluation, research. Cambridge, MA: Newbury House .

14.

Henning, G. 1996. Accounting for nonsystematic error in performance ratings . Language Testing 13, 53-61 .

15.

Hirose, K. and Sasaki, M. 1994: Explanatory variables for Japanese students’ expository writing in English: an exploratory study . Journal of Second Language Writing 3, 203-229 .

16.

Ishida, T. 1992: Nyuumon nihongo tesutohoo [Introduction to Japanese testing methods]. Tokyo: Taishuukan .

17.

Jacobs, H. L. , Zingraf, S. A. , Wormuth, D. R. , Hartfiel, V. F. and Hughey, J. B. 1981: Testing ESL composition: a practical approach. Rowley, MA: Newbury House .

18.

Jones, R. L. 1985: Second language performance testing: an overview. In Hauptman, P. C. , LeBlanc, R. and Wesche, M. B. , editors, Second language performance testing. Ottawa: University of Ottawa Press , 15-24.

19.

Kikuchi, Y. 1991: Sakubun: sono seishitsu to shutsudai, saiten [Composition: its characteristics, prompts, and scoring]. In Nihongo kyooiku gakkai [Society for Teaching Japanese as a Foreign Language] , editor, Nihongo tesuto handobukku [Japanese test handbook]. Tokyo: Taishuukan , 305-324.

20.

Kondo-Brown, K. 2001: Heritage language students of Japanese in traditional foreign language classes: a preliminary empirical study . Japanese Language and Literature 35, 157-180 .

21.

Kondo-Brown, K. and Brown, J. D. 2000: The Japanese Placement Tests at the University of Hawaii: applying item response theory. NFLRC NetWork 20. Honolulu, HI: Second Language Teaching & Curriculum Center, University of Hawaii .

22.

Linacre, J.M. 1996: Facets, version no. 3.0. Chicago: MESA.

23.

Lumley, T. and McNamara, T. F. 1995: Rater characteristics and rater bias: implications for training . Language Testing 12, 54-71 .

24.

Lunz, M. E. and Stahl, J. A. 1990: Judge consistency and severity across grading periods . Evaluation and the Health Professions 13, 425-444 .

25.

Lynch, B. K. and McNamara, T. F. 1998: Using G-theory and many-facet Rasch measurement in the development of performance assessments of the ESL speaking skills of immigrants . Language Testing 15, 158-180 .

26.

McNamara, T. F. 1996: Measuring second language performance. New York: Longman .

27.

McNamara, T. F. and Adams, R. J. 1991: Exploring rater behavior with Rasch Techniques . Paper presented at the Annual Language Testing Research Colloquium (Princeton, NJ, March 21-23). Eric document ED345 498.

28.

McQueen, J. and Congdon, P. J. 1998: Rater severity in large-scale assessment: is it invariant? Paper presented at the Annual Meeting of the American Educational Research Association (Chicago, IL, March 24- 28). ERIC document ED411 303.

29.

Mehrens, W. A. 1992: Using performance assessment for accountability purposes . Educational Measurement: Issues and Practice 11, 3-9, 20-20 .

30.

Morita, F. 1981: Sakubun no hyooka [Evaluating compositions] . Nihongo Kyooiku [Journal of Japanese Language Teaching] 43, 17-33 .

31.

Norris, J. M. , Brown, J. D. , Hudson, T. and Yoshioka, J. 1998: Designing second language performance assessments. Honolulu, HI: Second Language Teaching & Curriculum Center, University of Hawaii at Manoa .

32.

O’Neill, T. R. and Lunz, M. E. 1997: A method to compare rater severity across several administrations . Paper presented at the Annual Meeting of the American Educational Research Association (Chicago, IL, March 24-28). ERIC document ED 412236.

33.

Pennington, M. C. and So, S. 1993: Comparing writing process and product across two languages: a study of 6 Singaporean university student writers . Journal of Second Language Writing 2, 41-63 .

34.

Perkins, K. 1983: On the use of composition scoring techniques, objective measures, and objective tests to evaluate ESL writing ability . TESOL Quarterly 17, 651-671 .

35.

Robinson, P. and Ross, S. 1996: The development of task-based testing in English for academic purposes programs . Applied Linguistics 17, 455-476 .

36.

Sasaki, M. and Hirose, K. 1999: Development of an analytic rating scale for Japanese L1 writing . Language Testing 16, 457-478 .

37.

Shohamy, E. 1995: Performance assessment in language testing . Annual Review of Applied Linguistics 15, 188-211 .

38.

Shohamy, E. , Gordon, C. M. and Kraemer, R. 1992: The effect of raters’ background and training on the reliability of direct writing tests . The Modern Language Journal 76, 27-33 .

39.

Tanaka, M. , Tsubone, Y. and Hajikano, A. 1998a: Dainigengo to shite no nihongo ni okeru sakubun hyooka kijun: nihongo kyooshi to ippan nihonjin no hikaku [Evaluation criteria for writing by non-native speakers: a comparison of survey results for Japanese teachers and non-teachers] , Nihongo Kyooiku [Journal of Japanese Language Teaching] 96, 17-33 .

40.

Tanaka, M. , Hajikano, A. and Tsubone, Y. 1998b: Daini gengo to shite no nihongo ni okeru sakubun hyooka: ‘Ii’ sakubun no kettei yooin [Evaluation criteria for writing by non-native speakers of Japanese: factors affecting the evaluation of ‘good’ writing] . Nihongo Kyooiku [Journal of Japanese Language Teaching] 99, 60-71 .

41.

Tedick, D. J. , editor 1998: Proficiency-oriented language instruction and assessment: a curriculum handbook for teachers. Minneapolis, MN: Center for Advanced Research on Language Acquisition, University of Minnesota .

42.

Weigle, S. C. 1998: Using FACETS to model rater training effects . Language Testing 15, 263-287 .

43.

Wesche, M. 1987: Second language performance testing: the Ontario Test of ESL as an example . Language Testing 4, 28-47 .

44.

Wigglesworth, G. 1993: Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction . Language Testing 10, 305-335 .

45.

Wigglesworth, G. 1994: Patterns of rater behaviour in the assessment of an oral interaction test . Australian Review of Applied Linguistics 17, 77-103 .

A FACETS analysis of rater bias in measuring Japanese second language writing performance

Abstract

Get full access to this article

References