Sage Journals: Discover world-class research

Abstract

This study compared holistic and analytic marking methods for their effects on parameter estimation (of examinees, raters, and items) and rater cognition in assessing speech act production in L2 Chinese. Seventy American learners of Chinese completed an oral Discourse Completion Test assessing requests and refusals. Four first-language (L1) Chinese raters evaluated the examinees’ oral productions using two four-point rating scales. The holistic scale simultaneously included the following five dimensions: communicative function, prosody, fluency, appropriateness, and grammaticality; the analytic scale included sub-scales to examine each of the five dimensions. The raters scored the dataset twice with the two marking methods, respectively, and with counterbalanced order. They also verbalized their scoring rationale after performing each rating. Results revealed that both marking methods led to high reliability and produced scores with high correlation; however, analytic marking possessed better assessment quality in terms of higher reliability and measurement precision, higher percentages of Rasch model fit for examinees and items, and more balanced reference to rating criteria among raters during the scoring process.

Keywords

Analytic marking holistic marking L2 Chinese marking methods pragmatics speech acts

Get full access to this article

View all access options for this article.

References

Archer

Aijmer

Wichmann

(2012). Unite A9: Pragmatics, prosody, and gesture. In Archer

Aijmer

Wichmann

(Eds.), Pragmatics: An advanced resource book for students (pp. 96–109). Routledge.

Bacha

(2001). Writing evaluation: What can analytic versus holistic essay scoring tell us? System, 29(3), 371–383. https://doi.org/10.1016/S0346-251X(01)00025-2

Bardovi-Harlig

(1999). Exploring the interlanguage of interlanguage pragmatics: A research agenda for acquisitional pragmatics. Language Learning, 49(4), 677–713. https://doi.org/10.1111/0023-8333.00105

Bardovi-Harlig

(2003). Understanding the role of grammar in the acquisition of L2 pragmatics. In Martínez

Usó

Fernández

(Eds.), Pragmatic competence and foreign language teaching (pp. 21–44). Servei de Publicacions de la Univerisitat Jaume I.

Barkaoui

(2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54–74. https://doi.org/10.1080/15434300903464418

Barkaoui

(2011). Effects of marking method and rater experience on ESL essay scores and rater performance. Assessment in Education: Principles, Policy & Practice, 18(3), 279–293. https://doi.org/10.1080/0969594X.2010.526585

Bialystok

(1993). Symbolic representation and attentional control. In Kasper

Blum-Kulka

(Eds.), Interlanguage pragmatics (pp. 43–57). Oxford University Press.

Bond

Fox

C. M.

(2015). Applying the Rasch model: Fundamental measurement in the human sciences (3rd ed.). Routledge.

Brazil

(1997). The communicative value of intonation in English. Cambridge University Press.

10.

Brown

Iwashita

McNamara

(2005). An examination of rater orientations and test-taker performance on English-for-academic-purposes speaking tasks (TOEFL Monograph Series N 29). Educational Testing Service. https://doi.org/10.1002/j.2333-8504.2005.tb01982.x

11.

Carr

(2000). A comparison of the effects of analytic and holistic rating scale types in the context of composition tests. Issues in Applied Linguistics, 11(2), 207–241. https://doi.org/10.5070/L4112005035

12.

Chen

Liu

(2016). Constructing a scale to assess L2 written speech act performance: WDCT and e-mail tasks. Language Assessment Quarterly, 13(3), 231–250. https://doi.org/10.1080/15434303.2016.1213844

13.

Creswell

J. W.

Plano Clark

V. L.

(2018). Designing and conducting mixed-methods research (3rd ed.). SAGE.

14.

Cumming

Kantor

Powers

(2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal, 86(1), 67–96. https://doi.org/10.1111/1540-4781.00137

15.

Ecks

(2015). Introduction to many-facet Rasch measurement. Peter Lang.

16.

Faerch

Kasper

(1984). Pragmatic knowledge: Rules and procedures. Applied Linguistics, 5(3), 214–225. https://doi.org/10.1093/applin/5.3.214

17.

Goulden

N. R.

(1994). Relationship of analytic and holistic methods to raters’ scores for speeches. Journal of Research & Development in Education, 27(2), 73–82.

18.

Hamp-Lyons

(1990). Second language writing: Assessment issues. In Kroll

(Ed.), Second language writing (pp. 69–87). Cambridge University Press. https://doi.org/10.1017/CBO9781139524551.009

19.

Harsch

Martin

(2013). Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice, 20(3), 281–307. https://doi.org/10.1080/0969594X.2012.742422

20.

Hudson

Detmer

Brown

J. D.

(1995). Developing prototypic measures of cross-cultural pragmatics. Hawai’i at Manoa Second Language Teaching & Curriculum Center.

21.

Huot

(1990). Reliability, validity, and holistic scoring: What we know and what we need to know. College Composition and Communication, 41(2), 201–213. https://doi.org/10.2307/358160

22.

Iwashita

Grove

(2003). A Comparison of analytic and holistic scales in the context of a specific-purpose speaking test. Prospect: An Australian Journal of TESOL, 18(3), 25–35.

23.

Janney

R. W.

Arndt

(2005). Intracultural tact versus intercultural tact. In Watts

Ide

Ehlich

(Eds.), Politeness in language: Studies in its history, theory and practice (pp. 21–41). Mouton de Gruyter. (Original work published 1992)

24.

Kang

(2019). Prosody in L2 pragmatics research. In Taguchi

(Ed.), The Routledge handbook of second language acquisition and pragmatics (pp. 78–92). Routledge. https://doi.org/10.4324/9781351164085-6

25.

Kasper

(2006). Speech acts in interaction: Towards discursive pragmatics. In Bardovi-Harlig

Felix-Brasdefer

J. C.

Omar

A. S.

(Eds.), Pragmatics and language learning (Vol. 11, pp. 281–314). National Foreign Language Resource Center, University of Hawai’i at Manoa.

26.

Khabbazbashi

Galaczi

E. D.

(2020). A comparison of holistic, analytic, and part marking models in speaking assessment. Language Testing, 37(3), 333–360. https://doi.org/10.1177/0265532219898635

27.

Lee

(2013). The influence of social situations on fluency difficulty in Korean EFL learners’ oral refusals. Journal of Pragmatics, 50, 168–186. https://doi.org/10.1016/j.pragma.2013.01.002

28.

Lee

Y. W.

Gentile

Kantor

(2010). Toward automated multi-trait scoring of essays: Investigating links among holistic, analytic, and text feature scores. Applied Linguistics, 31(3), 391–417. https://doi.org/10.1093/applin/amp040

29.

Leech

(2014). The pragmatics of politeness. Oxford University Press.

30.

(2015). A comparison of EFL raters’ essay-rating processes across two types of rating scales. Language Assessment Quarterly, 12(2), 178–212. https://doi.org/10.1080/15434303.2015.1011738

31.

(2012). The effects of input-based practice on pragmatic development of requests in L2 Chinese. Language Learning, 62(2), 403–438. https://doi.org/10.1111/j.1467-9922.2011.00629.x

32.

(2014). The effects of different levels of linguistic proficiency on the development of L2 Chinese request production during study abroad. System, 45, 103–116. https://doi.org/10.1016/j.system.2014.05.001

33.

(2021). Pragmatics assessment in English as an international language (EIL). In Tajeddin

Alemi

(Eds.), English as an international language: Pragmatic pedagogy (pp. 191–211). Routledge. https://doi.org/10.4324/9781003097303-11

34.

Taguchi

(2014). The effects of practice modality on the development of pragmatic performance in L2 Chinese. The Modern Language Journal, 98(3), 794–812. https://doi.org/10.1111/modl.12123

35.

Taguchi

Xiao

(2019). Variations in rating scale functioning in assessing pragmatic performance in L2 Chinese. Language Assessment Quarterly, 16(3), 271–293. https://doi.org/10.1080/15434303.2019.1648473

36.

Linacre

J. M.

(2021). A user’s guide to FACETS Rasch-model computer programs. Program manual (Version 3.83.5) [Computer software]. https://www.winsteps.com/manuals.htm

37.

Taguchi

(2011). Rater variation in the assessment of speech acts. Pragmatics, 21(3), 453–471. https://doi.org/10.1075/prag.21.3.08tag

38.

Taguchi

(2012). Context, individual differences, and pragmatic competence. Multilingual Matters. https://doi.org/10.21832/9781847696106

39.

Taguchi

Hirschi

Kang

(2022). Longitudinal L2 development in the prosodic marking of pragmatic meaning: Prosodic changes in L2 speech acts and individual factors. Studies in Second Language Acquisition, 44(3), 843–858. https://doi.org/10.1017/S0272263121000486

40.

Taguchi

(2021). Contrastive pragmatics and second language pragmatics: Approaches to assessing L2 speech act production. Contrastive Pragmatics, 2(1), 1–23. https://doi.org/10.1163/26660393-BJA10014

41.

Taguchi

Roever

(2017). Second language pragmatics. Oxford University Press.

42.

Tajeddin

Alemi

(2014). Pragmatic rater training: Does it affect non-native L2 teachers’ rating accuracy and bias? Iranian Journal of Language Testing, 4(1), 66–83.

43.

Timpe-Laughlin

Wain

Schmidgall

(2015). Defining and operationalizing the construct of pragmatic competence: Review and recommendations. ETS Research Report Series, 2015(1), 1–43. http://doi.org/10.1002/ets2.12053

44.

Weigle

S. C.

(2002). Assessing writing. Cambridge University Press. https://doi.org/10.1017/CBO9780511732997

45.

White

E. M.

(1984). Holisticism. College Composition and Communication, 35(4), 400–409. https://doi.org/10.2307/357792

46.

Wiseman

C. S.

(2012). Rater effects: Ego engagement in rater decision-making. Assessing Writing, 17(3), 150–173. https://doi.org/10.1016/j.asw.2011.12.001

47.

Wright

B. D.

Masters

G. N.

(2002). Number of persons or item strata. Rasch Measurement Transactions, 16(3), Article 888. https://www.rasch.org/rmt/rmt163f.htm

48.

(2007). Evaluating analytic scoring for the TOEFL^® Academic Speaking Test (TAST) for operational use. Language Testing, 24(2), 251–286. https://doi.org/10.1177/0265532207076365

49.

Youn

(2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199–225. https://doi.org/10.1177/0265532214557113

50.

Zhang

Xie

Wang

Zhang

(2010). Report on new HSK development. China Examinations, 9, 38–43.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.39 MB

Comparing holistic and analytic marking methods in assessing speech act production in L2 Chinese

Abstract

Keywords

Get full access to this article

References

Supplementary Material