Sage Journals: Discover world-class research

Abstract

Constructing language assessments is a bit like building sandcastles. Language, like sand, is a dynamic phenomenon in a particular ecosystem (for language: Beckner et al., 2009; Haugen, 2001; for sand: Hyndes et al., 2022). Both phenomena shift when there are changes elsewhere in the system, for example, shifts in patterns of language use due to human mobility or new technology. Both language assessments and sandcastles are built from the substance of interest itself (language; sand) along with various kinds of observational infrastructure, for example, tasks and scoring methods for language assessments; buckets and driftwood for sandcastles. Although these infrastructures are designed to capture the phenomena, the very act of capture reshapes the “natural” patterns and the relationships between elements. Therefore, re contextualizing language patterns to observe their use by individuals is an unavoidable and fundamental practice in assessing language. The changes wrought on the intertwined phenomena of learning and language are clearest in its washback: the activity downstream, where certain patterns of use are valued, and certain learning strategies are prioritized. It is little wonder, then, that the field has grappled with questions of context for decades.

As Janna Fox and Natasha Artemeva explain in the opening chapter of their book Reconsidering Context in Language Assessment: Transdisciplinary Perspectives, Social Theories, and Validity, the issue of context in language assessment is thorny, ambiguous and enduring. Their quest is to lay out a toolset of theories and methods that help us to understand and account for context in development and validation research. Firmly rooted in Messick’s (1989) notion of consequential validity and the calls of McNamara (2007) and others to broaden the theoretical scope of language testing, the authors introduce (Chapters 1–3), justify (Chapter 4), demonstrate (Chapters 5–7), and reflect on (Chapter 8) the use of theoretical approaches for understanding “decisions, actions and consequences in context” (p. 4). Drawing on their experiences of working across disciplines, the authors propose a transdisciplinary approach that enables (a) the use of a range of theoretical orientations and (b) contributions from people with diverse disciplinary expertise and/or various relationships to the assessment activities. Transdisciplinarity, as set out in this book, is about partnerships across disciplinary alignments, in which divergent views are forced into discussion as “a catalyst for insight and innovation” (p. 10).

True to its transdisciplinary goal, the book is presented as a conversation in which the authors mediate between two disciplinary clusters that share a mutual interest in language assessment. One cluster is “assessment-centred” communities, comprising test developers, psychometricians, and graduate students in educational measurement or psychology who may be familiar with a more cognitivist approach to research. The other disciplinary cluster is “language-centred” communities, comprising language teachers, writing researchers and graduate students in discourse, culture or writing studies, and others who tend toward more socially oriented approaches to building knowledge. In the first part of the book, the authors set out the key terms of the title—context, language assessment, transdisciplinary perspectives, social theories, and validity—seeking to introduce readers with a language background (i.e., “language-centred” communities) to the concept of validity, and readers with an assessment background (i.e., “assessment-centred” communities) to social theorizations of language ability and language use. While there will be readers who work comfortably within both areas, there is plenty of food for thought in the subsequent chapters for those who are already familiar with validity or social theories.

Chapter 2, titled “Validity as an evolving rhetorical art,” sets out the trajectory of validity theory from the mid-20th century to the present. Readers of Language Testing will be familiar with the narrative involving theorists such as Meehl, Cronbach, Messick, Chapelle, McNamara, Cizek, Kane and Moss, and the evolution of the concept of validity across editions of the Standards for Educational and Psychological Testing. And you will probably recognize the validity narrative: from distinct validities (content, predictive/concurrent, construct) to Messick’s (1989) unitary concept of construct validity which, in combination with Kane’s argument-based scaffolding, has dominated educational and language assessment literature since the 1990s. What is refreshing in this discussion is the forensic attention the authors pay to their primary sources as they interrogate the history and nature of validation methodology. In pursuing their argument for transdisciplinarity, Fox and Artemeva emphasize that foundational theorists were not only alert to the impact of context on experience and behavior (especially Cronbach), but they also advocated strongly for multiple methodologies and viewpoints (especially Messick). The authors go on to contend that, despite the considerable developments in validity theory, we remain beholden to sides in the paradigm wars—the “battle between positivist philosophical principles and interpretivist ones”—a division that tends to be simplified as a methodological distinction between quantitative and qualitative research (Bryman, 2008, pp. 13–14). They propose a broad epistemological and methodological landscape for validation evidence, drawing on Moss’s advocacy for “methodological pluralism” in educational assessment, among other voices (e.g., Addey et al., 2020; Maddox, 2015; Moss & Haertel, 2016).

The problem of recontextualizing the phenomenon of interest in order to observe it, as we do with language use in assessment contexts, is akin to the “observer’s paradox” in sociolinguistic research (Labov, 1972), but for language assessment, it might be even more apt to call it an observer’s dilemma because the issue is not merely that the sample gathered in its new micro-context might be contaminated through the act of observing, it is that the sample gathered is assumed to represent the construct-relevant performance of an individual, and that this assumption forms the basis for inferences and uses. In this new micro-context of the assessment situation, language patterns are in a nested system, subject to its rules, conditions, relations and expectations of “communicative competence” (Harding et al., 2022). Chapter 3 addresses this “conundrum of context” wherein “meaningful interpretability and generalisability” is expected to apply to diverse contexts of assessment use which may shape the construct in unintended ways (pp. 80–81). Fox and Artemeva argue that robust, contextualized argument-based validation agendas call for a more apt theoretical toolkit. Thus, they make the case for social theories. They provide a historical overview that traverses various gear shifts: the cognitive revolution, the communicative turn, the discursive turn, the social turn, the multilingual turn, the visual turn. This is followed by a selection of social theorists and approaches that can be useful in grappling with the conundrum of context. The proposed theoretical approaches range from those more concerned with learning and human sociality (situated learning, communities of practice, cultural-historical activity theory, distributed cognition) to those more focused on language and language ability (rhetorical genre studies, English as a Lingua Franca, intercultural communication and interactional competence).

Turning to research trends, the authors of Chapter 4 (Fox, Montiero, & Artemeva) provide a current state of play with a meta-review of papers in four key language assessment journals over the last couple of decades: Language Testing (LT), Language Assessment Quarterly (LAQ), International Journal of Testing, and Educational Measurement: Issues and Practice. Starting with keywords and word roots and then building a more substantial classification scheme for the theoretical orientations of papers, the authors report that while a cognitive, individualist agenda is dominant in the journals, an awareness of social theoretical perspectives is on the rise. There is increasing attention given to the context in publications and a “modest,” but increasing use of social theories, concepts, and perspectives to inform research, especially in LT and LAQ which share a language specialization (pp. 131–132). The ensuing discussion traces positions on the role of “context” in language assessment, from seeing context as one background variable within a quantitative, individualist paradigm to a relational view of scores, inferences, and contexts manifesting in social consequences. The remainder of the chapter travels vast topics such as fairness, bias, and impact. Although a bit unwieldy in its structure, the chapter revolves around the central theme of assembling socially oriented approaches that grapple with key processes in the field: assessing learners (e.g., dynamic assessment), situating assessment (e.g., ecological theory) and interpreting scores (e.g., critical language testing).

The subsequent three chapters (Part II—Chapters 5–7) present three separate studies, each of which demonstrates how diverse social theories and transdisciplinary research partnerships can contribute to research on language assessment. The first study is the development and validation of a high-stakes oral proficiency test for pilots and air traffic controllers (Montiero & Fox), the second is the development of a diagnostic rating scale used to identify new engineering students in need of academic support (Fox & Artemeva), and the third describes classroom-based assessments in technologically-mediated spaces (Hartwick & Fox). Across these three examples, an array of theoretical tools and approaches was employed: English as a Lingua Franca (Jenkins et al., 2011), intercultural communication (Baker, 2011), interactional competence (Young, 2011), distributed cognition (Hutchins, 1995), rhetorical genre studies (Freedman & Medway, 1994), affordance theory (Gibson, 2015), learning-oriented assessment (Turner & Purpura, 2016), and complex adaptive systems (Larsen-Freeman & Cameron, 2008) among them. While demonstrating the applicability of these various theoretical approaches, the three studies offer very different assessment methods, from high-stakes testing to classroom-based methods.

The final chapter titled “Language Assessment in the wild” (Fox & Artemeva) takes a narrative approach, offering the authors’ personal reflections on their transdisciplinary collaborations. I was struck by how central communication is to their multi-party transdisciplinary efforts. When talking across disciplinary boundaries (with researchers, traditionally defined) and discourse boundaries (with practitioners, stakeholders, and other groups), revelations about differences can emerge. These differences range from understanding each other’s terminologies to divergent worldviews. Drawing again on Messick’s (1989) advocacy for “mutual confrontation of theoretical systems” (p. 61), the project backstories recounted in this chapter are illuminating, not just for the insights they provide on the practicalities of transdisciplinarity, but for the fascinating and frustrating tangle of project schedules, staff turnover, policy constraints, embedded practices, disciplinary expectations and personalities that will be familiar to anyone who has embarked on major assessment projects in institutional and government contexts. The chapter reminds us that transdisciplinarity is not just about disciplines developing insightful new networks, but it includes bringing together different worlds. In this sense, the field of language testing/assessment might be considered well-placed for transdisciplinarity because its central phenomenon—score meanings—are in fact objects that communicate across worlds (Macqueen et al., 2016).

Given the big picture focus of the book, questions of the vulnerability, and even isolation, of the field of language assessment are discussed. We are a field that has arisen from what may seem to colleagues in other disciplines as relatively mundane practices, yet the recent burst in artificial intelligence tools based on Large Language Models (LLMs) is, if anything, a clarion call for the general need for expertise that is both language-informed and assessment-informed, that is, expertise from the two communities addressed in this book.

Validity theory has been concerned with the consequences of test use, and this is largely the lens of this book, but if we view assessment methods themselves as the consequences of technological developments, the scope of validity inquiry expands, and with it, ethical considerations. For example, Paullada and colleagues (2021) draw attention to the datasets underpinning LLM applications, observing that “prevailing data practices tend to abstract away the human labor, subjective judgments and biases, and contingent contexts involved in dataset production” (pp. 1–2). Arguably, this broader societal impact is within the ethical remit of tests that rely on LLMs for their infrastructure, as are the environmental impacts of such tools (Bender et al., 2021). At the level of what is actually operationalized in LLM use, the construct validity of LLMs has been called into question when there is a mismatch between the training of the model (e.g., to decode language) and the use of it (e.g., to provide world knowledge) (Raji et al., 2021), echoing the construct validity concerns that arise when language tests are used for purposes they were not designed for. In relation to the capacities claimed of LLMs, Bender and Koller (2020) have called for “precise language use when talking about the success of current models and for humility in dealing with natural language” (p. 5193), a call which resonates strongly with the transdisciplinary challenges set out in Chapter 8. Furthermore, a critical awareness of the contextual forces driving assessment practices is increasingly necessary across the various activities associated with language assessment; from considering market pressure as a force for cheaper test methods and the impact of this pressure on constructs, to retheorizing the very notion of an “individual performance” in the age of artificial intelligence.

Thus, the book’s joint focuses of context and transdisciplinarity are timely. I agree with Fox and Artemeva’s sense that we are at “a turning point in the evolution of research practices” with a “growing awareness and evidence of transdisciplinarity” (p. 6). Transdisciplinarity is gaining traction in applied linguistics more generally (Douglas Fir Group, 2016; Hiver et al., 2022; Larsen-Freeman, 2012). Larsen-Freeman (2012) proposed that a transdisciplinary approach may allow dynamism and context to be properly considered and not simply a backdrop in which the action occurs (p. 208). While the role of context has occupied relatively long discussion threads in the fields of educational measurement (Ercikan & Roth, 2009; Messick, 1989; Mislevy, 2018; Moss, 1998) and language assessment (Bachman, 2007; Chalhoub-Deville, 2016; McNamara, 1997, 2007; McNamara & Roever, 2006; Shohamy, 2001, 2007), particularly in relation to assessment use and consequences, it remains a challenge to actually define context. As Larsen-Freeman (2012) noted, it is not possible to study everything so boundaries have to be set (p. 208). “Context,” then, is not something that simply surrounds an assessee’s performance, but is instead, inextricably embedded within the performance itself. In this sense, three layers of context are operationalized within assessment constructs (Knoch & Macqueen, 2020; Macqueen, 2022): (1) the societal context that engenders the status afforded to assessed languages and the trust afforded to assessment practices (e.g., Broadfoot, 1979; Shohamy, 2006), (2) the infrastructural context of the assessment practices in which societal patterns are recontextualized for observation through context-based notions such as “target language use tasks” and “criterion” (e.g., Bachman & Palmer, 2010, p. 63; Brindley, 1991, p. 140) and (3) the simulation context which emerges at the moment of engagement between an assessee and the other layers of context (e.g., McNamara, 1997, 2007). What Fox and Artemeva contribute to the field is the articulation of a set of theories and commensurate methodologies that enable us to interrogate these contextual layers in validation research. The authors’ efforts to set out a disciplinary matrix around language assessment reveal our disciplinary strengths, vulnerabilities and potential avenues of collaboration. May it inspire us to look beyond our sandcastles, to the sea, the sky and the beach.

Footnotes

ORCID iD

Susy Macqueen

References

Addey

Maddox

Zumbo

B. D.

(2020). Assembled validity: Rethinking Kane’s argument-based approach in the context of International Large-Scale Assessments (ILSAs). Assessment in Education: Principles, Policy & Practice, 27(6), 588–606. https://doi.org/10.1080/0969594X.2020.1843136

Bachman

L. F.

(2007). What is the construct? The dialectic of abilities and contexts in defining constructs in language assessment. In Fox

Wesche

Bayliss

Cheng

Turner

C. E.

Doe

(Eds.), Language testing reconsidered (pp. 41–71). University of Ottawa Press.

Bachman

L. F.

Palmer

A. S.

(2010). Language assessment in practice. Oxford University Press.

Baker

(2011). Intercultural awareness: Modelling an understanding of cultures in intercultural communication through English as a Lingua Franca. Language and Intercultural Communication, 11(3), 197–214. https://doi.org/10.1080/14708477.2011.577779

Beckner

Blythe

Bybee

Christiansen

M. H.

Croft

Ellis

N. C.

Holland

Larsen-Freeman

Schoenemann

(2009). Language is a complex adaptive system: Position paper. Language Learning, 59(Suppl. 1), 1–27. https://doi.org/10.1111/j.1467-9922.2009.00533.x

Bender

E. M.

Gebru

McMillan-Major

Shmitchell

(2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. https://doi.org/10.1145/3442188.3445922

Bender

E. M.

Koller

(2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the association for computational linguistics. The Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.463

Brindley

(1991). Defining language ability: The criteria for criteria. In Anivan

(Ed.), Current developments in language testing (pp. 139–164). SEAMEO Regional Language Centre.

Broadfoot

(1979). Assessment, schools and society. Methuen.

10.

Bryman

(2008). The end of the paradigm wars? In Alasuutari

Bickman

Brannen

(Eds.), The SAGE handbook of social research methods (pp. 13–25). Sage.

11.

Chalhoub-Deville

(2016). Validity theory: Reform policies, accountability testing, and consequences. Language Testing, 33(4), 453–472. https://doi.org/10.1177/0265532215593312

12.

Douglas Fir Group. (2016). A transdisciplinary framework for SLA in a multilingual world. The Modern Language Journal, 100(S1), 19–47. https://doi.org/10.1111/modl.12301

13.

Ercikan

Roth

W.-M.

(2009). Generalizing from educational research. Routledge.

14.

Freedman

Medway

(1994). Genre and the new rhetoric. Taylor & Francis.

15.

Gibson

J. J.

(2015). The ecological approach to visual perception (Classic ed.). Psychology Press.

16.

Harding

Macqueen

Pill

(2022). Assessing communicative competence. In Kanwit

Solon

(Eds.), Communicative competence in a second language: Theory, method, and applications (pp. 187–207). Routledge. https://doi.org/10.4324/9781003160779-14

17.

Haugen

(2001). The ecology of language. In Fill

Mühlhäusler

(Eds.), The ecolinguistics reader: Language, ecology and environment (pp. 57–66). Continuum.

18.

Hiver

Al-Hoorie

A. H.

Larsen-Freeman

(2022). Toward a transdisciplinary integration of research purposes and methods for complex dynamic systems theory: Beyond the quantitative–qualitative divide. International Review of Applied Linguistics in Language Teaching, 60(1), 7–22. https://doi.org/10.1515/iral-2021-0022

19.

Hutchins

(1995). Cognition in the wild. MIT Press.

20.

Hyndes

G. A.

Berdan

E. L.

Duarte

Dugan

J. E.

Emery

K. A.

Hambäck

P. A.

Henderson

C. J.

Hubbard

D. M.

Lastra

Mateo

M. A.

(2022). The role of inputs of marine wrack and carrion in sandy-beach ecosystems: A global review. Biological Reviews, 97(6), 2127–2161. https://doi.org/10.1111/brv.12886

21.

Jenkins

Cogo

Dewey

(2011). Review of developments in research into English as a lingua franca. Language Teaching, 44(3), 281–315. https://doi.org/10.1017/S0261444811000115

22.

Knoch

Macqueen

(2020). Assessing English for professional purposes. Routledge. https://doi.org/10.4324/9780429340383

23.

Labov

(1972). Sociolinguistic patterns. University of Pennsylvania Press.

24.

Larsen-Freeman

(2012). Complex, dynamic systems: A new transdisciplinary theme for applied linguistics? Language Teaching, 45(2), 202–214. https://doi.org/10.1017/S0261444811000061

25.

Larsen-Freeman

Cameron

(2008). Complex systems and applied linguistics. Oxford University Press.

26.

Macqueen

(2022). Construct in assessments of spoken language. In Haug

Knoch

Mann

(Eds.), Handbook of language assessment across modalities (pp. 233–249). Oxford University Press.

27.

Macqueen

Pill

Knoch

(2016). Language test as boundary object: Perspectives from test users in the healthcare domain. Language Testing, 33(2), 271–288. https://doi.org/10.1177/0265532215607401

28.

Maddox

(2015). The neglected situation: Assessment performance and interaction in context. Assessment in Education: Principles, Policy & Practice, 22(4), 427–443. https://doi.org/10.1080/0969594X.2015.1026246

29.

McNamara

(1997). “Interaction” in second language performance assessment: Whose performance? Applied Linguistics, 18(4), 446–466. https://doi.org/10.1093/applin/18.4.446

30.

McNamara

(2007). Language testing: A question of context. In Fox

Wesche

Bayliss

Cheng

Turner

C. E.

Doe

(Eds.), Language testing reconsidered (pp. 131–137). University of Ottawa Press.

31.

McNamara

Roever

(2006). Language testing: The social dimension. Blackwell Publishing.

32.

Messick

(1989). Validity. In Linn

R. L.

(Ed.), Educational measurement (3rd ed., pp. 13–103). American Council on Education and Macmillan.

33.

Mislevy

R. J.

(2018). Sociocognitive foundations of educational measurement. Routledge.

34.

Moss

P. A.

(1998). The role of consequences in validity theory. Educational Measurement: Issues and Practice, 17(2), 6–12. https://doi.org/10.1111/j.1745-3992.1998.tb00826.x

35.

Moss

P. A.

Haertel

E. H.

(2016). Engaging methodological pluralism. In Gitomer

D. H.

Bell

C. A.

(Eds.), Handbook of research on teaching (5th ed., pp. 127–248). American Educational Research Association.

36.

Paullada

Raji

I. D.

Bender

E. M.

Denton

Hanna

(2021). Data and its (dis)contents: A survey of dataset development and use in machine learning research. Patterns, 2(11), 1–14. https://doi.org/10.1016/j.patter.2021.100336

37.

Raji

I. D.

Bender

E. M.

Paullada

Denton

Hanna

(2021). AI and the everything in the whole wide world benchmark. arXiv (arXiv:2111.15366v1). https://doi.org/10.48550/arXiv.2111.15366

38.

Shohamy

(2001). The power of tests: A critical perspective on the uses of language tests. Longman.

39.

Shohamy

(2006). Language policy: Hidden agendas and new approaches. Routledge.

40.

Shohamy

(2007). Language tests as language policy tools. Assessment in Education, 14(1), 117–130. https://doi.org/10.1080/09695940701272948

41.

Turner

C. E.

Purpura

J. E.

(2016). Learning-oriented assessment in second and foreign language classrooms. In Tsagari

Banerjee

(Eds.), Handbook of second language assessment (pp. 255–273). De Gruyter Mouton.

42.

Young

R. F.

(2011). Interactional competence in language learning, teaching, and testing. In Hinkel

(Ed.), Handbook of research in second language teaching and learning (Vol. II, pp. 426–443). Routledge.

Book Reviews: Reconsidering Context in Language Assessment: Transdisciplinary Perspectives,Social Theories,and Validity J. Fox and N. Artemeva

Abstract

Footnotes

ORCID iD

References