Sage Journals: Discover world-class research

Abstract

Test preparation has garnered considerable attention in second language (L2) education due to the significant implications that successful performance on a language test may have for academic advancement, future career opportunities, and immigration prospects. Meanwhile, an overemphasis on test preparation has been criticized for encouraging the cultivation of construct-irrelevant test-taking strategies at the expense of developing general language proficiency. To systematically explore how test preparation has been investigated in the literature, we conducted a scoping review of 66 studies on L2 test preparation. Specifically, this study examined the key characteristics of publications on test preparation, the main themes explored, the study and participant characteristics, as well as the essential aspects of their research methodologies. The results of this review revealed various trends in the literature on L2 test preparation, such as the exclusive focus on English as the target language, the lack of diversity in stakeholders as participants, the dominance of international language tests, and the paucity of experimental studies that utilize advanced statistical techniques. In addition to interpreting the results of our analysis, we discuss the implications of this scoping review and outline several directions for future research on test preparation.

Keywords

Coaching language test Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)scoping review test preparation washback

Introduction

Overview and definitions

Language tests are widely used for a broad range of purposes around the globe. Driven by expanding globalization and the internationalization of education, there has been a steady increase in the number of international students, migrant workers, and people seeking permanent immigration who are required to demonstrate their proficiency in the target language through standardized language tests (Yu & Green, 2021). Successful performance on such tests depends to a large degree on the type, amount, and quality of test preparation activities and practices (Knoch et al., 2020). As a result, test preparation plays an essential role in language education and, in the case of some Asian countries, has become “a massive enterprise” and “a powerful industry” (Ross, 2008, p. 7). In South Korea, for example, up to 2% of the gross national product is reportedly spent on learning English and preparing for English language tests (Hincks, 2015).

In the literature, test preparation is also referred to as coaching (Hu & Trenkic, 2021; Messick, 1981), teaching to the test (Menken, 2006; Popham, 2001), and test-wiseness or test-taking strategy training (Yu & Green, 2021). As J. Ma (2017) argued, the term “test preparation” is considered to be more inclusive and neutral as it can be applied to various practices in both in-class and outside-of-class contexts. Furthermore, test preparation does not bear any negative connotations, unlike the terms “coaching” and “teaching to the test” that are oftentimes associated with cramming, test-deviousness strategies (Cohen, 2021), and other questionable practices aimed at inflating test scores.

Recognized as a complex, multidimensional construct, test preparation is also deemed to be a contentious phenomenon (Yu et al., 2017; Yu & Green, 2021) and has been dubbed “a double-edged sword” (Cheng & Doe, 2013, p. 19). On the one hand, pedagogically sound test preparation practices can help second language (L2) learners increase their overall language proficiency. On the other hand, improper practices can encourage the use of test-deviousness strategies and other tricks to artificially boost L2 learners’ test scores, especially in high-stakes testing contexts. Crocker (2006), for instance, claims that “[n]o activity in educational assessment raises more instructional, ethical, and validity issues than preparation for large-scale, high-stakes tests” (p. 115). In essence, the issues surrounding test preparation are a matter of an ongoing debate about its impact on the development of language proficiency, validity of test scores, instructional practices, as well as equity and fairness in language assessment (Gebril & Eid, 2017; J. Ma, 2017; Yu et al., 2017; Yu & Green, 2021).

Types of test preparation

Messick (1981) distinguishes between three main types of test preparation (which he refers to as coaching) based on the outcomes: test preparation leading to “genuine improvements” in the measured skills, test preparation resulting in “enhanced test-taking sophistication,” and test preparation focusing on the development of “heightened test-taking artifice” (p. 1). The first type of test preparation leads to the development of the language skill(s) measured by the test, thus contributing to the validity of scores. The second type of test preparation aims at familiarizing test takers with the test format and making them less anxious, which can also be viewed as having a positive effect on the validity of test scores. Finally, the third type of test preparation emphasizes the development of test-deviousness strategies (such as using various tactics to guess the correct answer) to artificially inflate test takers’ scores, which poses a clear threat to the validity of such scores. In sum, while the first two types of test preparation can be viewed as construct-relevant, the last type is a source of construct-irrelevant variance. Knoch et al. (2020) refer to the aforementioned types of test preparation as Type 1, Type 2, and Type 3, respectively.

A similar classification of test preparation types is provided by Fulcher (2010, p. 288). According to Fulcher, two main types of test preparation are available to test takers. The first type focuses on introducing L2 learners to the test, its format, item types, and other relevant information that learners may need to understand the structure of the test and the procedure for its completion. This type appears to correspond to Type 2 in Messick’s (1981) classification. Fulcher’s (2010) second type of test preparation aims at helping L2 learners achieve higher scores on the test by teaching them how to apply test-deviousness strategies (Cohen, 2021) rather than supporting the development of their language skills (similar to Type 3 in Messick, 1981). Unlike the first type, the second type of test preparation has been criticized for incentivizing “shadow education” (i.e., private tutoring) (Yung, 2015) and promoting unethical practices that introduce construct-irrelevant variance (Haladyna et al., 1991; Hamp-Lyons, 1998).

Overall, the above-mentioned classifications fundamentally recognize the same three types of test preparation, with Fulcher’s (2010) classification distinguishing only two of the three types (see Table 1 for a summary).

Table 1.

Summary of the three types of test preparation.

	Type 1^a test preparation	Type 2 test preparation	Type 3 test preparation
Learning Outcomes	Test takers make genuine improvements in language skills measured by the test.	Test takers become familiar with the test format, item types, and other relevant information, which reduces anxiety surrounding test completion.	Test takers learn to apply test-deviousness strategies to artificially increase their test scores.
Score Validity	Contributes positively to score validity	Contributes positively to score validity	Threatens the validity of test scores
Construct Relevance	Construct relevant	Construct relevant	Construct irrelevant

Note. ^aType 1 is recognized by Messick (1981) and Knoch et al. (2020), but not Fulcher (2010).

Research on test preparation

Test preparation in L2 contexts is commonly examined as part of washback research (Alderson & Hamp-Lyons, 1996; Allen, 2016a, 2016b; Di Gennaro, 2017; Green, 2007; Tsang & Isaacs, 2022; Wall & Alderson, 1993). Defined as the impact of a test on teaching and learning (Green, 2007), the concept of washback (or backwash) in language testing is inextricably tied to test preparation. Specifically, various aspects of test preparation, including the content, learning practices, teaching methods, as well as learners’ and teachers’ cognitive (e.g., beliefs) and affective (e.g., motivation) processes, are significantly shaped and affected by the test that L2 learners prepare to take (Cheng et al., 2015; Green, 2007; J. Ma, 2017).

As J. Ma (2017) and Yu and Green (2021) suggest, there are two main strands of research on test preparation. The first strand comprises studies investigating the processes that underlie test preparation, such as the types of test preparation practices that prospective test takers engage in both in class and outside of class (Knoch et al., 2020; J. Ma, 2017), the pedagogical approaches used by teachers to help learners with test preparation (Clark & Yu, 2022a; Irvine-Niakaris & Kiely, 2015), characteristics of test preparation materials (Wall & Horák, 2011), and participants’ attitudes and perceptions related to test preparation (Xie, 2015; Zhan & Wan, 2016). The second strand concerns research on the effectiveness and products of test preparation. In this strand, researchers have examined how different test preparation practices affect test scores (Lertcharoenwanich, 2022) and the development of language proficiency (Minakova, 2020).

Empirical evidence suggests that test preparation can be affected by a broad range of variables, including both teacher-related and context-specific variables (see Gebril & Eid, 2017; Green, 2013). Test preparation has also been shown to have a profound impact on both learning and teaching (Stoneman, 2006), which can be both positive and negative (Yu & Green, 2021). While the importance of test preparation is widely recognized, the ongoing debate about its impact on the curriculum and instruction warrants a more in-depth examination of test preparation practices due to the variety and complexity of factors involved. Furthermore, despite a large number of studies in this area, concerns have been raised about the methodological limitations of existing research on test preparation (e.g., Xie, 2015), including a notable scarcity of quantitative approaches and a heavy emphasis on observational and perceptual studies exploring participants’ beliefs, perceptions, and preferences regarding different aspects of test preparation, including its effectiveness and impact. In addition, the majority of studies appear to have explored test preparation for high-stakes English language proficiency tests such as IELTS and TOEFL. These concerns and observations point to the need for a comprehensive and evidence-based synthesis of published research on test preparation, including scrutinizing the methodological aspects of existing studies in this domain. Conducting such a review would help clarify the extent, range, and nature of research on test preparation, as well as to identify overarching trends and patterns in the methodologies utilized across multiple studies. Given that, to our knowledge, no systematic review of the literature on test preparation in L2 contexts has been done to date, we conducted a scoping review, which is a type of systematic review that “summarizes substantive and methodological features of primary studies on a particular topic” (Chong & Reinders, 2022, p. 3). Our scoping review aimed to answer the following research questions (RQs):

How much research on test preparation is there? What are the characteristics of publications on test preparation in L2 contexts?

What are the main themes explored in primary studies on test preparation?

What are the study and participant characteristics in primary studies on test preparation?

What types of research paradigms, research designs, data collection methods, data collection instruments, and data analytic methods are used in primary studies on test preparation?

Methodology

Research design overview

This study presents a scoping review of the current state of knowledge on test preparation in an L2 context. To carry out this review, we followed the guidelines for conducting systematic reviews from Macaro (2020) and Newman and Gough (2020), as well as the methodological framework for conducting qualitative research synthesis described in Chong and Reinders (2020) and Chong and Plonsky (2021). When working on this section, we also followed the Journal Article Reporting Standards for qualitative meta-analyses set by the American Psychological Association (APA) Style (Levitt et al., 2018) and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Page et al., 2021).

Research team description

The research team is led by a principal investigator (Ruslan Suvorov) who specializes in language testing and assessment. Three research collaborators (Shanshan He, Anne-Marie Sénécal, and Laura Stansfield) are currently pursuing doctoral studies in education, with expertise and interest in language testing and assessment research. Two of the four researchers (Shanshan He and Ruslan Suvorov) have previous experience with conducting and publishing a systematic review in this field.

Study selection

Inclusion and exclusion criteria

The initial step of this scoping review involved establishing a set of inclusion and exclusion criteria to guide the selection of studies and ensure the review’s quality.

The primary studies included in the review met the following inclusion criteria:

examined test preparation as part of washback (or backwash) for L2 tests;

involved participants in second or foreign language contexts;

reported research across the methodological spectrum (i.e., quantitative, qualitative, and mixed methods);

were published in any year;

were disseminated as journal articles, chapters in edited volumes, research reports, and unpublished doctoral dissertations available in repositories such as ProQuest (following Macaro’s, [2020] guidelines);

completed a peer review process (with the exception of doctoral dissertations, which are not peer-reviewed publications).

We excluded studies with the following characteristics:

explored test preparation in contexts unrelated to washback (or backwash) for L2 tests;

included participants outside of a second or foreign language learning context;

were disseminated through other types of publications (e.g., non-empirical studies, test reviews, literature reviews, meta-analyses, editorials, masters dissertations, published abstracts, etc.);

were not subjected to peer review.

Search strategy, screening, and eligibility assessment

We followed the PRISMA guidelines (Page et al., 2021) to identify possible studies, screen them, and assess them for eligibility based on our inclusion and exclusion criteria. To guide the study search, various combinations of the following search terms were used: washback, backwash, test preparation, coaching, language assessment, language test, and language exam. We selected these specific search terms through an iterative process of identifying the keywords, synonyms, and closely associated terms used in research on language test preparation. The final search string was as follows: (“test preparation” OR “coaching” OR “washback” OR “backwash”) AND (“language test*” OR “language exam*”).

To identify possible studies, three types of searches were used: database searches, journal searches, and forward and backward citation searches. First, two researchers from the team used the search string in Educational Resources Information Center (ERIC) and Linguistics and Language Behavior Abstracts (LLBA), which are the two most commonly used databases for research syntheses in applied linguistics (In’nami & Koizumi, 2010). Limited to “peer-reviewed” records only, the search of ERIC and LLBA databases yielded a total of 550 records (see the PRISMA flowchart in Figure 1). These records were then imported into Covidence, a web-based application for managing systematic reviews (Veritas Health Innovation, 2024). Among the imported records, 108 duplicates were removed. The titles and abstracts of the remaining 442 records were individually screened by the two researchers in Covidence, with the agreement of 93%. The remaining 7% (i.e., 31 records) were discussed to achieve consensus. After completing the initial screening, the two researchers retrieved 57 full-text studies and assessed them for eligibility. Out of these, 19 studies were further excluded because they did not meet the inclusion criteria.

Figure 1.

PRISMA flowchart for study selection.

In addition to the database search, the research team searched the journals Language Testing and Language Assessment Quarterly and conducted backward and forward citation search. These journals were chosen for their prominence in the field and history of publishing research on test preparation. Backward citation search was performed by examining the reference lists of studies selected from the databases, whereas forward citation search was done via Google Scholar. The search via these two methods yielded 44 additional studies. After screening the full texts of these studies, the research team excluded a further 16 because they did not meet the inclusion criteria. As shown in Figure 1, the final dataset included in the scoping review comprised 66 studies, of which 38 were identified through the database search and 28 were identified through the other two methods. The eligibility of each study in the final list was assessed and agreed upon by all four researchers during weekly meetings over several months.

Data extraction and synthesis

The extraction and synthesis of the data comprised three main steps. First, we conducted a preliminary round of data extraction using the initial list of 30 variables of interest organized into four categories that corresponded to the RQs: “Bibliographic information” (RQ1), “Themes” (RQ2), “Participant and study characteristics” (RQ3), and “Research design and analysis” (RQ4, see Appendix 1). In the context of this study, variables were defined as “observable attributes or properties of the world that take on different values” (Epstein & Martin, 2004, p. 321). We generated this initial list of variables based on the RQs guiding our scoping review. To extract data for these variables, we utilized an online collaborative Excel spreadsheet hosted in OneDrive, which was used throughout the extraction and synthesis process and ultimately became a coded dataset. The four researchers divided the 66 selected primary studies and independently extracted the data from each study that was relevant to each variable. To ensure consistency, regular discussions of the extraction process were held during weekly meetings.

Since the primary focus of this scoping review was to uncover which key topics have received attention in the literature on test preparation and how they have been investigated, we primarily examined the Introduction and Methods sections of the published works. In cases where information was not explicitly stated in these sections, we consulted other sections of the studies to locate the missing information. For example, details about statistical tests were sometimes found in the Results section.

In the second step, we refined the preliminary list of variables of interest by rephrasing, merging, and/or modifying some of them to better fit the RQs guiding the study. The final list consisted of 22 variables: 4 variables in the “Bibliographic information” category, 1 in “Themes,” 10 in “Participant and study characteristics,” and 7 in “Research design and analysis.”

After revising and finalizing the variables and extracting all the relevant data, we proceeded to data synthesis, which entailed coding the data (i.e., Step 3). At this step, two types of coding were used (Saldaña, 2021). First, the data for all four variables in the “Bibliographic information” category, as well as the data for some variables in the “Participant and study characteristics” category, were coded using attribute codes. We used attribute coding because it is appropriate for providing basic descriptive information such as the year of publication or the number of participants (Saldaña, 2021). To code the data for the remaining variables, we used thematic coding because this type of coding is used to identify patterns and themes in the extracted data.

We used one level of codes for all variables. The only exception was the “Themes” variable, for which we used three levels of codes to reflect the complexity of themes identified in the included studies (see Appendix 2). The first level of codes was based on Hughes’s (1993) classification of the washback components into participants, process, and products (also adopted by J. Ma, 2017). Using this classification, we created the following three codes to represent three broad groups of themes (henceforth Level 1 themes): perceptions (corresponding to Hughes’s participants), processes (same as in Hughes), and impacts (Hughes’s products). The second level of codes was used for the sub-types of Level 1 themes (e.g., administrator perceptions, learner perceptions, and teacher perceptions within the Level 1 theme “Perceptions”), thus providing a more nuanced categorization of perceptions, processes, and impacts. Finally, the third level of codes was used for highly refined and specific themes within each Level 2 theme (e.g., “learner perceptions of test impact on learning” or “teacher perceptions of test”). The final list of variables with sample codes is provided in Table 2.

Table 2.

Variables and sample codes.

RQs	Variables	Examples of individual codes
RQ1	Publication type Publication source Year Author	Journal article, research report, book chapter Language Testing, IELTS Research Reports Numeric value Brown, Knoch et al., Winke and Lim
RQ2	Themes^a	Perceptions Admin perceptions of test prep Learner perceptions of test Teacher perceptions of test Practices Learner test prep practices Teacher test prep practices Parental involvement Characteristics of test prep courses Factors affecting learner self-efficacy Relationship between test prep practices and test Impacts Impact of learner perceptions of test on test scores Impact of strategy use on test scores Impact of teacher characteristics on test prep Practices Impact of test on teaching Impact of test prep courses on test scores Impact of test prep on test scores Relationship between test scores and academic outcomes
RQ3	Participant type Number of learners Number of teachers Number of other participants Learner L1 Target language Geographical location Educational setting Language test Test type	Learners, teachers, administrators, parents Frequency count Frequency count Frequency count Chinese, Japanese, varied English United States, Japan, Mainland China University, language school, high school IELTS, TOEFL iBT, CET-4 International, national, regional, institutional
RQ4	Research paradigm Research design Data collection method Data collection instrument Quantitative data analytic method Statistical tests and procedures Qualitative data analytic method	Mixed methods, qualitative, quantitative Exploratory, descriptive, correlational Experiment, survey, observation Test, questionnaire, interview, observation notes Group comparison analysis, correlation analysis Paired-samples t-test, ANOVA Thematic analysis, content analysis

Note. ^aThe “Themes” variable includes examples of Level 1 and Level 3 codes only (listed in the last column). For a complete list of Level 1, Level 2, and Level 3 codes for this variable, see Appendix 2.

The finalized 22 variables were divided among the four researchers, with each researcher coding the data for 4–6 variables. To ensure the quality and reliability of coding, we double-coded the data, as explained below. Considering the importance and complexity of the “Themes” variable (RQ2) that had three levels of codes, 100% of the associated data were double-coded. For all other variables, we double-coded only 11% of the data due to resource constraints. Our decision to double-code 11% of the data was informed by Loewen and Plonsky (2015), who suggest double-coding 10–20% of the qualitative data. During the data synthesis phase, we held weekly meetings to discuss any discrepancies among the coders and reach agreement on all assigned codes, following Smagorinsky (2008) and Vorobel et al. (2021). The full set of variables and coded data are available as supplementary material in Appendix C.

We used descriptive statistics to quantitatively analyze the codes and provide insights into the features of the studies relevant for answering the four RQs.

Results

RQ1

The first RQ investigated the volume of research on test preparation and the characteristics of primary studies. The 66 included studies comprised 47 journal articles published in 29 different journals (representing 71.2% of all the publications), 13 published research reports (19.7%), 5 unpublished doctoral dissertations (7.6%), and 1 book chapter (1.5%).

The most common publication venue was IELTS Research Reports (k = 7), followed by ETS Research Report Series (k = 5). Assessment in Education: Principles, Policy, and Practice (k = 5) and Language Testing (k = 4) were the most common publication venues for journal articles, followed by Language Assessment Quarterly, Language Testing in Asia, and RELC Journal (three studies in each journal, see Table 3). The collection of publications spanned a period of nearly 30 years, with the earliest study published by Wall and Alderson in 1993. As evident from Figure 2, the number of studies has steadily increased over the years, with the largest increase observed over the past decade, wherein 73% of the reviewed studies were published.

Table 3.

Frequency counts of publication sources and source names.

Publication source	Source name	Number of studies
Journal article	Assessment in Education: Principles, Policy, and Practice	5
	Language Testing	4
	Language Assessment Quarterly	3
	Language Testing in Asia	3
	RELC Journal	3
	Assessing Writing	2
	Language Education and Assessment	2
	Studies in Educational Evaluation	2
	TESL-EJ	2
	TESOL Quarterly	2
	Applied Linguistics	1
	Assessment and Evaluation in Higher Education	1
	Australian Journal of Teacher Education	1
	Electronic Journal of Foreign Language Teaching	1
	Frontiers in Psychology	1
	International Journal of Bilingual Education and Bilingualism	1
	International Journal of Language Testing	1
	Issues in Educational Research	1
	Journal of English for Academic Purposes	1
	Journal of Language Teaching and Research	1
	Journal of the European Second Language Association	1
	Language and Sociocultural Theory	1
	Language Teaching Research	1
	Papers in Language Testing and Assessment	1
	PROFILE Issues in Teachers’ Professional Development	1
	System	1
	TESL Canada Journal	1
	The Asian Journal of Applied Linguistics	1
	The Journal of Educational Research	1
Research report	IELTS Research Reports	7
	ETS Research Report Series	5
	Cambridge ESOL Research Notes	1
Doctoral dissertation	unpublished	5
Book chapter	Washback in Language Testing: Research Contexts and Methods	1

Figure 2.

Number of included studies by year of publication.

RQ2

To answer RQ2, we examined the main themes in the 66 included studies on test preparation. As explained in the Methodology section above, our analysis of the themes was based on a three-level data coding approach. Following the three overarching Level 1 themes—namely, Perceptions, Processes, and Impacts—our analysis revealed 16 corresponding Level 2 themes (see the nested pie chart in Figure 3). Perceptions comprised Level 2 themes illustrating stakeholders’ perceptions of test preparation; Processes contained the themes representing different characteristics, factors, and practices associated with test preparation; whereas Impacts consisted of the themes that indicated how test preparation affected, or was affected by, various variables, including perceptions and processes. Among the 16 Level 2 themes, 3 were within Perceptions, 6 belonged to Practices, with the remaining 7 Level 2 themes constituting Impacts. As illustrated in Figure 3, Impacts was the largest Level 1 theme, with 80 instances of corresponding Level 2 themes found in the dataset (50 studies, or 76%), followed by Practices (60 instances of Level 2 themes in 39 studies, or 59%) and Perceptions (46 instances of Level 2 themes in 28 studies, or 42%). In terms of their frequency, the most common Level 2 themes were “learner perceptions” (found in 30 studies, or 45%), “impact of test” (28 studies, or 42%), and “learner practices” (20 studies, or 30%).

Figure 3.

Distribution of Level 1 and Level 2 themes in the included studies.

At a more granular level, each Level 2 theme comprised a refined list of individual Level 3 themes. There was a total of 66 individual themes identified in the included studies, with 14 themes related to Perceptions, 18 themes to Practices, and 34 themes to Impacts. As shown in Appendix 2, the most common Level 3 themes were “learner perceptions of test preparation (courses, practices, strategies)” in Perceptions (15/66 included studies, or 23%), “learner test preparation practices” in Practices (13/66 studies, or 20%), as well as “impact of test preparation on test scores” and “impact of test on teaching” in Impacts (9/66 studies each, or 14%). The vast majority of studies examined multiple Level 3 themes, with a single theme being the focus of only 12/66 studies (or 18%).

RQ3

RQ3 inquired about the characteristics of the recruited participant sample, geographical locations of the studies, and types of language tests used in primary research on test preparation. The reviewed studies included six different types of research participants: learners (k = 58), teachers (k = 25), administrators (k = 3), alumni (k = 2), parents (k = 1), and employers (k = 1). Twenty out of 66 reviewed studies (or 30%) included two or three participant types, with learners and teachers being the most popular combination (k = 17). The number of learner participants in each study ranged from two in Minakova (2020) to 14,593 participants in Liu (2014) (Mdn = 89). The number of teacher participants ranged from one participant in Mickan and Motteram (2008) to 200 participants in Gebril and Eid (2017), Mdn = 8. Across all 66 included studies, there were a total of at least 34,840 learner participants, 498 teacher participants, 128 alumni, 100 employers, 28 administrators, and 6 parents.

The reviewed studies included a wide range of learners’ first language (L1) backgrounds (see Table 4), with Chinese (k = 28), Korean (k = 7), and Japanese (k = 7) being the most common. Eighteen out of 66 studies had learner participants with mixed L1s, whereas in 31 studies, all learner participants shared the same L1. Notably, English was the target language in all 66 studies.

Table 4.

Frequency counts of learner participants’ L1s.

Learner participants’ L1	Number of studies
Chinese	28
of which specified Mandarin	5
of which specified Cantonese	1
varied	14
n/a	8
Japanese	7
Korean	7
unknown	6
Arabic	2
Hindi	2
Persian	2
Thai	2
Nepali	1
Spanish	1

Note. The “Chinese” category comprises 22 studies where learner participants’ L1 was given as Chinese (not otherwise specified), 5 studies that specified L1 as Mandarin Chinese, and 1 study that listed Cantonese; the “varied” category refers to the studies that either provided a long list of learners’ L1s or stated that the participants came from various L1 backgrounds; the “n/a” category includes the studies that did not contain any learner participants; the “unknown” category refers to the studies that did not report their learner participants’ L1 background.

As shown in Table 5, the geographical locations of the included studies varied, with Mainland China (k = 19), Australia (k = 6), and the United States (k = 6) being the most common. Sixty studies were conducted in one of the three educational settings, including four studies with data from two educational settings each (i.e., Barnes, 2016, 2017; J. Ma, 2017; Yu et al., 2017): university (k = 32), language school (k = 24), or high school (k = 9). The data for the remaining six studies were collected outside of educational settings (e.g., online or over the phone).

Table 5.

Frequency counts of geographical locations.

Geographical location	Number of studies
Mainland China	19
Australia	6
United States	6
Iran	5
Japan	5
varied	5
South Korea	4
Taiwan	4
United Kingdom	3
Canada	2
Hong Kong	2
New Zealand	2
Vietnam	2
Columbia	1
Cyprus	1
Egypt	1
Fiji	1
Greece	1
Nepal	1
Sri Lanka	1
Thailand	1

We categorized all language tests used in the included studies into four main types (see Table 6): international (k = 44), national (k = 20), institutional (k = 4), and regional tests (k = 1), with two studies examining two test types (i.e., J. Ma, 2017, and Stoneman, 2006). In terms of specific language tests, IELTS was the most commonly researched (k = 21), followed by TOEFL iBT (k = 13) and CET-4 (k = 6). Only four studies investigated more than one language test: Farnsworth (2013), J. Ma (2017), Saif et al. (2021), and Stoneman (2006). Table 6 also shows the geographical distribution of each test (i.e., countries where each test is administered), with the tests in the “international” category being coded as “worldwide” due to their presence in multiple countries.

Table 6.

Frequency counts of language test types and their geographical distribution.

Test type	Language test	Geographical distribution	Number of studies
International^a	IELTS	Worldwide	21
	TOEFL iBT		13
	TOEIC		4
	Basic English Skills Test Plus		1
	Cambridge English: First		1
	Examination for the Certificate of Proficiency in English		1
	high-stakes international English language test		1
	PTE Academic		1
	TOEFL (paper-based)		1
	TOEFL ITP (Institutional Testing Program)		1
	Versant English Test		1
National	College English Test Band 4 (CET-4)	Mainland China	6
	Graduate School Entrance English Examination (GSEEE)	Mainland China	2
	O-level English Exam	Sri Lanka	1
	High school final English exam	Iran	1
	General English Proficiency Test (GEPT)	Taiwan	1
	High school leaving exam (thanaweya amma)	Egypt	1
	Comprehensive Assessment Program (CAP) listening test	Taiwan	1
	Test of English for Academic Purposes (TEAP)	Japan	1
	Test of English Listening Comprehension (TELC)	Taiwan	1
	MA Language Test (MALT)	Iran	1
	Secondary Education Examination (SEE) English test	Nepal	1
	Senior High School Entrance English Test (SHSEET)	Mainland China	1
	English module of the Higher Education Admission Test	Iran	1
Institutional	General Multimedia Assisted Test of English (GMATE)	South Korea	1
	Graduating Students’ Language Proficiency Assessment (GSLPA English)	Hong Kong	1
	Oral English exam	Colombia	1
	University of Tokyo English exam	Japan	1
Regional	Computer-Based English Listening and Speaking Test (CELST)	Mainland China	1

Note. ^aThe geographical distribution of all international tests was coded as “worldwide” without specifying individual countries.

RQ4

The last RQ explored the research paradigms, research designs, data collection methods, data collection instruments, and data analytic methods used in the 66 studies on test preparation. In this review, we used the term research paradigm to refer to the three main methodological paradigms (i.e., quantitative, qualitative, and mixed methods), which we then further broke down into research designs and data analytic methods, as explained below. Forty percent of the included studies (k = 26) employed a qualitative research paradigm, followed by 22 mixed-methods studies and 18 studies that used the quantitative research paradigm. The vast majority of mixed-methods studies (i.e., 17/22) were published in the last decade, pointing to the growing recognition and adoption of this paradigm in research on test preparation.

Next, we loosely followed Creswell and Creswell (2022) to classify all included studies across the three paradigms into four categories of research design defined as “types of inquiry within qualitative, quantitative, and mixed methods approaches that provide specific direction for procedures in a research study” (p. 13). The four categories of research design were descriptive (i.e., studies designed to describe some existing phenomena related to test preparation), exploratory (i.e., studies designed to explore new phenomena), causal-comparative (i.e., studies designed to “compare two or more groups in terms of a cause (or independent variable) that has already happened,” Creswell & Creswell, 2022, p. 13), and correlational (i.e., studies investigating correlations among variables). The most common research design was descriptive (found in 51/66 studies, or 77%), followed by causal-comparative (31/66, or 47%), exploratory (23/66, or 35%), and correlational designs (16/66, or 24%). Eighteen studies were based on one research design, whereas the remaining 48 studies deployed a combination of two or, in some cases, three types of research design.

Table 7 shows that the most common data collection method was survey (k = 61) followed by experiment (which included both experiments and quasi-experiments, k = 21), observation (k = 21), and document collection (k = 13) methods. Twenty-six out of 66 studies (or 39%) used a single data collection method, namely, survey (22 studies), experiment (3 studies), and observation (1 study). The remaining 40 studies used a combination of two (k = 31), three (k = 8), or four (k = 2) data collection methods. Furthermore, researchers in the included studies used a variety of data collection instruments, with interviews (k = 45), questionnaires (k = 41), and tests (k = 21) being the most common.

Table 7.

Frequency counts of data collection methods and instruments.

Categories	Number of studies
Data collection method
Survey	61
Experiment	21
Observation	21
Document collection	13
Data collection instrument
Interview	45
Questionnaire	41
Test	21
Observation protocol	12
Observation notes	11
Teaching materials	5
Documents	3
Homework	2
Journals	2
Coursebooks	1
Think-aloud protocol	1

Among the qualitative data analytic methods (see Table 8), thematic analysis was the most common one (k = 44), followed by content analysis (k = 23) and discourse analysis (k = 1). Two studies that reported collecting and analyzing interview data as part of their mixed-methods research paradigms (i.e., Rao et al., 2003; Winke & Lim, 2017) did not specify the method used for qualitative data analysis and were thus coded as “unknown.”

Table 8.

Frequency counts of analytic methods for quantitative and qualitative data.

Categories	Number of studies
Qualitative data analytic methods
Thematic analysis	44
Content analysis	23
Discourse analysis	1
unknown	2
Quantitative data analytic methods
Descriptive statistics	32
Group comparison analysis	22
Factor analysis	12
Correlation analysis	10
Regression analysis	10
other	8

With regard to the quantitative data analytic methods, descriptive statistics were provided in almost half of the studies (k = 32), whereas group comparison analysis (which entailed testing for statistical significance) was used in one-third of the studies (k = 22). The remaining quantitative data analytic methods included factor analysis (k = 12), correlation analysis (k = 10), and regression analysis (k = 10), with 30 studies using a combination of methods. Within these broader categories of quantitative data analytic methods, there were multiple individual statistical tests and procedures (shown in Table 9), with the most common being paired-samples t-test and Spearman’s correlation (k = 7 each), followed by chi-square test, confirmatory factor analysis, exploratory factor analysis, and structural equation modeling (six studies each).

Table 9.

Frequency counts of statistical tests and procedures.

Statistical tests and procedures	Number of studies
Paired-samples t-test	7
Spearman’s correlation	7
Chi-square test (of independence)	6
Confirmatory factor analysis	6
Exploratory factor analysis	6
Structural equation modeling	6
Independent samples t-test	5
Mann–Whitney U test	5
Analysis of variance (ANOVA)	3
Pearson’s correlation	3
Wilcoxon signed-rank test	3
Factor analysis (unknown)	2
Hierarchical (multiple) regression	2
Post hoc comparison	2
Stepwise regression analysis	2
Analysis of covariance (ANCOVA)	1
Bivariate correlation	1
Correlation analysis (unknown)	1
Group comparison analysis (unknown)	1
Latent multivariate regression analysis	1
MANOVA	1
Mediation modeling construction	1
Multiple correspondence analysis	1
Multiple regression	1
Multivariate analysis of variance (MANOVA)	1
Neural network analysis	1
Principal component analysis	1
Regression analysis (unknown)	1
Sequential multiple regression	1
Simple regression	1

Note. Parenthetical “unknown” refers to the cases where the authors of the included studies did not specify the type of statistical test or procedure they used.

Discussion

The current study aimed to investigate methodological practices in research on test preparation. Specifically, we examined 66 studies to identify the number and types of publication venues, the main themes explored, methodological and participant characteristics, as well as the types of research approaches, research designs, data collection methods and instruments, and methods of data analysis. In this section, we discuss our results in relation to each RQ. In so doing, we summarize the main trends and patterns in our findings, offer an interpretation, and highlight issues related to the methodological aspects of the included studies, reflecting practices in language test preparation research over the past three decades, that require future consideration.

Amount of research and publication venues

The findings for RQ1 suggest that the number of studies investigating test preparation has been growing in recent years, demonstrating the increasing importance and scholarly maturity of this research area. One potential reason for this growth is a significant influx of international students in some predominantly anglophone countries over the past two decades, with the total number of internationally mobile students globally surging from 2.1 million in 2000 to 6.4 million in 2021 (UNESCO Institute for Statistics [UIS], 2024). This growth trend highlights the need for a deeper understanding of effective preparation practices tailored to different groups of L2 learners and various high-stakes language proficiency tests worldwide. It also underscores the importance of adapting L2 instructional approaches to ensure that this globally expanding demographic possesses the language proficiency necessary for academic success.

Most of the studies in our dataset (k = 47) have been published in journals. Even though the key journals in the field such as Language Testing and Assessment in Education: Principles, Policy & Practice house the largest number of studies per journal, the research report series published by ETS (k = 5) and IELTS (k = 7) contain the largest number of studies per publication venue. A fairly large number of research reports suggest that test preparation holds a prominent place in the research agendas of major test development companies that are willing to fund studies in this area.

While the diverse range of publication venues shows some promise that research on test preparation is of growing interest not only to applied linguists and language educators but also to scholars in adjacent fields such as psychology and cognitive science, the majority of source journals remain in the domains of language teaching/assessment, general education, and applied linguistics. The paucity of publications in adjacent fields suggests that for research on test preparation to have a meaningful impact on, for instance, language assessment policy for immigration or business, more interdisciplinary collaboration is needed between scholars in applied linguistics and other relevant areas such as political science. Such collaborations would ideally result in more cross-disciplinary research dissemination to expand the reach of test preparation literature across various publication venues.

Main themes

In response to RQ2, this scoping review uncovered a wide range of individual themes in the included studies. Interestingly, 60% of all themes (i.e., 40 out of 66) were explored in single studies. This pattern indicates that most themes related to test preparation have been studied only once, which points to a need for further, more in-depth investigation of these themes. In addition, more than half of individual themes (i.e., 34 out of 66 themes) were found to belong to the Impacts category and were investigated in 74% of the included studies (i.e., 49 out of 66 studies). This finding suggests that researchers working in this area tend to prioritize impact-related themes, such as the impact of test on teaching (e.g., Read & Hayes, 2003; Teng & Fu, 2019), the impact of test preparation on test scores (e.g., Winke & Lim, 2017; Xie, 2013), and the impact of test on learning (e.g., H. Ma & Chong, 2022; Wall & Horák, 2006). It should be noted that impact was found to be a complex bi-directional phenomenon, with some studies investigating the impact of different aspects of test preparation on test scores (i.e., impact of test preparation; see, for instance, Farnsworth, 2013; Hu & Trenkic, 2021; Knoch et al., 2020) and other studies examining the impact of a specific test on test preparation processes and practices (i.e., impact on test preparation; for example, Barnes, 2016; Stoneman, 2006; Yang, 2020).

While the studies exploring themes in the other two categories were less common, the proportion of their representation was still notable, with test preparation practices scrutinized in 38 studies (or 58%) and stakeholders’ perceptions being the focus of 29 studies (44%). Coupled with the fact that the vast majority of the included studies (i.e., 54 out of 66 studies, or 82%) investigated multiple themes, this moderately balanced distribution of the overarching categories intimates that researchers tend to study the topic of test preparation from different angles.

Study and participant characteristics

Despite the relevance of language test preparation to various stakeholder groups, the findings for RQ3 revealed that, with the exception of six studies (i.e., Dawadi, 2020; J. Ma, 2017; Pan & In’nami, 2017; Saif et al., 2021; Sayyadi & Rezvani, 2021; Wall & Horák, 2006), the existing body of research focuses overwhelmingly on teacher and learner participants. Other important groups of stakeholders in language testing (e.g., test administrators, immigration officials, and policymakers; see the list in Rea-Dickins, 1997) who use language test scores for making various decisions that have high stakes for L2 learners, such as decisions related to university admission, immigration, and employment, are yet to be examined empirically. The inclusion of other participant types in studies on test preparation appears to be particularly pertinent in light of Taylor’s (2013) call for further research investigating the development of language assessment literacy (LAL) across various stakeholder groups. Built upon the dimensions that Taylor proposed, Kremmel and Harding’s (2020) Language Assessment Literacy Survey contains multiple items related to test preparation and washback in general (e.g., Items 19, 23–25), indicating that the inclusion of more diverse participant (i.e., stakeholder) types in test preparation research has a strong potential to inform practices in the development of LAL, particularly in the component of language pedagogy.

Our results for RQ3 also point toward a sampling bias in the test preparation literature, a phenomenon not uncommon in applied linguistics and adjacent disciplines. Universities and language schools were the most common educational setting observed in the reviewed studies, a finding consistent with Andringa and Godfroid’s (2020) synthesis of meta-analyses in applied linguistics. Specifically, Andringa and Godfroid note that it is unclear what types of learners fall in samples from “language institutes” and that comparatively little is known about the language learning that takes place there. In the current synthesis, the reviewed studies covered multiple types of language schools, such as Japanese “juku” cram schools (Allen, 2016b) and Korean “hagwons”—private cram schools (Kim, 2021). Additionally, English was the only observed target language in the literature on test preparation, a finding that is less surprising when considered together with the overwhelming prevalence of studies investigating the preparation for international language tests such as IELTS and TOEFL iBT. Therefore, generalizing findings from educated participants (with likely higher socio-economic status) who study English for high-stakes international assessments to other contexts may not be appropriate. To obtain more generalizable results in this line of research, we stand in agreement with Andringa and Godfroid’s (2020) recommendation to diversify participant samples and expand research contexts beyond academia.

Patterns in research methodologies

The last RQ evinced some intriguing methodological patterns in the literature on test preparation, with three-fourths of the studies using descriptive research design and employing qualitative data analytic methods. Only a handful of quantitative or mixed methods studies in our dataset utilized more advanced statistical tests and procedures such as factor analysis (k = 12) and structural equation modeling (k = 6), with most studies resorting to the use of descriptive statistics (k = 32). This finding echoes Xie’s (2015) concern about a dearth of quantitative research on test preparation that employs “sophisticated” (p. 58) quantitative methodology.

In light of the large body of descriptive studies, it was hardly a surprise that survey was deployed as the data collection method in all but five studies, with interviews and questionnaires being the most common data collection instruments (k = 45 and k = 41, respectively). While survey research can provide useful insights into different aspects of test preparation and has been extensively adopted in language assessment and the broader field of applied linguistics, a number of concerns have been raised in recent years regarding the validity of this method and associated data collection instruments. Such concerns include the impact of the interviewer (e.g., power dynamics) and the interactional context on the interviewee’s responses (Talmy, 2010), the issues related to ill-constructed questionnaires that do not adequately measure the construct under consideration (Dörnyei & Dewaele, 2023), as well as over-reliance on convenience sampling that limits the generalizability of the findings (Wagner, 2015). These limitations highlight the fact that survey research alone cannot elucidate the full spectrum of variables and the complexity of their interaction, suggesting the importance of procuring additional empirical evidence through well-designed experiments that use, for example, the completely randomized design or the factorial design.

Lastly, the findings for RQ4 also demonstrated that qualitative studies in our dataset relied primarily on thematic analysis of the data. As a widely used approach to identifying, analyzing, and reporting themes in the data, thematic analysis has been embraced by many researchers because of its flexibility and ease of use (Nowell et al., 2017). Meanwhile, one of the main disadvantages of this qualitative data analytic method is that its flexible nature may lead to the lack of consistency and coherence in theme development (Holloway & Todres, 2003, as cited in Nowell et al., 2017). Indeed, in analyzing the research methodologies underlying the included studies, we observed that some of them failed to adequately outline and report the details of the procedure followed in conducting a thematic analysis that can be considered trustworthy and methodologically rigorous. This issue highlights the importance of transparency in qualitative research (cf. Chong & Plonsky, 2021) and points to the need to ensure the methodological rigor of future qualitative studies on test preparation.

Conclusion

The overarching goal of this scoping review was to provide a synthesis of research on test preparation in both second and foreign language contexts. Specifically, we aimed to explore the key characteristics of publications on test preparation, the main themes, the study and participant characteristics, as well as the essential aspects of research methodologies underlying the included studies. Out of 66 studies that we identified using a set of inclusion and exclusion criteria, the most common publication type was journal articles published in various L2 testing and learning journals, followed by research reports published by IELTS and ETS. The earliest publication on test preparation appeared in 1993, with the last decade witnessing a substantial increase in the number of published studies on this topic. Using a three-level coding scheme to examine the themes in each study, we identified 66 individual Level 3 themes within 16 Level 2 themes that were grouped into 3 main Level 1 themes (i.e., Impacts, Practices, and Perceptions). The largest number of studies fell into the Impacts theme, followed by the Practices and Perceptions themes. Almost one-half of the included studies belonged to “learner perceptions” (Level 2 theme), with other common Level 2 themes being “impact of test” and “learner practices.”

In terms of participant characteristics, 88% of all participants were learners, with their number in individual studies ranging from two to 14,593. Furthermore, we found that Chinese was the most dominant learners’ L1 and Mainland China was the most common geographical location for data collection, with most studies conducted at universities (48%) or language schools (36% of studies). As the target language of all 66 studies was English, two-thirds of tests examined in the included studies were international English language proficiency tests, such as IELTS and TOEFL iBT, followed by almost one-third of studies that investigated national tests, such as CET-4 in Mainland China.

Finally, our scoping review revealed that studies conducted within the qualitative research paradigm were more prevalent (40%) compared to mixed methods (33%) and quantitative studies (27%). Surveys were used in all but five studies, with interviews and questionnaires reported as data collection instruments in 68% and 62% of the studies, respectively. While thematic analysis was the most common qualitative data analytic method (67% of the studies), descriptive statistics were present in almost half of the included studies and group comparison analysis was used in one-third of the studies. Our review also unveiled a multitude of statistical tests and procedures that varied from fairly basic (e.g., paired-samples t-test) to more advanced (e.g., structural equation modeling).

Implications

Our examination of the individual themes across the 66 included studies revealed that test preparation is a multifaceted phenomenon: Most themes (40 out of 66) have been the subject of individual studies. The broad range of individual themes shows that researchers have embarked on disparate explorations and have approached the issue from different angles, which carries implications for scholars further delving into this topic. When interpreting existing literature findings, scholars should be cautioned against making generalizations, since there is a lack of multiple studies exploring the same theme. Consequently, there is a pressing need for research that revisits and repeatedly explores these individual themes to construct a more informed comprehension of the complex phenomenon that is test preparation. Such replication research is essential for strengthening the validity and generalizability of existing findings (Porte & McManus, 2019)—for instance, on the link between self-access test preparation activities and test performance in different instructional or assessment contexts than the original study (Knoch et al., 2020).

While research on test preparation lacks thematic homogeneity, there has been a noteworthy increase in research on this topic, especially over the last 10 years (with research published since 2013 accounting for 73% of the reviewed studies). This surge underscores the growing importance of test preparation in the field of language assessment, carrying implications for test developers. One such implication is that test preparation practices should be taken into consideration in test validation research that examines the impact of preparation on the assessment of language proficiency. Examining test preparation practices as part of validation research should extend to more local tests, such as institutional and regional language tests that, unlike well-established international language tests, are less likely to be the subject of multiple robust validation studies.

The exclusive focus on English tests in the current literature reflects the dominant role that English continues to play in the global language assessment industry (see, for instance, Isbell & Kremmel, 2020). This has implications for ESL and EFL teachers: Preparation for high-stakes English language tests has been and will most likely remain an integral part of language learning, especially in countries with exam-centric educational structures with institutionalized test-preparation markers, such as China, Japan, and South Korea (Ross, 2008). Therefore, ESL and EFL teachers working with L2 learners striving to prepare for specific English proficiency exams should seek a healthy balance between engaging learners in effective test-preparation practices and helping them improve their general English language proficiency.

Limitations

One limitation of this study pertains to our conceptualization of test preparation. As previously mentioned, there exists a broad range of terminologies denoting topics that are directly or closely associated with test preparation, such as coaching (e.g., Hu & Trenkic, 2021), teaching to the test (e.g., Menken, 2006), and test-wiseness (test-deviousness) or test-taking strategy training (e.g., Yu & Green, 2021). This lack of standardized conceptualization surrounding test preparation posed challenges in the search for studies relevant to our synthesis. To mitigate this challenge, we developed a comprehensive search strategy that encompassed an extensive range and various combinations of search terms (e.g., washback, backwash, test preparation, and coaching). Nonetheless, it remains possible that the variability in conceptualization may have impeded our ability to detect all pertinent studies.

Another potential limitation concerns our approach to data coding. When coding the data to identify the main themes explored in the literature on test preparation (RQ2), we encountered differences in nomenclature used by the authors of the included studies. As a result, our challenge was to decide whether to code the data for a particular theme based on the authors’ original terminology (which at times was misleading and did not accurately reflect the phenomena under investigation), or rather whether to rely on our own interpretation of the theme when assigning codes. To minimize the potential bias in our coding and analysis and ensure high inter-coder reliability, all four authors of this scoping review independently coded the data for themes and then discussed the codes to reconcile any discrepancies and create a final list of themes. A similar issue arose when coding the data for RQ4: The authors of the included studies used different terminologies to refer to their research designs, data collection instruments, and methods of data analysis, which we had to reconcile. For instance, while a number of studies that reported using written “surveys” with Likert-scale items referred to them as data collection instruments, we coded these as “questionnaires” following Brown (2001) and Wagner (2015), with the term “survey” being reserved for a data collection method.

Finally, we did not investigate the methodological quality of the included publications in our scoping review. According to Peters et al. (2020), a formal appraisal of the methodological quality of the research studies included in a scoping review is generally not required. However, such an appraisal would have strengthened the quality of this synthesis by more systematically evaluating individual studies. It could have also provided a snapshot of the methodological strengths and shortcomings of the included studies as an indicator of the state of reporting on language test preparation. Adopting appropriate reporting guidelines to formally evaluate methodological quality would be warranted in any further use of systematic methods to investigate this topic (see, for instance, Isaacs & Chalmers, 2023).

Directions for future research

While this scoping review provided a comprehensive characterization of the extent and nature of research on test preparation and identified a variety of themes related to test preparation perceptions, practices, and impacts, our synthesis did not examine the specific knowledge that was gained about these themes in the included studies. Consequently, future research is needed to review the findings of the existing studies on test preparation in order to advance our understanding of how test preparation is perceived by the key stakeholders, what test preparation practices are commonly used, and what factors contribute to the effectiveness of test preparation.

Our scoping review has also revealed a need for experimental studies that use more advanced statistical tests and procedures to investigate test preparation practices and their impact on L2 learners’ test performance. By employing well-designed experiments and harnessing the power of analytic techniques that align with the RQs, future studies will be able to explore complex relationships among different variables related to test preparation, provide deeper insights into the data, uncover hidden patterns, and enhance the validity of the research findings and their interpretation.

Further research is also warranted to address the existing gaps that we identified in the literature on test preparation. For instance, more empirical studies should be undertaken to explore a broader range of tests that we identified in the included studies based on their geographical distribution. While the majority of studies in this scoping review (approximately 67%) focus on high-stakes international tests, such as IELTS (k = 21) and TOEFL iBT (k = 13), it is essential to investigate other tests, such as national and regional tests, given that some of them also have a substantial number of test-takers. For example, the CET-4 test, a national English proficiency exam in Mainland China administered to undergraduate students (with the exception of English majors), warrants a more thorough investigation. The CET-4 was taken by 13 million students in 2006 (Zheng & Cheng, 2008), compared to a record 3.5 million test-takers reported by IELTS in 2018 (IELTS, 2019). This example highlights the importance of gathering additional empirical evidence on a wider range of tests to gain a deeper understanding of test preparation in L2 contexts. Moreover, it is imperative to undertake additional empirical research aimed at exploring proficiency tests of languages other than English. Notably, among the 66 studies within this scoping review, English has consistently remained the sole focal language. Researchers across various areas of applied linguistics, such as Gillespie (2020) in computer-assisted language learning and Dalman and Plonsky (2022) in L2 listening strategy instruction, have increasingly emphasized the urgent necessity for research examining a more extensive array of languages, including less commonly taught languages.

Finally, future research should investigate the influence of the COVID-19 pandemic and online language proficiency tests on test preparation practices, perceptions, and impacts. The landscape of post-secondary admissions and language proficiency testing has undoubtedly been altered since the onset of the pandemic (Ockey, 2021), with the closing of testing centers in 2020 necessitating the swift development and uptake of digital, at-home language tests such as the Duolingo English Test (DET) (Isbell & Kremmel, 2020). Although the literature on relatively novel, online proficiency tests such as the DET is still rather scarce, the importance of test preparation practices and their relevance to the interpretation of scores from such tests has already been recognized by some researchers (e.g., Isaacs et al., 2023). Given the dearth of investigations into the online language test preparation industry, scholars are left to speculate about the potential washback of assumed test preparation practices on score validity (e.g., Wagner, 2020). Therefore, there is a clear need for future research to examine the perceptions, practices, and impacts of test preparation for new digital at-home language tests compared to more established language proficiency tests administered at testing centers.

Supplemental Material

sj-xlsx-1-ltj-10.1177_02655322241249754 – Supplemental material for A scoping review of research on second language test preparation

Supplemental material, sj-xlsx-1-ltj-10.1177_02655322241249754 for A scoping review of research on second language test preparation by Shanshan He, Anne-Marie Sénécal, Laura Stansfield and Ruslan Suvorov in Language Testing

Footnotes

Appendix

Appendix 2.

Frequency counts of the three levels of codes representing the Themes variable.

Level 1	Level 2	Level 3	No. of studies
Perceptions	Admin perceptions	Admin perceptions of test prep	1
	Learner perceptions	Learner perceptions of test prep (courses, practices, and strategies)	15
		Learner perceptions of test	9
		Learner perceptions of teaching method in test prep course	1
		Learner perceptions of technologies for testing	1
		Learner perceptions of test impact on learning	1
		Learner perceptions of test prep outcomes	1
		Learner perceptions of questioning skills in test prep course	1
		Learner perceptions of usefulness of feedback for test prep	1
	Teacher perceptions	Teacher perceptions of test prep (courses, practices, and strategies)	9
		Teacher perceptions of test	3
		Teacher perceptions of questioning skills in test prep course	1
		Teacher perceptions of test impact on language proficiency	1
		Teacher perceptions of usefulness of feedback for test prep	1
Practices	Learner practices	Learner test prep practices	13
		Learner test prep strategies	4
		Learner test-taking strategies	2
		Changes in learner test prep practices over time	1
	Teacher practices	Teacher test prep practices	9
	Parental practices	Parental involvement	1
	Characteristics	Characteristics of test prep courses	8
		Comparison of test prep and non-test prep courses	6
		Characteristics of test prep course materials	1
		Characteristics of test prep resources	1
		Characteristics of participants in test prep courses	1
	Factors	Factors affecting learner test prep (practices, strategies)	5
	Factors	Factors affecting teacher practices in test prep courses	2
		Factors affecting learner self-efficacy	1
	Practice-based relationships	Relationship between test prep practices and test	2
		Relationship between test prep and self-efficacy	1
		Relationship between test prep practices and stakeholder perceptions	1
		Relationship between test-taking strategies and test anxiety	1
Impacts	Impact of perceptions	Impact of learner perceptions of test on test prep practices	4
		Impact of learner perceptions of test on test scores	1
		Impact of teacher perceptions on test prep courses	1
	Impact of practices	Impact of strategy use on test scores	3
		Impact of Dynamic Assessment on language development and test prep	1
		Impact of strategy instruction on strategy use	1
		Impact of strategy instruction on test scores	1
		Impact of teaching method on test scores	1
	Impact of individual characteristics	Impact of learner characteristics (motivation) on test prep practices	3
	Impact of individual characteristics	Impact of teacher characteristics on test prep practices	1
	Impact of test	Impact of test on teaching	9
		Impact of test on learning	7
		Impact of test on teaching methodology	3
		Impact of test on test prep strategies	3
		Impact of test on teaching content	2
		Impact of test on test prep practices	2
		Impact of test on employability	1
		Impact of test on test-taking strategies	1
	Impact of test prep courses	Impact of test prep courses on test scores	8
		Impact of test prep courses on language proficiency	4
		Impact of non-test prep courses on test scores	1
		Impact of test prep course materials on teaching	1
		Impact of test prep courses on learner motivation	1
		Impact of test prep courses on learner practices	1
		Impact of test prep courses on learner perceptions of learning	1
	Impact of test preparation	Impact of test prep on test scores	9
		Impact of test prep on language proficiency	2
		Impact of test prep on test anxiety	1
		Impact of test prep on learning	1
		Impact of test prep on teaching	1
		Impact of test prep on test-taking strategies	1
	Impact-based relationships	Relationship between test scores and academic outcomes	1
		Relationship between test scores and number of test-taking attempts	1
		Comparison of test scores from test prep course participants and non-test prep course participants	1

Acknowledgements

The authors would like to thank Dr. Talia Isaacs and the three anonymous reviewers for their insightful comments and constructive feedback that have been instrumental in shaping and refining this manuscript.

Author Contribution

Shanshan He: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Writing—original draft; Writing—review and editing.

Anne-Marie Sénécal: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Writing—original draft; Writing—review and editing.

Laura Stansfield: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Visualization; Writing—original draft; Writing—review and editing.

Ruslan Suvorov: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Supervision; Writing—original draft; Writing—review and editing.

Declaration of conflicting interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Ruslan Suvorov currently serves as Associate Editor of Language Testing. He was blinded to the manuscript in the ScholarOne online submission platform and Dr. Talia Isaacs managed all stages of its processing as handling editor. The remaining co-authors declared no potential conflicts of interest with respect to the research and authorship of this study.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Shanshan He

Anne-Marie Sénécal

Laura Stansfield

Ruslan Suvorov

Supplemental material

Supplemental material for this article is available online at the following link: .

References

*Abad

J. V.

Alzate

P. A.

(2016). Strategies instruction to improve the preparation for English oral exams. Profile: Issues in Teachers’ Professional Development, 18(1), 129–147. https://doi.org/10.15446/profile.v18n1.49592

*Alderson

J. C.

Hamp-Lyons

(1996). TOEFL preparation courses: A study of washback. Language Testing, 13(3), 280–297. https://doi.org/10.1177/026553229601300304

*Allen

(2016a). Investigating washback to the learner from the IELTS test in the Japanese tertiary context. Language Testing in Asia, 6(1), 1–20. https://doi.org/10.1186/s40468-016-0030-z

*Allen

(2016b). Japanese cram schools and entrance exam washback. The Asian Journal of Applied Linguistics, 3(1), 54–67. https://caes.hku.hk/ajal/index.php/ajal/article/view/338/412

Andringa

Godfroid

(2020). Sampling bias and the problem of generalizability in applied linguistics. Annual Review of Applied Linguistics, 40, 134–142. https://doi.org/10.1017/S0267190520000033

*Barnes

(2016). The washback of the TOEFL iBT in Vietnam. Australian Journal of Teacher Education, 41(7), 158–174. https://doi.org/10.14221/ajte.2016v41n7.10

*Barnes

(2017). Washback: Exploring what constitutes “good” teaching practices. Journal of English for Academic Purposes, 30, 1–12. https://doi.org/10.1016/j.jeap.2017.10.003

*Booth

(2012). Exploring the washback of the TOEIC in South Korea: A sociocultural perspective on student test activity [Unpublished doctoral thesis]. The University of Auckland. http://hdl.handle.net/2292/19379

*Brown

J. D.

(1998). Does IELTS preparation work? An application of the context-adaptive model of language program evaluation (IELTS Research Reports 1998–1). https://s3.eu-west-2.amazonaws.com/ielts-web-static/production/Research/does-ielts-preparation-work-brown-1998.pdf

10.

Brown

J. D.

(2001). Using surveys in language programs. Cambridge University Press.

11.

*Chappell

Lynda

Benson

(2019). Investigating test preparation practices: Reducing risks (IELTS Research Reports Online Series, No. 3). British Council, Cambridge Assessment English and IDP: IELTS Australia. https://s3.eu-west-2.amazonaws.com/ielts-web-static/production/Research/investigating-test-preparation-practices-reducing-risks-chappell-et-al-2019.pdf

12.

Cheng

Doe

(2013). Test preparation: A double-edged sword. IATEFL-TEASIG (International Association of Teachers of English as a Foreign Language’s Testing, Evaluation and Assessment Special Interest Group) Newsletter, 54, 19–20.

13.

Cheng

Sun

(2015). Review of washback research literature within Kane’s argument-based validation framework. Language Teaching, 48(4), 436–470. https://doi.org/10.1017/S0261444815000233

14.

Chong

S. W.

Plonsky

(2021). A primer on qualitative research synthesis in TESOL. TESOL Quarterly, 55(3), 1024–1034. https://doi.org/10.1002/tesq.3030

15.

Chong

S. W.

Reinders

(2020). Technology-mediated task-based language teaching: A qualitative research synthesis. Language Learning & Technology, 24(3), 70–86. http://hdl.handle.net/10125/44739

16.

Chong

S. W.

Reinders

(2022). Autonomy of English language learners: A scoping review of research and practice. Language Teaching Research. Advance online publication. https://doi.org/10.1177/13621688221075812

17.

*Chou

M.-H.

(2019). Predicting self-efficacy in test preparation: Gender, value, anxiety, test performance, and strategies. The Journal of Educational Research, 112(1), 61–71. https://doi.org/10.1080/00220671.2018.1437530

18.

*Clark

(2022a). Test preparation pedagogy for international study: Relating teacher cognition, instructional models and academic writing skills. Language Teaching Research. Advance online publication. https://doi.org/10.1177/13621688211072381

19.

*Clark

(2022b). The pedagogical remit of test preparation: The case of writing acquisition on an IELTS course. Applied Linguistics Review. Advance online publication. https://doi.org/10.1515/applirev-2021-0164

20.

Cohen

A. D.

(2021). Test-taking strategies and task design. In Fulcher

Harding

(Eds.), The Routledge handbook of language testing (pp. 372–396). Routledge. https://doi.org/10.4324/9781003220756

21.

Creswell

J. W.

Creswell

J. D.

(2022). Research design: Qualitative, quantitative, and mixed methods approaches (6th ed.). Sage.

22.

Crocker

(2006). Preparing examinees for test taking: Guidelines for test developers and test users. In Downing

S. M.

Haladyna

T. M.

(Eds.), Handbook of test development (pp. 115–128). Lawrence Erlbaum.

23.

Dalman

Plonsky

(2022). The effectiveness of second-language listening strategy instruction: A meta-analysis. Language Teaching Research. Advance online publication. https://doi.org/10.1177/13621688211072981

24.

*Damankesh

Babaii

(2015). The washback effect of Iranian high school final examinations on students’ test-taking and test-preparation strategies. Studies in Educational Evaluation, 45, 62–69. https://doi.org/10.1016/j.stueduc.2015.03.009

25.

*Dawadi

(2020). Parental involvement in national EFL test preparation. RELC Journal, 51(3), 427–439. https://doi.org/10.1177/0033688219848770

26.

*Di Gennaro

J. A

. (2017). The washback effects of an English exit exam on teachers and learners in a Korean university English program [Unpublished doctoral thesis]. University of Exeter.

27.

Dörnyei

Dewaele

J.-M.

(2023). Questionnaires in second language research: Construction, administration, and processing (3rd ed.). Routledge. https://doi.org/10.4324/9781003331926

28.

Epstein

Martin

(2004). Coding variables. In Kempf-Leonard

(Ed.), Encyclopedia of social measurement (Vol. 1, pp. 321–327). Elsevier Academic.

29.

*Farnsworth

(2013). Effects of targeted test preparation on scores of two tests of oral English as a second language. TESOL Quarterly, 47(1), 148–155. https://doi.org/10.1002/tesq.75

30.

Fulcher

(2010). Practical language testing. Hodder Education. https://doi.org/10.4324/980203767399

31.

*Gan

(2009). IELTS preparation course and student IELTS performance: A case study in Hong Kong. RELC Journal, 40(1), 23–41. https://doi.org/10.1177/0033688208101449

32.

*Gebril

Eid

(2017). Test preparation beliefs and practices in a high-stakes context: A teacher’s perspective. Language Assessment Quarterly, 14(4), 360–379. https://doi.org/10.1080/15434303.2017.1353607

33.

Gillespie

(2020). CALL research: Where are we now? ReCALL, 32(2), 127–144. https://doi.org/10.1017/S0958344020000051

34.

*Green

(2006). Washback to the learner: Learner and teacher perspectives on IELTS preparation course expectations and outcomes. Assessing Writing, 11(2), 113–134. https://doi.org/10.1016/j.asw.2006.07.002

35.

*Green

(2007). Washback to learning outcomes: A comparative study of IELTS preparation and university pre-sessional language courses. Assessment in Education: Principles, Policy & Practice, 14(1), 75–97. https://doi.org/10.1080/09695940701272880

36.

Green

(2013). Washback in language assessment. International Journal of English Studies, 13(2), 39–51. https://doi.org/10.6018/ijes.13.2.185891

37.

*Gu

Davis

Tao

Zechner

(2021). Using spoken language technology for generating feedback to prepare for the TOEFL iBT® test: A user perception study. Assessment in Education: Principles, Policy & Practice, 28(1), 58–76. https://doi.org/10.1080/0969594X.2020.1735995

38.

Haladyna

T. M.

Nolen

S. B.

Haas

N. S.

(1991). Raising standardized achievement test scores and the origins of test score pollution. Educational Researcher, 20(5), 2–7. https://doi.org/10.2307/1176395

39.

Hamp-Lyons

(1998). Ethical test preparation practice: The case of the TOEFL. TESOL Quarterly, 32(2), 329–337. https://doi.org/10.2307/3587587

40.

*Hayes

Read

(2004). IELTS test preparation in New Zealand: Preparing students for the IELTS academic module. In Cheng

Watanabe

Curtis

(Eds.), Washback in language testing: Research contexts and methods (pp. 97–111). Lawrence Erlbaum.

41.

Hincks

(2015). Technology and learning pronunciation. In Reed

Levis

J. M.

(Eds.), The handbook of English pronunciation (pp. 505–519). Wiley. https://doi.org/10.1002/9781118346952.ch28

42.

*Hu

Trenkic

(2021). The effects of coaching and repeated test-taking on Chinese candidates’ IELTS scores, their English proficiency, and subsequent academic achievement. International Journal of Bilingual Education and Bilingualism, 24(10), 1486–1501. https://doi.org/10.1080/13670050.2019.1691498

43.

Hughes

(1993). Backwash and TOEFL 2000 [Unpublished manuscript commissioned by Educational Testing Service (ETS)]. University of Reading.

44.

IELTS. (2019). IELTS grows to 3.5 million a year. https://takeielts.britishcouncil.org/about/press/ielts-grows-three-half-million-year

45.

In’nami

Koizumi

(2010). Database selection guidelines for meta-analysis in applied linguistics. TESOL Quarterly, 44(1), 169–184. https://doi.org/10.5054/tq.2010.215253

46.

*Irvine-Niakaris

Kiely

(2015). Reading comprehension in test preparation classes: An analysis of teachers’ pedagogical content knowledge in TESOL. TESOL Quarterly, 49(2), 369–392. https://doi.org/10.1002/tesq.189

47.

Isaacs

Chalmers

(2023). Reducing “avoidable research waste” in applied linguistics research: Insights from healthcare research. Language Teaching. Advance online publication. https://doi.org/10.1017/S0261444823000411

48.

Isaacs

Trenkic

Varga

(2023). Examining the predictive validity of the Duolingo English Test: Evidence from a major UK university. Language Testing, 40(3), 748–770. https://doi.org/10.1177/02655322231158550

49.

Isbell

D. R.

Kremmel

(2020). Test review: Current options in at-home language proficiency tests for making high-stakes decisions. Language Testing, 37(4), 600–619. https://doi.org/10.1177/0265532220943483

50.

*Kang

Hirschi

Miao

Ahn

Won

(2022). Test-takers’ IELTS preparations, their attitudes towards IELTS practices, and the use of technologies in the global pandemic (IELTS Research Reports Online Series, No. 2/22). British Council, Cambridge Assessment English and IDP: IELTS Australia. https://s3.eu-west-2.amazonaws.com/ielts-web-static/production/Research/test-takers-ielts-preparations-and-use-of-technologies-in-global-pandemic-kang-et-al-2022.pdf

51.

*Kim

(2021). Prepping for the TOEFL iBT writing test, Gangnam style. Assessing Writing, 49, 100544. https://doi.org/10.1016/j.asw.2021.100544

52.

*Knoch

Huisman

Elder

Kong

McKenna

(2020). Drawing on repeat test takers to study test preparation practices and their links to score gains. Language Testing, 37(4), 550–572. https://doi.org/10.1177/0265532220927407

53.

Kremmel

Harding

(2020). Towards a comprehensive, empirical model of language assessment literacy across stakeholder groups: Developing the Language Assessment Literacy Survey. Language Assessment Quarterly, 17(1), 100–120. https://doi.org/10.1080/15434303.2019.1674855

54.

*Lertcharoenwanich

(2022). The effect of communicative language teaching in test preparation course on TOEIC score of EFL business English students. Journal of Language Teaching and Research, 13(6), 1188–1195. https://doi.org/10.17507/jltr.1306.06

55.

Levitt

H. M.

Bamberg

Creswell

J. W.

Frost

D. M.

Josselson

Suárez-Orozco

(2018). Journal article reporting standards for qualitative primary, qualitative meta-analytic, and mixed methods research in psychology: The APA Publications and Communications Board task force report. American Psychologist, 73(1), 26–46. https://doi.org/10.1037/amp0000151

56.

*Li

(2021). Perceived effects of CET4 test preparation, language ability, and test performance: An exploratory study of Chinese EFL learners. Language Education & Assessment, 4(2), 38–58. https://doi.org/10.29140/lea.v4n2.480

57.

*Liu

O. L.

(2014). Investigating the relationship between test preparation and TOEFL iBT® performance (ETS Research Report Series, RR-14-15). Educational Testing Service. https://doi.org/10.1002/ets2.12016

58.

Loewen

Plonsky

(2015). An A–Z of applied linguistics research methods. Palgrave Macmillan.

59.

*Ma

Chong

S. W.

(2022). Predictability of IELTS in a high-stakes context: A mixed methods study of Chinese students’ perspectives on test preparation. Language Testing in Asia, 12(2), 1–18. https://doi.org/10.1186/s40468-021-00152-3

60.

*Ma

(2017). Understanding test preparation phenomenon through Chinese students’ journey towards success on high-stakes English language tests [Unpublished doctoral thesis]. Queen’s University.

61.

*Ma

Cheng

(2015). Chinese students’ perceptions of the value of test preparation courses for the TOEFL iBT: Merit, worth, and significance. TESL Canada Journal, 33(1), 58–79. https://doi.org/10.18806/tesl.v33i1.1227

62.

Macaro

(2020). Systematic reviews in applied linguistics. In McKinley

Rose

(Eds.), The Routledge handbook of research methods in applied linguistics (pp. 230–239). Routledge. https://doi.org/10.4324/9780367824471-20

63.

Menken

(2006). Teaching to the test: How No Child Left Behind impacts language policy, curriculum, and instruction for English language learners. Bilingual Research Journal, 30(2), 521–546. https://doi.org/10.1080/15235882.2006.10162888

64.

Messick

(1981). Issues of effectiveness and equity in the coaching controversy: Implications for educational and testing practice (ETS Research Report Series, RR-81-19). Educational Testing Service.

65.

*Mickan

Motteram

(2008). An ethnographic study of classroom instruction in an IELTS preparation program (IELTS Research Reports 2008–8). https://s3.eu-west-2.amazonaws.com/ielts-web-static/production/Research/ethnographic-study-of-classroom-instruction-in-ielts-program-mickan-et-al-2008.pdf

66.

*Mickan

Motteram

(2009). The preparation practices of IELTS candidates: Case studies (IELTS Research Reports 2009–10). https://s3.eu-west-2.amazonaws.com/ielts-web-static/production/Research/preparation-practices-of-ielts-candidates-mickan-et-al-2009.pdf

67.

*Minakova

(2020). Dynamic assessment of IELTS speaking: A learning-oriented approach to test preparation. Language and Sociocultural Theory, 6(2), 184–212. https://doi.org/10.1558/lst.36658

68.

Newman

Gough

(2020). Systematic reviews in educational research: Methodology, perspectives and application. In Zawacki-Richter

Kerres

Bedenlier

Bond

Buntins

(Eds.), Systematic reviews in educational research: Methodology, perspectives and application (pp. 3–22). Springer. https://doi.org/10.1007/978-3-658-27602-7_1

69.

Nowell

L. S.

Norris

J. M.

White

D. E.

Moules

N. J.

(2017). Thematic analysis: Striving to meet the trustworthiness criteria. International Journal of Qualitative Methods, 16(1), 1–13. https://doi.org/10.1177/1609406917733847

70.

Ockey

G. J.

(2021). An overview of COVID-19’s impact on English language university admissions and placement tests. Language Assessment Quarterly, 18(1), 1–5. https://doi.org/10.1080/15434303.2020.1866576

71.

*O’Sullivan

Dunn

Berry

(2021). Test preparation: An international comparison of test takers’ preferences. Assessment in Education: Principles, Policy & Practice, 28(1), 13–36. https://doi.org/10.1080/0969594X.2019.1637820

72.

Page

M. J.

Moher

Bossuyt

P. M.

Boutron

Hoffmann

T. C.

Mulrow

C. D.

Shamseer

Tetzlaff

J. M.

Akl

E. A.

Brennan

S. E.

Chou

Glanville

Grimshaw

J. M.

Hróbjartsson

Lalu

M. M.

Loder

E. W.

Mayo-Wilson

McDonald

& McKenzie

J. E.

(2021). PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. The BMJ, 372(160), 1–36. https://doi.org/10.1136/bmj.n160

73.

*Pan

Y.-C.

(2016). Traditional and non-traditional test preparation practices: Learner performance and perspectives. Electronic Journal of Foreign Language Teaching, 13(2), 170–183. https://e-flt.nus.edu.sg/wp-content/uploads/2020/09/pan.pdf

74.

*Pan

Y.-C.

In’nami

(2017). Does TOEIC as a university exit test ensure higher employability in Taiwan? International Journal of Language Testing, 7(1), 1–27.

75.

Peters

M. D. J.

Godfrey

McInerney

Munn

Tricco

A. C.

Khalil

(2020). Scoping reviews. In Aromataris

Munn

(Eds.), JBI manual for evidence synthesis (pp. 406–451). JBI. https://doi.org/10.46658/JBIMES-20-12

76.

Popham

W. J.

(2001). Teaching to the test? Educational Leadership, 58(6), 16–20.

77.

Porte

McManus

(2019). Doing replication research in applied linguistics. Routledge. https://doi.org/10.4324/9781315621395

78.

*Rao

McPherson

Chand

Khan

(2003). Assessing the impact of IELTS preparation programs on the General Training reading and writing test modules (IELTS Research Report 2003–5). https://s3.eu-west-2.amazonaws.com/ielts-web-static/production/Research/assessing-impact-of-ielts-preparation-programs-on-candidates-performance-rao-et-al-2003.pdf

79.

*Razavipour

Habibollahi

Vahdat

(2021). Preparing for the higher education admission test: Preparation practices and test takers’ achievement goal orientations. Assessment and Evaluation in Higher Education, 46(2), 312–325. https://doi.org/10.1080/02602938.2020.1773392

80.

*Razavipour

Mansoori

Shooshtari

Z. G.

(2020). Test takers’ perspectives on an English language test in Iranian higher education: A washback study. Issues in Educational Research, 30(3), 1058–1083. http://www.iier.org.au/iier30/razavipour.pdf

81.

*Read

Hayes

(2003). The impact of IELTS on preparation for academic study in New Zealand (IELTS Research Reports 2003–4). https://s3.eu-west-2.amazonaws.com/ielts-web-static/production/Research/impact-of-ielts-on-preparation-for-academic-study-in-new-zealand-read-et-al-2003.pdf

82.

Rea-Dickins

(1997). So, why do we need relationships with stakeholders in language testing? A view from the UK. Language Testing, 14(3), 304–314. https://doi.org/10.1177/026553229701400307

83.

*Robb

T. N.

Ercanbrack

(1999). A study of the effect of direct test preparation on the TOEIC scores of Japanese university students. TESL-EJ, 3(4), 1–22. https://tesl-ej.org/wordpress/issues/volume3/ej12/ej12a2/

84.

Ross

(2008). Language testing in Asia: Evolution, innovation, and policy challenges. Language Testing, 25(1), 5–13. https://doi.org/10.1177/0265532207083741

85.

*Saif

May

Cheng

(2021). Complexity of test preparation across three contexts: Case studies from Australia, Iran and China. Assessment in Education: Principles, Policy & Practice, 28(1), 37–57. https://doi.org/10.1080/0969594X.2019.1700211

86.

Saldaña

(2021). The coding manual for qualitative researchers (4th ed.). Sage.

87.

*Sato

(2019). An investigation of factors involved in Japanese students’ English learning behavior during test preparation. Papers in Language Testing and Assessment, 8(1), 69–95. https://www.altaanz.org/uploads/5/9/0/8/5908292/4._plta_8_1__sato.pdf

88.

*Sayyadi

Rezvani

(2021). Questioning in TOEFL iBT speaking test: A case of washback and construct underrepresentation. Language Testing in Asia, 11(1), 1–18. https://doi.org/10.1186/s40468-021-00137-2

89.

Smagorinsky

(2008). The Method section as conceptual epicenter in constructing social science research reports. Written Communication, 25(3), 389–411. https://doi.org/10.1177/0741088308317815

90.

*Stoneman

B. W.

(2006). The impact of an exit English test on Hong Kong undergraduates: A study investigating the effects of test status on students’ test preparation behaviours [Unpublished doctoral thesis]. The Hong Kong Polytechnic University.

91.

Talmy

(2010). Qualitative interviews in applied linguistics: From research instrument to social practice. Annual Review of Applied Linguistics, 30, 128–148. https://doi.org/10.1017/S0267190510000085

92.

Taylor

(2013). Communicating the theory, practice and principles of language testing to test stakeholders: Some reflections. Language Testing, 30(3), 403–412. https://doi.org/10.1177/0265532213480338

93.

*Teng

H.-C.

C.-W.

(2019). The washback of listening tests for entrance exams on EFL instruction in Taiwanese junior high schools. Language Education & Assessment, 2(2), 96–109. https://doi.org/10.29140/lea.v2n2.150

94.

*Trenkic

(2021). Teaching to the test: The effects of coaching on English-proficiency scores for university entry. Journal of the European Second Language Association, 5(1), 1–15. https://doi.org/10.22599/jesla.74

95.

*Tsagari

(2012). FCE exam preparation discourses: Insights from an ethnographic study. Cambridge ESOL Research Notes, 47, 36–48. http://www.cambridgeenglish.org/images/22669-rv-research-notes-47.pdf

96.

Tsang

C. L.

Isaacs

(2022). Hong Kong secondary students’ perspectives on selecting test difficulty level and learner washback: Effects of a graded approach to assessment. Language Testing, 39(2), 212–238. https://doi.org/10.1177/02655322211050600

97.

UNESCO Institute for Statistics (UIS). (2024). Internationally mobile students globally, 2000–2021 [Data set]. http://data.uis.unesco.org/

98.

Veritas Health Innovation. (2024). Covidence systematic review software. https://www.covidence.org/

99.

Vorobel

Voorhees

T. T.

Gokcora

(2021). Language learners’ digital literacies: Focus on students’ information literacy and reading practices online. Journal of Computer Assisted Learning, 37(4), 1127–1140. https://doi.org/10.1111/jcal.12550

100.

Wagner

(2015). Survey research. In Paltridge

Phakiti

(Eds.), Research methods in applied linguistics: A practical resource (pp. 83–99). Bloomsbury Publishing.

101.

Wagner

(2020). Duolingo English test, revised version July 2019. Language Assessment Quarterly, 17(3), 300–315. https://doi.org/10.1080/15434303.2020.1771343

102.

*Wall

Alderson

(1993). Examining washback: The Sri Lankan impact study. Language Testing, 10, 41–69. https://doi.org/10.1177/026553229301000103

103.

*Wall

Horák

(2006). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 1, the baseline study (ETS Research Report Series, RR-06-18). https://doi.org/10.1002/j.2333-8504.2006.tb02024.x

104.

*Wall

Horák

(2008). The impact of changes in the TOEFL examination on teaching and learning in Central and Eastern Europe: Phase 2, coping with change (ETS Research Report Series, RR-08-37). https://doi.org/10.1002/j.2333-8504.2008.tb02123.x

105.

*Wall

Horák

(2011). The impact of changes in the TOEFL® exam on teaching in a sample of countries in Europe: Phase 3, the role of the coursebook Phase 4, describing change (ETS Research Report Series, RR-11-41). https://doi.org/10.1002/j.2333-8504.2011.tb02277.x

106.

*Wang

(2019). The impact of TOEFL on instructors’ course content and teaching methods. TESL-EJ, 23(3), 1–18. https://tesl-ej.org/wordpress/issues/volume23/ej91/ej91a2/

107.

*Winke

Lim

(2017). The effects of test preparation on second-language listening test performance. Language Assessment Quarterly, 14(4), 380–397. https://doi.org/10.1080/15434303.2017.1399396

108.

*Xie

(2013). Does test preparation work? Implications for score validity. Language Assessment Quarterly, 10(2), 196–218. https://doi.org/10.1080/15434303.2012.721423

109.

*Xie

(2015). Do component weighting and test method affect time management and approaches to test preparation? A study on washback mechanism. System, 50, 56–68. https://doi.org/10.1016/j.system.2015.03.002

110.

*Xie

Andrews

(2013). Do test design and uses influence test preparation? Testing a model of washback with Structural Equation Modeling. Language Testing, 30(1), 49–70. https://doi.org/10.1177/0265532212442634

111.

*Xu

(2021). Processes and effects of test preparation for writing tasks in a high-stakes admission test in China: Implications for test takers. Studies in Educational Evaluation, 70, 101015. https://doi.org/10.1016/j.stueduc.2021.101015

112.

*Xu

(2022). Construct-oriented or goal-motivated? Interpreting test preparation of a high-stakes writing test from the perspective of expectancy-value theory. Frontiers in Psychology, 13, Article 846413. https://doi.org/10.3389/fpsyg.2022.846413

113.

*Yang

(2020). Grammar and vocabulary testing in the senior high school entrance English test in China: A washback study from a learning oriented assessment perspective [Unpublished doctoral thesis]. The Queensland University of Technology. https://doi.org/10.5204/thesis.eprints.203594

114.

Green

(2021). Preparing for admissions tests in English. Assessment in Education: Principles, Policy & Practice, 28(1), 1–12. https://doi.org/10.1080/0969594X.2021.1880120

115.

*Yu

Rea-Dickins

Kiely

Zhang

Fang

(2017). Preparing for the speaking tasks of the TOEFL iBT® test: An investigation of the journeys of Chinese test takers (ETS Research Report Series, RR-17-19). https://doi.org/10.1002/ets2.12145

116.

Yung

K. W. H.

(2015). Learning English in the shadows: Understanding Chinese learners’ experiences of private tutoring. TESOL Quarterly, 49(4), 707–732. https://doi.org/10.1002/tesq.193

117.

*Zhan

Andrews

(2014). Washback effects from a high-stakes examination on out-of-class English learning: Insights from possible self theories. Assessment in Education: Principles, Policy & Practice, 21(1), 71–89. https://doi.org/10.1080/0969594X.2012.757546

118.

*Zhan

Wan

Z. H.

(2016). Test takers’ beliefs and experiences of a high-stakes computer-based English listening and speaking Test. RELC Journal, 47(3), 363–376. https://doi.org/10.1177/0033688216631174

119.

Zheng

Cheng

(2008). Test review: College English Test (CET) in China. Language Testing, 25(3), 408–417. https://doi.org/10.1177/0265532208092433

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.03 MB