Abstract
The Chinese Proficiency Test or Hanyu Shuiping Kaoshi (HSK) significantly influences International Chinese Language Education (ICLE). The washback effect of the HSK has received less attention in earlier research studies. This study applied a mixed-methods research design to examine the HSK washback effect on ICLE in relation to value, intensity, length, specificity, and intentionality. It involves questionnaires, classroom observations, and interviews with international Chinese language teachers and international students as research subjects. The study reveals that HSK has a strong positive effect in terms of value and intensity. The effects occur both before and after the test for some time. The specificity of the washback effects is reflected in teachers’ teaching contents, methods, and attitudes, as well as students’ learning contents, methods, and attitudes; and in terms of intentionality. The design of HSK has been partially executed, but it produced unexpected consequences. To optimize the positive washback effects of HSK, this study offers practical recommendations for international Chinese language teaching and learning.
Keywords
Introduction
Over the past two decades, China has experienced a notable rise in the enrollment of international students, with the arrival of 492,185 learners from 196 countries and various regions in 2018. Among these, 258,122 were international students coming to China with academic qualifications, constituting over half of the international student population, underscoring a continuous expansion in the number of learners pursuing academic education in China (China Education Online, 2020). In response to this trend, the Ministry of Education has established detailed language proficiency requirements for incoming international students (Ministry of Education of the People’s Republic of China, 2018). The government scholarship program has also set forth explicit eligibility criteria, clearly delineating the relationship between Hanyu Shuiping Kaoshi (HSK) scores and scholarship disbursements for international students. This regulatory framework underscores the importance of HSK in international Chinese language education. The Ministry of Education and the State Language Commission officially released the Chinese Proficiency Level Standard for International Chinese Language Education (hereafter referred to as the “Standard”) on March 1, 2021. This document serves as a guide for HSK and international Chinese language instruction. Starting on July 1, 2021, the HSK was restructured into three levels and nine grades (HSK3.0), while maintaining the stability of the original six-tier system. The HSK has gained increasing attention as a standardized language assessment tool.
The HSK assessment system has also gradually aligned with internationally influential language standards, which has made testing methods more rigorous and scientific. It has facilitated the development of a hierarchical design for teaching Chinese, which includes the proper sequencing of Chinese character recognition and writing at various proficiency levels (Liu et al., 2022). In classroom teaching, teachers often incorporate HSK-related content, and students place great significance on their HSK scores. Testing can influence teaching and learning, and it is known as the washback effect (Hughes, 2003). Therefore, it is important to study the washback effects of HSK.
Both in China and at an international level, research on the washback effect has mainly focused on large-scale high-stakes English-as-a-second-language tests (J. M. Wang, 2011), such as the Test of English as a Foreign Language (TOEFL), the International English Language Testing System (IELTS), the National College English Test (CET), and the Test for English Majors (TEM). The Chinese Ministry of Education sponsors the latter two to assess Chinese college students’ English language skills. Comparatively less scholarly attention has been devoted to the HSK, a national standardized test designed to assess the Chinese proficiency of non-native speakers (C. X. Huang, 2013a). The extant literature on HSK has predominantly investigated the washback effect through the individual perspectives of either teaching or learning (F. Y. Kong & Zhang, 2021; J. C. Wang et al., 2023), thereby constraining a comprehensive exploration of the intricate interplay among assessment, teaching, and learning. Washback effects are closely tied to construct validity (Messick, 1989), providing a new perspective for observing how tests reflect construct validity by influencing teaching and learning practices. Moreover, differences in washback effects are inextricably linked to the characteristics of the test format (Green, 2006), enabling the test to fulfill dual functions: (1) evaluating teaching practices and (2) fostering pedagogical improvement as part of broader educational reform efforts (Petrie, 1987). Research on the washback effect of HSK provides direct guidance for teaching practice while helping test designers understand the actual impact of HSK on teaching and learning and then make targeted improvements to the test. Consequently, this study aims to explore the washback effect of HSK more comprehensively by synthesizing and analyzing its influence on teaching and learning aspects simultaneously within the field of International Chinese Language Education.
Literature Review
Assumptions and Models of the Washback Effect
The concept of the washback effect initially gained prominence in international contexts and subsequently gained attention in China after 2000. Numerous scholars have interpreted this concept and agreed that the washback effect refers to the impact of tests on language teaching and learning (Alderson & Wall, 1993; Gu, 2007b; Hamp-Lyons, 1997).
The investigation of 15 hypotheses concerning the influence of teaching and learning laid the groundwork for subsequent research on the washback effect, revealing that the effect was more complex than previously thought (Alderson & Wall, 1993). Several researchers created washback models to expound the washback effect (Bailey, 1996; Green, 2007a; Hughes, 1993). Hughes’s “PPP” washback model posits that testing impacts participants involved in various processes, leading to positive or negative effects (Hughes, 1993). Bailey’s washback effect model was expanded to emphasize the reciprocal interaction between testing and participant engagement, illustrating how it can foster a favorable washback effect (Bailey, 1996). Green’s washback model similarly acknowledges participant counteraction to testing, highlighting the importance of test design, and emphasizing the close relationship between test design and the directionality of washback effects (Green, 2007a). Compared to the models discussed above, the Explicit-Implicit Washback Effect model (Prodromou, 1995) distinguishes between explicit and implicit effects, arguing that the impact of testing on teaching and learning is predominantly negative and deeply ingrained. The above models explain washback effects from different perspectives, and it is easy to see that researchers attach great importance to the direction of washback effects and increasingly recognition their complex nature of washback effects (Cheng, 2004).
Different types of exams can have different impacts, so defining the dimensions of the washback effect is a prerequisite for launching an in-depth study (Chen, 2022). Watanabe proposed five dimensions for the washback effect: value, intensity, length, specificity, and intentionality. Value pertains to whether the washback effect of a test is positive or negative; intensity refers to the strength or weakness of the specific effect induced by the test; length encompasses the duration of the test’s impact. Specificity is contrasted with generality. Generality refers to a washback effect that is common to any test, whereas specificity refers to a washback effect that is only relevant to a particular type of test or a particular aspect of the test; and intentionality considers whether the impacts produced by a test are planned or unforeseen (Watanabe, 2004). The dimensions proposed by Watanabe provided a comprehensive framework for understanding the washback effect and significantly influenced research in this area.
Studies on Washback
Several studies have explored the washback effects of high-stakes language testing on teaching and learning in one or a certain number of dimensions (Gu et al., 2014; Jiang, 2015; Muñoz & Álvarez, 2010). Overall, the impact of testing on teaching and learning may be positive, may be negative, or they can even coexist (Alderson & Hamp-Lyons, 1996; Green, 2007b; Gu et al., 2014; Jiang, 2015; Muñoz & Álvarez, 2010); however, a previous study highlights that HSK has a more significant motivational effect on students (S. Wang, 2018). Although both positive and negative washback effects can have an impact on language education (Cheng, 1997), it seems that people are more willing to see those positive effects. In fact, positive or negative effects are relative, and well-designed tests may result in negative outcomes (Alderson & Wall, 1993), such as an excessive focus on test-based education. Therefore, assessing a test solely based on its directionality is not recommended, and it is necessary to explore the washback effect of the test in depth.
The washback effects of high-stakes language testing on teaching and learning are centered on teaching content, methods, and attitudes (C. X. Huang, 2013a; F. Y. Kong & Zhang, 2021; Wu, 2019), but there is no agreement on the specific effects of testing on these aspects. (i) There is an adjustment in the teaching content. Teachers seem to overlook language skills in favor of teaching test-taking skills in their lessons (Kilickaya, 2016; F. Y. Kong & Zhang, 2021; Laotrakunchai et al., 2021); however, some teachers focus on constructing authentic contexts to develop students’ language proficiency (C. Wang et al., 2014). (ii) Changes in teaching attitude. Prolonged exposure to testing pressures can lead some teachers to experience boredom and doubt their abilities, subsequently diminishing their confidence in their teaching (Wu, 2019). (iii) There should be a selection of teaching methods. Although existing research indicates that testing has an indirect impact on teaching methods (Alderson & Wall, 1993) and is relatively insignificant (Alderson & Hamp-Lyons, 1996; Luxia, 2005), we still find that teachers face high-stakes language tests with two types of teaching methods: teacher-centered (Barnes, 2016; Y. Z. Huang, 2013b) and student-centered (Saif, 2006; C. Wang et al., 2014). In this case, the teacher-centered teaching method is based on the teacher’s explanation, whereas the student-centered teaching method is based on group activities and other methods.
The washback effect of high-stakes language testing on learning is mainly centered on learning content, methods, and attitudes (Fan & Yang, 2020; Y. Z. Huang, 2013b; Rahman et al., 2021), but there is disagreement on the specific effects of testing on following aspects: (i) An adjustment in the learning content: Some students focus on information related to the test (Y. Z. Huang, 2013b), while others place more emphasis on vocabulary (S. H. Wang & Huang, 2021). (ii) Selection of learning methods: Some students choose superficial learning methods, such as performing practice tests (Fan & Yang, 2020; Zhang & Kong, 2021); several studies have reached contrasting outcomes, indicating that they do not adopt short-term intensive learning practices for Chinese, and do not excessively focus on test-taking skills (Biggs, 2001; C. X. Huang, 2013a). (iii) Attitudes about learning evolved: Students increasingly prioritize test preparation and find motivation in the test to develop an interest in Chinese language learning (J. C. Wang et al., 2023); however, some studies have indicated that, influenced by the test, they show negative attitudes (Rahman et al., 2021) and have trouble improving their self-confidence, even when receiving encouragement (Jia et al., 2022).
A recent study exploring the washback effect of the HSK exam, based on Watanabe’s five dimensions, offered a comprehensive analysis. However, it is worth noting that the examination of the specificity dimension in this study was limited to learning and teaching content and lacked a more in-depth analysis (Zhou et al., 2023). The present study seeks to fill this gap in the literature. Consequently, language testing influences teachers’ teaching and students’ learning regarding content, methods and attitudes; however, existing research debates the specific aspects of its impact, and how HSK impacts teaching and learning requires further investigation.
Indeed, the previously described studies investigated the washback effects of high-stakes language assessments. However, they paid little attention to the comprehensive nature of HSK, which is designed to promote both teaching and learning through assessment (Consulate General of the People’s Republic of China in Vancouver, 2019), focusing instead primarily on either teaching practices or student learning. Given the intrinsic interconnection between teaching and learning, the connection between examinations and either teaching or learning requires comprehensive inquiry to clarify the relationship. From the perspective of research dimensions, under a certain determined scope, most of the studies only focus on a single dimension or part of the dimensions of the washback effect; the inquiry is not deep enough, and there are limitations in the comprehensive inquiry. In terms of study methodology, majority of studies seem more isolated and have more constrained approaches, which prevents them from fully exposing the washback effect.
Considering this, the present study aimed to conduct a thorough and multidimensional investigation of the washback effects of HSK on international Chinese language education, considering Watanabe’s (2004) five dimensions. The objective of this study was to answer the following questions:
What are the impacts of HSK washback on teachers’ instruction in terms of value, intensity, length, specificity, and intentionality?
What are the impacts of HSK washback on students’ learning in terms of value, intensity, length, specificity, and intentionality?
The discussion will use a mixed-methods approach for a comprehensive analysis of the washback effects of the Chinese Proficiency Test (HSK) on International Chinese Language Education (ICLE).
Method
This study used a mixed-method convergent design to fully explore the intricate nature of the washback effect. This approach integrates both quantitative and qualitative methodologies to address the research questions (Creswell et al., 2003). In this design, the researcher collected quantitative and qualitative data simultaneously, analyzed the data separately, and combined the datasets to synthesize the results (Creswell & Plano Clark, 2018). The rationale for adopting this approach is to combine the strengths and weaknesses of both quantitative and qualitative methodologies (Patton, 1990), thereby enriching the quantitative results with qualitative insights and fostering a more comprehensive understanding of the research questions (Creswell & Plano Clark, 2018). Although quantitative research methods can present the extent of the washback effect through figures with objectivity and generalizability, it is difficult to explain the process behind the washback effect in depth, whereas qualitative research can obtain richer detailed information. Therefore, the application of mixed research allows the analysis of complex phenomenon of the HSK washback effect from various angles.
Participants
A convenience sampling technique was used to select 72 international students who had completed the HSK to pursue Chinese academic degrees and 107 international Chinese language teachers who had taught or were teaching undergraduate international students at a university in mainland China. International students who have taken the HSK must pass the HSK test to obtain their degrees, so the HSK is of great importance to their Chinese language learning; the international Chinese language teachers as the sample have courses centered on the HSK, so the selection of such research subjects can provide a clearer understanding of the washback effect of the HSK. In this cohort, 56.9% were male students, 43.1% were female students; 9.3% were male teachers, and 90.7% were female teachers. Previous studies on the washback effect have interviewed to 2 to 3 people (Y. Z. Huang, 2013b; F. Y. Kong & Zhang, 2021). Since teachers and students were direct targets of the HSK washback effect, their specific factors, such as gender, teaching, and learning experiences, might influence their perception of the HSK washback effect (Weir, 1984). Therefore, to avoid the influence of individual factors of teachers and students on the results of the interviews, four international students with Chinese degrees (two males and two females) who had passed HSK5 and four teachers (two males and two females) who had been teaching for more than 3 years were selected by a stratified random sampling method to participate in the interviews. In addition, the four teachers were teaching the HSK listening course, the HSK reading course, the HSK writing course, and the HSK vocabulary course respectively. Based on the results of the questionnaire, we chose the two courses with the highest and lowest frequency for classroom observation. By observing the areas where the frequency of study varies greatly, we were able to more clearly reflect the washback effect of the HSK.
Instruments
The following three research instruments were used to achieve the research objectives.
Questionnaire
This study used a questionnaire developed based on Green’s (2007b) and Gu’s (2007a) scales as a research tool. The questionnaire was based on a 5° Likert scale with 1 to 5 representing increments from lower to higher degrees. The questionnaire comprised three sections with 10 items. Basic information was collected in the first section, including gender, age, years of teaching experience,etc. The second section examined the impact by analyzing the value, intensity and specificity of the HSK washback effect and exploring its specific manifestations. This section addressed the “Research Questions 1 and 2.” The third section investigated the length of the HSK washback effect. It explored any differences in the washback effect between pre-and post-testing, providing insights into the length of the washback effect for “Research Questions 1 and 2.” Separate questionnaires were designed for both teachers and students, with the only difference being the phrasing of the questions. Cronbach’s Alpha (α) and Exploratory Factor Analysis (EFA) were employed to ascertain the reliability and validity of the survey instruments. The outcomes indicated that the Cronbach’s alpha coefficient for the teacher questionnaire was .876, with a KMO value of .827 (p < .01); for the student questionnaire, the Cronbach’s alpha coefficient was .889, accompanied by a KMO value of .804 (p < .01). Thus, the reliability and validity of both the teachers’ and students’ scales were commendable.
Classroom Observation
Inspired by Shih (2010) and F. Y. Kong and Zhang (2021) form, we expanded this study’s classroom observation form to include classroom interactions. Classroom observations were divided into three parts: teaching behavior, learning behavior, and classroom atmosphere, which mainly recorded teachers’ teaching contents, teaching methods, number of times they mentioned test-related information, students’ learning contents, learning methods, interactions, and attendance. The results of the classroom observations were used to verify if the teachers’ and students’ behavioral performances were consistent with the questionnaire results and to supplement these findings by analyzing other situations that arose.
Interview
Another research tool was a semi-structured interview outline with 10 items that covered perceptions of HSK; specific impacts produced by HSK; differences in impacts produced before and after the test; specific impacts of HSK on content, methods, and attitudes toward teaching and learning; and suggestions for HSK. We attempted to clarify the scale’s results.
Procedures
Before distributing the questionnaire, the study team informed the students and teachers that their participation in the survey was anonymous and that the collected data would exclusively serve academic research purposes. Those who consented to participate completed the questionnaire voluntarily.
Classroom observation was divided into three stages: in the preparation stage, communicating with teachers, determining the observation time, and familiarizing with the course content and teaching objectives; in the implementation stage: each observation lasted 90 min, and non-participant observation method was used to record the teachers’ teaching behaviors, students’ responses, and classroom interactions, and recording devices were used to assist in the recording during the observation process; and in the concluding stage, organizing the observation notes for the analysis to be carried out.
Before conducting the semi-structured interviews, the participants provided explicit consent. Interviews were conducted either in-person or over the phone in Chinese. The duration of the interviews with the eight selected participants varied between 15 and 30 min, depending on the depth and detail of the respondents’ answers.
Data Analysis Method
The data obtained from the scale were analyzed using the SPSS software (version 26.0). The research team conducted data analysis in several steps.
Two methods were employed to address “Research Question 1 (R1) and (R2).”
(i) Descriptive statistics were used to quantify the teachers’ washback effect on performance in terms of direction, intensity, and timing.
(ii) The Wilcoxon signed-rank test was utilized to assess any significant differences in the washback effect before and after the test. Because the distribution of the teacher’s performance in the washback effect before and after the test was not normal (ps < unk > .05), a nonparametric test was employed.
Classroom observations were processed using content analysis. The specific steps were as follows: transcribe the observation notes and audio recordings into text and categorize them according to the dimensions of the observation form; code the categorized data and count the frequency of each dimension. For example, the number of times teachers mentioned HSK, etc.
The interview data were processed using thematic analysis and coded using the qualitative analysis software Nvivo11. Table 1 shows the coding results. A coding framework was developed based on the interview questions to complement the findings from R1 and R2. The interview coding process involved the following steps:
Interview questions set the foundation for developing a broad coding framework that identified key categories.
The interviewees’ unedited statements were coded sentence by sentence and categorized accordingly.
The next step involved analyzing the relationships among the categories and grouping those that shared similar associations.
Lastly we generated the results by summarizing, integrating, and aligning the core categories to the groupings formed in the previous step.
Results of Coding.
Table 2 shows how each data collection and analysis method corresponds to address the two research questions. Through the mixed-method convergent design, we are able to comprehensively examine the performance of the HSK washback effect in five dimensions: value, intensity, length, specificity, and intentionality.
Correspondence Table Between Research Questions and Methods.
Results
Washback Effect of HSK on Teachers
Table 3 presents the descriptive statistics of the HSK washback effect on teacher teaching. The washback effect of HSK on teachers’ teaching was positive (M = 3.86 > 3) with an intensity of 3.86, which means that HSK has a relatively strong positive impact on teaching. In terms of the length of the washback effect, both the pre-test (M = 3.79) and post-test (M = 3.86) of the HSK influenced teachers’ teaching. The results of the Wilcoxon signed-rank test revealed no significant differences in their effects on Chinese language teaching (z = 1.03, p = .30 > .05).
Descriptive Statistics of HSK Washback Effects on Teachers.
The research team examined the specific effects of the HSK on various aspects of teacher instruction. Table 3 provides the descriptive statistics for the teaching content. The frequency of teachers incorporating different exercises in the classroom was ranked as follows: vocabulary (M = 4.09), speaking practice (M = 4.04), grammatical structures (M = 3.76), reading comprehension (M = 3.58), listening to dialog (M = 3.51), word choice (M = 3.30), and writing practice (M = 3.23). According to these findings, teachers frequently incorporated vocabulary-building exercises into their classroom activities, and students practiced writing assignments less often. As shown in Table 3, the frequency of teachers’ uses of different teaching methods in the classroom was ranked as follows: lecture method (M = 4.75), situational method (M = 4.02), and communicative method (M = 4.00). Notably, the lecture method is utilized with a comparatively high frequency in teaching contexts, whereas situational and communicative methods are employed at relatively lower frequencies.
The frequency of vocabulary practice among students was the highest, whereas their frequency of writing practice was the lowest. To comprehensively investigate the washback effect of HSK, this study compared the teaching behaviors of Teacher A in the HSK writing class with those of Teacher B in the HSK vocabulary class. Table 4 displays outcomes from the observations made in the classroom. (i) Regarding teaching content, Teacher A primarily focused on writing skills, while Teacher B emphasized word sense analysis, among other aspects. Both teachers prioritized explaining the test question types and providing guidance, with Teacher A mentioning exam techniques five times and Teacher B mentioning exam techniques four times. (ii) In terms of teaching methods, both teachers emphasized lectures, questioning, and student guidance. Teacher A spent 54 min lecturing, while Teacher B spent 57 min, with the classroom share of teacher lectures reaching 60% or more; none of the teachers conducted any communicative activities during the class. (iii) Regarding interaction and classroom atmosphere, the students in the writing class struggled to answer the teacher’s questions effectively, whereas those in the vocabulary class exhibited greater fluency. Furthermore, students perceived the vocabulary class as easier than the writing class.
Results of Classroom Observations.
Table 5 indicates the results of the teachers’ interviews: (i) Teachers generally acknowledge the positive influence of HSK; however, they also recognize the existence of negative impacts. (ii) The effects of HSK on teaching practices persisted for a considerable period following the test. (iii) When it comes to teaching content, the influence of HSK is mostly seen in the design of instructional plans and the content pertinent to the test. Teachers tend to follow a set instructional plan, so they focus on test questions and approaches.(iv) In terms of teaching methods, the impact of HSK is evident in the selection of methods, with teachers frequently opting for mechanical approaches such as questioning and exercise repetition. This choice reflects the difficulty of implementing communicative interactions with students. (v) The influence of HSK on teaching attitudes was largely emotional, giving more attention to students. (vi) The HSK washback effect on teachers correlates with their knowledge of HSK and the curriculum. (vii) Teachers offer recommendations on teaching strategies.
Results of Teachers’ Interview.
Washback Effect of HSK on Students
Table 6 shows the descriptive statistics of the effects of HSK washback on international students’ Chinese language learning. The washback effect of HSK on international students’ Chinese language learning was positive (M = 3.94 > 3) with an intensity of 3.94, which means that HSK has a relatively strong positive impact on Chinese language learning. In terms of the length of the washback effect, both the pre-test (M = 3.93) and post-test (M = 3.97) of the HSK had an impact on students’ learning. The results of the Wilcoxon signed-rank test revealed no significant difference in the impact on Chinese learning before and after the test (z = −0.76, p = .45 > .05).
Descriptive Statistics of HSK Washback Effects on Students.
Here, we describe the specific influence of HSK on various aspects of student learning. Table 6 presents the results of the descriptive statistics regarding learning content. The frequency of students performing different exercises was ranked as follows: vocabulary (M = 3.85), listening conversation (M = 3.78), speaking practice (M = 3.76), reading comprehension (M = 3.65), word choice (M = 3.56), grammatical structures (M = 3.47), and writing practice (M = 3.33). Notably, students performed vocabulary-related exercises most frequently, whereas the frequency of engagement in writing tasks was relatively low. As shown in Table 3, the frequency of students employing various learning methods was ranked as follows: memorizing vocabulary words (M = 4.07) > studying the textbook (M = 3.85) > reading extensively (M = 3.64) > answering previous HSK exam questions (M = 3.56) > answering other test questions (M = 3.38) > doing nothing (M = 1.92). Students spent most of their time memorizing vocabulary words, whereas only a small number opt to do nothing.
This study conducted a comparative analysis of student learning behaviors in the classes of Teacher A and Teacher B, leading to the following observations: (i) Regarding learning content: Since the two courses were skill-based, the learning content that students received in the classes were centered on relevant topics. (ii) Learning methods: Students in both classes focused on listening to the teacher’s explanations and practicing after class. However, it is noteworthy that students in the vocabulary class actively sought clarification by asking questions three times; no student-initiated questioning occurred in the writing class. (iii) Classroom atmosphere and interaction. One absence was recorded in the writing class, whereas students in the vocabulary class consistently arrived on time to ensure their attendance.
Table 7 shows the results of the student interviews. (i) Students generally acknowledge the positive influence of the HSK; however, they also recognize the presence of negative impacts. (ii) The effects of HSK on student learning persisted for a considerable period following the test. (iii) Regarding the learning content, we observed the influence of HSK primarily in the foundational knowledge and language skills acquisition. Students primarily focus on vocabulary acquisition, while facing challenges in mastering language skills such as reading comprehension and writing. (iv) In terms of learning methods, the influence of the HSK was evident in the selection of study methods, with students predominantly relying on vocabulary memorization and practice with simulation questions. (v) The influence of HSK on learning attitudes is largely effective and related to learning awareness. Although students acknowledge the significance of learning, sustaining their interest in it can be challenging. (vi) The effect of the HSK washback effect on students correlates with their motivation to undertake the test and their perception of HSK. (vii) The recommendations made by students mostly addressed the teaching and resource creation of teachers.
Results of Students’ Interview.
Discussion
This study explored the washback effect of the HSK on international Chinese language teaching and learning using a mixed-methods design. It was found that, as a high-stakes language test, the washback effect of the HSK is very complex, and the complexity is mainly reflected in the manifestation of the washback effect.
R1: What are the Impacts of HSK Washback on Teachers’ Instruction in Terms of Value, Intensity, Length, Specificity, and Intentionality?
Regarding the value and intensity of the washback effect, HSK has a strong positive impact on teachers’ teaching, and as the curriculum is centered on HSK, teachers’ teaching objectives are obvious. Therefore, teachers are generally willing to acknowledge the facilitating effect of HSK on their teaching, and this finding validates the results of previous studies (C. X. Huang & Li, 2010). Simultaneously, however, teachers who fail to recognize the gap between test objectives and teaching objectives may be constrained by the test in terms of teaching content and methods (Zhang & Kong, 2021). Thus, although the positive impact of HSK on teaching was significant, we should not ignore its negative impact. The impact of HSK on teachers is age-specific in terms of the length of the washback effect (Zhou et al., 2023). We can be sure that the HSK’s washback effect is evident during the preparatory phase before the test and continues to influence for some time after the completion of the test (Watanabe, 2004). Additionally, teachers’ post-test concerns about students’ test results can lead to a more pronounced after-test effect.
Educators’ teaching practices demonstrate the specificity of HSK’s washback effect on instruction. In terms of the choice of teaching content, although teachers appear to be in line with previous studies, neglecting language skills (Furaidah et al., 2015) and emphasizing test-taking content (F. Y. Kong & Zhang, 2021; Laotrakunchai et al., 2021), it was also found that teachers will still pay more attention to the acquisition of students’ language knowledge. For each skill practice, the frequency of writing practice in teachers’ classrooms was the lowest because there are no writing questions in HSK Levels 1 and 2; they do not appear until HSK Level 3, and it is not until HSK Level 5 that there is a tendency to equalize the proportion of writing with other question types. Although teachers are willing to acknowledge the importance of writing skills, they prefer to spend their time on test-taking content, which they see as a way to help students pass exams successfully (Kilickaya, 2016; Sayyadi & Rezvani, 2021). However, it is noteworthy that the high frequency of vocabulary and grammar in the classroom also shows the teachers’ focus on language knowledge, which is a point that has not been mentioned in previous studies. Besides, mentioning test-taking techniques may help students pass the test for a short period of time. On the contrary, it is not beneficial for long-term language learning; teachers make sure that the students have the appropriate vocabulary and grammar knowledge with the guidance of the syllabus, which also reflects the positive impact of HSK on the contents of instruction. In teaching methods, it is consistent with Y. Z. Huang’s (2013b) finding that teachers in the classroom most often use teacher-centered teaching methods, and the frequency of lecture-based teaching methods is significantly higher than that of communicative methods. Because of the constraints of the tests, which make the task of lecturing in a limited time relatively heavy, it is difficult to conduct some communicative activities for HSK courses, leading to the problem of insufficient classroom interest and a lack of opportunities for student interaction (Alderson & Hamp-Lyons, 1996; Barnes, 2016).
While manifesting attitudes toward teaching, teachers did not exhibit the same negative emotions or loss of confidence in their teaching as described in Wu’s (2019) study. By contrast, the overall mood of the teachers was more optimistic. Wu (2019) highlighted that teachers’ teaching performance is linked to the examination pass rate; consequently, teachers’ pursuit of a high pass rate in the examination interferes with normal teaching activities, which, in turn, results in teaching no longer being full of passion. However, for the teachers involved in this study, although the test passing rate is not the main criterion for their teaching performance, teachers will also pay attention to whether students can pass the test; in this process, teachers will pay extra attention to students preparing for the test and provide targeted help to students on difficult issues. Therefore, teachers increase their control over classroom details and become more rigorous. Teachers have opposing mindsets and distinct starting spaces.
On intentionality, unlike the existing conclusion that the design concept of HSK uses the test to promote teaching, which has not been implemented (Zhou et al., 2023), the results of the study show that the design concept of HSK “use the test to promote teaching” has been partially implemented. The consequential aspect of construct validity can explain the intentionality of the HSK’s washback effect, which focuses on the rationality of the actual consequences and impacts of the test (Messick, 1995). When considering consequential validity, teachers’ focus on language knowledge and positive teaching attitudes are expected and must be reasonable outcomes of HSK, which aims to enhance education. However, test-oriented content and mechanized teaching methods are unintended effects of HSK, contradicting its intended design concept.
R2: What are the Impacts of HSK Washback on Students’ Learning in Terms of Value, Intensity, Length, Specificity, and Intentionality?
For the value and intensity of the washback effect, the HSK has a strong positive impact on students’ Chinese language learning, as it is a high-stakes test for international academic students. Students’ Chinese language learning revolves around the HSK, and students gradually develop an interest in Chinese language learning in the process, which has also been verified in previous studies (Zhang & Kong, 2021). Despite the significant positive impacts, we observed negative consequences. Students were afraid of the difficulty of the test, which hindered their Chinese learning. In terms of the length of the washback effect, the impact of HSK on students was phased out, with HSK affecting students’ learning for a period before and after the test, which was also verified by previous research (Zhou et al., 2023). In addition, the difference in the magnitude of the impact before and after the test was insignificant, because students’ concerns about the test results continued until the results, were released.
The specificity of HSK’s washback effect on student learning is manifested in their learning behaviors. In terms of the choice of learning content, students paid more attention to the accumulation of their usual knowledge, especially vocabulary, which is consistent with the findings of previous studies (S. H. Wang & Huang, 2021). In addition, it is more interesting that students spend a lot of time practicing listening to questions, but will spend less time writing. Due to the limited time for listening practice in class, teachers may neglect listening skill (Homran & Asassfeh, 2023), which is also indispensable for daily life; therefore, to make up for the learning content, students will increase their listening training. Moreover, students’ anxiety regarding writing is evident in their inadequate motivation during writing courses, and the machine test format for Chinese character spelling also introduces an element of challenge. In examining learning methods, the results of the study revealed that students tended to opt for surface learning strategies, such as memorizing vocabulary words and taking practice exams. This aligns with the findings of Fan and Yang (2020) but contradicts the conclusion that cognitive and metacognitive strategies are utilized more frequently by students than memorization (Chak, 2023; Doan & Piamsai, 2025). It can be explained by the fact that as students aim is to pass the exam, passing or failing the HSK can have a significant impact on a student’s future, and the design of the exam sometimes induces students to focus their attention on coping with the exam (Green, 2007a); memorization of vocabulary words and practicing test questions helps students to practice intensively. However, students also tended to supplement their training by studying textbooks and extensive reading. Observing that HSK influences students and enables them to modify their learning strategies accordingly is encouraging (F. Kong & Zhang, 2024) and that they are gradually shifting from mechanical memorization to meaningful learning that embodies cognitive and meta cognitive strategies.
Regarding attitudes toward learning, in contrast to previous studies that reported negative attitudes (Rahman et al., 2021), students influenced by HSK exhibited mixed yet positive attitudes toward learning. Given the significant impact of the HSK on academic requirements, students approach the test with a more positive mindset during the later stages of HSK preparation. Students will briefly increase their interest in learning out of curiosity during the learning process. However, this attitude and interest in learning will diminish over time. Although students engage in studies to pass the exam, the duration of this engagement must be considered.
Concerning intentionality, the design concept of the HSK “use the test to promote learning” has been implemented to a certain extent, but it has also had an unintended impact. In terms of consequential validity, students’ interest in Chinese language learning is an expected and reasonable consequence of HSK, which is also consistent with the findings of existing studies (Zhou et al., 2023); however, retaining this interest is a critical issue that must be addressed immediately. This study also noted that the use of test-oriented learning methods and students’ evasion of writing skills were unintended consequences of HSK, which presents a challenge for the development of skills and knowledge.
Conclusion
This study reviews the theoretical underpinnings and models of the washback effect, with a specific focus on the influence of HSK. The researchers performed an empirical study to investigate the washback effects of HSK. According to the research findings, the HSK washback effect is complex, with both positive and negative influences on teaching and learning practices. These findings can not only help us deepen our understanding of the wasback effect of language testing, but also contribute to the understanding of the nature and function of educational evaluation; and promote the expansion of educational evaluation theory from measurement to the promotion of teaching improvement and student development.
Based on the empirical findings, teachers need to: (i) be attentive to the timing and frequency of their mentions of test-related information, (ii) reduce their reliance on non-communicative teaching methods while taking the initiative to enrich the teaching methods, and (iii) adjust the time for each language skill training according to the students’ learning situation to maximize the positive washback effect of HSK and avoid test-oriented teaching.
To realize efficient learning, students need to: (i) overcome learning challenges and improve the understanding of complex concepts, such as grammatical structures and writing; (ii) flexibly use various learning methods, and also appropriately increase discussions and exchanges with peers; and (iii) set a correct attitude toward learning and take the HSK as a stage-by-stage test, with an eye to in-depth learning and future development.
Associated institutions ought to: (i) provide training programs for teachers to strengthen their comprehension of HSK, (ii) create communication platforms for teachers to foster a skillful understanding of students’ proficiency in various skills, and (iii) develop online resources for the sharing of online teaching resources.
This study attempts to analyze the washback effect of HSK on the teaching and learning of International Chinese Language Education. Since the participants were mostly female teachers, it may lead to some limitations in the generalization of the findings, future studies can sample international Chinese language teachers, and student groups in different regions. In addition, due to operational limitations, we could not include the study of administrators and institutions or schools.
Footnotes
Acknowledgements
We would like to thank the editor and anonymous reviewers for their insightful comments. We also thank professor Shen for providing feedback to the earlier draft.
Author’s Note
Author1: Research directions: Language Testing, Educational Assessment, International Chinese Language Education. Author2: Research directions: theoretical linguistics, English education, language industry.
Ethical Approval and Informed Consent Statements
In accordance with the regulations, this study is classified as a routine educational quality project and, therefore, does not require approval from an Ethics Committee or Institutional Review Board. The study does not involve animal or human clinical trials and is not unethical. In line with the ethical principles outlined in the Declaration of Helsinki, all participants provided informed consent prior to participating in the study. The anonymity and confidentiality of participants are guaranteed, and participation was entirely voluntary.
Author Contributions
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a research project of the Macau University of Science and Technology (A Study on the Washback Effect of Lanuage Testing on International Chinese Language Education, FRG-24-034-UIC).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The authors confirm that the data supporting the findings of this study are available within the article.
