Abstract
To explore the washback of the HSK on students learning Chinese as a Second Language (CSL), the study surveyed 1,616 SCL students from 25 different mother tongue backgrounds and interviewed international students studying in China through questionnaire and interview. The following was found in this study: (a) The HSK influenced both the process and results of CSL learning, and the positive impact outweighed the negative impact. (b) Students’ motivation and perception of the HSK were the key factors that influenced the level of HSK washback. (c) The HSK had different intensities of washback on students at different HSK levels, with the strong effect on students at the lower level and a weak effect on students at the higher level. Based on the findings, some implications and suggestions for learners, teachers and test designers are provided.
Plain language summary
To explore the impact of the HSK on students learning Chinese as a Second Language (CSL), the study surveyed 1,616 SCL students from 25 different mother tongue backgrounds and interviewed international students studying in China through questionnaire and interview. The following was found in this study: (a) The HSK influenced both the process and results of CSL learning, and the positive impact outweighed the negative impact. (b) Students’ motivation and perception of the HSK were the key factors that influenced the level of HSK washback. (c) The HSK had different intensities of washback on students at different HSK levels, with the strong effect on students at the lower level and a weak effect on students at the higher level. Based on the findings, some implications and suggestions for learners, teachers and test designers are provided. For language learning, first, learning goals need to be clarified. Second, the HSK needs to be reasonably understood. Third, treat the test results need to be correctly. For HSK development, first, test developers need to reasonably design the difficulty gap between different HSK levels and balance the difficulty between different question types of the same level. Second, more HSK levels should be developed. Future research can focus on the personal background of learners and explore the impacts of factors such as gender, mother tongue and family on washback. At the same time, more attention should be given to social policies and administrators to expand the washback of the HSK.
Keywords
Introduction
As of December 2020, approximately 25 million people outside China are learning Chinese, and 70 foreign countries are promoting the Chinese language under their respective national education systems. Statistics from the Ministry of Education of China in 2020 show that over the past five years, a total of 40 million students have taken the Chinese Language Proficiency Test (HSK)and other Chinese tests. The HSK has become the most widely promoted and largest-scale standardized test, with the purpose of testing the ability of CSL students to use Chinese in their daily life, study and work. For a long time, academic research in this area has focused on the design and quality of the HSK, mainly addressing the following topics: (1) overall test design (X. Xie, 2011); (2) test syllabus design (Zhao et al., 2003); (3) types of test questions (L. Liu, 1997); (4) reliability, validity, difficulty and fairness of the test (Chai, 2013); and (5) scoring of writing (Zhu et al., 2013). Driven by the expanding influence of the HSK and computer technologies, a growing number of studies have focused on the following topics: the construction and error analysis of HSK-based Chinese composition corpora (Ren, 2010); test digitalization (Yang, 2011); comparison between HSK and other tests (W. Liu, 2019); and HSK reform and Chinese language standard setting (Y. Liu et al., 2020).
The abovementioned studies involve almost all aspects of standardized language tests that should be considered. However, the washback of the HSK is far less explored. The few studies on the washback of HSK mostly focus on the teaching (C. Huang & Li, 2010; Y. Huang, 2013; Wu, 2019), while the perspective of HSK test takers has not been fully discussed. Therefore, this study attempts to explore the influence of the HSK on CSL students to fill this research gap.
The following sections review the important models, dimensions and studies of washback in language testing, briefly describe the history of the HSK and related empirical studies of washback, use empirical methods to survey global Chinese learners. Moreover, ways in which to use the findings to promote long-term CSL learning and the development of Chinese language competence are considered, and information for developing the updated version of HSK3.0 under the guidance of the Chinese Proficiency Grading Standards for International Chinese Language Education is provided.
Washback: Models and Dimensions
Since the emergence of language testing as an independent subject, research on the impact of testing has been rising steadily. In general, the term washback refers to the impact of language testing on teaching and learning (Bachman & Palmer, 2010). More precisely, “they would not necessarily otherwise do because of the test” (Alderson & Wall, 1993, p. 117). In the 1990s, various hypotheses about washback became the theoretical basis of future relevant studies. The first theoretical framework consists of the 15 hypotheses about washback proposed by Alderson and Wall (1993), who addressed three aspects: students’ learning, teachers’ teaching and the intensity of washback. Hughes (1993) divided the mechanism of washback into three parts: participants, process, and product. Gu (2007) expanded this “P-P-P” model into a 4P mode (Participant-Perception-Process-Product), believing that participants’ perception of the test also played a significant role. Based on these findings, Bailey (1996) further discussed the internal mechanism of washback. Bailey’s model reflected the complexity of washback, describing the interactions among the test, participants, processes, and results. Accordingly, the nature of the test first influenced participants’ perceptions and attitudes, which in turn affected their work process and ultimately had an impact on learning outcomes. In Bailey’s model, no attention was given to process, while Green (2003, 2007a) made up for this shortfall and emphasized the role of test design in the washback mechanism. Green’s model insisted that test design (such as test form, content, and complexity) was related to washback but that focusing solely on test design while ignoring construct would lead to negative washback. Green then considered the risk level of tests, which could affect the value of washback. These theoretical hypotheses, particularly the model of Alderson and Wall (1993) and Green (2007b), provide a methodological basis for this study.
Various theoretical models have inspired empirical studies on the washback of language testing. For a long time, attention has been given to the washback of teaching (Alderson & Hamp-Lyons, 1996; Cheng, 2005; Shih, 2010; Watanabe, 1996) had been paid attention to. Now, students, as the most direct and closest contacts of tests, have also aroused the interest of researchers, and studies on the washback on students have increased accordingly (Gosa, 2004; Zhan & Andrews, 2014). Almost all studies on teachers and students have mentioned the complexity of washback, arguing that the washback has five dimensions: specificity, intensity, length, intentionality, and value (Watanabe, 2004). Among these, value, as the most important dimension, has received much attention, especially regarding high-stakes standardized tests. In short, the impact of tests can be both positive and negative (Watanabe, 2004), which is one issue on which our research will focus. Another important dimension is intensity, which is also addressed in this study. Cheng (1997) first proposed the concept of washback intensity, which refers to the extent to which a test affects certain aspects of teaching and learning.
When it comes to language testing, research on the washback has predominantly focused on high-stakes English testing, involving various stakeholders such as students, teachers, school administrators, and exam designers. Initially, the research primarily centered around teachers and their teaching practices. However, considering that learners are the most direct participants in exams, it has become imperative to also examine students and their learning experiences. Studies on related washback have revealed the following: (1) Exams influenced students’ attitudes (Fan et al., 2014). (2) Exams impacted the learning process (L. Dong & Qiao, 2020; Gu & Xiao, 2013; Zhan & Andrews, 2014; Zhang & Bournot-Trites, 2021). (3) Exams affected learning outcomes (M. Dong, 2020; Green, 2007b; Hayes & Read, 2004). (4) Exams did not fundamentally improve students’ learning (Andrews et al., 2002; Li, 1990). Therefore, it indicates that research on the washback of student learning has a reasonable foundation. However, there appears to be a notable gap in attention towards Chinese as a second language testing.
In summary, washback can be perceived in a broad and a narrow sense, with the latter mostly referring to the effect of a particular test on the related teaching and learning, which is the definition of washback used in this study. Each of the abovementioned theoretical models of the washback has their respective features, and these models have become increasingly complex and refined and together constitute the theoretical framework of this study. As important dimensions of washback, the value and intensity deserve to be explored again.
Background
The Chinese Language Proficiency Test (HSK) is an international standardized Chinese proficiency test that is designed to evaluate the Chinese proficiency of nonnative speakers (including foreigners, overseas Chinese, and Chinese ethnic minorities). It is both a proficiency and criterion-referenced test, while being aligned with the Common European Framework of Reference for Languages (CEFR). In the current HSK, listening, reading and writing skills are graded at six levels and the oral test is given separately. The maximum score for each module is 100 points and there is no writing test for Level 1 or Level 2, so the total score for these two levels is 200 points and the total score for the rest is 300 points. For Levels 1 to 4, 60% of the full mark is the passing line, while Level 5 and 6 do not have a designated passing score, in an attempt to continually encourage learners to pursue higher levels of language proficiency (Zhang & Zhang, 2014). The number of questions in the listening, reading and writing sections vary by level and Pinyin is only shown at Level 1 and 2. The listening section is divided into multiple choice questions (MCQs) and true or false questions (T/FQs). Tasks include matching pictures with words, sentence comprehension, and short dialog comprehension. The length of dialogs increases level by level, with only two rounds at Level 1, three rounds at Level 2, and five rounds at Level 3 and above. The reading section has the same types of questions as the listening section. Reading tasks include word recognition, sentence comprehension, sentence completion, and reading comprehension. Level 1 deals only with word-level comprehension, while reading skills in Levels 1 to 4 are limited to the sentence level, and Levels 5 and 6 involve textual comprehension. The writing section is included at Level 3 and above. Writing tasks at Levels 3 and 4 are sentence-based, while Levels 5 and 6 require the ability to write an essay. Tasks include writing Chinese characters, completing sentences, making sentences, writing an essay, and abbreviating essays. This section evaluates not only students’ writing skills but also their Chinese characters handwriting skills.
At present, the Ministry of Education of China stipulates that a score of 180 (passing score) on HSK4 is the minimum Chinese language proficiency requirement for international students to apply for Chinese government scholarships. A particular HSK score has become a key condition for international students to apply for financing and obtain an academic degree or pursue short-term studies in China. As a result, HSK has gradually become a high-stakes and internationally influential standardized test.
In summary, the HSK is becoming increasingly standardized, scientific and influential. However, empirical studies on the washback of the HSK seem to be scarce in comparison to high-stakes English tests. C. Huang (2013) conducted the earliest officially published study on the washback of the HSK (the old version), focusing on student learning. In this study, 353 valid questionnaires from 9 universities in China were collected in two separate periods, and it was found that the HSK had certain effects on Chinese learners, especially on their learning attitudes, methods and strategies. By conducting a questionnaire survey and interviews with 300 non-Chinese major international students in several universities of Hainan Province in China, Jiang (2015) found that the HSK had a strong effect on students who had taken the test, which promoted their Chinese learning and motivated them to adjust their learning plans and learning strategies. In contrast, the HSK had a weak washback on students who had not taken the test.
Although the above two studies have made a preliminary exploration of the washback of the HSK, the following deficiencies exist: first, the research questions and questionnaire design are very general without specific details, resulting in insufficient information obtained to discuss the findings. For example, both studies discussed the value of HSK washback and even compared the intensity difference, but the reason was unknown. Second, the analysis method is not scientific enough. A common method of presenting the results was to describe and report only the indicator of the percentage selected by the participants for the information in the questionnaire and then interpreted, without providing more methods of data analysis, such as factor analysis and regression analysis, resulting in a lack of statistically significant data support for the study conclusions. In conclusion, the existing research barely survey CSL students worldwide to comprehensively explore the washback of the HSK on Chinese language learning.
Considering these research gaps, this study will focus on the washback of HSK on Chinese learning by surveying global CSL students. It attempts to address the following three questions:
What washback does the HSK have on CSL students?
What factors influence the intensity of washback?
What are the differences in washback of different HSK levels?
Methodology
Participants
At present, there are test centers for the HSK both in China and overseas. The majority of overseas test-takers are from Levels 1 to 3, while after studying in China, CSL students are more likely to take advanced levels. All the students (a total of 1,616 after excluding invalid data) who responded to this questionnaire took the HSK test and were divided into two parts: 385 international students studying in mainland China and 1,231 students learning Chinese in their own countries. Among them, 455 (28.2%) were male and 1,161 (71.8%) were female. They came from 41 countries and had 25 different native language backgrounds. The number of students taking Levels 1 to 6 were 119, 126, 329, 390, 442, and 210 respectively. Personal interviews were conducted with six of the students studying in China, two each at Levels 4, 5, and 6, and three each for male and female students of different nationalities and at all grade levels.
Instrumental
The generation and phenomenon of washback are complex; therefore, a single research method cannot comprehensively reveal the questions. A questionnaire survey and interviews were used in this study to complement and verify each other.
Questionnaire
The questionnaire was based on Scales A and B of Green’s (2007a) study on the washback of IELTS and the core question scale of C. Huang’s (2013) study on the washback of the HSK on CSL students’ learning behavior, and was finalized after refining and modifying the survey on students’ background, proficiency level, learning methods and other basic information. Additionally, according to the current research questions, some content was added, such as the cognition of certain test questions and perception of the influence of the HSK, was added. The questionnaire included the following six parts. The first part gathered personal information through 12 questions related to language and study background. The second part focused on Chinese learning and testing, with 15 questions designed to obtain relevant information, including the reasons for learning Chinese and taking the HSK test. The third part contained 15 questions regarding preparations for the HSK test. The goal of this section was to understand how students prepare for the test and the content covered in it. The fourth part was centered around the influence of the HSK test and comprised a total of 18 questions. This section directly explored the impact of the HSK on students’ Chinese language learning in various aspects. The fifth part delved into students’ perceptions of the HSK test, covering three subtopics: the overall perception of the test, familiarity with its format, and the perceived difficulty of the questions. Finally, in the sixth part, there were two questions that directly inquired about the influence of the HSK on students’ learning and invited students to provide their suggestions regarding the HSK test. As mentioned above, the types and number of questions differed slightly across different levels of the HSK. Therefore, six questionnaires were designed for all HSK levels, and the difference lay in some specific question types, while the rest were the same. Then, the Chinese version of the questionnaire was translated into 8 languages: English, French, Spanish, Russian, Korean, Japanese, Thai, and Vietnamese.
Semistructured Interview
A semi-structured interview was adopted. It had 7 items and mainly covered six topics: the reasons for taking the HSK, views on the importance of the HSK, preparations for the HSK, teachers’ explanations of the HSK in class, influence of the HSK on learning, and suggestions for the HSK. The interviews were recorded for transcription with the knowledge and consent of the interviewees, adhering to the principle of confidentiality and anonymity.
Data Collection Procedure
A total of 56 international undergraduate students studying Chinese language and literature in China participated in the pilot test of the questionnaire. The results showed that the questionnaire design was reasonable and did not need to be modified. During the formal data collection phase, the questionnaire collection process was divided into two parts. For students studying in China, we contacted teachers and asked them to set aside 15 minutes in class for students to complete the questionnaire. For students studying in their own countries, we asked the directors of various Chinese schools and HSK test centers to help distribute the questionnaires in class or after the HSK test was completed. After the questionnaire was completed, six students voluntarily participated in the interview.
Data Analysis
The items of the questionnaire were entered into SPSS 25.0 for data analysis. The data analysis was performed in five steps.
First, to verify the reliability of the questionnaire, the reliability test showed that Cronbach’s α was .934, which indicated a good internal consistency.
Second, an internal correlation analysis was conducted on three 5-point Likert-type scales (the third, fourth, and fifth parts of the questionnaire), and then an independent exploratory factor analysis was performed on the third part (preparations for HSK), in order to summarize the students’ learning behaviors. In this process, KMO and Bartlett’s tests were first implemented, and the KMO index was 0.933, indicating that the data met the requirements of further analysis. Then, a factor analysis was carried out. Principal component analysis was used to extract the initial factors, and then varimax rotation was performed on the data to maximize the variance and facilitate structural interpretation. Considering the arbitrary nature, practicality and meaningful interpretability of factor extraction, data reduction in this study followed two principles: (a) four factors were extracted as the eigenvalue was greater than 1 and were at least two items on one factor; (b) when an item had double loadings, it was classified according to its practical meaning.
Third, based on the factor analysis of test preparations, a multiple regression analysis was used to explore the impact of the importance, difficulty and the consequences of the test on learning behavior. As previously indicated, the preparations for the HSK provided insights into students’ learning behaviors. In addition, the questionnaire’s second, fourth, and fifth parts aimed to gather students’ perspectives on the importance, difficulty, and consequences of the HSK, which are crucial aspects for evaluating and comprehending this examination. Considering that attitudes towards exams can influence language learning, it becomes essential to investigate how students’ perceptions of the HSK impact their learning behaviors. This step, including the second step, was aimed at exploring the RQ1: What washback does HSK have on CSL students?
Fourth, in order to explore the intensity of washback and then answer the RQ2 (What factors influence the intensity of washback?), the following two methods were used: (1) An independent sample T test was used to investigate whether motivations had significant differences at the level of HSK influence. Intrinsic motivations (interest in Chinese language, testing Chinese proficiency level, improving Chinese proficiency) and extrinsic motivations (admission to further education, graduation, job hunting, and applying for scholarships) were tested respectively. (2) A post hoc multiple test was conducted to observe whether students’ perception of HSK produced significant differences at the level of the influence of HSK.
Fifth, in order to discuss the RQ3 (What are the differences in washback of different HSK levels?), a post hoc multiple test was used to test whether there were differences among students at different levels of the HSK in terms of preparation behavior, perception and influence of HSK.
The qualitative analysis software Nvivo 12 plus was used to facilitate the coding process of interview data. A coding framework was constructed based on interview questions, and codes were added or removed based on the actual responses of participants. The coding framework is shown in Table 1.
Interview Coding Framework.
Results
The correlations between students’ preparations, perception, and influence of HSK were tested. The results indicated that preparation and influence these two variables were strongly correlated, r = 0.619, p < .001. Perception and influence were also strongly correlated, r = 0.627, p < .001.Furthermore, a significant positive correlation was observed between perception and preparation, r = 0.527, p < .001. The results showed a significant positive correlation among the three factors.
In the questionnaire, “preparations” reflected students’ learning behavior. Four factors were extracted on the basis of the total variance explained by the original variables (Q1–Q15 were descriptions of 15 items), and the rotated component matrix is shown in Table 2. Therefore, four factors were named according to the items included, as follows: Factor 1: learning plan (Q8, Q7, Q9, Q1, Q10, and Q11); Factor 2: classroom learning (Q14, Q15, Q13, and Q12); Factor 3: learning method (Q4 and Q2); and Factor 4: supplemental learning (Q5, Q3, and Q6).
Four-Factor Rotated Component Matrix.
Based on the results of the factor analysis above, the findings of the multiple regression analysis are also reported. In the four regression analyses, the predictor variables were the importance, difficulty, and consequences of the HSK, while the dependent variables consisted of the four factors obtained from the factor analysis mentioned earlier.
The results of regression analysis on learning plan are shown in Table 3. The predictor variable “Importance” demonstrated a substantial positive effect on the learning plan (β = .392, p < .001), indicating that students who perceived the HSK as more important were more inclined to develop a comprehensive learning plan. In contrast to “Importance,” the predictor variable “Difficulty” did not yield a statistically significant impact (β = .038, p = .128). Although there was a positive correlation, a significant relationship between students’ perceptions of the difficulty and their propensity to create a well-defined learning plan could not be ascertained. The predictor variable “Consequences” exerted a highly significant influence (β = .528, p < .001). It suggests that students who perceived substantial consequences in the HSK were more likely to adopt a more robust and strategic learning plan.
Results of Regression Analysis of Importance, Difficulty, and Consequences on Learning Plan.
The results of the regression analysis, as displayed in Table 4, illustrate the impact of predictor variables on classroom learning. The predictor variable “Importance” had a statistically significant effect (β = .179, p < .001), which means when students perceived the HSK as more important, it positively influenced the learning outcomes in the classroom setting. The predictor variable “Difficulty” had a negative coefficient, indicating that students may have perceived that the more difficult the test was, the less the HSK-related knowledge they might learn in class could meet their needs. However, this perception was not significantly related to classroom learning (β = −.002, p = .921). The predictor variable “Consequences” significantly influenced classroom learning (β = .326, p < .001), indicating that the more influential the HSK was to the students, the more effective classroom learning would be.
Results of Regression Analysis of Importance, Difficulty, and Consequences on Classroom Learning.
The findings from the regression analysis, presented in Table 5, demonstrate how the predictor variables influence learning method. The results revealed significant associations for the “Importance” (β = .189, p < .001) and “Consequences” (β = .222, p < .001) predictors with the learning method. However, the “Difficulty” predictor did not show a significant association (β = −.020, p = .410). By extension, the higher perceived importance of the HSK was associated with a greater likelihood of choosing appropriate learning method. Similarly, students tended to opt for appropriate learning approach when they expected more significant consequences tied to the HSK. The positive coefficient of “Difficulty” suggests a slight possibility when students perceive the HSK as excessively difficult, they might not actively prepare for it. However, it is essential to note that this relationship was not strong enough to reach statistical significance.
Results of Regression Analysis of Importance, Difficulty, and Consequences on Learning Method.
Table 6 presents the results of regression analysis between supplemental learning and three predictor variables. The “Importance” predictor did not show a significant association with supplemental learning (β = .017, p = .487). However, the “Difficulty” predictor exhibited a significant negative association (β = −.086, p = .001), indicating that when students perceived the material as more difficult, they were less likely to engage in supplemental learning. There was a significant positive regression coefficient between HSK consequences and supplemental learning (β = .055, p < .05), showing that the greater the influence of the HSK on students, the more supplemental learning would occur.
Results of Regression Analysis of Importance, Difficulty, and Consequences on Supplemental Learning.
The independent samples test results regarding students with different motivations are shown in Tables 7 and 8. According to the Table 7, students with and without intrinsic motivation had significant differences in terms of HSK consequences (p < .001). In Table 8, there was no significant difference in extrinsic motivation (p > .05) except for “applying for scholarships” (p < .001), which was the strongest extrinsic motivation.
Independent Samples Tests—Intrinsic Motivation.
Independent Samples Test—Extrinsic motivation.
The post hoc tests were analyzed. How much importance the students attached to the HSK and their perception of its difficulty and their familiarity with it produced differences in terms of the consequences of the HSK. Significant differences were found between the five dimensions of emphasis (how much importance they attached to HSK) (p < .001). In terms of the perception of the overall difficulty of the HSK, there was no significant difference between the students who chose “medium” and “easy” and between those who selected “easy” and “very easy” (p > .05). For the familiarity of the test, there were no significant differences between “not clear” and “completely unfamiliar,” between “not clear” and “unfamiliar,” or between “unfamiliar” and “completely unfamiliar” (p > .05). Students at different levels of the HSK judged the most difficult part of the test differently. Students at Levels 1 to 3 found listening to be the most difficult section, students at Level 4 thought writing was the most difficult part, and students at Levels 5 to 6 assumed reading was the most difficult part. This showed the differences regarding the consequences of the HSK among the abovementioned types of students mentioned above. Significant differences were found among these three types of students (p < .001). The differences of students at different levels in preparation behavior, perception of the HSK and the influence of the HSK were tested. It is worth noting that Level 6 students were significantly different from students at other levels in these three aspects (p < .001). In terms of perception of the HSK, Level 5 students were significantly different from students at other levels (p < .001). In terms of familiarity with HSK, Level 1 students were significantly different from students at other levels.
Table 9 reveals the results of interview. The result contains fewer codes compared to the original coding framework (see Table 1), listing only the actual generated codes. Given the limited number of students interviewed, the number and frequency of each code were tallied, and example quotes were included to support the survey results.
Results of Interview.
While some of the interview questions overlapped with those in the questionnaire, more detailed answers were obtained through the verbal report of students. The information derived from the interview results can be summarized as follows: (1) HSK, being a high-stakes test, motivated students primarily through external factors, yet it also involved a combination of internal and external incentives; (2) Most students perceived the overall difficulty of the HSK as moderate, with reading being the most challenging section compared to listening and writing. Factors influencing their perception of the test’s difficulty were primarily related to language knowledge and familiarity with the question types; (3) All students prepared for the HSK after class by practicing sample tests, memorizing vocabulary, and attending training classes. While their teachers taught them test-taking skills before the test, the HSK-related content covered in daily classes was not enough to help them prepare adequately for the test; (4) The students exhibited a general consensus regarding the positive impact of the HSK on their learning. However, approximately half of them acknowledged negative consequences, particularly concerning heightened stress and anxiety during exam preparation. (5) Suggestions for the HSK were mainly focused on the test design. Students expressed concerns regarding the uneven distribution of difficulty levels, particularly between levels 5 and 6, which posed challenges in adjusting to the new proficiency level. Furthermore, students with higher language proficiency hoped for more higher-level exams to meet their testing needs.
Discussion
This section discusses the results in connection with the three research questions.
RQ1: What washback does HSK have on CSL students?
The examination process involves multiple participants, including students, teachers, administrators, exam designers, and so on. The present study exclusively concentrates on students. Therefore, the washback to be discussed subsequently is rooted in a fundamental claim supported by empirical evidence, affirming that exams do indeed have an impact on the learning process. (Alderson & Wall, 1993; Cheng, 2005). Specifically, the RQ1 aims to investigate the impact of the HSK on learning. Nevertheless, it is essential to acknowledge that diverse participants may hold varying perspectives on the washback of the same examination and even differ in value of the same washback (Ferman, 2004; Shohamy et al., 1996).To assess the value of the washback, specific criteria have been defined: An examination is considered to have a positive impact if it enhances students’ language skills, encourages consistent daily learning, and aligns with the expected use of the test (Green, 2007a; Gu, 2007; Q. Xie, 2010). Conversely, an examination is deemed to have a negative impact if it undermines students’ confidence and motivation, diminishes their intrinsic motivation, and excessively focuses on test-taking skills (Jones et al, 2003; Rapp, 2002). Building upon Hughes’ (1993) trichotomy of washback model and integrating research findings, the discussion of the RQ1 will be divided into two sections: the impact of the HSK on the learning process and its influence on learning outcomes. The washback of the HSK on the learning process is primarily manifested in the following five aspects.
The HSK influenced learning strategies. According to the factor analysis and regression analysis, the influence of the HSK on students’ learning strategies is mainly manifested in learning plans, classroom learning and supplemental learning. (1) Students adapted their learning plan to the HSK as follows: (a) They learned test-taking skills. (b) With the balance of daily Chinese study, they spent significantly more time of learning Chinese. The aforementioned findings suggest that students indeed employ more learning strategies. However, it should be noted that such behavior may not necessarily yield a positive backwash effect. First, focusing on learning test-taking skills influenced the overall validity of the HSK, potentially leading to the HSK failing to achieve its intended purpose. Secondly, interview results also indicated that students staying up late to review for HSK adversely reduced daily Chinese classroom learning efficiency. A study conducted by Watanabe (1992) on the English entrance exam revealed that students utilizing more learning strategies experienced a beneficial impact. However, the study failed to establish a consistent conclusion and even presented an opposing view, suggesting that the extent of the positive effects of employing exam strategies depends on the consequences arising from their utilization. (2) Learning content changed. There were few regular Chinese classes dedicated to HSK training; as a result, students focused more on the test content after class. It was consistent with the research results of Gosa (2004). Specifically, students used sample tests, extra listening and reading materials to improve their Chinese proficiency. In addition, they worked harder on vocabulary and grammar.
HSK influenced students’ learning methods. Biggs (2001) divides learning methods into deep and surface learning methods, each of which has two dimensions: motivation and strategy. To achieve a learning or test-taking purpose in a short period, learners usually adopt a surface learning method, which leads to utility to a certain extent. Deep learning focuses more on long-term learning and development, going step by step, and is less affected by external factors. A total of 78.1% of the students prepared for the exam within three months, and the questionnaire and interview revealed that students attended tutorial classes, memorized vocabulary and practiced sample tests, indicating that these surface study methods were used regularly during the preparation period. From this perspective, HSK has negatively affected students’ learning methods, hindering the consolidation of their foundation and gradual progress in learning Chinese over a specific period. C. Huang’s (2013) study found that most students did not resort to short-term cramming methods for the HSK. However, the basis for Huang’s conclusion might be considered biased, as it relied solely on the fact that most students aimed to achieve proficiency in Chinese over four years. Such a conclusion did not take into account the possibility of students adopting surface learning methods specifically for exam preparation. It is essential to recognize that high-stakes tests may necessitate some degree of surface learning, and more significant examinations may trigger intensive preparation activities (Stoneman, 2006). Nonetheless, the negative impacts of such practices can be mitigated if they are controlled within specific time range and intensity levels (Xu, 2014). A total of 78.2%students continued to learn Chinese after the test. This is a manifestation of deep learning. In short, HSK had both positive and negative effects regarding learning methods.
The HSK provided the students with learning goals over a period of time. Indeed, 39.2% and 40.8% of students fully agreed and agreed, respectively, with the statement “HSK provides me with a goal of learning.” The students noted that Chinese learning would serve more long-term goals, such as mastering a foreign language, communicating better with Chinese people, meeting job requirements, and having a deeper understanding of the Chinese culture. In addition, as the questionnaire survey showed, the students thought that the HSK was a sufficient way to test their Chinese proficiency. These findings indicated that passing the HSK was a learning goal, but it was not the only or final goal of Chinese learning. Previous studies have overlooked discussions in this area, possibly due to a lack of investigation into language learning, such as its purpose and goals. The impact of examinations on long-term language learning has also been ignored.
HSK influenced the students’ learning attitudes. The formation of a positive learning attitude involves many factors. External factors play a role, but attitude is mainly influenced by learners’ intrinsic learning experience and the result of reflection. Regarding external factors, the HSK did not only measured Chinese proficiency level, but also served some social purposes such as admission to advanced education programs and employee selection. Interest in the Chinese language could generate an internal driving force, and driven by this internal force, the students were more likely to take a positive attitude toward HSK. This finding aligns with the research outcomes of Tang (2005), despite the fact that Tang’s study focused on the College English Test Band 4. A total of 98.2% of the students were more motivated to prepare for the test when it was considered a way to measure and improve their Chinese proficiency. A total of 72.2% of students agreed that the HSK increased their enthusiasm for learning Chinese, which means that the HSK also provided an intrinsic motivation for learning Chinese.
The HSK influenced the students’ passion for learning. Students thought the difficulty level of HSK was between “medium” and “relatively difficult.” Overall, it was “relatively difficult.” It shows that although the difficulty of HSK was controlled within a reasonable range, the HSK remained a challenge for the students, and they felt greater pressure than usual. As demonstrated by the statement “HSK makes me anxious,” the students felt a certain degree of psychological pressure before taking the HSK. Facing the pressure of the HSK, 64.4% of the students believed that the pressure had encouraged them to improve their Chinese proficiency. However, the HSK also had a small negative impact on their learning attitude. It could be inferred that these students had a more negative attitude toward preparing for the test. At the same time, the result of regression analysis also proved that when the students thought the test was difficult, they were more likely to give up studying. Shohamy (1993) and Fan and Yu (2009) similarly discovered that students experienced test anxiety without offering information pertaining to test difficulty. Consequently, it can be inferred that difficulty might not be the sole factor contributing to test anxiety, as this issue is commonly observed in high-stake testing situations.
Learning outcomes refer to the degree to which learning goals have been achieved, therefore, discussing the impact of HSK on students’ learning outcomes is essentially examining whether HSK promotes the improvement of students’ Chinese proficiency. The results demonstrates that HSK had a mostly positive washback on the students’ learning outcomes. The students basically agreed with all the statements that referred to the skill improvement, showing that the HSK could improve the students’ listening, reading, writing, and overall Chinese proficiency to some extent. The students’ perception of improvement in writing was the weakest, probably because HSK1 and HSK2 did not test their writing ability, so they had no way of knowing the change in their writing proficiency. In addition, this showed that the students’ HSK scores could reflect the level of their language skills to some extent. However, in the interviews, some students noted feeling that their Chinese proficiency had not truly reached the required level of HSK6. Therefore, they would continue to work hard in learning Chinese and challenge themselves in the future. This meant that there might be a certain gap between the test result and the actual proficiency level. Limited research exists on the washback of learning outcomes, mainly due to the requirement for pre- and post-examination comparisons and the availability of an independent exam that can effectively measure course objectives (Wall, 2000). Consequently, there is currently insufficient evidence to assert that students learn better or more on a particular exam. Despite this limitation, the study highlights the crucial role of HSK in students’ self-evaluation of their learning outcomes, even though it may not offer definitive answers based solely on subjective feelings reported by the students.
RQ2. What factors influence the intensity of washback?
Green’s (2007a) washback model considers intensity; however, research on HSK washback has yet to explore the factors affecting this dimension. Studying this aspect helps to explain the intensity of the perceived washback between different exams and individuals.
The findings illustrates the main factors that influence the intensity of washback on Chinese learning include students’ perception of the HSK and their motivation to take the test.
First is the importance that the students attach to the HSK. The results of the questionnaire showed that the students generally attached great importance to the HSK and that the HSK had a strong washback on them. Significant differences were found in how students who attached different importance to the test were influenced, and the students who paid the most attention to the HSK were the most affected. This also confirms Qi’s (2011) viewpoint that the more important the exam, the stronger the backwash effect. The importance of exams represents how participants measure the level of risk involved in the exam.
Second is the students’ perception of the difficulty of the HSK. On the one hand, the students subjectively judged the difficulty of HSK, while on the other hand, difficulty level is one of the criteria for evaluating the reliability and validity of the test, given the scientific nature of the overall structure and specific topics of the test. If students believe that the test is too difficult or too easy, the test will hardly have any washback on them. As Qi (2011) pointed out, the difficulty of the exam is not directly proportional to the intensity of the washback. Under the condition of ensuring the rationality of the difficulty coefficient and discrimination of the test, the students had different perceptions of the difficulty of the test, which resulted in different backwash. Overall, the students thought that the HSK’s difficulty level was “medium” to “slightly difficult.” Multiple post hoc tests were used, and it was found that students’ perception of the difficulty level of the HSK led to different influences of the HSK. The influence on students who thought the test was difficult was not as strong as that on other students, and these students were more likely to lose their motivation. Therefore, more substantial washback is often triggered among students who believe that exams are challenging but do not cause them to lose motivation for learning and exams.
Third is students’ familiarity with HSK. The interview found that some students believe that writing is difficult because they are not familiar with this type of test question. Therefore, a potential possibility is that an issue related to the perception of difficulty is how familiar the students are with various parts of the test. The students who were familiar with the test questions had significant differences in the impact of HSK, and the students who were most familiar with the test questions were the most affected. There were no significant differences between three dimensions of “not clear” and “completely unfamiliar,”“not clear” and “unfamiliar,” and “not familiar” and “completely unfamiliar.”
The survey results showed that the students’ motivations to take the HSK varied. Regarding students’ motivation, previous studies have always emphasized the impact of exams on learning motivation (Allen, 2016; Dong et al., 2023), and this study found that exam motivation also plays a particular role in the impact of exams. The intrinsic motivations presented in the questionnaire included interest in Chinese language, testing language level, and improving Chinese proficiency, while the extrinsic motivations were related to applying for scholarships, admission to further education, graduation, and job hunting. There was a significant difference in the impact between students with and without intrinsic motivation, with the HSK having a greater impact on students with intrinsic motivation. There was no significant difference between students with and without extrinsic motivation other than applying for scholarships, indicating that applying for scholarships was the strongest driver of extrinsic motivation.
RQ3: What are the differences in washback of different HSK levels?
The preparation and perception of HSK among students at different levels are somewhat different. Accordingly, the washback of HSK on students at different levels also varies. Previous studies have rarely investigated the washback of high-stake language tests that involve multiple levels, and therefore, it is impossible to explore the differences between different levels.
For high-level students, especially Level 6 students, the positive washback of the HSK was relatively weak. Level 6 students had the lowest scores in three dimensions: preparation for the HSK, perception of the HSK, and influence of the HSK, which were significantly different from the scores of students at other levels. In terms of the learning process, first, Level 6 students made relatively little adjustments to their learning strategies. Students who had prior test experience before HSK6 were familiar with the test questions, and they took targeted training to address the key and difficult points. Generally, the intensity of preparation was lower than that of students at other levels. Second, Level 6 students’ perception of the difficulty of the test was different from that of students at other levels. HSK6 was the most difficult test of all HSK tests, which was reflected in students’ scores in the questionnaire survey. Students at Level 1 to 3 thought listening was the most difficult section, while students of Level 4 believed writing was the most difficult section and students of Level 5 to 6 said that reading was the most difficult section. Correspondingly, the part for which students have spent the longest time preparing for was also the same part they thought was the most difficult one. There was a significant difference between students in Level 5 to 6 students and the other two groups regarding the impact. Students at Levels 5 to 6 had the lowest score on the impact-related questions, while students at Levels 1 to 3 had the highest score. It could be further inferred that the reading section was the most difficult section in the advanced-level tests and even across the whole test, which made the students lose their confidence and enthusiasm for learning in the preparation process. Third, the learning passion of HSK6 students was greatly affected. Students in the interview said that the difficulty gap between HSK 5 and 6 was large, and it was difficult to move up from Level 5 to Level 6 in a short time. Fourth, the intrinsic motivation of HSK6 students was not as strong and lasting as that of students at other levels. HSK6 students had the lowest interest in Chinese, and the biggest reason for taking HSK was the need for job hunting. The influence of this external motivation was not strong or lasting. Regarding learning results, compared with students of other levels, HSK6 students showed relatively small improvement in Chinese proficiency after taking the test. As HSK6 participants were advanced Chinese learners, it was more difficult for them to improve further. There were also some students who thought that their actual proficiency had not reached Level 6.
In contrast, the washback of the HSK on students at lower levels (HSK1-3) was stronger than that on students at higher levels. HSK2 and HSK3 students had the highest score on the influence of HSK, while HSK1 students had the lowest score among the three student groups. HSK1 students had the lowest familiarity with the test questions because it was the first time they had taken an HSK test. This affected the intensity of HSK washback on them. Specifically, first, HSK2 students showed the strongest interest in Chinese learning, followed by HSK1 students. Second, enthusiasm for learning Chinese among HSK 1 students was highest, and the enthusiasm gradually decreased as the level increased. Third, when answering the question “good HSK results will encourage me to continue to learn Chinese well,” HSK1 students got highest score, and this gradually decreased as their levels increased. Fourth, the test difficulty perceived by the students in the questionnaire increased with the rise of level, and the difficulty of HSK1 was the lowest, which aligned with the design concept of HSK: the test lowers difficulty and entry threshold to stimulate students’ learning interest. In summary, low-level students had strong intrinsic motivations, and the HSK could provide sustained motivation for these students, which was conducive to their long-term Chinese learning.
The above results indicate that different levels of HSK students exhibit different backwash effects. It is because students of different levels have different levels of difficulty perception, familiarity, and motivation to participate in exams, which means that question three is an extension of question two.
Conclusions and Implications
This study investigates 1616 HSK participants to examine the washback of the HSK on CSL students. The HSK had both positive and negative effects on the process and results of CSL learning, and the positive effect was outweighed the negative effect. The positive washback is mainly reflected in the following aspects: (a) TheHSK provided students with learning goals over a period of time, and the subsequent motivation generated by the HSK helped students achieve the ultimate goals of Chinese learning. (b) The HSK gave students a more positive attitude toward Chinese learning. (c) The HSK enabled students to adjust their learning strategies appropriately. (d) After taking the HSK, students improved their overall Chinese proficiency and listening, reading and writing skills involved in the test. The negative washback is mainly as follows: (a) Some students focused too much on preparing for the test but did not perform well in regular Chinese classes. (b) Preparations for HSK and the pressure caused by the difficulty of testing made students nervous, which undermined their self-confidence and caused them to give up the test and even further Chinese learning. (c) In the process of preparing for the test, students tended to adopt surface learning methods, which was not good for learning Chinese in the long run. The main factors affecting the intensity of HSK washback were students’ test motivation and their perception of the HSK. In contrast, the HSK had the strongest washback on students at lower levels, especially on the HSK1 students, and the weakest effect on students at higher levels, especially on the HSK6 students. The reason was that the two groups had different intrinsic motivations and learning strategies; in addition, the difficulty levels of their test varied.
Based on the results and discussion explained above, insights for Chinese learning and test development have been gained.
First, learning goals need to be clarified. Language learning is a long process and some basic rules must be followed. Students need to pay attention to both language input and output and try to improve their listening, speaking, reading and writing skills in various ways after class. One of the ultimate goals of learning Chinese language should be successful communication in daily life, and the HSK can be used to prove Chinese proficiency level, but it does not fully reflect the competence of students. Therefore, students can use the HSK to test their Chinese proficiency and determine their shortcomings in order to pursue greater progress.
Second, the HSK needs to be reasonably understood. Test-takers’ perception of tests plays an important role in the generation and intensity of the washback, including understanding of the importance, difficulty and question types of the test. Among them, the importance of tests is influenced by subjective factors, as well as external factors. Currently, the HSK is being given increasingly more socially additional functions, which is inevitably utilitarian. Therefore, students need to take a reasonable view of the importance of the HSK and not just pursue high scores at the expense of improving their practical language skills. Above all, students need to further clarify the syllabus of each level and the corresponding proficiency objectives, and find the level that best suits them.
Third, treat the test results need to be correctly. Students mostly believe that good HSK results will encourage them to continue learning Chinese, which is a major motivating effect of the test. However, some students think that the test is too difficult and the result is not satisfactory, and then they stop taking the test or even give up learning Chinese. Whether scores meet students’ expectations will affect their subsequent learning attitude and enthusiasm. Therefore, for long-term Chinese learning, students need to treat the test results correctly, remain positive about the test, and improve their Chinese learning through their intrinsic motivation.
For HSK development, first, test developers need to reasonably design the difficulty gap between different HSK levels and balance the difficulty between different question types of the same level. The difficulty gap between HSK5 and HSK6 was significantly larger than that between other adjacent levels. Students who have passed HSK5 can hardly adapt to the difficulty level of HSK6 within a short time, which may discourage them. At the same time, students generally focus on the most difficult part of the test, which is not good for the comprehensive and balanced development of all language skills in the long run.
Second, more HSK levels should be developed. Some students who have already passed HSK6 with good results want to take higher-level tests to measure the outcome of their learning continuously and maintain long-term motivation. Following the guidance of the Chinese Proficiency Grading Standards for International Chinese Language Education which was released in April 2021, the HSK will be upgraded and redivided into three levels and nine bands, which will serve CSL students better.
The washback of language testing is a complex mechanism that involves not only learners and teachers, but also administrators and social policies. Therefore, future research can focus on the personal background of learners and explore the impacts of factors such as gender, mother tongue and family on washback. At the same time, more attention should be given to social policies and administrators to expand the washback of the HSK.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Major Research Project of Chinese Testing International Co., Ltd. (CTI2021A01).
Data Availability Statement
The authors confirm that the data supporting the findings of this study are available within the article.
