Abstract
Current willingness to communicate (WTC) scales center on WTC in general second language (L2) learning, while L2 writing WTC is underrepresented. This study intended to close this gap by developing and validating an L2 writing WTC scale. A three-phase sequential embedded mixed-methods design was adopted to overcome the over-reliance on quantitative data and provide adequate evidence of validity. Nineteen items were generated based on our literature search and thematic analysis of the interview data (n = 10). With quantitative data collected from 288 learners of English as a foreign language (EFL), the psychometric properties of the initial scale were examined by exploratory factor analysis. After that, the revised 17-item questionnaire was validated by confirmatory factor analysis and other validation methods with data from 224 EFL learners. The results indicated that the underlying structure involved writing task traits, English language ideology, writing teacher support, interest in English language, and self-perception of English language proficiency. The scale was further validated through factor analysis of the quantitative data (n = 173) and thematic analysis of the immediate retrospective interview data (n = 12) from EFL learners to test its generalizability in other L2 learning contexts and for face validity evidence. The findings showcased a promising mixed-methods design for scale development and clarified the underlying factors of L2 writing WTC. Implications for scale development and the teaching and learning of L2 writing were discussed.
Keywords
I Introduction
Willingness to communicate (WTC) in a second language (L2) indicates learners’ intention to actively engage in L2 communicative activities (Dörnyei, 2005; D.M. Kang, 2014; MacIntyre et al., 1998). Research has revealed that WTC positively contributes to learners’ L2 oral performance and output (Dörnyei & Kormos, 2000), willingness to engage in complex speaking and writing tasks (MacIntyre et al., 1999), and foreign language enjoyment (Botes et al., 2022). Given the benefits of WTC to boost L2 learning, especially against the communicative language learning and teaching background, exploring how WTC can be fostered and accurately measured is necessary. Our scrutiny of the literature shows that the illuminated individual constructs as contributors to L2 WTC include motivation (MacIntyre & Charos, 1996; Teimouri, 2017), anxiety (Baker & MacIntyre, 2000), self-perceived L2 communicative competence (Joe et al., 2017; Baker & MacIntyre, 2000; MacIntyre & Charos, 1996), attitudes towards L2 (MacIntyre & Charos, 1996; MacIntyre et al., 2001; Subtirelu, 2014; Yashima et al., 2004), enjoyment (Dewaele & Dewaele, 2018), self-perceived language proficiency (Sato, 2023), and L2 interest (Eddy-U, 2015). Contextual variables have also been found to modulate WTC significantly. The identified factors, at least, consist of teacher and peer support (Cao, 2011; Eddy-U, 2015; S.J. Kang, 2005; MacIntyre et al., 2001; Yashima et al., 2018; Zarrinabadi, 2014; Zhong, 2013), exposure to an L2 (D.M. Kang, 2014; MacIntyre & Charos, 1996), and communicative task features (Cao & Philp, 2006; J. Zhang et al., 2018).
Contributors to WTC in general language learning are well documented, but factors promoting L2 writing WTC are underexplored. Writing is a complex activity indicating learners’ general language proficiency and performance. Understanding and capturing L2 learners’ writing WTC will help them engage better in writing and enhance their language learning progress. Consequently, the investigation into L2 writing WTC’s underlying forces and measurement is warranted. Several scaling instruments have been developed to measure WTC (MacIntyre et al., 2001; McCroskey & Baer, 1985; Ryan, 2009; Weaver, 2005). However, few measure WTC in the L2 writing learning context or provide compelling validation. Therefore, we sought to develop a well-validated questionnaire of L2 writing WTC.
The previous scale development emphasized factor analysis, neglecting the role of qualitative data and follow-up validation (Sudina, 2023). To pursue methodological rigor and robustness, we initiated developing and validating an L2 writing willingness to communicate scale (L2WWTCS) with a three-phase sequential embedded mixed-methods research design. We began generating scale items based on our literature review and the interview data. The newly developed scale was then evaluated through statistical analyses. Finally, the revised scale was examined with both quantitative and qualitative data. With the completion of this research, we can discuss potential factors nurturing L2 learners’ writing WTC and measure their variations.
1 Potential contributors to L2 writing WTC
Many researchers have reported the common phenomenon that some language learners with high-level communicative competence tend to avoid communication, while others with minimal linguistic competence seek chances to communicate (Dörnyei, 2005; MacIntyre et al., 1998). This issue has received much scholarly attention in an era emphasizing engagement in meaningful interaction to boost language learning (Swain, 1995). Originating from communication research (McCroskey, 1992; McCroskey & Richmond, 1991), the WTC construct was defined as the willingness to engage in communication when free to do so (McCroskey & Baer, 1985). It was conceptualized as static and subjected to personalities in the first language (L1) environment (McCroskey, 1992; McCroskey & Richmond, 1991). In L2 studies, WTC was considered as the final psychological step before actual communication (MacIntyre et al., 1998). MacIntyre et al. (1998, p. 547) defined WTC in L2 learning as ‘a readiness to enter into discourse at a particular time with a specific person or persons, using an L2’. It combined psychological, educational, linguistic, and communicative perspectives into L2 communication research (Clément et al., 2003). Given the vital role of WTC in learners’ engagement in L2 communication, WTC has been recognized as a critical individual difference factor in second language acquisition (Shirvan et al., 2019; Wang et al., 2021), and its multi-dimensional contributors have been profoundly investigated.
Early research followed McCroskey and Baer (1985), mainly examining WTC in oral contexts. MacIntyre et al. (1998) voiced the necessity of extending the WTC construct to other language skills. Similar to L2 speaking, L2 writing is a significant medium of communication. L2 learners write to inform, explain, or persuade, conveying or exchanging information to the audience. On the other hand, L2 writing bears fundamental differences from L2 speaking. For example, L2 writing, in most cases, is planned and non-interactive. Thus, it called for the exploration of L2 writing WTC. Different from the extended definition of WTC (i.e. engage in communication when free to do so), the L2 writing WTC targeted in this article refers specifically to L2 learners’ willingness to engage in L2 writing in instructional contexts. Overall, although there is little research on L2 writing WTC so far, the contributors to L2 WTC may help us understand the construct of L2 writing WTC.
The contributing factors of L2 WTC have been extensively researched at the individual level. For example, MacIntyre and Charos (1996) indicated that global personality features and affective factors associated with language could significantly influence L2 WTC. Baker and MacIntyre (2000) found that the two strongest indicators of WTC were communication anxiety for immersion students and perceived communication competence for non-immersion students. Subtirelu (2014) found that the deficit language ideology adversely affected L2 users’ WTC and the lingua franca ideology positively. Teimouri (2017) linked motivation studies with L2 WTC and suggested ideal L2 self as a predictor of WTC. Joe et al.’s (2017) research revealed that L2 WTC was strongly predicted by the satisfaction of basic psychological needs (i.e. autonomy, relatedness, and competence). Sato (2023) revealed that differences existed within the fluctuations of WTC between low-intermediate and advanced English proficiency groups, although they were both influenced by self-perception of English proficiency. Other individual variables also include L2 enjoyment (Dewaele & Dewaele, 2018) and international posture (Yashima, 2002; Yashima et al., 2004).
As L2 WTC was increasingly understood as a situated concept, researchers began to scrutinize the potential contextual factors contributing to its fluctuation. The effect of teachers in instructional settings on learners’ WTC has been extensively researched (Derakhshan et al., 2023). S.J. Kang’s (2005) qualitative study suggested that teachers’ engagement and positive feedback in conversations could increase learners’ security and situational WTC. Cao’s (2011) research found that a favorable teacher–student relationship would facilitate classroom engagement. Zarrinabadi (2014) identified four factors concerning teachers that might influence learners’ WTC: teachers’ wait time for receiving responses, topic selection, error correction, and support. The social context was also believed to influence L2 learners’ WTC. In their early research, MacIntyre and Charos (1996) discovered that exposure to L2 in social settings would influence WTC. Meanwhile, study-abroad (SA) experiences were reported to affect language learners’ WTC (D.M. Kang, 2014). Cao and Philp’s (2006) research lent evidence to the dynamic nature of WTC by delving into situational WTC. Several elements, including group size, interlocutor conditions, and topic familiarity, were detected to impact actual WTC. Eddy-U (2015) analysed possible factors (de)motivating task-situated WTC from a dynamic systems model and found that good group partners and marks had potential influences. Zhang et al.’s (2018) systematic review showed that situation cues (i.e. interlocutors, classroom atmosphere, and tasks) were overt features that influenced WTC. However, more latent factors (i.e. task-interest, task-usefulness, and task-confidence) were underlying elements that also influenced WTC.
2 Current scales to measure willingness to communicate
WTC cannot be detected or observed directly through physiological manifestations or physical data as with other psychological or cognitive processes. The previous research has used questionnaires, classroom observations, participant interviews, self-reporting, teachers’ ratings, or idiodynamic methods to measure WTC (e.g. Cao & Philp, 2006; de Saint Léger & Storch, 2009; MacIntyre & Legatto, 2011). The scaling measurement is the most common method used to assess WTC. McCroskey and Baer’s (1985) scale is the first psychometric measurement of WTC, developed initially for L1 communication. The items examine participants’ WTC with three types of receivers: stranger, acquaintance, and friend, and in four communication contexts: public speaking, meetings, group discussion, and dyad. All of the items are constructed in ordinary life circumstances. Data suggested satisfying reliability and validity of this scale (McCroskey, 1992). However, serious scrutiny revealed that this scale and most of its validation were conducted in bilingual contexts, making its face validity of application in L2 language learning settings weak (J. Peng, 2013). Cao and Philp (2006) also questioned its application in educational settings because of its wording in everyday situations.
MacIntyre et al. (2001) designed the first two questionnaires to measure WTC in and out of classrooms. Items in each questionnaire can be classified into four language skills: speaking, reading, writing, and comprehension. However, this scale lacks validity data, and its connection with the theoretical underpinning of WTC is obscure. Weaver’s (2005) L2 WTC scale devised on the Rasch model is the first endeavor to examine L2 speaking and writing WTC. The speaking part of this scale has been modified and validated by J. Peng and Woodrow (2010). Nevertheless, since the wording of its writing part is outdated and the validation data are absent, further modification and validation are needed. Ryan (2009) developed eight items to measure WTC inside and outside classroom settings on a 6-point Likert scale. Other questionnaires are almost modeled after the four aforementioned scales (J. Peng, 2013).
The review indicates the necessity to develop and validate a new scale assessing L2 learners’ writing WTC based on solid theoretical and empirical groundwork. By doing so, learners’ differences in performing them can be well discriminated against and, in turn, the theoretical underpinnings of L2 WTC as a whole can be strengthened. Adopting a sequential embedded mixed-methods design, we attempted to develop and validate a scale for measuring L2 writing WTC. In addition to examining its validity through inferential statistics, we value the role of qualitative data in strengthening its theoretical framework and adding content and face validity. The research questions were formulated as follows:
• Research question 1: How reliable and valid is the L2WWTCS?
• Research question 2: What is the confirmed underlying factor structure of L2 learners’ writing WTC?
II Method
1 Research design
Given the research questions, a three-phase sequential embedded mixed-methods design was adapted from Creswell et al. (2008), as shown in Figure 1, where qualitative and quantitative data were triangulated to ensure a well-validated L2 writing WTC scale. Qualitative data were collected before quantitative data to help the development of a new scale and afterward to verify the validity of the new scale together with quantitative data.

Sequential embedded mixed-methods design.
2 Participants
The participants included 10 interviewees involved in the preparation of L2WWTCS, 685 EFL learners in the quantitative examination of L2WWTCS, and 12 EFL learners in the follow-up interview recruited through convenience sampling on a voluntary basis. All the participants signed the consent letters after being informed of the research purposes and the anonymity and confidentiality of their personal information. The 10 interviewees were recruited after we posted the participant recruitment information on the notice board in a local tertiary educational institution. In this selection, we strived for the representativeness of different stakeholders. Among the 10 interviewees, two were university instructors responsible for teaching English and English writing, respectively. Their experiences and feelings regarding EFL learners’ writing WTC based on large-scale and long-term teaching contributed to the scale compilation. The other eight interviewees are undergraduates learning English writing in their relevant compulsory courses. Their majors represented various university academic disciplinary groupings, including Engineering (electrical engineering and computer science), Arts (Chinese language and literature), Law, and Media.
Given the three rounds of quantitative validation of L2WWTCS, the participants were divided into three groups. Their demographic information is shown in Table 1. In the first round, 288 undergraduates were recruited from a national university in southeast China. In the initial screening, three questionnaires were detected as mischief answers (i.e. same answers throughout the questionnaire), and one participant reported a different first language. Consequently, the four questionnaires were excluded from the statistical analyses. None of the respondents reported long-term living or studying experiences (longer than one month) in English-speaking countries. In the second round, 224 undergraduates from the same university participated in the research. The initial screening identified seven mischief answers, and one participant reported a one-year learning experience in an English-speaking country. They thus were excluded from the statistical analysis. In the third round, 173 English-language-major undergraduates in another large project completed the new scale. All the data were suitable for further quantitative analysis.
Demographic information of participants in quantitative data collection.
Note. EFL = English as a foreign language.
The 12 participants (4 males and 8 females) in the follow-up interview were from the third-round survey. They agreed to attend the follow-up interview voluntarily to discuss their relevant experiences.
3 Research procedure
The research procedure can be divided into three steps, during which principles laid out in Dörnyei and Taguchi (2009) were followed in our modification, administration, and analysis of questionnaire items. In Phase One, the scale items were generated based on our literature review and thematic analysis of the interview data. The content validation in item generation is the key to ensuring psychometric soundness (Cortina et al., 2020). However, previous research has not emphasized improving content validity or reporting the details of their endeavors to improve content validity. In this research, the literature review and the thematic analysis of the interview data could improve the content validity of the proposed questionnaire by caring for both the conceptual and operational definitions of L2 writing WTC (Sudina, 2023). In Phase Two, the initial scale was examined in two rounds through vigorous statistical analysis, including exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). In Phase Three, further quantitative and qualitative data yielded more evidence of the validity of the new questionnaire. Drawing on Ivankova et al.’s (2006, p. 15) graphic presentation of the mixed-methods design, we illustrate the research procedure in Table 2. We also provide more details on each phase in the following sections.
Visual model of the research procedure.
a Qualitative data collection and analysis
In the beginning, we thoroughly reviewed the existing literature exploring contributing factors to L2 learners’ WTC. Accordingly, we attempted to construct a multi-faceted model of L2 WTC to increase the content validity of the new scale. However, the existing literature primarily focused on speaking situations or WTC in general terms. No research paid exclusive attention to L2 writing WTC. As a result, qualitative data collected through semi-structured interviews were examined to tease out features related to the L2 writing WTC so that more questionnaire items could be generated to flesh out the key constructs of the new scale.
An interview protocol (see Appendix A) was pre-developed to ensure the interviews proceeded effectively and efficiently. We framed the initial questions based on the proposed research questions, and an experienced TESOL teacher helped revise them. Four sections with 10 questions were determined in the final protocol and believed to elicit enough responses. The 10 questions were general questions for the interviewer to follow. In the interview, the interviewer asked the questions with more details. If the interviewees were not clear about the questions and did not know how to respond, the interviewer would help them navigate through the questions with prompts. Moreover, the interviewer also asked additional questions based on the interviewees’ answers. Every interview session lasted for no more than 30 minutes in case of fatigue. Interviews were communicated in Chinese since Chinese was the interviewees’ mother tongue, which helped them express their experiences and ideas comfortably and thoroughly. Interviews were audio-recorded for future transcribing, reviewing, and thematic analysis.
The first author was responsible for analysing the interview data. Braun and Clarke’s (2006) proposal of six phases of thematic analysis was adopted. The interview data were first transcribed from the verbal form into the written form. A transcribing machine was employed to finish the initial transcription, after which the researchers manually checked the transcripts against the audio recordings to improve their accuracy. In this process, the researcher became familiar with the data. The initial coding criteria were established through (re)reading the data using both theoretically driven (deductive) and data-driven (inductive) methods. The coding process was conducted in NVivo 12. The researcher coded not only segments that were correlated with ideas having been documented by existent literature but also segments emerging in data. After two rounds of consistent and systematic (re-)coding, 21 codes were established. Intra-coder reliability calculated by intraclass correlation coefficient was 88.9%. After that, emerging codes were classified into potential overarching themes, during which the qualitative data were re-examined to justify codes if problems arose. An expert in the relevant field was consulted on the classification and definition of themes. Some of Nowell et al.’s (2017) suggestions on establishing trustworthiness for each phase of thematic analysis were adopted to ensure the credibility of thematic analysis in this study, including member checking, peer debriefing, and researcher triangulation.
An item pool with more than 30 items was generated in this process. At last, 19 items were adopted to prepare the initial 7-point Likert scale, which was confirmed through consultations with an expert in second language education. Two English language experts were consulted for the wording. They both have had studying experiences in English-speaking countries (i.e. England and Canada) and have conducted survey research. Their suggestions made the questionnaire items more accurate and easier to understand. For example, Item 15 read initially I have a good sense of logic and was then changed into I can write logically.
b Quantitative data collection and analysis
The three rounds of quantitative data were collected in the same way. We approached the potential participants after their classes with the lecturers’ permission. In total, 15 minutes were required to successfully conduct the data collection process. The questionnaires were distributed to the participants in person and paper-and-pen format. Since participants possessed different levels of English proficiency, the Chinese version of the L2WWTCS was provided to eliminate the potential unfairness raised by language proficiency. One of the researchers translated the questionnaire into Chinese. A university instructor with an accredited translation certificate then revised the draft translation with two principles: accuracy and readability. The Chinese version was also back-translated by a bilingual to ensure the equivalence of meaning.
Based on the results of EFA on the first-round data, the initial scale was modified (see Appendix B). In the second round, we conducted EFA, CFA, and other validation methods, which yielded the factor structure of L2 writing WTC. The final scale was examined its measurement invariance by the third-round data through CFA and multigroup CFA.
c Follow-up qualitative data collection and analysis
Follow-up interview data were collected as part of a large research project involving the implementation of the newly developed scale. We interviewed the participants to gather the face validity of the scale and investigated the sources of Chinese EFL learners’ writing WTC. The data were handled using the same practice as the previous qualitative data analysis. All the names used in the corresponding results part were pseudonyms.
III Results
1 Thematic analysis of precedent interview data
In total, 21 codes related to L2 writing WTC were identified. Their names, examples, and frequency numbers are shown in Table 3. After the negotiation between authors, five themes, defined as Writing Task Traits, Individual Differences, Teachers and Peers, Self-Perception of English Language Proficiency, and Miscellaneous, were generated to incorporate the 21 codes (see Figure 2). More than 30 items were then drafted to incorporate the 21 codes and constitute the item pool. After consultations with an expert in second language education, items related to prompts, interactive modes, physical and mental states, abroad experiences, and scores were believed to be less relative to the construct of L2 writing WTC and thus deleted under the expert’s suggestions. Items that overlapped were also deleted. In total, 19 items were selected from the item pool to compile the initial scale.
Emerging codes through thematic analysis.

Five themes identified to cluster emerging codes.
2 Descriptive statistics and distribution normality check of quantitative data
All the questionnaire data, including the demographic information and scaling data, were first imported into Excel. Descriptive data of each item, including mean score, standard deviation, skewness, and kurtosis, were presented in Appendices C, D, and E. Box-and-whisker plot showed no outlier in the collected data. Distribution normality data are shown in Table 4.
Descriptive and distribution normality statistics of quantitative data.
3 Exploratory factor analysis of the first- and second-round data
The Kaiser–Meyer–Olkin (KMO) measure of the first-round data was .80, larger than the cut-off value of .60. Thus, the total number of samples was adequate for further statistical analysis. The result of the Bartlett’s test of sphericity was χ2 = 1662.01 (p < .001, df = 105), supporting the appropriateness of EFA. Before the EFA, parallel analysis was conducted to determine the factors to be kept in an EFA. The yielded scree plot suggested that five factors existed in the model (see Figure 3).

Scree plot of eigenvalues of principal factors in the first round.
This research adopted the principal axis factoring and the promax rotation. A Kaiser normalization was conducted before the rotation. Considering the sample size in this round of data, the threshold of factor loading cut-off should be at least .35 (Hair et al., 1998). Items 9, 14, 15, and 19 were excluded since their factor loadings were lower than .35. The factor analysis results and the factor loadings of all the items are presented in Figure 4 and Table 5. This model accounted for 46.60% of the total variance.

Graphical presentation of factor analysis in the first round.
Factor loadings of the first-round quantitative data (n = 284).
The possible underlying factor structures identified by EFA were then labeled thematically by analysing the contents of items grouped under each factor: Factor 1, Writing Task Traits (WTT); Factor 2, English Language Ideology (ELI); Factor 3, Writing Teacher Support (WTS); Factor 4, Interest in English Language (IEL); Factor 5, Self-Perception of English Language Proficiency (SPELP). Item 5 was deleted since it was theoretically uncorrelated with the other two items clustered around Factor 4. After the EFA and the deletion of items, this questionnaire was inappropriate for CFA since the number of items clustered under Factors 2, 4, and 5 was less than 3, a threshold number of latent variables. As a result, another three items, believed to correlate with the corresponding factors, were extracted from the item pool. An updated questionnaire with 17 items was prepared for further examination.
For the second-round data, the KMO measure was .87, larger than the cut-off value of .60, indicating the sampling adequacy. The result of the Bartlett’s test of sphericity was χ2 = 1887.87 (p < .001, df = 136), indicating the second-round data were suitable for EFA. As for the number of factors, parallel analysis and the yielded scree plot suggested four factors in the model (see Figure 5). Since the parallel analysis was just one of the various statistical methods available for deciding the number of factors to be included in EFA, the most appropriate number can differ from its result. Consequently, this research tried both four and five factors to see which one accounted for more variance. The 4-factor model, which combined IEL and SPELP into one factor (i.e. IP), could explain 58.20% of the total variance, and the 5-factor model accounted for 61.20%. Besides, scrutiny of the two models revealed that the 5-factor model was theoretically more acceptable.

Scree plot of eigenvalues of principal factors in the second round.
Considering the sample size in this round of data, the threshold of factor loading cut-off should be at least .40 (Hair et al., 1998). The factor loadings of all items exceeded the threshold number. The results of factor analysis and factor loadings of all the items are shown in Figure 6 and Table 6.

Graphical presentation of factor analysis in the second round.
Factor loadings of the second-round quantitative data (n = 216).
4 Validity and reliability tests
a Construct validity
CFA was adopted to verify the factor structure described by EFA. Since the parallel analysis suggested a 4-factor model, we tested both the 4-factor model (Model 1) and the 5-factor model (Model 2) with CFA, as shown in Figures 7 and 8. The model fit indices of both Model 1 (χ2 = 265.694; df = 113; χ2/df = 2.351; TLI = .899; CFI = .916; RMSEA = .079 [.067, .092]; SRMR = .068) and Model 2 (χ2 = 199.052; df = 109; χ2/df = 1.826; TLI = .938; CFI = .950; RMSEA = .062 [.048, .076]; SRMR = .062) met the threshold values. The further comparison between the two models indicated Model 2 was better (χ 2M1 − χ 2M2 = 66.642; dfM1 − dfM2 = 4, p < .001). Considering the strong correlation between WTT and SPELP, hierarchical CFA was conducted to test whether including a second order (Model 3, see Figure 9) improved model fit. The results indicated that Model 2 was better than Model 3 (χ 2M3 − χ 2M2 = 6.036; dfM3 − dfM2 = 2, p = .049). The third-round data were also subjected to CFA to generate Model 4 (see Figure 10). The model fit indices of Model 4 (χ2 = 226.530; df = 109; χ2/df = 2.078; TLI = .900; CFI = .898; RMSEA = .079 [.065, .094]; SRMR = .074) basically met the threshold values. The results of multigroup CFA between Model 2 and Model 4 (see Table 7) showed the existence of acceptable measurement invariance.

Four-factor model of the second language writing willingness to communicate scale (L2WWTCS).

Five-factor model of the second language writing willingness to communicate scale (L2WWTCS).

The model yielded by hierarchical confirmatory factor analysis (CFA).

The model yielded in the third-round quantitative data.
Model fit results of measurement invariance models.
b Internal consistency reliability
The Cronbach’s Alpha for the five subscales were .83, .82, .70, .89, and .86, indicating this questionnaire had high internal consistency reliability. Compared with Cronbach’s Alpha, McDonald’s Omega took account of the strength of association between items (McDonald, 1999), which, as a result, was believed to be a better substitute for Cronbach’s Alpha. In our data, the McDonald’s Omega for the five subscales were .83, .82, .71, .90, and .85, all higher than the cut-off value .70.
c Split-half reliability
The questionnaire was split into two halves. One half comprised odd-numbered items, and the other consisted of even-numbered items. The Pearson Correlation Coefficient between the scores for the two halves was .80 (p < .001), which showed that all parts of the questionnaire contributed equally to measuring participants’ writing WTC, indicating high split-half reliability of this scale.
d Inter-rater reliability
To examine inter-rater reliability, the average score of each item collected in this study was compared with the data collected in a following project. The Pearson Correlation Coefficient between the pairs was .96 (p < .001), indicating very high inter-rater reliability.
5 Qualitative results of participants’ L2 writing WTC
The follow-up interview conducted immediately after the participants finished the questionnaire was intended to delve deeper into their answers, allowing the researcher to enrich the quantitative interpretation and collect face validity evidence. With their answers, we explored the sources of L2 learners’ writing WTC in the Chinese context.
In the five sub-scales of L2WWTCS, Writing Task Traits is the most writing task-specific dimension. The interviewees expressed various opinions on the traits of writing tasks that influenced their WTC. Unlike other parts of language abilities, writing tasks usually require task-takers to establish a position, present reasons, and evaluate evidence logically, increasing learners’ cognitive demands. Before they write, learners spend much more time planning, which could impair their WTC. Once you have an idea, you need to enumerate your evidence, which could not be fabricated impromptu but based on prior knowledge accumulated in your everyday life. As a result, learners need to employ prior knowledge and think critically about popular topics. Meanwhile, connecting opinions and evidence coherently and cohesively is challenging, requiring learners to practice their reasoning and argument ability validly. Other factors related to task features referred to by respondents also included the structure of essays and how to organize the structure with diversified syntactic structures and accurate vocabulary. Cao expressed her ideas on this issue.
Cao (female, 19, learning English for 10 years): For writing, the principal thing is to have a clear viewpoint. Once you have one, you can continue writing by listing your reasons. You need to persuade others, just like debating. If I want to complete this task well, I really need to think a lot. I cannot start writing once I see the task.
The data revealed that interviewees’ interest in learning English could be categorized into two types. One type of interest originated from their natural or nurtured interest in learning languages regardless of English being their major. They favored acquiring English in natural settings, taking the initiative in approaching English materials, such as movies, books, and TV series. They had been keen on learning foreign languages and chose their major out of personal propensity. The other type of interest was closely correlated with their recognition of the importance of English. Majoring in English, some interviewees confessed that, although they were initially apathetic to English, they gradually developed an interest in it. The reasons could be attributed to their immersion in an English-speaking environment constructed courtesy of their department and their decision to work in English-related fields. Chen Z.’s and Chen Y.’s opinions represented these two types of interest.
Chen Z. (male, 20, learning English for 11 years): I have been interested in learning English since I was a kid. When I was an elementary school student, I watched English TV series on electronic devices at night without my parents’ notice. This has influenced me a lot. Now, I feel that classroom knowledge is not enough for me. Basically, I try to learn more by surfing the Internet. Chen Y. (male, 20, learning English for 11 years): I am interested in learning English because this is my major. I will work in this field in the future. So, I certainly need to learn more stuff of English. And most of my classes are related to English.
Since participants scored lowest in their self-perceived English proficiency, their sources of such perception were carefully investigated. The results revealed that the test-oriented educational system was prominent in constructing their low confidence. Referring to the reasons why they felt unconfident in English proficiency, interviewees frequently mentioned their English subject grades. Seldom can test takers consistently achieve the highest in tests. These situations shifted their attention to the comparison with better performers while overlooking their personal growth. Cheng and Pan’s descriptions echoed this phenomenon.
Cheng (female, 19, learning English for 14 years): Before I went to university, my English grades were not good, so I had no confidence in my English ability. In the university, teachers’ evaluations of me, my poor spoken English, and my roommates’ good grades all make me feel disappointed in myself. Pan (male, 20, learning English for 11 years): I think my proficiency is maybe worse than pupils in the U.S. Oh, not maybe. It should be a sure thing. My proficiency is worse than native speakers. Because they have the environment. The environment to speak, to use. We don’t have the environment. In other words, our English knowledge is totally not from life but from the classroom or our self-study.
Interviewees’ English Language Ideology seems to be shaped by their recognition of English as a future survival skill. Many of them wanted to find English-related jobs, such as EFL teachers or job positions in multinational corporations. Writing as a fundamental language skill was deemed significant for them to be competitive in the job market. With this cognition, they were more determined to practice writing and more willing to communicate in writing. One typical illustration of this fact was found in Yang’s narrative.
Yang (female, 19, learning English for 12 years): English is my major and the most important thing I want to learn in the four-year study. Another reason is I want to choose a job related to English. So, it’s an essential skill for me, very important to me.
Several characteristics of writing teachers were important in improving EFL learners’ writing WTC. Teachers capable of arranging classroom activities properly and with distinctive charm seemed to be more favored by EFL learners. He’s idea was a generalization of these results.
He (female, 20, learning English for 11 years): Teachers’ teaching styles and their enthusiasm for English or their levels of professionalism influence my recognition of their teaching content. They also determine how willing I am to learn English.
IV Discussion
1 Validation of L2WWTCS
This study examined various aspects of validity and reliability in scale development (see Figure 11). In this section, we discussed how we achieved content, construct, and face validity, three fundamental types of validity evidence (DeVellis, 2017), in our scale development.

Validity and reliability examined in the scale development.
Content validity refers to the extent to which the scale items adequately cover the content domain of the investigated construct (DeVellis, 2017), which was informed by the literature review and qualitative data in this study. Our literature review indicated that individual and contextual attributes should influence L2 writing WTC simultaneously. This finding was corroborated in the thematic analysis of interview data. The five themes that emerged in the first-round thematic analysis were Writing Task Traits, Individual Differences, Teachers and Peers, Self-Perception of English Language Proficiency, and Miscellaneous. The first, third, and fifth sub-constructs tap into contextual dimensions. In their systematic review, J. Zhang et al. (2018) argued that task features (e.g. topic, type of activity, preparation time, and assessment) were critical situational antecedents of L2 WTC. Meanwhile, teacher and peer factors are also significant cues for L2 WTC. Research has revealed that teacher support, teacher engagement, teacher feedback, teacher–student relationship, and peer support can affect L2 WTC (e.g. Cao, 2011; S.J. Kang, 2005; Zarrinabadi, 2014; J. Zhang et al., 2018). The second and fourth sub-constructs tap into individual characteristics. Language ideology (Subtirelu, 2014), interest (Eddy-U, 2015), and language proficiency (Cao & Philp, 2006; Yashima et al., 2018) are well-documented individual traits related to L2 WTC. An expert in L2 education also confirmed these themes.
Construct validity relates to how well a scale measures the underlying structure, which can be divided into discriminant and convergent validity (DeVellis, 2017). EFA and CFA, including hierarchical and multigroup CFA, were used to examine construct validity. In this study, factor loadings over .40, absence of cross-loading, and acceptable latent factor correlation proved its discriminant validity; factor loading over .40 and items in their theoretically posited latent variables confirmed its convergent validity. The CFA confirmed a 5-factor model: WTT, ELI, WTS, IEL, and SPELP, which are the aforementioned critical cues to L2 WTC. At last, the model was examined its measurement invariance by the third-round quantitative data from a different language learning context, indicating a satisfying fitness of the model across second language learning contexts.
Face validity indicates the degree to which the measure appears to be related to the focal construct in the subjective judgment of non-experts (DeVellis, 2017). In this article, we confirmed the scale’s face validity by collecting follow-up qualitative data of EFL learners’ writing WTC sources. The interviewees’ narratives corresponded to the sub-structures confirmed in factor analysis and provided vivid and detailed explanations of each variable, which gave strength to the statistical validation we did previously.
2 Underlying factor structure of L2 writing WTC
The first factor is related to writing features. In J. Zhang et al.’s (2018) review of the situational antecedents of L2 WTC, task was recognized as a critical overarching category of situational cues. According to their proposed model, task features, including time (e.g. Zarrinabadi, 2014; Zhong, 2013), types of activity (e.g. Cao, 2011; de Saint Léger & Storch, 2009; Eddy-U, 2015; J.-E. Peng, 2012) and topic (e.g. Cao, 2011; S.J. Kang, 2005), and thematic categories of topic (e.g. Cao, 2011), including content knowledge and relevant vocabulary, will affect L2 WTC by regulating L2 learners’ confidence and motivation. In the final scale, time and topic were explicitly stated, and activity was implicitly implied.
The second factor, English Language Ideology, refers to L2 learners’ attitudes and beliefs about the roles of the English language in their social worlds. Language ideology has evolved as an essential concept in linguistic anthropology, in which its conceptualization is still under debate. In general, language ideology in linguistic anthropology could be roughly defined as people’s concepts concerning the roles of language in social experiences within a cultural group (see Kroskrity, 2004). Subtirelu (2014) imported this concept into the research on WTC, arguing that it would contribute to the theoretical underpinning of WTC. His research has indicated that positive language ideologies could promote WTC significantly. In this scale, items clustered under this factor presented statements on the importance of the English language in academic achievements and future careers.
The third factor, classified as Writing Teacher Support, deals with the role of teachers’ support in promoting writing WTC. Writing teachers’ support stated in this scale includes teachers’ scaffolding, feedback, and other behaviors that may stimulate a sense of appreciation. Scaffolding has been strongly recommended to be added to the instructional repertoire as it can help learners achieve learning targets with motivation (Cotterall & Cohen, 2003; Hammond, 2002). Research has confirmed that certain types of feedback could enhance the language learning or writing process (Hyland & Hyland, 2006; L.J. Zhang & Cheng, 2021) and promote writing motivation and engagement (Yu et al., 2020). Teacher support has been extensively recognized as a situational cue to promote WTC (J. Zhang et al., 2018). Fallah (2014) maintained that teacher immediacy, defined as teachers’ ‘nonverbal and verbal behaviors, which reduce psychological and/or physical distance between teachers and students’ (Christophel & Gorham, 1995, p. 292), influenced WTC by regulating motivation and confidence. Wen and Clément (2003) also pointed out that teachers’ immediacy, attitudes, and styles affected WTC from a sociocultural perspective.
The fourth factor tapped into L2 learners’ Interest in English Language. The research conducted by Amiryousefi (2018) confirmed that interest contributed significantly to L2 learners’ WTC. Interest is a considerable motivator for initiating WTC (Eddy-U, 2015) and can directly influence learners’ behaviors and involvement in learning (Amiryousefi, 2018). L2 learners with higher interest are thus more willing to engage in their L2 writing.
The fifth factor, Self-Perception of English Language Proficiency, has been documented as a highly correlated antecedent of L2 WTC. Some research acknowledged self-perceived communicative competence as the most crucial predictor of L2 WTC (MacIntyre & Charos, 1996; MacIntyre & Gardner, 1994). In MacIntyre et al. (2002), self-perceived communicative competence was the only variable significantly correlated with WTC. Shirvan et al.’s (2019) meta-analysis revealed that self-perceived communicative competence moderately correlated with L2 WTC.
In the two L2 learning contexts we investigated (i.e. L2 English undergraduate learners across majors and English major undergraduates), there were variations in the overall levels of L2 writing WTC and the five underlying constructs. English major undergraduates seemed to have higher L2 writing WTC. This result was mainly attributed to their higher scores in Writing Task Traits, Writing Teacher Support, and Interest in English Language. Surprisingly, although the English major undergraduates had higher English proficiency (i.e. upper-intermediate), they had similar low levels of self-perception of English proficiency as the L2 English undergraduate learners.
V Conclusions
This article contributes to developing and validating a scale for measuring L2 learners’ writing WTC with a sequential embedded mixed-methods design. To ensure theoretical soundness, literature was scrutinized to extract potential contributors to L2 writing WTC. These contributors were further empirically examined by interviewing the target population to generate the item pool. With large-scale data, carefully selected items were evaluated by EFA, CFA, and other scale validation methods. The results confirmed a 5-factor model of L2 writing WTC: Writing Task Traits, English Language Ideology, Writing Teacher Support, Interest in English Language, and Self-Perception of English Language Proficiency. After completing the newly developed scale, this underlying model was finally examined through another round of quantitative tests and the immediate retrospective interview with EFL learners in a different L2 learning context. The results indicated that the scale could accurately reflect their L2 writing WTC.
Validity is the key to the scale development in this article. A sequential embedded mixed-methods design can improve the chance of high validity uttermost. First, building literature reviews and qualitative data into item generation ensures the content validity of the new scale. Second, vigorous statistical examination further illuminates the proposed theoretical structure to increase its construct validity. Then, examining the model in a different context to test its generalizability across contexts strengthens its construct validity. Finally, a follow-up qualitative analysis gives more details of the confirmed model and further provides face validity.
This research has evident implications for scale development in L2 education and L2 writing learning and teaching. The triangulation of qualitative and quantitative data in scale development should be present and also iterative. Considering the diversity and complexity of L2 learning contexts, a well-validated scale should be examined several rounds by multi-source data. Moreover, to increase L2 learners’ writing WTC, educators and practitioners may focus on motivational predispositions, affective and cognitive factors, and writing-specific features. For motivational predispositions, L2 writing WTC could be enhanced by developing L2 learners’ interest in learning English. Furthermore, affective and cognitive features, including L2 learners’ attitudes toward English and their self-perceived English proficiency, also play indispensable roles. Finally, sufficient knowledge of writing tasks and teacher support also improve L2 learners’ writing WTC.
Footnotes
Appendix A
Appendix B
Appendix C
Descriptive statistics of the first-round quantitative data of the second language writing willingness to communicate scale (L2WWTCS).
| Item | M | SD | Skewness | Kurtosis |
|---|---|---|---|---|
| 1 | 3.37 | 1.28 | 0.07 | −1.15 |
| 2 | 5.61 | 1.01 | −1.33 | 2.27 |
| 3 | 5.84 | 1.04 | −1.24 | 2.18 |
| 4 | 3.94 | 1.28 | −0.20 | −0.53 |
| 5 | 3.56 | 1.42 | 0.24 | −0.83 |
| 6 | 3.10 | 1.31 | 0.24 | −0.64 |
| 7 | 3.45 | 1.30 | 0.16 | −0.79 |
| 8 | 4.65 | 1.23 | −0.54 | −0.34 |
| 9 | 4.98 | 1.32 | −0.73 | 0.19 |
| 10 | 5.27 | 1.27 | −0.66 | 0.06 |
| 11 | 3.29 | 1.30 | 0.15 | −0.75 |
| 12 | 4.79 | 1.39 | −0.67 | 0.35 |
| 13 | 3.66 | 1.26 | −0.04 | −0.59 |
| 14 | 5.11 | 1.24 | −0.88 | 0.32 |
| 15 | 4.53 | 1.47 | −0.36 | −0.81 |
| 16 | 4.15 | 1.19 | −0.20 | −0.32 |
| 17 | 5.51 | 1.11 | −1.04 | 1.65 |
| 18 | 5.21 | 1.29 | −1.00 | 0.96 |
| 19 | 4.95 | 1.20 | −0.66 | 0.17 |
Appendix D
Descriptive statistics of the second-round quantitative data of the second language writing willingness to communicate scale (L2WWTCS).
| Item | M | SD | Skewness | Kurtosis |
|---|---|---|---|---|
| 1 | 3.38 | 1.37 | 0.10 | −1.00 |
| 2 | 5.14 | 1.30 | −1.01 | 0.69 |
| 3 | 5.49 | 1.25 | −0.91 | 0.87 |
| 4 | 3.75 | 1.37 | 0.06 | −0.70 |
| 6 | 3.17 | 1.36 | 0.25 | −0.96 |
| 7 | 3.64 | 1.43 | 0.03 | −0.86 |
| 8 | 4.75 | 1.31 | −0.49 | 0 |
| 10 | 4.85 | 1.47 | −0.45 | −0.31 |
| 11 | 3.25 | 1.35 | 0.29 | −0.64 |
| 12 | 4.12 | 1.52 | −0.26 | −0.66 |
| 13 | 3.36 | 1.26 | 0.30 | −0.08 |
| 16 | 3.87 | 1.34 | −0.08 | −0.37 |
| 17 | 5.36 | 1.33 | −0.93 | 0.75 |
| 18 | 4.61 | 1.53 | −0.54 | −0.47 |
| 20 | 3.01 | 1.29 | 0.33 | −0.74 |
| 21 | 5.78 | 1.19 | −1.39 | 2.69 |
| 22 | 4.06 | 1.62 | −0.08 | −0.91 |
Note. 0 is a rounded number.
Appendix E
Descriptive statistics of the third-round quantitative data of the second language writing willingness to communicate scale (L2WWTCS).
| Item | M | SD | Skewness | Kurtosis |
|---|---|---|---|---|
| 1 | 3.21 | 1.34 | 0.29 | −0.80 |
| 2 | 5.17 | 1.35 | −1.25 | 1.20 |
| 3 | 5.58 | 1.24 | −0.93 | 0.97 |
| 4 | 3.74 | 1.30 | 0 | −0.58 |
| 6 | 3.20 | 1.28 | 0.25 | −0.90 |
| 7 | 3.78 | 1.41 | −0.08 | −1.02 |
| 8 | 4.88 | 1.34 | −0.64 | 0.06 |
| 10 | 4.90 | 1.41 | −0.41 | −0.34 |
| 11 | 3.34 | 1.33 | 0.22 | −0.73 |
| 12 | 4.39 | 1.36 | −0.35 | −0.30 |
| 13 | 3.37 | 1.15 | 0.14 | −0.69 |
| 16 | 3.92 | 1.20 | −0.25 | 0.25 |
| 17 | 5.62 | 1.09 | −0.91 | 1.10 |
| 18 | 4.93 | 1.40 | −0.86 | 0.28 |
| 20 | 3.05 | 1.21 | 0.22 | −0.59 |
| 21 | 5.94 | 1.09 | −1.56 | 4.07 |
| 22 | 4.24 | 1.53 | −0.35 | −0.67 |
Note. 0 is a rounded number.
Author contribution statement
Y. Zhang conceived and designed the study. Y. Zhang collected and analysed the data, drafted the manuscript, and all the authors revised and approved the manuscript. L.J. Zhang finalized it for submission as the corresponding author.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The project was supported by a joint doctoral scholarship awarded to the first author by The University of Auckland, New Zealand and the China Scholarshp Council (CSC NO. 202108250009).
Ethical approval
This research is approved by the Human Participants Ethics Committee of the University of Auckland, New Zealand (reference number UAHPEC22974).
