The interaction of auditory acuity and L2 input in suprasegmental cue-weighting shifts after study abroad: Mandarin speakers’ perception of Spanish lexical stress

Abstract

This longitudinal study investigates how auditory acuity and the amount of second language (L2) input lead learners to shift L2 cue-weighting strategies towards native norms in an immersion context. Thirty Mandarin-speaking students were tested on Spanish lexical stress perception upon arrival (T1) and before leaving Spain after one academic year (T2). Due to cross-linguistic influence, the learners placed more weight on pitch for stress perception, unlike Spanish natives, who relied more on duration. However, at T2, the learners upweighted the duration cue, suggesting a potential L2 cue-weighting shift. At the individual level, more L2 input helped learners with lower pitch acuity shift towards a native-like cue-weighting strategy, whereas those with higher pitch acuity showed the opposite pattern. The results highlight the persistent influence of L1 prosodic transfer and the interaction of cognitive and experiential factors in reshaping L2 perceptual categorization, which underscores the importance of individual differences in L2 speech acquisition in naturalistic settings.

Keywords

auditory acuity cue-weighting L2 input lexical stress second language speech perception

1. Introduction

People use multiple acoustic cues to categorize linguistic units (Hillenbrand et al., 1995; Holt and Lotto, 2006), but listeners tend to weigh one cue over the rest in speech perception. For instance, the American English tense–lax vowel pair /i:–ɪ/ contrasts in both duration and formant, but native listeners predominantly use formant cues to distinguish the two vowels (Hillenbrand et al., 1995; Kondaurova and Francis, 2008). Similarly, English /r/ and /l/ acoustically differ in the onset of the second and third formants (F2 and F3), but F3 outweighs F2 for L1 English listeners’ identification of /r–l/ contrast (Iverson and Kuhl, 1994; Iverson et al., 2003). More importantly, the ‘cue-weighting’, which refers to the relative use of various acoustic cues in language perception, varies across languages. For instance, lexical stress can be defined by pitch, duration, intensity, and formants in various languages (Fear et al., 1995; Jasmin et al., 2023). However, Spanish listeners place more weight on duration to perceive Spanish lexical stress (Li and Xi, 2024b), whereas English listeners rely on formant more than pitch to distinguish English lexical stress (Wang et al., 2024).

Crosslinguistic differences in cue-weighting can pose challenges for second language (L2) learners, as they may transfer their first language (L1) cue-weighting patterns to the L2 (Francis and Nusbaum, 2002; Francis et al., 2000; Kondaurova and Francis, 2008; Zhang and Francis, 2010). From the longitudinal perspective, learners gradually shift their cue-weighting strategies towards the native norms of the target L2 (Yazawa et al., 2020). Therefore, understanding how learners make the shift is important for L2 perceptual learning. Moreover, since cue-weighting strategies can vary among individuals in both L1 and L2 (Chandrasekaran et al., 2010; Kong, 2019; Kong and Kang, 2023), it is important to investigate how individual differences account for the L2 cue-weighting change in the long run. This study focuses on L1 Mandarin listeners to assess how they shift their L2 cue-weighting for Spanish lexical stress perception in a study abroad (SA) context over one academic year.

1.1. Cue-weighting in L2 perceptual learning and the role of amount of L2 input

The cue-weighting theory predicts that when learning an L2, L2 learners may differ from L1 listeners in their cue-weighting strategies for speech perception (Francis and Nusbaum, 2002; Holt and Lotto, 2006; Kondaurova and Francis, 2008; Zhang and Francis, 2010). For instance, some L2 learners of English (e.g. Japanese) do not rely on formant but rather duration to perceive the tense–lax vowel contrasts due to L1 transfer (Yazawa et al., 2020). Even though learners’ L1 does not show a vowel length contrast (e.g. Spanish, Russian, etc.), they may still place more weight on duration (Cebrian, 2006; Kim et al., 2018; Kondaurova and Francis, 2008) because processing durational cues may be cognitively less demanding than forming new categories relying on formant (Bohn, 1995; Escudero and Boersma, 2004). To perceive English lexical stress, both L1 Mandarin and L1 English listeners use formant and pitch information, but Mandarin listeners rely more on pitch (Zhang and Francis, 2010), while English listeners rely more on formant (Braun et al., 2011; Cutler, 1986; Cutler et al., 2007; Fear et al., 1995; Small et al., 1988; Wang et al., 2024). Studies on other language pairs also revealed similar conclusions, including L1-English–L2-Spanish, L1-Dutch–L2-English, among others (Ortega-Llebaria et al., 2013; Ortín and Simonet, 2022, 2023; Romanelli and Menegotto, 2015; Romanelli et al., 2015; Sagarra et al., 2024; Tremblay et al., 2021). These studies suggest the existence of a potential hierarchy of acoustic cues in speech perception: acoustic cues that are more salient in L1 or cognitively less demanding may weigh more in L2 perception than in L1 listeners of this same language.

L2 perceptual learning involves gradually shifting the cue-weighting strategies from the learners’ L1 patterns towards the native norms of the target L2, which we will term as ‘cue-weighting shift’. As perceptual cue-weighting seems to be stable across listeners and contexts, adult L2 learners often face challenges in cue-weighting shift. For instance, Korean lenis–fortis stops are primarily distinguished by pitch with little difference in voice onset time (VOT) (Holliday, 2014; Martínez García and Holliday, 2019). Lee et al. (2022) tested the development of French-speaking beginners’ perception of L2 Korean stops in a classroom learning setting. After one academic year, the learners showed limited cue-weighting shift from VOT to pitch to distinguish Korean lenis–fortis stops. The results suggest that, with instruction in a classroom context, it is not easy for cue-weighting to shift; more immersive exposure or explicit training is needed.

Nevertheless, cue-weighting shift can occur as learners’ L2 proficiency and L2 experience increase (Gilbert et al., 2026; Kong and Kang, 2023; Yu, 2023). For instance, L1 Dutch listeners with more proficient English abilities rely more on formants to perceive L2 English lexical stress than less proficient ones, which is considered more native-like (Tremblay et al., 2021). Similarly, L1 Mandarin listeners with long-term residence (3+ years) in English-speaking countries rely more on duration to perceive English phrase boundary contrasts than short-term residence (less than 1 year) learners, suggesting a potential trend for cue-weighting shift resulting from immersion experience (Petrova et al., 2023). In these cross-sectional studies, although the learners still overweigh the acoustic cues transferred from their L1, sufficient L2 experience can modify their cue-weighting strategy, especially in the naturalistic learning context of residence abroad.

However, what matters for cue-weighting shift seems to be the amount of L2 input rather than the length of residence as the two concepts do not necessarily entail each other (Flege and Bohn, 2021). Yazawa et al. (2017) used simulated data to model the relationship between L2 input and cue-weighting shift with a virtual learner. Their computational model predicts that although in 10 months, the virtual learner has started a cue-weighting shift, it requires a large amount of L2 input, which assumes 1,000 times of exposure to the target sounds per month. However, evidence from real learners shows that long-term residence does not entirely shift the learners’ L2 perceptual cue-weighting (Cebrian, 2006; Ingvalson et al., 2011). Recent studies demonstrate that the amount of L2 input positively affects L2 speech learning in an immersion context (Sun et al., 2024; Turner, 2024). Therefore, it is plausible that for L2 cue-weighting shift to occur, a sufficient amount of L2 input is necessary. In other words, even with the same length of residence, learners with more L2 input are more likely to shift their cue-weighting toward native-like norms than those with less input.

Notably, re-weighting the acoustic cues does not imply an absolute trade-off where upweighting one cue automatically downweights the other. Instead, learners can enhance sensitivity to multiple cues and establish a cue-weighting strategy that falls between their L1 pattern and the native norms of the L2. In Petrova et al.’s (2023) study, for instance, although the long-term residence group placed more weight on duration for phrase boundary contrast than the short-term group, they also did so with pitch. This suggests that even though a potential cue-weighting shift can be expected after long-term immersion, extensive L2 exposure may also upweight the acoustic cues that learners transfer from their L1.

Finally, although self-reports cannot avoid subjectivity, language questionnaires remain the most practical and widely used tool for indexing L2 input in longitudinal studies (Turner, 2024). Large-scale recording or ecological momentary assessment methods may pose challenges from logistical and analytical perspectives, especially when participants are dispersed across various institutions during SA. End-of-SA language history questionnaires can be uniformly administered and reliably coded, and these have yielded consistent results across studies. For instance, with questionnaires, previous studies have discovered the significant predictive role of self-estimated L2 use in the improvement of L2 pronunciation after study abroad, including media exposure, hours per day speaking with native speakers, and so on (Díaz-Campos, 2004; Muñoz and Llanes, 2014; Stevens, 2011; Sun et al., 2024; Turner, 2024). It seems that questionnaires can balance feasibility and validity for examining naturalistic L2 input in an SA context. Therefore, in the current study, we used a questionnaire to survey the participants’ amount of L2 input during SA.

To summarize, learners’ L1 cue-weighting strongly influences L2 perceptual categorization. Therefore, the cue-weighting shift in the L2 may need a long time and, crucially, a substantial amount of L2 input. This calls for longitudinal studies in an immersion context, which is the first motivation of the current study.

1.2. The role of auditory acuity in L2 speech perception

Beyond the influence of learners’ L1 background, individual differences in auditory acuity also play an important role in L2 speech learning. Auditory acuity can be assessed using either linguistic or non-linguistic stimuli, and these two types of abilities affect L2 learning differently. A well-studied example concerns pitch processing and the learning of L2 lexical tones. Linguistic pitch processing abilities, such as the identification or discrimination of Mandarin lexical tones, can predict success in L2 tonal word learning (Bowles et al., 2016; Chui and Qin, 2024; Cooper and Wang, 2012; Qin et al., 2021; Wong and Perrachione, 2007). By contrast, non-linguistic pitch processing abilities, measured with sine-wave or pure-tone stimuli, can better predict learners’ ability to generalize tonal learning to new contexts (Bowles et al., 2016).

Building on this distinction, recent research has increasingly focused on domain-general auditory acuity assessed with non-linguistic stimuli. Domain-general auditory acuity refers to the ability to precisely perceive basic acoustic dimensions such as pitch, formant, and duration, which is crucial for making effective use of L2 input during learning (Saito, 2023). Higher domain-general auditory acuity is associated with greater success in various aspects of L2 learning, including lexical proficiency (Saito et al., 2022c), speech perception (Kachlicka et al., 2019; Saito et al., 2022a), and production (Li et al., 2026; Saito et al., 2020a). Importantly, these effects are more consistently observed in immersion contexts than in classroom settings, possibly because classroom environments provide insufficient auditory input (Saito et al., 2021).

At the same time, domain-specific auditory acuity, particularly sensitivity to acoustic cues that are critical in the target L2 features, plays a central role in speech perception and learning. Higher pitch acuity is associated with better learning outcomes in L2 lexical tones (Qin et al., 2021, 2022; Zhou and Veríssimo, 2026) and intonation (Zheng et al., 2022). Similarly, duration acuity predicts the perception of L2 voicing contrasts based on VOT (Liu, 2022), the learning of L2 vowel length contrasts (Kempe et al., 2015), and lexical stress (Zheng et al., 2022). Finally, formant acuity is particularly relevant for acquiring difficult L2 vowel contrasts or consonant contrasts that require fine-grained spectral discrimination abilities (Lengeris and Hazan, 2010; Saito et al., 2022b).

However, few studies have tested the role of domain-specific auditory acuity in L2 cue-weighting shift, which requires a longitudinal perspective. One can formulate different hypotheses regarding the role of auditory acuity in the cue-weighting shift. First, if a learner has accurate domain-specific auditory acuity, in pitch for instance, more L2 input will strengthen their pitch-cue reliance for L2 perceptual categorization, especially when pitch is relevant to perceiving the L2 category (e.g. lexical stress). In line with this hypothesis, a recent study found that L1 Mandarin students with more precise formant discrimination improved L2 English stress production after eight-month study abroad (Saito et al., 2020b). It might be that precise formant discrimination abilities directed the learners’ attention to formant cues in speech perception, which was later applied to speech production. However, this speculation cannot be validated without direct perceptual evidence. A second hypothesis could be that, if a learner does not have a precise discrimination ability in the critical acoustic cue, more L2 input may draw their attention to other acoustic cues. For instance, an L1 Mandarin listener with weak pitch discrimination abilities would pay less attention to pitch in L2 stress perception, which would shift their cue-weighting towards other critical cues like duration. Again, evidence is needed to validate this assumption.

Overall, to investigate L2 cue-weighting shift, we need longitudinal evidence for the role of domain-specific auditory acuity and the amount of L2 input. This research gap motivates our decision to include the measures of the two individual difference factors.

1.3. Word-level prosody and acoustic cues of Spanish and Mandarin

The present study explores the lexical stress categorization of L2 Spanish by L1 Mandarin listeners. Therefore, we briefly present here the word-level prosody and the acoustic cues of the two languages.

Spanish is a stress language with lexical stress distinguishing lexical meanings, which is mainly cued by suprasegmental correlates (Ortega-Llebaria et al., 2013). In isolated words, stressed syllables are marked by higher pitch, longer duration and greater intensity than unstressed syllables (Hualde, 2012). When a word is embedded in a phrase, duration becomes a more reliable acoustic cue than pitch for perceiving lexical stress (Torreira et al., 2014), because pitch variations primarily mark phrase-level intonation patterns rather than word-level stress. Specifically, in the sentence-final (nuclear) position, the stressed syllable is often lengthened but shows various pitch patterns depending on the pragmatic function. In the non-final (prenuclear) position, a non-accented or low-accented word may assign a low pitch to the stressed syllable, whereas the pitch peak is sometimes delayed to the post-stressed syllable (Hualde and Prieto, 2015). In this case, duration becomes a more reliable cue for lexical stress contrasts (Ortega-Llebaria and Prieto, 2011). This interplay highlights how duration anchors stress identification at the sentence level, which challenges L2 learners’ perception of Spanish lexical stress.

Mandarin is a tone language, where the lexical tone is primarily organized around pitch specifications, while duration plays a secondary but functionally relevant role in lexical stress (Duanmu, 2007). In Standard Mandarin, each syllable is assigned one of the four citation tones, which differ in pitch height and contours: a flat tone mā ‘mother’, a rising tone má ‘hemp’, a dipping tone mǎ ‘horse’, or a falling tone mà ‘to scold’. Mandarin also shows evidence of lexical stress for which duration is the main phonetic correlate (Chen and Xu, 2006; Qu, 2013; Xu, 1997). A weak syllable with a neutral tone (e.g. ma, a question particle) can only be attached to a strong syllable with one of the four citation tones. Weak syllables are shorter than strong syllables but also vary in pitch contours (Lee and Zee, 2008). Because of the high functional load of lexical tones, tone language speakers are sensitive to pitch specifications (Bidelman et al., 2013; Chandrasekaran et al., 2009; Krishnan et al., 2009; Petrova et al., 2023). At the same time, as duration plays an important role in lexical stress, Mandarin listeners are still capable of acquiring duration as an acoustic cue in the perception of L2 suprasegmental features (Petrova et al., 2023; Qin et al., 2017).

Previous research has shown that Mandarin listeners largely rely on pitch for L2 Spanish lexical stress even after years of residence in Spanish-speaking countries. In perception, L1 Mandarin listeners relied on pitch more than duration to identify Spanish lexical stress at the sentence level, in contrast to L1 Spanish listeners who rely more on duration (Li and Xi, 2024b). Similarly, in running speech, Mandarin speakers produce Spanish stressed vowels with significantly higher pitch and longer duration than unstressed vowels or syllables without pitch accent, but Spanish speakers primarily manipulate duration (Li and Xi, 2022, 2023, 2024a). Therefore, we can predict that L1 Mandarin listeners will initially place more weight on pitch for the perception of Spanish lexical stress at the beginning of their stay in Spanish-speaking environments, which is the case with our participants.

1.4. The present study

The present study is motivated by two research gaps. First, there is not much longitudinal evidence on how the amount of L2 input facilitates cue-weighting shift in L2 perceptual categorization. Second, it is not clear whether and how auditory acuity and its interaction with the amount of L2 input affect the cue-weighting shift in L2 perception. We therefore formulate the following two hypotheses.

Hypothesis 1. At the group level, Mandarin listeners will initially rely more on pitch than duration to perceive L2 Spanish lexical stress due to L1 cue-weighting transfer. However, as duration plays a significant role in Spanish lexical stress perception, learners may upweight duration after SA. At the same time, since pitch is a relevant cue as well, learners can also reinforce their pitch-cue reliance after SA. These changes will result in a more categorical perceptual pattern of lexical stress.

Hypothesis 2. If SA leads to learners’ cue-weighting shift, they will show reduced reliance on pitch relative to duration. At the individual level, we expect that auditory acuity affects L2 cue-weighting, and that effect will emerge when participants receive a sufficient amount of L2 input. Specifically, participants with higher pitch acuity will place more weight on pitch to identify L2 lexical stress with an increased amount of L2 input. By contrast, those with lower pitch acuity would downweight the pitch cue with an increased amount of L2 input, which would facilitate a cue-weighting shift towards duration. The same hypothesis is formulated for duration acuity and duration cue-weighting.

2. Methods

2.1. Participants

This study was approved by the Norwegian Agency for Shared Services in Education and Research (SIKT). We recruited two groups of participants: a Chinese student group and a Spanish native group. All participants signed a consent form which allowed the researchers to collect and analyze their personal data. No one reported any history of speech or hearing impairments.

The Chinese students were 30 L1 Mandarin female learners of Spanish (M_age = 22.23 years, SD = 0.68). All were graduates in Spanish language and had received formal instruction in Spanish language for an average of 4.17 years (SD = 0.46) in China. None of the participants had prior SA experience. They moved to Spain to pursue a master’s degree, which was taught in Spanish. All the participants had taken at least one standardized Spanish proficiency test, Diploma de Español como Lengua Extranjera (DELE) and/or Servicio Internacional de Evaluación de la Lengua Española (SIELE), according to which, their Spanish proficiency ranged from B1 (intermediate) to C1 (advanced).

The Spanish natives were 19 female speakers of Peninsular Spanish (M_age = 21.11, SD = 3.05). They all grew up in Spain and reported using Spanish for their daily communication needs. None reported any study or living abroad experience except for short trips to foreign countries.

2.2. Materials and procedure

This study was part of a large SA project that consisted of a variety of tasks over one academic year. Considering the research questions at hand, we will focus on reporting the Spanish lexical stress perception task, the auditory acuity test, and the L2 use questionnaire.

2.2.1. Spanish lexical stress perception task

The auditory stimuli of the Spanish lexical stress perception task were recorded by a female L1 speaker of Castilian Spanish (age 26 years) in a soundproof room using a Shure SM35 headset microphone and a Zoom H4n Pro portable recorder. The speaker produced the words paso and pasó within the carrier sentence Por la plaza at a normal speech rate. The sentences differed in meaning due to the stress patterns of the words: the strong–weak (SW) paso pattern ‘I pass through the square’ and the weak–strong (WS) pattern pasó, ‘He passed through the square.’

We used Praat (Boersma and Weenink, 2017) to manipulate the auditory stimuli as follows. First, the intensity of the two syllables pa and so was adjusted to match the carrier sentence’s mean intensity of 64 dB, which remained unchanged in subsequent steps. Next, we created 49 auditory stimuli that varied in seven steps from word-initial stress paso to word-final stress pasó along two dimensions: vowel pitch and duration. For pitch cues, based on previous research indicating a mean fundamental frequency (F0) difference of 12 Hz between stressed and unstressed vowels in female Spanish native speakers (Li and Xi, 2022), we manipulated the mean F0 difference between the vowels in pa and so in seven steps: −18, −12, −6, 0, 6, 12, and 18 Hz, with a 12 Hz difference representing the medium-level contrast. From each of the seven steps along the F0, a seven-step continuum of duration was created. For duration cues, female Spanish speakers produced stressed vowels 1.5 times longer than unstressed vowels (Li and Xi, 2022). The mean vowel duration of the original paso and pasó produced by our model speaker was approximately 100 ms. We thus manipulated the vowel duration difference between pa and so in seven steps: −75, −50, −25, 0, 25, 50, and 75 ms, with a duration ratio of 1.5 (150 ms : 100 ms) being the medium level. The manipulation gave a 7 × 7 matrix of pitch and duration cues for the auditory stimuli. Finally, the 49 stimuli were appended after the carrier sentence to create the experimental stimuli.

The stress perception task was conducted using Praat. Participants first received instructions in their L1 to ensure they understood how to complete the task. They then completed five practice trials and had the opportunity to raise any questions with the experimenter. In the main task, participants listened to the 49 stimuli presented in a randomized order within a single testing block. Each stimulus was repeated 4 times, resulting in a total of 196 trials per participant. The entire task took approximately 10 minutes.

Chinese students completed the lexical stress perception task twice: once before SA (at T1) and once after (at T2), whereas Spanish natives completed the task only once. During each trial, participants listened to a sentence stimulus while seeing at the top of the screen an incomplete sentence without the subject noun ‘Por la plaza, paso/pasó ______’. They had to select the subject noun to complete this sentence based on their perception of lexical stress patterns of the verb. A yo response would indicate that the participants perceived a SW stress pattern of the verb, whereas an él response would indicate a WS pattern was perceived. Responses were recorded via keyboard key press, with participants pressing either the ‘Z’ or ‘M’ key. To minimize potential hand preference bias, the key-to-response mapping was counterbalanced across participants. Upon pressing the key for response, the subsequent trial began automatically.

2.2.2. Auditory acuity test batteries on pitch and duration at T1

We selected two AXB discrimination tasks from the auditory acuity test batteries (Mora-Plaza et al., 2022) to measure individuals’ perceptual sensitivity to pitch and duration. Only the Chinese students participated (at T1) in the two tasks which followed the same structure. At each trial, participants were presented with three auditory stimuli and were required to indicate whether the second stimulus differed from the first or the third by selecting numbers ‘1’ or ‘3’ displayed on the screen. Both the pitch and duration discrimination tasks comprised a continuum of 100 synthesized stimuli. In the pitch discrimination task, the F0 of the 100 auditory stimuli ranged from 330.3 Hz to 360 Hz. In the duration discrimination test, the duration continuum ranged from 0.25 ms to 0.5 ms. The 100 stimuli were organized by difficulty, with the task initiating at stimulus level 50. Task difficulty was dynamically adjusted based on participants’ responses: three consecutive correct responses would increase the difficulty by 10 levels, while a single incorrect response would result in a 10-level decrease.

The two tasks had a combined duration of 10 minutes. The system generated the results for each participant, which were later converted to Pitch Discrimination Score (in Hertz) and Duration Discrimination Score (in milliseconds) following Zheng et al. (2022). The two discrimination scores indicate an individual’s pitch and duration perceptual threshold, with a low score meaning good auditory acuity. For more details of the stimuli creation, refer to Kachlicka et al. (2019).

2.2.3. Questionnaires

We designed two questionnaires adapted from the language history questionnaire (Li et al., 2020) and the L2 English experience questionnaire (Sun et al., 2024). Only Chinese students were required to complete the questionnaires, which were therefore written in Chinese to ensure understanding. First, the linguistic background questionnaire used at T1 (see Appendix A) collected participants’ demographic information and linguistic experience, including current age, age of acquisition of Spanish, years of learning Spanish, and proficiency of all learned languages. Then, the language use questionnaire used at T2 (see Appendix B) asked participants to estimate the number of hours per day that they spent on various listening and speaking activities in Mandarin, English, Spanish, and Catalan during SA. Catalan was included because the participants resided in a Spanish-Catalan bilingual city. For this study, we only analyze the data of listening activities, which covered language input through music (e.g. songs), auditory materials (e.g. radio), audiovisual materials (e.g. TV), academic content (e.g. lectures), and non-academic content (e.g. casual conversations).

2.3. Data coding and statistical analyses

The final dataset comprised the following, with three Chinese students dropping out at T2. First, the lexical stress perception task yielded a total of 14,896 responses [(30 Learners at T1 + 27 Learners at T2 + 19 Spanish Natives) × 196 trials]. Second, the auditory acuity test at T1 yielded 30 Pitch Discrimination scores and 30 Duration Discrimination scores. Third, 27 Chinese students filled out the L2 use questionnaire at T2.

2.3.1. Coding the Spanish lexical stress perception task

We binary-coded the participants’ responses in the Spanish lexical stress perception task. Accordingly, 1 represented a yo response, which indicated a SW stress pattern, while 0 corresponded to an él response, a WS stress pattern.

2.3.2. Measure for the cue-weighting shift

We calculated the cue-weighting score for Spanish lexical stress perception of each participant, following Yazawa et al. (2020). We applied a logistic regression for every participant, with the binary responses as the dependent variable, and the z-scored pitch and duration steps as predictors. The resulting regression coefficients (β) for pitch and duration reflected the relative contribution of each cue to stress perception. The pitch cue-weighting (PQ) and duration cue-weighting (DQ) were calculated using Formulas (1) and (2), respectively. The sum of PQ and DQ for each participant equals 1. For instance, if one’s PQ is 0.7, their DQ will be 0.3. In other words, analyzing either measure will yield the same results. Therefore, we opted for PQ as the cue-weighting measure for the subsequent calculation.

PQ = \frac{β_{pitch}}{β_{pitch} + β_{duration}}

(1)

DQ = \frac{β_{duration}}{β_{pitch} + β_{duration}}

(2)

To assess the extent to which the Chinese students shifted their cue-weighting strategies towards the native norms, we calculated the change in the difference between each learner’s PQ and the Spanish natives’ average PQ from T1 to T2 (Formula 3). A positive score means that the Chinese student’s cue-weighting became closer to that of the Spanish natives at T2 as compared to T1, which indicates a cue-weighting shift towards the native norm. A negative score means the opposite and zero means no shift. We labelled this score as ‘Cue-Weighting Shift Score’.

Cue weighting shift score = | {PQ}_{T 1} - {PQ}_{native} | - | {PQ}_{T 2} - {PQ}_{native} |

(3)

2.3.3. Measure for the amount of L2 input

As this study focuses on speech perception, we only coded the Chinese students’ answers to the listening experience during SA, which reflected the amount of L2 input. For each participant, we aggregated the self-estimated number of hours per day listening to Mandarin (H_L1) and Spanish (H_L2) for all the activities. Following Li et al. (2020), we divided H_L2 by H_L1 to account for the individual biases in self-estimation. The resulting variable was labelled as ‘L2 Input Score’, which is a ratio between the amount of L2 and L1 input. See Formula (4).

L 2 input score = \frac{H_{L 2}}{H_{L 1}} = \frac{\sum hours per day listening to Spanish}{\sum hours per day listening to Mandarin}

(4)

2.4. Statistical analysis

The statistical analyses were performed in R (R Core Team, 2014) using the lme4 package (Bates et al., 2015) with the p-value estimated by the lmerTest package (Kuznetsova et al., 2017). The data visualization was performed using ggplot2 in tidyverse (Wickham, 2016), interactions (Long, 2019) and ggdist (Kay, 2024) packages.

To assess Hypothesis 1, we conducted two generalized linear mixed-effects models (GLMM). The regression coefficients of GLMMs can assess the extent to which the listeners relied on a specific acoustic cue to perceive lexical stress. Larger coefficients indicate more reliance and a crispier categorical boundary (Morrison, 2007). The dependent variable was the participants’ binary responses (1 = SW, 0 = WS). Model 1 assessed whether the Chinese students had changed their perceptual categorization of lexical stress from T1 to T2. The independent variables included Pitch Step, Duration Step, Session (T1 vs. T2) as well as all possible interactions.

Model 2 assessed whether the Chinese students at T2 differed from Spanish natives in the perceptual categorization of lexical stress. The independent variables included Pitch Step, Duration Step, Group (Chinese students vs. Spanish natives), and all possible interactions.

We added a random intercept of participant with Pitch Step and Duration Step being random slopes for each model. For Model 1, as the participants were tested twice (T1 and T2), we also included Session as a by-participant random slope.

To assess Hypothesis 2, we built a linear model (Model 3) with Cue-Weighting Shift Score being the dependent variable. The independent variables included the Pitch Discrimination Score, Duration Discrimination Score, L2 Input Score, and all possible interactions. All continuous predictors were z-score transformed prior to analysis.

3. Results

3.1. The change of lexical stress categorization over time

Figure 1 plots the percentage of SW responses given by the participants to each of the 49 synthesized stimuli. In what follows, we report on the statistical results of the two comparisons outlined in Section 2.

Figure 1.

Proportion of strong–weak (SW) responses to the synthesized stimuli ‘PASO’ in the Spanish lexical stress perception test, split into Spanish natives, Chinese students before study abroad (SA) (T1) and after SA (T2). /a/ is the vowel of the first syllable ‘pa’; /o/ is the vowel of the second syllable ‘so’.

3.1.1. Comparing the Chinese students’ lexical stress perception from T1 to T2

The analysis of Model 1 (Table 1) revealed the following results. First, there were significant main effects of Pitch Step and Duration Step, which means that an increase in pitch or duration differences between the vowels /a/ and /o/ resulted in a significantly higher proportion of SW (paso) responses at T1, as T1 was the reference level.

Table 1.

Summary of Model 1.

	Fixed effects				Random effects
	Fixed effects				By participant
	Log-odds	SE	z	p	SD
(Intercept)	0.45	0.13	3.56	< .001	0.64
Pitch step	2.21	0.18	11.96	< .001	0.96
Duration step	1.19	0.15	7.75	< .001	0.80
Session [T2]	−0.05	0.12	−0.38	.702	0.52
Pitch step × Duration step	0.06	0.05	1.30	.195
Pitch step × Session [T2]	0.86	0.09	9.81	< .001
Duration step × Session [T2]	0.45	0.07	6.35	< .001
Pitch × Duration × Session [T2]	−0.07	0.08	−0.90	.370

Notes. Model formula: response ~ pitch step * duration step * session + (duration step + pitch step + session | participant). Session is a factorial variable with dummy coding; reference level = T1.

Second, the significant two-way interactions of Pitch Step × Session and Duration Step × Session mean that the effects of Pitch Step and Duration Step on the participants’ responses changed from T1 to T2 (Figure 2). Both interactions revealed positive coefficients, which means that both acoustic cues had stronger effects on the participants’ responses at T2 than at T1. In other words, at T2, participants showed more pitch and duration cue-weighting for lexical stress categorization, although the improvement on pitch (β = 0.86) seemed to be larger than on duration (β = 0.45).

Figure 2.

The proportion of strong–weak (SW) responses as a function of pitch step (left panel) and duration step (right panel) given by Chinese students split by session (T1 vs. T2).

3.1.2. Comparing the Chinese students’ lexical stress perception at T2 to that of Spanish native speakers

The analysis of Model 2 (Table 2) revealed the following results. First, there were significant main effects of Pitch Step and Duration Step. This suggests that an increase in pitch or duration differences between /a/ and /o/ led to a higher proportion of SW responses in Spanish natives. The significant interactions of Pitch Step × Group (Figure 3, left panel) and Duration Step × Group (Figure 3, right panel) suggest that the Chinese students at T2 differed from the Spanish natives in terms of the pitch and duration cue-weighting for Spanish lexical stress categorization. Concretely, compared to Spanish natives, the Chinese students relied more on pitch (β = 2.13) but less on duration (β = −1.10) to perceive Spanish lexical stress contrast.

Table 2.

Summary of Model 2.

	Fixed effects				Random effects
	Fixed effects				By participant
	Log-odds	SE	z	p	SD
(Intercept)	0.06	0.16	0.39	.696	0.63
Pitch step	0.99	0.21	4.77	< .001	0.87
Duration step	2.73	0.23	11.82	< .001	0.92
Group [Chinese students T2]	0.32	0.21	1.56	.118
Pitch step × Duration step	0.16	0.07	2.27	.024
Pitch step × Group [Chinese students T2]	2.13	0.28	7.55	< .001
Duration step × Group [Chinese T2]	−1.10	0.30	−3.71	< .001
Pitch × Duration × Group [Chinese T2]	−0.13	0.10	−1.31	.191

Notes. Model formula: response ~ pitch step * duration step * group + (pitch step + duration step | participant). Group is a factorial variable with dummy coding; reference level = Spanish Natives.

Figure 3.

The proportion of strong–weak (SW) responses as a function of pitch step (left panel) and duration step (right panel) given by Spanish natives and Chinese students at T2.

Interestingly, there was a significant Pitch Step × Duration Step interaction on the reference level (Spanish natives). This means that Spanish natives’ reliance on pitch cue for Spanish lexical stress categorization was conditioned by duration cue. The positive coefficient (0.16) means that an increase in Duration Step led Spanish natives to give more SW responses as a result of an increase in Pitch Step. In other words, the Spanish natives used both pitch and duration cues for lexical stress categorization. By contrast, the Chinese students did not show such a performance pattern, which indicates that Duration Step did not significantly affect the effect of Pitch Step on their responses at both T1 and T2; this can be concluded from the Model 1 (for the non-significant interactions of Pitch Step × Duration Step and Pitch Step × Duration Step × Session, see Table 1).

To summarize, Chinese students predominantly relied on pitch for L2 Spanish lexical stress categorization before and after SA. Spanish natives relied on both pitch and duration cues although duration was stronger. After SA, the Chinese students upweighted both pitch and duration cues for Spanish lexical stress categorization, with pitch remaining dominant.

3.2. The role of auditory acuity abilities and the amount of L2 input in L2 cue-weighting shift

Table 3 summarizes the descriptive statistics of all the scores related to this section. In the supplementary materials, we provide a descriptive figure of the percentage of SW responses given by the Chinese students to each of the 49 synthesized stimuli, split by pitch discrimination abilities and amount of L2 use.

Table 3.

Descriptive statistics of pitch discrimination score (in Hertz), duration discrimination score (in milliseconds), L2 input score, pitch cue-weighting, and cue-weighting shift score.

	M (SD)	Range
Pitch discrimination score (Hz)	4.74 (3.00)	1.85 – 13.85
Duration discrimination score (ms)	49.83 (40.01)	12.08 – 215.83
L2 input score	1.20 (0.98)	0.31 – 3.67
Pitch cue-weighting
Chinese students T1: before SA	0.67 (0.17)	0.27 – 1.00
Chinese students T2: after SA	0.66 (0.16)	0.30 – 0.97
Spanish natives	0.27 (0.14)	0.08 – 0.67
Cue-weighting shift score	0.01 (0.15)	−0.18 – 0.57

Although we included Duration Discrimination Score as an independent variable, an initial analysis did not reveal any significant main effect of Duration Discrimination Score or its interaction with other factors. We therefore removed Duration Discrimination Score from Model 3, which increased the adjusted R² from 0.49 to 0.53. The final Model 3 involved the scores of Pitch Discrimination, L2 Input, and their interaction.

Model 3 (Table 4) did not reveal significant main effects of Pitch Discrimination Score or L2 Input Score on Chinese students’ Cue-Weighting Shift Score. However, there was a significant two-way interaction of L2 Input Score × Pitch Discrimination Score. Recall that the higher the Pitch Discrimination Score, the less sensitive one is to perceiving pitch differences. The positive coefficient means that after SA, as L2 Input Score increases, those who were less accurate in perceiving pitch cue showed a higher Cue-weighting Shift Score. In other words, participants with lower pitch acuity may not have sufficient access to pitch information. Therefore, an increased amount of L2 input may have drawn their attention to alternative cues such as duration and shifted their cue-weighting patterns towards native norms. As both independent variables are continuous variables, Figure 4 plots three regression lines of Cue-Weighting Shift Score as a function of L2 Input Score estimated at the mean value, one Standard Deviation above (+ 1 SD) and below (− 1 SD) the mean value of the Pitch Discrimination Score, respectively.

Table 4.

Summary of Model 3.

	β	SE	t	p
(Intercept)	−0.00	0.02	−0.07	.945
L2 input score	0.01	0.02	0.52	.611
Pitch discrimination score	0.03	0.02	1.70	.103
L2 input score × Pitch discrimination score	0.09	0.02	5.02	< .001

Notes. Multiple R² = 0.58; Adjusted R² = 0.53. Independent variables are centered at zero.

Figure 4.

Chinese students’ cue-weighting shift after study abroad (SA) in perceiving L2 Spanish lexical stress predicted by the pitch discrimination ability and the amount of L2 input during SA.

4. Discussion and conclusions

This longitudinal study investigated how individual differences in domain-specific auditory acuity and the amount of L2 input affect the cue-weighting shift in L2 perceptual categorization after a Study Abroad (SA) program. Overall, the L1-Mandarin–L2-Spanish learners showed a more categorical perception pattern of Spanish lexical stress after SA. The Chinese students upweighted both pitch and duration for Spanish lexical stress perception, with pitch always being the primary perceptual cue. There were considerable individual differences among the participants. Specifically, learners with lower pitch acuity showed more cue-weighting shift towards the Spanish norms given more Spanish input during SA. In what follows, we will organize our discussion by addressing the hypotheses.

Our results confirmed Hypothesis 1 that after SA, the L2 learners would show clearer lexical stress categories than before SA. This is validated by the significantly larger coefficients of both Pitch Step and Duration Step at T2 than at T1. The enhanced effects of pitch and duration can be viewed as an improvement in suprasegmental categorization after SA. However, pitch still remained the primary perceptual cue. Specifically, while learners significantly upweighted duration at T2, they upweighted pitch with a greater magnitude (see Model 1). Consequently, despite the improvement in duration use, the group-level results at T2 showed a larger divergence from the native norms compared to T1. Previous studies revealed similar findings among L1 Mandarin listeners with long-term immersion in the target L2 society, showing a tendency to place more weight on pitch in suprasegmental categorization (Li and Xi, 2024b; Petrova et al., 2023; Wang et al., 2024). The group-level analysis suggests that it is not easy to observe a cue-weighting shift at the group level even in an L2 immersion context. However, due to considerable individual differences among learners, the picture may be different when examining the SA effects at the individual level.

More importantly, the Chinese students’ duration-cue reliance became larger after SA, which suggests a potential cue-weighting shift at the group level. This finding conforms to a previous study where long-term residence in the target L2 society led to more duration-cue reliance for Mandarin listeners’ L2 suprasegmental categorization, compared to short-term residence (Petrova et al., 2023). In practice, Spanish lexical stress can be marked by multiple cues, including pitch, duration, and intensity (Hualde, 2012; Hualde and Prieto, 2015; Ortega-Llebaria and Prieto, 2011). Learners are free to choose whichever cue(s) to identify lexical stress in real-life communication. If paying attention to pitch can already identify stress most of the time, learners are likely to reinforce this perceptual strategy. As a result, the learners did not show a trade-off between pitch and duration weightings. Instead, SA led to a more categorical perceptual pattern with both cues, which allowed learners to integrate new strategies without abandoning the L1-transferred acoustic cues. However, at the sentence level, pitch is not always a reliable cue for Spanish lexical stress, whereas duration plays a more robust role (Torreira et al., 2014). Therefore, increased exposure to the L2 during SA may lead learners to start shifting their perceptual cue-weighting, although with considerable individual variation.

Our data support Hypothesis 2 that, at the individual level, auditory acuity interacts with the amount of L2 input on the cue-weighting shift. Specifically, lower pitch acuity may signal a more flexible cue-weighting shift, which leads participants to reduce pitch-cue reliance relative to duration to perceive lexical stress after SA given sufficient L2 input. By contrast, more L2 input could lead listeners with higher pitch acuity to rely more on pitch cues. In general, tone language speakers are sensitive to pitch changes and thus show stable pitch-cue reliance in the perception of speech and non-speech stimuli (Bidelman et al., 2013; Chandrasekaran et al., 2009; Krishnan et al., 2009). However, our data suggest that even within tone language speakers, individual differences in auditory acuity still result in different L2 learning outcomes. Participants with lower pitch acuity scores can recalibrate pitch-cue reliance with an increased amount of L2 input. This is likely because they paid less attention to pitch information, which reduced the significance of pitch as a critical acoustic cue in their processing of suprasegmental features. More importantly, our findings can possibly explain why previous studies revealed limited SA effects on L2 perceptual cue-weighting shift (Cebrian, 2006; Ingvalson et al., 2011), which might have been hindered by the individual variability across participants.

Duration discrimination ability could not significantly predict suprasegmental cue-weighting shift in L2 perceptual categorization. This finding aligns with the distinction between auditory acuity tested with non-speech stimuli and perceptual performance in L2 speech. For instance, L1 Japanese listeners can discriminate F3 differences in non-speech stimuli yet fail to perceive English /r/–/l/ distinction using the critical F3 cue (Miyawaki et al., 1975). Perhaps auditory acuity reflects the processing abilities at a lower order (Xu et al., 2024). It cannot fully turn over the L1-to-L2 cue-weighting transfer in lexical stress perception which happens at a higher-order cognitive level. However, more empirical evidence is needed to test this hypothesis. Moreover, a specific domain of auditory acuity may affect L2 learners’ cue-weighting in perceptual learning when the acoustic cue is relevant in their L1. In our case, duration is a secondary acoustic cue for Mandarin word-level prosody. It may not be easy to detect a significant predictive role of duration acuity in perceiving L2 word-level prosody. Liu (2022) showed that when perceiving L2 English /d–t/ contrast, Mandarin listeners’ duration acuity was related to their duration-cue reliance (VOT), but their pitch acuity and pitch-cue reliance (initial F0) did not correlate. This mirrors our results as duration (VOT) is more relevant than pitch for Mandarin stop contrast, whereas pitch is more relevant than duration for Mandarin word-level prosody. In line with Liu’s (2022) study, another longitudinal study by our team showed that duration acuity predicted Chinese students’ change of the categorical perception of L2 Spanish stops using VOT as the sole acoustic cue (Xi et al., 2026). The role of auditory acuity in L2 cue-weighting seems to be conditional on the relevance of the specific acoustic cue in the learners’ L1. Finally, our sample size and length of immersion might not allow sufficient power to find significant results on duration acuity. Future studies might want to continue exploring this hypothesis with a larger group of learners over a longer period of observation.

The current study leads to several theoretical contributions. First, we provide longitudinal evidence showing that learners’ L1 cue-weighting largely determines their L2 cue-weighting (Francis and Nusbaum, 2002; Francis et al., 2000; Kondaurova and Francis, 2008; Zhang and Francis, 2010), with a less-studied language pair, Mandarin-Spanish. In our case, pitch is consistently the primary cue for L1 Mandarin listeners’ perception of L2 lexical stress at the group level. Second, our results expand the cue-weighting theory by showing that the cue-weighting shift in L2 perception is driven by two key factors: individual auditory acuity and the amount of L2 input. In other words, although adults show consistent cue-weighting strategies for speech perception (Jasmin et al., 2020), there is still potential to recalibrate the cue-weighting patterns. This implies that, provided with proper training methods, L2 learners can gradually shift their perceptual strategies of L2 suprasegmental categorization towards the native norms.

The current study has certain limitations that future studies might want to address. First, our Chinese students were all graduates and moved to Spain for a master’s program taught in Spanish rather than learning the Spanish language per se. It was therefore not possible to have a parallel ‘at-home’ control group in China. A future study might want to focus on learners of Spanish language to advance this topic. Second, as cue-weighting shift typically needs a long time, a longitudinal study with a longer time frame will be welcome for better understanding the development of L2 perceptual learning. In addition, we relied on self-reports to measure L2 input. Although we carefully controlled for estimation bias by using an L2/L1 ratio, future studies may want to use more ecological methods such as language diaries, or objective tracking to quantify input more precisely. It would be useful to record changes in L2 use during SA rather than surveying it after SA to dynamically model how it links to the changes in the dependent variable such as cue-weighting shift (for comprehensive methodological recommendations, see Nagle and Zárate-Sández, 2024). Finally, as noted by an anonymous reviewer, learners’ cue-weighting shift may be a proxy of L2 proficiency change after SA. In the present study, however, learners reported their Spanish proficiency levels based on different official tests. Therefore, the proficiency data was not a reliable individual variable for statistical analysis. A replication study using a uniform, dedicated Spanish proficiency assessment would be welcome to examine potential relationships between L2 proficiency gains and cue-weighting shifts.

To conclude, the current study provided longitudinal evidence in a less-investigated language pair – L1-Mandarin–L2-Spanish – to demonstrate that individual differences in auditory acuity and the amount of L2 input jointly affect adult learners’ cue-weighting strategies in L2 suprasegmental perception. Therefore, adult learners still show flexibility in L2 speech learning given proper conditions. Our findings highlight the need for taking individual approaches when investigating the development of L2 learning, especially in naturalistic settings.

Supplemental Material

sj-docx-1-slr-10.1177_02676583261441156 – Supplemental material for The interaction of auditory acuity and L2 input in suprasegmental cue-weighting shifts after study abroad: Mandarin speakers’ perception of Spanish lexical stress

Supplemental material, sj-docx-1-slr-10.1177_02676583261441156 for The interaction of auditory acuity and L2 input in suprasegmental cue-weighting shifts after study abroad: Mandarin speakers’ perception of Spanish lexical stress by Peng Li, Xiaotong Xi and Clara D. Martin in Second Language Research

Footnotes

Appendix A

Appendix B

Acknowledgements

The authors would like to thank Sara Coego for her support in creating the experimental stimuli.

ORCID iDs

Peng Li

Xiaotong Xi

Clara D. Martin

Ethical approval

This study was approved by the Norwegian Agency for Shared Services in Education and Research (SIKT) under the project title Spanish pronunciation by Chinese learners (reference number 732455).

Consent for publication

All participants provided written consent, allowing the researchers to collect and analyse their personal data for research purposes.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research is supported by the Basque Government through the BERC 2022–25 program and is funded by the Spanish State Research Agency through BCBL Severo Ochoa excellence accreditation (CEX2020-001010/AEI/10.13039/501100011033), the Spanish Ministry of Science and Innovation through the Juan de la Cierva program funded by ‘NextGenerationEU’/PRTR (JDC2022-048729-I to P.L.), the Spanish Ministry of Economy and Competitiveness (PID2023-148756NB-I00 to C.D.M.), and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement number 819093 to C.D.M.).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

All materials, data, and statistical analyses are publicly available at the following OSF repository: .

Supplemental material

Supplemental material for this article is available online.

References

Bates

Mächler

Bolker

, et al. (2015) Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67(1): 1–48. https://doi.org/10.18637/jss.v067.i01

Bidelman

Hutka

Moreno

(2013) Tone language speakers and musicians share enhanced perceptual and cognitive abilities for musical pitch: Evidence for bidirectionality between the domains of language and music. PLoS One 8(4): Article e60676. https://doi.org/10.1371/journal.pone.0060676

Boersma

Weenink

(2017) Praat: Doing Phonetics by Computer [computer program]. Available at: https://www.praat.org (accessed April 2026).

Bohn

O-S

(1995) Cross-language perception in adults: First language transfer doesn’t tell it all. In: Strange

(ed.) Speech Perception and Linguistic Experience: Issues in Cross-Language Research. York Press, pp.379–410.

Bowles

Chang

Karuzis

(2016) Pitch ability as an aptitude for tone learning. Language Learning 66(4): 774–808. https://doi.org/10.1111/lang.12159

Braun

Lemhöfer

Mani

(2011) Perceiving unstressed vowels in foreign-accented English. The Journal of the Acoustical Society of America 129(1): 376–387. https://doi.org/10.1121/1.3500688

Cebrian

(2006) Experience and the use of non-native duration in L2 vowel categorization. Journal of Phonetics 34(3): 372–387. https://doi.org/10.1016/j.wocn.2005.08.003

Chandrasekaran

Krishnan

Gandour

(2009) Relative influence of musical and linguistic experience on early cortical processing of pitch contours. Brain and Language 108(1): 1–9. https://doi.org/10.1016/j.bandl.2008.02.001

Chandrasekaran

Sampath

Wong

PCM

(2010) Individual variability in cue-weighting and lexical tone learning. The Journal of the Acoustical Society of America 128(1): 456–465. https://doi.org/10.1121/1.3445785

10.

Chen

(2006) Production of weak elements in speech – evidence from F₀ patterns of neutral tone in Standard Chinese. Phonetica 63(1): 47–75. https://doi.org/10.1159/000091406

11.

Chui

Y-T

Qin

(2024) Individual differences in the distributional learning and overnight consolidation of the Mandarin level-falling tone contrast. The Journal of the Acoustical Society of America 156(6): 4256–4268. https://doi.org/10.1121/10.0034717

12.

Cooper

Wang

(2012) The influence of linguistic and musical experience on Cantonese word learning. The Journal of the Acoustical Society of America 131(6): 4756–4769. https://doi.org/10.1121/1.4714355

13.

Cutler

(1986) Forbear is a homophone: Lexical prosody does not constrain lexical access. Language and Speech 29(3): 201–220. https://doi.org/10.1177/002383098602900302

14.

Cutler

Wales

Cooper

, et al. (2007) Dutch listeners’ use of suprasegmental cues to English stress. In: Proceedings of the 16th international congress of phonetic sciences (eds Trouvain

Barry

). Universität des Saarlandes, pp. 1913–1916.

15.

Díaz-Campos

(2004) Context of learning in the acquisition of Spanish second language phonology. Studies in Second Language Acquisition 26(2): 249–273. https://doi.org/10.1017/S0272263104262052

16.

Duanmu

(2007) The Phonology of Standard Mandarin. Oxford University Press. https://doi.org/10.1017/S0025100316000359

17.

Escudero

Boersma

(2004) Bridging the gap between L2 speech perception research and phonological theory. Studies in Second Language Acquisition 26(4): 551–585. https://doi.org/10.1017/S0272263104040021

18.

Fear

Cutler

Butterfield

(1995) The strong/weak syllable distinction in English. The Journal of the Acoustical Society of America 97(3): 1893–1904. https://doi.org/10.1121/1.412063

19.

Flege

Bohn

O-S

(2021) The Revised Speech Learning Model (SLM-r). In: Wayland

(ed.) Second Language Speech Learning: Theoretical and Empirical Progress. Cambridge University Press, pp.3–83. https://doi.org/10.1017/9781108886901.002

20.

Francis

Baldwin

Nusbaum

(2000) Effects of training on attention to acoustic cues. Perception and Psychophysics 62(8): 1668–1680. https://doi.org/10.3758/BF03212164

21.

Francis

Nusbaum

(2002) Selective attention and the acquisition of new phonetic categories. Journal of Experimental Psychology: Human Perception and Performance 28(2): 349–366. https://doi.org/10.1037/0096-1523.28.2.349

22.

Gilbert

Honda

Friedland-Yust

, et al. (2026) The influence of individual differences in language experience on lexical stress cue-weighting: Native and non-native listeners. Brain and Language 273: Article 105674. https://doi.org/10.1016/j.bandl.2025.105674

23.

Hillenbrand

Getty

Clark

, et al. (1995) Acoustic characteristics of American English vowels. The Journal of the Acoustical Society of America 97(5 Part 1): 3099–3111. https://doi.org/10.1121/1.411872

24.

Holliday

(2014) The perceptual assimilation of Korean obstruents by native Mandarin listeners. The Journal of the Acoustical Society of America 135(3): 1585–1595. https://doi.org/10.1121/1.4863653

25.

Holt

Lotto

(2006) Cue weighting in auditory categorization: Implications for first and second language acquisition. The Journal of the Acoustical Society of America 119(5): 3059–3071. https://doi.org/10.1121/1.2188377

26.

Hualde

(2012) Stress and rhythm. In: Hualde

Olarrea

O’Rourke

(eds) The Handbook of Hispanic Linguistics. Blackwell, pp.153–171. https://doi.org/10.1002/9781118228098.ch8

27.

Hualde

Prieto

(2015) Intonational variation in Spanish: European and American varieties. In: Frota

Prieto

(eds) Intonation in Romance. Oxford University Press, pp.450–391. https://doi.org/10.1093/acprof:oso/9780199685332.003.0010

28.

Ingvalson

McClelland

Holt

(2011) Predicting native English-like performance by native Japanese speakers. Journal of Phonetics 39(4): 571–584. https://doi.org/10.1016/j.wocn.2011.03.003

29.

Iverson

Kuhl

(1994) Tests of the perceptual magnet effect for American English /r/ and /l/. The Journal of the Acoustical Society of America 95(5 Suppl.): 2976. https://doi.org/10.1121/1.408983

30.

Iverson

Kuhl

Akahane-Yamada

, et al. (2003) A perceptual interference account of acquisition difficulties for non-native phonemes. Cognition 87(1): B47–B57. https://doi.org/10.1016/S0010-0277(02)00198-1

31.

Jasmin

Dick

Holt

, et al. (2020) Tailored perception: Individuals’ speech and music perception strategies fit their perceptual abilities. Journal of Experimental Psychology: General 149(5): 914–934. https://doi.org/10.1037/xge0000688

32.

Jasmin

Tierney

Obasih

, et al. (2023) Short-term perceptual reweighting in suprasegmental categorization. Psychonomic Bulletin and Review 30(1): 373–382. https://doi.org/10.3758/s13423-022-02146-5

33.

Kachlicka

Saito

Tierney

(2019) Successful second language learning is tied to robust domain-general auditory processing and stable neural representation of sound. Brain and Language 192: 15–24. https://doi.org/10.1016/j.bandl.2019.02.004

34.

Kay

(2024) ggdist: Visualizations of distributions and uncertainty in the Grammar of graphics. IEEE Transactions on Visualization and Computer Graphics 30(1): 414–424. https://doi.org/10.1109/TVCG.2023.3327195

35.

Kempe

Bublitz

Brooks

(2015) Musical ability and non-native speech-sound processing are linked through sensitivity to pitch and spectral information. British Journal of Psychology 106(2): 349–366. https://doi.org/10.1111/bjop.12092

36.

Kim

Clayards

Goad

(2018) A longitudinal study of individual differences in the acquisition of new vowel contrasts. Journal of Phonetics 67: 1–20. https://doi.org/10.1016/j.wocn.2017.11.003

37.

Kondaurova

Francis

(2008) The relationship between native allophonic experience with vowel duration and perception of the English tense/lax vowel contrast by Spanish and Russian listeners. The Journal of the Acoustical Society of America 124(6): 3959–3971. https://doi.org/10.1121/1.2999341

38.

Kong

(2019) Individual differences in categorical perception: L1 English learners’ L2 perception of Korean stops. Phonetics and Speech Sciences 11(4): 63–70. https://doi.org/10.13064/KSSS.2019.11.4.063

39.

Kong

Kang

(2023) Individual differences in categorical judgment of L2 stops: A link to proficiency and acoustic cue-weighting. Language and Speech 66(2): 354–380. https://doi.org/10.1177/00238309221108647

40.

Krishnan

Swaminathan

Gandour

(2009) Experience-dependent enhancement of linguistic pitch representation in the brainstem is not specific to a speech context. Journal of Cognitive Neuroscience 21(6): 1092–1105. https://doi.org/10.1162/jocn.2009.21077

41.

Kuznetsova

Brockhoff

Christensen

RHB

(2017) lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82(13): 1–26. https://doi.org/10.18637/jss.v082.i13

42.

Lee

Yamaguchi

Fougeron

(2022) Why is Korean lenis stop difficult to perceive for L2 Korean learners? In: Proceedings of Interspeech 2022. International Speech Communication Association (ISCA), pp.1861–1865. https://doi.org/10.21437/Interspeech.2022-10912

43.

Lee

W-S

Zee

(2008) Prosodic characteristics of the neutral tone in Beijing Mandarin. Journal of Chinese Linguistics 36(1): 1–29.

44.

Lengeris

Hazan

(2010) The effect of native vowel processing ability and frequency discrimination acuity on the phonetic training of English vowels for native speakers of Greek. The Journal of the Acoustical Society of America 128(6): 3757–3768. https://doi.org/10.1121/1.3506351

45.

(2022) Spanish lexical stress produced by proficient Mandarin learners of Spanish. In: Proceedings of the 4th international symposium on applied phonetics (eds Ishihara

Tronnier

). International Speech Communication Association (ISCA), pp.41–45. https://doi.org/10.21437/ISAPh.2022-8

46.

(2023) Nuclear contours of Spanish echo questions produced by proficient Chinese learners of Spanish: A dynamic analysis. In: Proceedings of the 20th international congress of phonetic sciences (eds Skarnitzl

Volín

). International Phonetic Association, pp.1201–1205.

47.

(2024a) Nuclear contours of Spanish statements and questions produced by Mandarin speakers with advanced Spanish Proficiency. Phonica 20: 1–26. https://doi.org/10.1344/phonica2024.20.3

48.

(2024b) The perception of Spanish lexical stress by proficient Mandarin learners of Spanish. In: Proceedings of the 12th international conference on speech prosody (eds Chen

Chen

Arvaniti

). International Speech Communication Association (ISCA), pp.354–358. https://doi.org/10.21437/SpeechProsody.2024-72

49.

Ioannidou

Marazzina

, et al. (2026) The predictive roles of musical aptitude, auditory abilities, and working memory in L2 speech imitation: Differences between familiar and unfamiliar languages. Vigo International Journal of Applied Linguistics 23: 115–144. https://doi.org/10.35869/vial.v0i23.5772

50.

Zhang

, et al. (2020) Language History Questionnaire (LHQ3): An enhanced tool for assessing multilingual experience. Bilingualism: Language and Cognition 23(5): 938–944. https://doi.org/10.1017/S1366728918001153

51.

Liu

(2022) Individual differences in processing non-speech acoustic signals influence cue weighting strategies for L2 speech contrasts. Journal of Psycholinguistic Research 51(4): 903–916. https://doi.org/10.1007/s10936-022-09869-5

52.

Long

(2019) Interactions: Comprehensive, User-Friendly Toolkit for Probing Interactions. Available at: https://cran.r-project.org/package=interactions (accessed April 2026).

53.

Martínez García

Holliday

(2019) The perception of Korean stops by native speakers of Spanish. In: Proceedings of the 19th international congress of phonetic sciences (eds Calhoun

Escudero

Tabain

, et al.). Australasian Speech Science and Technology Association, pp.2585–2589.

54.

Miyawaki

Jenkins

Strange

, et al. (1975) An effect of linguistic experience: The discrimination of [r] and [l] by native speakers of Japanese and English. Perception and Psychophysics 18(5): 331–340. https://doi.org/10.3758/BF03211209

55.

Mora-Plaza

Saito

Suzukida

, et al. (2022) Tools for Second Language Speech Research and Teaching. Available at: http://sla-speech-tools.com (accessed April 2026). http://doi.org/10.17616/R31NJNAX

56.

Morrison

G-S

(2007) Logistic regression modelling for first and second language perception data. In: Prieto

Mascaró

Solé

M-J

(eds) Segmental and Prosodic Issues in Romance Phonology. Current Issues in Linguistic Theory. John Benjamins, pp. 219–236. Available at: https://benjamins.com/catalog/cilt.282.15mor (accessed April 2026).

57.

Muñoz

Llanes

(2014) Study abroad and changes in degree of foreign accent in children and adults. The Modern Language Journal 98(1): 432–449. https://doi.org/10.1111/j.1540-4781.2014.12059.x

58.

Nagle

Zárate-Sández

(2024) The phonetics and phonology of adult L2 learners after study abroad. In: Amengual

(ed.) The Cambridge Handbook of Bilingual Phonetics and Phonology. Cambridge University Press, pp.542–559. https://doi.org/10.1017/9781009105767.025

59.

Ortega-Llebaria

Fan

(2013) English speakers’ perception of Spanish lexical stress: Context-driven L2 stress perception. Journal of Phonetics 41(3): 186–197. https://doi.org/10.1016/j.wocn.2013.01.006

60.

Ortega-Llebaria

Prieto

(2011) Acoustic correlates of stress in Central Catalan and Castilian Spanish. Language and Speech 54(1): 73–97. https://doi.org/10.1177/0023830910388014

61.

Ortín

Simonet

(2022) Phonological processing of stress by native English speakers learning Spanish as a second language. Studies in Second Language Acquisition 44(2): 460–482. https://doi.org/10.1017/S0272263121000309

62.

Ortín

Simonet

(2023) Perceptual sensitivity to stress in native English speakers learning Spanish as a second language. Laboratory Phonology 14(1), Article 6. https://doi.org/10.16995/labphon.7978

63.

Petrova

Jasmin

Saito

, et al. (2023) Extensive residence in a second language environment modifies perceptual strategies for suprasegmental categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition 49(12): 1943–1955. https://doi.org/10.1037/xlm0001246

64.

Qin

Chien

Y-F

Tremblay

(2017) Processing of word-level stress by Mandarin-speaking second language learners of English. Applied Psycholinguistics 38(3): 541–570. https://doi.org/10.1017/S0142716416000321

65.

Qin

Jin

Zhang

(2022) The effects of training variability and pitch aptitude on the overnight consolidation of lexical tones. Journal of Speech, Language, and Hearing Research 65(9): 3377–3391. https://doi.org/10.1044/2022_JSLHR-22-00058

66.

Qin

Zhang

Wang

WS-Y

(2021) The effect of Mandarin listeners’ musical and pitch aptitude on perceptual learning of Cantonese level-tones. The Journal of the Acoustical Society of America 149(1): 435–446. https://doi.org/10.1121/10.0003330

67.

(2013) Representation and acquisition of the tonal system of Mandarin Chinese. Doctoral Dissertation, McGill University, Canada.

68.

R Core Team (2019). R: A language and environment for statistical computing [software]. Vienna: R Foundation for Statistical Computing. Available at: https://www.R-project.org (accessed April 2026).

69.

Romanelli

Menegotto

(2015) English speakers learning Spanish: Perception issues regarding vowels and stress. Journal of Language Teaching and Research 6(1): 30–42. https://doi.org/10.17507/jltr.0601.04

70.

Romanelli

Menegotto

Smyth

(2015) Stress perception: Effects of training and a study abroad program for L1 English late learners of Spanish. Journal of Second Language Pronunciation 1(2): 181–210. https://doi.org/10.1075/jslp.1.2.03rom

71.

Sagarra

Fernández-Arroyo

Lozano-Argüelles

, et al. (2024) Unraveling the complexities of second language lexical stress processing: The impact of first language transfer, second language proficiency, and exposure. Language Learning 74(3): 574–605. https://doi.org/10.1111/lang.12627

72.

Saito

(2023) How does having a good ear promote successful second language speech acquisition in adulthood? Introducing Auditory Precision Hypothesis-L2. Language Teaching 56(4): 522–538. https://doi.org/10.1017/S0261444822000453

73.

Saito

Cui

Suzukida

, et al. (2022a) Does domain-general auditory processing uniquely explain the outcomes of second language speech acquisition, even once cognitive and demographic variables are accounted for? Bilingualism: Language and Cognition 25(5): 856–868. https://doi.org/10.1017/S1366728922000153

74.

Saito

Kachlicka

Sun

, et al. (2020a) Domain-general auditory processing as an anchor of post-pubertal second language pronunciation learning: Behavioural and neurophysiological investigations of perceptual acuity, age, experience, development, and attainment. Journal of Memory and Language 115: Article 104168. https://doi.org/10.1016/j.jml.2020.104168

75.

Saito

Kachlicka

Suzukida

, et al. (2022b) Auditory precision hypothesis-L2: Dimension-specific relationships between auditory processing and second language segmental learning. Cognition 229: Article 105236. https://doi.org/10.1016/j.cognition.2022.105236

76.

Saito

Macmillan

Kroeger

, et al. (2022c) Roles of domain-general auditory processing in spoken second-language vocabulary attainment in adulthood. Applied Psycholinguistics 43(3): 581–606. https://doi.org/10.1017/S0142716422000029

77.

Saito

Sun

Tierney

(2020b) Domain-general auditory processing determines success in second language pronunciation learning in adulthood: A longitudinal study. Applied Psycholinguistics 41(5): 1083–1112. https://doi.org/10.1017/S0142716420000491

78.

Saito

Suzukida

Tran

, et al. (2021) Domain-general auditory processing partially explains second language speech learning in classroom settings: A review and generalization study. Language Learning 71(3): 669–715. https://doi.org/10.1111/lang.12447

79.

Small

Simon

Goldberg

(1988) Lexical stress and lexical access: Homographs versus nonhomographs. Perception and Psychophysics 44(3): 272–280. https://doi.org/10.3758/BF03206295

80.

Stevens

(2011) Vowel duration in second language Spanish vowels: Study abroad versus at-home learners. The Arizona Working Papers in Second Language Acquisition and Teaching 18: 77–103.

81.

Sun

Saito

Dewaele

J-M

(2024) Cognitive and sociopsychological individual differences, experience, and naturalistic second language speech learning: A longitudinal study. Language Learning 74(1): 5–40. https://doi.org/10.1111/lang.12561

82.

Torreira

Simonet

Hualde

(2014) Quasi-neutralization of stress contrasts in Spanish. In: Speech Prosody 2014 (eds Campbell

Gibbon

Hirst

). International Speech Communication Association (ISCA), pp.197–201. https://doi.org/10.21437/SpeechProsody.2014-27

83.

Tremblay

Broersma

Zeng

, et al. (2021) Dutch listeners’ perception of English lexical stress: A cue-weighting approach. The Journal of the Acoustical Society of America 149(6): 3703–3714. https://doi.org/10.1121/10.0005086

84.

Turner

(2024) The role of L2 input in developing a novel L2 contrast phonetically and phonologically: Production evidence from a residence abroad context. Second Language Research 41(1): 103–133. https://doi.org/10.1177/02676583231217166

85.

Wang

Deng

Tang

, et al. (2024) The effect of pitch accent on the perception of English lexical stress: Evidence from English and Mandarin Chinese listeners. Languages 9(87): 1–17. https://doi.org/10.3390/languages9030087

86.

Wickham

(2016) Ggplot2: Elegant Graphics for Data Analysis. Springer. Available at: https://ggplot2.tidyverse.org (accessed April 2026).

87.

Wong

PCM

Perrachione

(2007) Learning pitch patterns in lexical identification by native English-speaking adults. Applied Psycholinguistics 28(4): 565–585. https://doi.org/10.1017/S0142716407070312

88.

Shea

(2026) Perceptual development of second language sound categories in a study abroad program: L1 Mandarin speakers in Spain. Language and Speech. Epub ahead of print 2026. https://doi.org/10.1177/00238309261434241

89.

Saito

Mora-Plaza

(2024) Task-based pronunciation teaching: Lack of auditory precision but not memory hinders learning. System 127: Article 103532. https://doi.org/10.1016/j.system.2024.103532

90.

(1997) Contextual tonal variations in Mandarin. Journal of Phonetics 25(1): 61–83. https://doi.org/10.1006/jpho.1996.0034

91.

Yazawa

Kondo

Escudero

(2017) Modelling Japanese speakers’ perceptual listening of English /iː/ and /ɪ/ within the L2LP framework. In: Proceedings of the phonetic teaching and learning conference 2017. University College London, pp. 115–119.

92.

Yazawa

Whang

Kondo

, et al. (2020) Language-dependent cue weighting: An investigation of perception modes in L2 learning. Second Language Research 36(4): 557–581. https://doi.org/10.1177/0267658319832645

93.

(2023) The perceptual cue weighting of English tense–lax vowel contrasts by first language (L1) and second language (L2) speakers of English. Doctoral Dissertation, Indiana University, IN, USA.

94.

Zhang

Francis

(2010) The weighting of vowel quality in native and non-native listeners’ perception of English lexical stress. Journal of Phonetics 38(2): 260–271. https://doi.org/10.1016/j.wocn.2009.11.002

95.

Zheng

Saito

Tierney

(2022) Successful second language pronunciation learning is linked to domain-general auditory processing rather than music aptitude. Second Language Research 38(3): 477–497. https://doi.org/10.1177/0267658320978493

96.

Zhou

Veríssimo

(2026) L2 difficulties in the perception of Mandarin tones: Phonological universals or domain-general aptitude? Bilingualism: Language and Cognition 29(2): 405–419. https://doi.org/10.1017/S1366728925100114

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.16 MB

0.00 MB