Abstract
The current study aimed to explore how individual differences in lower-order perceptual abilities (auditory processing) and higher-order cognitive abilities (declarative and procedural memory) could be linked to various dimensions of second language (L2) speech production (segmentals, prosody, fluency, and lexicogrammar) in a group of 47 late Spanish–English bilinguals with varying levels of L2 proficiency and immersion experience. The statistical analyses revealed several key findings: (1) auditory processing showed significant correlations with L2 speech proficiency, particularly among learners with short-to-mid immersion experience (length of residence [LOR] = 1–5 years); (2) a robust triangular relationship (r = .7–.8) was observed between procedural memory, the duration of immersion experience, and L2 speech outcomes across the entire participant group (LOR = 1–20 years); and (3) declarative memory did not exhibit any significant associations with the factors related to experience and proficiency. Based on these results, it is tentatively suggested that auditory processing abilities may support the development of L2 speech in the initial stages of immersion (LOR = 1–5 years), while procedural memory skills may play a crucial role in achieving high-level L2 speech proficiency, as individuals have more opportunities to practice L2 pronunciation through active usage during an extended stay in a country where the L2 is spoken.
Keywords
I Introduction
Over the past five decades, scholars have been actively involved in an extensive discourse examining factors such as the age of second language (L2) acquisition and experience (Munro et al., 2025) that influence L2 acquisition after puberty, further highlighting the crucial role of aptitude in attaining advanced levels of L2 proficiency (Skehan, 2016).
Traditionally, aptitude was operationalized as higher-order cognitive abilities, such as working memory and attentional control (for a comprehensive overview of aptitude, see Li, 2016). While earlier research has primarily focused on the impact of such cognitive abilities and declarative and procedural memory on the morphosyntatic aspects of L2 learning, as evidenced in studies by Hamrick (2015) and Morgan-Short et al. (2014), the specific contribution of these abilities to mastering L2 speech remains unclear.
In recent years, there has been a noticeable shift towards investigating the significance of lower-order perceptual and sensory abilities (i.e. auditory processing) in L2 speech development (Kachlicka et al., 2019; Saito et al., 2020). While some argue that auditory processing can be considered as a dynamic interplay of perceptual, cognitive, and motor abilities (Kraus and Banai, 2007; Saito et al., 2024), this ability has been essentially operationalized as a bottom-up processing of basic acoustic information (pitch, duration, and intensity).
There is emerging evidence that both perceptual and cognitive individual differences (auditory processing and memory) interact to influence the outcomes of L2 speech perception (Saito, Cui, Suzukida, et al., 2022). We have yet to know the complex relationship between auditory processing, memory and L2 speech production. The following study, involving 47 Spanish–English bilinguals with diverse experience and proficiency levels, aims to unravel the intricate relationship between experience, auditory processing, and memory in the multifaceted dimensions of L2 speech proficiency, with a specific emphasis on phonology and lexicogrammar.
II Background
1 Auditory processing
In the context of first language (L1) acquisition, Kraus and Banai (2007) proposed the interaction view that auditory processing is a multifaceted phenomenon, a complex interplay of perceptual, cognitive, and motor abilities operating on different levels (for the context of L2 acquisition, see Saito et al., 2024). In general, however, auditory processing has been considered as a relatively low-order, perceptual ability to encode acoustic details of sounds.
Temporal acuity refers to an individual’s ability to discern changes in amplitude over time. This ability is crucial for diverse aspects related to fluency, such as phonation time and pause frequency, and segmental contrasts like short vs. long vowels. Temporal acuity also impacts prosody, including durational differences in weak vs. strong vowels, voice onset time, and rhythm, such as the duration between two syllables or stressed syllables vs. morae (Tallal et al., 1993). Spectral processing, on the other hand, pertains to an individual’s ability to track changes in the frequency content of a signal, such as pitch (the frequency of vocal cord vibration) and formants (the energy concentration at different frequency bands). This ability is essential for the appropriate assignment of stress and intonation, fine-tuning to isochrony (e.g. syllable-timed vs. stressed-timed), and the refinement of segmental accuracy (e.g. third formant variability for English [r] and [l]) (Saito, Kachlicka, Suzukida, et al., 2022).
Auditory processing is a pivotal element in the acquisition of an L1, and variations in auditory processing capabilities can significantly influence the pace at which L1 is acquired. As children evolve through the process of L1 acquisition, they become increasingly proficient in distinguishing temporal and spectral cues within their native language(s). However, these improvements do not necessarily translate to the perception of non-native language features, with a plateau often being reached by the end of their first year. Auditory acuity, a key aspect of auditory processing, develops and typically peaks around the ages of 7 to 10 years. Following this peak, there is a gradual decline in the precision of auditory acuity as individuals age (Skoe et al., 2015). Impediments in auditory processing have been linked to a variety of deficits. These include language skills, attention, memory, and reading difficulties, as observed in conditions like dyslexia (Surprenant and Watson, 2001). Moreover, differences in auditory processing at an individual level have been associated with challenges in L1 learning (Goswami et al., 2011). Despite the substantial body of evidence supporting these connections, the causal relationship between auditory processing and L1 acquisition remains a subject of ongoing debate (see Rosen and Manganari, 2001; Snowling et al., 2018).
Auditory processing is considered fundamental in L2 speech learning (Kachlicka et al., 2019; Saito et al., 2024) as presented in Table 1. Research, including studies by Kempe et al. (2012), Chandrasekaran et al. (2010), Perrachione et al. (2011), and Wong and Perrachione (2007), supports the notion that individuals with higher perceptual acuity are better at perceiving and learning unfamiliar sounds. This predictive power of acuity is particularly pronounced in naturalistic L2 learners who have achieved proficiency in a target language through immersive experiences. Specifically, perceptual acuity is a moderate to strong predictor of successful L2 speech learning outcomes for those with medium- to long-term immersion experiences (residing for 1–10 years; Kachlicka et al., 2019; Saito et al., 2022). Recent findings by Saito et al. (2024) underscore the significant role of auditory processing in predicting L2 perception accuracy in both phonology and lexicogrammar. However, it is less predictive for those with short-term immersion experiences (less than 6 months; Saito et al., 2022) or for foreign language learners with no immersion experience (Saito et al., 2021).
Summary of key studies in auditory processing and second language acquisition.
Note. LOR = length of residence.
Our research aims to further investigate these relationships, with a specific focus on auditory acuity, particularly temporal and spectral processing, which, as suggested by previous research, may play a critical role in L2 speech learning. In doing so, we aim to elucidate the intricate interplay between auditory processing and the mechanisms of other cognitive abilities (i.e. procedural and declarative memory) in the context of L2 acquisition. By understanding these connections, we can also provide valuable insights into the cognitive processes underlying successful L2 speech learning.
2 Declarative and procedural memory
Ullman’s (2004) Declarative/Procedural (D/P) model emphasizes the essential role of these long-term memory systems in language learning, storage, and utilization, serving as crucial domain-general learning mechanisms (Hamrick et al., 2018). Notably, these memory systems differ in various dimensions, including their relationship with awareness, their role in language learning, and the associated brain regions (Eichenbaum, 2002).
Declarative memory, linked to the hippocampus and its surrounding structures (Poldrack and Packard, 2003), is primarily responsible for the explicit learning of episodic and semantic knowledge and associations (Tulving, 1993). This memory system supports the acquisition of events (episodic memory), facts (semantic memory), and the storage of lexical items crucial for all idiosyncratic linguistic knowledge at the word or multi-word level. In contrast, procedural memory, crucial for implicit learning, encompasses a range of perceptual-motor and cognitive skills like navigation (Ullman and Lovelett, 2018). Despite its slower learning pace and need for repeated exposure, knowledge acquired in procedural memory is processed more rapidly and automatically than in declarative memory (Ullman, 2020). Moreover, procedural memory tends to be more robust in early childhood and diminishes in early adulthood, as evidenced by Nemeth et al. (2013).
The interaction between declarative and procedural memory systems in L2 learning is a complex and dynamic process as presented in Table 2. Studies have shown that these memory systems influence L2 learners’ linguistic outcomes differently based on their stage of acquisition, with declarative memory abilities predicting early-stage L2 grammar learning and individuals with stronger procedural learning abilities showing larger gains with increased L2 input (Hamrick, 2015; Morgan-Short et al., 2014). Research in classroom settings has explored how these memory systems relate to L2 speech learning, with findings suggesting that greater procedural learning ability is associated with L2 speech fluency (Granena, 2019). Recent studies, such as that by Quam et al. (2018) demonstrated that those with greater declarative memory benefited more from short-term training, whilst those with greater procedural memory yielded more gains as a result of an increased length of training. More recently, Saito, Cui, Suzukida, et al. (2022) further highlights the importance of procedural memory for precise L2 speech perception among Japanese learners of English in foreign language settings.
Summary of key studies in declarative and procedural memory and second language acquisition.
Note. SRT = serial reaction time.
However, the extent to which long-term memory abilities can support L2 learners in immersion environments remains to be fully understood. Given the intricacies of these memory systems and their roles in L2 learning, additional research is warranted to enhance our understanding of how they interact, particularly among experienced and proficient L2 speakers in naturalistic settings.
3 L2 speech learning
L2 speech proficiency has been conceptualized as a composite construct consisting of various linguistic domains such as the correct pronunciation of individual sounds (segmentals), the adequate and varied use of stress (prosody), and the accurate use of lexes and morphosyntactic markers (lexicogrammar; for a methodological review on the outcome measures of L2 speech proficiency, see for example Saito and Plonsky, 2019). There is ample cross-sectional and longitudinal research showing how the outcomes of adult L2 speech learning can be associated with the quantity and quality of experience at two different stages, i.e. rate of learning vs. ultimate attainment (for a comprehensive review, see DeKeyser, 2013).
During the initial years of immersion, L2 learners often experience significant improvement in various aspects of L2 speech as they actively seek and engage in conversations. This stage of L2 learning is referred to as the ‘rate of learning’, where individuals enhance their proficiency based on factors such as the length, timing, and intensity of immersion experiences (Derwing and Munro, 2013 for phonology; Saito, 2015 for lexicogrammar; Segalowitz and Freed, 2004 for fluency).
After extensive immersion experience, L2 learners’ proficiency tends to reach a more stable or plateaued state, becoming less susceptible to further practice opportunities. This stage, known as ‘ultimate attainment’, suggests that experience-related factors alone cannot fully explain the outcomes of L2 speech learning. Even among experienced L2 learners, there is considerable individual variation, with some achieving native-like proficiency and others maintaining a noticeable foreign accent (Derwing and Munro, 2013). Further research is needed to explore how learner-internal, aptitude-related factors contribute to the development of high-level L2 oral proficiency (Granena, 2013).
III Motivation for the current study
To date, a growing amount of evidence has pointed out the broad relationship between auditory processing, long-term memory systems (i.e. declarative and procedural), experience, and L2 speech proficiency (Granena, 2019; Kachlicka et al., 2019; Linck et al., 2013; Saito et al., 2022). However, it remains unclear how these variables interact with each other and to what extent they support adult learners in producing more accurate speech in immersive environments. First, few studies have probed the relative weights of auditory processing, long-term memory, and learners’ experience-related factors in L2 speech learning (see Saito et al., 2022).
Second, Skehan’s (2016) aptitude-acquisition model has argued (1) that auditory processing could be a key factor during the early phase of L2 acquisition as it determines the amount of input that learners can take in; and (2) that declarative and procedural memory may relate to the incidence of high-level L2 acquisition as the long-term memory abilities predict the extent to which learners can store, transform, and elaborate received input with a view of sophisticated linguistic analyses. Therefore, it is crucial to determine the potential varying strengths of the relationship between auditory processing, the two long-term memory systems, and L2 speech outcomes among L2 learners across different immersion and proficiency levels. By doing so, we can gain a more comprehensive understanding of these complex interrelationships and their impact on L2 learning. This exploration is vital as it will provide valuable insights into how these factors interact and contribute to the progress and success of L2 learners.
Finally, while the existing literature has shown that auditory processing (spectral and temporal acuity), long-term (declarative and procedural) memory and L2 experience (i.e. Length of Residence [LOR], Age on Arrival [AOA], Current L2 Use, and Foreign Language Education) are predictive of L2 speech learning respectively, very few studies have expounded the inter-relationships between the eight predictor variables in the context of post-pubertal L2 users. There is some evidence that both auditory processing and procedural memory are susceptible to chronological age (older participants likely have less auditory precision and memory; for procedural memory, see Nemeth et al., 2013; for auditory processing, see Saito et al., 2022). Little is known about how perceptual-cognitive abilities can be influenced by the length, timing, and intensity of the immersion experience.
The current study seeks to examine the extent to which aptitude and experience factors contribute to more accurate production across multiple dimensions of L2 speech, according to the framework developed and proposed by Saito and Plonsky (2019). These dimensions include vowels, consonants, word stress, intonation, rhythm, speech rate, semantic accuracy, and morphosyntactic accuracy. By examining how they interact and their contribution to variances in the segmental, prosody, fluency, and lexicogrammar aspects of L2 speech proficiency, we hope to provide a more in-depth understanding of L2 speech learning. This exploration is particularly relevant for Spanish speakers of English with varied levels of proficiency and immersion experience in the UK (1–20 years). The understanding derived from this study could inform tailored teaching strategies and methodologies that take into account these key variables. Therefore, the following research questions were formulated to guide this investigation:
Research question 1: How do auditory processing, procedural and declarative memory, and factors related to the experience of L2 learners (Length of Residence [LOR], Age on Arrival [AOA], Current L2 Use, Foreign Language Education) interact with each other?
Research question 2: To what extent do experience factors and individual differences in auditory processing and two memory systems predict the attainment of four dimensions of L2 speech: segmentals (vowels and consonants), prosody (word stress and intonation), fluency (rhythm and speech rate), and lexicogrammar (lexical and morphosyntactic accuracy)?
As for research question 1, both auditory processing (Saito et al., 2022; Skoe et al., 2015) and declarative memory (Ullman and Lovelett, 2018) tend to decline with age. Given this evidence, we anticipated finding a similar relationship in the current study. Furthermore, based on existing research, we hypothesized that auditory processing would not be influenced by the extent of language experience (Saito et al., 2024). This stands in contrast to procedural memory aptitude, which previous studies have found to be related to the length of residence in an L2 speaking country (Saito et al., 2021). Consequently, our study aims to add to this body of knowledge by exploring these relationships further.
For research question 2, we predicted that both auditory processing and long-term memory system (more procedural than declarative memory) could play equally significant roles in L2 speech learning. We postulated that learners with higher auditory acuity and memory capabilities might exhibit more precise prosody, segmentals, and lexicogrammar, as suggested by Granena (2019) and Saito et al. (2021).
Considering that declarative or explicit memory learning is believed to be associated with the initial stages of adult learning (Ullman, 2020), we anticipated that our study participants who have varying levels of immersion experience (LOR = 1–20 years) would demonstrate a greater reliance on procedural memory. This cognitive ability, which is thought to be characteristic of the later stages of L2 learning, is often enhanced with language experience (Hamrick, 2015; Morgan-Short et al., 2014; Suzuki and DeKeyser, 2015; Ullman, 2020). Therefore, we hypothesized that participants with more extensive L2 immersion experience would display a higher dependence on procedural memory, reflecting its role in facilitating more advanced stages of language acquisition.
IV Method
All the research materials used in the current project have been deposited on the open science platform for researchers and teachers, L2 Speech Tools (Mora-Plaza et al., 2022).
1 Participants
The participants in the current study were 47 Spanish speakers of English who were studying or working in the UK. Since the Catalan language is spoken in certain regions of Spain (e.g. in Catalonia) in addition to the official language of Spanish, Spanish bilingual speakers who spoke both Spanish and Catalan were excluded from the study to maintain the homogeneity of the language background. All participants responded to a flyer advertisement, and their L2 profiles were carefully scrutinized via a telephone interview by means of a background questionnaire. The study focused on experienced, active users, specifically individuals who regularly use English in their daily lives (Flege et al., 1995), and late L2 learners. These participants arrived in the UK after the age of 17 years and had been residing there for at least a year, demonstrating stable L2 proficiency as outlined by Derwing et al. (2008). The primary objective was to investigate the relative contributions of language experience, auditory acuity, and two memory systems to enhanced L2 proficiency.
In accordance with the screening criteria, we recruited participants with a range of chronological ages at the time of the study (M = 30 years, Range = 19–47), arrived in the UK after the age of 17 years (Mage = 22, Range = 17–33) and had stayed for more than one year (LOR: Myears = 7.7; Range = 1–20). Recognizing that both the quantity and quality of experience matter in L2 learning (Flege, 2016), participants were carefully selected based on their average daily use of English in three distinct settings: school/work, home, and social life (Mdaily usage = 70%, Range = 50–90%) (e.g. Flege and Liu, 2001; Flege et al., 1995). All participants reported significant experience in education in English as a foreign language prior to their arrival in the UK (Myears = 7.53 years; Range = 4–14 years). Furthermore, none of the participants reported any prior hearing impairments that could potentially impact their perception and production of L2 speech.
To further examine the effects of experience, auditory processing, and memory in different phases of L2 speech learning, the 47 participants were further categorized into two subgroups as per their LOR profiles: (1) short-LOR (n = 20, LOR = 1–5 years) and (2) long-LOR (n = 27, LOR = 6–20 years). The distinction here corresponds to the way scholars have typically operationalized the two different phenomena in L2 learning, i.e. rate of learning vs. ultimate attainment (see DeKeyser, 2013). In the rate of learning stage (LOR = 1–5 years), L2 learners are likely to demonstrate a dramatic improvement in their linguistic abilities. After years of immersion (LOR > 6 years), however, their L2 proficiency tends to become relatively stable and unchanged regardless of additional input and output opportunities. In the current study, the short-LOR group was assumed to represent the initial-to-mid state of L2 speech learning; and the long-LOR group was assumed to index their L2 attainment at the time of study.
The two-day experiment included auditory and cognitive abilities tests, administered via Zoom on the first day, and an IELTS speaking task on the second day. Throughout the entire duration of the test, the experimenter supervised the participants, providing assistance and addressing any questions they had. The tests focused on assessing auditory acuity, declarative memory, and procedural memory, with a 15-minute break provided between each test to ensure participants remained refreshed and attentive. Notably, no data was excluded from the study due to any interference or internet connection issues that may have arisen. For the questionnaire materials, see supplemental material S1.
2 Auditory processing measures
In this current investigation, auditory processing was measured via a set of discrimination tasks developed in Kachlicka et al. (2019). The tasks were designed to assess participants’ abilities to encode acoustic details of sounds. In these tasks, participants were required to listen to and differentiate between sounds using an AXB discrimination format. Due to the COVID global pandemic, the task was implemented up and implemented on the Gorilla platform (Anwyl-Irvine et al., 2020), allowing remote access for participants. This approach was different from Kachlicka et al. (2019) wherein similar data was collected under face-to-face conditions. 1
a Stimuli
Two aspects of participants’ auditory acuity were measured with three psycho-acoustic tests, i.e. duration, formant, and pitch discrimination. Duration test scores were used for temporal acuity, and formant and pitch test scores were used for spectral acuity. These stimuli were designed to not be perceived as speech by listeners. Each of the three subtests (formants, pitch, and duration) used the same set of stimuli, with the only difference being the target acoustic dimension.
The formant subtest consisted of 101 complex tones, with one standard stimulus and 100 comparison stimuli. Each stimulus had a duration of 500 ms and included 5-ms amplitude ramps at the beginning and end. The fundamental frequency was set at 100 Hz, with harmonics up to 3,000 Hz. Three formants were inserted at 500, 1,500, and 2,500 Hz using a parallel formant filter bank. The second formant of the standard stimulus was set at 1,500 Hz, while the comparison stimuli ranged from 1,502 to 1,700 Hz in 100 steps.
For the pitch and duration subtests, 101 four-harmonic complex tones were prepared. A 5-ms linear ramp was inserted at the beginning and end of each stimulus. In the pitch subtest, the fundamental frequency of the standard stimulus was set at 330 Hz, while the comparison stimuli ranged from 330.3 to 360 Hz in increments of 0.3 Hz. In the duration subtest, the length of the first amplitude ramp varied from 10 to 300 ms, with the standard stimulus set at 15 ms.
b Procedure
During each trial, participants listened to three synthesized stimuli, with the second stimulus remaining constant. The first or last stimulus could vary, and participants were required to identify the sound that was different from the other two by pressing either the number ‘1’ or ‘3’ on a computer screen. Using Levitt’s adaptive threshold procedure, the size of the difference between stimuli varied from trial to trial based on participants’ performance. The tests started at the midpoint of the comparison stimuli and changed in increments of 10. An incorrect response decreased the difficulty by increasing the difference between stimuli by 10 steps, while three consecutive correct responses increased the difficulty by decreasing the difference between stimuli by 10 steps. After the first reversal, the step size changed from 10 to 5, and then from 5 to 1. The tests ended after 70 trials or eight reversals. Participants’ auditory processing score was determined by the location of the final reversal, indicating how small a difference they could perceive between the standard and comparison stimuli. Lower scores indicated higher precision in perceptual acuity among participants. Although we did not specifically assess test–retest reliability in our study, Saito and Tierney (2024) have reported that the reliability of this test can be considered good. For the audio materials used in the formant, pitch, and duration discrimination tests, see supplemental material S2.
3 Declarative memory
LLAMA_B is a cognitive task which has been commonly used in L2 aptitude research as a measure of explicit aptitude and declarative memory learning (Hamrick, 2015). LLAMA_B is considered to be language-neutral, and it can be also referred to as rote learning or an associative memory (Li and Zhao, 2021). Drawing on the previous research (Hamrick, 2015) that looked at the role of individual differences in declarative and procedural memory learning, a similar 3.0 version of LLAMA_B was used to measure participants’ declarative memory learning ability (Meara, 2005). The test took approximately 10–15 minutes to complete. First, the participants were asked to learn the names of 20 unfamiliar objects within two minutes. Then they had to match the same but differently arranged objects with their names by clicking on them. This part of the test was not timed. After completing the test, the scores were calculated automatically by the software and ranged from zero to 20. According to Cronbach’s alpha analyses, the internal consistency of the declarative memory test (k = 20) for 47 participants (n = 47) yielded .953, which is far above the required benchmark (> .70) in the field of applied linguistics (Larson-Hall, 2016).
For the materials used in LLAMA_B, see https://www.lognostics.co.uk/tools/LLAMA_B/LLAMA_B_test.cgi.
4 Procedural memory
The serial reaction time (SRT) task is widely utilized in L2 research to investigate implicit aptitude and procedural memory learning (Granena, 2013, 2019; Hamrick, 2015; Quam et al., 2018). This task is considered a reliable measure of implicit/procedural memory learning ability (Kaufman et al., 2010; Shanks, 2005). In the present study, we operationalized the deterministic version of the SRT task (Willingham et al., 1989), which has demonstrated high internal reliability across previous studies (Granena, 2019; Granena and Yilmaz, 2019) with a split-halves reliability of 0.79. In contrast, the probabilistic version of the SRT task (Derwing and Munro, 2013; Granena, 2013; Kaufman et al., 2010; Suzuki and DeKeyser, 2015) exhibited lower reliabilities, ranging from 0.33 to 0.44. Specifically, in the deterministic SRT task, the experiment involves alternating blocks of patterned and random trials, with a greater number of patterned blocks compared to random blocks. In contrast, in probabilistic sequences, patterned trials are interspersed with random trials, and the signal-to-noise ratio varies across tasks.
Participants were required to download a free of charge Inquisit6 Lab software and task and were asked to complete the test on their computers. In line with previous research (e.g. Hamrick, 2015), the task included 6 blocks of 96 trials consisting of the first block, which presented a pseudorandom sequence (e.g. V,B,N,M,V,N,B,V,M,B,M,N with the letters corresponding to the position of the red light appearing in one of four squares presented on the screen) and which was followed by four blocks of a patterned sequence (e.g. V,B,V,N,M,B,N,V,M,N,N,M), and the final sixth block of the repeated pseudorandom sequence. Participants were presented with 4 grey boxes in 4 possible screen positions and had to press a spatially corresponding response button on the keyboard as fast as possible once one of the boxes turned red. Following the procedure applied by Hamrick (2015), participants were asked to sit approximately 50 cm away from the computer screen to keep the stimulus angle set at the same threshold, which is approximately 2.3° × 2.0°. According to Destrebecqz and Cleeremans (2001), if intervals between stimuli (interval between response and presentation of next stimulus) are kept too long (e.g. 250–500 ms), explicit learning may take place. Thus, to ensure that participants were learning only implicitly, a response stimulus interval was set at 50 ms. Participants were not informed that they would encounter random and patterned sequences and/or that they were intended to learn a pattern.
Following the procedure operationalized in previous language aptitude studies (e.g. Hamrick, 2015; Linck et al., 2013), two scores were calculated based on the participants performance. First, the reaction times (RTs) were used as the indicator of a general processing speed. The lower RTs indicated a faster processing speed. However, lower RTs scores may be the result of the practice rather than of procedural memory learning. Hence, to factor out the effect of practice, the final pseudorandom block served as a control, and the RTs in this block are usually slower than on the final patterned block in healthy adults. Thus, the difference between RTs in pseudorandom and patterned block constituted a reactive facilitation score that indicated procedural memory learning. The higher the score, the better the sequential learning. Finally, following Hamrick (2015), the reactive facilitation scores were submitted for further analyses. If the participant’s average response accuracy in the fifth and sixth block was less than 70%, the participants’ data had to be excluded (Linck et al., 2013). No data were excluded in the present study.
In line with previous research (Granena, 2013; Hamrick, 2015), the participants had to rate their familiarity with the sequence on a 6-point scale (1 = ‘I’m sure that this sequence was part of the test’; 6 = ‘I’m sure that this sequence was not part of the test’) to eliminate the possibility that explicit learning took place. None of the participants were able to remember the presented sequences. According to split-halves Spearman–Brown, the internal reliability of the SRT task yielded .952.
For the materials used in the SRT task, see https://www.millisecond.com/products/inquisit6/weboverview.aspx.
5 Speaking task
We adopted a procedure similar to the IELTS interview task in order to elicit more spontaneous and less controlled speech. Using the multifaceted framework of L2 oral ability (Crowther et al., 2015), we aimed to use seven different L2 oral ability measures so that we could tap into the four global aspects of L2 speech: (1) segmentals, (2) prosody (words stress and intonation), (3) fluency (rhythm and speech rate), and (4) lexicogrammar (vocabulary and grammar).
Although questionnaires, such as the LEAP-Q, are valuable tools for assessing language proficiency, they primarily focus on self-reported measures and may not capture the nuances of spontaneous spoken communication. By using the IELTS speaking task, we aimed to provide a more holistic evaluation of participants’ language abilities, considering their proficiency in real-life communication scenarios. It allowed us to establish a more comprehensive evaluation of participants’ oral communication abilities, including their ability to maintain a natural pace, use appropriate intonation, and pronounce words accurately, these all being important aspects of language fluency.
Participants joined scheduled Zoom meetings where the researchers shared the speaking task IELTS interview topic, such as ‘What was the hardest and toughest challenge in your life?’ and a set of follow-up questions, for example, ‘Why did you encounter this challenge?’ The procedure followed by the researchers was based on the guidelines outlined in Saito et al. (2017), which allowed participants one minute to prepare their response and 1–2 minutes to speak on the given topic. During the interview, participants’ voice samples were recorded. For speech materials, see supplemental material S3.
a Raters
Previous research posited that multilingual raters tend to be more lenient than monolingual judges while conducting a speech assessment (Isaacs and Thomson, 2013). Therefore, we recruited three experienced raters who were native speakers of English (Mage = 34 years; 3 females). All the participant raters had experience of teaching ESL which ranged from 1 to 10 years (Myears of teaching = 5.5), and they were trained for conducting a speaking assessment and teaching English phonology at a primary school. The NS raters were carefully selected and screened for their familiarity with Spanish-accented speech to allow for a more objective speech samples assessment. Consequently, the raters were asked to complete a questionnaire to report their familiarity with Spanish-accented English, which was M = 1.7 on a 6-point scale (1 = ‘not at all’; 6 = ‘very familiar’). Finally, raters conducted an assessment of 47 speech samples by rating different dimensions of participants’ L2 speech as presented in the supplemental material. For the raters’ background questionnaire and materials, see supplemental material S4.
b Rating procedure
The judgements of 47 speech samples were conducted individually by native speaker raters in a quiet room on primary school premises. To ensure consistency, the raters were provided with training on each of the seven speech measures. Following the procedure outlined in Saito et al. (2017), first the researcher provided each rater with a brief explanation of the speech features to be assessed, specifically segmental accuracy. Next, the raters familiarized themselves with the IELTS task and the follow-up questions, and practiced the rating procedure by evaluating three speech samples that were not included in the main dataset (Saito et al., 2017). For training scripts and descriptors, see supplemental material S5.
As for segmentals, word stress, intonation, rhythm, and speech rate, the assessment was audio-based which refers to the evaluation of these speech features solely based on the auditory information present in the speech samples. In other words, the raters relied on listening to the audio recordings of the speech samples rather than considering any visual or written cues. Only the first 30s of speech samples were used. Each speech sample was played once on the researcher’s personal laptop. The raters were asked to assess each category on the 9-point Likert scale (1= not targetlike, 9 = targetlike). Given that detailed lexical analysis requires longer samples (e.g. 3–4 minutes, see Saito et al., 2016), full length (M length = 3.3) of participants’ voice samples were transcribed orthographically according to the procedure used by Crossley et al. (2014). Following this procedure, all audio transcripts were cleaned by removing mispronunciation errors (e.g. I went on the sheep cruise was transcribed as ‘I went on the cruise ship’) and filled pauses (e.g. uh, oh). The raters used a 9-point Likert scale to assess the lexical accuracy (1 = ‘many inappropriate words used’; 9 = ‘consistently appropriate vocabulary’) and the morphosyntactic accuracy (1 = ‘not targetlike’; 9 = ‘targetlike’). The inter-rater agreement for all linguistic concepts in the present study was high and reached for segmental errors: .972; word stress: .945; intonation: .935; rhythm: .933; speech rate: .935; lexical accuracy: .958; morphosyntactic accuracy: .965.
V Results
As summarized in Table 3, participants’ L2 oral abilities, as well as perceptual-cognitive abilities, varied widely. Since participants’ declarative memory scores were significantly deviant from a normal distribution (p < .05), their scores were transformed via the log10 function. Similar to the prior research (Saito et al., 2017), Pearson correlation analyses found participants’ L2 oral abilities to be significantly correlated with each other (r = .705 to .962, p < .001). In terms of the relationship between participants’ auditory and memory abilities, a set of Spearman correlation analyses were conducted on the raw scores (an alpha set to .008 after Bonferroni corrections). Spectral acuity was significantly correlated with temporal acuity (r = .543, p = .001), suggesting that both measures tap into one broad category of auditory processing. Although there was a significant correlation between spectral acuity and declarative memory (r = –.409, p = .004), none of the other contexts reached statistical significance (r = –.346 to –.112, p = .017 to .455). The results here indicated that auditory processing, declarative memory, and procedural memory might be somewhat overlapping but essentially independent constructs (at least within the current study).
Descriptive statistics of participants’ second language (L2) oral and perceptual-cognitive abilities.
Notes. afor Kolmogorov–Smirnov test. For spectral and temporal acuity, lower scores are an indicator of a higher auditory sensitivity. For declarative and procedural memory, the higher the score the better sequential learning.
In the statistical analysis, our first objective was to examine the relationship between experience and age-related variables with participants’ auditory processing (temporal and spectral acuity) and long-term memory (declarative and procedural memory) in order to answer research question 1. Specifically, we aimed to investigate how auditory processing, procedural and declarative memory, and factors related to the experience of L2 learners (such as Length of Residence [LOR], Age on Arrival [AOA], Current L2 Use, and Foreign Language Education) interact with each other. This analysis allowed us to explore the complex interplay between these variables and provide insights into their influence on L2 learning. To this end, a set of multiple regression analyses were performed with participants’ auditory and memory scores as dependent variables relative to four predictor variables: (1) age of arrival (AOA) in the UK (17–33 years), (2) length of residence (short vs. long-LOR group), (3) current L2 use (%), and (4) foreign language experience.
As summarized in Table 4, none of the experience variables were significantly associated with participants’ temporal and spectral acuity and declarative memory (p > .05). In contrast, procedural memory was strongly tied to the LOR category (short- vs. long-LOR; p = .003). To follow up, an independent t-test was performed, to isolate and confirm the direct group-level difference in procedural memory scores between the two LOR groups without accounting for other predictor variables. This analysis found that the attainer group (LOR > 6 years) demonstrated larger procedural memory than the short-LOR group (LOR = 1–5 years) with large effects, t(45) = 3.257, p = .003, d = 0.942. In addition, a Pearson correlation analysis was conducted to find the relatively strong linear/continuous relationship between participants’ individual differences in procedural memory and immersion profiles, r = .714, p < .001 (see Figure 1).
Summary of multiple regression analyses.
Notes. afor group categories (short- vs. long-LOR); bfor Variance Inflation Factor.

Differences in procedural memory between short vs. long length-of-residence groups.
VI Perceptual, cognitive, and experiential correlates of L2 oral ability
The final objective of the statistical analyses to address research question 2 was to examine how both aptitude factors (temporal and spectral acuity, declarative and procedural memory) and bio factors (AOA, LOR, current L2 use, foreign language education) interacted to determine the outcomes of participants’ L2 oral abilities. To this end, linear mixed-effects modelling analyses were conducted using seven different dimensions of participants’ L2 speech proficiency scores i.e. vowel and consonant accuracy, words stress, intonation, rhythm, speech rate, and lexicogrammar (47 participants × 7 dimensions = 329 observations).
In Model 1, fixed effects included (1) temporal acuity, (2) spectral acuity, (3) declarative memory, (4) procedural memory, (5) age of acquisition, (6) length of residence, (7) current L2 use, and (8) foreign language education. Random effects included participants’ ID (1–47) and composite scores of seven different dimensions of oral abilities (segmentals, word stress, intonation, rhythm, speech rate, lexical accuracy, and morphosyntactic accuracy) which were standardized and validated before entering the analysis. According to the Variance Inflation Factor, the eight fixed effects factors did not show clear evidence of multicollinearity (VIF = 1.065–1.511).
As summarized in Table 5, the model identified the significant main effects of spectral acuity (β = –.462, t = –6.094, p < .001), procedural memory (β = .012, t = 15.427, p < .001), and LOR (β = –.389, t = –3.263, p = .001). The results indicated that those with more advanced L2 oral abilities were likely to have had not only a longer immersion experience (LOR > 6 years) but also higher levels of auditory precision and procedural memory. Effect size calculations indicated that spectral acuity demonstrated a medium-to-large effect (d = –.680), procedural memory had a very large effect (d = 1.72), and LOR exhibited a small-to-medium effect (d = –.360).
Summary of mixed-effects modelling analyses of second language (L2) oral abilities relative to eight predictor variables.
Focusing on the three significant predictors (i.e. spectral acuity, procedural memory, and LOR), Model 2 was constructed to further examine the extent to which these factors interact to determine L2 oral abilities. Similar to Model 1, a factorial model yielded significant main effects of procedural memory (β = .012, t = 13.962, p < .001, d = 1.56) and LOR (β = –.662, t = –2.986, p = .003, d = 0.330). Whereas the main effects of spectral acuity became non-significant (β = –.289, t = –1.362, p = .174), the interaction effects of auditory acuity and LOR reached statistical significance (β = –.597, t = –2.339, p = .020, d = 0.260).
To examine the interaction between auditory processing and Length of Residence (LOR), we conducted two Pearson correlation analyses involving participants’ spectral acuity and their averaged L2 oral abilities (n = 27). This analysis allowed us to gain insight into the relationship between these variables and explore the potential impact of LOR on auditory processing in relation to L2 oral abilities. As visually summarized in Figure 2, the relationship between auditory processing and L2 oral abilities was strong among the short-LOR group (LOR = 1–5 years; r = –.670, p < .001), but non-significant among the long-LOR group (LOR > 6 years; r = –.379, p = .051). Interestingly, the correlation coefficients between procedural memory and averaged L2 oral abilities were comparable across the two groups (r = .848 and .878, p < .001).

Correlations between auditory processing (spectral acuity), procedural memory, and L2 oral abilities as per two different immersion groups (short vs. long). (a) Short-LOR group (n = 20 with 1–5 years of immersion experience). (b) Long-LOR group (n = 27 with 6–20 years of immersion experience).
VII Discussion
Focusing on a total of 47 Spanish speakers of English who were assumed to represent the initial-to-mid state of L2 speech learning (n = 20 for 1–5 years of immersion) and the final state of L2 speech learning (n = 27 for 6–20 years of immersion), the current study examined the complex relationship between auditory processing (temporal and spectral acuity), long-term memory (declarative and procedural), experience (LOR, AOA, current L2 use and foreign language education), and L2 oral abilities (segmentals, prosody, fluency, and lexicogrammar). With respect to the experience correlates of auditory processing and memory (research question 1), the results of multiple regression analyses showed that the model did not reveal any significant relationships between auditory processing or declarative memory and experience-related factors. In contrast, procedural memory yielded strong linear correlations with participants’ profiles of the length of residence in the UK (1–20 years). The findings here suggest that both auditory processing and declarative memory can be considered as ‘aptitude’, which is, by definition, a stable trait that is unaffected by experience (Skehan, 2016).
Our argument is in line with the existing literature on the weak or nil relationship between auditory processing and L2 immersion experience (e.g. Saito et al., 2022), although auditory processing can be different when analyses compare groups of individuals with substantially different experience profiles (e.g. for tonal vs. non-tonal language users, see Bidelman et al., 2011; for simultaneous vs. sequential bilinguals, Krizman et al., 2015). Comparatively, very few studies have looked at the different types of memory profiles among L2 users with different experience and proficiency levels. There is some evidence showing that those with more L2 immersion experience tend to have a greater visuo-spatial and phonological short-term memory than monolinguals (e.g. Durand López, 2021; see also Anjomshoae et al., 2021), and that more advanced L2 users not only have ample immersion experience but also high-level cognitive capacities (e.g. Claussenius-Kalman et al., 2021; Linck et al., 2013). On a broad level, we echo Ullman’s (2004, 2005) view that more advanced L2 learners may have a greater procedural memory in particular. At the initial stage of L2 learning, individuals may mainly adopt declarative memory to establish explicit linguistic knowledge. Throughout a long-term immersion experience, they gradually rely on procedural memory to access, reinforce, and automatize what they have declaratively learned. To our knowledge, however, little empirical work has been conducted on how the L2 immersion experience shapes procedural memory.
With respect to the relative weights of auditory processing, memory, experience, and L2 oral abilities, the results of mixed-effects modelling analyses showed that L2 outcomes can be determined by three different predictor variables, i.e. procedural memory, length of residence (short vs. long), and auditory processing (spectral acuity). According to the follow-up analyses (see Model 2), it was shown that whereas procedural memory was equally predictive of L2 oral skills regardless of LOR profiles (short vs. long), the link between auditory processing and proficiency was strong among those with a medium amount of immersion experience (LOR = 1–5 years) relative to those with an extensive amount of immersion experience (LOR = 6–20 years).
The findings presented here offer empirical backing for Skehan’s (2016) aptitude-acquisition model, which emphasizes auditory processing as a significant influencer in the initial phases of L2 learning (Kachlicka et al., 2019) and underscores procedural memory as a crucial aptitude for achieving advanced levels of L2 proficiency later on (Linck et al., 2013). However, it is essential to approach these results with caution due to the limited sample size per LOR group. Future research directions could involve expanding the sample size to enhance the generalizability of the findings.
Interestingly, participants’ declarative memory was not significantly correlated with L2 oral abilities (nor with any of the biographical backgrounds). This could be partially due to the fact that the current investigation highlighted that highly educated L2 learners with enough L2 linguistic knowledge (all participants had received years of foreign language education prior to immersion) may have already passed the very initial stage of L2 immersion (all participants had spent at least one year in the UK). In fact, previous research has shown that declarative memory plays a key role in the initial stage of language-focused, analytic L2 learning (Morgan-Short et al., 2014; Quam et al., 2018). The current study adds that L2 learners may not draw on declarative memory, especially when they are already immersed in L2 speaking environments (LOR > 1 year) where they regularly work on improving spontaneous oral L2 abilities with a view of achieving communication in various social settings (Granena, 2019).
Taken together, three tentative conclusions can be derived to disentangle the effects of auditory processing, memory, experience, and L2 speech learning. First, auditory processing and experience can make a separate contribution to L2 speech learning. Whilst all L2 learners can enhance L2 oral abilities as a function of increased experience, those with more precise spectral acuity may make the most of every input opportunity at the initial-to-mid stage of L2 speech learning (LOR 1–5 years), resulting in more advanced L2 oral proficiency in the long run. Second, the relationship between procedural memory, experience, and L2 oral abilities could be triangular in nature: the more immersion experience learners engage in, the more robust procedural memory they obtain and the more advanced, automatized, and nativelike L2 oral abilities they can use in the long run. Third, the acquisitional role of declarative memory remains open to further investigation as to the development of spontaneous L2 oral abilities in naturalistic settings.
Supplemental Material
sj-docx-1-slr-10.1177_02676583251317909 – Supplemental material for Effects of auditory processing, memory, and experience on early and later stages of second language speech learning
Supplemental material, sj-docx-1-slr-10.1177_02676583251317909 for Effects of auditory processing, memory, and experience on early and later stages of second language speech learning by Linda Bakkouche and Kazuya Saito in Second Language Research
Footnotes
Acknowledgements
This work originates from the first author’s (LB) master’s dissertation, submitted to University College London in 2021. We extend our gratitude to all participants for their involvement and to the Second Language Research reviewers and associate editor, Guilherme Garcia, for their constructive feedback.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The development and validation of the test materials was supported by a Leverhulme Trust Research Grant (RPG-2024-391) and a UK-ISPF Research Grant (1185702223), both awarded to the second author (KS).
Open data statement
All the research materials listed below have been deposited on the open science platform for researchers and teachers, L2 Speech Tools:
Mora-Plaza et al. (2022). Tools for second language speech research and teaching. http://sla-speech-tools.com.
.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
