Abstract
This study investigates if two groups of experienced late bilinguals (Arabic-English, English-Arabic) produce the Arabic vowels /ɪ, u, a/ and the English vowels /ɪ, ʊ, æ/ with nativelike formant values (F1, F2) compared with Arabic and English monolinguals, respectively. We aimed to characterize the relationship between second-language (L2) acquisition and first-language (L1) attrition of vowels, that is, does nativelike acquisition of an L2 vowel correspond to attrition of a phonetically similar L1 vowel, and vice versa? Moreover, we explored if nativelikeness of bilingual vowel productions is influenced by the predictor variable sound discrimination aptitude. Results show that bilinguals who produce nativelike L2 vowels are also able to maintain native L1 productions, suggesting that an increased L2 proficiency does not inevitably entail a decline in L1 proficiency.
1 Introduction
Many vowel production studies have shown that late L2 learners, despite prolonged L2-learning experience, often fail to produce L2 vowels with nativelike acoustic features (e.g., Baker & Trofimovich, 2005; Bohn & Flege, 1992; Cobb & Simonet, 2015; Flege et al., 1997, 1999, 2003; Levy & Law, 2010; Piske et al., 2002). Within the framework of the Speech Learning Model (SLM; Flege, 1995; see Flege & Bohn, 2021, for a revised version of the SLM), category
Assimilation and dissimilation processes may not only impact L2 productions but have also been shown to affect L1 phonetic features; that is, bilinguals may produce sounds (e.g., Alharbi et al., 2023; Bergmann et al., 2016; Kornder & Mennen, 2021; Mayr et al., 2012) or prosodic features (e.g., De Leeuw et al., 2012; Mennen et al., 2022) in their L1 differently from monolinguals. Such L2-induced modifications of L1 pronunciation experienced by L2-immersed late bilinguals are referred to as
In terms of vowel production, research has shown that the presence of an L2 system might lead to changes in the realization of L1 vowels, that is, a bilingual’s L1 vowel categories may move acoustically closer (as evidenced in, for example, formant frequency changes) to corresponding L2 vowels, or similar L1 and L2 vowels move further apart (e.g., Bergmann et al., 2016; Flege, 1987; Kornder & Mennen, 2021; Mayr et al., 2012).
While L2 acquisition and L1 attrition of segments are well-studied phenomena, there is still a small number of studies that explore how acquisition and attrition are
1.1 Arabic and English vowels
The target varieties in our study are Modern Standard Arabic (MSA) and Standard Southern British English (SSBE), which are representative of the standard-oriented varieties spoken by our participants.
SSBE distinguishes six short monophthongs /ɪ, ɛ, a, ʌ, ɒ, ʊ/, five long monophthongs /iː, ɜː, ɑː, ɔː, uː/, and eight diphthongs /eɪ, aɪ, ɔɪ, ǝʊ, aʊ, ɪǝ, eǝ, ʊǝ/ that can occur in stressed syllables. English vowels can be characterized in terms of vowel quality (vowel height and vowel frontness) and duration, with the latter distinguishing English short from long monophthongs (e.g., Ladefoged, 2005; Williams & Escudero, 2014).
Compared with English, MSA has a simpler vowel system with only three short /ɪ, a, u/ and three long /iː, aː, uː/ monophthongs as well as two diphthongs /aw, aj/ (some Arabic dialects have more distinctions though; see, for example, Holes, 2004). MSA short vowels contrast with their long counterparts, that is, vowel length is phonemic in Arabic (e.g., Alghamdi, 1998; De Jong & Zawaydeh, 2002; Ryding, 2005), for example, /dʒaˈm
Here, we focus on the short monophthongs that are phonetically similar but not identical (i.e., they differ acoustically in terms of F1 and F2; see section 3.1) in Arabic and English, namely, short /a-æ/, /ɪ/ and /u-ʊ/. 2
1.2 Individual differences
Individuals may differ from each other in terms of successful L2 acquisition and L1 maintenance. Such differences are evident even among age-matched bilinguals with similar language learning histories, and they may be related to various cognitive, motivational and attitudinal factors (see Hansen Edwards, 2018, for an overview). In our study, we explore if and to what extent
1.2.1 Sound discrimination aptitude
Sound discrimination aptitude refers to a cognitive ability that is largely outside one’s control. This capacity is commonly considered an integral part of general language learning aptitude, that is, the innate
In the context of bilingual speech acquisition, the ability to perceive distinct phonetic characteristics of L2 sounds and distinguish them from perceptually-linked L1 sounds is considered essential for accurate production (e.g., Flege, 1995), though more recent models on L2 speech acquisition propose that the link between speech sound perception and production abilities is less straightforward than that (see De Leeuw & Chang, 2024; Flege & Bohn, 2021). As phonemic coding ability includes both sound discrimination and mimicry abilities, individuals with a low phonemic coding ability may encounter difficulties when identifying and remembering unfamiliar sounds and are also less likely to produce them accurately (e.g., Flege et al., 1999; Purcell & Suter, 1980; Saito et al., 2019; Thompson, 1991).
In the context of L1 attrition, speech
1.3. Research questions
The present study aims to expand on the body of research exploring the relationship between L2 acquisition and L1 attrition of
2 Methods
2.1 Participants
The study included four speaker groups (
The second bilingual group included English-Arabic bilinguals (4 males, 11 females; mean age = 33.93 years; range = 33–39 years;
Participants in both bilingual groups had fully acquired their L1 (Arabic or English) in their country of birth and had learned their L2 (English or Arabic) in late adolescence or early adulthood, that is, they can be described as late consecutive bilinguals. Based on the results of the grammar and listening comprehension parts of an English proficiency test (a practice version of the Test of English as a Foreign Language; Educational Testing Services [ETS], 2021) (
Monolingual Arabic and monolingual English speakers served as controls and were matched to the bilingual speakers as closely as possible in terms of level of education (all participants held at least a Bachelor’s degree), regions of residence and thus regional dialects, as specified above. The Arabic monolinguals (4 males, 11 females; mean age = 36.07; range = 30–56 years;
All four participant groups were matched as closely as possible in terms of age and gender. A linear model with age as response variable and participant group as fixed factor (sum-coded for ANOVA-style main effects) shows that the four participant groups do not differ significantly in age (β = -0.15,
2.2 Materials
All anonymized data, analysis scripts, and a result summary are available at https://osf.io/sp2mq/.
2.2.1 Vowel productions
We obtained Arabic and English vowel production data for the short vowels /a-æ/, /ɪ/ and /u-ʊ/ through a reading task. Some studies suggest that L1 attrition effects may be more pronounced in casual speaking styles (spontaneous conversation) than in formal contexts (sentence or word-list reading) because more formal speaking contexts allow participants to pay closer attention to their pronunciation (e.g., Hévrová et al., 2020; Major, 1992). Our aim, however, was to include an equal number of vowel repetitions produced by the individual speakers to provide a representative comparison of their L1 and L2 productions across and within speaker groups. As discussed in previous studies on L1 attrition and L2 acquisition of segmental speech production, using (semi-)spontaneous speech samples does not always allow for testing a balanced number of tokens per target sound and speaker (see Bergmann et al., 2016; Kornder & Mennen, 2021). In addition, as vowels are particularly prone to be affected by co-articulation, we wanted to ensure that the target vowels occurred in a neutral /hVd/ consonant framework in both languages (see, for example, Di Paolo et al., 2011; Robb & Chen, 2009), which would have been rather difficult to control for—if not impossible—in spontaneous speech samples. Accordingly, the English target words included in our study were
Following Mayr et al. (2012), we asked our participants to produce the target words both in isolation and in a fixed carrier phrase. Even though we do not expect considerable production difference in these two conditions, vowels produced in an isolated context may be slightly more peripheral than vowels produced in an embedded context. In our analyses, we do not collapse these two conditions but include speaking condition as a factor (see section 2.2.6) to be able to compare our findings to previous vowel production studies that typically make use of either isolated (e.g., Cobb & Simonet, 2015) or embedded (e.g., Levy & Law, 2010) elicitation tasks. To activate the intended vowel categories (particularly for Arabic nonwords), we primed the target words with real words containing the same vowel sound, for example,
2.2.2 Language background questionnaire
We collected bilinguals’ demographic data (age, gender, years of L1/L2 instruction, language qualifications) using a Language Background Questionnaire (adapted from Schmid, 2002). We also asked participants to report on how frequently they used their respective L1 and L2 with their parents, siblings, partner/spouse, children, and others (including work contexts and friends; see Appendices B and C, for language use information).
2.2.3 Sound discrimination
To investigate sound discrimination aptitude, bilinguals completed an adapted version of
2.2.4 Procedure
We collected data in Saudi Arabia and in the United Kingdom, recruiting participants mainly via personal contacts. The bilingual groups participated in three sessions, each taking place on a different day, with a maximum of 1 week in between: In the first session, the bilinguals completed the proficiency test for their respective L2 and the sound discrimination task (see Alharbi, 2020, for additional tasks not considered here). We then collected Arabic and English production data in two separate sessions. For the vowel production task (see Alharbi et al., 2023; Alharbi, 2020, for plosive and prosody production tasks), participants read out the Arabic and English target words 3 times isolation and 3 times embedded in a carrier phrase while being recorded with a handheld Sony tape recorder (
2.2.5 Data pre-processing
We examined English and Arabic vowels by conducting formant analyses of F1 and F2, corresponding to vowel height and vowel frontness, respectively. Vowel height is negatively correlated with F1, that is, high vowels typically have low F1 values and low values have comparatively higher F1 values. Vowel frontness, by contrast, is correlated with F2, that is, front vowels have high F2 values while back vowels have comparatively lower F2 values (see, for example, Harrington & Cassidy, 1999). For each of the elicited vowels, we set segment boundaries manually in Praat (Boersma & Weenink, 2016) by identifying vowel onset and end point in the spectrogram. Formants were measured at the temporal mid-point of the vowel,
6
using the
To eliminate variation caused by physiological and age-related differences among female and male speakers, we normalized the formant values with the Bark Difference Method (modified from Syrdal & Gopal, 1986), using the
2.2.6 Data analysis
For each vowel, we conducted three sets of analyses in R (R Core Team, 2023), with two analyses per set, one for tongue height (F1-Bark) and one for tongue frontness (F2-Bark). We decided to analyze the three vowels separately considering that our analyses would get too complex and difficult to interpret if we were to conduct one large analysis for all vowels. In addition, individual analyses allow us to more closely determine if and to what extent the vowels are affected differently by both acquisition and attrition. To see if the experimental manipulations affected L2 (RQ1) and L1 (RQ2) productions, we ran linear mixed-effects models using the lme4 package (Bates et al., 2015). The statistical model for each analysis included
To see if participants’ individual differences in terms of sound discrimination aptitude affected vowel productions (RQ3), we ran further linear mixed-effects models restricted to the bilingual groups. The response variables for the models were the distance in tongue height (F1-Bark) and tongue frontness (F2-Bark), respectively, from the monolingual norms (distance-from-norm). To calculate these distances for each language, we subtracted the average F1-Bark and F2-Bark of the respective monolingual groups from each F1-Bark and F2-Bark value for bilingual participants and then took the absolute value of each difference. For example, if a bilingual participants’ F1-Bark value for a certain production of the English vowel /æ/ is 6.3 Bark and the average F1-Bark from monolingual productions of the English /æ/ is 8.3 Bark, then the distance in tongue height from the monolingual norm for this production is |6.3 – 8.3| = |−2| = 2 Bark. The larger this absolute distance from the respective norm, the less nativelike we consider participants’ productions to be. The statistical model for each analysis included
For all linear mixed-effects models, we centered and sum-coded all fixed factors prior to analysis to avoid multicollinearity and to get ANOVA-style main effects and interactions. For each analysis, we then conducted model comparisons using the
To see if successful L2 acquisition might entail L1 attrition (RQ4), we ran Pearson correlations for the bilingual participants.
7
For each participant, vowel, speaking condition, and language, we calculated the distance in tongue height (F1-Bark) and tongue frontness (F2-Bark), respectively, from the monolingual norms. Unlike the previous analysis, where we calculated the distance of
3 Results
3.1 Comparison with previous vowel studies
We first compare the F1/F2 vowel space for /a-æ/, /ɪ/ and /u-ʊ/ of our bilingual and monolingual speaker groups with published reference values (which are only available in Hertz for Arabic) to determine if the formant measurements obtained here are comparable to previous measurements. For Arabic, we chose the average F1 and F2 values of five male 8 Saudi Arabic speakers reported in Alghamdi (1998). For English, we chose the weighted average 9 F1 and F2 values of the male and female speakers in Deterding (1997). Raw and Bark-converted F1 and F2 values for each vowel, language, and speaking condition can be found in Appendices D to F.
Figure 1 shows that our speakers’ Arabic F1 and F2 values are overall higher than the values obtained for Alghamdi’s (1998) Arabic speakers, most likely because Alghamdi (1998) included male speakers only, who tend to have lower F1 and F2 values compared with women (e.g., Huber et al., 1999; Traunmüller, 1988), while participants in our study were mainly female. Figure 1 also shows that our speakers’ English F1 and F2 values are less peripheral than those reported in Deterding (1997). This would be expected as our values comprise vowels from words produced in isolation and in a sentence, whereas values from Deterding (1997) come from citation forms. Overall, our F1 and F2 values for all three Arabic and English vowels are comparable to the values reported in previous studies.

Vowel space for the Arabic and English short vowels /a/-/æ/, /ɪ/ and /u/-/ʊ/ for monolinguals, A-E bilinguals, E-A bilinguals, and reference values.
3.2 Results of formant analyses
For each vowel we now analyze whether (a) the
3.2.1 Vowel /a/-/æ/
Experimental manipulations
We first consider whether the experimental manipulations affect F1-Bark and F2-Bark values of the vowel /a-æ/. Figure 2 shows that Arabic /a/ is produced at an overall higher position, as manifested in higher F1-Bark values, compared with English /æ/. On the front-back axis, Arabic /a/ shows a larger spread in values compared with English /æ/.

F1-Bark and F2-Bark values for Arabic /a/ and English /æ/ in the isolation and sentence conditions. The rectangles represent average F1-Bark and F2-Bark values for the Arabic monolinguals (A), English monolinguals (E), Arabic-English bilinguals (A-E), and English-Arabic bilinguals (E-A). Dots and ellipses show individual data points and distributions, respectively, for each participant group.
The final statistical model for F1-Bark (fixed effects: language, speaking condition; random intercepts: participants; random slopes: speaking condition by participant) in Table 1 shows significant main effects of language, with significantly higher F1-Bark for Arabic compared with English, and speaking condition, with significantly higher F1-Bark for the sentence condition compared with the isolation condition.
Statistical results for the experimental manipulations analysis for F1-Bark for the vowel /a/-/æ/.
The final model for F2-Bark had no fixed effects (random intercepts: participants), such that our experimental manipulations did not significantly affect tongue frontness.
Individual differences
We now explore if our individual differences measure affects the nativeness of /a-æ/ productions. The final model for the F1-Bark distance-from-norm analysis had no fixed effects (random intercepts: participants; random slopes: language by participant), such that speaker group, language, and sound discrimination aptitude did not affect /a-æ/ nativeness in terms of tongue height.
The final model for the F2-Bark distance-from-norm analysis (fixed effects: language; random intercepts: participants; random slopes: language by participant) in Table 2 shows a significant main effect of language, with significantly more target-like productions in terms of tongue frontness for English /æ/ than Arabic /a/. Sound discrimination aptitude did not affect the nativeness of participants’ /a-æ/ productions.
Statistical results for the individual differences analysis for F2-Bark for the vowel /a/-/æ/.
L2 acquisition, L1 attrition relationship
Our third analysis explores whether nativelike realizations of L2 vowels entail nonnative productions of the corresponding L1 vowels. Figures 3 and 4 show how far each individual participant’s F1-Bark and F2-Bark are from monolingual Arabic and English norms. While all correlations had a positive correlation coefficient, we found a significantly positive correlation only for F1-Bark in the sentence condition, suggesting that for /a-æ/ in a carrier phrase, nativelike L2 realizations entail nativelike L1 productions in terms of tongue height.

F1-Bark distance from the Arabic and English norms for the vowel /a-æ/.

F2-Bark distance from the Arabic and English norms for the vowel /a-æ/.
/a/-/æ/ summary
Overall, our analyses suggests that the tongue is in a higher position for Arabic /a/ compared with English /æ/ and for the sentence condition compared with the isolation condition. The absence of any effects involving speaker group suggests that bilinguals have nativelike /a/-/æ/ productions in their respective L2s (RQ1) and show no L1 attrition (RQ2). We also found more target-like productions in terms of tongue frontness for English /æ/ than Arabic /a/ (RQ3). Finally, we found that, for the sentence condition, a more nativelike tongue height in the L2 relates to a more nativelike tongue height in the L1 (RQ4).
3.2.2 Vowel /ɪ/
Experimental manipulations
We now consider whether our experimental manipulations affect the F1-Bark and F2-Bark values of the vowel /ɪ/. Figure 5 reveals that for tongue height, Arabic /ɪ/ shows a larger spread of data points, particularly for monolingual Arabic productions, compared with English /ɪ/. In addition, Arabic /ɪ/ is produced at an overall higher position compared with English /ɪ/.

F1-Bark and F2-Bark values for Arabic and English /ɪ/ in the isolation and sentence conditions. The rectangles represent average F1-Bark and F2-Bark values for the Arabic monolinguals (A), English monolinguals (E), Arabic-English bilinguals (A-E), and English-Arabic bilinguals (E-A). Dots and ellipses show individual data points and distributions, respectively, for each participant group.
The final statistical model for F1-Bark (fixed effects: language, speaker group, speaking condition, speaker group by language interaction, speaking condition by language interaction; random intercepts: participants; random slopes: speaking condition by participant) in Table 3 shows a significant main effect of speaker group, but not of speaking condition and language. The analysis also revealed significant interactions for speaker group by language and speaking condition by language.
Statistical results for the experimental manipulations analysis for F1-Bark for the vowel /ɪ/.
For speaker group, post hoc comparisons showed only that the E-A bilinguals’ F1-Bark values are significantly higher compared with monolinguals.
For the group by language interaction, post hoc simple contrasts for speaker group revealed significant differences only for Arabic /ɪ/, with significantly higher F1-Bark values for both bilingual groups compared with monolinguals. The simple contrasts for language only revealed significant differences for E-A bilinguals, with significantly higher F1-Bark values for Arabic compared with English.
For the speaking condition by language interaction, post hoc simple contrasts for speaking condition revealed a significant difference only for Arabic /ɪ/, with significantly higher F1-Bark for the sentence than the isolation condition. The simple contrasts for language revealed no significant differences.
The final statistical model for F2-Bark (fixed effects: speaking condition; random intercepts: participants; random slopes: speaking condition by participant) in Table 4 shows a significant main effect of speaking condition, with significantly higher F2-Bark for /ɪ/ in the sentence condition compared with the isolation condition.
Statistical results for the experimental manipulations analysis for F2-Bark for the vowel /ɪ/.
Individual differences
We now investigate if our individual differences measure affects the nativeness of bilinguals’ /ɪ/ productions. The final model for the F1-Bark distance-from-norm analysis (fixed effects: language; random intercepts: participants; random slopes: language by participant) in Table 5 shows a significant main effect for language, such that bilinguals’ productions are significantly closer to the monolingual norm in terms of tongue height for English /ɪ/ than for Arabic /ɪ/. We found no significant effect of sound discrimination aptitude on the nativeness of /ɪ/ in terms of tongue height.
Statistical results for the individual differences analysis for F1-Bark for the vowel /ɪ/.
The final model for the F2-Bark distance-from-norm analysis had no fixed effects (random intercepts: participants; random slopes: language by participant). Sound discrimination aptitude did not affect the nativeness in terms of tongue frontness of our bilinguals’ /ɪ/ productions.
L2 acquisition, L1 attrition relationship
The third analysis focuses on the acquisition-attrition relationship of the /ɪ/-vowel. Figures 6 and 7 show how far each participant’s F1-Bark and F2-Bark are from the monolingual Arabic and English norms, respectively. While all correlations again had a positive correlation coefficient, we found a significantly positive correlation only for F2-Bark in the sentence condition, suggesting that for /ɪ/ in a carrier phrase, nativelike L2 realizations entail nativelike L1 productions in terms of tongue frontness.

F1-Bark distance from the Arabic and English norms for the vowel /ɪ/.

F2-Bark distance from the Arabic and English norms for the vowel /ɪ/.
/ɪ/ summary
Overall, our analysis suggests that Arabic /ɪ/ is produced with a higher tongue position compared with English /ɪ/. Furthermore, E-A bilinguals produce Arabic /ɪ/ at an overall higher position than Arabic monolinguals, suggesting nonnative L2 productions (RQ1). Similarly, the A-E bilinguals produce Arabic /ɪ/ at a higher position than Arabic monolinguals, indicating L1 attrition (RQ2). No such effects were found for the E-A and A-E bilinguals’ English /ɪ/, suggesting nativelike productions (successful acquisition, no attrition). We also found more target-like productions in terms of tongue height for English compared with Arabic /ɪ/ (RQ3). Finally, we found that, for the sentence condition, a more nativelike tongue frontness in the L2 relates to a more nativelike tongue frontness in the L1 (RQ4).
3.2.3 Vowel /u/-/ʊ/
Experimental manipulations
We first analyzed whether our experimental manipulations affect F1-Bark and F2-Bark values for /u-ʊ/. Figure 8 shows a larger spread of data points on both axes for Arabic /u/ compared with English /ʊ/. Furthermore, the bilinguals’ Arabic /u/ productions have overall higher F1-Bark and F2-Bark values, that is, a higher and more back position compared with the monolinguals.

F1-Bark and F2-Bark values for Arabic /u/ and English /ʊ/ in the isolation and sentence conditions. The rectangles represent average F1-Bark and F2-Bark values for the Arabic monolinguals (A), English monolinguals (E), Arabic-English bilinguals (A-E), and English-Arabic bilinguals (E-A). Dots and ellipses show individual data points and distributions , respectively, for each participant group.
The final statistical model for F1-Bark (fixed effects: speaker group, speaking condition, language, group by language interaction; random intercepts: participants) in Table 6 only shows significant main effects for speaker group and speaking condition, with the latter showing higher F1-Bark values for the sentence compared with the isolation condition. Post hoc pairwise comparisons for speaker group revealed that the two bilingual groups’ F1-Bark values were significantly higher than the monolinguals’ values, but the two bilingual groups did not differ significantly.
Statistical results for the experimental manipulations analysis for F1-Bark for the vowel /u/-/ʊ/.
The final model for F2-Bark (fixed effects: speaker group, speaking condition, language, speaker group by speaking condition interaction, speaker group by language interaction; random intercepts: participants; random slopes: speaking condition by participant) in Table 7 shows a significant main effect of speaker group (but with no significant differences across groups in the post hoc tests) and significant interactions of speaker group by speaking condition and speaker group by language.
Statistical results for the experimental manipulations analysis for F2-Bark for the vowel /u/-/ʊ/.
For the speaker group by speaking condition interaction, post hoc simple contrasts for speaking condition showed only significantly higher /u/-/ʊ/ F2-Bark values for E-A bilinguals than monolinguals in the isolation speaking condition. Post hoc simple contrasts for speaker group further showed significantly higher /u-ʊ/ F2-Bark values for the sentence compared with isolation condition for monolinguals, and significantly lower /u-ʊ/ F2-Bark values for the sentence compared with isolation condition for E-A bilinguals.
For the speaker group by language interaction, post hoc simple contrasts for language only showed significantly higher F2-Bark values for E-A bilinguals compared with monolinguals for Arabic /u/. Post hoc simple contrasts for speaker group showed only that the E-A bilinguals’ F2-Bark values are significantly higher for Arabic /u/ than English /ʊ/.
Individual differences
We now analyze whether our individual differences measure affects the nativeness of bilinguals’ /u-ʊ/ productions. The final model for the F1-Bark distance-from-norm analysis (fixed effects: language; random intercepts: participants; random slopes: language by participant) in Table 8 shows a significant main effect of language, with more target-like productions for English than Arabic.
Statistical results for the individual differences analysis for F1-Bark for the vowel /u/-/ʊ/.
The final model for the F2-Bark distance-from-norm analysis had no fixed effects (random intercepts: participants; random slopes: language by participant), such that there were no significant effects involving speaker group, language or sound discrimination aptitude.
L2 acquisition, L1 attrition relationship
The final analysis explores the acquisition-attrition relationship for /u-ʊ/. Figures 9 and 10 show how far each participants’ F1-Bark and F2-Bark are from the monolingual Arabic and English norms, respectively. While all Pearson correlations again had a positive correlation coefficient, none of them were significant. This suggests that, for the vowel /u-ʊ/, there is no direct relationship between successful L2 acquisition and L1 attrition/maintenance.

F1-Bark distance from the Arabic and English norms for the vowel /u-ʊ/.

F2-Bark distance from the Arabic and English norms for the vowel /u-ʊ/.
/u-ʊ/ summary
Overall, our analysis suggests that our bilinguals produced /u-ʊ/ at an overall higher position than the monolinguals in both languages. Furthermore, E-A bilinguals produced L2 /u/ with the tongue in a more back position (i.e., higher F2-Bark) than monolingual Arabic speakers, which suggests that they do not meet the native target for this L2 vowel (RQ1). At the same time, their L1 English /ʊ/ productions show no evidence of attrition (RQ2). No such effects were found for the A-E bilinguals, that is, both their L1 and L2 productions of /u-ʊ/ are nativelike. We also found more target-like productions in terms of tongue height for English /ʊ/ compared with Arabic /u/ (RQ3). Finally, we found no L2 acquisition, L1 attrition relationship for /u-ʊ/ (RQ4).
4 Discussion
The present study sought to identify whether long-term L2-immersed bilinguals produce the vowels /a-æ/, /ɪ/ and /u-ʊ/ with nativelike acoustic features in their L1 (Arabic/English) and L2 (English/Arabic) (RQ1, RQ2). In addition, we aimed to determine if and to what extent sound discrimination aptitude may account for individual differences in vowel production accuracy among our bilinguals (RQ3). A further objective was to gain insights into the relationship between L2 acquisition and L1 attrition of segmental productions in a bilingual context, that is, do nativelike L2 vowel productions entail attrition of the corresponding L1 vowels, or is the relationship between the two less straight-forward than that (RQ4)?
4.1 L2 acquisition (RQ1) and L1 attrition of vowels (RQ 2)
A number of acoustic studies has focused on the extent to which either only L1 (Bergmann et al., 2016; De Leeuw, 2019; Hévrová et al., 2020) or L1
Our vowel analysis revealed differences between the bilingual and the monolingual groups for the vowels /ɪ/ and /u-ʊ/: While producing English /ɪ/ with nativelike F1-Bark values (+ successful acquisition), the Arabic-English bilinguals realized the corresponding L1 vowel outside the monolingual Arabic norm (+ L1 attrition). The English-Arabic bilinguals, however, produced L2 Arabic /ɪ/ with nonnative F1-Bark values (–successful acquisition) while maintaining nativelike values for L1 /ɪ/ (–L1 attrition). Thus, only F1-Bark of Arabic /ɪ/ was affected. Similarly, we found higher F1-Bark for /u-ʊ/ in both bilingual groups compared with the monolinguals (–successful acquisition, +L1 attrition). As discussed by Mayr et al. (2012), changes affecting F1 are generally more likely to occur as the human ear is more sensitive to differences in lower than in higher frequencies, which could provide an explanation for our observations. We also found that F2-Bark of English /ʊ/ in an isolated condition was affected in one of the bilingual groups, such that the English-Arabic bilinguals produced L2 /u/ with F2-Bark-values outside the monolingual Arabic norm (–successful acquisition) while maintaining nativelike values in their L1 (–L1 attrition). The Arabic-English bilinguals, by contrast, produced both vowels in a nativelike fashion (+ successful acquisition, –L1attrition).
While differences between our bilingual and monolingual groups were observed for the vowels /ɪ/ and /u-ʊ/, the vowel /a-æ/ was produced within native ranges in both languages by both bilingual groups, that is, they successfully acquired the L2 vowel while maintaining nativelike values in their respective L1s. One reason for this may be found in the notion of similarity proposed by the SLM (Flege, 1995; Flege & Bohn, 2021). It posits that the greater the perceived distance between similar but acoustically nonidentical L1 and L2 sounds, the more likely it is that bilinguals establish two separate categories for these sounds. It might therefore be the case that the bilinguals in our study perceived Arabic /a/ and English /æ/ to be different enough—both acoustically and phonemically (Chang, 2019)—to categorize them as two distinct vowels.
The question of whether the two vowels are perceived to belong to the same or to different phonetic categories also raises the issue of how the relationship between perception and production accuracy should be characterized. Some models proposed in the context of L2 speech acquisition, such as the Perceptual Assimilation Model (PAM-L2; Best & Tyler, 2007), suggest that perception accuracy precedes production accuracy, that is, accurate perception is a prerequisite for accurate production. In contrast, more recent models, such as the Speech Learning Model-revised (SLM-r; Flege & Bohn, 2021) and ADAPPT (De Leeuw & Chang, 2024; see below for more details), argue that there is no clear link between speech perception and production. As noted by De Leeuw and Chang (2024), determining whether and to what extent production and perception accuracy are related requires a more comprehensive examination, not only of the acoustic-phonetic realizations of specific segmental and prosodic features, as investigated in the present study, but also of how individuals perceive these features. Therefore, the findings of our study do not allow for broad conclusions regarding the relationship between production and perception in bilinguals’ L1/L2. However, a vowel perception study could offer further insights into the extent to which our bilinguals perceive the Arabic and English short vowels as similar.
4.2 Individual differences (RQ3)
When it comes to how nativelike our bilinguals’ vowels were, we found that productions were overall more target-like for English than for Arabic. One possible explanation for this observation could be the widespread use of English across the world due to its role as a lingua franca. As a result of its international status, English is not only widely spoken—both as a native and even more so as a second or foreign language—but also frequently encountered in different media, and in educational and professional contexts. This widespread presence of English as a lingua franca may have contributed to a greater familiarity with its phonetic characteristics. This is likely to apply to both bilinguals whose L1 is Arabic but who live in an English-speaking country and to L1 English bilinguals.
Our predictor variable sound discrimination aptitude had no significant effects on the nativelikeness of /a-æ/, /ɪ/, and /u-ʊ/ productions. This finding may suggest that while language learning aptitude in general and phonetic aptitude in particular are often considered important factors in L2 acquisition (of pronunciation) (e.g., Baker Smemoe & Haslam, 2013), they may not be the ultimate determinant of production accuracy. Instead, external factors such as language exposure, quality and amount of L1/L2 input as well as the sociolinguistic status of Arabic and English, respectively, may either override the internal cognitive factor of aptitude or interact with it.
In summary, in terms of individual differences, we found more target-like productions for English compared with Arabic, potentially due to the role of English as a lingua franca. To gain a better understanding of how language learning aptitude may influence L2 acquisition and L1 maintenance, future studies need to more closely explore other cognitive skills associated with language learning aptitude, such as grammatical sensitivity, memory capacity or associative memory (i.e., the ability to remember new words) (see Carroll, 1981). Furthermore, future research should investigate the influence of sociolinguistic factors, such as language prestige, to provide a more comprehensive understanding of bilingual language acquisition and maintenance.
4.3 Relationship between acquisition and attrition (RQ4)
When it comes to characterizing the acquisition-attrition relationship in pronunciation, one hypothesis is that an increased L2 proficiency is likely to cause L1 proficiency to decline (e.g., Flege, 1987; Major, 1992). The relatively sparse number of studies (Alharbi et al., 2023; Kornder et al., 2023; Kornder & Mennen, 2021; Mayr et al., 2012; Stoehr et al., 2017) that examined the acquisition-attrition relationship in late bilinguals’ segmental productions supports this hypothesis, at least to some extent. Kornder et al. (2023), for example, found that the majority of their English-Austrian German bilinguals—despite some inter-speaker variability—produced the /i:/-/ɪ/ contrast in their L2 in a nativelike fashion, but at the same time exhibited nonnative productions of the corresponding L1 contrast. That is, their L1 vowel categories have approached L2-norms, presumably as a result of category assimilation (Flege, 1995; Flege & Bohn, 2021; see also Kornder & Mennen, 2021).
In our study, however, we observed quite the contrary. For the vowel /a-æ/ produced in a sentence, the closer our bilinguals’ F1-Bark Arabic values were to the monolingual Arabic norm, the closer were their English F1-Bark values to the monolingual English norm. We found the same significant positive correlation for the vowel /ɪ/ produced in a sentence, but only for F2. These findings suggest that bilinguals who produce L2 vowels in a nativelike fashion also maintain nativelike L1 categories. It may therefore be the case that nativelike categories in
Adopting the SLM as a theoretical framework to contextualize our findings, the observation that some of our bilinguals exhibit nativelike vowel categories in both the L1 and the L2 may be explained against the background of the category precision hypothesis proposed in the revised version of the SLM (Flege & Bohn, 2021). According to this hypothesis, L2 learners’ ability to establish a new phonetic category for an L2 sound depends, among other factors, on how precisely the closest L1 category has been specified at the onset of L2 learning. That is, speakers whose L1 categories are relatively precise when L2 learning begins are assumed to more successfully distinguish an L2 sound from its closest L1 counterpart compared with bilinguals whose L1 categories are less precise. Precision in this context is defined as “the variability of acoustic dimensions measured in multiple productions of a phonetic category” (Flege & Bohn, 2021, p. 36). Consequently, precise
Taken together, the previous and present findings suggest that the acquisition-attrition relationship may change over the course of bilingual experience and is not as stable and straightforward as one may expect. Rather, it seems that this relationship is adaptive and highly variable, which is in line with a dynamic systems approach (DST) to language development (e.g., De Bot et al., 2007, 2013). According to DST, language development proceeds dynamically and in a nonlinear fashion, such that even small variations in one part of the system may lead to differences in outcome (De Bot et al., 2007). It may be due to such subtle system-internal modifications that, as our findings reveal, not all vowels show a significant positive L2 acquisition, L1 attrition relationship (see also Kornder & Mennen, 2021; Mayr et al., 2012). As proposed by DST, language systems constantly adjust not only to system-internal mechanisms (e.g., cross-linguistic interaction between the L1 and L2) but also to external factors (De Bot et al., 2007). Consequently, as stated above, an exploration of how motivational, attitudinal and emotional factors potentially influence nativelikeness of L1/L2 segmental speech production will shed further light on the relationship between acquisition and attrition in the phonetic domain.
The notion that language development is inherently dynamic is also reflected in one of the core principles of a more recent theory proposed by De Leeuw and Chang (2024) in the context of L1 attrition and drift. The
Footnotes
Appendix
F1 and F2 values in both Hertz and Bark for the vowel /u/-/ʊ/ for each speaker group, speaking condition, and language.
| speaker group | speaking condition | formant | Arabic (Hertz) | English (Hertz) | Arabic (Bark) | English (Bark) |
|---|---|---|---|---|---|---|
| monolinguals | isolation | F1 | 586.98 | 536.92 | 9.27 | 10.07 |
| F2 | 1435.59 | 1467.76 | 4.10 | 4.54 | ||
| sentence | F1 | 564.37 | 520.06 | 9.71 | 10.22 | |
| F2 | 1411.25 | 1351.64 | 4.42 | 5.02 | ||
| A-E bilinguals | isolation | F1 | 479.64 | 495.31 | 10.49 | 10.17 |
| F2 | 1294.5 | 1297.73 | 5.13 | 4.99 | ||
| sentence | F1 | 491.66 | 491.01 | 10.58 | 10.70 | |
| F2 | 1326.62 | 1395.93 | 5.14 | 5.04 | ||
| E-A bilinguals | isolation | F1 | 502.16 | 497.51 | 10.43 | 10.49 |
| F2 | 1256.2 | 1339.53 | 5.48 | 5.08 | ||
| sentence | F1 | 490.0 | 510.89 | 10.66 | 10.38 | |
| F2 | 1357.89 | 1406.25 | 5.09 | 4.76 |
Acknowledgements
We would like to thank the participants who took part in this study. We would also like to thank Ineke Mennen for her valuable input and feedback in the project, and Maryann Tan for recording the Cantonese stimuli.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The second author received a scholarship from Umm Al-Qura University to complete this work. The authors furthermore acknowledge the financial support by the University of Graz.
Ethical approval
The Bangor Ethics Committee of the College of Arts and Humanities approved our study (approval: CAH 1 1) on August 19, 2015.
Consent to participate
Respondents have given written informed consent for review and signature before participating in the study.
