Abstract
Aims and Objectives/Purpose/Research Questions:
Second dialect acquisition is the process of acquiring features of a new dialect. This study directly compares the rates of second dialect acquisition in first (L1) and second (L2) language speakers in speech production and speech perception, and at different linguistic levels to establish the relative flexibility of the two groups.
Design/Methodology/Approach:
The participants completed a picture-naming, lexical decision, and wordlist reading tasks, which aimed at estimating their preference for American English or Australian English variants in vocabulary and pronunciation.
Data and Analysis:
The data come from 55 participants forming 4 different groups: 13 L1 American English speakers in the United States, 14 L1 American English speakers in Australia, 15 L1 Russian speakers of English in the United States, and 13 L1 Russian speakers of English who have lived in the United States and are residing in Australia.
Findings/Conclusions:
Both L1 and L2 groups show some second dialect acquisition, but L2 speakers exhibit a comparatively higher increase in second dialect feature use. While the difference in the picture-naming task did not reach significance, L2 speakers showed a greater increase in second dialect feature use in the other two tasks. These higher rates of second dialect acquisition in L2 speakers are likely due to their comparatively lower entrenchment and psycho-social investment in the language.
Originality:
This is the first study to directly compare the rates of second dialect acquisition in first and second language speakers of English, which can help elucidate some long-standing questions as to the factors affecting dialect and language acquisition. It also considers different linguistic levels and examines both speech production and speech perception.
Significance/Implications:
The finding that L2 speakers are more flexible than L1 speakers portrays L2 speakers as successful language learners and highlights their bilingual advantage.
Keywords
Introduction
Learning a second language (L2) has multiple social and economic advantages: bilinguals can communicate with more people in their languages and tap into other cultures more easily than monolinguals, making them more competitive in a global work environment. Some studies find that the bilingual advantage extends to other spheres, particularly related to executive functioning, such as task-switching (e.g., Prior & MacWhinney, 2010), and can have far-reaching health implications with bilinguals showing later onset of age-related cognitive impairments (Bialystok et al., 2016). Unsurprisingly, bilinguals may also demonstrate a linguistic benefit in third language (L3) acquisition, due to higher metalinguistic awareness, more flexibility in learning strategies, and broader linguistic repertoire (Cenoz, 2013). Such an advantage has been shown for different linguistic levels, including phonetic (Antoniou et al., 2015) and lexical (Kaushanskaya & Marian, 2009) learning. The linguistic distance between languages involved may play a role in the bilingual linguistic advantage in that transfer to L3 may happen more readily from a typologically closer language (Cenoz, 2013). An extreme case of this would be when the variety is not a separate language, but a dialect of one of the languages spoken.
Second dialect acquisition (SDA) is the process of incorporating features of a different dialect in one’s repertoire. Features of a second dialect (D2) can be acquired by both first (L1) and second language speakers (e.g., when L1 speakers of American English and L1 speakers of Russian move from the United States to Australia). The factors predicting SDA have been extensively studied in L1 speakers (see Siegel, 2010, for a review). Research on SDA in L2 is much more limited, but existing studies suggest that similar constraints play a role there and more generally in second language acquisition (SLA). In terms of age of arrival, Chambers (1992) specifically suggests that speakers who move to a new dialect area before the age of 7 will acquire its features and those above 14 will not, which is comparable to the L2 critical period of 6–12 years of age for phonology and 15 for morphosyntax proposed by Long (1990). Most researchers agree that post the critical period age of arrival has no significant effect on SDA (Drummond, 2012) or SLA (e.g., J. S. Johnson & Newport, 1989). The findings in relation to the effect of length of residence (LoR) are much more mixed. A longer length of residence was found to be associated with higher rates of D2 acquisition in some studies with L1 (Foreman, 2003) and L2 (Drummond, 2012) speakers. At the same time, other studies find no such relationship, suggesting that this variable may interact with other, especially psycho-social, variables (e.g., Stanford, 2007). Of such factors, social networks have been shown to play a role with D2 speakers with more exposure to native speakers of the dialect (through marriage, etc.) exhibiting higher rates of D2 feature usage in L1 (e.g., Bortoni-Ricardo, 1985) and L2 (Drummond, 2012). Finally, attitudes toward the dialects involved are suggested to predict SDA: positive attitudes toward the first dialect are associated with feature retainment, and negative attitudes (including linguistic insecurity)—with D2 acquisition, and vice versa (e.g., in L1—Kang, 2022; L2—Drummond, 2012; L3—Castle et al., 2025). This literature paints a picture of a stark similarity in the factors predicting SDA in L1 and L2, but since these are rarely compared directly, we do not know whether L1 or L2 speakers are more flexible in the use of D2 features. We might expect a difference between L1 and L2 speakers because of their differential experience with the D1, both in terms of amount of exposure and degree of psycho-social investment (i.e., identity and social networks).
Some empirical research, produced in particular by Spinu and colleagues, suggests that bilingual speakers may show higher rates of SDA (though see a finding of a monolingual advantage for a suprasegmental variable in Spinu & Rafat, 2019). For example, Spinu et al. (2018) conducted a study in which English-speaking monolinguals and bilinguals from Canada were asked to imitate the Sussex English accent and their rates of the glottal stop in the word-final position were compared across baseline, training, and testing phases. The bilinguals showed higher rates of the glottal stop after training. Similar results were found in another part of the same larger study considering the acquisition of four phonological features in a constructed, artificial accent using the same general methodology (Spinu et al., 2020). These are important studies suggesting a bilingual advantage in SDA; however, they are limited to several phonological variables and involve very limited exposure to a D2 in a laboratory setting; they also focus on simultaneous bilinguals, so it is unclear whether adult L2 learners would also show such an advantage.
In a study that focused on L2 speakers’ SDA at the lexical level, Gnevsheva et al. (2022) compared the use of American and Australian vocabulary items (e.g., diaper vs nappy) in Australia-based L1 and L2 speakers of English with Australian English as either D1 or D2 (L1 speakers of American and Australian English, and L2 English speakers of L1 Russian with and without prior residence experience in North America). They found significant differences between all the groups. The difference between the L2 speakers with and without residence experience in North America was taken as evidence of SDA in L2. Moreover, the difference between the L1 and L2 groups with Australian English as D2 was taken as evidence that L2 speakers are more flexible in SDA compared to L1 speakers. However, this conclusion is based on an assumption about what the Australian English D2 speakers’ use of vocabulary items was like prior to relocation to Australia. A direct comparison between L1 speakers of American English and L1 speakers of Russian in the United States and Australia would give us more reliable evidence for or against SDA. In addition, as this study focused on lexical production, we do not know how SDA manifests in perception and at other linguistic levels, such as phonology.
The current study builds on Gnevsheva et al. (2022) and aims to address the above research gap by (1) collecting data with additional participant groups in the United States and comparing variety-specific feature use by L1 and L2 speaker groups in the United States (D1) and Australia (D2), (2) considering several linguistic levels (phonetics in addition to the lexicon), and (3) speech perception on top of speech production, with the following research questions:
How does D1 and D2 speakers’ lexical production compare in L1 and L2?
How does D1 and D2 speakers’ lexical perception compare in L1 and L2?
How does D1 and D2 speakers’ phonetic production compare in L1 and L2?
Because of the parallels in SDA and SLA (e.g., for cognitive control mechanisms in language and dialect switching in Kirk et al., 2018), it is possible that L1 and L2 speakers will use D2 features at the same rate. Alternatively, L1 speakers may be more successful at SDA because of a potential higher typological similarity between a native speaker’s D1 and D2 (cf. Bongaerts et al., 2000). Finally, L2 speakers may exhibit higher SDA rates because of the relatively weaker entrenchment of (Schmid, 2017) and psycho-social investment in the D1 (Siegel, 2010).
Method
Participants
The analysis is based on the data coming from 55 participants forming 4 different groups: 13 L1 American English speakers in the United States, 14 L1 American English speakers residing in Australia, 15 L1 Russian speakers of English residing in the United States, and 13 L1 Russian speakers of English who have lived in the United States and are residing in Australia. The exact number of participants whose data is analyzed in different tasks varies because of data loss due to poor quality of recording or other technical issues and is specified below for each task. The L1 English speakers in the United States were recruited through the authors’ social circles and the snowballing method. The other participants were recruited through social media posts in relevant groups (such as Americans in Sydney, Russians in Arizona) and the friend-of-friend method.
The participants in the four groups were similar to each other on a number of demographic characteristics (Table 1). Their average age was in mid- to late 30s. There were about twice as many women as men. The L2 groups (in the United States and Australia) were proficient in English as indicated by their high-skilled occupations (e.g., IT engineers, graduate students, managers) and rated their English ability to be 6.7 and 7.5 on a 10-point scale (1 = poor to 10 = native-like) and had lived in the United States for 7.4 and 3.6 years, respectively. None of the mobile participants had moved to their L2 or D2 country before the age of 16, suggesting that age of arrival would have a minimal effect on acquisition. The D2 groups (L1 and L2 speakers in Australia) had lived in Australia for 3.6 and 4.5 years, respectively; they had positive attitudes toward Australia (11.7 and 11.4, respectively, on a 6–36 scale with smaller values meaning more positive). All participants had slightly less positive attitudes toward the United States (range = 12.5–19.3). A Welch Two-Sample t-test, performed for the D2 groups to confirm that they were not significantly different in the variables which may potentially affect SDA, revealed no significant difference in their Australian LoR and attitudes toward Australia and the United States.
Participants’ demographic information (incl. mean and standard deviation in parentheses for continuous variables).
LoR information is missing for two of the participants.
Data collection procedure
The Australia-based groups completed the study on a laptop computer with E-Prime 3.0 (Psychology Software Tools, Inc., 2016) in a phonetics lab on university campus in 2019. The US-based groups participated in the study during Covid-19-related lockdowns in 2020 and 2021, so the data was collected remotely on participants’ own computers using Gorilla Experiment Builder (Anwyl-Irvine et al., 2020). The participants were recruited and data was collected by an L1 Russian L2 English-speaking author and an L1s Russian/Australian English-speaking research assistant. As data collection was computer-based, there was minimal in person interaction between the participants and the experimenter. Because the data was collected in the respective country of residence, we expect that the data represent participants’ current production. The participants completed several tasks, followed by a demographic and attitudinal questionnaire; here we focus on the picture-naming, lexical decision, and wordlist reading tasks. This procedure was reviewed and approved by the University Human Ethics Committee, and informed consent was obtained from the participants.
The picture-naming data from a total of 51 participants was analyzed (9 L1 speakers in the United States, 14 L1 speakers in Australia, 15 L2 speakers in the United States, and 13 L2 speakers in Australia). In this task, the participants were presented with 80 images and asked to name what they see in one word only (for additional methodological detail see Gnevsheva et al., 2022). A + sign appeared on the screen for 1,000 ms, followed by an image on white background for 2,000 ms. Five of the images were practice items at the beginning of the task. The following 75 were 50 test items and 25 fillers, randomized for each participant. The fillers and practice images were expected to elicit the same response from speakers of both varieties (e.g., dog). The test items were representing objects that can be denoted by different lexical items in American and Australian English (e.g., flashlight/torch). These word pairs were identified with the help of dictionaries of American and Australian English (Macquarie Dictionary, n.d.; Merriam-Webster, n.d.), and with feedback from native speakers of both varieties. One item in each pair was designated as American and the other as Australian based on their relative frequency in respective varieties in GloWbE (Davies & Fuchs, 2015). Participant responses were coded as American, Australian, or NA. NA included responses that could not be coded according to our scheme: a zero submission (due to failure to produce a response in the time allotted or any technical issue that would have resulted in a non-submission on Gorilla) or an unforeseen response such as beach for vacation/holiday. NAs accounted for 12.7% in L1 speakers in the United States, 21.6% in L1 speakers in Australia, 62.5% in L2 speakers in the United States, and 40.4% in L2 speakers in Australia (the higher proportion of NAs for L2 speakers is unsurprising given their less efficient lexical retrieval in comparison with L1 speakers, Bialystok et al., 2008). NAs were excluded from further analysis, so the results present a binary choice between American and Australian responses.
The lexical decision data from 52 participants (13 L1 speakers in the United States, 14 L1 speakers in Australia, 12 L2 speakers in the United States, and 13 L2 speakers in Australia) was analyzed (for additional methodological detail see Szakay et al., 2019). The task was conducted using a total of 152 words as stimuli (76 real words, 76 pseudo-words). The real words were further divided into 38 pairs, with one item in each pair representing an Australian lexical item (e.g., torch) and the other a corresponding American item (e.g., flashlight). One Australian and one American man native speaker read out all words resulting in a total of 304 audio-stimuli. Each lexical item was pronounced in both an Australian and American accent, with half having matched and half having mismatched speaker accent and word dialect. The participants heard the stimuli in random order individually and had to indicate whether this was a real English word or not by pressing a button on a button-box or computer keyboard as fast as they could. Reaction times (RTs) were measured from the offset of each word, and outliers 2.5 SD away from each participant’s mean were removed. RTs were then z-scored to neutralize the effect of different platforms used in different parts of the project (in-lab E-Prime vs. online Gorilla). Here we report accuracy and RT data on real words with matched dialect and accent only, that is, torch in an Australian accent, and flashlight in an American accent, a total of 76 words per participant, resulting in 3952 observations for the data set. Reaction time data was analyzed on accurate responses only, providing insight into processing speed for recognized lexical items.
The wordlist reading data from 50 participants (12 L1 speakers in the United States, 12 L1 speakers in Australia, 13 L2 speakers in the United States, and 13 L2 speakers in Australia) was analyzed. The stimuli were 100 words of the (C)CVC(C) structure from each of the 10 lexical sets
The choice of the three tasks allows us to investigate SDA at several linguistic levels: that is, picture-naming focuses on lexical production while wordlist-reading focuses on phonetic and phonological production,—and to test whether the linguistic level mediates any difference in SDA between L1 and L2 speakers, as previous research suggests that the level of phonology is more susceptible to maturational constraints (e.g., Siegel, 2010). The tasks also cover both speech production and speech perception (picture-naming vs lexical decision) in recognizing the importance and inter-dependence of both (Nycz, 2015). We might expect to see differences between L1 and L2 speakers in one modality and not the orther as L1 speakers may show an advantage in perception while L2 speakers may be more willing to use D2 vocabulary items.
In discussing the tasks and results we talk about acquisition as a short-hand for different underlying processes. The picture-naming task captures use of a word in the given situation and does not mean that the participant does not know the corresponding word from the other dialect. The lexical decision task registers participants’ ability to accurately and quickly recognize lexical items from the two dialects as real words; the difference in performance on words from the two dialects will suggest variable knowledge or speed of access. Accuracy then reflects vocabulary acquisition, while reaction times reflect processing efficiency during lexical access. When a participant correctly identifies a word, the accuracy data shows that the word is part of their (receptive) lexicon. Reaction times are analyzed on accurate responses only, that is, words that participants recognize; therefore, differences in reaction time reveal differences in processing efficiency rather than acquisition. The wordlist reading task taps into phonetic and phonological production of the segments of interest in the specific task; it does not mean that participants do not produce other vowel qualities as part of, for example, style-shifting.
The questionnaire at the end of the study collected applicable demographic information about participants, that is, gender, age, self-rating of English proficiency on a 10-point scale, age of arrival and length of residence in English-speaking countries. It also included six attitudinal statements (e.g., Australia is a good place to live; Gnevsheva et al., 2022, based on Drummond, 2012), which participants responded to on a 7-point agreement scale.
Statistical analysis
Mixed effects models (Baayen et al., 2008) were fit to the data using the lmerTest package (Kuznetsova et al., 2017) in R (R Core Team, 2022). Pair-wise comparisons were conducted in emmeans (Lenth, 2022). The sjPlot package was used for visualizing the model output (Lüdecke & Lüdecke, 2019). A binomial logistic mixed effects model was fit to the picture-naming data with word choice (American vs Australian) as the dependent variable, an interaction between location (US vs AU) and English speaker status (L1 vs L2) as predictor, and participant and word as random intercepts. Two separate models were fit to the lexical decision task data. The accuracy data was analyzed by a binomial logistic mixed effects model—using the bobyqa optimizer (Powell, 2009)—with accuracy (0 or 1) as the dependent variable, participant and word as random intercepts, and a three-way interaction of location (US vs AU), English speaker status (L1 vs L2), and word dialect (American vs Australian) as predictor. The reaction time data was analyzed by a linear mixed effects model with reaction time (calculated from the end of the word and z-scored by participant) as the dependent variable, and participant and word as random intercepts. The predictor variables included the participant’s reaction time on the immediately preceding trial, as well as a three-way interaction of location (US vs AU), English speaker status (L1 vs L2), and word dialect (American vs Australian). A linear mixed effects model was fit to the wordlist reading data for each lexical set with formant measurements as the dependent variable, an interaction between location (US vs AU) and English speaker status (L1 vs L2) as predictor, and participant and word as random intercepts. The models were manually pruned in a step-wise fashion to exclude non-significant predictors.
Results
Picture-naming task
Table 2 summarizes the percentage use of Australian words by the participant groups in the picture-naming task. It is clear that Australia-based participants use more of Australian words. The table also suggests that L2 speakers use more of Australian words. Statistical analysis confirms these observations.
Mean percent use of Australian items in the picture-naming task by group.
The interaction between location (AU vs US) and English speaker status (L1 vs. L2) did not reach significance; however, both fixed effects were found to be significant (p < .001) predictors of word choice (American vs Australian) such that L2 speakers and Australia-based participants were more likely to use Australian words (Figure 1; Table 6 in the Appendix). The significant effect of language status suggests that L2 speakers were generally more likely to use Australian words than did L1 speakers. This is likely due to their productive vocabulary including words from different dialects (specifically American and British English, which has substantial overlap with Australian English in respect to the test items in the study) to begin with through media exposure and formal study. The significant effect of location is such that Australia-based participants were more likely to use Australian words, which indicates SDA for both L1 and L2 speakers. Moreover, no significant interaction between the two variables suggests that L1 and L2 speakers showed similar rates of SDA, despite a greater numerical difference in probability of Australian response in the L2 pair (>20%) than in the L1 pair (<10%).

Model-predicted probability of Australian response across language status and location.
Lexical decision task
Accuracy data
Mean accuracy percentages by participant group and word dialect in the lexical decision task are presented in Table 3. Overall, the L1 speakers are more accurate in correctly identifying English words than the L2 speakers are. Australia-based participants are more accurate on Australian words than US-based participants; in turn, US-based participants are more accurate on American words than Australia-based participants are.
Mean percent accuracy in the lexical decision task by group and word dialect, with SD in brackets.
The logistic mixed effects model revealed a significant effect of the three-way interaction between location (US vs AU), English speaker status (L1 vs L2), and word dialect (American vs Australian) on participants’ accuracy. The coefficients table for the accuracy model is given in Table 7, and the corresponding pair-wise comparisons are presented in Table 8 in the Appendix. The interaction plot is shown in Figure 2.

Model-predicted accuracy in the lexical decision task by location, language status, and word dialect.
The results reveal that before moving to Australia, L2 English speakers (i.e., L1 Russian speakers in the United States in D1 context) were significantly less accurate than L1 American English speakers on both American English words (p < .01) and Australian English words (p < .001). This suggests a robust L1 advantage in lexical decision accuracy. Both L1 and L2 speakers exhibited significantly higher accuracy on American words compared to Australian words (p < .001 for both groups), which is expected given their immersion in an American English environment.
After moving to Australia, L1 speakers remained significantly more accurate than L2 speakers on American English words (p < .001), consistent with their continued advantage in their native variety. However, no significant difference was observed between the groups on Australian English words (p = .312), indicating that the L2 speakers had caught up to their L1 counterparts in recognizing Australian English vocabulary. L2 participants in Australia significantly improved in accuracy on Australian English words compared to their counterparts in the United States (p < .001), indicating that extended exposure to Australian English facilitated SDA. L1 American participants also significantly improved in their accuracy for Australian English words (p < .001), showing that L1 speakers can enhance their recognition of a second dialect through immersion. A crucial finding is that L1 Russian participants in Australia showed no significant difference in accuracy between American English and Australian English words (p = .323). This suggests that they fully adapted to recognizing Australian English vocabulary while maintaining their accuracy on American English words. In contrast, L1 American participants still showed a weak trend toward higher accuracy on American words (p = .098) even after prolonged exposure to Australian English. This suggests that while L1 speakers can improve their recognition of a second dialect, their first dialect advantage remains somewhat resistant to full alignment with the new variety.
Neither L1 nor L2 participants in Australia (D2 context) differed in their accuracy on American English words compared to when they lived in the United States (D1 context). That is, both groups maintained their ability to accurately identify American words despite extended exposure to Australian English. This suggests that while SDA leads to gains in recognition of a new dialect, it does not necessarily come at the cost of previously acquired lexical knowledge.
Reaction time data
Mean reaction times by participant group and word dialect in the lexical decision task are presented in Table 4, with latency values given in milliseconds before z-scoring was applied. Overall, the online Gorilla platform in the United States resulted in longer RTs for both L1 and L2 speakers compared to the shorter RTs produced by the in-lab E-Prime platform used in Australia. This difference between the two platforms highlights the importance of z-scoring, as it preserves the key trends in the data while neutralizing the effects of the platforms.
Mean raw reaction times (ms) before z-scoring in the lexical decision task by group and word dialect, with SD in brackets.
The linear mixed effects regression model using the z-scored RTs as dependent variable, revealed a significant effect of participant RT on the preceding trial, as well as the three-way interaction between location (US vs AU), English speaker status (L1 vs L2), and word dialect (American vs Australian). Model fit was significantly improved by including the preceding RT (Baayen & Milin, 2010), as well as the interaction term for the other three fixed effects. The preceding trial’s RT shows the largest effect size in the model: a significant positive correlation between RTs on two adjacent trials. The coefficients table for the RT model is given in Table 9, and the corresponding pair-wise comparisons are presented in Table 10 in the Appendix, while the model effect plots are shown in Figures 3 and 4.

Model-predicted RT in the lexical decision task by preceding trial RT.

Model-predicted RT in the lexical decision task by location, language status, and word dialect.
The results reveal key differences in how L1 and L2 speakers process American and Australian English words across different immersion contexts. In the D1 context of the United States, there was no significant difference in RTs between L1 American English and L1 Russian participants when responding to American words, suggesting that L2 speakers who had been immersed in American English were able to process these words at native-like speeds. For Australian words, no significant RT differences were found between the groups, although L2 speakers were numerically slower. This contrasts with the accuracy results, where L1 speakers had an advantage, potentially indicating that while L2 speakers in the United States recognize English words less reliably, their processing speed for known words is comparable to that of L1 speakers. Both L1 and L2 speakers were significantly faster when responding to American words compared to Australian words (p < .01), reflecting their greater familiarity with American English vocabulary.
For L1 American English speakers in Australia, the RT pattern remained consistent with the United States, that is, they were still significantly faster on American words than on Australian words (p < .05), and their reaction times did not differ between their D1 (US) and D2 (AU) contexts, on both American and Australian words. This suggests stability in L1 processing speeds, even with extended exposure to a new dialect. For L2 speakers, however, a different pattern emerged. Unlike in the United States, there was no significant difference in RTs between American and Australian words (p = .869). This mirrors the accuracy pattern, where their performance was also equivalent on both dialects, suggesting that extended exposure to Australian English led to not only more accurate recognition but also equalized processing efficiency across dialects.
Crucially, L2 speakers in Australia were significantly faster on Australian words than L2 speakers in the United States (p < .001), demonstrating clear SDA effects in lexical processing. However, they were also significantly slower on American words compared to L2 speakers in the United States (p < .01). This differs from the accuracy results, where they maintained stable performance on American words, suggesting a change in processing speed despite retention of knowledge. In addition, L2 speakers in Australia were significantly slower on American words compared to L1 American speakers in Australia (p < .01), despite showing no such difference in the D1 (USA) context. This suggests that while their accuracy on American words remained stable, their processing speed for these words declined after immersion in Australian English. Interestingly, on Australian words, L2 speakers in Australia were actually faster than L1 Americans in Australia (p < .01), indicating that their immersion in Australian English led to particularly efficient processing.
These results provide strong evidence that SDA affects not only accuracy but also processing speed, with L2 speakers adapting to the new dialect at a deeper cognitive level than just recognition. While L1 speakers can improve in accuracy, their RT results suggest a more stable pattern, maintaining their first dialect advantage in processing speed. The differential patterns observed between L1 and L2 speakers have important implications for SDA. L2 speakers’ ability to adjust both accuracy and RT in response to a new dialect suggests a higher degree of plasticity in their lexical processing systems. This flexibility may stem from their experience in acquiring a second language, which could make them more adept at adapting to additional linguistic variation (Best & Tyler, 2007). L1 speakers’ limited changes in RT, despite improvements in accuracy, indicate that while they can learn new dialectal vocabulary, their processing efficiency remains biased toward their first dialect. This finding aligns with research suggesting that L1 lexical representations are more stable and less susceptible to modification (Pallier et al., 2003).
Wordlist reading
Figure 5 visualizes the monophthongs (at 50%) and diphthongs (at 20%) for the four participant groups in the wordlist reading task. For some of the vowels (e.g.,

Vowel F1 and F2 values across the participant groups.
Statistical analysis confirmed that there were no significant predictors in the models of
To sum up, SDA in the wordlist-reading task is evidenced by a significant effect of location, either as a main effect (as for
Summary of significant (+) and non-significant (−) differences between D1 and D2 groups within each L1 and L2 pair.
Discussion
To briefly sum up the results, we found evidence of SDA for both L1 and L2 speakers in all tasks; however, the extent of SDA varied by speaker group, linguistic domain, and task modality. L1 speakers showed minimal SDA overall, with under 20% predicted Australian word use by L1 speakers in Australia in the picture-naming task. In the lexical decision task, L1 speakers in Australia improved in their accuracy on Australian words but did not react to them faster than L1 speakers in the United States, suggesting only partial adaptation. Their SDA was also limited in the wordlist-reading task, where significant effects emerged for only one out of twelve variables. These rates are lower than those previously reported for L1 children and teenagers (e.g., 9-17-year-olds ranging from 24% to 71% and averaging 52% in Chambers, 1992, p. 679), suggesting that adult SDA may be more limited than previously thought (see Gnevsheva et al., 2022).
For L2 speakers, the evidence for SDA was stronger. While direct comparisons with prior research are difficult due to the scarcity of L2 SDA studies, our results align with findings that L2 speakers do exhibit second dialect acquisition (Castle et al., 2025; Drummond, 2012). Our L2 speakers in the picture-naming task showed significantly increased Australian word use in Australia, at rates similar to L1 speakers in previous studies: a predicted probability of Australian word use of more than 40%. The lexical decision results further support SDA in the L2, with L2 speakers in Australia becoming more accurate and faster on Australian words compared to L2 speakers in the United States. In wordlist-reading, L2 speakers showed significantly more Australian-like production for two out of twelve variables (cf. average 27% for phonological acquisition in L1-speaking children in Chambers, 1992, p. 680). These results suggest that our findings for SDA in L2 are comparable to those in earlier work on L1 and L2 speakers.
Overall, L2 speakers showed similar or slightly more SDA than did L1 speakers. In particular, while L1 and L2 groups showed similar rates of SDA in the picture-naming task, L2 speakers outperformed L1 speakers on both lexical decision accuracy and RTs improvements and on phonetic acquisition. This is similar to the results of Spinu and colleagues who also found a bilingual advantage in SDA (Spinu et al., 2018, 2020). Our findings extend this to consecutive bilinguals acquiring a D2 in a naturalistic setting and suggest that even adult L2 learners whose proficiency in the L2 is lower than that of simultaneous bilinguals will show a bilingual advantage in SDA.
There may be both cognitive and social explanations for this finding. On the one hand, American English features may be relatively more entrenched for L1 speakers than L2 speakers (Schmid, 2017). A usage-based account of the results, such as exemplar theory (e.g., Bybee, 2013; K. Johnson, 2007; Pierrehumbert, 2003), would suggest that L1 speakers have had more exposure to D1 features than L2 speakers, especially during the sensitive period (e.g., Long, 1990), resulting in comparatively more robust, less malleable exemplar clouds. Exemplar theory posits that linguistic categories are formed through accumulated experiences with specific instances of speech, known as exemplars, which are stored in memory. More frequently encountered variants create stronger, more entrenched representations, making them less susceptible to change, while less frequent variants remain weaker and more malleable. As a result, when encountering a D2, L1 speakers—whose D1 exemplars are more entrenched—may update their exemplar clouds more slowly and less readily, leading to lower rates of D2 feature use. This is reflected in our findings, where L1 speakers showed minimal SDA overall, particularly in the wordlist-reading tasks. Their resistance to adopting D2 features aligns with the idea that entrenched exemplar clouds inhibit rapid adaptation, as frequently encountered variants reinforce existing representations and make them more resistant to change (Schmid, 2017). In contrast, L2 speakers showed greater SDA in both production and perception, outperforming L1 speakers in lexical decision gains and phonetic adaptation. This suggests that L2 speakers may have more fluid exemplar representations, either due to their later acquisition of D1 or their continued experience managing multiple linguistic systems.
A social account assumes that L2 speakers have a potentially weaker personal and/or social investment in the D1 because of their L2 status. Existing literature suggests that identification with the D2 group is a predictor of SDA in L1 (Siegel, 2010 and references therein) and identification with the L2 group is a predictor of SLA (e.g., Gluszek et al., 2011). If we assume that reduced identification with the D1/L1 group results in an opposite effect, then we would expect L2 speakers to have a weaker identification with American English than L1 speakers would, leading to higher rates of SDA. While we specifically chose to control for LoR and attitudes differences between L1 and L2 groups, future research may want to consider manipulating these variables to test these hypotheses explicitly. In addition, bilingual speakers may be more flexible in their L2 than monolinguals in their L1 because “they can continue to signal identity in their use of L1” (Sharma, 2024, p. 11). The cognitive and social constraints need not be exclusive and are likely to be affecting speakers at the same time.
In terms of the linguistic level, groups from both language status backgrounds show an increase in the use of D2 features in lexical production (picture-naming), but L2 speakers show SDA in phonetic production on more variables (wordlist reading). This supports previous literature in suggesting that SDA of vocabulary happens more readily than SDA at the level of phonetics and phonology (e.g., Siegel, 2010). The L2 group shows more acquisition at the more constrained level, suggesting that this is where the flexibility advantage may lie.
In comparing production and perception we find evidence for SDA in both for both L1 and L2 speakers. It is difficult to compare the tasks directly, but it is possible to note that for both language groups there was SDA in speech perception (lexical access and processing) and speech production (picture-naming). We take this as suggesting that SDA affects both production and perception. L2 speakers show a greater degree of SDA than L1 speakers in both perception and (phonetic) production, suggesting that the L2 speakers’ relative flexibility is not driven by advantage at either modality, but applies across both. However, more research is needed to support these propositions.
We also acknowledge the relatively small number of participants. While larger samples are usually preferable for increased reliability of results, sometimes they are not achievable logistically when the population under investigation is quite small. Furthermore, we made a methodological decision to have a smaller number of participants but minimize within-group variation by strictly controlling recruitment criteria. This left us with about 15 participants per group, which, in fact, compares favorably to many previous studies of second dialect acquisition (e.g., six participants in Chambers, 1992).
To conclude, this study compared the degree of SDA in L1 and L2 speakers and found that L2 speakers may be slightly more flexible in using D2 features than L1 speakers, at different linguistic levels and in both production and perception. This suggests that L2 speakers may exhibit a bilingual advantage in subsequent variety learning similarly to simultaneous bilinguals in comparison to monolinguals.
Footnotes
Appendix 1
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We would like to thank our participants for their time; James Grama and Billy McConvell for providing voices for the stimuli; research assistants Kira Rodionov for help in stimulus preparation and data collection; Chloe Xue for help with data transcription; Monique Koke for help with manual alignment correction; and Australian Linguistic Society, ANU College of Arts and Social Sciences, Macquarie University, and Paderborn University for financial support; and the GURT 2022 audience, as well as the editor and two anonymous reviewers for feedback on an earlier version of this work.
