Abstract
Aims:
How bilinguals control multiple languages is the object of intense recent scientific debate. Empirical research on language control at various linguistic levels has remained scarce, with language control at the phonetic level particularly underexplored. The present study aimed to examine the dynamics of phonetic-level language control during speech production.
Design:
Chinese-English-German speakers named the letter of the alphabet in English (L2) or German (L3), either in single-language blocks or in alternate-language mixed blocks. Letters vary regarding how phonetically similar pronunciation is across the two languages, hence allowing to explore cross-language phonetic influences.
Data and analysis:
Three-way repeated-measures analysis of variance (ANOVA) with trial type (non-switch vs. single-language for mixing costs; non-switch vs. switch for switch costs), response language (English/L2 vs. German/L3), and phonetic similarity (similar vs. neutral vs. different) as variables were conducted on 52 subjects’ response times and accuracy for mixing costs and switch costs, respectively.
Findings:
Results showed substantial mixing and switch costs, as well as a “reversed language dominance” effect, suggesting inhibitory control in response to cross-language phonetic interference. Cross-language facilitation was observed for phonetically similar letters, and mixing/switch costs were modulated by phonetic similarity in a complex pattern.
Originality:
The findings show a complex interplay of suppression (e.g., as indexed by switch costs) and facilitation (i.e., the effect of phonetic similarity between letter translation equivalents).
Significance:
The evidence for cross-language phonetic interference as well as facilitation effects at local and global levels of control implies a dynamic interaction between the two phonetic systems.
Keywords
Introduction
Language control in the bilingual brain has remained in the limelight of research over the past two decades (e.g., Crinion et al., 2006; Declerck & Koch, 2023; Declerck & Philipp, 2015a; Green, 1998; Green & Abutalebi, 2013; Rodriguez-Fornells et al., 2002). At present, bilinguals’ speech comprehension and production are assumed to be governed by general-domain cognitive control mechanisms (for reviews, see Abutalebi & Green, 2007, 2008). However, the nature and dynamics of mechanisms underlying bilingual language control may be more complex than is commonly assumed. Because language is a hierarchical system (phonetics, orthography, semantics, syntax, etc.), speech comprehension and production in bilinguals likely involve control at multiple linguistic levels (Bobb & Wodniecka, 2013; Declerck & Philipp, 2015a; Gollan et al., 2014; Kroll et al., 2006; Olson, 2013; Zhang et al., 2019). The present study aimed to examine the dynamics of bilingual language control at the phonetic level, using the alternating language-switching paradigm with a letter-naming task.
Local- and global-level language control during language switching
A wealth of empirical evidence suggests that a bilingual’s two languages are activated in parallel when one language is being processed, yielding cross-language interference (for review, see Bialystok & Feng, 2009; Kroll et al., 2006). Resolution of such an inherently competitive process requires language control (e.g., Crinion et al., 2006; Declerck & Philipp, 2015b; Green, 1998; Green & Abutalebi, 2013). According to the inhibitory control (IC) model (Green, 1998), the language control process involves recruitment of an IC mechanism by which a bilingual speaker must suppress the non-target language to produce the target language.
A major paradigm to investigate the underlying mechanism of language control is language switching. In language switching experiments, bilingual subjects switch between two languages, either randomly cued as to the relevant language or in a predictable pattern (e.g., AABBAA . . .). The most replicable empirical finding regards “switch costs”: when performance on trials which required a language switch from the previous trial is compared with that on trials in which both previous and current trial required the same language, there is worse behavioral performance as shown by slower response times (RTs) and higher rates of errors (for reviews, see Kroll et al., 2008). The “Task Set Inertia” account for task switching assumes that switch costs originate from interference resulting from having previously performed a competing task (Waszak et al., 2004; Yeung et al., 2006). Similarly, the IC model (Green, 1998) explains switch costs with the inertia of persisting inhibition. More exactly, IC mechanism is needed during language switching by suppressing the non-target language schema, and this inhibition persists over time. Subsequently it is more difficult to reactivate the previously inhibited language schema, giving rise to a drop in performance.
In the same vein, the IC model predicts “switch cost asymmetry” in unbalanced bilingual speakers, with larger costs when switching into the more compared to the less dominant language. This is because the more dominant language entails more inhibition on the production of the less dominant language such that it takes more efforts to switch into the more dominant language (for a review, see Bobb & Wodniecka, 2013). Despite considerable research supporting asymmetrical switch costs as evidence of inhibition, this idea has been challenged. Studies have shown symmetric switch costs among both balanced (e.g., Costa & Santesteban, 2004; Costa et al., 2006) and unbalanced bilinguals (e.g., Declerck et al., 2012; Finkbeiner et al., 2006; Heikoop et al., 2016; Mosca & de Bot, 2017; Peeters & Dijkstra, 2018). Furthermore, findings indicate that switching to a less dominant language requires more effort than to a more dominant one, displaying a reverse asymmetry pattern (e.g., Declerck et al., 2015; Zheng et al., 2020). These observations challenge the consistency of asymmetrical switch costs (for a meta-analysis, see Gade et al., 2021).
In addition to switch costs, there are “mixing costs,” with worse performance in (non-switch trials in) mixed-language blocks compared to single-language blocks, and also typically mixing costs being asymmetrical for unbalanced bilinguals (for a review, see Declerck et al., 2020). Switch costs are considered to be an indicator of transient, trial-to-trial control processes, reflecting a local-level reactive consequence of cross-language interference, while mixing costs are considered to be a marker of sustained control processes, reflecting a relatively global-level language control in response to cross-language interference (for a review, see Declerck & Philipp, 2015a). Notably, mixing costs are typically accounted for via proactive inhibition of the non-target language in single-language blocks. On top of mixing costs, the “reversed language dominance” effect is also a marker of global-level proactive language control. This pertains to the observation that although typically performance is better in a bilingual’s dominant compared to the non-dominant language, this pattern reverses when both languages are intermixed: under these circumstances, oftentimes performance is slower in the more dominant language. The reversed language dominance effect is considered to reflect global inhibition: when languages are intermixed, then processing of the non-dominant language requires substantial inhibition of the dominant language, resulting in a drop in performance (for reviews, see Declerck et al., 2020; Declerck & Koch, 2023; Declerck & Philipp, 2015a).
Language control at the phonetic level
Early research on phonetic production had already established that bilinguals differentiate their two phonetic systems from a very young age (Caramazza et al., 1973; Hazan & Boulakia, 1993; MacLeod & Stoel-Gammon, 2010). Interestingly, these two systems are not completely disconnected; instead, in certain contexts, one may intrude into the other. For instance, it is commonplace for a bilingual speaker to produce his or her L2 in a native-like accent (Goldrick et al., 2014). However, extant research has yielded mixed results as to cross-language phonetic interference: (a) unidirectional L1-influence on the L2 (Caramazza et al., 1973); (b) unidirectional L2-influence on the L1 (Flege et al., 2002); (c) bidirectional L1 L2 interaction (Fowler et al., 2008); (d) no L1 L2 interaction (Grosjean & Miller, 1994). Although currently no consensus exists on the directionality of influence, there is sufficient evidence supporting cross-language phonetic interference, suggesting that language control mechanisms must work efficiently at the phonetic level to resolve cross-language intrusions during phonetic production.
A more direct way of exploring the interactions between two phonetic systems is to manipulate characteristics of phonology in language switching paradigms. For instance, several studies have manipulated “cognate status” to investigate the effects of phonological properties on language switching (e.g., Broersma & Cutler, 2011; Christoffels et al., 2007; Declerck & Koch, 2023; Filippi et al., 2014; Li & Gollan, 2018; Thomas & Allport, 2000). Cognates are words that share the same etymological origin in two languages and thus have a large extent of phonological overlap. An important finding is that pictures whose names are cognates across a bilingual’s languages are named faster than pictures with non-cognate names (e.g., Costa et al., 2000; Hoshino & Kroll, 2008). This “cognate facilitation effect” has been taken to indicate cross-language phonological activation. Cognate status and language switching also appear to affect one another in a complex way: when cognate status was manipulated within an experimental block (i.e., cognate and non-cognate responses were intermixed), switch costs were larger for cognates relative to non-cognates (e.g., Filippi et al., 2014; Thomas & Allport, 2000); by contrast, the effect reversed when cognates and non-cognates were implemented in separate blocks, with switch costs reduced for cognates compared to non-cognates (e.g., Broersma & Cutler, 2011; Declerck et al., 2012). The smaller switch costs when cognates, compared to non-cognates, were implemented in between-block conditions could be explained as a global persisting (facilitation or inhibition) effect on phonological level, while the reversed effect when they are implemented in within-block conditions would be accounted for with local control processes (Declerck & Philipp, 2015b).
Declerck and Philipp (2015b) employed a different and more fine-grained manipulation to investigate the influence of phonological overlap on language switching. Words on trial n-1 and trial n which shared partial phonological overlap with the first two phonemes being identical (e.g.,
A substantial body of studies has examined language switching in phonetic production by measuring the voice onset time (VOT) difference between languages (Antoniou et al., 2011; Goldrick et al., 2014; Grosjean & Miller, 1994; Olson, 2013; Tsui et al., 2018). VOT corresponds to the lag between the release of a consonant’s constriction and the onset of periodicity signaling modal vocal-fold vibration (Lisker & Abramson, 1964). VOT is often used as a measure of phonetic interference by exploring word-initial voiceless consonants at the point of switch (e.g., a voiceless word-initial stop /k/ for English words “corn,” “candle,” “cat”). Recent phonetic research has indicated that the VOT of words is subject to cross-language phonological influence from the non-target language more for cognates than for non-cognates (Amengual, 2011; Goldrick et al., 2014). For example, bilingual speakers’ L2 production was found to become more accented (thus more influenced by L1) when they switched the language of production than when they did not (Goldrick et al., 2014). Therefore, language switching is subject to cross-language phonological influence and is moreover modulated by the bilingual speakers’ degree of bilingual experience and proficiency (Filippi et al., 2014; Goldrick et al., 2014; Olson, 2013). These findings from phonetic production research with language switching tasks suggest that bilinguals use language-control mechanisms to handle the consequences of cross-language interference at the phonetic level.
Taken together, the extant literature has explored cross-linguistic phonetic interference mainly by (a) manipulating cognate status, (b) rating accentedness of phonetic intrusions, and (c) measuring VOT. Despite much empirical evidence obtained for phonetic-level language control in response to cross-linguistic phonetic interference, it seems difficult to dissociate language control at the phonetic level from control at other linguistic levels, due to the use of words as experimental stimuli in most previous research. For example, Olson (2013) concluded that switching languages requires a switch at the lexical as well as at the phonetic level. In a recent study on language switching with a picture-naming task (Zhang et al., 2019), a switch in language form was dissociated from a switch in meaning (or concept). Hence, word stimuli are, to some extent, context-sensitive in that they are subject to lexical, semantic, phonetic, and/or orthographical constraints. The nature of phonetic-level language control would be more unequivocally revealed if the stimuli were parsimoniously constrained to mere phonetic influences.
The present study
As summarized earlier, most research on language switching has involved lexical stimuli. By contrast, the present study used letters as experimental stimuli which, in contrast to context-sensitive word stimuli previously used, are relatively context-free. The task required bilinguals to name letters of the alphabet (e.g., in English: “b” -> [biː]) in one of two of their languages. Letter stimuli are relatively (but not absolutely) context-free in that they exhibit varying degrees of phonetic overlap in two languages and hence may be subject to cross-language phonetic influences (i.e., interference and/or facilitation). For example, the letter “n” is pronounced very similarly in English and German ([ɛn]), whereas pronunciation of the letter “j” is virtually unrelated (English: [dʒeɪ], German: [ʝɔt]). The present study explored potential cross-language phonetic effects in letter naming. We further aimed to examine the dynamics of phonetic-level language control during speech production. To this end, letters were either named in single-language blocks, or they were named in an “alternating language switching” paradigm in which the naming language was predictable with alternating-run sequences (AABB, etc.). Participants in the present study were Chinese-English-German trilinguals with Chinese as their mother tongue (L1), English as L2, and German as L3. Different from languages in an alphabetic system which involve phonological mediation, Chinese is a language in a logographic system which is subject to little phonological mediation (Zhou & Marslen-Wilson, 1999). Therefore, we selected English (L2) and German (L3) as the naming languages.
Based on the results from studies on language switching with picture- or digit-naming tasks (for reviews, see Bobb & Wodniecka, 2013; Declerck et al., 2020; Declerck & Koch, 2023; Declerck & Philipp, 2015b), in the present letter-naming task, we explored local-level switch costs and global-level mixing costs and their respective symmetry/asymmetry regarding their response languages. Furthermore, we investigated the “reverse language dominance” effect. Most importantly, we were interested if there was cross-language phonetic interference and/or facilitation during spoken production of phonetically similar letters (Oldfield, 1971), and if so, how this effect would impact mixing/switch costs. The evidence regarding local- and global-level language control, together with cross-language phonetic interference and/or facilitation, would shed light on the dynamic interaction between two phonetic systems. This in turn would illuminate the dynamics of language control at the phonetic level and further progress our understanding of language control at multiple linguistic levels.
While the current study was conducted, an article appeared in which a closely related approach was pursued. Zuo et al. (2022) investigated language control in bilinguals with a letter-naming task. In their study, Chinese-English bilinguals named pinyin in Chinese (L1) or letters of the alphabet in English (L2) according to a color cue. A single-Chinese block, a single-English block, and two mixed-language blocks were conducted. The behavioral results showed local and global cross-language interference in the form of switch and mixing costs, as well as an asymmetry between language and switch cost (larger switch costs in L1 than in L2) but a symmetrical pattern regarding mixing costs. However, in contrast to our own study, the potential role of phonetic overlap between translation equivalents of letters was not explicitly investigated. Nonetheless, results from the study by Zuo et al. suggest that the letter-naming task provides an important tool to examine phonetic-level control in bilingual speakers.
Based on previous studies on language switching with digit-, picture-, or letter-naming tasks (de Bruin et al., 2018; Meuter & Allport, 1999; Zuo et al., 2022), we predicted that there would be switch costs at the local level, as well as mixing costs at the global level. In addition, we predicted the presence of a reversed language dominance effect, with responses being slower in the more dominant (English, L2) than in the less dominant language (German, L3). However, our main point of interest was to explore whether there would be cross-language phonetic facilitation or inhibition, with some of letters in the English and German phonetic systems having nearly identical pronunciation.
Method
Participants
Forty-five Chinese-English-German trilingual speakers were recruited (mean age 20 ± 1.5, range 18–22 years old, 36 female). All participants were right-handed (Oldfield, 1971) and reported no language, hearing, or neurological impairments. Participants were not color blind and had normal or corrected-to-normal vision. A written informed consent was given to each participant. The experiment was approved by the Ethics Review Board of Southwest University of Political Science and Law, China. Data of three participants were excluded from analyses due to technical failure of voice recording (see below); thus, the final sample consisted of 42 participants.
All participants were native Chinese speakers with Mandarin as their mother tongue (L1). They had started learning English as their second language between the ages of 9 and 12 years in primary education and had learned English for about 6–13 years by the time the experiment was conducted. They started learning German as their third language between the ages of 18 and 22 years at college. Language proficiency, use, and exposure for each language were captured through the Chinese version of the Language Experience and Proficiency Questionnaire (LEAP-Q; Marian et al., 2007). There was significant difference between English and German in each measure of language background (all ps < .001), suggesting that these two languages were unbalanced and therefore could be characterized as L2 and L3 (see Table 1).
Language background of participants: age of acquisition (AoA), scores of self-rated proficiency, language exposure, and language use.
Materials
The English alphabet consists of 26 letters, while the German alphabet has 30 letters of which four are German-specific. Therefore, the 26 letters that are shared in both English and German were selected as stimuli. Despite being orthographically identical, these 26 letters vary substantially regarding their pronunciation in the English and German phonetic systems, with some letters having nearly identical pronunciation (e.g., f: [ɛf] in English/German), whereas others are virtually unrelated (e.g., j: [dʒeɪ] in English, [ʝɔt] in German), and many letters differing in their pronunciation to some extent (e.g., b: [biː] in English, [beː] in German). To categorize letters according to phonetic similarity between the two target languages, we asked an independent cohort of 29 Chinese-English-German trilingual speakers to assess letter pronunciation similarity between English and German on a 5-point scale, ranging from very different (i.e., 1) to very similar (i.e., 5). The assessment was conducted on 26 letters which were randomly ordered in an emailed word text file. On the basis of these ratings, the 26 letters were assigned to three levels of phonetic similarity, with six being phonetically “different” (j, r, v, w, y, z; mean score 1.19 ± 0.12), six being phonetically “similar” (f, l, m, n, s, x; mean score 4.22 ± 0.54), and 12 being phonetically “neutral” (a, b, c, d, e, g, i, k, p, q, t, u; mean score 2.49 ± 0.45). Statistical comparisons between the three levels of phonetic similarity were all highly significant (ps < .001). Two letters (h, o) that stand at the division lines between the “different” and “neutral” categories and between the “similar” and “neutral” categories, respectively, were presented as warm-up stimuli at the beginning of each block and excluded from analyses (“dummy” in Table 2).
Letters of the alphabet, with English and German pronunciation in the International Phonetic Alphabet (IPA) format, average ratings and standard deviation (SD), and assigned category of phonetic similarity (similar, neutral, different, dummy; see text).
Procedure
After signing the informed consent, participants were asked to acquaint themselves with the pronunciation of the 26 letters, randomly sequenced, in a booklet in English and German. Then participants conducted a practice session including a single-English block with 26 trials, a single-German block with 26 trials, and two alternate-language blocks with 50 trials in each. The instruction and procedure in the practice session were identical to those in experimental session.
The experimental session included a total of eight blocks with four single-language blocks and four alternate-language blocks. The block order was as followed: block 1 (all English)—block 2 (all German)—block 3–6 (alternate languages)—block 7 (all German)—block 8 (all English). The block order for English and German was counterbalanced across participants, which was done to offset a potential blocked language-order effect. In single-language blocks, participants were instructed to name each letter in either English or German throughout the block. In alternate-language blocks, we adopted the alternating language switching paradigm, during which participants were instructed to switch languages after every second trial (e.g., En-En-Ge-Ge-En-En-Ge-Ge, etc.). As such, the letter-naming languages followed a predictable sequence (e.g., Festman et al., 2010; Jackson et al., 2004). A single-language block had 50 trials and an alternate-language block had 98 trials, with the first two trials serving as warm-up trials in each block. In total, there were 576 experimental trials excluding warm-up trials.
To reduce confounds for effects of a phonetically different letter following a phonetically similar letter or vice versa, we manipulated the letter sequence of three types, with a phonetically different or similar letter followed by a phonetically neutral letter such that no phonetically different or similar letters were presented in adjacency (e.g., similar-neutral-different-neutral-similar-neutral . . .). Moreover, despite the predictable alternating language sequence, in the alternate-language blocks, we presented additional color cues to remind participants of the response language of each trial in case of absent-mindedness. Letters were colored in red, yellow, green, or blue. Each letter was assigned a pair of colors, with two colors indicating letter-naming in English, and the other two colors indicating naming in German. Using four colors to indicate one of two response sets avoids confounds between cue and switching (Heikoop et al., 2016) because irrespective of whether the language switches on a given trial in the alternate-language blocks, the cues were always different (de Bruin et al., 2018; Jevtović et al., 2020). The color-to-language correspondence was counterbalanced across participants. In single-language blocks, two color cues were assigned for each naming language to minimize the visual difference between single- and alternate-language blocks.
Each trial began with a 300-ms central fixation cross. Then a colored letter was presented for 2,000 ms or disappeared from the screen once the voice key was triggered. Participants were instructed to name the target letter as quickly and accurately as possible (a) in English in single-English blocks, (b) in German in single-German blocks, or (c) alternate languages every two trials in English or German in alternate-language blocks. When the target letter disappeared, a blank screen was presented for 1,200 ms as the inter-trial interval (ITI). Letters were presented in lowercase one at a time in the center of the screen. The visual angle of a target letter measured 3° vertically and 0.8°–2.5° horizontally depending on the particular letter. The experiment was programmed and run using E-prime 3.0 on a DELL PC. The viewing distance was approximately 60 cm from the screen with a 60-Hz refresh rate and 1024 × 768 screen solution. Errors were coded on the spot by the experimenter. Participants were debriefed at the end of the experiment.
Results
Trials with (a) no response, (b) the wrong language of letter naming, (c) hesitation or stuttering, (d) combination of both languages, and (e) technical errors were discarded from the RT analysis (5.2%). The first two warm-up trials of each block, RTs below 200 ms and RTs 2.5 standard deviations (SD) above or below each participant’s mean RT were also excluded from RT analysis. In the accuracy analysis, we included all erroneous trials except technical errors.
Table 3 shows mean RTs and error percentages, dependent on block type (single-language vs. non-switch vs. switch), response language (English/L2 vs. German/L3), and phonetic similarity (similar vs. neutral vs. different). In addition, we calculated so-called “inverse efficiency scores” (IES = RT / 1-PE; Townsend & Ashby, 1978) which combine speed and accuracy into a composite measure; these are also displayed in Table 3. Here, IES (RT / 1-PE, where PE refers to the proportion of errors) was proposed as a means to combine both speed and accuracy into a single dependent variable and, hence, to provide a better summary of the results. 1
Mean response times (RTs, in milliseconds), error rates (in percentage), and inverse efficiency scores (IES, in milliseconds) dependent on block type (single-language; non-switch; switch), response language (L2/English vs. L3/German), and phonetic similarity (similar, neutral, different).
Note. Standard errors in parentheses.
Mixing cost
Mixing cost refers to the difference in RTs and/or errors between trials in single-language blocks and non-switch trials in the alternate-language blocks. The results are shown in Figure 1 for response latencies (top panel), errors (middle panel), and inverse efficiency scores (bottom panel), by response language (English vs. German) and phonetic similarity (similar, neutral, different). Table 4 (top half) shows the outcome of three-way repeated-measures analyses of variance (ANOVAs) conducted separately on latencies, errors, and inverse efficiency scores, with mixing (single vs. alternate), phonetic similarity (similar vs. neutral vs. different), and language (English vs. German). Results for latencies are on the left, for errors in the middle, and for inverse efficiency scores on the right.

Effects of “language mixing.” Top panel: response latencies; middle panel: errors; bottom panel: inverse efficiency scores (see text). Averages by response language (English vs. German) and phonetic similarity (similar, neutral, different). Error bars show within-subjects standard errors using the Cousineau-Morey-O’Brien (Cousineau & O’Brien, 2014) method.
Analysis of variance performed on response latencies (left), error percentages (middle), and inverse efficiency scores (right; see text), separate for “mixing cost” and “switch cost,” with the variables mixing or switching, phonetic similarity, and response language.
Note. Significant effects are in bold.
Response latencies
Latencies showed a significant mixing cost of 61 ms. No main effect of language was found (English: 713 ms, German: 718 ms). Phonetic similarity exerted a facilitatory effect (similar: 644 ms, neutral: 709 ms, different: 792 ms), with Bonferroni-corrected post hoc tests showing significant differences between all three conditions (ps < .001; all post hoc tests reported below were also Bonferroni-corrected). An interaction between mixing and language was found, with larger mixing costs in L2 (82 ms) than in L3 (39 ms; both p < .001). An interaction between mixing and phonetic similarity was found, with post hoc tests showing highly significant mixing costs (ps ⩽ .006) for all three conditions (similar: 30 ms, neutral: 89 ms, different: 63 ms). An interaction between phonetic similarity and language was found, with a subtly different shape of the effect of similarity dependent on language (the difference between neutral and similar was slightly more pronounced in L2 than in L3, but the difference between neutral and different was larger in in L3 than in L2). Simple effects showed that for L2, there was a significant effect of phonetic similarity (F(1, 41) = 61.09, p < .001), and so was the case for L3 (F(1, 41) = 17.66, p < .001). The three-way interaction between mixing, phonetic similarity, and language was not significant.
Errors
An analysis conducted on error percentages (see Table 3) showed a significant mixing cost of 2.3% but no main effect of language, with identical error rates in both languages (4.4%). A main effect of phonetic similarity emerged, with post hoc tests showing a significant difference between the similar and the neutral condition (4.3%, p < .001), as well as the similar and different condition (4.9%, ps < .001), but no difference between the neutral and the different condition (0.6%, p = .578). No interaction between mixing and language was found, with mixing costs of 3.0% in L2 and 1.6% in L3. An interaction between mixing and phonetic similarity was found, with a significant mixing cost in the neutral condition (5.3%, p < .001) but not in the similar condition (0.3%, p = .406) nor in the different condition (1.9%, p = .060). An interaction between phonetic similarity and language was found, again (as in the latencies) with the effect of phonetic similarity modulated by language in a subtle way (the neutral-similar difference was more pronounced in L2 than in L3, but the neutral-different difference was positive in L3 but negative in L2). Simple effects showed that for L2, there was a significant effect of phonetic similarity (F(1, 41) = 24.27, p < .001), and so was the case for L3 (F(1, 41) = 6.91, p = .012). The three-way interaction between mixing, phonetic similarity, and language was not significant.
Inverse efficiency scores
An analysis conducted on inverse efficiency scores showed the same statistical pattern as the response latencies (see Table 3).
Switch cost
Switch cost is defined as the difference in RTs and/or errors between switch trials and non-switch trials in the alternate-language blocks. The results are shown in Figure 2 for response latencies (top panel), errors (middle panel), and inverse efficiency scores (bottom panel), by response language (English vs. German) and phonetic similarity (similar, neutral, different). Table 4 (bottom half) shows the outcome of three-way repeated-measures ANOVAs conducted on latencies, errors, and inverse efficiency scores, with switching (switch vs. repeat), phonetic similarity (similar vs. neutral vs. different), and language (German vs. English). Results for latencies are on the left, for errors in the middle, and for inverse efficiency scores on the right.

Effects of “language switching.” Top panel: response latencies; middle panel: errors; bottom panel: inverse efficiency scores (see text). Averages by response language (English vs. German) and phonetic similarity (similar, neutral, different). Error bars show within-subjects standard errors using the Cousineau-Morey-O’Brien (Cousineau & O’Brien, 2014) method.
Response latencies
Latencies showed a significant overall switch cost of 40 ms, and a main effect of language, with 18 ms faster RTs in L3 than in L2. Phonetic similarity exerted a facilitatory effect (similar: 674 ms, neutral: 774 ms, different: 849 ms), with post hoc tests showing significant differences between all three conditions (p < .001). No interaction between switching and language was found, with similar switch costs in L2 (42 ms) and L3 (39 ms). No interaction between switching and phonetic similarity emerged, with switch costs of 29, 42, and 51 ms for the similar, neutral, and different conditions, respectively. An interaction between phonetic similarity and language was found, which was similar in shape to the one found in the “Mixing cost” analysis above, with the difference between neutral and similar more pronounced in L2 than in L3, but the difference between neutral and different larger in L3 than in L2. Simple effects showed that for L2, there was a significant effect of phonetic similarity (F(1, 41) = 69.12, p < .001), and so was the case for L3 (F(1, 41) = 71.40, p < .001).
Critically, a highly significant three-way interaction between switching, phonetic similarity, and language was found, which is visible in Figure 2. We followed up this interaction in two ways. First, simple effects analyses were conducted for each language separately. For L2, these revealed a significant interaction between switching and phonetic similarity (F(2, 82) = 13.25, p < .001), with post hoc tests showing switch costs of 17 ms (p = .039) in the similar condition, 35 ms (p < .001) in the neutral condition, and 74 ms (p < .001) in the different condition. By contrast, for L3, the interaction between switching and phonetic similarity was not significant (F(2, 82) = 1.86, p = .163), with relatively similar switch costs in the similar (42 ms), neutral (48 ms), and different (27 ms) conditions (all ps < .001). Second, we explored the interplay between switching and language for each level of phonetic similarity separately. For “similar” trials, there was a significant interaction between switching and language (F(1, 41) = 7.91, p = .008), with a switch cost of 17 ms (p = .039) in L2, and a switch cost of 42 ms (p < .001) in L3. For “neutral” trials, the interaction between switching and language was not significant, F(1, 41) = 2.12, p = .153, with switch costs of 35 and 48 ms in L2 and L3, respectively (both p < .001). For “dissimilar” trials, the switching by language interaction was significant, F(1, 41) = 28.25, p < .001, with switch costs of 74 ms (p < .001) in L2, and 27 ms (p < .001) in L3.
Errors
An analysis conducted on error percentages (see Table 4) showed a significant switch cost of 1.2%, and a main effect of language with error rates 1.0% lower in L3 than in L2. Phonetic similarity was also significant, with post hoc tests showing lower error rates in the similar and neutral conditions (p < .001) and in the similar and different conditions (p < .001), but neutral and different conditions are not significantly different (p = .783). No interaction between switching and language was found, with 1.5% switch cost in L2 and 0.9% in L3. An interaction between switching and phonetic similarity was found, with post hoc tests showing a significant switch cost only for the different condition (3.3%, p < .01) but not for the similar condition (p = .430) or the neutral condition (p = .925). An interaction between phonetic similarity and language was found whose shape resembled the one obtained in the analysis on “Mixing costs”: the neutral-similar difference was more pronounced in L2 than in L3, but the neutral-different difference was positive in L3 but negative in L2. Simple effects showed that for L2, there was a significant effect of phonetic similarity (F(1, 41) = 5.61, p = .023), which for L3 was marginally significant (F(1, 41) = 3.00, p = .091). The three-way interaction between switching, language, and phonetic similarity was not significant.
Inverse efficiency scores
An analysis conducted on inverse efficiency scores showed the same statistical pattern as the response latencies (see Table 4), with the only exception that a significant interaction between switching and phonetic similarity was found that was not present in the analysis of RTs: switch costs were 32 ms, 45 ms, and 98 ms for the similar, neutral, and different conditions, respectively (all ps < .001). Critically, the same significant three-way interaction as in the latencies was obtained. Simple effects analyses showed a significant interaction between switching and phonetic similarity for L2 (F(2, 82) = 16.84, p < .001), with significant switch costs in the similar, neutral, and different condition (ps ⩽ .029), but no such interaction for L3 (F(2, 82) = 0.47, p = .625), again with significant switch costs in all three conditions (ps ⩽ .005).
Discussion
The present study examined the dynamic nature of language control in bilinguals at the phonetic level. Specifically, we investigated the influence of phonology on local and global language control, using the alternating language switching paradigm with a letter-naming task.
Phonetic-level language control in bilinguals
An overall facilitatory effect of phonetic similarity was observed in Figures 1 and 2. This gradient was evident even in “single-language” blocks, with analysis showing it in latencies for both English (L2) and German (L3). This suggests co-activation between translation equivalents, leading to faster and more accurate production of letter names when L2 and L3 words were phonetically similar. The finding is reminiscent of the “cognate facilitation effect” (Costa et al., 2000), where bilinguals produce words faster when translations are cognates. Costa et al. argue that bilinguals co-activate multiple language systems, which cascades to the phonological level, creating a facilitatory effect. Our study suggests a similar mechanism, with co-activated translation equivalents affecting the phonetic level.
In our results, phonetic similarity interacted with language in both “mixing” and “switch” analyses, but the interaction was subtle. Simple effects indicated that phonetic similarity significantly affected naming times and errors for each language. In both analyses, the latency difference between “similar” and “different” conditions was 155 ms for English and 170 ms for German, suggesting a larger effect for the weaker (L3) language compared to the stronger (L2) language. This aligns with Costa et al. (2000), who found the cognate facilitation effect more pronounced in L2 than in L1 among Spanish-Catalan bilinguals.
Our findings showed that both mixing and switch costs were reduced by phonetic similarity, with clearer effects in inverse efficiency scores. This contrasts with previous studies where switch costs were larger for cognates than for non-cognates when intermixed within a single block (e.g., Christoffels et al., 2007; Filippi et al., 2014; Thomas & Allport, 2000). Smaller switch costs for cognates than for non-cognates are only found when cognates and non-cognates are implemented in separate blocks (e.g., Broersma & Cutler, 2011; Declerck et al., 2012). Such within- and between-block implementations of cognate status may reflect local- and global-level control processes, respectively (Declerck & Philipp, 2015a).
Unlike cognate words in previous studies, single letters in this study are unaffected by semantic or morphological influences, but this difference likely does not explain the discrepancy. Further research is needed to explore the role of phonetic similarity on mixing and switch costs across different block contexts. It is possible that the mixing/switch cost × phonetic similarity interaction seems exaggerated due to the substantial phonetic similarity gradient: a 148-ms difference in the “mixing cost” analysis and a 175-ms difference in the “switch cost” analysis. Faster responses often show smaller effects, potentially explaining the reduced costs with increased phonetic similarity. Switch costs, as percentages relative to the baseline, were 6.1%, 5.5%, and 4.5% for different, neutral, and similar conditions, respectively, suggesting a less pronounced switch × phonetic similarity interaction than the Figures imply (for mixing costs, the reductions were 8.3%, 13.4%, and 4.8%, respectively, indicating this argument might apply less here).
Finally, the switch analysis (but not the mixing analysis) revealed a complex three-way interaction between switch cost, language, and phonetic similarity. This is evident in the top (RT) and bottom (IES) panels of Figure 2. Simple effects analyses confirmed that in L2, switch costs decreased as phonetic similarity increased, whereas in L3, switch costs were unaffected by phonetic similarity. As outlined earlier, previous studies on the interplay between phonological overlap and switch costs have found a complex pattern, with cognates augmenting switch costs compared to non-cognates, at least when both types are intermixed (e.g., Christoffels et al., 2007). In contrast, our study found smaller switch costs with increased phonetic overlap, but only for L2. This suggests that switching into the less-proficient language (L3) is unaffected by phonetic overlap with L2. However, switching into the more proficient language (L2) is influenced by the overlap between translation equivalents, reversing the previously reported pattern. This complex interaction remains speculative, but the empirical pattern is clear.
Mixing costs, switch costs, and reversed language dominance
Mixing costs
We found significant mixing costs with a letter-naming task, generally in line with previous language switching studies with picture-naming or digit-naming tasks (e.g., Christoffels et al., 2007; Ma et al., 2016; Peeters & Dijkstra, 2018; Stasenko et al., 2017; Wang et al., 2009; Weissberger et al., 2012; Zhang et al., 2020). Mixing costs are typically attributed to inhibition-control mechanisms operating at a global level (for a review, see Kiesel et al., 2010). This global language control is proactive, as inhibition is preemptively applied to prevent interference from non-target languages (Braver, 2012; Jiao et al., 2022; Ma et al., 2016; Peeters & Dijkstra, 2018). In this study, letter-naming in the non-target language was proactively inhibited during single-language blocks, resulting in less cross-language interference than alternate-language blocks. Conversely, mixing costs in alternate-language blocks may stem from the added attentional demands of language monitoring (e.g., Braver, 2012; Koch et al., 2005; Prior & MacWhinney, 2010). Mixing costs were asymmetric, with larger mixing costs for English (L2) than for German (L3), in accordance with some previous studies (e.g., Christoffels et al., 2007; Jylkkä et al., 2017; Mosca & de Bot, 2017; Peeters & Dijkstra, 2018; Prior & Gollan, 2011). Nevertheless, some studies have found no mixing costs or even benefits for the less dominant language in cued language switching (e.g., Christoffels et al., 2007; Jylkkä et al., 2017; Mosca & Clahsen, 2016; Mosca & de Bot, 2017). In addition, benefits have been observed in voluntary language switching (e.g., de Bruin et al., 2018; Gollan & Ferreira, 2009; Jevtović et al., 2020; Jiao et al., 2022; Liu et al., 2021). Further research is needed to explore language-mixing costs or benefits.
Switch costs
Robust switch costs were observed in the alternate-language blocks, consistent with numerous previous studies on language switching (e.g., Christoffels et al., 2007; Costa & Santesteban, 2004; de Bruin et al., 2018; Declerck & Philipp, 2015a; Meuter & Allport, 1999; Philipp et al., 2007; Timmer et al., 2018; Verhoef et al., 2009; Zhang et al., 2020). Previous studies typically used lexically based stimuli and responses (e.g., picture names or digits). In contrast, we used alphabet letters, which lack semantic and morphological content. Despite this, robust switch costs were observed, similar to recent findings by Zuo et al. (2022). These switch costs indicate transient, trial-to-trial language control, reflecting local inhibition due to cross-language interference during language switching (Declerck & Philipp, 2015a; Guo et al., 2011; Ma et al., 2016). Notably, switch costs were identical when switching into L2 vs. L3. Previous studies often reported asymmetrical switch costs in unbalanced bilinguals, with larger costs when switching into the more dominant language. (Gade et al., 2021; Macizo et al., 2012; Meuter & Allport, 1999; Peeters et al., 2014; Philipp et al., 2007; Verhoef et al., 2009). However, few studies have examined trilingual speakers switching between their two non-dominant languages (L2 and L3) as we did. Costa et al. (2006) found symmetrical switch costs for Catalan (L2) and English (L3) in Spanish-Catalan bilinguals. Also, variables like language proficiency, preparation time, ITI duration, and task difficulty can influence switch cost directionality (for reviews, see Bobb & Wodniecka, 2013; Declerck & Philipp, 2015a; Green & Abutalebi, 2013). In addition, volition to switch might affect switch cost asymmetry and overall switch costs (e.g., de Bruin et al., 2018; Gollan & Ferreira, 2009; Jevtović et al., 2020; Jiao et al., 2022; Liu et al., 2021). In our study, the predictable sequence in alternate-language blocks allowed for extensive preparation time for both switch and non-switch trials, potentially reducing switch cost asymmetry (Verhoef et al., 2009). Alternatively, the relatively long ITI (i.e., 1200 ms) may have incurred a decay of inhibition persisting from a previous task (Declerck et al., 2012).
Reversed language dominance effect
In single-language blocks, participants responded with similar speed and accuracy in L2 and L3. However, in alternate-language blocks, responses were slower in the dominant language (English, L2) than in the less dominant language (German, L3), reflecting the “reversed language dominance” effect seen in studies with randomly intermixed languages (Costa & Santesteban, 2004; Declerck et al., 2020; Gollan & Ferreira, 2009; Heikoop et al., 2016) but also in studies with a predictable language sequence (Christoffels et al., 2007; Declerck et al., 2015; Wong & Maurer, 2021). The asymmetrical switch costs are generally viewed as a measure of transient, reactive control over language inhibition (Bobb & Wodniecka, 2013; Declerck & Philipp, 2015a), whereas the effect of reversed language dominance is seen as a marker for sustained, proactive control over language inhibition (Declerck, 2020; Kleinman & Gollan, 2018). However, the generality and robustness of these two phenomena remains controversial. For example, despite the fact that asymmetrical switch costs are extensively reported in most studies on language switching, some studies failed to observe such costs (e.g., Christoffels et al., 2007; Verhoef et al., 2010) and even reversed asymmetrical switch costs (e.g., Declerck et al., 2012; Macizo et al., 2012; Thomas & Allport, 2000). Inconsistent results may arise from subtle variations in methodological aspects (Declerck & Koch, 2023; Kleinman & Gollan, 2018). Moreover, Gade et al. (2021) used Bayesian linear mixed effects modeling to reveal challenging results on asymmetrical switch costs and reversed language dominance. In addition, a most recent Bayesian re-analysis confirmed the reliability of reversed language dominance, suggesting the need for cautious interpretation and further studies on asymmetrical switch costs (Goldrick & Gollan, 2023).
As summarized in the Introduction, a relevant study by Zuo et al. (2022) examined Chinese-English bilinguals naming a subset of the alphabet in either Pinyin (L1) or English (L2) in “pure” or intermixed blocks. Pinyin, used in early reading acquisition, is rarely read by Chinese adults, making English pronunciation potentially more dominant despite being L2 (Qin et al., 2016). In contrast, our study involved switching between L2 (English) and L3 (German), with a clearly defined L2/L3 hierarchy. Phonetic overlap between translation equivalents was central to our research but not explicitly considered by Zuo et al. They found similar reversed language dominance effects in both “mixing” and “switch” analyses, whereas our study showed this effect only in the “switch” analysis. In addition, Zuo et al. found no interaction between mixing costs and language but did find an interaction between switch costs and language. Our results similarly showed a significant switch × language interaction but no mixing × language interaction (see Table 4). These differences might relate to the linguistic status of the target languages (L1 vs. L2 vs. L3) or their phonological properties. Future studies should compare L1, L2, and L3 in letter-naming tasks with trilingual speakers to further explore these dynamics.
Overall, phonetic switching is an effective paradigm to investigate the dynamics of language control at the phonetic level. The present study, although requiring a type of phonetic switching, differs from previous studies in the use of a letter-naming task in which letters are maximally context-free. In contrast to the stimulus being a single letter in the present study, previous studies mostly used words or digits as stimuli which bear lexical and semantic effects, as shown by the manipulation of cognate status (Broersma & Cutler, 2011; Christoffels et al., 2007; Declerck et al., 2012; Filippi et al., 2014; Thomas & Allport, 2000) in language switching paradigm with a picture-naming task. Hence, the language switching paradigm with a letter-naming task provides an effective avenue for investigating phonetic-level language control.
Our findings provide further evidence that phonetics plays an important role in language control. Of note, however, language control in more ecologically valid experimental paradigms (i.e., picture- vs. letter-naming) involves cognitive control at multiple levels, ranging from the conceptual level, lemma level, phonological level, orthographic level, and even levels outside of language processing (Declerck & Philipp, 2015a). For example, a recent study demonstrated a facilitatory effect by both phonological and orthographic overlap while participants conduct language switch while writing (Roembke et al., 2023). As such, it is critical to underscore the complex processes of language control in the bilingual brain. It therefore follows that the complex pattern of results across studies could be attributed to the fact that language control occurs at multiple loci, but not at a single locus.
Implications for models of language control
The IC model (Green, 1998) is an influential model of language control in bilingual production. According to this model, switch costs arise from persisting inhibition from previously performed language task, and the switch cost asymmetry is the signature effect of inhibition. Moreover, the inhibitory mechanism also functions at the more global level as evidenced by mixing costs and reversed language dominance. On top of inhibition-based models, there are also activation-based models (e.g., Costa et al., 1999; Finkbeiner et al., 2006; La Heij, 2005; Roelofs, 1998) in which the specification of the target language at the conceptual level serves to activate the target language more than the non-target language. In a sense, the activation account is akin to the inhibition account just as a coin has two sides. As such, both inhibition and activation accounts can explain cross-language interference. However, there are also cross-language facilitation (rather than inhibition) effects, an issue which has received little attention in extant literature. Facilitation effects of phonetic similarity as evidenced in the present study and cognate facilitation effects obtained in other studies (e.g., Costa et al., 2000; Hoshino & Kroll, 2008) indicate the presence of cross-language activation at the phonetic level. Models of language control are yet to accommodate both cross-language interference and facilitation.
A well-agreed-upon consensus is that language control recruits mechanisms that are akin to those responsible for cognitive control in general (for a review, see Abutalebi & Green, 2007, 2008). However, there are proposals for language-specific control on the ground that language is unique for its multi-level subsystems (Bobb & Wodniecka, 2013; Declerck & Philipp, 2015a; Gollan et al., 2014; Kroll et al., 2006; Olson, 2013; Zhang et al., 2019). It is therefore plausible to conceive that language control can unfold at multiple linguistic levels. For example, Zhang et al. (2019) dissociated concept-level control from language form-level control, using language switching paradigm with a picture-naming task. In the review by Declerck and Philipp (2015a), multi-level language control was demonstrated during bilingual speech production. More specifically, the functional loci of language control may occur at the levels of concepts (Declerck & Philipp, 2015a; La Heij, 2005; Poulisse & Bongaerts, 1994; Schwieter & Sunderman, 2008), lemmas (e.g., Bultena et al., 2014; Hartsuiker & Pickering, 2008), phonology (Beauvillain & Grainger, 1987; Broersma & Cutler, 2011; Declerck et al., 2012; Goldrick et al., 2014), and orthography (Orfanidou & Sumner, 2005; Thomas & Allport, 2000). It will be important for future research to investigate language-specific control. On the other hand, it is important to highlight an integrative view of cognitive control in general rather than “isolated consideration of a single theoretical research perspective” (Koch et al., 2018).
To note, a potential limitation of the present study is that we are not sure of the extent to which behavioral performance in L2 (English) and L3 (German) was influenced by the first language (Mandarin). A further study with a sample of English-German bilingual participants switching between their L1 and L2, which we could not recruit in our language settings, would be desirable to unequivocally portray language control at the phonetic level. In addition, the Chinese-English-German participants in this study had studied English for approximately 6–13 years by the time of the experiment, while they were intensively learning German as a new language, resulting in greater exposure to and use of German. Consequently, it is important to note that language exposure and use may also influence the pattern of L2 L3 interference or facilitation observed in this study.
Our results show a complex interplay of suppression (as indexed by switch costs) and facilitation (i.e., the effect of phonetic similarity between letter translation equivalents). A theoretically relevant question is whether this pattern is specific to the items used within an experimental session, or whether suppression applies more generally to language-specific pronunciation patterns. It is in principle possible to test for item-specificity vs. item-generality of the effects, for instance, by dividing items into halves, by using one half of items in the language switching study, and then in a subsequent “pure” language block to test whether suppression transfers to the other half of items. However, doing so may be practically difficult in a letter-naming study given that the stimuli are so limited in number. 2 Moreover, it needs to be pointed out that letters vary in frequency or familiarity, which may have a potential impact on letter naming. However, given the limited number of letters available as stimuli, it is impossible to control for both phonetic similarity and frequency simultaneously in the present study. Future research could manipulate letter frequency as a variable.
Conclusion
The language switching paradigm with a letter-naming task provided a unique opportunity to investigate interaction of the two phonetic systems of bilingual individuals. The present study demonstrated significant local-level switch costs, global-level mixing costs, and reversed language dominance, suggesting cross-language phonetic interference. Meanwhile, there was also cross-language phonetic facilitation as shown by reduced switch, as well as mixing costs on phonetically similar trials. Therefore, future work is needed before language control can be assumed to be mainly based on inhibitory mechanism. Our results shed light on the dynamics of language control at the phonetic level, suggesting that there is language control beyond the phonetic level. Future research with a focus on the dynamic interplay of language control at multiple linguistic levels will serve to better depict the theoretical landscape.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported with the funding provided to Yong Zhang (21XYY001) by the National Social Science Foundation of China.
