Abstract
Speech of late bilinguals has frequently been described in terms of cross-linguistic influence (CLI) from the native language (L1) to the second language (L2), but CLI from the L2 to the L1 has received relatively little attention. This article addresses L2 attainment and L1 attrition in voicing systems through measures of voice onset time (VOT) in two groups of Dutch–German late bilinguals in the Netherlands. One group comprises native speakers of Dutch and the other group comprises native speakers of German, and the two groups further differ in their degree of L2 immersion. The L1-German–L2-Dutch bilinguals (
Keywords
I Introduction
Adults speaking a second language (L2) are likely to be identified as non-native speakers due to properties of their first language (L1) in their L2 speech (Brennan et al., 1975; Ferguson and Garnica, 1975; Flege, 1980, 1981; Scovel, 1969). Immersion in an L2 environment may cause the L2 to play a dominant role in everyday life, and may reduce the use of the L1 and contact to other native speakers. While L2 immersion can be beneficial to approach a native accent in the L2, the associated reduced L1 use may cause linguistic abilities in the L1 to deteriorate, a phenomenon known as L1 attrition (Freed, 1982; Schmid, 2004). When L1 attrition affects the domains of phonology or phonetics, it can surface as foreign-accented L1 speech (Bergmann et al., 2016; De Leeuw et al., 2010; Hopp and Schmid, 2013). The present study combines investigations of L2 attainment and L1 attrition in the speech of two groups of late bilinguals who differ in their degree of L2 immersion to assess potential bidirectional L1–L2 influences in their phonetic systems.
Bidirectional L1–L2 influences in a bilingual’s speech can be explained by the Speech Learning Model (SLM; Flege, 1995). The SLM postulates that bilinguals have a common L1–L2 phonetic space and that these phonetic systems remain to some degree flexible in adulthood. If an L2 sound is not perceived as sufficiently different from an L1 sound, it may be classified as this phonetically similar L1 sound, a process known as ‘equivalence classification’. As a result of equivalence classification in perception, also the speaker’s production of that L2 sound may be different from native speakers’ productions.
New L2 categories can be established provided they are perceived as sufficiently different from existing L1 sounds. Nevertheless, new L2 categories in a bilingual’s L1–L2 phonetic space may still deviate from those of monolingual native speakers, for example to maintain contrasts with the bilingual’s L1 categories. Hence, the speech of an L2 speaker who acquired new L2 categories may still deviate from native speech.
The SLM’s assumption that phonetic systems remain flexible over the lifespan also implies that L1 categories can change under the influence of L2 acquisition, which can lead to a foreign accent in the L1. For this reason, the SLM has previously been used to interpret phonetic L1 attrition (Bergmann et al., 2016; Chang, 2012; Mayr et al., 2012). In order to understand how phonetic categories are organized in a speaker who accommodates two languages, it is important to characterize phonetic properties in both L2 and L1 speech (Chang, 2012; De Leeuw et al., 2012, 2013; Flege and Eefting, 1987a, 1987b; Mayr et al., 2012; Mennen, 2004; Sancier and Fowler, 1997).
Bilinguals’ linguistic skills in the L2 are typically established by comparing their speech against monolingual native speech (Abrahamsson and Hyltenstam, 2009; Bongaerts et al., 1997). If the goal is to determine to what extent bilinguals have been able to adapt to the phonetic environment in which they actually acquire the L2, a comparison against monolingual native speakers may be unsuitable (for similar thoughts on heritage language acquisition, see Rothman, 2007). For example, consider an L2 learner who acquires the L2 in the home country where he or she is exposed to other non-native speakers (e.g. non-native instructors or fellow L2 speakers in the home country) or to a native speaker with attrited L1 speech (e.g. an immigrant from the L2 country). In this case, comparing L2 speakers with monolingual native speakers implies that L2 speakers are evaluated against a type of speech to which they are barely exposed.
The monolingual reference point is also problematic because bilinguals are affected by cross-linguistic competition between their two languages (Cook, 2007; Hopp and Schmid, 2013; Kroll et al., 2006; Kupisch et al., 2013; Rothman and Treffers-Daller, 2014; Schmid et al., 2014). In addition, bilinguals presumably have to accommodate more phonetic categories than monolinguals. For example, consider a native speaker of Dutch who acquired German as L2 and a monolingual native speaker of German. The L2 speaker’s phonetic system comprises L1-Dutch and presumably L2-German sounds, while the monolingual’s phonetic system only comprises L1-German sounds. The mere process of becoming bilingual, with more phonetic categories to accommodate, may make the monolingual state impossible to attain. If we aim to test to what extent L2 speakers approach the speech of their linguistic environment, both the characteristics of the language to which they are exposed and the fact that they are bilingual need to be acknowledged. These two considerations make it important to compare bilinguals to native speakers who have been exposed to a comparable linguistic environment and who are bilinguals themselves (Cook, 2007; Hopp and Schmid, 2013; Kroll et al., 2006; Kupisch et al., 2013; Rothman and Treffers-Daller, 2014; Schmid et al., 2014).
A bilingual’s daily linguistic environment is largely determined by the country of residence and may influence the linguistic skills in both L1 and L2. Bilinguals immersed in the L2 country are likely to be exposed to more speakers of their L2 compared to L2 speakers who live in their home country. The number of speakers who provide linguistic input has recently been identified as an important factor in the early stages of monolinguals’ phonotactic learning (Seidl et al., 2014) and heritage speakers’ lexical development (Gollan et al., 2015). Furthermore, quality and quantity of native language input play a crucial role in maintaining a native-like L1 accent after immigration to an L2 country (De Leeuw et al., 2010; Mayr et al., 2012). Input quality, quantity and diversity as captured through the country of residence are possibly also crucial factors in L2 acquisition.
The present study specifically focuses on the production of voice onset time (VOT) in two groups of late bilingual adults who live in binational households either in their home country or the L2 country, and who are L2 speakers and potentially L1 attriters. VOT is an acoustic cue that can contribute to a perceived foreign accent in both L2 speakers and L1 attriters (Flege, 1984; Flege and Eefting, 1987b; Major, 1987; Riney and Takagi, 1999; Sancier and Fowler, 1997; Schoonmaker-Gates, 2015). The present research enriches the existing literature on VOT in L2 attainment and L1 attrition in three important ways. First, it implements the methodological considerations on L2 attainment outlined above by evaluating L2 speech against the speech of native speakers who are bilinguals themselves and whose speech is characteristic to the L2 speakers’ linguistic environment. Second, it brings together investigations of L2 attainment and L1 attrition in the same speakers. Third, the present experiments cover VOT production in voiceless and voiced plosives to allow insight into the speakers’ voicing contrasts. By addressing these three considerations, the present study allows assessing the possible restructuring of bilinguals’ voicing systems.
VOT is the most important acoustic cue to distinguish voiced and voiceless plosives, and describes the time interval between a plosive’s burst release and the onset of voicing (Abramson and Lisker, 1973; Lisker and Abramson, 1964). The VOT continuum can be divided into three phonetic categories: prevoicing (negative VOT), short lag (short positive VOT) and aspiration (long positive VOT). Dutch contrasts prevoiced ‘voiced’ and short lag ‘voiceless’ plosives (e.g. Lisker and Abramson, 1964). German contrasts short lag ‘voiced’ and aspirated ‘voiceless’ plosives (e.g. Jessen, 1998). Thus, depending on the language, short lag plosives can be phonologically classified as ‘voiceless’ (in Dutch) or ‘voiced’ (in German). Although voiced plosives do not require prevoicing in German, adult native speakers sometimes prevoice initial singleton plosives (Fischer-Jørgensen, 1976; Hamann and Seinhorst, 2016; Jessen, 1998; Kohler, 1977; Stock, 1971).
In production, prevoicing, short lag and aspiration differ in the required velopharyngeal activity, which is reflected in children’s acquisition order (Allen, 1985; Bortolini et al., 1995; Kager et al., 2007; Kewley-Port and Preston, 1974; Khattab, 2000; Macken and Barton, 1980a, 1980b; MacLeod, 2016; Stoehr et al., 2017): across different languages, children produce the least complex short lag VOT in their early babbles. Around their second birthday, children acquiring an aspiration language produce aspiration, for which the glottis must remain open throughout consonantal closure. Substantially later, possibly in the early school years, children speaking a prevoicing language attain adult-like prevoicing, for which the glottis must be closed considerably before consonantal release and, additionally, vocal fold vibration must be initiated and sustained (Kewley-Port and Preston, 1974).
Within each phonetic category, small VOT differences can arise depending on the consonantal place of articulation (e.g. Lisker and Abramson, 1964) and, in the case of voiceless aspirated plosives, word length (Flege et al., 1998; Yu et al., 2015). In addition, male speakers produce optional prevoicing more frequently than female speakers (Ryalls et al., 1997), which can be ascribed to sex differences in vocal tract morphology (Fitch and Giedd, 1999).
1 Previous research into VOT in L2 acquisition
When bilinguals speak two languages that implement the voicing contrast differently, as is the case for the participants in the present study, a potential influence from L1 to L2 can be measured in their VOT. For voiceless plosives, three different acquisition patterns have been observed in late bilinguals whose L1 is a prevoicing language (Arabic, Dutch, French or Spanish) and who learn an aspiration L2 (English or German): (1) native-like acquisition (Schmid et al., 2014; Simon, 2009; Simon and Leuschner, 2010, the phonetically trained participants); (2) differential acquisition (Flege, 1987, 1991; Flege and Eefting, 1987a, 1987b; Simon and Leuschner, 2010, the phonetically untrained participants); and (3) complete L1-to-L2 transfer (Flege, 1987, the least experienced participants; Flege and Port, 1981).
The native-like VOT acquisition pattern has been observed in highly advanced L1-immersed native speakers of Belgian Dutch with L2-English (and some participants with L3-German). The late bilinguals produced VOT in English (and German) voiceless plosives similar to monolingual native speakers (Simon, 2009; Simon and Leuschner, 2010). Similarly, native speakers of Dutch in the Netherlands reached comparable VOT durations in English as English native speakers who were also immersed in a Dutch environment (Schmid et al., 2014). These studies demonstrate that native-like aspiration of voiceless plosives can be acquired without L2 immersion.
The differential VOT acquisition pattern occurs when bilinguals produce VOT differently in their L2 than in their L1, but still deviate from native speakers’ VOT in the L2. This pattern has been observed in bilinguals with L1-Spanish who learned L2-English as adults: their VOT was longer in English than in Spanish, but their English VOT was nevertheless shorter than that of monolingual English speakers (Flege, 1991). The same pattern emerged in bilinguals with L1-Spanish who learned L2-English during childhood, and occurred irrespective of whether they were immersed in an English environment or not (Flege and Eefting, 1987a). Similar results come from Dutch native speakers in the Netherlands with L2-English and L3-German who were not formally instructed in L2 and L3 phonetics. The speakers produced distinct VOT values for Dutch short lag voiceless plosives versus English and German aspirated voiceless plosives. Yet, their aspirated VOT productions in English and German still appeared shorter than the VOT of English and German monolinguals, although no direct statistical comparison was administered (Simon and Leuschner, 2010). L2 speakers with some level of L2 proficiency can thus differentiate L1 and L2 plosives in VOT, but do not necessarily reach native-like VOT.
The complete L1-to-L2 VOT transfer pattern has been observed in L1-Arabic speakers with L2-English in the USA (Flege and Port, 1981). Their VOT for English voiceless plosives was similar to Arabic and was therefore shorter than the VOT of English monolinguals. Although the L2 speakers were immersed in the L2 country for several years, they did not show evidence for phonetic differentiation between L1 and L2 VOT. L2 immersion thus does not always lead to the acquisition of new – be it native-like or differential – L2 VOT for voiceless plosives.
In sum, most studies on L2 VOT dealt with the acquisition of voiceless plosives. For long lag voiceless plosives, native-like acquisition, differential acquisition, and complete L1-to-L2 transfer have been observed, as was described above. For the acquisition of short lag voiceless plosives, native-like acquisition has never been reported, but it has only been addressed in one study, on English L2 speakers of French (Flege, 1987).
Studies on late bilinguals’ production of voiced plosives reveal two acquisition patterns: native-like acquisition and L1-to-L2 transfer. The native-like acquisition pattern has been observed for L2 short lag voiced plosives in only one sample of Dutch native speakers with L2-English even though they were not immersed in the L2-speaking country (Schmid et al., 2014). The L1-to-L2 transfer pattern of L1 prevoicing to L2 short lag has also been observed, even in advanced and phonetically trained L2 speakers (Simon, 2009; Simon and Leuschner, 2010). Similarly, bilinguals who acquired their L2 during childhood tend to produce voiced plosives with prevoicing in both languages, especially when their dominant language requires prevoicing (Flege and Eefting, 1987a; Hazan and Boulakia, 1993; MacLeod and Stoel-Gammon, 2009; Sundara et al., 2006).
No data are yet available on the opposite scenario: late bilinguals’ acquisition of L2 prevoiced voiced plosives when their L1 does not require prevoicing. The present study fills this gap in the literature by contributing data on the production of voiced plosives in Dutch by native speakers of German.
In sum, native-like attainment and even VOT differentiation between L1 and L2 do not seem to require immersion, and do not automatically result from immersion. Two studies suggest that VOT differentiation may instead be related to language experience. This relationship was observed for the acquisition of voiceless plosives in bilinguals whose L1 was a prevoicing language (Spanish) learning an aspiration L2 (English), as well as in bilinguals with an aspiration L1 (English) learning a prevoicing L2 (French) (Flege, 1987; Flege and Eefting, 1987a). The more advanced L2 speakers in these two studies produced different VOT in their L2 than in their L1, but still showed differential VOT acquisition. Only the less experienced L2 speakers displayed full L1-to-L2 transfer and thus did not produce language-specific VOT. These studies suggest that language experience contributes to differentiating VOT between L2 and L1, but it may not necessarily be a sufficient predictor for native-like VOT acquisition in the L2.
2 Previous research into VOT in phonetic attrition
In some L2 speakers, the reverse of L1-to-L2 influence can be observed, namely an influence from L2 to L1. Bilinguals whose L2 has become the dominant language, for example through L2 immersion, are generally more prone to L1 attrition than L1-dominant bilinguals (Schmid and Köpke, 2007). The present study also investigates speech production in L2-immersed bilinguals, who may be affected by L1 attrition.
Research on L1 VOT in phonetic attrition is sparse, but there is broad evidence for L1 phonetic attrition at the segmental level (Bergmann et al., 2016; Chang, 2012; De Leeuw et al., 2013; Flege, 1987; Flege and Hillenbrand, 1984; Major, 1992; Mayr et al., 2012; Sancier and Fowler, 1997; Ulbrich and Ordin, 2014; Ventureyra et al., 2004) and the suprasegmental level (De Leeuw et al., 2012; Mennen, 2004). L1 attrition affecting the segmental or suprasegmental level may surface as a global foreign accent (Bergmann et al., 2016; De Leeuw et al., 2010; Hopp and Schmid, 2013). Most of these studies on L1 phonetic attrition reported changes in the realization of L1 speech sounds or prosody under the influence of long term L2 use (for short term L2 use, see Chang, 2012), and thus represent a context of language use that is similar to that of the participants in the present study.
Phonetic attrition can surface as a drift of the L1 VOT values towards the L2 VOT values. Four studies have observed phonetic attrition surfacing as durational changes in VOT in highly proficient L2 speakers (Flege, 1987; Major, 1992; Mayr et al., 2012; Sancier and Fowler, 1997). The bilinguals in these studies spoke Dutch, French or Portuguese, which have voiceless short lag plosives, in addition to English, which has voiceless aspirated plosives, like German. Native speakers of English produced shorter VOT in English voiceless plosives when they frequently used French or Portuguese (Flege, 1987; Major, 1992). This was irrespective of whether they were immersed in the L2 or L1 context. Similarly, L1 speakers of French or Portuguese who were immersed in L2-English produced voiceless plosives with longer VOT in L1-French and L1-Portuguese than the respective monolinguals (Flege, 1987; Sancier and Fowler, 1997). Further support for L1 phonetic attrition of VOT comes from a case study of a monozygotic twin who emigrated from the Netherlands to the United Kingdom 30 years before testing (Mayr et al., 2012). Her VOT production was evaluated against the speech of the other twin who lived in the Netherlands throughout her life. The emigrated twin exhibited longer – and therefore more English-like – VOT in voiceless plosives than the Netherlands-based twin. By contrast, the emigrated twin’s L1-Dutch voiced plosives remained prevoiced and were thus not affected by L1 phonetic attrition. These four studies suggest that changes to the L1 VOT may be limited to bilinguals with high L2 proficiency, but appear to occur independently of the immersion context (Flege, 1987).
A more nuanced view on the role of the immersion context on durational changes to L1 VOT and target-like L2 VOT production is provided by longitudinal data of one Portuguese–English late bilingual (Sancier and Fowler, 1997). The speaker produced longer – and thus more English-like – VOT in L1-Portuguese and L2-English after several months of L2 immersion in the USA. In turn, the speaker produced shorter – and thus more Portuguese-like – VOT after subsequent L1 immersion in Brazil. These durational VOT changes were perceived by native listeners of Brazilian Portuguese who rated the speech as more accented right after the informant’s stay in the USA than after a stay in Brazil. This study suggests that changes to L1 VOT do not necessarily reflect an irreversible loss of native-like L1 VOT.
Although L1 attrition surfacing as durational VOT changes has been observed in highly proficient L2 speakers (Flege, 1987; Major, 1992; Mayr et al., 2012; Sancier and Fowler, 1997), high L2 proficiency does not automatically lead to attrition of L1 VOT. Dutch L1 speakers who acquired native-like aspiration in L2-English maintained short lag VOT in Dutch voiceless plosives (Simon, 2009; Simon and Leuschner, 2010). These speakers lived in their L1 country, which suggests that it may be easier to maintain native-like L1 VOT with frequent native L1 input.
The observed cases of L1 VOT drift in voiceless plosives are in line with the Speech Learning Model’s (SLM) assumed flexibility of L1 phonetic categories (Flege, 1995), and showed that L2 VOT can influence L1 VOT. This influence is not limited to an L2 immersion context, but rather seems related to frequency of language use. In addition, frequent L1 exposure through L1 immersion may help to prevent L1 attrition in highly proficient L2 speakers.
Only the case study of Mayr et al. (2012) included investigations of VOT in voiced plosives, but found no evidence for phonetic attrition of L1 prevoicing. The present study follows up on this finding to address whether voiced plosives are indeed resistant to durational changes of L1 VOT, while voiceless plosives are frequently affected.
3 The current study
This study investigates VOT in the L1 and L2 speech of Dutch–German binational couples living in the Netherlands. Each couple consists of one partner with L1-Dutch and L2-German and one partner with L1-German and L2-Dutch. Within each couple, interactions in both languages are common as the two partners have at least one child that they raise bilingually. The L1-Dutch speakers are frequently exposed to German and to non-native Dutch at home through their German partner and their bilingual child or children. Similarly, the L1-German speakers are frequently exposed to Dutch and non-native German at home. The exposure to German in both groups of bilinguals is limited to the family context. Exposure to Dutch occurs, on the other hand, in a variety of contexts and through multiple speakers.
In addition to a difference in immersion, the two groups face a different acquisition task: to produce target L2 VOT, the L1-Dutch speakers need to suppress Dutch prevoicing and learn to produce German aspiration. The L1-German speakers need to suppress German aspiration and learn to produce Dutch prevoicing.
This study combines investigations of VOT in L2 acquisition and L1 attrition in both voiceless and voiced plosives in the same speakers. Addressing the speakers’ two languages and both voicing categories is essential to draw conclusions about the structure of bilinguals’ phonetic space and voicing systems. The use of bilingual couples as participants allows addressing L2 attainment by comparing one group of bilinguals’ L2 to the other group of bilinguals’ L1, which offers two crucial advantages. First, a comparison between the L2 of one group of bilinguals and the L1 of the other group of bilinguals accounts for the characteristics of the speech to which the L2 speakers are daily exposed in their immediate social environment. Second, the L1 speech of bilinguals rather than monolinguals represents target speech that L2 speakers can in fact approach, as both groups’ phonologies encompass a similar number of phonemes.
The three questions we are specifically asking regarding both groups of bilinguals are whether both acquisition contexts allow to: (1) produce VOT differently in L1 and L2; (2) realize VOT in the L2 similarly to native speakers who are bilingual themselves; and (3) maintain L1 VOT that is similar to a monolingual control group consisting of speakers representative of the linguistic environment in which the participants acquired and used their L1 before they became bilingual.
Regarding the L1-Dutch speakers, we hypothesize that they produce longer than monolingual-like VOT in L1 voiceless plosives, but maintain native-like prevoicing in L1 voiced plosives (compare Mayr et al., 2012). In L2-German, we expect the L1-Dutch speakers to produce voiceless plosives with longer VOT than in Dutch, but shorter VOT than the L1-German speakers. We further expect transfer of L1 prevoicing to L2 voiced plosives.
Regarding the L1-German speakers, we hypothesize to find shorter than monolingual-like VOT in L1 voiceless plosives, and possibly prevoiced voiced plosives to maintain a clear voicing contrast. If the L1-German speakers are indeed capable of producing prevoicing in L1-German and L2-Dutch, which has never been addressed in previous research, we expect them to be able to suppress aspiration and produce L2-Dutch voiceless plosives with target-like short lag VOT.
II Method
1 Participants
Ninety-seven speakers divided over four groups participated in this study: bilinguals with L1-Dutch and L2-German (
Participant overview.
Sixteen of the L1D–L2G speakers have had formal instruction to German in high school; the other two learned German only as adults when they met their German partner. The average age of first exposure to German of the L1D–L2G speakers was 13 years (range 1–28,
The L1G–L2D speakers learned Dutch at an average age of 23 years (range 8–33,
Although not all participants reported knowledge of an additional language besides Dutch and German, schooling in the Netherlands and Germany requires all students to study English. Language teachers in these countries are, traditionally, non-native speakers of English.
The majority of the bilingual participants were 17 Dutch–German binational couples, contributing one partner to the L1D–L2G group and the other partner to the L1G–L2D group. One additional participant in the L1D–L2G group and six participants in the L1G–L2D group participated without their partners. The bilinguals were tested in different provinces across the Netherlands.
Of the Dutch monolinguals, two reported some knowledge of German, and three reported speaking English sporadically. All Dutch monolinguals were tested in or around Nijmegen in the Central Eastern Netherlands. Four of the monolingual German participants had some knowledge of Dutch, but none of them reported regular use of a language different from German. The German monolinguals were tested in Central Western Germany (
2 Materials and procedure
The target plosives were voiceless /p/, /t/ and /k/ and voiced /b/ and /d/. As /ɡ/ is not a native phoneme of Dutch, it was not included in this study for either language. For each language and plosive, six target words were selected that were picturable, plosive-vowel-initial nouns, such as the Dutch word
Testing took place in a quiet room in the participants’ homes, after the participants signed informed consent for their family to participate in the study. When both participants from a couple completed the task during the same testing session, the other participant left the room during the recordings. The participants were shown pictures of the target words and they were asked to name them at a comfortable pace without a determiner. The participants then filled out a language background questionnaire, while their children completed three tasks for a different study (Stoehr et al., 2017). Finally, the participants named the pictures in their other language. The language order was counterbalanced across participants. The picture naming took approximately three minutes per language. At the end of the session, the participants and their child were compensated with €10 or a book.
3 Recordings and VOT measurements
Recordings were made with an Olympus Linear PCM Recorder LS-10 with uncompressed 24 bit / 96 kHz recording capability. VOT measurements were performed in Praat (Boersma and Weenink, 2014) taking into account waveforms and spectrograms viewed at zero to 5,000 Hz. The burst onset was measured as the onset of abrupt energy release. The onset of voicing was defined as the first periodic component of the waveform and was measured at the preceding zero-crossing (Francis et al., 2003). Inter-coder reliability based on 25% of the data indicated 99% agreement. Measurements of voiceless plosives were considered in agreement when they differed less than 10 ms (Fabiano-Smith and Bunta, 2012). Coding of voiced plosives was considered in agreement when both coders rated VOT as either prevoiced or short lag. Only tokens that allowed unambiguous measurements without coarticulation or speech overlap entered the analyses. Figure 1 shows examples of VOT measurements of prevoicing, short lag, and aspiration, respectively.

Acoustic landmarks from top to bottom: A. prevoicing, B. short lag, C. aspiration.
III Results
In this section, we first provide an overview of the descriptive statistics of voiceless plosives (Table 2 and Figure 2) and voiced plosives (Tables 3 and 4, Figure 3). We then present the statistical models (Table 5) before we turn to the statistical effects of
Voice onset time (VOT) in ms by place of articulation over participants.

Voice onset time (VOT) of voiceless plosives by language background over participants.
Mean percentage of prevoiced plosives by place of articulation over participants.
Voice onset time (VOT) in ms of short lag voiced plosives by place of articulation over participants.

Percentage of voiced plosives produced with prevoicing by language background over participants.
Model specifications.
LangBackgr. = Language Background; PoA-LC = Place of Articulation: Labial vs. Coronal; PoA-CD = Place of Articulation: Coronal vs. Dorsal.
only in Dutch model due to convergence problems; 2only in German model due to convergence problems.
Results overview.
Table 2 provides the means and standard deviations of VOT per voiceless plosive over participants by language and language background. Both groups of bilinguals produced overall longer VOT in German than in Dutch. In each language, the bilinguals produced L1 VOT intermediate to the monolinguals’ L1 VOT and the L2 VOT of the other group of bilinguals. In Dutch, the L1D–L2G speakers produced minimally longer VOT than the monolinguals, and shorter VOT than the L1G–L2D speakers. In German, the L1G–L2D speakers produced VOT that was intermediate to the monolinguals’ overall longer VOT and the L1D–L2G speakers overall shorter VOT. Figure 2 visualizes these findings by consonantal place of articulation.
VOT of voiced plosives was bimodally distributed in 47 of the 70 participants in Dutch and in 51 of the 68 participants in German. VOT of voiced plosives was therefore treated categorically as either prevoiced (negative VOT) or short lag (short positive VOT). Table 3 shows the mean percentages and standard deviations of the voiced plosives produced with prevoicing (and inversely related short lag VOT) over participants together with the total number of analysable prevoiced and short lag tokens per voiced plosive by language and language background. Both groups of bilinguals produced overall more prevoiced tokens in Dutch than in German, although this difference is more pronounced in the L1G–L2D speakers. In Dutch, the L1D–L2G speakers produced the highest percentage of voiced plosives with prevoicing, closely followed by the monolingual Dutch speakers. This small between-group difference may be ascribed to the larger number of males in the L1D–L2G group, who typically produce more prevoicing than females (Ryalls et al., 1997). The L1G–L2D speakers produced a lower percentage of prevoiced plosives in Dutch than the two groups of Dutch native speakers. In German, the monolinguals produced the lowest percentage of prevoiced plosives, followed by the L1G–L2D speakers. The L1D–L2G speakers produced the highest percentage of prevoiced plosives. Figure 3 visualizes the percentages of prevoiced plosives by language and consonantal place of articulation across the groups. The devoiced voiced plosives had VOT values close to 10 ms in both languages and all groups (Table 4).
1 Description of the statistical models
Statistical analyses using mixed effects regression were performed in
The bilinguals’ differentiation of L1 and L2 VOT was assessed with within-group comparisons of the bilinguals’ Dutch and German. This L1–L2 comparison was conducted separately for the L1G–L2D speakers and the L1D–L2G speakers, and the independent variable (IV) of main interest was
Two between-group analyses addressed nativelikeness of the bilinguals’ VOT in the two languages. L2 attainment was assessed by comparing the bilinguals’ L2 VOT to the other bilinguals’ L1 VOT. L1 attrition was assessed by comparing the bilinguals’ L1 VOT to the VOT of an independent sample of monolinguals. The IV of main interest in all between-group analyses was
Additional IVs were used in all models to account for item-related and participant-related variance due to factors that are known to impact on VOT. Item-related IVs for analyses on voiceless plosives were
Table 5 provides an overview of the model specifications for each group comparison. All models comprised interactions between the IV of main interest and the other IVs, except for the models on L2 attainment, where simplification due to model convergence problems was required. Significant interactions were explored in separate follow-up analyses for each level of the IVs.
2 Results of the statistical models
This section presents the main findings of the three research questions. The first two analyses addressed the bilinguals’ differentiation of VOT in the L1 and L2. Subsequent analyses addressed the bilinguals’ L2 attainment and potential L1 attrition. Lastly, we present findings on variability specific to the target words and participants that did not contribute to the main results.
a Differentiation between L1 and L2 VOT within the bilinguals
The analyses on language differentiation in the L1G–L2D speakers showed that they produced VOT differently when speaking German compared to when speaking Dutch. The L1G–L2D speakers specifically produced longer VOT in voiceless plosives when speaking German (
The L1D–L2G speakers produced distinct VOT for Dutch and German voiceless plosives, but not for voiced plosives. They produced voiceless plosives with longer VOT in German than in Dutch (
b L2 attainment and L1 attrition
The following four analyses concerned the bilinguals’ VOT production in both their L2 and their L1. The reference point for L2 attainment was the other bilinguals’ L1. The reference point for L1 attrition was the speech of monolingual native-speakers.
The analyses on L1 attrition in the L1G–L2D speakers showed that their L1-German VOT of voiceless but not voiced plosives is affected by L1 attrition. The L1G–L2D speakers produced L1-German voiceless plosives with shorter VOT than monolinguals (
The analyses on L1 attrition in the L1D–L2G speakers did not find evidence for attrition of L1-Dutch VOT. The L1D–L2G speakers neither produced L1-Dutch voiceless plosives (
In sum, the results on L2 attainment and L1 attrition show that only the L1G–L2D bilinguals who were immersed in the L2 country partially attained native-like L2 VOT. Similarly, only the L1D–L2G bilinguals who were immersed in the L1 country maintained native-like L1 VOT.
c Variability related to the words and participants
In the following, we present the significant findings on the IVs relating to the target words and participants. As the bilinguals were part of three analyses, the results of an IV for a group was considered significant when at least one analysis including the group yielded significance for an IV. The complete model output of all models is presented in Appendices 2–4.
In analyses on voiceless plosives, all groups produced shorter VOT for /p/ than for /t/ in Dutch and in German, and all groups produced longer VOT for /k/ than for /t/ only in Dutch, but not in German. In addition, all groups produced longer VOT in monosyllabic than in disyllabic words in German, but not in Dutch. In analyses on voiced plosives, all groups prevoiced /b/ more frequently than /d/ in both languages. In all groups except the Dutch monolinguals, males prevoiced more frequently than females. Late bilinguals thus produce language-specific within-category VOT variability related to consonantal place of articulation and word length.
IV Summary
The present study investigated how two groups of Dutch–German late bilinguals in the Netherlands realize the voicing contrast in both Dutch and German by means of voice onset time (VOT). The bilinguals who speak Dutch as native language and German as the L2 are referred to as L1D–L2G speakers, and the bilinguals who speak German as native language and Dutch as the L2 are referred to as L1G–L2D speakers. To achieve native-like L2 VOT, the L1D–L2G speakers need to acquire aspiration for L2-German voiceless plosives and suppress prevoicing for L2-German voiced plosives. The L1G–L2D speakers need to suppress aspiration in L2-Dutch voiceless plosives and consistently prevoice L2-Dutch voiced plosives. We investigated whether (1) both groups of late bilinguals produced VOT differently in L1 and L2; (2) both groups of bilinguals achieved native-like L2 VOT; and (3) both groups of bilinguals maintained native-like L1 VOT.
The L1G–L2D speakers produced voiceless plosives with short lag VOT in L2-Dutch /p/ (
The L1G–L2D speakers seem to have attained native-like Dutch short lag VOT, at least for /p/ and /t/, but they did not yet reach native-like consistent prevoicing. In German, their VOT partly seems to be affected by language attrition, as revealed by shorter than monolingual-like VOT in voiceless plosives. Voiced plosives, by contrast, seem to remain unaffected by language attrition.
The L1D–L2G speakers produced voiceless plosives with longer VOT in L2-German (
The L1D–L2G speakers’ differentiation between voiceless plosives between Dutch and German does not go hand in hand with attainment of native-like VOT in German. They hardly aspirate /p/ (
V Discussion
In the following, we first interpret the results in light of the Speech Learning Model’s (SLM) equivalence classification and contrast maintenance hypotheses (Flege, 1995). We then discuss immersion and language use, articulatory constraints, and foreign accentedness as additional explanations of the results.
1 Equivalence classification and contrast maintenance
The SLM (Flege, 1995) attempts to explain L2 phonetic attainment in relation to the L1 phonetic system. The two main concepts applicable to this study are equivalence classification and contrast maintenance. Differential acquisition, that is deviation from native norms, was observed in the L1D–L2G speakers for both L2-German voiceless and voiced plosives, and in the L1G–L2D speakers for L2-Dutch voiceless /k/ and voiced plosives.
One account within the SLM to explain such differential acquisition is equivalence classification (Flege, 1987, 1995): L2 speakers perceive L2 sounds into their pre-existing L1 categories, and thus produce them in line with their L1 categories. However, equivalence classification cannot explain the specific patterns of differential acquisition in the present results. The L1G–L2D speakers prevoiced less frequently in Dutch than native speakers, but they prevoiced more frequently in L2-Dutch than in L1-German. Similarly, the L1D–L2G speakers did not produce native-like aspiration in L2-German, but they produced voiceless plosives with longer VOT in L2-German than in L1-Dutch. The observed differences between Dutch and German in the L1G–L2D speakers and the L1D–L2G speakers indicate that they perceive differences between the respective Dutch and German plosives. An alternative account for the differential acquisition of Dutch prevoicing and German aspiration lies in articulatory constraints, as discussed in detail below.
Equivalence classification has further limitations explaining the L1D–L2G speakers’ transfer of prevoicing from L1-Dutch to L2-German. Prevoicing is the main cue for Dutch native listeners’ voicing perception (Van Alphen and Smits, 2004). Equivalence classification would thus predict that the L1D–L2G speakers perceive German short lag plosives into their equivalent Dutch short lag voiceless category and thus produce German voiced plosives without any prevoicing. The need to maintain contrast between L2-German voiceless and voiced plosives offers an alternative explanation for the L1D–L2G speakers transfer of prevoicing to German.
Contrast maintenance is a second hypothesis within the SLM to explain differential L2 phonetic acquisition, and suggests acquisition of deviating phonetic categories in L2 to maintain contrast with already existing phonetic categories. The L1D–L2G speakers may need to produce prevoicing in L2-German to maintain a distinction between their voiced and voiceless categories. The VOT of their German voiceless plosives, especially in /p/, is perhaps too short to be contrasted with target-like short lag voiced plosives (Flege and Eefting, 1987a; Keating, 1984).
In contrast to the SLM’s predictions of differential acquisition, the L1G–L2D speakers reached native-like VOT in L2-Dutch /p/ and /t/. Their short lag space was initially occupied by L1-German voiced plosives, and therefore acquiring L2-Dutch short lag voiceless plosives constitutes an intricate task: keeping L2-Dutch voiceless short lag plosives separate from L1-German voiced short lag plosives requires restructuring of L1 phonetic categories. Native-like L2 phonetic categories can thus be acquired under favorable conditions, including long-term L2 immersion with diverse L2 use, simple articulatory gestures, and the social need to reduce a potential foreign accent. The effect of these conditions on L2 attainment and L1 attrition is discussed in detail below.
2 Immersion and language use
The two investigated immersion contexts, full immersion in an L2 environment and immersion in the L2 at home, are comparable in that both contexts involve natural and frequent use of the L2. Full L2 immersion is inherently tied with L2 use in a variety of contexts and also with numerous speakers, whereas it largely limits L1 use to conversations within the family. By contrast, L2 immersion at home limits L2 use to interactions within the family, while the L1 is continuously used outside the home in a variety of contexts and with numerous speakers. Successful L2 acquisition as well as L1 attrition seem to be limited to an immersion context that involves drastic reduction of native L1 contact due to extensive L2 use, as is the case for the L1G–L2D speakers.
One aspect of full immersion that may influence the outcomes of L2 acquisition is exposure to multiple speakers, which is beneficial in monolingual and heritage L1 acquisition (Gollan et al., 2015; Seidl et al., 2014). Such diverse L2 exposure was experienced by the L1G–L2D speakers (exposed to Dutch in and outside the home), who acquired target L2-Dutch voiceless plosives, but not by the L1D–L2G speakers (exposed to German in the home) who did not acquire target L2-German plosives.
Conversely, frequent L1 contact and use in diverse contexts and with multiple speakers may be necessary to prevent phonetic L1 attrition, as has previously been suggested by Mayr et al. (2012). This hypothesis is in line with previous research that found quality and quantity of native language input to play a crucial role in L1 maintenance (De Leeuw et al., 2010). Only the L1D–L2G speakers, who were exposed to L1-Dutch outside the home, maintain native-like L1 VOT. Without frequent and diverse exposure to the L1, the more prominent L2 is likely to impact on the L1 phonetic categories. The L1G–L2D speakers, whose L1-German use was limited to the family context, were affected by L1 phonetic attrition surfacing as shorter than native-like aspiration in L1-German voiceless plosives. Diversity of language use and exposure are important topics for future research into the circumstances that lead to successful L2 acquisition and L1 maintenance.
3 Articulatory constraints
Articulatory constraints seem to be at play when it comes to successful L2 acquisition and L1 maintenance of VOT. In comparison to short lag VOT, aspiration requires an additional timing component, as the glottis must remain open during burst release and be closed shortly after. Prevoicing requires complete glottal closure, and initiation and sustainment of vocal fold vibration before burst release (Kewley-Port and Preston, 1974).
Articulatory least complex short lag VOT was successfully acquired for L2-Dutch /p/ and /t/ by the L1G–L2D bilinguals. L1 short lag VOT was furthermore successfully maintained by the L1D–L2G speakers for L1-Dutch voiceless plosives and also by the L1G–L2D speakers for L1-German voiced plosives. Despite the articulatory simplicity of short lag VOT, it is still remarkable that the L1G–L2D speakers were able to suppress their L1-German aspiration and produce short lag VOT in /p/ and /t/ in L2-Dutch. To our knowledge, such suppression of aspiration in an L2 with target short lag voiceless plosives has never been reported in late L2 learners, and instead aspiration was carried over from L1 to L2 (Flege, 1987).
Although short lag VOT is allegedly easy to produce (Kewley-Port and Preston, 1974), the L1D–L2G speakers produced L2-German voiced plosives with prevoicing instead of short lag VOT. As discussed above, the production of prevoiced voiced plosives in L2-German may be caused by the need to maintain phonetic contrast with the L2-German voiceless plosives, which were produced with shorter than target-like VOT.
Articulatory more complex aspiration was not completely acquired by the L1D–L2G speakers in L2-German. Similarly, the target aspirated L1-German voiceless plosives of the L1G–L2D speakers appear to be affected by phonetic attrition.
The articulatorily most complex Dutch prevoicing was not completely acquired by the L1G–L2D speakers, but was successfully maintained by the L1D–L2G speakers. Despite the complex velopharyngeal activity involved in the production of prevoicing, the L1G–L2D speakers, and also the German monolinguals, are well capable of initiating velopharyngeal adjustments to close the glottis prior to oral release of the consonant, as evidenced by occasional occurrences of prevoicing in their speech. They may, however, not necessarily be able to control the required muscular activities to a similar extent as native speakers of a prevoicing language, which results in overall fewer productions of prevoicing in their speech.
4 Foreign accent
Another factor contributing to successful L2 acquisition and L1 maintenance may be accentedness and the associated social stigmatization (Fuertes et al., 2012; Kinzler et al., 2007). Production of aspiration in a language without aspiration, such as Dutch, is associated with a foreign accent (Flege, 1984; Major, 1987; Riney and Takagi, 1999; Sancier and Fowler, 1997; Schoonmaker-Gates, 2015). Dutch short lag voiceless plosives were successfully acquired by the L1G–L2D speakers and maintained by the L1D–L2G speakers. The social need to avoid stigmatization may be advantageous for the suppression of aspiration in L2-Dutch and the maintenance of short lag VOT in L1-Dutch.
Not all non-native VOT productions are associated with a perceived foreign accent: when target short lag voiced plosives are prevoiced, listeners do not perceive this as foreign accented (Hazan and Boulakia, 1993). This may explain why the L1D–L2G speakers did not suppress prevoicing in L2-German. The finding that the L1G–L2D speakers did not acquire consistent prevoicing in Dutch asks for additional explanations that can be related to articulatory complexity, as discussed in detail above.
5 Limitations
The present study comes with two limitations. First, the amount and contexts of L2 exposure are confounded with the speakers’ L1: as a result of the couples living in the Netherlands, all L1-German bilinguals were exposed more to Dutch than all L1-Dutch bilinguals were exposed to German. Second, the genders were not well balanced across groups: more L1-German bilinguals were female, and more L1-Dutch bilinguals were male. Although all analyses included the variable
VI Conclusions
The present study provided new insight into phonetic differentiation between L1 and L2, as well as L2 attainment and L1 attrition by comparing VOT productions of two groups of L2 speakers who differed in their degree of L2 immersion. Both groups used their L1 and L2 at home, but differed in their L1 vs. L2 use outside the home. Referencing the L2 speakers’ speech to L1 speech of their immediate environment, rather than to a monolingual reference group, addressed the question to what extent the L2 speakers had been able to acquire the L2 from the input that is available to them. The results show that both immersion contexts allowed L2 speakers to restructure their phonetic space to accommodate old L1 and new L2 phonetic categories for voiceless plosives. Only the L1G–L2D speakers who were frequently exposed to Dutch in a variety of contexts and by multiple speakers in their country of residence restructured their phonetic space to accommodate new L2-Dutch VOT for both voiceless and voiced plosives. The acquisition of language-specific VOT did not automatically go hand-in-hand with native-like L2 acquisition. Even when the L2 plays a crucial role in everyday life, L1 phonetic attrition seems to be prevented by frequent use of and exposure to the L1 in a variety of contexts and multiple speakers, for example at the workplace. Combining speech data of bilinguals with L1-Dutch and bilinguals with L1-German for both voiceless and voiced plosives revealed that success in acquiring native-like VOT in L2 and maintaining native-like VOT in L1 may be limited to VOT in the short lag range.
Footnotes
Appendix 1
Appendix 2
Appendix 3
Appendix 4
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
