Abstract
This study investigates the production of English and Tagalog voiceless stops by 14 Tagalog–English bilinguals in California, focusing on the effects of birth country and language dominance. Specifically, the study addresses three questions: (1) How do US-born heritage Tagalog speakers differ from Philippines-born heritage speakers in pronunciation? (2) How do their language-specific acoustic realizations in Tagalog and English vary as a function of birthplace? (3) How does individual language dominance influence the production of voiceless stops in both languages? Data from a reading-aloud task reveal that both US-born (Generation 2) and Philippines-born (Generations 1 and 1.5) bilinguals maintain distinct voice onset time (VOT) patterns for /p/, /t/, and /k/ in each language. Compared to US-born speakers, Philippines-born participants produce more ‘Tagalog-like’ VOTs in English, indicating phonetic convergence. Individual analyses further show that English-dominant bilinguals generally produce longer English VOTs than balanced or Tagalog-dominant speakers, whereas this trend was not statistically significant in their Tagalog VOTs. These findings illuminate the sources of cross-linguistic phonological influence and provide novel insight into the phonetic behavior of a heterogeneous and understudied diasporic community in the US, highlighting the interplay of birthplace and language dominance in heritage bilingual speech.
Keywords
I Introduction
Heritage speakers are early bilinguals who are exposed to both a minority (heritage) language and the majority societal language from a young age. This bilingual exposure may occur either simultaneously, when both languages are acquired from birth, or sequentially, when a child initially acquires the heritage language at home and later learns the majority language upon entering school, typically around the age of 5 or 6 years. As a subgroup of early bilinguals, heritage speakers are distinguished by the fact that their primary language is not the dominant language of their surrounding community. They acquire their heritage language naturally in the home environment, either as the sole language of communication or alongside the majority language in bilingual or multilingual contexts. Heritage languages encompass a wide range of linguistic situations, including diasporic languages spoken by the children of immigrants, indigenous or aboriginal languages threatened by colonization, and historical minority languages that coexist with dominant national languages (Kim, 2024; Montrul and Polinsky, 2021). Over the past several decades, research has demonstrated that heritage speakers occupy a unique position along the bilingual continuum: they differ systematically from both monolingual and second language (L2) speakers yet exhibit overlapping linguistic characteristics with each group (Benmamoun et al., 2013; Polinsky, 2018).
Studies of heritage language speech production have consistently shown that early bilinguals demonstrate advantages over late L2 learners in approximating target-like phonetic categories, largely due to early and sustained exposure to the heritage language (Au et al., 2002; Knightly et al., 2003). However, more recent findings indicate that while heritage speakers can form distinct phonetic categories for each language, these categories are not entirely independent; rather, they exhibit measurable interaction across the bilingual’s sound systems (Chang, 2016; Mayr et al., 2019). Such interlingual phonetic relationships, widely documented in bilingualism and L2 speech research, are attributed to the shared representational networks underlying both languages (Escudero, 2005; Escudero and Yazawa, 2024; Flege, 1995; Flege and Bohn, 2021). These interactions give rise to cross-linguistic influence (CLI), defined as ‘the ways in which a person’s knowledge of the sound system of one language can affect that person’s perception and production of speech sounds in another language’ (Jarvis and Pavlenko, 2008: 62).
CLI has thus become a central focus in bilingual phonetics and phonology, offering insight into how dual language systems interact (Amengual, 2023). To understand CLI more fully, researchers must consider a range of factors, such as proficiency, dominance, and language use, alongside community-specific sociolinguistic variables (Amengual, 2024; Flege et al., 1997, 2003; Guion, 2003). One such variable is the generation of immigration, which can influence both the degree of language exposure and the relative stability of phonetic categories among bilingual speakers whose home language differs from the dominant societal language.
Although research on language shift has long documented intergenerational heritage language attrition, most notably through Fishman’s (1964, 1991) three-generation model, few studies have analysed the acoustic properties of bilingual speech across groups differing by birth country. Building on this framework, the present study primarily investigates birth country as a key determinant of bilingual phonetic outcomes. Specifically, it compares speakers born in the Philippines to those born in the US within Tagalog–English bilingual communities in California. This approach allows for a more precise examination of how place of birth and timing of migration jointly influence phonetic production patterns. Such analyses are essential for understanding the fine-grained phonetic and phonological mechanisms underlying language shift, which reflects broader sociolinguistic dynamics. These processes are particularly salient in diasporic communities in the US, where English increasingly displaces the heritage language as the dominant language. By analysing the acoustic realization of Tagalog and English sounds across these groups, this study examines how birthplace and migration timing interact to shape cross-linguistic influence, providing new insight into the mechanisms of phonetic variation, language maintenance, and shift within a diasporic community that remains markedly underrepresented in the heritage bilingualism literature.
II Background
1 Tagalog and the Filipino diaspora in the US and California
Tagalog, a Malayo-Polynesian language native to the Philippines, boasts over 90 million speakers worldwide (Malabonga, 2009). The Philippines itself is home to a linguistically diverse population, with between 120 and 187 languages spoken (Gordon, 2005; McFarland, 1994). Tagalog, standardized as Filipino, is one of the country’s two official languages, alongside English (Gonzalez, 1998), and serves as the lingua franca for Filipinos across various ethnolinguistic backgrounds. As a result, Tagalog is the most widely spoken language among the Filipino diaspora.
Outside the Philippines, the United States (US) is home to the largest number of Tagalog speakers, including both first-generation Filipino immigrants and US-born Filipino Americans. According to the 2019 American Community Survey by the US Census Bureau (Dietrich and Hernandez, 2022), Tagalog ranks as the third most spoken non-English language in the US, with approximately 1.8 million speakers, following Spanish and Chinese. Nearly half of these speakers live in California (Fonacier, 2010), where Tagalog was the third most spoken language in 2005, with 668,073 speakers. It surpassed Mandarin and Cantonese, ranking only behind English and Spanish (Axel, 2011). Tagalog’s prominence is particularly evident in areas with large Filipino populations, such as Los Angeles, San Diego, and the San Francisco Bay Area, where bilingualism is widespread within the Filipino community.
The use of Tagalog in California extends beyond homes and community spaces; it is also reflected in the state’s media landscape, with Tagalog-language radio stations, television programs, and newspapers catering to the Filipino population. Furthermore, public institutions and educational settings are increasingly offering Tagalog courses and resources in response to the growing demand for bilingual education. Despite Tagalog’s significant presence in the US and California, there remains a surprising lack of research on the language’s linguistic features, particularly in relation to contact-induced changes in the speech of bilingual speakers. This study seeks to fill that gap by analysing acoustic data from Tagalog-speaking residents in California.
2 Intergenerational differences in pronunciation and language shift
In the US, Tagalog, like many other languages, is preserved through both immigration and intergenerational transmission, the passing of language knowledge from one generation to the next within families. Demographers have observed that immigrant minority languages typically undergo an intergenerational language shift toward English in US immigrant families (Fishman, 1965, 1966; Veltman, 1983a, 1983b). This language shift refers to the gradual transition from the immigrant (minority) language to the societal (majority) language, which in the US is English, across successive generations. Fishman’s (1964) three-generation model of language shift conceptualizes the intergenerational trajectory of linguistic assimilation within immigrant communities. In this framework, the first generation (G1) consists of foreign-born immigrants who predominantly retain their native language as the primary medium of intra-group communication. The second generation (G2), comprising the native-born children of immigrants, typically develops bilingual competence, maintaining the heritage language within the familial domain while acquiring and using the dominant societal language in educational and public spheres. By the third generation (G3), representing the native-born grandchildren of immigrants, the process of linguistic shift tends to reach its culmination, as this cohort exhibits near-complete assimilation and monolingualism in the dominant language. While scholars in recent decades have investigated the relationship between life course, age at migration, and generational status among first-generation immigrants, there remains considerable inconsistency in the classification of individuals born abroad who immigrated as adults or children, as well as their offspring, particularly with respect to the varying age thresholds employed to delineate these categories (Waters, 2014).
The absence of scholarly consensus regarding the demarcation of generational boundaries based on age at migration is exemplified in the evolving classificatory frameworks advanced by Portes and Rumbaut. In their earlier formulation, Portes and Rumbaut (2001) situated foreign-born children who migrated in early childhood within the second generation. In a subsequent refinement, Rumbaut (2004) and Portes and Rumbaut (2006) reconceptualized these distinctions through the introduction of the ‘1.5 generation’, denoting those who migrated between the ages of 6 and 12 years, and further delineated fractional categories, 1.75 and 1.25 generations, to designate, respectively, those arriving prior to formal schooling (ages 0–5 years) and those immigrating during adolescence (ages 13–17 years). In this study, we follow the categorization of Silva-Corvalán (1994) who refers to G1 as those foreign-born individuals who immigrated to the US during or after puberty (age of 12 years or after), G1.5 arriving in the US between the ages of 6 and 12 years (Portes and Rumbaut, 2006), and G2 as children of the G1, who were born in the US.
There are still relatively few studies that explore intergenerational differences in pronunciation. In a cross-linguistic investigation of contact-induced change in Toronto, Nagy and Kochetov (2013) examined the acoustic features of voiceless stops in the speech of three generations of heritage speakers (Italian, Russian, and Ukrainian) who also speak English. Their findings revealed a shift toward English voice onset time (VOT) values in the Russian and Ukrainian groups across generations, but not in the Italian speakers. This difference may be linked to the long-established Italian community in Toronto and the city’s educational resources that help preserve the language. Additionally, the study found that reduced heritage language use and weaker ethnic identity were associated with longer, more English-like VOT values.
In a study on the impact of intergenerational transmission of the heritage language, Mayr and Siddika (2018) analysed the production of stop consonants in both Sylheti and English among bilingual children and adults from two sets of Bangladeshi heritage families: G1 migrants from the Sylhet region of Bangladesh who moved to the UK as adults, and their UK-born (G2) children, and G2 UK-born adults and their (G3) children. The results showed significant generational differences in both Sylheti and English, as well as between the children and adult participants. Specifically, these bilinguals demonstrated gradual shifts toward the English VOT range across generations, with G3 children showing the strongest influence from English due to their linguistic environment. The authors suggest that G3 speakers may face both identity-related and input-related challenges that hinder their ability to achieve a native-like accent in their heritage language.
In a related study with the same language pair (Sylheti–English), Mayr et al. (2021) explored speech sound development across different generations of children raised in heritage language environments, focusing specifically on intergenerational differences in the phonological development of G2 and G3 Bengali heritage children in Wales. The results of a picture-naming task in both Sylheti and English showed high levels of accuracy in consonant and vowel production for children from both immigrant generations, particularly in English. Regarding Sylheti consonants, G2 children outperformed G3 children, but only on sounds specific to Sylheti. However, immigration generation did not significantly predict accuracy for English consonants. Additionally, G3 children showed more error types in Sylheti than G2 children, including a more frequent replacement of Sylheti dental stops with alveolar stops. These findings suggest that generational status may be an important factor to consider when assessing bilingual children’s phonological development in their heritage language.
Focusing on Spanish heritage speakers in California, Amengual (2018) examined the pronunciation patterns of four groups of Spanish–English bilinguals, incorporating the variable of immigrant generation. The study analysed the acoustic realization of voiced lateral approximants in the Spanish and English of Spanish heritage speakers (G1.5, G2, and G3) and L2 Spanish learners in California, who varied in their degree of language dominance. The results showed a language-specific phonetic distribution for each language, with Spanish and English laterals differing in their degree of velarization. English laterals exhibited a darker, more velarized articulation, while Spanish laterals were clearer and less velarized across all speaker groups. Intergenerational differences were also found within the Spanish heritage speaker groups: G1.5 speakers produced lighter (more Spanish-like) laterals, while G3 speakers produced darker (more English-like) laterals. These findings align with predictions related to language shift from Spanish to English across immigrant generations in the US.
Returning to the variable of immigrant generation in the Filipino communities in the US, several sociolinguistic studies have classified participants based on the individual’s or family’s connection to the immigration experience, showing a rapid shift towards English. For instance, Axel (2011) conducted interviews with eleven Filipinos (five G1 participants and six G2 participants). The findings reveal a clear shift towards English. For G2 participants, English is the primary language spoken both at home and outside the home, along with Spanish, which is widely spoken in California. The responses to the questionnaire also indicate that G2 children rarely, if ever, acquire Tagalog, as their parents worry that speaking these languages would give their children an ‘accent’ in English (Axel, 2011: 126), which is seen as a significant obstacle to social integration and securing higher-paying jobs (Moro and Russo, 2024).
In this study, birth country serves as the primary factor distinguishing groups of Tagalog–English bilinguals. Participants are first classified based on whether they were born in the Philippines or the US. Within the Philippines-born group, age of arrival is used to operationalize ‘generation’: early-arriving individuals (between the ages of 6 and 11 years) are classified as Generation 1.5 (G1.5), while immigrants who arrived in the US during or after puberty (age 12 years or after) are classified as first generation (G1). The second generation (G2) includes US-born individuals with parents who immigrated from the Philippines. By structuring the sample in this way, the study primarily examines the effects of birthplace on bilingual phonetic outcomes, while also considering age of arrival among Philippine-born participants as a secondary factor influencing language experience. This approach enables a nuanced analysis of how birthplace and migration timing shape the intergenerational transmission of the heritage language within Tagalog–English bilingual communities in California.
3 Heritage Tagalog speech
With respect to the existing research on the pronunciation of heritage speakers, only a handful of studies have focused on the sound system of Tagalog in the Filipino diasporic communities of Toronto, Canada (Kang et al., 2016; Umbal, 2023; Umbal and Nagy, 2021). Kang et al. (2016) compared the production of nine heritage Tagalog speakers’ voiced and voiceless Tagalog stops with ten monolingual Tagalog speakers, and their voiced and voiceless English stops with twelve monolingual English speakers. Their results show that the heritage Tagalog speakers produce target-like voiceless stops in both English and Tagalog, establishing separate phonetic categories in each language, but also reveal that these same heritage speakers exhibit considerable cross-linguistic influence in the form of merged phonetic categories in their acoustic realization of English and Tagalog voiced stops. In the case of voiceless stops, Kang et al. (2016) found that Tagalog heritage speakers can form and maintain separate representations for each language.
In a study examining the speech of heritage Tagalog speakers in Toronto, Umbal and Nagy (2021) use a variationist sociolinguistic framework to analyse Tagalog rhotics in the spontaneous speech of fifteen Generation 1 (G1) and eight Generation 2 (G2) Tagalog speakers, as well as nine homeland speakers from Manila (Philippines). They note that Tagalog has one rhotic phoneme, which typically appears as a tap or trill (Schachter and Otanes, 1972), although an approximant variant has also been observed, likely due to contact with English (Chen et al., 2016; Lesho, 2018). Comparisons between generations and groups show that G2 speakers use the approximant variant more than G1 speakers. Additionally, heritage speakers who report using or preferring English are more likely to use the approximant variant than those who prefer Tagalog. However, the study did not find a significant effect of ethnic identity on the use of the approximant variant: being oriented towards Filipino identity did not appear to influence its use.
In his dissertation, Umbal (2023) examines the production of /p, t, k/ in word-initial voiceless stops by sixteen heritage speakers, categorized by immigrant generation (G1, G2), and twelve homeland speakers, categorized by age group (older, younger). In his analysis of naturalistic speech data, Umbal finds no significant differences between G1 speakers and homeland speakers. Specifically, the degree of English contact among G1 speakers does not appear to influence their Tagalog VOT compared to the homeland speakers. However, a generational difference in VOT is observed between G1 and G2 speakers, with G2 speakers exhibiting a shift towards longer-lag VOT, a pattern more characteristic of English. Umbal interprets this shift as a result of contact-induced change and cross-generational drift.
4 The phonetic variable: Voice onset time (VOT) in Tagalog and English voiceless stops
Voice Onset Time (VOT) refers to the relative timing of the release of the air for a stop consonant and the onset of vocal fold vibration (voicing) of a following vowel. This acoustic measure is widely used as the primary correlate of the voicing contrast in many languages. Since VOT is language-specific (Abramson and Whalen, 2017; Cho and Ladefoged, 1999) and can vary along a continuum from voiceless aspiration to voiced stops, it provides a valuable lens for examining interlingual influence in the pronunciation of Tagalog and English.
Tagalog is considered a true voicing language, where voiced stops are produced with a negative VOT and voiceless stops have a short-lag VOT (Kang et al., 2016). Specifically, word-initial /p, t, k/ in Tagalog have a short VOT and are unaspirated, as [p, t, k]. Previous studies have reported that the VOT of voiceless stops in Tagalog ranges from 0 to 30 ms (Kang et al., 2016; Umbal, 2023). In contrast, English is an aspirating language, where voiceless stops exhibit a significant delay between the release of air and the onset of laryngeal vibration, resulting in a long-lag VOT that ranges from 30 ms to 120 ms (Cho and Ladefoged, 1999; Lisker and Abramson, 1964). Even though previous work on Philippine English phonology describes voiceless stops as unaspirated (Tayao, 2008), Lesho (2018) most recently finds that speakers tend to produce mostly aspirated acoustic realizations, with VOTs ranging between 56 and 87 ms (Lesho, 2018). Measuring the glottal-supraglottal timing (in milliseconds) of voiceless stops in both English and Tagalog, as produced by Californian Tagalog–English bilinguals, serves as a proxy to assess the degree of cross-linguistic influence between their two languages.
III The present study
This study investigates the acoustic realization of voiceless stops /p, t, k/ in the English and Tagalog speech of two groups of Tagalog–English bilinguals, categorized based on their country of birth: one group born in the Philippines, and another born in the US. Drawing from previous research on the speech production of early bilinguals, the study uses VOT to assess the phonetic influence of one language on the other. However, it extends past studies in three key ways: (1) it includes both Philippines-born and US-born bilinguals, representing generations G1, G1.5, and G2, allowing for an exploration of how language shift from Tagalog to English unfold across generations; (2) it examines heritage speakers from the Filipino diaspora in California, a sizable yet underexplored community in the US; and (3) it looks at individual language trajectories to shed light on variation in language contact contexts. The primary research questions guiding this production experiment are the following:
Research question 1: Do Philippines-born and US-born bilinguals produce similar VOT values in Tagalog and English, indicating phonetic convergence, or do they maintain distinct realizations in each language, indicating phonetic divergence?
Research question 2: How does place of birth influence the acoustic realization of these voiceless stops?
Research question 3: How do language dominance profiles of individual bilinguals affect their production of VOT in both Tagalog and English voiceless stops?
IV Method
1 Participants
Fourteen participants (8 males and 6 females) took part in this production experiment. All participants reported growing up in bilingual households, where both Tagalog and English were spoken, and none were native speakers of any other language. The participants were aged between 18 and 22 years (M = 19.7, SD = 1.2) and were undergraduate students at a public research university in California at the time of testing. They were recruited through the Filipino Student Association on campus. All participants reported normal speech and hearing, and normal or corrected-to-normal vision. In exchange for their participation, each received a stipend. For further details on the participants’ age, gender, place of birth, age of arrival to the US (if applicable), place raised, and immigration generation, please refer to Appendix A.
The Tagalog–English bilingual participants were divided into two groups based on their place of birth: Philippines-born and US-born. The Philippines-born group (n = 6) consisted of native Tagalog speakers who had immigrated to the US with their families and included Generation 1 (G1) participants, who moved to the US after the age of 12 years, and Generation 1.5 (G1.5) participants, who arrived between the ages of 6 and 11 years (Silva-Corvalán, 1994). Although these participants had been raised and educated primarily in English in the US, many, particularly the G1 group, had received significant education in Tagalog, with early schooling occurring in the Philippines. The US-born group (n = 8) consisted of second-generation (G2) heritage Tagalog speakers, born in the US to parents who were both born in the Philippines. These participants were raised speaking Tagalog to varying degrees at home but primarily used English in their everyday lives. Table 1 provides additional details on the age, age of exposure to each language, self-rated accents, and typical daily use of both Tagalog and English for each speaker group.
Age, age of exposure, accent self-ratings, and typical daily use of each language.
Each participant completed the bilingual language profile (BLP) questionnaire (Birdsong et al., 2012), which is designed to assess language dominance through self-reported data. The BLP generates a continuous dominance score and a general bilingual profile based on responses to questions across four modules: language history, language use, language proficiency, and language attitudes. The questionnaire was administered in English before the production experiment began. Based on participants’ responses, the BLP produced a global score for each language (English and Tagalog), a language-specific score for each module, and an overall global dominance score (see Appendix B). The scores were then converted to a scale where the English score was subtracted from the Tagalog score. The range of possible scores spans from −218 to 218. As shown in Figure 1, language dominance scores for participants ranged from −43.9 to 175.2. Participants with negative scores were classified as Tagalog-dominant (n = 2), while those with positive scores were classified as English-dominant (n = 12). Figure 1 illustrates the distribution of language dominance scores for both Philippines-born (G1 and G1.5) and US-born (G2) participants based on their BLP values.

Language dominance scores as a function of group (US-born, Philippines-born) and generation of immigration (G1, G1.5, G2) according to the bilingual language profile (BLP).
2 Materials and procedure
The voiceless stop production was elicited through a reading-aloud task. The materials consisted of 30 experimental items in Tagalog and 30 in English, with 10 items for each voiceless stop /p/, /t/, and /k/ in both languages. These items were controlled for factors such as syllable position, vowel context, and stress. The target voiceless stop, followed by a low vowel, appeared in a stressed syllable in both English and Tagalog 1 (e.g. Tagalog tabo ‘bucket’ and English tap). This factor was held consistent across all experimental items. Each target item was embedded in a carrier phrase: I can say TARGETWORD today (English), Kaya kong sabihin ang TARGETWORD ngayon ‘I can say the TARGETWORD today’ (Tagalog). The target experimental items appeared among 200 distractors (e.g. sigaw ‘scream’, sukat ‘measure’ in Tagalog). The full list of materials is provided in Table 2.
Experimental items.
The production task was carried out individually in a soundproof booth, with participants seated comfortably in front of a computer display. Each sentence was shown for 5 seconds on the screen, and participants were instructed by a Filipina native Tagalog-speaking researcher in English to read the sentences clearly and at a natural pace. The 60 experimental items were presented four times in a randomized order. Speech samples were recorded using a head-mounted microphone (Shure SM10A) and audio interface (MOTU Ultralite mk3), digitized at 44 kHz with 16-bit quantization, and edited on a computer for subsequent acoustic analysis. The session produced a total of 240 target productions (i.e. 20 /p/, 20 /t/, and 20 /k/ in both English and Tagalog, with 4 repetitions), resulting in a dataset of 3,360 VOT measurements.
3 Data analysis
a Acoustic analysis
The English and Tagalog voiceless stops /p, t, k/ were segmented using Praat (Boersma and Weenink, 2023), with synchronized waveforms and spectrographic displays. Praat scripts were employed to split each recording into individual files for each experimental item, and text grids were created by manually marking the onset and offset of VOT in each target segment. VOT values were measured by determining the time interval between the stop release and the onset of voicing, as identified by the periodic (repeating) cycles on the waveform. The measurement, rounded to the nearest decimal, was taken from the start of the burst (indicated by a sharp spike where the waveform transitions from quiescent to transient) to the beginning of the first regularly repeating voicing cycle. The onset of voicing was identified as the initial zero crossing in the waveform, as illustrated in Figure 2.

Voice onset time (VOT) measurement obtained from the waveform.
b Statistical analysis
The VOT values were analysed using a generalized linear mixed-effects model in R (R Core Team, 2023) with the lme4 (Bates et al., 2015) and lmerTest (Kuznetsova et al., 2017) packages, which calculated p-values via the Satterthwaite degrees of freedom method. Post-hoc comparisons were conducted using the emmeans package (Lenth, 2020), applying Kenward–Roger degrees of freedom approximation and Bonferroni-adjusted p-values when comparing three levels. The model included fixed effects for Speaker Group (Foreign-born, US-born), Language (English, Tagalog), Place of Articulation (/p/, /t/, and /k/), and their interactions, with a random intercept for Speaker. Marginal and conditional R2GLMM values were computed to estimate effect sizes (Johnson, 2014), with the marginal R2GLMM reflecting the variance explained by both fixed and random factors. Figures were generated using ggplot2 (Wickham, 2016). The alpha level was set at p < .05.
V Results
Research question 1 examined whether Philippines-born and US-born Tagalog–English bilinguals produce acoustically similar voiceless stops in English and Tagalog (i.e. cross-language convergence) or if they maintain distinct acoustic realizations in each language (i.e. cross-language divergence). As shown in Figure 3, both speaker groups on average produced longer VOTs in English than in Tagalog (Philippines-born: 65 ms in English vs. 22.8 ms in Tagalog; US-born: 87.1 ms in English vs. 24.1 ms in Tagalog). The data further reveal that the US-born group’s VOT values fell within the expected range for Tagalog, while also producing longer VOTs within the ‘target-like’ range for English. In contrast, the Philippines-born group exhibited substantially shorter VOTs in English, while their VOT values in Tagalog were also slightly shorter than those of the US-born group.

Voice onset time (VOT) values as a function of group (US-born, Philippines-born) and language (Tagalog, English).
To investigate the VOT patterns of each speaker group based on the place of articulation for each segment in English and Tagalog (/p/, /t/, and /k/), the means and standard deviations were calculated. As shown in Figure 4, the US-born group produced longer mean VOTs across all English segments (/p/: M = 80.4, SD = 25; /t/: M = 90.2, SD = 22.2; /k/: M = 90.6, SD = 23.2) compared to their Tagalog productions (/p/: M = 17.2, SD = 7.5; /t/: M = 20, SD = 9.7; /k/: M = 35, SD = 11). Similarly, the Philippines-born speakers exhibited distinct VOT ranges in their English (/p/: M = 53.4, SD = 23; /t/: M = 73.7, SD = 18.4; /k/: M = 68.2, SD = 19) and Tagalog voiceless stops (/p/: M = 15.9, SD = 6.2; /t/: M = 18.7, SD = 9.1; /k/: M = 33.7, SD = 13.1). These findings align with previous research (Cho and Ladefoged, 1999; Theodore et al., 2009), indicating that voiceless velar stops generally have longer VOT durations than voiceless alveolar/dental stops and bilabial stops.

Tagalog and English voice onset time (VOT) values plotted separately for /p, t, k/ as a function of group (US-born and Philippines-born Tagalog–English bilinguals).
The dataset was analysed using a linear mixed-effects model to examine the effects of Speaker Group (Philippines-born, US-born), Language (English, Tagalog), and Place of Articulation (/p/, /t/, and /k/) as fixed effects, along with the interactions between these variables and a random intercept for participant. The model was specified as VOT ~ Group * Language * Segment + (1 | Participant). To prevent collinearity, the fixed effects were centered, and sum-coding was applied to facilitate the interpretation of main effects and interactions (Singmann and Kellen, 2019). The results revealed a significant effect of Language on VOT values (β = 27.22, SE = 1.06, t = 27.05, p < .001), confirming that VOTs in English are significantly longer than in Tagalog for both speaker groups. A significant effect of Place of Articulation was found (β = 6.16, SE = 1.06, t = 6.12, p < .001), as well as a significant effect of Speaker Group (β = 5.8, SE = 2.1, t = 2.75, p < .05). The interaction between Speaker Group and Language was also significant (β = 5.18, SE = 1.01, t = 5.11, p < .001), with no other interactions reaching significance. The model’s marginal and conditional R2GLMM values were 0.86 and 0.91, respectively.
To further investigate the Speaker Group by Language interaction, post-hoc comparisons using simple contrasts were conducted for both speaker group and language. The Bonferroni-corrected post-hoc pairwise comparisons showed a significant difference in VOT between English and Tagalog voiceless stops for both the Philippines-born bilingual group (β = −42.3, SE = 2.81, t = −15.07, p < .001) and the US-born bilingual group (β = −63, SE = 2.43, t = −25.91, p < .001). Additionally, pairwise comparisons revealed significant differences in the acoustic realization of English VOT between the two groups, confirming that the Philippines-born group produces shorter (more Tagalog-like) English voiceless stops compared to the US-born group (β = −22, SE = 4.62, t = −4.76, p < .001), as shown in Figure 3. However, no significant difference was found between the speaker groups in the VOT values of their Tagalog voiceless stops (β = −1.2, SE = 4.62, t = −0.28, p = n.s.).
Because the presentation of group averages may obscure distinct patterns of between-speaker variation, the next step was to analyse the acoustic realization of English and Tagalog VOTs for each participant. To investigate whether individual bilingual language dominance profiles influence the VOTs in both languages, the average VOT for /p/, /t/, and /k/ in English and Tagalog was calculated for each participant yielding two values per participant: one in English and one in Tagalog. A Pearson’s test was run to measure the linear correlation between the VOT means across all segments and the BLP scores, separately for English and Tagalog. The correlation on the English data revealed a strong positive relationship between English VOT and language dominance, as indicated by the BLP (r = 0.75, t(12) = 3.87, p < .01). While a moderately positive correlation was also observed between the BLP values and Tagalog VOT, it was not statistically significant (r = 0.48, t(12) = 1.87, p = .086). As shown in Figure 5, VOT values in both languages were generally higher for English-dominant bilinguals (those with larger positive BLP scores) than for those with more balanced or Tagalog-dominant profiles (smaller positive or negative BLP scores). This pattern, however, was not statistically significant in the Tagalog VOT data.

Individual mean voice onset time (VOT) values for all stops in English (left) and Tagalog (right) plotted as a function of a speaker’s bilingual language profile (BLP) score.
VI Discussion
1 Summary of results
This study examined the acoustic realization of English and Tagalog voiceless stops /p/, /t/, and /k/ by measuring VOT in the speech of G1, G1.5, and G2 Tagalog–English bilinguals, categorized into two groups based on their place of birth: Philippines-born and US-born. While both groups are early bilinguals, exposed to both languages from birth or an early age and raised speaking Tagalog at home, the Philippines-born group consists of foreign-born G1 and G1.5 individuals, who are either Tagalog-dominant or moderately English-dominant. The US-born group includes G2 heritage Tagalog speakers who were raised speaking Tagalog at home but reported using English predominantly in daily life, leading to a more English-dominant bilingual profile. This production experiment not only explores the impact of intergenerational language shift from G1 and G1.5 to G2 on the acoustic realization of voiceless stops in both languages, but also examines the effects of place of articulation and individual language dominance profiles on the production of /p/, /t/, and /k/ among these bilinguals, all raised and educated within the Filipino diaspora in California.
The results of the production task show that both US-born and Philippines-born Tagalog–English bilinguals have effectively acquired the timing properties of voiceless stops in each of their languages. With respect to their VOT values, these bilinguals maintain language-specific phonetic categories in the production of /p/, /t/, and /k/ in both Tagalog and English, as illustrated in Figure 3. In other words, these early Tagalog–English bilinguals produce voiceless stops with distinct VOT values for each language: a short-lag VOT in Tagalog and a long-lag VOT in English (Kang et al., 2016; Umbal, 2023). Additionally, the acoustic data for both languages reveal that VOT consistently varies according to the place of articulation, which aligns with previous findings on VOT in other languages (Cho and Ladefoged, 1999; Theodore et al., 2009). The ability of both US-born and Philippines-born speakers to establish phonetic categories for voiceless stops in both their dominant and non-dominant languages suggest new category formation (Flege, 1995, 2007).
However, it is important to note that the results also reveal group differences based on language dominance, which can be interpreted as evidence of ‘compromise’ values, indicating cross-linguistic interactions at the phonetic level. The data show that voiceless stops, particularly in English, have longer VOTs when individuals are more dominant in English (i.e. the US-born group). This longer (more ‘normative’) VOT aligns with the predicted intergenerational language shift from Tagalog to English, with values approximating what we expect among English monolinguals. In contrast, only a modest increase in VOT was observed in the Tagalog voiceless stops between the Philippines-born and US-born groups. In short, the analysis of English VOT supports predictions based on the well-documented process of language shift from a minority language to English across generations of immigrant families in California and the US. However, this pattern does not hold for Tagalog, and therefore, does not reflect the cross-generational drift observed in heritage Tagalog speakers in Toronto (Umbal, 2023).
Finally, the analysis of individual data revealed that the degree of dominance in English influenced VOT values in English, with the US-born, English-dominant G2 speaker group showing generally higher VOTs than the Philippines-born G1 and G1.5 groups, who were either more dominant in Tagalog or slightly dominant in English. Specifically, English VOT varied according to language dominance and was significantly correlated with the degree of English dominance. In contrast, for Tagalog, participants were found to produce voiceless stops within the target range, and there was no significant correlation between VOT and language dominance, as measured by the BLP. In line with the results in Kang et al. (2016), Tagalog voiceless stops appear to remain stable within the sound systems of heritage Tagalog speakers.
2 Cross-linguistic influence in heritage language speech
It is commonly assumed that heritage speakers have an advantage in acquiring their heritage language’s sound system due to early exposure. However, as Polinsky (2018) notes, ‘phonetics and phonology remain among the least understood properties of heritage languages’ (p. 162). Considering this, researchers have called for instrumental studies to test the assumption that heritage speakers maintain ‘good phonology’ in their minority language, to better understand the so-called ‘heritage accent’ (Polinsky and Kagan, 2007). This study contributes to this effort by examining the speech production of early bilinguals from different countries of birth. It adds to the broader discussion on intergenerational differences in pronunciation and cross-generational shifts toward the majority language in diasporic communities (Amengual, 2018; Nagy and Kochetov, 2013; Mayr and Siddika, 2018; Mayr et al., 2021; Umbal, 2023).
Previous research has shown that early bilinguals exposed to both their minority (heritage) language and the majority language early in life exhibit persistent effects from their early sound exposure, which continue into adulthood (Amengual, 2019; Sebastián-Gallés et al., 2005). This study explores phonetic variation in the production patterns of Tagalog heritage speakers in California, focusing on how bilingual speech is influenced by the age of onset and the amount of exposure to both languages during early development. For the Tagalog–English bilinguals in this study, both US-born and Philippines-born speakers displayed distinct VOT patterns for /p/, /t/, and /k/ in each of their languages. While these bilinguals maintain language-specific voiceless stops in Tagalog and English, the question remains: does their language experience influence their production patterns? Is language dominance a key factor in understanding the acoustic realization of voiceless stops in bilingual speech?
Heritage speakers, like other bilinguals, tend to have one dominant or stronger language (Cutler et al., 1989; Flege et al., 2002). In immigrant communities in the US, language shift is a well-documented process, with heritage speakers typically becoming more dominant in the majority language across generations (G1 > G1.5 > G2 > G3). Within this Tagalog-speaking immigrant community, bilinguals, especially those who are foreign-born (G1, G1.5), often maintain a high frequency of heritage language use, leading to a different dominance profile compared to those who shift more towards English over time (G2, G3). The effects of language dominance are evident in the analysis of individual data, which shows that dominance operates along a continuum, capturing variations towards more English-like or Tagalog-like VOT values in both the US-born, English-dominant G2 group and the Philippines-born G1 and G1.5 groups.
3 Future directions
This phonetic production experiment examined the phonetic behavior of heritage speakers by considering factors such as language dominance and place of birth. The acoustic feature analysed is VOT, a reliable measure of consonantal voicing distinctions across many languages (Abramson and Whalen, 2017) and one that is particularly sensitive to change in language contact situations (Chang, 2012; Flege and Eefting, 1987). The ease with which VOT can be obtained, measured accurately, and replicated, likely explains why voiceless stops are one of the most studied phonetic variables in bilingual speech research. While this study focuses on VOT, future research on bilingual cross-linguistic influence could complement VOT analysis with measurements of other language-specific acoustic properties related to laryngeal timing and stop articulation in voiceless stops. These could include spectral characteristics of stop bursts and aspiration intensity (Repp, 1979; Sundara, 2005), F0 onset frequency and movement patterns (Dmitrieva et al., 2015; Hombert et al., 1979), F1 onset frequency and movement patterns (Hillenbrand, 1984), or spectral tilt and H1–H2 ratios (Kong et al., 2012).
Recent research has increasingly examined short-term, dynamic phonetic interactions, particularly through bilingual studies on language mode induced in laboratory settings (Amengual, 2018, 2021; Simonet, 2014; Simonet and Amengual, 2020). In these experiments, language mode is manipulated by having participants complete separate monolingual and bilingual sessions, spaced at least 72 hours apart. In monolingual sessions, participants read words in only one target language; in bilingual sessions, carrier phrases from both languages are presented in random order. Although monolingual sessions may not fully suppress the non-target language for bilingual speakers, thus still engaging them in a partial bilingual mode, it is assumed that bilingual sessions induce a higher degree of bilingual activation. When both languages are active, competition between their phonetic representations can occur, increasing interference during speech processing (Grosjean, 2001). Future research can expand on the results of the present study by exploring the potential ‘cost’ of this dual activation, investigating how language mode manipulations may measurably affect the phonetics of both languages in heritage speakers.
It is important to note that participant gender is not balanced across generation or country of birth in this study. Research on VOT in American English has produced mixed results regarding gender effects: some studies report that females produce longer VOTs than males for voiceless stops in both American and British English (Koenig, 2000; Robb et al., 2005; Whiteside and Irving, 1998; Whiteside and Marshall, 2001; Whiteside et al., 2004), while others find no systematic gender differences (Morris et al., 2008; Smith, 1978). Cross-linguistic research in languages such as Korean and Mandarin further suggests that VOT variation cannot be explained solely by physiological factors (Li, 2013; Oh, 2011). In the present study, it is unlikely that gender confounds account for the observed patterns, as the Philippines-born group includes more females and the US-born group more males, a distribution that would, if anything, reduce group differences. Nonetheless, future research should consider gender in combination with place of birth, generational status, and language dominance to more fully understand potential gender effects in bilingual speech.
Given the heterogeneity of heritage speakers and the variability observed in acoustic data, it is essential to incorporate larger sample sizes to enable a more fine-grained analysis of generational differences in the speech of Tagalog–English bilinguals. Beyond sample size, research can benefit from focusing on the linguistic and social factors that contribute to this variation, thereby refining the definition of the ‘heritage’ speaker group. One important factor in heritage language acquisition is the potential influence of ethnic identity on speech production. For Tagalog heritage speakers, Umbal (2023: 58) suggests that variability in speech patterns may reflect alignment, or lack thereof, with Filipino identity, predicting that speakers with stronger ties to Canada may produce more English-like patterns (e.g. longer VOTs), whereas those more strongly aligned with Filipino identity may retain homeland-like patterns (e.g. shorter VOTs). Although the present study did not measure ethnic orientation directly (Hoffman and Walker, 2010; Noels, 2014), prior research indicates that ethnic identity can shape linguistic variables in heritage languages, albeit with mixed effects: some studies report measurable influences (Nagy et al., 2014; Umbal, 2023), while others find weak or non-significant effects (Nagy and Kochetov, 2013; Nagy et al., 2011). These findings highlight the complex interplay between social identity and phonetic variation in heritage bilinguals and underscore the importance of considering social factors alongside linguistic variables in future studies.
In addition to broadening the scope of heritage language pairings and generational groupings, expanding research on segmental and suprasegmental features, and examining the relationship between perception and production in heritage language sound systems, greater attention should be directed toward the role of ethnic identity in shaping phonetic variation among heritage speakers. Specifically, it remains unclear which dimensions of ethnic orientation are most strongly associated with factors such as place of birth, generation of immigration, language dominance, patterns of use and exposure, and other biographical or non-linguistic variables that influence the linguistic behavior of heritage speakers. Addressing these questions is critical for developing a more nuanced understanding of the social and cognitive mechanisms underlying variation in heritage language phonetics.
VII Conclusions
This study investigated the acoustic realization of English and Tagalog voiceless stops produced by fourteen Tagalog–English bilinguals in California, categorized into a Philippines-born group and a US-born group. Analyses of a reading-aloud task in both languages revealed three key findings. First, both US-born and Philippines-born bilinguals maintained distinct VOT patterns for /p/, /t/, and /k/ in each language. Second, evidence of cross-linguistic phonetic interaction was observed: English voiceless stops exhibited longer VOTs, particularly among those more English-dominant speakers (i.e. the US-born group), whereas Tagalog voiceless stops showed no comparable intergenerational shift. Third, individual-level analyses indicated that English VOTs were higher for the most English-dominant bilinguals relative to those with more balanced or Tagalog-dominant language profiles, a trend not mirrored in Tagalog VOTs for either group. Collectively, these results provide new insight into cross-linguistic phonological influence within a diverse and underexplored diasporic community, highlighting the differential impact of language dominance and birth country on bilingual phonetic production.
Footnotes
Appendix
Bilingual language profile (BLP) scores for each module per participant.
| Language history | Language use | Language proficiency | Language attitudes | Global score English | Global score Tagalog | Dominance |
|||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ENG | TAG | ENG | TAG | ENG | TAG | ENG | TAG | ENG | TAG | Score | |
| P05 | 46 | 92 | 29 | 21 | 20 | 24 | 14 | 24 | 129.67 | 173.61 | −43.94 |
| P06 | 54 | 86 | 32 | 12 | 10 | 18 | 14 | 18 | 113.87 | 133.84 | −19.96 |
| P10 | 86 | 73 | 37 | 13 | 24 | 24 | 17 | 21 | 172.44 | 149.46 | 22.98 |
| P01 | 116 | 81 | 44 | 6 | 23 | 18 | 23 | 17 | 205.04 | 122.76 | 82.28 |
| P07 | 103 | 63 | 39 | 11 | 24 | 12 | 23 | 16 | 195.96 | 104.15 | 91.81 |
| P11 | 86 | 67 | 43 | 7 | 24 | 13 | 18 | 16 | 181.25 | 103.87 | 77.37 |
| P02 | 116 | 20 | 50 | 0 | 24 | 1 | 24 | 13 | 216.12 | 40.86 | 175.26 |
| P03 | 117 | 60 | 46 | 4 | 23 | 14 | 21 | 19 | 203.13 | 106.51 | 96.62 |
| P04 | 112 | 42 | 49 | 1 | 24 | 2 | 22 | 6 | 208.67 | 38.31 | 170.36 |
| P08 | 116 | 26 | 44 | 6 | 24 | 9 | 24 | 16 | 209.58 | 75.09 | 134.49 |
| P09 | 116 | 42 | 43 | 7 | 24 | 8 | 23 | 17 | 206.22 | 83.44 | 122.77 |
| P12 | 92 | 43 | 39 | 9 | 24 | 7 | 24 | 21 | 193.23 | 92.89 | 100.34 |
| P13 | 96 | 51 | 44 | 6 | 24 | 13 | 23 | 19 | 198.23 | 102.33 | 95.9 |
| P14 | 120 | 57 | 41 | 9 | 24 | 12 | 24 | 22 | 208.13 | 112.86 | 95.26 |
Acknowledgements
We would like to thank our participants for their contribution to this study. We would also like to thank the anonymous reviewers and Associate Editor Jeff Holliday for the very helpful comments and suggestions we received during the peer review process. Finally, we wish to express our appreciation to the audience of the 11th International Symposium on the Acquisition of Second Language Speech (New Sounds 2025) at the University of Toronto, for the feedback on our project.
Author contributions statement
Conception and design: MA and MuA; collection of data: MA and MuA; data analysis and interpretation: MA and MuA; drafting of the paper: MA; revisions: MA and MuA; final approval of the manuscript: MA and MuA: agreement to be held accountable for all aspects of the work: MA and MuA.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Ethics approval and informed consent statements
The University of California, Santa Cruz Institutional Review Board at the Office of Research Compliance Administration approved this research (HS-FY2023-122). Prior to data collection, adult participants signed consent forms. This study was conducted according to the guidelines of the Declaration of Helsinki and approved by UCSC IRB on 16 December 2022.
