Abstract
This study investigates socioprosodic variation in Yami, a moribund indigenous language under intense contact with Mandarin in Taiwan: 32 bilingual (Yami-dominant, balanced, Mandarin-dominant) and 5 Yami-monolingual participants were recruited. We used an Interactive Card Game to elicit semi-spontaneous speech for neutral questions (NQ), default declarative questions (DQ1), and declarative questions with lighter incredulity (DQ2). Results reveal that (1) yes/no question intonation in Yami is highly variable; (2) on a broad community level, the DQ1–DQ2 distinction is absent from Yami; and (3) there is prosodic hybridization and innovation in bilingual speech. In particular, we see significant differences in DQ1 and NQ productions, with DQ1s having a rising nuclear configuration, higher pitch level, and wider pitch span, while NQs are realized with a mid-level pattern, lower pitch level, and narrower pitch span. DQ2 utterances exhibited highly varied nuclear configuration patterns with no significant differences in either pitch level or pitch span in DQ1–DQ2 comparisons. Yet, there is evidence that a hybridized DQ2 has begun to be integrated into Yami among younger bilinguals, suggesting that present-day Yami is in flux and is undergoing restructuring.
These intonational variations are not easily attributable to a weakened Yami identity. Rather, younger bilinguals, who are leading the change, are highly dedicated to cultural practices and show strong rootedness in their Indigenous identity. Seemingly, while these less fluent speakers no longer use Yami to fulfill their everyday communicative needs, they are leaning more on its socio-indexical functions to reflect their ethnocultural identity.
1 Introduction
Migration is an integral part of human societies and has impacted language communities in many ways. One consequence is that speakers of distinct languages may incorporate external features into their native language (L1), causing restructuring in the recipient language (Mufwene, 2001, pp. 16–22; Thomason, 2001, pp. 85–88). In cases of intense contact, this incorporation is argued to follow a borrowing hierarchy starting with non-basic words, accompanied by syntactic or phonological features (Thomason, 2001, pp. 69–71; Thomason & Kaufman, 1988, pp. 74–76). With increased intensity and influence, fine-grained inflectional morphology or lower-level prosody 1 may also be added into the recipient language. This highly cited model, however, makes no direct predictions about higher levels of prosody, which is in fact permeable under contact (section 1.2).
When examining contact-based changes in bilingual contexts, Thomason (2001, pp. 12–14) argued that social factors often outweigh linguistic factors in predicting linguistic outcomes. This echoes the language ecology framework which sees the complex sociopolitical setting, along with speaker’s agency and identity as influential factors accounting for contact-based changes (Haugen, 1972; Lim & Ansaldo, 2015, pp. 118–127; Mufwene, 2001, p. xii; Ravindranath, 2015; Rodríguez-Ordóñez, 2019).
Yami, an Austronesian stress language spoken by the indigenous people of Orchid Island in Taiwan, is under heavy contact with Mandarin, a lexical tone language. It is a prime example of how sound change occurs in heavy contact contexts and how linguistic innovation prompts rootedness in speakers’ indigenous identity. For instance, recent studies on phonological variation in Yami have evidenced contact-induced segmental (Lai & Gooden, 2014, in press; Lai & Hsu, 2013) and intonational variation (Lai, 2018a; Lai & Gooden, 2018a, 2019) in Yami speech. The situation has been worsened by a recent tourism boom that has boosted Mandarin’s linguistic capital (Bourdieu, 1991) at the expense of Yami, especially among younger people (Lai & Gooden, 2018b; Rickford & Rickford, 2010).It is against this backdrop that we see Yami speakers’ language use and identities being (re)negotiated. We argue that phonological changes in Yami could serve to express speakers’ ethnocultural identity rather than reflecting a wholesale adoption of mainstream Taiwanese language and culture (Lai & Gooden, 2019).
This study further tests contact-influenced intonational variation in Yami (Lai, 2018a; Lai & Gooden, 2018a, 2019) with three specific goals. First, we conducted auditory and acoustic analyses on yes/no questions elicited in three pragmatic conditions to evaluate the ways in which speakers’ language background shape their linguistic output. Second, we investigate whether Yami exhibits (or has developed) discourse-pragmatic distinctions in yes/no question prosody analogous to those seen in Mandarin. Third, we evaluate the nature of Mandarin influence on Yami and discuss the intersections of observed socioprosodic variation and expressions of speakers’ ethnocultural identity.
Broadly speaking, this study is important for augmenting our understanding of the impact of language contact on prosodic variation in Yami as well as other Indigenous and under-represented languages, given still limited knowledge in this area of research.
The rest of the paper is organized as follows: we review frameworks concerning linguistic variation in bilingual communities (section 1.1) and present different types of contact-induced prosodic variation (section 1.2) to help inform the assessment of intonational variation in Yami. Section 2 compares Mandarin and Yami intonations to highlight potential areas of Mandarin influence on Yami, followed by a social portrait of Orchid Island (section 3). This helps contextualize the language ecology of the Yami community. Section 4 describes the methods and section 5 presents results. Section 6 synthesizes the findings and offers a general discussion of intonational variation in Yami. Section 7 is the conclusion with suggestions for future research.
1.1 Variation in ethnic/minority language in bilingual communities
Early contact literature recognized bilingualism as “the true locus of language contact” (Weinreich, 1968 [1953], p. 71) and held that linguistic features are borrowable between languages (see also Thomason, 2001; Thomason & Kaufman, 1988), causing bilingual speech to differ in several ways from monolingual speech. Prior research suggests alternative models for language variation in bilingual speech—one concerned with contact-induced change (Maher, 1991) and the other with language obsolescence effects (Sasse, 1992a). While these frameworks drew attention to morphosyntactic reduction and simplification in bilinguals’ ethnic languages, we believe these concepts can illuminate the inquiry into prosodic variation as well.
As laid out in Table 1, the two models differ in their treatment of social contexts, locus of change, and mechanisms of variation, thus predicting different linguistic outcomes. According to the contact-induced change model (Maher, 1991), in enclave speech communities with fewer speakers of a minority group than of a dominant language group, speakers acquire their ethnic language (L1) at first via intergenerational language transmission and successively accumulate their L1 language knowledge. Later on, they develop competence in the socially dominant language (L2) to become bilinguals. Coexistence of two language systems may trigger a restructuring process, via transfer or borrowing, 2 which ultimately reduces morphosyntactic complexity in their L1 grammar to promote communicative efficiency. The reduced morphosyntactic structure still exhibits a certain degree of stability such that speakers’ L1 is fully functional, and they are still considered “good speakers” by older monolinguals.
Key Differences Between Contact-Induced Change and Language Decay Models.
Under the language decay model (Sasse, 1992a), there is irreversible language loss and reduction among two types of “imperfect speakers.” 3 The first type of speaker develops comparably good proficiency from former fluent speakers, but never reaches full competence due to the lack of regular communication in a later stage of language acquisition. The second type, semi-speakers, produces “pathological” speech forms as they learned their ethnic language by listening to and occasionally talking with elder fluent speakers. Their linguistic competence thus remains confined to a closed list of short sentences and expressions—an indication of irreversible language loss (Sasse, 1992b). Interestingly, Sasse (1992a) noted that semi-speakers may at times hide their reduced language competence and avoid speaking the obsolescent language, while at other times they use it as a phatic symbol of ethnic identification.
Recent scholarship on heritage languages (Polinsky & Scontras, 2020) is also instructive for understanding the patterns of prosodic variation among Yami speakers. This research incorporates views on contact-induced change, language attrition, and divergent attainment 4 to account for differences between bilinguals’ heritage language and the monolingual baseline. Polinsky and Scontras (2020) cite processing pressures and high cognitive demands of maintaining distinct grammars as an impetus for change among bilinguals. Consequently, heritage speakers develop strategies like (1) avoidance of ambiguity, (2) resistance to irregularity, and (3) shrinking of structure for infrequent grammatical properties in their speech. Crucially, empirical observations suggest that some aspects of heritage languages remain resilient, while others are susceptible to change. In particular, segmental systems appear to be robust in heritage languages, whereas prosodic features appear to be among the more vulnerable domains. In a commentary on Polinsky and Scontras’s (2020) article, Flores and Rinke (2020) include a fourth influential factor, language-internal variation among fluent speakers who are mainly exposed to colloquial (non-standard) input.
We do not assume a priori that intonational variation in Yami is the result of contact-induced change, but compelling reasons to do so are twofold. First, most of our bilingual participants reported Yami as their L1 and primary language by preschool age (section 4.1). Second, there are instances where fluent bilinguals produced novel intonational patterns that are not Yami-like. Such variation is hard to explain solely through a language decay model, given speakers’ grammatical proficiency. Following Lai (2018a), the investigation of intonational variation is primarily situated within the contact-induced change framework (section 1.2), while being open to other theoretical explanations to best account for the varied speech patterns observable in Yami.
1.2 Contact-induced prosodic variation
Recent studies have shown that bilingual speakers exhibit complex prosodic patterns that are intertwined with the sociolinguistic profile of their community. In particular, researchers report several outcomes that have been categorized as cases of borrowing, bidirectional influence, fusion, and hybridization.
Broadly speaking, borrowing is the addition or integration of an external feature into one’s native language and is among the most obvious phenomenon in the speech of bilinguals. These include convergence toward Italian-like early peak alignment of prenuclear pitch accents and steep final lowering in neutral declaratives in Buenos Aires Spanish (Colantoni & Gurlekian, 2004); the replacement of contrastive lexical accentual rises and falls in Glasgow Gaelic with rising intonation patterns typical of Glasgow English (Nance, 2015); accommodation of Catalan-like falling absolute interrogative intonation in the speech of Peninsular Spanish monolinguals (Romera & Elordieta, 2013); as well as adoption of Catalan-like falling pitch accent in neutral declaratives and falling terminal tune in absolute interrogatives in younger Spanish–Catalan bilinguals’ Spanish production (Simonet, 2008, 2010a).
Potential mechanisms for prosodic change include direct and/or indirect borrowing, 5 both exemplified by Italian influence on Buenos Aires Spanish (Colantoni & Gurlekian, 2004). Direct borrowing is attested in Spanish monolinguals’ speech as they adopted Italian features into their speech due to a high concentration of Italian immigrants, and thus frequent exposure to Italian, in the city. Indirect borrowing is driven by some intermediate varieties in the contact setting. Specifically, in Buenos Aires, Italian prosodic patterns have spread to Lunfardo, a nonstandard variety of Spanish that is gaining prestige from the media. The rise of Lunfardo then accelerated the spread of Italian intonation to the speech of Spanish monolinguals. Similarly, Romera and Elordieta (2013) noted that when Peninsular Spanish monolinguals move to Majorca, Catalan–Spanish bilinguals use their L2—an intermediary variety of Spanish with strong Catalan prosodic features—for interaction. Catalan-like components are then indirectly permeated through the speech of Peninsular Spanish monolinguals.
Bi-directional influence is observed in cases where L1 and L2 mutually influence each other to create intermediate (phonetic) forms. For example, Dutch–Greek bilinguals (Mennen, 2004) exhibited Dutch-like early alignment in their Greek data (L1–L2 transfer) and had a smaller timing difference in peak alignment in their Dutch data, arguably due to Greek influence (L2–L1 influence). This is unlike the speech of Dutch monolinguals whose peak alignment is influenced by vowel length (early alignment for long vowels and late alignment for short vowels). Mennen suggested that such bi-directional influence may over time lead to intermediate forms that are neither like L1–Dutch nor L1–Greek patterns.
In a fusion scenario, L1 and L2 mutually influence each other to create a system with melodies from both languages, as in Turkish–German bilingual intonation (Queen, 2001). The two languages differ in phrase-final rises in interrogatives: Turkish with a sharp rise (L% H%), 6 and German, a dipped rise (L*+ H H%). Two patterns were reported in bilingual speech: in the minor pattern, bilinguals maintained separate intonation patterns in their Turkish and German production, respectively. The majority pattern showed co-occurrence of Turkish-like and German-like rises in both languages. This mixed pattern did not occur in the speech of German- or Turkish-monolinguals and is not explainable through borrowing or transfer. Queen coined the term “fusion” to capture this distinct prosodic pattern. Similarly, O’Rourke (2005) found that Cuzco Quechua-Spanish bilinguals retained early peak alignment in their L1 Quechua tokens, but displayed heterogeneous patterns in L2 Spanish production, including Spanish-like late alignment, Quechua-like early alignment, alongside intermediate alignment patterns.
In addition, a hybrid system arguably due to contact between stress languages and West African tone languages is seen in some Caribbean Creole languages like Papiamentu, Palenquero, and Saramaccan (Gooden et al., 2009), as well as in Pichi, the Equatorial Guinea English Creole (Steien & Yakpo, 2020; Yakpo, 2009). 7 While the outcomes are somewhat clear, the precise mechanisms by which these systems have been created are far less obvious in many cases and are not uniform across the Creoles (Clements & Gooden, 2009). For example, in Papiamentu (a Spanish and Portuguese-lexicon Creole), words have both tone and stress (Remijsen et al., 2014; Remijsen & Van Heuven, 2005; Rivera-Castillo, 2009). Lexical tone contrasts words with Tone I (HL) and Tone II (LH) and stress may fall on the penult or the final syllable. This yields three combined tone + stress patterns: (1) Tone I + penult stress, (2) Tone II + penult stress, and (3) Tone II + final stress. In Palenquero Spanish-lexicon Creole, there is a mixture of Bantu lexical H/L tone contrast and Spanish word-level stress contours (Hualde & Schwegler, 2008). Speakers invariably associate stress with a lexical H tone, leaving L tones unstressed. Hualde and Schwegler attributed this unique word-level contour to Palenquero’s Bantu substrate languages. Saramaccan, an English and Portuguese lexicon Creole, (Good, 2004a, 2004b, 2009) presents a rather different type of hybrid system, with two clearly differentiated lexical strata. One stratum has words of West African origin with lexically specified H/L tones and the other includes words of Portuguese and English origin, with context-dependent tone assignment and a culminative high tone (“pitch accent”) on the syllable with primary stress in the cognate word in these lexifier languages.
As with these cases, Yami (stress language) and Mandarin (lexical tone language) contact offers another opportunity to evaluate cross-linguistic influence on bilinguals’ prosodic systems. To identify potential areas of prosodic variation, we first compare Mandarin and Yami intonations below.
2 Intonation in Mandarin and Yami
2.1 Mandarin question intonation
Mandarin speakers use the subject–verb–object (SVO) order and a falling intonation in declarative sentences (Chuang et al., 2007; Chuang & Fon, 2016; Lai, 2018a, 2019; Lai & Gooden, 2018c; Shen, 1990). Yes/no questions and declaratives have identical syntactic structures and the most widely used types of questions include as follows:
Particle question—formed by attaching the sentence final question particle -ma/-ne to solicit information from the interlocuter (Lai, 2019; Li & Thompson, 1982; Liing, 2014). Due to the lack of presupposition, particle questions are also called neutral questions (NQ).
Confirmation-seeking question—formed by adding the sentence final question particle -ba to confirm the asker’s initial guess.
Declarative questions—formed with a declarative syntactic frame and a high-rising intonation to express the speaker’s incredulity/surprise (hereafter DQ1). Speakers may add the final particle -ma to a DQ1 to express a lighter degree of incredulity (hereafter DQ2).
These question types are also differentiated prosodically. Shen (1990) found that yes/no questions are realized with a higher tune than corresponding declaratives. Chuang et al. (2007), Chuang and Fon (2016), Lai (2018a), and Lee (2005) added that declarative questions (DQ1 and DQ2) are overall higher in pitch and have wider (final) pitch range than NQs. Within declarative questions, DQ1 is realized with a high-rising tune and DQ2, a high-level pitch contour (Chuang et al., 2007; Chuang and Fon, 2016). Lai’s (2019) study on Taiwanese Mandarin showed that though syntactically similar, confirmation-seeking questions are higher in pitch and have a steeper F0 slope than NQs. Examples of different utterance types, along with their intonation patterns, are illustrated in Table 2.
Mandarin and Yami Intonations.
2.2 Yami prosody and intonation
Morphosyntactically, Yami is an agglutinative language with rich affixational morphology and variable word order. Yami declaratives follow a preferred verb–subject–object (VSO) order, with the subject often being dropped (Chang, 2000, pp. 63–65; Rau & Dong, 2006, p. 91). Word order variation is affected by speaker age, such that speakers under 50 prefer an SVO order, possibly due to language contact with Mandarin (Rau & Dong, 2006, p. 97). Yami yes/no questions, as in Mandarin, have a similar syntactic structure to declaratives, with an optional final particle -ri or -ja(n). 8
Scholarly discussion on Yami prosody is comparatively new. Rau and Dong (2006, p. 82) noted that stress is phonemic in Yami, as in [mapiŋʂ
Regarding phrase-level prosody, Lai and Gooden (2015, 2016) proposed a three-layered prosodic hierarchy: word, phonological phrase, and intonational phrase (IP). Every phonological phrase contains at least one pitch accent; each IP has at least one phonological phrase and is marked by a final boundary tone at the right edge. For sentence intonation, Lai and Gooden (2015) and Lai (2018a) found that Yami declaratives and wh-questions have a falling intonation, whereas confirmation-seeking questions have a rising pattern. Lai and Gooden (2015) also showed that the F0 patterns associated word-level stress may be overridden by sentence-level prosody in IP final boundary position. 9
Global pitch trends in Yami questions (Lai, 2018a; Lai & Gooden, 2018a, 2019) indicate an association between speakers’ language background and realization of NQ boundary tones. Specifically, older monolingual speakers used an L%, while Mandarin-dominant bilinguals preferred an H%. No clear pattern was observed among balanced bilinguals, who had H%, L%, and mid-level (M%) contours. We posited that the default NQ intonation in Yami is a falling one given its prevalence in older speakers’ speech, which is often considered the traditional or more conservative forms of a language (see Berge, 2010; Dorian, 1981, 1994; Grinevald, 2003). The mid-level boundary tone may be due to Mandarin influence since Mandarin speakers also use M%. For declarative questions, speakers favored a high-rising pattern for DQ1s, but did display a clear DQ2 intonation pattern.
A comparison between Mandarin and Yami question intonation is shown in Table 2, among which NQ, DQ1, and DQ2 are the focus of this paper. Schematic representations of Mandarin intonation are plotted based on previous research (Chuang et al., 2007; Chuang & Fon, 2016; Lai, 2018a, 2019; Lai & Gooden, 2018c; Lee, 2005; Liing, 2014; Shen, 1990). For the Yami data, given that these are (semi-)spontaneous speech with variable morphosyntactic structure, it is unsurprising that participants produced different variants of the same utterance. 10 Table 2 therefore presents some representative utterances, rather than an exhaustive list from our participants.
2.3 Pitch range as a function of language background and discourse pragmatics
Differences in use and interpretation of F0 may be influenced by sociocultural factors and are thus likely candidates for effects of contact-induced change (Busà & Urbani, 2011). We examined pitch range as a parameter potentially manipulated by speakers in this way. Results on bilingual pitch range, however, are incongruous. While studies have shown that bilingual speakers had narrower pitch ranges in their L2 production than the monolingual baseline (e.g., Busà & Urbani, 2011 for English–Italian bilinguals; de Leeuw, 2019 for German–English bilinguals; Passoni et al., 2019 for Japanese–English bilinguals; Zimmerer et al., 2014 for both German–French and French–German bilinguals), there is also evidence showing that Japanese–English bilinguals produced a wider pitch range in their English (L2) speech, and their pitch variation is even greater than those produced by native English speakers (Aoyama & Guion, 2007). Pitch range may additionally have discourse pragmatic functions as greater pitch ranges are related with stronger degrees of incredulity (Chuang et al., 2007; Chuang & Fon, 2016; Crespo-Sendra et al., 2010).
In sum, this study aims to provide clarification on the melodic patterns produced in three yes/no question pragmatic contexts (NQ, DQ1, and DQ2) and see whether they are differentiated by Yami speakers, given heterogeneous language proficiencies across the community. Critically, we would like to clarify whether there is DQ1–DQ2 distinction in Yami. It is possible that unlike Mandarin, there is no authentic DQ2 in Yami (section 2.2). If it is present, then we would expect it to occur most prominently in the speech of younger bilinguals because they have integrated Mandarin syntactic–pragmatic structure to their Yami. In addition, given that pitch is modulated by bilinguals’ language background and the pragmatic subtlety of utterances (section 2.3), it is important to evaluate how our participants’ speech aligns with or diverges from patterns reported in other studies. In what follows, we describe how the local linguistic ecology has changed in the Yami community, to better understand the impetus for and directionality of intonational variation in Yami-Mandarin bilingual speech.
3 Language ecology: a social portrait of Orchid Island
Taiwan is a multiethnic society in which Taiwanese Mandarin, Taiwanese Southern Min, Hakka, sixteen Austronesian (Indigenous) languages, and various Southeast Asian languages are spoken. Mandarin has been promoted as the official language since the 1940s and Yami is an indigenous language spoken on Orchid Island, situated 56 miles off the southeast coast of Taiwan (yellow square in Figure 1) and is only accessible by ferry or flight dependent on good weather.

Relative locations of Taiwan and Orchid Island.
Since both Taiwan and Orchid Island are islands, for clarity, we will refer to Orchid Island as “the island” and the Yami people as “islanders.” We refer to Taiwan as the “mainland” and its residents/citizens as “mainlanders.”
3.1 Sociohistorical developments of Orchid Island
Despite its geographic isolation, sociopolitical moves by mainland Taiwan have had a continued influence on Yami language and society. The modern history of Orchid Island from the late-19th century to the present is marked by three major phases (Table 3).
Modern History and Outcomes of Language Contact on Orchid Island.
I. Orchid Island in tranquility (1890s to mid-1960s): During this period, the island was designated as an indigenous reserve accessible only with government permission. Life on Orchid Island was relatively “quiet,” and islanders made a living from small-scale farming and/or fishing. Despite the presence of a small number of mainlanders such as teachers, government employees, and soldiers, strong linguistic and cultural barriers hindered intergroup interaction (Lai, 2011; Tsai, 2009, 102f). In 1945, Mandarin was designated as the official language of Taiwan, which simultaneously suppressed other ethnic languages, including the Yami language spoken on Orchid Island.
II. Connecting to the outside world (late-1960s to late-1990s): From 1967 onward, Orchid Island was opened to the public and integration into the wider Taiwan society began. Villages like Imowrod, Iratay, and Yayo, hosts to municipal infrastructure and facilities (Figure 2), are consequently considered to be more modernized and commercialized and have frequent contact with Mandarin speakers. Yami speakers from these villages have been undergoing rapid language shift and show higher degrees of adaptation toward Taiwanese culture than speakers in other villages (Lai, 2011; Lai & Gooden, 2018b; Li & Ho, 1988; Rau, 1995; Tsai, 2009, pp. 33–36). In contrast, during this period, Iraralay, Iranmilek, and Ivalino villages, which are far from the main commercial center, were considered hallmarks of Yami language and culture preservation (Chen, 1998; Li & Ho, 1988; Lin, 2007; Rau, 1995). On the economic front, limited income from the farming and fishing economies pushed outward migration of workers to Taiwan. The traditional lifestyle on Orchid Island remained largely unchanged.
III. New top tourism hotspot: Since the turn of the 21st century, the island has gradually rebranded itself as a summer vacation spot and tourism has rapidly become the main economic engine (Lai, 2018a; Lai & Gooden, 2018a, 2018b, 2019). Iranmilek village, for example, a place renowned for its beautiful sunrise, coastal scenery, and water sports, has transformed itself into a new tourist hub (Figure 2). Recent tourism explosion has noticeably repopulated the island by attracting adult Yami returnees and an increasing number of mainlanders. The latest census (The Department of Household Registration, Taiwan, 2021) shows that the number of current residents on the island is 5,210, among whom about 4,198 (81%) are Yami People. The proportion of non-Yami people has nearly doubled over the past two decades and now represents 19% of the inhabitants on the island. 11

Distribution of municipal infrastructure and facilities on Orchid Island.
3.2 Language contact and sociolinguistic effects
3.2.1 Mandarin language policy (1946–1987) in Taiwan
The 1945 designation of Mandarin as official language simultaneously established it as a national language and presumed lingua franca for the various ethnic groups in Taiwan. Mandarin was promoted as the language of instruction and noncompliance was punitive such that students were fined or forced to wear signs saying “Don’t speak ethnic languages” around their necks (Lai, 2011; Sandel, 2003). Other hegemonic practices (1946–1987) included banning ethnic languages in public spaces and delivering most television programs and broadcasts in Mandarin. These efforts were so effective that a speaker’s proficiency on “standard” pronunciation came to index intelligence and patriotism (Sandel, 2003). Decades later, Mandarin has encroached on most traditional domains of other ethnic languages, resulting in widespread mass displacement of ethno-linguistic communities (Wei, 2006).
3.2.2 Outcomes of Mandarin–Yami contact
With the scenarios above in the backdrop we now see community-wide Yami–Mandarin bilingualism on Orchid Island. Sociolinguistic interview data (Lai, 2011) showed that the Yami speech community has shifted from diglossia with initial stage of bilingualism (1945–late 1960s) to bilingualism without diglossia (1970s–2000) (see Fishman, 1967). With a growing number of bilinguals, the functional differentiation between Mandarin and Yami has largely collapsed—Yami continues to cede to Mandarin as a family language and is only integrated into the school curriculum since 2001 to promote language preservation and revitalization efforts (Lai, 2011, 2018a).
The current tourism boom has further eroded Yami vitality since Mandarin is the medium of interethnic communication and is considered the key to economic success (Lai & Gooden, 2018b, 2019; Rau, 1995). Today, only speakers over 50 years old (approximately 1,200 people) can still conduct fluent conversations in Yami (Lai, 2018a; Lai & Gooden, 2018b, 2019). A recent fieldwork survey (Lai, 2018a) reveals that currently, there are only a few domains (e.g., rituals and religious settings) that continue to resist major influence from Mandarin. Despite persistent advocacy and language preservation efforts, Yami children and teenagers (except those who are from Iraralay) virtually do not speak Yami other than in schools (Lai, 2018a; Lai & Gooden, 2018b, 2019). This ongoing language loss, as suggested by Crystal (2000, pp. 19–21), poses an immediate threat to the language vitality of Yami.
3.2.3 Language use and indigenous identity
There is increasing consensus (Auer, 2007; Nguyen, 2017; Pavlenko & Blackledge, 2004) that when there are multiple strands of cultural influence (e.g., the heritage culture and the broader national context) in bi/multilingual settings, speakers can exercise agency in deconstructing predefined national/ethnic identities to form a bicultural or hybrid identity. Markstrom’s (2011) conceptual model for American Indian adolescents recognizes three overlapping and integrated components that form an overall identity. This includes (1) identification: self-categorization and labeling as a certain tribe/clan; (2) connection: genealogical heritage and connection to their ancestral homeland; and (3) culture that refers to language, cultural practice, and the underlying worldview shared by group members.
It is worth mentioning that while language is indeed a salient marker of cultural identities, there is no direct connection between language proficiency and strength of cultural identity. Instead, an amalgam of cultural activities and community practices shape and continue to influence indigenous/aboriginal groups’ perceptions of who they are (Kulis et al., 2013; Owen, 2011).
This identity (re)negotiating process is also seen among younger Yami people, who claim strong indigenous rootedness despite limited Yami proficiency (see section 6.4 for further discussion). As such, our discussion of speakers’ language profile considers not only language proficiency, but also their residential history and self-selected identity as potential factors influencing their yes/no question intonation (section 4).
4 Method
The data analyzed here were collected as part of a larger project investigating contact-based intonational variation in Yami–Mandarin bilingual speech (Lai, 2018a) and included spontaneous interactive game dialogues in both Yami and Mandarin languages. The present study focuses solely on the Yami data.
4.1 Participant language profile
A modified, post-experiment Language Experience and Proficiency Questionnaire (LEAP-Q) (Marian et al., 2007 Appendix A) was used to gather information about participants’ (1) self-reported language dominance, (2) first acquired language(s) by preschool age, (3) relative percentages of language use in daily communication, (4) education level, and (5) the ratio of the years of residence in Taiwan to the years of residence on Orchid Island after age of 15. The longer one spent in mainland Taiwan, the higher the ratio. 12
In all, 32 Yami–Mandarin bilinguals were recruited and divided into Yami-dominant, balanced, and Mandarin-dominant bilinguals based on their LEAP-Q responses. In addition, five near-Yami-monolinguals were included as a reference group. As summarized in Table 4, Yami-(near)monolinguals acquired Yami first and had virtually lived on Orchid Island for their whole lives. Although they received some years of schooling in Mandarin, they predominantly use Yami. Their Mandarin is heavily Yami-accented and they often have difficulty carrying on long conversations in Mandarin. Yami-dominant bilinguals also acquired Yami first; they completed a 6-year compulsory Mandarin education and on average spent 8.5 years residing in Taiwan. They are more fluent in Yami and use it as their primary language. Balanced bilinguals either acquired Yami first or learned Yami and Mandarin simultaneously. They completed a 9-year compulsory Mandarin education and on average spent more than a decade living in Taiwan. They make frequent use of Yami and Mandarin and self-reported as being equally fluent in both languages. Finally, Mandarin-dominant bilinguals either acquired Yami and Mandarin simultaneously or were mainly exposed to Mandarin by preschool age, with half receiving a college degree. They also spent more than a decade residing in Taiwan and have shifted toward using Mandarin on most occasions.
Participant (n = 37) Language Profile.
Education level was included as a parameter because Lai and Gooden (2014) found that speakers with higher education level produced higher rates of Mandarin-induced segmental variation. Following this reasoning, we believe education level can be an important aspect of one’s language experience.
Speakers were also grouped in terms of village and cultural identity. Participants came from five of the six villages: Iratay, Yayo, Iraralay, Iranmilek, and Ivalino, 13 and self-claimed identity as Yami or Yami-Taiwanese using a 10-point scale. In total, 27 participants claimed sole Yami identity (Median = 10); the rest considered themselves both Yami (Median = 10) and Taiwanese (Median = 9). No participant claimed an “Other” identity (Appendix B).
4.2 Data collection
The Interactive Card Games (Lai, 2018a, 2018b) were created to elicit spontaneous speech using two unscripted tasks: card-matching and picture-guessing. Paired participants collaborated and conversed while completing each game. These tasks yielded utterances across seven pragmatic contexts, of which we focus on three yes/no questions: NQ, DQ1, and DQ2 with lighter incredulity (cf. Table 2). The games were designed to have six disyllabic target words (and fillers) occur in sentence-final position. The six Yami target words [a.
4.3 Elicitation tasks
A sheet of paper containing the 6 target words and 10 fillers (Appendix C) was given to each participant, who was instructed to use the lexical items shown throughout the games. Both games included an introductory phase to help familiarize participants with the pragmatic nuances of the elicitation contexts and game rules. No explicit instructions were provided on pronunciation or specific syntactic frames, only a request to speak as naturally as possible and to provide utterances appropriate for the different pragmatic conditions.
4.3.1 The card-matching game
In this task, participants worked in dyads. They took turns leading the conversation to have the six target cards matched in pairs. Participant 1 initiated the conversation by asking [ja mijɛn imo ʂo ____?] “Do you have ____?” to request a target card from Participant 2. Upon hearing the request, Participant 2 checked his or her deck of cards to see whether they had the intended target Participant 1 needed. If so, they gave the card to Participant 1 to facilitate card matching. Once the cards were matched, Participant 1 specified which target was matched by saying [o jam ____.] “This is ____.” and then put the pair aside. After that, Participant 2 repeated the same procedure for card matching (Figure 3).

Elicitation context of NQ from the card-matching task.
4.3.2 The card-guessing game
Each participant in this task received a pile of cards with abstract drawings corresponding to the six target words (see Figure 4). Participant 1 randomly drew a card from his or her pile, showed it to Participant 2, and asked Participant 2 to guess what the picture on the card represented by saying [ikoŋ o ja?] “What is this?” Participant 2 picked up an answer from the 16 candidate words and said [___ ja(n)/ɻi?] “(Is that) ___?” to seek confirmation from Participant 1. Participant 1 then revealed the answer and said [bəkən, o jam ___.] “No, this is (the) ___.” Given the task, Participant 2 would have difficulty identifying the picture on each card and would express incredulity/surprise upon hearing the answer given by Participant 1 by saying [koŋ o ___ ɻi!?] “This is (the) ___!?” To convince Participant 2, Participant 1 showed the answer written on the back of the card to Participant 2 and emphasized [nonan,

Elicitation contexts of DQ1 and DQ2 from the card-guessing task. The rightmost column specifies the pragmatic contexts.
It is important to stress that NQ, DQ1, and DQ2 are shorthand notations for the pragmatic contexts designed to elicit those questions rather than the actual productions themselves (see also discussion in section 2.2). Overall, participants performed well in the card-matching task, and we successfully elicited the 6 target NQ responses from all the 37 participants, yielding 222 tokens. For the card guessing task, there were two instances where the participants guessed correctly and in each case the game was restarted with a fresh card. In total, we elicited 220 (6 × 37 – 2) tokens in the DQ1 context. The dataset for DQ2 is smaller (n = 167) as this context seemed to present difficulty for a few Yami speakers (Yami-monolinguals and Yami-dominant bilinguals in particular), causing them to skip DQ2s. Under such circumstances, the first author reminded the participants that there were two pragmatic conditions. If the participants still skipped DQ2, no further intervention was made to avoid “forced” responses. Altogether, the two games yielded a total of 609 tokens. Recording sessions were done in a quiet room of participants’ houses using a digital voice recorder (Olympus Zoom H4n) and saved to .wav format at a sampling rate of 44.1 kHz/16-bit resolution.
Table 5 provides some representative tokens, rather than an exhaustive list of Yami utterances (see section 2.2 on morphosyntactic variation in Yami). We focus on NQ, DQ1, and DQ2 (gray shaded).
Representative Sample Utterances.
Note. The blank lines represent the target positions. NOM: nominative case; OBL: oblique case; TOP: topic case marker; DEI: deictic; Q-PAR: question particle (optional); INT-PAR: introductory particle.
In what follows, we summarize the procedures for the prosodic labeling (section 4.4) and statistical modeling (section 4.5).
4.4 Prosodic transcription
Utterances containing background noise, laughter, disfluency, and/or hesitation were eliminated, leaving 341 tokens for analysis. The prosodic transcription focused on the utterance-final nuclear configuration, defined as the combination of nuclear pitch accent and boundary tone (see Frota et al., 2007). For each elicited utterance (corresponds to an IP in this study), the nuclear configuration was labeled based on the first author’s auditory impression, aided by visual inspection of the F0 trace in Praat (version 6.0.43, Boersma & Weenink, 2018). Tonal configurations were coded using a ToBI-style annotation (Beckman et al., 2006) for Yami (Lai, 2018a; Lai & Gooden, 2015, 2016, 2018a).
As seen in Table 6, the three contexts (NQ, DQ1, and DQ2) for the yes/no questions exhibit four pitch accents, including two monotones (H* and L*) and two complex tones—a fall (H +!H*) and a rise (L + H*). The H* is generally characterized by a high flat contour without audible pitch raising or lowering. L + H*, on the contrary, is typified by a sharp rising contour with audible pitch raise. The falling tone is denoted as H +!H*, rather than H + L* because the !H* part of the pitch accent is downstepped relative to the leading H tone of the same pitch accent but is not necessarily realized with a pitch at the bottom of the speaker’s pitch range (Beckman et al., 2006, p. 25). Boundary tones have a three-way distinction: a high tone (H%) characterized by a clear rising movement, a low tone (L%) with a clear downward pitch contour, and a mid-level tone (M%) where there was neither an overt final rise nor fall (Table 7).
Pitch Accent Inventory in Yami Yes/No Question.
Final Boundary Tone Inventory in Yami Yes/No Question.
4.5 Statistical methods and variables
For the acoustic analyses we measured sentence duration, speech rate, pitch level, and pitch span.
We conducted logistic and linear mixed effects regressions in RStudio (RStudio Team, 2020) using the
In building the models, all categorical predictor variables were dummy coded and reference levels set as follows:
The distribution of nuclear configuration patterns and the results of statistical tests are reported in section 5.
5 Results
5.1 Distribution of nuclear configuration contours
Of the total 341 labeled tokens (146 DQ1s + 92 DQ2s + 103 NQs), we analyzed NQ tokens with a VSO word order (n = 89; 86% of the NQ sample) and eliminated VOS ones (n = 14; 14%), as the target words occurred sentence-medially. The final dataset contains 327 (146 DQ1 + 92 DQ2 + 89 NQ-VSO) utterances. The analysis in this section focused on the pitch contours associated with the target word (nuclear pitch accent) and the boundary tone, that is, nuclear configuration. Given the small dataset, we did not further split the tokens based on presence/absence of a final particle. As summarized below, speakers produced a variety of nuclear configuration patterns across three intended pragmatic conditions (section 5.1.1) and the elicited nuclear configuration patterns also vary with speaker typology (section 5.1.2).
5.1.1 Pragmatic context
We observed the following patterns for nuclear configuration across three elicitation contexts (Figure 5). In the DQ1 context, speakers preponderantly produced a H* pitch accent (90%), with trivial proportions of L + H* (6%), H +!H* (3%), and L* (1%). Final boundary tones were H% (56%), L% (30%), and M% (14%). Together, these tokens had eight nuclear configuration patterns among which intermediate-rise H* H% (50%), mid-fall H* + L% (25%), and mid-level H* M% (13%) were the majority. There were also five other minor patterns ranging between 1% and 5%.

Distribution of elicited nuclear configurations by intended pragmatic context.
In the DQ2 context, speakers mostly had H* pitch accents (83%), followed by H +!H* (11%), and small portions of L + H* (4%) and L* (2%). For final boundary tone, there were similar amounts of H% (39%) and L% (38%), and a bit less M% (23%). In total, nine nuclear configurations were observed: intermediate-rise H* H% (34%), mid-fall H* L% (27%), mid-level H* M% (22%), sharp-fall H +!H* L% (10%), and five minor patterns of 2% or less.
In the NQ context, speakers had H* (70%) far more frequently than L + H* (18%), H +!H* (10%), and L* (2%). For final boundary tone, M% occurred more frequently (40%) than both L% (33%) and H% (27%). Taken together, eight nuclear configuration shapes were observed, and the majority patterns were mid-level H* M% (40%), sharp rise L + H* H% (20%), intermediate-rise H* H% (15%), sharp-fall H +!H* L% (12%), and mid-fall H* L% (10%) contours. Three other minor patterns each represented 1% of the data.
These varied patterns were then regrouped into three general categories: rise, fall, and mid-level to reduce the complexity for subsequent analyses. This regrouping maintains the general trends as reported in section 5.1.2.
5.1.2 Speaker typology
Figure 6 shows the percentage distribution of nuclear configuration types organized by speaker typology. Yami-monolingual (61%), balanced bilingual (56.5%), and Mandarin-dominant bilingual (73%) speakers mainly produced a rising contour under the DQ1 context. Yami-dominant bilinguals behaved differently by having both a falling (46%) and a rising (43%) pattern.

Nuclear contour patterns by pragmatic condition and speaker typology.
For the DQ2 context, Yami-monolinguals and Yami-dominant bilinguals patterned together by producing mid-level and falling nuclear contours, each having more than 40%, with fewer rising contours. Balanced bilinguals favored a falling (53%) or a rising (41%) shape over a mid-level contour (6%). Mandarin-dominant bilinguals showed a clear preference for a rising (88%) shape over a mid-level pattern (12%).
In the NQ condition, Yami-monolinguals preferred a falling contour (56%) over a mid-level (38%) and a rising (6%) pattern; Yami-dominant bilinguals used a mid-level contour (49%) more often than falling (28%) and rising (23%) contours; balanced bilinguals had equal amount of rising, falling, and mid-level contours. Mandarin-dominant bilinguals favored a rising contour (50%) over mid-level (36%) and falling (14%) patterns.
To sum up, speakers produced rising pitch movements in the DQ1 context. Under the NQ context, we observed heterogeneous patterns spanning from Yami-monolinguals’ falling contour, Mandarin-dominant bilinguals’ rising shape, Yami-dominant speakers’ mid-level pattern, and balanced bilinguals’ evenly split patterns. Speakers exhibited highly varied tonal configurations in their DQ2 productions, which confirms our observations (section 4.3.2) that the subtle discourse-pragmatic nuances between DQ1 and DQ2 (degrees of incredulity) might not be clear to all speakers. As such, we suspect that this distinction is absent from Yami.
Next, we built mixed effects logistic models (section 5.2) to evaluate the relationship between various predictor variables and nuclear configuration (the outcome variable).
5.2 Mixed effects logistic regression
Three separate mixed effects logistic regression analyses were performed, one for each of the binary outcomes of nuclear configuration (i.e., rise vs. non-rise, fall vs. non-fall, and mid vs. non-mid). In each of these model, the estimates represent the likelihood of producing a rise/fall/mid-nuclear configuration. The predicted probability plots were created by implementing the
5.2.1 Rising nuclear configuration
Table 8 presents a summary of the final model and graphs of the predicted probabilities are shown in Figure 7. For
Summary Table of Rise Nuclear Configuration.
Note. The final model is: rise ~ speaker typology + duration + (1 | participant) (χ2 = 158.58, df = 3, p < .001).SE: standard error; CI: confidence interval.
Significant codes: “***” p < .001, “**” p < .01, “*” p < .05, “.” p < .10. (Baayen, 2008)

Fitted model predictions for producing rise nuclear configurations.
5.2.2 Falling nuclear configuration
The final model returned pragmatic context and duration as significant predictors of using a falling contour (Table 9 and Figure 8). For pragmatic context, the negative estimate for NQ indicates that NQ utterances are less likely than DQ1 tokens to have a falling pattern (β = −2.30, p = .006). DQ2 utterances are also less likely than DQ1 utterances to have a falling pattern, but the difference was not significant (p = .399). To compare pitch contours in DQ2 and NQ utterances, we ran the model again and reset DQ2 as the reference level. The result returned a negative estimate, suggesting that NQ tokens are less likely than DQ2 tokens to be realized with a falling contour (β = −1.65, t = −1.98, p = .048). 16 The negative estimate of duration means a decreased likelihood of longer utterances having a fall pattern (β = −159.14, p = < .001).
Summary Table of Fall Nuclear Configuration.
Note. The final model is: fall ~ speaker typology + duration + (1 | participant) (χ2 = 158.58, df = 3, p < .001). SE: standard error; CI: confidence interval; DQ2: declarative questions with lighter incredulity; NQ: neutral questions.
Significant codes: “***” p < .001, “**” p < .01, “*” p < .05, “.” p < .10. (Baayen, 2008)

Fitted model predictions for producing fall nuclear configurations.
5.2.3 Mid-level nuclear configuration
The final model returned two main effects (Table 10 and Figure 9). For pragmatic context, the estimate for NQ is positive, meaning NQ utterances are more likely than DQ1 utterances to have a mid-level pattern (β = 1.50, p < .001). To compare pitch contours in DQ2 and NQ utterances, we ran the model again and reset DQ2 as the reference level. The result indicates an increased likelihood of NQ responses having a mid-level pattern than DQ2 tokens (β = 1.16, t = 2.62, p = .009). 17 For duration, the negative estimate means a mid-contour pattern is less likely to occur in longer utterances (β = −21.95, p = .015).
Summary Table of Mid-Nuclear Configuration.
Note. The final model is: mid ~ pragmatic context + duration + (1 | participant) (χ2 = 27.51, df = 2, p < .001). SE: standard error; CI: confidence interval; DQ2: declarative questions with lighter incredulity; NQ: neutral questions.
Significant codes: “***” p < .001, “**” p < .01, “*” p < .05, “.” p < .10.

Fitted model predictions for producing mid-level nuclear configurations.
To sum up, three main factors affect the probability of the nuclear configuration patterns produced. For speaker typology, Mandarin-dominant bilinguals are more likely to produce a rise pattern than other groups of speakers. For pragmatic context, speakers in general are more likely to use a mid-level pattern in NQ utterances than in other contexts. Finally, speakers exploit duration to differentiate nuclear configurations—longer sentences are more likely to be produced with a rise pattern, whereas shorter sentences are more likely to be realized with a fall or mid-level pattern.
We examine these results more closely in the analyses of global pitch measures in the following section. Specifically, we examine whether there are phonetic differences in pitch level and pitch span among the yes/no questions produced in the different pragmatic contexts, and whether these productions were influenced by different (socio)linguistic factors.
5.3 Linear mixed-effects regression
This aspect of the analysis focused on the relationship between predictor variables and two pitch measures. To assess whether pitch movements are modulated by speakers’ language experience, discourse pragmatics, and other explanatory variables, we initially built full linear mixed-effects models for pitch level and pitch span, respectively. These full models included three categorical variables (village, identity, and nuclear configuration), one interaction term (speaker typology*pragmatic context), and two continuous variables (sentence duration and speech rate). The inclusion of nuclear configuration in the models allowed us to examine whether rise, fall, and mid intonation contours differ in their phonetic manifestations.
After model selection, speaker typology, village, identity, speech rate, and the interaction term were excluded from both the pitch level and the pitch span analyses. For pitch level, the final model included pragmatic context and sentence duration as fixed effects, and for pitch span, the final model included pragmatic context and nuclear configuration as fixed effects. The final models were ones that both converged and provided the best parsimonious fit of the data. Using the anova() function (Baayen, 2008), each final model was compared against a null model containing two random intercepts (word and participant) by using the likelihood ratio test with a significance level of α = 0.05. As noted above, all categorical variables were dummy coded and the reference level set as: pragmatic context (DQ1) and nuclear configuration (rise).
5.3.1 Pitch level (mean F0)
The model returned significant effects of the intended pragmatic context and sentence duration (Table 11). Tokens produced in the NQ context were realized with a significantly lower pitch level than tokens produced in DQ1 (β = −.44, t = −2.53, p = .017) context, but the DQ1–DQ2 difference did not reach statistical significance (p = .302). We ran the model again with DQ2 as the reference level, finding that NQ tokens were also realized with a significantly lower pitch level than DQ2 tokens (β = −.36, t = −2.30, p = .027). 18 The results also return a duration effect in which longer sentences have a higher pitch level (β = 8.27, t = 5.03, p < .001). Recall that the duration variable accounts for the varied use of final particles, so it is likely that additional segmental content would allow for utterances to reach higher pitch targets.
Summary Table of Pitch Level.
Note. The final model is: pitch level ~ pragmatic context + duration + (1 + pragmatic context | participant) (χ2 = 44.23, df = 7, p < .001). SE: standard error; DQ2: declarative questions with lighter incredulity; NQ: neutral questions.
Significant codes: “***” p < .001, “**” p < .01, “*” p < .05, “.” p < .10. (Baayen, 2008)
5.3.2 Pitch span
The results returned main effects of
Summary Table of Pitch Span.
Note. The final model is: pitch span ~ pragmatic context + nuclear configuration + (1 | word) (χ2 = 14.68, df = 3, p = .002). SE: standard error; DQ2: declarative questions with lighter incredulity; NQ: neutral questions.
Significant codes: “***” p < .001, “**” p < .01, “*” p < .05, “.” p < .10. (Baayen, 2008)
Summarizing over these pitch analyses, DQ1 and DQ2 tokens were realized with a higher pitch level and a wider pitch span than NQ tokens. When taking both pitch measures into account, we found that pitch level and pitch span are jointly utilized by the speakers when presented with different pragmatic contexts. As seen in Figure 10, in general, as pitch span increases, pitch level also increases for DQ1 (r = .32) and DQ2 (r = .23) tokens, but there is no correlation between the two pitch measures for NQ responses (r = .02).

Relationship between pitch span and pitch level by pragmatic context.
6 General discussion
Although contact-based prosodic change has been well documented for Indo-European languages, our understanding of the mechanisms and outcomes of prosodic variation in Indigenous languages is sparse or barely scratches the surface. In the Yami community, top-down language policies by the Taiwanese government, along with traumatic changes in the local linguistic ecology over the past two decades, have exerted profound influence on the Yami language. This has caused Yami to be displaced in practically all domains as the number of bilingual speakers increases. It is not quite so surprising then to see variation because the prosodic system among bilinguals (and in fact among all speakers) is quite likely in flux. Our previous work showed evidence for intonational differences across different pragmatic contexts: a falling contour for declaratives and wh-questions, and a rising contour for yes/no questions (Lai & Gooden, 2015). Through an in-depth analysis of yes/no question productions, this article aims to clarify how or if Yami speakers reliably differentiate among different types of yes/no questions, given different pragmatic contexts.
6.1 Major nuclear configurations of Yami yes/no question production
Our analyses suggest that Yami speakers use different melodic patterns to encode subtypes of yes/no questions, and the specific categorization itself is quite different from Mandarin. The analysis suggests that participants mainly used a rising nuclear configuration (56%) in the DQ1 context and a mid-level contour (40%) in the NQ condition. Under the DQ2 context, speakers exhibited a rather mixed profile, making it difficult to deduce a reliable conclusion. Critically, the fact that the linear mixed effects regression analyses showed that DQ1 and DQ2 productions did not differ in their pitch level and pitch span (section 5.3) confirms the earlier suggestion that Yami does not have an authentic DQ2 category. DQ2 tokens were therefore difficult to be elicited from older, fluent Yami speakers (Lai, 2018a; Lai & Gooden, 2018a, 2019)
Conversely, younger Yami speakers have seemed to integrate Mandarin DQ2 into Yami and also their intonation pattern differs from that of Yami-monolinguals in the NQ context. In the discussion below, we take a detailed look at the patterning of tokens produced in the DQ2 and NQ contexts and discuss whether the use of less Yami-like prosodic features is related to a weakened Yami identity.
6.2 A hybrid pattern in the DQ2 context
In general, balanced and Mandarin-dominant bilinguals had no difficulty teasing apart DQ1 and DQ2 contexts over the course of the card-guessing task. In contrast, Yami-monolinguals as a group and Yami-dominant bilinguals often confused DQ1 and DQ2 contexts. As discussed in sections 5.1 and 5.3, the lack of a widespread DQ1–DQ2 distinction suggests an absence of this contrast in Yami, but which is present in Mandarin. Alternatively, it is possible that we may need to revise the game design to make the pragmatic nuance more salient so that it can be clear to older speakers. Methodologically, a blocked design that elicits DQ1 and DQ2 responses in separate sessions (e.g., Chuang et al., 2007; Chuang & Fon, 2016), rather than eliciting DQ1 and DQ2 responses consecutively (Figure 4), may also help the participants to tease subtle pragmatic differences apart. Whatever the cause, it is the knowledge of Mandarin syntactic-pragmatic structure that enables balanced and Mandarin-dominant bilinguals to map DQ2 onto Yami intonation.
Interestingly, when this Mandarin DQ2 is transplanted to Yami, the high-level intonation was not jointly borrowed. Rather, younger bilingual speakers fill the newly added DQ2 “slot” by realizing it with an already-existing high-rising intonation (Figure 11). This demonstrates an interesting case of hybridization, where a Mandarin syntactic frame (DQ2) is fused with Yami intonation (Lai, 2018a; Lai & Gooden, 2018a, 2019). If this new pattern continues, a DQ1–DQ2 distinction may appear in Yami in the future.

Hybridization of Mandarin syntactic pattern and Yami intonation (adapted from Lai, 2018a).
6.3 Prosodic borrowing and innovation in Yami NQ context
Looking more closely at NQ tokens, overall, participants chose a mid-level contour over falling and rising ones (Figures 5 and 9). Regarding speaker typology, we observed a falling contour for Yami-monolinguals (56%) and a rising pattern for Mandarin-dominant bilinguals (50%) (Figure 6). Yami-dominant bilinguals employed a mid-level contour in nearly half of the cases, while balanced bilinguals were evenly spread across mid, rising, and falling patterns.
Again, working with the assumption that older speakers’ speech often represents a more conservative form of the language (Berge, 2010; Dorian, 1981, 1994a; Grinevald, 2003; Sasse, 1992b), we argue that speakers conventionally employ a falling intonation in the NQ context. As such, the occurrence of a mid-level contour instead of a final fall in the NQ context caught our attention. Given Yami-dominant and balanced bilinguals’ grammatical proficiency (section 4.1), it is hard to explain this Mandarin-like mid-level pattern solely through a language attrition model (Sasse, 1992a). We suggest instead that this mid-level pattern embodies a case of contact-induced change (Maher, 1991), and in fact bilingual speakers play a key role in integrating Mandarin features into Yami via different mechanisms (Figure 12). For balanced bilinguals, given their proficiency in and frequent use of Mandarin, the Mandarin intonation pattern may over time be partially transplanted to their Yami via direct borrowing (Colantoni & Gurlekian, 2004). However, balanced bilinguals’ NQ intonation is unstable as we see equal amounts of three nuclear configuration patterns (rise, fall, and mid). Yami-dominant bilinguals, interestingly, have even higher rates of using a mid-level contour, likely due to their use of Mandarin (direct borrowing), coupled with frequent social interaction with balanced bilinguals (indirect borrowing, see Colantoni & Gurlekian, 2004; Romera & Elordieta, 2013). Thus, Mandarin intonation has exerted an “additive effect” in Yami-dominant bilinguals’ Yami speech.

Mechanisms of prosodic borrowing from Mandarin to Yami.
We also noted a rising pattern in Mandarin-dominant bilinguals’ NQ productions, which is not explainable through Mandarin influence, since Mandarin itself has a mid-level pattern. In this case, the language attrition framework (Sasse, 1992a) provides better explanation. More specifically, our Mandarin-dominant bilinguals showed limited Yami proficiency which barely goes beyond producing short sentences. Due to this lack of Yami competence, these speakers consistently use a rising intonation in producing questions, even though they indicated understanding the pragmatic nuances between the different question types and intended to produce NQ tokens. As pointed out by Polinsky and Scontras (2020), this simplification/reduction in Mandarin-dominant bilinguals’ Yami intonation system suggests speakers’ resistance to irregularity to make the system more learner-friendly, as maintaining two grammars in parallel impose relatively high cognitive demands for these younger speakers.
6.4 Language loss and cultural-identity affiliation: nonlinear relationship
The current linguistic ecology plays a crucial role in determining the evolution of Yami bilingual intonation. With increasing reliance on the tourism economies, younger bilinguals now make frequent, if not exclusive, use of Mandarin, thus exhibiting asymmetrical convergence toward the Mandarin intonation system. The declining use and changing grammars of Yami, however, is not easily interpreted as a weakened cultural affiliation with Yami, and the reasons are twofold. First, we did not find an explicit association between participants’ cultural belongingness and their Yami proficiency, because of the 37 participants, 10 claimed both Yami and Taiwanese identities, and those who “embrace” a dual Yami–Taiwanese identity span from Yami-dominant, balanced, and Mandarin-dominant bilinguals. This echoes the argument that there is no direct one-to-one mapping between language proficiency and cultural identity as observed for other indigenous groups (Kulis et al., 2013; Owen, 2011).
Identity construction and expression is further complexed by the current sociopolitical climates between the Yami society and mainland Taiwan, especially for younger speakers. Specifically, even though younger Yami people (under 40) have been undergoing rapid language loss and have an unprecedented dependence on tourism economies, the sociopolitical tensions between Yami and Taiwanese societies, alongside ongoing sociocultural clashes between islanders and tourists (Lai & Gooden, 2018b), have in fact deepened younger Yami people’s rootedness in their indigenous identity. This rootedness in turn motivates them to be more involved in, rather than distant from, traditional festivals and local affairs. Most recently, this younger group collaborated to issue a Yami ID card not only to strengthen their cultural representation, 20 but also to promote public recognition and support of the Yami culture. 21 Given the declining use and changing grammar of present-day Yami, it seems that while younger speakers now no longer use the language to fulfill their communicative needs, they may use it as a socio-indexical resource to voice their ethnocultural affiliation (Irvine & Gal, 2000; Kozminska, 2019).
7 Conclusion and future directions
This study describes key aspects of Yami yes/no question intonation and investigates Mandarin influence on Yami, given the current linguistic ecology on Orchid Island. Three pragmatic contexts (DQ1, DQ2, and NQ) were created to elicit tokens, and speakers’ realization of the nuclear configuration, pitch level, and pitch span were examined under different pragmatic contexts. Our analyses suggest that while Yami speakers use different melodic patterns to encode subtypes of yes/no questions, the specific categorization itself is quite different from Mandarin. Participants produced a rising contour in the DQ1 context but showed varied intonation patterns in NQ and DQ2 contexts. Based on the preferred pattern produced by Yami-monolinguals, we argue that (earlier) Yami speakers use a falling intonation in the NQ context. This deviates from Yami-dominant and balanced bilinguals’ Mandarin-like, mid-level pattern and also differs from Mandarin-dominant bilinguals’ rising pattern. For tokens uttered in the DQ2 condition, the data present challenges in delimiting a reliable DQ2 intonation, given the mixed patterns. We surmise that it is possible that the DQ1–DQ2 contrast only exists in Mandarin and not in Yami. Mandarin DQ2, however, has been borrowed to Yami by younger bilinguals to form a hybrid pattern, which intertwines Mandarin pragmatics-syntax and Yami intonation. Taken together, the innovative NQ and DQ2 intonation patterns suggest an intonational converge toward the Mandarin system. This adds support to the argument that prosodic features are permeable between languages (Colantoni & Gurlekian, 2004; Good, 2004a, 2004b, 2009; Hualde & Schwegler, 2008; Mennen, 2004; O’Rourke, 2005; Queen, 2001; Remijsen et al., 2014; Remijsen & Van Heuven, 2005; Rivera-Castillo, 2009; Romera & Elordieta, 2013; Steien & Yakpo, 2020; Yakpo, 2009), and that prosodic features appear to be among the more vulnerable domains, thus susceptible to change, in bilingual grammars (Polinsky & Scontras, 2020). At the same time, we argue that this ongoing prosodic variation in younger Yami people’s speech is not easily translated into a weakened cultural affiliation.
The research presented here contributes to discussions of contact effects on (bilingual) prosody; intonation and sociolinguistics research on language variation in underrepresented languages spoken in non-Western contexts (Stanford, 2016), and the connection between indigenous language and identity. The study can be extended in a number of directions. First, this paper examined utterance-final melodic patterns and global pitch measures. While the results indeed suggest that speakers exploit pitch level and pitch span to fulfill discourse-pragmatic functions (Figure 10), the results did not reveal clear tendencies in terms of how the use of pitch is modulated by Yami speakers’ language experiences. A more fine-grained analysis of localized effects such as peak alignment and the phonetic realization of pitch accent from a larger dataset may facilitate a more nuanced analysis and thus a better understanding of the Yami prosodic system.
Another practical extension of this study is incorporating perception tasks to further investigate issues such as (1) whether Yami listeners can correctly identify question types; (2) whether an individual’s language background (Yami proficiency) affects their perceptual abilities; and (3) whether Yami listeners can reliably distinguish between DQ1 and DQ2 contexts. If not, then it would provide support for our argument that in fact there is no authentic DQ2 in Yami.
Finally, linguistic ideologies go hand in with language variation and are layered with additional complexities in contact situations (Rodríguez-Ordóñez, 2019). Several studies have shown that speakers may take agency in adopting a non-native feature to align themselves with Others (Morris, 2017; Simonet, 2010b). Alternatively, as with the Yami speakers, speakers may be strongly rooted in a local identity, irrespective of whether they retain traditional linguistic features (Moore & Carter, 2015). Collecting metalinguistic commentary from community members, with the use of experimental approaches such as the Matched-Guise Task, the Implicit Association Test, or the Sociolinguistic Monitor (see Labov et al., 2011) is crucial to unpack the way speakers store, retrieve, and process linguistic features and their associated social meanings (Rodríguez-Ordóñez, 2019). Data such as these would permit a more nuanced understanding of how prosodic variation (and other sound change) is initiated and evolves in languages like Yami, which is under heavy cultural–economic pressure from more dominant languages systems and cultures.
Footnotes
Appendix A
性別Gender 年齡Age 填表日期Today’s date
男Male 女Female _______ _______年yyyy _______月mm _______日dd
1. 請圈選您會說的語言Please circle the languages you know:
雅美(達悟)Yami
華語 = 國語 Taiwanese Mandarin
閩南語 Taiwanese Southern Min
其他語言 1 Other language 1: _______
其他語言2 Other language 2: _______
2. 請依序排出您各個語言的
3. 請列出您就學前的
4. 目前,各個語言的使用比率 (各項比率總和為100%) Please list what percentage of the time you are on average exposed to each language. (Your percentages should add up to 100%):
5. 與他人對話時,若該名友人懂的語言跟您一樣多,且各個語言跟您一樣流利,您的語言使用比率為何(各項比率總和為100%)?When choosing a language to speak with a person who is equally fluent in all your languages, what percentage of time would you choose to speak each language? Please report percent of total time. (Your percentages should add up to 100%):
6. 一個人可能有多重的文化/身份認同,請選出最符合您身份認同的選項(認同度0最低,10最高) Please name the culture(s) with which you identify. On a scale from 0 to 10, please rate the extent to which you identify with each culture.
7. 請圈選您的最高學歷 Please circle your highest education level:
8. 您是否曾有過有聽力或語言障礙病史?
1. 雅美(達悟)語是您的(母 第二)語言 Yami is your (first or second) language.
2. 請填寫下列各階段的大約年齡 The age when you. . .
3. 雅美(達悟)語語言流利度自評 ( 流利度0最低,10最高)On a scale from 0 to 10, please indicate your level of proficiency in speaking, understanding, reading, and writing:
4. 目前,在下列場合/情境下,您會接觸到雅美(達悟)語的程度( 接觸程度0最低,10最高) On a scale from 0 to 10, please rate to what extent you are currently exposed to Yami in the following contexts:
5. 您認為您的雅美(達悟)語,聽起來有不自然的腔調嗎?(不自然程度0最低,10最高)In your perception, how much of a non-native accent do you have in Yami:
6. 曾有當地人說過您的雅美(達悟)語,聽起來有不自然的腔調嗎? Please rate how frequently others identify you have a non-native accent in Yami:
7. 語言使用回顧 Language use pattern at different life stages
1. 出生地Place of birth: 紅頭Imowrod 漁人Iratay 椰油Yayo 朗島Iraralay 東清Iranmilek 野銀Ivalino
2. 成長/居住地 Place of residence
紅頭Imowrod 漁人Iratay 椰油Yayo 朗島Iraralay 東清Iranmilek 野銀Ivalino
• 曾到臺灣工作/求學?Have you ever worked or studied in Taiwan?
若答案為
3. 目前主要職業(
Appendix B
Summary of Participant Profile.
| Participant ID | Speaker typology | Identity | Age | Gender | Village | Language dominance | Primary language at preschool age | % of language use | Education level | Ratio |
|---|---|---|---|---|---|---|---|---|---|---|
| 35 | Yami-monolingual | Yami | 69 | Female | Yayo | Yami > Mandarin | Yami | Yami 100 | ES | 0.06 |
| 36 | Yami-monolingual | Yami | 67 | Male | Yayo | Yami > Mandarin | Yami | Yami 100 | ES | 0.01 |
| 32 | Yami-monolingual | Yami | 65 | Female | Iratay | Yami > Mandarin | Yami | Yami 100 | MS | 0.11 |
| 37 | Yami-monolingual | Yami | 63 | Female | Yayo | Yami > Mandarin | Yami | Yami 100 | ES | 0.02 |
| 19 | Yami-monolingual | Yami | 60 | Male | Iranmilek | Yami > Mandarin | Yami | Yami 60 + Mandarin 40 | ES | 0.55 |
| 33 | Yami-dominant | Yami | 62 | Male | Iranmilek | Yami > Mandarin | Yami | Yami 50 + Mandarin 50 | ES | 1.76 |
| 34 | Yami-dominant | Yami | 62 | Female | Iranmilek | Yami > Mandarin | Yami | Yami 50 + Mandarin 50 | ES | 1.76 |
| 9 | Yami-dominant | Yami | 58 | Female | Iranmilek | Yami > Mandarin | Yami | Yami 80 + Mandarin 20 | ES | 0.02 |
| 20 | Yami-dominant | Yami | 56 | Female | Iraralay | Yami > Mandarin | Yami | Yami 80 + Mandarin 20 | HS | 0.32 |
| 3 | Yami-dominant | Yami | 55 | Female | Iratay | Yami > Mandarin | Yami | Yami 70 + Mandarin 30 | HS | 0.14 |
| 7 | Yami-dominant | Yami | 52 | Male | Yayo | Yami > Mandarin | Yami | Yami 60 + Mandarin 40 | HS | 0.54 |
| 10 | Yami-dominant | Yami | 52 | Female | Iranmilek | Yami > Mandarin | Yami | Yami 80 + Mandarin 20 | MS | 0.06 |
| 11 | Yami-dominant | Yami | 41 | Female | Iraralay | Yami > Mandarin | Yami | Yami 70 + Mandarin 30 | MS | 0.04 |
| 2 | Yami-dominant | Yami &Taiwanese | 58 | Male | Ivalino | Yami > Mandarin | Yami | Yami 50 + Mandarin 50 | MS | 0.23 |
| 1 | Yami-dominant | Yami &Taiwanese | 56 | Female | Ivalino | Yami > Mandarin | Yami | Yami 80 + Mandarin20 | MS | 0.24 |
| 4 | Yami-dominant | Yami &Taiwanese | 56 | Female | Iratay | Yami > Mandarin | Yami | Yami 60 + Mandarin 40 | MS | 0.17 |
| 8 | Yami-dominant | Yami &Taiwanese | 52 | Female | Yayo | Mandarin > Yami | Yami | Yami 70 + Mandarin 30 | MS | 0.28 |
| 23 | Balanced bilingual | Yami | 56 | Female | Iranmilek | Mandarin > Yami | Yami | Yami 40 + Mandarin 60 | MS | 1.6 |
| 22 | Balanced bilingual | Yami | 49 | Female | Yayo | Mandarin > Yami | Yami | Mandarin 50 + Yami 50 | MS | 2.4 |
| 27 | Balanced bilingual | Yami | 48 | Female | Ivalino | Yami > Mandarin | Yami | Yami 50 + Mandarin 50 | CL | 10 |
| 29 | Balanced bilingual | Yami | 47 | Male | Yayo | Yami > Mandarin | Yami | Yami 50 + Mandarin 50 | HS | 1.67 |
| 6 | Balanced bilingual | Yami | 45 | Female | Iraralay | Mandarin > Yami | Yami | Yami 40 + Mandarin 60 | HS | 0.5 |
| 21 | Balanced bilingual | Yami | 45 | Female | Yayo | Mandarin > Yami | Yami | Mandarin 80 + Yami 20 | HS | 1.5 |
| 28 | Balanced bilingual | Yami | 45 | Female | Yayo | Mandarin > Yami | Yami | Yami 20 + Mandarin 80 | HS | 1.5 |
| 17 | Balanced bilingual | Yami | 42 | Female | Iraralay | Yami > Mandarin | Yami | Yami 40 + Mandarin 60 | HS | 0.57 |
| 24 | Balanced bilingual | Yami | 42 | Female | Yayo | Mandarin > Yami > Taiwanese Southern Min | Yami | Yami 30 + Mandarin 50 + Taiwanese Southern Min 20 | HS | 0.59 |
| 5 | Balanced bilingual | Yami &Taiwanese | 45 | Female | Iraralay | Mandarin > Yami | Yami | Yami 50 + Mandarin 50 | HS | 1.5 |
| 12 | Balanced bilingual | Yami &Taiwanese | 42 | Female | Yayo | Mandarin > Yami | Yami | Yami 30 + Mandarin 70 | HS | 0.05 |
| 15 | Balanced bilingual | Yami &Taiwanese | 42 | Male | Iraralay | Yami > Mandarin | Yami | Yami 80 + Mandarin 20 | HS | 10 |
| 30 | Mandarin-dominant | Yami | 38 | Male | Iranmilek | Mandarin > Yami | Mandarin | Yami 15 + Mandarin 85 | HS | 2.14 |
| 13 | Mandarin-dominant | Yami | 37 | Female | Iraralay | Mandarin > Yami | Yami | Yami 40 + Mandarin 60 | HS | 2.67 |
| 31 | Mandarin-dominant | Yami | 36 | Male | Iranmilek | Mandarin > Yami | Mandarin | Yami 30 + Mandarin 70 | HS | 2.5 |
| 25 | Mandarin-dominant | Yami | 25 | Female | Yayo | Mandarin > Yami | Mandarin | Yami 45 + Mandarin 55 | CL | 4 |
| 26 | Mandarin-dominant | Yami | 23 | Male | Iratay | Mandarin > Yami | Mandarin | Yami 30 + Mandarin 70 | MS | 1.5 |
| 14 | Mandarin-dominant | Yami &Taiwanese | 37 | Female | Iraralay | Mandarin > Yami | Yami | Yami 30 + Mandarin 70 | HS | 0.29 |
| 16 | Mandarin-dominant | Yami &Taiwanese | 37 | Female | Iraralay | Mandarin > Yami | Mandarin | Yami 20 + Mandarin 80 | HS | 10 |
| 18 | Mandarin-dominant | Yami &Taiwanese | 34 | Female | Iranmilek | Mandarin > Yami | Yami | Yami 30 + Mandarin 70 | HS | 1.34 |
Note. The participants were ordered by speaker typology, identity, and age. ES = elementary school; MS = middle school; HS = high school; CL = college/university.
Appendix C
Appendix D
Summary Table of Mid-Nuclear Configuration (DQ2 Reference).
| β | SE | z | p | CI | Exp B (odds ratio) | ||
|---|---|---|---|---|---|---|---|
| Intercept | 41.00 | 17.66 | 2.32 | .020* | [6.38, 75.62] | 6.39E + 17 | |
| Pragmatic Context | DQ2 | −.33 | 0.42 | −0.78 | .434 | [−1.16, 0.50] | 0.72 |
|
|
|
|
|
|
|
|
|
|
|
− |
|
− |
|
|
||
Note. SE: standard error; CI: confidence interval; DQ2: declarative questions with lighter incredulity; NQ: neutral questions.
Significant codes: “***” p < .001, “**” p < .01, “*” p < .05, “.” p < .10. (Baayen, 2008).
Appendix E
Summary Table of Pitch Span (DQ2 Reference Level).
| Fixed effect | β | SE | t | p | |
|---|---|---|---|---|---|
| (Intercept) | .15 | 0.12 | 1.25 | .217 | |
| Pragmatic context | DQ1 | −.12 | 0.12 | −1.03 | .306 |
|
|
− |
|
− |
|
|
| Nuclear configuration |
|
|
|
|
|
| Mid | −.08 | 0.13 | −.58 | .561 |
Note. DQ2: declarative questions with lighter incredulity; SE: standard error; DQ1: default declarative questions; NQ: neutral questions.
Significant codes: “***” p < .001, “**” p < .01, “*” p < .05, “.” p < .10. (Baayen, 2008).
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
