Abstract
In multilingual Macau, Macau Mandarin (MacM) exhibits substantial variation due to its complex linguistic landscape. This study investigates lexical tones in MacM, focusing on differences across age and gender group. Twenty-four fluent Mandarin speakers were recruited and stratified by age and gender. The findings reveal a reduced tonal space in MacM compared to Putonghua, the standard form of Mandarin used in mainland China, which partly results from physiological factors. Additionally, despite clear first-language (L1) influence on MacM across all age groups, this study identifies generational differences in tonal production. Younger speakers demonstrate greater assimilation toward Putonghua, likely reflecting the increasing use of Mandarin by the younger generation. Gender also appears to interact with age-related patterns, as females seem more likely to lead T2 and T3 tonal mergers in MacM. This study offers valuable insights into the current characteristics and future trends of MacM tones, providing new evidence on the factors influencing tonal variation and advancing a more nuanced understanding of these dynamics.
Introduction
Mandarin is spoken by a regional and global community, representing a language with shared global ownership. In this context, the concept of Global Chinese has gained prominence in linguistic studies, promoting a paradigm shift toward a multilingual approach to understanding Mandarin varieties worldwide. The global spread of Mandarin can be conceptualized through three concentric circles of Mandarin users (Goh, 2003, 2017; Goh & Lim, 2010): the Inner Circle, where Mandarin is spoken as a the dominant working language and common language across these two regions; the Outer Circle, where Mandarin has been used as a lingua franca within the local Chinese communities since early days of Chinese immigration; and the Expanding Circle where Mandarin is learned as a foreign language. One significant development in the study of Global Chinese over the past decade is the increased scholarly focus on Outer Circle varieties. However, compared to studies on other Outer Circle varieties such as Singapore Mandarin (SgM), Malaysia Mandarin (MalM) and Hong Kong Mandarin (HkM), Macau Mandarin (MacM) remains underexplored.
Although Macau returned to Chinese sovereignty in 1999 and became a Special Administrative Region of the People’s Republic of China, its sociolinguistic reality places it within the Outer Circle. Despite its small geographical size and population, Macau boasts a highly diverse sociolinguistic environment where Chinese and Portuguese are the official languages. In this context, “Chinese” encompasses both written and spoken forms, with the spoken variant often referred to as Mandarin. Nevertheless, Cantonese is the most widely used local Chinese dialect, and English has gained increasing prominence. Consequently, Macau promotes triliteracy (Chinese, Portuguese and English), and quadrilingualism (Mandarin, Portuguese, Cantonese and English; Bray & Yee, 2005). Despite the official status of Portuguese, fewer than 3% of Macau’s population speaks the language (Qu, 2021). Conversely, Mandarin has gained significant popularity as the medium of instruction in Chinese language teaching in primary and secondary schools post-1999 (Zhang, 2019). By 2016, more than 50% of Macau’s population were fluent Mandarin speakers (Qu, 2021).
Mandarin has thus become Macau’s most common second language (L2; Zhang, 2019). However, Mandarin L2 speakers often diverges from first language (L1) speakers in speech production due to interactions between L1 and L2 systems. According to the Speech Learning Model (SLM) proposed by Flege (1991, 1995a, 1995b), early exposure to an L2 is crucial for achieving nativelike proficiency, with delayed exposure reducing this likelihood. Hence, this model focuses on the differences between early and late learners. The SLM has been further developed as the revised Speech Learning Model (SLM-r; Flege & Bohn, 2021). The revised model shifted the focus to incorporate new postulates, emphasizing the development of phonetic systems throughout the lifespan based on naturalistic L2 input. According to SLM-r, the quantity and quality of L2 input as well as language dominance, play a pivotal role in the phonetic variation among bilingual speakers (Flege, 2007, 2018).
Macau presents a compelling case of tonal variation, as it serves as a cultural linguistic crossroads between East and West. The political and social transformations following the 1999 handover have profoundly affected the linguistic environment, leading to differences in the quantity and quality of Mandarin input between older and younger generations. Furthermore, changes in Macau’s official language policies since the handover have likely influenced language use and acquisition patterns. Against this backdrop, this study investigates recent diachronic changes in MacM. the lexical tones of MacM. Additionally, while prior research suggests that women often lead language changes (Labov, 2001), other studies, such as Y. Liu et al. (2011), have highlighted men are the leaders in tonal changes in Hong Kong Cantonese. Despite these mixed findings, the influence of gender on language change is well-documented. This research addresses the following research questions:
(1) How are the lexical tones realized in MacM?
(2) What are the effects of age and gender on the lexical tones in MacM?
Literature Review
Previous Studies on Tones in Mandarin Varieties
Mandarin is well-known for its tonal system, which plays a crucial role in distinguishing word meanings. The study of lexical tones in isolation serves as a foundational step toward understanding larger linguistic units in speech prosody (C. Xu & Zhang, 2024) because lexical tones are used to contrast word meanings. Four lexical tones are well established in Putonghua and Taiwan Mandarin (TwM): Yinping, Yangping, Shangsheng and Qusheng (Tone 4; henceforth T1 to T4). In Putonghua, these tones are traditionally described using Chao’s notation as [55], [35], [214], and [51] (Chao, 1968). In contrast, studies on TwM have revealed inconsistencies in the realization of T2 and T3. Some research debates whether T2 is a dipping tone or a rising tone and whether T3 is a low-falling tone or level (e.g., Deng et al., 2006; Fon & Chiang, 1999; K. Huang, 2017). Fon et al. (2004) noted a tendency for T2 and T3 to merge in with TwM, specifically as a dipping tone, and highlighted the need for further investigation into analogous tonal mergers in other Mandarin varieties.
A growing body of literature has advanced our understanding of the tonal features of Inner Circle varieties of Mandarin, yet studies on the tonal variations of Outer Circle varieties remain relatively limited. Early research on Outer Circle varieties focused on SgM and MalM. Descriptions of these varieties generally identify four tonal categories (Lee, 2010; Ren & Chiew, 2024). Despite inconsistencies in reported tone value for MalM, the predominant tone realizations are as follows: T1 is a high-level tone, T2 is a rising tone, T3 is a mid-falling, and T4 is a high-falling tone. For instance, T. Huang (2016) describes T2 in MalM as [23], Ren and Chiew (2024) report it as [223]. In SgM, Chua’s (2003) findings align broadly with those of Chen (1983), though no consensus exists on whether T3 is represented by [11] (Chen, 1983) or [211] (Chua, 2003). Similarly, Lee (2010) observed that T2 in SgM includes a mid-level stretch followed by a gentle rise, a pattern also noted in TwM (Xiong & Li, 2008) and MalM (Ren & Chiew, 2024).
In addition to the four tone categories, SgM and MalM may feature an additional “fifth tone” (“T5”), which is derived from the checked tone in Middle Chinese (MC) tone system (circa 200–900 AD) found in southern Chinese dialects. “T5” is characterized by a short high-falling contour and may optionally end with a glottal stop or creaky voice (Chen, 1983; T. Huang, 2016; Khoo, 2014; Ren & Chiew, 2024). This suggests that SgM and MalM might have developed distinct tone systems compared to Putonghua and TwM. However, there is ongoing debate about whether “T5” represents a contrastive tone (T. Huang, 2016; Ng & Phua, 2012) or variant tone (Lee, 2010; Ren & Chiew, 2024). Further variation within SgM and MalM has also been documented. For example, T4 in MalM is frequently described as a high-level tone (Khoo, 2013, 2017; Yeoh, 2019), while T1 in SgM is sometimes identified as a high-falling contour (Chen, 1981). These variations are often attributed to the influence of local southern Chinese dialects, such as Cantonese and Hokkien.
While Inner Circles varieties, such as Putonghua and TwM, typically maintain a four-tone system, Outer Circle varieties like MalM and SgM appear to be evolving toward a five-tone system. However, the status of the additional tone, “T5,” remains highly contentious. T1 and T4 exhibit relative stability across TwM, MalM and SgM. In contrast, T2 and T3 display greater variability. T2 has been observed to include a level stretch preceding the final rise in TwM, MalM, and SgM (Lee, 2010; Ren & Chiew, 2024; Xiong & Li, 2008). T3 in MalM is predominantly realized as a mid- or low-falling contour, whereas in SgM, it is produced as either low-level or low-falling tone (Chen, 1983; Lee, 2010). Additionally, tone mergers between T2 and T3 have been documented in TwM and MacM, though the resulting merged contours differ (Bei, 2021; Fon et al., 2004; Ren & Wang, 2022). Variant tones of T1 and T4 have also been reported in MalM and SgM (Chen, 1981; Khoo, 2013, 2017). These tonal variations are largely attributed to language contact with local southern Chinese dialects such as Cantonese and Hokkien (e.g., Chen, 1981; Khoo, 2013, 2017; Ren & Chiew, 2024; Xiong & Li, 2008).
Given Macau’s geographical and sociolinguistic proximity to Singapore and Malaysia, it can be inferred that MacM likely exhibits many tonal variations that distinguish it from the Inner Circle varieties. However, research on the lexical tones of MacM remains limited, with only one notable study to date (Ren & Wang, 2022). Using recordings from three speakers of the young generation, Ren and Wang (2022) reported a merger between T2 and T3 in MacM, as well as the occurrence of a high-level T4, and “T5,” suggesting the onset of tonal changes. Ren and Wang (2022) suggested that all “T5” tokens derived from the checked tone in MC tone system (Cantonese). However, “T5” in MacM lacks duration contrast between its corresponding lexical tones in MacM and the glottal stop seems to be a redundant element in “T5.”Ren and Wang (2022) observed that Cantonese likely serves as a social motivator for the occurrence of “T5” in MacM. Given the ongoing debate surrounding “T5” in SgM, MalM, and MacM, this study focuses on the four primary lexical tones in MacM. Furthermore, although Bei (2021) identified tonal mergers in MacM, he attributed these to “errors” in tone production, as his study aimed to establish a normative profile of Putonghua tones from a language acquisition perspective.
Hence, although substantial descriptions of dialectal variants in Outer Circle varieties exist, most of these works tend to be overemphasize acoustic results, leading to a flourishing depiction of tonal variations in Mandarin varieties but with limited exploration of their social meanings. For instance, Bei’s (2021) research approached the topic from a language acquisition perspective rather than a sociolinguistic one, leaving the social meanings of tonal variations unexplored. While Bei (2021) analyzed tone production across gender groups, Ren and Wang (2022) focused exclusively on young speakers who grew up after the 1999 handover, limiting their ability to examine ongoing generational changes. Given the profound political and social realignments following the handover, investigating lexical tone production among the older generation in Macau is essential. As Li and Guan (2019) suggested, the tone productions of younger speakers may be ambiguous, potentially reflecting either language change or deviations from the established social norms. To address this ambiguity, further research is required to explore tone production across older age groups. Additionally, a growing body of literature has linked dialectal variations to social factors such as age, gender, and geographical distribution (Baran, 2014; Kuo, 2018; Tseng, 2016; Wang & Zhuang, 2022). Among these factors, gender often plays a prominent role in shaping language variation (Labov, 2001; Y. Liu et al., 2011). Building upon previous studies, the present research conducts a cross-generational analysis to trace the tonal variations and changes influenced by Macau’s return to China. It also examines the role of gender in shaping tonal realizations in MacM.
Cantonese Tone System
Cantonese is renowned for its intricate tonal patterns, closely resembling the MC tone system in persevering register contrasts and checked tones. Hong Kong Cantonese (HKC) is widely regarded as the standard variety of Cantonese, featuring six lexical tones and three checked allotones (Mok et al., 2013; Zhang, 2019). In contrast to the extensive research on HKC tones, the tonal characteristics of Macau Cantonese (MCC) tones have received comparatively little attention. Based on Bei’s (2021) study, Table 1 displays the tone system of MCC. The data indicate that MCC exhibits active mergers between T2 and T5, as well as T8 and T9. However, Zhang (2019) proposed that T2 and T5, along with T3 and T6, are undergoing tonal mergers in MCC. Specifically, T2 and T5 tend to retain a low-rising tone rather than the dipping contour, while T3 and T6 are typically realized as level tones.
Macau Cantonese Tone System.
Methods
Speakers
A total of 24 fluent Mandarin speakers participated in this study, comprising nine male (MG) and 15 female (FG) speakers. Of these, 14 speakers were born and raised in Macau, while the remaining 10 moved to Macau before the age of seven. All speakers identified Cantonese as their native and dominant home language, except for one speaker who also used the Shanghai dialect as the major communication means at home. The speakers were divided into two age groups: those born before 1992 (EG) and those who born in or after 1992 (YG). This division reflects the differing educational systems and language policies the two groups were exposed to. YG speakers represent individuals who consistently experienced the new educational and language policies implemented after the handover, while EG speakers were primarily exposed to the policies of the colonial period and the subsequent transitional phase.
The EG group consisted of 15 speakers, ranging in age from 33 to 61 years [mean age = 43, standard deviation (SD) = 9]. The YG group included nine speakers, aged 22 to 31 [mean age = 27, SD = 3]. Due to the broader age range in the EG group, speakers’ educational backgrounds were reported, as middle-aged speakers may have attained higher levels of Mandarin proficiency due to their education. Within the EG group, seven speakers had obtained bachelor’s degrees, six completed middle school, one received a junior college diploma, and one did not complete middle school. Notably, among the seven speakers with bachelor’s degrees, four completed their primary school to middle school education, and two completed their university education before the 1999 handover. This suggests that exposure to Mandarin was limited during their formative education years, as Mandarin usage was minimal during the colonial period. Speaker profiles were compiled based on their questionnaire responses. None of the speakers reported any hearing or speech disorders. Table 2 provides a detailed overview of the speakers’ demographics.
Detailed Demographics of the Speakers.
Reading Materials
To avoid using Putonghua as a benchmark, the reading material was designed based on Chinese dialectology. According to Chinese dialectology, modern Chinese is believed to have evolved from MC. The reading material consisted of 72 monosyllabic words, organized according to eight tones in MC phonology (MCT1 to MCT8), as outlined in Appendix 1. All the words are commonly used and could be easily recognized and produced by Mandarin speakers in Macau. To facilitate segmentation, syllables with zero onsets were excluded.
Recording Procedure
Due to the absence of a recording laboratory in Macau, the recordings were conducted in a quiet location near the speakers’ homes to ensure a familiar and natural speaking environment. The stimuli were presented individually as traditional Chinese characters displayed on Google Slides via a laptop. Two filler items were added at the beginning and end of the slide presentation to mitigate “beginning- and end-of lists effects” in reading (Hawkins & Midgley, 2005, p.108). The order of the target words was randomized, with a unique order applied to each speaker to avoid potential order effects. Each speaker was recorded individually using a TASCAM DR40X digital recorder sampled at 44.1 kHz (16-bit resolution) and a RODE NTF4 microphone positioned approximately 15 cm from the speaker’s mouth. Speakers were instructed to read the wordlist once, followed by a brief break before recording list a second time. All the recordings were saved in the WAV format for analysis.
Acoustic Measurements
Tokens in the sound files were manually transcribed and labeled by two principal investigators using Praat (Boersma & Weenink, 2021). The tonal domains of these syllables were identified in the waveform and spectrogram, with the annotation boundaries were labeled and adjusted based on spectral features. As Ladefoged and Keith (2014) suggest, tone is quantified by the fundamental frequency (F0). In addition to F0, linguistic features are mediated by the interaction between F0 and duration (Rose, 1989). Therefore, this study employed F0 and duration as the primary acoustic measurements. To ensure accurate extraction of F0 values, vocal period markings were scrutinized and manually corrected using ProsodyPro (Y. Xu, 2013), a script integrated into Praat. Subsequently, ProsodyPro (Y. Xu, 2013) was used to extract the time-normalized F0 values and raw durations. A total of 122 tokens (3.73%) were excluded from the analysis due to non-modal phonation, resulting in 3,327 tokens were used for further analysis.
ProsodyPro generates time-normalized F0 contours with 10 equidistant measurement points. To counter syllable-edge effects, the first and last points (1 and 10) were excluded, designating point 2 in F0 as the new onset, and point 9 as the new offset. Additionally, the duration was measured in milliseconds, and 80% of each token’s extracted duration was analyzed. While the raw duration data were used in this study, the raw F0 values were converted to T-values using Formula (1) (Shi & Wang, 2006). A multiplier of 5 involved in the formula was used to align the normalized F0 values with Chao’s five-scale system, where
However, since actual production and perception do not always align perfectly with Chao’s system, this study adopted L. Liu’s (2007) flexible boundary approach to align T-values with Chao’s system. In Chao’s framework, the speaker’s average pitch range is divided into scales from 0 to 5, conceptualized as follows: 3 is mid, 0 is low, and 5 is high (Chao, 1968). For example, C. Xu and Zhang (2024) propose that Chao’s numeral 2 corresponds to F0 values ranging from 0.9 to 2.1. This overlapping range allows for nuanced tone numeral assignments. Following C. Xu and Zhang’s (2024) recommendations, this flexible strategy was employed in the current research to enhance the accuracy and interpretability of tonal analysis.
Statistical Analysis
Eight-point F0 contours were used to plot the F0 trajectories, and the raw durations were visualized using violin plots. For the statistical analysis, F0 (T-value) onset, F0 (T-value) offset (Li & Guan, 2019; Yang et al., 2022), F0 (T-value) slope (Li & Guan, 2019) and raw duration were included as acoustic correlates. The pitch slope was calculated using the following formula below (Li & Guan, 2019). Since the eight-point extraction has already been normalized, the slope would be simplified to the pitch range: F0offset– F0onset.
To validate the visual observations of F0 contours and tonal duration across groups, a series of Linear Mixed-Effects models were applied using the lme4 packages (Bates et al., 2015) in R (R Development Core Team, 2019). F0 onset, offset, pitch slope and duration were treated as the dependent variables, while age, gender, and tone category were set as fixed variables. Speakers and items were modeled as random intercepts. To maximize the random-effects structure (Barr et al., 2013), the analysis began with the most complex model, which included by-speaker and by-item random slopes and intercepts for all relevant fixed effects. The random-effects structure was progressively simplified until convergence was achieved, ensuring the final model remained superior to the fully specified model.
Model assumptions were tested using the performance packages (Lüdecke et al., 2021), and visual diagnostics were conducted through residual plots, histograms and Q-Q plots. As the assumption of normal distribution and homogeneity were violated, the dependent variables were transformed to achieve a less-skewed data distribution. The results were reported on the logarithmic scale. However, a constant was added non-positive values before applying the log-transformation, as this operation is undefined for zero and negative values. While all F0 onset values were positive, certain transformations were necessary for the other measures. For F0 offset, seven of the 2,880 tokens had zero values, so a constant of 0.01 was added to these values. For F0 slope, 2,054 of the 2,880 tokens had non-positive values, with a minimum value of −4.54. To shift all F0 slope values to positive ones without significant distortion, a constant of 4.55 was added. These adjustments ensured accurate modeling and interpretation of the data while adhering to statistical assumptions.
In the statistical analyses, age, gender, and tone category were deviation-coded to test the main effects. Given that the standards adopted by the lme4 packages for evaluating the significance of fixed effects have been reported to be somewhat weak (Luke, 2017), the p-values for the main effects and interactions of the fixed factors were obtained via likelihood ratio tests using the mixed () function (Winter, 2019). For significant main effects, the post hoc comparisons were conducted using the emmeans function in the emmeans packages (Lenth, 2023). Degrees of freedom were approximated using the Kenward-Roger function, and p-values were adjusted using the Tukey’s method. An alpha level to 0.05 was set for all statistical tests in this study. Tokens not produced with the main F0 trajectories were excluded from the mixed-effects model analysis due to their scarcity. Given the reported tendency of the younger generation to merge T2 and T3 in MacM (Bei, 2021; Ren & Wang, 2022), additional analyses were conducted to examine whether the two tones were comparable in terms of F0 parameters and duration.
Results
Frequency Counts of Tonal Realizations
Figures 1 and 2 present the frequency counts of tonal realizations by age group and gender, respectively. The data reveal high consistency across the two age groups in the predominant tonal realizations of the four lexical tones: T1, high-level tone, T2 as a low-level-rising tone, T3 as a low-level-rising tone; and T4, high-falling tone. This major tonal pattern was consistent across both genders. Despite these main tonal realizations, tonal variants were observed for each lexical tone. While variants of T1, T2 and T4 were infrequent, T3 frequently exhibited an alternative tonal variant, the mid-falling tone, in both age and gender groups. Moreover, the two age groups demonstrated differences in the frequency of occurrence of tone contours. The low-level-rising T1 and T4 variants were unique to the EG group, whereas the dipping T3 variant appeared exclusively in the YG group. Notably, the high-falling tone variant occurred across T1, T2 and T3, while the level contour was observed in T2, T3, and T4 for both age groups. These occurrences may correspond to the so-called “T5” tokens.

Frequency counts of tonal realizations in EG (a) and in YG (b).

Frequency counts of tonal realizations in MG (a) and in FG (b).
Tone Contour Analysis
Figures 3 and 4 illustrate the primary pitch contours of the four lexical tones across age and gender groups. Consistent with previous findings, the main F0 trajectories for the four lexical tones were stable across both age and gender groups. An intriguing observation emerged with the level-rising contours of T2 and T3, which exhibited an initial decrease followed by a low-level stretch preceding the final increase. Despite this overall consistency, T3 tended to have a more pronounced falling onset compared to T2 in both age and gender groups, giving the appearance of a dipping pitch target. However, based on our auditory perception, these “dipping” contours do not reflect an actual dipping pitch target. Thus, in Chao’s notation, the main F0 trajectories of the four lexical tones can be characterized as [44], [223], [223] and [52]. This pattern suggests a tendency for T2 and T3 to merge in MacM.

Main F0 trajectories of T1(a), T2(b), T3(c), and T4(d) in the age groups. The shaded area of each contour paints 95% confidence intervals.

Main F0 trajectories of T1(a), T2(b), T3(c), and T4(d) in the gender groups. The shaded area of each contour paints 95% confidence intervals.
To further assess differences between groups, we examined the impact of age and gender. Panels (a) to (d) in Figure 3 illustrate the main F0 trajectories of the four lexical tones in MacM produced by speakers of different age groups. The trajectories are largely parallel between the two age groups, indicating consistency. However, the F0 trajectories of the four lexical tones produced by the EG group were slightly lower than those of the YG group. Panels (a) to (d) in Figure 4 illustrate the main F0 trajectories of the four lexical tones produced by speakers of different genders. The F0 tracks indicate that females consistently produced higher F0 values across all four lexical tones compared to their male counterparts.
To validate the visual observations, Linear Mixed-Effects Models were employed to analyze F0 (Hz) onset, offset, and slope of the main F0 trajectories. No significant main effect of gender emerged in F0 onset (χ¹ (1) = 1.07, p = .30). However, age emerged as a significant predictor (χ¹ (1) = 4.25, p = .04), indicating that F0 onsets produced by speakers in the YG group were consistently higher than those of the EG group. Similarly, tone category emerged as a significant predictor in the F0 onset (χ¹ (3) = 181.45, p < .001). The main effect resulted from the fact that all tonal pairs significantly differ in the F0 onset (all p < .05) except for the T2 and T3 pair (β = −.02, SE = 0.03, t = −0.94, p = .78). Crucially, significant age x category interactions were observed (χ¹ (3) = 187.64, p < .001). Pairwise comparisons indicated that for T3, the F0 onsets produced by EG speakers differed significantly from those of YG speakers (β = −.25, SE = 0.06, t = −3.99, p = .01). No significant differences in F0 onsets were observed for T1, T2, and T4 across the two age groups (all p > .05). Within age groups, T2 and T3 demonstrated similar performance for both EG (EG: β = .01, SE = 0.03, t = 0.39, p = 1.00; YG) and YG (β = −.06, SE = 0.03, t = 154.2, p = .53). Significant gender x category interactions were also observed (χ¹ (3) =45.35, p < .001). However, within-group comparisons revealed that males and females exhibited similar F0 onsets for T2 and T3 (Female: β = .03, SE = 0.03, t = 1.07, p = .96; Male: β = −.08, SE = 0.03, t = −2.7, p = .13). Additionally, both gender groups demonstrated comparable performance across individual tones (all p > .05). No age x gender interactions were found (χ¹ (1) = 0.12, p = .73). Further, a significant three-way interaction involving age, gender and category was identified for F0 onsets (χ¹ (3) = 22.29, p < .001). Post hoc analyses of within-tone comparisons revealed consistent performance across the gender and age groups (all p > .05). However, comparisons between T2 and T3 indicated that young males had exhibited a distinctive F0 onset (β = −.14, SE = 0.04, t = −3.76, p = .02).
Category also had a significant effect on the F0 offset (χ¹ (3) = 136.47, p < .001). All tonal pairs differed significantly in the F0 offset (all p < .005). Although age did not have a main effect (χ¹ (1) = 2.6, p = .09), a significant age x category interaction was found (χ¹ (3) = 42.53, p < .001). Post hoc analyses revealed that T4 offsets produced by EG significantly differed from those by YG (β = −.27, SE = 0.08, t = −3.40, p = .04). Within-age group comparisons revealed that both age groups performed similarly in F0 offsets between T2 and T3 (EG: β = .13, SE = 0.04, t = 3.02, p = .06; YG: β = .13, SE = 0.05, t = 2.34, p = .28). Additionally, no gender x category interaction (χ¹ (3) = 4.56, p = .21) or age x gender interaction (χ¹ (1) = 0.30, p = .59) emerged. A significant three-way interaction in the F0 offset was also found (χ¹ (3) = 18.18, p < .001). Pairwise comparisons demonstrated no significant within-tone variation within or across gender and age groups (all p > .05). Comparisons between T2 and T3 indicated that F0 offsets remained similar across gender and age groups (all p > .05).
No main effect of gender or age emerged in F0 slopes (gender: χ¹ (1) = 3.20, p = .07; age: χ¹ (1) =0.12, p = .73). However, category emerged as a significant predictor (χ¹ (3) = 249.4, p < .001), due to the different contours of the tones. A significant age ×category interaction was also found (χ¹ (3) = 96.18, p < .001). Pairwise comparisons indicated that F0 slopes of T1, T2, and T3 did not significantly differ between the age groups (all p > .05). However, in T4, a significant difference in F0 slope was observed between the age groups (β = −.17, SE = 0.04, t = −4.59, p = .002). Further comparisons within age groups demonstrated that elder speakers had similar F0 slopes between T2 and T3 (β = .05, SE = 0.02, t = 1.97, p = .51), while young speakers exhibited distinct F0 slopes between T2 and T3 (β = .12, SE = 0.03, t = 3.41, p = .02).
A significant gender ×category interaction was observed in F0 slopes (χ¹ (3) = 19.20, p < .001). Post hoc analyses revealed that when comparing within the tone category, older and younger speakers performed similarly (all p > .05). However, while female speakers demonstrated no significant difference in F0 slopes between T2 and T3 (β = .04, SE = 0.03, t = 1.36, p = .87), male speakers exhibited distinct F0 slopes between the two tones (β = .13, SE = 0.03, t = 4.05, p = .002). No gender×age interaction was observed (χ¹ (1) = 0.87, p = .35).
As with F0 onsets and offsets, a significant age× gender×category interaction emerged in F0 slopes (χ¹ (3) = 22.04, p < .001). Pairwise comparisons suggested that in T4, elder male speakers had distinct F0 slopes compared to young female speakers (β = −.28, SE = 0.06, t = −4.84, p = .004) and young male speakers (β = −.22, SE = 0.05, t = −3.97, p = .03). No statistical significance was found within T1, T2, or T3 across gender and age groups (all p > .05). Further, comparisons between T2 and T3 revealed that young male speakers had a distinctive F0 slope (β = .17, SE = 0.05, t = 3.63, p = .03). Additionally, young male speakers were significantly different from elder male speakers (β = .26, SE = 0.07, t = 3.78, p = .03), while no statistical significance was observed between T2 and T3 within or across other gender and age groups (all p > .05).
Duration Analysis
Figures 5 and 6 illustrate violin plots with overall tonal duration across different age groups and gender groups, respectively. As illustrated in Figure 5, the overall distributions of the two age groups were largely identical. Tonal duration decreased across the four lexical tones: T3 > T1 > T2 > T4 for both age groups. Similarly, as presented in Figure 6, the tonal duration of the four lexical tones in the gender groups followed the same pattern: T3 > T1 > T2 > T4. In both groups, T3 tended to be slightly longer than the other three tones, while T4 appeared shorter in duration compared to the other three tones. The difference in tonal duration between T1 and T2 was minimal.

Tonal duration distribution of the four lexical tones across the age groups.

Tonal duration distribution of the four lexical tones across the gender groups.
To validate our visual observations, we applied Linear Mixed-Effects Models to analyze tonal duration. No main effect of gender or age emerged in tonal duration (gender: χ¹ (1) = 0.97, p = .32; age: χ¹ (1) = 0.20, p = .66). However, the results of the modeling demonstrated a significant main effect of category on tonal duration (χ¹ (3) = 140.26, p < .001), indicating that the tonal duration of T1, T2 and T3 were longer than that of T4. More importantly, a significant interaction between age x category emerged χ¹ (3) = 10.71, p = .01). Pairwise comparisons suggested that within the same category, elderly speakers did not differ significantly from young speakers (all p < .05). Comparisons between T2 and T3 revealed a significant difference of tonal duration in the EG group (β = −.12, SE = 0.03, t = −4.12, p = .002); however, no statistical significance was observed in YG group (β = −.08, SE = 0.03, t = −2.58, p = .17). Additionally, a significant two-way interaction between gender and category was observed (χ¹ (3) = 8.93, p = .03). No statistical significance was found within the same tone category across gender groups (all p > .05). However, gender effects were evident in cross-tone comparisons. A statistical significance was observed between T2 and T3 for male speakers (β = −.12, SE = 0.03, t = −3.88, p = .004), while no statistical significance was found for young speakers (β = −.08, SE = 0.03, t = −2.70, p = .14). No significant gender×age interaction was found for tonal duration. Similar to the F0 analyses, a three-way interaction was also observed in tonal duration (χ¹ (3) = 76.71, p < .001). Pairwise comparisons demonstrated no statistical significances within or across gender and age groups for the same tone category (all p > .05). However, elder male speakers demonstrated a distinct difference in tonal duration between T2 and T3 (β = −.15, SE = 0.03, t = −4.55, p = .001).
Discussion
Returning to the initial questions raised at the beginning of this paper—what are the tonal features of Macau Mandarin (MacM)? how can these differences be explained? Based on the acoustic results, the main tonal patterns in MacM can be summarized as [44], [223], [223] and [52], as opposed to [55], [35], [214] and [51] in Putonghua. Compared with Putonghua, one noticeable discrepancy is the reduced tone space in MacM. Specifically, T1 in MacM is significantly lower than T1 in Putonghua. In addition, a reduction in steepness is observed in MacM tones. In detail, T2 in MacM is a low-level-rising tone rather than a mid-rising tone; MacM T3 is realized as [223], whereas Putonghua T3 is [214]; and MacM T4 is realized as [52] compared to Putonghua counterpart [51].
These findings suggest that MacM is undergoing pitch narrowing and a reduction in contour steepness, contributing to tonal reduction. A similar phenomenon has been observed in other Mandarin varieties. T. Huang (2016) and Ren and Chiew (2024) found that Malaysian Mandarin (MalM) is undergoing tonal reduction compared to Putonghua. Similarly, Chua (2003) observed tonal reduction in Singapore Mandarin (SgM). In both SgM and Taiwan Mandarin (TwM), T2 was reported to have an initial fall followed by a mid-level stretch and a gentle rising (Lee, 2010; Xiong & Li, 2008). Therefore, tonal reduction seems to be a widespread trend across various Mandarin varieties.
Xiong and Li (2008) suggested that contact with Hokkien, a dominant southern Chinese dialect spoken in Taiwan, may explain the variation in T2. However, unlike regions such as Taiwan, Malaysia and Singapore, Macau has less direct contact with Hokkien, especially given that Macau remains predominantly Cantonese-speaking. None of speakers in the current study reported speaking Hokkien. According to Bei and Xiang (2016) and Zhang (2019), Yangping (T4) in MCC is a falling tone, rather than a low-level-rising tone. In light of Macau’s situation, there is insufficient evidence to support previous claims about language contact being the cause of T2 variation; thus, the cause for this variation is more complex.
One alternative explanation could be principle of least effort, which is universally observed in human behavior (Zipf, 1949). This principle suggests that individuals strive to accomplish tasks with the least amount of effort. Martinet (1952, 1955) applied this principle to sound change and proposed that speakers are naturally inclined to convey word meanings with minimal articulation effort. He identified two opposing factors: the need for effective communication and the drive to minimize physical and mental effort.
Thus, one possible explanation for tonal reduction in MacM could be ease of articulation. For instance, producing a dipping contour is more physiologically complex than a level-rising contour because it involves lowering and raising the F0 at the correct points, making it more strenuous to articulate and maintain a long dipping tone. On the other hand, a level-rising contour may be easier to articulate and maintain, thereby minimizing articulatory effort. Furthermore, compared with T2 [35] in Putonghua, T2 in MacM is articulated less distinctly. This variation was also observed in TwM, where T2 was shown to be less dynamic. K. Huang (2017) suggested that the F0 excursion of a dynamic tone, such as T2 in TwM, tends to decrease over time, leading to a flattened contour. Therefore, given that the principle of least effort is fundamental in guiding language use (Wang & Zhuang, 2022), it is likely that Mandarin varieties, including MacM, adhere to this principle.
Our second aim was to understand the effects of age and gender on the four lexical tones in MacM. Our results confirmed those of previous studies, demonstrating that T2 and T3 actively merge in MacM. However, not all speakers exhibited this merger. Despite the limited number of tokens for other variant tones in T3 among the 24 speakers, we found that while seven speakers from the elder generation (EG) had merged T2 and T3, while two younger generation (YG) maintained distinct T2 and T3 contours. These two young speakers regarded the mid-falling contour as the basic contour of T3 and never produced T3 as a level-rising contour. Other speakers alternatively used low-level-rising and mid-falling contours, though one tone contour of the two is weaker. Overall, this suggests that T2 and T3 mergers are more advanced among older speakers. The occurrence of low-level-rising T3 in the current study may be linked to language contact with Cantonese. Yinshang (T2) and Yangshang (T5) in Macau Cantonese (MCC) have merged and are acoustically realized as [324] (Bei, 2021; Bei & Xiang, 2016). However, Bei and Xiang (2016) further noted that Shangsheng in MCC is similar to Yinshang (T2) and Yangshang (T5) in Hong Kong Cantonese (HKC) and Guangzhou Cantonese, which are realized as low-level-rising contours. In Zhang’s (2019) study, MCC active merges T2 and T5, both of which are realized as low-rising tones. Hence, MacM T3, realized as [223], may have been influenced by Cantonese contact.
Furthermore, in MacM, the dipping T3 occurs only in younger speakers. Additionally, a total of 14 tokens of the word 婦 ([fu51], female) were produced as a level-rising tone in MacM. Interestingly, this variant was only observed in elderly speakers. Similarly, 爸 ([pa51], father) was produced as the high-level tone in MacM, and this variation was more frequent among elder speakers. Specifically, 11 out of 13 elder speakers produced 爸 ([pa51], father) as the high-level tone, while only 3 out of 9 younger speakers exhibited this variation. Notably, the phonetic realizations of 婦 ([fu51], female) and 爸 ([pa51], father) as the level-rising tone and high-level tone, respectively, correspond to their Cantonese counterparts
This raises the question: Why would the T2 and T3 mergers more pronounced in older speakers rather than younger ones? Why do younger speakers demonstrate greater assimilation to Putonghua, while older speakers demonstrate less? According to the revised Speech Learning Model (SLM-r) framework, the degree of cross-linguistic influence may be influenced by factors such as the quantity and quality of L2 input (Flege, 2007, 2018). In other words, the formation of target-like phonetic categories is directly affected by the quantity and quality of L2 input. Before the 1999 handover, speakers had limited access to Mandarin in official and non-official settings. In contrast, after the 1999 handover, Chinese became Macau’s official language. As a result, there has been an increase in Mandarin usage in Macau, especially among younger speakers in formal contexts, and Mandarin is now promoted as the medium of instruction for Chinese in primary and secondary schools in Macau (Zhang, 2019). This growing exposure to Mandarin among younger speakers is likely contributing to their tendency to assimilate to Putonghua, resulting in more target-like tonal production in L2. Conversely, the relatively low exposure to Mandarin among elderly speakers may explain their lower assimilation to Putonghua. Thus, contact with Mandarin appears to be one of the main factors driving tonal changes in MacM.
Additionally, the main differences were observed in the onset, offset, slope, and duration. The onset varied greatly with T3 treatment, indicating that any change in pitch onset is more likely to start with T3. Furthermore, the offset and slope of T4 differed notably across the age groups. Elderly speakers maintained a relatively lower offset and steeper slope for T4, while younger speakers exhibited a higher offset and had a less steep slope. This pattern aligns with tonal changes observed in MCC. In Zhang’s (2019) study, elderly speakers demonstrated a lower offset and a steeper slope for T6 (Yangqu) in MCC, while younger speakers had a higher offset and a steeper slope. Therefore, age appears to be the most influential factor in the variation of T4 production, and this variation may also be attributed to the limited exposure to Mandarin among elderly speakers. Additionally, despite comparable offsets of T2 and T3 across all groups, young males maintained two distinct F0 onsets and slopes, while elderly males preserved the tonal duration of these two distinct tones. This suggests that, in elderly and young females, the tonal contrast between T2 and T3 appears to be shrinking. This indicates a gender effect on speech variation and reveals that females may be more sensitive to the merging trend, whereas male speakers have not yet adopted this change. This finding aligns with Labov’s (2001) observation that women often lead certain types of language change.
Thus, both physiological and social factors may contribute to the tonal variations produced by MacM speakers. Compared to Inner Circle speakers, who are consistently exposed to Mandarin in all aspects of daily life, Mandarin speakers in Macau receive less uniform Mandarin input. In a multilingual society such as Macau, speakers may experience significant variation in the frequency of language use and exposure to Mandarin. This variation likely explains the tonal and cross-generational variability observed in MacM. However, it is challenging to account for all the key factors influencing these tonal variations and draw definitive conclusions about MacM speakers. Nonetheless, these findings suggest that MacM is a unique and heterogeneous variety of Mandarin within the Outer Circle.
Conclusion
This study explored the lexical tones in MacM using sociophonetic methods. To the best of our knowledge, it is the first study to examine the apparent-time tonal variation in MacM. MacM was found to undergo tonal reduction compared with Putonghua, a process that may have been constrained by physiological factors. Given that Cantonese remains dominant in Macau, we observed tonal variations influenced by language contact with Cantonese, even in the younger generation. Additionally, across a broader range of age spans, our findings indicate that younger speakers demonstrate more assimilation toward Putonghua, while older speakers demonstrate less, despite L1 influence across all groups. The age-related discrepancy can be attributed to the quantity and quality of L2 input. Furthermore, gender plays a role in phonetic variation in MacM, with females leading the T2 and T3 mergers.
Our statistical results revealed cross-generational and cross-gender variability in tonal properties, while frequency counts revealed more dramatic variability across age groups. This is unsurprising, given that Macau returned to China less than 30 years ago and Cantonese continues to dominate the region. Therefore, generational differences at the phonetic implementation level may still be relatively subtle. However, if the use of Mandarin continues to grow, we can expect more pronounced generational differences in the phonetic realizations of MacM. This study presents a deeper understanding of the ways that MacM is currently spoken and how it is evolving. It also offers new insights into the factors that influence tonal variation in MacM and the forces driving sound change more broadly.
Despite these contributions, the study has limitations. The relatively small sample size may limit the generalizability of the findings to all MacM speakers. Additionally, it would be valuable to explore how MacM tones are produced in word combinations and connected speech, as speakers tend to be more conscious of their pronunciation when wordlist reading. Furthermore, given the wide range of the EG group, detailed information about participants’ professions would strengthen the study’s credibility, as some middle-aged individuals may have a high level of Mandarin proficiency due to their professional backgrounds. Future studies with a more inclusive and refined design will provide a more nuanced understanding of MacM.
Footnotes
Appendix
Target Words in the Word List.
| MC tones | Mandarin word | Phonetic form | Glossary | MC tones | Mandarin word | Phonetic form | Glossary |
|---|---|---|---|---|---|---|---|
| MCT1 | 他 | tha | he | MCT2 | 婆 | pho | mother-in-law |
| 天 | thian | sky | 題 | thi | topic | ||
| 波 | po | wave | 拿 | na | take | ||
| 芳 | faŋ | fragrant | 娘 | niaŋ | mother | ||
| 哥 | kə | older brother | 雷 | lei | thunder | ||
| 偏 | phian | prejudiced | 肥 | fei | fat | ||
| 多 | tuo | many | 房 | faŋ | house | ||
| 飛 | fei | fly | 田 | thian | field | ||
| 加 | tɕia | plus | 財 | tshai | fortune | ||
| MCT3 | 比 | phi | compare | MCT4 | 婦 | fu | female |
| 體 | thi | body | 弟 | ti | younger brother | ||
| 古 | kʊ | ancient | 米 | mi | rice | ||
| 粉 | fən | pink | 夏 | ɕia | summer | ||
| 展 | tʂan | exhibit | 染 | ʐan | dye | ||
| 坦 | than | level | 满 | man | full | ||
| 彩 | tshai | color | 老 | lau | old | ||
| 口 | khou | mouth | 道 | tau | road | ||
| 走 | tsou | walk | 在 | tsai | at | ||
| MCT5 | 課 | khɤ | course | MCT6 | 怒 | nu | angry |
| 爸 | pa | father | 路 | lu | road | ||
| 替 | thi | replace | 賀 | xɤ | temple | ||
| 變 | pian | change | 飯 | fan | rice | ||
| 痛 | thuŋ | pain | 念 | nian | read | ||
| 欠 | tɕhian | own | 共 | kuŋ | together | ||
| 對 | tuei | correct | 耐 | nai | bear | ||
| 嫁 | tɕia | marry | 漏 | lou | leak | ||
| 帶 | tai | band | 隊 | tuei | team | ||
| MCT7 | 失 | ʂʅ | lose | MCT8 | 罰 | fa | punish |
| 吃 | tʂhʅ | eat | 十 | ʂʅ | ten | ||
| 腳 | tɕiao | foot | 昨 | tsuo | yesterday | ||
| 刻 | khɤ | carve | 學 | ɕyɛ | learn | ||
| 拆 | tʂai | dismantle | 直 | tʂʅ | straight | ||
| 割 | kɤ | cut | 入 | ʐu | enter | ||
| 吉 | tɕi | lucky | 六 | liou | six | ||
| 则 | tsɤ | rule | 白 | pai | white | ||
| 尺 | tʂhʅ | ruler | 絕 | tɕyɛ | thief |
Acknowledgements
We extended our gratitude to the participants who contributed to the data collection for this research. We also express our appreciation to the anonymous reviewers for their insightful and invaluable comments on the paper.
Author Contributions
XR and HW contributed to the conception, design of the study, data analysis, and drafting the manuscript. XR contributed to addressing technical aspects of R program.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the International Chinese Language Education Research Program 2022 [grant number: 22YH63D]
Ethical Consideration
This study was approved and conducted in accordance with the ethical guidelines established by Research Committee for Humanities and Social Sciences of Changsha University of Science and Technology [study reference number:20220629-6].
Consent Details
All participants were informed about the purpose, procedures, and potential risks of the study, and written consent was obtained prior to their participation. Participants were assured of their right to withdraw at any stage without any consequences.
Data Availability Statement
The data that support the findings of this research are available on request from the corresponding author. The data are not publicly due to privacy or ethical restrictions.
