Alignment to a Mandarin Target Correlates With Talker Discrimination Ability in Tibetan

Abstract

Extracting talker identity from speech signals is a core perceptual function, yet the mechanisms underlying second Language (L2) identity processing remain unclear. Grounded in the DIVA model and Source-Filter Theory, this study investigated the coupling between perception and production in Tibetan–Mandarin bilinguals. Participants performed a delayed imitation task and a talker-identity discrimination task. Imitation performance was quantified within a multidimensional acoustic space defined by fundamental frequency (F0), harmonics-to-noise ratio (HNR), and formant dispersion (FD). Results indicated that learners achieved significant acoustic convergence toward the L2 model speaker, which persisted as episodic traces across short-term temporal delays. In the discrimination task, sensitivity improved with acoustic distance but plateaued between medium and large distances, while a significant negative response bias in the near condition revealed a tendency toward perceptual assimilation. Crucially, regression and machine-learning analyses revealed that only FD distance was significantly associated with discrimination sensitivity. Unlike source-related cues such as F0 that fluctuate with context, FD reflects relatively invariant vocal-tract structures. These findings suggest that the formation of L2 talker-identity representations involves a functional anatomical alignment with the target speaker through sensorimotor inverse mapping. By locking onto structural invariants like FD, learners can overcome within-person variability to form detailed episodic identity representations. This study extends the scope of auditory targets in speech production models from segmental to indexical levels.

Keywords

L2 imitation L2 speech perception Perception–production coupling L2 talker identity Formant dispersion

Get full access to this article

View all access options for this article.

References

Anikin

Barreda

Reby

(2024). A practical guide to calculating vocal tract length and scale-invariant formant patterns. Behavior Research Methods, 56(6), 5588–5604. https://doi.org/10.3758/s13428-023-02288-x

Anikin

Canessa-Pollard

Pisanski

Massenet

Reby

(2023). Beyond speech: Exploring diversity in the human voice. iScience, 26(11), 108204. https://doi.org/10.1016/j.isci.2023.108204

Barr

D. J.

Levy

Scheepers

Tily

H. J.

(2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001

Barsalou

L. W.

(2008). Grounded cognition. Annual Review of Psychology, 59, 617–645. https://doi.org/10.1146/annurev.psych.59.103006.093639

Bates

Mächler

Bolker

B. M.

(2014). Package lme4: Linear mixed-effects models using eigen and S4 (R Package Version 1.1–7, 1, 1–9). https://doi.org/10.32614/CRAN.package.lme4

Bates

Maechler

Bolker

Walker

(2015). lme4: Linear mixed-effects models using eigen and S4. Journal of Statistical Software, 67, 1–48. https://doi.org/10.18637/jss.v067.i01

Belyk

Johnson

J. F.

Kotz

S. A.

(2018). Poor neuro-motor tuning of the human larynx: A comparison of sung and whistled pitch imitation. Royal Society Open Science, 5(4), 171544. https://doi.org/10.1098/rsos.171544

Boersma

Weenink

(2024). Praat: Doing phonetics by computer [Computer program]. http://www.praat.org/

Boyd

Hejná

(2025). The ‘critical role’ of voice quality in dungeons and dragons: A case study of non-player characters voiced by Matthew Mercer. Language in Society, 55, 278–303. https://doi.org/10.1017/S0047404525000053

10.

Bradshaw

A. R.

Wheeler

E. D.

McGettigan

Lametti

D. R.

(2025). Sensorimotor learning during synchronous speech is modulated by the acoustics of the other voice. Psychonomic Bulletin & Review, 32(1), 306–316. https://doi.org/10.3758/s13423-024-02536-x

11.

Cason

Marmursztejn

D’Imperio

Schön

(2020). Rhythmic abilities correlate with L2 prosody imitation abilities in typologically different languages. Language and Speech, 63(1), 149–165. https://doi.org/10.1177/0023830919826334

12.

Dougherty

S. C.

Perrachione

T. K.

(2016). The language-familiarity effect in talker identification by highly proficient bilinguals depends on second-language immersion. The Journal of the Acoustical Society of America, 139(Suppl. 4), 2161. https://doi.org/10.1121/1.4950404

13.

Drake

Corley

(2015). Articulatory imaging implicates prediction during spoken language comprehension. Memory & Cognition, 43(8), 1136–1147. https://doi.org/10.3758/s13421-015-0530-6

14.

Fleming

Giordano

B. L.

Caldara

Belin

(2014). A language-familiarity effect for speaker discrimination without comprehension. Proceedings of the National Academy of Sciences, 111(38), 13795–13798. https://doi.org/10.1073/pnas.1401383111

15.

Glanzer

Cunitz

A. R.

(1966). Two storage mechanisms in free recall. Journal of Verbal Learning and Verbal Behavior, 5(4), 351–360. https://doi.org/10.1016/S0022-5371(66)80044-0

16.

Goldinger

S. D.

(1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251–279. https://doi.org/10.1037/0033-295X.105.2.251

17.

Guenther

F. H.

(2016). Neural control of speech. MIT Press.

18.

Hao

Y.-C.

de Jong

(2016). Imitation of second language sounds in relation to L2 perception and production. Journal of Phonetics, 54, 151–168. https://doi.org/10.1016/j.wocn.2015.10.003

19.

Hautus

M. J.

(1995). Corrections for extreme proportions and their biasing effects on estimated values of d′. Behavior Research Methods, Instruments, & Computers, 27(1), 46–51. https://doi.org/10.3758/BF03203619

20.

Hautus

M. J.

Macmillan

N. A.

Creelman

C. D.

(2021). Detection theory: A user’s guide (3rd ed.). https://doi.org/10.4324/9781003203636

21.

Hughes

Cardoso

Foulkes

French

Gully

Harrison

(2023). Speaker-specificity in speech production: The contribution of source and filter. Journal of Phonetics, 97, 101224. https://doi.org/10.1016/j.wocn.2023.101224

22.

Jiang

Pell

M. D.

(2017). The sound of confidence and doubt. Speech Communication, 88, 106–126. https://doi.org/10.1016/j.specom.2017.01.011

23.

Jung

G. E.

Han

J.-I.

(2025). Linguistic and social selectivity in phonetic imitation: Evidence from North Korean refugees in Seoul. Lingua, 327, 104038. https://doi.org/10.1016/j.lingua.2025.104038

24.

Kostromitina

Plonsky

(2022). Elicited imitation tasks as a measure of L2 proficiency: A meta-analysis. Studies in Second Language Acquisition, 44(3), 886–911. https://doi.org/10.1017/S0272263121000395

25.

Kuznetsova

Brockhoff

P. B.

Christensen

R. H. B.

(2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. https://doi.org/10.18637/jss.v082.i13

26.

Laméris

T. J.

K. K.

Post

(2023). Phonetic and phono-lexical accuracy of non-native tone production by English-L1 and mandarin-L1 speakers. Language and Speech, 66(4), 974–1006. https://doi.org/10.1177/00238309221143719

27.

Latinus

McAleer

Bestelmeyer

P. E. G.

Belin

(2013). Norm-based coding of voice identity in human auditory cortex. Current Biology, 23(12), 1075–1080. https://doi.org/10.1016/j.cub.2013.04.055

28.

Lavan

Burton

A. M.

Scott

S. K.

McGettigan

(2019). Flexible voices: Identity perception from variable vocal signals. Psychonomic Bulletin & Review, 26(1), 90–102. https://doi.org/10.3758/s13423-018-1497-7

29.

Lavan

Knight

McGettigan

(2019). Listeners form average-based representations of individual voice identities. Nature Communications, 10(1), 2404. https://doi.org/10.1038/s41467-019-10295-w

30.

Lee-Kim

S.-I.

Chou

Y.-C.

(2024). Unmerging the sibilant merger via phonetic imitation: Phonetic, phonological, and social factors. Journal of Phonetics, 103, 101298. https://doi.org/10.1016/j.wocn.2024.101298

31.

Yang

Song

Fang

Zhang

Chen

Cai

(2023). CCLOWW: A grade-level Chinese children’s lexicon of written words. Behavior Research Methods, 55(4), 1874–1889. https://doi.org/10.3758/s13428-022-01890-9

32.

Zhang

Zhao

(2020). Language history questionnaire (LHQ3): An enhanced tool for assessing multilingual experience. Bilingualism: Language and Cognition, 23(5), 938–944. https://doi.org/10.1017/S1366728918001153

33.

Llompart

Reinisch

(2019). Imitation in a second language relies on phonological categories but does not reflect the productive usage of difficult sound contrasts. Language and Speech, 62(3), 594–622. https://doi.org/10.1177/0023830918803978

34.

Meier

A. M.

Guenther

F. H.

(2023). Neurocomputational modeling of speech motor development. Journal of Child Language, 50(6), 1318–1335. https://doi.org/10.1017/S0305000923000260

35.

Muhl

Sheil

Jarutytė

Bestelmeyer

P. E. G.

(2018). The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability. Behavior Research Methods, 50(6), 2184–2192. https://doi.org/10.3758/s13428-017-0985-4

36.

Nagle

C. L.

Baese-Berk

M. M.

(2022). Advancing the state of the art in L2 speech perception-production research: Revisiting theoretical assumptions and methodological practices. Studies in Second Language Acquisition, 44(2), 580–605. https://doi.org/10.1017/S0272263121000371

37.

Nielsen

Scarborough

(2024). On the target of phonetic convergence: Acoustic and linguistic aspects of pitch accent imitation. Journal of Phonetics, 107, 101372. https://doi.org/10.1016/j.wocn.2024.101372

38.

Nygaard

L. C.

Tzeng

C. Y.

(2021). Perceptual integration of linguistic and non-linguistic properties of speech. In Pardo

J. S.

Nygaard

L. C.

Remez

R. E.

Pisoni

D. B.

(Eds.), The handbook of speech perception (pp. 398–427). John Wiley. https://doi.org/10.1002/9781119184096.ch15

39.

Orena

A. J.

Polka

Theodore

R. M.

(2019). Identifying bilingual talkers after a language switch: Language experience matters. The Journal of the Acoustical Society of America, 145(4), EL303–EL309. https://doi.org/10.1121/1.5097735

40.

Pardo

J. S.

Jordan

Mallari

Scanlon

Lewandowski

(2013). Phonetic convergence in shadowed speech: The relation between acoustic and perceptual measures. Journal of Memory and Language, 69(3), 183–195. https://doi.org/10.1016/j.jml.2013.06.002

41.

Pardo

J. S.

Urmanche

Wilman

Wiener

(2017). Phonetic convergence across multiple measures and model talkers. Attention, Perception, & Psychophysics, 79(2), 637–659. https://doi.org/10.3758/s13414-016-1226-0

42.

Perrachione

T. K.

Tufo

S. N. D.

Gabrieli

J. D. E.

(2011). Human voice recognition depends on language ability. Science, 333(6042), 595–595.

43.

Pisanski

Groyecka-Bernard

Sorokowski

(2021). Human voice pitch measures are robust across a variety of speech recordings: Methodological and theoretical implications. Biology Letters, 17(9), 20210356. https://doi.org/10.1098/rsbl.2021.0356

44.

Rojczyk

Porzuczek

Bergier

(2013). Immediate and distracted imitation in second-language speech: Unreleased plosives in English. Research in Language, 11(1), 3–18. https://doi.org/10.2478/v10015-012-0007-7

45.

Rojczyk

Sturm

Przedlacka

(2025). Phonetic imitation in L2 speech: Immediate imitation of English consonant glottalization by speakers of polish. Language Acquisition, 32(1), 102–113. https://doi.org/10.1080/10489223.2023.2253545

46.

K. K.

Cao

G. W.

Mok

(2025). Performance of Montreal forced aligner on Cantonese spontaneous speech. Proceedings of INTERSPEECH, 2025, 5398–5402.

47.

Tourville

J. A.

Guenther

F. H.

(2011). The DIVA model: A neural theory of speech acquisition and production. Language and Cognitive Processes, 26(7), 952–981. https://doi.org/10.1080/01690960903498424

48.

Waters

Kanber

Lavan

Belyk

Carey

Cartei

Lally

Miquel

McGettigan

(2021). Singers show enhanced performance and neural representation of vocal imitation. Philosophical Transactions of the Royal Society B: Biological Sciences, 376(1840), 20200399. https://doi.org/10.1098/rstb.2020.0399

49.

Jiang

Zhang

Wang

(2025). Introducing the Sisu Voice Matching Test (SVMT): A novel tool for assessing voice discrimination in Chinese. Behavior Research Methods, 57(3), 86. https://doi.org/10.3758/s13428-025-02608-3

50.

Yan

Maeda

Ginther

(2016). Elicited imitation as a measure of second language proficiency: A narrative review and meta-analysis. Language Testing, 33(4), 497–528. https://doi.org/10.1177/0265532215594643

Alignment to a Mandarin Target Correlates With Talker Discrimination Ability in Tibetan–Mandarin Bilinguals

Abstract

Keywords

Get full access to this article

References