Sage Journals: Discover world-class research

Abstract

The present study investigated how acoustic and phonetic characteristics of synthetic and natural voices affect personality impressions of the voices. To this end, we conducted a personality rating experiment in which 30 native Korean speakers judged the perceived personality of natural Korean utterances and their synthetic counterparts (voice clones) using the Big-Five personality model. Various acoustic parameters, including measures of voice quality, F0, and articulation rate, were then extracted from the speech, and Intonational Phrase boundary tones were annotated. The ratings of the Big-Five personality traits were reduced to two dimensions (P1: agreeableness, conscientiousness, and emotional stability; P2: extraversion and openness) using a principal component analysis. The results suggest that the acoustic differences between state-of-the-art synthetic speech and its original counterpart can produce varying effects on personality perception. For example, speech produced with a narrower F0 range received lower scores on P1 and P2, but for male speakers, this effect was only observed in synthetic voices, likely due to the less-natural intonational patterns used. The intonation analysis further demonstrates that across speech type, using context-appropriate tones or those conveying positive attitudes improves the overall impression of the voice (both P1 and P2). The results also suggest that a less-modal voice enhances the personality scores overall, but specific voice qualities (i.e., breathiness and creakiness) and voice pitch seem to affect P1 and P2 differently. The present study demonstrates a range of acoustic and phonetic characteristics that should be considered when designing personas for AI voices or developing more likable synthetic voices.

Keywords

Personality perception synthetic speech intonation voice quality

Get full access to this article

View all access options for this article.

References

Anderson

R. C.

Klofstad

C. A.

Mayew

W. J.

Venkatachalam

(2014). Vocal fry may undermine the success of young women in the labor market. PLoS ONE, 9(5), Article e97506.

Apple

Streeter

L. A.

Krauss

R. M.

(1979). Effects of pitch and speech rate on personal attributions. Journal of Personality and Social Psychology, 37(5), 715–727.

Aronovitch

C. D.

(1976). The voice of personality: Stereotyped judgments and their relation to voice quality and sex of speaker. The Journal of Social Psychology, 99(2), 207–220.

Barrick

M. R.

Mount

M. K.

(1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1), 1–26.

Bates

Mächler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1–48.

Baus

McAleer

Marcoux

Belin

Costa

(2019). Forming social impressions from voices in native and foreign languages. Scientific Reports, 9(1), 414.

Belin

Bestelmeyer

P. E.

Latinus

Watson

(2011). Understanding voice perception. British Journal of Psychology, 102(4), 711–725.

Boersma

Weenink

(2023). Praat: Doing phonetics by computer (Version 6.3.09) [Computer program]. http://www.praat.org/

Bolinger

(1978). Intonation across languages. In Greenberg

J. H.

(Ed.), Universals of human language (Vol. 2, pp. 471–524). Stanford University Press.

10.

Bolinger

D. L. M.

(1986). Intonation and its parts: Melody in spoken English. Stanford University Press.

11.

Borkowska

Pawlowski

(2011). Female voice frequency in the context of dominance and attractiveness perception. Animal Behaviour, 82(1), 55–59.

12.

Chen

Niu

Deng

Wang

Zhao

Chen

(2024). F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching. arXiv. https://arxiv.org/pdf/2410.06885

13.

Danielescu

Christian

(2018, April 21–26). A bot is not a polyglot: Designing personalities for multi-lingual conversational agents [Paper presentation]. Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada.

14.

Davidson

(2019). The effects of pitch, gender, and prosodic context on the identification of creaky voice. Phonetica, 76(4), 235–262.

15.

Edwards

Stoll

Lin

Massey

(2019). Evaluations of an artificial intelligence instructor’s voice: Social Identity Theory in human-robot interactions. Computers in Human Behavior, 90, 357–362.

16.

Erickson

Kawahara

Rilliard

Hayashi

Sadanobu

Daikuhara

De Moraes

Obert

(2020, May). Cross cultural differences in arousal and valence perceptions of voice quality [Paper presentation]. Proceedings of the 10th International Conference on Speech Prosody 2020, Tokyo.

17.

Fowler

G. A.

(2011). Are smartphones becoming smart alecks? https://www.wsj.com/articles/SB10001424052970204774604576631271813770508

18.

Gallardo

L. F.

Weiss

(2016, September 8–12). Speech likability and personality-based social relations: A round-robin analysis over communication channels [Paper presentation]. Interspeech, San Francisco, CA, United States.

19.

Garellek

Katz

W. F.

Assmann

P. F.

(2019). The phonetics of voice. In Katz

W. F.

Assmann

P. F.

(Eds.), The Routledge handbook of phonetics (pp. 75–106). Routledge.

20.

Garellek

Samlan

Gerratt

B. R.

Kreiman

(2016). Modeling the voice source in terms of spectral slopes. The Journal of the Acoustical Society of America, 139(3), 1404–1410.

21.

Gobl

Chasaide

A. N.

(2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40(1–2), 189–212.

22.

Gobl

Chasaide

A. N.

(2010). 11 voice source variation and its communicative functions. In Hardcastle

W. J.

Laver

Gibbon

F. E.

(Eds.), The handbook of phonetic sciences (Vol. 50, p. 378). Wiley.

23.

Goldberg

L. R.

(1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 26–42.

24.

Greenberg

J. H.

(1963). Universals of language. MIT Press.

25.

Hodari

Watts

King

(2019). Using generative modelling to produce varied intonation for speech synthesis. arXiv. https://arxiv.org/pdf/1906.04233

26.

Hodges-Simeon

C. R.

Gaulin

S. J.

Puts

D. A.

(2010). Different vocal parameters predict perceptions of dominance and attractiveness. Human Nature, 21, 406–427.

27.

Holliday

J. J.

Walker

Jung

Cho

(2023). Bringing indexical orders to non-arbitrary meaning: The case of pitch and politeness in English and Korean. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 14(1), 1–24.

28.

Ishi

Ishiguro

Hagita

(2010). Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech. EURASIP Journal on Audio, Speech, and Music Processing, 2010, 1–12.

29.

John

O. P.

Naumann

L. P.

Soto

C. J.

(2008). Paradigm shift to the integrative big five trait taxonomy. Handbook of Personality: Theory and Research, 3(2), 114–158.

30.

Jun

S. A.

(1996). The phonetics and phonology of Korean prosody: Intonational phonology and prosodic structure. Routledge.

31.

Jun

S. A.

(2000). K-ToBI (Korean ToBI) labelling conventions. Speech Sciences, 7(1), 143–170.

32.

Jun

S. A.

(Ed.). (2005). Prosodic typology: The phonology of intonation and phrasing. OUP Oxford.

33.

Kaiser

H. F.

(1960). The application of electronic computers to factor analysis. Educational and Psychological Measurement, 20(1), 141–151.

34.

Kawahara

Masuda-Katsuse

De Cheveigne

(1999). Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech Communication, 27(3-4), 187–207.

35.

Kim

S. Y.

Kim

J. M.

Yoo

J. A.

Bae

K. Y.

Kim

S. W.

Yang

S. J.

Shin

I. S.

Yoon

J. S.

(2010). Standardization and validation of big five inventory-Korean version (BFI-K) in elders. Korean Journal of Biological Psychiatry, 17(1), 15–25.

36.

Klofstad

C. A.

Anderson

R. C.

Nowicki

(2015). Perceptions of competence, strength, and age influence voters to select leaders with lower-pitched voices. PLoS ONE, 10(8), Article e0133779.

37.

Kreiman

Gerratt

B. R.

Garellek

Samlan

Zhang

(2014). Toward a unified theory of voice production and perception. Loquens, 1(1), Article e009.

38.

Kreiman

Sidtis

(2011). Foundations of voice studies: An interdisciplinary approach to voice production and perception. John Wiley & Sons.

39.

Kring

A. M.

Smith

D. A.

Neale

J. M.

(1994). Individual differences in dispositional expressiveness: Development and validation of the Emotional Expressivity Scale. Journal of Personality and Social Psychology, 66(5), 934–949.

40.

Kuznetsova

Brockhoff

P. B.

Christensen

R. H. B.

(2017). LmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82, 1–26.

41.

Ladefoged

(1971). Preliminaries to linguistic phonetics. University of Chicago Press.

42.

Lameris

Gustafson

Székely

. (2024). CreakVC: A voice conversion tool for modulating creaky voice. In 25th Interspeech Conference 2024, Kos Island, Greece, September 1-5, 2024 (pp. 1005–1006). International Speech Communication Association. https://www.isca-archive.org/interspeech_2024/lameris24_interspeech.pdf

43.

Lameris

Wlodarczak

Gustafson

Székely

. (2023). Neural speech synthesis with controllable creaky voice style. In International Congress of Phonetic Sciences (ICPhS) (pp. 3141–3145). https://www.diva-portal.org/smash/get/diva2:1789237/FULLTEXT01.pdf

44.

Lee

E. J.

Nass

Brave

(2000, April). Can computer-generated speech have gender? An experimental test of gender stereotype. In CHI’00 extended abstracts on Human factors in computing systems (pp. 289–290). https://dl.acm.org/doi/pdf/10.1145/633292.633461

45.

Lee

H. Y.

(1990). The structure of Korean prosody. University of London, University College London.

46.

Leemann

Kolly

M. J.

Britain

(2018). The English Dialects App: The creation of a crowdsourced dialect corpus. Ampersand, 5, 1–17.

47.

Lenth

M. R.

(2018). Package “lsmeans.” The American Statistician, 34(4), 216–221.

48.

Lin

Hornibrook

Ormond

(2012). Evaluating iPhone recordings for acoustic voice assessment. Folia phoniatrica et logopaedica, 64(3), 122–130.

49.

Loveday

(1981). Pitch, politeness and sexual role: An exploratory investigation into the pitch correlates of English and Japanese politeness formulae. Language and Speech, 24(1), 71–89.

50.

Mallory

E. B.

Miller

V. R.

(1958). A possible basis for the association of voice characteristics and personality traits. Communications Monographs, 25(4), 255–260.

51.

Mandeel

A. R.

Al-Radhi

M. S.

Csapó

T. G.

(2023, October 25–27). Modeling irregular voice in end-to-end speech synthesis via speaker adaptation [Paper presentation]. 2023 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest.

52.

Manfredi

Lebacq

Cantarella

Schoentgen

Orlandi

Bandini

DeJonckere

P. H.

(2017). Smartphones offer new opportunities in clinical voice research. Journal of Voice, 31(1), 111.e1–111.e7.

53.

McAleer

Todorov

Belin

(2014). How do you say ‘Hello’? Personality impressions from brief novel voices. PLoS ONE, 9(3), Article e90779.

54.

McCrae

R. R.

Costa, Jr

P. T

. (1997). Personality trait structure as a human universal. American Psychologist, 52(5), 509–516.

55.

Mileva

Lavan

(2023). Trait impressions from voices are formed rapidly within 400 ms of exposure. Journal of Experimental Psychology: General, 152(6), 1539–1550.

56.

Miller

Maruyama

Beaber

R. J.

Valone

(1976). Speed of speech and persuasion. Journal of Personality and Social Psychology, 34(4), 615–624.

57.

Mohammadi

Origlia

Filippone

Vinciarelli

(2012, October 29). From speech to personality: Mapping voice quality and intonation into personality differences [Paper presentation]. Proceedings of the 20th ACM International Conference on Multimedia, Nara, Japan.

58.

Nass

Lee

K. M.

(2001). Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction. Journal of Experimental Psychology: Applied, 7(3), 171.

59.

Nass

Moon

(2000). Machines and mindlessness: Social responses to computers. Journal of Social Issues, 56(1), 81–103.

60.

Nass

Moon

Green

(1997). Are machines gender neutral? Gender-stereotypic responses to computers with voices. Journal of Applied Social Psychology, 27(10), 864–876.

61.

Ohala

J. J.

(1983). Cross-language use of pitch: An ethological view. Phonetica, 40, 1–18.

62.

Ohala

J. J.

(1984). An ethological perspective on common cross-language utilization of F₀ of voice. Phonetica, 41(1), 1–16.

63.

Park

M. J.

(2012). The meaning of Korean prosodic boundary tones. Brill.

64.

Penton-Voak

I. S.

Pound

Little

A. C.

Perrett

D. I.

(2006). Personality judgments from natural and composite facial images: More evidence for a “kernel of truth” in social perception. Social Cognition, 24(5), 607–640.

65.

Polzehl

Möller

Metze

(2010, December 12–15). Automatically assessing acoustic manifestations of personality in speech [Paper presentation]. 2010 IEEE Spoken Language Technology Workshop, Berkeley, CA, United States.

66.

Polzehl

Möller

Metze

(2011, August 27–31). Modeling speaker personality using voice [Paper presentation]. Twelfth Annual Conference of the International Speech Communication Association, Florence.

67.

Ponsot

Burred

J. J.

Belin

Aucouturier

J. J.

(2018). Cracking the social code of speech prosody using reverse correlation. Proceedings of the National Academy of Sciences of the United States of America, 115(15), 3972–3977.

68.

Pradhan

Lazar

(2021, July 27–29). Hey Google, do you have a personality? Designing personality and personas for conversational agents [Paper presentation]. Proceedings of the 3rd Conference on Conversational User Interfaces, Bilbao.

69.

Prudký

Firc

Malinka

(2023, September). Assessing the human ability to recognize synthetic speech in ordinary conversation [Paper presentation]. 2023 International Conference of the Biometrics Special Interest Group (BIOSIG), Bonn.

70.

Puts

D. A.

Hodges

C. R.

Cárdenas

R. A.

Gaulin

S. J.

(2007). Men’s voices as dominance signals: Vocal fundamental and formant frequencies influence dominance attributions among men. Evolution and Human Behavior, 28(5), 340–344.

71.

Rammstedt

John

O. P.

(2007). Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. Journal of Research in Personality, 41(1), 203–212.

72.

Ren

Tan

Qin

Zhao

Liu

T. Y.

(2020). FastSpeech 2: Fast and high-quality end-to-end text to speech. arXiv. https://arxiv.org/pdf/2006.04558

73.

Ren

Ruan

Tan

Qin

Zhao

Liu

T. Y.

(2019). Fastspeech: Fast, robust and controllable text to speech. Advances in Neural Information Processing Systems, 32.

74.

Revelle

W. R.

(2017). psych: Procedures for personality and psychological research. https://www.scholars.northwestern.edu/en/publications/psych-procedures-for-personality-and-psychological-research/

75.

Riggio

H. R.

Riggio

R. E.

(2002). Emotional expressiveness, extraversion, and neuroticism: A meta-analysis. Journal of Nonverbal Behavior, 26, 195–218.

76.

Samlan

R. A.

Story

B. H.

Bunton

(2013). Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling. Journal of Speech, Language, and Hearing Research, 56, 1209–1223.

77.

Scherer

K. R.

(1978). Personality inference from voice quality: The loud voice of extroversion. European Journal of Social Psychology, 8(4), 467–487.

78.

Shue

Y. L.

Keating

Vicenik

(2011). VoiceSauce: A program for voice analysis [Paper presentation]. Proceedings of the ICPhS XVII, Hong Kong.

79.

Sigurgeirsson

A. T.

King

(2023, June 4–10). Do prosody transfer models transfer prosody? [Paper presentation]. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island.

80.

Smith

B. L.

Brown

B. L.

Strong

W. J.

Rencher

A. C.

(1975). Effects of speech rate on personality perception. Language and Speech, 18(2), 145–152.

81.

Sohn

H. M.

(2001). The Korean language. Cambridge University Press.

82.

Song

Kim

Park

(2023). Acoustic correlates of perceived personality from Korean utterances in a formal communicative setting. PLoS ONE, 18(10), Article e0293222.

83.

Starr

R. L.

(2015). Sweet voice: The role of voice quality in a Japanese feminine style. Language in Society, 44(1), 1–34.

84.

Tabachnick

B. G.

Fidell

L. S.

Ullman

J. B.

(2013). Using multivariate statistics (Vol. 6, pp. 497–516). Pearson.

85.

Teh

T. H.

Mohan

D. S. R.

Hodari

Wallis

C. G.

Ibarrondo

T. G.

Torresquintero

Leoni

Gales

King

(2023, June 4–10). Ensemble prosody prediction for expressive speech synthesis [Paper presentation]. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island.

86.

Tigue

C. C.

Borak

D. J.

O’Connor

J. J.

Schandl

Feinberg

D. R.

(2012). Voice pitch influences voting behavior. Evolution and Human Behavior, 33(3), 210–216.

87.

Vinciarelli

Salamin

Polychroniou

Mohammadi

Origlia

(2012). From nonverbal cues to perception: Personality and social attractiveness. In Esposito

Esposito

A. M.

Vinciarelli

Hoffmann

Müller

V. C.

(Eds.), Cognitive behavioural systems: COST 2102 International Training School, Dresden, Germany, February 21-26, 2011, revised selected papers (pp. 60–72). Springer Berlin Heidelberg.

88.

Waaramaa

Lukkarila

Järvinen

Geneid

Laukkanen

A. M.

(2021). Impressions of personality from intentional voice quality in Arabic-speaking and native Finnish-speaking listeners. Journal of Voice, 35(2), 326.e21–326.e28.

89.

Wang

Chen

Zhang

Zhou

Liu

Chen

Liu

Wang

Zhao

Wei

(2023). Neural codec language models are zero-shot text to speech synthesizers. arXiv. https://arxiv.org/pdf/2301.02111

90.

Wang

Skerry-Ryan

R. J.

Stanton

Weiss

R. J.

Jaitly

Yang

Xiao

Chen

Bengio

Agiomyrgiannakis

Clark

Saurous

R. A.

(2017). Tacotron: Towards end-to-end speech synthesis. arXiv. https://arxiv.org/pdf/1703.10135

91.

Weiss

Trouvain

Barkat-Defradas

Ohala

J. J.

(Eds.). (2020). Voice attractiveness: Studies on sexy, likable, and charismatic speakers. Springer Nature.

92.

Winter

Grawunder

(2012). The phonetic profile of Korean formal and informal speech registers. Journal of Phonetics, 40(6), 808–815.

93.

(2019). Prosody, tone and intonation. In Katz

W. F.

Assmann

P. F.

(Eds.), The Routledge handbook of phonetics (pp. 314–356). Routledge.

94.

Lee

W. L.

Liu

Birkholz

(2013). Human vocal attractiveness as signaled by body size projection. PLoS ONE, 8(4), Article e62397.

95.

Yamamoto

Song

Kim

J. M.

(2020, May 4–8). Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram [Paper presentation]. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona.

96.

Yanushevskaya

Gobl

Ní Chasaide

(2013). Voice quality in affect cueing: Does loudness matter? Frontiers in Psychology, 4, 335.

97.

Yeon

Brown

(2013). Korean: A comprehensive grammar. Routledge.

98.

Zhang

Chen

Yang

(2018, September 2–6). Acoustic analysis of whispery voice disguise in Mandarin Chinese [Paper presentation]. Proceedings of the Interspeech 2018, Hyderabad.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.12 MB

What Determines Personality Impressions of Synthetic and Natural Voices? The Effects of Voice Quality and Intonation

Abstract

Keywords

Get full access to this article

References

Supplementary Material