Sage Journals: Discover world-class research

Abstract

Links between musicality and vocal emotion perception skills have only recently emerged as a focus of study. Here we review current evidence for or against such links. Based on a systematic literature search, we identified 33 studies that addressed either (a) vocal emotion perception in musicians and nonmusicians, (b) vocal emotion perception in individuals with congenital amusia, (c) the role of individual differences (e.g., musical interests, psychoacoustic abilities), or (d) effects of musical training interventions on both the normal hearing population and cochlear implant users. Overall, the evidence supports a link between musicality and vocal emotion perception abilities. We discuss potential factors moderating the link between emotions and music, and possible directions for future research.

Keywords

amusia musicality musical training vocal emotion perception

Introduction

Human social communication depends on the exchange and mutual representation of multiple social signals. Among these, vocally expressed emotions are fundamental to human interaction (Grandjean, 2020; Sauter, 2017; Scherer, 1986; Scherer et al., 2016; A. W. Young et al., 2020). While humans perceive emotions efficiently and often automatically, interindividual differences in vocal emotion perception skills have recently become a focus of scientific attention (Mill et al., 2009; Schirmer et al., 2005). For humans, voices and music are both prominent means of auditory communication of emotions. Some researchers have emphasized the similarities in the acoustic features of certain emotions in the voice and in music (Juslin & Laukka, 2003; Scherer, 1995), whereas others reported similarities in the neural circuits involved in recognizing basic emotions from music as compared to voices (Aubé et al., 2015). Accordingly, differences in vocal emotion perception skills could be associated with different levels of musicality. Music forms a central part of human culture, and appreciation of music is interconnected with intense emotional experiences (Schäfer et al., 2013). However, there is huge variation in terms of musical aptitude and musical training. Here, we integrate the currently available research that assessed possible links between musicality—defined as sensitivity and/or talent regarding music in terms of both aptitude and training effects—and vocal emotion perception.

This review adds vocal emotion perception to several previous integrative works that discussed the association of musical training with other nonmusical abilities. Compared to nonmusical peers, musicians show superior perception of pitch (referring to the perceived frequency of a sound) and timbre (referring to the perceived quality or “color” of a sound, which allows a listener to perceive that two sounds of the same loudness and pitch can be dissimilar), as well as superior temporal processing (Kraus & Chandrasekaran, 2010), and those advantages expand from the musical into the vocal domain (Chartrand et al., 2008). Beyond the auditory modality, musicians have better audiovisual and auditory-motor integration, working memory, spatial abilities, executive functioning, general intelligence, as well as speech and language skills (Elmer et al., 2018; Schellenberg, 2001, 2016). At the brain level, musicality is associated with widespread functional and structural differences such as larger grey matter volume and stronger connectivity between areas associated with auditory, motor, and visuospatial functions (Kraus & Chandrasekaran, 2010; Pantev & Herholz, 2011). Against this background of established links between musicality and comparatively distant competencies, the field remarkably lacks systematic integration of evidence concerning the related domain of voices, and vocal emotion perception in particular.

Given the aforementioned benefits of musicality, a link between musicality and vocal emotion perception seems plausible. We considered that individual musicality may be determined by a combination of genetic (“talent”) and environmental factors (“training”), the contributions of which can be difficult to determine. Similarly, underlying mechanisms for the link between musicality and vocal emotion could be determined by both nature and nurture factors. On the nature side, some people might have an innate capacity to perceive fine-grained acoustic structures of both musical and vocal sounds, alongside with an inner drive to engage in musical activities. Accordingly, musical and vocal emotional capacities would be linked through genetic factors. This view is in line with Darwin’s protolanguage hypothesis, claiming that music and speech both evolved from the same origin, a musical protolanguage comprised of rudimental vocalizations (Darwin, 1871/2008; Thompson et al., 2012) in which the expression of vocal emotions was a key aspect of communication (Fitch, 2013). Indeed, expression of emotion in music and vocal channels seems to be based on similar acoustic cues, supporting the idea that emotional communication in both channels is intertwined (Juslin & Laukka, 2003). This gives rise to the possibility that capacities in both channels are driven by the same underlying genetic factors, and that there are innate forces that create transfer between musical and vocal capacities.

On the nurture side, it is typically assumed that musical training causes the differences that are observed in musicians and nonmusicians. From this perspective, the acoustic similarity between music and vocal emotions might be a reason why extensive training in the musical domain can lead to an improvement in vocal emotion perception. A more elaborated nurture-based approach is offered by the OPERA hypothesis (Patel, 2011). Although it was originally developed to explain music-to-speech transfer effects, it can also be considered in the context of vocal emotion perception. The OPERA hypothesis states that musical-training benefits’ transfer to other domains only occurs when five conditions are met: (1) overlap, (2) precision, (3) emotion, (4) repetition, and (5) attention. Overlap refers to shared neural networks between music and vocal emotion processing. Indeed, neuroimaging data suggest common neural networks for the processing of emotional sounds, including vocal, musical, and environmental sources (Escoffier et al., 2013; Frühholz et al., 2014, 2016; Grandjean, 2020; Schirmer et al., 2012). Core structures include the auditory cortex, the superior temporal cortex, frontal regions, the insula, the amygdala, the basal ganglia, and the cerebellum (Frühholz et al., 2016). Precision refers to the high auditory-motor demands that musical training places on these shared networks. The third condition, emotion, claims that the musical activity has to be perceived as rewarding. The subjective feeling of highly pleasurable experiences, such as “chills” or “shivers-down-the-spine,” is among the main reasons why humans engage in musical activities, and those experiences are associated with brain activity changes in regions involved in reward, emotion, and arousal—including the amygdala, the ventral striatum, midbrain structures, and orbitofrontal and ventromedial prefrontal areas (Blood & Zatorre, 2001; Stewart et al., 2006). Finally, the necessities of repetition and attention stress the point that training-induced benefits depend critically on how frequently and focused musical activity is pursued over time. It may take years of active musical engagement to observe stable differences in musicians’ brains compared to nonmusicians’ (Kraus & White-Schwoch, 2017), which can be regarded as truly reflecting training-induced changes (Elbert et al., 1995; Kraus & Chandrasekaran, 2010; Pantev & Herholz, 2011).

As it stands, the link of musicality with vocal emotion perception has received relatively little attention and is poorly understood. This seems surprising given that adequate emotion perception is crucial for well-being and perceived quality of life (Phillips et al., 2010; Schorr et al., 2009). However, to the best of our knowledge, there has been no attempt so far to integrate the existing evidence in a systematic manner. In this review, we aim at closing this gap while including the full range of musical abilities: We survey findings from highly trained musicians but also from people with exceptionally poor musical abilities. We also assess studies that investigate the role of individual differences in terms of musical interests or psychoacoustic abilities. Finally, we include music intervention studies with normal hearing individuals as well as cochlear implant (CI) users, who have significant difficulty recognizing vocal emotions due to degraded auditory input (Jiam et al., 2017), which might be improved with music-based interventions (Paquette et al., 2018).

Systematic Literature Search

We conducted parallel literature searches on Web of Science, PubMed, and PsychInfo on March 19, 2020, using the search terms “(voice OR prosody) AND (emotion* OR affect*) AND (music* OR auditory expert* OR amusi* OR auditory training).” We restricted publication language to English and considered empirical studies only. In total, the initial search returned 1,723 articles (Web of Science: N = 755; PubMed: N = 405; PsychInfo: N = 563) to which we applied the following inclusion criteria: (a) vocal emotion perception was assessed as dependent variable, (b) musicality was assessed, manipulated, or used as a defining criterion for a group-based comparison, and (c) responses were measured at the behavioral or brain level. We also screened the reference lists of the identified articles for relevant publications. This selection procedure resulted in a total of 33 articles, which we review in what follows (for a summary, please refer to Table 1). When screening the relevant literature, we noted that a substantial proportion (27 out of 33; 82%) of identified articles were published in 2011 or later, reflecting the current attention to this topic that contributed to motivating our present review.

Table 1.

Summary of Empirical Articles on Musicality and Vocal Emotion Perception that met the Selection Criteria.

Reference	Sample (mean age)	Inclusion criteria	Stimuli and task	Emotions	Results
Musical expertise – Behavioral data		Musicians
Başkent et al. (2018)	10 musicians (12 years); 11 nonmusicians (12 years)	Training onset before age 7; > 5 years of musical training; regular training within the last 3 years	Original and degraded speech, melodic contour identification, vocal emotion identification, and speech understanding in noise	Joy, anger, sadness, and relief	Musician effect only on melodic contour identification in degraded speech condition
Dmitrieva et al. (2006)	48 musicians; 46 controls (three age groups: 7–10 years, 11–13 years, and 14–17 years)	Recruited from the Russian National Orchestra, St. Petersburg	Vocal emotion identification, presentation to one ear with white noise on the other	Joy, anger, and neutral	Musicians were better and faster at emotion recognition
Fuller et al. (2014)	25 musicians (23 years); 25 nonmusicians (22 years)	Training onset before age 7; > 10 years of musical training; regular training within the last 3 years	Original and degraded (8-channel simulated) stimuli, tests on speech, vocal emotion, and melodic contour identification	Joy, anger, sadness, and relief	Musicians were better at emotion recognition in both original and degraded conditions
Lima and Castro (2011)	40 musicians; 40 nonmusicians (two age groups: 18–30 years and 40–60 years)	Instrumentalists; > 8 years of musical training; started in childhood; current regular practice	Vocal emotion identification and intensity rating	Happiness, anger, fear, sadness, surprise, disgust, and neutral	Musicians were better at emotion recognition, controlling for cognitive differences
Nilsonne and Sundberg (1985)	62 musicians (21 years); 51 controls (law students, 24 years)	Music conservatory students	Fundamental frequency contours extracted from utterances, presentation in pairs, choice which was made during depression	Depression versus recovery	Musicians made fewer errors
Parsons et al. (2014)	109 (29 years): parents/nonparents (25/26 musicians; 29/29 nonmusicians, respectively)	> 4 years of formal music training	Pairs of infant cries, only differing in pitch, participants decided which of the two infants sounded more distressed	Distressed infant cries	Advantage for parents with musical training, increasing with years of musical training, no musical training effect for nonparents
Twaite (2016)	58 (22 years) musicians; 61 nonmusicians (21 years)	Instrumentalists; > 8 years of musical training; training onset before age 12; currently > 2-hr weekly practice	Emotion perception in prosodic, lexical, facial, and musical channel	Happiness, anger, fear, sadness, surprise, disgust, and interest	Musicians were better at emotion recognition in the prosodic and musical channels
Weijkamp and Sadakata (2017)	16 musicians (29 years); 16 nonmusicians (22 years)	Instrumentalists; > 5 years of musical training; currently > 2.5-hr weekly practice	Audiovisual Stroop task, judging the emotion of either the face or the voice, plus unimodal classification tasks	Happiness, sadness, and neutral	No group differences in unimodal tasks, but musicians were better at the audiovisual tasks (for both faces and voices)
K. S. Young et al. (2012)	57 (27 years): depressed/healthy (13/15 musicians, 14/15 nonmusicians, respectively)	> 4 years of formal music training	Pairs of infant cries, only differing in pitch, participants decided which of the two infants sounded more distressed	Distressed infant cries	Effect of musical training even during depression, correlation between task score and years of musical training
Thompson et al. (2004)	Exp. 1: nine musicians; 11 nonmusicians (adults). Exp. 2: 28 musicians; 28 nonmusicians (adults). Exp. 3: 43 (all 7-year-old)	> 8 years of formal music training; training onset during childhood	Exp. 1 and 2: VER in spoken sentences or melodic analogues. Exp. 3: VER after 1 year of keyboard, singing, drama, or nothing	Happiness, anger, fear, sadness, and neutral	Exp. 1 and 2: musicians performed better than nonmusicians. Exp. 3: keyboard and drama better than nothing, no effect of singing
Musical expertise – Brain data
Nolden et al. (2017)	17 musicians (25 years); 20 nonmusicians (25 years)	> 5 years of musical training; daily instrument practice	Music and vocal stimuli, task: pure tones detection, EEG recording, oscillatory analysis (theta, alpha, beta, gamma bands)	Happiness, fear, sadness, and neutral	Effects of expertise on the alpha and theta bands (greater activation in musicians in both frequency bands)
Park et al. (2015)	12 musicians (20 years); 12 nonmusicians (19 years)	Instrumentalists; received formal music training (M_{years of training} = 13.8, SD = 2.6)	fMRI scans with passive listening followed by a session with emotion classification	Happiness, fear, sadness, and neutral	Increased activation in musicians’ frontal gyrus, posterior cingulate cortex, and the retrosplenial cortex for sad stimuli only
Pinheiro et al. (2015)	14 musicians (23 years); 14 nonmusicians (23 years)	> 8 years of musical training; daily instrument practice	Emotion classification with intelligible semantic content (SCC) or unintelligible semantic content (PPC), EEG recording	Happiness, anger, and neutral	P50 more positive in controls than in musicians in SCC only; musicians better at recognition of anger in PCC only
Rigoulot et al. (2015)	15 musicians (24 years); 18 nonmusicians (25 years)	> 5 years of musical training; daily instrument practice	Music and vocal stimuli, task: pure tones detection, EEG recording	Happiness, fear, sadness, and neutral	Differential electrophysiological response to vocal and music stimuli for musicians and nonmusicians
Strait et al. (2009)	30 adults (25 years): musicians (by onset age: 11; by years of training: 15); nonmusicians (by onset age: 19; by years of training: 15)	Musicians by onset age: training onset before age 7. Musicians by years: > 10 years musical training	Exposure to infants’ unhappy cries, recording of auditory brainstem responses during a complex and more periodic part of the stimulus	Infant cries	In musicians, subcortical activation is enhanced during the complex portion of the sound, and decreased during the more periodic portion
Congenital amusia		Amusia
Cheung et al. (2020)	20 participants with amusia (22 years); 17 controls (22 years), all Cantonese speakers	Global MBEA score < 71%	Emotion prosody rating task, emotion judgment of written words task, valence judgment of written words task	Happiness, anger, fear, and sadness	Participants with amusia performed worse in the VER task than controls; no difference in tasks with written words
Lima et al. (2016)	13 participants with amusia (58 years); 11 controls (53 years)	Pitch-based MBEA scores more than 2 SD below population average (~72%)	Exp. 1: rating of emotions in voices and faces. Exp. 2: rating of spontaneous and posed laughs	Anger, fear, sadness, disgust, relief, amusement, and pleasure; laughs	Exp. 1: impairments in amusia for all stimulus types. Exp. 2: participants with amusia showed decreased sensitivity to authenticity of laughs
Lolli et al. (2015)	Exp. 1: 40 (aged 18–22), nine considered amusic. Exp. 2: 29 (aged 18–28), three considered amusic	Pitch threshold > 16 Hz in pitch discrimination task	Exp. 1: VER, low-pass filtered (500 Hz) and unfiltered speech. Exp. 2: high-pass filtered (4800 Hz) and unfiltered speech	Happiness, fear, sadness, irritation, tenderness, and neutral	Exp. 1: impairments in amusia in the low-pass filtered but not in the unfiltered condition. Exp. 2: no differences
Pralus et al. (2019)	18 participants with amusia (33 years); 18 controls (35 years)	Global MBEA score < 23/30 (76%) and/or a MBEA pitch score < 22/30 (73%)	Emotion categorization and intensity rating of emotional sentences and vowels	Joy, anger, fear, sadness, and neutral	Worse performance of participants with amusia on vowels but not sentences. Intensity ratings: no differences
Thompson et al. (2012)	12 participants with amusia (50 years); 12 controls (46 years)	MBEA scale subtest score < 22/30 (73%) on two consecutive occasions	VER, self-report questionnaire on emotional prosody perception in daily life	Happiness, fear, sadness, tenderness, irritation, and neutral	VER: participants with amusia worse than controls; report awareness of VER problems in daily life
Zhang et al. (2018)	19 participants with amusia (23 years); 19 controls (23 years), all Mandarin speakers	Pitch-based MBEA score < 65/90 (72%)	Vocal emotion recognition, speech and nonspeech (low-pass filtered at 500 Hz) conditions	Happiness, anger, fear, sadness, surprise, and neutral	Performance of participants with amusia worse than controls for all emotions and conditions
Individual differences
Dibben et al. (2018)	52 (32 years)	–	Rating of perceived emotion in music and speech, questionnaires on personality, emotional intelligence, and musical training	Valence and arousal	Ratings associated with emotional stability, agreeableness, musical training (but for musical stimuli only), and age
Globerson et al. (2013)	Exp. 1: 60 (25 years). Exp. 2: 37 (25 years)	–	Exp. 1 and 2: prosody recognition tasks (pragmatic and emotional) and different psychoacoustic tasks	Happiness, anger, fear, and sadness	Psychoacoustic thresholds explained 31% and 38% of affective and pragmatic prosody recognition performance
Trimmer and Cuddy (2008)	100 (22 years)	–	VER of speech utterances and melodic analogues, tests on intelligence, emotional intelligence, and musical abilities	Joy, anger, fear, sadness, and neutral	Emotional intelligence, not musical training, predicted vocal emotion recognition performance
Waaramaa and Leisiö (2013)	250: 50 per country (Finland: 48 years; Russia: 35 years; Estonia: 32 years; Sweden: 27 years; US: 23 years)	–	Vocal emotion recognition task with replay option, questionnaire on musical interests	Happiness, anger, fear, sadness, surprise, disgust, interest, and neutral	Emotion recognition performance above chance in all countries; musical interests tended to have a positive effect on VER
Musical intervention – Normal hearing listeners
Bodner et al. (2012)	80 (29 years): 39 with social anxiety disorder; 41 healthy controls	–	Intervention: training in happiness recognition in music. Test: VER, pre- and postdesign	Happiness, anger, fear, sadness, and surprise	Music intervention improved recognition of vocal happiness in participants with social anxiety
Mualem and Lavidor (2015)	Exp. 1: 12 intervention group (24 years); 12 control group (25 years). Exp. 2: 23 musicians (26 years)	> 6 years of formal music training	Intervention: 4 x 30 min music sessions. Control: art session focused on emotion expression. Test: VER, pre- and postdesign	Happiness, anger, fear, sadness, and neutral	Exp. 1: intervention group better than control group in VER. Exp. 2: no difference between musicians and nonmusicians
Nashkoff (2007)	42 intervention group; 39 controls	–	Intervention: Pitch discrimination practice (8 weeks). Test: VER, pre- and postdesign	Happiness, fear, sadness, and ambiguous	VER improved after intervention
Musical intervention – Cochlear implant (CI) users
Chari et al. (2020)	18 CI users (62 years), postlingually deafened	–	Intervention: auditory-motor, auditory-only, or no training (3 months). Test: speech, music, and VER tests, pre- and postdesign	Happiness, anger, fear, sadness, and neutral	Training effects only on melodic contour identification task
Fuller et al. (2018)	19 CI users (69 years), postlingually deafened	–	Intervention: music therapy, pitch/timbre group, or control (6 weeks). Test: speech, music, and VER tests, pre- and postdesign	Joy, anger, sadness, and relief	Improvement in VER in the music therapy group; improvement in melodic contour identification in the pitch-timbre group
Good et al. (2017)	18 CI users, children (10 years)	–	Intervention: music (piano) or art training (painting), 6 months. Test: music and VER tests, pre-, mid-, and postdesign	Happiness, anger, fear, and sadness	Music perception and emotional speech prosody perception improved in the music group compared to the art group
Petersen et al. (2012)	18 CI users (53 years); six NH controls (54 years)	–	Intervention: 6-month, one-to-one musical ear training or control (no task). Test: speech, music, and VER tests, pre-, mid-, and postdesign	Happiness and sadness	In the music group, earlier onset of improvement in emotional prosody perception; both CI groups still worse than NH controls
Waaramaa et al. (2018)	25 CI users (12 years); 18 NH controls (12 years)	–	Vocal emotion identification task, self-reported musical interests, and measurements of acoustic parameters of the stimuli	Anger, fear, excitement, and contentment	NH controls performed better than CI users; musical interests and voice quality parameters related to correct identification in both groups

Note. VER = vocal emotion recognition, MBEA = Montreal Battery of Evaluation of Amusia (Peretz et al., 2003), NH = Normal hearing, SCC = semantic content condition, PPC = pure prosody condition.

Review of Identified Literature

After screening the identified literature, it became apparent that studies could be grouped according to different operationalizations of musicality, which focused either on individual differences that existed before the study was conducted or on experimental interventions seeking to create such differences via controlled treatment designs. Following this structure, we first review the evidence concerning individual differences, in three sections focusing on (a) differences between musicians and nonmusicians, (b) data on individuals with congenital amusia, and (c) correlations of vocal emotion recognition with either musical interests or psychoacoustic abilities. Subsequently, (d) we discuss effects of musical training interventions on vocal emotion perception.

Differences in Vocal Emotion Perception Between Musicians and Nonmusicians

Behavioral data

Several studies found a musician effect on vocal emotion recognition in the adult population. The presumably first study comes from Nilsonne and Sundberg (1985), where music students outperformed law students in judging whether a vocal sample was recorded during a depressive period of the speaker or not. These findings were replicated by Thompson et al. (2004). Musically trained participants performed better than untrained participants at categorizing emotional tone sequences extracted from vocal utterances. In a second experiment, however, which included additional sentences in familiar and unfamiliar languages, the effects were less straightforward: musicians’ performance was better for sad, fearful, and neutral, but not for happy and angry prosody. As a caveat, musical training was associated with differences in cognitive abilities too, limiting conclusions from this study. This issue was addressed by Lima and Castro (2011), who compared highly trained musicians with nonmusicians in two age groups (18–30 and 40–60 years). Musicians outperformed their nonmusical peers similarly across emotions and age groups, even when effects were controlled for cognitive differences. Further, similar patterns of misattributions and acoustic cue utilization were observed in both groups, suggesting that group differences were of a quantitative rather than a qualitative nature.

Evidence of a musician effect on vocal emotion perception in children and adolescents is less clear. Dmitrieva et al. (2006) tested vocal emotion perception by musicians and controls in children of three age groups (7–10, 11–13, and 14–17 years). Musicians outperformed their nonmusical peers, but this effect was mainly driven by the youngest group. This finding could indicate either that the limited musical experience in very young children allowed innate aptitude to be more visible or that musical experience might promote an earlier development of emotion sensitivity. Başkent et al. (2018) also studied adolescent musicians and nonmusicians. Their work was motivated by Fuller et al. (2014), who conducted a similar design with adults. Both studies compared emotion recognition performance of unprocessed and degraded speech intended to reproduce the spectro-temporal degradation experienced by CI users (Başkent et al., 2018). Whereas Fuller et al. (2014) found a small musician advantage in both conditions, Başkent et al. (2018) did not, potentially due to limited test power with a much smaller sample compared to Fuller et al.’s (2014). Statistical comparison of both experiments revealed significant age effects, suggesting a maturation of the auditory system during emerging adulthood (Başkent et al., 2018).

At an even earlier stage of development, sensitivity to vocal prosody is of particular relevance in parent–infant interactions, where parents’ adequate behavior crucially depends on their capacity to infer the infants’ needs from the sound of their cries. Parsons et al. (2014) found that parents’ musicality could foster a positive parent–child interaction due to enhanced sensitivity to the infants’ emotional state. K. S. Young et al. (2012) used the same paradigm on musicians and nonmusicians with and without depression to show that musicality can potentially protect against compromised auditory sensitivity towards the infant during a depression period.

Several studies raised the question whether emotional sensitivity in musicians is restricted to the auditory domain. Twaite (2016) compared performance of musicians and nonmusicians for prosodic, lexical, facial, and musical emotions, and reported a musician advantage for the prosodic and the musical channels only, indicating that the musicians’ advantage was limited to the auditory modality. However, Weijkamp and Sadakata (2017) reported somewhat conflicting results. In this study, musicians performed better in an audiovisual task, which might point towards more efficient cross-modal integration.

Up to now, most studies comparing musicians and nonmusicians support the hypothesis that musicians have an advantage in vocal emotion perception. This advantage seems to be of a quantitative rather than a qualitative nature (Lima & Castro, 2011; Twaite, 2016). However, the degree to which this advantage is moderated by factors such as innate musicality, the amount of musical training, age at training onset, or maturation of the auditory system all remain subjects for future research.

Brain data

A small number of studies investigated differences between musicians and nonmusicians with respect to the brain basis of vocal emotional processing. Existing work on brainstem potentials and auditory-evoked responses in the electroencephalogram (EEG) suggests that effects of musicality can be observed in very early stages of vocal emotion processing. Strait et al. (2009) recorded brainstem potentials evoked by acoustically simple and complex portions of an infant’s cry, and reported an intriguing interaction between musical expertise and stimulus complexity: Compared to controls, musicians showed reduced responses in the simple, but increased responses in the complex portion of the sound. The authors interpret these findings as indicating that (a) musical expertise results in fine neural tuning to acoustic features that are important to vocal communication, and (b) subcortical mechanisms contribute to vocal emotion perception. At the cortical level, Pinheiro et al. (2015) and Rigoulot et al. (2015) recorded event-related potentials (ERPs) and found that modulatory effects of musical expertise can be observed in early stages of cortical processing before 100 ms (P50, N100), as well as in later stages (P200). Nolden et al. (2017) reanalyzed the data of Rigoulot et al. (2015) with a focus on induced oscillatory activity, and found larger induced power for musicians in the theta (4–8 Hz) and the alpha bands (8–12 Hz).

Using functional magnetic resonance imaging (fMRI), Park et al. (2015) showed that musicians exhibited increased activation in frontal areas, the posterior cingulate cortex, and the retrosplenial cortex. However, these differences were observed for sad stimuli only. Accordingly, the authors hypothesized that sadness might be of “higher affective saliency” for musicians.

Taken together, the existing work on brainstem potentials and EEG suggests that modulatory effects of musicality can be observed in very early processing steps of vocal emotions, which are associated with a basic analysis of auditory cues and allocation of emotional significance (Schirmer & Kotz, 2006). Neuroimaging complements this by implicating brain regions associated with higher order functions, such as evaluative judgements or empathic engagement (Park et al., 2015). Overall, neuroscientific research on links between musicality and vocal emotion perception is still in its infancy, although it clearly has potential to shed light on the underlying mechanisms of musicians’ enhanced ability to process vocal emotions.

Impairments of Vocal Emotion Perception in Individuals With Congenital Amusia

To gain a full understanding of the association between musicality and vocal emotion perception, it is worthwhile to consider the entire performance spectrum by including individuals with exceptionally poor musical abilities, such as in congenital amusia. Individuals with congenital amusia display a perceptual disorder specific to the musical domain, in the presence of normal hearing and otherwise intact cognition (Ayotte et al., 2002; Stewart et al., 2006). Congenital amusia is usually measured using the Montreal Battery of Evaluation of Amusia (MBEA; Peretz et al., 2003). Thompson et al. (2012) were the first to show poorer vocal prosody recognition in participants with amusia compared to controls, and further observed a certain degree of awareness of their perceptual limitations in daily life. Lolli et al. (2015) suggested that a core problem could be poor pitch (F0) perception: although participants with suspected amusia performed similar to controls on emotion perception from unfiltered or high-pass filtered (4800 Hz) utterances, they performed poorer for low-pass filtered (500 Hz) utterances, which presumably degraded timbre while preserving pitch information. Corroborating a selective deficit in pitch perception, Pralus et al. (2019) found that controls and participants with amusia exhibited comparable emotion recognition for whole sentences, but participants with amusia performed worse for vowels. Of relevance, perceived emotional intensity was comparable in both groups for all stimuli, which was interpreted as preserved implicit processing of emotional prosody in amusia.

Lima et al. (2016) took a cross-modal approach: in two experiments, they tested adults with amusia and matched controls on their ability to identify emotions in different types of vocal stimuli and silent facial expressions. Participants with amusia were found to be impaired in the auditory and the visual domain, implying more universal emotion processing difficulties. Zhang et al. (2018) and Cheung et al. (2020) were interested in relationships between amusia and emotional prosody processing in tonal languages. Compared to controls, they reported poorer performance in participants with amusia, both with a Mandarin-speaking and a Cantonese-speaking background, disconfirming the hypothesis that tonal language acquisition might compensate for pitch processing deficits in participants with amusia. Taken together, published findings on amusia paint a fairly consistent picture, suggesting that musical impairments transfer to vocal emotion perception, and that impairments for vocal emotions may originate in poor pitch perception.

Correlation of Vocal Emotion Perception With Musical Interests or Psychoacoustic Abilities

Complementing studies on extreme groups, other researchers measured normal interindividual variation in musicality in the general population, to link it to variability in vocal emotion perception. These studies result in conflicting findings. The most compelling evidence against such a link may have been provided by Trimmer and Cuddy (2008). Their correlational analysis of 100 participants revealed that musicality, as assessed via MBEA scores, was not associated with vocal emotion perception, a finding they replicated with another 92 participants. Trimmer and Cuddy (2008) concluded that emotion perception in music and the voice is not linked via auditory sensitivity but rather via a supramodal emotional processor. This finding conflicts with many results discussed before, and provoked large debates in the field. For instance, Lima and Castro (2011) argued that participants in the study had only 6.5 years of musical training on average, which might have been insufficient to observe a significant effect. However, Dibben et al. (2018) also failed to observe a relationship between musical interests and moment-to-moment reports of perceived emotion in longer (2–3 min) vocal excerpts. As a limitation, musical interests were assessed with a single dichotomous item in this study, which hardly captured fine-grained interindividual variation in musicality.

Other studies reported positive correlations. Globerson et al. (2013) did not assess full-scale musical ability, but psychoacoustic measures of sensitivity to pitch were found to predict vocal emotion perception performance. This highlights the importance of subtle pitch variations for emotional prosody perception, in line with the impairments found in amusia discussed in the previous section. Finally, Waaramaa and Leisiö (2013) investigated the link between musical interests and emotional prosody perception in a large-scale, cross-cultural study across five different countries (Estonia, Finland, Russia, Sweden, US). Musical interests tended to have a positive effect on vocal emotion identification, but as in Dibben et al. (2018), this finding was based on very few self-reported items only.

In summary, correlational studies on the link between musicality and vocal emotion perception have yielded conflicting results. This could be due to substantial differences in the assessment of musicality across studies, ranging from musical tests to short questionnaires. Indeed, the use of standardized and validated instruments for the assessment of musical interests is desirable, as this should promote better comparability across future studies, and contribute to resolving remaining controversies.

Effects of Musical Training Interventions on Vocal Emotion Perception

Apart from comparing individual differences in musicality, the effectiveness of musical interventions was the focus of several studies, which we review in this section. Note that designs with randomized assignments to intervention and control conditions are particularly valuable in the context of the nature/nurture debate, as they permit to de-confound training effects from self-selection effects when seeking musical education. Our literature survey indicated that intervention studies could be grouped into interventions for normal hearing listeners, on the one hand, and interventions for hearing-impaired individuals with cochlear implants, on the other hand.

Intervention-based studies on normal hearing individuals

A few studies suggest the effectiveness of musical training interventions for normal hearing individuals. Thompson et al. (2004) randomly assigned six-year-old children to one year of training in keyboard, singing, drama, or no lesson. Postintervention, the drama and keyboard groups outperformed the no-lesson group in vocal emotion perception. Perhaps surprisingly, this effect was not found in the singing group. Thompson et al. (2004) speculated that singing may have trained vocal production of pitch contours over time that conflicts with natural prosodic use of the voice. Nashkoff (2007) reported that simple pitch perception training alone can improve speech prosody decoding skills, but only for already highly trained musicians. Another attempt to show the effectiveness of musical interventions was made by Mualem and Lavidor (2015), who assigned participants either to music-based or visual-art-based interventions, which focused explicitly on expression of emotions in the respective domain. After only four sessions, an improvement in vocal emotion recognition performance was observed in the music compared to the art group. However, when both groups were compared to a group of highly trained musicians, no performance differences were found. This could suggest that the effectiveness of the intervention partially reflected “training to the test,” as the intervention explicitly focused on emotions. Finally, Bodner et al. (2012) reported that a music-based intervention improved recognition of happiness in patients with social anxiety disorder (SAD), who often display a persistent bias towards negative emotions. Although these findings need further verification, they suggest that perceptual biases in affective disorders may be attenuated by musical interventions. Together, the body of literature on interventions suggests musical training effects, but it is still sparse for the normal hearing population. Of interest, musical training/interventions have been studied more intensely in the field of hearing rehabilitation for cochlear implant users.

Intervention-based studies on cochlear implant users

All studies reviewed in this section included vocal emotion perception as a part of larger test batteries to assess musical training effects on voice, speech, and music perception in cochlear implant (CI) users. Petersen et al. (2012) recruited CI users within 14 days after implantation. Half of them received a 6-month musical ear training. While distinct improvements were observed for musical perception, the pattern was less clear for vocal emotions: the intervention group showed an earlier onset of improvement but the endpoints were comparable. As a qualification, the freshly implanted CI users in this study were in speech therapy during the intervention, which could have interfered with the musical training. In contrast, Fuller et al. (2018) studied adult CI users with a minimum of one year postimplantation, who were randomly assigned to either (a) a pitch/timbre group that received receptive training, (b) a music therapy group with face-to-face sessions including active music production, or (c) a control group with nonmusical activities, over a period of 6 weeks. Crucially, vocal emotion recognition improved only in the music therapy group, emphasizing the importance of active musical engagement and/or social interaction for training success. Similarly, Chari et al. (2020) also studied adult CI users with at least one year of implant experience, and assigned them to auditory-motor, auditory-only, or no training, over a period of 3 months. However, there was no effect on vocal emotion perception even though the intervention period was about twice as long as in Fuller et al. (2018). Notably, both studies used very small sample sizes, with less than 10 participants per group. While findings are intriguing and potentially important, they call for further exploration and replication with more powerful designs, particularly when effect sizes and statistical power are not (yet) routinely reported.

Only one study investigated the role of musical training in children with CIs, aged 6 and 15 years (Good et al., 2017). Improvements in vocal emotion perception were found after 6 months of piano lessons compared to a visual art training. The authors concluded that musical training might be an effective supplement to auditory rehabilitation in children. In addition to intervention-based approaches, Waaramaa et al. (2018) showed that self-reported musical interests—especially a preference for dancing—predicted vocal emotion perception capacity in CI users. Taken together, findings emphasize an importance of active musical engagement, as compared to pure receptive training, in order to promote recovery of emotion perception after cochlear implantation.

Discussion

While the transfer of musicality to speech perception abilities is well documented, the transfer to emotion perception attracted substantial scientific interest only recently. Overall, while associations between musicality and vocal emotion perception ranged from strongly positive to absent, the majority of studies supported the idea that musicality is associated with better vocal emotion perception capacities. Both studies with highly trained musicians, on the one hand, and with individuals with amusia, on the other hand, suggest that musical capacities are positively associated with vocal emotion recognition. Correlational analyses with varying degrees of musicality in a normal population revealed less consistent results, presumably partly due to their methodological heterogeneity. Musical intervention studies are still sparse but illustrate great potential to improve vocal emotion perception capacities both in the normal hearing population and in cochlear implant users. In the following lines, we will first discuss potential moderators by evaluating the effectiveness of active versus receptive musical training, and the role of different acoustic cues signaling emotionality in both music and the voice. We then will discuss how these studies inform us about the contribution of nature and nurture factors to this link.

Active Engagement in Musical Activities Versus Receptive Training

Several studies suggest that active engagement in a musical task is a crucial factor. They compared purely receptive training to auditory-motor training, and reported stable benefits of auditory-motor training in the vocal emotion domain (Chari et al., 2020; Fuller et al., 2018). There is high consensus in the neuroscientific literature that active engagement in music and the synchronized tuning of auditory, visual, somatosensory, and motor processes is a driving force to adaptive neuroplasticity (Kraus & Chandrasekaran, 2010; Kraus & White-Schwoch, 2017; Palomar-García et al., 2017). Specifically, it has been shown that sensorimotor musical training leads to more robust changes in the auditory cortex compared to pure receptive training (Lappe et al., 2008). This surely does not imply that purely receptive music training is ineffective (Bigand & Poulin-Charronnat, 2006), but motor engagement may add a boost to the auditory fine-tuning process during training. This could be of particular relevance for cochlear implant users (Lehmann & Paquette, 2015), who during rehabilitation face the challenge of massive postimplantation adaptation to the new auditory input. Here, auditory-motor interventions could be particularly efficient in fostering neuroplasticity in auditory areas, and in aiding hearing rehabilitation.

The Role of Different Acoustic Cues and Supramodal Processes

Previous literature suggests that musicians show superior processing of auditory cues (Elmer et al., 2018). The present review reveals that superior pitch processing capacities in people with high levels of musicality are particularly tightly associated with vocal emotion perception. On the one hand, pitch discrimination performance was correlated with emotion perception performance in musicians; on the other hand, there was strong agreement that impaired pitch processing was a key deficit in people with amusia, accounting for impairments in the domain of vocal emotions. However, this conclusion has its limitations since amusia was often defined only based on low scores on the pitch subsets of the MBEA (see Table 1). According to Juslin and Laukka (2003), pitch and timbre cues are highly relevant for vocal emotion perception, but timing parameters like speech rate were found to be equally important. The potential role of timing was largely neglected in all reviewed studies, despite its central role in music, therein often referred to as tempo and rhythm. Lagrois and Peretz (2019) showed that although pitch and rhythm deficits are often linked in people with amusia, they sometimes can appear as distinct disorders. In parallel, there is current evidence of different brain mechanisms processing pitch-related versus timing-related structures in music (Sun et al., 2020). In the future, it would be very informative to investigate vocal emotion perception in people with specific impairments related to the temporal domain of music.

Alongside the notion that enhanced sensitivity to acoustic cues may lead to better emotion perception in people with a higher level of musicality, it was also suggested that there might be a domain-general supramodal process that mediates the link between musicality and emotional perception across domains (Schellenberg & Mankarious, 2012; Trimmer & Cuddy, 2008). Lima and Castro (2011) suggested that musical training might increase the level of “emotional granularity,” meaning a more fine-grained conceptualization and differentiation of emotions that, in turn, could aid emotional perception in other domains. However, although the involvement of supramodal processes seems plausible, the reviewed brain data suggest that modulatory effects of musicality can be observed in very early vocal emotion processing steps, which are associated with a basic analysis of auditory cues and detection of emotional saliency (Schirmer & Kotz, 2006). Hence, the link between musicality and vocal emotion perception seems to be based, at least partially, on a more sophisticated analysis of auditory cues.

Nature and Nurture

Musicality in people emerges from a combination of genetic and environmental factors. Likewise, the observed link between musicality and vocally expressed emotions could be either explained by a dispositional sensitivity to the musical and the vocal channels or by a transfer from musical training effects into the vocal domain. Additionally, conditions of nature and nurture interact in individuals, making it difficult to estimate the degree of their respective contributions. Unfortunately, apart from few randomized treatment studies that potentially isolated training effects, all reviewed articles established correlational designs or studied preexisting groups, and thus cannot provide direct evidence of the relative contributions of nature or nurture conditions. Nevertheless, it is worthwhile to consider implications of certain findings for this debate.

Without exception, all the studies on people with amusia suggested that vocal emotion perception deficits can be associated with a congenital music perception impairment. In that sense, the link between musicality and vocal emotion perception seems to occur in the absence of training effects and might therefore be mediated by genetic factors. These may have evolved in parallel with acoustic similarities between vocal and musical emotions (Juslin & Laukka, 2003), and may be expressed in overlapping neural circuits involved in recognizing basic emotions in voices and music (Frühholz et al., 2016). As a qualification, Bigand and Poulin-Charronnat (2006) showed that a remarkable degree of auditory sophistication can be acquired through exposure to music only, without explicit training. Accordingly, it remains possible that these implicit musical learning processes could be limited in people with amusia if they avoid exposure to music because they enjoy it less. Thus, while the limited vocal emotion perception capacities observed in amusia could hint to a genetic predisposition, they could also result in part from selective exposure.

At the same time, a consistent set of findings in highly trained musicians suggests that explicit musical training does play a central role in the development of auditory and vocal perceptual skills. This points to an influence of environmental factors but there is always a possible confound with natural inclination, as people with better auditory skills may be more likely to start and pursue musical training (Pantev & Herholz, 2011). Accordingly, Dmitrieva et al. (2006) observed superior vocal emotion perception capacities in a very young group of musicians, who presumably had very little musical training yet but might have been selected for musical education based on their auditory sensitivity. Note that many authors who found the musician effect on vocal emotion perception argued that it is very unlikely that it is entirely based on predispositional differences (Lima & Castro, 2011; Strait et al., 2009; Thompson et al., 2004): On the one hand, the effect was still present when participants were matched in socioeducational variables, general intelligence, cognitive control, and personality traits (Lima & Castro, 2011). On the other hand, some studies found a correlation between emotion perception capacities and years of musical education, suggesting a clear impact of training duration (Parsons et al., 2014; Twaite, 2016; K. S. Young et al., 2012). However, this could also reflect a gene–environment interaction since people who have a dispositional aptitude might stick longer to the training. Further, vocal emotion perception was found to be related to age at training onset (Strait et al., 2009). Although it is often difficult to disentangle age at onset from years of musical training, this could suggest a sensitive period for the acquisition of some music-training-induced skills. Accordingly, many studies required musicians to have started training before the age of 7 (see Table 1).

Finally, a few intervention studies with randomized assignment to treatment and control groups aimed at isolating learning effects of musical training and succeeded to improve vocal emotion perception in a healthy population. Note that these interventions were qualitatively different from more “natural” settings of musical education where the focus lies on mastery of an instrument or the singing voice. They were shorter and often particularly focused on emotion expression in music (Bodner et al., 2012; Mualem & Lavidor, 2015), except for Thompson et al. (2004), who randomly assigned children to 1 year of keyboard or singing lessons, but found mixed results. Likewise, studies on cochlear implant users showed that musical training can improve vocal emotion perception in this particular group, but, again, those interventions had an entirely different purpose than for normal hearing participants: instead of fine-tuning a healthy auditory system, CI users have to learn how to restore perception from a severely degraded input. Music-based interventions may be particularly effective in groups with poor auditory resolution to improve sensitivity to auditory cues in the vocal domain (Fuller et al., 2018; Good et al., 2017). Overall, while it may be difficult to generalize the results of these intervention studies to settings of instrumental or vocal music lessons, they show that vocal emotion perception can be improved through musical training in some circumstances. Accordingly, it seems worthwhile to incorporate emotionally oriented teaching units in music lessons or treatment programs.

Identification of Relevant Topics for Future Research

The findings surveyed in this article highlight many relevant aspects that can guide future research on relationships between musicality and vocal emotion perception. We hope this article will inform systematic research programs with better powered designs and standardized research materials, and ultimately promote a refined understanding of the putative common mechanisms underlying musicality and vocal emotion perception. Unfortunately, neuroscientific research in this field is still sparse and unsystematic, and the heterogeneity of the previous studies illustrates the need for more systematic research on candidate subcortical and cortical mechanisms to mediate the link between musicality and vocal emotion perception (for a recent review on subcortical and cortical mechanisms of nonverbal voice perception, see Frühholz & Schweinberger, 2021). Important questions for neuroscientific research include how perceptual neuroplasticity is induced in musicians, how this relates to motor plasticity in musicians’ brains (Elbert et al., 1995), what are the relative roles of training or ongoing maintenance (Merrett et al., 2013), and how each of these aspects relates to vocal emotion perception. Moreover, a particularly relevant comparison in the context of vocal emotion perception is the one between singers and instrumentalists. Most of the reviewed studies only included instrumentalists or did not report on that matter. Only Thompson et al. (2004) compared participants with piano and singing lessons, and their results suggested that singing lessons might even hinder vocal emotion perception, perhaps because the vocal patterns that are trained during singing lessons may conflict with natural vocal emotion expression. Another neglected but related field is the link between musicality and emotion production. It may be reasonable to assume that people who are highly trained in an emotionally expressive art have an advantage in vocal expression of emotion.

Conclusion and Outlook

In this review, we systematically identified and discussed the current state of research on the link between musicality and vocal emotion perception. Overall, the available evidence suggests that musicality is indeed associated with better vocal emotion perception performance. Since adequate perception of vocal emotions is a fundamental prerequisite for everyday social interaction, these results also may add weight to the presumed importance of music and musical education for personal development and quality of life. Musical training can provide a promising supplemental intervention for people who struggle with vocal emotion perception, and while supporting evidence can now be considered strong in the case of cochlear implant users, future applied research seems promising in the context of other target groups as well (e.g., individuals with autism or with auditory impairments compromising social communication). Although data often do not allow for causal inferences, their combined consideration can provide useful information on the question of how different factors of nature and nurture contribute to related skills of emotion perception in the domains of voice and music.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Author note:

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: C. N. research has been supported by the German National Academic Foundation (Studienstiftung des Deutschen Volkes). We thank Annett Schirmer for her helpful and critical comments. We are grateful to Susan Müller and Frank Nussbaum for their support in the preparation of the manuscript.

ORCID iDs

Christine Nussbaum

Stefan R. Schweinberger

References

Aubé

Angulo-Perkins

Peretz

Concha

Armony

J. L.

(2015). Fear across the senses: Brain responses to music, vocalizations and facial expressions. Social Cognitive and Affective Neuroscience, 10(3), 399–407.

Ayotte

Peretz

Hyde

(2002). Congenital amusia: A group study of adults afflicted with a music-specific disorder. Brain, 125(Pt. 2), 238–251. https://doi.org/10.1093/brain/awf028

Başkent

Fuller

C. D.

Galvin

J. J.

Schepel

Gaudrain

Free

R. H.

(2018). Musician effect on perception of spectro-temporally degraded speech, vocal emotion, and music in young adolescents. The Journal of the Acoustical Society of America, 143(5), EL311–EL316. https://doi.org/10.1121/1.5034489

Bigand

Poulin-Charronnat

(2006). Are we “experienced listeners”? A review of the musical capacities that do not depend on formal musical training. Cognition, 100(1), 100–130.

Blood

A. J.

Zatorre

R. J.

(2001). Intensely pleasurable responses to music correlate with activity in brain regions implicated in reward and emotion. Proceedings of the National Academy of Sciences of the USA, 98(20), 11818–11823.

Bodner

Aharoni

Iancu

(2012). The effect of training with music on happiness recognition in social anxiety disorder. Journal of Psychopathology and Behavioral Assessment, 34(4), 458–466. https://doi.org/10.1007/s10862-012-9304-7

Chari

D. A.

Barrett

K. C.

Patel

A. D.

Colgrove

T. R.

Jiradejvong

Jacobs

L. Y.

Limb

C. J.

(2020). Impact of auditory-motor musical training on melodic pattern recognition in cochlear implant users. Otology & Neurotology, 41(4), e422–e431. https://doi.org/10.1097/MAO.0000000000002525

Chartrand

J.-P.

Peretz

Belin

(2008). Auditory recognition expertise and domain specificity. Brain Research, 1220, 191–198. https://doi.org/10.1016/j.brainres.2008.01.014

Cheung

Y. L.

Zhang

(2020). Emotion processing in congenital amusia: The deficits do not generalize to written emotion words. Clinical Linguistics & Phonetics. Advance online publication. https://doi.org/10.1080/02699206.2020.1719209

10.

Darwin

(1871/2008). The descent of man, and selection in relation to sex. Princeton University Press.

11.

Dibben

Coutinho

Vilar

J. A.

Estévez-Pérez

(2018). Do individual differences influence moment-by-moment reports of emotion perceived in music and speech prosody? Frontiers in Behavioral Neuroscience, 12, Article 184. https://doi.org/10.3389/fnbeh.2018.00184

12.

Dmitrieva

E. S.

Gel’man

V. Y.

Zaitseva

K. A.

Orlov

A. M.

(2006). Ontogenetic features of the psychophysiological mechanisms of perception of the emotional component of speech in musically gifted children. Neuroscience and Behavioral Physiology, 36(1), 53–62. https://doi.org/10.1007/s11055-005-0162-6

13.

Elbert

Pantev

Wienbruch

Rockstroh

Taub

(1995). Increased cortical representation of the fingers of the left hand in string players. Science, 270(5234), 305–307.

14.

Elmer

Dittinger

Besson

(2018). One step beyond: Musical expertise and word learning. In Frühholz

Belin

(Eds.), The Oxford handbook of voice perception (pp. 209–235). Oxford University Press.

15.

Escoffier

Zhong

Schirmer

Qiu

(2013). Emotional expressions in voice and music: Same code, same effect? Human Brain Mapping, 34(8), 1796–1810.

16.

Fitch

W. T.

(2013). Musical protolanguage: Darwin’s theory of language evolution revisited. In Bolhuis

J. J.

Everaert

(Eds.), Birdsong, speech, and language: Exploring the evolution of mind and brain (pp. 489–504). The MIT Press.

17.

Frühholz

Schweinberger

S. R.

(2021). Nonverbal auditory communication – Evidence for integrated neural systems for voice signal production and perception. Progress in Neurobiology, 199, Article 101948. https://doi.org/10.1016/j.pneurobio.2020.101948

18.

Frühholz

Trost

Grandjean

(2014). The role of the medial temporal limbic system in processing emotions in voice and music. Progress in Neurobiology, 123, 1–17. https://doi.org/10.1016/j.pneurobio.2014.09.003

19.

Frühholz

Trost

Kotz

S. A.

(2016). The sound of emotions – Towards a unifying neural network perspective of affective sound processing. Neuroscience & Biobehavioral Reviews, 68, 96–110.

20.

Fuller

C. D.

Galvin

J. J.

Maat

Başkent

Free

R. H.

(2018). Comparison of two music training approaches on music and speech perception in cochlear implant users. Trends in Hearing, 22, 1–22. https://doi.org/10.1177/2331216518765379

21.

Fuller

C. D.

Galvin

J. J.

Maat

Free

R. H.

Başkent

(2014). The musician effect: Does it persist under degraded pitch conditions of cochlear implant simulations? Frontiers in Neuroscience, 8, Article 179. https://doi.org/10.3389/fnins.2014.00179

22.

Globerson

Amir

Golan

Kishon-Rabin

Lavidor

(2013). Psychoacoustic abilities as predictors of vocal emotion recognition. Attention, Perception & Psychophysics, 75(8), 1799–1810. https://doi.org/10.3758/s13414-013-0518-x

23.

Good

Gordon

K. A.

Papsin

B. C.

Nespoli

Hopyan

Peretz

Russo

F. A.

(2017). Benefits of music training for perception of emotional speech prosody in deaf children with cochlear implants. Ear and Hearing, 38(4), 455–464. https://doi.org/10.1097/AUD.0000000000000402

24.

Grandjean

(2020). Brain networks of emotional prosody processing. Emotion Review. Advance online publication. https://doi.org/10.1177/1754073919898522

25.

Jiam

N. T.

Caldwell

Deroche

M. L.

Chatterjee

Limb

C. J.

(2017). Voice emotion perception and production in cochlear implant users. Hearing Research, 352, 30–39. https://doi.org/10.1016/j.heares.2017.01.006

26.

Juslin

P. N.

Laukka

(2003). Communication of emotions in vocal expression and music performance: Different channels, same code? Psychological Bulletin, 129(5), 770–814. https://doi.org/10.1037/0033-2909.129.5.770

27.

Kraus

Chandrasekaran

(2010). Music training for the development of auditory skills. Nature Reviews Neuroscience, 11(8), 599–605.

28.

Kraus

White-Schwoch

(2017). Neurobiology of everyday communication: What have we learned from music? The Neuroscientist: A Review Journal Bringing Neurobiology, Neurology and Psychiatry, 23(3), 287–298. https://doi.org/10.1177/1073858416653593

29.

Lagrois

M.-É.

Peretz

. (2019). The co-occurrence of pitch and rhythm disorders in congenital amusia. Cortex, 113, 229–238.

30.

Lappe

Herholz

S. C.

Trainor

L. J.

Pantev

(2008). Cortical plasticity induced by short-term unimodal and multimodal musical training. Journal of Neuroscience, 28(39), 9632–9639.

31.

Lehmann

Paquette

(2015). Cross-domain processing of musical and vocal emotions in cochlear implant users. Frontiers in Neuroscience, 9, Article 343. https://doi.org/10.3389/fnins.2015.00343

32.

Lima

C. F.

Brancatisano

Fancourt

Müllensiefen

Scott

S. K.

Warren

J. D.

Stewart

. (2016). Impaired socio-emotional processing in a developmental music disorder. Scientific Reports, 6, Article 34911. https://doi.org/10.1038/srep34911

33.

Lima

C. F.

Castro

S. L.

(2011). Speaking to the trained ear: Musical expertise enhances the recognition of emotions in speech prosody. Emotion, 11(5), 1021–1031. https://doi.org/10.1037/a0024521

34.

Lolli

S. L.

Lewenstein

A. D.

Basurto

Winnik

Loui

(2015). Sound frequency affects speech emotion perception: Results from congenital amusia. Frontiers in Psychology, 6, Article 1340. https://doi.org/10.3389/fpsyg.2015.01340

35.

Merrett

D. L.

Peretz

Wilson

S. J.

(2013). Moderating variables of music training-induced neuroplasticity: A review and discussion. Frontiers in Psychology, 4, Article 606. https://doi.org/10.3389/fpsyg.2013.00606

36.

Mill

Allik

Realo

Valk

(2009). Age-related differences in emotion recognition ability: A cross-sectional study. Emotion, 9(5), 619–630. https://doi.org/10.1037/a0016562

37.

Mualem

Lavidor

(2015). Music education intervention improves vocal emotion recognition. International Journal of Music Education, 33(4), 413–425. https://doi.org/10.1177/0255761415584292

38.

Nashkoff

(2007). The relationship between pitch discrimination skills and speech prosody decoding skills [Doctoral dissertation, Walden University]. ProQuest Dissertations and Theses Global.

39.

Nilsonne

Å.

Sundberg

. (1985). Differences in ability of musicians and nonmusicians to judge emotional state from the fundamental frequency of voice samples. Music Perception, 2(4), 507–516. https://doi.org/10.2307/40285316

40.

Nolden

Rigoulot

Jolicoeur

Armony

J. L.

(2017). Effects of musical expertise on oscillatory brain activity in response to emotional sounds. Neuropsychologia, 103, 96–105. https://doi.org/10.1016/j.neuropsychologia.2017.07.014

41.

Palomar-García

M.-Á.

Zatorre

R. J.

Ventura-Campos

Bueichekú

Ávila

. (2017). Modulation of functional connectivity in auditory-motor networks in musicians compared with nonmusicians. Cerebral Cortex (New York, N.Y.: 1991), 27(5), 2768–2778. https://doi.org/10.1093/cercor/bhw120

42.

Pantev

Herholz

S. C.

(2011). Plasticity of the human auditory cortex related to musical training. Neuroscience and Biobehavioral Reviews, 35(10), 2140–2154. https://doi.org/10.1016/j.neubiorev.2011.06.010

43.

Paquette

Ahmed

G. D.

Goffi-Gomez

M. V.

Hoshino

A. C. H.

Peretz

Lehmann

(2018). Musical and vocal emotion perception for cochlear implants users. Hearing Research, 370, 272–282.

44.

Park

Gutyrchik

Welker

Carl

Pöppel

Zaytseva

Meindl

Blautzik

Reiser

Bao

(2015). Sadness is unique: Neural processing of emotions in speech prosody in musicians and non-musicians. Frontiers in Human Neuroscience, 8, Article 1049. https://doi.org/10.3389/fnhum.2014.01049

45.

Parsons

C. E.

Young

K. S.

Jegindø

E.-M. E.

Vuust

Stein

Kringelbach

M. L.

(2014). Music training and empathy positively impact adults’ sensitivity to infant distress. Frontiers in Psychology, 5, Article 1440. https://doi.org/10.3389/fpsyg.2014.01440

46.

Patel

A. D.

(2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology, 2, Article 142. https://doi.org/10.3389/fpsyg.2011.00142

47.

Peretz

Champod

A. S.

Hyde

(2003). Varieties of musical disorders: The Montreal Battery of Evaluation of Amusia. Annals of the New York Academy of Sciences, 999(1), 58–75.

48.

Petersen

Mortensen

M. V.

Hansen

Vuust

(2012). Singing in the key of life: A study on effects of musical ear training after cochlear implantation. Psychomusicology: Music, Mind, and Brain, 22(2), 134–151. https://doi.org/10.1037/a0031140

49.

Phillips

L. H.

Scott

Henry

J. D.

Mowat

Bell

J. S.

(2010). Emotion perception in Alzheimer’s disease and mood disorder in old age. Psychology and Aging, 25(1), 38–47. https://doi.org/10.1037/a0017369

50.

Pinheiro

A. P.

Vasconcelos

Dias

Arrais

Gonçalves

Ó. F.

(2015). The music of language: An ERP investigation of the effects of musical training on emotional prosody processing. Brain and Language, 140, 24–34. https://doi.org/10.1016/j.bandl.2014.10.009

51.

Pralus

Fornoni

Bouet

Gomot

Bhatara

Tillmann

Caclin

(2019). Emotional prosody in congenital amusia: Impaired and spared processes. Neuropsychologia, 134, Article 107234. https://doi.org/10.1016/j.neuropsychologia.2019.107234

52.

Rigoulot

Pell

M. D.

Armony

J. L.

(2015). Time course of the influence of musical expertise on the processing of vocal and musical sounds. Neuroscience, 290, 175–184. https://doi.org/10.1016/j.neuroscience.2015.01.033

53.

Sauter

D. A.

(2017). The nonverbal communication of positive emotions: An emotion family approach. Emotion Review, 9, 222–234.

54.

Schäfer

Sedlmeier

Städtler

Huron

(2013). The psychological functions of music listening. Frontiers in Psychology, 4, Article 511. https://doi.org/10.3389/fpsyg.2013.00511

55.

Schellenberg

E. G.

(2001). Music and nonmusical abilities. Annals of the New York Academy of Sciences, 930, 355–371. https://doi.org/10.1111/j.1749-6632.2001.tb05744.x

56.

Schellenberg

E. G.

(2016). Music training and nonmusical abilities. In Hallam

Cross

Thaut

(Eds.), The Oxford handbook of music psychology (Vol. 2, pp. 415–429). Oxford University Press.

57.

Schellenberg

E. G.

Mankarious

(2012). Music training and emotion comprehension in childhood. Emotion, 12(5), 887–891. https://doi.org/10.1037/a0027971

58.

Scherer

K. R.

(1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin, 99(2), 143–165. https://doi.org/10.1037/0033-2909.99.2.143

59.

Scherer

K. R.

(1995). Expression of emotion in voice and music. Journal of Voice, 9(3), 235–248.

60.

Scherer

K. R.

Banse

Wallbott

H. G.

(2016). Emotion inferences from vocal expression correlate across languages and cultures. Journal of Cross-Cultural Psychology, 32(1), 76–92. https://doi.org/10.1177/0022022101032001009

61.

Schirmer

Fox

P. M.

Grandjean

(2012). On the spatial organization of sound processing in the human temporal lobe: A meta-analysis. NeuroImage, 63(1), 137–147.

62.

Schirmer

Kotz

S. A.

(2006). Beyond the right hemisphere: Brain mechanisms mediating vocal emotional processing. Trends in Cognitive Sciences, 10(1), 24–30. https://doi.org/10.1016/j.tics.2005.11.009

63.

Schirmer

Striano

Friederici

A. D.

(2005). Sex differences in the preattentive processing of vocal emotional expressions. NeuroReport, 16(6), 635–639.

64.

Schorr

E. A.

Roth

F. P.

Fox

N. A.

(2009). Quality of life for children with cochlear implants: Perceived benefits and problems and the perception of single words and emotional sounds. Journal of Speech, Language, and Hearing Research, 52(1), 141–152. https://doi.org/10.1044/1092-4388(2008/07-0213)

65.

Stewart

von Kriegstein

Warren

J. D.

Griffiths

T. D.

(2006). Music and the brain: Disorders of musical listening. Brain, 129(10), 2533–2553. https://doi.org/10.1093/brain/awl171

66.

Strait

D. L.

Kraus

Skoe

Ashley

(2009). Musical experience and neural efficiency: Effects of training on subcortical processing of vocal expressions of emotion. The European Journal of Neuroscience, 29(3), 661–668. https://doi.org/10.1111/j.1460-9568.2009.06617.x

67.

Sun

Thompson

W. F.

Liu

Zhou

Jiang

(2020). The human brain processes hierarchical structures of meter and harmony differently: Evidence from musicians and nonmusicians. Psychophysiology, 57(9), Article e13598. https://doi.org/10.1111/psyp.13598

68.

Thompson

W. F.

Marin

M. M.

Stewart

(2012). Reduced sensitivity to emotional prosody in congenital amusia rekindles the musical protolanguage hypothesis. Proceedings of the National Academy of Sciences of the USA, 109(46), 19027–19032. https://doi.org/10.1073/pnas.1210344109

69.

Thompson

W. F.

Schellenberg

E. G.

Husain

(2004). Decoding speech prosody: Do music lessons help? Emotion, 4(1), 46–64. https://doi.org/10.1037/1528-3542.4.1.46

70.

Trimmer

C. G.

Cuddy

L. L.

(2008). Emotional intelligence, not music training, predicts recognition of emotional speech prosody. Emotion, 8(6), 838–849. https://doi.org/10.1037/a0014080

71.

Twaite

(2016). Examining relationships between basic emotion perception and musical training in the prosodic, facial, and lexical channels of communication and in music [Doctoral dissertation, City University of New York]. CUNY Academic Works.

72.

Waaramaa

Kukkonen

Mykkänen

Geneid

(2018). Vocal emotion identification by children using cochlear implants, relations to voice quality, and musical interests. Journal of Speech, Language, and Hearing Research, 61(4), 973–985. https://doi.org/10.1044/2017_JSLHR-H-17-0054

73.

Waaramaa

Leisiö

(2013). Perception of emotionally loaded vocal expressions and its connection to responses to music. A cross-cultural investigation: Estonia, Finland, Sweden, Russia, and the USA. Frontiers in Psychology, 4, Article 344. https://doi.org/10.3389/fpsyg.2013.00344

74.

Weijkamp

Sadakata

(2017). Attention to affective audio-visual information: Comparison between musicians and non-musicians. Psychology of Music, 45(2), 204–215. https://doi.org/10.1177/0305735616654216

75.

Young

A. W.

Frühholz

Schweinberger

S. R.

(2020). Face and voice perception: Understanding commonalities and differences. Trends in Cognitive Sciences, 24(5), 398–410. https://doi.org/10.1016/j.tics.2020.02.001

76.

Young

K. S.

Parsons

C. E.

Stein

Kringelbach

M. L.

(2012). Interpreting infant vocal distress: The ameliorative effect of musical training in depression. Emotion, 12(6), 1200–1205. https://doi.org/10.1037/a0028705

77.

Zhang

Geng

Zhang

(2018, September 2-6). Emotional prosody perception in Mandarin-speaking congenital amusics. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH) 2018 (pp. 2196–2200). Hyderabad. https://doi.org/10.21437/Interspeech.2018-91

Links Between Musicality and Vocal Emotion Perception

Abstract

Keywords

Introduction

Systematic Literature Search

Review of Identified Literature

Differences in Vocal Emotion Perception Between Musicians and Nonmusicians

Behavioral data

Brain data

Impairments of Vocal Emotion Perception in Individuals With Congenital Amusia

Correlation of Vocal Emotion Perception With Musical Interests or Psychoacoustic Abilities

Effects of Musical Training Interventions on Vocal Emotion Perception

Intervention-based studies on normal hearing individuals

Intervention-based studies on cochlear implant users

Discussion

Active Engagement in Musical Activities Versus Receptive Training

The Role of Different Acoustic Cues and Supramodal Processes

Nature and Nurture

Identification of Relevant Topics for Future Research

Conclusion and Outlook

Footnotes

Declaration of Conflicting Interests

Author note:

ORCID iDs

References