Abstract
The question of how hearing loss and hearing rehabilitation affect patients’ momentary emotional experiences is one that has received little attention but has considerable potential to affect patients’ psychosocial function. This article is a product from the Hearing, Emotion, Amplification, Research, and Training workshop, which was convened to develop a consensus document describing research on emotion perception relevant for hearing research. This article outlines conceptual frameworks for the investigation of emotion in hearing research; available subjective, objective, neurophysiologic, and peripheral physiologic data acquisition research methods; the effects of age and hearing loss on emotion perception; potential rehabilitation strategies; priorities for future research; and implications for clinical audiologic rehabilitation. More broadly, this article aims to increase awareness about emotion perception research in audiology and to stimulate additional research on the topic.
Introduction
One of the roles of hearing care professionals is to work with patients and families to mitigate the negative effects of hearing loss. This is accomplished by identifying, assessing, diagnosing, and treating adults and children with hearing loss, with the goal of fostering communication and psychosocial function (e.g., American Academy of Audiology, 2004; American Speech-Language-Hearing Association, 2004). Mitigating the effects of hearing loss is especially important for older adults because age-related hearing loss is one of the most common chronic conditions associated with aging (Chien & Lin, 2012). Indeed, prevalence estimates suggest nearly one third of older adults have hearing loss (Baltes & Mayer, 2001; Lin, Thorpe, Gordon-Salant, & Ferrucci, 2011). Acquired hearing loss can have considerable consequences, such as reduced speech audibility (Humes, 2007; Plomp, 1986; Sherbecoe & Studebaker, 2003) and increased cognitive load (McCoy et al., 2005; Rabbitt, 1991; Tun, McCoy, & Wingfield, 2009). These consequences of hearing loss can also have downstream psychosocial sequelae, including increased risk of depressive symptoms (Cacciatore et al., 1999; Monzani, Galeazzi, Genovese, Marrara, & Martini, 2008), increased social isolation (e.g., Mick, Kawachi, & Lin, 2014; Perissinotto, Cenzer, & Covinsky, 2012; Stam et al., 2016), and reduced quality of life, evidenced on both generic and hearing-specific measures (Bess, Lichtenstein, Logan, & Burger, 1989; Chia et al., 2007; Chisolm et al., 2007; Dalton et al., 2003; Gopinath et al., 2012; Meyer, Hickson, Lovelock, Lampert, & Khan, 2014). Consequently, it is of considerable interest not only to improve audibility and communication but to also reduce the negative consequences of hearing loss.
As a field, audiology has made great strides in the consideration of patients’ emotional responses in clinical situations, such as when delivering difficult news (Donald & Kelly-Campbell, 2016; English, Mendel, Rojeski, & Hornak, 1999), considering the role of significant others (Meyer et al., 2014; Scarinci, Worrall, & Hickson, 2009; Singh, Lau, & Pichora-Fuller, 2015; Stark & Hickson, 2004), and evaluating the role of hearing loss in psychosocial well-being (Mener, Betz, Genther, Chen, & Lin, 2013; Mulrow et al., 1990; Pronk et al., 2011). One area that has received less attention, but has considerable potential to affect patients’ psychosocial function, is how hearing loss affects patients’ momentary emotional experiences. Historically, laboratory investigations have almost exclusively used emotionally neutral stimuli. Little is known about how emotionally charged signals affect listeners with hearing loss; for example, can they identify subtle hints of anger in a talker’s voice? Can listeners with hearing loss control their own affective tone? Do happy sounds, such as a baby cooing or laughter, make them feel as happy as someone listening with normal hearing? Because the answers to these questions, and questions like them, have the potential to significantly affect clinical outcomes for adults with hearing loss, a workshop was convened by researchers with interest and experience in studying emotional communication in listeners with normal and impaired hearing.
The Hearing, Emotion, Amplification, Research, and Training (HEART) workshop was held in April, 2017 at Vanderbilt University in Nashville, Tennessee. The purpose of this workshop was to come to a consensus on what is known about the topic, to identify gaps in the existing knowledge, and to suggest priorities for future research. Because research in this area of audiology is in its infancy, the workshop discussions not only included work specific to listeners with hearing loss but also considered research from other fields that typically include participants with normal hearing, including psychology, neuroscience, and computer science. The central themes identified and discussed at the workshop included the following: (a) definitions of emotion perception, (b) appropriate methods and materials, (c) effects of age and hearing loss on emotion, (d) the role of interventions, (e) future directions, and (f) clinical implications. The purpose of this article is to summarize the workshop consensus. Although this article details the HEART workshop discussion and represents the consensus of the participants, it is not intended to be a comprehensive review of the literature or a meta-analysis. Instead, the intent is that this article provides a framework for researchers and clinicians to think about patients’ auditory emotional experiences and to provide a springboard for future research. The article is organized around the central workshop themes; each is discussed in turn.
Defining Emotion Perception
The word “emotion” commonly refers to the psychological and physiological reactions to sensory stimuli. Because a standardized definition of “emotion” across, and even within, disciplines remains elusive, the working group operationally defined emotion based on the definition proffered by Mulligan and Scherer (2012). Based on their philosophy and psychology backgrounds, Mulligan and Scherer suggest that the term emotion should be reserved for short, momentary affective episodes that are directed toward objects (e.g., things, organisms, events, behaviors, or memories) that elicit changes in the body that may be felt and appraised. The authors argue that longer affective states such as moods or predispositions should be categorized as distinct “affective phenomena.”
Classification Systems
Several models have been proposed to classify emotion, and currently there is not a single system that is supported by consensus. Instead, the classification of emotions is often based on one of two general systems, which can be useful for conceptualizing emotion, developing measurement techniques, and evaluating the effects of hearing loss or rehabilitation interventions. The following are the two common classification systems: (a) categorical systems, which suggest all emotions can be described with a few basic descriptors (e.g., anger, fear, sadness, enjoyment) and (b) dimensional systems, which explain the variability in emotions with two dimensions (e.g., valence, arousal), but sometimes more dimensions (e.g., dominance).
Categorical systems
Many experts agree that emotional states can be described based on a set of basic emotions in isolation or in combination. Categorical systems exhibit high face validity, are intuitive, well accepted, and account for much of the variability in participant reports of emotions. That is, when asked to describe emotions, many people will spontaneously produce the descriptors used by categorical theories. A limitation of categorical systems is that the descriptions are inherently tied to a shared, cultural vocabulary. In addition, there is some disagreement about the number and the categories of the basic emotions. For example, Ekman (1992) proposes that the six basic emotions are anger, fear, disgust, sadness, enjoyment, and surprise. Conversely, Izard (1977) proposes 10 basic emotions: interest, joy, surprise, sadness, anger, disgust, contempt, fear, shame, and guilt. Accordingly, most studies on listeners’ perception of auditory emotion have tested between 4 (e.g., Rodero, 2011; Sobin & Alpert, 1999) and 10 emotions (e.g., Linnankoski, Leinonen, Vihla, Laakso, & Carlson, 2005; Sauter, Eisner, Calder, & Scott, 2010a). While more emotions can be more descriptive, they also add to the cognitive load involved in making a response. Adding cognitive load may make it difficult to disambiguate challenges that are primarily cognitive in nature from challenges that are primarily sensory.
Dimensional systems
Unlike categorical systems, dimensional systems can be less dependent on vocabulary. Instead, emotions are described based on a combination of two- or three-dimensional continua. Two dimensions repeatedly emerge: arousal (exciting vs. calming) and valence (pleasant vs. unpleasant). A third dimension, most commonly dominance, has less consistently contributed meaningfully to the classification system (e.g., Bradley & Lang, 1994; Russell, 1980; Russell & Mehrabian, 1977; Watson & Foyle, 1985). Valence can be defined as the hedonic dimension of emotion, ranging from pleasant to unpleasant. Arousal can be defined as the mobilization of energy, ranging from calm to excited. Although semantically orthogonal (an emotion could be high or low on either dimension), emotions that score near the extremes on the valence dimension (i.e., are very pleasant or very unpleasant) are also more likely to score higher on the arousal dimension (e.g., Bradley & Lang, 2000).
Interindividual and Intraindividual Emotion Perception
The working group conceptualized emotion perception as consisting of two types of perception, which are separate, albeit related: (a) how a person perceives or witnesses emotion in others (interindividual perception) and (b) how a person experiences the emotion himself or herself (intraindividual perception). For the remainder of the article, these two types of perception will be considered separately, as each has different methodologies and potentially distinct effects of hearing loss. However, the working group recognizes the significant overlap between witnessing an emotion and experiencing an emotion. For example, facial mimicry of witnessed emotion may be automatic in some contexts (Chan, Livingstone, & Russo, 2013; Hatfield, Cacioppo, & Rapson, 1993; Hoffman, 1984). People also adopt body behaviors congruent with the witnessed emotion (Hess, Blairy, & Philippot, 1999). Evidence suggests that the facial feedback from mimicry can influence an observer to experience the witnessed emotion (Cappella, 1993; Hatfield et al., 1993; Livingstone, Vezer, McGarry, Lang, & Russo, 2016). Although some research has raised important questions regarding the extent to which mimicry can influence experienced or recognized emotions (Hess & Blairy, 2001), the cumulative findings demonstrate that inter- and intraindividual emotion perception are not strictly independent from each other.
Interindividual emotion perception occurs when a person witnesses, observes, recognizes, or identifies emotions in someone or something else. For example, how well can a listener identify that their communication partner is happy or angry? Humans convey a range of emotions in the way they speak to communicate quickly and efficiently (Jang & Elfenbein, 2015). Expressing emotions in speech has three main purposes (Bühler, 1934). First, it can serve as a symptom, allowing the listener to know how the speaker feels (e.g., a squeal of delight when receiving good news). Second, it can be a signal, asking the listener to take action or conveying the speaker’s intent to act (e.g., a shriek of fear that asks for help; Scherer, 1995). Third, it can be a symbol that allows the listener to understand an object or event (e.g., a sigh signifying frustration).
Previous research suggests that typically developing populations are able to correctly identify emotions well above chance levels, for faces (Ebner, Riediger, & Lindenberger, 2010) and vocal cues (Bachorowski, 1999). Accuracy is usually higher for emotion recognition on faces than vocal emotion recognition (Borod et al., 2000; Pell, 2002; Wallbott & Scherer, 1986). Facial emotion recognition is generally around 80% (Ebner et al., 2010; Scherer, Banse, Wallbott, & Goldbeck, 1991), and vocal emotion identification is around 60% (Borod et al., 2000; Scherer, 1995), although the range of accuracies for vocal emotion across studies is from about 60% (Juslin & Laukka, 2001) to more than 80% correct (Lima, Alves, Scott, & Castro, 2014).
Importantly, absolute performance varies considerably depending on the number of emotions presented and response options provided in a task. Some emotions appear to be easier to identify, while others are more difficult. Anger and sadness are relatively easy to identify (Banse & Scherer, 1996; Johnson, Emde, Scherer, & Klinnert, 1986; Juslin & Laukka, 2001; Linnankoski et al., 2005; Paulmann, Pell, & Kotz, 2008; Rodero, 2011; Scherer et al., 1991), whereas disgust (Juslin & Laukka, 2001; Scherer et al., 1991) and surprise (Ebner et al., 2010; Paulmann et al., 2008; Sauter et al., 2010a) tend to be more difficult. The findings for fear are inconsistent, with some studies concluding that fear is well identified (Juslin & Laukka, 2001; Linnankoski et al., 2005; Sobin & Alpert, 1999), while others suggest it is poorly identified (Paulmann et al., 2008; Scherer et al., 1991) relative to other emotions. The hierarchy is generally similar for auditory, visual, and auditory-visual modes (Most & Aviner, 2009), although the identification of happy may be different for facial and vocal emotion (Sen, Isaacowitz, & Schirmer, 2017). Specifically, Dupuis and Pichora-Fuller (2015) report emotion recognition accuracy of voices was lower for happiness than for most other emotions; it is well accepted that happiness is the most readily identifiable emotion on faces (Ruffman, Henry, Livingstone, & Phillips, 2008).
In the auditory domain, emotions are conveyed through different combinations of acoustic cues, with some emotions overlapping more than others (Linnankoski et al., 2005; Sauter et al., 2010a). For instance, anger, despair, and elation all have high mean F0 and high intensity, while sadness and shame have low F0 and low intensity (Banse & Scherer, 1996; Pell, Paulmann, Dara, Alasseri, & Kotz, 2009). Fear tends to exhibit a high F0 and limited F0 variability (Pell et al., 2009), whereas surprise exhibits high F0 and high F0 variability (Laukkanen, Vilkman, Alku, & Oksanen, 1996; Pell et al., 2009). Interestingly, Pell et al. (2009) report the acoustic cues important for recognition of emotional prosody translate across several languages (English, German, Hindi, and Arabic). These results demonstrate the strength of acoustic cues across language and cultural variables, although cross-cultural recognition might be specific to negative emotions (Sauter, Eisner, Ekman, & Scott, 2010b).
Despite the commonalities across studies in the acoustics underlying interindividual emotion perception, the expression of an emotion can be variable within and across talkers (Spackman, Brown, & Otto, 2009). Differences between talkers can interact with the expressed emotion, possibly making the differences between talkers more noticeable than the differences between emotions. For example, Dupuis & Pichora-Fuller (2015) found a significant interaction between emotion and talker on an emotion recognition task; accuracy for an older talker was better than for a younger talker, even when both were portraying “anger.” Conversely, accuracy was higher for happiness and sadness when these emotions were portrayed by the younger talker compared with the older talker. This finding is consistent with a body of literature demonstrating effects of talker demographics, such as age (Ebner et al., 2010; Sen et al., 2017) and gender (Chatterjee et al., 2015; Zuckerman, Lipets, Koivumaki, & Rosenthal, 1975) on interindividual emotion perception, particularly as they interact with characteristics of the listener such as age and gender (Ebner et al., 2010; Riediger, Voelkle, Ebner, & Lindenberger, 2011; Thompson & Voyer, 2014).
Intraindividual emotion perception refers to a person’s reactions to and experience of listening to or viewing a stimulus containing emotion information. That is, does a person experience happiness when listening to uplifting music or laughter? This type of emotion might also be referenced as an emotional response, elicited emotion, or emotional reactivity. The motivational theory of emotion suggests emotional responses serve two distinct purposes, depending on the valence (Bradley, Codispoti, Cuthbert, & Lang, 2001; Lang, 1995; Taylor, 1991). Aversive or unpleasant stimuli prepare a body for immediate action (e.g., running away from danger), whereas pleasant stimuli are appetitive, encourage approach behavior, and enhance a person’s well-being. Emotional responses might also have implications for speech recognition and cognition. Unpleasant stimuli can improve speech recognition (Dupuis & Pichora-Fuller, 2010) and facilitate focused attention (e.g., Baumeister, Bratslavsky, Finkenauer, & Vohs, 2001; Kensinger, 2009). Pleasant stimuli have larger effects in studies where an appetitive or broader attention might be beneficial, such as stress recovery (Alvarsson, Wiens, & Nilsson, 2010; Annerstedt et al., 2013; Ulrich et al., 1991) or creative thinking (Fredrickson, 2001).
Unlike interindividual emotion perception, the acoustic cues underlying intraindividual emotion perception are less understood. Arousal ratings generally have a clearer relationship with acoustic cues than valence ratings (Schmidt, Janse, & Scharenborg, 2016b). High-pitched, high-amplitude speech and music both carry higher levels of arousal (Goudbeek & Scherer, 2010; Ilie & Thompson, 2006; Laukka, Juslin, & Bresin, 2005; Ma & Thompson, 2015; Weninger, Eyben, Schuller, Mortillaro, & Scherer, 2013). However, there is relatively weak and mixed evidence for the acoustic encoding of valence (Banse & Scherer, 1996; Goudbeek & Scherer, 2010; Juslin & Laukka, 2001; Laukka et al., 2005; Picou, 2016b). Indeed, the relationship between acoustic cues and valence depends on whether the stimulus is vocal emotion, music, or nonspeech sounds. For example, Weninger et al. (2013) found louder speech was more unpleasant than quieter speech, whereas in music, louder music was associated with higher ratings of pleasantness. Similarly, F0 is inversely related to perceived pleasantness in speech (Schmidt et al., 2016b) but is not related to valence in music or other nonspeech sounds (Weninger et al., 2013).
The Role of Cognition
Although an extended review of the topic was beyond the scope of the HEART workshop, the working group recognized the important contributions of cognition to emotion perception. Significant research attention has been paid to untangling the relationship between emotion and cognition. There is evidence that supports the idea of functional specialization in the brain whereby regions can be considered as primarily “affective” or “cognitive,” thus leading to the notion that emotion and cognition operate, to some extent, independently of each other (i.e., the affective independence hypothesis; Zajonc, 2000). Consistent with the independence hypothesis, identification of emotion has been reported to be rapid, automatic, and nearly effortless for emotions conveyed by the face (Kiss & Eimer, 2008; Tracy & Robins, 2008) and the voice (Lima, Anikin, Monteiro, Scott, & Castro, 2018; Sauter & Eimer, 2010). In contrast, there is also considerable evidence to support connectionist models of the brain, suggesting that emotion and cognition behaviors arise from interactions of networks of brain regions previously considered more specialized (for a review, see Pessoa, 2008). Consistent with the connectionist models, cognitive decline has been shown to negatively affect interindividual emotion recognition ability (Dyck & Denver, 2003; Lambrecht, Kreifelts, & Wildgruber, 2014; Lima et al., 2014). Thus, hearing researchers investigating emotion perception should be aware of potential linkages between emotion and cognition and that cognitive processes may mediate or moderate affective experiences, particularly when listening to complex stimuli (Schirmer & Kotz, 2006). Moving forward, it is anticipated that research on emotion in hearing will often employ methodologies that ask participants to perform judgment and decision-making tasks. Accordingly, it may be important to consider individual differences in cognition (for a review, see Pichora-Fuller et al., 2016).
Methods and Materials
Depending on the research question, the classification scheme, or the type of emotion perception under study, a variety of methodologies and stimuli can be appropriate. The following is a review of some of the methodologies that members of the working group have direct experience with or that are recognized as valid indices of emotion perception, which could be applied to auditory emotion.
Evaluation of Interindividual Emotion Perception
Subjective methods
Although there are several questionnaires that assess the emotional consequences associated with hearing loss (e.g., Hearing Handicap for the Elderly; Ventry & Weinstein, 1982), we are aware of only one questionnaire that assesses the potential effect of hearing loss on emotional communication. The Emotional Communication in Hearing Questionnaire (EMO-CHeQ; Singh et al., in press) is a 16-item scale assessing vocal emotion hearing difficulties in four subdomains: (a) characteristics of encountered talkers (e.g., voices on television), (b) communication in challenging listening situations (e.g., noisy environments), (c) speech production (e.g., the ability to convey emotion in a subtle manner using one’s own voice), and (d) the associated impact of such deficits on socioemotional well-being (e.g., social isolation). Singh et al. (in press) report that for individuals with hearing loss in unaided conditions, the EMO-CHeQ significantly correlates with vocal emotion-identification performance measured behaviorally (r = −.64).
Objective methods (behavioral)
Objective evaluations of a listener’s ability to identify or recognize emotion in others are often based on a listener’s performance on a recognition or an identification task. These performance-based methodologies involve presenting stimuli and asking participants to identify or recognize the expressed emotion. Responses are typically closed set (participants choose from a limited number of emotions). Outcomes are usually reported in percent correct or rationalized arcsine units (Studebaker, 1985), reflecting the accuracy of identified emotion. In audiology, closed set response formats have been commonly used to investigate the effects of hearing loss and amplification on emotion perception in others (e.g., Chatterjee et al., 2015; Luo, Fu, & Galvin, 2007; Most & Aviner, 2009; Orbelo, Grim, Talbott, & Ross, 2005).
Evaluation of Intraindividual Emotion Perception
Subjective methods
While it is beyond the scope of this work to describe self-report measures of emotion in depth, it should be noted that a number of self-report measures exist for assessing subjective intraindividual experiences (for a review, see Mauss & Robinson, 2009). Subjective evaluations generally follow one of the classification systems outlined earlier, categorical or dimensional. Categorical methodologies involve asking participants to label the emotion they are experiencing, either in general or in response to a specific stimulus. To standardize the types of responses elicited, researchers often use validated scales with lists of adjectives that participants can use to rate their emotion. For example, the Differential Emotions Scale (Izard, 1977) includes a 30-item checklist with three adjectives for each of Izard’s 10 identified primary emotions (interest, joy, surprise, sadness, anger, disgust, contempt, fear, shame, and guilt). Rather than using a binary checklist of adjectives where participants check the adjectives that reflect their emotion, some investigators use a scale, asking participants to rate the extent to which they identify with a particular adjective or emotion. For example, the Positive Affect and Negative Affect Scales (Watson, Clark, & Tellegen, 1988) consist of 20 adjectives that describe positive and negative affect. Participants rate the extent to which they feel an adjective from 1 (very slightly or not at all) to 5 (extremely).
Methodologies based on dimensional schemes generally involve asking participants to rate their emotion on each dimension under investigation, typically valence and arousal. The Self-Assessment Manikin (SAM; Bradley & Lang, 1994) can be used to assist participants in rating emotions along dimensions. The SAM provides pictorial representations of the dimensions (valence, arousal); each includes schematic figures expressing the range of each dimension. For example, the valence figures range from a frowning face at the far left to a smiling face at the far right. Participants rate their emotion on a scale of 1 to 9 for each dimension, where 1 reflects a low score on the dimension (low valence, low arousal) and 9 reflects a high score on the dimension (high valence, high arousal). The SAM also includes a third dimension, dominance, which reflects the extent to which someone feels in control or is being dominated. Other nonverbal, self-report measurement tools include the ProEmo (Desmet, 2003) and an emotion monitor (Aaker, Stayman, & Hagerty, 1986; Baumgartner, Sujan, & Padgett, 1997). The ProEmo, similar to the SAM, includes images of 14 cartoons whose face and body are portraying an emotion. Emotion monitors, on the other hand, involve eliciting a continuous rating along a single dimension of a particular emotion. For example, a participant could rate their experienced valence during a 6-s music clip by drawing a line along a paper during stimulus presentation, moving it to the left when he or she felt more pleasant and to the right when he or she felt more unpleasant.
Objective methods (peripheral physiologic)
Peripheral physiological measures provide insight into some of the processes underlying the perception of emotional speech. Unlike subjective or behavioral measures of intraindividual emotion, physiologic measures allow for continuous tracking of emotion and do not rely on introspection or shared vocabulary. The valence dimension of speech emotion is best reflected in activity of the zygomaticus major (i.e., smiling) and corrugator supercilii (i.e., frowning) muscles of the face. Positively valenced speech tends to elicit increases in zygomaticus activity and decreases in corrugator activity, while negatively valenced speech tends to elicit the reverse pattern (Hietanen, Surakka, & Linnankoski, 1998; Livingstone et al., 2016; Magnée, Stekelenburg, Kemner, & de Gelder, 2007). Peripheral physiological correlates of the arousal dimension of speech emotion might include measures such as heart rate, respiration, and galvanic skin response. For example, Nespoli, Goy, Singh, and Russo (2018) found that increases in arousal were associated with increases in galvanic skin responses. In the context of music, increases in arousal have been associated with increases in heart rate, respiration, and galvanic skin response (e.g., Baumgartner, Esslen, & Jäncke,2006; Etzel, Johnsen, Dickerson, Tranel, & Adolphs, 2006; Iwanaga, Ikeda, & Iwaki, 1996; Krumhansl, 1997; Sandstrom & Russo, 2010).
One of the challenges involved with interpreting changes in peripheral physiologic measures is that their relationship with acoustic stimuli is likely to be multifactorial. For example, activation of the corrugator muscle may be the result of emotional contagion (e.g., spontaneous reaction to sad speech), emotional induction (e.g., felt response that emerges after extended listening to sad speech), or a startle response to an aversive stimulus. Similarly, and with relevance for hearing aids, an increase in the galvanic skin level may be due to emotional contagion, emotional induction, or a transient increase in stimulus intensity (Turpin & Siddle, 1979).
Objective methods (neural physiologic)
Although a wide range of neural measures have been used to understand the mechanisms underlying emotion processing, comparatively little work has used such measures to track the dynamics of emotional response to auditory stimuli. At least three magnetoencephalography/electroencephalography measures appear to be well suited to this task. First, the lateralization of fronto-cortical activity in the alpha band appears to be dynamically related to the valence of emotional speech (Bekkedal, Rossi, & Panksepp, 2011; Demaree, Everhart, Youngstrom, & Harrison, 2005). Approach responses to vocal affective stimuli tend to be more left-lateralized, whereas avoidance responses tend to be more right-lateralized. Similar results have been obtained by tracking the hemodynamic response using functional near infrared spectroscopy (Balconi, Grippa, & Vanutelli, 2015). As an optical method, this may prove to be particularly useful in testing listeners with hearing aids and cochlear implants. Second, time-locked electrophysiological activity such as the N300 component may serve as a measure of spontaneous emotion recognition from vocal cues (Bostanov & Kotchoubey, 2004). It seems feasible that the magnitude or latency of this event-related potential component will vary as a function of hearing loss or amplification. Finally, the mu rhythm (8–13 Hz) is an endogenous neural oscillation that may have relevance for understanding bottom-up and top-down factors involved in emotional speech perception. This rhythm originates from frontal and parietal sources and is thought to reflect sensorimotor network activity that underpins an internal simulation of observed action. The mu rhythm is desynchronized during perception of vocal activity (Bowers, Saltuklaroglu, Harkrider, Wilson, & Toner, 2014; Jenson, Harkrider, Thornton, Bowers, & Saltuklaroglu, 2015; Lévêque & Schön, 2013; McGarry, Pineda, & Russo, 2015). The extent of desynchronization has been shown to be greater in adverse listening conditions (Cuellar, Bowers, Harkrider, Wilson, & Saltuklaroglu, 2012) and when a listener is specifically asked to evaluate vocal emotion (McGarry et al., 2015).
Other objective methods (facial, body, and vocal cues)
Yet another method to evaluate experiences of individuals when listening to sounds that elicit emotion involves assessing fine facial behavior, such as upward-turned corners of the mouth or wrinkled forehead. Such measures include ratings by trained observers (for a review, consult Cohn & Ekman, 2005) and camera-based automated systems that employ facial recognition technologies. The most popular coding system is the Facial Action Coding System (Ekman & Friesen, 1978; Ekman, Friesen, & Hager, 2002; Ekman, Friesen, & O'sullivan, 1988) that trains observers to recognize muscle movements of the face (Ekman et al., 2002). This method is able to reliably code valence, but, in general, the measures are not particularly robust when evaluating arousal (Bonanno & Keltner, 2004; Russell, 1994). Camera-based automated systems are becoming increasingly popular in light of technological progress. In a recent study that tested a commercially available facial emotion recognition software, the automated system was as accurate (85%) at identifying emotion as trained human coders when classifying standardized corpora of facial emotion expression (Lewinski, den Uyl, & Butler, 2014). There are many commercially based systems available on the market. Of relevance to audiology, there is evidence to suggest that aging may have a significant impact on automated facial emotion recognition (Mary & Jayakumar, 2016), but the understanding of the extent to which aging influences automated emotion recognition is still in its infancy.
Gross facial changes, such as startle responses, also offer a potential avenue for the measurement of emotion. There are several behaviors associated with startle responses, with the eye blink being the most robust observable behavioral measure. Startle amplitude (of the eye blink) is larger for unpleasant stimuli and smaller for pleasant stimuli (Bradley, Cuthbert, & Lang, 1993; Bradley & Lang, 2000; Vrana, Spence, & Lang, 1988). It is more difficult to discern individual emotions with startle, and thus, this measure is better suited to measuring effect of valence, specifically for stimuli that are of sufficiently high intensity to elicit a startle response.
In addition to facial movements, it is possible to infer experienced emotion based on physical changes in the body, including body behavior, startle responses, and vocal characteristics. First, body behaviors complement other measures of intraindividual emotion because, although there is limited research on such behaviors in response to affective stimuli, two specific emotions, pride and embarrassment, have received considerable attention. These two emotions are not particularly discernable from facial expressions. Pride is associated with expansive body positions (Stepper & Strack, 1993), and embarrassment is associated with diminutive body postures (Keltner & Buswell, 1997). Finally, vocal characteristics can be used to infer emotion. There is evidence to suggest that under conditions of increased arousal, there are associated increases in vocal pitch (Bachorowski, 1999; Kappas, Hess, & Scherer, 1991; Scherer et al., 1991), although it is less sensitive to manipulations of valence (Johnstone & Scherer, 2000).
Stimuli for Measuring Inter- and Intraindividual Emotion Perception
Unlike the ample availability of speech materials used to assess speech recognition performance, there are fewer corpora available to assess experiences of listening to acoustic emotional stimuli. The following is a nonexhaustive list of currently available corpora well suited for research in audiology, including a discussion of the relative advantages and disadvantages of each. These could be used to evaluate inter- or intraindividual emotion perception, simply by changing the instruction (“what is the conveyed emotion?” or “how do you feel?”).
The Toronto Emotional Speech Set
The Toronto Emotional Speech Set (TESS; Dupuis & Pichora-Fuller, 2011) consists of audio recordings of the 200 NU-6 (Tillman & Carhart, 1966) items, with each item spoken to portray seven different emotions (angry, disgust, fear, happy, pleasant surprise, sad, and neutral). Each item begins with the carrier phrase “Say the word” that is also portrayed with emotion. There are recordings available for both a younger and older female adult talker, and all items are spoken in a North American accent. An advantage associated with the TESS is that the recordings facilitate simultaneous investigation of both emotion- and word-identification performance; however, because only two actors voiced the test materials, the TESS is not well suited to fully understand effects associated with talker variability.
The Ryerson Audio-Visual Database of Emotional Speech and Song
The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS; Livingstone & Russo, 2018) consists of audio and audio-visual recordings (centered on the face) of 24 actors (12 female, 12 male) speaking and singing with eight emotions (angry, disgust, fear, happy, surprise, sad, calm, and neutral) for the speech set and six emotional expressions for the song set (angry, fear, happy, sad, calm, and neutral). All emotions except neutral are expressed at two levels of emotional intensity: normal and strong. Recordings are available for two different sentences: “Kids are talking by the door” and “Dogs are sitting by the door.” Advantages of the RAVDESS include limited semantic information, auditory and auditory-visual modalities, and a large set of talkers. A disadvantage of the stimuli is that the constrained semantic content potentially limits the realism of the stimuli.
Corpus of nonverbal vocalizations
Lima, Castro, and Scott (2013a) developed and validated a corpus of nonverbal vocalizations, which includes recordings of two female and two male talkers vocalizing eight emotions. The emotions include four positive ones (triumph/achievement, amusement, sensual pleasure, and relief) and four negative ones (anger, disgust, fear, and sadness). The authors validated the sounds by testing 20 participants with a forced choice, emotion recognition task and an additional 20 participants with a ratings task, where participants rated the extent to which a recording represented the intended emotion, represented valence, represented arousal, and was believable. Recognition accuracy was 86% on average.
Montreal Affective Voices
The Montreal Affective Voices (MAV; Belin, Fillion-Bilodeau, & Gosselin, 2008) includes 90 nonverbal vocalizations that portray 9 emotions (anger, disgust, fear, pain, sadness, surprise, happiness, pleasure, and neutrality) vocalized by 10 actors (5 female). The authors validated the sounds by recording subjective ratings of valence, arousal, and categorical affective state (e.g., not all happy to extremely happy). Recognition accuracy was 68% and highest for female talkers. The corpora developed by Lima et al. (2013a) and the MAV share similar advantages (e.g., limited semantic content, brief duration, behavioral validation, and range of talkers) and disadvantages (e.g., acted stimuli).
Musical Emotional Bursts
The Musical Emotional Bursts (MEB; Paquette, Peretz, & Belin, 2013) includes 80 music samples that represent four emotions (happy, sad, fear, and neutral), as portrayed by trained musicians improvising on either a violin or clarinet. The authors conceptualize the corpus as a musical corollary to the MAV. The advantages of the MEB include the relatively brief duration of the samples and high recognition accuracy (80%). In addition, use of the MEB in studies is potentially advantageous for scientific inquiry regarding the extent to which vocalizations and music share a similar neural processing; some evidence supports shared circuitry (Ilie & Thompson, 2006; Lima & Castro, 2011; Thompson, Marin, & Stewart, 2012), whereas other evidence supports dissociation (Lima, Garrett, & Castro, 2013b; Omar et al., 2011; Peretz & Coltheart, 2003).
Naturally occurring stimuli
Yet another approach adopted in auditory emotion research is to use naturally occurring, rather than acted stimuli (e.g., Schmidt et al., 2016b). This is potentially an important distinction in emotion perception research because acted and spontaneous vocal emotion have been shown to exhibit different acoustical properties (for example in laughter; Lavan, Scott, & McGettigan, 2016). Although such tokens exhibit high ecologically validity, there are limitations regarding experimental control of the materials. For example, naturally occurring stimuli typically exhibit high variability in semantic content, utterance length, and intelligibility.
The International Affective Digital Sounds
The International Affective Digital Sounds (IADS)-2 (Bradley & Lang, 2007) is a standardized database of 167 naturally occurring sounds widely used in the study of emotion. The sounds vary along the dimensions of valence and arousal and contain a range of nonspeech sounds such as music (e.g., guitar), nature sounds (e.g., rain), body sounds (e.g., vomiting), animals (e.g., cow moos), human emotions (e.g., crying), and man-made sounds (e.g., siren). Normative arousal and valence SAM ratings collected on 78 undergraduate students of unknown hearing status are available (Bradley & Lang, 2007). Given that tokens on the IADS are nonspeech, the corpus is well suited to investigations of individuals who speak languages other than English. The nonspeech stimuli also complement speech corpora, particularly for investigating the potential for distinct emotion perception mechanisms for speech and nonspeech stimuli, as has been reported for cortical auditory perception (e.g., Norman-Haignere, Kanwisher, & McDermott, 2015).
The International Affective Picture System
The International Affective Picture System (IAPS; Lang, Bradley, & Cuthbert, 1999) is a large set (956 color images as of 2005) of emotionally evocative photographs widely used in emotion research that vary along the dimensions of arousal and valence. The corpus includes images of people (e.g., in various states of undress who are happy, angry, sad, fearful, threatening, attractive, etc.), housing projects, toilets, landscapes, waterfalls, sporting events, photojournalism from wars and natural disasters, mutilated bodies, baby animals, threatening animals, erotic images, insects, loving families, and so forth. Normative arousal and valence SAM have been collected across a range of cultures and age groups including children and young adults (McManis, Bradley, Berg, Cuthbert, & Lang, 2001) and older adults 63 to 77 years of age (Grühn & Scheibe, 2008). Although not auditory in nature, the IAPS allows for a detailed picture of the effects of hearing loss on emotional processing by providing for the quantification of emotional processing of visual stimuli. The strength of the IAPS comes from the nonlinguistic nature of the stimuli and the large number of previous investigations that used IAPS stimuli.
Effects on Interindividual Emotion Perception
If people with hearing loss are unable to correctly perceive emotions, it may compromise their ability to communicate and have broad quality of life effects, such as poorer performance at school and in work environments (Elfenbein, Der Foo, White, Tan, & Aik, 2007; Hall, Andrzejewski, & Yopchick, 2009). Indeed, deficits in interindividual emotion recognition have been proposed to be responsible for reduced empathy in children with hearing loss (Netten et al., 2015). To fully understand the relationship between hearing loss and interindividual emotion perception, the working group considered the effects of age and hearing loss on interindividual emotion perception separately.
Age
There are well-established differences between older and younger adults’ abilities to recognize emotion in others in both faces and voice (for meta-analysis, see Ruffman et al., 2008). Although not consistent across all studies (e.g., Dupuis & Pichora-Fuller, 2015), some authors report asymmetrical age differences, where unpleasant emotions are more influenced by age. For example, the effects of age are larger on listeners’ ability to identify sad intonation than happy or neutral vocal intonations (Dupuis & Pichora-Fuller, 2015; Mitchell, Kingston, & Barbosa Bouças, 2011; Paulmann et al., 2008; Sen et al., 2017). Similarly, age has been shown to negatively affect facial recognition of sadness, but not happiness or surprise (Keightley, Winocur, Burianova, Hongwanishkul, & Grady, 2006; Murphy, 1999; Sullivan & Ruffman, 2004). Consistent with this asymmetry, age effects on interindividual emotion perception have been explained in part by a “positivity bias.” As people age, they tend to engage in behaviors and thought processes that promote positive emotional experiences (Carstensen, Pasupathi, Mayr, & Nesselroade, 2000; Isaacowitz, Livingstone, & Castro, 2017; Mather & Carstensen, 2005; Mather & Knight, 2005). The effects of age have also been attributed to cognitive decline (Keightley et al., 2006; Sen et al., 2017; Sullivan & Ruffman, 2004), neuropsychological changes in brain structures associated with sociability (Ruffman et al., 2008), and changes in social environments (Murry & Isaacowitz, 2017).
Hearing Loss
Given the importance of fundamental frequency and pitch range to vocal emotion recognition, one might expect hearing loss to negatively affect interindividual emotion perception. However, study results into the effects of hearing loss on vocal emotion recognition are mixed and depend on degree of hearing loss. Among a group of elderly listeners with near normal hearing, Dupuis and Pichora-Fuller (2015) found no association between either degree of hearing loss (as indicated by pure-tone average thresholds) or suprathreshold auditory processing (as indicated by pitch, loudness, or gap detection abilities) and vocal emotion recognition of semantically neutral sentences.
Among older adults with mild-to-moderate hearing loss, some authors report no association between pure-tone average and emotion recognition, instead attributing population differences to age-related changes in cognition (e.g., Mitchell, 2007; Orbelo et al., 2005). More recently, Singh et al. (in press) demonstrated a significant effect of mild-to-moderate hearing loss on emotion recognition, as evidenced by self-reported handicap (EMO-CheQ) and behavioral performance on a recognition task using the RADVESS. The authors also report a strong, negative correlation between four-frequency pure-tone average and emotion recognition (without visual cues), as measured behaviorally (r = −.73, p < .05) for hearing-aided listeners, suggesting a strong effect of hearing loss on interindividual emotion perception. Similarly, Rigo and Lieberman (1989) reported a significant negative correlation between low-frequency hearing threshold (average of 500, 1000, and 2000 Hz) and the ability to correctly label the emotion in a brief utterance devoid of meaning. Furthermore, participants with normal low-frequency thresholds did not demonstrate interindividual emotion perception deficits, highlighting the importance of low-frequency hearing for emotion recognition and suggesting differences in degree of low-frequency hearing loss is a possible explanation for the mixed findings in the literature. The mixed results might also be partially explained by variability across studies in participants’ perception of complex pitch. Pitch perception is significantly related to emotional prosody recognition (Mitchell & Kingston, 2014), but people with similar audiograms can vary considerably in their pitch perception abilities (Arehart, 1994).
Unlike adults with mild-to-moderate hearing loss, it is clear that vocal emotion recognition is impaired for adults with severe hearing loss who use cochlear implants, relative to their peers with normal hearing (Chatterjee et al., 2015; Jiam, Caldwell, Deroche, Chatterjee, & Limb, 2017; Luo et al., 2007). Cochlear implants provide a representation of speech that is compressed in intensity range and limited in spectral resolution. Pitch is particularly poorly represented via cochlear implants. Rather than perceiving the pitch of complex sounds via information from lower numbered, spectrally resolved harmonics, listeners with cochlear implants must rely on the cues available in the periodic temporal envelope, which even in normal-hearing listeners produce a pitch that is considerably less accurate and less salient than the pitch from low-numbered resolved harmonics (Bernstein & Oxenham, 2003; Houtsma & Smurzynski, 1990). Most research on emotion perception in cochlear implant users has been related to pitch perception and the lack of clear pitch cues. A recent review article has summarized current research on voice emotion perception and production in cochlear implant users (Jiam et al., 2017). They report that cochlear implant users experience major deficits in emotion perception and production in speech, as well as difficulties recognizing emotional content in music. In speech, as well as music, alternative cues to pitch, such as intensity, duration, and speaking rate or tempo, are used to some extent, but not always very effectively.
Music perception is also highly degraded in cochlear implant users, in terms of their ability to recognize melodies or harmony (McDermott, 2004). Not surprisingly, therefore, cochlear implant users have difficulty distinguishing consonant chords from dissonant chords (Caldwell, Jiradejvong, & Limb, 2016), and they are not able to recognize emotions in music based purely on major or minor scale (Hopyan, Manno, Papsin, & Gordon, 2016). However, in cases of natural music, where other cues such as intensity and tempo are available, the emotion recognition by cochlear implant users is well above chance, although not quite as high as for normal-hearing listeners (Caldwell, Rankin, Jiradejvong, Carver, & Limb, 2015; Hopyan et al., 2016).
Like adults, children with profound congenital hearing loss also demonstrate deficits in emotion perception of speech (Chatterjee et al., 2015; Hopyan-Misakyan, Gordon, Dennis, & Papsin, 2009; Peng, Tomblin, & Turner, 2008) and music (Hopyan et al., 2016; Whipple, Gfeller, Driscoll, Oleson, & McGregor, 2015), although evidence suggests limited emotion perception deficits for children with moderate-to-severe hearing loss (Dyck & Denver, 2003). The examination of interindividual emotion perception in children provides insight into the developmental effects of hearing loss and language on emotion recognition. Children with cochlear implants exhibit impaired emotional recognition or delayed emotional competence (Denham & Auerbach, 1995; Dyck, Farrugia, Shochet, & Holmes-Brown, 2004; Gray, Hosie, Russell, Scott, & Hunter, 2007; Most, Gaon-Sivan, G., Shpak, T., & Luntz, 2012; Rieffe, 2012; Rieffe & Terwogt, 2000). It has been proposed that these deficits can be explained, in part, based on a model of hierarchical development of emotion comprehension, which develops with increasing age (Pons, Harris, & de Rosnay, 2004). Limited access to auditory information is expected to disrupt the hierarchy and delay emotional development (e.g., Cole & Flexer, 2015; Denham & Auerbach, 1995; Ziv, Most, & Cohen, 2013). With hearing loss, there might be less verbal sharing (Denham & Auerbach, 1995) and fewer interactions in parent–child communications (Cole & Flexer, 2015). As a consequence, these children have more restricted auditory learning opportunities leading to less flexible and more narrow perception of emotional situations (Pons et al., 2004). It is therefore speculated that the delayed emotional development is a product of delayed speech and language development rather than (necessarily) a distortion or reduction of the acoustic cues necessary for adequate emotion recognition (Ludlow, Heaton, Rosset, Hills, & Deruelle, 2010). These findings highlight the important interplay between hearing and language development on emotion perception.
Modality
Although the focus of this article is generally on auditory emotion perception, the consideration of stimulus modality provides insights into the mechanisms associated with changes in emotion perception with age or with hearing loss. That is, does hearing loss or advanced age affect interindividual emotion perception with visual cues? As a result of the aforementioned age-related changes in cognition and social function, it is not surprising that advanced age reduces recognition ability of emotion in faces (Demenescu, Mathiak, & Mathiak, 2014; Mill, Allik, Realo, & Valk, 2009; Murry & Isaacowitz, 2017; Sullivan & Ruffman, 2004). However, the conclusions regarding the interactions between mild-moderate, acquired hearing loss and stimulus modality are less clear. Some investigators report changes in interindividual emotion perception in the visual domain with hearing loss (Rigo & Lieberman, 1989), whereas others report hearing loss does not affect emotion perception with stimuli that are audio-visual (Singh et al., in press). The reconciliation of these findings is unclear; it might be related to age effects or methodology choices.
In children, emotion recognition in the visual domain has been shown to be resilient to the effects of hearing loss for school-aged children and adolescents (approximately 6 to 18 years old; Dyck et al., 2004; Hopyan-Misakyan et al., 2009; Hosie, Gray, Russell, Scott, & Hunter, 1998; Most & Aviner, 2009). Conversely, hearing loss has been associated with global emotion perception deficits in the visual domain for preschool children (approximately 2 to 5 years old) with hearing aids or cochlear implants. These results suggest that hearing loss has a developmental, modality-independent effect on interindividual emotion perception that is not evident in school-aged children or adolescents.
Effects on Intraindividual Emotion Perception
Age
As with interindividual emotion perception, the recognized positivity bias might be expected to influence intraindividual emotion perception. Instead, age effects on emotional responses have been small and difficult to measure in response to pictures (Grühn & Scheibe, 2008) and sounds (Picou, 2016b), particularly with a small number of stimuli (Mather & Knight, 2005; Mikels et al., 2005; Wieser, Mühlberger, Kenntner-Mabiala, & Pauli, 2006). When effects of age are noted, the findings have been confined to specific stimuli (older adults rate pictures of risky behavior as less pleasant than younger adults; Grühn & Scheibe, 2008). Other researchers report that age increases the range of emotional responses to pictures, where pleasant stimuli are rated as more pleasant (Backs, da Silva, & Han, 2005; Grühn & Scheibe, 2008; Smith, Hillman, & Duley, 2005) and unpleasant stimuli are rated as more unpleasant (Grühn & Scheibe, 2008).
There is mounting evidence that the lack of large aging deficits in intraindividual emotion perception is due to compensatory cognitive strategies. That is, older adults engage more prefrontal cortical regions and exhibit a smaller subcortical (amygdalar) response compared with younger adults, particularly for unpleasant stimuli (Fischer et al., 2005; Tessitore et al., 2005). For pleasant stimuli, older adults have been shown to demonstrate more amygdalar activity than younger participants (Mather et al., 2004). These data suggest that, despite similar behavior between younger and older adults, the interplay between the cognitive and the automatic systems shifts with age.
Hearing Loss
One of the first investigations into the effects of acquired hearing loss on intraindividual emotion perception of sound also revealed cortical changes in response to sounds in the presence of hearing loss. Husain, Carpenter-Thompson, and Schmidt (2014) tested older adults with normal hearing or acquired hearing loss, presented nonspeech sounds, and measured the emotional response behaviorally and physiologically. The results indicated that listeners with hearing loss were less affected by the emotional sounds than their peers with normal hearing. In addition, listeners with hearing loss demonstrated more prefrontal engagement and less amygdalar engagement, suggesting cognitive and behavioral consequences of hearing loss on emotional responses. These findings were confirmed by Picou (2016b), who evaluated subjective ratings of valence and arousal at multiple signal levels to evaluate the effects of age and hearing loss on emotional responses to nonspeech sounds. Although ratings from younger and older listeners with normal hearing were not different from each other, listeners with hearing loss exhibited a reduced range of valence ratings. They rated pleasant stimuli as less pleasant and unpleasant stimuli as less unpleasant compared with listeners with normal hearing. Furthermore, Picou and Buono (2017) found the effect of hearing loss to be inversely related to degree of hearing loss, whereby those with higher pure-tone averages exhibited smaller ranges of emotional responses to nonspeech sounds.
Modality
As with interindividual emotion perception, the extant literature suggests intraindividual emotion perception is more robust with visual than with auditory stimuli (Bradley & Lang, 2000; Shinkareva et al., 2014), although auditory and visual stimuli result in a similar pattern of behavioral and electrophysiological results (Bradley & Lang, 2000; Czigler, Cox, Gyimesi, & Horváth, 2007; Gerdes, Wieser, & Alpers, 2014; Schupp, Junghöfer, Weike, & Hamm, 2003). The results of most studies evaluating the combined effects of audition and vision on emotional responses to nonspeech sounds suggest that the combination of congruent sensory modalities enhances the emotional response relative to unimodal sensory stimuli (Cox, 2008; Gerdes et al., 2013), similar to findings reported for faces combined with speech stimuli (Livingstone & Russo, 2018). Furthermore, auditory stimuli facilitate the early and immediate processing of visual processing, as evidenced by enhanced electrophysiological cortical responses (Gerdes et al., 2013) and priming (Scherer & Larsen, 2011). Not all have reported multisensory enhancement of intraindividual emotion perception with combined auditory and visual cues (Brouwer, Van Wouwe, Mühl, Van Erp, & Toet, 2013), perhaps as a result of suboptimal congruency. Incongruent valence pairing could negatively affect multisensory enhancement. For example, Gerdes et al. (2013) found that valence ratings of pleasant sounds paired with pleasant pictures were significantly higher than pleasant sounds paired with unpleasant pictures.
Although yet to be investigated, the results of multisensory evaluations in listeners with normal hearing might provide insight into the expected effects of hearing loss on intraindividual emotion perception. Specifically, because hearing loss reduces ratings of valence of pleasant sounds (Picou, 2016b), one might expect valence ratings of auditory-visual stimuli to be lower for listeners with hearing loss than peers with normal hearing. Conversely, if intraindividual emotion perception of visual stimuli is preserved with hearing loss, and cortical reorganization associated with hearing loss results in dominance of visual sensory processing (Merabet & Pascual-Leone, 2010), or increased reliance on visual cues for processing speech (Rosemann & Thiel, 2018), listeners with hearing loss might not demonstrate differences in intraindividual emotion perception of audio-visual stimuli, relative to peers with normal hearing. Seemingly, the interaction between stimulus modality and acquired hearing loss warrants investigation.
Interventions
Technological Interventions
Hearing aids
To date, few articles have been published investigating how hearing aids influence interindividual perception. One test of emotion understanding with a study of 4- to 5-year olds found that hearing-impaired children wearing hearing aids exhibit levels of emotion understanding equivalent to that observed with normal-hearing children (Laugen, Jacobsen, Rieffe, & Wichstrøm, 2016). In contrast, on tests of emotion identification, hearing aid users generally have more difficulty than listeners with normal hearing. For example, children and adolescents who wear hearing aids have emotion-identification scores that are about 30 percentage points lower than those of their normal-hearing peers (Most & Aviner, 2009; Oster & Risberg, 1986). It could be that the effects of hearing loss on emotion perception are less readily observed on comprehension tasks, which are more dependent on context and cognition (for a review, see Pichora-Fuller & Singh, 2006).
The performance gap between aided and unaided listening appears to be somewhat smaller for adults than for children, possibly due to a different time course and underlying mechanism of hearing loss. One study showed that older adults with normal hearing outperformed older adults who wore hearing aids by 19 percentage points (Waaramaa, Kukkonen, Stoltz, & Geneid, 2016). Importantly, the aforementioned studies on adults and children did not test listeners with hearing loss in both unaided and aided conditions, so it is unclear how much hearing aids improved emotion recognition.
Two studies have investigated emotion-recognition performance in groups of adults with normal hearing, hearing loss in unaided conditions, and hearing loss aided conditions. It was first observed that, compared with not wearing their own hearing aids, aided listeners experience minimal benefits of about 6 percentage points (Goy, Pichora-Fuller, Singh, & Russo, 2016). Similarly, Singh et al. (in press) found that behavioral performance on an emotion-recognition task and perceived disability using the EMO-CHeQ were similar in both unaided and aided groups of older adults with acquired moderate hearing loss (see also Nespoli et al., 2018). Together, these data suggest limited, if any, positive effects of hearing aid use on emotion-recognition performance.
Conversely, although there have been few studies published in the area, there appear to be some effects of hearing aid use on intraindividual emotion perception. Two studies report on the perception of arousal and valence by older listeners with hearing loss. The findings from these two studies suggest that an increase in sound intensity leads to an increase in arousal ratings. Arousal ratings were generally higher when stimuli were presented at higher intensity (Picou, 2016b) or with the use of hearing aids (Schmidt, Herzog, Scharenborg, & Janse, 2016a). Based on the findings reported by Picou (2016b) that increasing the overall level of sounds from 60 to 80 dB SPL reduced ratings of valence, it might be expected that hearing aids would negatively affect ratings of valence. Instead, results from two studies suggest that providing individualized amplification through hearing aids did not affect ratings of valence (Picou, 2016a; Schmidt et al., 2016a).
Cochlear implants
There have been some investigations into the optimization of emotional prosody recognition for cochlear implant users. For example, the addition of an acoustic hearing aid to a cochlear implant (i.e., bimodal hearing) has been shown to improve prosody recognition for cochlear implant users for both adults (Krull, Luo, & Iler Kirk, 2012; Most et al., 2011) and children (Straatman, Rietveld, Beijen, Mylanus, & Mens, 2010). Within the cochlear implant itself, researchers report improvements in vocal emotion recognition with increased number of channels (Chatterjee et al., 2015; Luo et al., 2007), particularly for the identification of prosody associated with joy (Zhu, Miyauchi, Araki, & Unoki, 2016). In addition, cochlear implant processing schemes can affect prosody recognition. Agrawal and colleagues compared prosody recognition of angry, happy, and neutral sentences using Psychoacoustic Advanced Combination Encoder (PACE) and the Advance Combination Encoder (ACE) strategies for listeners with normal hearing (Agrawal et al., 2012) and for cochlear implant users (Agrawal et al., 2013). The ACE and PACE processing are similar, both stimulating electrodes with the highest amplitude in a given cycle. However, PACE targets electrodes that are more important for listeners with normal hearing, rather than focusing on all spectral maxima. Combined, the results of Agrawal et al. demonstrate advantages of the PACE processing for emotional prosody recognition, especially for the identification of happy.
Training Interventions
Investigators have also focused on the effects of musicianship on emotional speech processing (e.g., Dankovicová, House, Crooks, & Jones, 2007; Lima & Castro, 2011; Schön, Magne, & Besson, 2004; Strait, Kraus, Skoe, & Ashley, 2009). The interpretation of these studies is complicated by questions regarding whether group differences are due to preexisting conditions that anticipate music training. However, several experimental studies support the notion that music training can be used to support emotional speech perception in normal-hearing children (Mualem & Lavidor, 2015; Thompson, Schellenberg, & Husain, 2004), as well as deaf children who wear cochlear implants (Good et al., 2017). One way of understanding these experimental effects is that music training may increase the sensitivity and responsiveness to acoustic dimensions that underlie speech emotion fine temporal structure (pitch), gross temporal structure (dynamics), and spectrotemporal attributes of sound (timbre).
There has been limited work investigating the effects of nonmusic training on emotion recognition, and, to date, the results are not strongly supportive that training improves emotion perception. Zhang, Dorman, Fu, and Spahr (2012) tested the possibility of a 4-week, bottom-up perceptual, speech phoneme training program to improve speech recognition and emotion recognition for experienced bimodal listeners (cochlear implant and contralateral hearing aid). Their results suggest that, although training improved vowel, consonant, and word identification, training did not affect emotion recognition of prosody in semantically neutral sentences.
More specific to emotion, Dyck and Denver (2003) developed and tested a psychoeducational program for children with prelingual hearing loss (moderate-severe and profound). The training consisted of eleven 45-min sessions related to understanding and recognizing emotions, primarily with pictures. Their results indicate improved emotion vocabulary and emotion comprehension, but not emotion recognition. Krull et al. (2012) also evaluated the potential for a more focused training program to improve emotion recognition, although the emphasis of their intervention was on improving talker-identification. Talker-identification, such as emotion recognition, depends in part on F0 variability (Remez, Fellowes, & Rubin, 1997); thus, one might expect talker-identification training to have benefits for emotional prosody recognition. Indeed, the authors report that adults with normal hearing listening to implant simulations over the course of 4 days improved talker-identification performance, and the benefits of the training generalized to speech recognition in noise and also emotional prosody recognition. Taken together, these studies suggest the potential for training programs to improve emotion perception, although a successful training program would likely be in the auditory domain and focus on skills specific to emotion recognition.
Future Directions
The preceding discussion summarizes the general understanding of emotion perception as it relates to age, hearing loss, and hearing loss interventions within the scope of interindividual emotion perception (e.g., emotion recognition, emotion understanding) and intraindividual emotion perception (e.g., emotional responses, elicited emotion). Although the study of emotion perception is relatively new for audiology, we can draw on decades of research from other fields, including psychology and neuroscience. Drawing on existing literature and contributing novel research specifically with hearing considerations in mind fosters a more comprehensive understanding of auditory emotion perception for people with normal and impaired hearing. The working group identified directions for future research. Most of the recommendations apply to both interindividual and intraindividual emotion perception. The recommendations are broadly divided into three categories, those that relate to (a) a foundational understanding of the interplay between hearing loss and emotion perception, (b) methodological choices to be considered in the future, and (c) priorities for intervention research.
Foundational Questions
Emotions result in both psychological and physical changes that influence behavior. Although there is considerable past research exploring the psychosocial impact of hearing loss, to date, hardly any research has investigated experiences of auditory emotion using physiological measures with hearing-impaired populations. This is somewhat surprising given the central role of emotion to human experience. Hence, there is a need to better understand how hearing loss and amplification influence objective physiological experiences when listening to signals that convey or evoke emotion responses. Such work may be of value because emotion-related biomarkers may represent a novel category of outcome measures by which to assess the efficacy of treatment methods designed to reduce the disability associated with hearing loss.
A second direction for future research concerns the need to better understand the broader impact of hearing-related emotion processing deficits. Husain et al. (2014) observed that compared with normal hearing controls, individuals with mild-to-moderate hearing loss exhibit altered brain activation patterns when listening to affective but not neutral sounds. This finding raises several questions regarding the causal relationship between these and other variables. For example, does hearing loss lead to emotion processing deficits and can hearing rehabilitation, such as hearing aid fitting, ameliorate such deficits? What are the direct and indirect relationships between hearing loss, emotion, and other variables such as cognition or social relationships? Does psychological distress (e.g., depression, anxiety, etc.) associated with hearing loss arise from the effect of hearing loss on social relationships, emotion processing, or both?
Finally, the working group recommends research that investigates associations between auditory abilities and emotion perception. Although a picture is emerging of the auditory factors that contribute to interindividual emotion perception for older listeners (e.g., pitch, intensity, duration; Mitchell & Kingston, 2014), there is a lack of clear understanding regarding the relationship between hearing abilities (e.g., pure-tone thresholds, speech perception) and both interindividual (Orbelo et al., 2005) and intraindividual (Picou & Buono, 2017) emotion perception. In addition, future work is warranted to identify how specific perceptual abilities (e.g., loudness perception or frequency selectivity) related to this type of emotion perception.
Methodological Choices
Currently, most interindividual perception research uses categorical schemes (e.g., name the emotion expressed by that person), whereas intraindividual perception research often uses dimensional schemes (e.g., how do you feel on the valence dimension). These differences in methodology will likely shed unique insights into the experiences of listeners when encountering signals that contain emotion information. Designing studies that tap into more than one model of emotion perception may better inform us on how listeners perceive different aspects of emotion, and it is possible that some schemes of emotion may better explain the acoustic-perceptual link than others.
A second methodological recommendation the working group suggests is the consideration of more naturalistic stimuli. Nearly all studies on emotional speech perception have used professional actors to simulate different emotional states, to control for sentence content and achieve high-quality recordings. However, simulated emotion may reflect societal norms rather than physiological changes in the talker due to their emotional state (Scherer, 2003), and differences in portrayals between talkers have likely contributed to inconsistent findings between studies. Sentences with simulated emotion differ acoustically from those with real emotion in several ways, including having greater F0 variability, greater shimmer values, and more low-frequency energy (Jürgens, Hammerschmidt, & Fischer, 2011). Given these differences between real and simulated emotion, more naturalistic stimuli need to be tested to confirm whether the behavioral patterns and acoustic-perceptual relationships seen with simulated emotion also apply to real emotion. Some examples of naturalistic stimuli that have been used include excerpts from interviews (Jürgens et al., 2011) and talk shows (Schmidt et al., 2016b).
Intervention Priorities
Finally, the working group suggests considerably more research related to interventions that consider emotion perception. One question of interest is related to the effects of hearing aids and cochlear implants on emotion perception. The extant literature would suggest that there is minimal benefit of hearing aids for interindividual emotion perception, with some mixed results for the effects of hearing aids on intraindividual emotion perception. To fully understand the potential effects of interventions on emotion perception, it will be important to consider methodological differences across studies. For example, differences in findings on how hearing aids affect arousal may be partly due to the use of conversational speech samples by Schmidt et al. (2016a) and the use of nonverbal sounds and nonbiological sounds by Picou (2016a). Furthermore, Goy et al. (2016) used speech samples from only one talker (an actor) on the TESS in the emotion-identification task, but it has been shown that talkers differ on how they portray emotions (Bachorowski, 1999). Speech materials from more talkers should be tested, as well as speech materials recorded under more naturalistic conditions, to make experimental findings more generalizable to real-life listening situations.
A related recommendation is to conduct additional research designed to investigate how different hearing aid and cochlear implant processing parameters (e.g., compression, frequency lowering, etc.) affect the cues that convey vocal emotion. Participants were fit with a single prescriptive method in the study reported by Picou (2016a), whereas participants used their own hearing aids in the studies reported by Goy et al. (2016), Singh et al. (in press), and Schmidt et al. (2016a). One advantage of participants using their own aids rather than aids provided for research purposes is that participants were accustomed to their hearing aids and their performance reflected their emotion perception under typical conditions. However, different types of hearing aid processing may have contributed to variation in performance between participants and made it more difficult to gauge the extent of benefit from hearing aids for emotion perception.
A third recommendation would be to examine whether the effects of learning or training might compensate to some extent for the effects of hearing loss on vocal emotion perception. Specifically, are listeners with hearing loss able to make use of new acoustic cues made available by hearing aids, or are these cues permanently lost? It is not known whether emotion perception would improve as listeners become acclimatized to their new hearing aids, or whether explicit training would lead to greater improvements than simple acclimatization. Given that there are so few studies, one recommendation to advance this area would be to evaluate auditory training of emotion perception with listeners in both unaided and aided conditions.
Clinical Implications
Although the working group identified research needs in the area, even before these needs are fulfilled, we can extrapolate several potential implications for audiology practice. To date, there appears to be little to no consideration of a hearing-impaired listener’s experience of emotion when developing or assessing treatment interventions. Hence, one potential clinical application is that the use of emotion-based outcome measures may guide the development of hearing instrument technologies and treatment interventions.
A second application relates primarily to understanding patient experiences relevant for counseling. We believe that, currently, counseling sessions include few discussions regarding the impact of hearing loss on emotion perception such as the ability to correctly recognize the vocal emotion in others and the experience of listening to stimuli that evokes emotion responses (i.e., laughter, music, or crying). Discussion of potential emotion processing sequelae associated with hearing loss may be beneficial in several ways. For example, counseling that discusses perception of signals that contain auditory emotion may raise the patient’s awareness about the potential impact on everyday communication and quality of life.
A third application, also related to counseling, concerns the paucity of discussion of the psychosocial consequences associated with hearing loss. Despite receiving training regarding the emotional impact of hearing loss, and despite the myriad of positive outcomes associated with effective clinician-patient communication (i.e., greater treatment adherence, patient disclosure, and patient satisfaction; for a review, see Ha & Longnecker, 2010), there is evidence to suggest that when opportunities to discuss emotions present themselves, therapeutic communication in audiology inadequately addresses experiences of emotion associated with hearing loss (Grenness, Hickson, Laplante-Lévesque, Meyer, & Davidson, 2015). Reluctance to discuss emotion may stem, in part, from practitioner’s concerns regarding their ability to effectively discuss emotion (Maguire & Pitceathly, 2002). One method to potentially foster better communication about emotion and the psychosocial consequences of hearing loss for patients and families is by first discussing a patient’s experience with vocal emotion understanding. Such an approach may represent a more naturalistic and less threatening method to discuss emotion as it pertains to hearing loss.
Conclusions
Hearing loss is a common, chronic condition that affects many older adults. The effects of hearing loss have typically been considered in relation to audibility, speech understanding, and psychosocial function. One subject that has received little attention, but has considerable potential to affect patients’ psychosocial function, is how hearing loss and hearing rehabilitation affect patients’ momentary emotional experiences. This article is a product from the HEART workshop, which was convened to develop a consensus document describing research on emotion perception relevant for hearing research. The goal of the article is to increase awareness about emotion perception research in audiology and to stimulate additional research on the topic. The working group identified two general categories of emotion perception: (a) interindividual perception, which includes skills such as emotion recognition and identification and (b) intraindividual perception, which includes emotional responses to stimuli. Hearing abilities have been implicated in both types of emotion perception, although different auditory abilities have been related to each type of perception. Interventions that improve pitch perception and spectral resolution would be expected to improve interindividual emotion perception, whereas improving audibility without excessive loudness might be expected to improve intraindividual emotion perception. Training and counseling interventions hold promise for improving both types of emotion perception, although, similar to technological interventions, more work is necessary before specific interventions can be recommended for clinical practice. Despite its infancy, the study of emotion perception has important implications for hearing science, with the potential to improve clinical outcomes by considering emotional experiences.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The workshop and the associated article were funded by Sonova AG.
