Abstract
Social emotions have figured prominently in recent research pertaining to music-related emotions. If music is indeed able to evoke social emotions in listeners, the implication is that music may be perceived in some way as akin to a human agent. Yet music, especially instrumental music, is not obviously an agent capable of feeling. Following up on past research linking liking sad music to the fantasy facet of trait empathy, the results of three studies are reported. The first two were online surveys involving 112 and 137 participants, respectively, who rated sets of words in terms of their implied agency, synonymousness, or applicability for describing music. The third involved a listening task in which 299 participants listened to 24 short excerpts of instrumental music, and selected up to 3 words, from a list of 16, that best described each excerpt. The list of 16 words was compiled based on the results of the first two studies and comprised 8 pairs of words that differed in terms of their level of implied agency but were matched in terms of their meaning and applicability to music. Participants also completed the Fantasy subscale of the Interpersonal Reactivity Index. The results did not support the hypothesis that high-fantasy listeners would be more likely to impute (virtual) agency to music. Instead, the attribution of agency was significantly associated with enjoyment and musical arousal.
How does music induce emotion in listeners? Research on emotion suggests a distinction between social and non-social emotions that hinges on awareness of an agent. Many emotions are linked exclusively to social experiences, including feelings of admiration, hate, suspicion, envy, embarrassment, pride, love, pity, and so forth. However, some induced emotions require no perception of agency. For example, although fear can be induced by encountering a threatening burglar, fear can also be induced by encountering a deep hole or chasm; joy might be induced by the return of an absent friend, but it can also be induced by discovering a source of much-needed water; sadness or anger might arise in response to unjustified criticism, or from seeing one’s home destroyed by fire.
As noted, the key to the evoking of social emotions, however, is the perception of agency. In literature, drama, and film, it is nearly always the case that persons or characters are represented explicitly. Even animated cartoons nearly always portray characters as sentient beings capable of displaying thoughts, intentions, and feelings. In contrast to inanimate objects (furniture, rocks, etc.), animate agents are able to evoke complex emotions in observers, such as feelings of affection, anger, or compassion.
In music involving one or more vocalists, the singer(s) similarly offers a human presence that ostensibly allows listeners to recognize the existence of a sentient being whose thoughts, intentions, and feelings might be expected to evoke appropriate responses in listeners. However, in the case of purely instrumental music, it is not obvious that the music conveys, portrays, or represents a conscious being or agent. While (instrumental) musical sounds can certainly be perceived and understood as resulting from the intentional motor actions produced by a human agent (e.g., Launay, 2015), such music may also be experienced as conveying imagined or virtual agency. Previous empirical work has shown that it is common for listeners to informally describe instrumental music using terms that imply a sentient actor. Watt and Ash (1998), for example, found that people are much more likely to ascribe agency-related descriptors (male, female, good, evil, angry, pleased, gentle, and violent) to music than to food. In addition, music scholars have drawn attention to the ways in which instrumental music exhibits animacy cues suggestive of agency (Broze, 2013; Hatten, 2018). Without some sense of agency, one would expect music-induced emotions to be limited to non-social emotions.
While some philosophers have remained skeptical about the capacity of music to convey imagined agency (e.g., Davies, 1997), many music theorists and philosophers (e.g., Hatten, 2018; Levinson, 1996, 2006; Robinson & Hatten, 2012) have argued the opposite; that the listener sometimes experiences music as conveying virtual agency, or as a narrative containing a fictional or virtual persona. Recent empirical evidence supports this notion, demonstrating that listeners are able to perceive (instrumental) musical improvisations in terms of social intentions (Aucouturier & Canonne, 2017) and often construct imagined narratives when listening to instrumental music (e.g., Margulis, 2017; Margulis et al., 2022). Furthermore, instrumental music has been shown to elicit strong experiences of feeling moved and touched (e.g., Vuoskoski et al., 2022), which has more recently been conceptualized as a social-relational emotion with social bonding functions (e.g., Fiske et al., 2019; Menninghaus et al., 2015).
In approaching questions related to music-induced emotion, it is prudent not to assume that all listeners are having the same experience. Indeed, existing research suggests that there is considerable variability in listeners’ experiences. This variability is most apparent in the diversity of musical tastes. For example, listeners can be divided roughly evenly between those who like and those who dislike nominally sad music (Garrido & Schubert, 2011; Huron & Vuoskoski, 2020; Taruffi & Koelsch, 2014). Variability is also evident in the types of feelings evoked. For example, fans of Heavy Metal experience feelings of power and joy when listening to metal music, whereas non-fans experience tension and anger (Thompson et al., 2019).
Some of these differences have been linked to personality traits. For example, several studies have implicated openness to experience as a factor influencing the intensity of music-induced emotions (Colver & El-Alayli, 2016; Dobrota & Reić Ercegovac, 2015; Maruskin et al., 2012; McCrae, 2007; Nusbaum & Silvia, 2011; Rentfrow & Gosling, 2003; Silvia et al., 2015; Silvia & Nusbaum, 2011). More recently, however, it has been shown that listeners who score high on openness to experience also tend to exhibit high levels of empathy (Costa et al., 2014; Melchers et al., 2016; Sattmann & Parncutt, 2018). In a study involving some 300 participants, Sattmann and Parncutt (2018) found that openness to experience fails to serve as a significant predictor of music-induced emotion when a measure of trait empathy is included, suggesting that the main personality trait predicting individual differences in music-induced emotion is empathy (see also Eerola et al., 2016; Vuoskoski et al., 2012). In many ways, high trait empathy makes more sense than high openness, since empathy more clearly presupposes that the listener is interpreting the music as conveying some degree of agency.
An influential model of empathy is Mark Davis’s 4-factor model, which forms the basis for the Interpersonal Reactivity Index (IRI) (Davis, 1980, 1983). The four factors in Davis’s model include empathic concern, personal distress, perspective taking, and fantasy. Empathic concern is the disposition to feel concern or compassion for another person experiencing some misfortune or stress. Personal distress assesses the disposition to mirror or echo feelings of anxiety or unease when witnessing stress or suffering in others. Perspective taking is the cognitive tendency to spontaneously adopt the psychological point of view of others. Finally, fantasy ostensibly measures the ability to imaginatively transpose oneself into fictional situations, such as those found in literature, drama, or personal daydreams.
In the case of sad music, several studies have shown that the enjoyment of sad music is positively associated with empathic concern but not with personal distress (Eerola et al., 2016; Kawakami & Katahira, 2015; Vuoskoski & Eerola, 2017; Vuoskoski et al., 2012). Huron and Vuoskoski (2020) reviewed behavioral and neuroscientific research identifying compassion as a positively valenced emotion. If empathic concern is interpreted as a measure of compassion and personal distress is interpreted as a measure of (contagious) commiseration, then trait-related studies of sad music offer a ready explanation for the enjoyment of sad music; compared with sad-music dislikers, sad-music likers experience a higher ratio of (positive) compassion to (negative) commiseration. That is, those listeners who most enjoy sad music experience more positive feelings of concern, pity, charity, or sympathy in relation to negative feelings of distress, misery, suffering, or sorrow. This seemingly straightforward account of sad-music enjoyment is confounded, however, by the fact that the IRI fantasy factor is even more predictive of sad-music liking (e.g., Eerola et al., 2016; Vuoskoski et al., 2012).
In order for music to evoke a feeling of compassion, one would expect that a prior condition is that listeners perceive, sense, or infer some form of agency. Since much of the musical stimuli used in extant sad-music experiments have relied on non-vocal instrumental music, one possibility is that people who score high on trait fantasy are more likely to impute agency to instrumental sounds, and so are more likely to experience a socially oriented emotion, such as compassion. This conjecture suggests the hypothesis that high-fantasy listeners are more likely to impute agency to instrumental music. That is, listeners with high-fantasy proneness are more likely to describe music as conveying a sense of animacy or agency.
Method
A simple approach to testing this hypothesis might be to have listeners describe their musical experiences and compare their use of agency-related words or metaphors to their fantasy-facet scores on the IRI. Such open-ended tasks require complex content analysis of participant descriptions, and the design lends itself to low power. A more powerful design might involve asking participants to select music-appropriate descriptive terms from a prior list of independently assessed words with contrasting agency connotations.
In brief, three online studies were conducted. The materials and procedures were reviewed and approved by the Internal Research Ethics Committee at the Department of Psychology, University of Oslo (reference number: 9553162). All participants provided written informed consent before participation. In Study 1a, 57 native-English speakers rated 140 descriptive terms according to their implied agency. In Study 1b, 55 native-English speakers rated the similarity (synonymousness) of pairs of descriptive terms drawn from Study 1a. The result of Studies 1a and 1b was the creation of a word list used in Studies 2 and 3. In Study 2, 137 participants rated the appropriateness of each word as a useful musical descriptor. In Study 3, 299 listeners heard a variety of instrumental musical passages and were asked to choose suitable music descriptors. With the exception of Study 1b, participants in all studies also completed the IRI. All data and code are provided in the Open Science Framework (OSF) repository: https://osf.io/z3xhs/.
Music descriptors
In much music-related emotion research, it is common to have participants provide verbal descriptions of the emotional content, either emotions thought to be represented or conveyed by the music, or emotions thought to be evoked or induced in the listener. As noted, rather than employing an open-ended descriptive task, we aimed to increase experimental power by providing participants with a list of terms from which they would select appropriate descriptors.
In the current study, two properties of descriptive terms were of interest. First, how much does the term connote agency? And secondly, how suitable is the term for describing music? In addition, in order to reduce the possibility of highly skewed data, it would be helpful to match each high-agency term with a nearly synonymous low-agency term. Many musical descriptors clearly imply some sort of agent. Examples include words such as joyful, elated, and proud. Other descriptors imply less agency, such as rough, tranquil, and simple. Apart from the issue of agency, some descriptive terms are more appropriate for music than others. For example, listeners are more likely to describe a musical passage as peaceful or perky but not pastel or plastic.
In describing a particular musical passage, one would expect broad agreement among listeners regarding general features. For example, a given passage might reasonably be described as fast, happy, thrilling, or energetic, as opposed to slow, sad, morose, or lethargic. Ideally, in providing a list of possible descriptors, it would be useful to offer terms that are synonyms or near synonyms with regard to musical character that nevertheless differ with regard to agency or animacy. For example, the term heroic might be expected to imply more agency than the nominal synonym grand. Similarly, the term triumphant would seem to imply more agency than the nominal synonym monumental.
We began by assembling a list of descriptive musical terms gleaned from the existing literature on music-related emotion. For example, Zentner et al. (2008) collected a large number of musical descriptors from French fans attending a musical festival. From the collected descriptors, they produced a 9-factor model of music-related emotion. However, for the purposes of this study, we used 38 unique descriptive words drawn from Study 2. This set included words (in English translation) such as angry, calm, nervous, and proud. Only terms that would function as descriptive adjectives were selected, and thus terms such as thrills and goose bumps were excluded. In addition to these descriptors, we supplemented our list with a further 48 music descriptive terms drawn from a number of other published and unpublished sources pertaining to the evaluation of music-related emotion, timbre, and other musical features. Our preliminary list of 86 words is shown in Table 1.
The preliminary list of 86 words drawn from previous studies on music and emotion.
Since our intention was to compare high-agency descriptors with low-agency descriptors, it was necessary to identify approximate synonyms for each of the 86 candidate terms. Hence, for each term in Table 1, the authors identified one or more companion terms that the authors regarded as plausible synonyms that also contrast with the agency implied by the original term. For example, the authors conjectured that depressed implies relatively high agency and that a plausible low-agency synonym for depressed might be dark. Similarly, we conjectured that dull implies rather low agency and that a plausible high-agency synonym would be boring.
For several especially challenging terms, the authors identified two or more possible synonyms. For example, the descriptor sad appears to exhibit very high agency. It is difficult to identify words that are good synonyms for sad that do not also exhibit high agency. Consequently, we identified three possible low-agency words (gloomy, pale, and sunless), one of which we hoped might be considered more synonymous than the others.
In matching pairs of possible synonyms, we began by looking for other terms in our preliminary list that might provide suitable partners. Hence, we were able to link lively (presumed higher agency) with fast (presumed lower agency) and glum (presumed higher agency) with drab (presumed lower agency). In other cases, it was necessary to select descriptive terms not included in the preliminary list. For example, we matched the listed word slow (presumed lower agency) with a more explicitly higher agency non-list word tired. Similarly, we matched the listed word defiant (presumed higher agency) with a more explicitly lower agency non-list word resistant. In total, the entire roster of descriptors includes some 140 terms representing 70 nominal descriptive pairs (identified in Table 2). Non-list words in Table 2 are shown in italics.
The 70 pairs of words used in Studies 1a and 1b, and the means and standard deviations representing implied agency, synonymousness, and difference in implied agency.
Words that were not part of the preliminary list (see Table 1) are shown in italics. The pairs marked with an asterisk (*) satisfied the strict selection criteria, whereas the pairs marked with a dagger (†) satisfied the more relaxed criteria (see text). Pairs rendered in bold were selected for inclusion in Study 2.
The purpose of Study 1a was to rate the degree of agency implied by each of the 140 descriptors. The purpose of Study 1b was to rate the extent to which the 70 pairs of terms were synonymous.
Participants
Participants for all studies were recruited through the Prolific recruitment portal (Palan & Schitter, 2018; Peer et al., 2017). All the studies involved tasks in which participants make subtle distinctions between different words in terms of their connotations and denotations. Language competency was therefore deemed essential. Accordingly, only participants were recruited who reported native or first-language competency in English. In total, 603 participants were recruited. Of these, 576 participants completed the surveys, with 368 (63.89%) females, and with a mean age of 35.76 years. In order to avoid possible demand characteristics, recruitment was conducted in such a way that no one was able to participate in more than one study.
Study 1a: Procedure
Sixty-five participants were asked to characterize candidate descriptive terms according to their implied agency using a continuous scale. Participants received the following instructions:
For each of the following words, we want you to rate how much the word implies a human person. For example, words like “happy” and “shout” might be rated as having a human implication, whereas words like “green” and “flow” might be considered less human-like. To what extent would this word be more suitable for describing a person or human action rather than an object, event, or natural phenomenon? Please aim to use the complete range of the scale.
Participants rated each word via a horizontal slider (range: 0–100) with the endpoints labeled very human-like and not at all human-like. The instructions above were visible throughout the task. The results are reported below in conjunction with the results for Study 1b.
Study 1b: Procedure
Sixty-one participants were presented with pairs of words representing our presumed descriptive synonyms and were asked to rate their similarity on a continuous scale (range: 0–100), with the endpoints labeled highly synonymous and not at all synonymous. Participants received the following instructions:
For each of the following pairs of words, rate how synonymous the words are. For example, the words “love” and “affection” might be rated as highly synonymous, whereas “climate” and “color” might be rated as not at all synonymous. Please try to use the full range of the scale.
Results (Studies 1a and 1b)
Prior to analysis, we inspected the data for unfinished responses and outliers. We obtained an intercorrelation between all participants, thereby creating correlation coefficients indicating responses for each participant compared to responses from all other participants. We excluded participant coefficients that fell more than two standard deviations below the mean for all participant coefficients. In Study 1a, we removed four participants who did not finish the survey and four as outliers. In Study 1b, we removed one participant for not completing the survey and five participants as outliers. Means and standard deviations for both agency (Study 1a) and synonym ratings (Study 1b) are shown in Table 2. The range for both rating scales was 0–100. Differences in mean agency scores are also tabulated. Each line in the table identifies a pair of proposed synonyms. As noted, we proposed more than one synonym in some cases (such as joyful and sad). In total, participants rated 70 pairs of words.
Recall that the aim of Studies 1a and 1b was to assemble a set of descriptive pairs satisfying two criteria. The first was that pairs of words differ with regard to agency; that is, where one term connotes relatively lower agency than the other term in the pair. The second criterion was that the pairs were deemed relatively synonymous.
We had set these two criteria, a priori, when we selected our final list of pairs of descriptive terms: those that received both synonym and agency difference ratings above the median for each category. It was possible, therefore, that this selection method was too strict and we decided to relax these criteria as needed in order to ensure a list of at least 15 paired terms (30 descriptors) for use in the subsequent studies. Moreover, in order to ensure a range of descriptive terms capable of characterizing a variety of music, we resolved to retain at least two descriptive pairs capable of characterizing music in each of the four valence/arousal quadrants in the common two-dimensional emotion model (nominally representing happy, sad, scary, and tender musical passages).
In the data we collected, we found that 15 pairs of words satisfied both criteria (i.e., for contrasting agency and high synonymousness). These pairs are marked with asterisks in Table 2. Unfortunately, three words (cheerful, depressed, and gloomy) were duplicated. We added two further pairs of words (cheerful/radiant and glum/drab) by lowering the agency difference from the median value (23.75) to 22.00 and the synonym ratings from the median value (65.28) to 64.00. By replacing cheerful/bright with cheerful/radiant and depressed/gloomy with glum/drab, we ended up with 15 unique pairs of synonyms employing 30 unique (i.e., unduplicated) words.
Conveniently, the resulting 15 pairs of synonyms also fulfilled the criterion we had set, a priori, of ensuring a minimum of two pairs to represent happy, sad, scary, and tender moods, respectively. These were as follows: happy (cheerful/radiant, happy/bright, and flamboyant/colorful); sad (sad/gloomy, depressed/dark, drained/colorless, and glum/drab); scary (defiant/resistant, violent/rough, and frightened/shaky); and tender (sympathetic/soothing and caring/soft). The three remaining pairs of synonyms exhibiting contrasting agency with high synonymousness included fat/heavy, intimate/close, and open/wide. These pairs are shown in bold font in Table 2.
Study 2
A total of 154 native-English speakers were recruited for this experiment. Participants were presented with the 30 descriptive terms selected from Studies 1a and 1b. They received the following instructions:
When people describe music they may make use of many different descriptive words. Some words seem better suited than others for describing a particular musical passage. For example, it might be appropriate to describe a particular song or musical passage as “bold” or “charming.” But words like “curly” or “round” might seem less suitable for describing a musical passage. In this experiment, you can aid our research by helping us to identify those words that are most and least useful for describing music. For each of the following words, rate how pertinent the word is for describing music. Please try to use the complete range.
Each participant was presented with the 30 words in unique random order. Each word appeared in isolation, and participants were able to move a vertical slider (range: 0–100) whose endpoints were labeled not at all appropriate word for describing music and highly appropriate word for describing music. Following the 30 target words, five randomly selected words (flamboyant, shaky, wide, sad, and cheerful) were repeated in order to collect test-retest reliability measures.
After completing the word-rating task, participants then completed the 28-question IRI empathy survey (Davis, 1980).
Results (Study 2)
Prior to analysis, we inspected the data for unreliable responses and outliers. As in Study 1 we excluded participant coefficients that fell more than two standard deviations below the mean for all participant coefficients (n = 10). Second, we correlated each participant’s test-retest data for the five words randomly selected: flamboyant, shaky, wide, sad, and cheerful. Again, participant coefficients that fell two standard deviations below the mean for all participant coefficients were removed (n = 7), leaving the final sample size at N = 137.
Linear mixed-effects model
To test our hypothesis that listeners higher in IRI trait fantasy would more often choose musical descriptors higher in word agency, we created a linear mixed-effects model using the lme4 R-package v. 1.1-23 (Bates et al., 2015). Specifically, we tested whether high-agency words would interact with participants’ fantasy scores in predicting music-applicability ratings. Word agency was treated as a dichotomous variable, and fantasy score was treated as a continuous variable. The model employed random intercepts for both participants and words, as well as random slopes for word agency (between-word variable) by participants, and fantasy scores (between-participant variable) by words. The model equation is presented below.
Note.
The results for the fixed effects are presented in Table 3, and the random effects are provided in the Supplementary Tables on OSF. There were no significant main effects of word agency (β = .132, p = .345) or trait fantasy (β = .044, p = .062), and no significant interaction effect (β = .001, p = .668) in predicting music-applicability ratings.
The fixed and interaction effects of the linear mixed-effects model predicting music-applicability ratings.
Correlation analysis
In order to perform an analysis that could account for the variance in agency scores, we performed a second analysis. First, for each participant, ratings of musical appropriateness for each of the 30 words were correlated with the mean agency scores from Study 1b using Spearman rank-order correlations. Second, these correlation coefficients were correlated with the participants’ Fantasy scores. There was no significant association between the participants’ correlation coefficients and Fantasy scores (ρ = .017, p = .846).
Study 3: Procedure
In Study 2, we asked participants to rate the applicability of different descriptive terms to music; we did not provide any musical stimuli. That is, participants responded to an abstract question in the absence of any actual musical context. There remained, therefore, the question as to whether an association between fantasy and agency would be evident when participants were asked to listen to musical excerpts and characterize them. To that end, we recruited 323 native-English speakers to Study 3, delivered online, in which they responded to 24 musical excerpts. For each excerpt, participants selected the most appropriate descriptive terms (up to a maximum of three) from a list of 16 terms varying in their degree of agency. Excerpts were 15 s in duration, and participants were required to listen to at least 10 s of each excerpt to be included in the analysis. According to this criterion, we excluded 2 participants, and we also excluded 22 further participants who did not complete the survey, resulting in 299 complete responses for analysis.
We obtained the musical stimuli from a film music stimulus set compiled by Eerola and Vuoskoski (2011). The stimuli were selected to represent the four quadrants of the arousal-valence space: high arousal and positive valence (nominally happy), low arousal and positive valence (tender), high arousal and negative valence (scary), and low arousal and negative valence (sad). In the present study, each quadrant was represented by six 15 s musical excerpts. The list of stimuli and their mean valence and arousal ratings are provided in Appendix 1.
So that the task would not take too long, we trimmed the list of 30 descriptive terms (15 synonym pairs) used in Study 2 to 16 descriptive terms (8 synonym pairs). In selecting our reduced subset of descriptive terms, our first criterion was to ensure the availability of suitable descriptors for each of the four quadrants in the arousal-valence space. For each quadrant, our aim was to provide four pertinent descriptors (two synonym pairs).
Each of the three authors of the study independently rated all 30 descriptive terms according to their arousal and valence. The inter-author average paired correlation was .93 for valence ratings and .89 for arousal ratings. More importantly, the authors independently agreed on the quadrant assignments for all 30 descriptive terms.
As documented above, in Study 2, participants deemed some terms more appropriate than others for describing music. For example, terms such as sad and cheerful were rated high as musical descriptors, whereas fat and close were rated low. Moreover, the differences in music applicability for some synonym pairs were greater than for others. Examples include sympathetic/soothing and caring/soft; that is, sympathetic and caring were rated as much less applicable to music than their synonymous partners soothing and soft.
In Study 3, our aim was to investigate inter-individual variability in choosing high-agency vs. low-agency descriptors for musical excerpts. If all participants regarded soothing as more applicable to music than sympathetic, then we would be unlikely to see any tendency for high-fantasy participants to favor high-agency descriptors. Consequently, in order to increase the sensitivity of the study, we should include synonym pairs rated least different for music applicability. Accordingly, for each of the four arousal-valence quadrants, we selected those synonym pairs with the smallest differences between their mean music-applicability ratings.
The synonym pairs are shown in Table 4, classified according to the four arousal-valence quadrants. Also shown are the mean arousal and valence values (ranging from −5 to + 5) according to the three authors’ ratings. The differences between the mean music-applicability scores for each synonym pair are shown in the fourth column. (Positive difference scores are obtained when high-agency words are rated more applicable to music; negative difference scores are obtained when low-agency words are rated as more applicable to music.) In each arousal-valence quadrant category, synonym pairs are ordered according to the absolute difference in mean applicability scores. Once again, in order to enhance sensitivity, we aimed to select those synonym pairs whose descriptive words are similarly appropriate for describing music. That is, we chose those pairs with the smallest absolute difference scores for music applicability. For low arousal/positive valence, we selected intimate/close and caring/soft; for low arousal/negative valence, we selected drained/colorless and glum/drab; for high arousal/positive valence, we selected flamboyant/colorful and happy/bright; for high arousal/negative valence we selected violent/rough and frightened/shaky. Notice that the absolute difference in mean applicability scores was considerably higher (absolute M = 50.6) for the selected low arousal/positive valence descriptive word pairs compared with the mean applicability scores for all other selected word pairs (M = 12.0).
The list of 15 synonym pairs used in Study 2, categorized into the four valence-arousal quadrants, with the authors’ mean ratings of valence and arousal (range: −5 to + 5), as well as the mean music-applicability ratings from Study 2 and the difference in applicability for each pair (positive difference scores indicate that the high-agency word was rated more applicable to music; negative difference scores indicate that the low-agency word was rated more applicable to music).
Participants received the following instructions:
In this experiment, you will hear a series of 24 musical excerpts. You will also see displayed a number of descriptive words. After the completion of the music, we want you to identify up to three words that you feel best describe the music. You can select fewer than three words, although you must select at least one word for each excerpt. When you have finished selecting the most pertinent descriptive words, press on the “next excerpt” button to continue with the next musical example.
The 16 displayed descriptors were bright, caring, close, colorful, colorless, drab, drained, flamboyant, frightened, glum, happy, intimate, rough, shaky, soft, and violent. The order of the displayed descriptors was randomized for each participant, although the order of descriptors for any given participant was retained for successive musical excerpts. Descriptive terms were displayed throughout the sounded musical excerpt. The response page included the instruction reminder: “Select up to three words”:
Following the end of each musical stimulus and having selected the most appropriate descriptive term(s), participants subsequently rated their enjoyment of the musical excerpt. The screen displayed the question, “How much did you enjoy the musical excerpt?” A horizontal slider was provided with the left and right ends labeled Did not enjoy at all and Enjoyed very much, respectively. The default slider position was set in the middle.
After completing the music descriptor selection and enjoyment rating tasks for all 24 excerpts, participants then completed the 28-question IRI empathy survey.
Results (Study 3)
For each participant, we obtained an aggregate musical-descriptor agency score for each excerpt by summing up the agency values for all words chosen, divided by the number of words chosen. This aggregate score was used as our dependent variable.
A linear mixed-effects model analysis was conducted using the lme4 R-package v. 1.1-23 (Bates et al., 2015) to investigate whether people scoring higher in trait fantasy would be more likely to choose musical descriptors that were high in agency. Further, we included enjoyment ratings in the model to investigate if this factor had a main effect on or interacted with IRI fantasy scores in predicting musical-descriptor agency scores. In addition, we included valence and arousal as factors to investigate whether the potential main effects of IRI Fantasy or enjoyment ratings would differ between low and high-arousal excerpts and low and high-valence excerpts. In order to avoid overfitting, we created three models: the first model tested the interaction between fantasy and enjoyment, the second model tested the interaction between enjoyment and valence, and the third model tested the interaction between enjoyment and arousal. The model employed random intercepts for both participants and excerpts, as well as random slopes for fantasy (between-participants variable) by excerpts, valence, and arousal (between-excerpts variable) by participants, and enjoyment by participants and excerpts (between-participants and between-excerpts variable). The model equations are given below:
Model 1
Note.
Models 2 and 3
Note.
The fixed effects of the linear mixed-effects models are presented in Table 5, and the random effects are provided in the Supplementary Tables on OSF. Measures of model fit (Akaike information criterion; AIC) indicated that Model 3 was the best fit to the data. There were no significant main or interaction effects related to trait fantasy. However, there were significant main effects of arousal (B = 8.13, p < .001) and enjoyment (B = 0.11, p < .001), as well as a significant interaction between arousal and enjoyment (B = −0.07, p = .002). The effect of enjoyment on the aggregate descriptor-agency scores was stronger for low-arousal excerpts compared to high-arousal excerpts.
The fixed and interaction effects of the three linear mixed-effects models predict the aggregate musical-descriptor agency scores for each excerpt.
Discussion (Study 3)
The motivation for this study was an effort to try to explain why high trait fantasy might be associated with the enjoyment of sad music. We proposed that those individuals who score high on fantasy might be more likely to perceive music as an agent or actor, and therefore sad music might be more likely to evoke a (positive) empathetic response.
In Study 2 we tested whether high-agency words would interact with participants’ fantasy scores in predicting ratings of applicability for describing music. We found no evidence consistent with such an interaction effect.
In Study 3, we had participants judge the applicability of different words in describing actual music examples. Once again, our aim was to test whether those individuals who score high on fantasy might be more likely to characterize music using high-agency descriptors. Consistent with the results of Study 2, we found no relationship between IRI trait fantasy and a tendency to favor high-agency descriptors for music, contrary to our motivating hypothesis.
Interestingly, Study 3 revealed an unexpected relationship between the tendency to favor high-agency descriptors and both musical arousal and participant enjoyment. Specifically, the more a participant enjoyed the music, the greater the likelihood of favoring high-agency descriptors of the music. In addition, the greater the musical arousal, the greater the likelihood of a participant favoring high-agency descriptors. At the same time, the results indicated a negative interaction between these two factors: for high-arousal excerpts, enjoyment contributed less to the tendency to favor high-agency descriptors.
In the case of enjoyment and agency, there is an issue of causality: does greater enjoyment dispose listeners to favor higher-animacy descriptors? Or does the perception of higher agency result in greater enjoyment? In the case of arousal, the causality is clearer since arousal levels were not judged by the participants. A preference for high-agency descriptors cannot cause the musical stimuli to be more arousing. Instead, higher-arousal music must be the reason why participants prefer higher agency descriptors.
Interpreting the overall results, high-arousal music appears to encourage listeners to perceive the music as exhibiting more agency. It may also be the case that greater enjoyment disposes listeners to characterize the music as exhibiting greater agency, but it is also possible that music perceived as exhibiting greater agency is enjoyed more.
Post hoc analyses
Recall that the motivation for the current study was to establish a better understanding of sad-music enjoyment. The musical stimuli used in Study 3 were drawn from a range of emotions, including both sad and non-sad passages. Perhaps the relationship proposed in our main hypothesis is pertinent only to nominally sad music. The stimuli for Study 3 consisted of 24 musical passages explicitly drawn from the four quadrants of the common arousal-valence model. Nominally melancholic music is typically found in the low-arousal/negative-valence quadrant. Accordingly, we conducted a post hoc analysis in which we used only data from the low-arousal/negative-value quadrant. Once again, the aim of our analysis was to determine whether IRI fantasy scores play a statistically significant role in predicting agency scores. Our analysis showed a non-significant relationship (p = .59).
Since fantasy is implicated in the enjoyment of sad music, another post hoc approach might focus on the degree to which participants enjoy sad music. For this analysis, we created a new variable (sad-music liking). This was operationalized as the difference score between a participant’s mean enjoyment rating for low-arousal/negative-valence stimuli and the participant’s mean enjoyment ratings for all other stimuli. Here, we made two predictions: those participants who most enjoy low-arousal/negative-valence stimuli were more likely to score high on IRI fantasy and were also more likely to choose high-agency words as appropriate music descriptors. Restricting our analysis to the low-arousal/negative-valence quadrant, we found a non-significant correlation (p = .60) between low-arousal/negative-valence music liking and IRI fantasy. In addition, we found a non-significant correlation (p = .54) between low-arousal/negative music liking and word agency score.
Recall that apart from fantasy, the IRI assesses three other facets of empathy, including empathic concern, personal distress, and perspective taking. Although our original research plan focused exclusively on fantasy, the enjoyment of sad music is known to also be related to empathic concern. Notably, sad-music likers tend to score higher on empathic concern. Consequently, we tested whether those participants who scored higher on empathic concern were more likely to favor high-agency descriptors for low-arousal/negative-valence music. Once again, we found a non-significant correlation was found.
We also decided to test whether the current results replicated those of earlier studies regarding the relationship between trait fantasy, empathic concern, and sad-music enjoyment. Several studies have found that listeners who most enjoy nominally sad music tend to score high on trait fantasy and empathic concern in the IRI empathy scale (Eerola et al., 2016; Kawakami & Katahira, 2015; Sattmann & Parncutt, 2018; Vuoskoski & Eerola, 2017). Consistent with this research, we found a significant positive association between low-arousal/negative-valence music liking and IRI fantasy (r = .12, p = .045). We also found a significant positive relationship between liking for low-arousal/negative-valence music and empathic concern (r = .16, p = .005).
In our final post hoc analysis, we examined the possibility that the significant relationship between high-arousal music and the selection of more high-agency descriptors might be due to some inherent bias toward high agency among the high-arousal descriptors. We did this by calculating the Spearman rank-order correlation between the mean arousal ratings given by the three authors for the descriptive terms (obtained after Study 2) and the mean agency ratings obtained in Study 1A. There was no significant correlation (r = .11, p = .53), suggesting that the observed association between agency and musical arousal is unlikely to be due to any systematic bias in the study design. However, since the present study utilized only a limited number of musical excerpts and descriptive terms, it is nevertheless possible that the design may have caused certain high-arousal, high-agency words to be favored over their low-agency alternatives.
Study 3: Conclusion
The results of this study did not reveal any positive association between trait fantasy and high-agency descriptors in a music-listening task. However, we found that when asked to select words that best describe a particular musical passage, participants were more likely to select words associated with high agency for high-arousal music as well as for music that they enjoyed.
General discussion
In the first instance, our results were not consistent with the motivating hypothesis that high-fantasy listeners are more likely to impute agency to instrumental music. We failed to find a relationship between trait fantasy and the tendency to favor high-agency descriptors for music both in a word-rating task, as well as in a listening experiment.
While the participants in our studies found many of the high-agency descriptors highly appropriate for describing music, our findings suggest that trait fantasy does not seem to facilitate the detection or perception of agency in instrumental music. Davis defined “fantasy” as “the tendency to imaginatively transpose oneself into fictional situations (e.g., books, movies, daydreams)” (Davis, 1980, p. 11). 1 As defined, the fantasy facet aims to assess the degree to which a person is more or less transported, absorbed, or gets into some fantasized or imagined experience. Thus, it seems plausible that fantasy contributes to the degree to which listeners empathically identify with the detected agency cues in music rather than to the detection of these cues. However, further experimental research is required to investigate this possibility more thoroughly.
Although the present series of studies did not reveal any significant relationship between trait fantasy and agent-sensitivity in music listening, the results nevertheless suggest that the detection of agency may be important in music-related emotion. Specifically, we found that both enjoyment and musical arousal were positively associated with the tendency to select musical descriptors implying high agency. The fact that enjoyment was positively associated with the tendency to favor high-agency descriptors supports the view that social cognition and social emotions play an important role in our experience of music (cf. Clarke et al., 2015; Huron & Vuoskoski, 2020). Specifically, it is possible that the positive relationship between enjoyment and attribution of agency is related to a process of identification with the music. The experience of identification is closely related to empathy (e.g., Davis, 1980; Egermann & McAdams, 2012), and appears to involve heightened enjoyment and increased similarity between the listener’s perceived and felt emotions (see, for example, Egermann & McAdams, 2013; Schubert, 2007, 2013). Sloboda (2000) suggests that music may create an environment where the attribution of the detected emotion—either to oneself or to an external agent—may be particularly fluid. The experience of identification may even involve sharing the emotions of an imagined, indefinite agent or persona conveyed by the music (Levinson, 2006); a process where trait fantasy may potentially play a modulating role. However, the question of causality remains open: does the perception of agency contribute to increased identification and enjoyment or does enjoyment and/or the experience of identification make listeners more prone to selecting descriptors implying high agency? Both options appear plausible and should be subject to further experimental investigation.
The arousal dimension is positively associated with energy and activity (cf. Schimmack & Grob, 2000), and thus it is possible that the perception of high activity contributes to an increased sense of agency. It may also be that high-arousal music evokes more intense emotions in listeners (cf. Dibben, 2004) or gives rise to more salient motion imagery (cf. Eitan & Granot, 2006), which in turn could contribute to increased attribution of agency to music. However, it should be noted that this conjecture remains speculative, since the current experiments did not investigate induced emotional responses or imagery. It is furthermore possible that some aspect of the experimental design may have favored high-arousal, high-agency descriptors over their low-agency counterparts, although we attempted to mitigate this possibility with our post hoc analyses. Nevertheless, future studies should investigate how the detection of agency contributes to music-induced emotions (or vice versa) since music enjoyment has been shown to be positively associated with emotional intensity (e.g., Ladinig & Schellenberg, 2012).
By way of summary, the research presented here suggests that listeners find both high- and low-agency descriptors appropriate for describing music. The tendency to favor high-agency descriptors was associated with increased enjoyment and musical arousal. Trait fantasy did not appear to facilitate the attribution of agency to music.
Footnotes
Appendix
List of stimuli used in Experiment 3, with mean valence and arousal ratings obtained from Eerola and Vuoskoski (2011).
| Category | Number a | Soundtrack name | Valence (range: 1–9) | Arousal (range: 1–9) |
|---|---|---|---|---|
| Positive valence and low arousal | 44 | Pride and Prejudice | 7.38 | 3.21 |
| 48 | Dracula | 5.85 | 3.56 | |
| 83 | Big Fish | 6.40 | 4.49 | |
| 98 | Naked Lunch | 5.46 | 4.78 | |
| 102 | Shakespeare In Love | 6.01 | 4.96 | |
| 103 | The Fifth Element | 5.87 | 4.54 | |
| Positive valence and high arousal | 23 | Shallow Grave | 8.27 | 8.54 |
| 53 | Gladiator | 7.07 | 6.76 | |
| 72 | Man of Galilee CD1 | 7.45 | 8.39 | |
| 75 | Batman | 7.31 | 8.04 | |
| 77 | Lethal Weapon 3 | 6.27 | 6.34 | |
| 78 | Crouching Tiger | 5.30 | 5.87 | |
| Negative valence and low arousal | 32 | Running Scared | 4.04 | 3.67 |
| 33 | The Portrait of a Lady | 4.38 | 2.48 | |
| 38 | Dracula | 4.73 | 2.79 | |
| 63 | Batman | 4.76 | 3.96 | |
| 89 | Blanc | 4.33 | 3.21 | |
| 90 | Batman Returns | 4.43 | 3.55 | |
| Negative valence and high arousal | 2 | The Rainmaker | 2.50 | 8.21 |
| 64 | The Fifth Element | 3.03 | 5.12 | |
| 91 | The Alien Trilogy | 3.69 | 7.09 | |
| 92 | The Fifth Element | 2.99 | 6.99 | |
| 93 | Babylon 5 | 2.46 | 7.25 | |
| 97 | Shallow Grave | 2.88 | 6.51 |
Stimuli were obtained from the database published by Eerola and Vuoskoski (2011).
Stimulus number in the set of 110 film music excerpts published by Eerola and Vuoskoski (2011).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Research Council of Norway through its Centres of Excellence scheme, project number 262762.
