Abstract
We present a novel framework for music and emotion research that addresses emotional experiences with music as functional episodes. This framework, called the Episode Model, places the situation and the function of the music for the individual at the centre of the experience and integrates acts of affective self-regulation to our understanding of music as emotional experiences. The model consists of a set of five common and functionally unique episodes of emotional experiences related to music, which are: (1) Enjoyment–Distraction–Relaxation (EDR), (2) Connection–Belonging (CB), (3) Focus–Motivation (FM), (4) Personal Emotional Processing (PEP), and (5) Aesthetic–Interest–Awe (AIA). Each episode type can be characterised by a distinct configuration of six descriptive schemes: (1) core affect and emotion qualia, (2) induction mechanisms, (3) listening modes and agency, (4) reward and exposure, (5) musical meanings, and (6) functional contexts. This framework of episodes and schemes places the functionality of emotions at the forefront of music and emotion research and explains how emotional experiences are situated and functionally constructed. In addition, we provide a set of assumptions and specific predictions to facilitate focussed empirical studies of emotional engagement with music.
The ways we experience and react to music are diverse, situated, contextualised, and often emotional (Boer et al., 2012; Greb et al., 2018; Randall et al., 2014). Our reactions range from a gentle smile or a curious rise of eyebrows all the way to being moved to tears or passionately tapping along. Different reactions made by the same person in relation to the same song, with situational contextualization differentiating the experience, illustrates the complexity of studying these experiences. Over the past two decades, researchers have actively tried and made great advancement toward understanding how music relates to emotions. The importance of context and situation for the musical experience has been noted by several scholars (e.g., Juslin & Laukka, 2004; Scherer & Zentner, 2001), with progress made in how to empirically measure contextual features alongside their relevance for music listening experiences (Greb et al., 2018; Randall et al., 2014). In this article, however, we argue that there is a gap in the theoretical perspectives concerning the functional and situated nature of these experiences that has not been fully articulated by the theorising to date. By drawing together some of the recent theoretical developments across the field, we propose a new theoretical framework that addresses this gap and shifts the focus of music and emotion research toward characterising functional and situated patterns of emotional episodes.
Classical paradigms of general emotion psychology including basic emotions and dimensional models have offered solid ways to study music-related emotions, exploring, for instance, whether musical features serve as determinants of such previously validated labels of emotional content (Eerola & Vuoskoski, 2012; Juslin et al., 2022; Warrenburg, 2020). Interest in understanding the source of music-related emotions has broadened from musical features to psychological and cognitive mechanisms of emotion induction (Juslin & Västfjäll, 2008). Appraisal and socio-constructivist approaches have emerged and emphasised the role of personal goals, states, and contextual factors as constituents of the complex interplay leading to emotional responses (Céspedes-Guevara, & Dibben, 2022; Juslin, 2019; Lennie & Eerola, 2022; Scherer & Coutinho, 2013). Here we critically address the current set of theoretical accounts and argue that they have not yet offered a satisfying solution to understand the contextual nature of music-related emotional experiences. Existing accounts are inadequate in their capacity to explain what is typically taking place in the context of music engagement and why people actually experience emotions in relation to music. Our argument aligns with recent developments of emotion research, where situation-specific needs and contextual (situational and cultural) factors are now recognised to play a much more important role in emotions than previously acknowledged (see Brosch et al., 2008; Chiao & Ambady, 2007; Hoemann et al., 2020).
In this article, we argue that to understand the role of the situation and the individual context for music-related emotions, a more procedural, integrative, and holistic perspective is needed. While it remains important to identify specific emotion labels, mechanisms, and musical features as constituents of an experience, none of these components alone adequately addresses the situationally constructed nature of music-related emotional experiences. In this, we follow Russell’s psychological construction theories (Russell, 2003, 2017) and Barrett’s (2017) notion that addressing emotions as constructed is an insightful way to understand them. Yet, the mere acknowledgement of construction is not enough; the field of music and emotion needs theoretical propositions that are testable, falsifiable and that have explanatory and predictive power (Shoemaker, 2003). Indeed, increasingly holistic theoretical accounts are being introduced in general emotion psychology, including the integrative psychometric models of emotion (Lange et al., 2020) and the combination of different components of the emotion system (Hollenstein & Lanteigne, 2014). We believe that a fruitful avenue for this is to incorporate the concepts of emotional self-regulation and the functions of music engagement into the model itself. As a synthesis of these approaches, we introduce an episode model of emotional experiences of music.
Theoretical overview
We first summarise three theoretical accounts of how music induces emotions, namely an influential theory of mechanisms by Juslin and Västfjäll (2008), an adaptation of the appraisal theory by Scherer and Coutinho (2013), and constructivist perspectives by Céspedes-Guevara (2023), Lennie and Eerola (2022), and recent work emphasising the importance of context and construction (Juslin, 2019). We present these according to three levels of psychological analysis – micro, meso, and molar (De la Fuente et al., 2019) – emphasise the different foci of the theories. Finally, we outline perspectives of emotion regulation and psychological functions of music.
Mechanisms of emotion induction
One of the most influential models for the research of emotion induction through music is the BRECVEM framework (2008). BRECVEM mechanisms are offered “as information-processing devices at different levels of the brain, that use various means to track significant aspects of the environment, and that may produce conflicting outputs” (Juslin & Västfjäll, 2008, pp. 568–569). The model provides an evolutionary perspective to differentiate the mechanisms by different brain functions, induced affect, ontogenetic development, cultural learning, and key brain areas (Juslin, 2013; Juslin & Västfjäll, 2008). Continuing to improve what is now the BRECVEMA model, Juslin and colleagues have added additional mechanisms (aesthetic judgement) and have acknowledged the role of cognitive appraisals and socially constructed processes for the mechanisms (Juslin, 2013; Juslin et al., 2022). Several mechanisms in the model have been empirically verified to be capable of producing distinct experiential and physiological responses (Juslin et al., 2014, 2015) and cross-cultural support has also been found for some mechanisms (Juslin et al., 2016).
The BRECVEMA model has been criticised due to poorly defined mechanisms such as contagion (Thompson & Coltheart, 2008) and the model itself ignores the contextual and cultural aspects of emotions (Céspedes-Guevara, & Dibben, 2022; Trehub, 2008). The model postulates generic predictions triggered by mechanisms such as arousal and pleasantness by brain stem reflex, nostalgia by episodic memory and surprise, awe, thrills, hope, anxiety by musical expectancy mechanism (Juslin & Västfjäll, 2008, p. 571). Beyond these, the model does not take a position on what emotions are triggered by the mechanisms. High-level goals such as plans or motivations are not part of the model. Céspedes-Guevara (2023) also highlights the fact the BRECVEMA theory lacks predictions based on interactions between the mechanisms.
Past studies utilising the BRECVEMA model have mentioned how the situation has a strong influence on emotions (Juslin et al., 2016; Juslin & Västfjäll, 2008) and have detailed the empirical correspondences between activities, social conditions, listening motives, and emotions. These links were expanded by examining how individual differences, mechanisms, and acoustic characteristics relate to emotions induced by music (Juslin et al., 2022). Acknowledgement of the importance of context has been emphasised in Juslin’s recent monograph on music and emotions (Juslin, 2019), where “it is often the context that ultimately determine whether a mechanism is activated or not” (p. 384). Here, we aspire to turn some of these constructionist observations into a model that explicitly articulates the contextual elements and their likely emotional outcomes.
Appraisal account
Scherer and Zentner (2001) provided a series of routes of emotion induction which incorporate the situation, listener, and music into the equation. The role of relevance (appraisal) of the music to the listener takes a central role. Continuing this work, Scherer and Coutinho (2013) outlined a meso-level framework that explains appraisal-driven induction processes for musical emotions, known as the multifactorial approach, which divides appraisal processes into concepts of novelty, goal-relevance, intrinsic valence, goal-congruence control, and agency. This model assumes that the components are causally related to a fuzzy-set of emotions, giving basic emotions a privileged position while adding aesthetic (appreciation of beauty, awe, or wonder) and epistemic emotions (referring to information processing, knowledge, and sometimes curiosity and confusion). This model inherits elements of dimensional appraisal theories (Moors, 2013; Scherer, 2009) where appraisals are cognitive mechanisms that modify other components of emotions through associative mechanisms. Appraisal theories disagree on the level of detail concerning the biological nature of the system (Moors, 2014) and often the interplay between the appraisals is vague. Moreover, these theories usually do not associate any specific emotions with specific appraisals.
Constructionist account
Constructionist accounts of emotional experiences postulate that emotions emerge in interaction between physical sensations and actions that are meaningful and related to situations (Barrett, 2017). Barrett postulates that emotions are not innate systems but are constructed states continuously built by the brain to interpret ongoing situations (Barrett, 2006, 2014). This supports the idea that core affect and sensory changes form an underlying baseline from which the individual validates an emotional experience against, through learnt categorisation developed by similar prior experiences, referred to as a conceptual act. This process explains how emotional experiences emerge from interactions between context, brain, and body.
There are several constructionist models of emotions induced by music. Céspedes-Guevara (2023) takes core affect as the fundamental level where changes in emotions occur, but the actual interpretation of the experience is operationalised through two conceptual mechanisms, associative, and appraisal mechanisms. The former includes episodic memories, more general semantic memories (personal and cultural connotations), visual imagery, and lyrics. The second incorporates appraisal mechanisms such as the facilitatory role of music, control over music, and aesthetic evaluation of music. His model places attention as the central driver of how mechanisms are given prominence. Another model by Lennie and Eerola (2022) articulates a large number of goal-directed appraisals contextualised by core affect, which are intended to map interactions between a person and the physical and social environment. The evaluative dimensions that are central to the model are novelty, expectations, goal-conduciveness, goal-relevance, emotional coping, and physical coping.
As relatively new approaches, the constructivist models overall lack a supporting body of empirical evidence. The constructionist mechanisms by Cespedes are connected to well-known elements of music induction mechanisms (Juslin, 2013) which associates the theory with existing findings concerning mechanisms, although interactions between model-specific elements, namely the role of attention, have so far been unexplored. Exemplifying difficulty in gathering empirical evidence, the complexity of the model by Lennie and Eerola is exacerbated by the sheer number of interactions between important elements deemed necessary to reflect the theoretical predictions.
Related concepts – toward functional approach
While the previously presented theoretical accounts provide valuable additions to the conceptual knowledge on emotional experiences of music and also acknowledge context as a relevant factor, we argue that there is still a need for an improved conceptual elaboration of emotions as processes that simultaneously capture context, function, and meanings. This critique aligns with recent developments in general emotion psychology, that address emotions as dynamic systems as part functional to the sociocultural environment in which they occur (Mesquita & Boiger, 2014), as emergent episodes that have functional features (Barrett, 2014), or as network-type structures, in which the entire interconnected network of components constitute the emotion (Lange et al., 2020). Schemes that systematically describe the context, function, and the meaning of music are described in the following sections of music-related emotional self-regulation and the functions of music engagement.
Emotional self-regulation
Classic accounts of emotional self-regulation draw an indeterminate line between emotion induction and emotion regulation (Gross, 1998). According to a systems theory approach, humans are considered to be in a constant, dynamic stage of self-regulation behaviour, inherently affecting the emotions evoked and experienced (Quigley & Barrett, 2014; Thompson, 2011). Therefore, a self-regulation perspective provides critical relevance for understanding the procedural and dynamic nature of music-related emotional experiences. Indeed, some models of music-specific emotional self-regulation have been developed; Van Goethem’s (2010) GSTM framework broadly outlines that affective self-regulation can be studied through goals, strategies, tactics, and mechanisms. Saarikallio (2012) and Saarikallio and Erkkilä’s (2007) work identifies seven music-related self-regulation strategies. However, there is inconsistent use of various concepts across empirical self-regulation studies from motivational impulses, stress responses and coping to affect, mood, and emotion (Baltazar & Saarikallio, 2016). There is relatively little research that integrates current self-regulation models with research on music as emotion induction and emotional experience with the exception of a model by Baltazar and Saarikallio (2019) that focuses on personal mental processing and body-focused distraction.
Functions of music engagement
The functions of music engagement have been addressed from a multiplicity of perspectives, including psychological (Schäfer et al., 2013), anthropological (Merriam & Merriam, 1964), and social functions (Hargreaves & North, 1999), pleasure and reward functions (Mas-Herrero et al., 2012), functions of psychosocial development (Laiho, 2004), health-relevant (Groarke & Hogan, 2016), and cross-cultural functions (Boer et al., 2012; Mehr et al., 2019). In most of these perspectives, emotions are typically included as one (or several) of the function categories. Although such categorisation may seem logical in terms of providing conceptual clarity, the idea of placing emotions into a single category of functions contradicts the very essence of emotion. Empirical work also lends support to the interrelatedness of emotions and functions. For instance, Maloney’s (2017) meta-analysis of functions of music concludes that function categories are interlinked to emotions. In addition, Greb et al. (2018) also demonstrated how several functions of music listening relate to emotion regulation. Furthermore, Juslin and his colleagues (Juslin et al., 2008) have provided evidence of how specific listening motivations involving regulatory functions relate to specific emotions. While there is increased acknowledgement on the importance of psychological and social functions for emotions evoked by music, this has not been integrated into theoretical models.
Aim and rationale
A contemporary theory of music and emotion should give emphasis to contextual factors as our real-life experiences are contextually and situationally embedded. This aligns with modern conceptions of human cognition, which stress the importance of including situatedness as part of our conceptualisation of emotions (Krueger, 2014; Newen et al., 2018; Ward & Stapleton, 2012). This aligns with how constructionist theories place emphasis on how a person navigates their physical, social, and cognitive environment by interacting with it. Thus, instead of atoms (emotion labels, mechanisms, the actual music), research should shift focus on clusters of experiential patterns that encompass the holistic, dynamic, and embedded nature of our emotional experiences with music. We call these clusters episodes and argue that they are determined through the emotion-related functions and self-regulatory behaviours for which music is used, such as relaxation, concentration, stress relief, energising, or motivating an activity. In the language of sceptical theories of emotion (Moors & De Houwer, 2001; Moors & Fischer, 2019), we are broadening the topic from stimulus-driven processes to goal-driven processes by allowing situation, motivation, and decision-making to be a substantial part of the theory.
Our focus on episodes means that we aim to explain how goal-driven processes contribute to emotional experiences. Specifically we seek to explain how the purpose of listening and the way we focus and control the music and find it familiar and meaningful impact the emotions experienced.
The episode model
The episode model consists of a set of five types of episodes that capture common and broad clusters of emotion experiences that are distinct in the configuration of affective functions, emotion qualia, rewards, listening modes, and musical meanings. We have derived these episodes from commonly encountered situations where music evokes emotions recorded in experience sampling study (Randall & Rickard, 2017) and also by incorporating the fundamental and cross-cultural prevalent functions of music (Mehr et al., 2019). The choice of five episodes is close to the number of functions that has been inferred in past studies of functions of music listening (Greb et al., 2018; Henry, 2023; Lonsdale & North, 2011; Sloboda et al., 2009). The episodes are also compatible with the main regulatory functions of emotions involved in music (Baltazar & Saarikallio, 2016). The five episodes proposed are Enjoyment–Distraction–Relaxation (EDR), Connection-Belonging (CB), Focus-Motivation (FM), Personal Emotional Processing (PEP), and Aesthetic-Interest-Awe (AIA). We treat episodes as clusters of affective events, essentially as functional processes, for which the emergence of emotional experiences can take place.
We first describe each episode in detail and then move on to present how existing descriptive schemes offer varying amounts of relevance for each episode. Together, the episodes and the schemes outline a prescriptive model that can be empirically tested.
Table 1 summarises the key elements related to each episode and also provides an estimate of the prevalence of the episodes. The prevalence is based on the summary frequency adopted from the study using Experience Sampling Method (ESM) by Randall and Rickard (2017). Other ESM studies (Greb et al., 2019; Juslin et al., 2008; Sloboda et al., 2001) would give different estimates of prevalence given the way situations listed in these studies are mapped into the episodes.
Summary of Qualities of the Episodes in the Model.
To illustrate how the episodes tackle different types of emotional events, we quote snippets from Ian McEwan’s (2005) book Saturday to illustrate aspects of culture, individual meaning, nuance, and context. In this novel, the author often uses situations where the characters are placed in settings with live or recorded music to describe their inner feelings. The main character, Henry Perowne, is a successful surgeon and his son, Theo, is an aspiring blues guitarist.
EDR
No longer tired, Henry comes away from the wall where he’s been leaning, and walks into the middle of the dark auditorium, towards the great engine of sound. He lets it engulf him. (McEwan, 2005, p. 171)
Experience sampling studies highlight episodes in which music functions as relaxation, enjoyment, or distraction as the most frequently occurring emotional reasons (together 32% of these) for engaging with music (Randall & Rickard, 2017). EDR episodes feature prominently in cross-cultural functions of music, more specifically functions such as dance, play, entertainment, and storytelling (Mehr et al., 2019). In this model, the EDR episode refers to an affective process of enjoyable tension reduction. The essential change in core affect that characterises an EDR episode is one of increased positive valence combined with emotional distraction and tension reduction. The shift in the core affect of arousal can be in either direction because relaxation can be either calming or reviving, thus decreasing or increasing arousal (Saarikallio et al., 2017; Smith, 2019). The role of music is central for EDR; musical enjoyment diverts the attention away from worries and stressors, and shifts it toward the positive experience of the music itself, and the evoked bodily and mental sensations. A shift from externally raised (unwanted) thoughts and feelings toward the bodily level of experience can be expected (Baltazar & Saarikallio, 2016). The embodiment of relaxation can manifest itself as dancing, feeling sensual pleasure, release of muscle tension, or simply immersion of the whole body in music. An EDR episode can be expected to happen particularly with one’s feel-good music, possibly through emotional contagion or imagery (Baltazar & Saarikallio, 2016), and to evoke feelings of relaxation, pleasure, and power (Saarikallio et al., 2019).
CB
Something is swelling, or lightening in him as Theo’s notes rise, and on the second turnaround lift into a higher register and begin to soar. This is what the boys have been working on, and they want him to hear it, and he’s touched. (McEwan, 2005, p. 170)
CB refers to an episode in which the main functional focus is on social connection. It may contain intensified feelings of kinship, mutuality, and belonging, experiences of participating and sharing emotions, being heard, and being included. These episodes are likely to occur in situations with live social interaction, but they can also extend to solitary music listening moments where music serves as a social surrogate (Schäfer et al., 2020). CB episodes are frequently encountered (25%) in everyday situations (Randall & Rickard, 2017). For cross-culturally identified functions of music, procession, ritual, love, group bonding, and weddings (Mehr et al., 2019) are typical examples of this episode. In terms of core affect, it is likely to observe an increase in positive valence, intertwined with the feeling of social connection. These experiences have been widely discussed in the literature on music’s evolutionary function and social bonding (Dunbar, 2012). We argue that including social rewards and identity affirmation as key aspects of a theory contextualises emotions as being embedded in human interaction, social motivations, and cultural belonging.
FM
Perowne gets Gita to put on Barber’s ‘Adagio for Strings’. It’s been played to death on the radio these past years, but Henry sometimes likes it in the final stages of an operation. This languorous, meditative music suggests a long labour coming to an end at last. (McEwan, 2005, p. 256)
Music is actively used to motivate everyday activities, such as jogging, cleaning, or studying, and these tend to occupy a notable portion (21%) of music-related emotion episodes in the everyday context (Randall & Rickard, 2017). FM refers to experiences in which music supports task-related motivation and focus. Cross-cultural functions of music have identified specific ways in which music is used in this way, often in relation to work or preparation for war/battle (Mehr et al., 2019). In terms of core affect, there is likely to be a shift toward positive valence, while shifts in arousal are likely to be target congruent (e.g., higher arousal for energising functions and lower arousal for tasks demanding lower arousal). The experience is characterised by improved motivation, concentration, and enjoyment of the task at hand. Tasks can range from sports to schoolwork, from cooking to doing arts and crafts, and can also be music making, from singing to performing alone or in a group.
PEP
Theo and Chas drift back to centre stage to sing their unearthly chorus. Or you can be happy if you dare. He knows what his mother meant. He can go for miles, he feels lifted up, right high across the counter. He doesn’t want the song to end. (McEwan, 2005, p. 172)
The episode type of PEP refers to experiences of identifying with the musical content and allowing music to support emotional coping. These are frequent (21%) in everyday sampling studies (Randall & Rickard, 2017) and in cross-cultural functions of music, two relevant functions fall under this episode, mourning and healing (Mehr et al., 2019). PEP episodes can contain experiences of self-expression, increased emotional awareness and clarity, reappraisal and emotional work, as well as emotional validation and consolation. PEP episodes can serve as functions of emotional growth, insight, and identity construction if one allows music to serve as a ‘magic mirror’ (DeNora, 1999). PEP episodes hold resemblance to the self-referential aspects of strong experiences of music (Gabrielsson, 2011), and during a PEP episode, the musical focus is typically placed on the inner experience – thoughts, feelings, and meanings. The meanings can be music-evoked, yet often are extra-musically charged, relating to self-referential imagery, associations, and memories. Lyrics can play an important role in activating personal identification (Baltazar & Saarikallio, 2016; Barradas & Sakka, 2022).
AIA
There are these rare moments when musicians together touch something sweeter than they’ve ever found before in rehearsals or performance, beyond the merely collaborative or technically proficient, when their expression becomes as easy and graceful as friendship or love. (McEwan, 2005, p. 176)
Emotional episodes falling into the AIA category are characterised by a selection of special topics, such as being moved, spirituality, detached emotions, aesthetic, or awe experiences. As such, these tend to be rare, occurring in a minority of experiences sampled in ESM studies (e.g., <2% by Randall & Rickard, 2017), but these experiences are significant for individuals. Such episodes are mentioned in cross-cultural functions of music as religious activity, praise, and creating art (Mehr et al., 2019). These episodes may arise from intellectual curiosity and interest in music or performance, or are experiences punctuated by chills, music-induced goosebumps, and shivers (Bannister, 2020), or autonomous sensory meridian responses (Kovacevich & Huron, 2018). The phenomena titled awe or kama muta (Fiske et al., 2019), and being moved (Menninghaus et al., 2015) are also examples of AIA episodes. These episodes are particularly sensitive to the context, which may or may not afford introspection, immersion, and focus on music content. The change in core affect is usually toward more positive valence, although aesthetic emotions can also lead to tears and putatively negatively valenced emotions (e.g., Gabrielsson, 2011). Since these experiences tend to be cerebral rather than sensory, the arousal changes may not be meaningfully characterised in terms of arousal.
These five episodes form the backbone of the model. We next link six descriptive schemes to the episodes that flesh out the contents of the episodes.
Descriptive schemes applied to episode model
We offer a mapping of six descriptive schemes to elaborate the characterisation of the episodes. These schemes are (1) emotion qualia, (2) induction mechanisms, (3) listening modes and agency, (4) reward and exposure, (5) musical meanings, and (6) functional contexts. These allow us to disambiguate the episodes from each other and operate as rough guides of what types of functions, activities, mechanisms, experience, features, and focus each episode typically has. To make these schemes a prescriptive part of the episode model, we will provide an estimation of how relevant or how frequently associated each element within each descriptive scheme is for each episode, summarised in Table 2. Note that the descriptive schemes are not considered as requisite conditions or triggers for the episodes, but they nevertheless add predictive value for the model.
Prescriptive Mapping between Episodes and the Elements of Descriptive Schemes through Relevance.
Note. EDR: enjoyment – distraction – relaxation; CB: connection – belonging; FM: focus – motivation; PEP: personal emotional processing; AIA: aesthetic – interest – awe. GEMS: Geneva Emotional Music Scale (Zentner et al., 2008); GEMIAC: GEneva Music-Induced Affect Checklist (Coutinho & Scherer, 2017); AESTHEMOS: Aesthetic Emotions Scale (Schindler et al., 2017); MEAM: music-evoked autobiographical memories; ESM: experience sampling method.
Core affect and emotion qualia
Several emotion induction labelling schemes (qualia) are used in music and emotion research. A prominent example suggested by Zentner et al. (2008) of aesthetic and epistemic emotions involved nine dimensions (Geneva Emotional Music Scale, GEMS hereafter), with variants such as GEMIAC by Coutinho and Scherer (2017) and AESTHEMOS by Schindler et al. (2017) offering different classifications. In Table 2, we provide a mapping of the likely emotion qualia for each episode. For core affect, a simple one-to-one mapping of valence and arousal with episodes is simply not feasible, particularly for episodes such as PEP or AIA. In general, positive valence is expected in most episodes because this is simply the purpose of listening to music in many situations (Juslin et al., 2008; Zillmann, 2015). Arousal is less well determined for most episodes, as the actual purpose within the emotion episode such as EDR may target either low (to relax) or high arousal (to energise). Thus, while a change in arousal may be a likely feature of the episode, the direction of the change can vary, and hence we label some of these target congruent. The relevance of emotion qualia are offered here as a way to focus interest on specific episodes and to be able to parse natural verbal descriptions of emotion experiences into appropriate episodes.
Induction mechanisms
A well-defined set of mechanisms that induce emotions in musical contexts has been postulated (Juslin, 2013, 2019; Juslin et al., 2008). Here, we connect these mechanisms to the episodes to allow for a better interpretation of studies involving either framework (episodes or mechanisms) and to show how it is possible to incorporate micro and molar levels of analysis. It is outside the scope to list all potential combinations of mechanisms and episodes here, rather we focus on mechanisms which bear particular relevance and have been related to specific emotions, situations, or processes in empirical studies (Juslin et al., 2008, 2014, 2022). For instance, musical entrainment has been linked to happiness-elation in nomothetic and idiographic analyses of emotional experiences (Juslin et al., 2022). While this only specifies a correlation between broad emotion and one mechanism, we infer that, together with emotion activity and emotion listening motive correspondences (Juslin et al., 2008), the EDR and FM episodes are where the musical entrainment mechanism has particularly strong relevance. To illustrate the relevance of another mechanism to episodes, the contagion mechanism has been linked with calm-contentment (Juslin et al., 2022) and the top activities of this emotion are associated with relaxation and work/study (Juslin et al., 2008), hence we identify this mechanism to be particularly relevant for FM episodes. Similarly, the episodic memory mechanism has been associated with nostalgia/longing (Juslin et al., 2022) and one of the primary motives for listening is “to get some company”, which suggests relevance for CB episodes. Juslin et al.’s (2022) data show a correlation between the musical expectancy mechanism and the emotion of interest-expectancy, which has been elsewhere linked with music listening and watching TV/films (Juslin et al., 2008). We take this to signal the high relevance of musical expectancy to AIA. Finally, the contagion mechanism was correlated with sadness-melancholy in Juslin et al.’s (2022) data, and this emotion was linked to a listening motive that involved active influence of one’s feelings (Juslin et al., 2008), interpreted to be mainly in line with PEP episodes.
Listening modes and agency
Emotional episodes may differ in the way in which they depend on the attention given to music and how much control the individual has over music. Attention, as a term, is used to describe a diverse array of cognitive systems and processes related to the evaluation of the environment (Hommel et al., 2019). Work on attention and listening has a longer history than can be summarised here (Chion, 2019; Schaeffer, 1966; Tuuri & Eerola, 2012), but a recent review by Weining (2022) synthesised seven types of listening modes that we have adapted as a means of capturing differences in attention given to music during an episode. Starting with the least attention given to music, diffuse listening is characterised by marginal attention given to music. Bodily listening relates to embodied motor processing coinciding with the music. Emotional listening is attuned toward emotional aspects of the music. In associative listening, one derives ideas, thoughts, and images prompted by music, usually through learnt and acquired associations. Structural listening places emphasis on musical content (structure, themes, syntax, or any other patterns of sound) and is probably a specialised mode of listening informed by training and experience. Finally, Reduced and causal listening is a careful and forensic evaluation, done reflectively and including one’s experience, as a means to analyse particular sound elements.
An associative mapping is offered in Table 2, where the most likely or relevant listening modes for each of the five episodes is represented. However, it is important to acknowledge that not every possible combination is precisely determinable here; some specific situations may indicate that an episode was experienced through a different listening mode than nominated. Nevertheless, articulating the listener’s level of attention, from marginal focus to occupying an intense centre of attention toward the music, provides a useful distinction and possible comparison between those situations and the emotions experienced.
Another key element is the level or control the listener has on music or listening situation. We call this agency, which exhibits an ability to choose or control the music. Agency has been used previously, to explore its effect on emotion qualia and activities associated with music (Saarikallio et al., 2020). A high sense of agency in selecting or controlling music has shown more positive results in music intervention studies involving pain relief (Howlin & Rooney, 2020) and physical exercise (Fritz et al., 2018). Control over music is also an important reported moderator of emotions in experience sampling studies (Greb et al., 2018; Juslin et al., 2008). Greasley and Lamont (2011) demonstrated that the sense of choice is associated with functions of music listening such as enjoyment, relaxation, to help to concentrate/think, suggesting the sense of agency is highly relevant for EDR and FM. Self-chosen music tends to be associated with more positive feelings (Krause et al., 2015), suggesting that agency is highly relevant for EDR (see Table 2).
Reward and exposure
Previous research has often found a link between preference and emotional responses (Ferreri et al., 2019; Kreutz & Cui, 2022; Salimpoor et al., 2011; Schubert, 2007). The role of reward for emotions has not usually been directly acknowledged in the past. However, it has been common to link exposure and familiarity with specific music with emotional episodes (Juslin, 2019; also Völker, 2021). Exposure and familiarity are closely related constructs to reward, as exposure is known to increase liking (Zajonc, 2001). Exposure to specific pieces of music has been shown to be highly relevant for music-induced emotions (Hunter et al., 2011; Schellenberg et al., 2008; Schubert, 2007; Szpunar et al., 2004; Van Den Bosch et al., 2013). In operational terms, when exposure is not manipulated directly, it operates similarly as a reward – through the concept of familiarity (for a song, artist/composer, genre, or subculture), which is known to have an impact on induced emotions (Fuentes-Sánchez et al., 2022; Pereira et al., 2011; Schubert, 2007). As reward and familiarity have a clearly identified contribution to emotions experienced, we recognise the need to specify how reward and familiarity with the music may have different relevance for the episodes. We surmise that rewarding music has high relevance for EDR and FM episodes, but such a reward may not be a key factor for PEP or CB episodes. However, AIA episodes that involve awe, chills, or being moved are frequently characterised as deeply rewarding (Cotter et al., 2018; Vuoskoski & Eerola, 2017). For exposure, we hypothesise that high exposure is an element of EDR and CB episodes – the former because exposure has been linked to higher relaxation (Tan et al., 2012) and distraction (Finlay, 2014) and the latter due to nostalgia and autobiographical memories, which are strongly associated with increased exposure to specific pieces (Barrett et al., 2010; Jakubowski & Francini, 2023). However, we hypothesise that exposure may have the opposite relationship for AIA experiences, as one frequently engages with music that may be unexpected or unusual due to curiosity (for empirical evidence, see Janata et al., 2018).
Musical meaning
A conceptualisation of musical meaning has recently been captured in a music appreciation framework (Thompson et al., 2023). This framework identifies appreciation and meaning to emerge through three elements; perceiving musical structure (structural), activating personal significance, identity, and autobiographical memories (self), and what is called source sensitivity (source), which encompasses cultural context (including socio-political and historical context) of music and music-making). For structural forms of musical meaning, there is a long history of associating specific features of music with emotional expression (Gabrielsson & Lindström, 2010; Laukka et al., 2013), and some studies have associated induced emotions with both musical and acoustic characteristics (Juslin et al., 2022). We posit that the structural elements of music carry varying amounts of relevance for different episodes; targeted mapping of structural cues and emotions (Aljanaki et al., 2017) showed that structural elements are likely to be important for EDR and FM episodes, but for more culturally and individually determined episodes, such as AIA, CB, and PEP, the structural elements would likely be less important. For instance, PEP episodes are characterised by personal meanings that are less subject to structural codes, although some common codes for broad affective functions such as sadness and mourning have been identified (Huron, 2015). Similarly, CB episodes have by definition cultural and personal elements which can only partially be captured by structural forms (Levin & Süzükei, 2021), unless the belonging and social bonding is being established through entrainment and synchrony (Clarke et al., 2015; Stupacher et al., 2017). Finally, AIA episodes have elements that some scholars have associated with structural features such as harmonic, timbral, and dynamic changes that have been mapped to experiences of chills with music (Bannister, 2020; Fleurian & de Pearce, 2021).
Under the subcategory of self, several relevant concepts emerge, such as music-evoked autobiographical memories (MEAMs), which contribute to differentiate episodes (Janata et al., 2007; Platz et al., 2015). Such memories are also implied in the establishment of identity, and music is shown to offer a way to affirm and express identity (Boer et al., 2012; DeNora, 1999). Meanings can also be derived from virtual agency, where listeners experience music itself as a surrogate friend or person with whom they share a relationship or identity (Schäfer et al., 2020). We suggest that the relevance of these elements vary across episodes; CB and PEP episodes rely especially on MEAMs and constructors of identity (Ter Bogt et al., 2011), or bonding over music choices (Ter Bogt et al., 2011), even observed in several cultures (Boer & Abubakar, 2014). However, not all episodes hold such relevance with elements of self, for instance, FM episodes do not appear to be influenced to such degree by subcategory of self (but see Dyrlund & Wininger, 2008).
Musical meanings can also be generated by knowing the source, which relates to contextual, cultural, historical, or political information. Contextual attribution can be about style, region, visuals, lyrics, or biographical information about music/artist/composer. Conceptual source information, however, refers to a specific cultural, social, or historical context associated with music. Together these are known to inform responses and valuation of music (Gabrielsson & Lindström Wik, 2003; Kreutz et al., 2008; Margulis et al., 2022). We surmise that source elements are particularly relevant for CB, PEP, and AIA episodes, which involve personal meanings (Cross & Tolbert, 2009; Finnegan, 2012; Kramer, 2003). For instance, the type of episodes characterised by CB are exemplified by the way specific pieces of music can be used as a powerful political symbol (Danaher, 2010; Ziv, 2018) or by fans of specific genres (Powell et al., 2023). The value of historical and cultural information on music for PEP episodes involves extramusical associations/cultural topoi (Huovinen & Kaila, 2015; Shevy, 2008). Religious contexts such as trance (Becker, 2004), or EDM events (Solberg & Jensenius, 2017), may offer some insights into how shared cultural meanings can contribute to AIA episodes. Conversely, source elements have lower relevance for EDR and FM episodes.
Functional and affect regulatory context
To connect the episodes with functional contexts of music and the emotional contexts of music listening, we utilise three schemes that we alluded to when defining the episodes. First, we identify the typical emotional functions established in experience sampling studies (ESM) in the Western context by Randall and Rickard (2017). Similar patterns have been reported by Greb et al. (2019). We also provide a mapping between the episodes and cross-cultural functions of music (Mehr et al., 2019). Their analyses suggested 20 functional uses of music such as dance, healing, mourning, entertainment, praise, and group bonding, which we have associated with episodes (Table 2). To emphasise the perspective of affect regulation, we capitalise on a large review of affect self-regulation studies by Baltazar and Saarikallio (2016). Their summary of goals are largely consistent with the way we have defined the functional nature of the episodes; the most frequent goal is to obtain/stimulate/maintain positive emotions through music, consistent with EDR episodes. The second-most frequent goal is to control affects and motivation, as is done in FM episodes. The aim to reduce loneliness and connect with significant others is mentioned in the goals, which belong to CB episodes. Dealing with sadness as a goal and gaining personal insight into an issue is identified in multiple studies, which aligns well with PEP episodes. Finally, the goal of experiencing something new, to satisfy curiosity or to reach strong sensations is consistent with AIA episodes.
The predictive purpose of descriptive schemes
The primary purpose of connecting multiple descriptive schemes to the episode model is to be able to predict and diagnose emotional episodes. This move from descriptive to prescriptive definitions (Paul & Mendl, 2018) allows us to provide information about the likelihood of the specific qualities within the schemes contributing to the episode. One can also derive the probable episode given the sufficient amount of information about attributes within the schemes. The relevance of the elements within the scheme varies significantly across the episodes; for instance, knowing about core affect is informative for EDR, as positive valence characterises these episodes, but the same information is not useful for emotional episodes involving AIA, which may have highly differently valenced affective states. Conversely, the prescriptive stance on episodes and schemes is the ability to focus; when the research aim is to clarify the way a specific episode operates, such as EDR, the focus of the research needs to include structural aspects of music, music preferences, familiarity, and diffuse and bodily listening modes. If the purpose is to learn more about the CB episodes, these experiences and situations are intimately connected to historical and cultural context, identity affirmation, and MEAMs processes.
We can illustrate the model with a series of related pentagons (episodes), each of which will have a distinct combination of typical contents of the schemes (Figure 1). This reduction of an episode into a dominant content of the schemes does not do full justice to the possibilities within the episode, but here serves as a compact summary of the model.

Examples of the most prominent contents of episodes using dominant constructs from each descriptive scheme.
We have framed the episodes as largely positive emotional encounters with music, as most music research has identified music engagement as an activity that individuals want to engage in (Croom, 2015; Lamont, 2011; Saarikallio et al., 2021). Yet, each episode also has the opposing potential. The antithetical variant of an EDR episode might reflect a maladaptive emotional process if the use of music to distract becomes excessive and involves leaning toward emotional avoidance, emotional numbing, or hyperarousal (Carlson et al., 2015; Silverman, 2020). Similarly, a PEP episode can lead to intensified access to one’s personal experiences which can sometimes become maladaptive and turn into rumination (Carlson et al., 2015; Garrido & Schubert, 2013). FM episodes can turn into maladaptive ones where the focus is lost and the music is distracting and irritating, leading to a lack of motivation. Maladaptive instances of CB episodes could be characterised as feeling lonely, disconnected, and isolated if the episode highlights the absence of such connections.
Discussion
We have presented a novel theoretical model of emotional episodes, which places the functions and situationally constructed meanings at the core of understanding music-related emotions. In the episode model, the emotional experience itself becomes defined as a functionally constructed process. People engage with music for a variety of reasons with a variety of processing depths, familiarity, preferences, and personal meanings with specific affect-regulatory purposes. In the model, these factors are not only external causes of an emotional experience, but they are what partly constitutes the emotional experience. We claim that this perspective leads to a more dynamic and nuanced understanding of music-induced emotions where goal-driven and stimulus-driven explanations operate in parallel (Ede et al., 2020; Moors et al., 2017). The episode model emphasises the situation and affect-regulatory function of the music for the individual and shifts the attention away from stimulus-driven triggers of emotions. The episodes themselves are aligned with current knowledge from ESM studies, affect regulation with music, and cross-cultural functions of music.
The episode model targets a molar level of explanation (De la Fuente et al., 2019), where emotions are related to affect-regulatory functions of music, and these incorporate meanings, including autobiographical, cultural, and historical aspects of the experiences. This approach lends itself to narrative explanations of emotional experiences, but also allows the prediction and assessment of empirical data coming through the combination of episodes and the descriptive schemes proposed. The schemes themselves contain specific predictions of how the elements contribute to each episode. The episode model does not contradict micro-level explanations such as cognitive mechanisms (Juslin & Västfjäll, 2008) but offers a conceptual tool that is flexible and allows researchers to focus their research either on specific emotional episodes with specific functions, or to characterise broad issues related to emotional experiences involved with music.
The shift toward a molar level of explanation is in line with the developments taking place in emotion psychology. A recent overview of psychometric models of emotion (Lange et al., 2020) criticised emotion research for being scattered across various emotion theories (affect-programme, constructionist, appraisal) that are seemingly impossible to integrate. As a response, the authors introduced the psychometric network model, in which emotions are conceptualised as systems of causally interacting emotion components. In this model, the entire network structure of the emotion components, as well as their interconnections, constitute the emotion. In a similar way, the episode model takes a more holistic stance, which provides ground for conceptual integration of the existing music and emotion theories.
Implications and testable predictions
We have constructed the episode model to meet the criteria for a robust theory that include testability, falsifiability, parsimony, and explanatory power (Shoemaker, 2003). The way we have articulated the model with connections to functions, everyday sampling studies, and prescriptive commitment to descriptive schemes for each episode renders the theory testable and falsifiable. To be more precise, we make the following testable predictions:
We claim that the emotional experiences associated with music are influenced by the broad affective functions captured by the episodes.
The episodes are distinct in terms of a likely configuration of (1) emotion qualia, (2) induction mechanisms, (3) listening modes and agency, (4) reward and exposure, (5) musical meanings, and (6) affect regulation functions.
The episodes together with the descriptive schemes will be able to distinguish between a large set of diverse emotional experiences more effectively than the past theories.
In terms of parsimony, the episode model offers an economic account of the listener’s emotional reactions in everyday situations. If the descriptive schemes are operationalised as suggested in Table 2, this will allow the theory to differentiate episodes from each other and also allows the episodes to inform the likely emotion qualia induced, and what combination of other qualities of the experiences are expected to be relevant. A more sophisticated weighting coupled with the Bayesian notion of priors for the episodes and the schemes would allow for even more fine-grained interpretation and prediction of the episodes themselves (see Houlihan et al., 2023).
Challenges and limitations
A theory needs to be measurable, and while we have not yet provided and evaluated a measurement tool for the model, many of the descriptive schemes are already available. Measurements of episodes and descriptive schemes can reveal that some episodes may need splitting or that a specific configuration of descriptive schemes for the episodes is inaccurate. The very act of collecting such data confined to WEIRD samples (Henrich et al., 2010) may amplify the potential bias reflecting Western, consumer-orientated, and technology-driven culture, and may not capture all relevant episodes. Another challenge related to the measurement of the model is that some elements may be challenging to verbalise or consciously assess. For instance, the core affect may be better captured by psychophysiology (Lench et al., 2011), listening modes and the embodied musical meanings such as gestures and facial expressions might require nonverbal or indirect methods.
One of the challenges of any account of the emotions is the temporal nature of the experience. The episode model attempts to capture a broad temporal context (spanning minutes) rather than explaining the triggering process within much shorter timescales (seconds). One can also ask whether two or more episodes can co-exist at the same time. Our interpretation of the episodes would allow episodes to flow into others and thus have a degree of temporal overlap (e.g., an episode may start as a pleasant EDR but can shift into a PEP episode due to poignant and timely lyrics or contextual factors).
With respect to perspective taken in the model, it is centred around an individual experiencing emotions related to music in one way or another (listening intensively, part of other activities, or dancing), and this could also be taken to encompass performing music.
Future directions and applied perspectives
A direction forward is to create a self-report instrument that captures the affective function and context (e.g., activities, social composition, and cultural context), and annotates the other main elements of the episode through descriptive schemes. Such an instrument may best operate on a hierarchy where the primary-level attempts to capture the episode broadly and the secondary level details the applicable schemes. It might also be valuable to connect individual differences to the model as moderators of the episodes.
The episode model provides a concrete perspective to music and well-being studies as it puts emphasis on affect-regulatory functions and situations underlying the experience. For music therapy and rehabilitation, the model offers a prescriptive set of starting assumptions (the weight of the elements in the schemes for the episodes) that may guide the focus of research. We do not think that all research will always need to focus on all episodes, and selectivity can be seen as an asset here; music therapy targeting personal insights is more likely to operate with elements that have to do with musical meaning than with the other elements or schemes. We also envisage that the episode model offers a fruitful set of structures and hypotheses for Music Information Retrieval research; some episodes such as EDR and FM are likely to be present in the actual patterns of music and its use (e.g., Gómez-Cañón et al., 2021; Knees et al., 2020). Episodes themselves may also be a useful target for predicting the broad musical and acoustic correlates of specific music corpora (Mehr et al., 2019; Scarratt et al., 2023).
The episode model places the context, meaning, and the affective purpose of musical engagement at the forefront of research. We hope that this change in focus will bring music and emotion studies closer in contact with musicology, ethnomusicology, music therapy, music education, and music information retrieval. This direction will help to concentrate future research on how we as musicians and listeners use music in our efforts of creating emotional meaning and purpose, and shift in our affective states in a variety of real-world contexts that can be best conceptualised as emotional episodes.
Footnotes
Author contributions
The authors made the following contributions. Tuomas Eerola: Conceptualisation, Writing–Original Draft, Writing–Review & Editing, Visualisation; Connor Kirts: Writing–Review & Editing; Suvi Saarikallio: Conceptualisation, Writing–Original Draft, Writing–Review & Editing.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the Research Council of Finland (346210) and the European Union (ERC, MUSICONNECT, 101045747).
