Abstract
Music videos are a popular method of consuming music; however, the characteristics of these experiences and their effects on music perception are poorly understood. An online survey (N = 155) was designed using theoretical insight from Dasovich-Wilson et al.’s (2022) Intention Attention Reaction and Retention (IARR) framework. The survey consisted of two parts: the first explored the key characteristics of music video experiences, and the second explored their effects on subsequent listening outcomes. Separate principal component analyses (PCAs) were performed on each part to differentiate between the experience itself (Experience components) and the effects on subsequent listens (Retention outcomes). Relationships between Experience components and Retention outcomes were explored using correlation and regression analyses. The results suggest that music video experiences characterized by performance gestures and narratives have the strongest influence on music perception. These findings shed light on how extramusical information from music videos influences mechanisms related to visual imagery and personal associations.
Music videos have been a part of mainstream music culture for more than four decades. Originally intended as a marketing tool for record companies to promote new albums, this form of post-modern art became a cultural phenomenon popular among youth (Aufderheide, 1986); its nonlinear, non-narrative structure was seen as a reflection of the superficial and hedonic culture from which it emerged (Fiske, 1986). Today, music videos function not only as a promotional tool but as a revenue stream in their own right. Short, self-contained, and open to interpretation, music videos are an ideal format during the internet age where new content is always in demand (Edmond, 2014). Whether available on music channels such as MTV or accessed on modern streaming platforms such as YouTube, they stand out as a popular method of consuming musical multimedia. However, the characteristics and effects of music video experiences have been largely overlooked in music psychology research.
A recent study by Dasovich-Wilson et al. (2022) proposed a framework for conceptualizing the psychological functions and effects of music video experiences. The purpose of this framework was to address this gap in music psychology research and provide a foundation for future research exploring the topic. In their study, qualitative data from 34 adolescents and young adults were collected and analyzed using an abductive grounded theory approach (see Charmaz, 2006). The resulting framework categorized music video experience-related phenomena across temporal levels: intention, attention, reaction, and retention (IARR). Thematic categories within each level provided insight into participants’ reasons for watching, how the visual and musical content influence attention, the affective outcomes they evoke, and their influence on subsequent hearings. The categories in the retention level of the IARR reflect how perceptual phenomena that occur during the initial experience affect the perception of musical meaning, which can have an influence on subsequent listening outcomes. Dasovich-Wilson and colleagues concluded that extramusical information from music videos—including narrative content, visual effects, dance, and performance gestures—affects what the listener associates with the music. These associations can have a long-term influence on how the listener perceives a song’s meaning and/or affective quality in subsequent listens. However, the IARR did not consider the relationships between conceptual categories, either within or across temporal levels.
The current study uses conceptual insight from the IARR framework to explore the characteristics of music video experiences and their effects on subsequent hearings using quantitative methods. The topic of music videos (and music multimedia in general) is especially relevant in today’s music listening and multimedia climate, where music media is easily accessible from virtually anywhere at any time. To address this phenomenon, we highlight existing research on the functions of music and music videos and current theories of audio-visual perception.
The function of music in an information age
Music has been widely recognized for its ability to fulfill psychological goals which can be reduced to three dimensions: cognitive, affect-regulation, and social relatedness (Hargreaves & North, 1999; Rentfrow, 2012; T. Schäfer et al., 2013). These functions also apply to music video experiences. According to Sun and Lull (1986), youth watch music videos to better understand the meaning of their favorite songs as well as for emotion regulation, to pass the time, as part of social interactions and for social learning purposes—a finding which has also been supported in recent studies (Dasovich-Wilson et al., 2022; Wilson et al., 2020). Music video experiences, however, have evolved in recent years, in part due to concepts such as what is “trending” on social media platforms—a phenomenon that did not exist at the time of Sun and Lull’s study. T. Schäfer et al. (2013) posited that the social function of music is less important compared to its cognitive and affect-regulating functions. They suggested this reflects Western ideals that emphasize the importance of individuality: “self-acknowledgement and well-being appear to be more highly valued than social relationships and relatedness (p. 7).” Schäfer and colleagues could not have predicted that social media and audio-visual platforms such as YouTube would become as commonplace as they have in recent years. These platforms have become important methods of consuming, engaging with, and sharing music media. Schäfer and colleagues rightly pointed out the importance of self-acknowledgment and well-being. However, current trends in music engagement (including the platforms on which they occur) seem to more strongly suggest that music’s social function has evolved.
Social media and streaming platforms like YouTube have contributed to this evolution. This has implications for music’s cognitive and affect-regulation functions by virtue of the fact people may spend a lot of time engaging with these platforms: using them to cultivate their own sense of self, change their affective state, or both. However, the strongest contributor affecting these functions is the wealth of extramusical information they provide. For music videos, this comes in the form of visual content. The extent to which this information influences the perception of the music—and by extension, its function—is an important question that remains unanswered. Understanding how an individual processes and uses that information is a key step toward answering it, and the social component from which this information is accessed cannot be ignored.
Extramusical associations provide a cognitive explanation as to how listeners’ thoughts and meanings about the outside world, such as personal memories and other associated concepts, contribute to their experience of music (Meyer, 1956). Many of the mechanisms of music-evoked emotion (BRECVEMA; see Juslin et al., 2014; Juslin & Västfjäll, 2008) are grounded in the experience and associations of the listener as well, including musical expectancy, episodic memory, and visual mental imagery. Associated meanings in the form of personal memories can trigger mechanisms such as episodic memory (Juslin & Västfjäll, 2008), resulting in emotional outcomes such as nostalgia (Barrett et al., 2010). Extramusical meaning can occur as a result of perceived shared experiences the listener has with the artist (K. Schäfer & Eerola, 2020), fostering parasocial relationships, especially among youth (Greenwood & Long, 2009; Kistler et al., 2010). These shared personal meanings can promote affect regulation through strategies such as solace (Saarikallio & Erkkilä, 2007), providing consolation for the listener during hard times (ter Bogt et al., 2017). However, for an individual whose musical engagement style indicates poorer emotional health and well-being (see Saarikallio et al., 2015), watching a music video for songs with deeply personal meanings attributed to them only to find out the “true” meaning of the song from the perspective of the artist can have potentially devastating consequences if those two meanings are in conflict with each other (Dasovich-Wilson et al., 2022).
Music’s function depends on the individual, their relationship with music, and the context where the listening event takes place. Not all individuals place the same importance on music, and individuals who engage with music more often are more likely to use it to fulfill several functions simultaneously (Greasley & Lamont, 2011). Music videos can potentially change the type of associations attributed to a piece of music, which may affect its utility in fulfilling certain functions, for better or worse. Research that explores the cognitive mechanisms that make this possible will provide more insight into how the affective impact of music video content can manifest itself in subsequent listening experiences, even when the visual component is no longer present.
Audio-visual interactions and music perception
To understand how music videos can contribute to the experience of music, audio-visual interactions and their influence on cognition need to be considered. In film contexts, music can help establish mood, clarify the emotional meaning of a scene, and provide insight into the motivations behind the characters
Cognitive schemas, which are cognitive networks created from past experiences that represent knowledge about concepts or other stimuli, their attributes, and the relationships between those attributes stored in long-term memory (Sakamoto & Love, 2004), may serve as an explanation for how long-term memory is activated in CAM (Shevy, 2008). A network of cognitive schemas stored in long-term memory is activated depending on the individual who is listening, the context where the listening event is taking place, and the features of the music. This “spreading activation” has been suggested as an underlying cognitive mechanism responsible for the role of familiarity in establishing musical preference (Schubert et al., 2014). In the case of musical multimedia, information about the meaning of the music, the character’s actions, and perceived congruencies between the audio and the visual channels provide an opportunity for new associations to form. Spreading activation posits that as familiarity increases more associations are formed, making the music more enjoyable. However, spreading activation occurs at an unconscious level: The listener is not necessarily aware that they are making these connections (unless they are consciously trying to do so). For example, in a study by Strobin et al. (2015), the genre of the music that played during a film trailer influenced participants’ expectations of the feelings that would be expressed in the film. Furthermore, they found that the genre of the music played in the trailer influenced expectations of the film’s genre. The music, however, was not an important factor in determining whether they would go to see the film. This may be because the function of the music in this scenario is to fit with the content of the movie and to clarify the characteristics of the film (genre, events, and semantic quality), whereas music in music video contexts has a potentially more important psychological function that reflects the specific needs of the individual.
Unlike the influence of music in film trailer contexts, however, associations with visual material may be too intrusive to be subconscious in music video experience contexts. The most extreme example of this would be cases where hearing a song triggers mental imagery of the music video in subsequent listening episodes (Dasovich-Wilson et al., 2022). Joint encoding is a possible explanation for this phenomenon: when the music and video are perceived as mood-congruent (e.g., two individuals getting into a physical fight accompanied by angry music), information from both channels is integrated to form a single, audio-visual memory code (Boltz, 2004). This may explain why visual mental imagery from the video was a common occurrence in Dasovich-Wilson’s study, even in the absence of any other perceptual changes.
A large body of research has investigated how extramusical information influences music listening experiences in both personal music listening and multimedia contexts. Despite this, music video experiences have not yet been adequately explored. This lack of previous research establishes the need and motivation for the current study.
Objectives
The current study explores the relationship between conceptual categories within the IARR framework using quantitative methods to establish the key components of music video experiences (i.e., intention, attention, and reaction level categories) and their influence on subsequent outcomes (i.e., retention level categories). The study investigates two research questions:
What are the key characteristics of music video experiences (Experience components) and their carry-over effects (Retention outcomes)?
Because the study was designed using Dasovich-Wilson et al.’s (2022) framework as a starting point, Experience components should reflect themes such as engaging with narrative content, seeing the artist perform, for emotion regulation purposes, and as part of social media engagement. We expect Retention outcomes to reflect how visual and personal associations with the music change after having seen the music video.
How are Experience components related to Retention outcomes?
This second question is exploratory; however, we expect the strongest predictors for Retention outcomes to be experiences that emphasize visual content such as narratives outlining the meaning of the music and/or lyrics, and performance gestures (musical playing or dance).
Methodology
Participants
Participants were recruited via the authors’ university email lists, Twitter, and Survey Circle (www.surveycircle.com). The first 25 participants were given an Amazon voucher valued at €5. No other compensation was provided; however, all participants who completed the survey and provided a valid email address were entered in a raffle to win one of two Amazon gift cards (value of €25). Participants who did not complete the entire survey were removed for analysis, as well as three participants who filled out the same answer for each question. The final sample consisted of 155 participants between the ages of 15 and 44 (M = 27.3, SD = 5.8); 66% of the sample were female and 62.3% were currently living in a European Union (EU) country. The most frequently observed nationality was German (12.25% of cases), followed by British (11.6%), Finnish (11.6%), and French (9.67%). Most participants identified as music-loving non-musicians (44.5%; a full breakdown of participants’ nationalities and musical backgrounds can be found in the Supplementary Data file).
Questionnaire
Participants provided questionnaire data concerning their experiences with music videos using a 5-point Likert scale (1 = disagree strongly, 5 = agree strongly). The items were designed using the qualitative categories from Dasovich-Wilson et al.’s (2022) IARR framework as a starting point. The items reflecting these qualitative categories were organized in two parts. The first part contained items describing the main categories of the intention, attention, and reaction levels of the IARR framework. Intention items describe the cognitive and emotional goals that motivated music video engagement, as well as social influences such as peers, as well as social media platform use. The attention level reflects the structural and semantic content of the music video to gain insight into the type of content that would absorb attention. The reaction level reflects the type of affective outcomes that occur during and immediately following the music video experience, such as a boost in energy, as well as emotional processes that occur as a result of mechanisms such as empathy and contagion. Items from the intention level subcategories were labeled as cognitive (C), emotional (E), and social (S), and items derived from the attention and reaction levels were labeled A and R, respectively (see Table 1). Additional information concerning subcategories (e.g., interpretation-focused or affect-focused at the attention level) was not considered necessary for the purpose of this study.
Principal Component Structure for Experience Items.
PC: principal component. Loadings above or equal to .50 are in bold.
“Int” items were derived from the main thematic categories from the Intention level of Dasovich-Wilson and colleagues’ (2022) IARR framework. “Exp” items were derived from Attention and Reaction level categories, which pertain to the experience itself. Labels in parentheses: (A) items refer to specific content features of music videos that grab attention, including structural and semantic features (C) items reflect music’s cognitive function, (E) items reflect emotional uses of music videos, (R) items reflect affective reactions from music videos, (S) refers to social functions.
The second set of items explored the potential carry-over effects music videos would impose on their subsequent music-listening experiences. These items were based on the three retention-level categories in the IARR: (1) visual mental imagery, which reflects how content from the music video was recalled as in subsequent listens; (2) new interpretation of meaning, which reflects a change in understanding of the music’s meaning as a result of the video’s semantic information, and (3) new affect perception, which reflects a change in affective reactions, whether it be a different emotional response to the song or different appraisal. Retention-level items were also labeled according to the IARR thematic categories they were derived from (see Table 2).
Principal Component Structure for Retention Outcome Items.
PC: principal component; VI: visual mental imagery; NIM: new interpretation of meaning; NAP: new affect perception; UA: unaffected; LT: long term; NLT: non-long term.
Labels in parentheses refer to the retention-level categories from Dasovich-Wilson’s et al. (2022) original study.
Loadings above or equal to .50 are highlighted in bold.
LT and NLT items were presented as written.
To provide situational context to the experiences, nominal and categorical data were collected concerning the locations where music video experiences occur, whether they prefer to watch music videos alone or with others, and the devices they used were also collected in the questionnaire. This information is available in the Supplementary Data.
Procedure
The study was hosted online via the webropol survey platform. Participants indicated their consent to participate on the opening page, which also detailed the study’s purpose and instructions. Because the study did not put participants at any risk and participation was entirely voluntary, ethics approval was not required from the researchers’ host institution according to the guidelines of the Finnish National Board on Research Integrity. 1 Once participants completed the survey, they were given the option to provide their email to participate in the prize draw.
Results
Preliminary analysis
In total, two principal component analyses (PCAs) were performed. Components with an eigenvalue above 1.0 were retained and sampling adequacy was evaluated using the Kaiser-Meyer-Olkin (KMO < .60; Kaiser, 1974). The first PCA was conducted to reduce the dimensionality of items at the Intention and Experience level (referred to as Experience components for the rest of the article), whereas the second PCA reduced the dimensionality of items at the Retention level (R1). Promax rotation was used to increase the interpretability of the loadings because correlations between components are expected. Once components were established, correlations were performed to investigate the relationship between Experience components and Retention outcomes. Linear regressions were performed to establish which Experience components were the strongest predictors for each Retention outcome (R2).
Principal component analysis
Experience level
A PCA was performed to investigate the key characteristics of music video experiences. The KMO measure verified excellent sampling adequacy (KMO = 0.88). Seven Experience components were established (see Table 1) which account for 58.51% of the variance. These were labeled: (a) Narrative Content, (b) Visual Performance, (c) Meaning of Lyrics, (d) Personal Interpretation, (e) Emotional Use, (f) Social Use, and (g) YouTube Recommender.
The first four components are labeled according to the particular features of music videos that absorb attention and/or motivate listeners to watch them. The strongest items for Narrative Content describe watching music videos that are story-driven (I enjoy music videos that tell a story; I watch music videos that are like short films and have storylines), and depict a different interpretation of the song’s meaning than their own (I enjoy music videos that show an alternative interpretation of the music that is different from my own interpretation). The Visual Performance component consists mostly of items that emphasize the importance of visual content, especially seeing the artist perform (I enjoy music videos of live performances; I watch music videos to see the artist and how they perform the music). However, the item with the strongest loading for this component describes how the experience increases state arousal (music videos can motivate me or energize me and give me a boost of energy). The items with the strongest loading for the Meaning of Lyrics component reflect the intention to learn about the meaning of the music and to reflect on that meaningful content (I watch music videos to gain more insight about the meaning of the music and/or lyrics; I watch music videos to reflect on the content of the music/lyrics). Furthermore, this component has strong loadings on items describing certain affective reactions, including a deeper understanding of the music’s emotional tone and an increase in state arousal. The reaction item with the strongest loading reflects how music videos can cause listeners to feel more connected to the artist on a personal level (seeing the music video can make me relate to the artist more as a person). The fourth component, Personal Interpretation, also reflects experiences where attention is directed toward how the meaning of the music (or lyrics) is represented but emphasizes the importance of the listener’s personal interpretation of the music (I enjoy music videos that show an interpretation of the music that is similar to my own personal interpretation).
The last three components more prominently feature items describing the reasons (intentions) for engaging with music videos. Emotional Use items reflect music’s affect-regulating function and contain items reflecting the use of music videos with the intent to distract themselves from a negative affective state, to relax, to feel more motivated, or to relieve boredom. The Social Use and YouTube Recommender components reflect the social functions of music listening and the role of social media platforms (including but not limited to YouTube) and other media outlets in influencing music video consumption. The Social Use items with the strongest loadings reflect the influence of both peers and broader media hype (I watch music videos that are hyped in the media and have a reputation for something), and videos with choreography (I watch music videos to watch and/or learn dance moves and choreography; I enjoy music videos that contain dance and choreography). The YouTube Recommender component primarily reflects how YouTube recommendations and its auto-play function influence music video engagement (I watch music videos when YouTube recommends them to me or auto-plays them).
Retention level
A second PCA was performed to investigate the characteristic Retention outcomes.
The KMO measure revealed excellent sampling adequacy (KMO = 0.85). Three Retention outcome components were established, and account for 54.11% of the variance (see Table 2). These components were labeled: (a) Visual Mental Imagery, (b) Personal Significance Increase, and (c) Unaffected, and reflect the qualitative characteristics of music video effects and their duration.
The Visual Mental Imagery and Personal Significance Increase Retention outcomes highlight how music videos influence the type of associations made with the music, a song’s personal significance to the listener, and its influence on affective states. Both reflect the carry-over effects imposed on future listening experiences. The Visual Mental Imagery outcome sheds light on how visual information is recalled during listening, whereas Personal Significance Increase reflects how this information affects the song’s influence on their mood or emotional state. The item, “When I hear a song that I have seen the music video for, it is hard not to think of the music video,” was a key item for Visual Mental Imagery; this suggests that the mental imagery of the video may be triggered automatically, making it difficult for the individual to disassociate the song with the video in the future. Personal Significance Increase not only reflects how music video content can influence the perception of the music’s emotional character (the mood of the song can seem different . . .) but also how this content makes the song more significant to the listener (the song can be more meaningful and/or personally significant to me). The Unaffected component reflects experiences where the music video has no salient influence on how a song is perceived or what is associated with it in subsequent listens. Because this study is concerned with music video experiences and effects, the rest of the analyses and discussions will primarily focus on Visual Mental Imagery and Personal Significance Increase (all correlations and linear regression analyses conducted for the Unaffected outcome are reported in the tables).
Correlation and linear regression results
Correlations and linear regression analyses were performed to establish which Experience components are the strongest predictors for each Retention outcome (R2). All correlations are reported in Table 3 and all linear regressions are in Table 4. Bonferroni correction was applied to all regression results (p = .007).
Correlations between EC and RO.
EC: Experience components; RO: Retention outcomes.
Spearman’s rho.
p < .001; **p < .01; *p < .05
Retention Outcomes Predicted by Experience Components.
SE: standard error.
β is the unstandardized parameter estimate; β/SE is the t-value associated with the parameter estimate
Are principal components with an R2 ⩾ .15.
p < .001; *p < .007 (Bonferroni correction). The significance of bold values is ** p<.001.
Both Visual Mental Imagery and Personal Significance Increase Retention outcomes had a moderately strong, positive correlation with the Experience components: Narrative Content, Visual Performance, Meaning of Lyrics, Personal Interpretation, and Emotional Use. The Experience components with the strongest positive correlations with Visual Mental Imagery were Visual Performance, r(153) = .53, p < .001, and Personal Interpretation, r(153) = .45, p < .001. A moderate positive correlation was also observed between both Visual Mental Imagery and Personal Significance Increase, r(153) = .57, p < .001.
Visual Mental Imagery
Linear regression analysis revealed the extent to which each Experience components could predict the Retention outcome Visual Mental Imagery. Interestingly, all seven components were predictors of this outcome; however, there were four Experience components that significantly and positively predicted Visual Mental Imagery and explained more than 10% of the variance. Visual Performance explained 27% of the variance, R2 = .27, F(154) = 59.25, p < .001, 95% confidence interval (CI) [0.39, 0.66]; Personal Interpretation explained 20% of the variance, R2 = .20, F(154) = 40.16, p < .001, 95% CI [0.31, 0.60]; Narrative Content explained 15% of the variance, R2 = .15, F(154) = 28.49, p < .001; and Meaning of Lyrics explained 14% of the variance, R2 = .14, F(154) = 25.53, p < .001, 95% CI [0.23, 0.526].
Personal Significance Increase
The linear regression analysis revealed significant positive relationships between Personal Significance Increase and Experience components. The strongest predictor for Personal Significance Increase was also Visual Performance, which explained 23% of the variance, R2 = .23, F(154) = 47.28, p < .001, 95% CI [0.35, 0.63]. Meaning of Lyrics explained 21% of the variance, R2 = .21, F(154) = 42.13, p < .001, 95% CI [0.32, 0.61]; Narrative Content explained 20% of the variance, R2 = .20, F(154) = 40.06, p < .001); Emotional Use explained 19% of the variance, R2 = .19, F(154) = 36.45, p < .001; and Personal Interpretation, which explained 15% of the variance, R2 = .15, F(154) = 27.88, p < .001). Social Use and YouTube Recommender components explained only 5% and 4% of the variance, respectively. These findings were in line with our hypothesis that experiences that emphasize a focus on the music’s meaning or narrative and affect-regulation functions would be the strongest predictors of Retention outcomes.
Discussion
The aim of the study was to explore the key characteristics of music video experiences (Experience components) and their effects on future listening experiences (Retention outcomes). The study also explored the relationship between Experience components and two Retention outcomes: Visual Mental Imagery and Personal Significance Increase.
At the Experience level, seven principal components emerged. These components reflect the different characteristics of music video experiences based on the content they contain, the reasons for watching, and the affective reactions they evoke. Furthermore, these components highlight the ways in which music videos can satisfy the different cognitive, emotional, and social functions attributed to music. Narrative Content, Visual Performance, Personal Interpretation, and Meaning of Lyrics were characterized by the content they contain, and not necessarily a specific function-based intention such as to change one’s emotional state. The Personal Interpretation component is particularly unique in this respect. Whereas the other three content-based components feature at least one intention-related question (e.g., I watch music videos to gain more insight about the meaning of the music and/or lyrics), Personal Interpretation does not: It is characterized by the music video validating an interpretation of the music that is similar to that of the individual. On the other hand, Meaning of Lyrics is characterized by the intent to know what the song is about on a deeper level.
Although the first four components are characterized primarily by their content, the last three reflect why (or how) music video experiences come about. Emotional Use and Social Use reflect specific functions that music video experiences can fulfill. On the other hand, the YouTube Recommender component reflects the influence of the platform itself, which generates recommendations based on the user’s previous activity on the site (see Zhou et al., 2010) and was characterized mostly by a single item (I watch music videos when YouTube recommends them to me or auto-plays them). This reflects YouTube’s popularity as a streaming platform (77% of participants indicated that they sometimes or often use YouTube to stream music, see Supplementary Data). Interestingly, the other two items loading on this component describe enjoying music videos that have engaging choreography and storylines, and getting a boost of energy from the experience (notably, the loadings of these items are comparatively weaker). Given that many of YouTube’s most watched videos have dance moves associated with them which often contribute to their going viral, this was not surprising. For example, the first music video on the platform with over two billion views was PSY’s Gangnam Style (Benjamin, 2014) and the dance from this video was a viral meme at the time. In addition, the YouTube Recommender component reflects music video experiences from using YouTube for music streaming. While the algorithms involved in recommending and auto-playing videos are beyond the scope of this study, this component was included and labeled after the platform itself because it demonstrates a method of listening that has, to date, been under-explored in empirical research.
At the Retention level, two principal components were established that reflect the potential long-term effects of music video content on music perception: Visual Mental Imagery, which reflects how the visual imagery mechanism is affected, triggering internal imagery of the video in subsequent listens; and Personal Significance Increase, which reflects how the personal significance of a song is positively influenced by music video content, influencing the perception of and subsequent responding to the song’s emotional quality in the future. This finding reflects the overlap between changes in the perception of the music’s affective quality and changes in the interpretation of meaning previously reported by Dasovich-Wilson et al. (2022). A moderate correlation was observed between Visual Mental Imagery and Personal Significance Increase outcomes, suggesting they may co-occur depending on the music video and the individual’s relationship with the song. The last Retention component, Unaffected, represents the absence of effects in future listens.
The second aim of the study was to explore potential relationships between Experience and Retention components. The results are in line with our initial expectation that Experience components emphasizing the visual experience and meaning of the music would be the strongest predictors of Retention outcomes. While Visual Mental Imagery was predicted by all Experience components, the strongest predictor was Visual Performance. The cognitive mechanisms involved in mental imagery (which include visual, auditory, and motor imagery) offer a potential explanation for this relationship. Mental imagery is activated by the same neural processes involved during perception of the same modality (in this case, visual) as well as memory, emotion, and motor control mechanisms (Kosslyn et al., 2001). Visual Performance experiences involve attending to physical performance gestures such as a musician playing or dance choreography, and strong affective reactions including a boost in energy. The relationship with mental imagery may be a result of a joint memory code being formed during the experience being made salient due to the emotion and motor mechanisms activated. Visual Mental Imagery of human gestures may reflect the activation of mirror neuron mechanisms: neurons that activate during both observation and execution of physical action (Overy & Molnar-Szakacs, 2009). Although beyond the scope of the present study, this finding may be of interest for research on embodied cognition or neurological studies of music and multimedia. This is particularly relevant for studies exploring the relationship between music and memory retrieval.
Other Experience components which were strong predictors of Visual Mental Imagery were Personal Interpretation, followed by Narrative Content. Narrative Content experiences reflect an alternative interpretation of the music—or one that is ambiguous and open-ended—whereas Personal Interpretation experiences reflect the preference for the video depicting an interpretation similar to the meaning the listener has already attributed to the song. In both cases, the video’s depiction of the music’s meaning potentially provides a visual association that is recalled in future listens. However, the relationship with Personal Interpretation complements previous research concerning how cognitive networks are activated during music listening. When the music video is congruent with the listener’s personal associations or understanding of the music’s meaning, information from the music video may be more easily stored in memory. This may strengthen existing cognitive networks and create new ones as the music becomes more familiar (Schubert et al., 2014). This finding may also reflect how CAM functions in music video contexts; the individual is not only directing attention toward the music and videos shared structural features (see Cohen, 2013), but the similarities between the content of the video and their previous associations. It may be easier to store the visual information from the video in long-term memory because it confirms their expectations (i.e., previous associations) about the music, thus reinforcing that association. On the contrary, the reverse can also be true: if the video challenges one’s expectations about the meaning of the music, or if this violation is a result of ironic contrast between the music and the video, having one’s expectations violated may result in a more memorable experience. This complements existing research which has found that ironic contrast can enhance recognition of visual content compared to affectively congruent audio-visual pairings. This suggests that emotion-specific perception drives this process of memory formation in these contexts, as opposed to changes in felt emotion (Damjanovic & Kawalec, 2022).
All Experience components predicted Personal Significance Increase to some extent; however, the strongest predictors were Visual Performance and Meaning of Lyrics. These findings suggest music videos can influence aesthetic judgment and increase absorption, which is an important moderator influencing the strength of emotional responses to music (Sandstrom & Russo, 2013). The Visual Performance component is closely tied to music’s affect regulating function, specifically boosting energy levels. However, there is also a cognitive component to this experience, because it helps the individual gain a deeper understanding of the artist’s interpretation of the music by watching them perform. This is in line with previous research which has found that being able to see the artist perform influences audience perceptions of a song’s emotionality (Vines et al., 2006). The current study extends these findings by applying them in an everyday personal music-listening context and highlights the potential long-term effects that observing an artist perform can have on how one understands and relates with the music. The relationship between Meaning of Lyrics and Personal Significance Increase sheds light on how underscoring the meaning of the music in the video reduces any ambiguity about what the lyrics are referring to, thereby giving a song more depth. This relationship also highlights how Cohen’s (2001, 2013) CAM works in music video contexts; the structural features of the video draw attention to the lyrics, allowing for new associations to form that potentially have more depth and significance compared to whatever prior meaning was associated with the lyrics (if any).
While the study provides interesting insight into music videos and their effects on future listening experiences, there are limitations to consider. The study does not account for whether some experience types occur simultaneously, and if some occur together more often than others. For example, moderately strong, positive correlations were observed between the first three Experience components: Narrative Content, Visual Performance, and Meaning of Lyrics. A single music video may contain content that is both narrative and performative and may draw attention to the lyrics in creative ways. However, whether one or more components characterize the experience depends on the individual and their reason for watching. Future research should consider how individual differences contribute to and/or influence music video experiences and outcomes. Individuals who are highly engaged with music and consider it an important part of their lives may identify multiple characteristics of their experience. This would be in line with previous research which has found that individuals who are more engaged with music overall tend to use it to fulfill multiple functions simultaneously (Greasley & Lamont, 2011).
The intent behind the music video experience is an important factor that can potentially influence subsequent outcomes. In the current study, there were two components: Emotional Use and Social Use, which were characterized by the listener’s intentions. Both components may act as experience mediators and/or moderators. For example, Emotional Use experiences have a more deliberate function, therefore the video they select should be cohesive with their preferred affect-regulation strategy (Saarikallio & Erkkilä, 2007; van Goethem & Sloboda, 2011). If watching a music video to receive a boost of energy, the individual may lean toward a music video that features energetic performance gestures (i.e., Visual Performance). On the contrary, Social Use experiences do not necessarily have a deliberate goal beyond entertainment. Whether Emotional Use and Social Use experiences mediate the relationships between other Experience components and Retention outcomes should be explored further. Future research should consider individual differences in music listening and social media behavior, as well as individual traits (e.g., gender, musical training, personality, empathy, life satisfaction, etc.) to better understand when, how, and for whom music video experiences yield certain outcomes over others. Future research should consider using experience sampling methods (ESM) to gain more insight into how these experiences influence outcomes (see Randall & Rickard, 2017). More sophisticated methods of statistical modeling are necessary to gain better insight into the relationships between individual differences, experiences, and future listening outcomes. Another limitation of the current study is that participants were not listening to music or engaging with music videos prior to filling in the survey, therefore, the data are subject to recollective recall bias. This could be addressed in future research by either using an ESM, or a controlled experiment using a combination of experimenter and participant-selected music videos to address the question of how attention is directed during music video experiences, and how this is affected by the music.
Conclusion
Music videos are an easily accessible and popular method of music engagement. The current study explored the key characteristics of these experiences and the main effects they impose on future listening experiences. The results posit that music videos can have a salient influence on mechanisms such as mental imagery, as well as the individual’s personal associations with the song. These findings synthesize previous research on the function of music in film and research on everyday music-listening experiences by providing insight into how visual information about a song can affect personal music-listening outcomes. Future research should explore these relationships further and account for individual differences to better understand when, how, and for whom these experiences yield the strongest effects, and whether these effects are perceived as positive or negative.
Supplemental Material
sj-docx-1-pom-10.1177_03057356231220943 – Supplemental material for The characteristics of music video experiences and their relationship to future listening outcomes
Supplemental material, sj-docx-1-pom-10.1177_03057356231220943 for The characteristics of music video experiences and their relationship to future listening outcomes by Johanna N Dasovich-Wilson, Marc Thompson and Suvi Saarikallio in Psychology of Music
Footnotes
Contributorship
J.N.D.-W. researched the literature, J.N.D.-W., M.T., and S.S. conceived and designed the study, and J.N.D.-W. wrote the first draft of the manuscript. All authors reviewed and edited the manuscript and approved the final version.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Academy of Finland (project number 346210), the Alfred Kordelin Foundation (project number 220309), and from the European Research Council (ERC) under the European Union’s Horizon Europe research and innovation programme (grant agreement number 101045747). The views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
