Abstract
This paper offers a critical reflection on the paucity of theories for the phenomenon of “earworms,” also known as involuntary musical imagery (INMI), and poses some as-yet unanswered questions relating to the unique nature of the phenomenon, the optimal conditions for earworm induction, as well the underlying mechanisms that may drive the behavior. While numerous earworm studies have focused on analyzing the symptoms of the phenomenon, few studies have attempted to focus on investigating the underlying cause. In addition, common explanations are typically tied to proximal rather than distal causes (e.g., recent exposure). In particular, the question of “why music” (as opposed to other time-based auditory stimuli such as language/poetry), or, perhaps “what about music” is raised, and some conjectures and starting places for future studies are offered. Possible theoretical avenues and testable hypotheses are suggested, based on synthesizing informal observations and existing empirical research across multiple disciplines.
Introduction
As partially evidenced by this very collection, the interest in the topic of musical imagery has grown over the past decade, and along with it a wealth of empirical studies (Hubbard 2018). One question in particular, however, has continued to perplex both music cognition scholars and lay people alike: Why do we experience the phenomenon of earworms (also known as stuck song syndrome)?
Earworms, which are more commonly referenced in the literature using the more technical title, involuntary musical imagery or INMI—are often defined as a segment of music that comes to mind in the absence of any external auditory stimulus, and without being recalled voluntarily. In addition, definitions almost always include the notion that there is continuous (or at least frequent) repetition of the fragment. For example, “a piece of music that repeats a number of times…” (Williams 2015), “[episodes] have a serial, looping nature…” (Halpern and Bartlett 2011), “…repeats outside of conscious control…” (Beaman and Williams 2010). Note that for the remainder of this article, I will use the term “earworm” rather than INMI, as I concur with other scholars who describe earworms as a subtype of INMI (e.g., Geffen and Pitman 2019). For instance, C. Philip Beaman (2018, p. 43) states: …the more colloquial name [“earworm”] is increasingly familiar to the general public but also … using terms such as involuntary musical imagery synonymously with earworms implies that all forms of involuntary musical image are equivalent—a (possibly unintended) theoretical commitment that has not been established.
Indeed, while definitions of the features of the phenomenon appear to be consistent, the question of what exactly constitutes an earworm (or INMI) has not been rigorously defined, which may give rise to some inconsistencies in comparing the literature on the topic (Hubbard 2018; Williams 2015). For instance, a musical fragment that is “stuck” can be heard in one's head and repeated all day long but not necessarily in a literal continuous loop. There frequently can be gaps (and rather large ones at that) of silence in-between repetitions (or re-cueing of the song) (Hyman et al. 2013). At times, in the literature, it seems that the implied definition of an “episode” is the total time that the “stuck” fragment continues to repeat continuously for, whereas other studies refer to this as the “earworm” and an episode is merely the short fragment itself that repeats. Yet other studies use additional labels, such as “section length” to describe the length of the section of music that repeats (Farrugia et al. 2015). Additionally, in most studies, the definition offered to the participants is not explicitly declared in the paper. In a recent review by Liikkanen and Jakubowski (2020), the authors state that “…several studies have not clearly differentiated between ‘earworms’ and single (nonrepetitive) episodes of musical imagery…,” thus implying a de facto definition of episode (and, to a degree, of earworm as well). The authors do not mention any discrepancy in the term “episode” in their “Duration of Episodes” section, however. Thus, in surveying the “duration” of earworms and episodes across studies, it becomes challenging to know whether the terms refer to a single fragment (i.e., episode), or the overall duration of continuously-looped episodes, or something else altogether. However, for clarity in this paper, I will adopt Liikkanen and Jakubowski (2020)'s implicit definitions.
In any case, in order to ensure consistency in interpreting experimental results it is imperative that we, as researchers, clearly present the definition assumed as well as any definitions given to participants.
Thanks to numerous published studies we know more about earworms than ever before. And yet, a rather simple question seems to be taken for granted that rarely appears to be addressed in the literature: why music? That is, it is typically taken as a given that we experience earworms to music—and not fragments of speech, for example—because of music's inherent structure and repetition (Beaman 2018; Margulis 2013). On the other hand, Beaman (2018, p. 42) takes a “business as usual” stance on the earworm phenomenon, proposing that in fact, earworms are not unique at all. He states: …nothing that has been observed so far warrants any special status for the earworm experience outside of existing theories of relevant mental constructs such as auditory imagery, cued recall, and ironic mental control. Instead, I argue that what makes earworms distinct from other aspects of auditory cognition…is the reaction of people experiencing an earworm.
Beaman's implication that it is solely the visceral negative reaction that makes the earworm phenomenon distinct is not consistent with the majority of literature that has generally shown that, for most people, experiencing an earworm is not unpleasant or disturbing (most of the time) (Halpern and Bartlett 2011; Killingly et al. 2021; Williamson and Jilka 2014). The alternative explanation by Beaman is that if one is not voluntarily suppressing the earworm then one is instead voluntarily “singing along”—either of which (suppression or singing)—could cue repetition. I would argue, however, that rehearsal can be voluntary or involuntary, and can be experienced as either positive or negative. Moreover, I argue that it is not the negative experience that makes the earworm phenomenon unique, but rather, with regard to involuntary imagery that recurs and immediately repeats, often in a direct loop, there is no other comparable experience other than, perhaps, accounts of recurring involuntary imagery in post-traumatic stress disorder (PTSD) (McCarthy-Jones and Longden 2015; Snyder 2000) or the intrusive mental imagery experienced in many with obsessive-compulsive disorder (OCD) (Taylor et al. 2014). Yet, while PTSD and OCD are afflictions that only affect a fraction of the population, earworms appear to be a ubiquitous and universal phenomenon (Geffen and Pitman 2019; Liikkanen 2012; Liikkanen and Jakubowski 2020). Yet, Beaman (2018, p. 50) himself acknowledges these points, but proposes that the earworm's unique “looping” trait “can be explained within contemporary cognitive theories of memory and auditory imagery.” I argue that while the earworm phenomenon may indeed be explainable in terms of such theories of memory and imagery, that to ignore its uniqueness misses an opportunity to understand something greater about music itself as an unusual stimulus. And, possibly even lead to further insights in hotly disputed areas such as the shared memory components of music and language, or even the evolutionary basis of music. Again, the phenomenon of earworms begs the question: why music? And if it is simply something inherent to its repetitive structure—as opposed to its tonal or rhythmic structure—can we then induce earworms artificially using other types of stimuli such as speech, non-musical sounds, or even images, that bear the same characteristics?
In the following sections I will briefly review the findings related to empirical research on earworms, such as musical and environmental features, relation to certain mental or emotional states, voluntary versus involuntary imagery, and the notion of “musical memory.” I will highlight some open questions in these research areas, and attempt to connect some of the findings from existing earworms literature to related research on auditory memory and the role of the phonological loop, types of mental imagery, relation to other “looping” or persistent types of imagery, and the effect of stress on short-term memory. Following that, I will detail the few studies that have proposed theories or frameworks for understanding earworms. Finally, I will critically analyze what is “special” about music to be the only known stimulus to generate involuntary imagery that continuously “loops,” and propose suggestions for future research in this area.
Review of Existing Literature
The majority of research on earworms has focused on examining factors related to their everyday experience, such as environmental and situational precursors, or other idiosyncratic or phenomenological experiences of an earworm, or “getting a song stuck” in one's head, such as its average duration or frequency, or the fidelity of the imagery (see, e.g., Liikkanen and Jakubowski 2020, for a recent review). Few studies have attempted to empirically measure the likely musical features that contribute to the phenomenon (Jakubowski et al. 2017; Williamson and Müllensiefen 2012), though a few studies have attempted to infer simple features or statistics from examining lists of self-reported songs (Beaman and Williams 2010; Farrugia et al. 2015; Hyman et al. 2013).
Several studies have also tested the roles of particular memory structures (Geffen and Pitman 2019; Williamson et al. 2010), movement (Lima et al. 2016; McCullough Campbell and Margulis 2015), or subvocalization (i.e., “covert singing”) (Pruitt et al. 2019; Killingly et al. 2021) by attempting to induce earworms in a laboratory setting under various constraints or while performing different tasks. In addition, a few select papers have proposed conjectures relating to the underlying mechanisms driving earworms. In this section, I briefly review the current state of knowledge relating to these and other factors critical to understanding earworms and mental imagery. In order to avoid redundancy, I only minimally discuss certain topics already covered at length by two recent review papers by Beaman (2018) and Liikkanen and Jakubowski (2020). Instead, I will focus mostly on other related topics or musical features of earworms that were not discussed at length in these two review papers, though at times I summarize directly from them. Specifically, I refer interested readers to these two papers for more in-depth discussions of: research methodologies (including lab-induced and self-report), phenomenology, earworms as intrusive/unwanted, coping mechanisms, external activities and cues, individual differences (including effects of training), and general auditory imagery.
Musical Features
According to a recent comprehensive review of 47 studies by Liikkanen and Jakubowski (2020), “musical features” are one of four common research themes in studying earworms. As multiple studies have commented on the difficulty in studying earworms empirically—mainly due to the necessary reliance on some form of self-reporting—it is not surprising that the estimates of musical features (and indeed other, non-musical factors as well) seem to vary widely from study to study. Nevertheless, it has been shown that the music that generates earworms is extremely variable and highly dependent on individuals’ musical exposure and taste (Liikkanen and Jakubowski 2020). Liikkanen and Jakubowski (2020) cite only four very broad factors that appear common across experiences of earworms: the degree of familiarity, liking the music, “locus around the chorus,” and the presence of lyrics. However, the role of language and lyrics in particular is severely understudied. According to Liikkanen (2012), music in a foreign (unknown) language is equally capable of inducing earworms, and the same study demonstrated that instrumental music is still capable of producing earworms, even if perhaps a less- common scenario. Still, this appears the only study to have addressed the role of language for earworms directly, and, while the sample size is large, it was restricted to Finnish internet users.
It has been suggested frequently that “catchy” songs are more prone to becoming earworms (e.g., Floridou et al. 2012; Jakubowski et al. 2017; Williamson and Müllensiefen 2012), although the features of “catchiness” (and even repetitiveness) have proven difficult to define, making this common conjecture difficult to test. Indeed, earworms have commonly been described as “simplistic,” “repetitive,” and “easy to sing” (Killingly et al. 2021), all features that likely contribute to a song's “catchiness” (Burgoyne et al. 2013; Van Balen et al. 2015; Clark and Arthur, 2022). However, with regard to the “simplistic” feature, Beaman and Williams (2010, p. 646) found that there was “no evidence that songs such as children's songs, jingles, or theme tunes were overrepresented in the earworms reported.” The lack of a formal definition has not stopped researchers from attempting to use “catchy” tunes to induce earworms (e.g., Hemming 2008; Killingly et al. 2021), and indeed, Killingly et al. (2021) found that catchy songs generated significantly more phonological interference compared to non-catchy songs on a memory task, suggesting that the catchy songs were either more inducive of earworms and/or induced a greater desire to sing along.
It appears that to date there have only been two corpus-based studies that have attempted to quantify the musical features of earworm-inducing songs (Jakubowski et al. 2017; Williamson and Müllensiefen 2012), yet the results are unfortunately (yet understandably) fairly vague. 1 For example, Williamson and Müllensiefen (2012, p. 1130) found that “INMI tunes tend to contain notes with longer durations but smaller pitch intervals as compared to the matched control tunes.” Similarly, Jakubowski et al. (2017) concluded that “INMI songs had more common global melodic contours and less common average gradients between melodic turning points than non-INMI tunes”, (p.122), which was interpreted in a later paper (by an original author) as meaning that “INMI music typically combines melodic shapes that are familiar overall with some uncommon interval patterns” (Liikkanen and Jakubowski 2020, p. 1209).
The length of an earworm is perhaps the most contentious and widely varying feature reported across studies. Liikkanen and Jakubowski (2020) cite experimental studies with self-reported individual earworm episodes averaging less than 10 seconds, while diary studies included reports of median durations up to 36 minutes. The authors note that retrospective studies appeared to distort and lengthen perceived earworm episode durations from hours to days. While the authors suggest that different experimental methods appear to result in different estimates, it must be noted that the individual instructions to participants (and goals of experimenters) may be quite different. In other words, as previously mentioned, it is unclear whether the studies are all tracking the length of a single musical earworm episode, or the duration of the earworm's continuous persistence.
Environment, Attention, and Personality
The most common environmental or situational antecedent to earworms relates to mental state and attention, or cognitive load. Mind wandering, associated with low cognitive load, has been repeatedly implicated in triggering earworm activity (e.g., Beaman 2018; Floridou and Müllensiefen 2015; Williamson and Müllensiefen 2012). Interestingly, while low cognitive load is more commonly implicated, high cognitive load has also been shown to lead to increased earworm activity (Hyman et al. 2013), suggesting there may not be a simple correlational relationship between the two. A study by Hyman et al. (2013) examined the effect of low and high cognitive load using both verbal and non-verbal tasks and found increased earworms in all conditions (for the most recent song heard only). Floridou et al. (2017) attempted to replicate Hyman et al.'s findings in a controlled study manipulating additional degrees of cognitive load, and were unable to replicate the trend, instead finding that higher cognitive loads reduced INMI. Nevertheless, the authors acknowledge that their two higher cognitive load conditions “placed additional demands on phonological processing” (Floridou et al., 2017, p. 2197). Given the evidence (discussed more below) for the role of the phonological loop in sustaining earworms, this suggests the relationship between cognitive load and earworms requires further investigation.
Mind wandering on its own as a cognitive phenomenon has received growing attention, and a significant body of work covering both theoretical and experimental aspects across both cognitive psychology and neuroscience has formed over the past decade (Christoff et al. 2016). In a somewhat parallel argument to that of the present article, Smallwood (2013) aims to clarify and distinguish between attempts to explain how the mind wanders from why it wanders. In regard to the “why,” Smallwood proposes (at least) three distinct processes that contribute to mind wandering which require explanation: the trigger event, the value of the experience, and one's ability to self-regulate internally-driven mental processes. If earworms are indeed a subtype of mind wandering, then each of these three processes require further study into how they mediate earworm activity. In particular (and as will be discussed towards the end of this article), the inherent value of the experience, especially in comparison with the competing external environment, has largely been overlooked in the earworm literature. In Christoff et al. (2016, p. 718), the authors argue that “mind-wandering is best understood as a member of a family of spontaneous-thought phenomena that also includes creative thought and dreaming.” The authors distinguish between two forms of mental constraints that serve to direct or hold an attentional state, one type is deliberate (implemented through cognitive control) while the other is relatively automatic (operating outside of cognitive control). The authors then propose that “mind-wandering is a special case of spontaneous thought that is more deliberately constrained than dreaming, but less deliberately constrained than creative thinking and goal-directed thought” (p. 718). The placement of earworms amidst this range of phenomena classified under the umbrella of spontaneous thought has similarly been understudied, and the field would benefit from a greater understanding of the degree and variation of the cognitive control of earworms, and how the phenomenal experience compares to other (non-musical) forms of spontaneous thought.
A distinction is made between internal and external environmental factors or cues that may trigger earworms. While mental or emotional states may predispose one to earworms, the cues that initiate the memory recall may be external or internal. Liikkanen and Jakubowski (2020) discuss “INMI triggers” (i.e., cues) as a commonly-studied phenomenon and report that environmental or situational associations were the most common type of memory trigger identified (e.g., a specific location, object, or procedure).
While mind wandering is the most common state that has been mentioned in the literature leading to internally-cued earworms, another “state” or frame of mind has been largely overlooked: stress; perhaps on account of its association with high cognitive load, which, as mentioned, has largely been thought of as countering earworm activity. Yet, being in a state of high stress or anxiety has a notable documented history with influencing memory (Quaedflieg and Schwabe 2018; Wolf 2009) although the directionality of the relationship—akin to attentional load—has not been clearly established. A potentially relevant finding from a non-auditory task comes from Buchanan and Tranel (2008), who sought to better understand why memory is at times degraded by stress and at other times improved by stress. In this study the authors manipulated both the emotional content to be retrieved and the level of participants’ stress. The authors found that only some participants produced a cortisol response to the stress condition (6 out of 20 participants) and that those who produced a cortisol response typically had weaker memory performance overall, but especially in response to negatively-valenced stimuli. In contrast, those who did not produce a cortisol response (14 out of 20) showed increased retrieval capabilities. Similarly, a more recent study from Khayyer et al. (2021) examined the interaction of stress induction with emotional valence on auditory working memory specifically. While the authors of this study again found an interaction between valence and stress response, the directionality was reversed: participants generally showed increased memory capabilities following a stressful task for positively-valenced stimuli. However, the authors did not measure cortisol response. Work by Edwards et al. (2015) examined the relation between situational stress, trait anxiety, and cognitive load on performance ability on an attention-switching task. Their results found an interaction such that at lower cognitive loads, those who exhibited higher trait anxiety but who reported lower situational stress performed significantly better at the task. Finally, Jakubowski et al. (2015) found a significant positive relationship with the tempo of the earworm tune and subjective arousal. This could suggest that the arousing properties of the music could act as an external cue, or it could be that more stressful (or arousing) songs are more likely to become earworms. In any case, both are interesting given the relation between arousal and stress, and should be carefully controlled in future earworm studies, and possibly examined more directly in a controlled experiment.
Given these reported impacts of stress on memory performance, it may be fruitful to research the effects of stress (and in particular the cortisol response) and trait anxiety (as well as cognitive load, and their interactions) on earworm cueing and maintenance.
Distinct from particular mental states, there have also been multiple studies supporting the correlation of certain aspects of musical imagery—both voluntary and involuntary—with certain personality traits such as neuroticism and openness to experience (Floridou et al. 2012; Kellaris 2003; Liikkanen and Jakubowski 2020), as well as obsessive-compulsive traits among other individual differences (Müllensiefen et al. 2014). Comparisons across multiple studies have been facilitated by tools specifically designed to gather trait measurements related to musical and auditory imagery, namely, the involuntary musical imagery scale (IMIS) (Floridou et al. 2015) and the Bucknell Auditory Imagery Scale (BAIS) (Lizotte 1998).
Types of Imagery
Across sensory modalities, there is empirical support showing significant overlap between acts of perception and imagination, both in terms of phenomenology as well as shared functional brain areas and cognitive mechanisms (Floridou et al. 2022; Halpern 2001). Floridou et al. (2022) discuss certain similarities and differences between voluntary and involuntary imagery, in particular within the auditory domain, noting that these differences have often been neglected since typically studies focus on one aspect or the other, often with different goals. For instance, the authors mention that voluntary imagery is commonly studied with reference to its vividness or realism, while involuntary imagery is commonly concerned with aspects of “everyday experience such as its frequency” (Floridou et al., 2022, p. 30).
Mental imagery, regardless of the sensory modality, can be classified as either voluntary, when deliberately brought to mind, or involuntary, when imagery spontaneously (and unintentionally) is brought to mind. Earworms are consistently defined as involuntary. However, it seems clear that maintenance may be either voluntary or involuntary (e.g., Beaman 2018). Since earworms are defined as involuntary imagery, a key component to understanding the phenomenon involves ascertaining whether there are any relevant differences between voluntarily and involuntarily evoked auditory memories or imagery. If differences in voluntary and involuntary imagery are minimal, we may be able to use voluntary tasks to study these domain differences using both behavioral and neuroscientific methods. These two aims were recently pointed out and investigated (behaviorally) in Floridou et al. (2022). However, Hubbard (2018) warns of confounding the two types of imagery before these differences can be known in detail. For instance, he points out that “the time course for cortical activation differs between voluntary auditory imagery and involuntary verbal hallucinations” (p. 24), and that for voluntary versus spontaneous inner speech, areas of neural activation differ. Both Hubbard and Floridou et al. raise this important issue in regard to connecting older and newer research findings, noting that historically researchers failed to distinguish between the level of intentionality of imagery experiences, and that the focus was mainly on the modality of the imagery.
Similarly, understanding how musical imagery differs from speech or from visual imagery will offer critical insights into the study of earworms. For instance, Hubbard (2018) discusses analogies of control—the ability to manipulate or transform a mental image—that have been tested in both the visual and auditory literature. Hubbard notes, however, that pausing the image (“keeping it constant and unchanging”) remains unexplored. Indeed, a lack of ability to perform this control would have significant ramifications for the study of earworms. Differences in memory structures, phenomenology, and capabilities across domains are discussed in the ensuing section.
Musical Memory
An important question, apparently largely unanswered, concerns the apparent vividness or fidelity experienced with auditory memories compared with other modalities, especially visual: are auditory memories, and in particular musical memories, more vivid or realistic compared with those of other domains? Musical imagery, especially earworms, are commonly reported as playing in the mind in extremely high fidelity (Jakubowski et al. 2015, 2017; Liikkanen and Jakubowski 2020; Williamson and Jilka 2014), often like a weakened form of perception or “replay” of the original musical material (i.e., acoustic signal). According to a recent review by Liikkanen and Jakubowski (2020), this has been a repeated finding in both quantitative and qualitative studies, and the authors cite Williamson and Jilka (2014, p. 666) as noting that “musical imagery is frequently comparable to an actual music listening experience.”
Yet, a recent study by Talamini et al. (2022) demonstrated that in comparing the vividness of auditory to visual imagery, only trained and self-taught musicians had higher reported auditory vividness compared to visual images. The non-musicians showed a slightly reversed trend, though the authors do not report whether it was significant. This could suggest the over-presence of self-selection bias in a large portion of earworm research that has mostly relied on self-report and survey methodologies (Liikkanen and Jakubowski 2020). Similarly, Floridou et al. (2022, p. 39) found empirical support for a “modality-general mechanism in relation to vividness of visual, auditory, and motor stimulus modalities.” However, the scant amount of cross-domain research in this area suggests a need for further inquiry.
In addition to the unique fidelity of musical memories, numerous studies have commented on the unusual accuracy of memory for music. For instance, Margulis (2013) points out that while memories for verbal material typically recall the gist quite well but often fail at exact reproduction, musical memories are often “unavoidably verbatim in nature” (p.75). Several studies have validated the conjecture that our memory for music is retained in a verbatim manner, suggesting fundamentally different underlying representations for memories of linguistic (either read or spoken) versus musical material. An analagous experience in the visual domain would be that of eidetic imagery, colloquially referred to as “photographic memory.” While in vision, this type of memory appears to be exceptionally rare, the implication is that most of us carry a much more verbatim representation for music, especially for melody (i.e., “the tune”) both in terms of its near-verbatim representation as well as its fidelity. Recent empirical research has also helped to validate claims of accuracy by measuring the remembered tempo of the earworm tune as well as the key compared with the original song, finding strong correlations in both cases (Jacubowski et al., 2015; Evans et al., 2022, respectively).
An interesting finding by Bigelow and Poremba (2014) is that, in general, humans—like our nonhuman primate relatives—may have a weakness for remembering auditory compared with visual and tactile stimuli. More interesting, perhaps, is that in their experiment, performance on both recognition and recall did not differ between sensory modalities at smaller intervals of 1–4 seconds, but at longer retention intervals (8–32 seconds) accuracy for auditory stimuli worsened compared to retention for the visual or tactile stimuli. Their findings were replicated in a subsequent study using more ecologically valid stimuli, and a parallel study cited in their paper shows that professional musicians fared no better than those with no musical expertise. A speculative conjecture rising from this finding would be that additional cognitive resources as well as cognitive strategies (e.g., rehearsal) may be necessary for maintaining auditory memories compared with other types of memories. However, it should be acknowledged that there is research suggesting the reverse. For instance, Conway and Christiansen (2005) found that in a comparison of touch, vision, and audition modalities in the statistical learning of sequential input, the auditory modality displayed a quantitative learning advantage compared with vision and touch. In fact, within the psychological literature on memory, there is a substantial body of research suggesting that in many cases auditory information is remembered better, not worse, than visual information (Watkins, 2001). These various findings raise the question as to whether “musical memory” is special?
Musical memory has been a topic of study since the beginnings of the emergence of the field of music perception and cognition (e.g., Meyer 1903; O’Brien 1943; Whitely 1934). The notion that we may possess a special type of memory for music or whether there may be dedicated neural resources specifically to musical content, especially with regard to a so-called “tonal working memory,” has been debated for decades. While a comprehensive review of musical memory is beyond the scope of the present article, the interested reader may consult Jäncke (2019) for a recent review. Specifically, this debate has featured most prominently in the literature surrounding the evolutionary and neuro-functional ties of language and music (e.g., Patel 2010).
With regard to memory structures and capacity, musical memory has most commonly been compared to verbal memory, most likely because of their shared sequential structure and reliance on the auditory domain. There has been strong empirical evidence suggesting that the neural architecture and networks required for representation and manipulation of both language and song are remarkably similar if not identical (e.g., Albouy et al. 2019; Koelsch et al. 2009). However, a few studies have found limited correspondence for the ways that music and language are processed in auditory short-term memory (e.g., Williamson et al. 2010). As described above, the verbatim nature of musical memory appears in stark contrast to that of memory for language. However, this may simply be a function of a specialized role for semantic processing. In other words, when listening to speech our attention is on the semantics rather than at the level of phonemic awareness. It is well known that when presented linguistic information in one's native language the focus is on the semantic content rather than the phonetic structure (or resulting sound), and, in fact, when our focus of attention switches from the meaning to the sound (sometimes losing the meaning entirely) it produces a reaction referred to as “semantic satiation” (Black 2003).
Both music and language rely on the phonological loop as a rehearsal mechanism facilitating the encoding of long-term memories (Aboitiz et al. 2010; Baddeley and Hitch 2019). With regard to the earworm phenomenon specifically, the suggestion has been raised about the connection to our short-term memory capacities, and in particular, the role of the phonological loop in sustaining or rehearsing earworms. As pointed out by Beaman (2018), the phonological store supposedly has a 1.5–2 seconds time limit, and yet the varied estimates of the reported duration of single earworm episode commonly exceeds this threshold by at least a factor of 4 (see the Musical Features section above), 2 although, as Beaman notes, this discrepancy may arise because of a lack of distinction in the literature between the length of the musical fragment itself that repeats, and the duration of the episode overall (i.e., how long one retains the earworm for.) As such, the length of the actual musical fragment (and its content) is something that should be given greater attention in future research. Recent work by Evans et al. (2022) may soon provide some novel clues, as his research study asked participants to sing the fragment of the earworm episode that was looping. Finally, this 1.5-2 seconds limit for the phonological store does not take into consideration the effect of perceptual chunking, which is known to extend the capabilities of short-term memory structures (Snyder 2000).
There is a large body of evidence in support of a phonological loop capacity limit of approximately four chunks on average, but the ability for individual chunks to be connected (and how) may alter this limit (Cowan 2001). Interestingly, Gilbert et al. (2014) reference studies citing the appearance of accent groupings in language that do not tend to exceed four syllables on average, and that listeners are capable of detecting such groupings. Overall, research on short-term memory for music suggests that the section of melody that repeats may be related to the size of the units (chunks) as well as the overall duration. That said, it is most common for earworms to come from songs that are already well known (and not novel), and therefore, the material already exists in some form in long-term memory. In this case, the traditional limits given to the phonological loop in its role as a rehearsal buffer could potentially be irrelevant. Yet, according to the “unitary storage account” and the theoretical framework provided by Cowan (2001), the capacity limit would equally apply to all memory mechanisms, including information retrieved from long-term memory. Indeed, Cowan proposes that the four-chunk capacity arises due to limitations of attention rather than memory.
The phonological loop's link to earworms has been implicated in more than just its capacity or size. A recent study by Killingly et al. (2021) presents new evidence implicating the role of the phonological loop during maintenance of auditory imagery. In their study, participants completed serial recall tasks during alternating blocks of music and silence (the music was either “catchy” or “non-catchy” based on piloting). In the silent periods following “catchy” songs, participants were significantly poorer at the serial recall task compared with a baseline (pre-test silence condition), suggesting both that participants’ phonological store was “occupied” with the musical material. This paper also presents one of the few cognitive theories for earworms (see next section).
Another way that music has been suggested to hold a special status lies in its apparent mnemonic power. Over multiple decades, numerous studies have attempted to test the effect of music as a mnemonic device (Baird et al. 2017; Ding et al. 2018; McElhinney and Annett 1996; Moore et al. 2008; Moussard et al. 2014; Thaut et al. 2014; Wallace 1994). However, outcomes of this research are quite contradictory with regard to any evidence on this phenomenon; mostly when subsequent studies attempt to replicate with a more appropriate control in place, the effect disappears (see, e.g., Schulkind 2009). Indeed it has been proposed that “the ‘special’ power of music as a mnemonic device may be that it fosters excessive rehearsal” by virtue of the pleasure it gives in rehearsing it (Schulkind 2009; Beccacece et al. 2021). Finally, music's role as a mnemonic has also been implicated in evolutionary accounts. In addition to the many other suggested possible functions of music from an evolutionary perspective—beyond the scope of this article to list—music may have facilitated long-distance communication and possibly served as a framework for aiding in the remembering of lessons, stories, and the passing down of cultural information over time (Harvey 2018; Trainor 2018).
Proposed Mechanisms and Theories
Over the past several years, a few notable papers have emerged that have proposed some answers to the “why” of earworms. The first such proposed idea was implied in the work of Hyman et al. (2013), where—based on an informal diary study where they found increased earworms following incomplete presentations of songs—they manipulated the length of a song's exposure, hypothesizing that presenting an incomplete song would produce a Zeigarnik effect. Zeigarnik theory proposes that thoughts or tasks that are interrupted trigger an increased desire to complete the thought/task, leading to increased prominence and retention in memory. However, in the formal experiment, Hyman et al. (2013) found no empirical evidence in support of their hypothesis. This hypothesis was brought up again in Williamson et al. (2014) as a result of analyzing over 1,000 reports of reactions to INMI episodes. Here, the authors refer to the work of Hyman et al. (2013), pointing out that “[the] failure to show the Zeigarnik effect as an antecedent for INMI origins does not preclude the idea that it may be a cause of the involuntary repeating, cyclic nature of the experience…” (Williamson et al., 2014, p. 7), advocating for further testing of the Zeigarnik effect. McCullough Campbell and Margulis (2015) also tested this effect (among other things) in a controlled experiment and likewise found no empirical evidence in support of the Zeigarnik hypothesis. Likewise, Killingly et al. (2021) found no significant effect for song truncation in a controlled experiment. In addition, the number of reports of earworms to incredibly simple and repetitive songs (e.g., “Baby Shark”), where the full song is known in its entirety (and often presented in its entirety, as in the case of commercial jingles), suggests that even if there is a role for the Zeigarnik effect, it cannot be the sole cognitive mechanism triggering earworms. Nevertheless, at present there is not much empirical support for the Zeigarnik effect in creating earworms.
The second notable paper proposing a theory is by Beaman (2018). However, Beaman's theory is unique in that he generally proposes “no theory” as a theory. As stated earlier in the Introduction section, Beaman argues that what is special or notable about an earworm is one's reaction to it. Nevertheless, Beaman does attempt to provide a “theoretical framework” for the earworm phenomenon in terms of an “information processing ‘box and arrow’ diagram.” This (self-admittedly “speculative” and “inchoate”) framework attempts to organize the various components of the earworm experience (i.e., involuntary and voluntary processes; positive versus negative reactions) into a parsimonious construct. It does not, however, offer any insights as to the underlying drivers or explanations for the various facets of the phenomenon. In fairness, Beaman explicitly positioned himself as arguing for earworms as any ordinary involuntarily evoked memory, in which case no further explanations are required. Nevertheless, Beaman's framework attempts to make explicit certain assumptions about the phenomenon, such as the distinction between the initial cueing (which can be internal or external) and the subsequent maintenance, where the former is always involuntary but the latter need not be.
The third and most recent notable paper is that of Killingly et al. (2021). In this paper the authors propose two contributing factors, one related to the structure of the song itself, and one related to cognition. The authors review several melodic features from numerous surveys and studies that suggest “songs that people report as earworms tend to be those which are easier to sing” (p.457). Following this, Killingly et al. (2021, p. 458) propose that earworms arise from “an unconscious desire to sing along” and they conjecture that a repetitive fragment of song can become “stuck” in the phonological loop, such that one feels compelled to continually rehearse the line in working memory.
More specifically, subvocal articulation is engaged in rehearsing musical information, activating the phonological loop, which is known to be a crucial component in maintaining verbal and other auditory information in memory and facilitating the passage of information from short to long-term memory. The authors conducted a series of experiments investigating the effect of the presence of earworms (presuming phonological loop and subvocal articulation engagement) on participants’ performance on a concurrent phonological task. Killingly et al. found convincing evidence for the usage of the phonological loop during earworms, as well as worsened task performance during earworms of songs self-rated as generating a strong “desire to sing along,” thus supporting their hypothesis and offering new evidence for memory structures engaged during earworms. However, the authors did not offer any reason or cognitive basis for why there would be an “unconscious desire” to sing along in the first place. As mentioned earlier, the authors tested the Zeigarnik effect but song truncation was not found to be a significant predictor of earworms.
Although not specific to earworms, the work of Snyder (2000) may offer some clues as to why earworms repeat. The first point is that repetition in melodies not only allows “chunking” (and therefore significantly extends the capabilities of short-term memory) but is an implicit, subtle form of rehearsal via redundancy in the information. Snyder (2000, p. 53) says: Indeed, any [sic] repetition of elements in a pattern…constitutes a kind of rehearsal, reduces memory load, and helps us to maintain an image of the immediate past…This is why repetition is essential in the construction of memorable patterns.
The second, and perhaps more subtly overlooked point is that the beginnings and ends of “chunks” of information in short-term memory are more easily recalled and accessed, and can, in fact, be linked. As anyone old enough to recall listening to full CDs (in their original track order) can attest to, often the end of one song triggers the cue for how the beginning of the next song will begin. Indeed, Snyder (2000, p. 55) mentions (and supports) that “the way we remember long time-ordered sequences of chunks is that the last [sic] element in each chunk can act as a recall cue for the next [sic] chunk.”
Two conjectures arise from connecting the work of Snyder (2000) and Killingly et al. (2021). The first is that musical fragments that contain more internal repetition (redundancy), especially within the estimated range of short-term memory (reported averages vary, but 3–5 s appears a modest middle ground) would be more likely to induce earworms. This internal repetition is something that Killingly et al. (2021) mention as partial criteria for creating “easy to sing” melodies. It remains to be tested whether the other features that contribute to more “singable” melodies (such as small musical interval successions or the range of the song compared to the range of the person's voice) are also important criteria in inducing earworms. The second conjecture is that songs with a chunk ending that also matches the chunk's beginning acts as another cue to recall the fragment from the beginning, thus creating an implicit loop where the end of a musical phrase cues itself to start again and so on. I return to these conjectures in the subsequent section.
Finally, while not directly proposed as a theory for earworms, Kubit and Janata (2022) found empirical evidence to support that earworms serve not only a consolidation mechanism for the music itself, but also for associated episodic information. This may be a crucial component in formulating an etiology of earworms, since several papers mention the lack of any clear benefit or purpose. For instance, Killingly et al. (2021, p. 456) state “[a]n earworm is a phenomenon that serves no apparent purpose to the observer.”
Towards a Theory of Earworms
In the prior sections of this paper, I have reviewed selected literature on earworms while attempting to connect it to research from additional subdisciplines relating to memory and imagery that often have different motivating questions and underlying goals (e.g., visual and auditory short-term memory, external and internal factors on memory cueing, involuntary and voluntary musical (and non-musical) imagery, behavioral and neuroscientific studies). Through reviewing the above literature and attempting to synthesize this information, I suggest that there are some overlooked but fundamental questions that will need to be addressed in order to further support a theory for the optimal conditions as well as the underlying mechanisms for earworm induction and maintenance. In what follows, I will propose several testable hypotheses in response to these questions in the hopes of leading to a unified theory of earworms. Furthermore, these questions are raised in order to encourage a focus for future studies on earworms towards a biologically- or evolutionarily-driven research approach so that the focus remains not only on the symptoms, but also works toward the derivation of a plausible etiology for the phenomenon.
Three Big Questions
There are (at least) three questions that appear to be somewhat taken for granted or overlooked in the study of earworms. The first is, generally speaking, why music? That is, presumably there is something about music that affords or creates the internal repetition. But, what about the music, exactly, is that driving feature? As Margulis (2013, p. 66) notes, “timbres and chords aren’t ‘catchy’ in the way a tune can be. The catchiness arises from the chunked and sequential nature of tunes…”. Yet, the information being sequential or inherently “chunk-able” does not appear sufficient, as there would be other types of stimuli that would fulfill those criteria that do not seem to give rise to earworms.
There are three likely culprits to music's “unique” status as an earworm generator, that may all interact: the inherent repetition on a relatively short scale, the melody (i.e., tonal, pitched material), or the rhythm (and/or rhyme). Notice that I did not suggest speech or lyrics as a key factor. While it has been implied that songs with lyrics are much more likely to become earworms (Beaman 2018; Hyman et al. 2013; Killingly et al. 2021)—and it certainly seems plausible that language plays a key role—it clearly cannot simply be the presence of speech alone that drives the phenomenon, since otherwise involuntary imagery of random spoken fragments heard throughout the day would be a commonly-reported occurrence. If the melody or rhythm has no role in the creation of earworms, then it should be possible to induce earworms with non-musical stimuli that simply are repeated using similar repetition structures and on a similar time scale (i.e., on the order of a few seconds). Accordingly, this would be the first hypothesis to test: Compared to musical stimuli, repetitive, sequential non-musical auditory stimuli are equally capable of generating earworms compared with musical stimuli. One slight problem in carrying out an experiment to test this hypothesis would be that it is known that repeating inherently non-musical stimuli multiple times actually highlights their non-semantic properties such as intonation and rhythm (Deutsch et al. 2008, 2011), ultimately creating something music-like out of non-musical stimuli. Even so, it remains a useful hypothesis to test. If results can be obtained that are consistent with the above hypothesis, then the next logical step would be to add in the other features one at a time (i.e., rhythm, rhyme, then melody) to observe if either alone can facilitate earworm induction. Similarly, musical stimuli that lack any sense of rhythm or inherent repetition could be tested against matched/similar versions that do. As illustrated by Halpern and Bartlett (2011) it would appear that there do exist documented cases of non-musical earworms (I find “non-musical INMI” a bit of an oxymoron) but if they do occur, they appear extremely rare. For example, Halpern and Bartlett (2011) found some descriptions of non-musical (verbal) auditory memories, but it is not clear from the study what they were, nor what is meant by “persist,” since this could refer to the notion that the memory repeats (or “loops”) but also could simply mean that the memory itself is involuntarily recalled multiple times without looping per se.
The second overarching question is why are earworms so common? It appears that the phenomenon of “getting an earworm” is not only universal but quite commonplace, with most people having at least one (and typically more) per day. For instance, Halpern and Bartlett (2011) reported a median of 1.7 episodes per day while Jakubowski et al. (2015) reported a median of 3.5 per day. In comparison, other forms of involuntary imagery (such as involuntarily recalled autobiographical memories) are reported to occur between 5 and 20 times per day (Berntsen 2010; Rasmussen and Berntsen 2011; Rasmussen et al. 2015), suggesting other forms of involuntary imagery are more common. Yet, it appears unknown whether the overall duration of the experience (over both a single episode, and over the duration of the combined episodes per day) last, or persist, longer than these other types of spontaneous imagery. For instance, Halpern and Bartlett (2011) report a mean duration of 8.22 minutes for earworms, with the median overall duration of 36 minutes; and Beaman and Williams (2010) report an overall duration of earworms happening “over a period of hours.” On the other hand, while literature on involuntary visual memories also examine factors such as frequency, environment, and personality (Berntsen 2021; Maillet and Schacter 2016), I was unable to find any mention of the length or duration that people dwell on such memories, suggesting that it unlikely to be a prominent component of the experience (i.e., that they are fleeting), excepting literature on rumination (Watkins and Roberts 2020).
The direct comparison between unbidden visual versus auditory imagery ought to be empirically investigated in order to compare the relative frequency, prominence, and time spent on the activity throughout the day. Another conjecture is that, due to the unusual looping sensation, compared to other involuntary imagery, we perceive earworms to be more common than other forms of involuntary imagery (as Beaman (2018) suggests), simply because of an increased awareness of them. Or, could it actually be the case that humans recall musical fragments more than (or as much as) other types of memories on a daily basis, and then, if so, why?
In the relatively recent work by Barzykowski et al. (2019, p. 667), the authors rhetorically ask, “why are we not constantly flooded by involuntary thoughts?” The answer to the question lies in an inhibitory control mechanism “that prevents our stream of consciousness from being flooded by task-unrelated thoughts about the past and future” (p. 668). The authors of this study theorized that since this inhibitory process requires energy and maintenance that can be depleted, that by forcing a depletion, and therefore a breakdown in this inhibitory process, involuntary thoughts should significantly increase. However, to the authors’ surprise their experiment did not support this theory. How these inhibitory processes relate to the frequency or ubiquity of earworms, or to the variation in individual differences, remains largely unstudied.
The third overarching question is why do we continually rehearse it? The continued rehearsal (whether truly involuntary or not) of the earworm material (i.e., episode) is arguably its most notable characteristic. It is not sufficient to simply lay the blame on the musical stimulus which carries inherent repetition. Rather, if it is simply that inherent repetition in any auditory stimulus somehow “triggers” rehearsal in the phonological loop, that ought to be able to be demonstrated (as already mentioned). Rather, it seems crucial to building an etiology of the earworm phenomenon that we gain an understanding as to why, unlike other sequential memories being recalled, a problem occurs with the image or memory getting “stuck” in the phonological loop, resulting in a (largely unintentional) state of continuously rehearsing the fragment. That is, it would be prudent to investigate what is triggering this presumably non-adaptive trait of the (typically) involuntary rehearsal of non-meaningful information. Is there, perhaps, some inherent or unconscious reward for doing so? Is it instead (or also) an artifact of an ancient system for preserving spoken instructions through song?
A starting point for investigating these questions might examine closely-related phenomena that retain the psychological tendency to involuntarily recall and continuously rehearse information. By attempting to search for similarities in environmental conditions, emotional or behavioral outcomes, trait predispositions, etcetera, we may uncover previously unnoticed connections between what appear to be—on the surface—very different phenomenological experiences that could point to shared underlying mechanisms. In this regard, three psychological phenomena appear similar: post-traumatic stress disorder (PTSD), obsessive-compulsive disorder (OCD), and rumination. As noted by Beaman (2018), the visual flashbacks that can accompany PTSD (and similar disorders) have the similar trait of being “stuck in a playback loop.” However, as Beaman mentions, these cases are not ubiquitous but pathological, and are typically accompanied by strong sensory and emotional experiences relating to trauma. Nevertheless, understanding the mechanisms and conditions that lead to repeated flashbacks in PTSD may hold some clues for earworm research.
Traits of OCD typically include either involuntary repetitive movement and/or uncontrollable thoughts that the individual feels compelled to continuously repeat (National Institute of Mental Health, 2022). Indeed, numerous studies reviewed in Liikkanen and Jakubowski (2020) found OCD traits to be positively related to earworm frequency, duration, or unpleasantness.
Rumination, in contrast to PTSD and OCD, is something that is a common, psychological process, that, while it can be pathological, is also experienced by most people throughout their lives (Watkins 2008; Watkins and Roberts 2020). Rumination is defined as repetitive and prolonged reflection of one's past actions, behaviors, feelings, and experiences (Watkins 2008), and/or remaining “fixated” on problems or feelings (Christoff et al. 2016). PTSD, OCD, and rumination are all thought to be negative experiences, all of which contribute to the exacerbation and maintenance of physiological stress responses (Watkins and Roberts 2020). Although there have been documented cases of prolonged and extended episodes of earworms described in the literature as distressing, 3 as already mentioned—unlike rumination and especially PTSD or OCD—earworm experiences are typically not unpleasant. Moreover, unlike PTSD and OCD, rumination is not always maladaptive. For instance, under some circumstances, rumination has the ability to help learn from past mistakes and plan for the future (Watkins and Roberts 2020). It has been suggested that rumination is a largely involuntary process, and that in some cases individuals (especially with depression) are largely unable to stop themselves from ruminating, suggesting a lack of cognitive control (or “automatic” mental constraint) (Christoff et al. 2016).
Interestingly, neuroticism has been found to be closely related to rumination tendencies, although the nature of this relationship is unknown (Hervas and Vazquez 2011), and, as mentioned, neuroticism (as well as openness to experience) have shown positive relationships to INMI experiences across several studies (Floridou et al. 2012; Kellaris 2003; Liikkanen and Jakubowski 2020).
One conjecture stemming from this research, albeit somewhat unsubstantiated, is that those with more anxious and neurotic dispositions may be more prone to habitual rumination, which in turn may predispose the individual to earworms. In other words, those already prone or in the habit of continuously reflecting and repeating information may experience more earworms.
Nevertheless, while individual differences and, in particular, personality traits may predispose a person to more earworms over their lifetime, clearly the repetitive, looping nature of earworms remains one of its defining features across all individuals. Thus, understanding why the fragments continuously repeat remains an unsolved puzzle. A final conjecture, introduced in the Proposed Mechanisms and Theories section, is that the looping occurs as a result of self-cueing. That is, that either the end of one chunk (e.g., a phrase) is identical to the beginning of the next (even if just the same note), or this could occur at the level of an entire phrase (or subphrase) where the final phrase cues the beginning (e.g., a song with an “A, A, B, A” phrase structure), and so on. While perhaps a somewhat obvious conjecture, and one that even has been implied in the literature (see, e.g., Halpern and Bartlett 2011, p.425), this conjecture remains to be tested in any empirical study. Nevertheless, despite the fact that music is a rather unique stimulus in its capacity for repetition (Margulis, 2013), it does seem unlikely that the repetition alone can account for the induction of earworms. As mentioned, this repetition structure could be applied to many different auditory (and non-auditory) stimuli to test whether an earworm-like phenomenon results.
A fourth, tangentially-related question to also consider, is: is memory for music special? In the section Musical Memory, several areas of study were reviewed relating to music as holding some sort of “special” memory status. For instance, the vivid or verbatim nature in which musical memories are often stored compared with memories in other sensory domains; its potential power as a mnemonic device; and its referenced role in evolutionary accounts-whether for long-distance communication, mnemonic value, or in reference to a musical-linguistic “protolanguage.” Definitively answering the question as to whether musical memory is, indeed “special” may hold clues for our understanding of the phenomenon of earworms, and why they appear unique to music.
Discussion
This paper has reviewed notable recent literature on earworms, a form of involuntary musical imagery, or INMI, in conjunction with specific related research in music and memory such as non-musical imagery, stress and personality, mechanisms of auditory memory, and music and evolution. The purpose of the paper was to illustrate that while several recent papers have noted the lack of any theoretical explanation of the earworm phenomenon, the few proposed frameworks and underlying mechanisms lack clear distal explanations (or even conjectures). In the section Three Big Questions, I proposed three large questions that need to be investigated in order to better understand earworms and whether it relates to some form of “special” musical memory, some evolutionary (possibly vestigial) role of music itself as a memory cue, or whether it is simply a function of the inherent (and unique) repetition structure in music. In sum, these three questions were: Why music?; Why are earworms so common (or are they)?; and What causes the continuous rehearsing of an earworm episode? I propose that searching for answers to these three questions will help to generate a biologically-driven theory for earworms.
In sum, the earworm remains a perplexing (and at times, vexing) phenomenon. Although not discussed in the present paper, a recent review discusses the challenges in studying it empirically (Liikkanen and Jakubowski 2020). Nevertheless, a wealth of studies in the past decade have brought new insights and clues as to the foundation for the phenomenon. In particular, the role of subvocalization and the phonological loop appear quite critical in the rehearsal and maintenance of earworms, and several studies point to low-arousal states, and in particular, mind wandering, as leading to internally-cued earworms.
Given our understanding of earworm research to date, it seems plausible that the combination of music's unique properties, certain optimal conditions (such as mind wandering or stress), and personal traits and predispositions (such as neuroticism or anxiety), at times randomly coincide to produce the perfect conditions for the cuing and maintenance of an earworm. And, moreover, the mechanisms of the working memory system may attempt to “run on autopilot” given the right musical conditions, and that perhaps music's ability to reward may be sufficient for our brains to content themselves rehearsing the music. However, much more work remains to be done to give any credence to these tenuous conjectures.
Several hypotheses and directions for future study were forwarded in this paper, relating to both the propensity of certain songs (over others) to induce earworms (i.e., based on the structure of the internal short-term repetition); the ability for non-musical stimuli to induce earworms; the frequency or propensity of earworms compared to other (e.g., visual) involuntary memories; the predisposition of anxious or rumination-prone individuals to encounter more earworms; and the possible role of stress in leading to earworms. It is my hope that sharing these questions, conjectures, and hypotheses may lead to fruitful avenues for new research, and help to build a theory of the earworm phenomenon.
Footnotes
Action Editor
Liila Taruffi, Durham University, Department of Music
Peer Review
Nicolas Farrugia, IMT Atlantique Bretagne Pays de la Loire, Department of Mathematical and Electrical Engineering
Philip Beaman, University of Reading, School of Psychology and Clinical Language Sciences.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
This research did not require ethics committee or IRB approval. This research did not involve the use of personal data, fieldwork, or experiments involving human or animal participants, or work with children, vulnerable individuals, or clinical populations.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article
