A phrase in a loop: A rethink of verbatim repetition in the speech-to-song illusion and a new approach to the study of involuntary auditory imagery

Abstract

The human mind engages with repetition both in the form of external stimulation and as an internal psychological phenomenon. I studied these two facets of repetition in 111 musically untrained participants. The investigation was in two parts. In the first part, I looked at the speech-to-song (STS) illusion by comparing ratings of musicality for identical looped (repeated) spoken phrases and for looped spoken phrases including pitch transpositions. Contrary to previous evidence, the stimuli containing transposed presentations were rated more song-like. I attribute this finding to the particular patterns of transpositions used and the fact that the interval of transposition was a major second as opposed to the fractions of semitones used in previous studies. In the second part of the investigation, I partially adopted the stimulus design of the STS illusion paradigm, presenting participants with looped phrases and single phrases. Participants were asked to suppress them and click a button if they thought of them. An increase in button clicks suggests that participants found it significantly harder to suppress the looped phrases than the single phrases. As the stimuli in the second part had been shown in previous research not to induce the STS illusion when repeated, I attribute the differences between participants’ responses to the two kinds of stimuli to the repetition of the phrases rather than their being perceived as song-like. I consider these findings in the light of research on earworms, or involuntary musical imagery. Future directions for a more in-depth exploration of the STS illusion in conjunction with auditory imagery are also briefly discussed.

Keywords

repetition transposition imagery suppression earworms

Repetition is ubiquitous in music but relatively rare in language (Margulis, 2013), with notable exceptions including public speeches (Ehrensberger, 1945), poetry (Ribeiro, 2007), and child-directed speech (Hoff-Ginsberg, 1985; McRoberts et al., 2009; Schwab et al., 2018). Given listeners’ focus on semantic content rather than surface structure in everyday conversation, repetition appears irrelevant in naturally occurring speech in contrast to its central role in music (Margulis, 2013). An artificial form of speech has the potential to blur the boundaries between music and language (Deutsch et al., 2008): a spoken phrase played in a loop, or repeated several times. I built on previous research in the first part of the study, exploring the effects of exact repetition and deviations from it on a well-known musical illusion. I brought together previously disparate strands of research in the second part, looking at the effects of repetition on involuntary mental imagery.

Acoustic repetition in the speech-to-song illusion

It has been long known that single spoken phrases repeated several times in quick succession come to be perceived as sung, a phenomenon referred to as the speech-to-song (STS) illusion (Deutsch et al., 2008). There has been extensive inquiry into the acoustic features of speech that may contribute to it (Falk et al., 2014; Groenveld et al., 2020; Rathcke et al., 2021; Tierney et al., 2018) as well as some work demonstrating the generalizability of the illusion to nonspeech stimuli (Margulis & Simchy-Gross, 2016; Rowland et al., 2019; Simchy-Gross & Margulis, 2018). With regard to speech, it has been postulated that the mechanisms underpinning pitch perception may be less active, so as to facilitate the processing of phonological and, in turn, semantic information, but such mechanisms may be activated following consecutive repetitions (Deutsch et al., 2011). In other words, when speech perception is divorced from the taxing process of extracting meaning, attentional resources may be directed to the perception of pitch information.

Although the effects of repetition on the extent to which a stimulus is perceived as music-like have been well studied, less is known about the effect of deviations from exact repetition on the STS illusion. In previous studies, researchers have transposed some of the repetitions in a sequence and looked at how this affects listeners’ ratings of musicality (Deutsch et al., 2011; Vanden Bosch Der Nederlanden et al., 2015). Transposition—the shifting of notes in a melody up or down while the same series of intervals is preserved—is processed in a relative rather than absolute fashion, with only perfect pitch possessors perceiving transposed melodies as distinct (Dowling & Harwood, 1986). The processing of tonal patterns in this relative way is a part of music cognition emerging very early in human development (Chang & Trehub, 1977; Trehub et al., 1984), and our auditory world abounds with transposed melodies, with transposed themes being prevalent in Western music (Kamien & Wagner, 1997; Mason, 1962; Ricci, 2017; Russ, 2012; Yust, 2018). Given the widespread presence of the device in compositional practice, the question arises as to whether pitch transposition in the context of the STS illusion reinforces the notion that listeners shift, when they hear it, to perceiving the stimulus as music rather than speech. The STS illusion is less likely to occur when the stimulus is transposed, at least by small non-tempered intervals such as two-thirds of a semitone, with participants rating transposed versions as less song-like than untransposed ones (Deutsch et al., 2011; Vanden Bosch Der Nederlanden et al., 2015). It is yet to be established whether transposing the stimulus by an interval more likely to be heard in everyday life, such as an equal-tempered major second, also reduces the likelihood of producing the illusion. Aside from the magnitude and precision of the transposition, the resulting pattern may be less or more musical depending on whether the locus of the manipulation mirrors compositional practice; this was an additional concern in this study. In the case of an ostinato, a short musical phrase is repeated and, in some cases, transposed to another pitch (Encyclopaedia Britannica, 2013). Phrases may also take a question-and-answer form, the question being played or sung at a higher pitch and the answer at a lower pitch (Snoman, 2012). Employing such manipulations in the context of the STS illusion can help address the question of whether any deviation from exact repetition compromises the illusion, as previously shown (Deutsch et al., 2011; Vanden Bosch Der Nederlanden et al., 2015), or whether the type of transposition and the extent to which it is perceived as music-like may also have a bearing on the listener’s experience.

Perceived repetition in earworms (involuntary musical imagery)

A separate body of literature shows that repetition can be experienced as an internal or imagined phenomenon in the form of an earworm, or involuntary (and typically musical) imagery. Earworms have been referred to anecdotally in literature dating back to the 19th and early 20th century (Sacks, 2007) and are described as snippets of music playing in a loop in one’s head beyond conscious control (Beaman & Williams, 2010). They seem to be prevalent in the general population (Liikkanen, 2008), but engagement in musical activities has been positively associated with increased frequency or duration of earworms in several studies (Beaty et al., 2013; Floridou et al., 2015; Hyman et al., 2013; Liikkanen, 2012b; Liptak et al., 2022; Moeck et al., 2018). More generally, repeated exposure to the music that is the source of the earworm may lead to more persistent imagery (Byron & Fowles, 2015; Killingly & Lacherez, 2023; Liikkanen, 2012b; Liptak et al., 2022). The findings of these studies of earworms could thus be taken as an indication of a relationship between two apparently distinct types of repetition. The first type of repetition is contained within the external stimulus itself (e.g., repetitive music); the second type of repetition is imagined and involuntary (i.e., an earworm).

Verbal earworms equivalent to musical earworms but involving snippets of speech playing in a loop remain comparatively understudied. Verbal earworms have been found to be less prevalent than musical earworms (Halpern & Bartlett, 2011; Tillmann et al., 2023). One possible explanation is that musical earworms derive from a richer tapestry of acoustic information than speech and are thus activated by a wider variety of cues; another is that people are more likely to be exposed to music than to public speeches or recitations of poetry, the purported sources of verbal earworms (Halpern & Bartlett, 2011). Despite the scarcity of research on verbal earworms, the majority of musical earworms reported in some previous work originated in music containing lyrics (Halpern & Bartlett, 2011; Liikkanen, 2012b; Tillmann et al., 2023). Furthermore, participants report the most vivid components of musical imagery to be not only melody but also lyrics (Bailes, 2007), and exposure to music with lyrics has been associated with longer earworm episodes (Liptak et al., 2022). Hence, although it is music that is typically presumed to be sticky, verbal content appears to be a common component of musical imagery.

If repetition contained within an external stimulus and repetition experienced as an earworm are somehow connected, as suggested above, it may be the repetition itself that causes the involuntary musical imagery rather than some other feature of the stimulus. On the basis of what is known about the longer-term effects of repeated exposure to musical content (Byron & Fowles, 2015; Killingly & Lacherez, 2023; Liikkanen, 2012b; Liptak et al., 2022), it might be predicted that repetition of a stimulus would produce more persistent involuntary imagery than the same stimulus presented once, and this prediction could be tested in controlled conditions. Moreover, evidence linking earworms with a stimulus presented recently (Bailes, 2007; Byron & Fowles, 2015; Floridou et al., 2012; Halpern & Bartlett, 2011; Hyman et al., 2013; Liikkanen, 2012a; Williamson et al., 2012) suggests that involuntary imagery is even more likely to be induced if a stimulus is presented in controlled conditions and the participant is tested immediately afterwards. Earworms have been elicited and studied experimentally in previous research (e.g., Beaman et al., 2015; Hyman et al., 2013; Killingly et al., 2021; Liikkanen, 2012a). Even more relevant to the present investigation is the study of earworms in the lab by intending to induce thought suppression, or the deliberate attempt to ignore a thought (Beaman et al., 2015). This approach was borrowed from earlier studies showing that mental images remain active in consciousness even when people are instructed to suppress them (Wegner et al., 1987). Given the evidence of the role of lyrics in imagery discussed earlier (Bailes, 2007; Halpern & Bartlett, 2011; Liikkanen, 2012b; Liptak et al., 2022; Tillmann et al., 2023), a spoken phrase could serve as a source of involuntary imagery if repeated several times in quick succession. To explore this possibility, I used the thought-suppression paradigm in the second part of this study to compare participants’ ability to mentally control and ignore single phrases, on one hand, and as repeated repetitions of the same phrases, on the other; in the latter case, I adopted the stimulus presentation mode employed in the STS-illusion literature. In this way, I looked at whether repeated repetitions of a spoken phrase can produce a verbal earworm, albeit only in the short term. In an effort to disentangle the effect of repetition per se from listeners’ perception of stimuli as music-like, I used as stimuli only spoken phrases that had already been found by Tierney et al. (2013) not to yield the STS illusion when repeated.

This study was the first, so far as I know, to examine repetition contained within an external stimulus in conjunction with repetition experienced involuntarily in the imagination. My first aim was to find out if the two kinds of repetition are linked; my second aim was to lay the ground for further exploration. Although I used pitch transpositions only in the first part of the study, they could also be used in future research similar to the second part of this study to test the effects of different types of repetition on short-term involuntary imagery.

Method

Ethics

The study received ethical approval from the local ethics officer at the Department of Linguistics at University College London (approval no. LING-2023-04-06). Participants gave informed consent prior to taking part in the study and were treated according to the ethical standards of the Declaration of Helsinki.

Participants

All participants were recruited online on Prolific (https://www.prolific.co). Exclusion criteria included formal music training and/or a diagnosis of dyslexia, other literacy difficulties, and autism. These exclusion criteria were set to ensure that no unwarranted variability in speech and music processing would affect the results. A total of 111 English-speaking monolingual individuals (69 female, 41 male) aged between 19 and 36 years (M = 27.48, SD = 4.39) took part in the study. Each participant carried out the same two tasks, one in each of the two parts of the study. The tasks were administered and completed online via Gorilla Experiment Builder (for more details on this platform, see Anwyl-Irvine et al., 2020).

Materials and procedure

Task I

The raw stimuli for this task were taken from Tierney et al. (2013) with permission from the first author. The stimuli consisted of spoken phrases of varying lengths that were found to be perceived as sung when repeated eight times (Tierney et al., 2013), such as “I have had nothing since breakfast” and “Snags and sandbars.”¹ For the purposes of the present task, target phrases were looped to produce a total of eight repetitions separated by 800 ms pauses.

The task comprised three conditions with stimuli presented as follows: sequences made up of eight identical presentations (original repetitions), sequences whereby the second, third, fourth, fifth, sixth, and seventh presentations were transposed up by two semitones (intervening transpositions), and sequences in which the second, fourth, sixth, and eighth presentations were transposed up by two semitones (alternating transpositions). The intervening-transpositions condition had been used in previous research (Deutsch et al., 2011; Vanden Bosch Der Nederlanden et al., 2015); I added the alternating-transpositions condition to find out if question-and-answer patterns reminiscent of question-and-answer musical stimuli are perceived as even more song-like, as discussed above. The transpositions were applied using the overlap-add method in Praat software.

The experimental manipulations were counterbalanced so that each spoken phrase served as a stimulus for all three conditions (original repetitions, intervening transpositions, and alternating transpositions). A total of 72 stimuli were generated and three stimulus lists were created, each including 24 stimuli presented in fixed random order.

Participants were randomly assigned to one of the three lists and required to rate target stimuli on a 5-point Likert-type scale from 1 (exactly like speech) to 5 (exactly like song) by clicking the appropriate button on the computer screen. They were reminded of the meaning of each of the rating scale categories in every trial. As the stimuli had previously been shown to induce the illusion when repeated (Tierney et al., 2013), participants in this study were required only to provide a single rating at the end of each trial so that I could compare the mean ratings per stimulus obtained in each of the three conditions.

Task II

Again, the raw stimuli used in this task were taken from Tierney et al. (2013). The stimuli consisted of spoken phrases previously found to be heard as speech—not song—when repeated. This task included two conditions: spoken phrases presented once (single presentation) and spoken phrases repeated eight times, the repetitions separated by 800 ms pauses (repeated presentation), as in Task I (original repetitions).

The experimental manipulations were counterbalanced so that the same spoken phrases served both as single presentations and repeated presentations. A total of 28 stimuli were generated and two stimulus lists were constructed, each including 14 stimuli presented in fixed random order. Participants were randomly assigned to one of the lists. Due to the passive nature of the task (see below), not all the raw stimuli from Tierney et al. (2013) were used, to ensure that participants would remain focussed on the task throughout the session.

Participants were explicitly asked to try not to form mental images of (i.e., suppress) the stimuli presented, in line with the thought-suppression conditions described by Wegner et al. (1987) and an earworm study by Beaman et al. (2015). This paradigm was adopted to provide a baseline for the two conditions; I assumed that participants would be able to suppress the stimuli to some extent regardless of condition, but a greater number of clicks would indicate more persistent imagery. After being exposed to each stimulus, participants were asked not to form a mental image of it for one minute but to click a button if it came to mind.

Results

Task I

A linear mixed-effects model was fitted to analyze the data from the first task. The type of manipulation (original repetitions, intervening transpositions, and alternating transpositions) was modeled as a fixed effect. Participants and trials were modeled as random effects. Significance testing was conducted using the lmerTest (Kuznetsova et al., 2017) and lme4 (Bates et al., 2015) packages. Effect sizes were computed using the effectsize (Ben-Shachar et al., 2020) package. Post-hoc tests applying the false discovery rate (FDR) for multiple comparisons were carried out using the emmeans package (Lenth, 2020).

Analysis of variance revealed that the type of manipulation was statistically significant, F(2, 2460) = 15.40, p < .001, η_p² = .01, albeit with a small effect size. Post-hoc pairwise comparisons with FDR corrections showed that alternating transpositions were perceived as more song-like than original repetitions, p < .001, d = .11; intervening transpositions were perceived as more song-like than original repetitions, p < .001, d = .08. There was no significant difference between alternating and intervening transpositions, p = .11, d = .03. Figure 1 shows the corresponding boxplots for the data broken down by type of manipulation.

Figure 1.

Boxplots showing participants’ responses, that is, mean ratings for all phrases by type (alternating, intervening, and original), with higher values indicating stimuli perceived as more song-like. Post-hoc pairwise comparisons showed that both alternating and intervening repetitions were perceived as more song-like than the original version including verbatim repetitions, *** p < .001. Thick horizontal lines represent the medians; black dots represent outliers.

Analysis of mean ratings per stimulus revealed variation in the extent to which the stimuli in the three conditions were perceived as song-like (see Figure 2. Stimuli appear in the same order as in Tierney et al., 2013). Stimuli 1, 2, 3, and 11 (“Here is no less,” “Gave the houses,” “Snags and sandbars,” and “People in the neighbourhood” respectively) were perceived as more musical. Stimuli 4, 14, 16, and 23 (“And his two sisters,” “Linen of this sort in public,” “I have had nothing since breakfast,” and “The prince continued to struggle”) were perceived as least song-like.

Figure 2.

Stacked barplots depicting responses (mean ratings), with different colors in each bar representing different manipulation types and higher ratings indicating stimuli perceived as more song-like. To listen to the original sentences created by Tierney et al. (2013), follow the link to the supplementary data in their article.

Task II

A Poisson regression was originally selected for the second task but due to the overdispersion of the data, a negative binomial regression was fitted to model the number of times participants clicked a button as a function of the type of presentation (single vs. repeated). Results including zero counts showed that repeated presentations were associated with a statistically significantly higher rate of clicks compared to single presentations, β =.48, SE = .06, z-value = 7.20, p < .001. The exponentiated value of the regression coefficient, e^0.48 = 1.61, points to a 61% increase in clicks for repeated presentations compared to single presentations (see Figure 3 for boxplots comparing the two conditions).

Figure 3.

Boxplots displaying mean numbers of clicks for repeated and single presentations, *** p < .001. Thick horizontal lines represent the medians; black dots represent outliers.

Some phrases were associated with a larger mean number of clicks when repeated than others, as shown in the top panel of Figure 4. This does not include zero counts, however, and should be read in conjunction with the bottom panel, which shows that zero-count clicks were primarily associated with single presentations. Single presentations were also more frequently associated with one or two clicks, while repeated presentations were more frequently associated with larger numbers of clicks.

Figure 4.

(a) (Upper panel): Stacked barplots depicting mean numbers of clicks corresponding to single and repeated presentations. Note that this panel does not take zero counts into consideration. Figure 4(b) (lower panel): Total number of clicks per condition for different count ranges including zero counts. For more information about the stimuli, follow the link to the supplementary data by Tierney et al. (2013) and see Table S1 in the supporting information file. They did not number the stimuli but Stimulus 1 in this study corresponds to the first phrase of the Song stimuli in Table S1 and Stimulus 25 in this study corresponds to Stimulus 1 of the Speech stimuli in Table S1.

Discussion

In the first part of the study, I demonstrated that transposing repetitions of a looped spoken phrase up by two semitones not only did not disrupt the STS illusion, but actually enhanced it; listeners perceived the stimuli containing transposed repetitions as significantly more song-like than those comprising exact repetitions. The locus of the transposition, intervening (whereby only the first and last repetitions remained at the original pitch) and alternating (whereby original and transposed presentations alternated), was not associated with statistically significantly different ratings of perceived musicality. Having borrowed the structure of the stimuli from the STS illusion paradigm, I found that repeated repetitions led to more persistent involuntary mental imagery in the second part of the study; participants found it significantly harder to suppress looped phrases than phrases presented in isolation.

The findings of the first part of this study contrast with those of previous studies suggesting that repeated repetitions of the same stimulus are perceived as more music-like than transposed ones (Deutsch et al., 2011; Vanden Bosch Der Nederlanden et al., 2015). I explain this discrepancy by reference to the type of transposition applied; I transposed repetitions by a standard musical interval instead of fractions of semitones and used patterns of manipulations (question-and-answer) that could be considered more musical. I can only speculate on the psychological mechanisms underlying participants’ responses at this point. Regarding untransformed repetitions, Deutsch et al. (2011) showed that, having heard a spoken phrase in a looped form, listeners distort the original pitch content when they sing it back, presumably for perceptual reasons. The authors further reasoned that repetition helps listeners access pitch more easily and draw on their stored memories for music.

In similar vein, listeners may draw on common transpositions they have already absorbed; this may partially account for the present results regarding transposed repetitions. Transposition is a device prevalent in compositional practice (Kamien & Wagner, 1997; Mason, 1962; Ricci, 2017; Russ, 2012; Yust, 2018), contributing to a soundscape in which transposed melodies are linked to the perception of musicality. Yet the size of the transposition interval and the precision of the interval, in terms of Western tonality, may be crucial; musical themes are more likely to be transposed by an equal-tempered major second than, say, two-thirds of a semitone. If transpositions by standard musical intervals are part of listeners’ stored representations, the perception of musicality that they confer may be transferred into the speech domain. Moreover, alternations between a transposed phrase and its lower-in-pitch original version may mirror question-and-answer patterns heard in music (Snoman, 2012), thus sounding more musical than the sequences of transpositions used in previous research (Deutsch et al., 2011; Vanden Bosch Der Nederlanden et al., 2015), although I found no significant difference between the extents to which intervening and alternating transpositions were perceived as song-like.

Further methodological differences that may have contributed to the differences between the present findings and those of previous studies must be considered. First, I used raw stimuli that had been shown in previous research to produce the STS illusion when repeated (Tierney et al., 2013), but instead of collecting ratings after each presentation, I only collected ratings after each trial. This allowed me to compare the three conditions without requiring participants to rate every single repetition. However, the phenomenological experience of the participants in the two studies may have been different, since listening to an intact stimulus (as in this study) is not the same as listening to a stimulus requiring multiple ratings (as in Tierney et al., 2013). Second, the direction of the transposition may have also had a bearing on the results, as I only transposed the original stimuli up, and not down. It is worth emphasizing that repetitions were rated more song-like when they were transposed up than when they were transposed down in the study by Vanden Bosch Der Nederlanden et al. (2015), even though it is still of note that their high transpositions did not boost the illusion. Finally, the discrepancy between my findings and those of previous studies may or may not be attributable to the use of equal-tempered transpositions; this would need to be addressed in future research using a more fine-grained design. Overall, the current findings do not support the suggestion, based on previous results, that the STS illusion can only occur if repetitions are exact (Deutsch et al., 2011; Vanden Bosch Der Nederlanden et al., 2015). They suggest, rather, that the illusion may be enhanced when there is deviation from an exact repetition, depending on type of transposition and the musicality of the pattern it produces.

Repetition is understood to be at the heart of the STS illusion, as non-speech sounds, as well as speech sounds, can be perceived as more like music when they are repeated (Margulis & Simchy-Gross, 2016; Rowland et al., 2019; Simchy-Gross & Margulis, 2018). Music is typically highly repetitive (Ockelford, 2009); however, further work is warranted to find out if repetition explains the STS illusion fully or only in part. Although the illusion is thought to stem from repetition, it could be that listeners perceive sequences of non-musical stimuli as musical whenever the structure of the sequence parallels a particular musical device. Repetition can be understood as an ostinato, with examples including phrases that are repeated (as in all studies on the STS) or transposed by an equal-tempered interval (as in the current study). It remains to be seen whether other musical devices that do not involve repetition produce a similar illusion.

The results of the second part of the study show that participants found it harder to suppress spoken phrases when they were repeated than when they were presented in isolation. Even though the literature on earworms describes different settings and time windows, frequent or repeated material is associated with greater imagery (Byron & Fowles, 2015; Killingly & Lacherez, 2023; Liikkanen, 2012b; Liptak et al., 2022). Recent exposure also increases musical imagery (Bailes, 2007; Byron & Fowles, 2015; Floridou et al., 2012; Halpern & Bartlett, 2011; Hyman et al., 2013; Liikkanen, 2012a; Williamson et al., 2012), as suggested by my findings. This was not the focus of my inquiry, however, so I did not include a control condition that would have enabled me to test length of exposure. Nevertheless, I used stimuli previously rated non-musical, insofar as they had not produced the STS illusion in previous research (Tierney et al., 2013), so it is likely that higher involuntary imagery (inferred from button clicks representing hard-to-suppress stimuli) was the result of repetition rather than perceived musicality. Future research is warranted to find out if the set of stimuli found to induce the illusion would also produce higher involuntary imagery, or not. A potential practice effect precluded testing for this in the present study, as my participants had already been exposed to the illusion-inducing sentences in the first part of the study.

Verbal earworms are the phenomenon closest to the type of involuntary verbal imagery identified in this study. They seem to be rarer than musical earworms (Halpern & Bartlett, 2011; Tillmann et al., 2023). In the past, this has been explained by reference to the range of possible activation cues afforded by music (e.g., pitch, harmony, timbre, and lyrics), which is wider than that afforded by speech, and the relative lack of exposure to sources of verbal earworms (e.g., speeches, poetry) in comparison to exposure to music (Halpern & Bartlett, 2011). I propose a novel explanation on the basis of the current results: as repetition is rare in everyday speech (Margulis, 2013), fewer episodes of involuntary imagery stem from speech. Future research could find out if hearing looped phrases, as in this study, produces verbal earworms in listeners (i.e., they are stored in long-term memory) if participants were followed up.

The findings of previous studies have shown that verbal content in the form of song lyrics is a common component of musical imagery (Bailes, 2007; Halpern & Bartlett, 2011; Liikkanen, 2012b; Liptak et al., 2022; Tillmann et al., 2023). The findings of this study show that the verbal content of repeated spoken phrases can produce involuntary imagery at least in the short term. People with congenital amusia, a neurogenetic disorder affecting music and speech processing, have recently been found also to report musical earworms (Loutrari et al., 2022; Tillmann et al., 2023). Interestingly, verbal earworms seem to be more prevalent in people with congenital amusia than neurotypical controls (Tillmann et al., 2023), but people with amusia, when asked to imagine musical, vocal, and environmental sounds, find it harder than neurotypical controls to control the transition from imagining one sound to another (Loutrari et al., 2022). It would therefore be interesting to explore the effects of repeated repetitions of speech on people with amusia and find out if they can suppress involuntary imagery given their difficulty to control imagined sound transitions

In conclusion, this study replicated previous research in which repeated repetitions of a spoken phrase produce the STS illusion, but my findings suggest that the illusion can be enhanced by presenting not verbatim repetitions but patterns of repetition that deviate in a musical way. Further research is warranted to determine more precisely the kinds of deviation that are more or less likely to produce the illusion. I also used the STS illusion paradigm to test whether repeated repetitions of a spoken phrase are associated with more persistent involuntary auditory imagery in the short term. Further research is needed to find out if involuntary imagery is a function of the extent to which the repetitions of a spoken phrase are perceived as song-like or whether it is a function of repetition per se. This could be achieved in a future thought-suppression study using both illusion-inducing and non-inducing stimuli to compare musical and verbal involuntary imagery.

All the original stimuli are available in the supplemental materials by Tierney et al. (2013). Readers can also access the transcribed sentences corresponding to the stimuli; the second table in the supporting information file contains the phrases that were shown to be perceived as song-like in the original study.

Footnotes

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for profit sectors.

ORCID iD

Ariadne Loutrari

Notes

References

Anwyl-Irvine

Massonnié

Flitton

Kirkham

Evershed

(2020). Gorilla in our midst: An online behavioral experiment builder. Behavior Research Methods, 52(1), 388–407. https://doi.org/10.1101/438242

Bailes

(2007). The prevalence and nature of imagined music in the everyday lives of music students. Psychology of Music, 35(4), 555–570. https://doi.org/10.1177/0305735607077834

Bates

Mächler

Bolker

B. M.

Walker

S. C.

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1). https://doi.org/10.18637/jss.v067.i01

Beaman

C. P.

Powell

Rapley

(2015). Want to block earworms from conscious awareness? B(u)y gum! Quarterly Journal of Experimental Psychology, 68(6), 1049–1057. https://doi.org/10.1080/17470218.2015.1034142

Beaman

C. P.

Williams

T. I.

(2010). Earworms (“stuck song syndrome”): Towards a natural history of intrusive thoughts. British Journal of Psychology, 101(4), 637–653. https://doi.org/10.1348/000712609X479636

Beaty

R. E.

Burgin

C. J.

Nusbaum

E. C.

Kwapil

T. R.

Hodges

D. A.

Silvia

P. J.

(2013). Music to the inner ears: Exploring individual differences in musical imagery. Consciousness and Cognition, 22(4), 1163–1173. https://doi.org/10.1016/j.concog.2013.07.006

Ben-Shachar

Lüdecke

Makowski

(2020). effectsize: Estimation of effect size indices and standardized parameters. Journal of Open Source Software, 5(56), 2815. https://doi.org/10.21105/joss.02815

Byron

T. P.

Fowles

L. C.

(2015). Repetition and recency increases involuntary musical imagery of previously unfamiliar songs. Psychology of Music, 43(3), 375–389. https://doi.org/10.1177/0305735613511506

Chang

H. W.

Trehub

S. E.

(1977). Auditory processing of relational information by young infants. Journal of Experimental Child Psychology, 24(2), 324–331. https://doi.org/10.1016/0022-0965(77)90010-8

10.

Deutsch

Henthorn

Lapidis

(2011). Illusory transformation from speech to song. The Journal of the Acoustical Society of America, 129(4), 2245–2252. https://doi.org/10.1121/1.3562174

11.

Deutsch

Lapidis

Henthorn

(2008). The speech-to-song illusion. The Journal of the Acoustical Society of America, 124, 2471.

12.

Dowling

W. J.

Harwood

D. L.

(1986). Music cognition. Academic Press.

13.

Ehrensberger

(1945). An experimental study of the relative effectiveness of certain forms of emphasis in public speaking. Speech Monographs, 12(1), 94–111. https://doi.org/10.1080/03637754509390108

14.

Encyclopaedia Britannica. (2013). Ostinato. In Encyclopaedia Britannica (Sept ed.). https://www.britannica.com/art/ostinato

15.

Falk

Rathcke

Bella

S. D.

(2014). When speech sounds like music. Journal of Experimental Psychology: Human Perception and Performance, 40(4), 1491–1506. https://doi.org/10.1037/a0036858

16.

Floridou

G. A.

Williamson

V. J.

Müllensiefen

(2012). Contracting earworms: The roles of personality and musicality. In Cambouropoulos

Tsougras

Mavromatis

Pastiadis

(Eds.), Proceedings of the 12th ICMPC-ESCOM Conference (302–310). Causal Productions: School of Music Studies, Aristotle University of Thessaloniki.

17.

Floridou

G. A.

Williamson

V. J.

Stewart

Müllensiefen

(2015). The involuntary musical imagery scale (IMIS). Psychomusicology: Music, Mind, and Brain, 25(1), 28–36. https://doi.org/10.1037/pmu0000067

18.

Groenveld

Burgoyne

J. A.

Sadakata

(2020). I still hear a melody: Investigating temporal dynamics of the speech-to-song illusion. Psychological Research, 84(5), 1451–1459. https://doi.org/10.1007/s00426-018-1135-z

19.

Halpern

A. R.

Bartlett

J. C.

(2011). The persistence of musical memories: A descriptive study of earworms. Music Perception, 28(4), 425–431. https://doi.org/10.1525/mp.2011.28.1.425

20.

Hoff-Ginsberg

(1985). Some contributions of mothers’ speech to their children’s syntactic growth. Journal of Child Language, 12(2), 367–385. https://doi.org/10.1017/S0305000900006486

21.

Hyman

I. E.

Burland

N. K.

Duskin

H. M.

Cook

M. C.

Roy

C. M.

McGrath

J. C.

Roundhill

R. F.

(2013). Going gaga: Investigating, creating, and manipulating the song stuck in my head. Applied Cognitive Psychology, 27(2), 204–215. https://doi.org/10.1002/acp.2897

22.

Kamien

Wagner

(1997). Bridge themes within a chromaticized voice exchange in Mozart expositions. Music Theory Spectrum, 19(1), 1–12. https://doi.org/10.2307/745996

23.

Killingly

Lacherez

(2023). The song that never ends: The effect of repeated exposure on the development of an earworm. Quarterly Journal of Experimental Psychology, 76(11). https://doi.org/10.1177/17470218231152368

24.

Killingly

Lacherez

Meuter

(2021). Singing in the brain: Investigating the cognitive basis of earworms. Music Perception, 35(8), 456–472.

25.

Kuznetsova

Brockhoff

P. B.

Christensen

R. H. B.

(2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13). https://doi.org/10.18637/jss.v082.i13

26.

Lenth

(2020). Emmeans: Estimated marginal means, aka least-squares means (R package version 1.4.5.e). https://cran.r-project.org/package=emmeans

27.

Liikkanen

L. A.

(2008). Music in everymind: Commonality of involuntary musical imagery. In Miyazaki

Adachi

Hiraga

Nakajima

Tsuzaki

(Eds.), Proceedings of the 10^th ICMPC-ESCOM Conference 408–412. Causal Productions: ICMPC10.

28.

Liikkanen

L. A.

(2012a). Inducing involuntary musical imagery: An experimental study. Musicae Scientiae, 16(2), 217–234. https://doi.org/10.1177/1029864912440770

29.

Liikkanen

L. A.

(2012b). Musical activities predispose to involuntary musical imagery. Psychology of Music, 40(2), 236–256. https://doi.org/10.1177/0305735611406578

30.

Liptak

T. A.

Omigie

Floridou

G. A.

(2022). The idiosyncrasy of involuntary musical imagery repetition (IMIR) experiences: The role of tempo and lyrics. Music Perception, 39(3), 320–338.

31.

Loutrari

Ansell

Philip Beaman

Jiang

Liu

(2022). Auditory imagery in congenital amusia. Musicae Scientiae, 27, 698–716. https://doi.org/10.1177/10298649221122870

32.

Margulis

E. H.

(2013). Repetition and emotive communication in music versus speech. Frontiers in Psychology, 4, 1–4. https://doi.org/10.3389/fpsyg.2013.00167

33.

Margulis

E. H.

Simchy-Gross

(2016). Repetition enhances the musicality of randomly generated tone sequences. Music Perception, 33(4), 509–514.

34.

Mason

(1962). Form in Shostakovich’s quartets. The Musical Times, 103(1434), 531–533.

35.

McRoberts

G. W.

McDonough

Lakusta

(2009). The role of verbal repetition in the development of infant speech preferences from 4 to 14 months of age. Infancy, 14(2), 162–194. https://doi.org/10.1080/15250000802707062

36.

Moeck

E. K.

Hyman

I. E.

Takarangi

M. K. T.

(2018). Understanding the overlap between positive and negative involuntary cognitions using instrumental earworms. Psychomusicology: Music, Mind, and Brain, 28(3), 164–177. https://doi.org/10.1037/pmu0000217

37.

Ockelford

(2009). Similarity relations between groups of notes: Music-theoretical and music-psychological perspectives. Musicae Scientiae, 13(1), 47–98. https://journals.sagepub.com/doi/abs/10.1177/102986490901300104

38.

Rathcke

Falk

Dalla Bella

(2021). Music to your ears: Sentence sonority and listener background modulate the “speech-to-song illusion.” Music Perception, 38(5), 499–508.

39.

Ribeiro

A. C.

(2007). Intending to repeat: A definition of poetry. Journal of Aesthetics and Art Criticism, 65(2), 189–201. https://doi.org/10.1111/j.1540-594X.2007.00249.x

40.

Ricci

(2017). The pump-up in pop music of the 1970s and 1980s. Music Analysis, 36(1), 94–115. https://doi.org/10.1111/musa.12086

41.

Rowland

Kasdan

Poeppel

(2019). There is music in repetition: Looped segments of speech and nonspeech induce the perception of music in a time-dependent manner. Psychonomic Bulletin and Review, 26(2), 583–590. https://doi.org/10.3758/s13423-018-1527-5

42.

Russ

(2012). Bartok, Beethoven and the sonata for two pianos and percussion. Journal of the Royal Musical Association, 137(2), 307–349.

43.

Sacks

(2007). Musicophilia: Tales of music and the brain. Picador.

44.

Schwab

J. F.

Rowe

M. L.

Cabrera

Lew-Williams

(2018). Fathers’ repetition of words is coupled with children’s vocabularies. Journal of Experimental Child Psychology, 166, 437–450. https://doi.org/10.1016/j.jecp.2017.09.012

45.

Simchy-Gross

Margulis

E. H.

(2018). The sound-to-music illusion: Repetition can musicalize nonspeech sounds. Music and Science, 1, 1–6. https://doi.org/10.1177/2059204317731992

46.

Snoman

(2012). The dance music manual: Tools, toys and techniques. CRC Press.

47.

Tierney

Dick

Deutsch

Sereno

(2013). Speech versus song: Multiple pitch-sensitive areas revealed by a naturally occurring musical illusion. Cerebral Cortex, 23(2), 249–254. https://doi.org/10.1093/cercor/bhs003

48.

Tierney

Patel

A. D.

Breen

(2018). Acoustic foundations of the speech-to-song illusion. Journal of Experimental Psychology: General, 147(6), 888–904. https://doi.org/10.1037/xge0000455

49.

Tillmann

Fornoni

Agathe

Hoarau

Albouy

Caclin

(2023). Earworms in the amusic mind? Questionnaire investigation in congenital amusia. Music and Science, 6. https://doi.org/10.1177/20592043231166332

50.

Trehub

S. E.

Bull

Thorpe

L. A.

(1984). Infants’ perception of melodies: The role of melodic contour. Child Development, 55(3), 821–830.

51.

Vanden Bosch Der Nederlanden

C. M.

Hannon

E. E.

Snyder

J. S

. (2015). Everyday musical experience is sufficient to perceive the speech-to-song illusion. Journal of Experimental Psychology: General, 144(2), e43–e49. https://doi.org/10.1037/xge0000056

52.

Wegner

D. M.

Schneider

D. J.

Carter

S. R.

White

T. L.

(1987). Paradoxical effects of thought suppression. Journal of Personality and Social Psychology, 53(1), 5–13. https://doi.org/10.1037/0022-3514.53.1.5

53.

Williamson

V. J.

Jilka

S. R.

Fry

Finkel

Müllensiefen

Stewart

(2012). How do “earworms” start? Classifying the everyday circumstances of involuntary musical imagery. Psychology of Music, 40(3), 259–284. https://doi.org/10.1177/0305735611418553

54.

Yust

(2018). Organized time: Rhythm, tonality, and form. Oxford University Press.