Abstract
While musicologists have long noted that triplet rhythms evoke sensations of rotation in listeners, no theory has been proposed to account for this apparent association. To investigate this phenomenon, 33 excerpts of “spinning, rotating, twirling, or swirling” music were crowd-sourced from an online discussion forum. Analysis revealed a prominence of fast, repeated, isochronous patterns using stepwise pitch movement, with significantly more compound meters than generally found in Western music. Inspired by ecological acoustics, an Ecological Theory of Rotating Sounds (ETRoS) is proposed to explain these associations. The theory maps patterns of loudness fluctuations to trajectories of rotating sound sources. Two experiments tested the theory. In Experiment A, listeners rated how much binary, ternary, quaternary, and quinary figures (of 2–5 notes) evoked sensations of rotation. Experiment B used a two-alternative forced-choice paradigm pitting ecological quaternary stimuli (strong-medium-weak-medium) against unecological stimuli with permuted stress values more typical of Western music (strong-weak-medium-weak). Results indicate that perceived rotation increases with tempo and is poorly evoked by binary rhythms. Loudness patterns consistent with rotating trajectories were perceived as more rotating than unecological patterns—but only when pitch was also moving. Altogether, moderate support is provided for an acoustic-ecological account of rotating sounds.
Introduction
Music commentators have long observed that music seems to have a special affinity with movement (e.g., Levitin, Grahn, & London, 2018). Even apart from dancing and the necessary movements of performers, music commonly evokes kinetic impressions among otherwise immobile listeners. In addressing music’s action affinity, researchers have employed various theoretical approaches. One example is embodied cognition, where perceptual and cognitive processes are linked to the human body and to the body situated in an environment (e.g., Godøy & Leman, 2010; Leman, 2007; Schiavio, 2014). Extending visuomotor mirror neurons to the auditory modality, others have demonstrated sound–action connections consistent with the existence of auditory-evoked mirror responses in supplementary motor areas (Aziz-Zadeh, Iacoboni, Zaidel, Wilson, & Mazziotta, 2004; Buccino et al., 2005).
Another approach to addressing music’s action affinity is research that draws on metaphor theory (Lakoff & Johnson, 1980; Zbikowski, 2002, 2008). Well-documented metaphors include the cross-modal association between pitch height and spatial elevation (Jeschonek, Pauen, & Babocsai, 2013; Wagner, Winner, Cicchetti, & Gardner, 1981; Walker et al., 2010) and the association of loudness with visual brightness (Lewkowicz & Turkewitz, 1980). Some cross-modal mappings seem straightforwardly related to physical phenomena, such as the association between decreasing loudness and increasing distance (see Neuhoff, 1998). Similarly, the association between pitch height and object size might be explained by the simple acoustic relationship between higher frequency and smaller mass or volume (Eitan & Timmers, 2010; Marks, 1987; Mondloch & Maurer, 2004). However, other mappings seem to defy simple explanation. For example, an association between rising pitch and increasing size has been widely reported (Antovic, 2009; Eitan, Schupak, Gotler, & Marks, 2014; Kim & Iwamiya, 2008). Yet this finding appears to contradict the static association of high pitch with small size. That is, lower pitch is larger, yet falling pitches shrink (Eitan, 2013). Research has also chronicled various mapping asymmetries; a change in one direction (such as rising pitch) may evoke a stronger metaphorical relationship than an equivalent change in the opposite direction (see Eitan, 2013; Eitan & Granot, 2006). Aware of this complexity, Eitan (2013) has proposed multiple underlying explanations. Specifically, while some cross-modal associations are innate, others might arise from multisensory integration, represent true synesthesia, be learned from exposure to common environmental occurrences, or simply originate in unique cultural expressions (see Levitin et al., 2018).
In the current study, we focus on a single sound-movement association related to spinning, rotating, or twirling. We offer an account based on yet another theoretical approach, namely, “ecological acoustics”—an approach that has not been featured in previous music-related research.
In his book, La Musique et l’Ineffable, Vladimir Jankélévitch (1961) wrote of “our sense that triplets ‘whirl’” (p. 91). Indeed, music listeners can readily point to musical examples in which subdivision of beats into three is associated with rotation, twirling, whirling, or spinning. Famous examples include Franz Schubert’s “Gretchen am Spinnrade,” Op. 2, D.118, and Bedřich Smetana’s swirling water motion at the beginning of “Die Moldau.” Other music scholars have observed this apparent relationship: The repeat…explodes in scurrying triplets.…It embraces every progressive device of keyboard virtuosity…swirling triplets and sextuplets. (Mellers, 1954, p. 557 and p. 561) the glockenspiel starts spinning roundabout triplets (Sarsfield, 2014, p. 34) idea e could be accelerated to twirling triplets. (Mather & Karns, 2015, p. 47) In the repertoire, musical markers of gaiety abound: lively tempo, major mode, clear and uncomplicated melodic organization, and simple rhythms with a swinging gait (triplet figures are common). (McKee, 2014, p. 171)
Dance treatises also associate spinning dances with steps counted in threes or sixes (Mädel, 1805, as cited in McKee, 2014). Such observations raise the question: why—and under which conditions—would triplets seem to spin? This article aims to answer this question systematically: first, musical excerpts of spinning or rotating music are collected and analyzed; second, inspired by the field of ecological acoustics (Gaver, 1993a, 1993b; Gibson, 1966, 1979; Li, Logan, & Pastore, 1991; see also Bregman, 1990), a theory is proposed to account for this phenomenon; finally, two experiments testing both the theory and the informal and analytical observations are reported.
Survey of music theorists
To study the relationship between acoustic features and rotating qualia in music more systematically, further musical examples were solicited from music scholars using a popular online discussion website sponsored by the Society for Music Theory (SMT). The following query was posted; with no mention of triplets, to avoid biasing responses: Dear collective wisdom, Some of you may know Jankélévitch’s Music and the Ineffable (1961). In one passage, he talks about various rhythms as connoting spinning, rotating, twirling, or swirling. Examples might include the beginning of Smetana’s Die Moldau, Schubert’s Gretchen am Spinnrade, Saint-Saëns’ Omphale’s Spinning Wheel, and Dvořák’s The Golden Spinning Wheel. We’re looking for more musical examples. Suggestions welcome. David Huron & Niels Chr. Hansen
Fifteen of the SMT examples include Italian tempo terms, with Allegro (33%) being most common. In order to determine whether this tempo is faster or slower than music in general, a distribution of tempo terms for a representative sample of 750 common-practice orchestral works composed between 1750 and 1900 was used as a comparison (Horn & Huron, 2015). Although this comparison sample may be biased in some way (London, 2013), Figure 1 nevertheless provides back-to-back bar graphs comparing the tempo distributions for the SMT examples with the Horn and Huron corpus. Note that 80% (12 out of 15) of the tempo-designated SMT examples exhibit tempi that are at or above the median tempo for the Horn and Huron distribution (technically Allegretto, but closer to Moderato than Allegro moderato). A Mann-Whitney U-test shows a statistically significant difference between the two distributions (W = 3,980.5, Z = −1.98, p = .047). In short, the nominally rotating SMT passages appear to exhibit somewhat faster tempi compared with common-practice Western music.

Back-to-back bar graph comparing the distribution of tempo terms for a sample of 15 nominally rotating passages (solid bars) compared with a general distribution (shaded bars) of tempo terms for 750 orchestral works (after Horn & Huron, 2015).
With regard to dynamics, the most common dynamic marking amongst the 26 SMT examples that contained dynamic markings was piano (Figure 2). Unlike for tempo, no statistically significant difference was found between the dynamics for the nominally rotating SMT passages and common music between 1750 and 1900 (W = 8,028.5, Z = −1.502, p = .134).

Back-to-back bar graph comparing the distribution of dynamic markings for a sample of 26 nominally rotating passages (solid bars) compared with a general distribution (shaded bars) of dynamic markings for 750 orchestral works (after Horn & Huron, 2015).
With regard to musical meter, Table 1 reports the frequency of occurrence of different metric types in the SMT examples compared with those found in 10,150 classical music themes from Barlow and Morgenstern’s (1983) Dictionary of Musical Themes. Musical meters are divided according to subdivision of the beat (i.e., “metric division”) and the number of beats per measure (i.e., “metric grouping”). Specifically, compound and simple meters contain subdivisions of the beat into three (e.g., 6/8, 9/8) or two (e.g., 2/4, 3/4), respectively; in terms of metric grouping, triple meters contain three beats per measure (e.g., 3/4, 9/8) whereas duple, quadruple, and quintuple meters contain two, four, or five beats per measure, respectively.
Metric division and grouping types in Barlow and Morgenstern’s (1983) Dictionary of Musical Themes compared with the SMT list of musical examples “connoting spinning, rotating, twirling, or whirling” (see Appendix). Twelve irregular and ambiguous meter signatures are excluded from the Barlow & Morgenstern corpus.
SMT: Society for Music Theory.
A chi-square test suggests that the rotating examples submitted by music theorists include more compound meters than would be expected given the distribution of meters in Western classical music generally, χ2(1) = 52.3, p < .001. 1 Conversely, there is no statistically significant tendency for the nominally rotating passages to favor triple meter over non-triple meters, χ2 (1) = 2.77, p = .096. In fact, the data are skewed in the direction opposite to favoring triple meter. In other words, the music connoting “spinning, rotating, twirling, or swirling” appears to favor subdividing the beat into three (i.e., compound meters) but not measures containing three beats (i.e., triple meters).
This finding, that rotation is associated with compound but not with triple meters, is likely to interact with tempo. Specifically, music theorists speak of a metric hierarchy ranging from beat subdivisions, through beats and measures, to hypermetric groups of measures (Lerdahl & Jackendoff, 1983). Because the beat (or tactus) can only be perceived within the 200–1,800 ms range with a peak of maximum pulse salience at 600 ms (Fraisse, 1978), triple and compound meters primarily differ in terms of tempo (London, 2012). If music slows down, for example, tactus might change to what otherwise would be considered a subdivision of the beat. Indeed, when executing ritardandi, musicians mentally subdivide the beat to facilitate creating a smooth transition (Spitzer et al., 2017). This calls for systematic manipulation of tempo in empirical studies of the rotating qualia.
Apart from tempo, dynamics, and meter, accompaniment figurations were also analyzed. Each passage was classified as either isochronous or anisochronous, depending on whether the notated note values were constant or not. Fully 85% (23 out of 27) of the SMT-list accompaniments made use of isochronous figuration, most commonly using sixteenth notes. In combination with a generally fast tempo (Figure 1), the predominance of sixteenth notes attests to a rather fast pace in passages deemed to be evocative of “rotating.” Furthermore, the SMT examples exhibit a penchant for cyclic, repeated figures. Interestingly, 76% (19 out of 25) of the cyclic lengths were multiples of three, consisting of 3, 6, 9, or 12 notes.
Finally, melodic interval size was coded in terms of the median interval of the first six pitches. The median interval for the SMT examples is two semitones. This is identical to the median interval for a wide-ranging sample of world music reported by Huron (see Figure 5.1 in Huron, 2006). However, in the world-music sample, 23% of melodic intervals are unisons whereas only 4% (5 out of 125) of melodic intervals in the SMT examples are unisons. Omitting unisons from analysis increases the median melodic interval in the world-music sample to three semitones whereas the median interval remains two semitones in the SMT examples. Thus, in nominally rotating passages, small melodic intervals are especially prevalent, even though unison intervals are rare.
By way of summary, passages of Western music that are nominally evocative of rotating or spinning appear to be associated with at least five musical features. Specifically, there is a marked tendency to employ (1) compound meters, (2) fast tempi with short note values, (3) isochronous rhythms, (4) repeated cyclic patterns with cardinalities favoring multiples of 3, and (5) small melodic intervals while simultaneously avoiding unison pitch repetitions. Finally, as demonstrated by the lack of an association of rotating qualia with triple meters, there is no evidence that nominally rotating passages favor a compound triple hierarchy. In light of this observation, in the ensuing discussion, the terms “duple”, “triple”, “quadruple”, and “quintuple” will be reserved for situations where stress patterns accord with the conventional metric hierarchy (such as the quadruple “strong-weak-medium-weak” pattern). In most cases, the more neutral terms, “binary”, “ternary”, “quaternary”, and “quinary” will be used instead.
Ecological acoustics
The use of physical models in the realm of ecological acoustics may be relevant in considering the question why triplets seem to spin. Ecological acoustics is a branch of “ecological psychology” associated with the work of James Gibson. Gibson (1966, 1979) argued that an analysis of the environment is crucial to explaining behavior. In the case of sound, an ecological, Gibsonian perspective would emphasize that listeners acquire practical everyday knowledge from sounds in their acoustical environment rather than abstract properties such as frequency, duration, and intensity (Fowler, 1990). For example, what are the acoustical properties of sounds that allow a listener to recognize that the source is made of metal or wood, or whether a sound source was struck, rubbed, or blown? Empirical work in ecological acoustics has indeed established that listeners can infer the length (Carello, Anderson, & Kunkler-Peck, 1998) and geometric width–height ratio (Lakatos, McAdams, & Causse, 1997) of objects that are dropped or struck as well as the hardness of the mallet that they were struck with (Freed, 1990) and whether the objects remained intact or broke into several pieces (Warren & Verbrugge, 1984). Similarly, listeners can infer the hand configuration used for clapping and perform above chance in guessing which individual produced a given clapping sound (Repp, 1987). In more musically pertinent studies, it has been argued that string bowing techniques (Halmrast, Guettler, Bader, & Godøy, 2010) and trumpet mouthpiece depth (Poirson, Petiot, & Gilbert, 2005) can also be inferred from the produced sounds.
A paradigmatic example of the ecological approach in the realm of sound is found in Warren and Verbrugge’s (1984) work on “breaking” and “bouncing” sounds. They considered how listeners are able to deduce from sounds whether an object has broken (in response to being dropped) or is still intact (and bouncing). In brief, whereas bouncing follows a predictable exponential temporal pattern of onsets consistent with Newtonian mechanics, breaking objects separate into component parts, typically differing in shape and mass. In this latter case, the aggregate temporal pattern represents a series of (several parts) bouncing, rather than a single bouncing object. Experiments showed that listeners are indeed sensitive to these temporal patterns and consequently successfully judge synthesized sounds to be either “breaking” or “bouncing” on the basis of experience with real-world acoustical patterns. The present study asks a similar question regarding rhythmic sounds that might be emitted when either a sound-producing object or the listener is rotating.
An Ecological Theory of Rotating Sounds (ETRoS)
In considering rotating sounds, four circumstances or scenarios may be distinguished (see Figure 3). One scenario has a sound source rotating while the listener remains outside of the rotating trajectory. A second scenario has a sound source rotating around a listener. A third scenario has the listener moving in a circle with a static sound source outside of the rotating trajectory. A fourth scenario has the listener moving in a circle around the sound source. For convenience, the initial discussion focuses on the first of these scenarios whereas the remaining three are addressed afterwards. Throughout, the listener is treated as a point receiver favoring no particular direction, thus ignoring any effects from having two spatially disjoint ears. Similarly, the sound source is assumed to exhibit isotropic radiation, that is, to emanate uniformly dispersed sound in all directions.

Four unique circumstances or scenarios in which rotating sounds can be distinguished according to the relationship between the sound source and the listener. Theoretically, the relative pattern of loudness (resulting from distance) is invariant, regardless of whether the sound source rotates, scenarios (a) and (b), or the listener rotates, (c) and (d), with the stationary part inside, (b) and (d), or outside, (a) and (c), the circular trajectory.
Figure 3(a) illustrates a sound source moving in a circular pattern with the listener outside of the movement trajectory. Note that the illustration supposes that the listener is facing towards the center of the trajectory and that the circular motion happens in the horizontal plane. Such a rotating sound source is expected to produce three notable acoustical effects: first, loudness should decrease with distance from the source to the listener; second, due to the Doppler effect, a receding sound should be slightly lower in pitch, whereas an approaching sound should be slightly higher in pitch—the amount of pitch change being proportional to the speed of the object. Finally, a truly rotating object will cause changes in perceived location due to inter-aural time and amplitude differences documented in the sound localization literature (e.g., Wightman & Kistler, 1992).
If a musician endeavors to create a rotating effect, having the sound source move in a circular pattern sometimes proves impractical since many musical instruments are relatively immobile. With the notable exceptions of electroacoustic or digitally processed music, inter-aural time and amplitude differences are generally unavailable as cues. Similarly, most musical instruments are designed to produce discrete pitches unlike the continuous frequency shifts needed to emulate the Doppler effect. This renders dynamic level potentially the most important acoustic cue for evoking a sense of rotation in listeners. Accordingly, effects of rotating trajectories on loudness will be considered next.
Since this study was motivated by the apparent association of triplets with rotation, it seems appropriate to focus on intermittent (rhythmic) rather than continuous sounds. In the most basic situation, one might consider a series of isochronous clicks or short tones with fixed rotational phase. Figure 4 offers a more detailed analysis of the scenario shown in Figure 3(a). The figure illustrates a hypothetical listener in the presence of an intermittent sound-producing object following a rotating trajectory with constant velocity. Each of the various sub-figures illustrates sound onsets occurring at regular intervals, subdividing the rotation into two, three, four, or five, respectively. As noted above, loudness will vary systematically with distance to the listener.
The top row in Figure 4 illustrates the situation for a binary pattern where two sound events are emitted for each rotation. The first column illustrates a (rare) perfect phase alignment where one of the sounds is emitted directly in front of the listener. The middle column illustrates an equally rare phase alignment in which the two closest sounds are emitted at equal distance to the listener. The third column illustrates instances of more typically encountered phase relationships. As indicated in the thumbnail bar graph and accompanying rhythmic notation, the binary cases almost exclusively involve a loud/quiet alternation. By analogy, the second, third, and fourth rows of Figure 4 illustrate ternary, quaternary, and quinary patterns in which three, four, or five isochronous sound events are emitted for each rotation.

A listener placed outside the rotating trajectory of a sound-producing object will perceive different patterns of loudness in different instantiations of binary, ternary, quaternary, and quinary cases where two, three, four, or five sounds are emitted per cycle, respectively. Dotted lines connect sound events that form part of the same case (i.e., configuration of sound events), such that the images in the first two columns depict a single case whereas the images in the third column depict a multitude of different cases all stacked upon each other. Unlike the most common quaternary and quinary cases, binary and ternary loudness patterns are consistent with metrical hierarchies used in Western music-making.
In considering the common cases illustrated in the third column of Figure 4, it is worth emphasizing that, in the absence of any already existing metrical context, the listener will tend to hear the loudest event as marking the downbeat (Lerdahl & Jackendoff, 1983; London, 2012). Notably, the recurring strong-weak pattern most commonly present in the binary case (Figure 4) is consistent with duple meters in Western music-making. For the ternary case, either a strong-weak-medium or a strong-medium-weak pattern results. Both of these feature a single strongest event, consistent with common stress patterns in triple music-making. Conversely, in the case of the quaternary pattern, the most common loudness pattern exhibiting a strong-medium-weak-medium pattern contrasts notably with the ubiquitous strong-weak-medium-weak pattern widely regarded by musicians as characteristic of quadruple music-making. Similarly, the most common quinary pattern (strong-medium-weak-weak-medium) deviates from quintuple meters in Western music (most typically, strong-weak-medium-weak-weak or strong-weak-weak-medium-weak). Thus, while music in duple and triple meters exhibits stress patterns consistent with sound sources following circular trajectories, this is typically not the case for music in quadruple and quintuple meters.
Which trajectory?
Note that any given stress pattern is consistent with multiple possible trajectories. Figure 5 illustrates alternative trajectories that could theoretically produce the same loudness patterns. For example, a binary stress pattern might arise from a figure-of-eight trajectory (Figure 5(a)). Similarly, a ternary stress pattern might arise from a more complicated trajectory (Figure 5(b)). If velocity was, moreover, allowed to vary, the number of possible trajectories would increase considerably.

Although sound sources may follow complex trajectories (solid lines), a listener with eyes closed is likely to infer the simplest trajectory consistent with the perceived pattern, in this case, leading to (a) pendular or (b) circular percepts (dashed lines).
So on what basis might a listener favor inferring one trajectory over another? A parallel problem is known from visual perception. For example, in motion-picture film where apparent motion is induced from a sequence of still images, a rotating wheel may be perceived to move backwards whenever speed of rotation interacts with the number of frames per second. While, in reality, the wheel is rotating forwards, low frame rates may result in successive images being visually parsed to favor the simpler trajectory: a smooth backward motion rather than a jumpy forward motion involving multiple rotations between each frame (Watson & Ahumada, 1985). Sampling theory refers to this phenomenon as “aliasing” resulting from violating the Nyquist criterion, which requires sampling at no less than twice the highest frequency to be represented (Nyquist, 1928).
By analogy, the listener depicted in Figure 5(a) most likely infers simple, circular motion over a complex, figure-of-eight trajectory. However, an even simpler candidate trajectory exists: namely, linear back-and-forth, pendular-like motion.
In addition to the cognitive preference for simplicity (Chater & Vitányi, 2003) exemplified here, ecological acoustics predicts that listeners infer the most common real-world movement conforming to the perceived pattern of loudness. Most people encounter linear, back-and-forth movements (e.g., rocking, bouncing, swinging) more often than circular motion which, arguably, is yet more commonly experienced than figure-of-eight trajectories. Thus, environmental exposure and cognitive simplicity preference both speak in favor of pendular and circular interpretations of Figure 5(a) and (b), respectively.
Whereas Figure 4 illustrates expected sound patterns for sources known to follow a circular trajectory, Figure 6 considers how different loudness patterns are likely to be perceived by listeners. A truly rotating sound source will not necessarily result in the perception of a rotating source. A rotating sound source emitting two sounds per cycle (as in Figure 6(a), for example) would most likely be perceived to result from back-and-forth oscillation—the simplest and most commonly encountered movement consistent with the perceived pattern.

Due to the cognitive preference for simpler and more commonly encountered trajectories, binary patterns easily reduce to pendular motion, as in (a). Whereas this would imply change of direction with no co-occurring sound events for ternary and quinary patterns, (b) and (d), it is more ambiguous whether quaternary patterns, (c), are most parsimoniously represented as spinning or bouncing.
Two possibilities are illustrated for the ternary case in Figure 6(b). Consider first the back-and-forth motion where the loudest event occurs when the pendulum is closest to the listener. As the pendulum moves two-thirds away, a second (weaker) sound is produced. After reversing direction at the end point, a third (weak) sound occurs in the identical position.
So why would triplets favor a rotating trajectory rather than the illustrated back-and-forth motion? In the real world, sound onsets are strongly linked with changes of acceleration. For example, a percussive mallet recoils after striking an object. Similar principles apply to sounds produced by bowed or plucked strings and air movements. In a simple back-and-forth trajectory, there are only two points of change of acceleration. Assuming constant velocity, regardless of how one attempts to place three equidistant sound events, a maximum of one sound event would coincide with one of the points of maximum pendular acceleration change. At least one of the beats would need to coincide with the movement. Therefore, from an ecological perspective, back-and-forth trajectories offer a poorer fit than circular trajectories to ternary sound patterns. 2 The same logic regarding circular motion as the most plausible trajectory applies to quinary rhythms (Figure 6(d)).
Quaternary rhythms are more ambiguous. While accounting for the two sound events coinciding with end-points, a pendular interpretation would leave another two events unexplained. Consequently, it is unclear whether quaternary rhythms favor rotating or pendular trajectories.
Orientation and position
Up until now, discussion has been limited to movement in a horizontal plane with the listener positioned outside the trajectory facing the sound source (Figure 3(a)). Alternative orientations and positions such as those suggested by Figure 3(b) to (d) will now be addressed.
First, consider the situation where a stationary listener is positioned inside the movement trajectory of the sound source (Figure 3(b)). In the unlikely case where the listener occupies the center of the circular trajectory, all sounds will be equally loud. As the head moves away from the center, dynamic patterns become increasingly differentiated, reaching maximum contrast when head position coincides with a point on the circle. Moving outside the circular trajectory, dynamic contrast declines again, asymptotically towards zero differentiation at infinite distance. Note that head position only affects the relative contrast of the pattern, not the pattern itself. In short, the analysis offered earlier applies equally well to scenarios where the listener is inside or outside the sound source trajectory.
Next, suppose that a listener is moving in a circle either around a stationary sound source (Figure 3(d)) or next to it (Figure 3(c)). Again, in the unlikely scenarios where the sound source occupies the center of the circle or is infinitely far away from it, all sounds will exhibit the same loudness with no differentiated pattern emerging. In all other cases, binary, ternary, quaternary, and quinary patterns will necessarily contain one sound event that is louder than the others, as outlined in the ETRoS model. Finally, regarding the plane of movement, any tilt (including a fully vertical plane) will similarly maintain the dynamic patterns described above.
In conclusion, although first exemplified with a horizontally moving sound source with a stationary point-source listener outside the trajectory, ETRoS generalizes to other scenarios of circular relative motion, regardless of which component is moving and whether its movement circumnavigates the stationary component or not.
If this ecologically-inspired theory is correct, the rotating qualia reported by music commentators should also be perceived by ordinary listeners. Accordingly, two experiments are reported. Experiment A examines listener judgments of spinning for rhythmic patterns with different cardinality (binary, ternary, quaternary, quinary) played at four different tempi. Experiment B manipulates relative intensity patterns as a more direct test of ETRoS.
Experiment A
The first experiment tests the following hypotheses:
Method
Participants
Twenty-six participants (23 females; median age, 20 years; age range, 19–57 years), principally students from the Ohio State University Department of Speech and Hearing Sciences, were recruited. Most participants received course credit for their contribution. Participants ranged from 7 to 41 (median, 23.5) on the musical training subscale from the Goldsmiths Musical Sophistication Index (Müllensiefen, Gingras, Musil, & Stewart, 2014). Answering a single question from the Ollen Musical Sophistication Index (Ollen, 2006), of the 26 participants, 2 (8%) self-reported as non-musicians, 14 (54%) as music-loving non-musicians, 6 (23%) as amateur musicians, and 4 (15%) as serious amateur musicians. None self-reported as semi-professional or professional musicians.
Stimuli
To resolve which stimuli would be most suitable to test the two hypotheses, a range of pilot experiments were conducted with different pitch patterns. To induce the intended groupings of 2, 3, 4, or 5 notes, some pilot stimuli made use of pitch height (Figure 7(a)), others of contour changes (Figure 7(b)), and others again comprised repeated unison pitches where rhythmic structure was imposed solely by increasing the intensity of the first member of each grouping (Figure 7(c)). Pilot participants comprised lab members and colleagues of the experimenters who self-reported to be unfamiliar with the experimental hypotheses. All pilot experiments were conducted online using headphones with participants asked to set their volume level to a “comfortable level where you can easily hear the sound”. Sound stimuli were presented using the Qualtrics presentation software and responses were provided using a computer mouse on a 7-point scale ranging from “1: Does definitely not sound like it is spinning or rotating” to “7: Definitely sounds like it is spinning or rotating.” In completing the initial pilot experiments, some individuals spontaneously reported having difficulty hearing repeated pitches (Figure 7(c)) as connoting any type of rotation. This confirmed analytic observations in that only three of the 27 SMT examples employed unisons. Accordingly, further pilot experiments were conducted using repeated isochronous patterns emulating the SMT examples with constant dynamics and varying pitches in close proximity (Figure 7(a), (b), (d), and (e)). None of the pilot data are included in the data analysis for our main experiments.

Examples of pilot stimuli using (a) pitch height; (b), (d), and (e), contour changes; or (c) dynamic accents in order to induce the intended groupings of 2, 3, 4, or 5 notes. A range of tempo values were tried, including 100, 150, 200, and 250 quarter notes per minute. Timbres comprised (a) and (b), piano, and (c), (d), and (e), marimba. For pilot stimuli using consistent note repetitions (i.e., (c)), two MIDI velocities were used, corresponding to 127 for accented notes and 95 for unaccented notes. Whereas the final stimuli depicted in Figure 8 were also used as pilot stimuli, none of the pilot data are included in the main analysis reported here.
After several iterations, a stimulus set emerged that appeared to produce consistent results while maintaining pitch contour and tonal implications across different cardinalities. Specifically, D4-C#4 was used for binary, D4-C4-C#4 for ternary, D4-C#4-C4-C#4 for quaternary, and D4-C4-B4-C4-C#4 for quinary. Notably, these patterns are all consistent with Doppler shifts produced by a rotating sound source whereby pitch contour rises as the source approaches and falls as the source recedes away from the listener. They also follow Thomassen’s (1982, 1983) model of pitch-related accent whereby perceived accents coincide with changes of pitch contour with ascending-descending pivots providing greater accent than descending-ascending pivots. Music-compositional practice is known to conform to this model (Huron & Royal, 1996).
Each pattern was generated using a MIDI piano timbre at four different tempi: 200 notes/minute (300 ms tone duration), 300 notes/minute (200 ms), 400 notes/minute (150 ms), and 500 notes/minute (120 ms). Sequences of patterns were 6 seconds in duration and exported as MPEG-1 (“mp3”) files with a sampling rate of 44.1 kHz. To ensure that participants heard the first tone (D4) as the downbeat, a 500 ms fade-out with no fade-in was used. An example of a ternary pattern presented at 400 notes/minute can be found in Audio 1 (Supplemental material).
Procedure
Participants were tested individually via headphones in an Industrial Acoustics Corporation sound-attenuated room with the volume set to a participant-selected comfortable—yet audible—level. Each participant heard a unique random order of 32 sound files comprising two copies of each of the four tempo renditions for binary, ternary, quaternary, and quinary patterns presented with the Qualtrics software. Participants first viewed three muted computer animations illustrating rotating, bouncing, and figure-of-eight trajectories (see Video 1). Subsequently, participants received the following instructions: Note that in this experiment, “spinning” and “rotating” are used as two complementary ways of describing a specific way of experiencing musical sounds. Some people may generally find one of these terms more suitable than the other. Also, for a given individual, some sound passages may sound more like they are spinning while others may sound more like they are rotating. Others again will sound as neither spinning nor rotating. Therefore, if you think a sound passage sounds like it is spinning, but not rotating, then you should give it a high rating for spinning/rotating. Similarly, if you think a given sound passage sounds like it is rotating, but not spinning, then you should also give it a high rating for spinning/rotating. On the other hand, if a given sound passage neither sounds like it is spinning nor rotating, then you should give it a low rating on spinning/rotating. Now that you have learned what we mean by “Spinning/Rotating Sounds,” you are ready to begin the listening experiment. You will hear 32 short sound passages. For each passage, your task is to rate how much the passage sounds like it is spinning/rotating. You provide your answer on a 7-point scale ranging from “1: Does definitely not sound like it is spinning or rotating” to “7: Definitely sounds like it is spinning or rotating.” You can play each sound passage as many times as you like, but please make sure to only play one sound at a time. There are no right and wrong answers so we recommend that you just go with your first intuition.
Results
Test–retest reliability
Before data analysis, an exclusion criterion had been set to ensure that results reflected consistent responses. In the movement sciences, test–retest reliability is commonly used to quantify skillful performance, in that highly consistent actions or responses reflect mastery of a given skill (Weir, 2005). To adequately characterize the physical or physiological properties of expert motor skills, one would therefore focus on individuals who can perform the relevant movement patterns consistently every time and exclude incapable individuals who only add noise. Similarly, in the present experiment, to meaningfully characterize the musical properties of rotating sound stimuli, it is appropriate to first focus on listeners who consistently perceive certain stimuli as more evocative of rotating qualia than others. Note that this procedure does not bias results towards responses that are consistent with the experimental hypothesis, but merely towards responses that are less noisy. Thus, results could be reliable, yet inconsistent with the hypothesis. To ensure that results generalize more broadly across individuals, confirmatory analysis on the full dataset was also planned.
It was decided a priori that participants exhibiting test–retest correlations less than +.50 would be excluded. However, if necessary, this reliability threshold would be lowered to ensure that at least 50% of participants’ data was retained for data analysis. In Experiment A, test–retest reliability scores ranged from −.55 to +.97 with a mean of +.46. Following the a priori exclusion criterion, eight participants were excluded, leaving data from 18 participants.
Spinning/rotating ratings
Figure 8 displays average spinning/rotation ratings for the 16 stimulus types. Consistent with our main hypotheses, a 2×4 ANOVA with pattern (ternary vs. non-ternary) and tempo (200 vs. 300 vs. 400 vs. 500 notes/minute) as within-participant factors showed that ratings were generally higher for ternary than non-ternary patterns, F(1, 17) = 19.65, p < .001,

The results from Experiment A show that compared with non-ternary patterns ternary patterns evoke a greater sense of spinning/rotating in listeners. This is, however, primarily attributable to low ratings for binary patterns. Spinning/rotating qualia, moreover, increase with musical tempo (spanning from 200 to 500 notes/minute).
Another 2×4 ANOVA conducted on data from all participants confirmed that these results remained robust when participants with low reliability were included. Specifically, significant main effects of pattern, F(1, 25) = 24.19, p < .001,
Notably, when assessing the different types of non-ternary stimuli, somewhat unanticipated responses emerged for quaternary and quinary rhythms (Figure 8). Overall, these types of non-ternary stimuli appeared to be perceived as no less spinning or rotating than ternary patterns. This raises the question whether the main effects of pattern might primarily be driven by lower ratings for binary patterns. To investigate this possibility, a post hoc one-way ANOVA was conducted on data from participants with supra-threshold reliability, comparing ratings for binary, ternary, quaternary, and quinary patterns explicitly. In addition to a significant main effect of pattern, F(1.6, 26.6) = 9.27, p = .002,
Response strategies
In post-experiment interviews, when prompted for explicit response strategies, 11 out of the 26 participants spontaneously reported that they had closed their eyes and internally visualized objects (or themselves) moving along with the sounds. Moreover, four individuals out of the full sample reported that they had entrained with the sounds, for example, by moving their wrists, heads, eyes, or the computer mouse in a circular trajectory. All in all, given two individuals reporting use of both types of strategies, a total of 13 participants admitted using visualization and/or entrainment strategies. Such strategies may have been prompted by the animation videos shown to participants prior to the experiment (Video 1).
Assessing whether these strategies were beneficial or not, Mann-Whitney U-tests showed that people using visualization strategies generally had higher reliability scores (Median = .782, n = 11) compared with people who did not use visualization strategies (Median = .511, n = 15), U = 38, Z = −2.31, p = .020. Entrainment strategies, on the other hand, were associated with significantly lower reliability scores (Median = .258, n = 4), U = 16, Z = −1.99, p = .048, compared with people not using such strategies (Median = .617, n = 22). Due to the limited use of entrainment strategies, however, this latter result needs to be interpreted with caution.
Even though visualization and entrainment strategies may have influenced the reliability of participants’ responses, two separate 2×4×2 ANOVAs on data from all participants with pattern (ternary vs. non-ternary) and tempo as within-participant factors and visualization or entrainment strategy as dichotomous between-participant factors showed no main effects or interactions involving strategy-related factors (for all, p > .05). Thus, while visualizing objects may have led to more consistent responses and entraining with sounds may have led to less consistent responses, none of these strategies compromised the overall results reported previously.
Experiment B
The ETRoS theory suggests that the rotating quality of triplets is not attributable to the ternary pattern itself, but to the conformity of its intensity profile to the behavior of actual rotating sound sources. Consequently, imposing ecologically-congruent intensity patterns on other rhythms should render them more rotating-sounding. For example, overriding the classic metric interpretation of quadruple rhythms, “strong-weak-medium-weak”, with “strong-medium-weak-medium”, should transform quaternary patterns to be consistent with an ecological interpretation of a rotating sound-producing or sound-receiving object. This suggests the following hypothesis:
Method
Participants
Experiment B was conducted immediately after Experiment A and employed the same participants.
Stimuli
In light of the strong tempo effects observed in Experiment A, all stimuli in Experiment B employed a constant tempo of 300 notes per minute. One stimulus type (“chromatic”) used the same quaternary pitch contour as in Experiment A (namely, D4-C#4-C4-C#4). Note that even though unison pitch patterns were infrequent in the SMT examples and were generally not perceived as spinning or rotating using neutral intensity manipulations in the pilot experiments, it remains possible that ecological intensity manipulations would be effective even for such patterns. Therefore, a second stimulus type (“repeated”) used groups of four repeated pitches (D4).
As noted, the main manipulation pertained to intensity. Specifically, employing the principle that loudness is inversely proportional to distance, ecological patterns derived from the ETRoS model were pitted against unecological matched patterns (Figure 9).

Example of one of the 18 phase rotations used in Experiment B to create ecological and unecological quaternary stimulus patterns based on the Ecological Theory of Rotating Sounds (ETRoS). (a), (b): Ecological intensity profiles were determined by mapping the Euclidean distance between a listener and the points of sound emission to MIDI velocity values; (c): Unecological “twins” were subsequently created by permuting the ecological MIDI velocity values.
Intensity profiles for ecological stimuli were calculated as follows: Given a phase position representing the loudest event in the cycle, three other phase positions were established by dividing the 360° of the circle into four. For example, a starting phase of 10° entails accompanying events sounding at 100°, −170°, and −80° (Figure 9(a)). Euclidean distances were then calculated between each phase position and the hypothetical listener assumed to be located at position zero on the circle (Figure 9(b)). Using this distance measure, MIDI key velocities for each note event were interpolated linearly between the maximum MIDI velocity 127 and 31 (deemed to constitute a reasonably quiet though still clearly audible lower bound). This procedure follows from Dannenberg’s (2006) observation that—in the absence of any 100% fixed relationship—a simple square law provides an approximate mapping between MIDI key velocity and peak RMS amplitude. Conveniently, the inverse square-law relating distance to amplitude thus cancels out the square-law MIDI mapping. Finally, steps 1–3 were repeated for 18 starting positions, each offset by 5° spanning the arc from −40° to +45°.
For each ecological stimulus, a non-ecological “twin” was created by permuting the four MIDI velocity values (Figure 9(c)). Specifically, the highest MIDI key velocity was retained for the first (i.e., closest) sound event, the second highest value occupied position 3, and positions 2 and 4 were interchanged. For example, an ecological pattern of MIDI key velocities of 119, 53, 32, 65 (Audio 2 for chromatic and Audio 3 for repeated, Supplemental Material) would be reorganized as 119, 32, 65, 53 (Audio 4 for chromatic and Audio 5 for repeated, Supplemental Material). In this way, 36 stimulus pairs comprised chromatic and unison versions of 18 ecological intensity patterns conforming to a physical, ecological model of rotation and 18 unecological intensity patterns conforming to a musically more familiar strong-weak-medium-weak stress pattern. For variety, a marimba sound was used, with slight reverberation to increase realism. All stimulus sequences were 6 seconds in duration, comprising six repetitions of the four-note patterns notated in Figure 9(c). They were prepared in Cubase 7.1.3 (Steinberg Media Technologies GmbH, Hamburg, Germany) and exported as MPEG-1 (“mp3”) files with a sampling rate of 44.1 kHz.
Procedure
Participants completed a two-alternative forced-choice task where they listened to two random permutations of the 36 stimulus pairs. Sound files were presented with the Qualtrics software and played once whenever the participant clicked the relevant on-screen display with a computer mouse. Headphones were used with the volume set to a participant-selected, comfortable, yet audible sound level. The task was self-paced with a voluntary break encouraged halfway through.
Beforehand, participants received the following instructions: You will now hear 72 pairs of short, repeated sound passages. For each pair, your task is to choose which passage sounds more like it is spinning/rotating. You give your answer by clicking on the frame surrounding the most spinning or rotating sound display. You can play each sound passage as many times as you like, but please make sure to only play one sound at a time. There are no right or wrong answers so we recommend that you just go with your first intuition.
Results
Test–retest reliability
Once again, an a priori test–retest reliability criterion was set to reduce noise in the measurements. Since, however, the data in Experiment B were dichotomous ecological/unecological judgments rather than ratings on a 7-point scale (as in Experiment A), an alternative ratio-based exclusion criterion needed to be adopted. Consequently, it was decided that only participants responding the same way to 60% or more of the repeated paired judgments would be included in analysis. As in Experiment A, this criterion would be loosened as necessary to ensure that no more than half of the data were excluded. Test–retest reliability in Experiment B ranged between 44% and 89% with a mean of 62%. The inclusion criterion was met by 16 of the 26 participants.
Ecological versus unecological patterns
In analyzing the data from Experiment B, the statistical software R (R Core Team, 2014) and the “glmer” function from the lme4 package (Bates, Maechler, Bolker, & Walker, 2015) were used to conduct binary logistic regression with generalized linear mixed effects modelling by maximum likelihood (Laplace Approximation). This enabled accounting for the data dependency of multiple responses from each participant and for dual responses for each repeated stimulus pair. Using this method, an intercept with a log odds ratio significantly above zero would imply that participants were more likely than chance to select ecological stimuli over unecological ones as sounding more spinning or rotating. Consequently, whereas an intercept term was the only fixed effect entered into the initial model, participant ID and stimulus item were entered as random effects. This model was compared with a null model with the same random effects, but no intercept.
Indeed, when including 1,152 data points from the 16 participants meeting the inclusion criterion, a significant intercept estimate of 0.79 (SE = 0.17), Z = 4.68, p < .001, was obtained. This model (AIC = 1,408.13) performed significantly better than the null model (AIC = 1,421.74), χ(1) = 15.61, p < .001. Moreover, a post hoc analysis including 1,872 data points from all 26 participants confirmed this pattern with a significant intercept of 0.62 (SE = 0.14), Z = 4.50, p < .001. This model (AIC = 2,365.72) also outperformed the null model (AIC = 2,380.43), χ(1) = 16.71, p <.001.
As seen in Figure 10, the extent to which ecological stimuli were perceived as more evocative of spinning/rotating than unecological stimuli differed considerably between stimulus pairs making use of repeated versus chromatic pitches. Because ETRoS did not predict that the manipulation of MIDI velocity values would have different effects on these two stimulus types, they were not differentiated in the initial analysis. However, to investigate this unexpected effect, post hoc tests were performed. First, another binary logistic regression model was created with stimulus type as an additional fixed effect on the data for the 16 participants meeting the inclusion criterion. Indeed, in this model, stimulus type contributed significantly to the prediction of task responses, with a fixed effect estimate of 1.04 (SE = 0.14), Z = 7.67, p < .001. Second, further regression models predicting repeated and chromatic data separately showed that whereas the ecological version was clearly experienced as more spinning/rotating for chromatic stimuli, intercept estimate = 1.34, SE = 0.15, Z = 8.93, p < .001, the difference between ecological and unecological stimuli remained statistically non-significant when using repeated pitches, intercept estimate = 0.33, SE = 0.25, Z = 1.31, p = .189. Similarly, whereas the model for chromatic pitch stimuli outperformed the null model, χ(1) = 32.83, p < .001, this was not the case for the model for repeated pitch stimuli, χ(1) = 1.651, p = .199. Thus, whereas the overall results are consistent with ETRoS, post hoc analysis suggests that a statistically significant superiority of ecological over unecological patterns of dynamic accents in evoking the musical qualia of spinning/rotating was only present for stimuli using moving pitch.

The results from Experiment B show that ecological patterns with intensity profiles consistent with circular trajectories are generally deemed to sound more spinning/rotating than unecological patterns reflecting metrical hierarchies typically found in Western music. However, when chromatic and repeated pitch stimuli are analyzed separately, this difference in perceived spinning/rotating qualia between ecological and unecological stimuli is only present when pitch is moving.
Discussion
In examining the musical SMT passages deemed by music theorists to be evocative of rotation, it was observed that nominally spinning passages tend to employ repeated, isochronous pitch patterns using small intervals while avoiding unison pitch repetitions. Tempo was typically fast and compound meters dividing the beat into three rather than two were prominent. An Ecological Theory of Rotating Sounds (ETRoS) was proposed to account for the apparent association between spinning/rotating and ternary tone sequences, based on patterns of relative loudness as a plausible cue for rotating sounds in the environment. Two experiments provided results that were overall consistent with the theory. Specifically, Experiment A showed that ternary patterns are perceived as more spinning/rotating than non-ternary patterns. Note that any differentiated stress pattern involving isochronous ternary sound sequences will be consistent with a rotating trajectory, including strong-weak-weak, strong-medium-weak, strong-weak-medium, strong-strong-weak, and strong-weak-strong (but not strong-strong-strong, or weak-weak-weak). This overall difference between ternary and non-ternary patterns was, however, primarily driven by low spinning/rotating ratings for binary patterns whereas quaternary and quinary were no less spinning or rotating than ternary patterns. The low ratings for binary patterns may be explained by a presumed cognitive preference for the simplest consistent movement trajectory (see Chater & Vitányi, 2003) combined with lower environmental exposure to more complex movement trajectories. The fact that these factors sometimes lead to erroneous inferences is consistent with research showing that auditory perception utilizes imperfect heuristics that are serviceable, albeit not always entirely accurate (Aarden, 2003; Huron, 2006; von Hippel & Huron, 2000). Similarly consistent with ETRoS, Experiment B showed that ecological patterns with rotating intensity profiles were indeed judged by listeners as more spinning or rotating than non-ecological patterns with more culturally familiar intensity profiles.
Before discussing these findings in further detail, it would be appropriate to discuss candidate confounding effects. Although participants were not explicitly asked to indicate preferences, preferences may indeed influence response behavior in selection tasks. Note that increasing preference due to mere exposure (Zajonc, 2001) and processing fluency resulting from familiarity (Reber, Schwarz, & Winkielman, 2004) would predict the opposite pattern from what was observed. Thus, preference effects are unlikely to have confounded the present results.
It may also be argued that our mentioning of four specific musical examples and a single musicological text in our online discussion group message may have biased SMT discussants to report passages in triple or compound meters. Note that because inclusion of musical examples where spinning/rotating qualia were referred to in the work title was not dependent on subjective association, this criticism would only apply to 10 of the analyzed musical examples. The somewhat informal procedure adopted in the present exploratory crowdsourcing study served to increase the likelihood of receiving online responses from professional music theorists, thus maximizing the overall sample size and statistical power of the conducted analysis. Future research could benefit from replicating these findings relating to ecological, real-world musical stimuli in a more controlled lab setting with neutral, non-suggestive wording.
In addition to possible confounds, the empirical results raise a number of interesting questions that are not directly addressed in the current version of ETRoS. First, the finding of a positive effect of musical tempo on perceived spinning/rotating requires further discussion. Speculatively, the fact that faster tempi lead to higher arousal (Dillman Carpentier & Potter, 2007) may give rise to more vivid motion perception, which, in turn, may lead to higher ratings of spinning/rotating. Another possibility is that very slow rotation speeds might compromise the impression of motion due to having cycle periods exceeding short-term memory or may perhaps fail to exceed the just noticeable difference for detecting movement. This latter conjecture accords with empirical evidence that at especially slow tempi, the ability for listeners to predict beat onsets or metric downbeats is compromised (London, 2012). This is thought to arise due to switching from a short-term, beat-based predictor to a duration-based estimate of onsets. This suggests that the model may not generalize to especially slow tempi. Similarly, it could be objected that at exceptionally fast tempi the required speed would exceed that at which humans (including highly trained ballet dancers) are capable of moving. This would, however, not preclude rotating qualia experienced under these circumstances to have arisen from objects or animals that can move more swiftly than humans (as suggested in Figure 3(a) and (b) as opposed to Figure 3(c) and (d)).
Another ecological interpretation of the tempo effect might argue that rotating sound sources in the environment simply tend to move quickly and/or follow trajectories with a small circumference. This is hard to verify (or falsify, for that matter). However, because tempo is the predominant factor in distinguishing triple from compound meters (Hannon, Snyder, Eerola, & Krumhansl, 2004; London, 2012), the association between spinning and compound, but not with triple meters observed in the crowdsourced SMT examples seems strikingly consistent with the experimental tempo effect.
Note that because tempo was manipulated systematically in terms of event rate rather than pattern rate in Experiment A, pattern duration varied between stimulus types employing the same number of notes per minute. For example, the quaternary-200 pattern lasted twice as long as the binary-200 pattern. These different pattern durations could potentially have introduced an unwanted confound. Importantly, however, the remarkably similar spinning/rotating ratings for stimuli using the same number of notes per minute across cardinalities (i.e., ternary, quaternary, and quinary) (see Figure 8) suggests that event rate was indeed a better predictor of behavioral responses than pattern rate. Moreover, and very deliberately, no assumptions were made as to which beat rate participants would perceive (i.e., beats per minute). Therefore, it is not unlikely that some participants would have perceived 2 beats per 4-note pattern for slow quaternary stimuli. By comparison, due to their prime number cardinality, ternary and quinary patterns would have been impossible to group into composite isochronous beats. The fact that quaternary stimuli received very similar ratings to ternary and quinary stimuli presented at the same event rate suggests that prime number cardinality cannot in itself account for differences in rotating/spinning qualia ratings.
Second, it remains unclear why unison pitches impede the evoking of rotating qualia, as demonstrated by the absence of note repetitions in the SMT examples and the null effects of ecological repeated pitch patterns in Experiment B. Although this was not tested directly, it seems that spinning/rotating qualia are best evoked by a combination of ecological dynamic patterns and moving pitch sequences. Taken together, the results from the two experiments suggest that spinning or rotating qualia can be evoked simply through the use of ecological pitch (Experiment A) but not simply through the use of ecological dynamics (Experiment B). Ecological dynamics only increased these qualia when pitch was also moving. Note, however, that because all moving-pitch stimuli applied a discrete approximation of ecologically consistent Doppler shifts, it is unknown if this effect generalizes beyond this particular pitch pattern and if ecological congruence between pitch and dynamics or merely pitch movement is the determining factor. Given that motion perception relies on multiple cues (Lutfi & Wang, 1999), further research, for example contrasting Doppler-consistent and Doppler-inconsistent patterns, seems warranted.
Third, why do spinning or rotating sounds seem most likely to be evoked by repeating figures? A main goal of the auditory system is to predict the environmental sources of sensory signals. Location constitutes a key component of this. Inferring circular trajectories is only informative to the extent that such trajectories reoccur. If they reoccur, locational predictability increases notably because the range of possible locations is heavily constrained. The resulting increases in the certainty and accuracy of listener predictions are adaptive in their own right. Conversely without repetition, estimated movement trajectories will be more uncertain and less informative for predicting location and may, therefore, seem less salient to the listener. In ecological terms, repetition in music has been described as a sonic analogue for cyclic processes in the environment (Zbikowski, 2008). Importantly, however, since the current experiments did not systematically test a range of different repeating and non-repeating figures, further research would be appropriate to establish whether repetition is indeed crucial to conveying rotating qualia, and if so, why that might be the case.
Fourth, why are nominally “spinning” passages so strongly disposed to employ isochronous tone sequences? Recall that because we did not conduct an experiment to test this analytical observation, the prominence of isochronous note durations may constitute a cultural phenomenon with no ecological origin. Nevertheless, from an ecological perspective, rotating motion most likely arises from a single, sustained or impulsive force. Maybe because isochronous tones are perceived as implying constant velocity, they better conform to a simpler (and more common) experience of sounds arising from a single force. Such speculations, however, invite future experiments.
Fifth, why do quaternary and quinary patterns appear to evoke spinning as well as ternary patterns? Again, these may potentially be cultural associations with no ecological origin. Moreover, since quinary patterns are rare in Western music, there is no standardized way of subdividing quintuplets; unecological strong-weak-medium-weak-weak and strong-weak-weak-medium-weak patterns constitute possible realizations, but so do ecological strong-medium-weak-weak-medium patterns. In any case, as previously noted, quinary patterns are inconsistent with bouncing/pendular motion because no sound events would coincide with the end points of the trajectory (Figure 6(d)).
The spinning character of quaternary patterns seems more puzzling. Unlike ternary and quinary patterns, quaternary patterns indeed seem reconcilable with simple pendular or bouncing back-and-forth movements (Figure 6(c)). Moreover, traditional musical practice provides a stereotypic way of reconceiving of quaternary patterns as binary patterns at twice the tempo. Therefore, it seems curious that the quaternary pattern retained some rotating qualia for listeners in Experiment A. Importantly, because no explicit strong-weak-medium-weak differentiation was imposed in Experiment A, it remains unknown whether this manipulation rendered unecological stimuli in Experiment B less evocative of spinning/rotating. Nevertheless, research strongly suggests that Western listeners would inadvertently perceive conventional metrical hierarchies when listening to monotone metronome sequences despite the absence of any objective dynamic manipulation (Bååth, 2015). This suggests that the ETRoS model is incomplete and requires incorporation of other pertinent factors.
Sixth, the current study does not address all the repercussions of the current instantiation of ETRoS. For example, ecological experience with Doppler shifts would predict that continuously changing pitch (glissandi) should enhance spinning in comparison with the discrete pitch changes applied here. In fact, the manipulation of intensity in Experiment B to be consistent or inconsistent with rotating motion could easily be extended to pitch and inter-aural cues. Future research may, moreover, investigate the spinning/rotating potential of hierarchical ternary patterns. Indeed, the finding that meters containing two ternary levels (e.g., 9/8) were no more evocative of rotation than other compound meters (e.g., 12/8) seems consistent with the ecological conjecture that listeners very rarely encounter sound sources following trajectories of nested rotations. On the other hand, this was not established experimentally.
Finally, while the current study provides moderate support for various aspects of our ecological theory of rotating sounds, all possible non-ecological explanations have not been definitively refuted. The most competitive candidate explanation that comes to mind is that of a conceptual (rather than ecological) mapping between circular trajectories in space and circularity applied to musical parameters. In other words, it is possible that spatial circularity is effectively conveyed by (pseudo-)continuously increasing and decreasing motion in pitch or dynamics which changes direction midway and returns to the starting point with a regular frequency. Such metaphorical mapping would indeed account for perceptions of spinning or rotating qualia at exceptionally slow or fast tempi even though—as argued above—some of these scenarios would entail physical movement that would not realistically be experienced in the environment. However, because the conceptual and ecological theories provide coinciding predictions in the scenarios that we can think of, they were not addressed individually here. Hopefully, future research will be able to cunningly design stimuli that tease apart these interpretations.
In general, the present results have repercussions for the broader concept of musical motion. Music theorists have long recognized how descriptions of music often resort to motion metaphors (Godøy & Leman, 2010). Ethnomusicologists have observed that “music” as a distinct category does not exist in all cultures; instead, many cultures conceptualize “music and dance” as a single phenomenon (Blacking, 1995, p. 224). This suggests that sound and movement are inextricably linked. While empirical researchers have rightfully proposed that embodied cognition may account for the music–motion connection (Clarke, 2005), the specific relationship is often left obscure. The current study draws attention to ecological acoustics as a potentially informative source for better understanding the relationship between music and movement. In particular, careful modeling of the acoustical consequences of specific movement trajectories may offer a useful tool for better understanding how specific musical patterns may be evocative of certain kinds of movement.
Supplemental material
Supplemental Material, Audio1 - Twirling Triplets: The Qualia of Rotation and Musical Rhythm
Supplemental Material, Audio1 for Twirling Triplets: The Qualia of Rotation and Musical Rhythm by Niels Chr. Hansen and David Huron in Music & Science
Supplemental material
Supplemental Material, Audio2 - Twirling Triplets: The Qualia of Rotation and Musical Rhythm
Supplemental Material, Audio2 for Twirling Triplets: The Qualia of Rotation and Musical Rhythm by Niels Chr. Hansen and David Huron in Music & Science
Supplemental material
Supplemental Material, Audio3 - Twirling Triplets: The Qualia of Rotation and Musical Rhythm
Supplemental Material, Audio3 for Twirling Triplets: The Qualia of Rotation and Musical Rhythm by Niels Chr. Hansen and David Huron in Music & Science
Supplemental material
Supplemental Material, Audio4 - Twirling Triplets: The Qualia of Rotation and Musical Rhythm
Supplemental Material, Audio4 for Twirling Triplets: The Qualia of Rotation and Musical Rhythm by Niels Chr. Hansen and David Huron in Music & Science
Supplemental material
Supplemental Material, Audio5 - Twirling Triplets: The Qualia of Rotation and Musical Rhythm
Supplemental Material, Audio5 for Twirling Triplets: The Qualia of Rotation and Musical Rhythm by Niels Chr. Hansen and David Huron in Music & Science
Footnotes
Contributorship
NCH and DH conceived the study and study design and jointly wrote the first draft of the manuscript. NCH gained ethics approval, recruited participants, conducted data collection, and performed analysis. Both authors approved the final version of the manuscript.
Author’s Note
Niels Chr. Hansen is also affiliated to The MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Australia.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The research reported here was approved as a sub-study under Protocol “2012B0006 STUDIES IN SYSTEMATIC MUSICOLOGY” by the Behavioral & Social Sciences IRB at Ohio State University.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
Peer review
Zohar Eitan, Tel Aviv University, School of Music.
David Greatrex, University of Cambridge, Centre for Music and Science.
Roni Granot, The Hebrew University of Jerusalem, Department of Musicology.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
