Abstract
This study presents an audio stimulus set of 40 drum patterns from Western popular music with empirical measurements of perceived complexity. The audio stimuli are meticulous reconstructions of drum patterns found in commercial recordings; they are based on careful transcriptions (carried out by professional musicians), drum stroke loudness information, and highly precise onset timing measurements. The 40 stimuli are a subset selected from a previously published larger corpus of reconstructed Western popular music drum patterns (Lucerne Groove Research Library). The patterns were selected according to two criteria: a) they only feature the bass drum, snare drum, and one or more cymbals, and b) they plausibly cover the complexity range of the corpus. Perceived stimulus complexity was measured in a listening experiment using a pairwise comparison design with 220 participants (4,400 trials). In each trial, participants were presented with two stimuli, and they stated which of the two sounded more complex to them. The comparison data then served to calculate complexity estimates using the Bradley–Terry probability model. The complexity estimates have an intuitive interpretation: they allow calculation of the probability that one pattern is considered more complex than another pattern in a pairwise comparison. To our knowledge, this is the first set of naturalistic music stimuli with meaningful perceived complexity estimates. The drum pattern stimuli and complexity measurements can be used for listening experiments in music psychology. The stimuli will further allow measures and models of drum pattern complexity to be assessed.
Introduction
The complexity of musical stimuli and its psychological effects on listeners (such as listeners’ aesthetic appreciation or interest in the music, their groove experience, and many other kinds of music-evoked emotions) are a wide field of inquiry. Complexity has been shown to affect humans’ appreciation of artifacts such as pictures (Nadal et al., 2010; Osborne & Farley, 1970; Vitz, 1966), narratives (Carney et al., 2014; Stokmans, 2003), music (Chmiel & Schubert, 2019), buildings (Imamoglu, 2000), advertisements (Cox & Cox, 1988), and websites (Mai et al., 2014; Nadkarni & Gupta, 2007; for a general overview, see Mihelač and Povh, 2020; Van Geert & Wagemans, 2020). A popular theory posited by Berlyne (1963, 1971) claims that humans’ preferences form an inverted-U function (also called a Wundt curve, see Wundt, 1874, p. 432) of stimulus complexity. According to this theory, humans prefer percepts of moderate complexity: the percepts need to be complex enough to be interesting, but not so complex as to be confusing. Studies in empirical music aesthetics have been equivocal in their findings: some provided evidence to support the inverted-U hypothesis with respect to musical stimuli (Beauvois, 2007; Heyduk, 1975; North & Hargreaves 1995, 1996; Steck & Machotka, 1975), whereas others failed to find evidence to support the theory or were inconclusive (Eisenberg & Thompson, 2003; Marin & Leder, 2013; Orr & Ohlsson, 2001, 2005; Russell, 1982; Vitz, 1964).
Besides music appreciation, the inverted-U hypothesis has also become relevant to the study of musical groove, whereby groove is defined as a pleasurable urge to move in response to music (Janata et al., 2012; Senn et al., 2020). Witek et al. (2014) found that popular music drum patterns with intermediate rhythmic complexity evoked a stronger groove experience than drum patterns with low or high rhythmic complexity. This finding was confirmed by several later studies (Cameron et al., 2023; Chmiel & Schubert, 2019; Matthews et al., 2019, 2020; Morillon & Zalta, 2021; Stupacher et al., 2020). A recent theory on the connection between complexity and groove hypothesizes that listeners use repetitive body movement to clarify metric uncertainties triggered by rhythmically complex stimuli. They “fill in the gaps” (Witek, 2017), replacing missing beats in the music by body movements such as foot or hand taps (Spiech et al., 2022; Stupacher et al., 2022). This theory suggests that the groove experience and associated body movement is essentially an embodied mechanism to clarify the meter underlying a complex musical rhythm.
To our knowledge, the concept of musical complexity has not yet been comprehensively defined in music psychology or in music scholarship. In the present study, we propose to approach musical complexity from a pragmatic and empirical point of view: The approach relies on the availability of one set of naturalistic musical stimuli that cover a wide complexity range and that are associated with reliable complexity measurements. This kind of stimulus set allows one to investigate which stimulus properties correlate with the complexity measures. Thus it contributes to theorizing musical complexity on an empirical basis and potentially to a definition of the concept.
Measuring the complexity of stimuli, however, is not a trivial problem. Past studies on music and complexity used either objective (i.e., stimulus-based) or subjective (i.e., perception-based) methods to assess the complexity of musical stimuli. Objective methods estimate complexity using structural properties of the stimuli themselves. Existing objective measures of rhythmic complexity are based on syncopation (Keith, 1991; Longuet-Higgins & Lee, 1984; Witek et al., 2014, 2015), offbeatness (Gómez et al., 2007), metric complexity (Toussaint, 2002), or weighted note-to-beat distance (Gómez et al., 2005). These approaches link rhythmic complexity to discrepancies between the
In a number of studies, researchers have used approaches from information theory to develop objective measures of rhythmic complexity, such as Shannon entropy, entropy rate, excess entropy, transient information, or Kolmogorov complexity (De Fleurian et al., 2017; Thul & Toussaint, 2008; Witek et al., 2014). These methods represent rhythm as a discrete series of symbols and quantify the information content within this series. If the information content is high (or if redundancy is low), then the rhythm is understood to be complex.
Subjective measures of musical complexity are based on the notion that complexity is not a property of the stimulus itself, but “intrinsically related to how listeners perceive music” (Eerola, 2016, p. 2f.), which in itself is thought to be highly enculturated (Trainor et al., 2012). Pressing (1999) formulated the idea that subjectively perceived rhythmic complexity is linked to the “computational cost” (p. 2), which is the effort that the individual needs to make to relate a concrete rhythm to the underlying meter. With this idea, Pressing anticipated the
From a subjective perspective, the perceiver is the ultimate judge of stimulus complexity. This is also the stance adopted in this study: We assume that stimulus complexity needs to be determined empirically in a listening experiment with human judges whereby a musical stimulus is complex, when listeners perceive it as such. Further, we assume that objective models of stimulus complexity based on the properties of the stimuli themselves are most useful if they reliably predict the subjectively measured complexity of the stimuli.
In several studies, researchers measured stimulus complexity subjectively using Likert-type rating scales: Heyduk (1975) let participants rate the complexity of four short piano pieces. Russell (1982) asked participants to judge the complexity of 40 audio excerpts from modern jazz. In North and Hargreaves’ (1995) study, participants rated the complexity of 60 short extracts from popular music tracks. Shmulevich and Povel (2000) let participants judge the complexity of 35 homophonous rhythms, each of them consisting of 9 rhythmic events. Orr and Ohlsson (2005) asked participants to rate the complexity of 40 jazz and 40 bluegrass improvisations. Each of these stimulus sets has properties that limit their usability: The Heyduk (1975) set is very small (four stimuli). The Russell (1982), North and Hargreaves (1995), and Orr and Ohlsson (2005) stimuli show differences of instrumentation across stimuli and vary in many other uncontrolled ways. The Shmulevich and Povel (2000) stimuli consist of abstract rhythms that are not played on musical instruments and that are not idiomatic for any specific kind of music. In all of these studies, the stimulus sets were prepared
Clemente et al. (2020) presented a large stimulus set of 200 monophonic melodies that were explicitly designed for repeated use in music psychology. The stimuli vary along several axes that have been found to be important to music perception: balance, contour, symmetry, and complexity. Many stimulus properties are kept invariant across the set (instrumental timbre, monophony, length); others are easily controllable (number of notes, melodic intervals, tessitura, tempo, etc.). The Clemente et al. (2020) stimuli thus have many valuable qualities. Yet, the single-note piano melodies without loudness, timing, or tempo variations are not particularly naturalistic. Consequently, they are not adequate for research contexts in which ecological validity is key.
Witek et al. (2014) presented a set of 50 simplified drum patterns drawn from Western popular music recordings with a wide range of degree of syncopation. These stimuli show a balance between ecological validity and experimental control. Nevertheless, the simplifications of the drum patterns are substantial: The hi-hat voice is reduced to a simple sequence of eighth notes, thus neutralizing the contribution of the hi-hat (and other cymbals) to the complexity of the originally recorded patterns. The stimuli present all patterns at a tempo of 120 bpm instead of the originally recorded tempi. The stimuli do not implement variations in loudness and/or microtiming/swing. Finally, 13 of the 50 stimuli are not derived from originally recorded popular music but were composed by the researchers to show extreme (high or low) syncopation. All these modifications affect the naturalness of the stimuli. Furthermore, the complexity measures associated with the 50 stimuli are purely objective (
In conclusion, presently there is no stimulus set available that enables the use of naturalistic music to study the effects of complexity on music listeners and that is associated with accurate subjective measures of perceived stimulus complexity. This study has the following goals: (1) it presents a set of 40 idiomatic, naturalistic drum pattern audio stimuli for which there is a spectrum of perceived complexity and that can be used in psychological research and in the modeling of drum pattern complexity; (2) it provides empirical subjective complexity measurements based on a listening experiment; (3) it uses these measurements to assess existing objective measures of drum pattern complexity and formulates a benchmark for future measures.
Methods and Materials
Forming the Stimulus Set
From the corpus of the Lucerne Groove Research Library, 40 popular music drum patterns were selected. This corpus currently consists of 251 audio reconstructions, transcriptions, and metadata relating to drum patterns from Western popular music (rock, funk, rhythm and blues, pop, disco, soul, heavy metal, and rock’n’roll) in common time (4/4 time signature). These patterns were originally recorded between the 1950s and the 2010s. The audio reconstructions of the drum patterns are based on expert transcriptions of eight-bar extracts from the original recordings and were prepared by two professional musicians (second and fourth author). Event timing relies on computer-assisted onset time measurements (precise to a few milliseconds). The loudness of each stroke is based on the perceptual judgment of the two transcribers (for information on the transcription, measurement, and reconstruction process, see Senn et al., 2018, pp. 5–7).
To select the 40 drum patterns for the current study, we applied two criteria: first, we wanted the drum patterns to be more or less homogenous in terms of their instrumentation. Patterns that included toms or additional percussion instruments were excluded from the selection. The remaining 184 drum patterns were eligible because they only used the bass drum, snare drum, hi-hat, and other cymbals.
Second, we wanted the stimulus set to cover the complexity range of the corpus and to fill this range more or less equidistantly. To achieve this, we calculated the
The 40 patterns were shortened from 8 measures duration to 4 measures, ending on the first beat of the fifth measure. Since most patterns repeat with periods of a half bar, one bar, or two bars, a duration of four bars is sufficient to present the period at least twice and thus reveal the periodic nature of the patterns. The shortness of the stimuli (between 8.8 and 16.5 s) facilitates their use in a listening experiment.
Stimuli were rendered from MIDI data in
Experimental Design
The goal of the listening experiment was to measure the relative complexity of the 40 drum pattern stimuli as perceived by listeners. The experiment used an incomplete pairwise comparison design: In every trial, two stimuli were presented to the participants, who then selected which stimulus sounded more complex (“winner”) compared to the other (“loser”).
In music psychology, pairwise comparison designs are rarely used; Likert rating is much more frequent. Yet, human participants tend to carry out pairwise comparison tasks with more ease than Likert rating tasks (Clark et al., 2018; Laming, 1984; Phelps et al., 2015). In a comparison task, participants do not need to map their judgment onto an abstract ordinal scale, but they can rank two stimuli in a direct way. Additionally, each trial presents participants with all the information they need to carry out the task. They do not have to remember stimuli from earlier trials to provide a fair judgment. The approach also eliminates well-known biases connected to Likert-rating, such as response style bias (which range of a scale participants like to use, see Chen et al., 1995; Oishi et al., 2005) and the reference group effect (the same pattern might appear simple to a musician but complex to a non-musician, see Heine et al., 2002). Conversely, the comparison method is costly: Many trials are necessary to achieve a good differentiation. This is the case because every trial only offers ranking information on two stimuli; information pertinent to the size of the perceived complexity difference is lost. Participants only choose a winning pattern; they do not express whether the difference in terms of complexity is small or large.
The experiment was implemented as an incomplete pairwise comparison design because a complete design would have required that participants judged all 780 pairs that can be formed from the 40 drum patterns. The incomplete design was implemented as follows: The 40 stimuli were divided into two subsets. Each subset featured 20 drum patterns that covered the same range on the The first phase of the data collection (November 2020 – January 2021) established the relative complexity of the stimuli The second data collection phase (March 2021) established the relative complexity
The combined dataset collected in phases 1 and 2 allowed for the relative perceived complexity of all 40 stimuli to be estimated using one single probabilistic model (see
Procedure
Participants filled the survey online on the SosciSurvey platform (www.soscisurvey.de). The survey was available in two languages, German and English. Participants gave informed consent and responded to a series of demographic questions (gender, age, country of residence, survey language skills, musical preferences). Participants’ musical expertise was assessed using the
Participants were asked to carry out the experiment in a quiet location using good quality headphones. They listened to one drum pattern (similar to the 40 experimental stimuli, but not part of the set) in order to test their audio equipment and set playback to a fairly loud, but agreeable volume. They were asked not to change the volume settings after this initial adjustment. Participants then carried out 20 experimental comparisons. In each trial, the two stimuli were presented as separate audio players within the browser window. Participants started the playback themselves and were allowed to listen to the stimuli as many times as they liked. They answered the question “Which of the two drum patterns sounds more complex to you?” using radio buttons. No definition of complexity was offered to the participants, because we wanted them to apply their own concept of complexity. Participants were advised to go with their gut feeling if they struggled to make a decision. They could not advance in the survey without selecting a winning pattern in each trial (no ties were allowed).
After the last trial, participants were asked the question: “What makes a drum pattern complex?” They wrote their answers in a free text field but could opt out of answering this question. The data from this text field will be analyzed in a qualitative study in the future. Finally, participants could volunteer to share their e-mail addresses if they wanted to be informed about the outcome of the study and/or be invited to future surveys. This contact information was saved separately from the experimental data to preserve the anonymity of the participants. On average, it took participants 16 min to complete the survey.
Data Collection and Filtering
A total of 260 participants completed phase one of the survey between November 2020 and February 2021. They were recruited through two channels: 84 participants responded to an e-mail invitation that went out to the students and faculty of the Lucerne University of Applied Sciences and Arts (School of Music) and to personal connections of the authors. These participants filled either the German (
In each trial, participants were asked to listen to both drum patterns in their entirety before choosing a winner. Observations (
Phase 2 of the experiment was offered in German only. Invitations to participate were circulated via e-mail among students and faculty of the Lucerne University of Applied Sciences and Arts (School of Music) and personal connections of the authors. Participants received a cafeteria voucher (with the possibility to opt out). Phase 2 yielded complete and valid observations from 11 participants.
Participants
The data of
A total of 37 participants self-identified as professional musicians, 34 as music students, and 43 as amateur musicians. The remaining participants stated they were either music listeners (104) or not interested in music (2). Participants had a mean
Statistical Analyses
The experimental data were analyzed using the Bradley–Terry probability model, which was developed to analyze the outcomes of pairwise comparison experiments. The model was first studied by Zermelo (1929), then formalized by Bradley and Terry (1952), and later subsumed under the family of generalized linear models, due to its close relationship to the logistic probability model (Agresti, 2007, 2012; Bradley & El-Helbawy, 1976; Perez-Ortiz & Mantiuk, 2017). All statistical analyses were carried out with
Results
Complexity Estimates and Their Interpretation
The experiment yielded 4,400 valid trials. Every stimulus participated in 220 trials: it was compared 11 times against each of the 19 other stimuli in the same subset (phase 1 of the experiment) and another 11 times against one complexity-matched stimulus from the other set (phase 2 of the experiment). A Bradley–Terry model was fitted to the data. The fit was good (

Perceived complexity (Bradley–Terry estimates
The Bradley–Terry coefficients allow the estimation of the probability that one drum pattern wins a pairwise comparison trial against another drum pattern when judged by a random member of the listener population, as represented by the participant sample. The estimated success probability
Inference on the difference between the
The most complex stimulus 40 (“Jelly Belly”) has a very high probability of
Effect of Musical Training on Complexity Judgments
The sample of participants consisted of people with extreme differences in terms of musical training, as measured by the
If so, then music experts’ complexity judgments should agree with each other often, resulting in lopsided distributions of wins between high- and low-complexity stimuli. Conversely, the musical non-experts can be expected to perceive complexity differences less clearly. In this case, we expect the trials to be decided by chance more often, which would result in a more uniform distribution of wins across stimuli.
Table 2 lists the frequencies of wins for each stimulus, grouped by musical training. The high-training group consists of participants who had an
Number of wins for each stimulus among participants with high (
A
Training Set and Test Set
Future users of the stimulus set may want to divide the stimuli into two subsets of 20 stimuli to define one set as a training set for complexity model development and the other as a test set for model assessment. The variable
Mean, standard deviation, skewness, and excess kurtosis of the
Control Variables
The 40 drum pattern stimuli closely reproduce the rhythm, timing, and dynamics of the drum patterns played in the original recordings. Thanks to the detailed reconstruction, the stimuli are highly idiomatic and allow for the creation of experiments with good ecological validity. However, this implies that the stimuli differ in many ways besides the differences between the rhythmic patterns as represented in the transcriptions (transcriptions.pdf). These additional dimensions may need to be controlled when the stimuli are used in an experimental setting.
The drum pattern data file (drumpatterns.Rda) lists all information that was used to create the audio reconstructions. For every note onset (
The stimuli data file (stimuli.Rda) contains discographic and complexity-related data but also a series of descriptive statistics associated with each of the 40 stimuli. These statistics were derived from the drum pattern file and the audio files: They provide the duration of each stimulus in seconds (
Table 4 presents the results of tests that investigate whether
Pearson correlation coefficients (
Perceived Complexity and Objective Measures of Drum Pattern Complexity
How do the subjective − − − −
The
Table 5 shows that all four measures were strongly and positively correlated with
Pearson correlations of four measures of rhythm complexity with
Figure 2 shows scatterplots of

Scatterplots of
Stimuli and Data Availability
The audio reconstructions of the stimuli (mp3), transcriptions in drumset notation (transcriptions.pdf), and all information used to create the stimuli (drumpatterns.Rda) can be found online at
Discussion
This study presents a set of stimuli consisting of 40 drum patterns drawn from Western popular music. These stimuli replicate the original drum patterns faithfully by using information from detailed transcriptions in drumset notation, measurements of highly precise note onset timing, and subjective assessments of each stroke's loudness, provided by two experienced professional musicians. Drum patterns can be heard alone quite frequently in this kind of repertoire: Many songs start with a solo drum pattern (examples are Michael Jackson's “Billie Jean,” Stevie Wonder's “Superstition,” or Led Zeppelin's “When The Levee Breaks”) or there may be a drum break later in the song (e.g., James Brown's “Funky Drummer” or The Winstons’ “Amen, Brother”). Consequently, drum patterns are likely to evoke a naturalistic listening experience and may be used for experiments that aim at ecological validity. Drum patterns are considered essential elements of groove in Western popular music; hence this set of stimuli will be particularly useful for groove research.
The high similarity of the audio reconstructions with the originally recorded drum patterns comes at the cost of experimental control: The stimuli do not just vary with respect to complexity but show differences in many other dimensions. To account for the effect of these types of variation, a series of control variables has been operationalized that allow experimenters to monitor the effects of nuisance variability. All data that informed stimuli creation have been provided as a dataset (drumpatterns.Rda). This information can be used to develop further control variables or to create new stimuli sets in which specific types of nuisance variability are suppressed.
The listening experiment provided a subjective measure of
We did not find an effect of musical training on the complexity judgments of the participants. Consequently, it is plausible that the participant sample was drawn from a homogenous population with respect to their ability to discern differences of complexity. This result contributes a nuance to the general findings in music psychology that music training enables superior discrimination across many domains of music perception (Besson et al., 2007; Chartrand et al., 2008; Neuhaus et al., 2006; Penhune, 2019): Musical training does not appear to alter perception of complexity in popular music drum patterns. This counterintuitive result might be due to the ubiquitous presence of popular music in the media of the globalized West: An adult person with intact hearing is likely to have heard songs from this repertoire on many thousand occasions during their lifetime. So, people both with and without formal music training might implicitly be familiar with the properties of popular music drum patterns by sheer exposure.
It would be interesting to repeat this study's experiment with children, adolescents, and adults to investigate whether these groups show different abilities to distinguish between complex and simple stimuli. Carrying out this experiment with professional drummers/percussionists could potentially also lead to a different result, as this population may not only consider the perceived complexity of the patterns but also the difficulty of learning and playing them. We did not explicitly collect data from the highly proficient drummer population, so this hypothesis cannot be tested on the basis of this study's data.
We did not find significant relationships between eight loudness-, tempo-, or microtiming-related control variables and the
Four objective measures of stimulus complexity were positively correlated with the
Experimenters can use the complexity measurements, the control variables (stimuli.Rda), and the pairwise significance test data (p_matrix.Rda) to choose drum pattern subsets according to their specific requirements. The pairwise comparison data (contests.Rda) is available if researchers want to add new stimuli to the set; in this case it will be necessary to measure the perceived complexity of the new stimuli against each other and in relation to the existing stimuli in the course of a listening experiment and fit a Bradley–Terry model to the combined data.
The stimulus set and measurements have a series of limitations. First, the drum patterns are drawn from commercial recordings of Western popular music; they only cover the complexity range that is idiomatic within this repertoire. Consequently, these stimuli will not be adequate for studies that investigate the effects of extremely high or low complexity. Second, rhythm is the primary source of complexity in these drum pattern stimuli. Other types of complexity (harmonic, melodic, formal, or others) are not varied in this stimulus set. This means that experimental results based on these stimuli will not permit statements on musical complexity in general. Third, we can expect that the perceived complexity of a stimulus decreases with familiarity (Hannon et al., 2012). We did not measure participants’ familiarity with the stimuli, and therefore we cannot assess the relevance of this effect for this study. The complexity measurements rely on the perception of participants that predominantly live in Western, educated, industrialized, rich and democratic (WEIRD) countries (Muthukrishna et al., 2020). Researchers should be aware that experimental results based on these stimuli are primarily valid for the sampled repertoire and population. It may not be warranted to generalize results to different musical repertoires and to other human populations (Apicella et al., 2020; Henrich et al., 2010; Jacoby et al., 2021).
Conclusion
This study (1) presented a set of forty naturalistic stimuli that vary along a complexity dimension. (2) It provided a perceived complexity measurement that has an intuitive probabilistic interpretation for each of the stimuli. (3) These subjective measurements were then used to test existing objective measures of drum pattern complexity and to formulate a benchmark for future objective measures. The stimuli are more naturalistic than those presented in Clemente et al. (2020) and are associated with a ground truth measurement of complexity that is more accurate than the
Footnotes
Action Editor
Jessica Grahn, Western University, Brain and Mind Institute & Department of Psychology.
Peer Review
Deniz Duman, University of Jyväskylä, Department of Music, Art and Culture Studies.
Maria Witek, University of Birmingham, Department of Music.
Contributorship
Data Availability
Consent to Participate and Publish
Informed consent was obtained from all individual participants included in the study.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethics Statement
This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Lucerne University of Applied Sciences and Arts (October 30, 2020/No. EK-HSLU 005 M20).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (grant number 100016 192398/1).
