Abstract
The present study approached drumming motion from a three-fold perspective: computational movement feature analysis, perceptual evaluations of movement animations, and both in combination. We motion-captured a professional drummer performing two rhythms at two tempi with varying combinations of dynamics and amount of movement. Movement feature analysis indicated that the drummer varied their movement such that movement fluidity was higher in slow tempo compared to fast tempo, in particular for experimental conditions involving more movement and softer dynamics. Movement complexity was highest when playing slowly in the “much movement” condition. Subsequently, a perceptual online experiment was conducted using the drummer's animations with and without audio, asking observers (n = 114) to judge perceived passage of time, expressivity, and tempo. Results indicate that passage of time ratings were related to movement instructions, with much-movement performances being perceived to pass more quickly than little-movement ones. Furthermore, fast tempo stimuli were perceived to pass more quickly than slow tempo stimuli. Expressivity ratings were associated with the stimulus tempo as well, with fast tempo stimuli being rated as more expressive than slow tempo stimuli. Moreover, much-movement performances received higher ratings for expressivity, reinforcing the link between expressivity and movement. Including the movement features in the statistical model revealed that increased movement complexity and amount of movement related to higher perceived expressivity. While movement could be well distinguished between the different drumming conditions, the analysis regarding the other perceptual judgments (passage of time and tempo) indicated less influence of the movement characteristics, implying that participants rather focussed on other aspects when rating these. The outcomes suggest systematic links between bodily implementation of drumming and its perception related to time, tempo, and expressivity.
Introduction
Drumming is among the most physical music performance activities, where movement is directly related to the sounding output. And despite the persistent myth that drummers necessarily play loud when playing fast, the limitations of human motor abilities can, indeed, impose constraints on expressive performance at different tempi, dynamics, and rhythmic complexities, meaning that some combinations of physical and musical parameters might be more difficult to execute than others. Thus, this study investigates the interplay of tempo, rhythmic complexity, dynamics, and movement in a percussion context, that is, how different combinations of the above-mentioned criteria are performed by a highly experienced drummer, as well as how those combinations impact external observers’ perception of the drumming performance.
Body posture and movements are displays of musical expression and intentions, being one of the fundamental claims of the notion of embodied music cognition (Leman, 2007). Embodied cognition claims that human cognition and intelligent behavior require goal-directed interaction between mind/brain, sensorimotor capabilities, body, and environment – thus, a coupling of action and perception, instead of merely passive perception (e.g., Varela et al., 1991). Music or musical involvement can then be seen as connecting music making and music perceiving by sound as well as by movement that in turn reflect, imitate, and help parse and understand musical structure and its elements (Leman, 2007). The goal of a musical performance is to both convey music structures as well as intended expressions, which not only happens by the sound of the instrument, but also by movements of the body. Partly, these movements are needed to actually play (e.g., pressing a key or moving a bow), but there are also so called ancillary movements that are not directly related to creating sound (e.g., Nusseck et al., 2018; Wanderley, 2002). These are rather of supportive, communicative nature and carry, for instance, expressive meaning, maintain musical and expressive flow of the performance, or are used to interact with other players or the audience. An even more specific division was suggested by Jensenius et al. (2010): sound-producing gestures, sound-facilitating gestures, sound-accompanying gestures, and communicative gestures. When musicians are asked to play with exaggerated expression, they often increase their amount of movement (e.g., Thompson & Luck, 2012; Vines et al., 2006).
Drumming is strongly embodied, as the drum kit is one of the few instruments that is played with all four limbs in usually somewhat independent manners (e.g., on different metrical levels). Thus, most movements executed by a drummer are directly related to the sounding output. In this respect, Dahl (2011) points out that expert drummers exhibit detailed control over timing and sound production, such that, for instance, movement amplitude and sound characteristics are closely linked: large movements, for example, generally relate to playing with loud dynamics. Kawakami et al. (2008) could show that drummers used less acceleration when asked to play softer compared to playing louder. Dahl (2006) furthermore found that dynamics influenced both how high the sticks were lifted before a stroke (higher lift resulted in louder play) and the timing of the stroke. Moreover, rhythmic characteristics may change, as for instance strokes get more prolonged when playing softly compared to playing loud.
Another aspect impacting drumming performance is tempo. Drummers show higher synchronization ability and accuracy than other instrumentalists and non-musicians, in particular at very slow and very fast tempi (e.g., Dahl, 2006; Krause et al., 2010; Repp, 2010). However, biomechanical constraints prohibit playing at tempi too fast (Fujii et al., 2011), while perceptual and cognitive time-keeping mechanisms impede playing at very slow tempi (Repp & Doggett, 2007). Buck et al. (2021) found that drummers played less accurately at faster than at slower tempi, with expert drummers performing better than amateurs. Gonzalez-Sanchez et al. (2019) found that differences in fluency can also relate to tempo and expertise: faster tempi tended to be performed with smoother movements than slower tempi, while experts used smoother strokes in general than less expert drummers. Moreover, tempo relates to dynamics, meaning that faster play tends to be louder, as Danielsen et al. (2015) found in a drumming study, where medium and fast tempo excerpts were played louder than slow tempo ones. They suggested that playing faster demands more effort, which in turn might increase the force on the drum stroke, making it hit harder and thus sounding louder. In addition, sound amplitude and striking speed of expert drummers were found to increase with faster tempo across a variety of motor tasks (Chen et al., 2016; Fujii et al., 2009; Tsuji & Nishitaka, 2006). However, Dahl (2011) indicated that extreme combinations of tempo and dynamics can be difficult even for experts. Moreover, in a study by Câmara et al. (2020), rhythmically more complex patterns were played louder than the simpler patterns, suggesting an additional influence of the rhythmic structure on dynamics. Ultimately, Dahl (2006) pointed out that drummers need to perform any music expressively at any given tempo and dynamics; they need “to be able to master the range of expression needed, given the tempo, in terms of dynamics and sound qualities.” (p. 126). This notion is also reflected in a study by Di Mauro et al. (2018), indicating that a professional drummer successfully encoded different expressive intentions for different genres, tempi and rhythmic complexities, which were subsequently identified by musically trained and untrained participants.
Musicians are able to convey expressive intentions to an audience, both using audio and movement (for a comprehensive account of musical expressiveness in various musical traditions, see Fabian et al., 2014). The role of musicians’ expressive body movements has been studied extensively, showing for instance that performers such as marimba players exhibit body sway movements that are perceived to be particularly expressive (Broughton & Davidson, 2016). Audience members, in turn, may perceive these movement features and decode expressive intentions visually, in addition to the auditory cues. Nusseck and Wanderley (2009) found in a perception experiment with professional clarinetists that the animations with higher movement amplitudes received higher ratings in expressivity/intensity. Moreover, study results by Davidson (1995) suggest that visual information has even stronger impact on observers’ ratings of expressivity (thus, carrying a large amount of expressivity information) than audio-only or audio-visual presentations. In performances of large (Western-style) ensembles, conductors shape expressiveness by their gestures, resulting in the ensemble performance being perceived as being more or less expressive (for an overview, see Morrison et al., 2009). There is some debate as to whether perceived expressiveness is mainly related to large body movements (as can be measured, e.g., by quantity of movement), as pianists’ overall body movements may be highly idiosyncratic (Camurri et al., 2004), and conductors may have a variety of ways to convey expressiveness including nuances in facial expression (Wöllner, 2008). Taken together, visual information facilitates additional information on musical parameters, such as tempo, timing, and dynamics, and may influence expressivity perception of the music.
Characteristics of the performance affect audiences’ judgments not only of expressivity, but also of their perceptions of time. Several studies investigated the effects of musical characteristics on changes in subjective experiences of time (for an overview, see Wang & Wöllner, 2020). Faster music, for instance, leads to longer estimations of the music's duration compared to slow music (Droit-Volet et al., 2013; Hammerschmidt & Wöllner, 2020). A further highly relevant measure for time perception is “passage of time” (PoT) that involves judgments of how fast or slow a certain time interval had passed subjectively, occasionally in relation to presumed clock-time (Droit-Volet & Wearden, 2016). An advantage of the PoT measure (rating scale, from slow to fast), in comparison with duration estimation (time interval judgment in seconds) lies in the individual's awareness that time can be perceived differently based on stimulus properties or emotional states. In a study involving hip-hop music (Wöllner & Hammerschmidt, 2021), participants experienced time to pass more quickly with increased cognitive load (highest for dual working-memory task, lowest for listening-only). There was no effect of musical arousal between hip-hop pieces that differed in parameters such as RMS energy and event density (notes per second), while the tempo was kept constant across all musical examples. These results suggest that subjective time experiences may vary in relation to musical listening conditions, in addition to tempo effects and other musical parameters as found in previous research (Hammerschmidt et al., 2021; Wang & Wöllner, 2020).
Aims
The current study aims to investigate the interplay of musical characteristics such as tempo, rhythmic complexity, and performance instruction relating to dynamics (playing loud, medium, and soft) as well as the amount of movement (much, normal, and little) performed by a professionally trained drummer. The impact of the performer's action on the perception of music performance by independent participants is investigated, scrutinizing the link between musical movements under different performance conditions and experiences of expressivity, time, and tempo. Due to the strong relationship between movement and sounding outcome, and drummers usually being the ones responsible for keeping and providing time in musical ensembles, drumming seems a very suitable activity for studying such links between temporal, musical, and in particular expressive perception in musical performances in visual and audio-visual modalities.
Therefore, this study consisted of a motion capture and a perceptual part. In the motion capture part, a professionally trained drummer performed different combinations of tempo, rhythmic complexity, dynamics, and movement. The resulting motion capture animations were subsequently rated by external observers in a perceptual experiment regarding perceived passage of time, expressivity, and tempo, being relevant dimensions relating to the perception of musical performances. Based on previous literature, we developed the following predictions regarding several performance characteristics (tempo, rhythmic complexity, loudness, movement), its consequences for the drummer's movement, and the perception of the performances.
H1) We predict that performance tempo is positively associated with the amount and smoothness of movement as well as perceived expressivity, and the perceived passage of time (PoT) (cf. Dahl, 2006; Danielsen et al., 2015; Di Mauro et al., 2018; cf. Droit-Volet et al., 2013; Hammerschmidt & Wöllner, 2020). H2) We predict that rhythmic complexity is positively associated with the amount, jerk, and complexity of movement as well as perceived expressivity. Higher complexity would further predict slower PoT (cf. Câmara et al., 2020; Wöllner & Hammerschmidt, 2021). H3) We predict that performance dynamics is positively associated (i.e., when louder) with amount, jerk, and complexity of movement as well as perceived expressivity and tempo. (cf. Câmara et al., 2020; Dahl, 2011; Danielsen et al., 2015; Kawakami et al., 2008). H4) We predict that performance movement is positively associated (i.e., more movement used) with amount, smoothness, and complexity as well as perceived expressivity and tempo (cf. Nusseck & Wanderley, 2009).
Materials and Methods
Participants
A total of 114 participants (mean age: 27.25 years, sd age: 8.24, min age: 17, max age: 75; female: 59, male: 52, other: 3) completed the perceptual online study which presented one professional drummer in different performance instruction conditions. The study was administered via the platform SoSci Survey (Leiner, 2019). Recruitment was conducted via email lists, as well as social media and survey exchange websites (www.surveyswap.com and www.surveycircle.com). All participants provided informed consent prior to taking part. A lottery of four vouchers, each worth €15, was conducted after the data collection.
Ninety-five participants had a Bachelor’s degree or higher, suggesting a relatively highly educated sample. In order to assess participants’ musical experience, we used the Goldsmiths Music Sophistication Index (Müllensiefen et al., 2014). Regarding “Active engagement with music” (Factor 1), 53 participants indicated low engagement, 41 medium engagement, and 20 high engagement (based on the percentiles, divided into thirds, provided in Müllensiefen et al., 2013); as regards “Perceptual abilities” (Factor 2), 52 participant were found in the lowest third of the range, 38 in the middle third, and 24 in the highest third; with respect to “Musical training” (Factor 3), 33 participants identified themselves as little musically trained, 54 as intermediately trained, and 27 as highly trained. Eleven participants indicated drums/percussion as their main instrument. Furthermore, most participants indicated to have low physiological arousal (see Kallen, 2002; possible range 7–63, mean of sample: 18.73, SD: 9.92) and were of a neutral to positive mood (based on the Positive and Negative Affect Schedule (PANAS-SF), Watson et al., 1988; possible range: 10–50, mean of positive terms: 27.27, SD: 7.51; mean of negative terms: 14.61, SD: 5.96).
Stimuli
The stimuli comprised visual-only and audio-visual video clips of 15-s drumming performances of a professional, classically trained drummer (sex: male, age: 27). The drummer performed two rhythms (one simple and one complex) at two different tempi (80 and 140 BPM) with a combination of a loudness and a movement instruction on a drum kit consisting of kick drum (played with the right foot), snare (played with the left stick), and hi-hat (played with the right stick). These instructions were either loud, medium, or soft, and much (i.e., exaggerated, larger movement than he would usually use), normal (i.e., how he would usually play), or little (i.e., less movement than normally) movement (so for instance ‘play R1 in 80 BPM loud and with little movement’), following the research tradition, for instance, by Davidson (1995), Vines et al. (2006), and others. Example animations can be found at https://tinyurl.com/2p8u7rxe (material will later be published).
All combinations (rhythm, tempo, instructions, resulting in 36 combinations; see Figure 1) were randomly performed following a prompt on a monitor next to the drum kit controlled using a custom-made patch in Max 8 (www.cycling74.com). In order to provide the drummer with the respective target tempo, metronome click tracks were created using Audacity 2.4.2 (www.audacityteam.org) consisting of 4 beats of tempo customization, followed by a 15-s metronome phase, with the drummer being asked to synchronize as accurately as possible to the beat and a 15-s continuation phase, in which the drummer was asked to keep the target tempo as accurately as possible without an audible metronome. The drumming was recorded using a set of four Fame drum microphones (one in the kick drum, one attached to the snare, and two overheads with one close to the hi-hat and the other one above the middle of drum kit for the room sound) into Ableton Live 10 on a Windows laptop using a Soundcraft Signature 12 mixing desk as well as using two Sennheiser MK4 microphones positioned left and right of the drum kit into AudioDesk 4.01 (https://motu.com/products/software/audiodesk) via a Motu 828mk3 sound device. The metronome click track was split and recorded into both applications to be able to synchronize both recordings afterwards. The AudioDesk recording was furthermore used afterwards to synchronize the motion capture recordings with the audio recordings via an SMTPE pulse provided by the Motu sound device.

Stimulus characteristics and performance instructions performed by the professional drummer in the motion capture experiment. (Material will be published later).
The drummer was recorded using a 10-camera Qualisys optical motion capture system (Oqus 7, Qualisys Track Manager 2020.2, www.qualisys.com) at a frame rate of 200 frames per second. Thirty-three markers were attached on the drummer, one on each stick as well as five on the drum kit (see Figure 2A). For the animations used in the perceptual experiment and the movement feature analysis, the marker setup was slightly modified (see Figure 2B) using the Matlab Motion Capture Toolbox (Burger & Toiviainen, 2013). The markers of the drummer were reduced to 22 (20 to cover the body and 2 for the sticks) to remove redundant markers that were used to eliminate markers that got hidden during the recording process, covering the main moving joints. The markers of the drum kit were increased to 16 to obtain a clearer visualization of the drum kit in the animations.

Visualization of the marker locations during recording (A) and the transformed configuration used in the analysis (blue markers) and the perceptual experiment (B).
Subsequently, all mocap and audio recordings were trimmed to the playing part using the SMTPE pulse of the Motu, and animations were created using the Motion Capture Toolbox. In order to produce the audio-visual clips for the perceptual experiment, drumming audio recorded with the Fame overhead “room” microphone was found most suitable to be added to the animations, as it represented the drumming sounds well and equally distributed. As the microphone was furthest away from the kick drum, the audio tracks were low-frequency-boosted by 12 dB using Audacity (Effect “Bass and Treble”). We further adjusted the dynamics of the recordings as the raw dynamics differences were too large for the perceptual experiment, with the soft stimuli not being very audible and the loud stimuli being uncomfortably loud. To do so, we normalized (Effect “Normalize”) each recording between 0 and −10 dB to have an average RMS value (root mean square of the loudness) close to 0.4 as verified with the Matlab MIRtoolbox (“mirrms”, Lartillot & Toiviainen, 2007), so that all audio-visual clips were of similar loudness, though the relative differences were still perceivable (see Figure 3 for an overview of the dynamics processing). Animations were cut to the last 15 s of the continuation phase to provide the observers with the most unrestricted part of the performance. To indicate the start of the clip and focus visual attention to the center of the screen, a fixation cross of 1 s was added to the beginning of each video clip. Figure 4 shows the pipeline of collecting and processing the motion capture and audio data.

Dynamics processing of the audio files. Left: raw: mid: after bass-boosting; right: after normalization.

Pipeline of the motion capture and audio recording processing. Bold font indicates the experimental variables.
Procedure
In the perceptual online experiment, each participant was asked to rate 18 visual-only and 18 audio-visual animations, thus half of the entire set of stimuli. This was done to keep the time required to complete the online experiment feasible (25 to 30 min altogether). We chose not to include an audio-only condition, due to time constraints and the focus of the experiment on movement characteristics (as opposed to modality differences). From the total set of stimuli four groups were created prior to the experiment, so that each group would consist of a different subset of 18 visual-only and 18 audio-visual stimuli, with each performance either being included as the visual-only or the audio-visual version. It was paid attention that each group (either visual-only or audio-visual) contained stimuli representative of all conditions, i.e., performances with both tempi, rhythms, and both instructions. During the experiment, participants were presented with one of the four groups based on their month of birth (Jan–Mar: group 1 (26 participants); Apr–Jun: group 2 (27 participants); Jul–Sep: group 3 (33 participants); Oct–Dec: group 4) (28 participants). This birth month-based selection was used as a simple way to randomly distribute the participants into four similarly-sized groups. The 36 stimuli were presented in four blocks of nine stimuli, two blocks containing the visual-only clips, the other two the audio-visual clips, in alternating presentation (so either V, AV, V, AV or AV, V, AV, V based on the month of birth selection). Furthermore, one trial was added to each block, which was a repetition of one clip from the respective other V or AV block to check that participants rated consistently, summing up to 40 stimuli per participant.
At the start of the online study, participants were asked to provide consent for their participation, given an introduction to the experiment and asked to check their audio and video playback. Next, they provided their demographic information and answered a battery of questions regarding their musical background (Factor 3 “Musical training” plus “best played instrument” from the Goldsmiths Musical Sophistication Index, Müllensiefen et al., 2014) – if they chose “Drums (percussion instruments)” for the instrument they know best, further questions regarding their drumming experience were provided (self-developed, not part of the Gold-MSI). Then, they filled in two short questionnaires regarding their current arousal and mood levels (physiological arousal, Kallen, 2002, and PANAS-SF, Watson et al., 1988) and were given a test trial in which the rating scales and the interface were explained and participants could perform a test round. After this test trial, participants selected their birth month, and the actual experiment started.
Each performance was rated on the following scales (each a 7-item Likert scale):
Passage of Time (how quickly time had passed, i.e., it went by relatively fast – as if it “flew” – or relatively slow; scale ranging from slow to fast) Expressivity (how expressive the performance was perceived to be; scale ranging from not expressive to very expressive) Tempo (how fast/slow the performance was perceived to be; scale ranging from slow to fast).
After rating all 40 trials, participants were asked about their music engagement and their perception abilities (Factor 1 “Active Engagement” and Factor 2 “Perceptual Abilities” from the Goldsmiths Musical Sophistication Index). The survey was available in English and German (88 participants used the English version, 26 the German one). A visual overview of the flow of the experiment is displayed in Figure 5.

Overview of the perceptual experiment. Light grey monitors indicate information and testing screens, dark grey monitors indicate background questionnaires and materials, and the black monitor indicates the perceptual ratings part.
Data Analysis
Motion Capture Data
Motion capture data analysis was performed to investigate differences between the different conditions and instructions. From the 22 markers of the reduced marker configuration, the eight markers that move (most) when playing (elbows, wrists, fingers, sticks) were used in the computational feature extraction (see blue markers in Figure 2B). Three higher-level features were extracted from the data using Motion Capture Toolbox (Burger & Toiviainen, 2013) in order to describe the movements on a general, overarching level: Movement Fluidity, Movement Complexity, and Amount of Movement. Movement Fluidity describes how smooth or jerky the movement was and was computationally extracted using the ratio between velocity and acceleration of the position time-series data (a high value refers to smoother movement, whereas a low value would mean jerkier movement, see Burger et al., 2013; Mocap Toolbox function mcfluidity using a Butterworth smoothing filter and the default parameters). Movement Complexity refers to how many dimensions/directions the movement covered, based on explained variance of the first five components using principal component analysis (large variance refers to low complexity, as the first five components are sufficient to explain the movement, whereas a low variance means more complex movement, see Burger et al., 2013; Mocap Toolbox function mccomplexity). Amount of Movement is based on the cumulative distance that the eight markers travelled during the performance, thus referring to how much movement took place (used as a control feature for the “movement” instruction; Mocap Toolbox function mccumdist). The three features resulted in one value per performance each. As only one drummer took part, we kept the following analysis descriptive and additionally ran a K-means cluster analysis (using R [Version 3.6.2; www.r-project.org] and the function pam from the package “Cluster”, which “clusters ‘around medoids’, a more robust version of K-means”, see Mächler et al., 2021) to investigate whether there is any systematic pattern visualizable how the drummer performed the instructions and musical characteristics.
Perceptual Data
In order to prepare the perceptual data for further analysis, we first checked for outliers (e.g., participants not rating seriously and honestly, taken the lack of control in an online experiment, cf. Eerola et al., 2021) by calculating the average and standard deviation for all ratings per participant. Average values ranged between 2.86 and 5.96 with SDs between 0.75 and 1.95. As no unreasonable distributions (such as very low mean or variability) were detected, we kept all participants in the analysis. For five participants, the survey programming had a small glitch that one stimulus was presented twice and one not at all. However, after checking that these participants did not change the results overall, we kept them in the analysis, disregarding the second instance of the doubled rating. Regarding the duplicated trials that were added to check for consistency, intraclass correlations were calculated between the ratings for the pairs of same stimuli. All ICC coefficients [ICC(1)] were significant at p < .001, being between .36 and .72, apart from the audio-visual trials for Expressivity that yielded a non-significant result [ICC(1) = .03, p = .41] (Shrout & Fleiss, 1979). Checking the crosstabs distributions, changes were mostly in the +/−1 range, so we concluded that participants rated the consistency trials similarly enough compared to the original trials to remove the consistency trials from the further analysis. Finally, an analysis based on cumulative link mixed models was chosen to account for the ordinal nature of the rating data as well as the sampling method (i.e., participants only rating half the stimuli), predicting, in the first part, the rating responses from the stimulus characteristics as fixed factors (dynamics and movement instructions, tempo, rhythmic complexity, and the modality (visual-only or audio-visual animations) with participant and stimulus as random factors accounting for the sampling method (i.e., participants only rating half the stimuli).
The second part encompassed predicting the ratings using the movement features. Analysis was performed in R using the function “clmm” from the package “ordinal” (Bojesen Christensen, 2019). Post hoc tests to evaluate significant differences in main effects and interactions were conducted using the “lsmeans” function from the package “emmeans” (Lenth et al., 2021). Significance levels in post hoc group differences were adjusted using Bonferroni correction.
Results
The first part of this section will cover the results regarding the motion capture data to show how the drummer implemented the different stimulus characteristics and performance instructions. In the second part, the results of the perceptual experiment will be presented, giving insights into how observers perceived the drumming performances. The third part will link movement features and perceptual ratings to investigate how the drummer's movements relate to the perceptions indicated by the observers.
Movement
The overview of the distribution of the three movement features for all performances is shown in Figure 6. With respect to Movement Fluidity, more differences are visible within the slower tempo performances compared to the faster tempo, with the much and normal movement instructions being more fluid than the little movement instruction. Furthermore, the loud performances were least fluidly performed and the soft ones most fluid. Overall, the faster performances were much more similar with respect to fluidity. Moreover, the variation in rhythmic complexity did not visibly change the fluidity. Regarding Movement Complexity, the “slow tempo much movement” conditions resulted in most complex movements, in particular the loud performances. Both normal and little-movement performances received relatively low complexity values. A descending pattern (starting from the “loud much movement” performances to the “soft little movement” ones) with rather low complexity values was further found for the faster tempo performances. Again, the rhythmic complexity did not produce a difference. As regards the Amount of Movement, both tempi received similar values, in particular for the “much movement” performances, while the normal and little-movement performances showed slightly less cumulative movement in the slower tempo compared to the faster. Again, differences for the two rhythmic complexities were marginal.

Descriptive results of movement features, displayed per performance (36 performances including two tempi, two rhythmic complexities, three movement instruction levels, and three dynamics instruction levels).
Subsequently, using the three movement features, a K-means cluster analysis was conducted in order to further (visually) explore the inherent relationships between the instruction and musical variables. The number of three clusters was chosen conceptually, as three levels of movement and dynamic instructions were used, aiming to display this three-fold structure (see Figure 7). When plotted in two dimensions, the three clusters were found to represent the movement instruction (much movement to the right, normal in the middle, and little movement to the left), with the horizontal axis thus resembling the movement instruction and the vertical axis the two tempi (slow at the top, fast at the bottom). While the performances of the same tempo and instruction combination but different rhythms were located very closely to each other (implying that bodily performance features did not differ much with respect to complexity), the different dynamic levels were arranged in a c-shaped pattern (in particular the medium and soft performances), possibly suggesting that movement instruction is more strongly represented than the dynamics instruction. (Note: using data driven methods to determine the optimal number of clusters, such as the Silhouette Method, would result in 2, combining the normal and little-movement performances into one cluster.)

K-means cluster solution based on the three movement features.
Taken together, these results show differences between the performance instruction conditions that follow a somewhat systematic pattern. In the perception experiment, it was investigated whether these features have an impact on individuals’ ratings.
Perception
Cumulative link mixed model (CLMM) analyses were performed predicting the three perceptual rating scales (Passage of Time, Expressivity, and Tempo) with the instructions dynamics and movement levels, the stimulus characteristics tempo and rhythmic complexity, and the modality as fixed factors and participant and stimulus as random factors. The model building was done so that for each perceptual rating scale, ten models (see Table 1) were created, with the fixed factors being either added as main effects only or as interaction terms in varying combinations (including pairings resulting from the two instructions as well as the stimulus characteristics, and modality either interacting with the instructions or stimulus characteristics or all to keep combinations feasible and reasonable based on the assumptions and experiment design), with the random factors being stable (adding both random factors outperformed the models with only one of the two random factors, thus both were included in all models here). The model with the smallest AIC (Akaike information criterion) value was chosen as the final model.
Overview of the model building for each of the three perceptual ratings (“PR”, so either Passage of Time, Expressivity, or Tempo) in either visual-only or audio-visual modality.
“*” indicates an interaction between the factors, “+” indicates no interaction (just added as main factor), fixed factors are displayed in regular font, random factors in italics.
For Passage of Time, model m1 (main effects only) was found most significant (see Supp. Table 1 in the supplementary material), showing significant main effects for movement, dynamics, and tempo (see Table 2). The remaining main effects were non-significant.
Significant model for Passage of Time (random factors Participant: SD = 0.15; Stimulus: SD = 0.00, 114 participants, 36 stimuli).
***p < .001, **p < .01, *p < .05.
Following up the significant main effects regarding movement, dynamics, and tempo, pair-wise differences were calculated. A visual depiction of the significant main effects and post hoc comparisons is displayed in Figure 8. For movement, much movement was significantly different from little movement (p = .003), with much-movement stimuli being perceived as having passed the fastest in time and little-movement stimuli to have passed the slowest (medium in the middle). For dynamics, the differences resulted in non-significant differences and were therefore neglected. For tempo, the fast tempo was perceived to pass significantly faster than the slow tempo (p < .001). These results suggest that both the amount of movement and the tempo considerably influence the perception of the passage of time with the more “active” performance instructions (much movement, fast) being related to faster passing time and the more “inactive” performance instructions (little movement, slow) to slower passing time.

Significant main effects movement and tempo for Passage of Time judgment. Error bars refer to the standard error. Significant levels of the differences: ***p < .001, **p < .01, *p < .05.
For Expressivity, the final model, m2 (interaction between movement and dynamics, main effects for tempo, rhythm, and modality; see Supp. Table 2), resulted in a significant main effect for tempo as well as a significant interaction for movement and dynamics (see Table 3).
Significant model for Expressivity (random factors Participant: SD = 0.00; Stimulus: SD = 0.03, 114 participants, 36 stimuli).
***p < .001, **p < .01, *p < .05.
The post hoc test for tempo revealed that the fast tempo stimuli were perceived as significantly more expressive compared to the slow tempo ones (p = .007; see Figure 9, left panel). In order to follow up the significant interaction, pair-wise tests between the movement/dynamics levels were conducted (see Figure 9, right panel). Significant differences (between the same levels) were found between: for loud dynamics between much and little movement (p = .003) with the much-movement stimuli being perceived as more expressive; for soft dynamics between much and little as well as between normal and little movement (both p < .001) with the much-movement stimuli being rated as most expressive, followed by the normal and the little-movement stimuli; and for little movement between loud and soft as well as between medium and soft dynamics (both p < .001) with the loud stimuli being perceived as most expressive, followed by the medium and soft stimuli. Overall, the much-movement stimuli were rated as being most expressive, relatively independent of the dynamic level (all three dynamic levels higher or as high as the normal and little movement ratings). The normal movement stimuli reside in between the much- and little-movement presentations with little difference between the loud and the medium dynamic ratings and slightly lower ratings for the soft performances (all three dynamic levels still higher or as high as the little movement ratings). Also, the little-movement stimuli show little difference between the loud and the medium dynamic ones, with a significant decrease for the soft performance.

Significant main effect and interaction for expressivity judgment; left panel: tempo, right panel: movement*dynamics. Error bars refer to the standard error. Significant levels of the differences: ***p < .001, **p < .01, *p < .05.
For Tempo, the final model, m2 (interactions between movement and dynamics as well as main effects for tempo, rhythm and modality, see Supp. Table 3) indicated a significant main effect for tempo (see Table 4).
Significant model for Tempo (random factors Participant: SD = 0.00; Stimulus: SD = 0.00, 114 participants, 36 stimuli).
***p < .001, **p < .01, *p < .05.
Following up the main effect using post hoc comparisons revealed that the slower stimuli were perceived as significantly slower than the stimuli of faster tempo (p < .001), indicating that the participants reliably distinguished the two different tempi (this variable also served as a control that the participants actually attuned to the experiment properly (see Figure 10).

Significant main effects tempo for tempo judgment. Error bars refer to the standard error. Significant levels of the differences: ***p < .001, **p < .01, *p < .05.
Movement Features and Perceptual Results
In order to investigate how participants’ ratings could have been shaped by the movement characteristics of the drummer, linear regression analysis was used to test if the movement features significantly predicted the perceptual ratings (averaged across participants, separated into visual-only and audio-visual presentations).
For the visual-only ratings of Passage of Time, the model was significant (R2 = .71, F(3, 32) = 26.20, p < .001). It was found that Fluidity (β = −.29, p < .05) and Amount of Movement (β = .98, p < .001) significantly predicted PoT (Complexity: β = −.26, p = n.s.), suggesting that a large amount as well as jerky movement was associated with a perceived fast Passage of Time. For the audio-visual ratings of Passage of Time, the model was also significant (R2 = .71, F(3, 32) = 25.74, p < .001), with Fluidity (β =−.44, p < .001) and Amount of Movement (β = .91, p < .001) significantly predicting PoT (Complexity: β =−.30, p = n.s.), suggesting a similar influence of movement features in visual-only and audio-visual conditions.
For the visual-only ratings of Expressivity, the model was significant (R2 = .72, F(3, 32) = 27.11, p < .001). It was found that Amount of Movement (β = .91, p < .001) significantly predicted Expressivity (Fluidity: β =−.09, p = n.s, Complexity: β =−.08, p = n.s.), suggesting that a lot of movement was perceived as highly expressive. For the audio-visual ratings of Passage of Time, the model was also significant (R2 = .77, F(3, 32) = 34.85, p < .001), with Amount of Movement (β = .83, p < .001) significantly predicting Expressivity (Fluidity: β =−.12, p = n.s, Complexity: β =−.06, p = n.s.), suggesting again a similar influence of movement features in visual-only and audio-visual conditions.
For the visual-only ratings of Tempo, the model was significant (R2 = .82, F(3, 32) = 49.12, p < .001). It was found that Fluidity (β =−.39, p < .001), Complexity (β =−.69, p < .001), and Amount of Movement (β = 1.18, p < .001) significantly predicted Tempo, suggesting that a high amount of movement as well as jerkier and less complex movement was rated being faster in tempo. For the audio-visual ratings of Tempo, the model was also significant (R2 = .69, F(3, 32) = 23.67, p < .001), with Fluidity (β =−.47, p < .001), Complexity (β =−.62, p < .01), and Amount of Movement (β = .95, p < .001) significantly predicting Tempo, suggesting again a similar influence of movement features in visual-only and audio-visual conditions.
Discussion
In this study, we investigated drumming movements from both a performance and a perception viewpoint. A professional, classically trained drummer was asked to perform two varyingly complex rhythms at two tempi with three movement and three dynamic levels. Analysis of motion capture data revealed that, based on tempo and performance instructions, the drummer played differently in terms of fluidity, movement complexity, and amount of movement, and that rhythmic complexity had less of an effect on those characteristics. In a subsequent perceptual experiment, observers perceived the clips differently with respect to passage of time, expressivity, and tempo. Moreover, combining quantitative movement features with the observers’ perception results provided insights into how the perceptual judgments related to movement characteristics of the performances.
The drummer performed the tasks (movement/dynamics instructions and musical characteristics) using different extents of the three movement features fluidity, complexity and amount of movement. The slow tempo performances showed much higher fluidity as well as varying more in fluidity than the fast tempo performances, in particular the much and normal movement levels and medium and soft dynamics. Thus, these findings are in line with our predictions (H3 and 4: dynamics and movement instructions) and converge with previous findings by Dahl (e.g., 2018) and Kawakami et al. (2008). However, they do not follow our predictions regarding tempo (H1 – higher fluidity at faster tempi), and are not in line with Gonzalez-Sanchez et al. (2019), who found higher fluidity in faster instead of slower tempi. These results might vary due to the task and study focus given (related to predicting music sophistication) and the feature computation (we used the ratio between acceleration and velocity, Gonzalez-Sanchez et al. calculated the normalized Fourier magnitude spectrum of each stroke and obtained the “spectral arc length”, which may have classified movements differently). In our case it could be that the performer had more time to perform smoother gestures in the slower tempo, especially when he was asked to use a large amount of movement with softer dynamics. This could have made him play with less force than in the loud trials, following playing techniques described in Dahl (2006; 2018).
Regarding movement complexity, the much-movement performances showed higher complexity, especially in the slow tempo, while medium and little movement in both tempi and all dynamics levels resulted in relatively similar values. These results are in line with hypothesis 1 (tempo) and hypothesis 4 (movement). The drummer used complex movement, covering several dimensions, in particular in the loud-much-movement performances in the slow tempo to implement the given task properly, putting in enough force while possibly also filling the available time. The faster tempo reduced the dimensionality, as less time was available to perform the rhythms, so a reduction might have taken place to the required and necessary movements. Movement complexity, however, seem rather unaffected by dynamics (H3). These results confirm previous findings by Buck et al. (2021) and Dahl (2006).
Amount of movement reflected the movement instructions more in the faster tempo (rather “step-wise” decrease) than in the slower tempo. In the slow tempo, there was slightly less amount in the much-movement performances for the medium compared to both loud and soft dynamics, and in the normal-movement performances for loud compared to medium and soft dynamics. Overall, the faster tempo showed more movement, in particular for the normal and little movement conditions. These results support H1 and partly H3 and H4, and could indicate that, overall, the drummer tried (possibly) too hard to implement the instructions, resulting in “overshooting” and using more movement in case of both extreme (loud and soft) dynamics much-movement performances as well as “undershooting” in case of the loud-medium-movement performances to compensate for the unusual combination of dynamics and movement. Results are in line with movement characteristics found in Kawakami et al. (2008) and Dahl (2011, 2018).
Interestingly, the level of rhythmic difficulty did not create differences, as the values were relatively similar for both rhythms, contrary to our prediction (H2). However, while the rhythms varied in complexity, the main differences arose from the kick that implemented a syncopated pattern. The hi-hat pattern was the same in both rhythms and the snare was just shifted by an eighth note, so that the actual arm movements needed to play would not differ. This could be due to the high levels of training the drummer of the current study had accumulated (cf., e.g., Buck et al., 2021), which would call for collecting data from less experienced drummers, as well as for using more complex rhythms with highly trained drummers.
The cluster solution supports these results and suggests that the drummer was consistent in the way he implemented the given tasks. The clustering followed a strong 2-dimensional division in movement (x-axis) and tempo (y-axis), implying both were strongly represented in the chosen movement features and were relatively easy for the drummer to implement. Nevertheless, the cluster solution was able to distinguish the different dynamics instructions as well, suggesting that the drummer was able to independently manipulate his playing to match the given instruction. As stated above, in order to generalize such performance behavior, the data collection should extend to more drummers of different expertise levels in the future.
The perceptual experiment revealed insights into how independent observers perceived the different clips. Regarding the perceived passage of time, the movement instruction as well as tempo indicated significant effects, while dynamics, rhythmic complexity and modality did not. Stimuli containing much movement were perceived to be shorter than the stimuli containing little movement (in line with our H4). Additionally, faster stimuli were perceived to be shorter than slower ones (in contrast to H1). Thus, “active stimuli” (in terms of instructions) were perceived to pass more quickly than the inactive ones, despite them objectively being all of the same length. This is partially in line with research by Boltz (2011), who found that more active stimuli (e.g., relating to loudness or ascending pitch in melodies) were judged to be faster. The effect of tempo, on the other hand, is in contrast to previous research that found subjective overestimations of time with faster tempi (Droit-Volet et al., 2013; Hammerschmidt & Wöllner, 2020) and relatively longer reproductions with faster compared to slower music of the same duration (Hammerschmidt et al., 2021). However, it should be noted that passage of times measures are related but not completely equivalent with duration estimates (Droit-Volet & Wearden, 2016). In the current study, passage of time ratings were found unaffected by the chosen movement features, suggesting that participants based their ratings rather on other characteristics, mainly on the performance tempo. The actual, specific movements might not have added anything compared to a general visible activity (much movement and fast tempo) to the overall time perception.
Furthermore, results did not show any relation either to rhythmic complexity or to dynamics, thus not supporting H2. There is some evidence that more complex music (Bueno et al., 2002) or untypical musical patterns (Hammerschmidt & Wöllner, 2020) lead to time overestimations, hence should in turn have been reflected in time passing more slowly. Future research should further scrutinize the perceived passage of time in relation to controlled variations of musical characteristics.
When being asked to judge the perceived expressivity, participants’ ratings were guided by visible movement, less by dynamics, as the much-movement performances were rated as more expressive than the ones including normal and little movement, regardless of the dynamic level. These findings are in line with previous studies (e.g., Vuoskoski et al., 2014) and with our hypothesis 4. Overall, they indicate that participants typically equate expressiveness with larger movements. While not having asked the drummer to play expressively (which could be a task in future research), it is generally more challenging to play expressively using little movement, while it is easier to convey expressivity when playing softer: dynamics is a musical stylistic that can be manipulated independently, while movement is required for sound production in general and can be used further to convey expressivity (cf. ancillary gestures, Wanderley, 2002). Furthermore, we found an effect for tempo, meaning that participants rated the fast stimuli to be more expressive than the slow stimuli (supporting H1), suggesting an effect of visible activity (i.e., faster tempo) on the perceivers’ judgments relating to expressivity. This is in line with van Zijl and Luck's research on experienced emotions in music performance (Van Zijl & Luck, 2013) and Ilie and Thompson's study on acoustic cues on emotions in music, showing that fast music was judged as having greater energy arousal compared to slow music (Ilie & Thompson, 2006).
The regression results (movement features predicting ratings) revealed movement characteristics on which observers potentially based their judgments. Passage of time was related to Fluidity and Amount, with jerky and much movement being associated with faster passing time. It could be that clips in which a lot of jerky movement was present, hence being active and energetic, take up a lot of attention and were thus perceived as short in time (Block et al., 2010). For the expressivity ratings, only the amount of movement was found significant. A higher amount of movement being related to higher perceived expressivity is in line with previous research, such as Nusseck and Wanderley (2009). Tempo ratings were significantly predicted by all three movement features: high amount of movement as well as jerkier and less complex movement related to faster perceived tempo. This is in line with movement features related to faster play (e.g., Dahl, 2018). Moreover, this result might be due to two circumstances: the high amount of movement suggesting active stimuli and the implicit assumption that it is more difficult to perform complex movements at a fast tempo. Interestingly, the regression analysis revealed the same pattern of significant predictors for both visual-only and audio-visual stimuli, suggesting that there has been no modality effect present. This could suggest that the visual information was already sufficient in this context, and the actual audio content was somewhat deductible from the visual and thus redundant for the perceptual judgments (cf. Davidson, 1995).
The tempo judgment was mainly included as a control variable that participants indeed paid attention to the videos. Thus, rather unsurprisingly, we found a strong effect for tempo, with participants rating the slow stimuli as slower and the fast stimuli as faster, and can conclude that participants followed the experiment with attention. This can count as a promising side-result regarding repetitive tasks in online-experiments that tend to create fatigue and boredom in participants.
Instead of regular video, we used motion capture technology in this study. This not only provided us with the possibility to measure the movement in a highly accurate fashion and use the resulting movement features in the analyses, but also to use the animations in the follow-up perceptual experiment. This approach reduces biases regarding the style and appearance of the particular drummer, as for instance facial and bodily features are removed, thus participants are able to focus completely on the movements and watch the performances in a more neutral way. It would certainly be desirable (barring costs for system and data preparation) if such an approach became standard practice in similar research efforts in the future, as it would also support comparison between studies.
One limitation of this study lies in the fact that we were able to collect data with one drummer (due to the Covid-19 pandemic) that was highly skilled and very consistent in his performances. This could have yielded a floor effect due to the high level of expertise. It would therefore be beneficial to collect data from more drummers in order to generalize the production and movement feature results, for instance across levels of expertise, as novice drummers might execute the instructions in different ways compared to professionals. Additionally, comparisons between drummers of different musical genres (e.g., classical vs. jazz vs. rock/pop) could provide further insights into movement characteristics related to style as well as training, as findings by for instance Buck et al. (2021) and Altenmüller et al. (2020) suggest. Jazz drummers might intentionally make their movement look calm and parsimonious to underline the character of the music, while rock drummers might rather overexaggerate their movement as part of the show. A larger number of drummers would also facilitate a more advanced analysis of movement features, possibly revealing further insights into movement characteristics in music/drumming performance and allowing inferences about effects of instructions on the drummer's performance. Furthermore, data from more drummers would allow a more diverse data basis for follow-up perceptual experiments, providing the possibility to investigate how individual drumming differences influence the perception as well as possibly reduce fatigue effects due to watching different drummers.
As the current drum patterns included the same amount of hits of both snare and hi-hat in both complexity levels to keep the rhythms somewhat comparable, a future study could include more rhythmic patterns as well as patterns that have a different amount of snare hits (i.e., four instead of two per bar), following for instance the patterns used in Câmara et al., 2020. This would increase complexity by more aspects than just rhythmically shifting the snare and adding hits in the kick. This will naturally increase the amount and possibly the complexity of the (right) arm movements, as they would have to play more notes. It could be possible that different rhythms are able to highlight the importance of movement in distinct ways, hence worthwhile to be studied in systematic manners.
Future work could moreover analyze the audio recordings to investigate the dynamics-force relationship. While this is not within the scope of the paper, the drummer's differences in the chosen dynamics, in particular the relative loudness in the different experimental conditions, could be investigated to find out more about performance parameters and the interplay of tempo, rhythm, dynamics, and movement in drumming.
Another potentially interesting analysis would relate to the observers’ musical training, and in particular drumming expertise. Individual drumming experience has been shown to influence how expressive the performance of a professional drummer playing in different styles and genres was judged (Di Mauro et al., 2018), or how asynchronies are detected in experimentally manipulated audio-visual stimuli (Petrini et al., 2009a; Petrini et al., 2009b). However, we used an unconstrained sampling paradigm resulting in a heterogeneous group of participants, only resulting in 10% of the participants having drumming expertise, and thus refrained from following this path of analysis. Future studies could aim for an equal sample of non-musicians, musicians, and drummers.
Taken together, this study approached drumming motion from the perspective of computational movement feature analysis combined with perceptual evaluations of movement animations. We could connect movement characteristics, such as fluidity, complexity, and amount of movement with performance instructions (movement and dynamics) and musical characteristics (tempo and rhythmic complexity) as well as perceptual ratings of passage of time, expressivity and tempo in drumming animations. Future work should collect data from more drummers performing a wider variety of tasks (e.g., different rhythms and expressivity levels) to provide a more extensive stimulus pool for both movement analysis and follow-up perceptual experiments.
Supplemental Material
sj-docx-1-mns-10.1177_20592043231186870 - Supplemental material for Drumming Action and Perception: How the Movements of a Professional Drummer Influence Experiences of Tempo, Time, and Expressivity
Supplemental material, sj-docx-1-mns-10.1177_20592043231186870 for Drumming Action and Perception: How the Movements of a Professional Drummer Influence Experiences of Tempo, Time, and Expressivity by Birgitta Burger and Clemens Wöllner in Music & Science
Footnotes
Acknowledgments
We wish to thank Fabian Otten for supporting the motion capture part of this study.
Action Editor
Jessica Grahn, Western University, Brain and Mind Institute & Department of Psychology
Peer Review
Jonna Vuoskoski, University of Oslo, Department of Musicology and Department of Psychology
Karen Petrini, University of Bath, Department of Psychology
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The study was approved by the Ethics Committee at the Faculty of Humanities, University of Hamburg, Germany (no ID provided, date of issue: 10th September, 2020).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by a Consolidator Grant to the second author from the European Research Council (Grant No. 725319) for the project “Slow Motion: Transformations of Musical Time in Perception and Performance.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
