Sage Journals: Discover world-class research

Abstract

Polyrhythms are central to many musical practices. However, the extent to which people can simultaneously track multiple non-metrically related beat patterns remains unclear. Here we studied people's ability to simultaneously track multiple periodic streams containing beat patterns that cannot be perceived according to a single metric framework. Participants listened to one or two beat patterns simultaneously in three conditions: single stream, selectively attending to one of the streams, and simultaneously attending to both streams. Using a probe-tone paradigm, we assessed whether they were tracking the pattern's periodicity, while recording their brain activity using EEG. The EEG showed limited effects of attention. However, our behavioral results show that, while performance during dual attending is overall worse than selective attending and single stream listening, people are able to perform the dual attending task above chance levels, indicating that they can at least extract and evaluate temporal information from both periodic streams simultaneously. We also found evidence of one group being better at dual attending than another. Our findings suggest that tracking of polymetric streams is possible, and that this ability may depend on individual difference.

Keywords

Attention auditory cognition beat perception polyrhythm

Introduction

Rhythms are a ubiquitous feature of human experience: our hearts beat, our lungs breathe, and our movement can spontaneously synchronize with others. Musical rhythms are widely enjoyed, and danced to across cultures (Mehr et al., 2019; Singer et al., 2023). Beyond its role in music and dance, periodic rhythms are also essential in speech and general motor control (e.g., Birkett & Talcott, 2012; Harding et al., 2019; Thaut et al., 2015). Therefore, beat perception—extracting temporal regularities from a perceived underlying pulse that arises in response to (musical) rhythm (Honing, 2013; Large & Palmer, 2002; Merchant et al., 2015; Møller et al., 2021; Nozaradan et al., 2011; Rathcke et al., 2024)—is an essential skill to many of our daily activities.

In many styles of music, such as much of European classical music and contemporary global pop music, beats are usually perceived within groupings of regularly alternating weak and strong beats. This grouping is known as the meter, which provides the timing framework from which a given rhythmic pattern is understood (e.g., a waltz is a three-beat meter of strong-weak-weak alternation, Levitin et al., 2018; London, 2012). The metric framework relies on multiple levels of subdivisions which are integers of each other, such as a half-time or double time. Rhythms of different meters can be combined if the meters share a common metrical level. For example, when a three-beat meter rhythm is combined with a four-beat meter rhythm, the metrical level of the bar can be shared, creating a 3:4 polyrhythm. The beats of each level are of different lengths, so they equally subdivide the bar by 3 and 4, respectively (see Figure 1). In such instances, listeners perceive the combined rhythms according to one of the meters, either three- or four-beat (Møller et al., 2021; Stupacher et al., 2017). But when two rhythms cannot be perceived as sharing a common metrical level, it is unclear if the beat of those separate rhythms can be perceived simultaneously.

Figure 1.

Adapted from Nijhuis et al., (2026). (A) Schematic representation of 2:3 and 3:4 polyrhythms (Stimulus) and their key metrical levels ranging from Subdivisions to individual Pulses to the overarching Cycle/Bar. Bright green dots represent binary subdivision grouping (binarized pulses) and dark purple dots represent ternary subdivision grouping (ternarized pulses). (B) A polymeter of 2/4 against 5/8, drum beat and guitar beat are integer multiples of each other; two guitar beats align with one drum beat. The cycle of the guitar beat repeats every five eighth notes, while the cycle of the drum beat (kick-snare) repeats every two quarter notes (i.e., every four eighth notes). The first downbeats of the drum and guitar cycles line up only once every 20 eighth notes; between these points, they are misaligned. (C) Schematic illustration of how polyrhythms and polymeters are defined in this study. In the 2:3 polyrhythm, the IOIs between 2-pulse and 3-pulse differ, while the cycle is shared. In the 2:3 polymeter the IOIs are shared, while the durations of the cycles differ; the 2-meter goes through three cycles and the 3-meter goes through two cycles before the first downbeats align again.

Polyrhythms are therefore largely taught and perceived as integrated rhythms of repeating sequences that fit within the perceived meter (Jagacinski et al., 1988). When people are asked to tap the beat, they are strongly biased towards a binary meter (divisible by two), although it is also possible to prime towards a ternary meter (divisible by three) (Møller et al., 2021). Many studies report the perception of a composite rhythm (i.e., the combination of all the event onsets in the two (or more) sequences placed on a single timeline), suggesting integrated rather than segregated perception of the two separate streams within the rhythm (e.g., Jagacinski et al., 1988; Keller & Burnham, 2005). To perceive two separate streams of unrelated meters simultaneously would represent true polymetric perception, and is largely thought to be impossible (Jagacinski et al., 1988; Keller & Burnham, 2005; London, 2012).

Yet polymetric structures are not altogether uncommon in music. Polyrhythm and polymeter are important compositional techniques in various musical practices around the world, from West-African drumming to jazz and death metal (see Poudrier & Repp, 2013 for a summary). In particular, drummers, conductors, and DJs might benefit from polymetric perception, when tracking multiple streams in complex musical pieces.

For true polymetric perception to occur, metrically competing rhythmic structures must be tracked individually but simultaneously. While each rhythm can have its own metrical structure of alternating patterns of strong and weak beats, the tempo of the beats are not related in a 1:N manner (Poudrier & Repp, 2013). This would require divided (or segregated) attention, rather than integrative attention (Loui & Guetta, 2019).

The ability to segregate multiple rhythmic auditory streams, a process that is part of Auditory Scene Analysis (ASA), partially depends on acoustic features. Key factors that aid segregation of streams are pitch and tempo (Bregman & Campbell, 1971; Deike et al., 2012; van Noorden, 1975). The larger the pitch difference and the faster the presentation of the two streams, the easier it is to segregate the streams without interference (Bregman & Campbell, 1971; Miller & Heise, 1950; van Noorden, 1975). In contrast, when two rhythmic auditory streams are similar in pitch and slower, they can interfere with one another and will ultimately be integrated into a single composite stream, making it difficult to track them individually (Snyder & Large, 2005).

Poudrier and Repp (2013) suggest that humans seem to rely on such a composite rhythm to track two separate rhythmic streams. Using a task that required detection of timing deviation, they first tested the perception of 2:3 polyrhythms with and without phase shifts (an eighth note), which can relatively easily be perceived as one composite rhythmic pattern. These rhythms were made up of two streams that represented two non-isochronous rhythms of different meters (2/4 and 6/8). The authors first tested participants’ performance when selectively attending to one stream. Then, when told to attend to both streams simultaneously, participants were found to outperform their performance as compared with a random selective attending strategy (randomly selecting a stream) which was calculated based on their selective attending performance. When the authors tested polyrhythms of higher complexity (where the phase relationship was shifted half of an eighth note), with ambiguous metrical frameworks which could not easily be perceived as a single integrated composite rhythm, participants seemed unable to simultaneously keep track of the two rhythms in the separate streams. Hence, Poudrier and Repp (2013) could not find statistical support for true polymetric perception requiring divided attention to simultaneously monitor two rhythmic auditory streams, and suggested that polymetric perception is likely largely supported by perception of the composite rhythm. However, some individual participants' behavioral patterns did indicate better polymetric perception than expected.

However, Demany et al. (2015) suggest that humans are capable of simultaneously tracking two non-metrically related auditory streams, providing evidence for split auditory attention. Similar to Poudrier and Repp (2013), they segregated the streams by pitch register. However, they used a melodic rather than rhythmic task. What most likely explains their participants’ ability to exhibit simultaneous tracking was the nine hours of training on the tested stimuli. The training, and hence familiarity with the auditory streams, may have reduced the attentional demands or working memory load to allow easier processing and better tracking of separate streams (Engle, 2002). This indicates that auditory stream segregation and simultaneous attending might be a trainable skill that musicians such as DJs, drummers, or conductors may have developed for their particular discipline.

Stream segregation has previously indeed been found to be influenced by individual capacities such as musical training (Benocci & Calcus, 2024), attention (Dalton et al., 2009), and working memory (Conway et al., 2001; Lotfi et al., 2016). Training such individual capacities has been shown to improve stream segregation, for example in children with Auditory Processing Disorder (Moossavi et al., 2015). Highly trained musicians have also been shown to have higher working memory capacities (Talamini et al., 2016) and better selective attending (Loui & Wessel, 2007) and beat perception abilities (Nguyen et al., 2022; for a review see Repp & Su, 2013). This highlights the potential for further investigation into the interindividual differences of segregating and integrating streams in complex polyrhythm perception.

Neural tracking of rhythm can be assessed using frequency tagging of the beat (Nozaradan et al., 2011; 2014). The frequency spectrum of EEG activity while listening to a rhythm shows clear peaks at beat-related frequencies, known as the Steady State Evoked Potential (SSEP). The magnitude of the beat-related SSEPs is functionally relevant. For example, the SSEP is larger when the beat is played by low-frequency notes than high-frequency notes, which explains the special role bass sounds play in conveying the beat of music (Lenc et al., 2018). Furthermore, the strength of neural beat tracking has been correlated with the ability to synchronously tap to the beat (Nozaradan et al., 2016).

Using frequency tagging, Stupacher et al. (2017) demonstrated that the brain tracks the beat of both rhythms in complex auditory streams like polyrhythms. SSEPs reflected the rhythmic structure of both rhythms, with beat-related SSEPs even present during silent periods after listening. The study also underlined the functional relevance of neural tracking for beat perception, as these neural responses correlated positively with the behavioral accuracy of temporal judgments. Frequency tagging has also recently been used to study the perceptual integration of multiple rhythms in the visual domain, during interpersonal coordination (Alp et al., 2017; Varlet et al., 2020). Not only can the tagged frequency of each person's movement during an interaction be observed in the cortical signal, but intermodulation frequencies (frequency A + B) were also identified. The intermodulation frequency was shown to be functionally relevant for the integration of a person's own and the other's movement, with higher amplitude of the integration frequency correlating with higher movement synchrony. Alp et al. (2017) also noted higher-order perceptual integration in addition to low-level sensory integration. These studies highlight the utility of analyzing neural dynamics to understand the perception and processing of complex stimuli like auditory polyrhythms.

In this study, we used a behavioral beat perception task (judging a probe tone) and EEG frequency tagging to investigate whether people can track two simultaneous periodicities that do not share a metrical framework. We presented listeners with either one or two auditory periodicities at two different non-metrically related tempi. When presented with two periodicities, listeners were asked to either attend to one (selective attending) or both (dual attending). Participants judged a probe tone at the end of the periodicities as in or out of time with either of the patterns, indicating whether the periodicities were perceptually tracked. In addition to the temporal judgments trials, participants tapped along to separate trials to assess movement tracking of the two beats. We recorded auditory SSEPs using EEG and tested whether the amplitudes of neural frequencies of the two periodicities reflected simultaneous tracking. To assess effects of individual differences we tested each participant's beat perception abilities.

We hypothesized that people can track the two metrically unrelated periodicities simultaneously, evidenced by accurate judging of the probe tones as in or out of time with their respective periodicities. We tested this hypothesis by calculating participants’ predicted performance assuming a selective attending strategy, and comparing performance during dual attending to that. As a function of task difficulty, we further hypothesized temporal judgments would be most accurate for single and selective attending, and least accurate for dual attending. We expected tapping precision and accuracy to reflect participants’ perceptual and neural tracking of the periodicities, with more accurate and precise tapping for patterns that were more strongly attended or tracked. In line with the frequency tagging literature, we hypothesized that attention would be split between the periodic streams, and the tracking (SSEP) would be reduced for dual compared to selective or single attending. Yet, if there was neural tracking of both rhythms, each of the two SSEPs would have to be larger than its unattended counterpart during selective attending. Finally, we expected performance might be improved both by musical experience and beat perception ability.

Methods

Participants

Forty-eight participants volunteered to take part (26 females and 22 males, aged from 18 to 47; M = 24.8, SD = 7.15), receiving £20. All participants had normal hearing. The median years of musical experience (YoME) of participants was 4 ± 11.75.

Participants who did not perform above chance level (50%) in the single track listening condition were excluded (n = 6), since this suggests they were not able to perform any of the tasks. This leaves 42 participants included (21 male, 21 female, Median_YoME = 3 ± 9.5).

There were six additional participants with corrupt files, faulty triggers, or missing data from the EEG recordings, that prevented reading in and processing their EEG data (see detailed list in supplementary materials). Hence, 36 participants were analyzed for the EEG results (Median_YoME = 3.5 ± 10.0). Five participants had to be excluded from the analysis of tapping data altogether, because their taps did not register consistently, or they did not perform the tapping at all (e.g., > 5 s of no tapping, see supplementary materials for details). Therefore, 43 participants were included in tapping-only analyses and 32 participants remained for correlation analyses between tapping and EEG measures (i.e., one participant had been excluded for both tapping and EEG).

Stimuli

Participants were presented with two isochronous periodicities of a woodblock with different tempi, which, when presented simultaneously, did not combine into a composite rhythm within the same metric framework. Both periodicities had loudness accentuations every other beat, expressed via the increase of sound level (by 8 dB). This implied a binary metrical structure. Loudness was chosen as the metric cue (Grahn & Rowe, 2009) over other cues such as pitch and melody, to avoid adding additional complexity to the already complex patterns.

Both patterns were each presented at a unique tempo; Pattern A had a tempo of 114.6 BPM and pattern B of 163.8 BPM. Patterns with these two tempi are difficult to perceptually combine into a single metric framework, as their phases do not align until after eight onsets of the A pattern and 11 onsets of the B pattern. Furthermore, at the point of alignment, there is still a small phase difference between the two (B at 0.002 ms ahead of A), which accumulates over the course of the whole pattern (0.015 ms at the final alignment), but this difference is not perceptually detectable (London, 2012). Pattern A was five semitones higher pitched than pattern B. As such, the two patterns fall within the ambiguous zone of streaming and integration (Bregman & Campbell, 1971; Miller & Heise, 1950; van Noorden, 1975). Furthermore, the combination of higher pitch with slower tempo for pattern A vs lower pitch with higher tempo for pattern B balanced out the attention that is given to each musical beat pattern in the streams based on pitch and speed (Møller et al., 2021). The patterns were continuously presented for 30 s followed by a probe sine tone. This probe could either coincide with the next imagined note of pattern A, B, or arrive late by 30% of the period for either pattern. Stimuli (Figure 2) were presented via in-ear headphones (Etymotic Research Studio, ER2SE).

Figure 2.

Amplitude spectra (noise corrected) for track A, B, and their combined track. Audio files and code for the calculation of this FFT can be found on the OSF repository (https://osf.io/zrwmy/files).

Procedure

On arrival, participants were provided with an information sheet that described the perceptual task, and the EEG equipment used. All participants gave informed consent. Participants were then fitted with the EEG caps and electrodes. Participants first completed the computerized adaptive Beat alignment test (ca-bat v0.11.0), (Harrison & Müllensiefen, 2018) run via R (version 4.2.2), to test their general beat perception skills.

Participants then received further instructions about the task and performed a training block of 12 trials to learn to discriminate between pattern A and B. To pass the training, participants had to recognize both patterns correctly and answer the question ‘which pattern is the slower, higher pitched pattern, A or B?’ correctly. If they answered incorrectly, the training block was set to reinitiate, however, all participants passed the training in one attempt.

The experiment was run in psychoPy (version 2022.1.4) on a Dell latitude (Windows version) with a screen refresh rate of 60 Hz. The trials were self-paced by pressing the space bar to continue to the next trial. Participants were encouraged to take breaks between the blocks of trials. Before starting the dual attending block, participants were once again asked to confirm that they were able to recognize which beat pattern was which.

Task

Participants’ main task was to listen and attend to beat patterns and judge whether a probe tone that appeared after the pattern would have coincided with the next beat in the attended pattern, had it continued (i.e., ‘did the probe tone coincide with the beat you were attending to?’).

They judged the probe tone in three conditions: single pattern listening, selective attending, and dual attending. During single pattern listening participants heard a single beat pattern at a time (either A or B) and judged whether the probe tone was in time with the pattern (y/n). For selective attending, participants heard both beat patterns but were instructed to attend to either pattern A or B. They then judged whether the probe tone was in time with the attended pattern or not (y/n). During the dual attending condition, participants had to actively attend to both patterns simultaneously and judge whether the probe coincided with pattern A, B, or neither (a/b/n). We wanted the probe task to reflect the cognitive process and task load when asking to track both patterns simultaneously. If they are tracking both sequences, they should be able to evaluate the interval for both tracks.

A total of 62 trials were grouped into blocks according to conditions. Block 1 contained 20 single pattern listening trials, including 10 repetitions of each pattern, with five ‘on the beat’ (ON) probe trials and five ‘late’ probe trials (OFF). Block 2 contained 20 trials for the selective attending condition, with half of the trials instructing to attend to A and the other half to attend to B. Again, each pattern had five ON and five OFF probe trials. Block 3 contained only 12 dual attending trials, as the stimuli and task are the same for all trials, consisting of pattern A and B playing simultaneously. Half the trials contained probes for pattern A, and the other half for pattern B. Each group of six contained three trials that were ON and three trials that were OFF. This meant that half of the trials contained ON probes where participants were meant to answer A or B and the other half contained OFF probes where the correct answer was ‘neither’ (n). The final block (4) contained 10 tapping trials, 2 for each condition (single tapping to A, single tapping to B, selectively tapping to A, selectively tapping to B, free tapping with dual presentation), to assess motor entrainment. Trials were randomized within blocks.

Behavioral Analysis

The behavioral performance was quantified as the ratio of correct responses within each block. In the single listening block, for example, this would be the number of correct responses divided by 20 trials.

Predicted Behavioral Performance

To investigate whether participants actually attended to and could track both beat patterns simultaneously during the dual attending condition, we tested whether they outperformed not only random chance (33%) but also a selective attending strategy. This strategy assumes that, without instruction to attend to a particular pattern, participants will randomly choose either pattern A or B to attend to, which would be the correct pattern in 50% of trials. This strategy is similar to the selective attending trials, where participants attend to only one pattern. Therefore, based on each participant's performance in the selective attending block, we can estimate how many of the trials they would answer correctly if taking a selective attending strategy during the dual attending trials. The remaining trials would then be performed at chance level.

More specifically, within the 12 dual attending trials, six trials have the probe tones ON the beat, of which three are on A and three on B. During ON trials, if the participant is only paying attention to one pattern (for example all A), they are correct 50% of the time, and the selective attention performance applies to the three trials where the correct pattern is attended: (3*Rselective).

The remaining six trials are all OFF beat trials, where the probe tone coincides with neither (n) pattern A or B.  Therefore, regardless of which pattern is being attended (A or B), participants would correctly identify them as OFF at the rate of their selective attending performance. Participants would correctly identify that the probes are OFF in (Rselective * 6) cases. However, then there are still two options to choose from: the other pattern (a/b) or neither (n), which results in a 50/50 chance of correct response: (Rselective * 6(trials) *.5(chance)).

This same calculation applies to the three ON beat trials where the probe tone falls in line with the other (not attended) pattern. In (Rselective * 3) trials participants should correctly identify the probe as OFF beat, therefore leaving a 50% chance of correct response (the other pattern or neither): (Rselective * 3(trials) *.5(chance)).

Adding up the six OFF and three unattended ON responses results in (Rselective * 9(trials) *.5(chance)).

The predicted performance that participants would need to outperform to establish a dual attending strategy was thus calculated with the following formula:

Predicted performance ratio (using a selective attending strategy) = ((3 (trials)*Rselective) + (Rselective * 9(trials) *.5(chance))) /12 (trials).

We refer to this as their predicted score from hereon.

EEG Recordings and Analysis

EEG signals were recorded at a sampling rate of 2048 Hz using an ANT neuro (Enschede, Netherlands) mobile EEG system with 64 channels, placed over the scalp according to the international 10/10 system. All electrodes were referenced to the CPz electrode and their impedance was kept below 50 kOhm.

Upon the start of the trial and stimulus presentation, a trigger was sent via serial port command to ensure synchronized EEG recording with the audio presentation.

The EEG data were processed using MATLAB 2022a (The MathWorks, Inc., Natick, MA). Data were first segmented into 30 s trials, then high-pass filtered with a cut-off frequency of .2 Hz to remove slow drifts in the recorded signals and notch filtered to remove 50 Hz (and corresponding harmonics) electrical power contamination with a bandwidth of 1 Hz using a fourth-order Butterworth filter. Based on visual inspection, channels containing excessive artifacts or noise were then interpolated with the neighboring channels (i.e., an average of 2 [SD = .7] interpolated electrodes per participant, and never more than 3 electrodes). After filtering, the EEG signals were decomposed using an independent component analysis (FastICA), as implemented in Fieldtrip (Oostenveld et al., 2011) to remove eye movement artifacts. Based on visual inspection of the topography and time-course of independent components, components reflecting eye-blinks and lateralized eye movements were removed from the data (i.e., an average of .51 components [SD = .74] per participant). EEG data were then re-referenced to the average of all scalp electrodes.

At the next stage of data processing, we used a Fast Fourier Transformation (FFT) with zero-padding to compute the amplitude spectra with a frequency resolution of 0.01 Hz to align frequency bins with the frequencies of interest (i.e., 1.91 and 2.73). In order to examine the occurrence of significant EEG responses at the beat-related frequencies, we pooled the spectra of all EEG channels together and computed Z-scores at each frequency bin as the difference in amplitude between that frequency bin and the mean of 20 neighboring frequency bins (excluding the four immediately adjacent frequency bins, two on each side), divided by the standard deviation of those 20 neighboring bins. Z-scores were computed at the group-level (amplitude spectra averaged across conditions and participants) and individual-level (amplitude spectra averaged across conditions). EEG responses at specific frequency bins were considered to be significant when the Z-score value was greater than 1.645 (p < .05, one-tailed), in line with previous studies that used frequency-tagging techniques ( Jacques et al., 2016; Quek et al., 2018), which indicated signal amplitude significantly larger than the background noise. Background noise and muscular artifacts affect amplitude spectra over a large range of frequency bins around those of interest (Varlet et al., 2020). To minimize the effect of such irregularities on participants’ EEG responses, baseline subtraction was applied. At each frequency bin of the amplitude spectra we subtracted the average amplitude of the 20 neighboring frequency bins excluding the two immediately adjacent frequency bins (Jacques et al., 2016; Lenc et al., 2018; Nozaradan et al., 2011; Varlet et al., 2020). Finally, noise-subtracted amplitude spectra averaged across all EEG channels were computed for each participant and condition and then log-transformed to satisfy normality assumptions. The log-transformed SSEP amplitudes were then used for further statistical analyses in order to compare the amplitude of participants’ beat-related frequency responses (SSEPs) across the different conditions and to investigate correlations with the behavioral performance on the probe-tone task.

In addition to the fundamental frequency and harmonics of the beat frequencies in track A and B, we also explored intermodulation frequencies representing non-linear integration between stimuli (i.e., A + B = 4.64 Hz, and 2*A + B = 6.55 Hz) (Alp et al., 2017; Norcia et al., 2015; Varlet et al., 2020).

Statistical Analyses

Statistical analyses were conducted using JASP (version 0.16.4.0). Mixed ANOVAs were performed with Greenhouse–Geisser correction applied when the assumption of sphericity was violated. Pairwise contrasts were used to further examine the significant main effects using Bonferroni corrections for multiple comparisons.

To compare the behavioral performance across the different conditions, a mixed ANOVA with the between-factor Group (musician, non-musician), and the within-factors Pattern (A: 1.91 Hz; B: 2.73 Hz), and condition (Single pattern listening, Selective attending, and Dual attending) was performed on the ratio of correct responses.

The behavioral performance during dual attending was compared to the chance level (33.33%) with a one-sample t-test, and participants’ predicted performance assuming selective attending strategy using a two-tailed paired samples t-test.

Neural tracking strength, operationalized as the beat-related EEG amplitude peaks (SSEPs), was compared across the different conditions using a one-way repeated measures ANCOVA with the factors Group (musician, non-musician), Track (A: 1.91 Hz; B: 2.73 Hz), and condition (Single pattern listening A, Single pattern listening B, Selective attending A, Selective attending B, and Dual attending).

To investigate the relation between the neural responses and the behavioral performance on the probe task, linear correlations (multiple regressions) were calculated between the behavioral performance (ratio of correct responses) and the amplitude of the beat-related frequency peaks. Finally, we explored correlations between participants’ beat perception ability as measured by their ca-BAT performance and their neural and behavioral responses.

To evaluate tapping performance, we computed the discrete relative phase between the onset of the sounds in the beat patterns and the tap times of participants, expressed between 0 and 360 degrees. These asynchronies were calculated by finding the closest beat to each tap, to avoid problems with missed taps. We then used circular statistics to compute the average relative phase (mean vector angle) and standard deviation of the relative phase (resultant vector length) for each participant (Batschelet, 1981). A smaller relative phase (mean vector angle) would mean more accurate tapping performance, as the temporal gap between the tap and the beat is smaller. Whereas a longer resultant vector length indicates more precise tapping, as this is the result of a less variable relative phase. These were then compared across conditions using repeated measures ANOVAs and paired t-tests.

Whenever assumptions of normality were violated, non-parametric tests were performed.

Transparency and Openness

Data was collected in 2022. This study's design was not pre-registered. Data, stimuli and statistical analysis files are posted on OSF: https://osf.io/zrwmy/overview. (Nijhuis & Witek, 2025)

Results

Behavioral Results

Dual Attending Performance

In the dual attending condition, participants outperformed chance levels (33%) in a one-sample t-test (t(41) = 6.871, p < .001, d = 1.060, see Figure 3a) and also their predicted scores assuming selective attending strategies (see Figure 3b), as assessed by a one-tailed t-test (t(41) = −2.551, p = .007, d = −.394).

Figure 3.

Participants outperforming (a) chance level and (b) expected success rate during dual attending on the behavioral probe-tone task. Error bars represent 95% CI.

A mixed repeated measures ANOVA showed a significant effect of attending condition on the ability to correctly judge the probe tone (F(2,82) = 63.168, p < .001, partial η² = .606). Post-hoc comparisons showed a significant decrease in performance from single listening to selective (p_bonf = .018) and dual attending (p_bonf < .001). Performance further decreases significantly from selective to dual attending (p_bonf < .001), see Figure 4. The effect of the attending condition persisted, after correcting for musical experience (years) as a covariate (F(2,76) = 56.635, p < .001, partial η² = .598). Musical experience did not interact with the effect of condition either (F(2,76) = .483, p = .619, partial η² = .013).

Figure 4.

Performance on the probe-tone task (ratio correct responses) across attending conditions. Error bars represent 95% CI.

Tapping

Accuracy

A significant interaction between track and attending condition was found (F(2,70) = 9.36, p < .001, η_p² = .211). For track A, single tapping to A (M = 0.71, SE = 0.06, Figure 5: top left) was significantly more accurate than free tapping (M = 1.36, SE = 0.10, Figure 5: top right), but not significantly different to selective tapping to A (M = 0.87, SE = 0.09, Figure 5: bottom left). Selective tapping to A was also more accurate than free tapping (p < .001). Single tapping to track B (M = 1.48, SE = 0.06, Figure 5: top middle) was, instead, significantly less accurate than selective tapping to B (M = 1.33, SE = 0.08; p = .033, Figure 5: bottom middle). The free tapping (M = 1.54, SE = 0.12, Figure 5: bottom left) was not significantly different from either condition.

Figure 5.

Tapping performance represented on unit circles. In the left column are the tapping to A conditions (top: single tapping, bottom: selective tapping), in the middle column are the tapping to B conditions (top: single tapping, bottom: selective tapping), and in the right column are the free tapping conditions (top: referenced to pattern A, bottom: referenced to pattern B).

The significant main effect of the attentional condition on tapping accuracy is driven by the pattern found for tapping to track A, as described in the interaction (F(2,70) = 18.51, p < .001, η_p² = .346). Overall, single tapping (M = 1.10, SE = 0.05) was significantly more accurate than the free tapping (M = 1.45, SE = 0.60; p < .001) but not significantly different to selective tapping (M = 1.10, SE = 0.07; p = 1.000). Selective tapping was, however, significantly more accurate than free tapping (p < .001).

Across conditions, accuracy of tapping was lower for track B, as indicated by the increase in the circular statistic angle of the mean resultant length (track B M = 1.45, SE = 0.07 vs track A M = 0.98, SE = 0.06; p < .001, main effect of track: F(1,35) = 28.25, p < .001, η_p² = .447). The lower accuracy for B can be explained by this track being faster than track A, since a similar deviation from the beat in milliseconds amounts to a higher relative deviation in degrees.

The single tapping condition was significantly more accurate in track A (M = 0.71, SE = 0.60) than track B (M = 1.48, SE = 0.06). The same was observed during selective tapping (A: M = 0.87, SE = 0.09, B: M = 1.33, SE = 0.08; p < .001). In the free tapping condition, there was no significant difference (Track A M = 1.36, SE = 0.10; Track B M = 1.54, SE = 0.12; p = .321).

Precision

There was a significant main effect of condition (F(1.24,43.48) = 211.74, p < .001, η_p² = .858) on the precision of tapping, as measured by the circular statistic mean resultant length. Single tapping (M = .903, SE = .010) was significantly more precise than selective tapping (M = .807, SE = .031; p = .002) and free tapping conditions (M = .450, SE = .010; p < .001). Selective tapping was also significantly more precise than free tapping (p < .001). No main effect of track (A/B) or an interaction with track was observed (p-values > .7).

Free Trapping to Dual Beat Patterns

When investigating the free tapping during dual attending, we allocated the trials depending on whether their inter-tap intervals (ITIs) were on average within 10% above or below the inter-onset interval (IOI) of A or B. We found a roughly equal split between participants choosing to tap along to pattern A (10) or B (8), changing between trials (12), or tapping a different tempo altogether (7) (Figure 6).

Figure 6.

Mean ITI per participant. Legend; orange = track A, cyan = track B, purple = different beat.

EEG Frequency Responses

After initial exploration, only the fundamental frequencies of the beat patterns’ SSEPs were found to be above significance levels (z-score >1.645) in all conditions, but not their harmonics. We therefore report only results for the fundamental frequencies (see supplementary materials for details and comparison).

There was a significant interaction between condition and pattern (F(3,102) = 3.350, p = .022, η _p ² = .092), see Figure 7. The main effects of attending condition and pattern (A or B) on the amplitude of the EEG frequency response were not significant (F(3,102) = 2.318, p = .080, η _p ² = .066, and F(1,34) = .971, p = .409, η _p ² = .029, respectively), while the musical experience (years) of participants was controlled for as a covariate. Adding the covariate only slightly adjusted the effect sizes. The significant interaction also persisted after additionally controlling for tapping precision as a measure of perceptual rhythmic ability (p = .023, η _p ² = .118).

Figure 7.

EEG amplitude (log₁₀-transformed) of beat-related frequencies across conditions, split by track. Error bars represent 95% CI.

To investigate the interaction of track and condition, we ran two separate ANOVAs, one for each track. Both ANOVAs included musical experience (years) as a covariate. The effect of condition was significant for track A (F(3,102) = 3.183, p = .027, η _p ² = .088), but not for track B (F(3,102) = 1.428, p = .123, η _p ² = .056). The significant effect for track A also persisted after controlling for tapping precision as a measure of perceptual rhythmic ability (p = 0.008, η _p ² = .135). Post-hoc tests showed a nearly significant difference between single attending and selective attending to A after Bonferroni correction (p_bonf = .089, 6 comparisons), this difference was significant when uncontrolled for musical experience (p_bonf = .047). Dual attending was not significantly different from any of the other attending conditions (p-values > .18, see Figure 7).

Correlations

ca-BAT Score Correlations

Participants’ beat perception abilities, as measured with the ca-BAT, did not correlate with their performance in the single listening (Spearman's r = −.068, p = .701), selective attending (r = .144, p = . 417), or dual attending conditions (r = .164, p = .355). Ca-BAT was not correlated with any of the tapping measures or SSEP amplitudes (p-values = .09 - .9).

Neural Tracking Correlations

No significant correlations were found between the neural tracking of A (SSEP 1.91 Hz) and the behavioral task performance (all p-values > .1). The neural tracking of B (SSEP 2.73 Hz) was significantly correlated with behavioral task performance in the single listening condition (Spearman's r = .375, p = .024) but does not survive multiple comparison correction (p = .144, 6 comparisons).

Performance on the dual attending task significantly correlated with the second-order integration frequency (SSEP at 6.55) (Spearman's r = .385, p = .02), notwithstanding correction for multiple comparisons (n = 4, p = .08), but not with the first-order integration frequency (Spearman's r = .138 p = .421).

Tapping Correlations

Tapping performance was strongly correlated across conditions (Tables 1 & 2).

Table 1.

Tapping accuracy (absolute angle) correlations across tapping conditions.

Variable		Single A		Single B		Selective A
1. Single A	Spearman's rho	—
	p-value	—
	Effect size (Fisher's z)	—
	SE Effect size	—
2. Single B	Spearman's rho	0.537	**	—
	p-value	0.002		—
	Effect size (Fisher's z)	0.600		—
	SE Effect size	0.190		—
3. Selective A	Spearman's rho	0.512	**	0.374	*	—
	p-value	0.003		0.036		—
	Effect size (Fisher's z)	0.565		0.393		—
	SE Effect size	0.190		0.188		—
4. Selective B	Spearman's rho	0.299		0.666	***	0.486	**
	p-value	0.097		< .001		0.005
	Effect size (Fisher's z)	0.309		0.804		0.531
	SE Effect size	0.187		0.193		0.190

*p < .05, **p < .01, ***p < .001.

Table 2.

Tapping precision (mean resultant length) correlations across tapping conditions.

Variable		Single A MRL		Single B MRL		Selective A MRL
1. Single A MRL	Spearman’s rho	—
	p-value	—
	Effect size (Fisher's z)	—
	SE Effect size	—
2. Single B MRL	Spearman’s rho	0.674	***	—
	p-value	< .001		—
	Effect size (Fisher's z)	0.818		—
	SE Effect size	0.193		—
3. Selective A MRL	Spearman's rho	0.647	***	0.653	***	—
	p-value	< .001		< .001		—
	Effect size (Fisher's z)	0.770		0.780		—
	SE Effect size	0.193		0.193		—
4. Selective B MRL	Spearman's rho	0.696	***	0.750	***	0.679	***
	p-value	< .001		< .001		< .001
	Effect size (Fisher's z)	0.860		0.974		0.828
	SE Effect size	0.194		0.195		0.193

*p < .05, **p < .01, ***p < .001.

For tapping accuracy (absolute angle, Table 1), moderate to strong correlations between track A and B were found during single tapping (Spearman's r = .572, p < .001, z = .651), and selective tapping (Spearman's r = .528, p = .002, z = .587). The single tapping and selective tapping were also significantly correlated for track A (Spearman's r = .551, p < .001, z = .620), and track B (Spearman's r = .712, p < .001, z = .890).

For tapping precision (mean resultant length, Table 2), moderate to strong correlations between track A and B were found during single tapping (Spearman's r = .674, p < .001, z = .818), and selective tapping (Spearman's r = .679, p < .001, z = .828). The single tapping and selective tapping were also significantly correlated for track A (Spearman's r = .647, p < .001, z = .770), and B (Spearman's r = .750, p < .001, z = .974).

Tapping precision for single tapping to A (r = .390, p = .037), single tapping to B (r = .378, p = .043), selectively tapping to A (r = .593, p < .001), and B (r = .465, p = .011) were all correlated with the probe task performance during single track listening. While all except selectively tapping to A would not survive multiple comparison correction (n = 24), the consistency of all tapping measures correlating with the probe task performance and the strength of the correlations indicate that this is likely not a spurious correlation.

We found a spurious correlation between tapping accuracy when tapping to B (single tapping) and probe task performance during the selective attending condition (p = .018), but this correlation did not survive multiple comparison correction (n = 24).

No significant correlations (after corrections) were found between tapping performance and the neural tracking of the beat frequencies.

Post-hoc Group Split Analysis: Selective and Dual Attenders

Upon closer inspection, we identified two distinct groups: one that outperformed their predicted score in the dual attending condition (dual attenders) and one that did not (selective attenders) (see Figures 8 and 9). This split results in equal group sizes (20 and 22, respectively). The group split was not confounded by years of musical experience or age. The selective attenders had a median age of 21.0 ± 5.5 years old and had a median 5 ± 10 years of musical experience. Dual attenders had the same median age (21.0 ± 8.3 years old) and had a median of 2.5 ± 12 years of musical experience. A non-parametric Mann–Whitney test showed that these differences were not significant between the Selective and Dual attenders, U = 265.000, p = .792, r = .047 and U = 270.000, p = .700, r = .067, respectively.

Figure 8.

(a) Predicted and performed dual attending scores for all participants. (b) Predicted and performed dual attending scores grouped by attending type, based on their ability (or inability) to outperform the predicted score.

Figure 9.

Distribution of difference scores (Performance – predicted score) in the dual attending condition, suggesting a bimodal distribution at just below 0 (unable to outperform predicted score) and just above 0.1 (able to outperform predicted score). Positive scores are indicative of outperforming a selective attending strategy.

The groups also did not significantly differ on their ca-BAT scores (M_Selective = .530 ± .822, M_Dual = .753 ± .958, t(37) = −.780, p = .441, d = −.250).

When comparing the effect of attending condition (F(2,62) = 47.970, p < .001, η_p² = .607) on behavioral performance of the two groups, a clear interaction between attending condition and attender type group was observed (F(2,62) = 14.353, p < .001, η_p² = .097, see Figure 10Figure 6), which persisted after correcting for their ca-BAT scores, years of musical experience, and tapping precision. There was no main effect of the grouping by attender type (F(1,31) = 1.341, p = .256, η_p² = .041).

The interaction (Figure 10) shows that the group that did not outperform their own predicted score (i.e., selective attenders) was very accurate in the selective attending condition to the point that their performance was similar (i.e., not significantly different) to their performance during single track listening (t(19) = .237, p_bonf = 1.000). This performance for selective attending is significantly higher than that of the group that did outperform their predicted score (i.e., dual attenders) (t(1,37) = 3.573, p < .001, but not in post-hoc comparisons p_bonf = .309). We refer to this group as the ‘Selective attenders’, since they seem to be very capable of segregating out the irrelevant stream and selectively attending to the relevant stream.

Figure 10.

Interaction between group and the attending conditions for performance on the probe-tone task, split by group (black = dual attenders, white = selective attenders).

The group that outperformed their predicted score also clearly outperformed selective attenders on the dual attending task (t(1,37) = −3.855, p_bonf = .004). Their dual attending performance was even comparable to their selective attending performance (t(18) = 2.161, p_bonf = .518), indicating that this group is very good at simultaneously tracking the two streams, potentially integrating them (at the cost of selectivity). Therefore, this group is termed the ‘Dual attenders’.

SSEP Intermodulation

While an ANOVA on the two attending type groups and the two orders of intermodulation frequencies did not show any effect of the order (first- vs second-order, F(1,34) < .001, p = .706, η_p² = .004), it did show a significant effect of the attending type group (F(1,34) = 16.357, p < .001, η_p² = .198). There was no interaction between intermodulation order and group (F(1,34) < .001, p = .539, η_p² = .011). Post-hoc testing for the two intermodulation frequencies showed significant differences between selective and dual attenders on both frequencies (Figure 11).

Figure 11.

(a) The magnitude of the first intermodulation frequency split by group. (b) The magnitude of the second intermodulation frequency split by group.

The dual attenders demonstrated significantly higher amplitudes of the SSEP at intermodulation frequencies of the tracks (4.64 = A + B and 6.55 Hz = (2*A)+B) than the selective attenders (U = 90.00, p = .017, effect size = −.444, and U = 72.50, p = .003, effect size = −.552, respectively). Both normality and equivalent variance assumptions were not met, therefore a non-parametric Mann–Whitney U test was conducted.

No significant differences between the dual attenders and selective attenders were observed for the tapping tasks.

Discussion

In this study, we assessed people's ability to simultaneously track multiple periodic streams containing beat patterns that cannot be perceived according to a single metric framework. Participants listened to one or two isochronous beat patterns simultaneously, in which every other tone was louder, in three conditions: single stream, selectively attending to one of the streams, and simultaneously attending (dual attending) to both streams. After 30 s they judged whether a probe tone at the end would have coincided with the beat of one of the beat patterns (A or B), or neither, indicating whether they were tracking the pattern's periodicity. Using this paradigm, our behavioral results show that, while performance on dual attending is overall worse than selective attending and single stream listening, people are able to perform the dual attending task above chance levels, indicating that they can at least extract and evaluate temporal information from both periodic streams during the 30 s of the trial. Our neurophysiological data, i.e., SSEPs at the beat patterns’ periodicities in the EEG, show that both patterns are at least passively tracked. However, no effects of (selective) attention were observed. We found no effect of musical training or beat perception abilities, but participants showed some individual difference in their ability to selectively or dually attend to the two tracks.

Behavioral Tracking

In addition to performing above chance level, participants also outperformed a selective attending strategy in the dual attending condition. Thus, our participants demonstrate the ability to track period information from two independent, metrically unrelated patterns when simultaneously presented. This implies that, in certain circumstances, people might employ multiple uncoupled time-keepers to keep track of different parts of the auditory scene, such as a musical piece. This might benefit certain types of musicians whose practice involves tracking multiple rhythms, such as drummers, conductors, or DJs.

Poudrier and Repp (2013) have previously suggested that attending to two metrically unrelated auditory streams in parallel, by equally splitting attention across the streams, is likely impossible. They did note that the rhythmic complexity of their non-metrically related auditory streams might have confounded their results. Hence, we used simple isochronous sequences in the streams with fixed frequency registers, which Demany et al. (2015) suggested may aid in the streams’ parallel tracking. Contrary to Poudrier and Repp (2013) and in line with Demany et al. (2015), our results show that temporal information can be processed from simultaneously occurring non-metrically related periodicities. This suggest that parallel attention might be occurring, or that it may not be required for keeping track of the overall periodicities. Other mechanisms than parallel attention might be at play, such as rapidly switching attention between the two streams, or holding temporal intervals in short-term memory.

An alternative explanation for why we find performance above chance level and above the ‘predicted’ performance from a selective attending strategy, where Poudrier and Repp (2013) did not, may be our use of a third-choice option. In forced choice tasks like ours and Poudrier and Repp (2013), the chance level is determined by the number of choices. In our task, when attending to both streams, participants did not simply choose between A or B or ‘on’ versus ‘off’ beat like in Poudrier et al.'s study, but also the third option of ‘neither’. By increasing the number of choices, we reduced the random chance level. In a protocol of only 12 trials, this allowed our participants to outperform random chance and the ‘predicted performance’ under a random selective attending strategy. Hence, our design with a lower chance level might be more suitable for finding task performance above chance, allowing fewer of the correct answers to be assigned to random luck.

Adding a third-choice option for the dual attending condition can also be considered a limitation, as the task is now not identical in all attending conditions. A follow-up study could address this by using a 50/50 guess between ‘on’ and ‘off’ beat. Another important difference between Poudrier and Repp (2013) and the current experiment is in the specific stimuli. Poudrier & Repp's stimuli were made up of non-isochronous rhythmic patterns, each of which was supported by a different metric framework. The current stimuli, as described, are made up of two isochronous sequences at different rates, in which loudness is used to create a duple pattern. Both sequences thus support a duple meter, while at different rates—they do not support categorically different metrical frameworks such as Poudrier and Repp's stimuli. Finally, our differing results on behavioral tracking could be due to Poudrier and Repp (2013) (n = 10) very small sample sizes which included (some of) the authors of the papers. Besides the fact that findings from small samples might prove difficult to replicate, our finding of different ability groups among our participants adds further complexity to the reliance on small sample sizes; such small samples are more likely to randomly contain low proportions of dual attenders.

Neural Tracking

The neural tracking of the simultaneous beat patterns in the periodic streams was not attenuated by attention. Single stream listening resulted in greater amplitudes at the beat frequency than any simultaneous stream presentation (selective and dual attending). A reduction of amplitude is expected when attention has to be directed towards or divided between multiple sources. However, this effect was found only for track B (163 bpm) and not for track A (114bpm). Furthermore, within each selective attending condition, there were no significant differences between the SSEP amplitudes of A and B, regardless of which one was attended to. Moreover, SSEP amplitudes also did not significantly differ when comparing A as the attended track and the unattended track, across selective attending conditions. This was also true for the SSEPs to track B. Therefore, interpretation beyond low-level hearing cannot be substantiated (Nave et al., 2022).

Furthermore, no significant differences were found between selective and dual attending SSEP amplitudes. Together, the EEG results indicate similar neural responses to the beat in the attended and unattended streams during selective attending, as well as simultaneously attended streams. The steady-state response to more than one periodic stream was thus not enhanced by selectively attending to one, compared to equally attending to both. This could imply that unattended auditory streams are still monitored (e.g., Aydelott et al., 2015; Hutmacher & Kuhbandner, 2020) for temporal information in a similar way to attended streams. The only evidence we found in support of attentional effects was a negative correlation between tapping precision and the amplitude of the first-order SSEP integration frequency during selective tapping conditions. This suggests that neural integration of the two tracks might hinder the tracking of the beat when selective attending is required. However, since the correlation was only found when attending to track B, and not A, and there were no further correlations supporting this interpretation, we cannot conclude any clear attentional effects on SSEPs in our study.

The lack of attentional effects on SSEPs may not seem in line with traditional frequency tagging approaches in the visual domain that show attentional enhancement and suppression (e.g., Toffanin et al., 2009; Gulbinaite et al., 2017; Quek et al., 2018). However, in the auditory domain, the effect of attentional enhancement and suppression is less established (e.g., Bharadwaj et al., 2014). While some studies have found some modulation of the brain's response by auditory attention, they are not simple changes in magnitude of the SSEP (Linden et al., 1987). The findings are complex and mixed (Tiitinen et al., 1993), confounded by interhemispheric differences (Bidet-Caulet et al., 2007; Müller et al., 2009; Saupe et al., 2009). A recent study by Lenc et al. (2020) also showed no difference between two auditory attention tasks: attending to tempo versus pitch. In line with our findings, they report that the auditory SSEP was only attenuated when attention was directed at a different modality. Even when the visual modality was attended, the relative strength of beat-related frequencies was the same as in the auditory attention conditions, only the overall gain in the auditory response was reduced. We might thus suggest that the auditory SSEP of beat-related frequencies is not sensitive to additional selective attentional enhancement, as beat-related frequencies have been shown to be selectively enhanced, regardless of attentional tasks (Lenc et al., 2020; Nozaradan et al., 2012). Perhaps timing plays such a crucial role in our environment that temporal information is always processed to a high degree, even without overt attention.

However, it is also important to consider that the task might not have required as much continuous attention as intended. Given that the stimulus is 30 s long, the phase aligns (perceptually) between the two tracks a total of 11 times during a trial. If this alignment could be heard and quickly encoded by participants, they might not need to pay attention until the final couple of alignments. This could explain why we did not observe big attention differences. This potential fluctuation in attention should be controlled for in future studies, for example by inserting probe tones at randomly allocated points in the stimuli, rather than always at the end.

Attention to Tracks A and B

Importantly, we also found no difference in neural tracking strength between the two periodic streams. We found no evidence that either of the beat patterns in the streams evokes a significantly stronger neural response. While track A (114 bpm) is closer to what is traditionally considered human's preferred movement frequency (2 Hz, e.g., Fraisse, 1974), it did not result in stronger neural tracking. Our tapping results also did not show a preference for either beat pattern during the free tapping, when both beat patterns were presented, and participants were free to tap along in whichever way they felt was most suitable. Tapping performance also did not show a clear advantage of one of the beat patterns. Participants tapped closer to the beat for pattern A, but were equally consistent for both patterns, showing a similar lack of clear preference in the movement responses as the neural responses. This suggests neither beat pattern attracted more attention simply because of the stimulus design.

Nonetheless, the more accurate tapping to pattern A indicates a potential (perceptual) difference between the patterns that might be addressed in future studies by counterbalancing the tempo (i.e., each pattern being played twice, at 114 and 163 bpm). This would require further training of the participants and might prove to be practically challenging. Another potential limitation to the stimuli is that the loudness accentuation was not validated for its capacity to induce meter in the listeners. Since we did not ask the participants about the perceived meter, we cannot be entirely sure that the duple meter was successfully induced. While loudness accentuation has been shown to work before (e.g., Grahn & Rowe, 2009), it has not been used in a polyrhythmic context.

Dual Attenders vs Selective Attenders

We identified two distinct groups of participants based on their performance on the dual attending task: ‘Selective attenders’, who are better at extracting period information when selectively attending to a single beat pattern compared to dual attending, and ‘dual attenders’, who are better at dual attending than the selective attenders.

The groups were identified and separated based on whether their task performance in the dual attending condition outperformed their predicted score based on a selective attending strategy. It is important to note that a high selective attending score would make it more difficult for a participant to outperform their predicted score in the dual attending condition. For example, 100% accuracy (R = 1) during selective attending would result in a prediction of ((3 (trials)*1) + (1 * 9(trials) *.5(chance))) /12 (trials) = 7.5/12 correct responses predicted. Whereas 80% accuracy (R = .8) would result in a prediction of ((3 (trials)*.8) + (.8 * 9(trials) *.5(chance))) /12 (trials) = 6/12 correct responses predicted. Therefore, doing well in the selective attending condition increases the predicted score for the dual attending condition, and thus makes it harder to outperform compared to someone who performed poorly in the selective attending condition.

However, if selective attenders were just ‘better’ at the task in general than the dual attenders, making it harder to outperform their predicted scores, then the interaction we found between the group and task performance (shown in Figure 10) would not appear. The interaction shows that selective attenders were worse than dual attenders at the dual attending task. If the selective attenders were simply ‘better’ at the task overall, they would have performed well during selective attending and might not have outperformed their own predicted scores, but would still have outperformed the dual attenders in general, and thus on the dual attending task as well. The interaction shows, however, that the selective attenders perform significantly worse than the dual attenders during dual attending. This suggests that selectively attending and dual attending may be two separate skills. These separate skills may also explain the somewhat mixed findings regarding the ability to track multiple non-metrically related beats in the previous literature.

The finding of these two groups existing also supports the fact that dual attending is an incredibly challenging task, to the point that half of the participants cannot attend to more than one stream simultaneously when explicitly instructed to do so. However, we should not ignore that despite the task being challenging, some people seem to be able to dually attend to the two patterns. Poudrier and Repp (2013) also noted that some individual participants performed above chance level in their task. It would be particularly interesting for future research to study what it is about these participants or their listening strategies that makes them able to do this, especially because our participants’ SSEPs of the individual streams did not differ between the groups.

While the selective and dual attenders did not differ in their neural tracking of the fundamental beat frequencies in the periodic streams, the SSEP magnitude of the intermodulation frequencies was larger for the dual attenders group. This suggests that the non-linear integration of the two periodicities that is happening at the cortical level, even if there is no perceptible stable composite rhythm, is stronger for the dual attenders than the selective attenders. The fact that both the first- and second-order integration frequencies were stronger in dual attenders suggests that they integrated the auditory streams more effectively at both early and intermediate stages of processing (Alp et al., 2017; Varlet et al., 2020). The correlation between the amplitude of the second-order integration frequency and dual attending performance was also found when both groups were combined, but the group split analyses suggests this was driven by the dual attenders.

The split between selective and dual attenders was not confounded by musicianship. There were roughly equal numbers of musicians in each group. There were also no effects of musical expertise in our sample. Musicians performed similarly to non-musicians on the task, showed similar neural tracking (SSEP amplitudes), and movement responses. This might be due to the relatively simple patterns used in the task—more complex polyrhythms might have been better able to highlight the effect of musicianship. The lack of difference might also be due to the heterogeneity of the musician group, i.e., many different kinds of musicians were grouped together. This is also evidenced by the lack of difference in ca-BAT scores between musicians and non-musicians. Previous studies suggest that certain musician groups have better perceptual and motor timing abilities, such as percussionists (Repp & Su, 2013). We were unable to systematically test the effect of type of musician on our effects in this study. Nonetheless, it seems relevant to musical skills that segregating and integrating multiple periodic streams appear to be two independent skills, especially for those who might use simultaneous beat tracking in their musical practice (conductors, DJs, drummers). They might develop a particular expertise for dual attending (or switching) that allows them to keep track of multiple periodicities, potentially at the expense of selectively attending and blocking out other parts of the sound.

While broad musicianship may not explain the difference between selective attenders and dual attenders, studies suggest that auditory stream segregation is constrained by individual capacities, such as working memory limits (Lotfi et al., 2016) and attentional resource allocation (Heinrich et al., 2008). Lower working memory capacity correlates with reduced stream segregation (Lotfi et al., 2016). Working memory is also implicated in anticipating auditory events in a series (Colley et al., 2018) and producing rhythmic intervals from memory (Grahn & Schuit, 2012). Working memory thus plays a role in perceiving and producing rhythms, and likely affects the ability to perceive multiple rhythms, or hold multiple time intervals in memory at once. In future studies, individuals’ performance on (auditory) working memory tasks (such as auditory digit span tasks (Hilbert et al., 2015) or rhythm span tasks (Schaal et al., 2014)), should be controlled for, or correlated with their ability to selectively attend a single beat pattern embedded in simultaneous auditory streams.

Here, we show that people can track two simultaneous isochronous patterns that cannot easily map onto a single metric framework (approximates 8:11). To further understand whether people can become entrained to such patterns, the persistence of the beat frequency in neural activity could be measured in silent breaks, or during listening to non-isochronous (syncopated) beat patterns. Syncopated rhythms do not always have acoustic events at time points where a beat is perceived (as in Lenc et al., 2020). Therefore, peaks in neural activity at the beat frequency during silent breaks and syncopated rhythms will be less dependent on evoked responses and can thus be more confidently attributed to entrainment. Moreover, entrainment could be assessed with mismatch negativities and behavioral responses after silent breaks (e.g., Stupacher et al., 2016). If neural activity at the beat frequency persists during silence after presentation of the (syncopated) rhythm(s), or mismatch negativities are larger when the rhythm is re-introduced off-beat than on-beat after the silent breaks, that could be taken as evidence of entrainment to the induced beat rather than a result of active tracking of auditory events. Thus, using non-isochronous beat patterns and silence periods would help distinguish between active tracking of auditory events and induced entrainment to the simultaneous periodic streams.

Conclusion

We found behavioral evidence that people can extract period information from two non-metrically related isochronous beat patterns that are playing simultaneously, and that there is individual difference in the ability to simultaneously attend to the two beat patterns. Neural correlates suggest that the ability for simultaneous attention could be linked to higher-order integration of period information within the beat patterns.

Supplemental Material

sj-docx-1-mns-10.1177_20592043261435532 - Supplemental material for Simultaneous Beat Tracking of Two Auditory Rhythmic Periodicities

Supplemental material, sj-docx-1-mns-10.1177_20592043261435532 for Simultaneous Beat Tracking of Two Auditory Rhythmic Periodicities by Patti Nijhuis and Maria A. G. Witek in Music & Science

Footnotes

Acknowledgments

The authors want to thank Rhys Yewbrey for his work on the tapping analysis. This research was supported by UK Research and Innovation (AH/W000954/1) and the Research Council of Finland (346210)

ORCID iDs

Patti Nijhuis

Maria Witek

Author Contribution Statement

P. Nijhuis contributed to concept, design, data acquisition, analysis, interpretation, drafted the first manuscript, gave final approval, and agrees to be accountable for all aspects of the work ensuring integrity and accuracy.

M. Witek contributed to concept, design, data acquisition, interpretation, critically revised the first manuscript, gave final approval, and agrees to be accountable for all aspects of the work ensuring integrity and accuracy.

Ethical Approval Statement

All participants provided written informed consent prior to the experiment, which was approved by the University of Birmingham Humanities and Social Sciences (HASS) ethics committee under number ERN_21-1258.

Ethical approval: ERN_21-1258 approved by the University of Birmingham's College of Arts and Law review. All participants provided written informed consent prior to participation.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by UK research and Innovation (Arts and Humanities Research Council) (grant number UKRI AH/W000954/1) and The Research Council of Finland (grant number 346210).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability

Data and statistical analyses are available on OSF: https://osf.io/zrwmy/files/osfstorage. (Nijhuis & Witek, 2025).

Action Editor

Jessica Grahn, Western University, Brain and Mind Institute and Department of Psychology.

Peer Review

Karli Nave, University of Michigan, Kresge Hearing Research Institute. Department of Otolaryngology – Head Neck and Surgery. Eve Poudrier, University of British Columbia, School of Music.

Supplemental Material

Supplemental material for this article is available online.

References

Alp

Nikolaev

A. R.

Wagemans

Kogo

(2017). EEG frequency tagging dissociates between neural processing of motion synchrony and human quality of multiple point-light dancers. Scientific Reports, 7(January), 1–9. https://doi.org/10.1038/srep44012

Aydelott

Jamaluddin

Nixon Pearce

(2015). Semantic processing of unattended speech in dichotic listening. The Journal of the Acoustical Society of America, 138(2), 964–975. https://doi.org/10.1121/1.4927410

Batschelet

(1981). Circular Statistics in Biology. London: Academic Press.

Benocci

Calcus

(2024). Stream segregation, musical abilities, and the development of speech perception in noise. JASA Express Letters, 4(12), 210. https://doi.org/10.1121/10.0034543

Bharadwaj

H. M.

Verhulst

Shaheen

Liberman

M. C.

Shinn-Cunningham

B. G.

(2014). Cochlear neuropathy and the coding of supra-threshold sound. Frontiers in Systems Neuroscience, 8. https://doi.org/10.3389/fnsys.2014.00026

Bidet-Caulet

Fischer

Besle

Aguera

P.-E.

Giard

M.-H.

Bertrand

(2007). Effects of selective attention on the electrophysiological representation of concurrent sounds in the human auditory cortex. Journal of Neuroscience, 27(35), 9252–9261. https://doi.org/10.1523/JNEUROSCI.0644-07.2007

Birkett

E. E.

Talcott

J. B.

(2012). Interval timing in children: Effects of auditory and visual pacing stimuli and relationships with reading and attention variables. PLoS ONE, 7(8), e42820. https://doi.org/10.1371/journal.pone.0042820

Bregman

A. S.

Campbell

(1971). Primary auditory stream segregation and perception of order in rapid sequences of tones. Journal of Experimental Psychology, 89(2), 244. https://doi.org/10.1037/h0031163

Colley

Ian D

Keller

Peter E

Halpern

Andrea R

(2018). Working memory and auditory imagery predict sensorimotor synchronisation with expressively timed music. Quarterly Journal of Experimental Psychology, 71(8), 1781–1796. https://doi.org/10.1080/17470218.2017.1366531

10.

Conway

A. R. A

Cowan

Bunting

M. F.

(2001). The cocktail party phenomenon revisited: The importance of working memory capacity. Psychonomic Bulletin & Review, 8(2), 331–335. https://doi.org/10.3758/BF03196169

11.

Dalton

Santangelo

Spence

(2009). The role of working memory in auditory selective attention. Quarterly journal of experimental psychology, 62(11), 2126–2132.

12.

Deike

Heil

Böckmann-Barthel

Brechmann

(2012). The build-up of auditory stream segregation: A different perspective. Frontiers in Psychology, 3, 461. https://doi.org/10.3389/fpsyg.2012.00461

13.

Demany

Erviti

Semal

(2015). Auditory attention is divisible: Segregated tone streams can be tracked simultaneously. Journal of Experimental Psychology: Human Perception and Performance, 41(2), 356–363. https://doi.org/10.1037/a0038932

14.

Engle

R. W.

(2002). Working Memory Capacity as Executive Attention. Current Directions in Psychological Science, 11(1), 19–23. https://doi.org/10.1111/1467-8721.00160

15.

Fraisse

(1974). Psychologie du rythme. Paris: Presses Universitaires de France.

16.

Grahn

J. A.

Rowe

J. B.

(2009). Feeling the beat. Premotor and striatal interactions in musicians and non-musicians during beat perception. The Journal of Neuroscience, 29(3), 7540–7548. https://doi.org/10.1523/JNEUROSCI.2018-08.2009

17.

Grahn

Jessica A

Schuit

Dirk

(2012). Individual differences in rhythmic ability: Behavioral and neuroimaging investigations. Psychomusicology: Music, Mind, and Brain, 22(2), 105–121. https://doi.org/10.1037/a0031188

18.

Gulbinaite

van Viegen

Wieling

Cohen

M. X.

VanRullen

2017). Individual Alpha Peak Frequency Predicts 10 Hz Flicker Effects on Selective Attention.

19.

Harding

E. E.

Sammler

Henry

M. J.

Large

E. W.

Kotz

S. A.

(2019). Cortical tracking of rhythm in music and speech. NeuroImage, 185(October 2018), 96–101. https://doi.org/10.1016/j.neuroimage.2018.10.037

20.

Harrison

P. M.

Müllensiefen

(2018). Development and validation of the computerised adaptive beat alignment test (CA-BAT). Scientific Reports, 8(1), 12395.

21.

Heinrich

Antje

Schneider

Bruce A

Craik

Fergus I. M

(2008). Investigating the Influence of Continuous Babble on Auditory Short-Term Memory Performance. Quarterly Journal of Experimental Psychology, 61(5), 735–751. https://doi.org/10.1080/17470210701402372

22.

Hilbert

Nakagawa

T. T.

Puci

Zech

Bühner

(2015). The digit span backwards task. European Journal of Psychological Assessment, 31(3), 174–180. https://doi.org/10.1027/1015-5759/a000223

23.

Honing

(2013). Structure and interpretation of rhythm in music. In Deutsch

(Ed.), Psychology of music (3rd ed.; pp. 369–404). London: Academic Press. 10.1016/B978-0-12-381460-9.00009-2.

24.

Hutmacher

Kuhbandner

(2020). Detailed long-term memory for unattended, irrelevant, and incidentally encoded auditory information. Journal of Experimental Psychology: General, 149(2), 222–229. https://doi.org/10.1037/xge0000650

25.

Jacques

Retter

T. L.

Rossion

(2016). A single glance at natural face images generate larger and qualitatively different category-selective spatio-temporal signatures than other ecologically-relevant categories in the human brain. NeuroImage, 137, 21–33. https://doi.org/10.1016/j.neuroimage.2016.04.045

26.

Jagacinski

R. J.

Marshburn

Klapp

S. T.

Jones

M. R.

(1988). Tests of parallel versus integrated structure in polyrhythmic tapping. Journal of Motor Behavior, 20(4), 416–442. https://doi.org/10.1080/00222895.1988.10735455

27.

Keller

P. E.

Burnham

D. K.

(2005). Musical meter in attention to multipart rhythm. Music Perception, 22(4), 629–661. https://doi.org/10.1525/mp.2005.22.4.629

28.

Large

E. W.

Palmer

(2002). Perceiving temporal regularity in music. Cognitive Science, 26(1), 1–37. https://doi.org/10.1207/s15516709cog2601_1

29.

Lenc

Keller

P. E.

Varlet

Nozaradan

(2018). Neural tracking of the musical beat is enhanced by low-frequency sounds. Proceedings of the National Academy of Sciences, 115(32), 8221–8226. https://doi.org/10.1073/pnas.1801421115

30.

Lenc

Keller

P. E.

Varlet

Nozaradan

(2020). Attention affects overall gain but not selective contrast at meter frequencies in the neural processing of rhythm. BioRxiv, 2020.09.23.309443. https://doi.org/10.1101/2020.09.23.309443

31.

Levitin

D. J.

Grahn

J. A.

London

(2018). The psychology of music: Rhythm and movement. Annual Review of Psychology, 69(1), 51–75. https://doi.org/10.1146/annurev-psych-122216-011740

32.

Linden

R. D.

Picton

T. W.

Hamel

Campbell

K. B.

(1987). Human auditory steady-state evoked potentials during selective attention. Electroencephalography & Clinical Neurophysiology, 66(2), 145–159. https://doi.org/10.1016/0013-4694(87)90184-2

33.

London

(2012). Hearing in time: Psychological aspects of musical meter. Oxford: Oxford University Press.

34.

Lotfi

Mehrkian

Moossavi

Zadeh

S. F.

Sadjedi

(2016). Relation between working memory capacity and auditory stream segregation in children with auditory processing disorder. Iranian Journal of Medical Sciences, 41(2), 110.

35.

Loui

Guetta

R. E.

(2019). Music and attention, executive function, and creativity. In Thaut

M. H.

Hodges

D. A.

(Eds.), The Oxford handbook of music and the brain (pp. 263–284). Oxford: Oxford University Press.

36.

Loui

Wessel

(2007). Harmonic expectation and affect in Western music: Effects of attention and training. Perception & Psychophysics, 69(7), 1084–1092. https://doi.org/10.3758/BF03193946

37.

Mehr

S. A.

Singh

Knox

Ketter

D. M.

Pickens-Jones

Atwood

Lucas

Jacoby

Egner

A. A.

Hopkins

E. J.

Howard

R. M.

Hartshorne

J. K.

Jennings

M. V.

Simson

Bainbridge

C. M.

Pinker

O'Donnell

T. J.

Krasnow

M. M.

Glowacki

(2019). Universality and diversity in human song. Science, 366(6468), eaax0868. https://doi.org/10.1126/science.aax0868

38.

Merchant

Grahn

Trainor

Rohrmeier

Fitch

W. T.

(2015). Finding the beat: A neural perspective across humans and non-human primates. Philosophical Transactions of the Royal Society B: Biological Sciences, 370(1664), 20140093–20140093. https://doi.org/10.1098/rstb.2014.0093

39.

Miller

G. A.

Heise

G. A.

(1950). The trill threshold. The Journal of the Acoustical Society of America, 22(5), 637–638. https://doi.org/10.1121/1.1906663

40.

Møller

Stupacher

Celma-Miralles

Vuust

(2021). Beat perception in polyrhythms: Time is structured in binary units. Plos one, 16(8), e0252174. https://doi.org/10.1371/journal.pone.0252174

41.

Moossavi

Mehrkian

Lotfi

Adjedi

(2015). The effect of working memory training on auditory stream segregation in auditory processing disorders children. Iranian Rehabilitation Journal, 13(1), 27–22.

42.

Müller

Schlee

Hartmann

Lorenz

Weisz

(2009). Top-down modulation of the auditory steady-state response in a task-switch paradigm. Frontiers in Human Neuroscience, 3, 429. https://doi.org/10.3389/neuro.09.001.2009

43.

Nave

K. M.

Hannon

E. E.

Snyder

J. S.

(2022). Steady state-evoked potentials of subjective beat perception in musical rhythms. Psychophysiology, 59(2), e13963. https://doi.org/10.1111/psyp.13963

44.

Nguyen

Sidhu

R. K.

Everling

J. C.

Wickett

M. C.

Gibbings

Grahn

J.A.

(2022). Beat Perception and Production in Musicians and Dancers. Music Perception: An Interdisciplinary Journal, 39(3), 229–248. https://doi.org/10.1525/mp.2022.39.3.229

45.

Nijhuis

Møller

Bamford

J.S.

Stupacher

(2026). Polyrhythm Perception and Production: A Scoping Review. PsyArXiv. https://doi.org/10.31234/osf.io/ymt5v_v1

46.

Nijhuis

Witek

M.A.G:

(2025). Simultaneous beat tracking. OSF. https://doi.org/ https://doi.org/10.17605/OSF.IO/ZRWMY.

47.

Norcia

A. M.

Appelbaum

L. G.

Ales

J.M.

Cottereau

B. R.

Rossion

(2015). The steady-state visual evoked potential in vision research: A review. Journal of Vision, 15(6), 4. https://doi.org/10.1167/15.6.4

48.

Nozaradan

(2014). Exploring how musical rhythm entrains brain activity with electroencephalogram frequency-tagging. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1658), 20130393–20130393. https://doi.org/10.1098/rstb.2013.0393

49.

Nozaradan

Peretz

Keller

P. E.

(2016). Individual differences in rhythmic cortical entrainment correlate with predictive behavior in sensorimotor synchronization. Scientific Reports, 6(1), 20612. https://doi.org/10.1038/srep20612

50.

Nozaradan

Peretz

Missal

Mouraux

(2011). Tagging the neuronal entrainment to beat and meter. Journal of Neuroscience, 31(28), 10234–10240. https://doi.org/10.1523/JNEUROSCI.0411-11.2011

51.

Nozaradan

Peretz

Mouraux

(2012). Selective neuronal entrainment to the beat and meter embedded in a musical rhythm. Journal of Neuroscience, 32(49), 17572–17581. https://doi.org/10.1523/JNEUROSCI.3203-12.2012

52.

Oostenveld

Fries

Maris

Schoffelen

J. M.

(2011). FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Computational intelligence and neuroscience, 2011(1).

53.

Poudrier

Repp

B. H.

(2013). Can musicians track two different beats simultaneously? Music Perception, 30(4), 369–390. https://doi.org/10.1525/mp.2013.30.4.369

54.

Quek

Genevieve

Nemrodov

Dan

Rossion

Bruno

Liu-Shuang

Joan

(2018). Selective Attention to Faces in a Rapid Visual Stream: Hemispheric Differences in Enhancement and Suppression of Category-selective Neural Activity. Journal of Cognitive Neuroscience, 30(3), 393–410. https://doi.org/10.1162/jocn_a_01220

55.

Rathcke

Smit

Zheng

Canzi

(2024). Perception of temporal structure in speech is influenced by body movement and individual beat perception ability. Attention, Perception, & Psychophysics, 1–17.

56.

Repp

B. H.

Y. H.

(2013). Sensorimotor synchronization: A review of recent research (2006–2012). Psychonomic Bulletin and Review, 20(3), 403–452. https://doi.org/10.3758/s13423-012-0371-2

57.

Saupe

Schröger

Andersen

S. K.

Müller

M. M.

(2009). Neural mechanisms of intermodal sustained selective attention with concurrently presented auditory and visual stimuli. Frontiers in Human Neuroscience, 3, 688. https://doi.org/10.3389/neuro.09.058.2009

58.

Schaal

N. K.

Banissy

M. J.

Lange

(2014). The rhythm span task: Comparing memory capacity for musical rhythms in musicians and non-musicians. Journal of New Music Research, 44(1), 3–10. https://doi.org/10.1080/09298215.2014.937724

59.

Singer

Jacoby

Hendler

Granot

(2023). Feeling the beat: Temporal predictability is associated with ongoing changes in music-induced pleasantness. Journal of Cognition, 6(1), 34. https://doi.org/10.5334/joc.286

60.

Snyder

J. S.

Large

E. W.

(2005). Gamma-band activity reflects the metric structure of rhythmic tone sequences. Cognitive Brain Research, 24(1), 117–126. https://doi.org/10.1016/j.cogbrainres.2004.12.014

61.

Stupacher

Witte

Hove

M. J.

Wood

(2016). Neural entrainment in drum rhythms with silent breaks: Evidence from steady-state evoked and event-related potentials. Journal of Cognitive Neuroscience, 28(12), 1865–1877. https://doi.org/10.1162/jocn_a_01013

62.

Stupacher

Wood

Witte

(2017). Neural entrainment to polyrhythms: A comparison of musicians and non-musicians. Frontiers in Neuroscience, 11(APR), 1–17. https://doi.org/10.3389/fnins.2017.00208

63.

Talamini

Carretti

Grassi

(2016). The Working Memory of Musicians and Nonmusicians. Music Perception, 34(2), 183–191. https://doi.org/10.1525/mp.2016.34.2.183

64.

Thaut

M. H.

McIntosh

G. C.

Hoemberg

(2015). Neurobiological foundations of neurologic music therapy: Rhythmic entrainment and the motor system. Frontiers in Psychology, 5, 1185. https://doi.org/10.3389/fpsyg.2014.01185

65.

Tiitinen

H. T.

Sinkkonen

Reinikainen

Alho

Lavikainen

Näätänen

(1993). Selective attention enhances the auditory 40-Hz transient response in humans. Nature, 364(6432), 59–60. https://doi.org/10.1038/364059a0

66.

Toffanin

de Jong

Johnson

Martens

(2009). Using frequency tagging to quantify attentional deployment in a visual divided attention task. International Journal of Psychophysiology, 72(3), 289–298. https://doi.org/10.1016/j.ijpsycho.2009.01.006

67.

van Noorden

L. P. A. S.

(1975). Temporal coherence in the perception of tone sequences.

68.

Varlet

Nozaradan

Nijhuis

Keller

P. E.

(2020). Neural tracking and integration of ‘self’ and ‘other’ in improvised interpersonal coordination. Neuroimage, 206, 116303. https://doi.org/10.1016/j.neuroimage.2019.116303