Abstract
Rhythm perception and production can be disrupted by neurological or neurodevelopmental disorders (e.g., Parkinson’s disease, dyslexia). Rhythm deficits are associated with poor performance in language, attention, and working memory tasks. Re-training rhythmic skills may thus provide a promising avenue for improving these associated cognitive functions. To this end, here we present a new protocol for selective training of rhythmic skills implemented in a tablet serious game called Rhythm Workers. Experiment 1 served to select 54 musical excerpts based on the tapping performance of 18 non-musicians who moved to the beat of music. The excerpts were sorted in terms of the difficulty of tracking their beat, and assigned to different difficulty levels in the game. In Experiment 2, the training protocol was devised and tested in a proof-of-concept study, including two versions of the game. One version (tapping version) required a synchronized motor response (via tapping), while the other (perception version) asked for a perceptual judgment. Ten participants were trained with one version and 10 with the other version of Rhythm Workers, for 2 weeks. A control group (n = 10) did not receive any training. Participants in the experimental groups showed high compliance and motivation in playing the game. The effect of the training on rhythm skills yielded encouraging results with both versions of the game. Rhythm Workers thus appears to be a motivating and potentially efficient way to train rhythmic abilities in healthy young adults, with possible applications for (re)training these skills in individuals with rhythm disorders.
Introduction
Most of us can easily track the beat of rhythmic auditory events, such as music, and move along with it. This can be seen when we synchronize our movement to the rhythm of music while dancing, marching, doing sport activities (e.g., jogging to the beat of music). This ability is widespread in the general healthy population (Repp, 2010; Sowiński & Dalla Bella, 2013), with just a few exceptions (e.g., beat deafness, Bégel et al., 2017; Launay, Grube, & Stewart, 2014; Palmer, Lidji, & Peretz, 2014; Phillips-Silver et al., 2011; Sowiński & Dalla Bella, 2013). Moving to musical rhythm implies that listeners can extract the beat from an auditory sequence. The beat is defined as a perceived pulse that marks equally spaced points in time (Large & Jones, 1999; London, 2012), to which we usually move when we tap our finger/foot or in dance. Beat tracking can be tested in purely perceptual tasks (e.g., detecting a deviation from isochrony in a sequence of tones, Ehrlé & Samson, 2005) or in sensorimotor tasks (e.g., paced tapping to the sounds of a metronome or to music, Repp, 2005; Repp & Su, 2013). In the past few years, batteries including both perceptual and sensorimotor tasks have been developed for the evaluation of rhythmic and timing abilities, such as the Battery for the Assessment of Auditory Sensorimotor and Timing Abilities (BAASTA; Dalla Bella, Farrugia, Benoit et al., 2017) and the Harvard Beat Assessment Test (H-BAT; Fuji & Schlaug, 2013). These batteries are highly valuable as they allow the characterization of the timing capacities of distinct populations and the highlighting of inter-individual differences (Bégel et al., 2017; Benoit et al., 2014; Cochen De Cock et al., 2018; Dalla Bella, Benoit, Farrugia et al., 2017; Dalla Bella, Dotov, Bardy, & Cochen de Cock, 2018; Dalla Bella, Farrugia, Benoit et al., 2017; Falk, Müller, & Dalla Bella, 2015; Puyjarinet et al., 2017).
Rhythmic skills are sustained by a complex neuronal network. Notably, even in the absence of a motor response, mere extraction of the beat from an auditory signal recruits motor regions of the brain, such as the basal ganglia, premotor cortex, pre-SMA, and the cerebellum (Chen, Penhune, & Zatorre, 2008a; Coull, Cheng, & Meck, 2011; Grahn & Brett, 2007; Grahn & Rowe, 2009), on top of perceptual regions (superior temporal gyrus; Chen, Penhune, & Zatorre, 2008b; Schwartze & Kotz, 2013; Thaut, 2003). When a motor response is coupled to an auditory rhythm this network extends to sensorimotor integration areas (e.g., dorsal premotor cortex; Chen, Zatorre, & Penhune, 2006; Coull, Cheng, & Meck, 2011; Zatorre, Chen, & Penhune, 2007). Malfunctioning of these networks typically affects rhythmic skills in neurodegenerative disorders (e.g., Parkinson’s disease; Benoit et al., 2014; Grahn & Brett, 2009; Jones & Jahanshahi, 2014; Pastor, Artieda, Jahanshahi, & Obeso, 1992; Spencer & Ivry, 2005) or neurodevelopmental deficits (ADHD, Noreika, Falter, & Rubia, 2013; Puyjarinet et al., 2017; stuttering, Falk, Müller, & Dalla Bella, 2015; autism spectrum disorder, Allman, Pelphrey, & Meck, 2012; speech and language impairments, Corriveau & Goswami, 2011; Corriveau, Pasquini & Goswami, 2007; Dalla Bella, Dotov, Bardy, & Cochen de Cock, 2018; Goswami, 2011; Huss, Verney, Fosker, Mead, & Goswami, 2011). Timing and rhythmic skills can also be selectively deficient in healthy adults (beat deafness, Bégel et al., 2017; Launay, Grube, & Stewart, 2014; Palmer, Lidji, & Peretz, 2014; Phillips-Silver et al., 2011; Sowiński & Dalla Bella, 2013; tone deafness, Dalla Bella & Peretz, 2003; Dalla Bella, Berkowska, & Sowiński, 2015). Interestingly, the ability to track the beat has been associated with other cognitive abilities such as working memory, sustained attention, or language and reading skills in children (Tierney & Kraus, 2013; Woodruff Carr, White-Schwoch, Tierney, Strait, & Kraus, 2014).
Altogether these studies indicate that there is a tight link between rhythmic skills, motor, and cognitive functions. Because of that link, one may expect that an improvement in rhythmic skills may positively affect both motor and cognitive functioning. Rhythmic training may provide a viable strategy to improve other functions above and beyond rhythm. This possibility finds some confirmation in studies showing the beneficial effect of rhythmic stimulation on motor functions and cognition. For example, rhythmic training in which patients with movement disorders such as patients with Parkinson’s disease walk together with a metronome or music improves their gait, by increasing speed and stride length (Dalla Bella, Benoit, Farrugia et al., 2017; de Dreu, Van Der Wilk, Poppe, Kwakkel, & Van Wegen, 2012; Thaut et al., 1996; Spaulding et al., 2013) and reduces their deficits in rhythm perception and production (Benoit et al., 2014; Dalla Bella, Benoit, Farrugia et al., 2017). Notably, the positive response to rhythmic stimulation depends on patient’s perceptual and sensorimotor rhythmic skills, thus pointing to a strong link between rhythm processing and motor control (Dalla Bella, Benoit, Farrugia et al., 2017; Dalla Bella, Dotov, Bardy et al., 2018; Cochen de Cock et al., 2018). In addition, rhythmic stimulation (e.g., rhythmic priming) can be used for improving speech perception in children with dyslexia, and with specific language impairment (e.g., Przybylski et al., 2013; Schön & Tillmann, 2015).
In sum, training rhythmic skills appears to be a promising avenue for improving movement and cognition in a variety of populations. To the best of our knowledge, no systematic protocol for training selectively rhythmic skills has been proposed so far. The goal of this study was to devise and test a new rhythm training protocol which is implemented as a serious game exploiting new mobile technologies. A serious game is a game designed specifically for training and education purposes, such as providing a dedicated training for rehabilitation/remediation, in an entertaining and motivating fashion, while being widely accessible to the targeted public and remaining low-cost (Annetta, 2010; Kato, 2012). Over the past two decades, serious games have been extensively used in therapy (for a review, see Rego, Moreira, & Reis, 2010). Several studies proved that serious games involving motor exercises have beneficial effects on movement capacities in stroke (Friedmann et al., 2014; Webster & Celik, 2014), in Parkinson’s disease (Barry, Galna, & Rochester, 2014; Harris, Rantalainen, Muthalib, Johnson, & Teo, 2015; Mendes et al., 2012), and in healthy older adults (Sun & Lee, 2013). Dedicated cognitive training, such as working memory or executive function training, via serious games has also yielded encouraging results over the past 10 years (e.g., Anguera et al., 2010; for review, see Lumsden, Edwards, Lawrence, Coyle, & Munafò, 2016; however, for a discussion on the limit of computerized cognitive training, see Owen et al., 2010).
There are a few examples of rhythmic games in the market, such as Guitar Hero® or Rhythm Heaven Fever®. Unfortunately, none of these games is specifically dedicated to rhythmic training or complies with the measurement standards needed for experimental work, as we highlighted in a recent survey (Bégel, Di Loreto, Seilles, & Dalla Bella, 2017). First, measures of rhythmic motor performance lack temporal precision in the presentation of stimuli and/or data acquisition. The output data is insufficient since there is no measure of rhythm performance recorded. Second, there is usually no progression in the games based on the rhythmic features of the musical stimuli. Therefore, these games do not train selectively rhythmic skills. Finally, most of the games may be played without relying on auditory information. Visual cues displayed on the screen, such as images appearing rhythmically, are often sufficient to play the game, as the goal is to execute a movement in a given temporal window in reaction to these cues. For instance, in Guitar Hero, the player has to synchronize with moving circles when they reach a given position on the screen. These drawbacks make off-the-shelf rhythm games unsuitable for selective training of rhythmic skills.
Here we devised a new serious game for training perceptual and sensorimotor rhythmic skills, named Rhythm Workers. The goal of the game is to construct a building. Construction games, such as SimCity and Megapolis, are widespread and highly entertaining. As an illustration, the EA Mobile’s Vice President, Jason Willig, claimed that nearly 40 million people have played the last version of SimCity in the first 6 months after the game was launched. Rhythm Workers uses rhythmic patterns and musical stimuli with different degrees of beat saliency, assessed in a first experiment. As rhythmic skills can be trained using both perceptual and motor tasks, we devised two versions of the game. In the perception version of Rhythm Workers, the training is performed via an adapted version of the Beat Alignment Test (BAT, Iversen & Patel, 2008). The player has to detect whether a sequence of percussion sounds (a metronome) is aligned to the beat of the stimulus or not. In the tapping version, the goal is to tap to the beat of the stimulus as accurately as possible. Having two different versions of the game is particularly useful, as perceptual training alone may be sufficient for improving rhythmic skills. Indeed, beat perception in the absence of associated movement is sufficient for activating motor regions of the brain (Grahn & Brett, 2007; Grahn & Rowe, 2009).
The study consists of two experiments. The goal of the first experiment is to select and validate the musical material. In particular, the experiment served to rank the musical stimuli from high to low beat saliency, tested with a tapping task. Beat saliency is critical for creating difficulty levels in the game. Players in both the perception and the tapping versions of the game need to extract the beat in order to complete the tasks with success. This is more difficult when the beat has low saliency than when the beat is very salient. When the beat is less salient, beat extraction will particularly recruit mechanisms devoted to the internal generation of the beat (Grahn & Rowe, 2009). The second experiment is a proof-of-concept pilot study, with the goal of testing usability of Rhythm Workers and compliance with a training protocol using this serious game. Healthy young adults played the game on a tablet device at home for two weeks. Usability of the serious game and motivation all along the training protocol were tested. Moreover, in order to obtain first evidence on the effect of Rhythm Workers on rhythm skills, participants were submitted to a version of the BAT taken from BAASTA (Dalla Bella, Farrugia, Benoit et al., 2017), before and after the training. This task was chosen as it is a good indicator of beat perception skills, and has been proven to be very sensitive to inter-individual differences (e.g., Bégel et al., 2017; Cochen De Cock et al., 2018; Falk, Müller, & Dalla Bella, 2015). The BAT was successfully used in the past to show changes in rhythm skills following training of gait with rhythmic auditory stimulation (in Parkinson’s disease; Benoit et al., 2014).
Experiment 1
The goal of Experiment 1 is to select the musical excerpts for Rhythm Workers and to rank them from high beat saliency (low rhythmic difficulty) to low beat saliency (high rhythmic difficulty). The selection of the other stimuli included in the game (metronome and metrical sequences) was not part of this experiment, but is reported in Experiment 2 (Material and procedure).
Participants
Eighteen participants (5 females, mean age = 26, SD = 3.4) without musical training volunteered to participate in the experiment.
Material and procedure
An initial set of 90 musical excerpts available in MIDI format in an online music repository (www.midiworld.com) was selected from three musical genres, 30 from classical music, 30 from jazz, and 30 from pop music. The choice of musical stimuli across different genres affords variety among the excerpts in terms of beat saliency. In addition, it has the advantage of making the game less monotonous and more attractive for players regardless of their musical preferences. For stimuli in which vocal performance was part of the excerpt, the voice was replaced by a melody with a piano timbre. All excerpts were rated by four members of the laboratory (two musicians), experts in timing and rhythm research, in terms of beat saliency, pleasantness, and familiarity. Beat saliency was rated on a seven-point scale (1 = the beat can be hardly perceived, 7 = the beat can be easily perceived). Similar scales were used to rate stimulus pleasantness (1 = not pleasant, 7 = very pleasant) and familiarity (1 = not familiar, 7 = very familiar). The stimuli were ranked based on the ratings of beat saliency, and assigned to three categories including 30 excerpts each: with (1) a highly salient beat (ratings between 5.75 and 6.75), (2) an averagely salient beat (ratings between 4.5 and 5.5), and (3) a beat with low saliency (ratings between 1.5 and 4). Within each category the five least pleasant excerpts were discarded, leading to 75 excerpts, 25 in each category. Rating scores for pleasantness, familiarity and beat saliency of the excerpts in the three categories were compared with a one-way repeated measures Analysis of Variance (ANOVA). The three stimulus categories significantly differed in terms of average beat saliency (= 2.76 for stimuli with low beat saliency, 5 for stimuli with average beat saliency, and 6.31 for stimuli with high beat saliency; F(2,48) = 969.23, p < .001). Note that excerpts with a highly salient beat were also the most familiar ones (mean familiarity for high beat saliency = 5.68, average beat saliency = 4.91, and low beat saliency = 4.45; F(2,48) = 5.68, p < .01). The three stimulus categories did not differ in terms of pleasantness (mean pleasantness for high beat saliency = 4.69, average beat saliency = 5.03, and low beat saliency = 5.06; F(2,48) = 2.06, p = .14). In this experiment, we wanted to confirm that the excerpts with the most salient beat as rated by the four experts were those for which the beat was the easiest to track. To do so the 75 excerpts were further tested with a tapping task on a group of non-musicians (see Participants above). Participants were asked to tap with their index finger to the beat of the excerpts, presented in a random order. We recorded their tapping performances via a Roland SPD-6 MIDI percussion pad controlled by Sonar software (LE version). In addition, participants rated the excerpts in terms of beat saliency, pleasantness and familiarity.
Analysis
Motor synchronization to the beat was analyzed with circular statistics (Fisher, 1993; for examples with tapping, see Dalla Bella & Sowiński, 2015; Kirschner & Tomasello, 2009). This method consists of representing the inter-beat interval (IBI) of the stimuli on a 360° polar scale. The timing of each finger tap relative to the beat is represented by an angle by comparing the time of the tap to the time of the nearest beat (Figure 1). Angles, treated as unitary vectors, are used to compute the resultant vector R. The length of vector R, between 0 and 1, represents synchronization consistency, namely how variable is the interval between the taps and the beat in a trial (for example, see Dalla Bella, Farrugia, Benoit et al., 2017; Kirschner & Tomasello, 2009; Sowiński & Dalla Bella, 2013). Consistency is treated here as an indicator of rhythmic difficulty: the lower the consistency, the more difficult it is for the participants to synchronize to the beat of music. The angle of vector R (θ or relative phase, in degrees) represents synchronization accuracy (negative angle = the participant taps before the beat, positive angle = the participant taps after the beat). Performances with tapping rates twice as fast than the expected beat, usually corresponding to the quarter note, were discarded because they artificially lead to reduced vector length, even if the tapping performance is good. The percentage of participants who tapped at the rate of the beat was calculated.

Examples of two distributions of taps corresponding to two musical excerpts. The dots represent the distributions of the timing of the taps relative to the beat (= 0 degrees) for one participant. On the left, the dots occur in the vicinity of the beat, indicating high synchronization consistency (length of vector R = .90). On the right, the taps are scattered around the circle, showing poor synchronization consistency (vector length = .31). The angle (θ) represents synchronization accuracy. An angle of 0° means that the taps occurred exactly on the beat. An angle of 180° indicates that the taps occurred in between the beats (i.e., in antiphase).
Finally, in order to obtain an objective measure of beat saliency, for each excerpt we computed pulse clarity, based on the acoustic signal using the “pulse clarity” function in the MIR toolbox in Matlab (Lartillot & Toiviainen, 2007; Lartillot, Toiviainen, & Eerola, 2008). Large values in terms of pulse clarity indicate that the beat is particularly salient.
Results and discussion
Twenty-two musical excerpts were rated below 4 in terms of pleasantness, and were thereby discarded. The final set of 53 musical excerpts 1 were ranked from the easiest to the most difficult to synchronize with, based on synchronization consistency.
Musical excerpts ranked based on synchronization consistency, treated as an indicator of rhythmic difficulty, are presented in Table 1. It is worth noting that synchronization consistency is positively correlated with pulse clarity (r = .27, p < .05) and with metrical level (r = .40, p < .01). This shows that participants’ ability to tap to the beat is linked to the presence of a clear pulse in the acoustic signal, and to the tendency to consistently identify the beat at a given metrical level. Highly familiar excerpts were also the most pleasant ones (r = .85, p < .01), and those for which the beat was judged as the most salient (r = .68, p < .01).
Description of musical excerpts. Vector length (between 0 and 1) represents synchronization consistency. Pulse clarity (between 0 and 1), computed based on the acoustic signal, is an index of beat saliency. Metrical level indicates the percentage of participants who tapped at the expected metrical level (e.g., at the quarter note). Participants’ ratings of familiarity, pleasantness, and beat saliency, provided on a 7-point scale, are also reported.
This set of 53 musical excerpts ranked for rhythmic difficulty was the musical database used to design a rhythm training protocol implemented in a serious game (Rhythm Workers). The ranking of the excerpts in Table 1 was used in the game to assign the excerpts to levels of increasing difficulty. The game is devised and tested in Experiment 2.
Experiment 2
This experiment is a proof-of-concept pilot study of the game Rhythm Workers. A small group of young adults underwent a 2-week training protocol with the game; rhythmic skills were assessed before and after the training period. The goal of this experiment is to prove the usability of the game and the compliance with the protocol. Additionally, it provides first evidence about the effects of this training protocol on rhythmic skills.
Participants
Thirty healthy young adults participated in the experiment (8 females, mean age = 24.67, SD = 3.04). Participants considered themselves as non-musicians (average musical training = 1.63 years, SD = 2.14). The participants were randomly assigned to one of three groups: control (n = 10), tapping (n = 10), and perception (n = 10). Participants were remunerated for participating in the experiment, namely for participating in the pre- and post-training testing sessions who took place in the laboratory, and for playing the game according to our instructions (see Procedure) during the two-week training period at home. Participants who did not play the game for the maximum amount of time (300 mins) still received the remuneration at the end of the training.
Experimental design
Training protocol: Rhythm Workers
Stimulus material
In addition to the musical excerpts selected in Experiment 1, nine metronome sequences (isochronous sequences of tones) and 37 rhythmic sequences were created. The metronome sequences are formed by 80 isochronously presented tones. Rhythmic sequences are temporal patterns of tones with different durations and with an underlying beat. There were 18 strongly metrical sequences and 191 weakly metrical sequences defined based on the classification of Povel and Essens (1985; see also Patel, Iversen, Chen, & Repp, 2005) (see Table 2). The beat underlying strongly metrical sequences is typically easier to track than in weakly metrical sequences (Patel et al., 2005). In both metronome and rhythmic sequences, the timbre of the tones was a woodblock percussion sound.
Metrical sequences. Tempos of the sequences were manipulated as follows: the first two sequences were presented with a tempo of 100 beats per minute (BPM; IBI = 600 ms). The following sequences’ tempos were either progressively reduced or increased by 10% of the original BPM value in steps of 10%.
x = event onset.
. = silent position.
| = indicates that the following event or silent position is associated with a beat.
To vary rhythmic difficulty, the beat rate of metronome sequences and rhythmic patterns was manipulated. An IBI of 600 ms corresponds to the natural rate at which on average individuals tap in the absence of a pacing stimulus (Repp, 2005; Repp & Su, 2013). Stimuli with this beat rate are thus considered as the easiest to tap along with. Difficulty was manipulated by progressively deviating from this optimal rate. We created sequences with IBIs which are 10%, 20%, 30% and 40% faster or slower than 600 ms for metronome sequences (IBI range: 360–840 ms), and 10%, 20%, 30%, 40% and 50% faster or slower than 600 ms for rhythmic sequences (range: 360–900 ms). Two strongly and weakly metrical sequences were created for each tempo. 1 In order to make the game less monotonous, faster and slower stimuli were interleaved.
Rhythm Workers
The goal of the game is to construct buildings. The construction of a building is associated to one of the stimuli presented above (metronome, rhythmic sequence, or music; each stimulus includes 80 beats) and corresponds to one level of the game. Ninety-nine levels were designed. These levels were grouped into nine degrees of difficulty, referred to as “worlds” (11 levels per world), as illustrated in Table 3. Ninety-nine stimuli were selected (53 musical excerpts, 9 metronome sequences, and 37 rhythmic sequences) and assigned to the different worlds, as can be seen in Table 3. To make the game interesting, thus potentially motivating, the three types of stimuli were alternated within the same world as follows: the game started with a metronome sequence and other metronome sequences occurred every 10 levels of the other stimuli. The other 10 stimuli (i.e., music and rhythmic sequences) were presented after each metronome sequence according to the following fixed order: 2 musical excerpts – 2 strongly-metrical sequences – 2 musical excerpts – 2 weakly-metrical sequences – 2 musical excerpts. This structure of the rhythmic training protocol was implemented in two versions of the game. In the perception version, the task is an adaptation of the Beat Alignment Test (BAT, Iversen & Patel, 2008) in which the player is asked to detect if a sequence of percussion sounds (a metronome) is aligned or not to the beat of the stimulus. In the tapping version, the goal of the task is to tap to the beat of the stimulus as accurately as possible.
Structure of the rhythmic training protocol implemented in Rhythm Workers. Stimuli corresponding to the levels of the game are indicated. For simplicity, we indicate tempo changes only for metronome sequences; tempo changes for metrical sequences are the same as for the metronome (see Table 2).
M = music excerpt.
SM = strongly metrical sequence.
WM = weakly metrical sequence.
The aesthetic quality of the building depended on the player’s performance (see Figure 2). Feedback about the performance was provided both during the performance of a level in real time, and at the end of the level (after the end of a stimulus). Real-time feedback was provided four times while the participant played a given level in both the perception and the tapping versions of the game. The four iterations correspond to the appearance of the 4 stories of the building. When the player tapped accurately to the beat (tapping version) or detected correctly whether the percussion sounds were aligned to the beat (perception version) the stories of the building appeared as better structured (e.g., more symmetrical), richer, and more aesthetically appealing than when the player’s performance was not good. The feedback was provided every 15 beats of the stimulus in both versions of the game.

Two examples of buildings generated by one player. From left to right are presented the four steps of the building construction (appearance of each of the four stories of the building). The aesthetic quality of the building depends on the player’s performance. A) Example of a bad performance (score = 10 points, 1 star). B) Example of a very good performance (score = 100, 5 stars).
Additional feedback about the performance was given at the end of each level. This was a final score between 5 and 100 points calculated based on participants’ overall performance in the level (for details, see below) and converted into a number of stars, from one (score < 70) to five (score > 95) stars. A performance leading to at least two stars was sufficient to unblock the next level within the same world. To unblock (and move to) the next world, the player had to gather at least 20 stars in the current world. Note that if the player could not obtain two stars after five trials at the same level, the next level was automatically unblocked. This process allowed a player who had particular difficulties with one level to move to the next level, with the possibility of training on the previous level within the same world later. Finally, if the participant completed all the worlds before the end of the two weeks, the game restarted from world 1, but with a slightly more difficult version of the game, in which a number of three stars, instead of two, was needed to unblock the subsequent level.
Perception version
In the perception version of the game, five sequences of isochronously presented tones with a triangle timbre were superimposed on each stimulus at five different moments during the presentation of the stimulus. The first sequence of tones lasted five beats and is an example of an aligned sequence (i.e., the tones are aligned to the beat); in this case the participant was not expected to provide an answer. The following four sequences, with tones aligned or misaligned to the beat, included 10 beats. Aligned and misaligned sequences of tones were separated by four beats without superimposed tones.
The task of the player was to judge after each tone sequence whether the tones were aligned or not with the stimulus beat. The player responded by touching one of two buttons (“Yes” and “No”) presented in the middle of the screen (see Figure 3). The buttons appeared for 2.5 s in correspondence with the eighth tone of each sequence. A wrong answer led to the appearance of a story of the building which was not aesthetically appealing and 25 points were subtracted from the final score (= 100 points at the beginning of the game). A correct answer and the player’s reaction time (the faster, the better) determined the final score and whether the story of the building appearing was more or less aesthetically appealing. If the correct response was given in the first half of the response time window (i.e., within 1.25 s), the best version of the building was displayed, and no points were subtracted from the final score. If the correct response was provided later, but within the second half of the response time window (i.e., between 1.25 s and 2.5 s), a less appealing version of the building was displayed and 5 points were subtracted from the final score. Finally, if the player provided a wrong response for all the four tone sequences, a minimum score of 5 points, corresponding to one star, was assigned, to avoid a null score that would be very demotivating for the player. Altogether, 396 triangle tone sequences were judged in the game. Half of them (198) were aligned to the beat of the stimulus. The other half (198) were misaligned. When misaligned, the sequence IBIs presented either a change in period (100 sequences) or in phase (98 sequences) relative to the stimulus IBI. Fifty sequences were presented at a tempo 10% slower, and the other 50 sequences at a tempo 10% faster, than the stimulus IBI. Moreover, tones in 49 sequences anticipated the stimulus beat by 30% of the IBI, while the tones in the other 49 sequences were presented later than the beat by 30% of the IBI. In both cases, the inter-tone interval was the same as in the stimulus.

Examples of the response windows in the perception version of Rhythm Workers. (Left panel: response window just before the first story of the building is presented. Right panel: response window just before the third story of the building is presented.)
Tapping version
As in the perception version of Rhythm Workers, a stimulus at a given level included 80 beats. After four beats, a sequence of 10 isochronously presented triangle tones (metronome) aligned to the beat was presented. The goal of this superimposed metronome was to show the time at which the beat occurred in the stimulus (at a chosen metrical level). The task of the player was to finger tap on the screen to the beat of the stimulus after the metronome was presented.
Circular statistics computed from the last 15 taps prior to the appearance of a story of the building were used to assess tapping performance in real time. Synchronization consistency (vector length) and accuracy (vector angle) 3 were used to calculate the score. The score at each level was computed as follows. Synchronization consistency (a value between 0 and 1) was multiplied by 100 to obtain a score between 0 and 100. Note, however, that maximum consistency (= 1) is impossible to achieve in human performance. Thus, to reward a very good performance, three points were automatically added, leading eventually to a score of 100 points for an excellent player. Points were subtracted from this score depending on synchronization accuracy. When the vector angle was higher than 60 degrees, 5 points were subtracted every 10 degrees with decreasing accuracy, so that between 60 and 70, 5 points were subtracted from the final score, between 70 and 80, 10 points were subtracted, and so on. Finally, in some cases the player may tap right in between the beats (i.e., in antiphase). This situation was treated as an erroneous performance, and was detected during the performance of a level, namely when the player obtained synchronization accuracy which departed by at least 120 degrees from the beat, knowing that the antiphase corresponds to 180 degrees. In this case, a warning message at the end of level appeared with the information that the player tapped in between beats and that the level had to be repeated.
If the score, computed before showing a building’s story, was lower than 70, the worst version of the building was displayed. A good version was presented if the score was between 70 and 90, and a very good one if the score was higher than 90. The versions of the buildings were the same as in the perception version. The final score was computed on the overall performance. As for the perception version of the game, the minimal score was set to 5 instead of 0.
Assessment of rhythmic skills before and after training
Participants’ rhythmic skills, namely beat perception and sensorimotor synchronization to the beat, were tested before and after the training protocol with two tasks taken from BAASTA (Dalla Bella, Farrugia, Benoit et al., 2017), the BAT and a paced tapping task. In the BAT, 72 musical fragments lasting 20 beats from Bach’s “Badinerie” and Rossini’s “William Tell Overture” were presented at three different tempos, with 450-, 600- and 750-ms IBIs. From the 7th beat, an isochronous sequence (percussion sound) is superimposed onto the music. The sequence was either aligned with the beat of the music or non-aligned (in terms of phase or period). The participant judged whether the percussion sounds were aligned or not with the beat. In the paced tapping task, participants were asked to tap with their index finger to the sounds of a metronome at three tempos (450-, 600- and 750-ms inter-stimulus intervals (ISIs)), and to the beat of music (two excerpts from Bach’s “Badinerie” and Rossini’s “William Tell Overture”; IBIs = 600 ms). Each trial of the paced tapping task was repeated twice.
Procedure
Each participant in the tapping and perception groups received a tablet (LG G Pad 8.0 model) with the application dedicated to the specific version of the game for the training period. Instructions about the game were provided during the pre-training meeting with participants, followed by a short practice session. Participants were instructed to play the game for a maximum of five times a week for 30 mins at home, and to rate their motivation to play the game on a seven-point scale at the end of each session (1 = very low motivation, 7 = very high motivation). It was made clear to the participants that the rated degree of motivation should reflect their willingness to play the game at
Results
Usability and compliance
The total time played was 205.9 mins (SD = 57.6) in the perception group and 208 mins (SD = 44.9) in the tapping group. 4 The two experimental groups did not differ in the amount of time played (t < 1). Only one player in the tapping group did not manage to complete the game, probably because he was the one who played the least (118 mins). Note also that this player omitted to rate motivation. Participants were able to finish the game in approximately one week. Three participants in the perception group managed to finish the game a second time. On average, the participants attained the eighth level of the third world in the tapping group, and the sixth level of the seventh world in the perception group, in both cases at the second repetition of the game. In general, it took more time to complete the tapping version (on average, 191.0 mins; SD = 61.57) than the perception version (121.9 mins; SD = 17.53) (t(7.1) = 2.94, p < .05). Average scores for each of the nine worlds and for the two versions of Rhythm Workers are presented in Figure 4. The performance of the perception and the tapping groups did not differ at the beginning of the game (world 1, t(13.04) = 1.10, p = .29). The scores in the nine worlds for the two groups were entered in a 9 x 2 mixed-design analysis of variance (ANOVA). World (1 to 9) was the within-subject factor and Group (tapping vs. perception) the between-subject factor. The effect of the group was close to significance (F(1, 8) = 4.08, p < .058). This suggests that the tapping version may be slightly more difficult than the perception version. The average performance in the nine worlds differed as a function of the group, as shown by a significant World x Group interaction (F(8, 144) = 3.1, p < .01). 4 Post hoc tests (Tukey HSD) comparing the two groups at each of the worlds showed a significant difference only in world 8 (p < .05). In sum, the protocol using the tapping version of the game was slightly more difficult than the one using the perception version, an effect which became visible as the game progressed.

Mean scores obtained in the nine worlds for the tapping and the perception groups. Error bars indicate Standard Error of the Mean.
Finally, we compared the results obtained with the three types of stimuli (metronome, metrical sequence, and music) by the participants in the two groups (see Figure 5). Participants in both groups performed better with the metronome sequences than with the metrical sequences (W(18) = 55, p < .01), and, in turn, better than with music (W(18) = 55, p < .01). Finally, participants in the perception group had higher scores than in the tapping group with the metronome sequences (W(18) = 93, p < .01) but not with the other stimuli.

Mean scores obtained with the three types of stimuli for the tapping and the perception groups. Error bars indicate Standard Error of the Mean.
Motivation
Mean motivation ratings for each session in the two groups are shown in Figure 6. In all the sessions, motivation was much higher than the average value of the scale (3.5), irrespective of the version of the game. We compared the motivation in the 10 sessions for the two groups with a 10 x 2 ANOVA. Session (1 through 10) was the within-subject factor and Group (tapping vs. perception) the between-subject factor. Neither the main effects of Group (F(1,17) = 0.01, p = .76) or of Session (F(9,153) = 1.16, p = .32), nor the Group x Session interaction (F(9, 153) = 0.92, p = .51) reached significance. Thus, motivation to play the game was high and constant throughout the protocol in both groups. Participants’ comments about their experience in playing the game were also collected. Overall, all the players reported that the game was enjoyable. Several comments concerned the originality of using a game in a scientific context (“I was not expecting playing a game in a scientific experiment”; “it was a great experience playing a game that may finally help people”). The negative comments were all about the time participants had to spend to carry out the entire training protocol (up to 5 hours over two weeks, and in addition the pre- and post-training evaluation sessions). None of the participants reported difficulties in playing the game in general, but most of them found some of the last levels quite difficult.

Motivation scores across the 10 testing sessions. Error bars indicate Standard Error of the Mean.
Beat perception and tapping
For the BAT, a sensitivity index (d′) was calculated on the basis of the number of Hits (i.e., when a misaligned metronome was correctly detected) and False alarms (i.e., when a misalignment was erroneously reported). No change in the performance of the control group was observed between the two sessions (before, mean = 2.96, SD = .92; after, mean = 3.13, SD = 1.06; t(9) = 1.05, p = .32). As differences in the effect of the training protocol can be expected from one individual to the other in the experimental groups, the difference between the two sessions in percentage was calculated separately for each individual. Improvement and worsening of the performance post-training between 10% and 25% relative to the performance before the training were considered as small, between 25 and 50% as average, and above 50% as large. Mean and individual results for the three groups are presented in Table 4. 5
Individual performances (d′) for participants in the control, perception and tapping groups obtained in the BAT. 5 Grey shades indicate the level of improvement.
Improvement: *small, **average, ***large.
Worsening: †small, ††average, †††large.
A significant improvement was found only for the perception group (t(9) = 3.21, p < .01) but not for the tapping group (t(8) = 0.23, p = .41). In spite of these results at the group level, we observed important individual differences within the three groups, which can be seen in Table 4.
In the control group, half of the participants improved their performance in the second evaluation session by more than 10% while the other five had only a small improvement. Two participants worsened their performance, while the last three participants remained stable.
Most of the participants in the perception group (7 out of 10) displayed an improvement of their performance after the training greater than 10%. Two showed a large improvement, two an average improvement, and three had only a small improvement. Note that three subjects did not reveal any improvement. This is not very surprising, though, as two of them already reached the maximum score (d′ = 4.37), and the third obtained a score very close to the maximum (3.93) before the training. Thus, lack of training effect in these cases can be ascribed to a ceiling effect.
The effects of the training were less visible in the tapping group. More than half of the participants (5 out of 9) improved their performance as a result of the training, by more than 10%. Two showed an average improvement and three had a small improvement. In contrast, surprisingly, three participants revealed worse performance after the training than before the training, two of them with a large decrease in the detection of beat alignment.
In the tapping tasks, synchronization consistency (length of the resultant vector, from 0 to 1) was calculated, as it is a measure of synchronization performance particularly sensitive to individual differences (Bégel et al., 2017; Dalla Bella, Farrugia, Benoit et al., 2017; Sowiński & Dalla Bella, 2013). Participants performed at ceiling before the training: synchronization consistency when tapping with a metronome was .96 (SD = .02) for the control group, .97 (SD = .02) for the perception group, and .93 (SD = .01) for the tapping group. Similarly, synchronization consistency when tapping with music was very high in the three groups at baseline (control group, .94, SD = .01; perception group, .96, SD = .01; tapping group, .88, SD = .01). The performances of the tapping group vs. the control group (with music, t(13.2) = 1.01, p = .33; with a metronome, t(12.3) = 2.12, p = .05), and the perception group vs. the control group (with music, t < 1; with a metronome, t < 1) did not differ before the training. No difference was observed between the performances of all participants before and after the training (with a metronome, t(28) = .11, p > .05; with music, t(27) = .92, p > .05).
Discussion
The goal of this experiment was to pilot a training protocol based on Rhythm Workers in a small group of healthy young adults in order to assess the usability of the game and the compliance with the protocol. Moreover, first evidence was gathered about the effects of the protocol on rhythmic skills using a beat perception task and paced tapping. In general, the protocol was very well received by the participants. All of them completed the study and played a significant amount of time (on average 69% of the maximum time). Participants were highly motivated to play the game across the different training sessions and until the end of the game. None of them misunderstood the goals of the game or the instructions, or had issues with operating the tablet interface. Despite the fact that rhythmic difficulty increased with the worlds in the game, all participants but one (i.e., who spent the least of time playing) could complete the game in approximately 1 week. Overall, these findings show excellent usability and compliance with both versions of Rhythm Workers in young adults.
The tapping version, although very well performed, appeared slightly more difficult than the perception version, as seen in the performance of some levels of the game, in particular with a metronome. In addition, the game was generally easier with simple rhythmic sequences (i.e., metronomes), and was progressively more difficult with metrical sequences and music. This finding does not come as a surprise, as stimulus complexity influences beat perception (Chen, Penhune, & Zatorre, 2008a; Grahn & Brett, 2007; Lewis, Wing, Pope, Praamstra, & Miall, 2004). The metronome contains one single periodicity as there is only one event (i.e., the metronome tick) repeated in an isochronous pattern. Metrical sequences are more complex temporal patterns (with up to four events per beat) involving more than one temporal periodicity (meter; London, 2012), but without pitch variations. Finally, music is the most complex stimulus, including both pitch variations and a metrical structure. Note that syncopated rhythms (i.e., rhythms in which no event occurs at the time of the beat, Fitch & Rosenfeld, 2007) may occur in metrical sequences and music but not in metronome sequences.
This protocol creates optimal conditions for improving rhythmic skills. First evidence was provided that beat perception tested with the BAT was clearly improved by the perception training. All the participants in the perception group improved their performance (up to 100% relative to their performance before the training), except those who performed at ceiling before the training. More than half of the participants in the tapping group also showed enhanced beat perception by more than 10%. In contrast, no effect of the training protocol was found on sensorimotor synchronization. This negative finding has to be interpreted with caution, though, as participants’ performance before the training was already at ceiling in the paced tapping task. More sensitive tasks (e.g., tapping in antiphase or with more complex rhythmic patterns) should be used in the future to properly assess the effect of the training protocol on sensorimotor synchronization.
General discussion
Rhythm Workers, a music-based serious game, is a new tool targeted to the training of rhythmic skills. In the first experiment, we selected the musical material used in the serious game. Measures based on expert ratings and on a tapping study served to sort the excerpts from the easiest to the most difficult in terms of beat saliency. In the second experiment the structure of Rhythm Workers was presented, including 99 audio stimuli (music, rhythmic sequences, and metronome sequences), and a dedicated training protocol was devised. Two training protocols implementing two versions of the game (tapping and perception) were tested in a small sample of 20 healthy young adults. Usability of the game and compliance with the protocol were tested. In addition, first pilot data with the goal of testing the effect of the training on rhythmic skills were collected.
The results of this proof-of-concept pilot study show high motivation of the participants when playing the game, which is sustained for the two weeks of the training protocol. This finding is very encouraging because players’ motivation attests that the game is equally engaging across the different worlds, in both the perception and in the tapping versions. Thus, the protocol with Rhythm Workers is not likely to be hampered by motivational factors, at least over a period of two weeks, and thereby creates good training conditions. Moreover, because all the participants but one managed to finish the game in approximately one week, the serious game shows good usability and compliance with both versions. A caveat of the present study is that the participants received a remuneration for taking part in the experiment. This may have partly increased the general motivation in participating in the experiment and in playing the game, at least at the beginning.
All the players managed to play and to obtain satisfying scores, in spite of the notable variability in the players’ performances. The serious game and the protocol are sufficiently flexible to adapt to initial individual differences in rhythmic skills, without hindering players’ motivation. Variability in rhythmic skills in individuals without musical training is expected since it reflects the various profiles of rhythmic abilities in the general population (Bégel et al., 2017; Launay, Grube, & Stewart, 2014; Sowiński & Dalla Bella, 2013; Tranchant, Vuvan, & Peretz, 2016). Participants’ scores in the tapping version of the game, although generally comparable with the scores in the perception version, differed at least at one of the difficulty levels of the game. Thus, in its current form, the game may be slightly more challenging in the tapping version than in the perception version. This is confirmed by the fact that it took more time to complete the tapping version than the perception version of Rhythm Workers. The discrepancy between the two versions can be explained by task factors such as the type of response, and by the temporal processes engaged during the performance of the game. In the tapping version, the player produces a continuous performance (i.e., up to 60 taps) for a given stimulus (i.e., level in the game), with little possibility of achieving a good score by chance. In the perception version, the player has a smaller set of possible responses, with a binary choice (yes/no) for each judgment (50% chance to answer correctly), and four judgments to be provided at a given level. Due to the more limited number of responses in the perception than in the tapping version of the game, it is easier for the player to provide a performance above chance (i.e., 3 out of 4 correct answers will suffice).
We first provided evidence that a training protocol using a serious game such as Rhythm Workers can have a positive effect on rhythmic skills. All individuals who received the perception training improved their performance in detecting whether a metronome is aligned or not with music (BAT), except those who were at ceiling before training. Improvement of the performance could not be ascribed merely to a repetition of the tasks. No difference in beat perception at the two times of testing was found in the absence of training (control group). The evidence provided here show that the training is capable of improving rhythm skills, although this effect may be task-specific, as only a perceptual training improved the performance in the BAT. Unfortunately, due to a ceiling effect in the tapping task at baseline, we could not have a proper assessment of the effect of perception training on sensorimotor synchronization. Further testing with more sensitive tasks will be needed to assess whether perceptual training can also improve the performance of motor rhythmic tasks.
Against our expectations, we found no significant effect of the tapping training at the level of the group. Yet, at the individual level, it is worth noting that most of the participants also improved after the training in terms of beat perception, but three of them showed worsening of the performance. This unexpected finding deserves further inquiry in future studies with a larger sample size. Yet, these differences in the response to our rhythmic training protocol are reminiscent of the results obtained with other protocols for rhythmic training such as rhythmic auditory stimulation (RAS; Benoit et al., 2014; Dalla Bella, Benoit, Farrugia et al., 2017; de Dreu et al., 2012; Spaulding et al., 2013; Thaut et al., 1996). This is a method consisting of presenting auditory rhythmic stimuli to patients with movement disorders, such as Parkinson’s disease (Benoit et al., 2014; Dalla Bella, Benoit, Farrugia, Schwartze, & Kotz, 2015; Dalla Bella, Benoit, Farrugia et al., 2017; de Dreu et al., 2012; Lim et al., 2005; Spaulding et al., 2013; Thaut & Abiru, 2010; Thaut et al., 1996) and stroke (Thaut, McIntosh, & Rice, 1997; Thaut, Kenyon, Hurt, McIntosh, & Hoemberg, 2002; Thaut et al., 2007). For example, some patients with Parkinson’s disease trained with RAS for several weeks positively respond to the training (i.e., by increasing their walking speed) while others either do not react or respond negatively to the auditory cues (Dalla Bella, Benoit, Farrugia et al., 2017; Dalla Bella et al., 2015). Importantly, individuals who respond positively to the training are those who have relatively spared rhythmic abilities (Dalla Bella, Benoit, Farrugia et al., 2017). Therefore, by improving rhythmic skills with Rhythm Workers it may be possible to increase patients’ response to such rehabilitation programs.
In summary, we devised Rhythm Workers, a serious game for training rhythm skills, and tested it in a proof-of-concept pilot study in healthy young adults. This step was mandatory to ensure that the game can be used by individuals without neurological or neuro-developmental disorders. Usability and game compliance were excellent. Encouraging results of the effect of the training on rhythm skills were also found. Further studies are needed to test whether the game can be used by individuals with neuro-degenerative (e.g., Parkinson’s disease), neurological (e.g., stroke) or neuro-developmental disease (e.g., ADHD, dyslexia or stuttering). Finally, testing the effect of training rhythm skills with Rhythm Workers on associated functions such as movement and cognition in these disorders would be highly relevant for rehabilitation.
Footnotes
Contributorship
VB, AS, and SDB contributed to the design of the experiment. VB conducted the experiments. VB and SDB analyzed the results and wrote the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The study was approved by the Euromov IRB.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by a CIFRE grant to V.B., in collaboration with the Company NaturalPad, and a Junior Grant from the Institut Universitaire de France to S.D.B.
Peer review
Felicia Cheng, European Neuroscience Institute Göttingen.
One anonymous reviewer.
