Abstract
As musicians have been shown to have a range of superior auditory skills to non-musicians (e.g., pitch discrimination ability), it has been hypothesized by many researchers that music training can have a beneficial effect on speech perception in populations with hearing impairment. This hypothesis relies on an assumption that the benefits seen in musicians are due to their training and not due to innate skills that may support successful musicianship. This systematic review examined the evidence from 13 longitudinal training studies that tested the hypothesis that music training has a causal effect on speech perception ability in hearing-impaired listeners. The papers were evaluated for quality of research design and appropriate analysis techniques. Only 4 of the 13 papers used a research design that allowed a causal relation between music training and outcome benefits to be validly tested, and none of those 4 papers with a better quality study design demonstrated a benefit of music training for speech perception. In spite of the lack of valid evidence in support of the hypothesis, 10 of the 13 papers made claims of benefits of music training, showing a propensity for confirmation bias in this area of research. It is recommended that future studies that aim to evaluate the association of speech perception ability and music training use a study design that differentiates the effects of training from those of innate perceptual and cognitive skills in the participants.
There has been an increase in interest recently in the question of whether music training has causal benefits for a range of speech perception skills in people with and without sensory or other deficits. This review focusses on the question of whether there is evidence that music training provides benefits in speech understanding for people with hearing impairment. Sensorineural hearing loss is associated with difficulty in speech perception, particularly in background noise, a difficulty that is not wholly ameliorated by amplification. This difficulty arises mostly through loss of spectral or temporal information in the periphery (e.g., loss of hair cells and their connections with the auditory nerve—features that are not known to be amenable to plastic changes due to training; Moore, 1996). However, deafness itself can induce plastic changes centrally (Kral et al., 2002; Lee et al., 2003; Sharma et al., 2015). It is therefore of interest to know whether music training can help to overcome the limitations in speech understanding in adults or children with hearing impairment via induced plasticity in the central language networks. Studies that have tested this hypothesis have made the implicit assumption that central brain plasticity induced by music training will overcome or limit the effects of hearing loss.
The hypothesis that music training has causal benefits for skills outside of the music domain (i.e., far transfer of training—such as improved speech perception) has been based on hypothesized specific and general benefits of music training that could be induced via brain plasticity. First, specific skills acquired by music training may overlap with the skills needed for success in the other domain. For example, it has been hypothesized that training in musical pitch perception might transfer to an increased ability to perceive voice pitch, which is an auditory cue that can help to distinguish between two simultaneous talkers (Darwin et al., 2003) or to detect emotional prosody in speech (Bulut & Narayanan, 2008). These hypotheses have been evaluated in studies measuring the frequency following response (FFR) to the fundamental frequency (F0) and its harmonics evoked by a speech syllable. A systematic review of these papers (Rosenthal, 2020), however, challenged this hypothesis and concluded that, although the subcortical F0 response tends to be larger in musicians, the response size is not correlated with speech perception ability in noise. The overlap, precision, emotion, repetition, and attention (OPERA) hypothesis of Patel (2014) is an example of proposed transfer of benefits from music to speech domains via plasticity in shared neural networks. However, examples of far transfer of training are extremely rare in the psychology literature, and many scholars express doubt that it is possible (e.g., Melby-Lervag et al., 2016).
Second, it has been proposed that music training might improve general cognitive or academic skills, and these skills can be used to improve performance on any task-related outcome measure, including speech perception. In the educational field, many existing studies about the effect of music training are targeted at school-aged children, with the purpose of testing whether music training can transfer to cognitive ability or academic achievement (literacy or mathematics). However, a recent meta-analysis by Sala and Gobet (2020) found that, when quality of the research design was taken into account, there was no effect of music training on cognitive skills or academic achievement regardless of age or duration of training. Very small positive effects were only seen in studies with poor design (no randomization and using non-active controls), implying that those small positive results are very likely to be false positives. This null result is supported by studies suggesting that innate characteristics of musicians are better predictors of intelligence. For example, a large control study in twins, one of each twin being musically trained, concluded that the association between musicianship and intelligence was not causal (Mosing et al., 2016).
There are many published papers that compare normal-hearing musicians (with a variety of definitions of “musician”) to non-musicians using cross-sectional and correlational research designs and that show that musicians have a range of psychoacoustic skills that are better than those of non-musicians. These skills include those directly related to musicianship: pitch or melodic contour discrimination (Baskent et al., 2018; Boebinger et al., 2015; Madsen et al., 2017, 2019; Martinez-Montes et al., 2013), temporal beat discrimination (Sares et al., 2018), and ability to attend to stimuli in a complex environment (Tierney et al., 2020; Vanden Bosch der Nederlanden et al., 2020). However, the question of whether these skills are associated with improved speech understanding in musicians in quiet or background noise has not been universally experimentally supported. A number of studies have not found a benefit for speech understanding in musicians in spite of benefits being demonstrated for pitch or intensity discrimination (e.g., Baskent et al., 2018; Boebinger et al., 2015; Madsen et al., 2017, 2019; Ruggles et al., 2014). For a review of studies that investigated speech in noise perception in neurologically normal musicians and non-musicians, the reader is referred to Coffey et al. (2017), who outline the possible theoretical bases of the connection between musicianship and speech in noise perception. However, that review does not discuss the possibility that any advantage for speech perception in noise for musicians may be related not to the music training of musicians, but to their innate skills, except for a note that future studies should “use longitudinal training studies to confirm the causal effects …”
Cross-sectional or correlational studies as described earlier, where musicians and non-musicians are compared, cannot distinguish between putative effects of plasticity due to music training and differences that may be due to innate (genetic or developmental) characteristics and skills. For example, superior innate auditory skills or personality characteristics may be a necessary or at least beneficial characteristic for becoming a professional or amateur musician (Swaminathan & Schellenberg, 2018a). A review by Schellenberg (2015) concluded that the association between music training and speech perception may be largely driven by interactions of genes and environment. This conclusion was based on genetic studies (e.g., Hambrick & Tucker-Drob, 2015) showing that musical aptitude (innate musical ability or potential to succeed in musicianship) has more influence on musical achievement than music practice, demonstrating a powerful influence of genes for becoming a successful musician. In addition, genetic linkage studies (e.g., Oikkonen et al., 2015) link musical aptitude with a range of innate auditory and cognitive skills that would also underpin performance of speech understanding. Cognitive skills, such as working memory, non-verbal intelligence, and attentional skills, may similarly contribute to musicianship (Swaminathan & Schellenberg, 2018a, 2018b). Additional group differences, such as socioeconomic status, are likely to be relevant (itself associated with other characteristics such as educational status, parental interaction quality during childhood, and engagement with other social and intellectual activities). Some authors have argued that a correlation between duration of music training with speech perception outcomes implies a causal effect (e.g., Kraus et al., 2014). However, it is also the case that people with superior auditory and cognitive skills are likely to persist for longer in music training (Swaminathan & Schellenberg, 2018a). To directly test whether music training has a
Review Methods
A search on PubMed, PsyArticles, and Google Scholar was undertaken on March 31, 2020, and rechecked on November 30, 2020, using the terms

Database Searching and Selection Flowchart of Articles for the Review.
The papers are discussed with respect to the design and analysis principles that are detailed below. A meta-analysis statistical approach was not taken in this review, as the small number of studies, and the large number of potential covariables that need to be accounted for in this population, do not allow the statistical approach to be usefully interpreted. Instead, each paper is separately assessed and general conclusions drawn.
Assessment of Research Design: Did the Authors Use a Valid Research Design (Crossover or Randomized Control Trial)?
The gold standard research design to test the efficacy of a training therapy is a randomized controlled trial (RCT). In an RCT, the randomization of subjects between test and control groups aims to make the two groups equivalent in the pertinent individual characteristics that may confound the interpretation (George et al., 2016). Randomization is particularly important when some or all of those characteristics are unknown (e.g., in a new drug trial). However, for randomization to be effective, the number of subjects in each group must be large enough to ensure that the two groups end up being equivalent in all potential confounds (i.e., they are both a good representation of the total population). For subjects with hearing loss, there are multiple known potential confounds related to the hearing impairment (hearing loss type and degree, age, age at onset, type of hearing device, etc.) as well as other general features (e.g., IQ, educational level, incidental exposure to music, etc.) and others we may not be able to identify before starting the experiment (such as differences in baseline performance on the outcome measure). This means that the test and control groups may need to be very large or very tightly defined for randomization to be truly statistically effective in this population.
For smaller studies, an alternative research design is a crossover design, in which the same participants undergo
Assessment of Control Training: Did the Authors Use a Control Group With an Appropriate Active Training? Did They Pay Attention to Potential Biases of Trainers and Trainees?
Without a control group, any increase in outcome measures between before and after training cannot be attributed to the training. A similar problem arises if the control group is a “no-training” passive control group. In that case, any increase in outcomes of the control group may be due to expectations of the effect of training (placebo effect) or the increased interactions with experimenters, and such a study could not distinguish effects of music training from another type of training. Such studies
The choice of the active-control training is also important: The control and test training must only differ in the feature under test. Wright and Zhang (2009) discuss the distinction between stimulus learning and procedural learning, where stimulus learning refers to learning the attributes of a stimulus and procedural learning refers to learning of the factors independent of the trained stimulus. The latter factors include task learning, and environmental factors such as lab environment and interactions with trainers. Without an active control group, the contribution of stimulus learning on its own (usually the aim of the experiment) cannot be assessed. All of the potential factors that could induce confounding differences between effects of test and control training, such as quality and duration of interaction with the trainers, and differences between potential strength of placebo effects need to be carefully controlled by an appropriate choice of control training.
Participant bias is a potential confounding factor that is quite difficult to eliminate or even limit in training studies, as participants will always know what sort of training they are experiencing. In the case of music training, social and mainstream media often contain stories about benefits of music training, making participant bias and expectation particularly difficult to control. Therefore, to limit potential participant bias, both test and control training should be, as far as possible, equally plausibly associated with improved speech perception and equally enthusiastically proposed to the participant by the research team and/or trainer. In the case of child participants, the same principles apply equally to parents. Similarly, in a crossover design, participants (and/or their parents) should be told that both training types are proposed to improve speech perception and that researchers do not know which one is hypothesized to be better. The exact script of what participants are told about the study should be predetermined and included in the report. These techniques do not entirely prevent individuals from having their own biases about the benefits of each training method, however.
Biases arising from both the people supplying the training and the researchers testing the outcome measurements also need to be considered and controlled. The interactions between trainer and trainee should be carefully controlled to make sure that participants in each group are receiving equal quality and type of interaction and encouragement to complete the training. The researchers who are measuring the outcome measures (e.g., speech perception) should be blind as to which training group the participant belongs. This last requirement is the easiest for researchers to implement in their research design.
Assessment of Randomization: Did the Authors Randomize Participants to the Test and Control Training? Did the Authors Report and Statistically Handle Dropouts Appropriately?
Without very careful randomization, it is not possible to control effects of innate abilities on the outcomes. For example, studies that compare people who
In a crossover study design, the challenge of dropouts and non compliance is somewhat different from that in an RCT. Leaving out subjects based on any missing data will have the same effect for both training methods, as the hypothesis is tested by a within-subject comparison. However, if many participants drop out, the results of the analysis may only apply to people who have characteristics likely to make them compliant. In this case, rate of dropouts and/or non compliance should also be measured and reported for each training method. Given equal compliance levels and dropout levels, the analysis can be performed with the remaining subjects (unlike in an RCT) to test the hypothesis.
Assessment of the Statistics: Did the Authors Use an Appropriate Statistical Test to Test Their Hypothesis and Did They Take Into Consideration Multiple Comparisons?
An excellent review of common scientific and statistical errors in clinical trial analysis has been published by George et al. (2016), and several pertinent topics relevant to this review are summarized here. First, when comparing test and control training outcomes, the appropriate test statistic is always one that compares
Two other common errors are mentioned by George et al. (2016): failure to account for multiple comparisons and effects of regression to the mean. Many studies test hypotheses about multiple outcome measures, therefore increasing the chance of type I error. “Regression to the mean” describes a phenomenon whereby, in repeated measures in the same subject, those people with the highest scores tend to have lower scores when retested and those with the lowest scores tend to have improved scores on retest. This can be due to both random variation in test results (experimental error) but also due to real effects. This phenomenon can significantly bias the interpretation of statistical results when the baseline measures of the test and control groups differ and can be measured using a
Results
Thirteen papers were identified that met the inclusion criteria. Eleven of the 13 addressed the population of users of cochlear implants (CIs; seven in adults, four in children) with the remaining two addressing children with either hearing aids or CIs and adults with mild to normal hearing impairment. The 13 papers are listed in Table 1, in which the columns detail information about the research design (type of design, control group and training, randomization, and outcome measures tested).
List of Papers Reviewed With Information About Research Design and Outcome Measures.
Of the 13 papers identified, four (Cheng et al., 2018; Firestone et al., 2020; Hutter et al., 2015; Smith et al., 2017) used no control group. Although all these papers claimed improvements in speech perception in CI users after training, with no control group, these improvements cannot be validly attributed to training or any aspect specifically about music. Cheng et al. (2018) studied 22 pediatric Mandarin-speaking CI users who were trained using the melodic contour identification (MCI) test for 30–60 hours over 8 weeks. Outcome measures were MCI performance, lexical tone identification, and sentence understanding in quiet, all of which improved over the five test sessions 2 weeks apart. Hutter et al. (2015) studied 12 newly implanted adult CI users, who undertook ten 45-minute music therapy sessions over an average of 134 days. The music therapy consisted of five modules that included both music (pitch, rhythm, timbre) and speech perception training. Outcome measures were three questionnaires (with sub measures) of sound quality, self-concept, and therapy satisfaction, along with three music assessments (pitch discrimination, timbre identification, and melody recognition). Assessment of musical timbre and melody identification, but not pitch discrimination, improved after music therapy, as did estimated sound quality and self-concept. However, in a new CI user, most aspects of hearing improve rapidly in the first 3 months after implantation (Blamey et al., 2013; Blamey et al., 2001; Lazard et al., 2012), so any effect of music therapy cannot be deduced from this study. Smith et al. (2017) studied 21 experienced adult CI users who undertook music training using self-administered melody training software. Speech outcome measures were sentences in quiet and in noise. Participants were assessed before and after training and at 6 months post-training and were split for analysis between low and high levels of baseline music experience. Results showed improvements in both speech perception measures only for the low-musical-experience group. However, no between-group statistics were presented, and the music-experienced group had better baseline speech perception scores, suggesting that regression to the mean may have affected the difference between groups. No statistics were presented for both groups combined. Firestone et al. (2020) studied 11 experienced adult CI users, who were instructed to listen to music of their choice for 40 minutes a day, 5 days a week, for 4 or 8 weeks. No training tasks were involved. Outcome measures were obtained from three speech perception tests (words, sentences in quiet, and sentences in noise), a hearing questionnaire, a frequency change detection test, audiometric thresholds, and four cortical acoustic change response parameters for three sizes of frequency change. Significant differences between pre- and posttraining were observed in all behavioral measures but not in the electroencephalogram (EEG) measures. It should be noted that the audiograms showed lower thresholds in the post-test session compared with the pre-test session, making the improvements in speech perception potentially caused by better audibility (e.g., due to higher volume or sensitivity setting in the CI or test environment changes).
A further paper compared two different music trainings without any non-music control group. Lo et al. (2015) studied 16 experienced adult CI users and randomized them between two types of melodic contour training, one of which manipulated the difficulty using pitch intervals, whereas the other manipulated duration cues. They hypothesized that both types would improve speech perception due to improved prosodic cues and F0 tracking but that the duration group would have additional specific benefit for identification of stop consonants due to better perception of voice onset time and formant transitions. Unfortunately, as there was no non-music control group, the research design only allows inferences about the
Four papers used a passive “no-training” control group (Dubinsky et al., 2019; Lo et al., 2020; Petersen et al., 2012; Yucel et al., 2009). Again, although any improvements seen in such studies might be due to the training, any difference before and after training or between groups may be due not to the intervention but to expectations of participants or researchers (placebo-type effects), or due to the additional beneficial interactions between trainers and participants that would happen with any training scheme. In addition, none of these four studies randomized participants to control and test groups, making it possible or likely that the two groups differed on important innate characteristics.
Dubinsky et al. (2019) studied older adults with normal hearing or mild hearing impairment. Forty-five participants in a group singing course were recruited for the test group (nine withdrew from the study), and the passive control group was made up of age- and audiometrically-matched other adults. Because the test group participants were self-selected, the possibility of innate differences between groups was high, especially considering that nine withdrew from the study. The 10-week singing course was supplemented by online musical and vocal training exercises. Outcome measures were sentence perception in noise, frequency difference limens, and two EEG measures (FFR amplitude and phase coherence). Group × Session interactions were significant for speech perception in noise and frequency difference limens and non-significant for FFR amplitude but not phase coherence. The trend for significance seen in the EEG amplitude data seemed to be driven by the unequal baseline EEG amplitudes (i.e., regression to the mean). Although the test group gained more improvement in speech perception in noise than the control group, the use of a passive control and self-selection in the test group makes the interpretation of this result problematic.
Lo et al. (2020) recruited 14 children with moderate to profound hearing loss who used a variety of hearing aids and CIs. Five were assigned to start 12 weeks of music training immediately (but two withdrew before completing the study), and nine were assigned to start 12 weeks later. Three of the 14 did not have music training, making the final composition of the groups unclear. Although group allocation was “pseudorandom,” parents could opt for a different group for convenience. Changes in outcome measures over the first 12 weeks in the wait-list group were used as a passive (no training) control data; however, the main hypothesis testing did not include Group (trained
Petersen et al. (2012) studied 18 newly implanted adult CI users, divided into a test group who undertook 6 months of music training, and a no-training group. The groups were matched on hearing factors and not randomized. It is unclear whether there was any self-selection for the music training arm. The music training consisted of 1-hour/week face-to-face training plus home practice using computer applications. Outcome measures were a music test battery with five subtests, speech perception in noise, and emotional prosody. Group × Session interactions were calculated for each outcome measure, with three of the five music tests showing greater improvement in the test group compared with control group (one of two music instrument identification tests, rhythm detection and MCI). Gains in pitch ranking, speech perception, and emotional prosody identification were all not different between the test and control groups. Multiple comparisons were not taken into account.
Yucel et al. (2009) studied 18 newly implanted children (mean age of implantation around 4 years) who were assessed preimplantation and over 2 years following implantation. The test group was enrolled in a program that included music training carried out at home with a computer and electronic keyboard, consisting of pitch and rhythm tasks and color-coded playing of tunes (mean time approximately 2–3 hours per month). The control group was selected from a different research program that did not include music training. No information about how the control group was selected was given. Outcome measures at each test point were speech sound detection, closed-set word identification, and two types of open-set sentence perception tests, along with a parent questionnaire about music perception after 12 and 24 months. In addition, parent questionnaires were administered to assess use of sound in everyday situations. Separate statistics compared speech perception of music and control groups at each of 6 time points between preimplant and 24 months postimplant (a total of 18 tests without control of multiple comparisons), with only one instance of a
Two further papers (Chari et al., 2020; Fuller et al., 2018) used a design that compared two types of music training with a non-music control group. Chari et al. (2020) tested the hypothesis that auditory-motor music training is better than auditory-alone music training for adult CI users. Subjects were randomly assigned to the three groups. However, it should be noted that (a) with such small numbers (4–7 in each group), randomization is unlikely to make the groups equivalent on confounding characteristics in CI users and (b) two subjects were excluded for “failure to complete the training,” but it was not stated which group(s) these subjects originally belonged to. Home-based training consisted of 30 minutes a day for 5 days per week for 4 weeks (10 hours total). Both music trainings involved training MCI using commercial software. Outcome measures included speech perception in quiet and noise, speech prosody perception, and two musical tests (pitch perception and melodic contour perception). To test the hypothesis, the “change measure” was calculated for each outcome measure, and a one-way analysis of variance was used to test differences in changes across the three groups. No differences were found between groups for any outcome measure except for MCI using the piano tones (a test directly associated with the training applied), where the auditory-motor training group had greater benefit than the auditory-alone group. However, this last result appears doubtful because the auditory-motor group happened to have lower baseline scores than the auditory-alone group on this measure, making the result susceptible to regression to the mean. This study had an appropriate design and analyses for testing whether auditory-motor was better than auditory-alone music training for the outcome measures, except for the rather small
Fuller et al. (2018) evaluated the effects of two different types of music training (pitch/timbre training and music therapy) in adult CI users in an RCT. An active non-musical control training was also included which consisted of writing, cooking, and woodwork classes. Training consisted of weekly 2-hour sessions for 6 weeks. The outcome measures tested were speech perception in quiet and noise and vocal emotion identification, as well as MCI and quality of life. Although the group allocation was randomized, the authors confirmed that the final groups could not be equivalent on average for all relevant features as there were too many variables and low numbers of participants (6 or 7 per group). They reported that there was a non-significant Group × Session interaction for both the speech perception outcome measures. For vocal emotional identification, there was also a non-significant Group × Session interaction, indicating no significant difference on training outcome between the three groups. However, they then reported that the music therapy group had a significant within-group training effect, different from the other groups, and invalidly claimed this as evidence of intramodal training (as the music therapy included emotion identification training). The conclusion of this article that “… computerized music training or group music therapy may be useful additions to rehabilitation programs for CI users … ” was therefore not substantiated by the data. This article had a better design than the ones reported earlier, in that they used an active control group, randomized participants, and assessed the Training × Group interaction terms. However, the very low sample size would make randomization ineffective in CI users, and the choice of active control was not ideal to help limit biases of participants or experimenters. Nevertheless, the results did not support the benefit of music training for speech perception.
Good et al. (2017) tested the hypothesis that music (piano and singing) training was superior to visual art training for development of perception of emotional speech prosody in 18 children with CIs. Students were mostly assigned to two different test locations for the two trainings, based on geographical preference, with the remainder being randomized. However, 7 out of 25 students recruited dropped out of the program before completion. The authors did not report which group they were originally in. The test group had 6 months (half hour per week, total 12 hours plus weekly practice) of music training, and the control group had the same hours of visual art training. Most children in both groups also participated in school-based musical activities. Outcome measures were musical abilities (Montreal Battery for Evaluation of Musical Abilities: Peretz et al., 2013) and perception of emotional prosody. For musical abilities, the interaction of Session × Group was significant, showing that the music training group improved musical outcome measures more than the art training group. For emotional prosody tests, there was a non-significant interaction of Group × Session, showing that music training did not improve emotional prosody perception more than did art training. Unfortunately, in spite of the non-significant interaction term, the authors then proceeded to commit the statistical fallacy of interpreting differences in significance of training effects in individual groups as evidence of differences in benefit of the training methods and incorrectly inferred that “music training improved emotional speech prosody.”
Bedoin et al. (2018) tested the hypothesis that musical (rhythmic) primes (played by percussion instruments) used in morphosyntactic training exercises would improve syntax processing in 10 children with CIs more than the same training using primes of (non-rhythmic) environmental sounds. Primes are stimuli preceding the training stimuli that are intended to draw the attention of the trainee to the relevant features in the training stimuli. Musical primes were rhythmic structures taken from four, 30-second musical sequences. Non-musical primes were 30-second sequences of environmental sounds (street, cafeteria, playground, market). In the grammatical judgment training sessions, 10 sentences were presented (preceded by the primes), and participants were asked to detect and correct morphological errors. In training of morphosyntactic comprehension, participants had to follow the instructions in five sentences. The musical primes were not matched in meter with the sentences used. The authors used a crossover design, and the outcome measures were grammatical processing, syntax processing, non-word repetition, attention, and memory. The two groups for different training order were selected based on “best balance” of age and performance on the two morphosyntactic outcome measures. Each child had eight 20-minute sessions of each training method, two per week. The Training × Session interaction was significant in favor of the rhythmic primes over environmental primes for grammatical judgment but not the syntactic processing (in contrast to the hypothesis, which was that the syntactic processing would be more benefited by rhythmic primes than was grammar judgment). For the non-word repetition and all the cognitive tests, the interaction term was not reported (likely because it was not significant—certainly so in the case of non-word repetition, where individual data were presented), and hence the reported within-training analyses of outcome measures cannot be interpreted in terms of which training was better. However, the authors claimed that “ . . . musical primes enhanced the processing of training syntactic material, thus enhancing the training effects on grammatical processing as well as phonological processing and sequencing of speech signals.” All of these claims are unsubstantiated by the data: There was no test of a causal relation between syntactic training material processing and grammatical processing, and the claimed benefits to phonological processing and sequencing of speech were unsupported by cross-group analyses. This article did not assess the effect of music training per se but rather the use of rhythmic primes to improve syntax training. There was no mention of blinding of participants or researchers, and the interaction of training order and training method was not checked before orders were combined for analysis.
Overall, it is notable that none of the 13 papers, including those with better research designs (use of active controls and randomization), mentioned attempts to limit bias of participants, trainers, or testers. In addition, the non-music active control training choices (visual art, woodworking, cooking, and writing) were likely to introduce bias via higher expectations of participants, trainers, and testers for music training. In spite of these design limitations, none of the 13 papers produced statistically valid evidence to support the specific hypothesis that music training improves speech perception in hearing-impaired listeners. Of the papers that included valid analyses of their data, none found that music training improved speech perception: There was no evidence that music training was better than visual art training (Good et al., 2017) or writing/cooking/woodwork training (Fuller et al., 2018), no evidence that auditory-motor music training was better than auditory-alone music training (Chari et al., 2020), and no evidence that rhythmic primes improved syntactic processing in speech or non-word repetition more than did environmental primes (Bedoin et al., 2018).
Discussion
The review has found no evidence to support the hypothesis that music training has a significant causal effect on speech understanding or speech processing in hearing-impaired populations. In fact, the papers with a higher quality research design showed no significant benefit when valid statistics were used. Although this null result may be contributed to by insufficient statistical power in the studies reviewed, an alternative interpretation is that music training does not transfer to speech perception benefits in hearing-impaired people to any clinically relevant degree. Therefore, either music training is ineffective for improvement of speech understanding in general, or the limitations imposed by hearing loss or listening with a CI cannot be overcome using music training, or both of these proposals are true. The first proposal is supported by the studies reviewed in the introduction that show that innate characteristics (such as IQ, musical aptitude, and personality) predict musicianship and independently predict better speech perception (Schellenberg, 2015; Swaminathan & Schellenberg, 2018a, 2018b). The distinction between plastic effects of training and innate characteristics of the person being trained is particularly important when music training is being proposed as a therapy in clinical populations and where the hypothesis is solely about exploiting plasticity induced by the training. In that case it is extremely important that the experiment (which is essentially a clinical trial) is specifically designed to limit any genetic or innate differences in participants in the comparison groups, for example, by using careful randomization or a crossover design. A comparison of children who choose to take music lessons, or who engage more in music lessons, with those that do not, is just as much a test of genetic differences as it is a test of influence of music training.
All but two of the papers reviewed addressed questions in the profoundly deaf population who use CIs. All of those studies except Bedoin et al. (2018) based their hypotheses at least partially on the proposal that improving (musical) pitch perception would translate to speech perception benefits via better perception of voice pitch (F0). CIs are unable to convey complex pitch (such as musical pitch or voice pitch) to the degree or salience that people with normal hearing can hear pitch (Fielden et al., 2015; McDermott & McKay, 1997). The design of CIs means that fine details about harmonic structure in speech and music (such as precise frequency and spacing) are not represented, and resolved harmonics are what the normal-hearing system processes to extract salient pitch from sounds. A very weak pitch can be heard related to periodicity in the electrical signal (McKay et al., 1994); however, many studies have shown that this periodicity pitch cannot be reliably heard in real-life situations, as the modulations in different electrical channels can interact with each other (McDermott & McKay, 1997; McKay & Carlyon, 1999; McKay & McDermott, 1996). Many signal processing strategies have attempted to improve the transmission of complex pitch for CI users, but the results have been equivocal or unsuccessful (Wouters et al., 2015). The most successful way to transmit complex pitch to a CI user may be via concurrent use of any low-frequency residual acoustic hearing that the person may have (Chen et al., 2014; Straatman et al., 2010; Visram, Azadpour, et al., 2012; Visram, Kluk, et al., 2012). Because the difficulty of representing pitch in CIs is related to inbuilt limitations of how a CI works, it must be doubtful whether pitch training can have a beneficial effect on pitch perception in real-life listening, especially one that can be transferred to speech understanding, or the perception of emotional prosody. In addition, most implant systems do not transmit modulations above 300 Hz, making F0 of high female voices and children’s voices, and the F0 of many musical notes, unrepresented in the modulations of the signal (Wouters et al., 2015). To add to the difficulty of the concept that music training can transfer to better speech understanding in CI users, many studies have shown that, even though musicians may have better pitch discrimination than non-musicians, better pitch discrimination does not necessarily translate to better speech perception in noise, or better use of voice pitch to distinguish between two simultaneous talkers.
Six of the papers hypothesized that music training would improve the perception of speech prosody (emotion detection and/or differentiation between questions and statements) by CI users based on the fact that F0 provides cues for prosody in people with normal hearing. However, in everyday conversations, there are concurrent alternative or correlated cues to prosody such as changes in intensity, duration, or timbre (Coutinho & Dibben, 2013) that are more reliably perceived than voice pitch by a CI user. Therefore, direct training in prosody perception is likely to be a more fruitful way than pitch training of improving this feature of speech perception by CI users.
The Bedoin et al. (2018) paper investigated whether the rhythm features of music could translate to better processing of speech for CI users. This idea follows a series of papers in which it has been shown that preceding a spoken utterance with a rhythmic prime that matches the meter (stress pattern and number of units) of the utterance leads to improved, or faster, processing of the speech, due to the expectations set up by the prime, compared with a prime that does not match the utterance (Cason, Astesano, et al., 2015; Cason & Schon, 2012). That is, knowing ahead of time the meter structure of the sentence you are going to hear helps to process the sentence better. This effect is not surprising, as the matched prime is providing useful cues to the following sentence or multisyllable utterance and focusses attention on the stressed syllables in the utterance. Similarly, matched primes improved the speech production of a following heard sentence in hearing-impaired children (Cason, Hidalgo, et al., 2015). However, in the Bedoin et al. paper, the rhythmic primes used in the training sessions were not related to the training material that immediately followed the prime, so it is unclear how the rhythmic prime could improve the within-session effectiveness of the training more than the non-rhythmic one (or with no prime). In addition, no paper has reported that the effect of using primes has any longer-term effect than that on the immediately following utterance.
This review has highlighted the difficulty in undertaking high-quality research in the hearing-impaired population to investigate potential benefits of music training. First, there is the challenge of participant and experimenter bias, given the popular community view of music benefit for “brain training,” as promulgated in the media. Experimenters are not immune to this bias, as evidenced by the unsubstantiated claims of benefit made in the majority of the papers reviewed. Confirmation bias, where researchers only report data or analyses that support their original belief and ignore data or analyses that do not fit, seems ubiquitous. This same point was also strongly made by the authors of the large meta-analysis of music training for cognitive benefit in children (Sala & Gobet, 2020) who state, “We conclude that researchers’ optimism about the benefits of music training is empirically unjustified and stems from misinterpretation of the empirical data and, possibly, confirmation bias.” Along similar lines, a study of whether papers about music training inferred causality from correlation found that 72/109 of the papers reviewed invalidly made this inference, a ratio that increased up to 81% for papers written by neuroscientists (Schellenberg, 2020). If research studies are undertaken with the mindset of “demonstrating a known fact” instead of testing a hypothesis, it is no surprise that researchers are tempted to find whatever they can in the data to present to the reader as supporting the “fact.”
The second challenge faced by experimenters is to find a way to ensure that test and control groups are truly equivalent on all relevant factors. Randomization will only be effective if large numbers of participants are used, and few people withdraw from the program. For small groups of profoundly deaf children or adults, this task is virtually impossible. Trying to match groups poses the same problem and opens the opportunity for bias of group assignment. A crossover research design would seem a better choice in this population, provided effects of test order can be carefully controlled.
A different experimental approach using correlational studies has been proposed by Swaminathan and Schellenberg (2018b), in which effects of training and musical aptitude (and other potential confounding factors) are partialled out. When examining the association between music training and non-music abilities such as speech understanding, or cognitive abilities, the association is tested using partial correlations with multiple known factors that account for overlapping variance. In particular, they use a music aptitude test (which is correlated with cognitive abilities and predicts success in music training) as one such factor. For example, when music training was kept constant, an association between music aptitude and phoneme perception in adults was found (Swaminathan & Schellenberg, 2017); however, when musical aptitude was kept constant, no association between music training and speech perception or intelligence remained. The authors concluded that
The finding in this review that there is no evidence that music training benefits speech perception in hearing-impaired people does not detract from the evident social, cultural, and enjoyment benefits that music education can confer. Music is an important feature of human cultural and social life, with the large majority of people enjoying listening to music even if they do not perform music or undergo music education. The cultural relevance of music is particularly relevant for the hearing-impaired population, who may be unfairly discouraged from participating in musical activities. It is important that equity of access to music education is sought and engagement in music making is encouraged for all young people with hearing impairment. There is no necessity to promote music education on grounds of a supposed benefit for academic or speech and language benefit, as it has values of its own that make it worth including in education for everyone, including those with hearing impairment.
Conclusions and Recommendations
It is clear that the benefits of music training for speech perception in the hearing-impaired population have not been convincingly demonstrated and probably do not exist to any practically relevant degree. Future researchers who wish to scientifically study whether music training can lead to benefits in non-music domains of cognition such as language perception or language development need to plan their studies to robustly and validly test their hypotheses and to actively distinguish between plasticity effects of training and innate characteristics.
Supplemental Material
sj-pdf-1-tia-10.1177_2331216520985678 - Supplemental material for No Evidence That Music Training Benefits Speech Perception in Hearing-Impaired Listeners: A Systematic Review
Supplemental material, sj-pdf-1-tia-10.1177_2331216520985678 for No Evidence That Music Training Benefits Speech Perception in Hearing-Impaired Listeners: A Systematic Review by Colette M. McKay in Trends in Hearing
Footnotes
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
