Abstract
Research on the effect of background music (BgM) on cognitive task performance is marked by inconsistent methods and inconclusive findings. In order to provide clarity to this area, we performed a systematic review on the impact of BgM on performances in a variety of tasks whilst considering the contributions of various task, music, and population characteristics. Following the PRISMA and SWiM protocols, we identified 95 articles (154 experiments) that comprise cognitive tasks across six different cognitive domains—memory; language; thinking, reasoning, and problem-solving; inhibition; attention and processing speed. Extracted data were synthesized using vote counting based (solely) on the direction of effects and analyzed using a sign test analysis. Overall, our results demonstrate a general detrimental effect of BgM on memory and language-related tasks, and a tendency for BgM with lyrics to be more detrimental than instrumental BgM. Only one positive effect (of instrumental BgM) was found; and in most cases, we did not find any effect of BgM on task performance. We also identified a general detrimental impact of BgM towards difficult (but not easy) tasks; and towards introverts (but not extraverts). Taken together, our results show that task, music, and population-specific analyses are all necessary when studying the effects of BgM on cognitive task performance. They also call attention to the necessity to control for task difficulty as well as individual differences (especially level of extraversion) in empirical studies. Finally, our results also demonstrate that many areas remain understudied and therefore a lot more work still needs to be done to gain a comprehensive understanding of how BgM impacts cognitive task performance.
Keywords
Introduction
With the growth in the accessibility, exposure, and consumption of music in everyday life, people engage with music listening in a wide variety of situations and contexts (Bull, 2006; North et al., 2004). Interestingly, amongst these music listening behaviors, research shows that on most occasions people listen to music when they are engaged with other tasks like studying or working, exercising or doing housework, shopping or traveling, amongst many others. Some of the key reasons for listening to music in these situations are fighting boredom, passing the time, general entertainment, and out of habit (Greasley & Lamont, 2011; Juslin et al., 2008; Lonsdale & North, 2011; North et al., 2004; Randall & Rickard, 2017; Rentfrow, 2012; Sloboda et al., 2001; Stratton & Zalanowski, 2003). 1
Amongst these activities, some of the most common ones involve mental work that require intensive cognitive functioning. For instance, Calderwood et al. (2014) conducted a study to understand what other activities students normally engage with whilst studying and found that, in a 3-hr study session, students spent more than one-third of the time (73 min) listening to music. Similarly, a survey conducted amongst 295 office employees in the UK showed that employees reported spending an average of 36% of their working week listening to music (Haake, 2011). In fact, listening to music often makes it to the list of tips and hacks towards achieving better work productivity and cognitive performance (D’Angelo, 2022; Robinson, 2020; Spherion Staffing & Recruiting, 2022). Interestingly, when students and employees were asked about their reasons for listening to music whilst working/studying and the perceived impact music has on them, the answers tended to be mood-related (e.g., improves mood, helps relaxation, alleviates boredom) rather than to enhance cognitive performance or work quality (Haake, 2011; Kotsopoulou & Hallam, 2010). Still other studies suggest that to improve ‘efficiency’ is also a key reason (Kononova & Yuan, 2017).
Whether or not music may elevate mood and increase motivation whilst people engage in mental work, the human cognitive capacity is limited (the brain can only attend to and process limited amount of information at one time; Eysenck & Keane, 2020) and it is plausible to ask whether (or to what extent) background music (BgM) listening can hinder cognitive performance in any way. At the same time, BgM listening also helps sustain attention and prevent mind-wandering in low demand cognitive tasks (Kiss & Linnell, 2020) and can improve performance through mitigating task-related cognitive interference (e.g., inhibiting instinctive responses in a color Stroop task; Masataka & Perlovsky, 2013). Therefore, it is also plausible to ask whether or to what extent and/or in what circumstances does BgM improve cognitive performance.
Music During the Execution of Cognitive Tasks: Good or Bad?
It should come as no surprise that discerning the effects of BgM listening on cognitive performance has become a very popular research area. Indeed, BgM may (consciously or unconsciously, positively or negatively) interfere with a variety of cognitive processes (e.g., Haake, 2011), and the ubiquity of this habit (Calderwood et al., 2014; David et al., 2015; Haake, 2011; Kononova & Yuan, 2017) demands that more attention is paid to its implications. Unfortunately, research in this area has been marked by inconclusive findings, with many studies showing that BgM can have beneficial (Crust et al., 2004; Mammarella et al., 2007; Miller & Schyb, 1989; Proverbio & De Benedetto, 2018), detrimental (Alley & Greene, 2008; Avila et al., 2012; Deng & Wu, 2020; Liu et al., 2017; Perham & Currie, 2014; Perham & Vizard, 2011; Xiao et al., 2020), or no effects (Burkhard et al., 2018; Ferreri et al., 2015; Kou et al., 2018; Liu et al., 2012; Reynolds et al., 2014) on a wide range of cognitive tasks performance.
In order to foster an understanding of these findings, researchers have conducted one systematic review (De La Mora Velasco & Hirumi, 2020) and two meta-analyses (Kämpfe et al., 2010; Vasilev et al., 2018)
2
of studies published in this area, but there are some inconsistencies amongst their reports:
Kämpfe et al. (2010) concluded that BgM hinders performance on memory-related tasks and reading comprehension. Vasilev et al. (2018) concluded that BgM hinders reading comprehension, that BgM with lyrics (L-BgM) is more detrimental than instrumental BgM (I-BgM) for reading comprehension, and that BgM also hinders reading speed (i.e., slows down reading). De La Mora Velasco and Hirumi (2020) did not find any definitive effects of BgM on cognitive performance.
Arguably, the different conclusions reached by these reviews are the result of methodological differences and limitations. For instance, the inclusion/exclusion criteria (and the resulting list of included articles) are quite diverse across these reviews, which naturally has a direct implication to the findings. One of these criteria is the targeted population: Kämpfe et al. (2010) reviewed only studies of adult participants, whereas Vasilev et al. (2018) and De La Mora Velasco and Hirumi (2020) reviewed studies of both adults and children (whose cognitive control capacity is different from adults; Cowan et al., 2006). Another criterion concerns the types of outcome measures being assessed: De La Mora Velasco and Hirumi (2020) only vaguely described the eligibility criteria for their outcome measures—“an explicit learning outcome”; whereas Kämpfe et al. (2010) did not outline the inclusion/exclusion criteria for the types of outcome measures being assessed. Another methodological difference amongst these reviews concerns the publication timeline (and correspondingly the sample sizes of the reviews). Kämpfe et al. (2010) included all articles published before the year 2008, and Vasilev et al. (2018) included all articles published before 2017 (the starting year was not mentioned in both reviews). However, De La Mora Velasco and Hirumi (2020) only reviewed articles published between 2008 and 2018. As a result of this, De La Mora Velasco and Hirumi (2020) sampled only 30 articles, whereas Kämpfe et al. (2010) and Vasilev et al. (2018) managed to sample 97 and 65 articles respectively. Given that the findings in De La Mora Velasco and Hirumi (2020) differed from the other two, the 10-year coverage in their systematic review may not be sufficient to obtain a sizable and representable sample of studies that allow detection of a true effect. The last methodological difference we identified is the differing analytical approach. Unlike the other two reviews, De La Mora Velasco and Hirumi (2020) performed vote counting based on the statistical significance reported in their sample. However, they did not conduct any inferential analysis (e.g., binomial test, sign test, etc.) and therefore the results are merely descriptive. This largely reduces the review's power in identifying meaningful outcomes (Chaimani et al., 2021; Deeks et al., 2021), which, again, could conceal any potential true effects of BgM.
Aside from the methodological differences between previous attempts to synthesize evidence on the impact of BgM on cognitive performance, there is also a general lack of consideration of the multiplicative interactions between task (e.g., the difficulty of task, cognitive domain), music (e.g., presence of lyrics, arousal level, etc.) and population-specific factors (e.g., personality traits, music education, etc.). For example, considering the impacts of different types of BgM, performance in verbal processing tasks such as reading comprehension tend to be hindered by lyrical but not instrumental music (Perham & Currie, 2014). Masataka and Perlovsky (2013) also observed that cognitive dissonance (e.g., performance in a color Stroop task) was mitigated by consonant music but impaired by dissonant music. In terms of task complexity, BgM also tends to hinder performance in complex (but not simple) tasks (Gonzalez & Aiello, 2019). Finally, the direction of these effects could further vary amongst different populations (e.g., extraverts vs. introverts (Furnham & Strbac, 2002); musicians vs. nonmusicians (Herath, 2018), higher vs. lower working memory capacity individuals (Christopher & Shelton, 2017), etc.). Simply put, in order to obtain a more comprehensive understanding of the phenomenon, the evaluation of BgM's effect on cognitive task performance should encompass each task, music, and population-specific analysis.
Another relevant aspect is the fact that none of the three reviews conducted a quality assessment (i.e., methodological quality, reporting quality) on their sampled articles. This is also an important limitation because quality assessments enable both reviewers and readers alike to balance a review's findings against the methodological qualities of the articles that contributed to the findings, which then helps them to better appraise the overall quality of the evidence from the review (Wells & Littell, 2009). This is particularly important to an area of research marked by inconsistent findings and a lack of standard procedures that encompass the heterogeneity of variables and situational factors that can interfere with the findings.
Revisiting the Evidence
Due to conflicting evidence regarding the impact of BgM on cognitive performance and the limitation of previous attempts to synthesize evidence in this area, we propose to revisit the evidence from a different perspective. Through a systematic review of all the empirical studies published so far on this topic, our goals are to quantitatively evaluate the effect of BgM on cognitive task performance separately for different cognitive domains and tasks and consider the contribution of different music and population characteristics. Specifically, our research questions are as follows:
How does BgM affect performance in different types of cognitive domains and tasks?
Are there any music characteristics (e.g., presence of lyrics, volume, tempo, genre, tonality, etc.) that contribute to the effect of BgM on cognitive task performance?
Are there any population characteristics (e.g., personality traits, musical background, gender, etc.) that contribute to the effect of BgM on cognitive task performance for specific listeners?
Methods
Protocol and Registration
The protocol of the review was registered on PROSPERO (Cheah et al., 2020). The review was conducted based on the Preferred Reporting Items for Systematic Review and Meta-Analysis Statement (PRISMA; Page et al., 2021). The PRISMA 2020 Checklist can be found in Appendix A.
Eligibility Criteria
Studies were included if they met the following criteria:
primary research written in English and published in an academic journal from January 1, 1960, to July 2, 2020; the sample population consisted of individuals aged 16 years or older
3
; at least one of the interventions consisted of performing a cognitive task whilst listening to BgM; at least one of the outcomes was a quantitative measure of performance in a cognitive task that is typically required in office desk jobs or studying; the study had a control condition that is either silence, no music or containing natural sounds that are expected to occur in the environment in which the study was executed (e.g., café noises, office noises, etc.); the sample consisted of a special population with particular cognitive or health conditions or disabilities that would systematically affect cognitive performance (e.g., people with dementia, Parkinson's, Autism Spectrum Disorder, any form of learning disabilities, etc.); music was not being listened to during the execution of a cognitive task (e.g., music was listened to before the cognitive task; music-making, etc.); the study does not contain an adequate control group (see inclusion criteria above); the outcome measure consisted of a cognitive performance that is not typical of a regular desk job or studying, which includes:
motoric behaviors (e.g., driving, exercising, surgical procedures, etc.); moral reasoning/solving moral dilemma; communication/social/interpersonal skills (e.g., attention to the body language of others, interpreting the facial expression of others, interpersonal/group bonding or controlling agitated behaviors, etc.); musical tasks (e.g., pitch identification, tempo recognition, etc.); sensorimotor adaptations/skills (e.g., pain tolerance, odor discrimination, time perception, etc.); other types of activities that rely on cognitive skills but are not typical of a regular desk job or studying (e.g., gambling, remembering film scenes/events, remembering advertisements, autobiographical memory, purchasing behaviors, affective impressions, sleep quality, psychological wellbeing, eating behaviors, and alcohol consumption, etc.).
Studies were excluded if they contain the following criteria:
Information Sources and Search Strategy
Given the varied terminology used in studies focusing on music listening and cognitive performance, we devised a multi-step procedure to optimize our search strategy. First, we conducted a preliminary (manual) search to gather a list of relevant articles to serve as a gold standard for our searches and help optimize our search strategy. The articles were sourced from the reference lists of two relevant articles (Kämpfe et al., 2010; Schellenberg & Weiss, 2013), as well as from a blind search through Google Scholar (using the key words ‘background music’ and ‘cognitive performance’). 4 In this process we identified 67 relevant articles (see Appendix B, Table B1). This list was then used as the gold standard to evaluate different search strategies and identify the optimal one, i.e., the one that yielded the highest number of pre-selected articles and the lowest number of total hits (i.e., the total number of sources retrieved by each search strategy).
Our searches were conducted on five electronic databases: PubMed, PsycINFO, Scopus, Web of Science and Google Scholar (the first 500 search results). The optimal search strategy identified via the method described above was ‘music AND cogniti* OR task’ (see also Appendix B, Table B2 for the filters applied to the searches in each database), which yielded 15,407 hits that included 51 of the 67 articles from our list of pre-selected articles. After removing 6,568 duplicates, we added to the pool of articles the 16 pre-selected articles that were not retrieved by the search strategy, and 12 more articles identified in another review that was published in the meantime (De La Mora Velasco & Hirumi, 2020). The final list of articles for screening included 8,867 unique sources.
Screening Process
The screening and selection process was conducted and recorded on the platform Rayyan (Ouzzani et al., 2016) and consisted of two stages: (1) title and abstract screening; (2) full-text screening. First, the titles and abstracts of 8,867 articles were reviewed by the authors YC and EC, and seven independent researchers. Each article was screened by at least two researchers, and inconsistencies were resolved between two of the review authors (YC, EC). At this stage, 185 articles were identified as relevant and were subsequently retrieved for full-text screening. The full-text screening was conducted by two of the review authors (YC and EC), and discrepancies were discussed by the same authors until consensus was reached. The final list of sources included in this review consists of 95 articles (reporting 154 relevant empirical experiments) 5 that meet all the pre-defined inclusion criteria in our protocol. The PRISMA 2020 flow diagram depicting the study selection process is presented in Figure 1.

PRISMA 2020 flow diagram for searches of databases and other sources.
Data Extraction Process
The data were extracted by one review author (YC) and double-extraction was performed on 25 articles (∼26%; HKW and three independent researchers). All inconsistencies were resolved by EC. The data extracted included the following information:
basic sample characteristics (sample size, age, gender); study design information (repeated measures, between-subjects, mixed); experimental conditions (i.e., the independent variables); characteristics of the music used in each relevant experimental condition (e.g., with lyrics or instrumental); type(s) of cognitive task(s) (e.g., arithmetic, abstract reasoning, verbal reasoning, etc.); (if available) level of difficulty of the cognitive tasks; (if available) population characteristics (e.g., personality traits, working memory capacity, level of music education); outcome measures (e.g., mean, median, percentage or total scores) for all BgM and control conditions (as defined in the inclusion and exclusion criteria).
Data items 1–7 can be found in Table 1 under the ‘Results’ section. Data item 8 is available as Supplementary Material because it is too large to include in the main article.
Master list of extracted data: Population characteristics (sample size, age, gender, other population characteristics), study design, experimental conditions, types of cognitive tasks, and other moderating variables (e.g., task difficulty, personality traits, music training.).
Codes [population characteristics]. M = means; SD = standard deviation
Codes [design]. RM = repeated measures; IDP = independent design (i.e., between-subjects); MIXED = combination of repeated measures and independent design
Codes [experimental conditions]. BgM = background music; I-BgM = instrumental background music; L-BgM = background music with lyrics; HA = high arousal music; LA = low arousal music; P = positive valence music; N = negative valence music; FAM = familiar music; UNFAM = unfamiliar music; LANG = language of lyrics; PREF = preferred music; NON-PREF = non-preferred music
Codes [other variables]. EXT = extraversion level; WMC = working memory capacity; MT = music training; TF = task difficulty
Codes [cognitive domain/tasks]. MEM = memory; LANG = language; THINK = thinking, reasoning, and problem-solving; ATT = attention; INH = inhibition; PS = processing speed; SR = serial recall; I-/D-FR = immediate/delayed free recall; I-/D-PAL = immediate/delayed paired-associates learning; REC = recognition; WM = working memory; RC = reading comprehension; RS = reading speed; LG = linguistic; LL = language learning; WQ = writing quality; WF = writing fluency; MATH = mathematics/arithmetics; VR = verbal reasoning; NVR = non-verbal reasoning; IQ = general IQ tests; CRT = creativity tests; CELL = cell interpretation task; G-NG = No-NoGo task; STRP = colour Stroop task; PS = processing speed; V = verbal; NV = non-verbal
Data Classification and Synthesis
Classification
In order to answer our research questions, we have further classified the extracted data by types of cognitive task (and, when this information was available, the level of difficulty of the task), BgM characteristics (e.g., music with lyrics, instrumental music, music conveying positive moods, etc.), and relevant population characteristics (e.g., level of extraversion, participants’ level of music education, etc.).
In relation to the types of cognitive tasks, we first created a two-level taxonomy of cognitive domains and the corresponding tasks within each domain. At the first level, we classified the outcome measures reported by each experiment into one of six cognitive domains: (1) memory; (2) language; (3) thinking, reasoning, and problem-solving; (4) inhibition; (5) attention; and (6) processing speed. Then, within each of these domains, we created a second, more specific classification level that indicates the specific type of cognitive task (e.g., the language domain includes tasks such as reading, writing, language learning, etc.). The two-level taxonomy is depicted in Figure 2 and a detailed description of each cognitive task is included in Appendix D (Table D1). Note as well that, for some experiments, outcome measures were also classified (and the data synthesized) according to their difficulty levels (as reported by the authors in each experiment).

Taxonomy of the six cognitive domains and the cognitive tasks within each domain.
Concerning the experimental condition, in addition to synthesizing and contrasting the results of the control condition and any BgM interventions irrespective of music characteristics (i.e., only considering the presence or absence of BgM), where possible, we also grouped and contrasted the outcomes of different experiments based on intervention characteristics related to certain musical features. In most cases, this meant summarizing and contrasting the outcomes of BgM interventions using L-BgM and I-BgM, but also other specific music characteristics identified such as tempo, genre, complexity, or its emotional character. Furthermore, given the varied ways in which descriptions of the emotional characteristics of the music were used in the various studies (e.g., happy, sad, energetic, relaxed, etc.), we also standardized this terminology by mapping discrete emotion terms to their respective quadrant in Thayer's arousal-valence emotion plane (Thayer, 1990). For instance, ‘happy’ music would constitute a ‘high arousal, positive valence’ (HA-P) music, and ‘sad’ music would be ‘low arousal, negative valence’ (LA-N). Note that, as a consequence of considering these music characteristics and conducting multiple comparisons, the results of the same experiments could be used in multiple analyses (e.g., a comparison of BgM vs. silence might simultaneously be tested as L-BgM vs. silence and HA-P vs. silence). In all these cases, appropriate procedures were adopted to adjust the significance of statistical tests for multiple comparisons (as it will be described later).
Finally, because some studies included in this review evaluated the moderating effects of various population characteristics and reported outcome measures separately for each subgroup, we also synthesized and analyzed these results separately. In particular, we considered both the mean difference between a BgM and the control condition for each subgroup (e.g., introverts: BgM vs. silence; extraverts: BgM vs. silence) as well as the mean difference between comparable subgroups for each BgM intervention and the control condition (e.g., BgM: introverts vs. extraverts; silence: introverts vs. extraverts). In situations where population characteristic(s) were measured but subgroup outcomes were not available, we did not consider those subgroups in our analysis and focused only on the impact of the experimental conditions as a whole.
All the data extracted and classified using the strategies described above can be found in the Supplementary Material. 6
Synthesis
Due to the heterogeneity of the interventions, outcomes and study designs, and the lack of necessary data in some articles, we were not able to perform a meta-analysis. Summarizing the effect estimates or combining the p-values was also not feasible, and narrative synthesis is undesirable as it can make it difficult to determine the validity of the findings (Campbell et al., 2020). As such, we employed the Synthesis Without Meta-analysis (SWiM) reporting method (Campbell et al., 2020; McKenzie & Brennan, 2021), which synthesizes the extracted data using vote counting based (solely) on the direction of effects (mean difference between two conditions irrespective of the p-values, effect estimates and the participant sample size) and used a sign test analysis (a form of binomial test whereby the chances of an intervention significantly affect or not affect an outcome is set to p = 0.5) to determine whether there is any evidence of a specific effect across comparable studies (see Bushman & Wang, 2009; McKenzie & Brennan, 2021). It is necessary to note that data analysis using vote counting is exploratory in nature and comes with its limitations, therefore the results derived from this method will need to be interpreted with some level of caution. We will depict this in more detail later in the Limitations section.
To perform vote counting, we first calculated (for each outcome of each experiment) the mean difference between the measured outcomes for any two experimental conditions, population and/or task-related groups (e.g., BgM vs. silence, L-BgM vs. I-BgM, BgM: introverts vs. extraverts, BgM: easy vs. difficult task). Then, based on the sign of this difference (i.e., negative or positive), the directions of the effects were determined and recorded using a standardized binary metric of 0 (negative) and 1 (positive), where 0/1 means decreased/increased performance in the target condition (‘n/a’ was used when the mean difference was 0). For example, when comparing the effect of BgM (the target) to silence (the control), i.e., BgM vs. Silence, if the performance on a particular cognitive task was better in the BgM condition it would be coded with 1, otherwise it would be coded as 0 (if worst) or ‘n/a’ (if there was no difference). When the measured outcomes were not available in numerical form (e.g., exact mean values), we determined the mean difference based on available graphical data or, when this was not available, we requested the original data from the authors. The direction of effects that were unidentifiable and unattainable through any of the above means were recorded as not available (‘n/a’).
Once the above process was completed for all experiments and comparisons, we added up the number of positive signs (i.e., all 1s) for all identical comparisons (e.g., all L-BgM vs. Silence comparisons) for each outcome measure (e.g., reading comprehension task), and performed a two-tailed sign test against a significance level of α = 0.05. To correct for potential false discovery rate due to multiple testing, the p-values were also adjusted using the Benjamini-Yekutieli False Discovery Rate (B-Y FDR; Benjamini & Yekutieli, 2001). 7 The confidence intervals (CI) for each intervention effect were calculated using the (SWiM recommended) Wilson's CI formula (Brown et al., 2001; McKenzie & Brennan, 2021). Note that, due to the small number of matched pairs (hereinafter referred to as ‘tests’) used for some of the comparisons (e.g., only a sample of 3 tests that compared the effects of L-BgM vs. silence on creative task performance) it would not be possible to reach statistical significance in the sign test irrespective of the estimated proportion of positive (or negative) effects. Indeed, as shown in Table 2, the smallest number of tests needed to achieve statistical significance (at p < .05) is 6 (comparisons with less than 6 tests will always be inconclusive even if all tests are positive or negative). Due to this, we decided to analyze samples that have at least 9 or more tests, which requires less than 10% (or more than 90%) of the tests achieving positive signs to reach statistical significance.
A sampling of the number of successes required to achieve statistical significance at α<.05 in a two-tailed sign test for test samples 1 to 20 and 140 to 150.
All the analyses were performed using RStudio version 4.1.0 (RStudio Team, 2021). The sign test analysis was performed using the function “binom.test” (p = 0.5; alternative = ”two.sided”). Wilson's confidence interval was computed using the package “PropCIs” (Scherer, 2018)—function “scoreci” (conf.level = 0.95). Lastly, the BY-FDR adjustment was performed using the “p.adjust” function (method = ‘BY’).
Assessment of Quality
To our best knowledge, there is no specific tool to evaluate the quality of studies measuring cognitive performance in general or the impact of music interventions on cognition in particular. Therefore, we decided to develop a new tool—the Music and Cognitive Performance Appraisal Tool (MCPAT; see Appendix C). Following the general recommendations in Whiting et al. (2017), we first adapted a list of potential assessment criteria (one that caters for research in music and cognitive performance) based on existing risk of bias assessment tools—Cochrane RoB 2.0 (Sterne et al., 2019), ROBINS-I (Sterne, Hernán et al., 2016; Sterne, Higgins et al., 2016), Mixed Methods Appraisal Tool (MMAT) version 2018 (Hong, Fàbregues et al., 2018; Hong, Pluye et al., 2018), and the checklist for reporting music-based interventions (Robb et al., 2011)—and refined them along a series of selections and consensus meetings among our team (the authors). With the first draft of the MCPAT, we then piloted the tool using a subset of the articles sampled in this review, followed by further refinements of the criteria. Any selection conflicts along this process was supervised by a third person.
As a result, the MCPAT offers a quality assessment reference for five domains (each with a variety of assessment criteria): (1) experimental design, (2) music selection and characteristics (i.e., the interventions), (3) cognitive tasks and outcome measures, (4) experimental procedures, and (5) results and data analysis. Each assessment criterion is equipped with a signalling question for reviewers to determine (using the responses ‘Yes’(Y), ‘No’ (N), ‘Can't tell’ (CT) or ‘Not applicable’ (NA)) whether they were fulfilled by the relative experiments. The complete MCPAT can be found in Appendix C, together with the signalling questions associated with each of the assessment criterion.
The quality assessment was conducted by the review author (YC), and double-extraction was conducted on 25 (∼26%) of the articles by another author (HKW). Any discrepancies were then resolved by the last author (EC). Following this, we then calculated the total number (and the corresponding percentage) of experiments that fulfilled each assessment criterion (i.e., [Y / (Y + N + CT)] * 100%). 8
Results
In what follows, we provide a summary of relevant characteristics of our review sample, the quality assessment of individual experiments, and the sign test results for each cognitive domain and task (according to music-specific characteristics), and the sign test results for specific population groups.
Studies Characteristics
The studies included in this review (mostly published after 2010; cf. Figure 3) include data from 6,246 participants (females: 2,949; males: 2,009; not reported: 1,288), most of which are young adults (see Table 1, column ‘Age’).

Trend of publication of included articles.
In relation to cognitive tasks, as it can be seen in Figure 4, a variety of tasks have been studied, with the most common relating to the memory, language, and thinking domains. The most frequently studied tasks were reading comprehension (n = 21 experiments), mathematics/arithmetics tasks (n = 15), and serial recall and non-verbal attention (n = 13 each). It is interesting to notice that (cf. Figure 4) only 17 experiments (out of 154) considered the role of task difficulty when investigating the impact of BgM on task performance, and most of them were related to thinking (n = 7).

Tree-map of the number of experiments sampled for each of the six cognitive domains as well as for task difficulty, and the corresponding cognitive tasks.
Regarding population distributions by cognitive task (see Table 3), there are very clear differences in the number of participants tested in different cognitive tasks. Many participants were tested on reading comprehension (1544), mathematics/arithmetics (1391), serial recall (781), non-verbal reasoning (717), and verbal attention (649), but very few were tested on tasks such as immediate and delayed free recall of non-verbal materials (68 each), writing process and quality (73 each), and language learning (32).
Population size for each individual cognitive task (i.e., level 2 classification), organized by their respective cognitive domains and then by verbal or non-verbal properties.
Note. As some studies evaluated multiple types of cognitive tasks (e.g., abstract reasoning, reading comprehension and mathematics) on the same group of participants, the population and sample size listed above do not add up to the total population (n = 6246, males = 2009; females = 2949; not reported = 1288) of the review.
Plus signs ( + ) indicate the presence of missing data on gender distribution.
Regarding experiments that specifically evaluated the interactions between individual characteristics and BgM (see Table 4), the personality trait level of extraversion (i.e., introverts, extraverts) was the variable tested in the largest number of experiments (n = 31), followed by gender (n = 15), music training (n = 7) and working memory capacity (n = 3). In relation to music characteristics (see Table 5), by far, the largest sample obtainable was that of the absence (I-BgM; n = 103) or presence of lyrics (L-BgM; n = 77). Others included arousal and valence (n = 18), genre (n = 13), tempo (n = 8) and volume (n = 5), complexity of the music (n = 6) and listeners’ preference for the music (n = 7), to name a few.
Population and sample sizes (total and by respective cognitive tasks) for each subgroup: personality traits, gender, music education, working memory capacity, and task difficulty.
Code. Imm. = immediate; Del. = delayed; V = verbal; NV = non-verbal; PAL = paired-associates learning
Plus signs ( + ) indicate the presence of missing data in respect to the respective population distributions.
Population characteristics in relation to gender were extracted only from studies that reported summary statistics in relation to gender.
Sample of each type of BgM condition, organized by the respective cognitive task.
Codes. L-BgM = background music with lyrics; I-BgM = instrumental background music; A&V = arousal and valence; GEN = genre; PREF = preference; TEM = tempo; VOL = volume; LANG = language of lyrics; TONAL = tonality/harmonicity; COMP = complexity; FAM = familiarity; INST = type of musical instrument; CONT = context
Quality Assessment
Quality assessment was conducted for each empirical study 9 rather than for each article. An overview is shown in Table 6.
Quality assessment criteria rating for each individual experiment.
Codes. Y = yes; N = no; CT = can't tell; N/A = not applicable
1.1 = sampling criteria; 1.2 = control condition; 1.3 = extraneous variables; 1.4 = baseline comparability; 1.5 = internal validity; 2.1 = music selection rationale; 2.2 = compliance to hypothesis; 3.1 = task description; 3.2 = outcome measures; 3.3 = objectivity of assessors; 4.1 = timing of delivery; 4.2 = intervention strategies; 4.3 = replicability; 5.1 = analyses appropriateness; 5.2 = reporting (direction of effect); 5.3 = reporting (statistical significance); 5.4 = reporting (effect magnitude).
Note. The ‘Y’ response is a sign of good quality.
Y (%) = [nY / (nY + nN/CT)]*100% (i.e., NA responses were discarded from the calculation).
Outcome (n) for if either criterium 1.4 or 1.5 is ‘Y’.
Outcome (n) for if both criteria 1.4 and 1.5 is either ‘N’ or ‘CT’.
Outcome (n) for only and only if criteria 1.4 and 1.5 are both ‘NA’.
As can be seen, the majority of the studies (94%) did not report effect sizes, and only 16% reported exact significance values (i.e., p-values) for all the significant and non-significant results.
On the other extreme, all studies included a control group (naturally, as this was an inclusion criterion in this review), and employed adequate outcome measures to quantify performance on the various cognitive tasks reported. Additionally, most studies contained clear description of the characteristics of their sampled population (92%), and had a clear rationale for the music selection process (94%) that was aligned to the hypothesis of the experiment (95%). Generally, there were also clear descriptions regarding the timing of delivery of the BgM interventions (98%), how the length of the BgM was accommodated to the length of the cognitive task (69%), clear descriptions of the cognitive tasks being studied and how they were executed (93%), and clear justifications for the chosen statistical analyses (85%). Most studies (86%) also had adequate measures in place (e.g., random allocation, counterbalancing, etc.) to control for extraneous variables. Of the 14% that did not perform or report such measures, 67% of them had adequate alternative measures in place to ensure homogeneity among participants (e.g., measures of working memory capacity at baseline).
Various other criteria were less consistent (or not clearly described) across studies, namely the existence of an independent outcome assessor for outcome measures that require subjective rating (60%), and the inclusion of sufficient information about the experiments that allows a full replication (60%).
Impact of BgM on Different Cognitive Domains
The sign test analyses showed significant effects of BgM (of particular features) on cognitive performance for two cognitive domains—memory (Table 7) and language (Table 8). There are also significant effects associated with task difficulty (Table 9). The full vote counting and sign test analyses for the other cognitive domains are included in Appendix D (Tables D2-D5). Note that, as mentioned before, some tests were not conducted due to small sample sizes (and therefore comparisons are not included in the tables).
Sign test results of BgM and memory-related tasks, performed on RStudio version 4.1.0, using the function ‘binom.test’ (p = 0.5, alternative=‘two.sided’).
Codes. BgM = background music; L-BgM = background music with lyrics; I-BgM = instrumental background music; S = silence; HA = high arousing music; LA = low arousing music; P = positive valence music; N = negative valence music
* p < .05 ** p < .001
Successes in favour of the first listed condition. For example, the number of successes for a comparison L-BgM vs. I-BgM would correspond to the successes in favour of L-BgM
p-values corrected for multiple comparisons, based on the Benjamini-Yekutieli False Discovery Rate.
Confidence intervals are calculated based on Wilson's CI.
Sign test results of BgM and language-related tasks, performed on RStudio version 4.1.0, using the function ‘binom.test’ (p = 0.5, alternative = ‘two.sided’).
Codes. BgM = background music; L-BgM = background music with lyrics; I-BgM = instrumental background music; S = silence
* p < .05
Successes in favor of the first listed condition. For example, the number of successes for a comparison L-BgM vs. I-BgM would correspond to the successes in favor of L-BgM
p-values corrected for multiple comparisons, based on the Benjamini-Yekutieli False Discovery Rate.
Confidence intervals are calculated based on Wilson's CI.
Sign test results of BgM and cognitive performances by task difficulty, performed on RStudio version 4.1.0, using the function ‘binom.test’ (p = 0.5, alternative = ‘two.sided’).
Codes. BgM = background music; I-BgM = instrumental background music; S = silence
* p < .05
Successes in favour of the first listed condition. For example, the number of successes for a comparison L-BgM vs. I-BgM would correspond to the successes in favour of L-BgM. In the case of subgroup comparisons, the number of successes will be successes in favour of the first listed subgroup, e.g., in a comparison of BgM (males - females), it would be successes in favour of males in the BgM condition (when compared to females in BgM condition).
p-values corrected for multiple comparisons, based on the Benjamini-Yekutieli False Discovery Rate.
Confidence intervals are calculated based on Wilson's CI.
Memory
Out of 152 tests comparing the effect of BgM on memory tasks, our results show that there is no evidence that BgM (irrespective of its characteristics) either hinders or benefits performance in the memory domain (cf. Table 7, Test M1). Nonetheless, if the music had lyrics, performance was significantly worse compared to performance in silence (p < .001; cf. Table 7, Test M2) and performance in instrumental music (p = .009; Table 7, Test M4). The performance in instrumental music did not differ from that in silence.
When looking at individual memory tasks, we found some task specific effects. Compared to silence, there was a strong detrimental effect of BgM (p < .001), L-BgM (p = .003) and I-BgM (p = .021) on serial recall task performance (cf. Table 7, Tests M13-M15). There was also a significant detrimental effect of L-BgM (when compared to silence) on the performance on memory recognition of verbal materials (cf. Table 7, Test M25; p = .011). The only positive effects we found pertain to immediate paired-associates learning of verbal materials, which indicate a positive effect of I-BgM (compared to silence, cf. Table 7, Test M21; p = .003).
No significant results were found for any other types of memory tasks, either because the sign tests were non-significant (sometimes after corrections for multiple comparisons) or that the analyses were not conducted due to the low number of available tests.
Language
Out of 106 tests comparing the impact of BgM on all language-related tasks, our results show that there is no evidence that BgM either hinders or benefits performance irrespective of task (cf. Table 8, Test L1). Nonetheless, the analyses of individual tasks have revealed two significant effects. First, BgM (compared to silence) hindered reading comprehension (p = .029; cf. Table 8, Test L7), which seems to be associated with the detrimental effect of L-BgM (p = .038; cf. Table 8, Test L8). Second, BgM (compared to silence) had slowed down reading speed (p = .022; cf. Table 8, Test L11).
No significant results were found for any other language-related tasks, either because the sign tests were non-significant or that the analyses were not conducted due to the low number of available tests.
Task Difficulty
Out of 24 tests that compared the impact of BgM on tasks varying in difficulty level (irrespective of the cognitive domain and types of tasks), our results show no difference in difficult tasks performance with or without BgM. This is the same for easy tasks. However, when we directly compared the performance between difficult and easy tasks, we found no significant difference in task performance when in silence; but when in the presence of BgM, performance in difficult tasks was significantly poorer than performance in the easy tasks (p = .023; cf. Table 9, Test TF5). We found a similar effect when we tested only instrumental music (p = .031; cf. Table 9, Test TF6). The analysis of L-BgM was not conducted due to the low number of available tests.
The Contributions of Individual Characteristics
In relation to the contributions of individual listeners’ characteristics on the effect of BgM on cognitive task performance, only the level of extraversion yielded statistically significant effects. As it can be seen in Table 10, compared to silence, introverts’ performance across all types of cognitive tasks was significantly poorer in the presence of BgM in general (p = .004; cf. Table 10, Test EXT3) and L-BgM in particular (p = .004; cf. Table 10, Test EXT4), whereas the performance of extraverts was not affected by the presence of music. We also directly compared the performances of introverts and extraverts in both BgM and silent conditions. Interestingly, our results show that, compared to extraverts, introverts had a significantly superior performance in the silent condition (p = .004; cf. Table 10, Test EXT7), but this effect disappeared in the presence of BgM (cf. Table 10, Tests EXT3-EXT4).
Sign test results of BgM and cognitive performances by level of extraversion, performed on RStudio version 4.1.0, using the function ‘binom.test’ (p = 0.5, alternative = ‘two.sided’).
Codes. BgM = background music; L-BgM = background music with lyrics; I-BgM = instrumental background music; S = silence; EXT = extraverts; INT = introverts
* p < .05
Successes in favour of the first listed condition. For example, the number of successes for a comparison L-BgM vs. I-BgM would correspond to the successes in favour of L-BgM. In the case of subgroup comparisons, the number of successes will be successes in favour of the first listed subgroup, e.g., in a comparison of BgM (males–females), it would be successes in favour of males in the BgM condition (when compared to females in BgM condition).
p-values corrected for multiple comparisons, based on the Benjamini-Yekutieli False Discovery Rate.
Confidence intervals are calculated based on Wilson's CI.
No significant effects were found for gender and music training because the sign tests were non-significant, and analysis was not conducted for working memory capacity due to the low number of available tests. The full results of the vote counting and sign test analyses for these factors are included in Appendix D (Tables D6-D7).
Discussion
The aim of this systematic review was to evaluate the effect of BgM on cognitive task performances and to provide clarity to the findings in this field. Building upon the findings and limitations of previous reviews, we adopted a task-specific approach towards evaluating BgM's effect on specific cognitive domains and tasks. To this end, we devised a taxonomy to classify the cognitive tasks reported in 95 empirical articles (154 experiments) into one of six cognitive domains: (1) memory; (2) language; (3) thinking, reasoning, and problem-solving; (4) inhibition; (5) attention; and (6) processing speed. Within each domain, we then performed task-specific analyses on each type of cognitive task (e.g., within the memory domain: serial recall, free recall, recognition, etc.); and if the data are available, analyses according to the difficulty levels of the cognitive tasks (as reported in each experiment). We also adopted a music-specific approach to the task-relevant analyses, and identified 13 music characteristics—the presence/absence of music (i.e., BgM vs. silence); and when the information is available, the presence (L-BgM) and absence of lyrics (I-BgM) in the music, arousal and valence, genre, tempo, volume, language of lyrics, tonality/harmonicity, complexity, listeners’ preference, listeners’ familiarity to the music, types of musical instrument used, and/or the context in which BgM was played (e.g., during encoding only, during both encoding and recall, etc.). Further subgroup analyses based on individual differences (extraversion level, gender, and music training) were also performed.
Before discussing our findings, the first important observation to make concerning our results is that, for the majority of the comparisons analyzed (e.g., L-BgM vs. Silence), cognitive performance with BgM did not differ from conducting the same tasks in silence (see results summary in Table 11). Nonetheless, it is important to interpret this carefully: many comparisons had a very low number of tests, which means that achieving the level of significance is very difficult even if the vast majority of them report positive or negative effects. Indeed, amongst the comparisons that we analyzed (those with 9 or more tests), approximately one-third of them (38 out of a total of 94 comparisons) had a sample size of 15 tests or less, which means that, unless less than 20% (or more than 80%) of the tests achieved positive signs, it is not possible to achieve statistical significance (see critical values in Table 2). Furthermore, we also could not statistically analyze BgM's effect on a variety of cognitive tasks (see Table 11) due to their small sample of studies (more studies are needed for these tasks).
Summary of the detrimental, facilitative and inconclusive effects of BgM (with corresponding number of tests) on different cognitive tasks and population, organised by BgM, L-BgM, I-BgM, Arousal and/or valence and genre.
Codes. S = silence; BgM = background music; L-BgM = background music with lyrics; I-BgM = instrumental background music; HA = high arousing music; LA = low arousing music; P = positive valence music; N = negative valence music; PAL = paired-associates learning
● = detriment effect;
= facilitative effect;
= no identifiable effect; dash symbol (-) = no available/sizable analysis; n/a = not applicable
Single-interventions displayed in the heading indicate comparisons with the control condition (i.e., silence/no music); unless when in circumstances when two subgroups are compared (e.g., introverts–extraverts; males–females), it is the comparison between the two specified subgroups within the corresponding condition (e.g., introverts–extraverts in L-BgM only).
Codes. S = silence; P = positive valence music; N = negative valence music; FAM = familiar music; UFAM = unfamiliar music; PAL = paired-associates learning
● = detriment effect;
= facilitative effect;
= no identifiable effect; dash symbol (-) = no available/sizable analysis; n/a = not applicable
Single-interventions displayed in the heading indicate comparisons with the control condition (i.e., silence/no music); unless when in circumstances when two subgroups are compared (e.g., introverts–extraverts; males–females), it is the comparison between the two specified subgroups within the corresponding condition (e.g., introverts–extraverts in L-BgM only).
It is also worth mentioning that the methodological quality of the experiments analyzed in this review is generally satisfactory and therefore so is the quality of the evidence presented in this review. Indeed, all studies reported clear sample characteristics, had adequate control condition(s), and used valid and reliable outcome measures. The majority of the studies also adequately controlled for the influence of possible extraneous variables, and provided clear procedural description and justification in regards to the selection and delivery of their interventions (i.e., BgM) and outcome measures (i.e., cognitive tasks). There were nonetheless various limitations that should be considered in future work. Firstly, there is a lack of clear descriptions of the complete empirical procedures (which are crucial for replicating the work). Secondly, many studies did not report statistical data on effect sizes and significance values, which are required for better appraisal of the overall magnitude of an intervention effect on an outcome measure. That being said, in the context of this review, as these studies mainly fall short in terms of the data being reported rather than their methodological quality, we do not think that they pose significant influence to the overall confidence of our findings (especially considering the type of analysis we conducted).
We turn now to our research questions in more detail. For the sake of clarity, Table 11 summarizes our results.
Research Question 1: How Does BgM Affect Performance in Different Types of Cognitive Tasks (i.e., Tasks of Different Cognitive Domains and Levels of Difficulty)?
Our results show that the impact of BgM on cognitive performance differs for different types of cognitive tasks as well as for tasks of different levels of difficulty; and when significant impacts were identified, those impacts are mostly negative. Overall, we found that music with lyrics (compare to silence and instrumental music) has a general detrimental impact in the memory domain, which is particularly evident in tasks involving memory recognition of verbal materials and serial recall tasks (in the case of serial recall, instrumental music also has a detrimental impact). Interestingly, instrumental music led to an improvement in the performance on immediate paired-associates learning of verbal pairs (note that there were no studies that used music with lyrics). In relation to language, the effects were also task-specific, with music (with or without lyrics) slowing down reading speed, and music with lyrics hindering reading comprehension. In relation to task difficulty, we found that instrumental music (there were no experiments using music with lyrics) led to a significant reduction in performance in difficult tasks (compared to easy tasks).
Clearly, our evidence show that BgM seems to affect a small range of domains (language and memory), that the effects of music on cognitive task performance depend on the nature of the task and its difficulty, and almost all effects hinder performance.
Research Question 2: What Are the Music Characteristics (e.g., Lyrics, Volume, Tempo, etc.) that Contribute to the Effect of BgM on Cognitive Task Performance?
Whereas many different types of music characteristics have been manipulated (or identified by us) in the studies included in our review (e.g., presence of lyrics, music complexity, music genre, tempo, loudness, mood, etc.), by far the most common (and with adequate sample sizes for our analyses) was the presence/absence of lyrics, followed by arousal-mood and genre. Clearly, music with lyrics seems to hinder cognitive performance more often than music without lyrics, and it particularly affects memory-related tasks and reading comprehension.
With regard to instrumental music, serial recall (memory domain) and difficult cognitive tasks, in general, were hindered by its presence. This suggests that instrumental music is less likely to affect cognitive task performance (in the context of the tasks evaluated in this review) unless they are complex tasks that are more cognitively demanding. It is also worth mentioning that the only positive effects of music on cognitive performance were related to instrumental music. In sum, there is clear evidence that music with lyrics tend to have more detrimental effects on cognitive task performance compared to instrumental music.
Taken together, both the task and music-specific findings largely agree with the current models of human working memory, specifically of a capacity and structural limit to the working memory system (Baddeley, 2003, 2012; Eysenck & Keane, 2020). Firstly, due to the limits of working memory capacity, tasks that require high levels of cognitive effort (to the extent that they exhaust or overload working memory resources) will limit the quality of cognitive performance (Kahneman, 1973; Norman & Bobrow, 1975). The fact that instrumental music (there are no sufficient data on music with lyrics) impaired performance in cognitive tasks that were generally identified as being ‘difficult’ (regardless of them being verbal or non-verbal tasks) could be a manifestation of an overall cognitive ‘overload’. It is also possible that our results can be related to the structural limits of working memory, whereby ‘[i]f two tasks use the same component [from the working memory system], they cannot be performed successfully together’ (Eysenck & Keane, 2020, p. 247). This is evident in our findings that L-BgM impaired performance in serial recall (of verbal materials), reading comprehension and memory recognition (of verbal materials) tasks. On top of that, serial recall performance is also impaired by I-BgM, and reading speed is significantly slower when BgM was present—all of which are tasks related to verbal processing.
Nonetheless, it is worth noting that following the logic of a structurally limited working memory system, we would expect that BgM should impair performance in all verbal tasks. However, this is not what we found. There is no impact of BgM on linguistic task performance, and no (to potentially weak) impact of BgM on the free recall of verbal materials. We even observed improved performance of immediate paired-associates learning of verbal pairs in effect of I-BgM. Some of the precedents that may support these findings could be earlier studies on patients with phonological deficits (Han & Bi, 2009; Hanley & McDonnell, 1997), as well as studies on the impact of articulatory suppression on paired-associates learnings of foreign and native language vocabularies (Papagno et al., 1991). These studies postulated that although phonological coding is integral in enabling verbal processing (Leinenger, 2014; Slowiaczek & Clifton, 1980), verbal processing is not necessarily always phonologically mediated (Baron, 1973; Han & Bi, 2009; Hanley & McDonnell, 1997; Levy, 1978; Papagno et al., 1991). Particularly in Papagno et al. (1991), the researchers observed that when participants underwent paired-associates learning in a native language, semantic access to the verbal information can be achieved by bypassing phonological coding.
All in all, the trends of behaviors that we have identified through this review may be explained by general models of the working memory as well as findings related to the role of phonological coding in various verbal tasks. Nonetheless, we must also emphasize that given the exploratory nature of our review and the fact that many BgM-performance relationships remain unidentified due to limited samples (e.g., language learning, writing fluency, writing quality, verbal reasoning), these are attempted explanations, and should therefore be investigated further (rather than used to affirm any existing theories).
Research Question 3: What Are the Individual Characteristics (e.g., Personality Traits, Music Education, etc.) that Contribute to the Effect of BgM on Cognitive Task Performance?
The population characteristics analyzed in our review were the personality trait of extraversion (extraverts vs. introverts; 31 experiments), gender (males vs. females; 15 experiments) and self-reported level of music training (musicians vs. nonmusicians; seven experiments). Our results revealed that introverts’ cognitive task performance is clearly hindered by the presence of music with lyrics (when compared to silence), whereas extraverts’ performance is not affected (note that the impact of instrumental music is unknown due to insufficient tests). This effect is also evident that, when directly comparing introverts’ and extraverts’ performance in the presence and absence of music, introverts outperformed extraverts in silence but this advantage disappeared when music was present.
Overall, the results related to extra/introversion seem to cohere partially with Eysenck's theory of personality (Eysenck, 1967). This theory posits that introverts generally have higher cortical arousal at rest, and therefore the presence of BgM during task performance would tip their arousal off the optimal level, leading towards performance decline. On the other hand, extraverts’ task performance should benefit from BgM due to their lower cortical arousal at rest. Our results are consistent with the former depiction. As for the latter, instead of extraverts benefitting from BgM, we only observed that BgM did not affect extraverts’ task performance. It is possible that we failed to identify an existing impact due to methodological limitations, or that the impact of BgM on extraverts’ task performance is only subtle such that they could not be statistically identified in the context of this review. It is also worth mentioning that current theorizations of how personality traits might moderate the influence of BgM on task performance is still debatable (see Küssner, 2017). Therefore, the specific contribution of extra/introversion towards how BgM affects cognitive task performance remains an open-ended question and warrants further studies.
Comparison with Past Reviews
Generally, our review confirmed the findings of Kämpfe et al. (2010) and Vasilev et al. (2018) regarding the detrimental effect of BgM (and particularly music with lyrics) on memory-related tasks and reading comprehension. Our finding that BgM slows down reading speed is also consistent with that of Vasilev et al. (2018). Additionally, we have extended their findings in relation to the specific types of memory-related tasks that are affected by music with lyrics, and the fact that instrumental music generally does not affect cognitive task performance—except for its detrimental impact on serial recall and complex cognitive tasks, and its positive effect on the immediate paired-associates learning of verbal materials (all of which were not identified in past reviews). Furthermore, we have also demonstrated that in order to obtain a comprehensive understanding of how BgM impacts cognitive task performance, we should also account for the contribution of population (i.e., listeners’) characteristics.
Limitations of Our Approach
There are some limitations of this review that must be acknowledged.
Firstly, the use of a vote counting approach to data analysis (imposed by the lack of sufficient data to conduct a meta-analysis) imposes some limitations. Indeed, through vote counting, we synthesized all data based solely on the direction of effect derived from mean differences, and precluded the summation of significance values (i.e., p-values) and effect sizes (as well as population sample sizes). Without data on the overall significance and magnitude of effect, our findings are only demonstrations of the trends of findings but not effect estimates. They should therefore be interpreted with caution. Nonetheless, given the complexity of the field and the lack of consistent findings, our review functions as a preliminary synthesis of available data that provides an exploratory overview of current trends. On this account, our findings can be used by researchers to further the field—for example, the generation of more research questions and/or hypothesis in relation to music listening and cognitive performance.
Secondly, due to our analytical approach, the fact that some tasks, music, and population characteristics have been less studied meant that we could not include them in our analysis. As such, we urge our readers to interpret our results whilst considering: (1) the total number of tests (cf. relevant sign test results table) as well as the number of experiments (cf. Table 5) analyzed for each comparison, (2) the number of comparisons (e.g., L-BgM vs. Silence; Pop music vs. Silence) analyzed for each outcome measure (the total number of comparisons will affect the outcome of the adjusted p-values), (3) both the unadjusted and adjusted p-values, and (4) the corresponding CI for each comparison. These can be helpful for data interpretation, as well as for identifying areas in need for further research.
Challenges for Future Reviews
Given that music listening and cognitive task execution are both highly individualized processes, a review of relevant studies in this topic has its own challenges. For instance, the generalizability of some findings (at least, in the context of this review) could be limited due to methodological heterogeneities amongst studies. As a starting point, the musical pieces used across studies were (naturally) not the same. Unlike studies of the so-called Mozart effect, whereby there is a clear scope concerning the specific music (Mozart's K.448) and cognitive tasks involved (visuospatial tasks), the study of BgM and cognitive task performance does not share this level of specificity. Although some prevalent trends are observable, the unavoidable fact for studies in this discipline is the lack (and impracticality) of controlling or standardizing the intervention stimuli beyond those of general music characteristics (e.g., tempo, loudness, presence of lyrics, etc.). Other variables, such as the complexity of music, are less tangible and more complicated to quantify, but eventually might also have impact on cognitive performance (e.g., Furnham & Allass, 1999; Gonzalez & Aiello, 2019). For instance, the ‘complexity’ of music can be defined on many levels (e.g., instrumental, melodic, harmonic, rhythmic). The constituent of complexity is also multi-faceted (Streich, 2006), operating at different levels depending on the genre (e.g., classical vs. popular music) as well as individual differences (e.g., music training, level of exposure/familiarity, etc.), thereby making quantifying these aspects a complication (see Downie, 2004). Furthermore, not all studies attempted to control for other possible individual differences (e.g., personality traits, working memory capacity, etc.), and we have shown that at least the level of extraversion is clearly central to the impact of background music on cognitive performance. Therefore, the conclusions we are able to draw from the current sample simply address the more prevalent trends in terms of (1) the types of relationship between (particular) BgM and (particular) cognitive task performance that are more robust to additional influences from population characteristics (e.g., serial recall performance), and (2) the contributions of non-musical factors that prevail regardless of the types of cognitive task (e.g., levels of extraversion, task difficulty).
Another challenge in reviewing the evidence on this topic is in synthesizing cognitive tasks with comparable levels of difficulty. In the context of this review, some types of tasks were classified based on a collection of shared cognitive processes rather than by the specific tasks per se. As such, there is the possibility that the associated results could be confounded by differing levels of difficulties amongst the tasks. For example, the linguistic tasks assessed in our review are a broad category that consist of different tasks such as proofreading, matching consonants and vowels, and verbal fluency (to name a few). The execution of verbal fluency and proofreading tasks may require more complex cognitive processes than a task of matching consonants and vowels. However, a further classification by each specific linguistic task was not feasible due to the limited sample. There are no means (at least in the context of this review) to identify how comparable (or not) the difficulty levels amongst these tasks, and the extent to which they are susceptible to the influence of BgM. But as our analysis on the contribution of task difficulty demonstrated a detrimental impact of BgM on difficult tasks, our findings with regard to the impact of BgM on linguistic tasks should be considered with caution, especially in light of the possible confound of task difficulty.
On the other hand, the inconsistent situational and procedural contexts across studies can also pose challenges to the generalizability of some results. For example, the studies on BgM and free recall performance differed in terms of when music was played. Some studies played BgM only during the process of encoding (Ferreri et al., 2015; Woo & Kanachi, 2006), and others from the encoding stage all the way until the end of the testing stage (Bottiroli et al., 2014; Furnham & Bradley, 1997). Some studies also started playing BgM seconds in advance to the start of the cognitive tasks (Bottiroli et al., 2014; Echaide et al., 2019; Ferreri et al., 2015), whilst for others, both music and tasks commenced together (Nguyen & Grahn, 2017; Woo & Kanachi, 2006). Past studies have suggested that when concerning memory consolidation, if the same contextual information (e.g., BgM) is presented during both the encoding and recall stage, it could enhance the formation of memory bonds and prompt better recall (Godden & Baddeley, 1975; Tulving, 1979). Although the studies that set out to actually test this hypothesis found no support for it (Echaide et al., 2019; Ferreri et al., 2015; Nguyen & Grahn, 2017), the sheer difference in the learning and testing environment amongst our sample of studies is still a potential confound that could influence (however subtly) how BgM affected participants’ performance.
Another main challenge in conducting a systematic review on this topic is the trade-off between having a representative sample of articles (but with less robust results) or conducting in-depth, informative, and robust quantitative analysis (but with a smaller and potentially less representative sample of articles). Our quality assessment outcomes from 154 experiments showed that only 3% reported effect sizes, and only 14% reported the exact significance values for all comparisons (both significant and non-significant). To that end, it is not surprising that the meta-analysis conducted by Kämpfe et al. (2010) identified only less than a handful of cognitive tasks, and each with a small number of experiments (eight for reading performance and memory and two for mathematics/arithmetics, compared to our review that includes 21 experiments for reading, 53 for various memory-related tasks, and 15 for mathematics/arithmetics). However, Kämpfe et al. (2010) (as well as Vasilev et al., 2018) provided more in-depth report on the effect sizes of BgM's impact on each cognitive task performance, whereas our sample data only allows a surface-level evaluation of potential trends. With respect to this, we also advise future empirical studies to consider using the MCPAT in the design phase in order to provide high-quality contributions to this area.
Contributions
The contributions of this work and our findings are manifold.
In relation to the research questions we presented at the outset, we clearly show that it is fundamental to consider the nature of the cognitive tasks when evaluating the effect of BgM on cognitive task performance. Our analysis has demonstrated that even cognitive tasks in the same domain can have different levels of susceptibility to the influence of (different types of) BgM. Beyond that, task difficulty (and perhaps other characteristics not evaluated here) can further determine how BgM affects task performance, irrespective of cognitive domain or type of task. Indeed, human cognition is a complex system, and distinctive cognitive tasks are also functionally different (Eysenck & Keane, 2020). Consequently, they could be affected differently by different types of BgM, and evaluations that do not adequately account for the task-specific effects of BgM might not be representative.
Moreover, our results demonstrated that both music and listeners’ characteristics can further influence how BgM affects cognitive task performance. Indeed, we have clearly shown that at least the presence of lyrics (music-related) and the listener's level of extraversion (listener-related) are determinant factors in this process. Therefore, a proper control and reporting of relevant music characteristics and individual variables are important in empirical studies. Note also that the effects of (various) music and listeners’ characteristics should not be reduced to the effects found in this review due to the fact the data available for our review were limited, and as such we were unable to evaluate thoroughly other relevant effects.
Whilst attempting to answer the research questions, we have also provided a thorough perspective on research in this area, with very detailed insights into the studies that have been conducted in the field, and the sub-areas and topics that still lack research. Clearly, a lot has been done (especially in the last 10 years); but, by far, most studies (see summary results in Table 11) have concentrated on the domains of memory (and especially those concerning verbal materials), language and thinking tasks, with particular focus on serial recall, reading comprehension and mathematics/arithmetics test performance. Clearly, other cognitive domains with relevance to everyday life tasks need further research. For instance, there is a lack in the studies of BgM's impact on memory for non-verbal materials, language learning, writing quality, writing fluency, reading fluency, working memory, verbal reasoning, or creativity. Furthermore, more music characteristics may likely moderate the impact of BgM on cognitive performance (e.g., mood, complexity, language of lyrics, instruments used, tonality, dissonance), sometimes in interaction with listener-related characteristics (e.g., familiarity, preferences, musical training/background, working memory capacity, preference for external stimulation).
Another contribution is the MCPAT, which offers an important tool to appraise both the methodological and reporting qualities of experiments related to music listening and cognitive task performance. Given the challenges in this area, we hope that the development of this tool will not only aid the conduct of future systematic reviews, but also to guide the design of future empirical studies. Ideally, we hope that such guideline could aid the development of the field in the long-run—with the production of quality empirical studies and consequently, contributing to reviews with results that are informative, robust, and representative.
Furthermore, our methodological and analytical approaches are also important contributions. For methodological contribution, we have demonstrated through our findings that in order to obtain informative results, domain and task-specific mappings of the cognitive tasks should be performed prior to data synthesis and analysis. With respect to analytical contribution, our analysis approach (i.e., vote counting and sign test analysis) also testify to the SWiM guidelines that in circumstances when a meta-analysis is not plausible, there are next-best alternatives available that generate meaningful statistical inferences using a lesser amount of data from the reported summary statistics (although the limitations of the approaches should be considered when interpreting the results). We hope that these approaches could also serve informative as examples and/or guidelines for future reviews.
Finally, our detailed protocol and data are freely available and we hope that future work can build upon this review by updating it with new findings.
Conclusion and Outlook
With this systematic review, we have provided a structured approach to analyzing previous works on the impact of background music on cognitive tasks performance. By doing so, we shed light in this area and demonstrate that future research must consider task, music and population-related factors from the outset in order to offer impactful results in this area. We have also demonstrated that, with the limited available evidence, music does not seem to have a generalized negative impact on cognitive task performance; but, when it does, it tends to have lyrics, primarily affecting specific memory-related and language-related tasks, and affect disproportionately individuals that display personality traits of introversion. We hope that these findings are also of value for people who habitually accompany their work or study with music, which is particularly relevant given the increasing commonality in desk jobs and remote working and studying (especially during and after the COVID-19 pandemic; Office for National Statistics, 2020; Wong, 2020).
Looking forward, we have some key recommendations for future research. Obviously, and discussed earlier, task, music and population-related factors (and especially the multiplicative impacts amongst these factors) must be considered in any research design. Moreover, there is a clear lack of research is a variety of cognitive domains and tasks that needs to be addressed. On top of that, we suggest that future research ventures beyond the analysis of cognitive tasks based on their content (e.g., verbal or non-verbal materials) and considers the level of cognitive control required to perform the cognitive processes involved in those tasks (some tasks benefit from high-level cognitive control whereas some benefit from a more relaxed cognitive state; Amer et al., 2016). Correspondingly, the potential evaluations could be (1) whether different types of BgM would differentially affect tasks that demand high and low cognitive control, and (2) whether there are multiplicative interactions among the types of music, types of tasks, as well as the level of cognitive control required by those tasks on performance. Furthermore, current studies on the impact of BgM on cognitive tasks performance have been focusing mainly on very distinctive types of tasks (e.g., serial recall, reading comprehension, abstract reasoning, etc.). However, the execution of everyday-life activities (including studying and work-related tasks) are much less straightforward—they involve the combinations of multiple cognitive processes. Therefore, it will be interesting for future studies to evaluate the differentiated impact of different types of BgM (as well as the moderating impacts of individual differences) on performances in more generalizable tasks that reflect those occurring in everyday life (e.g., evaluate how BgM affects the outcomes in actual study or work sessions; see procedures in Calderwood et al. (2014) and Lesiuk (2005)). More generally, we suggest that researchers use this systematic review as a platform for informing future research questions and exploring specific areas in this field.
As a final note, we would like to add that the self-reported benefits of BgM on task performance (by both students and office workers; e.g., Haake, 2011; Kotsopoulou & Hallam, 2010; Lesiuk, 2005) suggest that music facilitates the perceived improvement in performance through its impact on the affective states of listeners (e.g., improving mood, enhancing motivation, promoting relaxation, regulating energy levels), which then facilitates engagement and time spent on the tasks (rather than the impacts on performance per se). Interestingly, this is seldom the focus of research in this area and, whereas the main focus thus far has been on the interference (positive or negative) of BgM on specific cognitive processes, we have yet to assess how they interact with affective responses to the music and their relative importance in terms of task engagement, completion and performance. The interesting question here would be to what extent listening to music is helpful towards performance in a concurrent task at all these levels.
Supplemental Material
sj-docx-1-mns-10.1177_20592043221134392 - Supplemental material for Background Music and Cognitive Task Performance: A Systematic Review of Task, Music, and Population Impact
Supplemental material, sj-docx-1-mns-10.1177_20592043221134392 for Background Music and Cognitive Task Performance: A Systematic Review of Task, Music, and Population Impact by Yiting Cheah, Hoo Keat Wong, Michael Spitzer and Eduardo Coutinho in Music & Science
Footnotes
Acknowledgement
We would like to thank Chairos Loo, Yi-Ning Chuah, Emma Risley, Yee-Wen How, Shu-En Lee, Xin-Er Lee, Nalni Moorthi, Sheng-Yee Wan, Jing-Kai Wong, and Xuen Yu who assisted with data screening and extraction. We also express our gratitude to Manuel Gonzalez, Roger Johansson, Samuel Ken-En Gan, and William Thompson for sharing their research data.
Action Editor
Diana Omigie, Goldsmiths, University of London, Department of Psychology.
Peer review
Luca Kiss, Goldsmiths, University of London, Department of Psychology.
E. Glenn Schellenberg, University of Toronto Mississauga, Department of Psychology.
Author Contributions
YC and EC planned the review and wrote the protocol. YC performed all the searches and YC, EC, and HKW conducted the article selection process. YC led the data extraction with the support of HKW and EC. The data analyses were conducted by YC with the support of EC. YC and EC prepared the manuscript and MS provided feedback on the final version. All authors reviewed and approved the final version of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
This research did not require ethics committee or IRB approval. This research did not involve the use of personal data, fieldwork, or experiments involving human or animal participants, or work with children, vulnerable individuals, or clinical populations.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by a Ph.D. scholarship awarded by the University of Liverpool to Yiting Cheah.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
