Abstract
This research investigated the cognitive processes involved in classifying the cultural origin of music and how various musical features influence these processes. One hundred White-European (Western) listeners from the United Kingdom, the United States, New Zealand, and Australia listened to 48 ten-second music excerpts, comprising Western classical and Chinese traditional bowed-string music conveying happiness, sadness, calmness, and agitation. Western listeners classified each music excerpt as either Western or Chinese in a forced-choice task and rated their familiarity with the music. Measures of accuracy, sensitivity, response bias, and response time were obtained, while musical features were extracted for subsequent regression analyses. Listeners were more accurate in classifying Western music than Chinese music but responded more quickly when attempting to classify Chinese music than Western music. Listeners also demonstrated high sensitivity in distinguishing between Western and Chinese music, but with a liberal response bias to classify music as Western. Listeners exhibited reduced ability to distinguish between Western and Chinese music (low sensitivity) that was happy or agitated, with a strong response bias to classify such excerpts as Western. Musical features and culture-specific factors significantly predicted classification accuracy and response time for both Western and Chinese music. We discuss the roles of decision-making processes in the cultural classification of music, proposing that both heuristic (automatic) and reflective processes are involved. We also discuss potential mechanisms through which the emotional connotations of music impact upon its perceived cultural origins.
Keywords
Musical behaviours are observed globally, with different forms of music reflecting the unique practices, beliefs, social conventions and values of different human populations (Mehr et al., 2019; Thompson & Balkwill, 2010). Musical structure and accompanying activities vary across geographic, linguistic, religious, and ethnic backgrounds. For example, Western classical music is often distinguished by its use of harmony and complex chord progressions within the heptatonic scale, with legacies that include symphonies, sonatas, and operas often performed in concert halls. In contrast, traditional Chinese music emphasises melodic contour and the pentatonic scale, with instrumental folk traditions often categorised into civil and martial styles. Each tradition employs distinctive instruments with historical and cultural significance. The aesthetics of these musical styles – such as the grandeur of Western symphonic works or the subtlety of Chinese melodies – are intertwined with cultural rituals, belief systems, and historical contexts that shaped their development (Hao, 2023; Nan & Guan, 2024; Thompson et al., 2023).
Acculturation to a particular musical system influences the way we perceive, classify, appreciate, and respond to unfamiliar music from a different culture. Culture-specific knowledge shapes cognitive processes like categorisation, expectation, emotional response, and meaning-making (Berry et al., 2011). Thus, when we encounter unfamiliar music from a different culture, our experience typically involves perceiving the music as foreign and, in many cases, contemplating its cultural origins (Chilvers et al., 2023). The process of categorising familiar and unfamiliar music can be broken down into three key stages. The first stage, cue abstraction, involves identifying similarities and differences in auditory cues to form high-level groupings, such as recognising scales, instrumental timbres, or rhythms characteristic of specific musical traditions (Deliege, 1996). The second stage, comparison and matching, utilises pattern recognition to compare auditory cues with mental prototypes of musical genres. According to prototype theory (Rosch, 1975), categorisation is based on similarity to these central representations rather than strict rule-based definitions, enabling listeners to distinguish traditions such as Western versus non-Western instrumental qualities. The third stage, decision-making, involves generating hypotheses based on abstracted cues and making classification judgements, relying on cognitive strategies like pattern recognition and familiarity to streamline the process.
Long-term exposure to recurring features in music helps us to determine the cultural origins of music, even when we have never heard that music before. The ability to classify music as familiar or foreign results from enculturation, a process by which individuals internalise the norms of their own and other cultural practices (Thompson et al., 2019). Enculturation to Western music allows individuals to quickly recognise exemplars of Western music, and to classify exemplars into more precise styles such as classical, jazz, or heavy metal (Cook, 2000).
When listeners encounter music foreign to their lived experience and culturally unfamiliar to them, the same process of categorisation takes place, but may be prone to error. For example, upon hearing Japanese gagaku or Korean court music, listeners might recognise it as “Asian” but incorrectly assume it is Chinese. By contrast, familiarity with the music of one’s own culture may lead to an in-culture advantage – the tendency to recognise and classify stimuli from one’s own culture more accurately than those from other cultures (e.g., Elfenbein & Ambady, 2002; Thompson et al., 2019).
Decision-Making and Heuristics in Music Classification
Decision-making plays a crucial role in the process of classifying music. When listening to short excerpts of music, people appraise and classify the musical features that are present in order to identify the type of music they are hearing. The response time and accuracy of these decisions depend on the listeners’ ability to identify musical elements, their level of familiarity with those elements, and the mechanisms by which decision-making is achieved.
Heuristics are mental shortcuts that help us make quick decisions but can sometimes lead to biased judgements, especially when dealing with unfamiliar cultural elements. According to dual-system theory (Kahneman, 2003; Sloman, 1996), there are two types of decision-making processes: System 1 thinking is fast, automatic, intuitive, and effortless, whereas System 2 thinking is slower, deliberate, and analytical. In uncertain situations, people often rely on System 1 thinking, using heuristics to make quick decisions.
Two important heuristics relevant to music classification are the availability heuristic and the recognition heuristic. The availability heuristic refers to a tendency to make a rapid judgement based on how easily examples come to mind (Tversky & Kahneman, 1974). The recognition heuristic refers to the tendency to make a judgement based on a single recognised cue while ignoring other potentially conflicting information. It follows the fast-and-frugal principle, which enables quick and efficient decisions by relying on easily accessible cues (Gigerenzer & Gaissmaier, 2011; Gigerenzer & Goldstein, 1996). Both availability and recognition heuristics operate under System 1 thinking, facilitating rapid decisions without the need for detailed analysis. In music, if a listener recognises an instrument that is often used in Balinese music, they might quickly conclude that the music originated from Bali. Once they make this judgement, they may avoid considering other cues that contradict this judgement, overlooking features that could place the music as a Western composition that happens to involve a non-Western instrument.
In familiar situations, people often rely on System 2 thinking. The elimination-by-aspects (EBA) theory posits that System 2 decision-making involves systematically eliminating choices based on specific criteria (Tversky, 1972). Applied to music, listeners will classify the cultural origin of new music by progressively evaluating aspects like instrumentation, tempo, or vocal style, refining their classification as they eliminate options that don’t match culturally familiar categories. For example, when hearing a new piece of heavy metal music, listeners might first focus on an aspect like electric guitar distortion, leading them to classify it as “metal.” They might then evaluate other aspects, such as the vocal style or tempo, to refine their classification into subgenres like “thrash metal.” This process of elimination, driven by the salience of different musical features, helps listeners classify music.
Combining System 1 heuristic decision-making with System 2 EBA theory allows us to predict that Western listeners will tend to judge (unfamiliar) non-Western music more rapidly but less accurately than (familiar) Western music. When detailed knowledge of a musical style is lacking, listeners tend to rely on heuristics, using prominent features of non-Western music to make rapid, surface-level judgements that are often imprecise. Conversely, EBA theory suggests that decision-making for familiar music will be slower but more accurate, given that many recognisable musical features can be used to eliminate choices when judging the cultural origins of that music.
Musical Features
Which cues are available for decision-making in music classification, and how do cross-cultural similarities and differences in the perception of these cues influence cognitive processes? The cue-redundancy model (CRM) proposed by Balkwill and Thompson (1999) distinguishes between psychophysical and cultural cues in the context of cross-cultural emotion recognition, and can be applied here to cross-cultural classification. Psychophysical cues, such as tempo and intensity, are acoustic properties that are isomorphic to the natural biomechanical and physiological processes that occur during emotional states. For example, the reduced acoustic activity in slow-tempo music is isomorphic to the reduced energy of low-arousal emotions such as sadness, while fast-tempo music mirrors the heightened energy of high-arousal emotions such as anger or agitation. In contrast, cultural-specific cues are tied to cultural knowledge, such as harmonic progressions in Western classical music or distinctive playing techniques in Chinese music. Such cues must be learned and hence are recognised primarily by cultural insiders.
According to the CRM, listeners rely more on psychophysical cues when evaluating the emotional meaning of unfamiliar music, resulting in responses that are broadly accurate but less precise than judgements of culturally familiar music, where both psychophysical and cultural cues are accessible. Thus, the ability to decode emotional meanings depends on cultural familiarity and sensitivity to psychophysical cues, predicting an in-culture advantage in emotion recognition accuracy (Thompson & Balkwill, 2010).
The CRM can explain, in part, the capacity to judge emotional meaning in unfamiliar music outside of one’s lived experience. The model also provides a useful framework to extend beyond emotion recognition and into cultural classification, specifically the cognitive processes involved in determining the cultural origin of unfamiliar music. Exploring whether psychophysical and cultural cues affect classification accuracy and response time could shed light on the cognitive mechanisms underlying cross-cultural classification of music.
Theoretical Framework and Predictions
The present investigation examined the cognitive processes by which listeners classify the cultural origins of music. These processes were inferred by examining the familiarity, accuracy, sensitivity, response bias, and response time when listeners classified the cultural origins of music from familiar and unfamiliar cultures. Additionally, we explored the impact of musical features and cultural factors on these processes.
Familiarity
There is limited research on how familiarity affects the classification of the cultural origin of music. Fung (1996) found that familiarity influenced world music preferences, but this research did not consider whether familiarity helps people identify the cultural origins of music. However, other research suggests that familiarity with one’s own culture enhances recognition and preference, including emotion recognition (Elfenbein & Ambady, 2002), art judgement (Darda & Cross, 2022), and music emotion recognition (Fritz et al., 2009; Laukka et al., 2013). Neuroimaging studies also show that familiar music activates emotion-related brain regions, whereas unfamiliar music engages broader functional networks, reflected in increased connectivity between regions as the brain works harder to process and integrate novel stimuli (Pereira et al., 2011; Thammasan et al., 2017). These findings suggest that Western listeners are likely to process Western music more efficiently, whereas unfamiliar non-Western music may require greater cognitive effort, prompting listeners to rely on heuristic strategies to determine its cultural origin.
Accuracy
Western listeners are expected to classify Western music more accurately than culturally unfamiliar music, such as Chinese music. While accuracy refers to the proportion of correct responses, sensitivity and response bias – two key factors in decision-making under uncertainty as described by signal detection theory (SDT) – can help explain the underlying patterns in listeners’ decision-making processes (Green & Swets, 1966; Macmillan, 2003). Sensitivity (d’) measures the ability to distinguish between “signal” (Western music) and “noise” (non-Western music). Response bias (criterion c) reflects a tendency to classify stimuli as “signal,” which often leads to a liberal bias (a lower decision criterion). In distinguishing between Western and Chinese music, high sensitivity indicates that listeners accurately classify the music’s origin. However, a response bias can cause them to classify music as Western more often, resulting in more hits but also more false alarms.
Research supports this prediction. Familiarity with one’s own culture enhances the accuracy of recognising emotional expressions in faces and voices (Chen et al., 2023; Elfenbein & Ambady, 2002, 2003; Juslin & Laukka, 2001; Paulmann & Uskul, 2014), as well as in cross-cultural music emotion recognition (Balkwill et al., 2004; Fritz et al., 2009; Laukka et al., 2013). Similarly, studies in art categorisation and cultural stereotype judgement demonstrate how enculturation fosters familiarity, leading to greater recognition accuracy (Alcott & Watt, 2021; Darda & Cross, 2022; Marsh et al., 2007). Moreover, SDT has been used to measure sensitivity and bias in various categorisation tasks, including pitch discrimination among autistic populations (Bonnel et al., 2003), music perception ability (Whiteford et al., 2023), implicit gender stereotyping (Banaji & Greenwald, 1995) and mood-influenced racial stereotyping (Park & Banaji, 2000). Taken together, we predicted that Western listeners would show greater accuracy in classifying Western music compared to unfamiliar Chinese music, with high sensitivity in discriminating cultural source and a liberal bias to classify music as Western.
Response Time
When people encounter unfamiliar stimuli, they often rely on heuristic decision-making, leading to faster but potentially less accurate judgements. Given that heuristic decision-making is a cognitive shortcut, Western listeners should classify Chinese music more quickly than familiar Western music. However, there are mixed findings on how familiarity affects response times in classification tasks. In some cases, familiarity can lead to faster processing due to cognitive efficiency. For instance, Elfenbein and Ambady (2003) found that emotion recognition for faces from familiar cultures occurred more quickly. Similarly, Filipic et al. (2010) demonstrated that familiar music is recognised accurately even in brief excerpts, highlighting this familiarity effect. In other cases, unfamiliar stimuli can prompt rapid judgements through heuristic strategies. Belfi et al. (2018) reported that liking for unfamiliar music was judged more quickly in aesthetic tasks, reflecting “gut-level” decision-making. Malekmohammadi et al. (2023) also reported that listening to familiar music induced alpha and low-beta power suppression, reflecting increased attention or arousal due to long-term memory engagement with familiar features. In contrast, unfamiliar music did not elicit the same sustained neural suppression, indicating that it is processed without the engagement of long-term memory systems observed for familiar music.
Response time is a key indicator of processing efficiency in studies of implicit attitudes, especially in tasks designed to test implicit bias. The Implicit Association Test (IAT) exemplifies this approach, assessing implicit bias by comparing reaction times for congruent and incongruent judgements (Greenwald et al., 1998). Despite its widespread use, most IAT studies focus on word or visual stimuli rather than auditory targets like music. Thus, there is a gap in understanding the significance of response time in cross-cultural music classification, where factors like familiarity and heuristic processing may function differently.
Given the role of heuristic decision-making in processing unfamiliar stimuli, we predicted that Western listeners would classify unfamiliar Chinese music more quickly but less accurately than classifications of familiar Western music. However, we also recognised that cultural classifications might be quicker for Western music than for Chinese music, given evidence that familiarity can increase the speed of decision-making.
Role of Musical Features in Cultural Classification
Research on cross-cultural music perception shows that musical features such as rhythmic patterns, melodic structures, and tempo are closely associated with emotional recognition, a link also well established in general music perception (Balkwill et al., 2004; Balkwill & Thompson, 1999; Fritz et al., 2009; Laukka et al., 2013). These features have been explored through subjective ratings and acoustic analyses, revealing their role in predicting perceived emotions (Balkwill et al., 2004; Balkwill & Thompson, 1999; Laukka et al., 2013; Nordström & Laukka, 2019).
Despite extensive research on the influence of musical features on emotional recognition, little is known about their role in the cognitive processes involved in judging the cultural origin of music. The CRM (Balkwill & Thompson, 1999) suggests that listeners rely on both psychophysical and culture-specific cues to accurately recognise emotions across musical cultures. Similarly, Thomas Fritz’s (2013) Dock-in Model explains how musical systems intersect with universal cues, while each tradition incorporates unique features that require cultural familiarity. As music evolves, it may “dock out” of these universal cues, reducing its accessibility to listeners from other cultures.
Given that cultural and emotional meanings in music are intertwined, it is plausible that the same features used to recognise emotions could also help listeners identify the cultural origin of music. For instance, distinct rhythms or instrumentation characteristic of a specific culture might serve as cues for cultural classification. Therefore, we predicted that musical features influencing emotional recognition would also affect cognitive processes, such as accuracy and response time, in cultural classification. This, however, remains an exploratory approach.
The Present Study
This study explored the cognitive processes involved in classifying the cultural origin of familiar and unfamiliar music, examining how listeners’ cultural background may contribute to systematic errors or biases, and how musical features might influence these processes. It provides a novel approach to assessing the cognitive mechanisms underlying cultural classification by examining accuracy, sensitivity, response bias, response time, and familiarity.
Western classical music and Chinese traditional music were selected as representative of two distinct musical cultures, and we recruited Western listeners who were familiar with Western music but not with Chinese music. To ensure instrumental comparability, the Western violin and Chinese traditional erhu 二胡 (both bowed-string instruments) were used (Song & Horner, 2022; Wang et al., 2021). Forty-eight ten-second excerpts (24 from each culture) were selected to convey agitation, happiness, sadness, and calmness, representing the four quadrants of the valence-arousal model (Russell, 1980). Including a range of emotional connotations allowed us to consider how emotional cues in music might impact upon decision-making in cultural classification (Bandyopadhyay et al., 2013; Lerner & Keltner, 2000).
The cultural origin and emotional connotation of the music excerpts used in the study were validated by expert Western and Chinese musicians (n = 8 per group), with extensive training in Western classical and Chinese traditional music, respectively. Each expert had performance and pedagogical experience, ensuring the selected excerpts reliably reflected culturally distinct characteristics. In the primary experiment, Western listeners completed a forced-choice task to classify each excerpt as either Western or Chinese as quickly and accurately as possible, using a 2 (culture: Western, Chinese) × 4 (emotion: agitation, happiness, sadness, calmness) within-subjects design.
We predicted that Western listeners would show greater accuracy in classifying Western music compared to unfamiliar Chinese music, with high sensitivity in discriminating cultural source and a liberal bias to classify music as Western. Conversely, the classification of unfamiliar Chinese music should be faster but less accurate due to heuristic decision-making under uncertain conditions. Additionally, psychophysical and culture-specific features were expected to influence these cognitive processes.
Method
The data reported in this article were collected from the same participants and with the same stimuli as in Li et al. (2025). The data sets do not overlap. This article focuses on classification data, whereas the other examined emotion perception data.
Participants
We recruited 108 Western (White-European) participants from the United Kingdom, the United States, New Zealand, and Australia via Prolific for an online survey hosted on Qualtrics. After eight returns, 100 participants completed the survey. Participants self-identified as White/Caucasian, enjoyed listening to music and had no or limited exposure to Chinese music. All participants passed five attention checks and received £4 for approximately 20 min of participation. Ethics approval was granted by the Macquarie University Human Research Ethics Committee (Reference No: 520231435649832).
Participants were aged 20–75 years (M = 41.58, SD = 14.70): 48 females, 48 males, 3 non-binary, and 1 who preferred not to disclose gender. On average, they had 1.55 years of formal instrumental training (SD = 3.17), and musician status was 36% nonmusicians, 46% music-loving nonmusicians, 12 % amateur musicians, and 6% as serious amateur musicians.
Given that participants may respond differently to each music stimulus, no fixed cutoff (e.g., 10 s) was imposed on reaction times. This approach allowed for the varied emotional content of the music to influence response times naturally. During data cleaning, trials where participants’ reaction times had a z-score greater than ± 2 for each stimulus were excluded, resulting in 4.65% of trials (223 out of 4,800; 99 Western music trials and 124 Chinese music trials) being discarded. The number of completed trials per participant ranged from 27 to 48 (M = 45.77, SD = 3.94). No participants were excluded.
An a priori power analysis indicated 54 participants would be sufficient for within-subjects designs (e.g., analysis of variance) to detect a medium effect size at 95% power (Faul et al., 2009). Based on the median sample size in cross-cultural emotion recognition research (100 participants; Elfenbein & Ambady, 2002), we recruited 100 participants.
Music Stimuli
Music stimuli consisted of 48 ten-second excerpts: 24 from Western classical violin and 24 from Chinese traditional erhu, selected from solo and ensemble pieces, with the target instrument prominently featured in ensembles. Within each culture, six excerpts represented each intended emotion: happiness, sadness, agitation, and calmness. A pilot study was conducted to validate the stimuli, with cultural experts from each tradition rating the emotional intent and confidence in the cultural origins of each music stimuli, using a 7-point Likert scale. Table 1 displays the mean ratings from this study. Bolded values indicate the highest-rated emotion, confirming that it always matched the intended target (e.g., happiness ratings for music intended to convey happiness). Confidence ratings were close to the maximum of 7, confirming that musical stimuli were unambiguously representative of Western or Chinese music. Full details of the pilot study, including selection procedures and validation results, are available in the supplemental material (Tables S1–S4). Downloadable audio files of the stimuli are available via the Open Science Framework (https://osf.io/3whjz/).
Mean Ratings of Emotion by Cultural Expert Musicians of the Music Stimuli.
Note: 24 Western music excerpts were rated by Western musicians (n = 8), and 24 Chinese music excerpts were rated by Chinese musicians (n = 8). There were six music stimuli per emotion type. Bolded values indicate the highest-rated emotion matching the intended emotion.
Procedure
Participants enrolled in this study through Prolific provided informed consent, their Prolific ID, and their ethnicity for pre-screening. They were informed that they would listen to 48 ten-second music excerpts, equally divided between Western and Chinese pieces. Two practice trials familiarised participants with the task. Alerts were prompted during the practice trials if responses were delayed, but were omitted in the main study.
Cultural Classification
Participants first listened to each music excerpt and judged whether the excerpt was Western or Chinese using a two-alternative forced-choice discrimination procedure. They were instructed to respond as quickly and accurately as possible and playback ceased immediately upon response.
Ratings
After the classification task, participants rated their perceived familiarity and enjoyment of the excerpt, followed by perceived emotional responses (on four emotions). All six ratings are on a 7-point Likert scale (1 = “Not at all,” 4 = “Moderately,” 7 = “Very much”). The emotional rating question was framed as: “Please rate the extent to which you perceive these emotions in this music, assigning the highest rating to the most salient emotion: Happiness, Sadness, Agitation, Calmness.”
The order of music stimuli and five attention-check questions was randomised. After completing 48 trials, participants provided demographic details, including musician status assessed with a single-item measure ranging from 1 (nonmusician) to 6 (professional musician), developed by Zhang and Schubert (2019).
Design and Analysis
This study adopted a 2 × 4 within-subjects design, with two musical cultures (Western and Chinese) and four intended emotions (happy, sad, agitated, and calm) as the independent variables. The dependent variables were familiarity, along with accuracy rate and unbiased hit rate, sensitivity (d’ prime), response bias (c criterion), and response time for the cultural classification. Familiarity was measured using 7-point Likert scales, where 1 represented “not at all,” 4 “moderate,” and 7 “very much.” Response times for cultural classification were assessed using a series of paired sample t-tests.
Classification accuracy for Western and Chinese music was first determined by the percentage of correct responses (hit rate). The unbiased hit rate (Hu), which corrects for both overuse and underuse of a category, was calculated following Wagner (1993) as
Decision-making accuracy in classification was analysed using the signal detection theory framework (Macmillan, 2003; Macmillan & Creelman, 2005; Green & Swets, 1966). Sensitivity (d’ prime) measures the ability to distinguish between two stimuli, with Western music as the “signal” and Chinese music as the “noise”. It was calculated as
Musical features were extracted using MIR Toolbox 1.8.2 (Lartillot et al., 2008) in MATLAB (version R2023b). Six categories with 15 features were extracted: (1) dynamics (Root mean square (RMS), Low energy); (2) rhythm (Attack time, Tempo, Pulse clarity, Event density); (3) timbre (Spectral centroid, Spectral entropy, Roughness, Spectral flux); (4) register (Salient pitch); (5) tonality (Key clarity, Mode); and (6) Musical Novelty (Spectral Novelty, Tonal novelty). For the descriptions of the features, see Laukka et al. (2013).
Statistical analyses were conducted using the Statistical Package for the Social Sciences (SPSS v.28), including paired and independent sample t-tests (two-tailed), analyses of variance (ANOVAs), and partial least squares regression (PLSR). Partial least squares regression (PLSR) was used to examine whether acoustic features predicted classification accuracy, response times, and familiarity ratings. PLSR is tolerant of large numbers of predictors and relatively small sample sizes (Carrascal et al., 2009), reduces multicollinearity by extracting latent components, provides regression coefficients (b values) and variable importance in projection (VIP) scores to assess predictor contributions, and evaluates model fit using explained variance and adjusted R². Predictors (15 acoustic features across 6 categories) were mean-centred and standardised. Separate PLSR models were tested for Western and Chinese music for each outcome, with extraction limited to six latent components (corresponding to the six acoustic categories). Because SPSS (version 28) does not provide cross-validation indices (e.g., Q²), model selection was based on variance explained and adjusted R². Predictors were interpreted using VIP ⩾ 1.0 and meaningful regression coefficients. Data visualisation was conducted using the R programming language (version 4.3.3) (R Core Team, 2024).
Results
Familiarity Ratings
Table 2 presents descriptive statistics for familiarity ratings and results from paired sample t-tests comparing Western and Chinese music across each emotion type. For total music, Western listeners rated music from their own culture (M = 3.35, SD = 1.49) as significantly more familiar than music from the Chinese culture (M = 2.26, SD = 1.09), t(99) = 11.33, p < .001, d = .96. Similar patterns emerged across all emotion categories, ts(99) ⩾ 8.33, ps < .001, ds ⩾ .94. These findings suggest that enculturation or previous exposure leads to greater familiarity with music from one’s own culture.
Descriptive Statistics and Paired Sample t-tests for Familiarity Ratings.
Note: N = 100. Ratings are means with standard deviations in parentheses, on a 7-point Likert scale. Total trial counts = 4,577 (Western = 2,301; Chinese = 2,276).
Cultural Classification Accuracy
We analysed cultural classification accuracy by comparing the accuracy rates for Western and Chinese music, measured as the percentage of correct classifications. We then analysed and compared unbiased hit rates between the two music cultures to account for any potential bias. Table 3 displays the confusion matrix of cultural classification and difference in classification accuracy in response to Western and Chinese stimuli. Paired sample t-tests revealed that the classification accuracy for Western music was significantly higher than for Chinese music, with a mean difference of 8.72%, t(99) = 7.75, p < .001, d = .78. Furthermore, accuracy varied depending on the intended emotion of the music. When the music was agitated, accuracy in cultural classification was 28.75% greater for Western music than Chinese music, t(99) = 11.18, p < .001, d = 1.12, and 7.70% for happy music, t(99) = 5.61, p < .001, d = .56. No significant differences in accuracy were observed for calm and sad music, ts(99) = .30 and –.62, p-values = .386 and .268, ds = .03 and .06, respectively.
Confusion Matrix for Cultural Classification Accuracy.
Note: Percentage of trials in which Western and Chinese music stimuli were classified as Western or Chinese. Rows represent the presented stimuli, and columns represent participants’ responses. Correct classifications appear in bold. The final column (Difference %) reports the difference in accuracy between Western and Chinese stimuli. N = 100. Total trial counts = 4577.
p < .001.
The unbiased hit rate (Hu) results supported this pattern (see Table 4 for descriptive statistics). All Hu rates were significantly above-chance (Pc), ts(99) ⩾ 15.52, p-values < .001, ds > 1.55, indicating an above-chance hit rate in classifying both Western and Chinese music. Western listeners showed significantly higher Hu rates in identifying Western music, with a mean difference of .03 (SD = .04), t(99) = 5.97, p < .001, d = .60. High-arousal Western music was classified significantly more accurately than high-arousal Chinese music in Hu rates, showing significant differences in the agitated music (M = .12, SD = .13), t(99) = 9.02, p < .001, d = .90; and happy music (M = .02, SD = .08), t(99) = 3.09, p = .003, d = .31. No significant differences were observed for music intended to convey the low-arousal emotions (calm, sad), ts(99) = .74 and –.86, p-values = .459 and .393, ds = .07 and –.09. Thus, the Hu analysis corroborated the raw accuracy results.
Mean Hit Rates (%) and Unbiased Hit Rates (Hu) for Cultural Classification of Western and Chinese Music.
Note: Hit % = mean percentage of correct classifications. Hu = mean unbiased hit rate. Values are means across participants, with 95% confidence intervals in brackets. N = 100.
Overall, Western listeners were more accurate in classifying Western music relative to Chinese music. However, the accuracy varied based on the emotional intention of the music: high-arousal music (agitated, happy) resulted in higher classification accuracy for Western music than Chinese music, whereas calm and sad music resulted in no significant differences in accuracy. For the Western agitated excerpts, the large discrepancy between hit rate and Hu indicates a strong tendency to classify all excerpts as Western.
Cultural Sensitivity and Response Bias
Sensitivity was measured by calculating

Decision-making patterns from classifying Western and Chinese music.
Response bias was measured by calculating criterion c. The overall response bias criterion c was –0.27, 95% CI [–0.32, –0.20], significantly below zero (p < .001). A negative c reflects a liberal bias, indicating that Western listeners were more likely to classify the music stimuli as Western music rather than Chinese music. Response bias varied across the intended emotional dimensions, as shown in Figure 1(b). The biased tendency to classify music as Western music was greater for the high-arousal excerpts: agitated music (Mc = –.38, 95% CI [–0.44, –0.30]) and happy music (Mc = –.10, 95% CI [–0.14, –0.06]), both below zero. For the low-arousal excerpts, calm (Mc = –.01, 95% CI [–0.05, 0.03]) and sad (Mc = .02, 95% CI [–0.03, 0.06]), the 95% CIs crossed zero, indicating no reliable bias. This finding suggests that low-arousal music elicited a more balanced decision-making approach.
Overall, these findings illustrate high sensitivity by Western listeners to the cultural origins of music, accompanied by a liberal response bias that reflected a tendency to classify music as Western. High-arousal music was associated with reduced cultural sensitivity and increased cultural response bias relative to low-arousal music.
Cultural Classification Response Time
Table 5 lists descriptive statistics for cultural classification response times in seconds, along with results from paired sample t-tests that compared mean differences in classification response time between Western and Chinese music. Western listeners classified Chinese music significantly faster than Western music, with a mean difference of .765 s, t(99) = 7.73, p < .001, d = .77. Furthermore, there were significant differences in response times for low-arousal Western and Chinese music. First, calm Chinese music was classified more quickly than calm Western music by a mean difference of 1.700 s, t(99) = 10.80, p < .001, d = 1.08. Second, sad Chinese music was classified more quickly than sad Western music by a mean difference of 1.213 s, t(99) = 7.07, p < .001, d = .71. There were no significant differences in response times for high-arousal Western and Chinese music (agitated, happy), ts(99) ⩽ .80, p-values > .05, ds ⩽ .08.
Descriptive Statistics and Paired Sample t-tests for Classification Response Time.
Note: N = 100. Response times are means with standard deviations in parentheses, measured in seconds. Total trial counts = 4,577 (Western = 2,301; Chinese = 2,276).
Figure 2 illustrates the density of response times for each intended emotion across Western and Chinese music. For each intended emotion, overlapping density plots are shown for the two cultures, with response times (in seconds) on the x-axis and probability density on the y-axis. Dotted lines indicate the mean response times for Western music, while solid lines represent those for Chinese music. The figure reveals that for music conveying agitated and happy emotions, the mean response times are similar, as indicated by the minimal or non-existent gaps between the lines. In contrast, music conveying sad and calm emotions shows a significant difference in response times, with larger gaps between the mean value lines. Additionally, both cultures exhibit faster response times for happy and agitated music.

Classification response time difference by intended emotion across cultures.
Furthermore, there were no significant correlations between response times and accuracy, either across emotion categories or within each emotion category, rs (98) ranging from –.20 to .11, all p-values > .05 (for detailed results, see Table S9 in the supplemental material). This finding suggests that accuracy and response times provide distinct insights into cognitive processes in cultural classification, rather than reflecting a speed-accuracy trade-off.
Overall, Western listeners classified Chinese music more rapidly than Western music, but response patterns varied depending on the intended emotion: low-arousal music (calm, sad) elicited faster classification of Chinese music than Western music, whereas response times for classifying Western and Chinese music were similar for high-arousal music (agitated, happy).
Musical Features Predicting Cognitive Processes
We conducted six PLSR models to examine whether acoustic features predicted classification accuracy, response times, and familiarity ratings for Western and Chinese music separately. For each model, we report the number of components that provided the best balance between explanatory power and parsimony, along with the influential predictors. PLSR is tolerant of large numbers of predictors and relatively small stimulus samples such as in this study, and it has previously been employed to examine the role of musical features in affective judgements (e.g., Wang et al., 2021). A detailed comparison of musical features and the full regression coefficients (b values) and Variable Importance in Projection (VIP) scores is available in the supplemental material (Tables S5–S11). The full acoustic feature dataset extracted from each excerpt is provided in an Excel file (by stimuli features MIR data.xls) available at https://osf.io/3whjz/.
For classification accuracy, a three-component PLSR model explained 68% of the variance (R²adj = .64) in classifying Western music. The most influential predictors were roughness (VIP = 1.69, b = –61.36), spectral novelty (VIP = 1.37, b = 8.04), and spectral entropy (VIP = 1.24, b = –2.21). In addition, a four-component PLSR model explained 79% of the variance (R²adj = .74) in classifying Chinese music. The most influential predictors were pulse clarity (VIP = 1.76, b = –2.46), event density (VIP = 1.45, b = –.48), and roughness (VIP = 1.27, b = 9.23).
For classification response time, a six-component PLSR model explained 82% of the variance (R²adj = .76) in classifying Western music. The most influential predictors were spectral entropy (VIP = 1.40, b = –2.45), pulse clarity (VIP = 1.59, b = –1.31), tonal novelty (VIP = 1.23, b = .65), and RMS (VIP = 1.19, b = –3.27). In addition, a six-component PLSR model explained 75% of the variance (R²adj = .67) in classifying Chinese music. The most influential predictors were pulse clarity (VIP = 1.66, b = .93), RMS (VIP = 1.61, b = –3.10), Attack time (VIP = 1.34, b = –8.73), and tonal novelty (VIP = 1.35, b = –8.96), roughness (VIP = 1.24, b = –28.72).
For familiarity, a three-component PLSR model explained 42% of the variance (R²adj = .33) in Western music. The most influential predictors were pulse clarity (VIP = 1.79, b = 1.85), attack time (VIP = 1.64, b = 5.15), event density (VIP = 1.40, b = –.19), and tonal novelty (VIP = 1.38, b = –9.54). In addition, a three-component PLSR model explained 72% of the variance (R²adj. = .67) in Chinese music. The most influential predictors were tonal novelty (VIP = 1.40, b = 3.45), spectral novelty (VIP = 1.24, b = 2.92), spectral entropy (VIP = 1.08, b = –.60).
Figure 3 illustrates significant shared and culture-specific predictors of cultural classification. Timbral features (roughness) were shared predictors of classification accuracy; rhythmic features (pulse clarity), dynamics (RMS energy), and tonal novelty were shared predictors of response times; tonal novelty was a shared predictor of familiarity (Panel A). From a culture-specific perspective, in Western music, spectral entropy (a timbral feature) and spectral novelty were predictors of accuracy, and spectral entropy was a predictor of response time (Panel B). Rhythmic features (pulse clarity, attack time, event density) were predictors of familiarity. For Chinese music, pulse clarity and event density predicted classification accuracy (Panel C). Attack time and timbral roughness predicted response time, while spectral entropy and spectral novelty predicted familiarity. Together, these findings reveal a nuanced interplay between acoustic features and cognitive processes in cultural classification: particular timbral, rhythmic, dynamic, and novelty-related features operate as both shared and culture-specific predictors.

Musical features as shared and culture-specific predictors.
Discussion
We investigated how Western listeners classified 10-s excerpts of Western and Chinese music by measuring cultural classification accuracy, sensitivity, response bias, reaction time, and familiarity to infer underlying cognitive processes. We also examined musical features that predicted classification outcomes. As expected, Western listeners were more familiar with Western music than Chinese music, leading to: (1) higher accuracy for classifying Western music, particularly in high-arousal music (agitated, happy); (2) high sensitivity but a liberal response bias to classify music as Western music, particularly in high-arousal music; and (3) faster responses to Chinese music, particularly in low-arousal music (calm, sad).
Overall, Western listeners showed greater accuracy in classifying Western music than Chinese music. Although listeners could distinguish Western from Chinese music, they tended to misclassify unfamiliar Chinese music as Western, particularly in high-arousal music, where heuristic judgements were more likely. Faster response times for Chinese music, especially for low-arousal Chinese music (calm, sad), indicate a reliance on heuristic processing driven by salient, recognisable acoustic cues. For Western music, the decision-making process listeners employed was dependent on the music’s level of arousal: low-arousal music prompted slower, thoughtful analytical responses (System 2), whereas high-arousal music elicited faster, intuitive heuristic responses (System 1). Timbral features (e.g., roughness) predicted classification accuracy, rhythmic and dynamic features (pulse clarity and RMS) predicted response times, and a tonal feature (tonal novelty) predicted familiarity across both cultures. Sensitivity to these music-specific and culture-specific cues impacted listeners’ ability to recognise cultural origins, affecting accuracy and response times through heuristic or analytical decision processes. Familiarity judgements for Chinese music relied on specific timbral features (e.g., Chinese-sounding instrumentation), while familiarity with Western music was supported by rhythmic cues but overall was less feature-dependent, likely reflecting the increased role of enculturation. Misclassification of Chinese music as Western music often occurred when psychophysical features like rhythm or dynamics overrode culturally distinctive elements. We will now introduce a novel framework to help understand and explain these findings.
A Framework of Cultural Classification
To better understand the cognitive processes in cultural classification, Figure 4 illustrates a framework of music engagement, musical features, cognitive mechanisms, and sociocultural influences to classification outcomes. This framework provides an organised model for interpreting how listeners respond to musical stimuli based on arousal and feature recognition, and how these processes influence classification outcomes. It includes four layers: music engagement, feature predictors, cognitive and sociocultural influences, and classification outcomes.

A framework of cultural classification.
In the first layer, music engagement refers to Western listeners’ engagement with both Western classical and Chinese musical excerpts. Each excerpt was intended to convey one of four emotions, comprising two high-arousal emotions (agitation and happiness) and two low-arousal emotions (calmness and sadness). These emotional qualities influence how listeners attend to acoustic features and activate cognitive mechanisms (System 1 or System 2) when making classification decisions.
The second and third layers show how acoustic features and cognitive–sociocultural influences shape processing styles and biases. Acoustic features such as dynamics, rhythm, timbre, and novelty feed into cognitive mechanisms associated with signal detection and dual-process theories. Signal detection theory quantifies listeners’ sensitivity and response biases, while dual-system theory distinguishes between heuristic (System 1) and analytical (System 2) processing. Sociocultural influences such as enculturation, prior exposure, and stereotypes further bias these processes and expectations.
The last layer, classification outcomes, comprises accuracy, sensitivity (d′), response bias (c), and response time. Sensitivity underpins accuracy, response bias reflects systematic cultural tendencies (e.g., a bias towards classifying ambiguous excerpts as Western), and processing modes influence response time.
Cultural Familiarity and Classification Accuracy, Sensitivity, and Response Bias
Consistent with prior research on emotion recognition (Elfenbein & Ambady, 2002; Laukka et al., 2013), the present study has now shown that Western listeners’ familiarity with Western music contributes to greater accuracy in classifying the cultural origins of the music. Indeed, enculturation not only influences cross-cultural emotion recognition, but also cultural music classification. Familiarity with Western musical cues (e.g., scales, rhythms) offers cognitive advantages reflecting an in-culture bias (Laukka et al., 2013; Thompson & Balkwill, 2010). Specifically, Western listeners were more adept at distinguishing Western and Chinese music by attending to both psychophysical and culturally specific cues (Balkwill & Thompson, 1999).
Familiarity affected both sensitivity and response bias in classification judgements under conditions of uncertainty. High-arousal Western music (agitated, happy) increased classification accuracy but reduced sensitivity in distinguishing between Western and Chinese music, as listeners often misclassified high-arousal Chinese music as Western. This liberal response bias suggests that universal, high-arousal features (e.g., fast-tempo) overshadow culturally distinctive cues, reducing sensitivity to cultural distinctions. In particular, Chinese agitated music was perceived with higher intensity and higher recognition accuracy than Western agitated music (Li et al., 2025), yet these same excerpts were the most difficult to classify by cultural origin in the current study. In contrast, low-arousal Western music (calm, sad) exhibited more balanced sensitivity and fewer classification errors, as listeners could focus more on culturally specific details (Thompson & Balkwill, 2010). The heightened energy of high-arousal music may further amplify this bias, leading to overconfidence in classifying Western music (Juslin & Västfjäll, 2008; Thompson & Balkwill, 2010).
Classification Response Time and Decision-Making
Western listeners classified Chinese music faster, especially in low-arousal music (calm, sad), while response times for high-arousal music (agitated, happy) showed fast but no significant differences between two cultures. This challenges the typical familiarity bias, which assumes that familiar music would be processed quicker (Elfenbein & Ambady, 2003; Filipic et al., 2010). This discrepancy may arise because previous studies focused on emotional recognition rather than cultural classification. Importantly, differences in response times may reflect not only faster decision-making processes but also the speed with which certain musical features are processed. Indeed, our acoustic analyses showed that rhythmic cues predicted response times in the classification task.
Our results align with dual-system decision-making theory (Kahneman, 2003). For unfamiliar music, System 1 thinking likely drove rapid classification based on heuristics such as recognition and availability, using salient cues to make quick decisions. For example, when exposed to unfamiliar Chinese music, Western listeners likely relied on salient timbres and instruments, bypassing detailed analysis. This adoption of heuristic judgement was supported by Belfi et al. (2018), who found similar response in judgements of unfamiliar music.
System 2 thinking may be activated when listeners engage with familiar music, resulting in deliberate and analytical processing. The EBA theory (Tversky, 1972) suggests that listeners evaluate multiple aspects of familiar music systematically, leading to slower yet more accurate classification. This analytical approach increases cognitive load, which is reflected in longer judgement times (Malekmohammadi et al., 2023; Pereira et al., 2011). Specific emotions can also shape judgement and choices (Lerner & Keltner, 2000), for example, sad music may trigger introspection and calm music induces relaxation, potentially explaining the longer response times for familiar low-arousal Western music. Despite high-arousal Western music resulting in greater classification accuracy, response times for high-arousal Western and Chinese music were similar. This similarity suggests that dominant psychophysical cues, such as fast-tempo – an isomorphic feature of the heightened energy associated with happy emotion – drive quick judgements regardless of cultural origin. While our discussion draws on dual-systems theory, we acknowledge that such accounts have been debated as potentially oversimplified (e.g., Stanovich & West, 2000). Nevertheless, the System 1/System 2 framework provides a useful heuristic for understanding differences in response times across music cultures.
The Influence of Musical Features
Specific musical features acted as universal or shared cues, such as timbral roughness and rhythmic pulse clarity, dynamic energy, and tonal novelty while others were more culture-specific, including dynamic intensity, rhythmic complexity (attack time and event density), timbral cues (spectral entropy), and spectral novelty. Shared features helped Western listeners recognise similarities across cultures, facilitating judgements of unfamiliar genres (Laukka et al., 2013; Thompson & Balkwill, 2010). At the same time, culture-specific cues support a more nuanced evaluation of both familiar and unfamiliar music. For example, in high-arousal Western music, familiarity with rhythmic patterns may have reduced sensitivity to culture-specific cues, leading to greater response bias and quick, heuristic-based decisions (Gigerenzer & Goldstein, 2011; Pleskac, 2007). In contrast, low-arousal music triggered more balanced judgements and slower, deliberate processing. However, Chinese music posed challenges due to its timbral roughness, which diverged from Western listeners’ expectations and reduced sensitivity, thereby complicating accurate classification. Yet, these same unfamiliar cues also drove fast, intuitive judgements.
Our findings show that acoustic features explained substantial variances in classification accuracy, response times, and familiarity across cultures. Accuracy and familiarity judgements for Chinese music were more strongly predicted by features, suggesting that listeners relied on salient cues when processing unfamiliar music. In contrast, response times for Western music were more strongly predicted by features, indicating that enculturation supported more systematic, feature-based decision speeds for familiar music. This pattern suggests heuristic cue reliance when evaluating unfamiliar music and more analytical feature integration when evaluating familiar music.
The absence of significant effects for pitch and tonal predictors (e.g., key clarity and mode) suggests that Western listeners’ sensitivity to these cues may be ingrained and less dependent on explicit acoustic variation when classifying music across cultures. In contrast, multiple features such as rhythmic complexity, timbral instability, high energy levels created uncertainty and lowered familiarity ratings for Chinese music. This reflects a reliance on distinctive cues when encountering unfamiliar musical styles, which led to heuristic decision-making.
Implications and Limitations
This study is among the first to investigate the role of musical features in cultural classification, advancing knowledge of cross-cultural music perception and cognition. It illustrates how universal and culture-specific musical features interact with cognitive processes to influence Western listeners’ classification of music’s cultural origin. Emotional and cultural cues shaped classification decisions, with listeners alternating between heuristic and analytical thinking depending on the inherent musical features.
These findings extend past research investigating cross-cultural emotion recognition (Balkwill et al., 2004; Laukka et al., 2013; Wang et al., 2022) by examining classification accuracy, response time, and cultural familiarity, rather than focusing solely on emotional recognition. Our findings also support the CRM (Balkwill & Thompson, 1999), showing that psychophysical and culture-specific cues that underpin emotion recognition are also important predictors of cultural classification. However, the findings extend this model by demonstrating that the same broad attribute (e.g., timbre, rhythm) can function either as a psychophysical cue or a culture-specific cue. For example, within the attribute of timbre, acoustic roughness may function as a psychophysical cue, whereas more contextually grounded aspects of timbre, like the instrument sound of a Chinese erhu, often carry culture-specific significance. Our findings also refine dual-system decision-making theory (Kahneman, 2003) by showing that, in cultural classification, decision-making strategies (System 1 and System 2) are influenced by the emotional quality expressed by the music, specific musical features, and familiarity with the musical tradition.
One limitation of this study was the absence of a maximum cutoff time for responses. We mitigated this by excluding trials with extreme reaction times but suggest future research could explore a gating paradigm (Belfi et al., 2018) to better understand response strategies. It could be argued that the binary classification task does not fully capture “uncertain situations,” given that participants could rely on their familiarity with Western music to infer that unfamiliar items must be Chinese. However, our data do not support this interpretation: misclassification rates, response times, and signal detection analyses indicate that discrimination was not trivially easy, and that errors in discrimination reflected systematic cultural bias. The purposeful task order, with classification preceding emotion ratings, may influence emotion judgements by the speed with which participants were able to infer the expressed emotion. Future research could examine whether task order affects both emotion perception and classification responses. A further limitation is that the hypotheses were not pre-registered, which increases the risk of hindsight bias. Including Chinese listeners could strengthen cross-cultural comparisons. However, the widespread influence of Western music makes it challenging to find participants with limited exposure (Fritz et al., 2009; Thompson et al., 2019). Additionally, this study featured only two bowed-string instruments (the Western violin and Chinese erhu). Future research could use a wider range of musical instruments and larger music samples to better understand cross-cultural categorisation and the development of cultural stereotypes (Macrae et al., 1996; Tajfel, 1969). Future studies could also compare cultural origin classification between cultures with the classification of different genres within a culture to clarify both shared and distinct cognitive processes involved in music categorisation.
Reaction time patterns were consistent with heuristic decision-making processes, particularly for unfamiliar stimuli. However, reaction time may also reflect properties of the stimuli (e.g., tempo differences in low-arousal excerpts), along with familiarity effects. Thus, it is important to acknowledge that reaction time alone does not provide a direct or conclusive measure of the cognitive mechanisms engaged (Draheim et al., 2019). To confidently characterise the underlying decision-making processes, future research could supplement the current evidence with converging approaches such as computational modelling, process-tracing methods (e.g., mouse-tracking of response trajectories) and neurophysiological indices (e.g., EEG/ERP markers of processing stages). Finally, future research may benefit from incorporating subjective ratings of perceived music features (e.g., Balkwill et al., 2004) alongside the objective acoustic predictors used in this study.
Conclusion
This study investigated the cognitive processes in cultural classification of music, illustrating how features of music can shape cultural classification and how familiarity through enculturation or prior exposure influences sensitivity to cultural and emotional cues (Thompson et al., 2023). Enhanced cultural sensitivity can enrich appreciation for diverse musical traditions and reduce cultural bias (Chilvers et al., 2023; Harwood, 2017; Li et al., 2023). Indeed, exposure to unfamiliar musical cultures fosters cultural empathy, broaden cultural understanding (Li et al., 2023). Such exposure can also inform the development of culturally resonant interventions, thereby enhancing the efficacy of music therapy and education (Crooke et al., 2024).
Supplemental Material
sj-docx-1-msx-10.1177_10298649261442428 – Supplemental material for Cognitive Processes in the Cultural Classification of Music
Supplemental material, sj-docx-1-msx-10.1177_10298649261442428 for Cognitive Processes in the Cultural Classification of Music by Marjorie G. Li, Kirk N. Olsen and William Forde Thompson in Musicae Scientiae
Footnotes
Acknowledgements
We thank all the expert musicians for validating the music excerpts.
Ethical Considerations
The research presented in the article was approved by the Macquarie University Human Research Ethics Committee (HREC Reference 520231435649832) and informed consent was obtained from all participants.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Our work was funded by a Discovery Project grant awarded to Professor William Forde Thompson by the Australian Research Council (grant number DP190102978) and a Macquarie University Government Funded Research Training Program Scholarship awarded to the first author (allocation number 20224177).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
