Abstract
The Goldsmiths Musical Sophistication Index has been translated into several European languages. In the East Asian area, a traditional Chinese language translation is available. Due to differences in written characters and language use in various Chinese regions, a translation using simplified Chinese would reach a wider audience in mainland China and other regions. Our study, therefore, aimed to validate the simplified Chinese version of the Gold-MSI (Gold-MSI-SC) to replicate psychometric properties and factor structures of the Gold-MSI and to investigate the relationship between socioeconomic status (SES) and factors of the Gold-MSI-SC in a mainland Chinese sample (N = 64,555). Following the translation guidelines for intercultural research, the Gold-MSI-SC self-report questionnaire, two music listening tests, and the Musical-Rhythmic Intelligence subscale (M-RI) were included in the main study together with the demographic and SES-related questions. All subscales of the Gold-MSI-SC showed high internal consistency (Cronbachs’ ɑ = [.80–.91]) and good test-retest reliability (rtt = [.842–.935]). Confirmatory factor analysis revealed that the original bi-factor structure was replicated with satisfactory fit (root mean square error of approximation (RMSEA) = .053 and comparative fit index (CFI) = .888). Correlations between the Gold-MSI-SC and the music tests, as well as the M-RI, demonstrated strong convergent and discriminant validity; structural equation models revealed negative relationships between age and the Gold-MSI factors, while SES positively correlated with all of the subscales. The Gold-MSI-SC has thus been shown to be a reliable tool in assessing multidimensional musical behaviors in simplified Chinese and in supporting the measurability of musical sophistication in different cultures.
Keywords
Interacting with music is considered to be a universal phenomenon in the human world. Every known culture has developed specific ways of integrating music as an essential component in rituals, social practices, and everyday routines, and the capacity for taking part in such activities is a property of being human (Blacking, 1973; Wallin et al., 2001). In the anthropological and sociological context, the concept of musicality is a “statistical universal” as every person bears necessary predispositions to engage in musical activities (Trehub et al., 2015, p. 2). However, conceptualizations and implications of musicality might differ when it comes to psychological questions. The term musicality reflects the Western perspective (as of now) on the production and perception of music as a subjective cognitive process, while music, as an objective phenomenon, varies in its style and structure. Those with a growing interest in a biological basis for the human potential to music (Small, 1999) have emphasized a distinction between music as a social and cultural practice and musicality as “a spontaneous developing set of traits based on and constrained by our cognitive abilities and their underlying biology” (Honing, 2018, p. 51). In other words, music – encompassing its vast diversity – is fundamentally built on musicality.
From musicality to musical sophistication
Modern research on musicality, understood as a set of various abilities necessary for producing and perceiving music, can be traced back to the early 19th century. Following Gembris (1997), approaches to defining musicality can be categorized in three different phases that accentuate phenomenological (early phase) and psychometric aspects (second phase) in the 19th and early 20th century before focusing on musical meaning in late 20th century (third phase). From these different approaches, various terms have emerged, such as musical talent, musical intelligence, musical aptitude, musical achievement, musical ability, musical competence, and musicality, that are defined and used inconsistently (Gagné & McPherson, 2016, pp. 33–34).
Early attempts to define musicality were closely linked to the theories of musical aesthetics and taste. For example, Michaelis (1805) described various abilities required for recognizing and appreciating “good” music, and Billroth’s (1895) understanding of being musical closely followed the aesthetics of Hanslick (Gembris, 1997; Hanslick, 1854, p. 19). In the 19th century, the concept of musicality was defined as being a combination of highly developed perceptual skills and musical judgment and thus had an evaluative nature.
In the 20th century, the focus shifted to the productive aspects of engaging with music and included, for example, the distinction between receptive and productive musical abilities (Müllensiefen & Hemming, 2018, p. 95). Furthermore, the idea of discerning musical talent as a non-hierarchical set of sensory, motor, associational, and affective abilities was put forward by Seashore (1915) who then developed a standardized measure to select qualified applicants for conservatoires (Seashore, 1919, 1938). Serving the interests of music education, the majority of standardized tests were developed for school children and adolescents, while the development of standardized tests for adults has been a recent trend since the beginning of the 21st century (Müllensiefen & Hemming, 2018, pp. 104–109; Zentner & Gingras, 2019, pp. 251–253).
The common conceptualizations of the 20th century were influenced by the nature versus nurture debate in psychology and thus implicitly emphasized either heritable characteristics or learnable skills. In contrast, several attempts have been made in recent years to resolve the nature versus nurture debate in the theories of musicality. In particular, Hallam and Prince (2003) introduced a multifaceted conception of musical ability that involves “aural skills, receptive activities, generative activities, the integration of a range of skills, personal qualities, and the extent to which it is learned” (p. 2). In line with this definition, Sloboda (2008) and Levitin (2012) suggested that musicality is a polymorphic concept that integrates physical, emotional, cognitive, and psychosocial factors and includes many different subcomponents. The understanding of musicality and musical abilities has become multidimensional, encompassing learnable skills (e.g., playing musical instruments and singing), various activities of engagement with music (e.g., listening to and gathering information about music) as well as personal and motivational factors (e.g., interest in and commitment to music; Hallam & Prince, 2003). The introduction of the term musical sophistication in the context of standardized testing (Müllensiefen et al., 2014; Ollen, 2006) represents the most recent step in the conceptualization of musicality that considers musical activities to be a general feature of human cultural practices beyond any particular aesthetic ideals or musical genres.
Furthermore, given that musical activities are grounded in various cultural settings, such as music technology, spaces and places of musical encounters, and social practices and rituals, musical socialization always depends on the accessibility of and participation in musical activities (Sloboda, 2008). Therefore, standardized tests should take into consideration the variety of social, political, religious, technological conditions, and processes that form musical cultures.
Research in the fields of music sociology and ethnomusicology has shown that the phenomena of globalization have led to the integration of music cultures from across the world (Yoon, 2018; Zhao, 2011). This highlights the need for cross-cultural research in music psychology that aims to specify culture-specific concepts of musicality by revealing their inner variety (Jacoby et al., 2020). This also calls for conceptualizations that take into account various forms of engaging with music, such as making music, listening to music, gathering information about music, and being able to enjoy music emotionally, as well as cognitive processes such as perceiving rhythm and recognizing melodies. The concept of musical sophistication as posited by Müllensiefen et al. (2014) appears to be adequate to capture such changes because it does not promote particular musical aesthetics but refers to “musical skills, expertise, achievements, and related behaviours across a range of facets” (Müllensiefen et al., 2014, p. 2). These multiple facets are represented using a bi-factor structure, comprised of five subscales (Active Engagement, Perceptual Abilities, Musical Training, Singing abilities, Emotions) and a general subscale (General Musical Sophistication). Compared to other measurement tools developed in recent years, the Goldsmith’s Musical Sophistication Index (Gold-MSI, Müllensiefen et al., 2014) is unique in that it is for those who are not necessarily musically trained, which means it allows for the capture and analysis of individual differences in musical sophistication across a broader spectrum. Furthermore, it has been translated into several languages, including German (Schaal et al., 2014), French (Degrave & Dedonder, 2019), Portuguese (Lima et al., 2020), traditional Chinese (Lin et al., 2021), Japanese (Sadakata et al., 2022), Danish, and Russian (both language versions are available at https://shiny.gold-msi.org/gmsiconfigurator/). As such, it could stimulate future work within and across research communities and encourage the emergence of more cross-cultural research.
Traditional Chinese and Simplified Chinese
As a first study adapting the concept of musical sophistication to the culture of eastern Asia, Lin et al. (2021) translated the Gold-MSI into traditional Chinese, and validated it with a sample from the Taiwanese region. Results showed that the original factor structure could be confirmed, yet Lin et al. (2021) noted that the outreach of the questionnaire could be widened by using simplified Chinese—a written form that is used in mainland China (Lin et al., 2021, p. 247). The simplified Chinese characters were developed in the 1950s by the Script Reform Committee of China to improve the country’s literacy rate. Since 1956, simplified Chinese has gradually become the written standard in mainland China (DeFrancis, 1986), and it is also the official language in countries such as Singapore and Malaysia. Learning traditional Chinese characters is not part of elementary education in mainland China. Simplified Chinese is based on the traditional Chinese characters: some with simple structures were adopted, and more complex ones were simplified as to their structure and order of strokes. For the simplification, some characters were newly created, and, in total, simplified Chinese has fewer characters than traditional Chinese. Furthermore, some characters are used to represent words with the same pronunciation but have different meanings in simplified Chinese (Bökset, 2006). Such semantic ambiguities may cause serious problems in understanding items in a written self-report questionnaire, in which psychometric properties largely rely on the understanding of fine semantic differences (van de Vijver & Leung, 2011). In this regard, a simplified Chinese version of the Gold-MSI (Gold-MSI-SC) can reach a much broader target group (e.g., mainland China and other Chinese-speaking regions and countries) and introduce the concept of musical sophistication derived from up-to-date research with relevance for music psychology and music education to the Chinese-speaking research community. It also enables further cross-cultural research to include another sample from Asia, compensating for the bias arising from research based solely on samples of Western cultures (Berry, 2013; Stevens & Byron, 2016).
Concepts and interests in Chinese research on musicality
Regarding the psychological and sociological issues of musical abilities and skills, it has been observed that “recent developments regarding the assessment of musical ability have barely influenced the Chinese-speaking research community, despite the growing interest in research on musical development and music psychology” (Lin et al., 2021, p. 228). To explore the standardized tests that are used in the Chinese research literature, we searched for internationally published articles on musicality in mainland China and for research articles on musicality published in simplified Chinese in mainland China.
In internationally published Chinese research, musicality appears to be an aesthetic concept and is of interest in the fields of music education and politics (Wang, 2021; Wong, 2020). We also searched for research literature on the largest online academic database in mainland China, the China National Knowledge Infrastructure (CNKI, www.cnki.net), using keywords such as musical ability, musical aptitude, musical talent, musical performance, musical achievement, and musical test. Since 1996 CNKI has been the distribution channel for academic electronic resources in China with a digital publishing platform that integrates all types of academic resources including Chinese and international language academic journals, dissertations, conference papers, newspapers, patents, standards, yearbooks, tools, books, and so forth. Our search revealed that there is much discussion of musical ability in Chinese research, which has served the educational purposes of developing and measuring the musical ability of students in the 9-year period of compulsory education and identifying musically gifted students suitable for music conservatories. Other studies have offered theoretical foundations for creating guidelines for applied testing or screening exams (Ma, 2021; Sun & He, 2015).
In the field of Chinese music psychology research, diverse dimensions of musicality have been explored, and interdisciplinary research has emerged. For example, studies investigating the influence of musical ability on prosodic production and language acquisition have been published (Pei & Ding, 2013). However, in terms of standardized tests of musicality, only one systematic review was found (Jiang, 2005); this introduced several Western measurement tools for measuring music ability to the Chinese research community. Not many non-Chinese standardized measures have been translated into simplified Chinese so far. Our search found that the standardized test most often used in Chinese music education research is the Advanced Measures of Music Audiation (AMMA; Gordon, 1989), the foundations of which are currently under intensive discussion and review in Western music education (Hanson, 2019; Platz et al., 2022). While the AMMA and more recently developed tests such as the Musical Ear Test (Wallentin et al., 2010), Profile of Music Perception Skills (Law & Zentner, 2012), and the Swedish Music Discrimination Test (Ullén et al., 2014) focus on perception abilities, the Gold-MSI also includes a variety of musical behaviors that are of interest for music education and music psychology alike. The concept of musical sophistication might be culture-independent, so that the Gold-MSI could be used to help improve communication between research communities in the West and the East. Furthermore, the Gold-MSI-SC can reach a broader population and thus reduce cultural bias toward Western cultures in current music psychology research. Such a translation will certainly be beneficial for the Chinese research community, as this test represents a standardized measurement tool that can be used internationally.
Aims of the study
We aimed to validate the Gold-MSI in simplified Chinese language (Gold-MSI-SC), to examine its psychometric properties, including its factor structure, and to investigate the relationship between socioeconomic status (SES) and factors of the Gold-MSI-SC in a mainland Chinese sample. Accordingly, we posed three research questions: (1) How reliable and valid is the Gold-MSI-SC in a mainland Chinese sample? (2) Can the factor structure of the Gold-MSI be replicated in the Gold-MSI-SC using a mainland Chinese sample? (3) Are there significant correlations between SES and musical sophistication as measured by the Gold-MSI-SC?
Method
Translation and adaptation of the Gold-MSI inventory
The Gold-MSI-SC and the instructions for the Melody Memory Task and the Beat Alignment Perception Task (both parts of the original Gold-MSI test battery) were constructed following the International Test Commission (ITC) guidelines for translating and adapting tests (ITC, 2017) and the guidelines for intercultural psychological instrument development (Tran et al., 2017). The gender of those involved in the development of the translation (including translators, experts, and participants in cognitive interview) were balanced, as the guidelines for intercultural psychological instrument development (Tran et al., 2017) suggested that gender is an important factor influencing language expressions. This process covered nine steps in total (see Figure 1).

Procedure for translation and adaptation of the Gold-MSI Inventory from English into simplified Chinese.
Cross-cultural translation evaluation matrix.
Note: Language clarity: evaluate the use of words and syntax of the translated item and its back-translation, Appropriateness: determine whether the translated items are culturally appropriate in both language and meaning for the target population, Difficulty: determine whether the translated items are difficult for prospective respondents or participants to understand and to respond, Relevance: determine whether the translated items are culturally relevant to the participants’ experiences in real-life situations. Derived from Tran et al. (2017, p. 31).
Measurements
Gold-MSI self-report questionnaire
The Gold-MSI questionnaire measures self-reported musical sophistication. It consists of 38 items in total; of those, 31 statements are rated on a 7-point Likert-type scale, and seven questions use ordered response categories. Following a bi-factor structure, the Gold-MSI comprises five subscales and one General Musical Sophistication subscale. The subscale consists of 18 items from the original 38 items with the highest loadings on the general factor (Müllensiefen et al., 2014). This factor structure has been replicated by several studies with samples from different countries (Degrave & Dedonder, 2019; Lima et al., 2020; Lin et al., 2021; Schaal et al., 2014), all of which have demonstrated that the subscales are of sufficient to very good reliability (Table 2).
Overview of the Gold-MSI self-report inventory and its internal consistency (Cronbach’s α) in six language versions.
Note: n = number of items. Cronbach’s α values are derived from: en = English (Müllensiefen et al., 2014, N = 147,633), de = German (Schaal et al., 2014, N = 641), pt = Portuguese (Lima et al., 2020, N = 408), fr = French (Degrave & Dedonder, 2019, N = 750), zh-TW = Traditional Chinese (Lin et al., 2021, N = 1108), ja = Japanese (Sadakata et al., 2022, N = 689).
Gold-MSI music tests
We applied two music tests from the Gold-MSI test battery to investigate how musical skills are associated with musical behaviors. The Melody Memory Task contains 13 pairs of melodies. Each melody is between 10 and 17 notes in length and lasts 4–9 s. The participants are asked to identify whether the two consecutively played melodies demonstrate the same pitch structure despite transposition. In addition, participants indicate the confidence rating of their judgment on a three-level scale. The Beat Alignment Perception Task contains 17 music excerpts with a beeping click-track. Participants are asked to determine whether the beeps are on or off the beat and give a three-level confidence rating.
Musical-Rhythmic Intelligence questionnaire
The Musical-Rhythmic Intelligence questionnaire (M-RI) is a subscale of the Eight Multiple Intelligences Questionnaire (Chou, 2006). It contains six items to be rated on a 10-point rating scale to assess musical abilities such as composing, singing harmonies, playing, and tuning musical instruments. The ratings result in a sum score, and the subscale shows a high internal consistency (Cronbach’s α = .93). In our study, the Musical-Rhythmic Intelligence questionnaire was used to examine the convergent and discriminant validity of the Gold-MSI-SC.
SES
SES is a theoretical construct, which is usually discerned by looking at highest educational level reached, occupational prestige, and income. National educational systems may differ in their levels, making it difficult to compare degrees and statuses. In order to adjust necessary criteria appropriately to the prevailing situation in mainland China, the UNESCO Classification Guidelines (UNESCO Institute for Statistics, 2012, 2015) were supplemented by categories from the Education Overview of China (Ministry of Education of the People’s Republic of China, 2019). This resulted in the following nine categories to determine the educational level of the main income earner: unfinished primary education, primary school, lower secondary school, upper secondary school (including secondary vocational schools), vocational college (2 or 3 years of post-secondary school education), bachelor’s degree (4 or 5 years), Master’s degree (3 years), and Doctorate and higher. The occupational prestige of the main income earner was measured with the questionnaire for the occupational prestige class of (Li, 2005); this distinguished 12 categories subsequently classified in seven levels from the lowest (e.g., porter) to highest status (e.g., senior administration officials). In addition, income per household was classified in 10 stages ranging from less than 2,000 Yuan (Stage 1) to more than 32,000 Yuan (Stage 10), based on the currently available data from the Chinese National Bureau of Statistics.
Main study
The main study was conceived as an online survey using the platform SoSci Survey (Leiner, 2019), containing the Gold-MSI-SC self-report questionnaire, the Melody Memory Task, the Beat Alignment Perception Task, the Musical-Rhythmic Intelligence questionnaire (Chou, 2006), and socio-demographic questions pertaining to SES. No ethical review was required as this study was carried out as research for a Master’s thesis and closely followed the procedures of previous Gold-MSI translation studies for which no ethical concerns were raised.
After signing the consent form, participants gave information on demographic and SES variables including gender, age, current place of residence, nationality, main place of residence before the age of 18, level of education, occupational prestige, and monthly household income. Furthermore, they were asked whether they had completed, or would be completing a music degree, and if they had been or were engaged in a music-related profession. Afterwards, participants completed the Gold-MSI-SC self-report questionnaire and the Musical-Rhythmic Intelligence subscale of the Eight Multiple Intelligence Questionnaire. Finally, the music tests were presented in randomized order, starting either with the Melody Memory Task or the Beat Alignment Perception Task. It was possible to finish only one of the music tests. The whole survey took approximately 25–30 min. Upon completion of the survey, participants received feedback on their Gold-MSI-SC scores (General Musical Sophistication subscale) and their performance in the music tests (number of correct answers). Furthermore, participants were invited to participate in a retest after 14 days with the same procedure but without having to provide demographic information.
Participants
Participants were recruited through convenience sampling. The link to the survey was distributed via scholars, colleagues, and friends as well as Chinese social media and generated large interest. Initially, 128,916 participants were recruited, mainly via social media. After applying several criteria assuring data quality, we formed a main sample with a total of 64,555 participants (see Data Preparation and Analysis); of those, 2,003 participants participated in the retest. The main sample consisted of 39,843 females (61.72%), 24,240 males (37.55%) and 472 participants (0.73%) who did not identify with the previous two options or did not want to disclose their gender with an average age of 22.39 years (SD = 5.07) and a range from 13 to 56 years. Among them, 95% of the participants were aged between 13 and 32 years. As in the study by Lin et al. (2021), the sample consisted of both adolescents and adults, 19% of all participants being aged between 13 and 18 years (see Figure S1 for age distribution in supplementary material). The SES-related descriptive statistics (Table 3) showed that more than half of the participants had completed a bachelor’s degree (54.3%) or a secondary education degree (86.8%); the occupations of a large number of participants corresponded to the higher prestige class levels (39.2% for the upper class and 73.9% in the middle-upper class). Most of the participants did not have or were not in the process of taking a music degree (92.3%) nor did they have work experience in music-related fields of activity (87.26%). The sample was thus suitable for measuring musical sophistication in the general population, in terms of their musical background.
Socioeconomic variables: descriptive statistics.
Note: ¥ = Chinese Yuan.
Data preparation
The data preparation procedure was carried out in two steps using Python (Van Rossum & Drake, 2009) and R (R Core Team, 2017), respectively, as shown in Figure 2. Data pre-processing was carried out using the Python-package Pandas (McKinney, 2011). From the 128,916 initial participants, we excluded participants who had not finished the Gold-MSI-SC self-report, the Musical-Rhythmic Intelligence questionnaire, and at least one of the Gold-MSI-music tests (n = 66,192). Participants who were not of Chinese nationality (n = 583) or had missing demographic information (n = 150) were excluded.

Data-preparation process.
To further eliminate invalid cases, we calculated the response variance of every participant and the mean absolute deviation (MAD) of the response variance of the Gold-MSI-SC and then subtracted the doubled MAD from the median of the response variance. Cases (n = 202) with response variance less than this value were discarded (Leys et al., 2013). In addition, invalid cases were also identified based on demographic and socioeconomic information. Here we applied both methods of density-based spatial clustering of applications with noise (DBSCAN) and Mahalanobis distance. The overlapping outliers identified by both methods (n = 510) were also screened out. Finally, participants younger than 13 years old (n = 192) were excluded, leaving 64,555 cases in the final sample. The same procedure was applied on the retest sample, resulting in a sample of 2,003 valid cases.
Data analysis
Data analyses were performed with R using the integrated development environment RStudio (RStudio Team, 2016). The analysis consisted of four parts: (1) the factor structure of the Gold-MSI, (2) the relationship between the SES variables, age, and the Gold-MSI-SC, (3) reliability, and (4) validity.
Descriptive statistics were calculated using the moments package (Komsta & Novomestky, 2015). The factor structure of the Gold-MSI-SC was determined by comparing the four models presented by Müllensiefen et al. (2014). Structural equation modeling (SEM) was conducted using the lavaan package (Rosseel, 2012). The pattern of the missing values was examined using the R-package MissMech (Jamshidian et al., 2014), and the assumption of multivariate normality was tested using the R-package MVN (Korkmaz et al., 2014). Because the data did not meet the assumption of multivariate normality and missing values were present, we applied the full information maximum likelihood (FIML) method to estimate the parameters of the structural equation models, while reporting the robust statistics. Five indices were used to assess the model fit: RMSEA, SRMR, CFI, TLI, and BIC (Beaujean, 2014). The SES variable with three indicators (educational level, occupational prestige, and income) was defined as a formative construct and the measurement error of SES was set to zero for the purpose of model identification (Korkmaz et al., 2014). Non-significant paths (p > .05) were removed from the structural models, and only the final models were chosen for presentation. To determine the reliability of Gold-MSI-SC, we examined the internal consistency and the test–retest correlations. The psych package (Revelle, 2017) was used to extract five different indices of internal consistency, including Cronbach’s alpha (Cronbach, 1951), McDonald’s omega (Zinbarg et al., 2005), Guttman’s lambda 6 (Guttman, 1945), inter-item correlation, and mean item-total correlation. Pearson correlation coefficients were reported to assess test–retest reliability, convergent validity, and discriminant validity.
Results
Factor structure of the Gold-MSI-SC inventory
In determining the structure of the Gold-MSI, Müllensiefen et al. (2014) proposed four models differing in their factor structure: a hierarchical model with one general factor and five group factors (Model 1), a bi-factorial variant without correlation between the general factor and the five group factors (Model 2), a model with uncorrelated factors and without general factor (Model 3), and a model allowing five intercorrelated factors without general factor (Model 4). This study compared the fit of the four models by means of confirmatory factor analysis (CFA, see Table 4).
Confirmatory factor analysis: fit statistics of the four models.
BIC: Bayesian information criterion; TLI: Tucker–Lewis Index; CFI: Bentler’s comparative fit index; RMSEA: root mean square error of approximation; SRMR: standardized root mean square residual.
The chi-square values for all four models showed a highly significant deviation from an exact model fit. However, the chi-square test is known to be too sensitive with large sample sizes (Steiger, 1990). As a result, the BIC was used to compare the four models’ goodness of fit. The bi-factor model (Model 2) showed the best fit with the lowest BIC value, confirming the findings reported in previous studies (Baker et al., 2020; Degrave & Dedonder, 2019; Lima et al., 2020; Lin et al., 2021; Müllensiefen et al., 2014; Schaal et al., 2014). Moreover, the SRMR (.048) was smaller than the cut-off value of <.06, and the RMSEA (.055) was close to the cut-off value of <.05. In addition, the TLI (.861) and the CFI (.879) approached the common cut-off value of >.9. All four fit indices supported good fit between the bi-factor model and the observed data.
The factor structure of the Gold-MSI-SC is reported in Figure 3. Although the factor structure reported by Müllensiefen et al. (2014) was replicated in this study, the factor loadings differed from those of other samples (Lin et al., 2021; Müllensiefen et al., 2014).

Factor structure of the simplified Chinese Gold-MSI with standardized factor loadings.
The General Musical Sophistication subscale (FG) was initially formed of 18 items with the highest non-standardized factor loadings across all subscales (Müllensiefen et al., 2014, p. 6). Following this empirical approach, the particular items forming the general factor may vary depending on the respective data. Therefore, the structure of the general factor seems inconsistent across different studies with samples from different cultural regions (Table 5). To enhance the comparability of results between studies, the General Musical Sophistication subscale was calculated using the 18 items proposed by the original study (Müllensiefen et al., 2014).
Different subsets of 18 items with highest factor loading in Gold-MSI.
Note: The 18 items with highest factor loadings that overlapped with Müllensiefen et al. (2014) are in italics.
Reliability and internal consistency of the Gold-MSI-SC
As item analysis did not show any significant improvement in reliability according to Cronbach’s alpha, no item of the Gold-MSI-SC had to be removed, and all items were included in the analyses. All subscales of the Gold-MSI-SC including the general Musical Sophistication subscale showed a good or excellent reliability (Table 6) implying that the items measured the same construct while containing adequate unique variance (Cohen & Swerdlik, 2009). This is in line with previous studies (Baker et al., 2020; Degrave & Dedonder, 2019; Fiedler & Müllensiefen, 2017; Lima et al., 2020; Lin et al., 2021; Müllensiefen et al., 2014; Schaal et al., 2014). In addition, internal consistency for the Emotions subscale (F5) was lowest among all the subscales across replication studies in different languages (Lin et al., 2021; Schaal et al., 2014). In this study, the Emotions subscale also had the lowest internal consistency (λ6 = .79) and test–retest reliability (rtt = .795), whereas test-retest correlations for all the other subscales demonstrated good or very good reliability (rtt = .842–.935).
Gold-MSI-SC subscales (F1–F5) and General Musical Sophistication Scale (FG): summary statistics and reliability.
Gold-MSI-SC = simplified Chinese translation of Goldsmiths Musical Sophistication Index; SD = standard deviation; Min. = minimum value of the scale; Max. = maximum value of the scale; rtt = test–retest correlation; n = 2,003 for rtt of the Gold-MSI subscales.
Note: All test–retest correlations were statistically significant (p < .0001).
Convergent and discriminant validity of the Gold-MSI-SC
The correlation between the scores of the two music tests was moderate (r = .182 for the d’ scores), confirming that these tests measure different abilities (Müllensiefen et al., 2014). Table 7 shows the coefficients for the correlation between the Gold-MSI-SC self-report inventory and the two music tests from administration of both the first and the second test.
Correlations between the subscales of the Gold-MSI-SC and the Melody Memory Task, Beat Alignment Perception Task, and Musical-Rhythmic Intelligence subscale of the Eight Multiple Intelligences Questionnaire.
Note: All correlation coefficients were statistically significant (p < .0001).
Although the values of Pearson’s r were rather low (r = [.089–.365]), the higher value reached the “upper range of what is usually reported as the correlation between a paper-based self-report measure and actual perceptual or cognitive ability tests” (Müllensiefen et al., 2014, p. 9). On one hand, comparison of the statistics from the first and second tests showed that that the correlations between the Gold-MSI-SC and the Melody Memory Task in the data from the second test (r = [.120–.234]) were slightly higher than those from the first test (r = [.089–.183]). On the other hand, the correlations between the Gold-MSI-SC and the Beat Alignment Perception Task remained almost unchanged in both tests (r = [.188–.359] in the first test compared to r = [.183–.365] in the second test). The effect size of the Beat Alignment Perception Task was overall greater than the Melody Memory Task, which is in accordance with the results of Müllensiefen et al. (2014).
Furthermore, the subscales Musical Training (F3), Perceptual Abilities (F2), Singing Abilities (F4) as well as the General Musical Sophistication (FG) demonstrated stronger relationships with both music tests than the Active Engagement (F1) and Emotions (F5) subscales. These results were also in line with previous investigations (Lin et al., 2021; Müllensiefen et al., 2014) and can be seen as evidence for the convergent validity of the inventory.
In addition, the items in the subscales Emotions (F5) and Active Engagement (F1) mainly focus on emotional and motivational aspects of musical behaviors and do not necessarily address musical perception or skills. The lower correlations between those two subscales and the music tests support the discriminant validity of the Gold-MSI-SC, and they also confirm that both subscales assess different aspects of interacting with music.
Finally, convergent and discriminant validity was also observed in the correlations between the subscales of the Gold-MSI-SC and the M-RI. The M-RI measures self-reported musical abilities derived from music theory and musical training. Correlations between the M-RI and Musical Training (F3), Perceptual Abilities (F2), Singing Abilities (F4), and General Musical Sophistication (FG) were stronger than those between the M-RI and Active Engagement (F1) and Emotions (F5). This confirmed the convergent and discriminant validity of the Gold-MSI-SC.
The relationship between musical sophistication and SES
Two structural equation models were built to investigate the relationship between musical sophistication and SES. The first structural equation model showed a very good fit (see Figure 4). Age was negatively correlated with all the aspects of musical sophistication, while all SES indicators were positively influenced by age. In addition, the latent variable SES possessed significantly positive connections with all five subscales of the Gold-MSI-SC. The effect sizes were weak to medium (standardized path coefficients ranging: .128–.291). The strongest association was found between Musical Training and SES, whereas the associations between SES and the four other subscales were comparatively weaker.

Structural equation model relating SES to the five factors of the Gold-MSI-SC and age.
The second model presented in Figure 5 had a good fit (RMSEA ⩽ .001, SRMR < .001, TLI = 1.000 and CFI = 1.000). Age was negatively correlated with General Musical Sophistication; conversely SES was positively correlated with General Musical Sophistication with a weak effect (standardized path coefficient = .213). In order to ensure consistency in the estimation of SES across two formative models (Diamantopoulos et al., 2008), we applied further restrictions for model specification. The unstandardized path coefficients from Education, Occupation and Income in Figure 5 are constrained to be the same as those in Figure 4.

Structural equation model relating Age, SES, and General Musical Sophistication.
Discussion
Reliability and validity of the Gold-MSI-SC
This study presents the development and validation of the Gold-MSI-SC. Different psychometric measures (Cronbach’s α, Guttman’s λ6, McDonald’s ω, inter-item correlation, and mean item-total correlation) demonstrated good to excellent internal consistency for the individual subscales and the General Musical Sophistication scale of the Gold-MSI-SC. In terms of Cronbach’s alpha, for example, all the subscales had values above α = .80 (α = [.80–.86]), while excellent internal consistency was found for the General Musical Sophistication subscale (Cronbach’s α = .91). The examination of test-retest reliability indicated high stability of this measure over time. The test–retest correlations of all the subscales demonstrated good to very good reliability (rtt = [.842–.935]) except Emotions (rtt = .795). No item had to be removed from the Gold-MSI-SC.
The convergent and discriminant validity of the Gold-MSI-SC was demonstrated by examining the correlations between the Gold-MSI-SC with the music tests as well as the M-RI subscale of the Eight Multiple Intelligences Questionnaire. The subscales Active Engagement (F1) and Emotions (F5) showed lower correlations with both music tests as well as with the M-RI subscale. These results are not surprising, as these two subscales primarily measure motivation or response to music. They do not necessarily require the musical skills and cognitive abilities that these two music tests and the MR-I subscale are intended to measure, which can be seen as evidence for the discriminant validity of the inventory.
The other three subscales mainly assess music skills and cognitive abilities. The Perceptual Abilities (F2) subscale measures a person’s self-reported sensitivity of perception, while the items of the Musical Training (F3) and Singing Abilities (F4) subscales involve other dimensions of cognitive processes such as melodic and rhythmic memory. Correspondingly, the two music tasks require both the discrimination of two similar melodies and ability to determine whether a beeping sound was on or off the beat. As observed by Müllensiefen et al. (2014, p. 13), “the performance on both tests clearly benefits from the amount of musical training an individual has had.” Therefore, it is reasonable that the correlations were higher between the subscales mentioned above and performance on the music tests. The findings can be seen as evidence for the convergent validity of the inventory.
The original factor structure of the Gold-MSI has been repeatedly confirmed in previous replications (Degrave & Dedonder, 2019; Lima et al., 2020; Lin et al., 2021; Schaal et al., 2014). This study offers further evidence for the robustness of the factor structure of Gold-MSI by means of a CFA. This invariance of factor structure across different samples from different language regions supports the cross-cultural similarity of musical behaviors.
In sum, the results support the conclusion that the self-report inventory and the two music tests were translated and adapted effectively. The Gold-MSI-SC has appropriate reliability and validity as well as a stable factor structure. It could therefore be used to measure musical behavior and abilities in the Chinese-speaking population.
SES and musical sophistication in the Chinese general population
We investigated the relationship between the SES, age, and musical sophistication using two structural equation models. On one hand, we found that age was negatively correlated with all but one of the factors of the Gold-MSI-SC (Perceptual Abilities). On the other hand, SES was positively correlated with all the Gold-MSI-SC factors, with the strongest effects in Musical Training, Perceptual Abilities, and General Musical Sophistication (β = .291, β = .184, and β = .213, respectively). The negative association between age and musical sophistication may result from research findings on the development of musical preferences, such that young people tend to listen to music more often and in a broader variety of contexts than middle-aged people (Bonneville-Roussy et al., 2013). This effect of age was also apparent in the English, German, and Portuguese samples (Lima et al., 2020; Müllensiefen et al., 2014; Schaal et al., 2014), whereas Lin et al. (2021) reported that age does not influence musical sophistication. Such variation is also found in the relationship between SES and musical sophistication. While positive correlations have been reported in English, German, and traditional Chinese studies, Lima et al. (2020) found a negative correlation between SES and musical sophistication.
The inconsistencies seen in the relationships between SES, age, and musical sophistication cannot yet be attributed to a specific factor, as the SES variables in the different studies were measured using different indicators and the corresponding samples consisted of participants of varying ages. Notably, the potential impact of age-related differences must be considered carefully when interpreting results from samples comprising both adolescents and adults, such as the differences between their responses to questions relating to disposable income. This makes it challenging to determine if these differences stem from cultural variations, study design issues, sampling bias, or other factors. Further studies with standardized measures and more representative samples across cultures are possible solutions.
Limitations of this study
Convenience sampling did not allow us to control for the source and characteristics of the sample. A large number of participants came from the consensus Tier-1 cities in China, which are the most developed areas of the country in terms of economy, politics, and culture, namely Beijing, Shanghai, Guangzhou, and Shenzhen. This resulted in an unbalanced sample in terms of age, education, occupation, and income; respondents were predominantly young with high levels of education and occupational prestige, leading to possible limitations in the generalizability of our findings to the broader population. This convenience sample was accepted due to the rather open and somewhat exploratory approach of validating a measure with a broad sample size. In future studies, a (more) representative sample could be obtained by means of stratified sampling (Särndal et al., 2003) to enable a more focused approach and to reduce possible biases. This would help in calculating percentiles and testing norms for the five subscales and the general factor.
Furthermore, this study was administered not in controlled conditions but online, since both the initial study (Müllensiefen et al., 2014) and subsequent replications were administered as online surveys. While Benfield & Szlemko (2006) and Birnbaum (2004) state that online research is subject to limitations and biases relating to the properties of the sample, respondents’ motivation and attention, and resulting effect sizes Honing & Ladinig (2008) point out that these are not necessarily resolved by conducting the research in laboratory conditions; problems simply manifest themselves in different ways (Honing & Ladinig, 2008). We endeavored to establish that our final sample was of the highest quality possible by following a strict procedure when preparing the data. In future, it would be beneficial to consider combining online experiments with laboratory experiments as a way of addressing the potential limitations of relying solely on online research.
Further investigation of musical sophistication
In this study, replicating the factor structure of Gold-MSI-SC via CFA is a rather basic way of addressing cross-cultural invariance. Until the metric invariance of the Gold-MSI-SC is determined (Putnick & Bornstein, 2016), the scores obtained from administering it cannot be directly compared with the scores obtained from administering other language versions. This issue could be investigated in future studies using data sets from different cultures by comparing the variance and covariance matrixes of scores obtained from different samples.
Second, the investigation of the structure of musical-listening ability using the music-test batteries of the Gold-MSI has just begun. Pausch et al. (2021) explored the presence of a general factor using scores from four listening tests, the Computerized Adaptive Beat Alignment Perception Task, Musical Emotion Discrimination Task, Melodic Discrimination Task and Mistuning Perception Task (the first and third representing advances on the listening tests used here). At the time, we conducted our study, only the Melody Memory Task and the Beat Alignment Perception Task were available for our use. As further tests assessing different listening abilities are developed, researchers may include these in test batteries to investigate the factor structure of musical sophistication.
Third, in line with previous replications, it appears remarkable that the subscale Emotions (F5) showed the weakest psychometric quality. The content of the subscale could be adjusted in accordance with previous findings in future revisions of the Gold-MSI. For example, music conveys cultural stereotypes (Susino & Schubert, 2019). Thus, it might be worth elaborating on the issue of culture-specific approaches in the conceptualization of musical sophistication.
Moreover, although the original factor structure of the Gold-MSI (Müllensiefen et al., 2014) was replicated in this study, there were differences in the factor loadings of the items of the General Music Sophistication Subscale in different samples (Lin et al., 2021; Müllensiefen et al., 2014; present study), as shown in Table 4. This might indicate a cultural bias. It would be worth investigating possible differences in the factor loadings of the General Music Sophistication Subscale in future replication studies taking a cross-cultural approach. Adapting such an instrument to culture-specific practices, values, and attitudes requires more than statistical testing. A possible way of integrating culturally specific features would be to provide a set of items covering such specificities. In addition, researchers could check for variables such as personality and musical preferences to reveal a possible cultural bias and enhance the instrument’s adaptability.
Musicality, music, and musical sophistication from a cross-cultural perspective
The concept of musical sophistication can be used to contribute to the study of musicality as a biological issue, and music as a social issue (Honing, 2018), by providing a framework for describing musical engagement and the process of skill development. It could also be used to pave the way for comparing the basic cognitive traits that are required for engaging with music across cultures. The translation, replication, and validation of established standardized measures are therefore essential. The present study not only provides the broader Chinese-speaking research community with a widely established measurement instrument, but also responds to the call for increased cross-cultural awareness in music psychology research.
Supplemental Material
sj-docx-1-msx-10.1177_10298649231183264 – Supplemental material for Measuring musical sophistication in the Chinese general population: Validation and replication of the Simplified Chinese Gold-MSI
Supplemental material, sj-docx-1-msx-10.1177_10298649231183264 for Measuring musical sophistication in the Chinese general population: Validation and replication of the Simplified Chinese Gold-MSI by Jiaxin Li, Hsin-Rui Lin, Anna Wolf and Kai Lothwesen in Musicae Scientiae
Footnotes
Acknowledgements
We should like to express our sincere gratitude to all those who helped us recruit participants, greatly contributing to the success of this study.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
