Abstract
This study presents a Japanese translation of the Goldsmiths Musical Sophistication Index (Gold-MSI). The index consists of 38 self-report questions and provides a general sophistication score as well as subscale scores for Active Engagement, Perceptual Abilities, Musical Training, Singing Abilities, and Emotions. The validation of the translation with 689 native Japanese speakers indicated excellent internal consistency and test–retest reliability. Confirmatory factor analysis revealed that the bifactor model structure formulated by the original study of Gold-MSI is maintained reasonably in our data. The strengths of the Gold-MSI self-report inventory are (1) it offers a multifaceted view of musical sophistication, (2) a subset of five subscales can be used to measure different aspects of musical sophistication independently, and (3) the ease of administration as it is a self-report questionnaire. In view of the fact that this inventory and its translations increasingly contribute to research on musical expertise, skills, and abilities, having a Japanese translation may enhance future research in these areas even further.
An important predictor variable in much of music research is an individual’s musicality, a complex concept that is associated with various aspects of musical abilities, such as musical skills, aptitude, and musicianship. Traditionally, researchers have adopted various criteria to approximate this complex construct. For example, many studies determined individuals’ musicianship by using the extent of their musical training or asking if they had been admitted to a music conservatory (e.g., Kolinsky et al., 2009). Although these estimates have been helpful, their thresholds and criteria vary considerably among laboratories, making it challenging to compare data on musical expertise across studies. For example, while many studies use 10 years as a cut-off point of musicianship (e.g., Nazari et al., 2018; Rammsayer & Altenmüller, 2006; Wolpert, 2000), many others may consider their participants musicians if they have 4–5 years of experience (e.g., Aksentijevic et al., 2014; Jimenez & Kuusi, 2018; Panagiotidi & Samartzi, 2013; Sadakata & Sekiyama, 2011; Schiavio & Timmers, 2016; Weijkamp & Sadakata, 2017; Yu et al., 2016) and, notably, some studies included 4, 5, and 6 years of experience as an upper cut-off point for defining non-musicians (Collier & Collier, 2007; Hansen et al., 2013; Madison & Merker, 2005; Olsen et al., 2018; Radvansky et al., 1995; Repp et al., 2012; Wolpert, 2000). Therefore, depending on the study, the same person with about 4–5 years of musical experience could be classified as a musician or non-musician. Besides, conceptualizing musicality with reference to the number of years of training fails to capture the full scope of musical experiences in which people engage. An increasingly common way to assess musicality is to use batteries of musical listening tests, such as the Profile of Music Perception Skills (Law & Zentner, 2012), the Musical Ear Test (Wallentin et al., 2010), and the Swedish Musical Discrimination test (Ullén et al., 2014), to name but a few. Listening tests commonly address different musical skills such as the perception of pitch differences, pitch contours, melody, chord progression, and rhythm. While listening tests provide more objective measures of musical abilities, the number of specific skills they can measure is usually limited and their administration might require specific material or a local or online testing platform. In contrast, self-report questionnaires can cover a wider range of different aspects of expertise (e.g., not only listening abilities) and are easier to administer.
Although existing batteries consider various dimensions of musical perceptual and performance skills, the question remains whether these measures genuinely reflect the individual’s musical aptitude or musicality. Musicality is a concept whose multifaceted nature (cf. Hallam, 2010, 2016) makes it difficult to offer a simple definition. Gembris (1997) describes three different phases in the history of research on musical ability. The initial investigation of musical ability started in the 1800s, mainly focusing on phenomenological aspects of musical listening and performance (e.g., Michaelis et al. as cited in Gembris, 1997). In a second phase during the 20th century, research focused on the development of psychometric measures of musical abilities in the form of test batteries for musical aptitude or talent (e.g., Galton, 1883; Gordon, 1967; Seashore et al., 1960; Wing, 1962). A more recent approach in the late 1980s focused on the cognitive understanding of musical meaning (Sloboda, 1993; Swanwick & Tillman, 1986). However, the discussion of what constitutes musicality continues to the present day. How accurately one can recognize a melody or a rhythm is undoubtedly a part of this concept, but recent views are increasingly incorporating a broader range of skills. For example, Chin and Rickard (2010) advocate taking into account musical engagement aside from specialized skills in performing and perceiving music, which they define as involving “advanced or frequent listening experiences, a commitment to music activities, or a high level of participation in a particular type of musical activity (e.g., social or emotional)” (p. 198). In accordance with such suggestions, more measurement instruments have started to include assessments of various aspects of individual differences in musicality. The Goldsmiths Musical Sophistication Index (Gold-MSI) is one of the most popular (Müllensiefen et al., 2014). Müllensiefen and colleagues coined the term musical sophistication, a more neutral and inclusive term that sheds light on observable behaviors, compared to other terms that relate more to an individual’s musical potential, such as talent and aptitude. The index consists of 38 self-report questions grouped into the following five subscales: Active Engagement, Perceptual Abilities, Musical Training, Singing Abilities, and Emotions. This set of questions was refined from 153 statements that the authors had extracted from existing studies through a rigorous selection and evaluation procedure. The index was validated by a large number of mainly English-speaking participants and it has demonstrated good internal reliability for each of the five subscales and the general sophistication index, high test–retest reliability, and reliable correlation with a wide range of objective listening ability tests (e.g., Gelding et al., 2021; Harrison, Collins, & Müllensiefen, 2017; Harrison & Müllensiefen, 2018; Larrouy et al., 2019; Lee & Müllensiefen, 2020; MacGregor & Müllensiefen, 2019; Müllensiefen et al., 2014)
The Gold-MSI has been widely used since it was introduced in 2014 and has been translated into several languages, such as Traditional Chinese (Lin et al., 2021), Simplified Chinese (Li et al., 2021), Portuguese (Lima et al., 2020), German (Schaal et al., 2014), and French (Degrave & Dedonder, 2019). Evaluations of these translations show high internal consistency (Degrave & Dedonder, 2019; Lima et al., 2020; Lin et al., 2021; Schaal et al., 2014) and high test–retest reliability (Lima et al., 2020; Lin et al., 2021). In general, the validation data collected by various studies show a satisfactory-to-good fit with the bifactor model structure suggested by the original Gold-MSI (Degrave & Dedonder, 2019; Lima et al., 2020; Lin et al., 2021; Schaal et al., 2014). These demonstrate that the structure and the set of questions put forward to capture musical sophistication by the original Gold-MSI hold for other cultures and languages as well.
There are some batteries developed for testing individuals’ musical abilities (i.e., perceptual and performance skills and interpretation of musical pieces) in Japanese, such as Onkenshiki Youji no Ongaku Tekisei Tesuto (Onken Musical Aptitude Test for Young Children, Onken style, Ongaku Shinri Kenkyu Sho, 1969), Takenshiki Ongaku Soshitsu Tesuto (Taken Musical Aptitude Test, Motegi et al., 1959), and New Musical Aptitude Test (Ogawa, Murao, & Mang, 2008). It is interesting to note that many of these batteries are designed to measure the musical abilities of children. Although not many non-Japanese standardized measures have been translated into Japanese so far, a few are available, such as the Bentley Measure of Musical Abilities (Furuichi & Umemoto, 1975), a battery measuring children’s musical abilities. For measuring music-related traits in adults, the Profile of Music Perception Skills (Law & Zentner, 2012) and the Harvard Beat Assessment Test (focusing more on rhythmic aspects, Fuji & Schlaug, 2013) are both available with Japanese instructions. Furthermore, it is also common that individual studies (e.g., Satoh et al., 2011; Zhang et al., 2020) adopt existing perceptual tests with specially translated Japanese instructions, such as the Montreal Battery of Evaluation of Amusia (Peretz et al., 2003), the Musical Aptitude Profile (Gordon, 1967, 1995), the Musical Ear Test (Wallentin et al., 2010), and the Seashore Measure of Musical Talents (Seashore et al., 1960).
Given the current situation, having a Japanese version of the Gold-MSI would facilitate future work within and across research communities. For self-report questionnaires such as the Gold-MSI, a validated translation of instructions and question items is crucial. Thus, the goal of the current study was to translate and validate the Japanese version of the Gold-MSI, hereafter Gold-MSI-J. We report the evaluation of internal reliability, test–retest reliability, inter-factor correlations, confirmatory factor analyses for model comparisons, as well as correlations between the Gold-MSI-J and important demographic variables. We discuss similarities and differences between the validation results of our version of the Gold-MSI and the original English version, as well as other translated versions.
Translation
Two native Japanese speakers fluent in English and familiar with neuropsychological questionnaires created two independent Japanese translations of the Gold-MSI. The two translators then combined both translated questionnaires into one version based on their individual translations. Another two independent Japanese native speakers checked the translations and adjusted their fluency and readability. Finally, one bilingual speaker of English and Japanese translated the Japanese questionnaire back into English again. Differences between the back translation and the original Gold-MSI were resolved unanimously. Whenever necessary, translations were adjusted in such a way as to aim to capture conceptual meaning rather than the literal expression of the question. The Gold-MSI-J can be downloaded from https://shiny.gold-msi.org/.
Methods
Participants
A total of 718 participants took part in the online survey. Among them, we discarded the data of 14 participants because they did not want to be included in the study, 8 because they filled in the questionnaire more than once (excluding the invited retest subgroup), 5 because Japanese was not their native language, and 2 because they gave significantly inconsistent answers to the same questions phrased differently to catch cursory responses (difference greater than 4 scale points). 1 This resulted in a final sample of 689 participants (female = 282, male = 396, other = 5, undisclosed gender = 6). While many of our participants were under 29 years old, we tried to recruit other age groups as much as possible to increase diversity and create an appropriate sample. Instead of the exact age, participants gave categorical responses (the coding of the age group “under 20” as 1, “20–29” as 2, “30–39” as 3, etc.) (mean category age M = 2.58). The age distribution and the distribution of the expected highest educational qualifications are illustrated in Figure 1. Because educational qualifications were given as optional questions, not all participants responded to these (see Tables 4 and 5 for the sample size). A question regarding socioeconomic status was asked to a subset of participants (n = 148, see Table 5 for details).

Distribution of age and expected highest educational qualifications.
Retest participants
From among the 689 participants, 23 took the survey again, approximately 3 weeks after their first response (mean duration M = 21.1 days, SD = 1.75). All retest participants were university students (undergraduate degree) and aged between 20 and 29 years old at the time of testing.
Data collection
Data collection took place during different periods from October 2018 to May 2021. Participants were recruited through institutional and social networks. Multiple web survey platforms were used to collect the data (Google Forms n = 85, Qualtrics n = 456, CrowdWorks n = 148). A total of 150 participants who took part in the survey via CrowdWorks received a small monetary compensation for their contributions (220 yen = approximately 1.7 euros). Due to a technical error, 239 participants answered the question “Ability to Accompany Novel Tune” twice instead of “Judge Others’ Singing Ability.” The answers to the missing question were estimated by two methods: Full-Information-Maximum-Likelihood (FIML) estimation provided by lavaan package (v0.6-9; Rossel, 2012) for fitting the models, and multiple imputation (10 imputations using all other variables) for computing the other statistics that are reported in Tables 1, 2, and 4 using IBM SPSS (v.28.0.1.0). The research was approved by the ethics committee of the Faculty of Humanities of the University of Amsterdam.
Summary statistics and indicators of reliable analyses for the Japanese version of the Gold-MSI subscales and general Musical Sophistication scale.
Note: SD: standard deviation; theoretical max.: theoretical scale maximum; Sk.: skewness; Kur.: kurtosis, rtt: test–retest correlation; FG: General Musical Sophistication.
Inter-factor correlations between the five subscales of the Gold-MSI (Pearson).
Note: Reported correlations are all highly significant (Bonferroni corrected p < .001).
Summary results of confirmatory factor analyses comparing four different structural models.
Note: AIC: Akaike information criterion, BIC: Bayesian information criterion, CFI: Bentler’s comparative fit index, TLI: Tucker–Lewis index, RMSEA: root mean square error of approximation, SRMR: standardized root mean square residual. R:lavaan package (Rosseel, 2012).
Associations between the Gold-MSI-J and age, gender, socioeconomic status, and education.
Note: n varies for different items because not all participants provided corresponding answers (no answers or other). The analysis of gender excluded 11 responses that were either “other” (n = 5) or “do not want to share” (n = 6). Coding used for gender: females = 0 (n = 282), males = 1 (n = 396). SES: socioeconomic status (income), Education: highest education qualification expected.
p < .05, **p < .01 (Bonferroni corrected).
Results
Table 1 shows the summary statistics and reliability indicators of the Gold-MSI-J subscales and general musical sophistication factor. The Shapiro–Wilk test confirmed that the scores for each subscale and the general musical sophistication factor were normally distributed. In general, measures of internal consistency and test–retest reliability were high, indicating a strong internal consistency. The test–retest correlation for the subscale Emotions was somewhat lower but still within the acceptable range, which mirrors the results of existing versions of Gold-MSI (e.g., Lin et al., 2021). The inter-factor correlations are shown in Table 2. The order of low to high correlation coefficients in Table 2 is comparable to that of German (Schaal et al., 2014) and French (Degrave & Dedonder, 2019) translations of the Gold-MSI.
Subsequently, we investigated whether the bifactor model accounts best for the Japanese questionnaire data, just as in the original study (Müllensiefen et al., 2014) and the Traditional Chinese translation of the Gold-MSI (Lin et al., 2021). Here, four models are compared using confirmatory factor analyses (lavaan package, v0.6-9; Rossel, 2012). Model 1 is a hierarchical model that assumes an influence of the general factor on the five factors as well as the influence of these five factors on individual items associated with each. Model 2 is a bifactor model that assumes that each item is influenced by both the general factor and one of the five factors. Models 3 and 4 do not assume a general factor but define five factors, each associated with individual items. While Model 3 does not allow factors to correlate, Model 4 allows for factor intercorrelations. Table 3 summarizes the model fit of each model.
Both Akaike information criterion (AIC) and Bayesian information criterion (BIC) indicated that the bifactor model showed the best fit among the compared four. Root mean square error of approximation (RMSEA) and standardized root mean square residual (SRMR) were somewhat higher than in previous studies but still indicate a reasonable fit. Bentler’s comparative fit index (CFI) and Tucker–Lewis index (TLI) were relatively low. But, as Lin et al. (2021) argue, it is hard to achieve over .90 for a model with many factors with each having many associated items, and indeed, the Chinese version of the Gold-MSI shows similar CFI and TLI values. Overall, our results confirmed that the bifactor structure suggested by the original Gold-MSI outperformed alternative models. Figure 2 presents the factor structure and standardized parameters of the Gold-MSI-J of the bifactor model (Model 2). Please note that the general factor of the original Gold-MSI consists of 18 items while we included all 38 items, because our 18 items with the highest standardized factor loading did not match the English version (such discrepancy is commonly present in other translations of Gold-MSIs). Because all loadings were significant, we decided to keep all of them in the model.

The bifactor model and standardized parameters, applied to the validation data of the Gold-MSI-J.
Table 4 presents the summary of the association between demographic information, socioeconomic status, and Gold-MSI-J scores. We used Spearman’s ρ, a non-parametric correlation measure, to measure associations between our discrete variables. The age of participants was significantly positively correlated with Perceptual Abilities. Mann–Whitney U tests indicated that three subscales of the Gold-MSI-J scores were significantly different between males and females, Perceptual Training, Musical Training, and Singing Abilities. In all cases, females had significantly higher average scores than males (see also Table 5). There were no significant correlations between socioeconomic status and Gold-MSI-J scores. However, because the number of individuals who provided information about their socioeconomic status was rather small, further validation would be desirable for drawing conclusions based on more evidence.
Breakdown of statistics for the five subscale scores and general sophistication scores for age, gender, socioeconomic status, and education.
Note: SD: standard deviation; SES: socioeconomic status.
million yen = about 7,400 euros, participants from CrowdWorks answered the SES questions.
Discussion
This article presents the development and evaluation of the Gold-MSI-J. After a careful translation process, we collected valid responses from 689 Japanese native speakers. The model comparisons confirmed that the internal bifactor structure of the original questionnaire is maintained in the Japanese Gold-MSI. Our results also replicated high internal consistency (McDonald’s ω, Cronbach’s α, Gutmann’s λ6, and inter-factor correlation) and test–retest reliability of the subscales of the questionnaire.
While the relative score structure is reliable and comparable with the original and other translated versions of the Gold-MSI, the comparison of absolute scores seems to show some cultural differences. For example, the original and Traditional Chinese versions demonstrate a higher average score on the five subscales and the general musical sophistication scale than those for the German, Portuguese, and French versions (see Supplemental Table 1). Absolute average scores of the subscales of Gold-MSI-J tend to be on the lower side as compared to other translated versions. It is hard to identify the reasons for such subscale score variability among translated versions because it may stem from nuances in translations but also from some cultural characteristics or differences in the sample composition. With regard to the Japanese results, the modest scores may be partially related to the characteristics of Japanese culture that value modesty and show relatively low self-enhancing motivations (e.g., Heine & Renshaw, 2002; Markus & Kitayama, 1991). Administering objective musical perception and performance tests is an excellent way to identify whether musical abilities are indeed lower in samples from the general Japanese population than in comparable samples from other countries or other factors are contributing to the observed empirical differences. In any case, as Lin et al. (2021) rightly state, direct comparisons of self-report absolute scores across different countries and cultures may not be appropriate due to such potential cross-cultural differences.
The ways we think about musical sophistication are changing continuously. Around the world, there is an increasing awareness that defining musicality in terms of a traditionally used single scale such as years of instrumental training falls short of representing the manifold ways in which people engage with music. It is therefore imperative that researchers around the globe are able to use psychometrically valid tools to measure musical sophistication in its full breadth, not least to encourage cross-cultural studies and to facilitate the comparison of research outputs beyond national and linguistic boundaries. Translating, adapting, and further validating already existing test batteries and self-report inventories, as we have done with the Gold-MSI-J, provide an important step toward reaching these goals.
Supplemental Material
sj-docx-1-msx-10.1177_10298649221110089 – Supplemental material for The Japanese translation of the Gold-MSI: Adaptation and validation of the self-report questionnaire of musical sophistication
Supplemental material, sj-docx-1-msx-10.1177_10298649221110089 for The Japanese translation of the Gold-MSI: Adaptation and validation of the self-report questionnaire of musical sophistication by Makiko Sadakata, Yasumasa Yamaguchi, Chie Ohsawa, Masaki Matsubara, Hiroko Terasawa, Andres von Schnehen, Daniel Müllensiefen and Kaoru Sekiyama in Musicae Scientiae
Footnotes
Acknowledgements
We are grateful to Atsuko Takashima and Maki Suzuki for their support in the translation process. We are also grateful to the people who contributed to the data collection.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
