Abstract
The general performance of the human cognitive–motor system tends to deteriorate from the fifth decade onward, resulting in a decline in both accuracy and speed. Nevertheless, many professional musicians continue to perform at the highest level at older ages. Although lifespan psychology and its assumption of coping strategies for handling age-related losses in cognitive–motor performance seem plausible, there is no empirical evidence for the successful application of this strategy to date. Using a within-subjects design, we investigated whether the recordings of professional musicians from earlier (before age 40) and later years (after age 65) can be allocated to the respective age segment by listening. Six pairs of early and late recordings of the same piece by the same performer were identified (instruments: piano, violin, saxophone, trombone; all five musicians aged healthily). In an online experiment, the discrimination sensitivity of N = 200 valid participants for early vs. late career phases was measured using the replicated paired A-Not A design from signal detection theory. The resulting mean sensitivity was d‘ = 0.01 (95% CI [−0.09, 0.12]), which indicates a performance at chance level. We conclude that the recordings contained no reliable cues for decreasing performance skills in older musicians and therefore suggest successful compensation for age-related deficits. For the first time, we found empirical perceptual evidence that for some musicians and pieces, early and late recordings could not be distinguished by listening in a blind test. The coping strategies for the handling of age-related losses may point to some underlying reasons for the high-level performance in old age.
For human beings, both the aging of the central nervous system and an age-related decline in sensorimotor and cognitive performance are inevitable. From a neurological perspective, two primary processes underlie this phenomenon. As Bartzokis et al. (2010) reported, the myelin content of the human brain (in particular in the frontal lobe) is characterized by a quadratic lifespan trajectory, which peaks at approximately 39 years of age and decreases with an accelerating trajectory with advancing age. Since myelin integrity and maximum motor speed are highly correlated, this development is highly relevant for potentially age-related changes in instrumental performance. Sensorimotor performance was measured by Bartzokis et al. (2010) in a standardized finger-tapping task (speed tapping across 10 s) with 72 participants ranging between 23 and 80 years of age. Myelin integrity is affected by the degree of frontal lobe white matter. Regression analyses revealed a strong, inverted U-shaped relationship between the two variables (r = .43). However, this relationship was not observed in other brain regions, such as the splenium of the corpus callosum. A second explanation for the age-related decline in cognitive–motor performance was proposed by Taki et al. (2011). Specifically, based on brain imaging of a large sample of 1460 healthy individuals aged between 20 and 69 years, the authors revealed a significant interaction between age, gender, and gray matter volume in many brain regions, with a smaller decline in gray matter volume in females than in males. This pattern of gray matter changes seems to be the fingerprint of neurotypical brain aging, as gray matter volume is strongly related to cognitive performance. This age-related change is highly relevant for instrumental performance. This short survey of findings on age-related changes in the sensorimotor system led us to explore whether professional musicians can continue to carry out their profession in old age without audible restrictions, as documented by recordings.
However, more recent research on cognitive aging presents a more nuanced picture. Based on panel data from a longitudinal study, Hanushek et al. (2025) found that the decline in cognitive skills—which is assumed by the cross-sectional standard model of ageing to begin around age 30—should be reconsidered by including interacting variables such as below- or above-average skill use, educational attainment, and gender. For example, no decline in literacy or numeracy skills up to age 60 was observed among individuals in white-collar occupations or those with tertiary education.
In our sample, professional musicians can be compared to the white-collar population, which is characterized by lifelong skill maintenance and continuous skill use. This pattern of usage may contribute to resilience against age-related cognitive decline. Thus, such specialized populations may be considered “cognitive super-agers” (Hanushek et al., 2025, p. 9).
In contrast to the overall age-related deterioration of sensorimotor control, two findings from the field of motor performance illustrate the extremely high demands on time-critical cognitive–motor performance in professional music performance. First, based on the standardized task of scale playing (C major in 16th notes at 120 BPM) on the piano, Jabusch (2006) reported that the regularity in scale performance of professional pianists (as measured by the mean standard deviation of interonset intervals) is 9.5 milliseconds on average, which means an extremely low degree of sensorimotor irregularity. These deviations in timing are so small that most listeners cannot perceive them. Second, although the grip force of the human hand is strong (approximately 30 kg for women and 50 kg for men, decreasing in both groups with age; see Hogrel, 2015; Puh, 2010; Steiber, 2016), violin playing needs only a small fraction of this force; the average bow force used in expert violin performance of a standardized task (depending on the bow-bridge distance, the respective string, and dynamics) is in the range between approximately 10 g for a slow bow speed (5 cm/s) and 30 g for a medium bow speed (15–20 cm/s; Schoonderwaldt, 2009). As Schoonderwaldt and Demoucron (2009) reported, the bow force applied in the performance of a genuine composition (e.g., a piece for violin solo by Bach) rarely exceeds the equivalent of 100 g (all bow forces converted from the physical unit Newton). In other words, violin playing requires extremely efficient and time-critical control of inhibitory processes to limit the hand force. To put it bluntly, for professional instrumental performance, the high demands for cognitive–motor control are nonnegotiable—even at an advanced age—which stands in contrast to the general progressive deterioration of the neural system's efficiency.
However, empirical research on this topic is scarce and often based on case studies of outstanding musicians. In her pioneering work on the lifespan development of professional musicians, Manturzewska (2006) surveyed a sample of 35 outstanding Polish musicians (age range: 25–81 years). The self-reported period of optimal artistic achievement in this sample was between 20 and 65 years for instrumentalists; however, the last concert was given up to age 75, with exceptions continuing concert activities up to 90 years of age. This finding confirms the modern neuroscientific view of the key role of regular musical practice in healthy mental aging (Ferreri et al., 2020; Verghese et al., 2003).
More systematic research on age-related changes in instrumental performance is available for orchestra musicians. Although this population seems to be particularly affected by the age-related decline in performance skills, we must consider that the performance situation of orchestra musicians is characterized by more constraints (e.g., selection of repertoire, tempo changes) than that of soloists and thus cannot be directly compared to that of solo performers. In a large survey conducted by Gembris and Heye (2014) of more than 2,500 orchestra musicians, participants self-evaluated the age-related decline in performance skills as 60% of the agreement until the age of 60. Fewer than 1% agreed that musicians older than 60 years can perform at the highest level. For other musicians, such as conductors who do not use an instrument, research has revealed an ambiguous picture of age-related effects. In a cross-sectional study, Jennen and Gembris (2000) reported that the tempi of 25 recordings of Mozart's opera “Don Giovanni” are correlated with age (range: 29–83 years; r = .56) and that the conductors’ tempi slowed beyond the age of 60 years. However, 21 examined recordings of “The Magic Flute” did not show this relationship (r = .13). From a within-subjects perspective, a few multiple recordings (up to three) of “Don Giovanni” confirmed this slowing-with-age tendency of tempo, whereas “The Magic Flute” was caught in the undertow of a tempo acceleration. The authors concluded that the deceleration hypothesis does not apply as a general effect but instead seems to be superimposed by other influential variables, such as traditions in performance practice or personal styles.
Finally, new light has been shed on the slowing-with-age hypothesis by another longitudinal corpus analysis based on 14,556 songs from popular music published between 1956 and 2020 conducted by Luck and Ansani (2024); for artists aged 20 to 30, the tempi of songs slightly increased, while from artists’ thirties to their eighties, the tempi decreased across their lifespan by approximately 2 beats per minute per decade. This is mainly explained by a degradation in general motor capacity (e.g., movement speed at the instrument).
Although there seems to be strong evidence for the slowing-with-age hypothesis caused by a general decline in performance skills in professional instrumentalists, this assumption is in strict contrast with an impressive number of well-known musicians who have demonstrated successful careers up to the age of 80 years (and sometimes even beyond); for example, the pianist Wilhelm Kempff (1895–1991) gave his last recital in 1981 at age 86 and then retired for health reasons (‘Wilhelm Kempff’, 2024); the piano virtuoso Arthur Rubinstein (1887–1982) retired from the stage in 1976 at age 89 after experiencing a constant loss of sight beginning from 88 years (Rubinstein, 1980, chapter 128); and the jazz-rock guitarist John McLaughlin (b. 1942) still goes on tours and performs at the highest level. This short survey of case studies raises the question of how the opposing views from neuroscience and individual biographies might be resolved.
The theoretical framework for the maintenance of high-performance levels comes from the “principle of selective optimization with compensation” (the so-called SOC model) by Baltes and Baltes (1990); the model describes the dynamic interplay between development-oriented plasticity and age-related boundaries of such plasticity (p. 21) as a general process of adaptation. In the domain of music, selection refers to a reduction in repertoire or the selection of less difficult pieces; optimization refers to the activation of reserve capacities, for example,, by increasing the amount of practice at the instrument; and compensation refers to a reduction in difficulties in challenging passages by a tempo decrease shortly before such a part, followed by a tempo increase immediately after this passage. This example of adaptive musical behavior is given by Baltes and Baltes (1990, p. 26) and exemplified by the case of the pianist Rubinstein. However, the validity of the SOC model in the domain of music is questionable because the margin for the application of compensatory strategies by variable use of tempo is small and can unexpectedly result in an idiosyncratic performance style. To the best of our knowledge, there has been no evidence of proof provided for the validity of the SOC model in professional musicians thus far.
Beyond the scope of surveys and autobiographical reports, experimental research on instrumental performance and skill maintenance at advanced ages is important but exists only to some degree (and only in cross-sectional designs). Krampe and Ericsson (1996) studied general cognitive and motor performance in piano playing and various tapping tasks of different complexity in younger vs. older (older than 60 years) and amateur vs. expert pianists. To summarize, there was no support found for the assumption that skill maintenance at the expert level is an automatic consequence of expert skills acquired during one's early 20 s. Instead, a continued high level of practice was found to be required to compensate for the normal age-related decline in cognitive–motor resources. If continued intense practice in older musicians is applied, then the performance level is comparable to that of younger musical experts. This differentiated view of age-graded decreases in fine motor movements over general factor explanations of age-related slowing of cognitive–motor performance is also supported by Krampe's (2002) review.
A very rare case of a large collection of compositions performed by an internationally renowned pianist at a late age is the unique “Magaloff corpus” (Flossmann et al., 2009, 2010). It represents Chopin's entire work for piano solo performed live from memory in six public appearances at age 77 by the pianist Nikita Magaloff on a computerized grand piano. Invaluable insights into the conditions of successful aging and skill maintenance at high age can be obtained from this unique corpus of musical instrument digital interface (MIDI) data. For example, the note error rate (insertions, deletions, or substitutions) was between 1.5% (substitutions) and 3.7% (insertions), which is only slightly above the error rates reported in similar studies with younger pianists. A comparison of Magaloff's recordings of selected pieces with those of 14 other recordings by various pianists revealed high levels of variability but no systematic tempo decrease in Magaloff's late recordings; 12 out of 18 recordings at age 77 even showed a faster tempo, and in some cases, his performance was found to be faster than the youngest of the other performers.
The instrumental performance of older musicians can also be characterized by more holistic concepts, such as “maturity,” “wisdom,” or the “tradition/school” of interpretation. However, literature in the field of music criticism presents an inconsistent picture. For instance, Cooke (1917) uses the term maturity in relation to musical child prodigies only, while the term school is applied to broad categories such as the French or Polish school of piano playing. Mach (1991) discusses age-related performance features solely within the context of the initiation of formal music training. In a content analysis by Alessandri et al. (2015), which examined music critics’ reviews of Beethoven piano sonata recordings from 1934 to 2010, descriptors such as maturity, wisdom, and the school of interpretation were not among the 12 dominant themes used in evaluating recordings. When these terms did appear, they were applied sporadically and unsystematically and did not emerge as independent or significant topics within the evaluation process, as noted in the first author's corpus analysis (E. Alessandri, personal communication, July 25, 2025).
To summarize, while musical performance skills are affected by the neurotypical processes of aging, the consequences for the level of achievement can be compensated for by long-term adaptation strategies. Continued practice seems to play a key role in this process of skill maintenance. Many studies and biographical reports have confirmed that professional musicians successfully use such compensation strategies to perform continuously at a professional level up to their eighth decade. This would be a strong argument for the assumption of a null hypothesis as the outcome of a perceptual discrimination task evaluating the playing of younger vs. older expert musicians. Thus, in this study, we take the listener's perspective and ask for the auditory distinguishability of younger and older musicians in a blind listening test.
Research Question and Hypotheses
The present study takes a perceptual perspective on the potential influences of age-related changes in professional instrumentalists and aims to determine whether listening to recordings of professional musicians can distinguish whether the musicians were younger or older at the time of each recording. In the case of successful compensation for age-related deterioration in the sensorimotor system, there should be no perceptible differences found between younger and older recordings of the same performer playing the same piece. Furthermore, we are interested in the effect of listeners’ musical experience on the ability to allocate stimuli to the musicians’ age group.
This leads to the following hypotheses being investigated in the current study:
Hypothesis 1: By listening to musical examples in a blinded manner, participants can discriminate between younger and older musicians significantly better than chance. Hypothesis 2: Musicians have a higher discrimination rate than nonmusicians do. Hypothesis 3: Musicians have a higher discrimination rate for musical examples played on their instrument than on other instruments. Hypothesis 4: The discrimination rate correlates with a person's degree of general musical sophistication.
Method
Transparency and Openness
The authors declare that there is no overlap in datasets or participants with previous work. This study was not preregistered. The details of the sample size, data exclusions, manipulations, and all measures are fully reported. All data, analysis codes, and materials are available for review. The data were analyzed using signal detection theory implemented in R (R Core Team, 2023), following the approach suggested by Düvel and Kopiez (2022). The code for data preparation and analysis is provided on the Open Science Framework: https://doi.org/10.17605/OSF.IO/Q5RZV.
Design
Although the study was not preregistered, the authors adhered to the standard phases of planning, conducting, and analyzing the data, and the process was administered online via SoSci Survey (https://www.soscisurvey.de/). To investigate the discrimination ability between two groups of stimuli, a replicated paired A-Not A design from signal detection theory (SDT; Green & Swets, 1966; Macmillan & Creelman, 2005) was used. The participants were presented with several pairs of stimuli, consisting of performances by the same musicians at younger and older ages, one at a time. They were then asked to classify each stimulus into one of two age categories (younger or older). Detailed background information regarding the design and its selection can be found in the Supplementary Online Material (section A.1).
Stimuli
Selection Criteria
We selected excerpts from the solo repertoire in classical music and jazz based on specific criteria. First, we included performances by the same musician of the same piece at both a younger age (≤ 40 years) and an older age (≥ 65 years). The age limit for younger musicians corresponds to the likely peak of recording and performance activity, as outlined by Manturzewska (1990, p. 136), whose model suggests the greatest artistic output typically occurs between ages 30 and 45, occasionally extending to age 50. Gembris (2014, p. 168) estimated the peak of professional musicianship at 40 years, with surveys by Gembris and Heye (2012, p. 2014, p. 379) identifying ages 35–39 as the most frequent peak period, and age 40 as the median. Thus, we defined “younger” as age 40 or below, assuming musicians had reached their peak performance. For “older” musicians, we used the threshold of 65 years and above, in line with the commonly used definition of old age according to the United Nations (2019, p. iii). Martin and Kliegel (2014, pp. 26–29) reviewed various old age definitions, noting thresholds between 30 and 100 years. While Orimo et al. (2006) described the 65-year threshold as conventional and suggested raising it to 75 years, we retained 65 years as the cutoff due to the difficulty of finding recordings from older musicians. Second, performances needed to be professionally recorded on CD in .wav format, not downloaded as MP3 files or from YouTube. Third, live recordings were compared with live recordings only, and similarly, studio recordings with studio recordings, to account for the extensive editing possible in studio recordings in recent years. Live recordings were preferred as they best reflect musicians’ true abilities. Fourth, we reviewed musicians’ biographies to confirm neurotypical aging, excluding cases involving illness, substance abuse, or disability. These selection criteria are further discussed in the subsequent analysis of the study results. Importantly, our findings cannot be generalized to the entire population of professional musicians, as we focus on those musicians who continue to record live or studio performances without severe health issues.
Selected Stimuli
Identifying recordings that met all the selection criteria proved unexpectedly challenging. Ultimately, six eligible stimulus pairs were identified (see Table 1; further discographic details are available in the Supplementary Online Material, Table S1). Several musicians, including Vladimir Horowitz, Simon Preston, and Alfred Cortot, were excluded due to issues such as mismatched recording types, a lack of high-quality recordings at older ages, insufficient audio quality, or health-related concerns.
List of stimuli. For more detailed information, see the Supplementary Online Material, Table S1.
Note. aThe early recording by Heifetz is a studio recording from 1935. At this early age, audio manipulation and editing were not possible, and this recording must have been recorded in one take. Therefore, this stimulus is compared to a live recording at an older age.
Determining the Specific Excerpts from the Selected Pieces
After identifying suitable pieces, short excerpts (30–60 s) representing each musician's musical and technical abilities were selected for the online study. Lecturers and professors from Hanover University of Music, Drama, and Media who specialized in the relevant instruments and styles chose passages based on these criteria, using both sheet music and recordings. The final stimuli averaged 52.3 s in length (SD = 7.29; range: 44–63 s).
Adaption of the Sound Quality and Pretests
Because audio recording quality differed between older and newer performances, the newer recordings were edited to match the sound quality of the older recordings. This adjustment ensured that the participants could not distinguish the age of the recordings based on audio features alone. Various audio processing techniques and software were used to achieve comparable audio quality within each stimulus pair. More detailed explanations of the procedure can be found in the Supplementary Online Material (section A.2). The edited stimuli were validated in two pretests to ensure that listeners could not distinguish the old from the new recordings based on the audio quality alone (for details, see the Supplementary Online Material, section A.3).
Measurements, Control Variables, and Procedure
Ethical Approval
The study followed institutional and national guidelines (Föderation Deutscher Psychologenvereinigungen, 2022; Hanover University of Music, Drama and Media, 2024) and the Declaration of Helsinki. Approval from the university ethics committee was not required, as all regulations were met. All participants gave informed consent online and could withdraw at any time without consequences.
Procedure
The online study took place from March to July 2022. The participants were provided the following information: “This study addresses the question of whether an audio recording can be used to determine whether a professional musician was older or younger at the time of recording. In this study, audio samples are to be assigned and, additionally, questionnaires, for example,, on musical experience, are to be completed.” Informed consent for data use and anonymization was obtained from the participants. To control for listening conditions, the participants were asked whether their environment was quiet or loud and whether they were located inside or outside. They also completed a module of the Headphones and Loudspeaker Test (HALT; Wycisk et al., 2022) to calibrate playback volume.
In the main part of the study, participants classified 12 stimuli as either A (younger musician) or Not A (older musician). Two additional stimuli (one from each age group) were randomly re-presented as retest items, with participants blinded to the repetition. The 14 stimuli were randomized, ensuring paired versions were not consecutive. The participants specified up to five classification criteria, indicated the age, gender, and main instrument, and completed the general factor of the Goldsmiths Musical Sophistication Index (Gold-MSI, 18 items; Müllensiefen et al., 2014; Schaal et al., 2014). The average completion time was 17.11 min (SD = 1.39, N = 200). No reimbursement was given.
Required Sample Size and Power Analysis
A preliminary power analysis after testing the first participants showed that the effect size was extremely small, resulting in very low statistical power even as the sample grew. Reliable detection of such a small effect would require an unrealistically large sample (more than 10,000 participants), which was neither feasible nor relevant for this study. Therefore, data collection continued to obtain a substantial sample, but we did not aim to demonstrate the very small effect statistically. For more details on the sample size and power analysis, see the Supplementary Online Material (section A.5)
Data Collection
To achieve a sufficient and representative sample, we first recruited a convenience sample via university mailing lists and social media, which typically yielded younger and more musically sophisticated participants (as, for example, experienced by Düvel et al., 2021). To balance the sample, additional participants were recruited through the commercial provider Moweb Research (https://www.mowebresearch.com). Before merging these subsets, we compared their demographic characteristics (see the Sample Description section).
Data Preparation, Filtering, and Description of the Sample
The SoSci Survey platform recorded 499 survey starts, with 242 participants completing the survey and consenting to data protection. These 242 cases were downloaded and filtered in Microsoft Excel 365 (see “Filtering dataset and calculating variables.xlsx” in the OSF supplement) based on the data collection conditions (assumed background noise by surroundings and used playback devices), responses to an instructed response item (Leiner, 2019), and detection of stereotypical response patterns (e.g., always choosing the same option; see Figure 1). The final dataset comprised N = 200 valid cases, 29.5% of which were from the convenience sample, and was used for all subsequent analyses (see “dataset.csv” in the OSF supplement).

Flowchart of participant selection.
Planned Statistical Analyses
To test Hypothesis 1, McNemar's test assessed whether listeners could distinguish between younger and older musicians based on sound examples better than by chance; effect size d’ and response bias c were calculated.
For Hypothesis 2, the discrimination ability (d’) was compared between musicians and nonmusicians using independent samples t-tests with Cohen's d. Defining musicianship is complex and debated (Ollen, 2006; Zhang et al., 2020; Zhang & Schubert, 2019). Based on Zhang and Schubert (2019), “strong musical identity” was defined as 10 or more years of training, and “no musical identity” as less than 6 years. In the present study, the response options from the Gold-MSI (Müllensiefen et al., 2014; Schaal et al., 2014) were used. Since the survey categories did not allow an exact distinction between 10 and 11 years, the cutoff was set between 9 and 10 years. The Gold-MSI item was also used to represent nonmusicians as those with less than six years of musical training. Additionally, Zhang et al. (2020) found that at least six years of training was a common threshold for classifying musicians in the literature, so a second cutoff was set between five and six years. To investigate Hypothesis 2, discrimination ability was compared between musicians (for scenario 1: 10 + years, scenario 2: 6 + years) and nonmusicians (< six years of musical training) by calculating independent samples t-tests.
For Hypothesis 3, analysis was limited to participants who played one of the instruments used in the stimuli. Their discrimination ability (d’) for their instrument was compared to that for other instruments using a dependent samples t-test.
For Hypothesis 4, the correlation between d’ and the Gold-MSI general factor was calculated to examine the relationship between discrimination ability and musical sophistication.
Sample Description
Age
The participants averaged 42.6 years old (SD = 16.7, range 17–75). The convenience sample (M = 26.7, SD = 9.3) was significantly younger than the Moweb sample (M = 49.3, SD = 14.5; Cohen's d = –1.71, 95% CI [–2.1, −1.4]). For comparison, the average age in Germany in 2021 was 44.7 years (Turulski, 2023). Overall, the combined sample closely matches the national average (one-sample t-test not significant).
Gender
Women were slightly underrepresented overall (45.5% female, 54.5% male). Although the participants could select a third gender option or decline to answer, no one used these options. Women were overrepresented in the convenience subsample (64.4%) but underrepresented in the Moweb subsample (37.6%). Combining the subsamples increased the gender imbalance compared with considering each separately.
Musical Sophistication
Musical sophistication was measured with the German version of the Gold-MSI general factor (Müllensiefen et al., 2014; Schaal et al., 2014), with a mean of 74.66 (SD = 18.5, range 20–120; possible value range: 18–126), corresponding to the 55th percentile of the German reference sample (Schaal et al., 2014, p. 445). The convenience sample was slightly more musically sophisticated (M = 77.9, SD = 12.0; 60th–65th percentile) than the Moweb sample (M = 73.3, SD = 20.6; 50th–55th percentile; d = 0.25, 95% CI [–0.06, 0.56]). Overall, 60% of the participants played a musical instrument, and the percentage was higher in the convenience sample (78%) than in the Moweb sample (51.1%). The most common instruments were piano (36), guitar (26), voice (9), flute (8), trumpet (7), and violin (6); all others were played by three or fewer participants.
Audio Device
Participants reported their audio device (n = 187; 13 missing): smartphone speakers (56) or headphones (55) were the most used, followed by laptop speakers (38), loudspeakers (20), monitor speakers (10), and tablet speakers (8).
Results
As reported in the Data Preparation, Filtering, and Sample Description sections, the filtering of the data and the calculation of the variables (e.g., questionnaire scores and sensitivity d’) were conducted in Microsoft Excel 365. All further analyses reported in this chapter were carried out in RStudio (RStudio Team, 2023) using R (R Core Team, 2023). The analysis script is available in the OSF supplement (file “Analysis.R”).
Discrimination Between Younger and Older Musicians (Hypothesis 1)
According to the SDT framework, frequencies a, b, c, and d, which specify response patterns to a pair of stimuli, were calculated (see Table 2; Düvel & Kopiez, 2022). From this information, the proportions of hits, false alarms (FAs), misses, and correct rejections (CRs) were derived. Summarized correct responses (hits and CRs) revealed an accurate response rate of 50.3%. Since there were two response options resulting in a chance level of 50%, this outcome indicates that response behavior was no better than chance.
Classification of response patterns and their relative frequency.
For a statistical test of whether participants exceeded the chance level, McNemar's test was used. The exact test was not significant, and the odds ratio was estimated to be OR = 0.97 (95% CI [0.82, 1.16]). Therefore, discrimination performance did not differ from chance, and Hypothesis 1 was rejected. Since null hypothesis significance testing (NHST) does not provide direct confirmation of the null hypothesis, we also applied a Bayesian version of the McNemar test to the data (see the script McNemar_in_Bayes.R in the Supplementary Online Material section). A Bayes factor of BF10 = 0.11 (resulting in a Bayes factor of BF01 = 9 supporting the null hypothesis) strongly supported the rejection of the alternative hypothesis in favor of the null hypothesis (for the terminology, see Wagenmakers et al., 2018).
Nevertheless, the effect size for discrimination performance (sensitivity d’) was calculated with d’ = 0.018 and 95% CI [−0.086, 0.121] (see first error bar in Figure 2). The histogram of d’ values shows a normal distribution (see Figure S8 in the Supplementary Online Material). Since d’ = 0 corresponds to the chance level, this confirms the result of McNemar's test. An additional measure of discrimination behavior is response bias c, which indicates whether participants tend to use one response option more often than the other. The response bias was c = 0.011 (95% CI [−0.038, 0.059]), which indicates almost no tendency toward one of the two response options. Thus, Hypothesis 1 was rejected.

Error bar plot of the sensitivity for the entire sample and different subgroups.
Comparing the Discrimination Ability of Musicians and Nonmusicians (Hypothesis 2)
Hypothesis 2 presumed that musicians would show greater discrimination ability than nonmusicians. Musicians were operationalized in the following two scenarios. In the first, participants had 10 or more years of instrumental practice (including voice training) using the respective item from the Gold-MSI (n = 51), and in the second, participants had six or more years of musical practice. In both scenarios, participants with less than 6six years of instrumental practice were classified as nonmusicians (n = 124). To examine Hypothesis 2, two independent samples t-tests of their discrimination sensitivity d’ were calculated for the two scenarios.
As a result, the nonmusicians’ mean sensitivity was d’ = −0.026 (SD = 0.708, 95% CI [−0.150, 0.099] for 124 participants with no more than five years of musical training; see Figure 2). Their performance did not differ significantly from chance level (McNemar's test resulted in p = .732). In the first scenario (10 or more years), musicians’ mean sensitivity was d’ = 0.171 (SD = 0.766, 95% CI [−0.039, 0.381] for n = 51 participants (McNemar's test resulted in p = .156). The difference between the two groups resulted in a small effect size (d = 0.271, 95% CI [−0.067, 0.612]); however, this finding was nonsignificant (p = .119). For the second scenario (6 or more years), musicians presented a mean sensitivity of d’ = 0.088 (SD = 0.812, 95% CI [−0.095, 0.270] for 76 participants (McNemar's test resulted in p = .376). Therefore, for the second scenario, the comparison of discrimination behavior between musicians and nonmusicians was nonsignificant (p = .317, d = 0.151, 95% CI [−0.136, 0.439]). Thus, Hypothesis 2 was rejected.
Stimuli Featuring Own Instrument vs. Another Instrument (Hypothesis 3)
We assumed that the acquisition of expertise might be instrument-specific and thus would show an advantage to the learned instrument. For this analysis, only participants who played one of the four instruments featured in the stimuli (piano, violin, trombone, and saxophone) were included (n = 45). For each participant, two sensitivity indices were calculated, namely, one from the stimuli that featured the participant’s own instrument and one for the remaining instruments. The mean sensitivity for each instrument was d’ = 0.027 (SD = 0.508, 95% CI [−0.121, 0.175], see Figure 2), whereas the average sensitivity for the other instruments was d’ = −0.030 (SD = 0.625, 95% CI [−0.153, 0.625]). The difference was found to be statistically negligible and nonsignificant (t(44) = 0.49, p = .625, d = 0.100 with 95% CI [−0.303, 0.502]). Therefore, Hypothesis 3 was also rejected.
Correlation Between Discrimination Ability and General Musical Sophistication (Hypothesis 4)
To employ more general criteria than the years of musical training or the instrument presented, we assumed a relationship between the participants’ general musical sophistication and their discrimination ability as measured by the sensitivity d’. However, the correlation was nonsignificant (p = .31), yielding a negligible effect size (r = .07 with 95% CI [−.07, .21]). Therefore, Hypothesis 4, which stated the presence of a connection between the degree of musical sophistication and discrimination ability, was rejected.
Test-Retest Reliability of the Allocation
As a first measure of test-retest reliability, judgment consistency was calculated as described by Comeau et al. (2017, p. 205). This process compared the responses to the identical stimuli that were presented two times and calculated the percentage of identical responses. Since, in this study, two stimuli were presented two times (one of type A, one not of type A, randomly selected), the possible values of the measure of judgment consistency were 0.50% or 100%. The mean judgment consistency was 61.5% (SD = 37.1%), which was significantly higher than chance (50%; one-sample t-test: t[199] = 4.39, d = 0.31, small effect). In the present study, the judgment consistency was found to be slightly higher than that reported in a methodologically similar study by Pausch et al. (2022, p. 50; judgment consistency: 56.2%, SD = 44.5), which used recordings by child prodigies and professional musicians for the discrimination task.
As a second measure, the correlation between the two ratings of the same stimulus was calculated. For the A-stimuli (recordings of younger musicians), the correlation between test and retest was
Influence of Audio Devices
Since the quality of audio devices can influence our evaluation of music, we asked the participants to indicate which audio devices they used for their participation. In an exploratory analysis, we examined whether groups of participants with different audio devices differed in their discrimination ability. The between-subjects one-way ANOVA was nonsignificant, indicating no difference in discrimination ability between groups (F(5, 181) = 1.476, p = .20,
Participants’ Criteria for Evaluation
After the participants had classified all stimuli as being played by either a younger or an older musician, they were asked about the subjective criteria they had used for discrimination. Five fields were offered in the survey to specify distinct criteria. For this evaluation, the spelling was corrected, and similar words were combined. A total of 132 criteria were mentioned by 193 participants; a total of 7 participants did not submit any criteria. The most common criteria and their English translations are listed in Table 3.
Most common criteria used for the classification of A-stimuli played by a younger or older musician.
Most criteria name all kinds of musical parameters, such as tempo, which is ranked first, followed by genre, instrument, and rhythm. The term “feel” is used ambiguously here; it can, on the one hand, refer to a musician's good interpretation of the piece. On the other hand, it can mean that participants relied on their subjective feelings while classifying the stimuli. Several other criteria are similar to the latter meaning, the most common being “gut feeling” (as described by the German term “Bauchgefühl”). This collection of subjective criteria remains open to future analyses.
Discussion
In this study, musical examples from different genres featuring different instruments were selected as a basis for a perceptual discrimination experiment. From the perspective of lifelong skill development, we investigated the question of age-related changes in sensorimotor skills and their auditory correlates. Each musical example had been recorded twice by the same musician, at a younger and an older age. In a listening experiment, these musical excerpts were allocated to one of the two age groups.
On the basis of the N = 200 valid cases, the participants were unable to allocate the stimuli above the chance level (Hypothesis 1 was rejected). To examine Hypothesis 2, the participants were classified as either nonmusicians (having less than 6 years of musical training) or musicians (either 6 or more years or 10 or more years of musical training). None of the groups showed an allocation performance that differed from the chance level. The comparisons of musicians and nonmusicians yielded no significant differences and only small effect sizes. Overall, we did not find the assumed differences between musicians and nonmusicians; thus, Hypothesis 2 was rejected. Hypothesis 3 assumed that the participant's familiarity with the instrument featured in the presented stimulus should have a positive influence on their discrimination performance. However, this indicator of musical expertise also resulted in nonsignificant differences. Listening examples based on the familiar instrument did not outperform those based on other instruments. Hypothesis 4 outlined the positive relationship between general musical sophistication and discrimination performance that we expected to find since the presented task should have benefited from general knowledge of age-related effects in music performance in all musical genres. This hypothesis could not be confirmed, and the two variables were not found to be significantly correlated.
The strength of empirical investigations is also influenced by the participants’ judgment stability. In our study, we used test-retest reliability, which was, however, determined by only two randomly selected items. The resulting correlation between the two ratings of the same stimulus showed a small effect (for the recordings of both age groups:
In this study, we employed a discrimination paradigm. This choice does not preclude the future use of rating paradigms or qualitative methods to describe perceived differences in performances. Additionally, analyzing the low-level psychoacoustic features of the recordings could be an alternative approach. However, this approach would necessitate longer recording excerpts and greater standardization of audio quality than is feasible for a discrimination task. We prioritized the fundamental research question of whether perceivable differences exist before exploring the underlying mechanisms and auditory cues. Essentially, we refrained from examining potential parameter differences between old and young recordings until discriminability was established. Future studies should consider diverse methodological approaches.
Another issue arises from using low-level acoustical features in performance descriptions, as provided by software such as MIRtoolbox (Lartillot, 2021; Lartillot et al., 2008). Although the software currently offers approximately 300 acoustical features, the perceptual relevance of these features remains uncertain, and their relationship to human auditory information processing is unvalidated. For instance, MIRtoolbox uses over 20 timbre descriptors for musical genre classification, whereas psychological research identifies fewer than 5 as relevant for timbre perception (Siedenburg et al., 2016). The only descriptor with evident perceptual relevance for classifying early versus late performances is musical tempo. Our tempo analysis of early versus late recordings in the sample revealed a varied pattern of tempo changes: Some recordings demonstrated an increased tempo, others exhibited a decreased tempo, and some maintained a consistent tempo (see the Supplemental Online Material, Figure A.4).
Our results support the SOC model of successful aging (Baltes & Baltes, 1989, p. 1990); furthermore, for the sample of selected musicians, we suppose a successful allocation of optimization strategies. However, while we cannot exclude that the musicians in our sample might have used compensation strategies (e.g., ritardando before difficult passages), these strategies at least remained unnoticed by the participants as indicators of age-specific adaptation strategies.
Several critical points remain unresolved. First, we cannot rule out selection bias in the availability of recordings. For instance, when musicians like Richter and Heifetz chose to record at an advanced age, it may have been due to their overall good health, continued ability, and confidence to perform at an outstanding level. They might have also opted for less virtuosic, though still challenging, pieces. Second, modern digital recording techniques, such as post-editing or mastering, may have played a role in masking any age-related decline, affecting both studio and live recordings. Third, by using audio-only stimuli rather than audio-visual materials, the performative aspect is largely overlooked. This aspect can significantly impact listeners in a concert setting (Platz & Kopiez, 2012, 2013), so we cannot draw conclusions about evaluative processes under audiovisual conditions. Fourth, we do not claim that the five selected performers and six selected pieces are representative. For instance, we lack information on their criteria for repertoire selection, strategies for compensating age-related decline, and recording preparation strategies. As some of these questions remain unanswered, the absence of this background information should inform the careful interpretation and generalization of the findings.
Limitations and Future Perspectives
A critical limitation of the current study is the number and availability of musical stimuli for the listening tests; the process of tracing adequate recordings sometimes resembled detective activity. Artificial intelligence and large language models such as Qwen (Alibaba Cloud, 2024), ChatGPT (ChatGPT, 2024), or Llama (Llama, 2024) could be of great help in such cases but were unavailable at the time of study conception (beginning of 2022). However, such explorative approaches have shown promising results. Such approaches would not only help to extend the typical focus on classical instruments such as piano or violin to wind instruments or plucked instruments but also open the search to the performers of popular music. However, the problem of self-selection would remain; that is, musicians who drop out of the music business for health reasons would not have their performance skills documented. Another limitation is the use of a discrimination paradigm; since we were interested in the fundamental question of the discriminability of younger and older performers, we focused on listeners’ sensitivity. However, this approach could be complemented by using a rating of expressive qualities to obtain more insight into changes in interpretive concepts. We are aware that we only investigated one end of the scale (internationally outstanding artists); thus, we cannot conclude that there are age-related changes in performance skills in the population of average professional musicians. For this group of musicians, finding recordings will be much more challenging. A deliberate initiative to encourage the rerecording of works at a young age could be one approach. Finally, our study can serve as a starting point for a more systematic investigation of aging processes in professional musicians.
Supplemental Material
sj-docx-1-mns-10.1177_20592043251400124 - Supplemental material for No Perceptual Discrimination Between Healthy Musicians’ Early and Late Recordings of Selected Musical Pieces
Supplemental material, sj-docx-1-mns-10.1177_20592043251400124 for No Perceptual Discrimination Between Healthy Musicians’ Early and Late Recordings of Selected Musical Pieces by Nina Düvel, Patricia Trzciński and Reinhard Kopiez in Music & Science
Footnotes
Action Editor
Sebastian Klotz, Humboldt-Universität zu Berlin, Institut für Musikwissenschaft und Medienwissenschaft.
Peer Review
Eckart Altenmüller, University of Music, Drama, and Media Hannover, Institute of Music Physiology and Musicians’ Medicine. Peter Keller, Aarhus University, Center for Music in the Brain, Department of Clinical Medicine.
Author Contribution Statement
Following the Contributor Role Taxonomy (CRediT), the author's contributions were as follows: conceptualization and methodology: PT, ND, and RK; investigation: PT; data curation, formal analysis, and data visualization: PT and ND; supervision and validation: ND and RK; writing (original draft, review, editing, revision work): ND and RK; resources and project administration: RK and ND.
Consent for Publication
Informed consent for publication was provided by the participants.
Data Availability Statement
The dataset used for this manuscript have not been published elsewhere. The authors adhere to the APA Style Journal Article Reporting Standards (https://apastyle.apa.org/jars). The musical stimuli, data, and analysis script are available on the Open Science Framework at
. The dataset is published under a CC BY license, along with this article, and can therefore be reused in the future. The supplementary material (tables and figures) has been published on the journal's website, along with this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Approval
The main study was carried out in accordance with relevant institutional and national guidelines and regulations (Föderation Deutscher Psychologenvereinigungen, 2022; Hanover University of Music, Drama and Media, 2024) and with the principles outlined in the Declaration of Helsinki. Formal approval of the study by the ethics committee of the Hanover University of Music, Drama and Media (2024) was not mandatory, as the study adhered to all the required regulations.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Informed Consent to Participate
Informed consent was provided by all participants by checking a box in the online survey. The participants had the option to withdraw from the study at any time without negative consequences.
License
This work is licensed under a Creative Commons Attribution-Non Commercial-No Derivatives 4.0 International License (CC BY-NC-ND 4.0;
). This license permits copying and redistributing the work in any medium or format for noncommercial use provided that the original authors and source are credited and that a link to the license is included in the attribution. No derivative works are permitted under this license.
Original use of Data
The authors declare that there is no overlap with any prior published work.
This study was conducted as a master's thesis by the second author (PT):
Trzciński, P. (2022). In alten wie in jungen Jahren? Eine empirische Studie zur Unterscheidbarkeit von frühen und späten Aufnahmen professioneller Musiker*innen [In old as in young age? An empirical study on the distinguishability of early and late recordings of professional musicians] [master's thesis]. Hanover University of Music, Drama and Media, Germany.
A subset of findings from this article was presented at the following conferences:
Kopiez, R., Düvel, N., & Trczinski, P. (2024, July 3–6). Is it possible to distinguish between early and late recordings of professional musicians? [Paper presentation]. 12th Triennial Conference for the Cognitive Sciences of Music, York, UK. Düvel, N., & Kopiez, N. (2024, February 23–24). In alten wie in jungen Jahren? Keine Unterscheidbarkeit von frühen und späten Aufnahmen professioneller Musiker*innen [In old as in young age? No distinguishability between early and late recordings by professional musicians] [Poster presentation]. 22nd Symposium of the German Society for Music Physiology and Musicians’ Medicine, Hannover, Germany. Düvel, N., Kopiez, R., & Trzciński, P. (2023, October 9–11). In alten wie in jungen Jahren? Keine Unterscheidbarkeit von frühen und späten Aufnahmen professioneller Musiker*innen [In old as in young age? No distinguishability between early and late recordings by professional musicians] [Paper presentation]. 39th Annual Meeting of the German Society for Music Psychology, Hannover, Germany. Düvel, N., & Kopiez, R. (2023, March 9). In alten wie in jungen Jahren? Keine Unterscheidbarkeit von frühen und späten Aufnahmen professioneller Musiker*innen [In old as in young age? No distinguishability between early and late recordings by professional musicians] [Paper presentation]. Annual meeting of the Specialist Group Systematic Musicology of the Society of Music Research, Trossingen, Germany.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
