Sage Journals: Discover world-class research

Abstract

For people with profound hearing loss, a cochlear implant (CI) is able to provide access to sounds that support speech perception. With current technology, most CI users obtain very good speech understanding in quiet listening environments. However, many CI users still struggle when listening to music. Efforts have been made to preprocess music for CI users and improve their music enjoyment. This work investigates potential modifications of instrumental music to make it more accessible for CI users. For this purpose, we used two datasets with varying complexity and containing individual tracks of instrumental music. The first dataset contained trios and it was newly created and synthesized for this study. The second dataset contained orchestral music with a large number of instruments. Bilateral CI users and normal hearing listeners were asked to remix the multitracks grouped into melody, bass, accompaniment, and percussion. Remixes could be performed in the amplitude, spatial, and spectral domains. Results showed that CI users preferred tracks being panned toward the right side, especially the percussion component. When CI users were grouped into frequent or occasional music listeners, significant differences in remixing preferences in all domains were observed.

Keywords

cochlear implant music preprocessing remixing preference panning spectral filtering

Introduction

Cochlear implants (CIs) have been shown to be very successful in providing access to sounds that support speech perception, especially in quiet, for people suffering from profound sensorineural hearing loss. CIs stimulate the auditory nerve through electric pulses delivered by electrodes placed in the cochlea, bypassing the damaged hair cells. Today, many CI users are able to understand speech in favorable acoustic scenarios (e.g. Krueger et al., 2008) but face major problems in situations with background noise, reverberation, or multiple speakers, the so-called cocktail party scenario (Cherry et al., 1953). Moreover, CI users report difficulties when listening to music (McDermott, 2004; Limb & Roy, 2014), especially classical music (Gfeller et al., 2003). Classical music refers here to Western art music commonly utilizing polyphony. Music consisting of multiple instruments simultaneously resembles a cocktail party scenario, in which it is especially difficult for CI users to follow any of the sound sources. This work investigates methods of making instrumental music more accessible to CI users.

Especially for tasks related to melody and timbre perception, the vast majority of CI users perform substantially worse than normal hearing (NH) listeners. On the other hand, tasks involving rhythm perception are performed equally well by the two groups (Limb & Roy, 2014). Despite the limited music perception CI users obtain, they can still enjoy music. However, postlingually deafened CI users rate their music enjoyment lower than before their deafness (Migirov et al., 2009; Gfeller et al., 2000) and than their NH peers (Veekmans et al., 2009). Even though it has been shown that there are no strong correlations between perceptual acuity and music enjoyment for CI users (Wright & Uchanski, 2012), still, the self-reported perceived sound quality is correlated with self-reported music enjoyment (Lassaletta et al., 2007). Previous work showed that CI users tend to prefer pop music to classical music (Gfeller et al., 2003). This result can be explained by the simpler and often repeated rhythmic patterns in pop music, which are easier for CI users to recognize. In addition, in the same work, it was shown that NH listeners rated classical pieces higher than CI users. Moreover, it has been reported that CI users prefer music with fewer instruments (Kohlberg et al., 2015), and for this reason, increasing the number of instruments, such as in orchestral music, reduces music enjoyment. Following these insights, two datasets were evaluated in our study, one containing few instruments (such as trios with additional percussion) and one containing a full orchestral ensemble to represent both ends of the spectrum.

Signal processing techniques have been proposed to modify music and make it more enjoyable for CI users (Nogueira et al., 2019; Kohlberg et al., 2015; Buyens et al., 2014; Pons et al., 2016; Nagathil et al., 2017; Gajecki and Nogueira, 2018; Tahmasebi et al., 2020; Gauer et al., 2022). These techniques include the use of filters in the spectral domain, the separation of instruments in the spatial domain, or the emphasis on specific music elements in the amplitude domain.

In the spatial domain, it has been shown that bilateral CI users obtain improved speech understanding when noise and speech are spatially separated (e.g. Van Hoesel et al., 2003). Probably, bilateral CI users use interaural level differences (ILDs) to obtain this benefit, as the perception of interaural time differences with current CI technology is severely limited (Aronoff et al., 2010). In music-related experiments, Vannson et al. (2015) investigated the perception of dichotic, diotic, and monaural presentation of piano pieces for bilateral CI users. A dichotic presentation was rated as clearer than a diotic presentation, and a diotic presentation was rated as clearer than a monaural presentation. Buechner et al. (2020) showed that bilateral CI users prefer stereo over mono music for both direct coupling and free field sound presentation. This is likely due to the perception of ILDs, which are present in spatially separated music. Inspired by previous studies, we hypothesized that CI users would prefer instrumental classical music remixes with greater spatial separation of musical components than NH listeners in both datasets when participants can pan musical components (melody, bass, accompaniment, and percussion) separately. Additionally, we hypothesized that CI users prefer a remix with an asymmetric panning, as many CI users have an asymmetric speech understanding performance (Mosnier et al., 2009). Since music perception remains a difficult task for CI users, we assume that more exposure to music would impact its perception (Joshua et al., 2010). We assume that CI users who listen frequently to music may have an enhanced ability to distinguish different components in music enabling them to distribute across different spatial locations. Therefore, we hypothesized that CI users frequently listening to music would prefer a greater spatial separation in comparison to CI users who only listen occasionally to music.

In the amplitude domain, it may be also beneficial for CI users to emphasize the amplitude of specific music elements to improve the clarity and consequently their music enjoyment. Numerous studies have focused on pop western singing music. Buyens et al. (2014) showed that whereas CI users preferred a $6$ -dB enhancement for the vocals, unsurprisingly, NH listeners preferred no enhancement with respect to the background instruments, as the remix was originally aimed at an NH audience. Moreover, CI users preferred $6$ -dB enhancement over $12$ -dB enhancement, likely to maintain perception of the background instruments. Other studies corroborated these findings (Pons et al., 2016; Gajecki and Nogueira, 2018; Tahmasebi et al., 2020). Buyens et al. (2014) also showed that CI users prefer an enhancement of bass and drums in pop western music. Additionally, Buyens et al. (2015) showed that separating percussive sounds from harmonic sounds and augmenting the percussion component increased music enjoyment. Using instrumental classical music, Nagathil et al. (2017) presented CI users with a $3$ -dB and $5$ -dB melody-to-instrument ratio. CI users did not prefer an enhancement of melody on a population level. Still, seven out of 14 participants rated the $3$ -dB condition equally or higher than the original condition. Therefore the original condition was also not significantly preferred over the $3$ -dB condition. Also, an enhancement of other musical components, for example, percussion, was not tested in this study. Following these studies, we hypothesized that CI users would have a different remixing preference than NH listeners for both datasets when the gain of musical components (melody, bass, accompaniment, and percussion) could be modified separately. In particular, we hypothesized that the gain for melody and percussion would be increased. For the particular case of percussion, we wanted to assess if CI users preferred enhanced percussive elements in instrumental classical music. Those were present as part of a piano in arrangements with fewer instruments or as cymbals or timpani in full ensembles. The artificial percussion was also included to have the same ontology as in the studies regarding pop music. We also assume that further increased exposure to music can lead to CI users perceiving music differently, therefore, we hypothesized that remixing preferences of frequently and occasionally music-listening CI users were different.

In the spectral domain, current CI technology is limited by the low number of electrode contacts (12 to 22 depending on the manufacturer) and the interaction between channels caused by the spread of current in the highly conductive perilymph of the cochlea. For this reason, a detailed spectral information about instrumental music is distorted when transmitted through the CI. Spectral mixing techniques are commonly used in the music industry to emphasize certain elements of music, for example, the main melody or the singing voice (e.g. Ronen et al., 2015). Such spectral mixing techniques can not only emphasize certain frequency regions but can also be interpreted as a reduction in spectral complexity, for example, when low-pass filtering background instruments, result in fewer frequencies being present at the same time. Nagathil et al. (2017) performed spectral complexity reduction through principal component analysis (PCA) on sliding-window constant-Q transformed instrumental classical music. Participants preferred retaining only eight or 13 of these PCA components over the original version. This approach, however, reduced the information transmitted through the CI by removing PCA components. Regarding instrumental classical music, this method was applied to the whole excerpt, not considering that the underlying musical components (such as melody) might be of dissimilar importance for CI users, which is known to be the case for pop music. Tahmasebi et al. (2023) explored a different approach to spectral reduction specifically for CI users by reducing the number of bands selected in the sound coding strategy, thereby improving music enjoyment for vocal pop music. Hwa et al. (2021) used a median split in frequency to create a treble and bass component of each music excerpt. Participants adjusted the gain of the treble or bass component. For classical music, the average bass boost was 1 dB higher than the treble boost. It is unclear how well this result can be generalized given that only one excerpt per participant was evaluated. In this work, we investigate methods for remixing music using less complex and therefore easy-to-implement spectral techniques, namely low-pass and high-pass filters. We hypothesized that CI users would prefer different remixing than NH listeners in both datasets when musical components (melody, bass, accompaniment, and percussion) could be filtered separately. Specifically, we expected CI users, similar to in the amplitude domain, would make the melody or percussion more salient with respect to the accompaniment and bass. For this purpose, CI users could remove overlapping frequency content from the melody, accompaniment, or bass by a high-pass or low-pass filter applied to each component. Furthermore, we hypothesized that CI users who frequently listen to music preferred a different spectral remixing than their occasional listening peers.

In instrumental classical music, the recordings of the individual tracks are usually not available. If CI users have individual remixing preferences for instrumental music different than for NH listeners, source separation algorithms could be used to remix the music. In the present study, we investigated the remixing preferences of bilateral CI users and NH listeners for instrumental classical music using two datasets with different complexity due to the number of instruments involved. One of the datasets was newly created for this study. Different remixing modes were applied to each of the musical components. Furthermore, we evaluated differences in remixing preferences for both datasets and we investigated the influence of the CI users’ daily music listening time on their remixing preferences. Given the huge complexity and variability in instrumental classical music, we decided to simplify the problem by selecting and simplifying excerpts as well as creating well-controlled conditions. The idea of the experimental design was to maximize potential remixing effects by CI users.

Methods

Participants

Demographics of the CI participants (mean age: 60 $\pm$ 13 years) are shown in Table 1. All 10 participants were postlingually deafened bilateral CI users with Nucleus implants (Cochlear Ltd, Sydney, Australia). They were recruited from the database of the Hannover Medical School (MHH). Additional exclusion criteria were residual hearing and mental disorders. Participants took part voluntarily and no compensation was given, except for travel expenses. The study was approved by the ethics committee of MHH with identification number 8874_BO_K_2020. All participants signed a consent form and filled in a questionnaire about their music listening habits. The NH listeners (10 participants, age = 28 $\pm$ 3 years) were paid for participation and self-reported affirmed their NH status. The confounding variable of the age difference between the two groups cannot be neglected, as not only sound perception change with age, but also the musical style preferences may be different across generations. Also, the different musical backgrounds, typical for NH and CI groups, have to be considered before interpreting the results: While NH listeners answered that they were listening to music 2.5 h each day (ranging from 1 to 5 h), CI users only listened to 1.2 h of music each day (ranging from 0 to 5 h). All participants were tested in three experiments consisting of remixing music in spatial, amplitude, and spectral domains. The order of the experiments was randomized for each participant. CI06 could not perform the spectral remixing experiment, because she could not perceive differences in that domain.

Table 1.

CI Participants’ Demographics.

Participant’s ID	Sex	Age (years)	Years since last implantation	Self-estimated daily music listening time (min)
CI01	Male	70	13	20
CI02	Male	66	1	30
CI03	Female	66	10	30
CI04	Female	26	2	180
CI05	Female	55	4	30
CI06	Female	60	5	0
CI07	Female	63	4	60
CI08	Female	65	10	30
CI09	Female	53	2	300
CI10	Male	74	6	60

CI: cochlear implant; ID: identifier.

Datasets

For this study, a newly created dataset of quartets and an orchestral dataset were used. The datasets differed in the number of instruments to cover a broad range of classic instrumental music. For both datasets the tracks for each instrument were available. The multitracks were arranged to create four musical components: melody, bass, accompaniment, and percussion. While this separation was inspired by research on pop music, it was expanded in the current study to western instrumental classical music. In general, the percussion component fulfills different roles across these music genres, that is, repeating the rhythm and beat of a song and thereby giving a temporal basic structure for dances in pop music and a more supporting role of emphasizing and highlighting harmony in classical music. Still, the investigation of remixing the percussive elements in western classic instrumental music seemed to be important as it is one of the few components still well perceived by most CI users, as it mainly consists of temporal information. Note that in the remixing in the spectral domain and amplitude domain, it was possible to completely remove the percussion component and thereby obtain a more natural remix. To normalize these tracks, ReplayGain was used (Robinson, 2013), which is a technical standard to normalize the perceived loudness in music. It uses a psychoacoustic model to approximate the perceived loudness of an average listener through a gain that applied to a track would lead to consistent loudness across tracks. It has been used in previous experiments to balance the level of individual tracks for remixing music for CI users (Buyens et al., 2014; Pons et al., 2016; Gajecki and Nogueira, 2018). In this study, the software Foobar2000 (Pawlowski et al., 2022) was used to calculate the ReplayGain. A corresponding gain was then applied to each track.

Trio+ Dataset

This dataset was created and synthesized for this study. The Trio+ dataset consisted of audio stems obtained using MIDI files from the “classical archives,” including different eras and composers to potentially increase generalization. We selected pieces with clear melody, accompaniment, and bass. Moreover, pieces should have a clear bass distinct from the other components to allow for spectral remixing. Furthermore, pieces had no silence of more than 0.5 s for a single instrument and each instrument played the same voice or component during the 30 s excerpt duration. This ensured a consistent remixing preference during the whole excerpt. Only pieces containing a piano performing a percussive and an accompaniment role were selected. Different parts of the Piano Trio in E minor (Op. 90) by Antonin Dvorak, the Piano Trio No.1 in G by Claude Debussy, the Piano Trio No. 1 in D minor by Felix Mendelssohn, and Sonata in G major for two flutes and basso continuo by Johann Sebastian Bach were selected. Originally, these selected MIDI files consisted of trios with a clear melody, accompaniment, and bass. The MIDI files were synthesized in excerpts of 30 s using samples of instruments from the library Contact (Native Instruments, Berlin, Germany). To separate the accompaniment function of the piano from its percussion function, we additionally added a percussion track, which is grounded in the hammers of the piano, to be able to separately emphasize percussion. A percussion component was created based on the onsets of the piano track, emphasizing the inherent percussion of the piano and doubling its main accents and rhythm. This percussive component was synthesized using a sound sample of a hammer in a dampened piano. The percussion component was loudness balanced to the other instruments by an expert mixer, as the ReplayGain was not conceived for balancing transient or percussive components. With this extra track, an enhancement of the percussion was possible. During the synthesis, the assignment of instruments differed from the original piece to contain instruments from different instrument families within each piece. While this results in rarely occurring instrument arrangements, more instrument families could be regarded in this study. Furthermore, each excerpt was synthesized with two arrangements, as shown in Table 2. The percussive components in both arrangements differed by the material that the hammers hit against. The instruments assigned to melody and bass were swapped across arrangements to dampen instrument-specific effects. As this was not possible for accompaniment and percussion, different instruments were used across arrangements. Because the sample library was already recorded in stereo, no further panning was performed on this dataset.

Table 2.

The Selected Instruments for Both Arrangements in the Trio+ Dataset for Each Musical Component (Melody, Bass, Accompaniment, and Percussion). For the Dampened Piano, the Dampening Material is Noted in Brackets.

Arrangement	Melody	Bass	Accompaniment	Percussion
1	Clarinet	Cello	Piano	Dampened piano (leather)
2	Cello	Bassclarinet	Harpsichord	Dampened piano (felt)

The synthesis of MIDI files sounds less natural than a live recording, however, none of the participants complained about the naturalness of the music during the experiments. The dataset can be found at: www.zenodo.org/record/7966531 (doi: 10.5281/zenodo.7966531).

Orchestral Dataset

The orchestral dataset was composed of live recordings of a full symphony orchestra in an anechoic chamber (Patynen et al., 2008). Each instrument was recorded separately. It contained Anton Bruckner’s Symphony No. 8, Gustav Mahler’s Symphony No. 1, and the Donna Elvira of the opera Don Giovanni of Wolfgang Amadeus Mozart. In the latter, only vocal-free parts were considered. Excerpts were selected based on the clarity and presence of melody, bass, and accompaniment. The multitracks were arranged into melody, bass, accompaniment, or percussion components depending on their music role. If excerpts did not contain any percussion component, a percussion component was created and added to the piece based on timpani and cymbals. Each instrument’s track was panned to simulate the location of the instruments in a typical orchestral arrangement with an American seating plan (Meyer et al., 2009).

Experimental Setup

Participants were asked to remix musical components of a classical excerpt to produce their most pleasant listening experience. The experiments were performed in a double-walled sound booth with low reverberation. Music excerpts were presented through a personal computer (PC) connected to a universal serial bus audio interface (“Mobile Pre,” M-Audio, Cumberland, USA). An overview of the experimental setup is shown in Figure 1. The sound was sent either to the CI users’ sound processors through a TV audio streamer (Cochlear Ltd), which enabled stereo streaming, or through headphones (“DT 770 Pro 80 Ohm,” Beyerdynamic, Heilbronn, Germany) for NH listeners. The sound volume setting in the PC was fixed for CI users and the volume setting of the TV audio streamer was adjusted to produce a pleasantly sound sensation. For NH listeners, the volume setting of the PC was changed. For this, a specific sound excerpt was presented and the volume setting was changed until the participant reported a pleasant sound sensation. For both groups, a pleasant sound sensation is defined as a level of 6 on a 10-point loudness scale ranging from 1 (very soft) to 10 (very loud).

Figure 1.

Experimental setup. A graphical user interface based on MT5 was used with which a participant was able to modify a musical excerpt by interacting with sliders. The audio was streamed via headphones for NH listeners or via the TV audio streamer to the CIs. NH: normal hearing; CI: cochlear implant.

The presentation and remixing of the excerpts were conducted through the software “MT5” (Buffa et al., 2015), which was used in a previous experiment of our group (Pons et al., 2016). MT5 allows an online change in the audio stream by presenting a slider for each of the four musical components (melody, bass, accompaniment, and percussion). In each of the three sub-experiments, these sliders controlled either the level (dubbed “amplitude domain”), the panning (dubbed “spatial domain”), or the frequency content (dubbed “spectral domain”) of each track. The order of the sub-experiments was randomized across participants. Each sub-experiment was performed first using the Trio+ and then using the orchestral dataset. There was no visual cue as to which underlying musical component was changed by the movement of a slider. The matching of sliders and musical components was randomized for each excerpt so that a specific slider would not always modify, for example, the melody. A random offset was added to the range represented by the slider, ranging from zero to two steps to the left or to the right to compensate for any tendency to put the sliders in a specific pattern. The Trio+ dataset was presented first, containing a training part of four excerpts and a test part with 18 excerpts, which comprised the two arrangements each with nine pieces. In the training part, the task was explained to the participant as well as the randomized sliders and slider position and the mode of remixing. After the training, the test part started and the excerpts were presented in a random order. Participants were asked to press the “next” button, once they reached their most preferred remixing. They could take as much time as needed. This procedure was then repeated for the orchestral dataset, using four excerpts in the training set and nine in the test set. This procedure was repeated for each remixing domain.

Spatial Domain

Each musical component could be panned from $- 90^{\circ}$ to $+ 90^{\circ}$ azimuth with a resolution of 18^∘ using the sine-panning-law, expressed by equations (1) and (2):

S_{s_{R}} = S_{m} \cdot \sin (\frac{a π}{2})

(1)

S_{s_{L}} = S_{m} \cdot \sin (\frac{(1 - a) π}{2})

(2)

where

S_{s_{R}}

S_{s_{L}}

are the right or left channel of a stereo signal,

S_{m}

is a mono signal and

a = (p / 180^{\circ}) + 0.5

with

p

being the panning from

- 90^{\circ}

90^{\circ}

. To produce the mono signals from the originally stereo-synthesized excerpts, equation (3) was used:

S_{m} = \frac{S_{s_{R}} + S_{s_{L}}}{2}

(3)

To characterize the spatial remixing, two measures were used: the bias, being the mean of the remixed angle of all four musical components of a single excerpt, and the spread, being the standard deviation of all four musical components of a single excerpt. Here, the average bias and spread for a dataset of a participant were considered for further analysis.

Amplitude Domain

For each musical component, the gain could be modified by $- 12$ dB to $+ 6$ dB with a resolution of 2 dB. The gain for bass, accompaniment, and percussion subtracted from the gain for melody were evaluated. These were averaged across all excerpts in a dataset for each participant.

Spectral Domain

For each of the four musical components, a spectral filter was applied with a varying cutoff frequency. At the center of each slider, an all-pass filter was implemented. Moving the slider to the left resulted in a low-pass filter and moving the slider to the right resulted in a high-pass filter (both second-order filters, 12 dB/octave roll-off). Each step on the slider corresponded to halving or doubling the cutoff frequency of a high- or low-pass filter, resulting in the following frequencies: 109, 219, 438, 875, 1750, 3500, 7000, or 14,000 Hz.

Statistical Analysis

For all three domains, the random offset was removed and all slider positions for a specific musical component for each dataset were averaged for each participant. First, it was evaluated whether NH listeners and CI users remixed a dataset differently in any of the remixing domains. Group differences for NH listeners and CI users were evaluated separately for each dataset using a Mann–Whitney $U$ test with a significance level of $α = 0.05$ . A Bonferroni–Holm correction was performed to compensate for multiple comparisons. Secondly, it was investigated whether there were differences in remixing preferences across datasets.

The Trio+ dataset was additionally split into two arrangements. For each group, a separate Friedman test ( $α = 0.05$ ) was applied to check for dataset-dependent effects for each musical component or measure (bias and spread) and remixing domain.

Lastly, the self-estimated music listening times of CI users were used to investigate whether frequent music listeners had different remixing preferences than occasional music listeners.

The CI users were grouped based on their self-reported estimated daily music listening time because more exposure and training with music is associated with better music enjoyment (Looi et al., 2012). CI users with a daily music listening time of 1 h or more were regarded as “frequent listeners” and the others as “occasional listeners.” This separation resulted in two groups of four and six CI users, respectively. Due to the unequal group sizes, IBM SPSS Statistics (Version 28, Armonk, USA) was used to perform a generalized linear mixed model (GLMM) analysis of the influence of music listening time and dataset (fixed factors) on the remixing. As random effects, participants and songs were chosen. For the amplitude domain, relative gains of bass, percussion, and accompaniment were analyzed. For the spectral domain, the cutoff frequency of each musical component was analyzed. For the spatial domain, the spread and bias features were analyzed. Factors were considered significant if their $p$ -value was $< 0.05$ .

Results

Remixing Preferences of NH Listeners and CI users

Spatial Domain

The results of the remixing experiment in the spatial domain are presented in Figure 2. A Mann–Whitney $U$ test indicated that in the Trio+ dataset, CI users panned the percussion component more toward the right than NH listeners with a mean of ${11.3}^{\circ}$ and $- {1.8}^{\circ}$ , respectively, which was not significant after Bonferroni Holm correction ( $U = 23$ , $p = 0.043 / p * = 0.172$ ). The same was observed in the orchestral dataset, where CI users panned it significantly more toward the right than NH listeners, with a mean of $- {6.4}^{\circ}$ and $13^{\circ}$ , respectively ( $U = 12.50$ , $p = 0.003 / p * = 0.012$ ). While this was the only significant difference, the median CI users’ responses for any musical component were further toward the right side than the NH listeners’ ones. Therefore, the spatial bias measure was evaluated to investigate whether this was a general effect. Additionally, the spatial spread measure was evaluated to check for differences in the overall range chosen by the groups. The two measures are shown in Figure 3. A Mann–Whitney $U$ test indicated that in the Trio+ dataset, CI users had a spatial bias significantly further toward the right than NH listeners with a mean of ${8.2}^{\circ}$ and $- {1.9}^{\circ}$ , respectively ( $U = 21$ , $p = 0.029$ ). A similar indication was found in the orchestral dataset, with a mean of ${9.7}^{\circ}$ and ${0.3}^{\circ}$ , respectively ( $U = 15$ , $p = 0.007$ ). For the spread measure, no significant difference between the NH and CI groups was observed, neither for the Trio+ nor the orchestral dataset.

Figure 2.

Spatial remixing preferences for the musical components for NH and CI users with the Trio+ dataset (left panel) and the orchestral dataset (right panel). The box denotes the 25 $%$ and 75 $%$ quartiles, crosses represent the individual results, whiskers correspond with up to 1.5 times the interquartile range, a horizontal line represents the median, and circles indicate outliers. Significant differences between groups are marked with an asterisk. NH: normal hearing; CI: cochlear implant.

Figure 3.

Spatial remixing preferences for the measures bias and spread for NH and CI users with the Trio+ dataset (left panel) and the orchestral dataset (right panel). The box denotes the 25 $%$ and 75 $%$ quartiles, crosses represent the individual results, whiskers correspond with up to 1.5 times the interquartile range, horizontal line represents the median, and circles indicate outliers. Significant differences between groups are marked with an asterisk. NH: normal hearing; CI: cochlear implant.

Amplitude Domain

The amplitude remixing preferences are presented as the gain in amplitude of each music component (bass, accompaniment, and percussion) relative to the amplitude of the melody component in dB in Figure 4. In the Trio+ dataset, a Mann–Whitney $U$ test indicated a significantly lower relative gain for the percussion component for NH listeners than for CI users with a mean of $- 4.8$ and $0.64$ dB, respectively ( $U = 13$ , $p = 0.004 / p * = 0.012$ ). For the orchestral dataset, there were no significant differences between groups.

Figure 4.

Amplitude remixing preferences for NH and CI users with the Trio+ dataset (left panel) and the orchestral dataset (right panel). The box denotes the 25 $%$ and 75 $%$ quartiles, crosses represent the individual results, whiskers correspond with up to 1.5 times the interquartile range, a horizontal line represents the median, and circles indicate outliers. Significant differences between groups are marked with an asterisk. NH: normal hearing; CI: cochlear implant.

Spectral Domain

The spectral remixing preferences are presented as the cutoff frequencies in Hz in Figure 5. In the Trio+ dataset, there was no significant difference between groups. In the orchestral dataset, a Mann–Whitney $U$ test indicated that NH listeners filtered more high-frequencies of the bass component than CI users with a low-pass filter with a cutoff frequency of $11.7$ kHz and a high-pass filter with a cutoff frequency of $25$ Hz, respectively, which was not significant after Bonferroni Holm correction ( $U = 16$ , $p = 0.017 / p * = 0.068$ ).

Figure 5.

Group comparison of NH (red) and CI users (blue) for remixing cutoff frequencies in the spectral domain for the Trio+ dataset (left) and orchestral dataset (right). Each data point corresponds to the average spectral cutoff frequency of each subject across all excerpts. The box denotes the 25 $%$ and 75 $%$ quartiles, crosses represent the individual results, whiskers correspond with up to 1.5 times the interquartile range, a horizontal line represents the median, and circles indicate outliers. Significant differences between groups are marked with an asterisk. NH: normal hearing; CI: cochlear implant.

Remixing Preferences Across Datasets

Spatial Domain

Neither the spatial bias measure nor the spatial spread measure was significantly different across the Trio+ Arrangements 1 and 2 and the orchestral dataset for NH listeners and CI users.

Amplitude Domain

There was no significant difference in remixing for the CI group across the datasets orchestral, Trio+ Arrangements 1 and 2 (A1 and A2), for any of the musical components. A Friedman test indicated that for the NH group, the accompaniment was remixed significantly differently across the datasets, where the average relative gain in the orchestral dataset was 0.71 dB, in A1 $- 0.1$ dB, and in A2 $- 3.4$ dB ( $χ^{2} (2) = 11.13$ , $p = 0.004 / p * = 0.016$ ). For the percussion component, a Friedman test indicated also a significantly different average relative gain across datasets, where the average relative gain in the orchestral dataset was 0.7 dB, in A1 $- 4.8$ dB, and in A2: $- 5.0$ dB ( $χ^{2} (2) = 15.44$ , $p < 0.001 / p * < 0.001$ ). The other musical components showed no significant difference across datasets.

Spectral Domain

There was no significant difference in cutoff frequency selection for the CI group across datasets for any of the musical components. For NH listeners, a Friedman test indicated that only bass was remixed significantly differently across datasets ( $χ^{2} (2) = 9.28$ , $p = 0.01 / p * = 0.04$ ) for the orchestral dataset where a low-pass filter was applied with a cutoff frequency of 11,760 Hz and a low-pass filter for A1 and A2 with cutoff frequencies of 21,000 Hz and 9800 Hz, respectively.

Influence of Daily Music Listening Time

The results of the remixing experiment after a separation of the CI users into occasional and frequent music listeners based on their reported daily music listening time can be seen in Figure 6.

Spatial Domain

GLMM indicated that in the CI group, frequent music listeners preferred a significantly higher spatial spread than the occasional listeners, with ${48.9}^{\circ}$ and ${32.8}^{\circ}$ , respectively ( $F = 4.429$ , $p = 0.036$ ). There was no significant effect of music listening time on the spatial bias measure.

Amplitude Domain

GLMM indicated that in the CI group, frequent music listeners preferred a significantly higher relative gain for the bass component than occasional listeners with 1.24 and $- 0.25$ dB, respectively ( $F = 5.712$ , $p, = 0.018 / p * = 0.036$ ). GLMM indicated that also the relative percussion gain was higher for frequent listeners than for occasional listeners with 2 and $- 0.23$ dB, respectively ( $F = 5.923$ , $p = 0.016 / p * = 0.048$ ). No significant effect of music listening time on relative accompaniment gain was observed.

Spectral Domain

GLMM indicated that in the CI group, occasional music listeners removed more low frequencies than frequent listeners, which was not significant after Bonferroni Holm correction, applying on average an 89 Hz cutoff high-pass filter and a 16,520 Hz cutoff low-pass filter, respectively ( $F = 5.302$ , $p = 0.022 / p * = 0.088$ ).

Discussion

This study investigated if CI users have different remixing preferences than NH subjects when listening to instrumental classical music via streaming or headphones, respectively. For the amplitude domain, we hypothesized that CI users would increase the gain of melody and percussion, for the spectral domain we hypothesized the use of filters to reduce non-relevant spectral components, and for the spatial domain, we hypothesized that CI users would remix musical components further apart than NH listeners.

Spatial Domain

CI users showed a remixing preference with a significant bias toward the right side for the orchestral dataset and the Trio+ dataset. NH listeners on the other hand showed no preference toward either side. For CI users, all music components contributed to this effect with a median spatial remixing above $0^{\circ}$ , but only the percussion component showed a significant bias. This may be related to the fact that most participants reported preferring their right CI for listening to music (eight out of 10). However, two subjects preferred their left CI to listen to music but also showed a spatial remixing bias toward the right. Further experiments with subjects preferring their left CI are needed to confirm the bias toward the preferred side. Contrary to our expectation, there was no significant difference in remixing preference regarding the broadness, as measured through spatial spread, of musical components across NH listeners and CI users. However, the current study showed that CI users listening more frequently to music preferred a significantly broader distribution of instruments across space in contrast to occasional listeners. Probably, a broader spatial distribution of instruments led to a clearer perception of the musical components of these subjects. This result may be related to the findings of Vannson et al. (2015), where CI users rated a dichotic song as clearer than a diotic or monaural version of the same song, and the findings of Buechner et al. (2020), where CI users reported higher music enjoyment for stereo than mono music. The results of the current study showed that it is important to individualize spatial remixing preferences.

Amplitude Domain

There was no significant preference for the assumed increased gain for melody in either dataset. Our findings are in line with the result of Nagathil et al. (2017) where a 3 dB or 5 dB gain for the instrumental leading voice was not preferred by CI users. However, these results contrast with the preference by CI users for elevated vocals in pop music (Pons et al., 2016; Buyens et al., 2014; Tahmasebi et al., 2020; Gajecki and Nogueira, 2018). The differences between instrumental classical music and pop music might be explained by the strong salience of vocals in pop music, while the melody voice in classical music is often obscured by the surrounding instruments of the accompaniment. It is also possible that CI users increase the level of the vocals in pop music mainly because of the lyrics and not because of their melodic function.

Figure 6.
Group comparisons of occasional music listening cochlear implant (CI) users (light blue) and frequent music listening CI users (purple) for the bias and spread measure in the spatial domain (top left), for the relative gain in the amplitude domain (top right) and spectral domain (bottom). Each datapoint corresponds to the average remix value across all excerpts of a participant. The box denotes the 25 $%$ and 75 $%$ quartiles, crosses represent the individual results, whiskers correspond with up to 1.5 times the interquartile range, a horizontal line represents the median, and circles indicate outliers. Significant differences between groups are marked with an asterisk.

Another difference between the current study and previous ones that found a preference for more gain for vocals is that in previous studies the stimulation was provided monaurally through a single CI, while the current study and the study of Nagathil et al. (2017) provided bilateral stimulation. It has been shown that bilateral CI users obtain better music enjoyment than unilateral CI users (Veekmans et al., 2009).

In our study, CI users did not significantly increase the gain for the percussive element relative to the other musical components in either dataset. However, for the Trio+ dataset, CI users set the percussive gain on average 5 dB higher than NH listeners. NH listeners described the percussive component as unpleasant and reduced the gain by 5 dB. The authors argue that for CI users, the percussive component probably helped to follow the rhythm and tempo of the music while not causing an unpleasant sound sensation. Nonetheless, this result could also be explained by CI users being unable to perceive the unpleasantness of the sound itself without providing additional benefits. However, this has only been tested in the context of trios containing a piano and artificially emphasizing its onsets. The result is nonetheless supported by the results of Buyens et al. (2014), who observed that CI users preferred boosted percussion for pop music. For the orchestral dataset, percussion only occurred sparsely and was not useful in indicating the rhythm or tempo of the music as it did for the Trio+ dataset. This may explain why CI users did not enhance the percussion component in the orchestral dataset, when compared to NH listeners. These results are consistent with the findings of Hwa et al. (2021) that showed a trend toward a slightly increased level for percussion. For the remaining musical components, namely bass and accompaniment, there was no significant difference in remixing preferences across groups in either dataset. Tested CI users who frequently listen to music showed significant differences in the relative gains applied to the bass and the percussion. These results are consistent with the findings of Buyens et al. (2014), where enhanced bass and percussion were observed when using the harmonic percussion source separation approach to remix the music. There was no difference in remixing preference for the accompaniment component across music-listening groups.

Spectral Domain

Contrary to our expectations, no significant difference between NH and CI users was observed for the Trio+ dataset. Subjects preferred an all-pass filter, that is, no filtering was applied to any musical component. CI users reported that low- or high-pass filtering resulted in less full sound. In contrast to our approach, Nagathil et al. (2017) applied signal processing based on PCA to reduce the spectral complexity while keeping the harmonic structure of the music. In their study, CI users obtained a benefit from this processing. Hwa et al. (2021) reported that CI subjects preferred spectral processing of music consisting of bass and treble gains of 5 and 4 dB, respectively, resulting in a slight bass boost. For the orchestral dataset, NH listeners preferred low-pass filter to the bass component. No preference for spectral remixing was observed for CI users. For this particular dataset, the bass component contained a single instrument that after loudness equalization presented a high-pitched noise due to the recording. It is likely that NH listeners were able to perceive this noise and removed it through the low-pass filter, while CI users were unable to perceive it. While this does not directly influence better music remixing, it still underlines that artifacts in music affect NH listeners and CI users differently. When looking at the subgroups, the results of the study show that tested occasional CI music listeners set higher cutoff frequencies for the percussive component than frequent CI music listeners, even if it failed to reach statistical significance after multiple comparisons correction.

Further research is needed to investigate, how strong the confounding variables, that is, the difference in age and musical background across groups affect the results. A follow-up experiment should investigate how age difference is related to music style preference, sound perception in general, and its relation to music remix preferences. NH listeners tend to play music instruments or sing more often than CI users (Migirov et al., 2009), this may explain differences in remixing across groups. More exposure to music may allow them to capture more detailed differences across tracks and music in general.

To create a practical experimental paradigm some constraints to the selection of pieces have been made, for example, an instrument does not switch across musical components during an excerpt, all musical components are present every time, and instruments are playing steadily loud. These constraints need to be considered before generalizing to instrumental classical music. For example, in instrumental classical music, fast and frequent changes related to which instruments contribute to the melody component occur frequently. This aspect and others were not present in our datasets. In this study, we evaluated trios and orchestral music pieces. However, further research is required to determine the generalizability of our findings to the entire genre. We would argue that in pieces without any percussive element (e.g. string quartets), the percussion track could in the future remain empty, but the preferred remixing of the other components (melody, bass, and accompaniment) would stay the same as with our findings. However, it is possible that the addition of the percussion track in the Trio+ dataset led to a different preferred remixing than without it, even when it was possible for participants to remove it during the experiment. An additional point that needs to be investigated is the influence of the mode of listening. For example, the spectral characteristics of the microphone of the CI and the influence of the room acoustics may have an impact on remixing preference that could not be taken into account with our test setup based on the TV audio streamer. Additionally, a larger sample size of participants is needed to validate our findings for the different remixing preferences of frequent and occasional music-listening CI users. It also seems beneficial to investigate additional subjective confounding factors such as familiarity with classical music, musical concepts in general, or aesthetic preferences.

In summary, the current study showed that CI users had different remixing preferences for instrumental classical music than NH listeners. Specifically, CI users preferred higher gain on a percussion component that provided clear musical rhythm and tempo. Also, generally, the CI users preferred all instruments to be panned to their preferred side, that is,, the right side, with the percussive component panned the most. There are indications that the preferred remix is influenced by individual factors such as how often a CI user listens to music, however, further studies with a larger population are required to validate these findings. Additionally, it is of interest to investigate other factors such as familiarity with the music material or style on remixing preferences. As CI users experience limitations when listening to music and their experience of music is very individual, signal processing algorithms could be adapted to make music more accessible to them. Many people today listen to music through computers or smartphones, which can be similarly connected to the CI as in this study. Customized algorithms can be easily implemented in these devices. Generic source separation algorithms could be used to separate the tracks of instrumental music and remix them according to the needs of each CI user. For some CI users, improving music enjoyment might increase the amount of time spent listening to music, which in turn might be beneficial for their auditory rehabilitation (e.g. Dincer D’Alessandro et al., 2022).

Conclusions

For western instrumental classical music, CI users’ remixing preferences differ from those of NH listeners in the spatial domain. CI users panned the instruments toward the right side, especially the percussive element. There were individual differences in remixing preferences for occasional and frequently music-listening CI users, which suggests that a user-dependent remixing system is desirable.

Footnotes

Acknowledgements

The authors express their gratitude to Liza Lengert for her valuable contribution to classical archives () for the supply of MIDI files, Native Instruments for the supply of the Contact library, and to the participants for their voluntary participation.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Project ID: 446611346.

ORCID iD

Jonas Althoff

References

Aronoff

J. M.

Yoon

Freed

D. J.

Vermiglio

A. J.

Pal

Soli

S. D.

(2010). The use of interaural time and level difference cues by bilateral cochlear implant users. The Journal of the Acoustical Society of America, 127(3), EL87–EL92. https://doi.org/10.1121/1.3298451

Buechner

Krueger

Klawitter

Zimmermann

Fredelake

Holube

(2020). The perception of the stereo effect in bilateral and bimodal cochlear implant users and its contribution to music enjoyment. PLoS ONE, 15(7), e0235435. https://doi.org/10.1371/journal.pone.0235435

Buffa

Hallili

Gonin

P. R.

(2015). MT5: A HTML5 multitrack player for musicians. In: Proceedings of the international web audio conference. https://inria.hal.science/hal-01150455

Buyens

Van Dijk

Moonen

Wouters

(2014). Music mixing preferences of cochlear implant recipients: A pilot study. International Journal of Audiology, 53(5), 294–301. https://doi.org/10.3109/14992027.2013.873955

Buyens

Van Dijk

Wouters

Moonen

(2015). A stereo music preprocessing scheme for cochlear implant users. IEEE Transactions on Biomedical Engineering, 62(10), 2434–2442. https://doi.org/10.1109/TBME.2015.2428999

Cherry

E. C.

(1953). Some experiments on the recognition of speech, with one and with two ears. Journal of the Acoustical Society of America, 25(5), 975–979. https://doi.org/10.1121/1.1907229

Dincer D’Alessandro

Boyle

P. J.

Portanova

Mancini

(2022). Music perception and speech intelligibility in noise performance by Italian-speaking cochlear implant users. European Archives of Oto-Rhino-Laryngology, 279(8), 3821–3829. https://doi.org/10.1007/s00405-021-07103-x

Gajecki

Nogueira

(2018). Deep learning models to remix music for cochlear implant users. The Journal of the Acoustical Society of America, 143(6), 3602–3615. https://doi.org/10.1121/1.5042056

Gauer

Nagathil

Eckel

Belomestny

Martin

(2022). A versatile deep-neural-network-based music preprocessing and remixing scheme for cochlear implant listeners. The Journal of the Acoustical Society of America, 151(5), 2975–2986. https://doi.org/10.1121/10.0010371

10.

Gfeller

Christ

Knutson

Witt

Mehr

(2003). The effects of familiarity and complexity on appraisal of complex songs by cochlear implant recipients and normal hearing adults. Journal of Music Therapy, 40(2), 78–112. https://doi.org/10.1093/jmt/40.2.78

11.

Gfeller

Christ

Knutson

J. F.

Witt

Murray

K. T.

Tyler

R. S.

(2000). Musical backgrounds, listening habits, and aesthetic enjoyment of adult cochlear implant recipients. Journal of the American Academy of Audiology, 11(7), 390–406. https://doi.org/10.1055/s-0042-1748126

12.

Hwa

T. P.

Tian

L. L.

Caruana

Chun

Mancuso

Cellum

I. P.

Lalwani

A. K.

(2021). Novel web-based music re-engineering software for enhancement of music enjoyment among cochlear implantees. Otology and Neurotology, 42(9), 1347–1354. https://doi.org/10.1097/MAO.0000000000003262

13.

Joshua

K. C. C.

Ann Yi

C. C.

McMahon

Hsieh

J. C.

Tung

T. H.

Lieber

P. H. L.

(2010). Music training improves pitch perception in prelingually deafened children with cochlear implants. Pediatrics, 125(4), e793–e800. https://doi.org/10.1542/PEDS.2008-3620

14.

Kohlberg

G. D.

Mancuso

D. M.

Chari

D. A.

Lalwani

A. K.

(2015). Music engineering as a novel strategy for enhancing Music enjoyment in the cochlear implant recipient. Behavioural Neurology, 2015, Article ID 829680. https://doi.org/10.1155/2015/829680

15.

Krueger

Joseph

Rost

Strauß-Schier

Lenarz

Buechner

(2008). Performance groups in adult cochlear implant users: Speech perception results from 1984 until today. Otology and Neurotology, 29(4), 509–512. https://doi.org/10.1097/MAO.0b013e318171972f

16.

Lassaletta

Castro

Bastarrica

Pérez-Mora

Madero

De Sarriá

Gavilán

(2007). Does music perception have an impact on quality of life following cochlear implantation? Acta Oto-Laryngologica, 127(7), 682–686. https://doi.org/10.1080/00016480601002112

17.

Limb

C. J.

Roy

A. T.

(2014). Technological, biological, and acoustical constraints to music perception in cochlear implant users. Hearing Research, 308, 13–26. https://doi.org/10.1016/J.HEARES.2013.04.009

18.

Looi

King

Kelly-Campbell

(2012). A music appreciation training program developed for clinical application with cochlear implant recipients and hearing aid users. Seminars in Hearing, 33(4), 361–380. https://doi.org/10.1055/s-0032-1329225

19.

McDermott

H. J.

(2004). Music perception with cochlear implants: A review. Trends in Amplification, 8(2), 49–82. https://doi.org/10.1177/108471380400800203

20.

Meyer

(2009). Seating arrangement in the concert hall. InAcoustics and the Performance of Music (pp. 263–346). Springer

21.

Migirov

Kronenberg

Henkin

(2009). Self-reported listening habits and enjoyment of music among adult cochlear implant recipients. Annals of Otology, Rhinology and Laryngology, 118(5), 350–355. https://doi.org/10.1177/000348940911800506

22.

Mosnier

Sterkers

Bebear

J. P.

Godey

Robier

Deguine

Fraysse

Bordure

Mondain

Bouccara

Bozorg-Grayeli

Borel

Ambert-Dahan

Ferrary

(2009). Speech performance and sound localization in a complex noisy environment in bilaterally implanted adult patients. Audiology and Neurotology, 14(2), 106–114. https://doi.org/10.1159/000159121

23.

Nagathil

Weihs

Neumann

Martin

(2017). Spectral complexity reduction of music signals based on frequency-domain reduced-rank approximations: An evaluation with cochlear implant listeners. The Journal of the Acoustical Society of America, 142(3), 1219–1228. https://doi.org/10.1121/1.5000484

24.

Nogueira

Nagathil

Martin

(2019). Making music more accessible for cochlear implant listeners: Recent developments. IEEE Signal Processing Magazine, 36(1), 115–127. https://doi.org/10.1109/MSP.2018.2874059

25.

Pätynen

Pulkki

Lokki

(2008). Anechoic recording system for symphony orchestra. Acta Acustica United with Acustica, 94(6), 856–865. https://doi.org/10.3813/AAA.918104

26.

Pawlowski

(2022). Foobar2000 (1.6.9). https://www.foobar2000.org/

27.

Pons

Janer

Rode

Nogueira

(2016). Remixing music using source separation algorithms to improve the musical experience of cochlear implant users. The Journal of the Acoustical Society of America, 140(6), 4338–4349. https://doi.org/10.1121/1.4971424

28.

Robinson

(2013). ReplayGain. https://wiki.hydrogenaud.io/index.php?title=ReplayGain

29.

Ronen

(2015). Vocal clarity in the mix: Techniques to improve the intelligibility of vocals. In 139th Audio Engineering Society International Convention, AES 2015. Audio Engineering Society. http://www.aes.org/e-lib/browse.cfm?elib=18001

30.

Tahmasebi

Gajecki

Nogueira

(2020). Design and evaluation of a real-time audio source separation algorithm to remix music for cochlear implant users. Frontiers in Neuroscience, 14, 434. https://doi.org/10.3389/fnins.2020.00434

31.

Tahmasebi

Segovia-Martinez

Nogueira

(2023). Optimization of sound coding strategies to make singing music more accessible for cochlear implant users. Trends in Hearing, 27, 1–18. https://doi.org/10.1177/23312165221148022

32.

van Hoesel

R. J. M.

Tyler

R. S.

(2003). Speech perception, localization, and lateralization with bilateral cochlear implants. The Journal of the Acoustical Society of America, 113(3), 1617–1630. https://doi.org/10.1121/1.1539520

33.

Vannson

Innes-Brown

Marozeau

(2015). Dichotic listening can improve perceived clarity of music in cochlear implant users. Trends in Hearing, 19, 1–10. https://doi.org/10.1177/2331216515598971

34.

Veekmans

Ressel

Mueller

Vischer

Brockmeier

S. J.

(2009). Comparison of music perception in bilateral and unilateral cochlear implant users and normal-hearing subjects. Audiology and Neurotology, 14(5), 315–326. https://doi.org/10.1159/000212111

35.

Wright

Uchanski

R. M.

(2012). Music perception and appraisal: Cochlear implant users and simulated cochlear implant listening. Journal of the American Academy of Audiology, 23(5), 350–365. https://doi.org/10.3766/JAAA.23.5.6/BIB

Remixing Preferences for Western Instrumental Classical Music of Bilateral Cochlear Implant Users

Abstract

Keywords

Introduction

Methods

Participants

Datasets

Trio+ Dataset

Orchestral Dataset

Experimental Setup

Spatial Domain

Amplitude Domain

Spectral Domain

Statistical Analysis

Results

Remixing Preferences of NH Listeners and CI users

Spatial Domain

Amplitude Domain

Spectral Domain

Remixing Preferences Across Datasets

Spatial Domain

Amplitude Domain

Spectral Domain

Influence of Daily Music Listening Time

Spatial Domain

Amplitude Domain

Spectral Domain

Discussion

Spatial Domain

Amplitude Domain

Spectral Domain

Conclusions

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

ORCID iD

References