Abstract
Users of bilateral cochlear implants (CIs) show above-chance performance in localizing the source of a sound in the azimuthal (horizontal) plane; although localization errors are far worse than for normal-hearing listeners, they are considerably better than for CI listeners with only one implant. In most previous studies, subjects had access to interaural level differences and to interaural time differences conveyed in the temporal envelope. Here, we present a binaural model that predicts the azimuthal direction of sound arrival from a two-channel input signal as it is received at the left and right CI processor. The model includes a replication of a clinical speech-coding strategy, a model of the electrode-nerve interface and binaural brainstem neurons, and three different prediction stages that are trained to map the neural response rate to an azimuthal angle. The model is trained and tested with various noise and speech stimuli created by means of virtual acoustics. Localization error patterns of the model match experimental data and are explicable largely in terms of the nonmonotonic relationship between interaural level difference and azimuthal angle.
Keywords
Introduction
Normal-hearing (NH) subjects can localize the source of a sound with high acuity across a wide range of azimuthal (horizontal) locations. The highest precision that can be achieved for azimuthal differences in the frontal field—the minimum audible angle—is approximately 1° (e.g., Mills, 1958). The explanation for this precision has been attributed to exquisite sensitivity to interaural time differences (ITDs) conveyed in the temporal fine structure of low-frequency sounds, particularly in the frequency range of 500 to 800 Hz (Wightman & Kistler, 1992). Cochlear implant (CI) listeners are generally unable to exploit this cue, even with identical CI processors in both ears. One of the reasons is that the ITD is often not preserved in the electrical pulse timing. Another reason is that most CI listeners are insensitive to ITDs if the electrical pulse trains that convey ITDs are presented at a rate greater than 500 pulses per second (pps; e.g., Majdak, Laback, & Baumgartner, 2006). CI listeners instead rely on other localization cues that are generally available to both NH and bilaterally implanted listeners, such as ITDs conveyed in the temporal envelope of modulated sounds (envelope ITDs) and interaural level differences (ILDs). Spectral cues generated by the interaction of the sound with the head and outer ears also contribute to some degree to localization performance in both the horizontal and vertical planes and are available even in monaural listening. The extent to which CI users can exploit these cues, however, is further limited by the microphone position, the CI processing, and the spread of excitation. Together these aspects result in the very limited localization performance of unilateral CI listeners (Kerber & Seeber, 2012).
Localization performance is typically assessed with broadband noise (e.g., Kerber & Seeber, 2012) or speech stimuli (e.g., Potts, Skinner, Litovsky, Strube, & Kuk, 2009) presented via loudspeakers in free-field conditions to provide a wide range of localization cues. The localization performance of different listener groups can then be directly inferred from the availability and absence of the localization cues. Compared with unilateral CIs, the use of bilateral CIs greatly improves the accuracy with which sounds can be localized in the azimuthal plane (e.g., Litovsky, Parkinson, & Arcaroli, 2009; Nopp, Schleich, & D’haese, 2004), which can be attributed to bilateral CI users exploiting the ILD, and potentially the envelope ITD, in their judgments (Laback, Pok, Baumgartner, Deutsch, & Schmid, 2004; Seeber & Fastl, 2008). Nevertheless, most studies reveal a substantial gap in localization performance between NH listeners and users of bilateral CIs (Grantham, Ashmead, Ricketts, Labadie, & Haynes, 2007; Litovsky et al., 2012), likely resulting from of the lack of fine-structure ITD information, as well as the independent operation of left and right CI devices, left and right differences in the processor settings, and the electrode-nerve interfaces. Kerber and Seeber (2012) tested the localization ability of NH listeners, bilateral CI users, and unilateral CI users in the same task. Their results revealed a median root mean square (RMS) localization error of about 5° for NH listeners, 30° for bilateral CI listeners, and 47° for the two best performing unilateral CI listeners, whilst two other unilateral listeners were unable even to discriminate between left and right. Távora-Vieira, De Ceulaer, Govaerts, and Rajan (2015) further showed that CI subjects suffering from unilateral deafness showed significant improvements in localization abilities when the CI was switched on (mean RMS error = 22.8°) compared with when it was not (mean RMS error = 48.9°), despite the very different sound stimulation provided to each ear.
To gain a deeper insight into the origin of the localization errors made by bilateral CI subjects, Jones, Kan, and Litovsky (2014) compared their performance with that of NH listeners listening through a vocoder simulation of virtual acoustic sounds. They found comparable performance of about 30° RMS error and a very similar pattern of systematic errors. The comparison of CI data with NH data using vocoder simulations has proven beneficial in the past because the simulations enable testing of two almost identical ears with identical preprocessing and also show a low intersubject variability (e.g., Goupell & Litovsky, 2014). Therefore, the comparable performances between vocoded NH and CI subjects reported by Jones et al. suggest that the absence of fine-structure ITD is the most limiting performance factor with differences between left and right CI playing a less critical role.
Of the remaining two interaural cues available to typical bilateral CI users, namely envelope ITDs and ILDs, the latter is the more salient and dominant cue (Laback et al., 2004; Seeber & Fastl, 2008). Using their clinical speech processors, subjects could discriminate acoustic ILDs of approximately 2 dB, only about twice as high as NH listeners (Laback et al., 2004).
Despite ILDs being the most important cue for sound source localization with bilateral CIs, little information exists as to how the azimuthal direction of sound arrival translates into different electrode activations, different auditory nerve (AN) response rates, and, ultimately, in listeners being able to perceive the direction of sound arrival. Computer models complemented by experimental data are expected to reveal this, by dissecting the specific contribution of each stage of the long processing chain. For example, some monaural models combine a speech-coding strategy with a model of the electrode-nerve interface (e.g., Imennov & Rubinstein, 2009; Stadler & Leijon, 2009). Such models are useful in clarifying whether specific differences between electric and acoustic hearing are caused by the speech-coding strategy or rather by the electrode-nerve interface. To understand spatial hearing with CIs, even more processing stages are required, namely the filtering of the sound by the head and torso, typically referred to as the direction-dependent head-related transfer function (HRTF) as a front end and the binaural interaction as additional back end.
To date, such direction-estimating models exist only for NH listeners (e.g., Dietz, Ewert, & Hohmann, 2011; Faller & Merimaa, 2004). They have found various applications in wave field synthesis (e.g., Wierstorf, Raake, & Spors, 2013), computational auditory scene analysis (Spille, Meyer, Dietz, & Hohmann, 2013b; Woodruff & Wang, 2013), and automatic speech recognition (e.g., Spille, Dietz, Hohmann, & Meyer, 2013a). Direction-estimating models for bilateral CI listeners could further be applied to CI algorithm development or individualized performance prediction. However, existing binaural models of CI listening (e.g., Chung, Delgutte, & Colburn, 2014; Colburn, Chung, Zhou, & Brughera, 2009; Nicoletti, Wirtz, & Hemmert, 2013) have mostly been concerned with the response of binaural neurons in the brainstem to electrical stimulation. So far, no published CI model combines all necessary processing stages required to estimate the direction of sound arrival based on simulated neural responses.
Here, we present a processing chain for modeling the spatial hearing of bilateral CI listeners by combining several existing models. The processing chain includes (a) the sound interacting with the HRTF, (b) the CI speech processor, (c) the electrode-nerve interface, (d) a model of the AN responses, (e) a neural stage of binaural processing, and (f) three models of a mapping stage, which estimate an azimuthal direction of the sound source based on the neural left and right input. Investigating the input–output characteristics of the key stages and the nonlinear interaction of some stages will allow for a detailed investigation of the final prediction and of any systematic error. The model-based estimates will also be compared against published data and be used to test hypotheses made in the previously described studies. In particular, the dominance of the ILD and the detrimental effect of the level-dependent compression will be investigated.
Methods
The model was implemented in MATLAB. The code is publicly available. 1
After introducing the stimuli used, the following subsections will follow the processing chain of the stimuli by the model stages as illustrated in Figure 1.
Flow chart of the implemented processing chain. Each stage is described in detail in the Methods section. The localization models operate either on the AN output or on the output of the binaural interaction stage (see last part of the Methods section for details). The output of the binaural interaction stage still has two channels, representing left and right hemisphere brainstem neurons.
Stimuli and HRTF Filtering
Four different stimuli were used to test the model:
Stationary speech-shaped noise (SSN) generated from 10 male and 10 female adult German speakers, uttering sentences of the Oldenburg Sentence Test (OLSA) by the procedure described in Wagener, Kühnel, and Kollmeier (1999): OLSA noise Pink noise (PN), that is, stationary noise with a 1/f power spectrum, White Gaussian noise (WN), and A male
Dichotic, free-field stimuli were produced from these four stimuli using a previously generated database of head-related impulse responses (Kayser et al., 2009). The head-related impulse responses were taken from the frontal out of the three behind-the-ear microphones mounted on an artificial human head and torso at a virtual source distance of 3 m. The free-field stimuli were then generated at 5° intervals for the azimuthal angles between 0° and 90°, generating 19 virtual sources for each stimulus.
All experimental stimuli were 200 ms in duration and adjusted to have 10-ms rise and fall times. Model experiments were performed at three different stimulus levels of 45, 55, and 65 dB sound pressure level (SPL) for the frontal source direction. When the signal was presented from a nonfrontal direction, the same calibration was used, and the level would deviate from the frontal level.
In addition to these four main stimuli, two more natural stimuli were used for additional testing. One was 10 s of continuous male and female speech, obtained from the OLSA, to examine predictive ability in the case of dynamic level fluctuations. The second was SSN combined with a 360° white noise interferer at 5 dB signal-to-interferer ratio to examine the effects of interfering sound on the predictive ability of the model.
CI Processing
The transformation from sound to electrodogram was performed using the advanced combination encoder (ACE) strategy implemented in the Cochlear Nucleus 24 implant (Laneau, 2005). The acoustic broadband signal was sampled at 16 kHz and filtered into 22 frequency bands using a 128-point FFT. The frequency bands had center frequencies linearly spaced below 1000 Hz and logarithmically spaced above 1000 Hz. As part of the ACE processing strategy, an n-of-m strategy was implemented by selecting the 8 most energetic of the 22 channels in each 8-ms time frame. Out of the eight selected bands, the most basal band was stimulated first during each time frame, followed sequentially by the next one. Frequency-independent compression was then performed on each channel by applying a loudness growth function with a steepness controlled by the parameter αc = 415.96 (see Swanson, 2008 for further details). Threshold levels of TSPL = 25 dB SPL and maximum levels MSPL = 65 dB SPL were set, and the resulting levels were then mapped to the CI electrode threshold and saturation levels that were specified in clinical units (CL) of T = 100 CL and M = 200 CL, respectively. These values were then mapped onto output current values,
Electrode-Nerve Interface
The spread of current within the cochlea and the response of the population of AN fibers was modeled as in Fredelake and Hohmann (2012). The model assumed an unwound cochlea with a length of 35 mm. The 22 electrodes were equally distributed between 8.125 and 23.875 mm (measured from the apex) at a spacing of 0.75 mm, simulating a Cochlear Nucleus 24 electrode array. The electrode locations inside this virtual cochlea corresponded to acoustic frequencies between 363 and 4332 Hz according to the frequency-to-place function of Greenwood (1990). The spread of current in the cochlea was modeled by a double-sided, one-dimensional, exponentially decaying spatial-spread function controlled with the parameter λ = 9 mm. Along the virtual cochlea, 500 AN fibers were equally distributed. To simulate individual AN fiber stimulation to electric pulses, the electric pulse trains were processed by a deterministic leaky integrate-and-fire model (Gerstner & Kistler, 2002) extended with a zero-mean Gaussian noise source, to simulate stochastic behavior of the AN fibers. The model processed electric pulse trains with a stimulus-level-dependent current amplitude across electrodes as input. The model output was a vector of AN spike times over the duration of the acoustic stimulus for each AN fiber in the population.
AN Frequency Bands
Models of binaural processing assume convergent input from a certain number of AN fibers. The model described in the binaural interaction stage used 20 AN inputs from either side (Wang & Colburn, 2012) so that sample groups of 20 fibers along the model cochlea were combined for further analysis of the AN responses. Instead of presenting all 25 possible AN groups from the 500 fibers, only five sample groups were selected, centered at 10, 13, 16, 19, and 22 mm from the apex, corresponding to 5 segments, each covering 2 electrodes in the range of electrodes 3 to 20. The five mean center frequencies of the five electrode pairs were 563, 1063, 1813, 3188, and 5500 Hz. To compute the corresponding acoustic free-field ILDs, left and right audio channels were filtered using equivalent rectangular bandwidth-wide fourth-order gammatone filters (Hohmann, 2002) with the same five center frequencies mentioned earlier. The RMS power was computed in each frequency band and the difference between the right and left channels produced the final frequency-dependent ILD.
Binaural Interaction Stage
A Hodgkin–Huxley-type model was used to model binaural interaction at the level of the brainstem (Wang & Colburn, 2012). The single-compartment model contained a sodium channel, a high-threshold potassium channel, and a passive
Localization Models
The stages presented so far modeled the neural response rates at the AN and LSO stages in the five AN frequency bands. The neural response rates are expected to change depending on the source location. In this subsection, three different localization models are proposed to map the response rate differences to a predicted source location. The methods, namely linear rate-level localization, linear response difference localization, and maximum-likelihood localization, were implemented on the interaural AN response differences or on the respective LSO response differences. Each model differs from the others in the choice of input–output relations that were used to train the model.
Linear rate-level localization model
This model operates only on AN responses, not on LSO responses. Compared with the other two model types that will be introduced, this model allows for a more functional understanding of the input–output relations of each processing stage shown in Figure 1, thus enabling the user to pinpoint the origin of model predictions to the influence of acoustic ILDs, the influence of speech coding, or influences at the electrode-nerve interface. The model assumes a priori claims that (1) ILD can be linearly mapped to azimuthal direction of sound arrival, (2) monaural AN response rate can be linearly mapped to stimulus level, and (3) CI subjects’ percept of azimuthal localization can be linearly mapped to the interaural AN response rate difference
Thirty instances of 200-ms SSN were HRTF filtered and the virtual source at 0° was presented monaurally. Monaural rate-level curves were constructed from the mean AN response rates within each of the AN frequency bands, and, based on Claim (2), a linear regression was performed on this data between the SPL of 35 and 70 dB. The slopes Schematics of the localization models via linear fitting. (Left): two-stage rate-level localization model. The first stage maps azimuthal angle to ILD, and the second stage maps level to spike rate. These two mappings produce a final mapping of AN response rate difference to azimuthal angle. (Right): response rate difference localization model. This model is used on the interaural response difference for either the AN or the LSO output. (Top): Ipsilateral (dashed lines) and contralateral levels (solid lines) relative to 0° plotted as a function of azimuthal angle for the five AN frequency bands. (Bottom): Interaural level difference (ILD) as a function of azimuthal angle for the five AN frequency bands.

For training the model to the frequency-dependent mapping of ILD to azimuth, the SSNs were presented (in virtual acoustic space) from each of the azimuthal angles. A further linear regression (based on Claims 1 and 3) generated a slope
In each frequency band
Two different model calibrations were generated by altering the azimuthal angle range over which the linear regression was computed: Slope
To produce a final prediction from the five AN frequency bands, a rate-weighted average was then derived:
Linear response difference localization model
Slopes of Linear Regression Performed for Each Stage of Localization Models Obtained With 55 dB Speech-Shaped Noise.
Maximum-likelihood estimation
This model assumes that localization can be modeled as maximizing the likelihood of a set of frequency channels over all possible azimuthal angle percepts, similar to the approach of Day and Delgutte (2013). A prediction was produced by performing a maximum-likelihood estimation (MLE) over all angles. This method exploits the azimuth-specific ILD patterns across frequency channels that are also mirrored by LSO neurons (Tollin & Yin, 2002). The MLE method includes, but is not limited to, a possible explicit place coding of ILD that will be detailed in the HRTF Results and Discussion section: Depending on which specific frequency channel is closest to its maximum ILD, the prediction model derives the corresponding azimuth. Hypothetically, this place-coding strategy is suitable at lateral angles, whereas, for more central angles, the probabilistic calibration can effectively operate in terms of a rate code, similar to the other calibrations. The model was trained with interaural AN response rate differences. As this prediction stage is more suited to training with naturally fluctuating signals, rather than long-term averages, 10 s of 55 dB SPL continuous male and female speech originating from each of the 19 possible directions was used as training material. Sample means, and corresponding standard deviations, were computed on response difference data collected in 200-ms windows for each of the 5 frequency bands resulting in 50 × 19 training events. A multivariate Gaussian distribution of response rate differences was then assumed at each angle. In contrast to the other models, this model is restricted to the right hemisphere by the training. However, this limitation can easily be overcome.
In the testing phase, only the five rate differences were supplied to the multivariate probability density function and the angle with the maximum likelihood was determined. This calibration method offers the advantage over the other two calibrations in that it can potentially cope with the nonmonotonic ILD because the azimuth dependence of the ILD is frequency specific, and each angle has its unique combination of ILDs. Because the training material and nature of the calibration was fundamentally different than that of the other linear methods, the maximum-likelihood results were not included in the statistical analysis of the other calibrations.
Results and Discussion
Influence of the HRTF
The upper torso and the head have a considerable spectral filtering effect that depends critically on the direction of sound arrival, especially for wavelengths shorter than head diameter (
Whilst the difference in HRTFs between the ears for lateral sources is well known, and forms the basis of numerous investigations of spatial hearing (e.g., Strutt, 1907), the spectral coloration is typically not considered in studies that are not explicitly concerned with directional hearing. The importance of HRTFs, even for investigations into dimensions of sound processing other than localization, however, is apparent in the transfer function of the HRTF filtering to white noise at 0° azimuth and elevation (Figure 4), where a significant amplification of ∼5 dB SPL is evident in the 2 to 4 kHz range. A collateral benefit of this filtering is an enhancement of the higher frequency formants (1.7, 3.4, 4.3 kHz) that are captured in the corresponding AN frequency bands. The HRTF-filtered white noise also shows a high-pass characteristic resulting in approximate 10 dB attenuation at 200 Hz.
Smoothed spectra of the four stimuli at 0° with and without HRTF filtering. Top left: speech-shaped noise (SSN). Top right: A 200-ms male speech token (Sp). Bottom left: 1/f pink noise (PN). Bottom right: White Gaussian noise (WN). The colored rectangles represent the AN frequency bands used in the localization models. The respective color corresponds to the color code introduced in Figure 3.
In contrast to the lack of HRTF consideration in monaural studies, many investigations have analyzed the azimuth dependence of the HRTF (e.g., Duda & Martens, 1998). In the current study, compared with a reference at 0° azimuth, ipsilateral signal levels increased slightly up to azimuthal angles of approximately 45° and flattened beyond that (Figure 3, top). This effect was relatively independent of sound frequency, with the exception of the 3188-Hz band, which showed destructive interference effects (due to the reflections from the shoulder) for angles beyond 45°. Conversely, contralateral levels were strongly dependent on azimuth over the entire range of angles. This effect was highly frequency dependent, with larger negative slopes corresponding to higher frequencies. This frequency dependence is well established and indeed was highlighted in early descriptions of the duplex theory of sound localization (Strutt, 1907). Low-frequency sounds have a longer wavelength than the diameter of the head and, as a result, are less subject to attenuation at locations contralateral to the source. This results in a frequency-dependent mapping of interaural difference to azimuth, expressed by the slopes
Contralateral levels dropped with increasing azimuthal angle up to a frequency-dependent minimum, after which they began to rise again. This increase toward 90° has been analytically described with a spherical head model (e.g., Duda & Martens, 1998): Close to −90°, the pathways around the head are similarly long and the waves interfere constructively at the contralateral ear. At midfrequencies (e.g., 1 kHz), this effect is already apparent at −60°, but at higher frequencies with shorter wavelength, the constructive interference began to appear only at more lateral angles (see Figure 3).
The free-field dependence of ILD on azimuthal angle was dominated by the contralateral effects (Figure 3, bottom). Due to the nonmonotonic behavior, it is not possible to generate a linear mapping of ILD to azimuth without including systematic errors. It is possible that the brain potentially learns this pattern of ambiguity and employs some form of explicit code to map this region, such that the dominant frequency of the neuron with the peak ILD determines the azimuth. The proposed MLE localization model can potentially account for this possibility, whereas the other localization models are limited by their linear approximations.
Influence of the CI Processing
One aspect that illustrates the interaction of HRTF influences and CI processing is the spectral filtering of the HRTF, which was shown to have relevance beyond azimuthal localization. These spectral filtering properties of the head, torso, and outer ear are typically only investigated when studying localization in the sagittal plane (e.g., Majdak, Goupell, & Laback, 2011). Their particular relevance for electric hearing can be seen in Figure 5, when comparing the monaural AN model output with and without HRTF filtering from 0° azimuth. For instance, the 1/f (pink) noise had the same energy in each octave band and elicited similar activations in all five frequency channels without HRTF preprocessing but possessed a very frequency-dependent activation after 0° HRTF filtering (Figure 5, column 3). Interestingly, this was precisely opposite to the case when SSN is assessed; here, the HRTF-filtered signal yielded the most homogeneous electrode activation (Figure 5, column 1). This illustrates just how well CI processing is tailored to match free-field HRTF filtering and the average speech spectrum. It is, therefore, important to include HRTF filtering for the purpose of modeling realistic response patterns for broadband stimuli even in monaural studies. CI subjects are also exposed to a similar HRTF filtering when they listen using their behind-the-ear microphones.
(Top row) Cumulative charge output for each of the AN frequency bands of the four HRTF-filtered stimuli at 0°. The charge is summed over the 2 electrodes in each bin for the 200-ms duration of the stimuli. (Middle row) Mean AN frequency band rate-level curves for the same free-field stimuli. (Bottom row) Mean AN frequency band rate-level curves for the nonfree-field stimuli.
To map the large dynamic range of input signals—primarily of speech—to the low dynamic range of CI listeners, a relatively strong compression to the input signal is applied, which generates a nonlinear relationship between input level in dB and output current (Figure 5). This nonlinearity leads to a level-dependent transformation from ILD to interaural current differences (ICD). In the model, this ultimately resulted in level-dependent localization. At low input levels, where the compression was weak, a given ILD resulted in a larger ICD than at high input levels. This model prediction calls for a subjective evaluation of whether CI subjects show level-dependent localization. Level-dependent lateralization has been demonstrated with NH subjects with conflicting ITD information (Dietz, Ewert, & Hohmann, 2009) and bimodal listeners with nonmatched loudness growth (Francart & MacDermott, 2012). However, we are aware of only one bilateral CI subject whose localization abilities were tested at different levels (60 and 70 dB SPL; van Hoesel, Ramsden, & O’Driscoll, 2002). In line with our model prediction, van Hoesel et al. (2002) argues that level compression likely also has a compressive effect on the perceived angle at higher sound levels (i.e., it causes a central localization bias). Their single subject, in contrast, has a lateral localization bias at 70 dB SPL. This is further backed by data from Grantham, Ashmead, Ricketts, Haynes, and Labadie (2008; their Figure 3) that reveals lower ILD thresholds for bilateral CI subjects when compression is switched off.
Finally, performance is reduced by the independently operating n-of-m strategies in either processor (i.e., either ear). Due to the frequency dependence of ILDs, the contralateral ear had a bias to lower frequencies, resulting in the n-of-m strategy selecting more apical electrodes that were potentially not selected on the ipsilateral side. In some cases, this result led to a stronger stimulation on the contralateral electrodes and, ultimately, a contralateral localization cue in the apical channels (see PN: 65 dB, Figure 7, top right). A similar effect has previously been described for independently operating gain control in each ear (Dorman et al., 2014).
The HRTF-filtered SSN produced the most uniform stimulation pattern across all sound levels, and the dynamic range from approximately 40 to 70 dB SPL was centered around 55 dB (Figure 5, middle left). Because of these properties, 55 dB SSN was chosen to calibrate the localization models. In the case of speech being used as a test stimulus, the instantaneous level fluctuations resulted in a more linear mean stimulus level across the stimulation current curve (Figure 5, second column).
Influence of Spread of Excitation
The spread of current within the cochlea is the major limiting factor of spectral resolution in electric hearing (Bingabr, Espinoza-Varas, & Loizou, 2008). With a 9 mm exponential current decay, and an AN frequency band spacing of just 1.4 mm, a considerable cross talk can be expected. This was visible in the AN-rate-versus-azimuth functions (Figure 6), where differences between neighboring channels were reduced compared with the acoustic ILDs (Figure 3). However, in the tested conditions, with one directional source, even this large spread of excitation did not systematically influence localization ability. Similarly, even an interaural electrode mismatch has been shown not to harm ILD sensitivity very much (Kan, Stoelb, Litovsky, & Goupell, 2013). Further, in a two-source condition, with both sources having different frequency content, spread of excitation would be expected to impact localization more than we demonstrate here. Finally, the spread of excitation resulted in a less reliable relation of response rate difference for a given azimuth. For instance, the peak of the AN-rate difference over azimuth (Figure 6) was at 70° for both the 19- and the 22-mm channel, whereas the corresponding acoustic ILD had peaks at 70° and 80°, respectively. The spread from the more energetic electrodes around the 19-mm band to the 22-mm band caused the 22-mm rate difference to be dominated by off-frequency ILDs.
AN response rate differences for speech-shaped noise at three different levels. (Top) Right (dashed lines) and left AN response rates relative to a 0° reference are plotted as a function of azimuthal angle for the five AN frequency bands. (Bottom) Interaural AN response rate difference as a function of azimuthal angle for the AN frequency bands. The respective color corresponds to the color code as in Figures 3 and 4. The level dependence is due to flooring effects at the contralateral left side (no responses) at 45 dB and strong compression effects especially at the right side at 65 dB. (Top) Interaural AN response rate difference as a function of azimuthal angle for the 5 AN frequency bands to speech-shaped noise (left) and pink noise (right) at 65 dB. (Bottom) Interaural LSO response rate difference as a function of azimuthal angle for the five AN frequency bands to speech-shaped noise (left) and pink noise (right). The respective color corresponds to the same frequency bins as in previous figures.

Influence of the Binaural Model
Whilst the difference between interaural AN responses purely resembled the ILD cue, the modeled binaural neurons were also sensitive to the ITD. In the case of the ACE strategy assessed here, the only available ITD cue was the envelope ITD. In clinical processors, ILD cues dominate envelope ITD cues for both speech and noise input (Laback et al., 2004). This is in line with the model outcome, that is, that the LSO output as a function of azimuth (Figure 7) had a similar shape to that of the acoustic ILDs (Figure 3). When comparing LSO model data with AN model data (Figure 7), the small but positive influence of envelope ITD can be seen in the 22-mm channel where the decline toward 90° was smaller in the LSO than in the AN response. The LSO stage of the model is expected to be suitable for testing potential localization benefits of new coding strategies that preserve ITD in the pulse timing.
NH listeners can also exploit temporal fine-structure ITD information in the 1-kHz regime, presumably through their faster medial superior olive (MSO) pathway (Remme et al., 2014). In processing binaural information, CI listeners appear to be limited to the slower LSO pathway, consistent with their upper-frequency limen being roughly 200 to 400 Hz or pps, even if ITDs are preserved in the pulse pattern (van Hoesel & Clark, 1997; van Hoesel & Tyler, 2003). A possible reason for CIs not activating the MSO pathway is that the highly synchronized neural stimulation pattern from the electric pulses is not optimal for the synapse and membrane parameters of the MSO (Chung et al., 2014). This single effective pathway of the LSO was represented by our model and contrasts with the complex dual (MSO, LSO) pathway models of the acoustically stimulated binaural system (e.g., Dietz et al., 2009; Hancock & Delgutte, 2004; Takanen, Santala, & Pulkki, 2014).
Results of the Localization Models
In contrast to the previous stages, less is known about how central pathway stages extract a localization percept from the ensemble of binaural brainstem neurons. Therefore, three different localization models were tested here. The first strategy was a two-stage linear mapping of the monaural AN response rates into level and from ILD to azimuth. This strategy is less likely to resemble the physiologic processes or the learning and estimating strategies of a human subject (Figure 8(a)). The second localization model of a one-stage linear mapping from interaural rate differences of LSO or AN responses to azimuth is more plausible, at least as a possible learning strategy of a listeners brain (Figures 8(b) and 9).
Model predictions and RMS error for the rate-level and response difference localization models. Frequency band predictions are shown for the SSN stimulus at 55 dB. Weighted average predictions are also shown (black) for all levels tested with the model. (Left) Predictions using the AN-rate-level Model predictions and RMS error for the AN (Left) and LSO (Right) response difference localization models for the 

The calibration of the linear mapping was somewhat arbitrary; however, it could resemble an individual mapping strategy. For instance, Jones et al. (2014) reported very individualized localization estimation patterns that can be attributed to such mappings. Two possible mapping strategies were tested here: (a) minimizing the error in the frontal segment between −45° and +45° and (b) minimizing the error in the range between −90° and +90°. Both strategies were successful in reaching their specific goal (Figures 8(b) and 9(a)). The first strategy, using the slope
Results of the LSO Localization Models for Both Model Calibrations Compared With the Results of Localization Experiments Performed With Bilateral CI Users (Jones et al., 2014).
Finally, one predictive model was calibrated by creating multivariate normal probability density functions across all AN frequency bands for all azimuthal angles. In the current experiment, the probabilistic model outperformed the linear models by producing lower overall RMS errors and by being able to predict performance at more lateral angles (Figures 10 and 11). The probabilistic model’s predictive ability was also compared with that of the linear AN response difference localization model ( Weighted average predictions and RMS error are compared for the linear AN response rate difference model and the maximum-likelihood estimation (MLE) model for the 200-ms speech token and a 10-s combination of OLSA male and female speech. Predictions are made in shifting windows of 200 ms. RMS errors of the weighted average predictions for each model calibration. Results are shown for the four main stimuli at the three stimulus levels used in the experiment.

Despite their inherent differences, each model produced predictions that were within the range of published subject data (e.g., Grantham et al., 2007; Jones et al., 2014; Kerber & Seeber, 2012), thus making it difficult to say that one model is more realistic than the rest (Figure 11). It is also not possible to state which model is most accurate. What can be summarized from this subsection is that the LSO model is most robust against changes in signal level and that the MLE model performs best at the most lateral angles. Instead of choosing a model version based on performance, each one offers a different functional understanding of the input–output relations and should be chosen according to the focus of study.
Influence of the Test Stimulus
The best performing stimulus level, averaged over all stimulus types and prediction models, was 55 dB SPL with an RMS error of
In addition to testing the 4 different 200-ms stimuli in the absence of an interferer, data were also obtained for continuous speech (Figure 10, bottom row) and for a +5 dB signal-to-noise ratio condition (Figure 12). In the continuous speech, a new prediction was made every 200 ms, and the resulting large variability was caused by the nonstationary speech. Despite this variability, the distribution of predictions still retained the same basic bimodal form as that of the stationary stimulus. The central bias observed for more lateral angles in the +5 dB signal-to-noise ratio condition is in line with data for actual CI subjects (Kerber & Seeber, 2012) and can be explained by the reduction of ILD due to the 0-dB ILD of the interferer.
Weighted average predictions and RMS error for clean SSN stimuli at 55 dB SPL (left) and the same stimuli at a signal-to-interferer ratio of +5 dB (right). The AN response difference localization model was used with the 
Summary and Conclusions
A computer model for simulating the spatial hearing abilities of bilateral CI listeners was presented and tested in seven different versions. The model predictions were similar to the subjective performance of bilateral CI listeners not only in terms of the average localization error but also in the occurrence of systematic errors. The spatial resolvability of the model was too good, likely due to the absence of
Irrespective of the particular version, the model confirmed a range of hypotheses from experimental studies, including the dominance of ILD cues in lateralization judgments, and the detrimental effect of level-dependent compression. The model allows for the identification of the origins of unexpected predictions along the processing chain. Beyond that, the proposed model of localization for bilateral CI can be useful in predicting performance of individual CI subjects by customizing the model to their clinical profile and for early stage testing of CI algorithms. It can be especially useful for the investigation of complex acoustic scenarios and more complex input systems (e.g., electro-acoustic stimulation or single-sided deaf), as well as in testing the influence of interaural pulse time differences at low pulse rates (with the ITD sensitive LSO model). From the investigations made to date, a binaurally coordinated n-of-m selection and compression are highly desirable for the next generation of CIs. Future model extensions are envisaged to include individualized versions of the processor setting, the electrode-nerve interface, and the mapping strategy.
Footnotes
Acknowledgments
We thank Tim Jürgens and Ben Williges for providing the code and support for the
model, Le Wang for the LSO model code, Bernd Meyer for generating the multitalker speech-shaped noise, and Heath Jones for providing localization data of binaural CI subjects. We are indebted to David McAlpine, Volker Hohmann, and Nathan Spencer for valuable feedback on previous versions of the article. We are grateful to Birger Kollmeier and the Medical Physics group for continuous support and fruitful discussions.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under ABCIT grant agreement n° 304912.
