Abstract
Current research on sound source externalization primarily focuses on air conduction (AC). As bone conduction (BC) technology advances and BC headphones become more common, the perception of externalization for BC-generated virtual sound sources has emerged as an area of significant interest. However, there remains a shortage of relevant research in this domain. The current study investigates the impact of reverberant sound components on the perception of externalization for BC virtual sound sources, both with the ear open (BC-open) and with the ear canals occluded (BC-blocked). To modify the reverberant components of the Binaural Room Impulse Responses (BRIRs), the BRIRs were either truncated or had their reverberation energy scaled. The experimental findings suggest that the perception of externalization does not significantly differ across the three stimulation modalities: AC, BC-open, and BC-blocked. Across both AC and BC transmission modes, the perception of externalization for virtual sound sources was primarily influenced by the reverberation present in the contralateral ear. The results were consistent between the BC-open and BC-blocked conditions, indicating that air radiated sounds from the BC transducer did not impact the results. Regression analyses indicated that under AC stimulation, sound source externalization ratings exhibited strong linear relationships with the Direct-to-Reverberant Energy Ratio (DRR), Frequency-to-Frequency Variability (FFV), and Interaural Coherence (IC). The results suggests that BC transducers provide a similar degree of sound source externalization as AC headphones.
Keywords
Introduction
As virtual reality (VR) (De Vries et al., 2001) and augmented reality (AR) technologies continue to evolve, three-dimensional audio presented via headphones has become a crucial component. However, sound produced through headphones is often perceived as being inside the head, a phenomenon known as in-head localization (Toole, 1970). In contrast, naturally occurring sound sources are perceived as externalized, meaning they are located outside the head (Hartmann & Wittenberg, 1996). The ability to externalize sound is crucial for accurate spatial perception and for creating realistic acoustic environments, both of which are essential for an immersive audio experience (Simon et al., 2016). In recent years, extensive research has focused on the cues necessary for perceiving sound source externalization and methods for achieving this effect in virtual sound sources using headphone reproduction (Catic et al., 2013, 2015; Hassager et al., 2016; Li et al., 2018a, 2018b, 2019, 2021).
In a free-field environment (without environmental reflections), the transmission of sound to the ears can be characterized by head-related transfer functions (HRTFs). HRTFs encode spatial information about sound sources and are critical for studying binaural spatial hearing. Virtual auditory perception uses HRTFs for signal processing to simulate spatial auditory perception when reproduced via headphones or loudspeakers (Xie, 2013). Research has demonstrated that localization of virtual sound sources presented through headphones relies on interaural time differences (ITDs) for frequencies below approximately 1 kHz, interaural level differences (ILDs) across the entire frequency range, and accurate spectral information delivered to both ears (Hartmann & Wittenberg, 1996). Many studies have examined how spectral details affect sound source externalization by manipulating the smoothness of HRTF spectra. For instance, increased smoothness of HRTF spectra has been found to cause the sound image to move inward, closer to the head (Baumgartner et al., 2017). Nonetheless, Kulkarni and Colburn argued that fine spectral details are not always necessary for externalization of wideband noise sources (Kulkarni & Colburn, 1998).
Compared to HRTFs, virtual auditory perception using Binaural Room Impulse Responses (BRIRs) can significantly improve sound source externalization (Begault et al., 2001). BRIRs encode not only spatial information about the sound source but also the acoustic characteristics of the environment, including reflections and reverberation (Sandvad, 1999), which contribute to the perception of distance (Shinn-Cunningham, 2000). Reverberation is essential for creating a sense of spatial presence when listening to virtual sound sources through headphones (Begault et al., 2001). Li et al. (2018b) investigated how BRIRs signal length and reverberant energy influence externalization. Their findings for laterally positioned sources suggested that reverberation in the contralateral BRIR signals had a greater effect on externalization than reverberation in the ipsilateral signals. Moreover, variations in Direct-to-Reverberant energy Ratio (DRR) and Frequency-to-Frequency Variability (FFV) in the contralateral ear were found to correlate with changes in externalization, depending on BRIR signal length or reverberant energy levels (Li et al., 2018b). These effects are all related to monaural information.
Previous studies highlight the importance of binaural information in the reverberant signal for sound source externalization. Catic et al. (2015) showed that when the direct sound contains weak binaural cues and minimal interaural differences, the binaural cues from the reverberant sound become critical for externalization. Conversely, when the direct sound has strong binaural cues and interaural differences (i.e., for a laterally positioned source), the contribution from reverberant binaural information is reduced. Leclère et al. (2019) further noted that reverberant sound enhances externalization only when it generates interaural differences; when reverberation is identical in both ears, its effect on externalization is negligible. For lateralized sound sources, Li et al. (2018b) found that changes in the contralateral ear's BRIR reverberation significantly influenced Interaural Coherence (IC), with increased reverberation leading to decreased IC and enhanced externalization. In contrast, variations in the ipsilateral ear's BRIR reverberation led to relatively minor IC changes.
A key area of research in sound source externalization is identifying the binaural cues that contribute to this perceptual phenomenon. While earlier studies emphasized the role of interaural time differences (ITDs) (Levy & Butler, 1978), more recent research has shifted the focus toward interaural level differences (ILDs) and IC. Catic et al. (2013) investigated the role of binaural cues in sound source externalization for speech in reverberant environments using BRIR signals in normal-hearing participants. Their data suggested that temporal fluctuations in binaural cues play a crucial role. Specifically, for frequencies above 1 kHz, compressing ILD fluctuations in BRIR-processed speech significantly reduces the perceived externalization. However, when the sound is low-pass filtered below 1 kHz, externalization remains largely unaffected by ILD fluctuation compression. In a subsequent study, Catic et al. (2015) expanded their investigation across a wider range of conditions and stimuli, concluding that ILDs and ICs strongly correlate with externalization ratings. Leclère et al. (2019) observed a strong correlation between externalization ratings and both IC fluctuations and overall IC within stimuli. Li et al. (2018b; 2019) further explored the impact of these cues on externalization by selectively modifying BRIR signals for the ipsilateral or contralateral ear, such as by truncating signals or reducing reverberation energy. Their results indicated that for lateral sound sources, externalization is particularly influenced by the reverberation characteristics reaching the contralateral ear, where reverberation fluctuations tend to be greater than in the ipsilateral ear.
Previous research on virtual sound source externalization using air conduction (AC) has significantly enhanced listeners’ spatial perception. With the increasing adoption of VR, AR, bone conduction (BC) headphones, and BC hearing aids, there is growing interest in using BC transducers as spatial audio interfaces, mainly due to their ability to leave the ear canals open (Maier et al., 2022; Surendran et al., 2023).
However, BC transmission introduces cross-talk and results in smaller ILDs and ITD values (Rowan & Gray, 2008; Surendran & Stenfelt, 2023; Zurek, 1986). Farrell et al. (2017) used BC transducers to directly measure ILD and ITD based on intracochlear pressure measurements. Their findings indicated that while crosstalk can reduce ITDs, ILDs remain relatively robust against crosstalk in most cases. For bilaterally applied stationary BC signals, ITDs can be transformed into ILDs due to constructive and destructive interference (Deas et al., 2010; Rowan & Gray, 2008; Stenfelt & Zeitooni, 2013). Measurement of intercochlear ITDs and ILDs with bilateral BC stimulation has been conducted using signal cancellation techniques (Surendran & Stenfelt, 2023). Such data suggest that BC provides less effective spatial information compared to AC (Stenfelt & Zeitooni, 2013; Stenfelt et al., 2024; Zeitooni et al., 2016).
The maximum ITDs generated via BC are frequency-dependent, with values ranging from approximately 0.1 to 0.2 ms at frequencies below 500 Hz and increasing to around 0.5 ms at higher frequencies (Surendran & Stenfelt, 2023). In contrast, model-based analyses such as the rigid sphere model, where the radius of the sphere is assumed to be 0.0875 meters, demonstrate that ITDs under AC are approximately frequency-independent in both the low-frequency range, below 0.4 kHz, and the high-frequency range, above 3 kHz, for specific source azimuths (Kuhn, 1977; Xie, 2013). In these frequency regions, the ITD tends to asymptotically approach a constant value, with larger ITD values observed at lower frequencies. For example, at an azimuth of 90°, the ITDs at 0.1 kHz and 3.0 kHz are approximately 767 µs and 676 µs, respectively. In the transition region between 0.5 kHz and 3 kHz, the ITD exhibits frequency dependence, gradually shifting from the low-frequency to the high-frequency asymptote. Similar frequency-dependent ITD behavior has also been observed in empirical measurements using the KEMAR manikin (Kuhn, 1977). Current research on BC sound reproduction has largely focused on sound source localization and spatial unmasking in BC transmission (Wang et al., 2022a, 2024).
In BC experiments, vibrations from the BC transducer housing can generate airborne sound radiation into the ear canal, potentially affecting results (Surendran & Stenfelt, 2022; Surendran et al., 2023). To minimize this effect, suppressing radiated sound from the BC transducer is essential. In our previous research (Wang et al., 2022a), airborne sound from the BC transducer was blocked by fully inserting foam earplugs into the listeners’ ear canals. However, occluding the ear canal can lead to an occlusion effect, which enhances low-frequency sound perception (Stenfelt & Reinfeldt, 2007). Studies indicate that placing the occlusion device deep into the bony part of the ear canal can reduce or even eliminate the effects of occlusion (Stenfelt & Reinfeldt, 2007; von Békésy, 1941). Therefore, the BC experiments in the current study were conducted with both open and blocked ear canals, where the blocked condition involved deep insertion of foam earplugs. Deep insertion was ensured by having the lateral end of the earplug positioned inside the tragus supervised by the experimenter. This procedure eliminated radiated airborne sound from the transducer and simultaneously minimized the occlusion effect (Stenfelt & Reinfeldt, 2007; Wang et al., 2022a).
Despite growing interest in BC auditory processing, studies specifically investigating the externalization of virtual sound sources via BC remain scarce. The current study aims to bridge this gap by examining the externalization using BC stimulation and comparing it to AC, providing insights into the potential of BC transducers as alternatives to AC headphones for virtual sound reproduction. Therefore, the current study investigated sound source externalization at 45° azimuth under both AC and BC conditions, focusing on the influence of reverberation ratio on BC externalization and comparing externalization performance between AC and BC stimulation. The choice of a 45° sound source angle facilitates analysis of the contributions from the two ears. Additionally, several studies on binaural BC hearing have incorporated this specific angulation (Priwin et al., 2004; Stenfelt, 2005; Stenfelt & Zeitooni, 2013; Stenfelt et al., 2024; Zeitooni et al., 2016). Two experimental setups were designed to modify the reverberation ratio of BRIRs. In Experiment I, BRIR durations to the left and right ears were shortened to assess externalization perception using in-ear headphones and BC transducers. Experiment II adjusted reverberation by scaling the reverberant portion of BRIR signals based on a predefined DRR.
Method
Acquisition of BRIRs
The BRIRs used in the experiments were obtained from the University of Surrey's database (Leclère et al., 2019), which contains data from four different rooms. For this experiment, Room D was selected, a medium-to-large exhibition hall with approximate dimensions of 8.7 m × 8 m × 4.2 m (length × width × height), and a reverberation time (RT60) of 0.89 s. The database signals had been generated by placing a Head and Torso Simulator (HATS) 1.5 m away from a loudspeaker. Only the BRIR data corresponding to 0° and 45° azimuth angle was used in the current study.
Stimulation
The original speech signal used in the experiment was a 1.6-s segment extracted from the Sound Quality Assessment Material (SQAM), developed by the European Broadcasting Union (Tech, 2008). The speech segment was spoken in English by a female speaker (Li et al., 2021). Binaural stimuli were generated by convolving the original speech segment with BRIR-modified signals under different conditions.
Equipment
The experiment was conducted in a semi-anechoic chamber with a background noise level of 22 dBA. The setup consisted of a computer, a Fireface UFX II 12-channel sound card, Sennheiser IE800 in-ear headphones for AC, and Radioear B81 BC transducers. For the BC trials, bilateral BC transducers were positioned on the mastoids and secured with an elastic band, applying a static force of approximately 3N. The BC transducers were placed only once, ensuring consistency across all BC measurements. Care was taken to avoid contact between the BC transducers and the pinnae while maintaining symmetrical placements on both mastoids. In AC trials, participants used Sennheiser IE800 in-ear headphones.
The frequency response of the Radioear B81 BC transducers was measured using a Brüel and Kjær 4930 artificial mastoid before binaural stereo playback through the bilateral transducers. Based on the measured frequency response, compensation filters were designed and applied to equalize the BC transducer output within an effective bandwidth of 100–8,000 Hz, minimizing frequency response-related distortions during the experiment (Farina, 2007; Xie, 2013).
Subjective Listening Test Procedure
The same nine participants (three females and six males) took part in the two subjective listening experiments. All participants had normal hearing, with hearing thresholds of 25 dB HL or better across all frequencies from 125 Hz to 8 kHz in both ears.
Two graphical user interfaces (GUIs) were designed for Experiments I and II, each corresponding to the linear externalization rating scales in Tables 1 and 2. The first experiment assessed sound source externalization using a subjective rating scale similar to that used in previous studies (Catic et al., 2015; Hartmann & Wittenberg, 1996). The scale, implemented as a graphical slider, had a step size of 0.1, and ranged from 0 (I hear the sound in my head) to 3 (fully externalized) with the integer values as shown in Table 1. A reference signal, derived from the unmodified BRIR-convolved speech signal, served as an anchor corresponding to the loudspeaker location, was assigned the highest rating (3).
Scale Used for Evaluating the Perception of Externalization in Experiment I.
Scale Used for Evaluating the Perceived Relative Distance in Experiment II.
The second experiment evaluated the perceived relative distance, as relative distance perception correlates with how far sound sources appear to be (Werner et al., 2016). The rating scale, shown in Table 2 (Li et al., 2018b), also had a step size of 0.1 but a range of −1 to +1. Here, the unmodified BRIR signals and the BRIR-convolved speech signals used as reference signals were rated as 0. As the DRR decreased, participants likely perceived the sound source as externalized and positioned farther than the reference, approaching a rating of 1. Conversely, as DRR increased, the ratings approached −1.
Before conducting the experiments, loudness equalization was performed for AC and BC conditions (both with BC-open and BC-blocked) to ensure that the BC stimuli were perceived at the same loudness as the AC stimuli. The stimulus used for loudness equalization was derived from the BRIR database at Surrey University, with a 0° azimuth and a sound source distance of 1.5 m. The AC stimulus was presented through IE800 headphones at 65 dB SPL, and the BC stimulus was delivered via a transducer placed on the mastoid process. AC and BC stimuli were presented alternately, and participants adjusted the level of the BC stimulus to match the loudness level of the AC stimulus at 65 dB SPL (Pollard et al., 2013; Qin & Usagawa, 2017).
Before the formal experiments, participants listened to all stimuli once to familiarize themselves with the auditory perception. Each trial began with the presentation of a reference signal, followed by a 400-ms silent interval, and then a modified sounds stimulus, which participants rated. It should be noted that this experiment aimed to evaluate perceived externalization for different experimental conditions. Therefore, other perceptual attributes, such as plausibility (Lindau & Weinzierl, 2012), authenticity (Brinkmann et al., 2014), and coloration (Crawfordemery & Lee, 2014) were not evaluated. Experiments were performed sequentially for AC, BC-open, and BC-blocked under each truncation or modification condition. Stimuli were presented in random order, and participants could request a repetition if they were uncertain about their rating. The slider step size on the GUI was 0.1, consistent with previous studies (Li et al., 2021). Each participant completed three repetitions of the blocks, and the average results across the three trials were used for analysis.
Experimental Procedures
Experiment I: Monaural and Binaural Truncation
This experiment assessed the effect of BRIR duration on the perceived externalization of BC sound sources. The BRIRs were truncated into various time intervals: 2.5, 5, 10, 20, 40, 80, 120, and 200 ms. The sound source was located at a 45° azimuth in the right frontal direction, with the left ear as the contralateral ear (farthest from the source) and the right ear as ipsilateral ear (closest to the source). Three experimental conditions were investigated:
Truncated contralaterally: The BRIRs of the contralateral ear were truncated at different time intervals, while the ipsilateral BRIRs remained unaltered. Truncated ipsilaterally: The BRIRs signal of the ipsilateral ear was truncated at different time intervals, while the BRIRs of the contralateral ear remained unaltered. Truncated bilaterally: The BRIRs of both ears were truncated equally at different time intervals.
All truncated BRIRs signals were zero padded to a total duration of 1,000 ms to ensure uniformity for comparison.
Experiment II: Monaural and Binaural Modifications
This experiment explored the role of DRR in perception of distance and externalization in reverberant environments (Zahorik et al., 2005). DRR is defined as the ratio of direct sound energy to reverberant sound energy within the BRIRs signal, expressed as:
The original DRR values for the ipsilateral (right) and contralateral (left) ear BRIRs were 11.3 dB and −2.1 dB. The DRR values were adjusted by −6, −4, −2, 0, + 5, + 10, + 20 dB leading to the following levels:
Ipsilateral (right) ear: 5.3, 7.3, 9.3, 11.3 (original), 16.3, 21.3, and 31.3 dB. Contralateral (left) ear: −8.1, −6.1, −4.1, −2.1 (original), 2.9, 7.9, and 17.9 dB.
Three experimental conditions were tested:
Modified ipsilaterally: The DRR of the ipsilateral ear was modified, while the contralateral DRR remained unchanged. Modified contralaterally: The DRR of the contralateral ear was modified, while the ipsilateral DRR remained unchanged. Modified bilaterally: The DRR of both ears was modified simultaneously.
This experimental design allowed for the investigation of monaural and binaural DDR modifications in the externalization of BC sound reproduction.
Results and Analysis
Experiment I: Effect of Truncation and Stimulation Condition on Sound Source Externalization
Figure 1 shows the average externalization scores for the three truncation conditions (“truncated bilaterally,” “truncated contralaterally,” and “truncated ipsilaterally”) across three stimulation conditions: AC and BC (BC-open and BC-blocked). Among the truncation conditions, ipsilateral truncation resulted in the highest externalization scores of around 2.65, which remained consistent regardless of the truncation window duration. This suggests that ipsilateral truncation by itself did not substantially impair externalization, with participants perceiving the sound source as being close to the reference position. Contralateral truncation yielded the second-highest externalization scores, whereas bilateral truncation produced the lowest scores indicating an additive effect of bilateral truncation. Externalization scores increased with longer truncation durations, particularly between 20 and 120 ms. However, beyond 200 ms, externalization scores plateaued at around 2.65, indicating that further increases in truncation duration did not significantly enhance externalization.

The Average Sound Source Externalization Scores for the “Truncated Bilaterally” (Blue Square Line), “Truncated Contralaterally” (Orange Circle Line), and “Truncated Ipsilaterally” (Black Cross Line) Across Three Different Stimulation Conditions with a Sound Source Azimuth Angle of 45°. The Error Bars Represent the Standard Deviations, and Dots are the Individual Data. (a) AC Headphones, (b) BC-Open, and (c) BC-Blocked.
A three-factor repeated-measures ANOVA was conducted to examine the effects of window length, stimulation condition (AC, BC-open, BC-blocked), and truncation condition (ipsi, contra, bilateral) on sound source externalization. A Shapiro–Wilk test confirmed that the data in each group followed a normal distribution (p > .05).
The ANOVA showed significant main effects of truncation condition and window length on sound source externalization (F(2,16) = 344.536, p < .001; F(8,64) = 251.970, p < .001), while stimulation condition did not reach significance (F(2,16) = 2.313, p = .131). There was a significant interaction between truncation condition and window length (F(16,128) = 164.735, p < .001), and among all three factors (F(32,256) = 2.106, p = .001). The other interactions did not reach significance (p > .05).
Bonferroni-corrected Post hoc tests were conducted to analyze the externalization results across stimulation conditions (AC, BC-open, and BC-blocked), using the truncation duration of 2.5 ms as baseline. For bilateral truncation, externalization scores were significantly higher for truncation durations of 40 ms or longer (p < .05), compared to 2.5 ms. No significant differences in externalization scores were found between truncation durations of 120 ms and 1000 ms (p > .05), suggesting that increasing reverberation beyond 120 ms does not significantly improve externalization compared to fully reverberant BRIR (1,000 ms). This has implications for VR sound rendering, as selecting the minimal truncation duration required for externalization can reduce computational complexity without compromising auditory perception.
For contralateral truncation, during AC stimulation, truncation durations of 40 ms or longer significantly improved externalization. However, under BC stimulation (BC-open and BC-blocked), 80 ms or longer was required for significant externalization improvement. No significant differences were observed between truncation durations of 80 ms and 1,000 ms across all stimulation conditions (p > .05).
For ipsilateral truncation, externalization scores remained unchanged across truncation conditions (p > .05), indicating that ipsilateral truncation by itself had minimal impact on perceived externalization.
Further analysis revealed that there were significant differences in externalization scores between all three truncation conditions (all p < .001). Among the three truncation conditions, bilateral truncation yielded the lowest externalization scores, followed by contralateral truncation, while ipsilateral truncation resulted in the highest externalization scores. These results indicate that reverberant energy at both the ipsilateral and contralateral ears significantly contributes to perceived externalization. However, reverberation energy at the contralateral ear has a greater impact on the externalization ratings than the reverberation energy at the ipsilateral ear.
Experiment II: Effect of DRR Modification on Perceived Relative Distance and Externalization
Figure 2 displays the average perceived relative distance scores across three stimulation conditions: AC, BC-open, and BC-blocked. These scores are shown for the “modified bilaterally,” “modified contralaterally,” and “modified ipsilaterally” conditions. Overall, changes in perceived relative distance due to DRR modification are similar across the three stimulation types. In Figure 2, a score of 0 represents no change in perceived relative distance (corresponding to the unprocessed BRIR signal), while positive scores indicate increased externalization, and negative scores signify a decrease. The average externalization score for the unprocessed BRIR signal is 2.65 (Figure 1), reflecting strong externalization.

Mean Perceived Relative Distances for the “Both Modified” (Blue Square Line), “Modified Contralaterally” (Orange Circle Line), and “Modified Ipsilaterally” (Black Cross Line) Conditions Under Three Different Stimulation Conditions at a Sound Source Azimuth Angle of 45°. The Error Bars Represent the Standard Deviation of the Mean, and Dots are Individual Data. (a) AC Headphones, (b) BC-Open, and (c) BC-Blocked.
Figure 2 shows that, across all three modification conditions, increasing reverberation relative to direct sound (negative DRR variation) results in greater perceived distance, whereas decreasing reverberation (positive DRR variation) reduces perceived distance. This pattern suggests that higher reverberation levels enhance externalization, while reduced reverberation brings the perceived source closer, with the most pronounced effect observed under the “modified contralaterally” condition, while the effect is less pronounced under the “both modified” condition.
Based on the data in Figure 2, a three-factor repeated measures ANOVA was conducted to examine the effects of DRR variation, stimulation condition (AC, BC-open, BC-blocked), and ear modification (both modified, modified contralaterally, modified ipsilaterally) on the experimental results. A Shapiro–Wilk test confirmed that the data were normally distributed (p > .05).
There were significant main effects of ear modification and DRR variation on sound source externalization (F(2,16) = 6.594, p = .008; F(6,48) = 156.398, p < .001), while stimulation condition did not reach significance (F(2,16) = 2.142, p = .150). Significant interactions were found between stimulation condition and DRR variation, as well as between ear modification and DRR variation (F(12,96) = 3.360, p < .001; F(12,96) = 70.746, p < .001) and the three-way interaction between stimulation condition, ear modification, and DRR variation (F(24,192) = 2.083, p = .003), while the interaction between stimulation condition and ear modification did not reach significance (F(4,32) = 0.695, p = .601).
Bonferroni-corrected Post-hoc tests were conducted for each modification condition across the three stimulation conditions, with comparisons made against the baseline condition (DRR variation = 0 dB). For the bilateral condition (both modified), across all three stimulation conditions, a DRR variation of −4 dB or lower resulted in significantly higher externalization scores. Conversely, a DRR variation of 5 dB or larger resulted in significantly lower externalization scores.
For the “modified contralaterally” condition, the results were largely similar to those for the bilateral modified condition. However, under AC and BC-blocked conditions, a DRR variation of 10 dB or greater was required to produce significantly lower externalization scores. Moreover, for all three stimulation conditions, there were no significant differences in externalization scores between 10 dB and 20 dB DRR variation (p > .05).
For the “modified ipsilaterally” condition, for AC stimulation, a DRR variation of −4 dB or lower resulted in significantly higher externalization scores, while 10 dB or greater resulted in significantly lower externalization scores. For both BC-open and BC-blocked, only a DRR variation of −6 dB led to significantly higher externalization scores. Under BC-blocked, a DRR variation of 20 dB or larger was required to significantly reduce externalization scores, whereas no such effect was observed for BC-open.
Further analysis revealed that there were significant differences in externalization scores between “both modified” and “modified ipsilaterally” (p < .001), as well as between the “modified contralaterally” and “modified ipsilaterally” (p = .026) while the difference between “both modified” and “modified contralaterally” was not significant (p = .460).
These findings indicate that externalization is primarily influenced by modifications applied to the contralateral ear, rather than the ipsilateral ear. This conclusion is consistent with the results of Experiment 1, where contralateral reverberation had more significant impact on externalization, although reverberation at both the ipsilateral and contralateral ears contributed to the perceived externalization. These results of the two experiments highlight the importance of the contralateral ear in externalization perception, particularly when modifying the DRR in simulated auditory environments.
DRR, FFV and IC in Experiments I and II
Figures 3 to 5 show the DRR, FFV, and IC under varying conditions in Experiments I and II, which were computed from the BRIR-convolved speech signals. It should be noted that due to the unknown characteristics of cross-talk in bone conduction, the cochlear signals under BC stimulation are not known, preventing computation of these parameters. DRR was calculated according to equation (2), while FFV was determined based on averaging the absolute amplitude difference between adjacent frequency intervals, with magnitudes calculated in dB. Since the duration of the BRIR signal was aligned to 1,000 ms under all experimental conditions, the resulting frequency resolution was 1 Hz. The average IC was computed by first filtering the BRIR signals using a 4th-order gammatone filterbank consisting of 48 filters. These filters were spaced evenly along the Equivalent Rectangular Bandwidth (ERB) scale, with center frequencies ranging from 100 Hz to 8,000 Hz covering the spectral range of the test stimuli. For each filtered BRIR signal, the maximum value of the interaural cross-correlation was extracted, and the final IC value was obtained by averaging these maxima across all frequency channels (Jiang et al., 2020). Consequently, IC was obtained from the maximum value of the binaural inter-correlation given by:

DRR Under All Experimental Conditions in (a) Experiment I and (b) Experiment II. In Each Subplot, the Blue Circle Line Represents the DRR of the Contralateral Ear, and the Orange Square Line Represents the DRR of the Ipsilateral Ear.

FFV Under All Experimental Conditions in (a) Experiment I and (b) Experiment II. In Each Subplot, the Blue Circle Line Represents the FFV of the Contralateral Ear, and the Orange Square Line Represents the FFV of the Ipsilateral Ear.

IC Under All Experimental Conditions in (a) Experiment I and (b) Experiment II. In Each Subplot, the Blue Circle Line, Orange Square Line, and Yellow Asterisk Line Represent the IC Under Different Conditions.
Here,
The results suggested that changes in DRR and FFV in the contralateral signal have a greater effect on perceived sound source location than those at the ipsilateral ear. This is likely because reverberation fluctuations are typically more pronounced at the contralateral ear. In both AC experiments, increasing either the length of the BRIR signal or the proportion of reverberation energy resulted in a lower DRR (Figure 3), higher FFV (Figure 4), and lower IC (Figure 5), and the changes observed at the contralateral ear played a more critical role in modulating these acoustic parameters and consequently in enhancing the externalization percept. Thus, DRR, FFV, and IC serve as reliable indicators of sound source externalization (Li et al., 2018b). The BC experiments in the current study also indicated that reverberation, particularly at the contralateral ear, contributed to enhanced externalization perception as in the AC conditions. However, as stated above, due to the unknown characteristics of cross-talk in bone conduction, it is not feasible to compute these parameters directly from the cochlear signals under BC conditions.
Influence From DRR, FFV, and IC on AC Externalization Ratings
To further investigate the influence from DRR, FFV, and IC on AC externalization ratings, linear regression analyses were conducted for each truncation and modification condition. For each condition, DRR, FFV, and IC were treated as independent predictor variables and entered into the models individually, with contralateral and ipsilateral ear data analyzed separately rather than pooled together. It should be noted that the dependent variable for all regression analyses was the mean externalization rating averaged across all participants under each condition, rather than individual participant scores. When only the contralateral ear was truncated or modified, the DRR and FFV values for the contralateral ear were exclusively included as predictors, as the ipsilateral parameters remained constant and thus provided no explanatory power. Conversely, when only the ipsilateral ear was affected, solely the ipsilateral parameters were considered. This approach ensured that each regression model incorporated only the variables that actually varied within the given condition, thereby enhancing the validity and interpretability of the resulting linear fits.
A Shapiro–Wilk test confirmed that the data for each individual regression analysis conformed to a normal distribution (p > .05). The results of the linear regression analyses demonstrated that DRR consistently exhibited a significant negative association with the mean externalization ratings across all conditions (β < 0, all p < .05), indicating that higher DRR systematically reduced perceived externalization. In contrast, FFV maintained a stable positive correlation with the mean ratings in all conditions (β > 0, all p < .05), suggesting that greater FFV enhanced externalization perception. Furthermore, IC was significantly negatively related to externalization under all conditions (β < 0, p < .05) except for the ipsilateral truncation condition, where it showed a significant positive effect (β = 12.022, p = .011). Overall, these results confirm that DRR, FFV, and IC each maintain a stable linear relationship with virtual sound source externalization ratings, with DRR and IC generally exerting a negative effect and FFV consistently showing a positive influence, and all predictors reached statistical significance (p < .05). To provide a clearer visualization of the relationships between each parameter and the mean externalization ratings, Figure 6 presents scatter plots and linear fits between DRR and the mean externalization ratings across different truncation and modification conditions, Figure 7 illustrates the corresponding relationship for FFV, and Figure 8 displays the results for IC.

Scatterplots and Linear Regression Curves Between DRR and the Mean Externalization Ratings Under Each Truncation and Modification Condition: (a) Experiment I and (b) Experiment II. In (a), the Blue Regression Curve Represents the Linear Relationship Between the DRR of the Contralateral Ear and Externalization Ratings Under the “Both Truncated” Condition; the Orange Regression Curve Represents the Linear Relationship Between the DRR of the Ipsilateral Ear and Externalization Ratings Under the “Both Truncated” Condition; the Yellow Regression Curve Represents the Linear Relationship Between the DRR of the Contralateral Ear and Externalization Ratings Under the “Truncated Contralaterally” Condition; and the Green Regression Curve Represents the Linear Relationship Between the DRR of the Ipsilateral Ear and Externalization Ratings Under the “Truncated Ipsilaterally” Condition. In (b), the Same Color Scheme is Used to Represent the Regression Curves that Correspond to the “Modified” Conditions.

Scatterplots and Linear Regression Curves Between FFV and the Mean Externalization Ratings Under Each Truncation and Modification Condition: (a) Experiment I and (b) Experiment II. In (a), the Blue Regression Curve Represents the Linear Relationship Between the FFV of the Contralateral Ear and Externalization Ratings Under the “Both Truncated” Condition; the Orange Regression Curve Represents the Linear Relationship Between the FFV of the Ipsilateral Ear and Externalization Ratings Under the “Both Truncated” Condition; the Yellow Regression Curve Represents the Linear Relationship Between the FFV of the Contralateral Ear and Externalization Ratings Under the “Truncated Contralaterally” Condition; and the Green Regression Curve Represents the Linear Relationship Between the FFV of the Ipsilateral Ear and Externalization Ratings Under the “Truncated Ipsilaterally” Condition. In (b), the Same Color Scheme is Used to Represent the Regression Curves that Correspond to the “Modified” Conditions.

Scatterplots and Linear Regression Curves Between IC and the Mean Externalization Ratings Under Each Truncation and Modification Condition: (a) Experiment I and (b) Experiment II. In (a), the Blue, Orange, and Yellow Regression Curves Represent the Linear Relationships Between IC and Externalization Ratings Under the “Both Truncated,” “Truncated Contralaterally,” and “Truncated Ipsilaterally” Conditions, Respectively. In (b), the Regression Curves for the “Modified” Conditions Use the Same Color Scheme.
Discussion
The Effect of Reverberation Ratio in BRIR Signals on Perceived Sound Source Externalization
Despite significant advancement in understanding virtual sound source externalization, the state of literature leaves several questions open. Studies have demonstrated that several factors, including reverberation, reliable self-motion cues (Brimijoin et al., 2013; Hendrickx et al., 2017; Loomis et al., 1990), personalized HRTFs (Kim & Choi, 2005), and consistent visual information (Klein et al., 2017), can all influence externalization perception. However, the complex interplay among these factors makes it difficult to establish clear criteria for their relative importance (Best et al., 2020; Jiang et al., 2020). Although externalization and distance perception are distinct auditory phenomena, as spatial perception can still occur even in the absence of externalization, there is general agreement that reverberation plays a critical role in spatial perception across various conditions (Leclère et al., 2019; Lombera et al., 2025; Li et al., 2019).
The current study primarily investigated the differences in sound source externalization between AC and BC stimulation, with a specific focus on the role of reverberation. To isolate the influence of radiated sound from a BC transducer, the BC condition was further divided into two subsections: open ear canal (BC-open) and blocked ear canal (BC-blocked). By systematically manipulating the duration of reverberant sound and the DRR, the finding consistently demonstrated that externalization perception improves as the amount of reverberant sound increases. This enhancement is mainly driven by increased reverberation in the contralateral ear. However, once the reverberation reaches a certain threshold, further increases do not lead to additional improvements in externalization perception, indicating a saturation effect.
The externalization ratings from both experiments revealed no significant differences in externalization perception across the three reproduction conditions (AC, BC-open, and BC-blocked). This suggests that participants primarily relied on the proportion of reverberation to judge externalization, leading to similar perceptual effects across different reproduction conditions. As a result, the perception of sound source distance remained largely consistent across the three conditions in both experiments. These findings highlight the dominant role of reverberation-based cues in externalization perception, regardless of the specific auditory transmission pathway.
The Influence of AC and BC Stimulation on Externalization Perception
The results presented in Figures 2 and 3 indicate that externalization outcomes were consistent across both AC and BC stimulation. This suggests that the ILD and ITD alterations caused by BC cross-talk did not significantly affect externalization in reverberant conditions. However, previous research demonstrated that ITDs and ILDs provide critical cues for binaural hearing and contribute to the externalization of virtual sound sources in anechoic conditions (Hartmann & Wittenberg, 1996). This suggests that reverberation probably was the dominant cue used by participants to evaluate externalization, potentially overshadowing the effects of other factors, such as ITDs and ILDs. ITDs and ILDs could be impacted (reduced) by cross talk, but ILDs remained more robust at least at higher frequencies. As such, ILDs might be a useful cue for externalization of sounds via BC. Future experiments should investigate whether listeners rely exclusively on ILDs for externalization when reverberation cues are absent (e.g., in anechoic environments) or if ILDs continue to influence distance perception even in reverberant conditions.
The foam earplugs that were used to eliminate the radiation of airborne sound into the ear canal might introduce an occlusion effect, which is defined as an increase in perceived sound with occlusion of the ear canal when the stimulus is presented through BC (Stenfelt & Reinfeldt, 2007). To prevent airborne sound radiation from the BC transducer housing, deeply inserted foam earplugs were used, which may introduce an occlusion effect quantified as the difference in hearing thresholds between the open and occluded ear canal conditions (Reinfeldt et al., 2013). Current study revealed the occlusion effect to be below 10 dB at frequencies up to 8 kHz. This is similar to results in Wang et al. (2022a) using the same occluding devices, and also similar to occlusion effects of Koss Porta Pro headphones and Sennheiser HD650 headphones (Wang, Lu, et al., 2022a; Wang, Stenfelt, et al., 2022b). Therefore, the occlusion effect caused by deeply inserted foam earplugs in this experiment can be considered minimal in terms of altered magnitudes. The consistent externalization ratings obtained across the two BC stimulation conditions (BC-open and BC-blocked) further indicate that radiated sound did not significantly influence BC externalization.
Externalization Versus Distance Perception
An important issue in defining sound externalization is its relationship with auditory distance perception. While these two perceptual phenomena share some similarities, research suggests they can also exhibit independent characteristics depending on the context (Bidart & Lavandier, 2016; Kopco et al., 2020; Lavandier et al., 2024). On the one hand, to some extent, externalization may be a specific manifestation of distance perception, reflecting different levels of spatial localization accuracy (Durlach et al., 1992). They may form a continuum in certain contexts, sharing spatial cues and neural mechanisms (Callan et al., 2013; Hunter et al., 2002; Kopčo et al., 2012). Based on previous findings indicating a substantial correlation between perceived relative distance and the degree of externalization (Werner et al., 2016), the present study employed distance-related perceptual rating scales to assess the externalization of virtual sound sources. The analysis focused on the associations between externalization and three acoustic features: DRR, FFV, and IC. The observed trends were consistent with prior findings on auditory distance perception, indicating that externalization increased when DRR decreased (Zahorik et al., 2005), FFV increased (Shinn-Cunningham et al., 2005), and IC decreased (Jiang et al., 2020). These results imply that distance perception and externalization may not be entirely dissociable constructs. Moreover, although the present study quantitatively assessed the extent to which each of these features contributed to externalization, it remains to be determined whether their relative influence corresponds to their role in distance perception, highlighting the need for further investigation. On the other hand, the study by Lavandier et al. (2024) revealed that externalization is not simply a function of distance perception. Instead, factors such as sound type, reverberation, and source azimuth can lead to a disassociation between the two. The distinct perceptual mechanisms and methodological differences suggest they can also be treated as separate dimensions. For example, Lavandier et al. (2024) noted in their setup that while the piano was judged as more externalized than the speech, it was not evaluated as more distant, highlighting a dissociation between the two percepts. In the current study, a single signal type was used, leading to insufficient exploration of this perceptual dimension. However, in future studies on evaluating externalization under multiple influencing factors, the exclusive use of distance-related rating scales may not be appropriate. Additionally, although externalization in the present experiment was manipulated by altering reverberation, participants were instructed to evaluate the perceived externalization of the presented stimuli, not the perceived reverberation itself.
Room Diffusion Effect
Another important issue is that while reverberation significantly influences sound source externalization, the auditory signals must remain credible, natural, and aligned with the listener's expectations to ensure robust externalization. Previous studies have shown that if the perceived acoustic environment deviates substantially from the listener's expectations, such as when the acoustic characteristics of a virtually synthesized space differ markedly from those of the actual listening environment, externalization can deteriorate (Plenge, 1974). In audio engineering, this phenomenon is referred to as the “room diffusion effect” (Klein et al., 2017; Werner et al., 2016).
Although the listening room in the current study does not match the virtual room, the externalization scores obtained align in trend and range with those reported by Li et al. (2018b), who used the same virtual room. The key difference is that, in the study of Li et al. (2018b), the real and virtual environments were acoustically matched while the focus of this study is on comparing virtual sound externalization between BC and AC. Since the acoustic mismatch between listening and virtual rooms was consistent across all stimulus conditions, it is unlikely to affect the comparative analysis of BC and AC externalization.
The current study only examined externalization in static spatial conditions, without considering dynamic factors, such as head movements or moving sound sources. In real-world listening environments, these dynamic elements are common and can enhance externalization perception (Brimijoin et al., 2013; Hendrickx et al., 2017). Future research should investigate the influence of such dynamic factors on BC-based virtual sound externalization.
The current study provides insights into BC virtual sound source externalization, contributing to the development of more natural and realistic BC sound reproduction. The findings may enhance BC device usability in virtual sound environments, supporting broader adoption and application.
Conclusion
The current study investigated the effects of reverberation on sound source location perception under three experimental conditions: AC, BC-open, and BC-blocked. Two experiments were conducted by manipulating the length of BRIRs and the DRR to change the proportion of reverberation. Results from both rating experiments emphasized the importance of reverberation in enhancing externalization perception for virtual sound sources, with a stronger effect observed at the contralateral ear compared to the ipsilateral ear. Notably, externalization scores for the BC were as high as those for the more typical AC when identical BRIR transducer signals were used. Separate simple linear regression analyses of ratings based on DRR, FFV, and IC demonstrated that all three parameters maintained stable linear relationships with externalization ratings.
The current study specifically focused on sound source externalization under BC conditions with BC transducers placed at the mastoid process and considered only a single sound source azimuth. Future research should explore additional factors influencing externalization by investigating different BC transducer placement positions. Additionally, non-individualized BRIR were used in the experiments. While some studies have shown that personalized BRIRs have minimal impact on externalization, potential influences cannot be entirely ruled out. Furthermore, future studies may also consider testing different sound source azimuths to further understand the impact of spatial factors on externalization.
Footnotes
Acknowledgments
We would like to thank all the participants in the current study.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Fundation (grant numbers 11974086, 12074403, and 12411530075).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
