Comparing the Auditory Distance and Externalization of Virtual Sound Sources Simulated Using Nonindividualized Stimuli

Abstract

When reproducing sounds over headphones, the simulated source can be externalized (i.e., perceived outside the head) or internalized (i.e., perceived within the head). Is it because it is perceived as more or less distant? To investigate this question, 18 participants evaluated distance and externalization for three types of sound (speech, piano, helicopter) in 27 conditions using nonindividualized stimuli. Distance and externalization ratings were significantly correlated across conditions and listeners, and when averaged across listeners or conditions. However, they were also decoupled in some circumstances: (1) Sound type had different effects on distance and externalization: the helicopter was evaluated as more distant, while speech was judged as less externalized. (2) Distance estimations increased with simulated distances even for stimuli judged as internalized. (3) Diotic reverberation influenced distance but not externalization. Overall, a source was not rated as externalized as soon as and only if its perceived distance exceeded a threshold (e.g., the head radius). These results suggest that distance and externalization are correlated but might not be aspects of a single perceptual continuum. In particular, a virtual source might be judged as both internalized and with a distance. Hence, it could be important to avoid using a scale related to distance when evaluating externalization.

Keywords

spatial perception hearing virtual acoustics localization

Introduction

Usually, a sound is experienced as coming from a specific point in the space around the listener (the source position), with a direction (azimuth and elevation) and a distance. The use of headphones often gives the impression that the simulated sound source is “internalized,” perceived within the head (Jeffress and Taylor, 1961). Using binaural synthesis, it is possible to create an externalized percept, simulating a source perceived outside of the head (Blauert, 1997; Durlach et al., 1992). A recent review discussed the subjective experience of sound externalization, its definition, and measurements (Best et al., 2020). The authors review a wealth of knowledge on sound externalization but acknowledge that the link between auditory distance and externalization is not clearly established yet. Specifically, it is not clear whether externalization arises simply because the sound source has a (simulated) distance from the listener. The aim of the present study was to further investigate this link.

Some authors assume that the percepts of distance and externalization are aspects of a single perceptual continuum (Durlach et al., 1992; Hartmann and Wittenberg, 1996), where the center of the head is internalized and is the 0-m reference used to evaluate distance (Best et al., 2020), and where a source is externalized as soon (and only if) its perceived distance exceeds the head radius. The present study aimed to challenge this idea by systematically comparing ratings of perceived distance and externalization. The conditions tested here were not used to highlight any new effect on distance or externalization. They were chosen based on the effects previously highlighted in the literature (which could reveal differences in behavior between distance and externalization), in studies that measured either distance or externalization. Here, they were measured in parallel by the same listeners on the same stimuli, so that they could be directly compared.

Overall, we expected that distance and externalization ratings would be correlated, as similar activation patterns in the temporal gyri have been reported for distance and externalization tasks (Callan et al., 2013; Kopčo et al., 2012), even if the corresponding brain activations do not completely overlap. However, we also expected that distance and externalization would not be perfectly correlated, and that externalization would not just be a binary version of distance either, that is to say that a sound source would not be rated as externalized as soon as and only if its perceived distance exceeded a threshold (e.g., the head radius). This would indicate that, while distance and externalization are not independent percepts, they might not be aspects of a single perceptual continuum either. In accordance with this hypothesis, distance judgments seem possible when the percept of externalization is weak, such as with diotic sounds or frontal sources (e.g., Bidart and Lavandier, 2016; Kopčo et al., 2020; Prud’homme and Lavandier, 2020). Moreover, it has been established that lateralizing the source enhances its externalization (Brimijoin et al., 2013; Kates et al., 2018; Leclère et al., 2019), while distance perception appears rather similar for frontal and lateral sources (Zahorik, 2002), except at short distance (<1 m; Brungart et al., 1999; Kopčo et al., 2020). On the one hand, externalization ratings seem to depend mainly on binaural cues (Best et al., 2020), in particular their reverberation-induced variations (Catic et al., 2013; Leclère et al., 2019; Li et al., 2019). Leclère et al. (2019) showed that reverberation enhances externalization only if it creates signal differences across the two ears, so that diotic reverberation does not improve externalization. On the other hand, distance perception is dominated by monaural cues (Kolarik et al., 2016; Zahorik et al., 2005), in particular sound level and direct-to-reverberant ratio (DRR). While reverberation produces monaural cues crucial for distance perception, it seems that the reverberation-induced variations in binaural cues do not influence distance judgments for frontal sources (Bidart and Lavandier, 2016; Prud’homme and Lavandier, 2020). Interaural differences seem to influence distance only for very close lateral sources (<1 m; Brungart et al., 1999). Finally, the vastly different scales used to evaluate externalization further complicate the issue, in particular when distance labels are used for the externalization scale (for a review of externalization evaluations see Best et al., 2020).

While sound level is a strong distance cue, not much is known about its effect on externalization. Most externalization studies have used stimuli that did not vary in level, because the sources were simulated at a unique distance from the listener (e.g., Brimijoin et al., 2013; Hendrickx et al., 2017; Kates et al., 2018) or because stimuli were equalized in level (e.g., Catic et al., 2015; Hartmann and Wittenberg, 1996; Leclère et al., 2019). Catic et al. (2013) tested stimuli that varied in broadband level but also in bandwidth so that it is not possible to isolate the potential effect of level on externalization ratings. Hartmann and Wittenberg (1996) mentioned a problem with a varying level when evaluating externalization, at least for random level variations that were informally reported as making the sounds “jump around the room, sometimes jumping into the head.” When considering the influence of level on distance, it is also important to keep in mind its potential interaction with the listeners’ experience and their level expectations depending on the type of source considered. Comparing whispered, conversational, and shouted speech, Brungart and Scott (2001) showed that speech type can have more influence on distance estimates than level (which had no influence on whispered speech, perceived at a fixed close distance). Leclère et al. (2019) did not highlight such a strong effect of sound type on externalization ratings when comparing noise, speech, music, and environmental sounds (clinking bottles).

The aim of the present study was to better understand the link between the percepts of distance and externalization by comparing distance and externalization ratings on the same set of stimuli. Participants were asked to judge the same 27 simulated auditory conditions on distance and externalization tasks. These conditions varied in terms of source azimuth and (simulated) distance, level of reverberation, and amount of binaural information present in the stimuli. Three types of sound (speech, piano, helicopter) differing in terms of level expectations were convolved with binaural room impulse responses (BRIRs) and anechoic head-related transfer functions (HRTFs) corresponding to different simulated azimuths (and distances in the room). Some BRIR-convolved stimuli were also averaged across ears to investigate diotic reverberation. Finally, diotic anechoic stimuli were tested as a reference for internalized sound sources.

As the influence of sound level on externalization was not known, all the stimuli were equalized in level. Arend et al. (2021) showed that level or loudness equalization does not fully remove all level cues for distance. These cues could mask the influence of more subtle cues, such as ILDs for nearby lateral sources, and even lead to distance ratings varying inversely to the simulated distance in anechoic conditions where few other distance cues were available. To eliminate the use of level cues, researchers can choose to rove the sound level that thus varies randomly from trial to trial (Brungart et al., 1999; Kopčo and Shinn-Cunningham, 2011). Here, the aim was not to remove level cues for distance, but to avoid level variations that could severely impair the evaluation of externalization (Hartmann and Wittenberg, 1996), thus roving level was not an option. We chose to equalize the overall broadband level of the stimuli, as previous studies have shown that it can still allow for reliable distance ratings especially when reverberation provides for additional distance cues (Akeroyd et al., 2007; Bidart and Lavandier, 2016; Mershon and Bowers, 1979; Prud’homme and Lavandier, 2020). While the overall level was equalized across conditions, this was not the case for the direct and reverberated sound levels and the DRR that could still be used as cues.

Even if early studies on externalization pointed to the importance of reproducing with headphones the ear signals as they would be produced by real sources, the magnitude of the effect of the individualization of the stimuli on externalization is still unclear. Some studies mentioned some improvements in externalization when stimuli were individualized (using the listeners own HRTFs) compared to generic (e.g., using measurements from a manikin), in particular for less externalized frontal sources (Kim and Choi, 2005; Werner et al., 2016). However, the individualization of stimuli improved only marginally the externalization ratings measured by Cubick et al. (2015) and Leclère et al. (2019). Begault et al. (2001) did not observe any significant effect of individualization when measuring externalization in anechoic and reverberant conditions. This was also the case for Kates et al. (2018), who showed that a generic binaural model was as effective as individualized stimuli to produce an externalized image in reverberant conditions. Reviewing recent literature on externalization, Best et al. (2020) concluded that individual spectral cues may not be critical for externalization in realistic listening conditions. Concerning distance perception, Zahorik (2002b) showed that listeners’ performance in judging distance was not impaired (in terms of both the individual estimates and their standard deviations) by using the HRTFs of another listener; while Prudhomme and Lavandier (2020) did not find any significant difference in distance estimates for individualized stimuli compared to stimuli based on manikin measurements. The present study only used such nonindividualized stimuli. Individualized measurements are not yet available to everybody, and certainly not on a large scale, so any applications associated with externalization and virtual distance could benefit from knowledge on the externalization and perceived distance of nonindividualized stimuli.

The distance and externalization ratings for all the stimuli were systematically compared by computing their correlation. The hypothesis was also tested that externalization ratings could result directly from the distance ratings, the source being rated as externalized as soon as the distance rating exceeds a threshold (e.g., the head radius). Finally, distance and externalization were compared by investigating their potential differences in behavior in specific conditions: (1) One could expect an effect of sound type on distance estimates due to a priori level expectations from the listeners (Brungart and Scott, 2001), e.g., the helicopter might be judged at further distances to produce the same sound level as speech at the ears. No such effect was expected on externalization ratings (Leclère et al., 2019). (2) We expected higher externalization ratings for lateralized sources compared to frontal sources (Brimijoin et al., 2013; Kates et al., 2018), while the perception of distance was not expected to change much with azimuth for the distances tested (above 1 m; Zahorik, 2002). (3) BRIRs measured at different distances in the same room were expected to trigger differences in perceived distance (Prud’homme and Lavandier, 2020; Zahorik, 2002), but a relatively fixed high level of externalization compared to sounds convolved with anechoic HRTFs (in particular for lateral sources; Leclère et al., 2019). (4) For the diotic versions of the BRIR stimuli, the perceived distance was still expected to vary with simulated distance (Bidart and Lavandier, 2016), but externalization was not expected to be enhanced compared to the anechoic stimuli (Leclère et al., 2019).

Methods

Stimuli

Nonindividualized BRIRs measured by Leclère et al. (2019) in a gym (33.7 m×44.5 m×10.5 m) at three azimuths (0^◦, −30^◦, and 60^◦) and three distances (1, 3, and 5 m) from a manikin were used. They were measured using a log sine sweep technique (Farina, 2000), with a 15-s sweep duration and 20Hz-20 kHz frequency range. The signal was played through a loudspeaker (Tannoy System 8 NFM 2) at the desired location and recorded at the simulated listener position using an MK2/NCF1 dummy head (Neutrik Cortex Instrument). The average broadband reverberation time was 1.35 s. Room acoustical characteristics further describing the BRIRs can be found in the supplementary material (Suppl. Table 1), while the room layout and measurement details were presented by Leclère et al. (2019).

To simulate anechoic stimuli, HRTFs were used. They were measured with a KEMAR manikin and a loudspeaker at 1.4 m by Gardner and Martin (1994). Different manikins and loudspeakers were used for the BRIR and HRTF measurements, leading to differences in the spectrum of the corresponding direct sounds, while the spectrum of the whole BRIR stimuli was further influenced by reverberation (Supplementary Figure 2). Due to a programming error, the HRTFs used here were measured at different azimuths than the BRIRs: 0^◦, −60^◦, and 30^◦. The HRTFs are left/right symmetric so that the lateral sources based on the HRTFs were mirror images of what was intended. There is no reason to expect a left/right asymmetry in the distance and externalization tasks (Arend et al., 2021; Begault et al., 2001; Best and Roverud, 2024; Parseihian et al., 2014), as will be further discussed below. The corresponding data were thus mirror-imaged (the data for the −60^◦ stimuli were assigned to the 60^◦ condition, and the data for the 30^◦ stimuli were assigned to the −30^◦ condition) to allow for a comparison between the anechoic and reverberant conditions.

Three types of sound were considered: a short speech excerpt also used in previous studies (“Toute la nuit” meaning “All night long,” duration 0.9 s; Bidart and Lavandier, 2016; Leclère et al., 2019), the sound of a piano from the NESSTI database (1 s; Hocking et al., 2013), and the sound of a helicopter (1 s; Sound-Ideas-Series, 1992).

For each of these sounds, 27 processing conditions were considered, as summarized in Table 1. The original diotic anechoic signal was used as a reference for a source that should be very internalized. Twelve signal versions were created through convolution with the nine BRIRs (3 distances×3 azimuths) and the three HRTFs (three azimuths). Six diotic versions of the BRIR signals were also created by averaging the left and right channels of the sound produced by the lateral sources at the three distances. Finally, eight lateralized versions of the diotic stimuli were created by adding broadband interaural level/time differences (ILD/ITD) into the diotic BRIR signals, as well as into diotic versions of the HRTF signals (lateral sources only). The broadband ILD/ITD values used in both cases were measured in the corresponding HRTFs. The ITDILD stimuli contain coarse binaural cues that are broadband and constant over time. They were included to test whether lateral sources could be more externalized only because they are simulated on the side, even in the absence of the time-varying frequency-dependent binaural cues associated with reverberation (Catic et al., 2013; Leclère et al., 2019; Li et al., 2019). Best et al. (2020) hypothesized such a lateralization bias when mentioning that “a listener may not be inclined to give a rating of zero for the lateral sounds, which may introduce a bias toward higher externalization ratings.”

Table 1.

Details of the 27 Conditions Tested for the Three Sound Types (Speech, Piano, Helicopter).

Label	Simulated azimuth	Reverberation level/simulated distance	Binaural information/listening mode
REF		Anechoic, non-spatialized	Diotic
HRTF_0	0^◦	Anechoic at 1.4 m	All binaural info
HRTF_30	−30^◦	Anechoic at 1.4 m	All binaural info
HRTF_60	60^◦	Anechoic at 1.4 m	All binaural info
ITDILD_HRTF_30	−30^◦	Anechoic at 1.4 m	Broadband ITD and ILD
ITDILD_HRTF_60	60^◦	Anechoic at 1.4 m	Broadband ITD and ILD
BRIR1 m/3 m/5m_0	0^◦	Room at 1, 3, or 5 m	All binaural info
BRIR1 m/3 m/5m_30	−30^◦	Room at 1, 3, or 5 m	All binaural info
BRIR1 m/3 m/5m_60	60^◦	Room at 1, 3, or 5 m	All binaural info
ITDILD_BRIR1 m/3 m/5m_30	−30^◦	Room at 1, 3, or 5 m	Broadband ITD and ILD
ITDILD_BRIR1 m/3 m/5m_60	60^◦	Room at 1, 3, or 5 m	Broadband ITD and ILD
Dio_BRIR1 m/3 m/5m_30		Room at 1, 3, or 5 m	Diotic
Dio_BRIR1 m/3 m/5m_60		Room at 1, 3, or 5 m	Diotic

All stimuli were equalized in overall level such that the average of the root-mean-square (RMS) power of the left and right ear signals was set to the same level. The specific gains used as a function of simulated distance for the BRIR stimuli are provided in Supplementary Table 2. The virtual source produced the same broadband (averaged across ears) level independently of its distance and of the signal it reproduced. This equalization did not affect the variations in DRR and source spectra associated with simulated distance, also preserving any ILD present in the stimuli. Moreover, while the overall level was constant with simulated distance, the direct and reverberated sound levels varied and could also be used as distance cues (Supplementary Figure 1). Note that the variations in direct sound level were reduced by the equalization, but the direct sound level of the equalized stimuli still decreased with simulated distance, so it would not mislead the listeners by providing a cue varying inversely to what they would expect (Arend et al., 2021). All stimuli were sampled at 44.1 kHz, D/A converted and amplified using a Lynx TWO sound card, and delivered through headphones (HD 650 Sennheiser; Wedemark, Germany) at 60 dB SPL (calibrated using the MK2 dummy head).

Procedure

Two experimental tasks were used, each in a different experimental session. In each task, the participants were asked to imagine a loudspeaker that produces different types of sound (speech, helicopter, and piano), to close their eyes while listening to the sound played, and then open their eyes to respond to the question displayed on a screen in front of them. The participants were told that each sound could be replayed as much as they liked. Only the question changed between the two tasks.

For the distance task, listeners were asked to indicate the distance of the virtual source in meters by entering a number (using decimals if needed). No reference was provided, except that they were told to indicate a distance of 0 m when they perceived the sound to originate from within their head. For the externalization task, they were asked to evaluate whether the sound seemed to be originating from inside or outside their head (binary response, coded as 0 for internalized and 1 for externalized), and were also asked to indicate their degree of confidence in this answer using a slider on a horizontal line going from “not at all confident” to “very confident.” The analyses associated with this degree of confidence are not presented below, because confidence ratings were generally high and constant across conditions so that their analysis did not add anything to the analysis of the binary ratings.

A short practice session using six stimuli (the 3 REF and the 3 BRIR5m_60) was included to familiarize the participants with the testing environment. All sessions were performed in a double-walled soundproof booth, with a screen visible through a window and access to a mouse and keyboard within the booth. The participant was always facing the screen placed about 1 m away. The order of sessions was randomized across participants, who did the two sessions the same day. For each session, the 27 processing conditions were repeated twice for each sound type. Thus, participants responded to one question for 162 sounds (27 × 3 × 2) during each of the two sessions. The experiment lasted on average 1.5 hr per participant (including breaks, practices, instructions, and audiogram measurement).

Listeners

Eighteen university students (mean age = 22 years old, SD = 3 years; 10 female) participated in this study. They had audiometric thresholds <20 dB hearing level (HL) at octave frequencies between 125 and 8,000 Hz. Note that 6,000 Hz was also tested and a threshold of 25 dB HL only at this frequency for one listener in one ear was observed. All participants signed a written informed consent and were compensated for their participation.

Data Analyses

Because perceived distance varies as a power function of simulated distance (Zahorik, 2002; Zahorik et al., 2005), distance ratings d were log-transformed before all analyses. The transformation ln(1 + d) was used as 0 m was a valid response (Prud’homme and Lavandier, 2020). Statistical analysis and calculations of mean and standard error were done using ln(1 + d). The values of means and error bars presented in the figures correspond to the inverse transform of ln(1 + d) applied to the mean, the mean plus the standard error, and the mean minus the standard error calculated with ln(1 + d).

The externalization and (log-transformed) distance ratings were systematically compared. First, the raw data were analyzed. A logistic regression was used to evaluate the extent to which the (binary) externalization ratings could be predicted from the distance ratings while controlling for the various effects of the experimental conditions. The externalization data were fitted with a generalized linear mixed-effects model having the (log-transformed) perceived distance and the experimental factors as fixed effects and the participants as a random effect (using lme4::glmer in R version 4.2.1). The point-biserial correlation between externalization and distance ratings was also computed.

Correlation analyses were then performed on the data averaged across repetitions and sound types because the statistical analyses did not reveal any interaction between the effects of sound type and processing conditions. This averaging allowed for the externalization ratings to be more “continuous” rather than limited to three discrete values when averaged only across the two repetitions (0, 0.5, and 1), thus more suitable for Pearson correlation computation. The mean externalization rating is informative of the proportion of externalized responses in a given condition (Brimijoin et al., 2013). The mean distance and externalization ratings were compared by computing their Pearson and Spearman correlation coefficients (r_P and r_S) across processing conditions and listeners, or averaged across listeners or processing conditions. Because three correlations were considered, the alpha value used to evaluate their significance was Bonferroni corrected to 0.05/3 = 0.0166.

To test whether a source was rated as externalized as soon as and only if its distance rating exceeded a distance threshold (e.g., the head radius), the distance ratings were also used to create binary distance ratings. For each distance rating, the binary rating was set to 0 (inside the head) when the distance rating was below 10 cm, and 1 (outside the head) when the distance rating was above 10 cm. The 10-cm threshold was chosen as an approximation of a head radius and also because it corresponds to a dip in the distribution of the distance ratings (see ln(1 + d) = 0.15 in Supplementary Figure 3). After averaging across repetitions, the binary distance ratings were directly compared to the externalization ratings by computing their correlation (across listeners, conditions, and types of sound, n = 18 ∗ 27 ∗ 3 = 1,458), and analyzing the percentage of the data for which they are 0, 0.5, and 1. To evaluate the sensitivity of this analysis to the choice of distance threshold, it was also performed with two other thresholds set at 20 or 1 cm, this latter arising from the distance task in which listeners were told that 0 m should correspond to a source perceived inside their head.

To test for potential differences in behavior between externalization and (continuous) distance ratings in specific conditions, different statistical analyses were used. The (binary) externalization ratings were analyzed using logistic regressions, fitting the data with generalized linear mixed-effects models having the experimental factors and their interactions as fixed effects and the participants as a random effect. The significant effects were then assessed using analyses of deviance and Tukey pairwise comparisons of estimated marginal means (with car::Anova and emmeans). The (continuous log-transformed) distance ratings (averaged across repetitions) were analyzed using within-subject two-way analyses of variance and Tukey pairwise comparisons (with aov and TukeyHSD). The hypotheses stated at the end of the Introduction were tested by applying the statistical analyses to three specific subsets of the data depending on the particular experimental factors considered. Because the three analyses were conducted on partially overlapping data, the alpha value used to evaluate significance in these analyses was Bonferroni corrected to 0.0166.

Hierarchical clustering was performed for each session in order to assess the homogeneity of the listener's ratings (Gordon, 1999; Prud’homme and Lavandier, 2020). This was done using a matrix of dissimilarities across participants, with dissimilarity between two participants being calculated as 1 minus the Pearson's correlation of their distance/externalization estimates across all conditions in the session (after averaging across repetitions). The participants’ ratings were found to be homogeneous for the two sessions.

Results

Overall Comparisons of Distance and Externalization Estimates

Figure 1 presents the scatter plot of the raw externalization and distance ratings. The logistic regression controlling for the effects of sound type and processing conditions indicated a significant effect of perceived distance on externalization (χ²(1) = 80.9, p < .001). The point-biserial correlation between externalization and distance ratings is 0.36. When averaging across repetitions, among the 112 distance ratings at 0 m, 86.6% have an externalization rating at 0, 9.8% at 0.5, and 3.6% at 1. Among the 567 sources rated as internalized for the two repetitions (externalization ratings at 0), 17.1% have a distance at 0 m while the remaining 82.9% have a distance different from 0 (up to 125 m): 72.8% above 10 cm, 70.9% above 20 cm, 59.8% above 50 cm, 48.3% above 1 m.

Figure 1.

Scatter plots of the raw externalization and log-transformed distance ratings. The data symbols are partly transparent, so that they appear darker when data overlap.

Figure 2 presents scatter plots of the data averaged across repetitions and sound types, displaying the externalization and distance ratings across processing conditions and listeners (panel A), or averaged across listeners (panel B) or processing conditions (panel C). Distance and externalization ratings are significantly correlated in these three cases. The Pearson and Spearman correlation coefficients are r_P= r_S= 0.51 across processing conditions and listeners, r_P= .69 and r_S= .62 across processing conditions (averaging across listeners), r_P= .66 and r_S= .65 across listeners (averaging across processing conditions).

Figure 2.

Scatter plots of the mean (across repetitions and sound types) externalization and log-transformed distance ratings across the 27 processing conditions and 18 listeners (panel A), or averaged across listeners (panel B) or processing conditions (panel C). The data symbols are partly transparent, so that they appear darker when data overlap increases, highlighting the data distribution. The corresponding Pearson and Spearman correlation coefficients (r_P and r_S) are presented in each panel. All correlations are significant.

To test whether a source was rated as externalized as soon as its distance rating exceeded 10 cm, the binary distance ratings were compared to the externalization ratings (after averaging across repetitions but not sound types). Their correlation across listeners, conditions, and types of sound is significant with r_P= .44 and r_S= .46. There is a match between binary distance and externalization for 51.3% of responses: for 40.3% of responses the source was rated as externalized and with a “distance outside the head”; for 10% of responses the source was rated as internalized and with a “distance inside the head”; for 1% of responses, the ratings are both 0.5. There is a first type of mismatch between binary distance and externalization for 22.1% of responses: 0.4% of the sources were externalized with a distance inside the head, while 21.7% were internalized with a distance outside the head. There is a second type of mismatch for 26.6% of responses: one rating is at 0.5 while the other is at 0 or 1 (mostly the externalization at 0.5 and binary distance at 1, in 17.8% of responses, or the externalization at 0 and binary distance at 0.5, in 7.1% of responses), indicating a mismatch at one of the two sound presentations.

These analyses detailed by processing conditions are presented in Supplementary Table 3. The mismatch where an internalized source is rated with a distance outside the head is observed more in the diotic conditions (Dio, REF), with proportions ranging from 25.9% to 37% of the responses, and also for the frontal sources (HRTF_0, BRIR_0), with proportions between 16.7% and 38.9% of the responses. The proportions and correlations remain broadly unchanged with a distance threshold set at 1 or 20 cm (Supplementary Table 4; r_P= .40 and r_S= .43 for 1 cm, r_P= .45 and r_S= .46 for 20 cm).

Specific Comparisons of Distance and Externalization Estimates

Distance and externalization were also compared by investigating potential differences in behavior in specific conditions. Our hypotheses concerned the effects of (1) sound type, (2) simulated azimuth, (3) simulated distance and reverberation, (4) listening mode and diotic reverberation.

Sound Type

To investigate the effect of sound type on distance and externalization estimates, a main analysis was applied to the whole dataset considering the factors of sound type (three levels) and processing condition (27 levels). It revealed significant main effects of sound type (distance: F(2,1377) = 129.1, p < .001; externalization: χ²(2) = 9.7, p < .01) and processing condition (distance: F(26,1377) = 5.1, p < .001; externalization: χ²(26) = 130.2, p < .001). To highlight the effect of sound type, Figure 3 presents the externalization (bottom panel) and distance (top panel) estimates averaged across processing types and listeners. Pairwise comparisons on externalization indicated that the speech was perceived less externalized than both the piano and helicopter. These comparisons on distance estimates indicated that the helicopter was perceived at a significantly larger distance than both the piano and speech. Distance ratings (averaged across repetitions) varied between 0 and 200 m; 90 ratings (6.1% of ratings) were at 35 m or more, eight were speech (from two listeners), one was piano (from a listener who judged seven speech at 35 m or more) and 81 were helicopters (from six listeners); 29 ratings (2% of ratings) were at 70 m or more, they were all helicopters (from four listeners).

Figure 3.

Mean distance (top) and externalization (bottom) estimates with standard errors across listeners and processing types plotted as a function of sound type. The significant differences are highlighted with *** (p < .0001).

Simulated Azimuth, Distance and Reverberation

Figure 4 presents the distance (top panel) and externalization (bottom panel) estimates averaged across sound types and listeners. To investigate the effects of simulated azimuth and distance/reverberation, subanalysis 1 considered a subset of 12 processing conditions (black filled symbols): the HRTF conditions at the three simulated azimuths and the BRIR conditions at the three simulated azimuths and distances. They involved four reverberation levels (one anechoic and one for each simulated distance). Subanalysis 1 thus tested for three experimental factors: source azimuth (three levels), reverberation level (four levels), and sound type (three levels). On distance estimates, it revealed significant main effects of source azimuth (F(2,612) = 7.8, p < .001), reverberation level (F(3,612) = 19.5, p < .001), and sound type (F(2,612) = 62, p < .001). Pairwise comparisons indicated that the frontal sources (black triangles, top panel of Figure 4) were judged significantly closer than the two lateral sources (black squares and circles). All reverberation levels led to significant differences in distance estimates, apart from the BRIRs at 3 and 5 m. Perceived distance increased when going from anechoic (HRTF) to reverberant (BRIR), and further increased when simulating the source further away in the room at 3 m compared to 1 m. The analysis on externalization indicated significant main effects of source azimuth (χ²(2) = 12.4, p < .01) and sound type (χ²(2) = 9.3, p < .01). Pairwise comparisons revealed that the frontal sources (black triangles, bottom panel of Figure 4) were judged significantly less externalized than the two lateral sources (black squares and circles). The effects of sound type were identical to those already described (Figure 3).

Figure 4.

Mean distance (top) and externalization (bottom) estimates with standard errors across listeners and sound types in the 27 tested conditions grouped by processing type: the categories REF (original unprocessed diotic signal), HRTF (anechoic signals), and BRIR (reverberated signals convolved with a BRIR at one of the three simulated distances: 1, 3 or 5 m) are displayed on the x-axis, while the symbols code for the categories 0/30/60 (signals convolved with the BRIR/HRTF at 0^◦, −30^◦, and 60^◦, respectively), Dio_30/60 (diotic versions of the BRIR signals at −30^◦ and 60^◦), and ITDILD_30/60 (broadband ITD and ILD added to the diotic versions of the BRIR/HRTF signals at −30^◦ and 60^◦). A small horizontal of set has been added to the data to reduce symbol overlap. For figure readability, significant differences are reported only in the text.

Diotic Reverberation and Listening Mode

To investigate the effects of listening mode and diotic reverberation, subanalysis 2 considered a subset of 18 processing conditions (all the BRIR data in Figure 4 apart from the black triangles of the frontal conditions): the BRIR conditions at the three simulated distances (1, 3, and 5 m), for the lateral sources, in three listening modes (signals convolved with the BRIRs 30/60, diotic versions of these signals Dio_30/60, and signals with broadband ITD and ILD added to the diotic versions of the BRIR signals ITDILD_30/60). Subanalysis 2 thus tested for four experimental factors: listening mode (three levels), as well as sound type (three levels), source azimuth (two levels), and reverberation level/simulated distance (three levels). On distance estimates, it revealed significant main effects of listening mode (F(2,918) = 13.8, p < .001), sound type (F(2,918) = 71.3, p < .001), and simulated distance (F(2,918) = 11.6, p < .001). Pairwise comparisons indicated that the BRIR sources (black filled circles and squares in the top panel of Figure 4) were perceived further away than their Dio and ITDILD versions (open and gray symbols). The sources simulated at 1 m were perceived significantly closer than the sources simulated at 3 and 5 m. The effect of sound type was identical to the one already described (Figure 3). The analysis on externalization indicated a significant main effect of listening mode (χ²(2) = 14.1, p < .001). Pairwise comparisons revealed that the three listening modes led to externalization estimates that were all significantly different from each other, in particular the ITDILD signals (gray filled symbols in the bottom panel of Figure 4) led to more externalized sources than the Dio signals (open symbols).

The original diotic anechoic signals (REF) were used as a reference for sources that should be very internalized. Figure 4 confirms that this condition provided for the lowest externalization ratings, along with the frontal anechoic source (HRTF_0, which was diotic and anechoic like REF).

Discussion

The present study aimed at comparing evaluations of perceived distance and externalization by the same listeners for the same source simulations. A logistic regression indicated that externalization ratings could be partly predicted from the distance ratings. The results confirmed that ratings averaged across repetitions and sound types were significantly correlated, r_P and r_S varying between .51 and .69 when considering comparisons across listeners and/or conditions. They also indicated that a source was not rated as externalized as soon as and only if its perceived distance exceeded a threshold: using a 10-cm threshold, only 51.3% of responses are consistent with this hypothesis. The ratings were also compared in specific subsets of conditions chosen to test our hypotheses concerning potential differences in behavior between distance and externalization. These conditions were not used to highlight any new effect. Most of the effects previously highlighted in the literature were replicated here. As discussed below, while the present study cannot conclude concerning a potential difference for the effect of source azimuth on distance and externalization, it highlights differences for the effects of sound type, simulated distance, and diotic reverberation.

Overall Comparisons of Distance and Externalization

The 0.36 point-biserial correlation between the raw externalization and distance ratings indicates that the variance in the distance ratings can account for 13% of the variance in the externalization ratings. When averaging ratings across repetitions and sound types, Figure 2 confirms that distance and externalization are not independent percepts: the corresponding ratings are correlated across processing conditions and listeners (panel A), with r_P= r_S= .51. These correlation coefficients indicate that the variance in distance ratings/rankings explains 26% of the variance in externalization ratings/rankings (and reciprocally). The ratings are also correlated when averaged across listeners (panel B, r_P= .69 and r_S= .62), so that conditions in which sources were perceived as far/near tended to be conditions in which the sources were perceived as externalized/internalized, with the variance in distance ratings/rankings explaining 47.6% and 38.4% of the variance in the externalization ratings/rankings, respectively. The ratings are correlated when averaged across conditions (panel C, r_P= .66 and r_S= .65), so that listeners who judged a source as far/near also tended to judge it as externalized/internalized, with the variance in distance ratings/rankings explaining 43.6% and 42.2% of the variance in externalization ratings/rankings, respectively.

One could argue that the limited amounts of explained variance could result from the hypothesis that externalization is expected to be constant when perceived distance increases above the head radius, the source being perceived inside the head when perceived distance is below this threshold. The binary distance ratings (0 for d ⩽ 10 cm; 1 otherwise) are significantly correlated with the externalization ratings (r_P= .44 and r_S= .46) and 51.3% of responses support this hypothesis. However, a mismatch is observed across repetitions in 22.1% of responses, mostly associated with sources rated as internalized but with a distance outside the head (21.7% of responses), in particular in the diotic conditions and for the frontal sources. For the remaining 26.6% of responses, one of the two sound presentations also led to a mismatch. Overall, these results indicate that a source was not rated as externalized as soon as and only if its perceived distance exceeded a threshold. This was further confirmed by analyses setting the threshold just above 0 or at 20 cm.

The reference sources for internalization (REF) were on average rated at 2 m by the listeners (despite being instructed to rate at 0 m a source perceived in their head), while still providing among the lowest externalization ratings. This is surprising, even when taking into account that distance estimations are less reliable when measured in a listening booth (Cubick et al., 2015). Three main reasons could explain this result. First, sound level was the only distance cue available for these stimuli (while DRR was another cue for the reverberated sounds) and the listeners were faced with the difficult task of making absolute judgments for sounds equalized in level, a situation which would not be expected in natural environments. Second, spatial perception is dependent on the stimulus context, in particular the perceived location of the preceding stimuli, as highlighted for noise bursts (Andrejková et al., 2023; Carlile et al., 2001), band-limited noises (Laback, 2023) and tones (Lingner et al., 2018). Here, the distance estimate in a given condition was influenced by the other distances simulated in the experiment. The non-spatialized sources REF would have certainly been rated at much shorter distances if the other sources had been simulated only at distances between 15 and 100 cm, instead of 1–5 m. Third, the distance ratings have most probably been influenced also by source-level expectations from the listeners (see the discussion below on the effect of sound type), such that the helicopter was perceived much farther than the speech and piano. The distance ratings for the helicopter were large in all conditions, in particular also in the REF condition, contributing to the average rating of this condition being at 2 m. Although this 2-m value is surprising, it is reassuring to note that the REF sources still led to the lowest distance ratings in the experiment. This confirms that the interpretations of absolute distance judgments need to account for the multiple possible influences of context (task, stimuli, instructions), as suggested by Lingner et al. (2018) for absolute spatial perception in general.

Specific Comparisons of Distance and Externalization

Sound Type

Similar trends across processing conditions were obtained for the three types of sound. However, the main effects of sound type on distance and externalization ratings were different: while the helicopter was evaluated at further distances than the piano and speech, the speech was judged as less externalized than the helicopter and piano. Another way of formulating this differential role of sound type on distance and externalization is that, while the relationship between the helicopter and speech was identical in the two measures (the helicopter was rated as more distant and more externalized compared to the speech), the piano was found more externalized but not more distant than the speech, and found less distant but not less externalized than the helicopter.

While the stimuli equalized in level were not equalized in loudness (across sound types with different spectra), the very large distance estimates obtained for the helicopter indicate that factors other than loudness were probably involved, including cognitive factors. Philbeck and Mershon (2002) suggested that, in an experimental context, distance perception can be affected by past experience independently from the comparison with previous stimuli in the experiment. A helicopter is generally not a source experienced at short distances by most listeners, but more critically they would probably expect it to be very loud, and thus could consider that it should be far away to produce the same sound level as speech at the ears. Equalizing the ear level removed a strong distance cue for each source type, but it potentially reinforced or introduced an across-type cue due to different level expectations from the listeners depending on the source. The present study points toward level expectations having a strong effect on absolute distance ratings, generalizing the conclusions of Brungart and Scott (2001) to stimuli other than (whispered, conversational, and shouted) speech.

It is important to note that the present study used convolutions with BRIRs and HRTFs to simulate a loudspeaker (the one used in the BRIR/HRTF measurements) reproducing different types of sound, in agreement with the instructions given to the listeners who were asked to imagine such a quite common situation in real life (e.g., when watching a movie). The observed effect of level expectations could indicate that listeners might have imagined real sources rather than a loudspeaker. The convolutions used would not be appropriate to simulate such a situation: at distances of 1–5 m, a helicopter and even a piano cannot be approximated by a point source. This approximation would only be realistic if these sources were simulated far away. This could have further contributed to the large distance estimates obtained for the helicopter.

The speech was found less externalized than the helicopter and piano, consistent with the results of Leclère et al. (2019) who found speech slightly less externalized than music in part of their experiments. Speech might be more internalized because internal speech is not uncommon as people can hear their own voice (at least partly) inside their head when talking and are also quite familiar with phone conservations with earphones that generally also produce internal speech. However, no strong effect of sound type on externalization was found here nor in the study of Leclère et al. (2019), contrary to the strong effect found here for distance.

Frontal Versus Lateralized Sources

Externalization ratings were expected to be higher for lateral sources compared to frontal sources (Brimijoin et al., 2013; Kates et al., 2018; Leclère et al., 2019). This was true here. The lateral sources were more externalized than the frontal sources, both in anechoic and reverberant conditions. This effect can be further understood by considering the ITDILD sources. They tested whether lateral sources could be more externalized only because they are simulated on the side, even in the absence of the time-varying frequency-dependent binaural cues associated with reverberation that have been shown to enhance externalization (Catic et al., 2013; Leclère et al., 2019; Li et al., 2019). These cues explain here that the ITDILD sources were less externalized than the sources with full binaural cues. However, the fact that the ITDILD sources were more externalized than the corresponding diotic sources indicates that there is something in addition to the time/frequency-dependent binaural cues that enhances externalization for lateral sources. It gives support to the existence of a bias in which listeners might not rate a source as internalized as soon as it is lateralized, even if it might still be inside the head, because it tends to be perceived on the side of the head, closer to the surface of the skull (Best et al., 2020).

Contrary to externalization, perceived distance was not expected to change much with azimuth for simulated distances above 1 m. Zahorik (2002) did not find a direct influence of source direction on perceived distance while comparing the parameters of power functions fitted to distance estimates. Here, the direct comparison of such estimates showed that frontal sources were perceived closer than lateral sources. The discrepancy might be explained by the fact that in the present study stimuli were equalized in sound level, while level was kept as a distance cue in Zahorik's study, who also showed that the relative weights given by listeners to the level and reverberation cues depended on source direction. The present study indicates that, when overall level is kept constant, distance perception can depend on source direction even for distances above 1 m. The fact that the diotic versions of the lateral sources were perceived at the same shorter distances as the frontal sources indicates that the difference in perceived distance between lateral and frontal sources probably relies on additional binaural information available for the lateral sources (this is further discussed below).

Even if Figure 4 indicates that the effect of source azimuth could be larger for externalization than for distance, the magnitude of the differences in ratings between frontal and lateral sources is difficult to compare on different scales. Moreover, the statistical analyses point to a significant effect of source azimuth for both externalization and distance. Thus it cannot be concluded here that this effect was different for distance and externalization.

Simulated Distance and Reverberation

Reverberation was expected to enhance externalization (Catic et al., 2013; Leclère et al., 2019; Li et al., 2019). Figure 4 shows that sources simulated in the room tended to be perceived as more externalized than the anechoic sources, but this trend was not statistically significant (with a p-value of .0447 in subanalysis 1, the effect was above the Bonferroni-corrected significance level at .0166). This is probably due to a lack of statistical power. More listeners or repetitions would have been necessary to confirm this effect highlighted in the literature.

Perceived distance was expected to increase with simulated distance in the room (Prud’homme and Lavandier, 2020; Zahorik, 2002) and this was observed when the simulated distance increased from 1 to 3 m. Increasing this distance to 5 m did not further increase perceived distance, maybe because of the existence of an auditory “horizon” observed in several previous studies (Akeroyd et al., 2007; Bronkhorst and Houtgast, 1999; Mershon and Bowers, 1979; Zahorik, 2002). Contrary to perceived distance, externalization remained fixed while varying the source distance in the room.

Diotic Reverberation

Perceived distance was still expected to vary with simulated distance in diotic listening (Bidart and Lavandier, 2016; Prud’homme and Lavandier, 2020) and this was verified: the effect of simulated distance was significant and did not interact with the listening mode (BRIR vs. Diotic vs. ITDILD). Significant differences in perceived distance were obtained in the diotic conditions that were among the most internalized conditions. For the lateral sources, the diotic stimuli led to shorter perceived distances than the BRIR stimuli with full binaural information. This contrasts with previous results obtained for frontal sources, in which diotic and dichotic listening led to very similar distance estimates (Bidart and Lavandier, 2016; Prud’homme and Lavandier, 2020). Diotic listening was not tested here for the frontal sources to limit the number of tested conditions. Nevertheless, the frontal sources were perceived at about the same distances as the diotic and ITDILD versions of the lateral sources, closer than the lateral sources with full binaural information. This suggests that the difference in perceived distance between lateral and frontal sources could be due to the time-varying frequency-dependent binaural differences available for the lateral sources simulated in the room. The long-term broadband approximation of these binaural differences (in the ITDILD stimuli) and the binaural differences created by reverberation for the frontal sources do not seem sufficient to increase perceived distance, consistent with the results of the aforementioned studies in which dichotic and diotic listening led to similar distance estimates for frontal sources.

Contrary to its effect on distance, diotic reverberation was not expected to improve externalization (Catic et al., 2015; Leclère et al., 2019). Our results show a strong reduction in externalization when reverberation was diotic (Dio_30/Dio_60) compared to dichotic (30/60). The diotic conditions were the least externalized conditions with the anechoic frontal sources and the REF conditions. Adding the broadband ITD and ILD to the diotic versions of the BRIR and HRTF stimuli increased externalization (as discussed above), while this was not observed for distance.

Limitations

Many studies on externalization have based their evaluation on an underlying measure of distance (for a review see Best et al., 2020). The most commonly used externalization scale is going from the center of the head towards a fixed point in the environment, usually a (silent) loudspeaker used as a reference, often using adjectives such as “close/near” and “far” (Catic et al., 2013; Gil-Carvajal et al., 2016; Hartmann and Wittenberg, 1996; Hassager et al., 2016; Kim and Choi, 2005). Using such a scale is implicitly assuming that distance and externalization are aspects of a single perceptual continuum. Here, such a distance-related measure of externalization was replaced by a binary question, like previously used by Brimijoin et al. (2013). To create a continuous scale for externalization without using distance references, a question concerning the confidence in each externalization rating was included. It varied continuously between 0 (not confident at all) and 1 (very confident). Through multiplication of the externalization rating (changed to −1 for internalized, keeping 1 for externalized) with the confidence rating, one can create a continuous scale between −1 for extremely confident that the sound originated internally, to 1 for extremely confident that the sound was externalized. However, in the present study, the confidence ratings were generally high and these continuous ratings showed highly similar results as the simpler binary ratings.

Stimuli were equalized in overall level in the present study, thus removing a strong distance cue for each sound type (but not the across-type cue associated with level expectations discussed above, nor the variations in direct and reverberated sound levels that might have been used as distance cues). If externalization were independent of level, then, because distance is strongly dependent on level, when natural level variations are present one would expect externalization and distance ratings to behave even more differently than what has already been shown here. However, investigating the potential effect of level on externalization might not prove as trivial as one might think, Hartmann and Wittenberg (1996) informally reporting sounds to “jump around” when testing roved levels.

The fact that nonindividualized HRTFs/BRIRs were used here for all participants, regardless of their head size compared to the manikins used for the measurements, could have impaired their spatial perception compared to what they would have experienced listening with their own ears. However, as pointed out in the Introduction, based on previous studies, no such impairment was expected for distance perception (Prudhomme and Lavandier, 2020; Zahorik, 2002b) and, if any, the impairment was probably limited for externalization (Begault et al., 2001; Cubick et al., 2015; Kates et al., 2018; Leclère et al., 2019). In particular, Leclère et al. (2019) showed that individualization of their stimuli had a much smaller effect on externalization ratings than the effects of reverberation and source azimuth (front vs. side), also smaller than the already small effect of sound type. The effect of source azimuth on externalization was large in the present study, but the effect of sound type was small and the effect of reverberation did not reach significance. Thus, if tested, the effect of individualization would have probably been very limited. Using nonindividualized stimuli could still have slightly impaired externalization for the less externalized frontal sources (Werner et al., 2016), in particular in the anechoic condition. This does not jeopardize the study conclusions, which might even generalize to individualized stimuli, but this remains to be tested.

One of the limitations of the present study is the programming error that affected the HRTF-based stimuli. The corresponding data had to be mirror-imaged assuming that the distance and externalization tasks were not affected by any left-right asymmetry. The anechoic nonindividualized HRTFs used here are left/right symmetric, so any asymmetry affecting the results would have been perceptual. Arend et al. (2021) measured the perceived distance of nearby sources simulated between 0.25 m and 1.50 m, in anechoic conditions, for three source azimuths: 30^◦, 150^◦, and −90^◦. A left/right asymmetry cannot be evaluated directly because the sources on each side were tested at different azimuths, but already the effect of azimuth was significant essentially at very short distances below 0.5 m, or for conditions without level equalization in which the effect was interpreted as resulting from azimuth-dependent loudness differences in the stimuli. Parseihian et al. (2014) also measured the perceived distance of nearby sources, between 0.72 m and 1.08 m, but for a configuration of sources placed on a tabletop surface. Sources were tested every 30^◦ between ±60^◦ and also at +90^◦ and +120^◦, in a low reverberant space, with stimuli equalized in loudness. Listeners had to place their hand at the perceived position of the source and results were analyzed in terms of errors in perceived localization. The error in distance was significantly smaller for the lateral sources (60^◦/90^◦) compared to the other positions, but no significant differences between ±30^◦ or ±60^◦ was found, indicating no sign of a left/right asymmetry despite the (nonanechoic) room potentially introducing differences in the stimuli for the left/right sources. In the study of Best and Roverud (2024), listeners wearing different types of hearing aids were seated in a large sound-treated booth and presented with sound from one of seven loudspeakers at 0^◦, ±15^◦, ±30^◦, ±90^◦, at a fixed distance. Perceived distance was evaluated on a scale ranging from 10 (at the loudspeaker ring) to 0 (in the head) to −10 (the furthest behind the listener). For normal-hearing listeners, no significant effect of source azimuth was found. The authors also used the distance ratings to compute binary internalization ratings and did not find a significant effect of source azimuth. There was no evidence for a left/right asymmetry in perceived distance or externalization, despite potential differences in the stimuli due to nonsymmetric reverberation. Begault et al. (2001) measured externalization for sources simulated in a room at 0^◦, ±45^◦, ±135^◦, and 180^◦. Again, while reverberation could have introduced differences in the stimuli for the left/right sources, they did not find a significant effect of source azimuth, in particular no sign of a left/right asymmetry. In their experiment 1, Brimijoin et al. (2013) measured externalization while reproducing sound over loudspeakers placed at a fixed distance, every 30^◦ symmetrically around the listener, in a hemi-anechoic room. The signal was either fixed, reproduced on a single loudspeaker, or it moved across the loudspeakers to follow the listener head movements (simulating headphone listening). The pattern of externalization ratings appears very symmetrical (see their Figure 3), but the ratings of symmetric conditions are not identical, in particular for the signals following head movements (though error bars almost always overlap). The effect of source azimuth was significant but no post-hoc analyses were provided to test for an effect of source laterality. A few other studies measured externalization for sources simulated on both sides of the listeners, but the data were averaged across sides without mentioning/testing for any left/right difference (Wenzel, 1995; experiment 2 of Brimijoin et al., 2013; Hendrickx et al., 2017; Heine et al., 2021). Even if these studies point towards the absence of a left/right asymmetry in perceived distance and externalization, none was specifically designed to investigate such asymmetry. Previous studies have highlighted a left/right asymmetry in perceived azimuth (Abel et al., 1999, 2000; Burke et al., 1994; Savel, 2009) and elevation (Butler, 1994), as well as in the build-up of echo suppression (Grantham, 1996). Interestingly, Abel et al. (1999, 2000) and Burke et al. (1994) associated the asymmetry in azimuth perception with more front-back confusions for the right sources, and thus to a difference in the treatment of spectral cues, as it would be the case for the asymmetry in perceived elevation (Butler, 1994). Therefore, it could be interesting to investigate the potential existence of a left/right asymmetry for perceived distance and externalization, particularly in conditions in which they rely on spectral cues (Best et al., 2020; Kolarik et al., 2016; Zahorik et al., 2005), while controlling for any confounding effect that could be associated with nonsymmetric reverberation.

The results of the present study are probably influenced by different contextual effects associated with the tasks and stimuli. The distance and externalization tasks were done in different experimental blocks. Listeners could report the source to be at a distance in one block, and then in their head in the second block, without being challenged about that. The tasks were blocked so that their results could be compared to those of experiments that measured only distance or only externalization, but the results might have been different if the tasks were done one just after the other for each sound presentation. The contextual effects also concern the different types of stimuli that were mixed in the same experimental block. It was already mentioned that the range of simulated distances influences all distance estimates, even those of non-spatialized sounds (REF). There could be also contextual effects associated with presenting a reverberated sound just after an anechoic one. The effects of sound type and level expectations could have been reduced if the sound types were tested in different experimental blocks. These contextual effects are impossible to assess here, but they likely influenced the data. This question requires further investigation, e.g., by designing an experiment in which the types of sound are blocked but the tasks are not.

The results of the present study are probably also influenced by the instructions given to the listeners. Visual information may have affected judgments in both tasks. Even if the listeners were asked to keep their eyes closed while listening, they were not monitored to check that they obeyed this instruction. This could have impaired their ability to imagine a loudspeaker at a certain distance producing sounds. If they had been explicitly told that a source perceived at a distance had to be outside their head, and that they should monitor their externalization responses as such, the data might be different. Listeners might make a distinction between making judgments in virtual space where they might vary their distance ratings (e.g., a source is rated farther away when it is more reverberated) even if the source is internalized (no matter how reverberated it is), and real space where sources have to be externalized to vary in distance.

In the present study, simulated distance varied from 1 to 5 m. Such distances allowed to highlight that externalization does not arise simply because the source has a (simulated) distance from the listener. However, if the aim were to get a better spatial resolution closer to the head radius, smaller increments of distance below 1 m should be considered. Note that then distance becomes more depend on binaural information (Brungart et al., 1999), like externalization is (Catic et al., 2013; Leclère et al., 2019; Li et al., 2019), thus the correlation between distance and externalization might be stronger. This warrants further investigations.

In the distance task, listeners were told to use a source perceived in their head as the reference for 0 m, as commonly done in distance studies (Best et al., 2020; Kolarik et al., 2016). An externalization reference was thus used for a distance task. In the same way that distance labels might not be the most appropriate for an externalization scale, we will restrain from using an externalization reference to define the origin of a distance scale in the future, keeping in mind that significant differences in perceived distance have been reported in internalized conditions, such as the diotic conditions here and in previous studies (Bidart and Lavandier, 2016; Prud’homme and Lavandier, 2020). Note that listeners might not need to be provided with a reference to evaluate distance and that their reference might be different between virtual and real spaces.

Finally, one criticism that could be made is that participants did not spontaneously evoke their subjective experience and were guided in the experience they had to report. They might spontaneously not perceive externalization but, in experimental conditions when asked to choose between externalized and internalized, they could develop cognitive strategies to respond in a distributed manner between the two extremes, based on acoustic cues that are unrelated to the externalization percept. The same could be said about distance.

Despite these limitations, it is reassuring to note that the cues that vary externalization and distance in our study have already been reported as such in the literature (Best et al., 2020; Kolarik et al., 2016; Zahorik et al., 2005). The present study replicated several effects highlighted in the literature in studies that measured either distance or externalization.

Conclusion

Comparing evaluations of distance and externalization for the same stimuli by the same listeners indicated that distance and externalization ratings were significantly correlated across conditions and listeners, and when averaged across listeners or conditions. These correlations indicated that the variations in perceived distance explained between 26% and 47.6% of the variations in externalization (and reciprocally; 13% in the raw data before averaging across repetitions and sound types). It was also shown that a source was not rated as externalized as soon as and only if its perceived distance exceeded a threshold, e.g., the head radius. Important differences between externalization and distance were highlighted. Diotic reverberation influenced distance but not externalization. Adding broadband ITD and ILD to diotic sounds increased externalization but not distance. The effects of sound type were different for distance and externalization: the helicopter was evaluated as more distant, while speech was judged as less externalized (or the piano was found more externalized but not more distant than speech, and found less distant but not less externalized than the helicopter). Distance ratings increased with simulated distance even for stimuli judged as internalized (diotic sounds, frontal sources). Taken together, these results indicate that distance and externalization are correlated but might not be aspects of a single perceptual continuum. As a consequence, future studies evaluating externalization could consider not to use a scale related to distance.

Supplemental Material

sj-docx-1-tia-10.1177_23312165241285695 - Supplemental material for Comparing the Auditory Distance and Externalization of Virtual Sound Sources Simulated Using Nonindividualized Stimuli

Supplemental material, sj-docx-1-tia-10.1177_23312165241285695 for Comparing the Auditory Distance and Externalization of Virtual Sound Sources Simulated Using Nonindividualized Stimuli by Mathieu Lavandier, Lizette Heine and Fabien Perrin in Trends in Hearing

Footnotes

Acknowledgments

The authors thank Dr. Nobert Kop̌co, Robin Duclermortier, and Dr. Virginia Best for advice on a previous version of this paper.

Data Availability Statement

The data that support the findings of the present study are available from the corresponding author upon reasonable request.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Labex CeLyA (grant number ANR-10-LABX-0060), the PHC Danube program (grant numbers 45268RE, APVV DS-FR-19-0025, and WTZ MULT 07/2020), and the Horizon Europe research and innovation program (grant number 101129903).

ORCID iD

Mathieu Lavandier

Supplemental Material

Supplemental material for this article is available online.

References

Abel

S. M.

Giguère

Consoli

Papsin

B. C.

(1999). Front/back mirror image reversal errors and left/right asymmetry in sound localization. Acta Acustica United with Acustica, 85(3), 378–386.

Abel

S. M.

Giguère

Consoli

Papsin

B. C.

(2000). The effect of aging on horizontal plane sound localization. The Journal of the Acoustical Society of America, 108(2), 743–752. https://doi.org/10.1121/1.429607

Akeroyd

M. A.

Gatehouse

Blaschke

(2007). The detection of differences in the cues to distance by elderly hearing-impaired listeners. The Journal of the Acoustical Society of America, 121(2), 1077–1089. https://doi.org/10.1121/1.2404927

Andrejková

Best

Kopčo

(2023). Time scales of adaptation to context in horizontal sound localization. The Journal of the Acoustical Society of America, 154(4), 2191–2202. https://doi.org/10.1121/10.0021304

Arend

J. M.

Liesefeld

H. R.

Pörschmann

(2021). On the influence of non-individual binaural cues and the impact of level normalization on auditory distance estimation of nearby sound sources. Acta Acustica, 5, 10. https://doi.org/10.1051/aacus/2021001

Begault

D. R.

Wenzel

E. M.

Anderson

M. R.

(2001). Direct comparison of the impact of head tracking, reverberation, and individualized head-related transfer functions on the spatial perception of a virtual speech source. Journal of the Audio Engineering Society, 49(10), 904–916.

Best

Baumgartner

Lavandier

Majdak

Kopčo

(2020). Sound externalization: A review of recent research. Trends in Hearing, 24, 2331216520948390. https://doi.org/10.1177/2331216520948390

Best

Roverud

(2024). Externalization of speech when listening with hearing aids. Trends in Hearing, 28, 23312165241229572. https://doi.org/10.1177/23312165241229572

Bidart

Lavandier

(2016). Room-induced cues for the perception of virtual auditory distance with stimuli equalized in level. Acta Acustica United with Acustica, 102(1), 159–169. https://doi.org/10.3813/AAA.918933

10.

Blauert

(1997). Spatial Hearing. The Psychophysics of Human Sound Localization (2nd edition). The MIT Press.

11.

Brimijoin

W. O.

Boyd

A. W.

Akeroyd

M. A.

(2013). The contribution of head movement to the externalization and internalization of sounds. PLoS One, 8(12), e83068. https://doi.org/10.1371/journal.pone.0083068

12.

Bronkhorst

A. W.

Houtgast

(1999). Auditory distance perception in rooms. Nature, 397, 517–520. https://doi.org/10.1038/17374

13.

Brungart

D. S.

Durlach

N. I.

Rabinowitz

W. M.

(1999). Auditory localization of nearby sources. II. Localization of a broadband source. The Journal of the Acoustical Society of America, 106(4), 1956–1968. https://doi.org/10.1121/1.427943

14.

Brungart

D. S.

Scott

K. R.

(2001). The effects of production and presentation level on the auditory distance perception of speech. The Journal of the Acoustical Society of America, 110(1), 425–440. https://doi.org/10.1121/1.1379730

15.

Burke

K. A.

Letsos

Butler

R. A.

(1994). Asymmetric performances in binaural localization of sound in space. Neuropsychologia, 32(11), 1409–1417. https://doi.org/10.1016/0028-3932(94)00074-3

16.

Butler

R. A.

(1994). Asymmetric performances in monaural localization of sound in space. Neuropsychologia, 32(2), 221–229. https://doi.org/10.1016/0028-3932(94)90007-8

17.

Callan

D. E.

Ando

(2013). Neural correlates of sound externalization. NeuroImage, 66, 22–27. https://doi.org/10.1016/j.neuroimage.2012.10.057

18.

Carlile

Hyams

Delaney

(2001). Systematic distortions of auditory space perception following prolonged exposure to broadband noise. The Journal of the Acoustical Society of America, 110(1), 416–424. https://doi.org/10.1121/1.1375843

19.

Catic

Santurette

Buchholz

J. M.

Gran

Dau

(2013). The effect of interaural-level-difference fluctuations on the externalization of sound. The Journal of the Acoustical Society of America, 134(2), 1232–1241. https://doi.org/10.1121/1.4812264

20.

Catic

Santurette

Dau

(2015). The role of reverberation-related binaural cues in the externalization of speech. The Journal of the Acoustical Society of America, 138(2), 1154–1167. https://doi.org/10.1121/1.4928132

21.

Cubick

Rodriguez

C. S.

Song

MacDonald

E. N.

(2015). Comparison of binaural microphones for externalization of sounds. Proc. Int. Conf. Spat. Audio.

22.

Durlach

N. I.

Rigopulos

Pang

X. D.

Woods

W. S.

Kulkarni

Colburn

H. S.

Wenzel

E. M.

(1992). On the externalization of auditory images. Presence: Teleoperators and Virtual Environments, 1(2), 251–257. https://doi.org/10.1162/pres.1992.1.2.251

23.

Farina

(2000). “Simultaneous measurement of impulse response and distortion with swept-sine technique,” in AES 108th Convention, Preprint 5093 (D-4).

24.

Gardner

Martin

(1994). HRTF Measurements of a KEMAR. The Journal of the Acoustical Society of America, 97(6), 3907–3908. https://doi.org/10.1121/1.412407

25.

Gil-Carvajal

J. C.

Cubick

Santurette

Dau

(2016). Spatial hearing with incongruent visual or auditory room cues. Scientific Reports, 6, 37342. https://doi.org/10.1038/srep37342

26.

Gordon

A. D.

(1999). Classification (2nd Ed). Chapman and Hall/CRC.

27.

Grantham

D. W.

(1996). Left-right asymmetry in the buildup of echo suppression in normal-hearing adults. The Journal of the Acoustical Society of America, 99(2), 1118–1123. https://doi.org/10.1121/1.414596

28.

Hartmann

W. M.

Wittenberg

(1996). On the externalization of sound images. The Journal of the Acoustical Society of America, 99(6), 3678–3688. https://doi.org/10.1121/1.414965

29.

Hassager

H. G.

Gran

Dau

(2016). The role of spectral detail in the binaural transfer function on perceived externalization in a reverberant environment. The Journal of the Acoustical Society of America, 139(5), 2992–3000. https://doi.org/10.1121/1.4950847

30.

Heine

Corneyllie

Gobert

Luauté

Lavandier

Perrin

(2021). Virtually spatialized sounds enhance auditory processing in healthy participants and patients with a disorder of consciousness. Scientific Reports, 11, 13702. https://doi.org/10.1038/s41598-021-93151-6

31.

Hendrickx

Stitt

Messonnier

Lyzwa

Katz

B. F.

de Boishéraud

(2017). Influence of head tracking on the externalization of speech stimuli for non-individualized binaural synthesis. The Journal of the Acoustical Society of America, 141(3), 2011–2023. https://doi.org/10.1121/1.4978612

32.

Hocking

Dzafic

Kazovsky

Copland

D. A.

(2013). NESSTI: Norms for environmental sound stimuli. PLoS One, 8(9), e73382. https://doi.org/10.1371/journal.pone.0073382

33.

Jeffress

L. A.

Taylor

R. W.

(1961). Lateralization versus localization. The Journal of the Acoustical Society of America, 33(4), 482-483. https://doi.org/10.1121/1.1908697

34.

Kates

J. M.

Arehart

K. H.

Muralimanohar

R. K.

Sommerfeldt

(2018). Externalization of remote microphone signals using a structural binaural model of the head and pinna. The Journal of the Acoustical Society of America, 143(5), 2666–2677. https://doi.org/10.1121/1.5032326

35.

Kim

S.-M.

Choi

(2005). On the externalization of virtual sound images in headphone reproduction: A wiener filter approach. The Journal of the Acoustical Society of America, 117(6), 3657–3665. https://doi.org/10.1121/1.1921548

36.

Kolarik

A. J.

Moore

B. C. J.

Zahorik

Cirstea

Pardhan

(2016). Auditory distance perception in humans: A review of cues, development, neuronal bases, and effects of sensory loss. Attention, Perception, & Psychophysics, 2(78), 373–395. https://doi.org/10.3758/s13414-015-1015-1

37.

Kopčo

Doreswamy

K. K.

Huang

Rossi

Ahveninen

(2020). Cortical auditory distance representation based on direct-to-reverberant energy ratio. NeuroImage, 208, 116436. https://doi.org/10.1016/j.neuroimage.2019.116436

38.

Kopčo

Huang

Belliveau

J. W.

Raij

Tengshe

Ahveninen

(2012). Neuronal representations of distance in human auditory cortex. Proceedings of the National Academy of Sciences, 109(27), 11019–11024. https://doi.org/10.1073/pnas.1119496109

39.

Kopčo

Shinn-Cunningham

B. G.

(2011). Effect of stimulus spectrum on distance perception for nearby sources. The Journal of the Acoustical Society of America, 130(3), 1530–1541. https://doi.org/10.1121/1.3613705

40.

Laback

(2023). Contextual lateralization based on interaural level differences is pre-shaped by the auditory periphery and predominantly immune against sequential segregation. Trends in Hearing, 27, 23312165231171988. https://doi.org/10.1177/23312165231171988

41.

Leclère

Lavandier

Perrin

(2019). On the externalization of sound sources with headphones without reference to a real source. The Journal of the Acoustical Society of America, 146(4), 2309–2320. https://doi.org/10.1121/1.5128325

42.

Schlieper

Peissig

(2019). The role of reverberation and magnitude spectra of direct parts in contralateral and ipsilateral ear signals on perceived externalization. Applied Sciences, 9(3), 460. https://doi.org/10.3390/app9030460

43.

Lingner

Pecka

Leibold

Grothe

(2018). A novel concept for dynamic adjustment of auditory space. Scientific Reports, 8, 8335. https://doi.org/10.1038/s41598-018-26690-0

44.

Mershon

D. H.

Bowers

J. N.

(1979). Absolute and relative cues for the auditory perception of egocentric distance. Perception, 8(3), 311–322. https://doi.org/10.1068/p080311

45.

Parseihian

Jouffrais

Katz

B. F. G.

(2014). Reaching nearby sources: Comparison between real and virtual sound and visual targets. Frontiers in Neuroscience, 8, 269. https://doi.org/10.3389/fnins.2014.00269

46.

Philbeck

J. W.

Mershon

D. H.

(2002). Knowledge about typical source output influences perceived auditory distance. The Journal of the Acoustical Society of America, 111(5), 1980–1983. https://doi.org/10.1121/1.1471899

47.

Prud’homme

Lavandier

(2020). Do we need two ears to perceive the distance of a virtual frontal sound source? The Journal of the Acoustical Society of America, 148(3), 1614–1623. https://doi.org/10.1121/10.0001954

48.

Savel

(2009). Individual differences and left/right asymmetries in auditory space perception. I. Localization of low-frequency sounds in free field. Hearing Research, 255(1-2), 142–154. https://doi.org/10.1016/j.heares.2009.06.013

49.

Sound-Ideas-Series (1992). Sound ideas series 6000. in General Sound Effects Library.

50.

Wenzel

E. M.

(1995). The relative contribution of interaural time and magnitude cues to dynamic sound localization. in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 80–83.

51.

Werner

Klein

Mayenfels

Brandenburg

(2016). A summary on acoustic room divergence and its effect on externalization of auditory events. 8th International Conference on Quality of Multimedia Experience, (pp. 1–6).

52.

Zahorik

(2002). Assessing auditory distance perception using virtual acoustics. The Journal of the Acoustical Society of America, 111(4), 1832–1846. https://doi.org/10.1121/1.1458027

53.

Zahorik

(2002b). Auditory display of sound source distance. Proceedings of the 8th International Conference on Auditory Display, edited by Nakatsu

Kawahara

(pp. 239–243), Kyoto, Japan.

54.

Zahorik

Brungart

D. S.

Bronkhorst

A. W.

(2005). Auditory distance perception in humans: A summary of past and present research. Acta Acustica United with Acustica, 91, 409–420.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.82 MB