Abstract
Sound localization testing is key for comprehensive hearing evaluations, particularly in cases of suspected auditory processing disorders. However, sound localization is not commonly assessed in clinical practice, likely due to the complexity and size of conventional measurement systems, which require semicircular loudspeaker arrays in large and acoustically treated rooms. To address this issue, we investigated the feasibility of testing sound localization in virtual reality (VR). Previous research has shown that virtualization can lead to an increase in localization blur. To measure these effects, we conducted a study with a group of normal-hearing adults, comparing sound localization performance in different augmented reality and VR scenarios. We started with a conventional loudspeaker-based measurement setup and gradually moved to a virtual audiovisual environment, testing sound localization in each scenario using a within-participant design. The loudspeaker-based experiment yielded results comparable to those reported in the literature, and the results of the virtual localization test provided new insights into localization performance in state-of-the-art VR environments. By comparing localization performance between the loudspeaker-based and virtual conditions, we were able to estimate the increase in localization blur induced by virtualization relative to a conventional test setup. Notably, our study provides the first proxy normative cutoff values for sound localization testing in VR. As an outlook, we discuss the potential of a VR-based sound localization test as a suitable, accessible, and portable alternative to conventional setups and how it could serve as a time- and resource-saving prescreening tool to avoid unnecessarily extensive and complex laboratory testing.
Keywords
Introduction
Hearing is a complex process. It entails the transduction of acoustic information arriving at the ears into neural impulses, their transmission through the auditory nerves, and their appropriate interpretation by the central nervous system (Werner et al., 2012). Sound localization and lateralization, auditory pattern recognition, temporal integration and discrimination, and speech understanding in challenging acoustic situations are just a few basic skills that rely on our auditory processing abilities (Bellis, 2003a; Chermak & Musiek, 1997).
Auditory processing disorders (APDs) are difficulties in the perceptual processing of auditory information by the central nervous system, evidenced by poor performance on one or more of the aforementioned tasks (Chermak & Musiek, 2013; de Wit et al., 2016; Geffner & Ross-Swain, 2019). Children and adults with APDs have impaired abilities to attend to, discriminate, organize, or comprehend auditory information despite having average intelligence and normal hearing (NH) sensitivity (Keith, 1986). Thus, Bellis described this phenomenon as “when the brain cannot hear” (Bellis, 2003b). It is estimated that around 5% of school-aged children (Chermak & Musiek, 2013; Geffner & Ross-Swain, 2019), more than 40% of children with learning disorders (Iliadou et al., 2009), and between 26% and 76% of people over the age of 55 are affected by APDs (Cooper & Gates, 1991; Golding et al., 2004; Stach et al., 1990). However, diagnosing APDs can be challenging due to their heterogeneous presentation and similarities to other common disorders such as attention deficit hyperactivity disorder (ADHD), autism spectrum disorder (ASD), language impairments, and learning disabilities (Musiek & Chermak, 2014).
Despite previous efforts to develop methods for diagnosing and treating APDs, providing timely, widely available, and efficient access to diagnosis and treatment remains an ongoing topic of research. Audiologists currently use a comprehensive set of behavioral tests to diagnose APDs (American Academy of Audiology, 2010; American Speech-Language-Hearing Association, 2005; Geffner & Ross-Swain, 2019; Jerger & Musiek, 2000; Musiek & Chermak, 2014). These tests help narrow down the specific deficits contributing to the patient's hearing difficulties and design suitable treatment plans. Therefore, it is crucial to select the right tests for each patient, taking into account their individual history, auditory complaints, and potential comorbidities.
The test battery should include tasks that assess different levels and regions of the central auditory nervous system and different auditory processes, such as speech-in-noise tests (Cameron & Dillon, 2007; Dillon et al., 2012; Nilsson et al., 1994; Soli & Wong, 2008), dichotic listening tests (Hurley & Musiek, 1997; Musiek et al., 1991), auditory discrimination tests (Cranford et al., 1982), or tests of temporal processes such as within-channel gap detection (Musiek et al., 2005), among others.
The test results are interpreted using criterion-referenced scores, also known as normative cutoff values. These normative cutoff values are set at performance levels to provide the best balance between sensitivity (detection rate) and specificity (correct rejection rate). In order to diagnose APDs, there must be performance deficits of at least two standard deviations
Sound localization testing is a key part of screening for suspected APDs as it is sensitive to central auditory nervous system involvement. Poor sound localization skills have been observed in individuals with temporal lobe impairment (Moore et al., 1990; Sanchez-Longo & Forster, 1958; Sanchez-Longo et al., 1957), multiple sclerosis (Cranford et al., 1990), the aging population (Cranford et al., 1993), and patients with hearing loss (Musiek & Chermak, 2014). However, the clinical adoption of sound localization testing remains low, in part due to time and resource constraints (Musiek & Chermak, 2015) and concerns about the validity and reliability of sound localization results unless they are obtained in a well-attenuated sound room or even an anechoic chamber using complex loudspeaker arrays (American Academy of Audiology, 2010; Musiek & Chermak, 2015). An accessible, reproducible, and widely accepted method for assessing sound localization abilities is still lacking (Musiek & Chermak, 2015).
Efforts to develop a suitable configuration and procedure for sound localization testing are not new. Directional audiometry, or spatial audiometry, originated in the 1950s (Goodhill, 1954; Jongkees & Groen, 1946; Jongkees & Veer, 1957; Sanchez-Longo & Forster, 1958), and a decade later, Tonning published eight papers on the development and use of directional hearing tests for audiological applications. Six focused on directional speech intelligibility testing (Tonning, 1971a, 1971b, 1972a, 1972b, 1972c, 1973b), and only two addressed localization issues (Tonning, 1970, 1973a). Several publications followed that proposed some form of directional audiometry (Humes et al., 1980; Link & Lehnhardt, 1966; Newton & Hickson, 1981; Vermiglio et al., 1998). However, although several test configurations and data collection procedures have been proposed, the clinical community has not yet agreed on a single standard procedure for assessing sound localization abilities. Unresolved issues include the technical requirements for testing systems, comprehensive yet flexible procedures, and normative data for directional hearing (American Academy of Audiology, 2010; Letowski & Letowski, 2016).
More recently, the ERKI method was developed to fill this gap and as an attempt to embed sound localization testing into typical clinical audiological procedures (Plotz & Schmidt, 2017). ERKI is an acronym for “Erfassung des Richtungshörens bei Kindern” in German, which translates to “measurement of directional hearing in children.” The method determines the listener’s angular localization error in nonspeech localization tasks over the horizontal plane using an approved medical device of the same name. The setup consists of five loudspeakers arranged in a frontal semicircle around the patient (0°, ± 45°, ± 90°; r = 1 m), hidden behind an acoustically transparent curtain so that neither the number nor the position of the loudspeakers is known. The system can display 37 sound sources; five are real (loudspeakers), and the remaining 32 are generated as phantom sources between adjacent loudspeakers using the vector base amplitude panning (VBAP) method (Pulkki, 1997).
The patient sits in the center of the loudspeaker array with the head aligned to the 0° azimuth position. A brief noise signal is presented, and their task is to determine the location of the perceived signal by using a rotatory switch to position a light on an LED bar placed around the semicircle. They press a button to confirm their choice, and the whole procedure can take less than ten minutes.
To ensure that the results were not compromised by the use of phantom sound sources, Plotz and Schmidt compared subjects’ localization performance using a setup with discrete real loudspeakers to that using the proposed ERKI setup (using only five real loudspeakers and generating the remaining 32 phantom sound sources using VBAP). The comparison revealed no significant differences in performance between the two setups, indicating that sound localization testing can be reliably performed using the proposed ERKI method in children and adults.
The ERKI method offers a higher spatial resolution than similar devices previously available and improves the user-friendliness and automatability of the procedure. However, although it is a suitable technical solution with basic hardware equipment, sound localization abilities are still not widely evaluated as a typical audiological practice. Reasons for this may be lack of access to the equipment or lack of space, as typical setups (ERKI and similar) require semicircular loudspeaker arrays with a radius of at least 1 m in sufficiently large and acoustically treated rooms.
To address these concerns, we evaluate the feasibility of performing sound localization testing in virtual reality.
The field of digital healthcare is advancing rapidly, and virtual reality and augmented reality (VR and AR) technologies are playing an important role in medical care. These innovations have made medical training, diagnosis, and treatment more portable, accessible, and affordable. Hearing healthcare is no exception (Murphy, 2017). Previous studies on sound localization training (Steadman et al., 2019), auditory spatial analysis in multi-talker environments (Ahrens & Lund, 2022; Ahrens et al., 2019a), and speech intelligibility measurements (Ahrens et al., 2019b; Salorio-Corbetto et al., 2022) have successfully demonstrated the potential of AR and VR technologies to improve audiological research and care. They could provide cost-efficient alternatives to bulky and technically complex setups.
With great potential to support tele-audiology, they could improve diagnostic and intervention services and facilitate access to hearing healthcare services across geographic boundaries. Therefore, the development of efficient auditory tests and training procedures that can be performed on inexpensive consumer-grade hardware with simple setups is of great research interest today. In this context, modern VR peripherals can be a suitable alternative for evaluating sound localization abilities because they support spatial audio, are portable, and are comparatively inexpensive. In addition, recent versions support standalone operation, which facilitates the reproducibility and scalability of the setups.
To create a virtual sound localization test, it is necessary to make some modifications to the conventional test setup, such as replacing the loudspeaker arrays with a headphone-based binaural audio presentation for the auditory stimuli and using a head-mounted display (HMD) for the visual feedback. However, these changes may reduce test accuracy by increasing localization blur. To measure these effects, we conducted a study with a group of normal-hearing adults, comparing sound localization performance in different AR and VR scenarios. We started with a conventional loudspeaker-based measurement setup and gradually moved to a virtual audiovisual environment, testing sound localization in each scenario using a within-participant design. By comparing localization performance between the conventional loudspeaker-based and virtualized conditions, we could estimate the increase in localization blur induced by virtualization at each step. Furthermore, the results of the virtual localization test provided new insights into localization performance in state-of-the-art VR environments, allowing us to estimate first proxy normative cutoff values for sound localization testing using consumer-grade VR peripherals.
As a baseline against which to compare the (potentially degraded) performance in the virtualized scenarios C2 and C3, we created a conventional loudspeaker-based scenario, hereafter referred to as C1.
C1 basically replicates the ERKI method, but we replaced the original allocentric pointing method, that is, the rotating disk, with an egocentric pointing method using a handheld controller. We chose an egocentric pointing method because it is known to lead to more accurate performance than allocentric methods (Bahu et al., 2016; Djelani et al., 2000; Pernaux et al., 2003). Such egocentric pointing methods are also intuitive and user-friendly, and they are the most commonly used in VR.
As a second test condition, we used an AR scenario (C2), in which the loudspeaker-based audio playback was replaced by a (static) binaural headphone-based presentation using nonindividual head-related transfer functions (HRTFs). All other test components remained identical to C1. For the third condition (C3), we virtualized the visual feedback by replacing the real LED array with its virtual counterpart: a virtual LED array presented through a standalone HMD. In addition, the auditory stimuli were presented using state-of-the-art headphone-based dynamic binaural rendering with nonindividual HRTFs. Thus, C3 was a completely virtual audiovisual environment.
Our first hypothesis was that conditions C2 and C3 would result in lower performance (i.e., higher localization errors) compared to C1 due to the use of headphone-based audio presentation and nonindividual HRTFs. Typically, the use of nonindividual (generic) HRTFs results in reduced localization accuracy due to the lack of appropriate monaural (spectral) cues, increased front-back confusion, and usually higher localization errors in the median sagittal plane (Begault et al., 2001; Møller et al., 1996; Wenzel et al., 1993). However, generic HRTFs still provide robust binaural cues for localization in the frontal horizontal plane (Wenzel et al., 1993), often without a relevant increase in localization error compared to individual HRTFs (Begault et al., 2001). Furthermore, since sound localization testing is typically limited to the frontal horizontal plane when assessing binaural interaction functions, using nonindividual HRTFs may be sufficient for this use case (Brungart et al., 2017).
Our second hypothesis was that localization accuracy in C3 would be equal to or better than in C2. We expected that any performance difference between C2 and C3 would be due to the switch from real to virtual visual presentation, that is, the introduction of the HMD. This is because, with the procedure and short stimulus duration used in our study, the switch from static to dynamic binaural rendering for the audio presentation is unlikely to affect localization performance (see “Setup and Stimuli” section for details). In addition, we expected that the inclusion of matching HMD-based visuals would improve the overall plausibility of the virtual scene and the perceived immersion, thus aiding the sound localization task.
With this initial feasibility study, we aimed to gather preliminary evidence that sound localization abilities in the frontal horizontal plane can be tested in VR (C3 in this study) and that a virtualized version of the conventional test setup could be useful in screening of sound localization abilities, despite the (expected) overall performance degradation induced by virtualization.
Methods
Participants
Twenty engineering students and fellow researchers from the TH Köln University of Applied Sciences participated in the study. They all had NH sensitivity, verified by standard pure-tone audiometry in octave frequency bands from 125 Hz to 8 kHz (hearing threshold < = 25 dB HL), and had previous experience participating in listening experiments. Table S1 in the supplemental material lists the relevant demographic details of the study participants and their pure-tone audiogram results for both ears (Ramírez et al., 2024).
One participant was excluded from the study because they had physical difficulties performing the task, that is, extending their arm to point in the direction of the perceived sound source. Therefore, we report data from nineteen participants (n = 19, age 21–51 years, M = 28 years, Mdn = 26 years,
Setup and Stimuli
We replicated the ERKI setup (Plotz & Schmidt, 2017) in the sound-insulated anechoic chamber of the acoustics laboratory at the TH Köln, which has dimensions of 4.5 × 11.7 × 2.3 m (W × D × H), a lower cutoff frequency of about 200 Hz, and a background noise level of ∼ 20 dB(A) SPL. We used five Genelec 8020D loudspeakers as the real sources and generated the remaining 32 virtual sound sources as phantom sources between adjacent loudspeakers using the VBAP method (Pulkki, 1997).
Using VBAP-produced virtual sound sources instead of real sound sources does not reduce localization accuracy in this setup, as shown by Plotz and Schmidt (2017). Furthermore, Frank (2013) showed that VBAP yields sufficient localization accuracy for the setup used in the present study (5° steps in the frontal horizontal plane).
We used two successive strands of Adafruit WS2801 pixels to display the location of the 37 sound sources as individual LED lights, using a serial peripheral interface to transmit the color data and clocked by a microcontroller board (Arduino Mega 2560) (Figure 1a). An opaque, acoustically transparent fabric covered the setup.

(a) Experimental setup in the anechoic chamber of the TH Köln. Rendering of the setup without the fabric cover (K. Altwicker, TH Köln). (b) Participants had to extend their arms and point to the location of the perceived sound object relative to their body axis (egocentric pointing). They received visual feedback about the direction they were pointing by changing the color of the LED dot (R. Gillioz, TH Köln). (c) Screenshot of the VR scenario. A child-friendly, gamified, and fully automated application for sound localization testing based on the ERKI method was developed. It ran on a standalone HMD (Oculus Quest 2).
An OptiTrack system with an update rate of 120 Hz tracked the listener's head orientation, and the handheld controller used for pointing. We used head tracking to ensure that stimuli were not presented unless the participant's head was oriented at 0° azimuth (central loudspeaker) for at least two seconds. In addition, the tracking information from the handheld controller was used to know where listeners were pointing and to provide visual feedback by changing the color of the corresponding LED dot (Figure 1b).
The stimulus was a 300 ms broadband white noise with 10 ms cosine-squared onset and offset ramps. We chose a duration of 300 ms to ensure that the signal is long enough to be accurately perceived by NH individuals (Tobias & Zerlin, 1959). At the same time, it is short enough to prevent listeners from turning their heads toward the perceived direction during the stimulus presentation (Brungart et al., 2017; Gaveau et al., 2022; Higgins et al., 2023; Pollack & Rose, 1967; Thurlow & Mergener, 1970), as typical reaction times for such head movements are around 400 ms (Savelsbergh et al., 1991).
A custom MATLAB application running on a PC controlled the test procedure. Game elements were included to increase user engagement and motivation. For example, child-friendly voice prompts guided the participant through the procedure explanation, initial training trials, and the experiment, making it easy to use and resulting in a fully automated process that did not necessarily require an experimenter.
Materials
Test Condition 1: Baseline Measurement (C1)
As described in the “Introduction” section, we replicated the ERKI setup. However, we replaced their original allocentric pointing method, that is, the rotating disk, with a modified handheld Oculus Quest 2 controller, thus changing the spatial coding of the pointing method from allocentric to egocentric. In this experimental condition, the audio presentation was loudspeaker-based, using the five loudspeakers as real sources and generating the remaining 32 virtual sound sources as phantom sources between adjacent loudspeakers.
A short video illustrating some trials in this experimental condition is part of the supplemental material (Ramírez et al., 2024).
Test Condition 2: Introduction of Headphone-Based Static Binaural Rendering (C2)
In this test condition (C2), we created an AR environment in which the loudspeaker-based audio playback was replaced by a static binaural headphone-based presentation. The test environment was otherwise identical in design and procedure to C1. The subject sat in the center of the semicircle, listened to the stimuli through headphones (Sennheiser HD 600), and indicated the location of the perceived sound source by extending their arm and pointing in the direction of the perceived sound object in the LED array.
For the binaural presentation, we used measured far-field HRTFs from a Neumann KU100 dummy head (Bernschütz, 2013). The HRTF set, initially measured on a Lebedev grid with 2702 spatial sampling points, was transformed to the spherical harmonic domain at a sufficiently high spatial order of N = 44, allowing artifact-free spherical harmonic interpolation to obtain HRTFs for any desired direction using the open-source SUpDEq toolbox (Pörschmann et al., 2019). This processing resulted in accurate HRTFs for the 37 sound source directions, which we then used to generate the corresponding virtual sound sources by convolution with the noise test signal.
Given the availability of a high-quality, dense HRTF set, we decided to use the 37 discrete HRTFs covering the horizontal plane with a resolution of 5°, rather than using only five and interpolating between them (as in C1). Switching from loudspeaker-based to headphone-based binaural rendering with nonindividual HRTFs is already known to degrade localization performance, and using only five HRTFs and interpolating between them using VBAP could further reduce the localizability of virtual sound sources. This allows evaluating the feasibility of testing sound localization using simple setups as accurately as possible, that is, using the best technology currently available.
A generic headphone compensation filter was applied to the precomputed stimuli (noise test signal convolved with the respective HRTF) to minimize the influence of the headphones used. The filter is based on 12 measurements (putting the headphones on and off the dummy head) to account for repositioning variability and was designed by regularized inversion of the complex mean of the headphone transfer functions (Lindau & Brinkmann, 2012) using the implementation of Erbes et al. (2017).
Test Condition 3: Introduction of Headphone-Based Dynamic Binaural Rendering and HMD-Based Visual Feedback (C3)
The third test condition (C3) was a fully immersive audiovisual virtual environment. The LED array was replaced by its virtual counterpart (Figure 1c). Additional gamification elements were included in this condition. For example, the player earned stars as they progressed through the test, and there were short text prompts that displayed encouraging messages in a gamified manner to support the voice instructions.
The subject simply wore the HMD and headphones and started playing the game. The virtual LED array automatically adjusted to the listener’s interaural axis height when the game started. Stimuli were presented when the listener's head was oriented at 0° azimuth and if their head or the HMD was not tilted (roll control). In addition, this scenario integrated dynamic binaural rendering with head tracking.
We evaluated two different renderers for the dynamic binaural presentation. STEAM® Audio SDK (Valve Corporation, 2022) and the Unity wrapper for the 3D Tune-In Toolkit (Cuevas-Rodríguez et al., 2019; Reyes-Lecuona & Picinali, 2022). In the current study, we used the 3D Tune-In Toolkit because it is open-source, well-documented, and explicitly developed for hearing research. We used the same headphone compensation filter as well as the same Neumann KU100 HRTFs as in the previous condition (C2), but in this case, the full-spherical HRTF set in SOFA format (Majdak et al., 2022).
A short video illustrating some trials in this test condition is included in the Supplemental Material (Ramírez et al., 2024). Additionally, Table 1 summarizes the parameter settings in all three test conditions and compares them to the traditional ERKI setting.
Parameter Settings: Comparison Between ERKI and All Scenarios Implemented in the Current Study.
Abbreviations: ERKI = Erfassung des Richtungshörens bei Kindern; HRTFs = head-related transfer functions; VBAP = vector base amplitude panning; HMD = head-mounted display; LED = light emitting diode.
Experimental Procedure
We used a within-participant design so that the procedure was identical for all three test conditions (C1, C2, and C3), and the order of presentation of the test conditions was randomized. The subject sat in the center of the loudspeaker array with the head oriented at the 0° azimuth. The stimuli (65 dB(A) SPL) were presented if their head was aligned to 0° (a slight tolerance of ±2° was allowed). If their head was not aligned, the LED dot at 0° azimuth flashed, and a beep signal sounded to direct their attention to the desired direction. Additionally, voice messages instructed the listener to look forward.
The subjects’ task was to determine the location of the perceived signal by extending their arm and pointing in the direction of the perceived sound object in space and pressing a button on the handheld controller to confirm their response. Participants heard encouraging messages and tones regardless of the accuracy of their responses. They could not repeat a trial or proceed without responding, and no feedback was provided, that is, whether the response was correct. The flashing central light and beep redirected listeners’ attention until their head was reoriented to 0° azimuth, and the next stimulus was presented.
Before each test condition, child-friendly voice prompts guided listeners through the procedure. Following the voice instructions, the subject was guided through the first practice trial in a game-like fashion. A total of five practice trials were presented to familiarize them with the setup, stimuli, and procedure. After the practice trials, there was room for questions about the task.
In the experiment, stimuli were randomly presented once from all possible 37 positions (from −90° to +90° azimuth), corresponding to the frontal horizontal plane with a resolution of 5°. We used one trial per position as the previous ERKI studies showed that the number of trials could be reduced to one without negatively affecting the reliability of the results (Plotz & Schmidt, 2017). A complete experimental session lasted ∼ 1 hr, including instructions, training, and short breaks between test conditions.
The study was conducted following the principles of the Declaration of Helsinki (World Medical Association, 2013) and the guidelines of the local institutional review board of the Institute of Computer and Communication Technology at the TH Köln University of Applied Sciences. All participants gave written informed consent for voluntary participation in the study and the subsequent publication of the results. All personal data and experimental results were collected, processed, and archived according to country-specific data protection regulations.
Parameters and Statistical Analysis
We estimated localization performance for each test condition using the root-mean-square (RMS) localization error
For statistical analysis of the results, we first applied a Lilliefors test for normality to the RMS localization error
We performed a two-way repeated measures analysis of variance (ANOVA) on
Results
Sound Localization Performance
Figure 2 shows the results of the listening experiment for two different subjects: one with above-average localization accuracy (subject No. 5) and one with relatively higher localization errors (subject No. 15). The plot shows the perceived (subjective) position of the sound source as a function of its real (objective) position for all the test conditions.

Subjective localization of a sound source as a function of its position over the horizontal plane. The graph shows the objective position (abscissa) and the corresponding subjective localization (ordinate) of two study participants: Subject No. 5 (left) and Subject No. 15 (right) in all the experimental conditions (C1, C2, and C3). The solid lines are the fifth-order best-fit polynomial curves for the discrete data. The solid black line represents the ideal correct localization, and the dotted gray lines represent the closest possible responses (adjacent positions) to both the left and right sides of the ideal correct localization.
Note that our experiment allows localization with a precision of at most 5°. Therefore, for ease of interpretation, we have added gray dashed lines to the graph to represent the closest possible adjacent responses to the left and right sides of the ideal correct localization. Figure S6 in the Supplemental Material (Ramírez et al., 2024) shows the individual results for all subjects.
Figure 3 shows the pooled results of all subjects as a two-dimensional histogram. It shows the perceived localization of a sound source as a function of its position over the horizontal plane in all three test conditions.

Two-dimensional histograms of the perceived localization of a sound source as a function of its position over the horizontal plane in all the experimental conditions: C1 (left), C2 (middle), and C3 (right), pooled over all the subjects (n = 19). The solid black line represents the ideal correct localization, and the dotted gray lines represent the closest possible responses (adjacent positions) to both the left and right sides of the ideal correct localization.
The plots show that localization accuracy tends to decrease in conditions C2 and C3 compared to C1, especially for the lateral locations, which is consistent with our hypothesis and the literature. The average RMS localization errors across subjects with their standard deviations are
To better interpret these results, we present

RMS localization error in degree across test conditions (C1, C2, and C3) for all subjects (n = 19) over the entire frontal horizontal plane (left) and by region (right). The individual RMS localization errors are shown as colored markers. The boxes represent the IQR across participants, and the medians are shown as solid black lines. Whiskers indicate 1.5 × IQR below the 25th percentile or above the 75th percentile, and asterisks indicate outliers beyond this range. Left: Trend lines connect the results per participant. The color of the line indicates higher (in red) or lower (in green) RMS localization error compared to the previous test condition. Right: Frontal region: Area containing stimuli between ± 45° azimuth in front of the listener. Lateral region: All areas beyond these limits, that is, from −45° to −90° azimuth to the left and from 45° to 90° azimuth to the right.
In addition, we present the RMS localization error by region (
A two-way repeated measures ANOVA for the RMS localization error with the within-subjects factors
Post-hoc Tukey's HSD tests (Tukey, 1949) showed significant differences in the paired comparisons between the RMS localization error
Furthermore, although the listeners’ performance, that is, localization accuracy, was on average
To determine whether subjects’ ability to localize stimuli changed during the experiment due to factors such as fatigue, increased familiarity with certain stimulus features, or increased proficiency with the experimental procedure, we divided the results of each condition into four time periods, or epochs, and calculated RMS localization errors separately for each epoch. Epoch 1 included trials 1–9, epoch 2 included trials 10–19, epoch 3 included trials 19–27, and epoch 4 included trials 28–37.
We performed a one-way repeated measures ANOVA with epoch as the within-subjects factor for statistical analysis. The results showed no significant effect of epoch for any of the three conditions (
Figure S7 in the Supplementary Material (Ramírez et al., 2024) shows the corresponding RMS localization errors for the three conditions as a function of the four epochs using box plots.
Localization Performance in VR Relative to Baseline
Although the sample sizes used in our study are not large enough to establish a normative sample, the data can provide a first hint at classification criteria relevant for localization testing over the frontal horizontal plane in NH adults using conventional setups and in AR and VR environments. Moving forward, we have decided not to discuss test condition C2 any further. This decision is based on the fact that the primary objective of our study is to evaluate the overall feasibility of assessing sound localization skills in VR. Since the results of our study showed no significant differences between the RMS localization errors of C2 and C3, our analysis and subsequent discussion will focus on the VR-based application for sound localization testing (C3).
Figure 5 shows the best-fitting normal distribution of the data collected in our study for test conditions C1 and C3, using the mean and standard deviation of the RMS localization error of these test conditions. The vertical lines in the figure represent the cutoff values for normal sound localization abilities based on the mean and standard deviation of our baseline measurement (C1) and the VR condition (C3) for NH adult listeners. The solid lines represent the cutoff values for a scoring criterion of localization deficits of

Best fitting normal distribution of the experimental data for test conditions C1 (purple) and C3 (pink). The vertical lines indicate the cutoff values, that is, the maximum RMS localization error a subject could have to pass the localization test. They are calculated based on the mean and standard deviation of the data collected in C1 (purple lines) and C3 (pink lines) from NH listeners. The cutoffs for performance deficit scoring criteria of two and three standard deviations below the mean are shown respectively:
Figure 5 shows that, as expected, the virtual sound localization test (C3) has a reduced mean accuracy compared to the baseline (C1 in our study). The higher localization errors in C3 also lead to a wider distribution and, consequently, higher cutoff values.
The relationship between C1 and C3 scores for individual listeners is shown in Figure 6. The graph shows that most subjects performed significantly worse in C3 than in C1, with only a few subjects performing similarly or better in C3. The solid gray regression line predicts localization accuracy in VR from localization accuracy in the more standard baseline condition. In addition, the cutoff values, indicated as purple solid and dashed lines, provide a first proxy for classification criteria for sound localization testing in the horizontal plane in real and virtual conditions.

Relationship between RMS localization errors in the loudspeaker-based (C1) and virtual (C3) conditions for individual listeners (pink dots) in our study. The solid gray line is the least-squares regression line, and the dashed gray line depicts a perfect correlation. The purple solid and dashed lines show the cutoff values for evaluation criteria of performance deficits of
There is a low to moderate, positive but nonsignificant correlation between the two data sets (
Discussion
On the Availability of Normative Data
To determine whether sound localization testing in VR is a viable alternative to the traditional loudspeaker-based setups in the context of comprehensive auditory processing diagnostic testing, it is necessary to consider the criteria used in clinical practice to evaluate auditory test performance. As noted in the “Introduction” section, international guidelines and expert consensus recommend that performance deficits of at least
Published data on the sound localization abilities of NH listeners mainly refer to the abilities observed in laboratory studies on adults (Blauert, 1996; Makous & Middlebrooks, 1990; Middlebrooks & Green, 1991; Perrott & Saberi, 1990; Sabin et al., 2005). Data on children are even more scarce. In most cases, studies of sound localization skills in infants and school-aged children seem to have been motivated by the need to benchmark the development of these skills in NH individuals so that their performance can be compared with the evolution of patients after hearing aid and (or) cochlear implant fitting (Beijen et al., 2010; Bess et al., 1986; Grieco-Calub & Litovsky, 2010).
Furthermore, the terms “minimum audible angle” and “localization accuracy” have been used interchangeably in some studies, although both measures are based on different psychoacoustic paradigms. The first measure is derived from a relative spatial discrimination task that assesses the ability to distinguish between different sound source locations. The second is derived from an absolute identification task that measures the ability to identify a single sound location without a reference. It is still unclear whether or how the measures from the two tasks are related (Moore et al., 2008; Werner et al., 2012, Chapter 6). To date, researchers agree that absolute localization (source identification) and spatial discrimination are two distinct auditory tasks that tax (at least in part) different stages of the ascending auditory pathway (Kühnle et al., 2013; Spierer et al., 2009; Zatorre et al., 2002). Moreover, absolute localization is considered a more ecologically relevant measure of localization ability (Werner et al., 2012). Therefore, we limit our discussion to previous studies that used the same psychoacoustic paradigm as in our study, i.e., an absolute localization task.
The methods used to measure localization accuracy in previous studies vary widely, making it difficult to compare their results. Several factors may influence and explain the performance variability seen in previously published data. These factors include the number of loudspeakers used and their distribution in the horizontal plane, whether they are visible or not, the type and duration of the stimuli, the response/pointing methods, and the training protocols prior to the experimental session. Besides, some studies report absolute mean localization errors
Populin (2008) collected data from adults with NH
Our study has the most methodological similarities with the study by Yost et al. (2013), and the results of both studies are also quite comparable. The mean RMS localization error that we obtained in our loudspeaker-based scenario
The development of portable and accessible sound localization testing setups and procedures, such as the one presented in this study, could facilitate the collection of normative data in clinical and laboratory settings.
Sound Localization Testing in VR
VR-based sound localization testing using consumer-grade devices has great potential to simplify the screening of APDs (as part of a larger virtualized comprehensive test battery). However, our results confirm that virtualization increases localization blur.
Binaural reproduction of virtual sound sources using nonindividual HRTFs has the greatest negative impact on localization accuracy. On the other hand, switching from real to virtual visual representation did not significantly affect performance. The use of VR for sound localization testing may be a trade-off between cost efficiency and accessibility on the one hand and lower accuracy on the other.
The results of our baseline localization test are consistent with similar studies in the literature, as are the corresponding proxy cutoff values. This consistency further validates the reliability of our loudspeaker-based localization test. The virtual replica of this test provides the first insight into the localization performance of NH listeners in virtual audiovisual test environments. In particular, our study provides the first proxy cutoff values for sound localization screening using VR-based setups.
One point to consider is that our study included only NH adult listeners. The available data on the sound localization abilities of NH children suggest that they show higher RMS localization errors and intersubject variability than NH adults. Based on the data reported by Grieco-Calub and Litovsky (2010) and Litovsky and Godar (2010), as well as clinical practice guidelines (American Academy of Audiology, 2010; American Speech-Language-Hearing Association, 2005; Musiek & Chermak, 2014), it can be estimated that children (5 + years old) with sound localization deficits would need average RMS localization errors greater than ∼ 32° before their performance can be considered abnormal (using the most conservative criterion of
In general, our analysis of the feasibility of VR-based sound localization testing is based on the assumption that patients with sound localization deficits will not perform better in the virtual version of the sound localization test than in conventional loudspeaker-based setups. In this context, Brungart et al. (2017) showed that similar to NH listeners, HI listeners have higher sound localization errors when using virtualized auditory stimuli (headphone-based dynamic rendering with nonindividual HRTFs) than when using a traditional loudspeaker-based setup. They conclude that their study provides evidence that a virtual auditory display based on nonindividual HRTFs could be a valuable tool for evaluating the localization abilities of HI listeners. However, it should be noted that the case of patients with suspected APDs may be different from the case of HI patients. Further studies including patients with suspected APDs are needed to verify this.
At the expense of (already expected) reduced sound localization accuracy, the use of headphone-based binaural presentation using nonindividual HRTFs in the assessment of sound localization abilities opens up other possibilities, such as the chance to independently manipulate specific parameters, for example, interaural time and level differences, spectral information, and different HRTF sets, among others. In addition, VR technology allows us to further investigate the effects of multimodal stimulation in such behavioral tests, for example, by manipulating visual and proprioceptive feedback independently and in a controlled manner. The ability to conduct auditory processing tests in an immersive and controlled VR environment free from distraction is also attractive. In particular, gamification is an aspect that can be highly relevant, especially for children who may have shorter attention spans.
If the performance of the virtual sound localization test is still considered insufficient, it could be further improved by enhancing spatialization using individual or individualized HRTFs, thereby improving localization accuracy in general. Many individualization approaches exist (Guezenoc & Séguier, 2018), and methods for adapting generic HRTFs based on individual anthropometric data could be implemented in a VR application in a way that is reasonably manageable for users. For example, a comparatively simple individualization approach would be to adapt the interaural time differences of the generic HRTFs to the listener's individual head size, which would need to be measured in some way. This individualization should generally improve localization accuracy in the horizontal plane. However, individualization (even using individual HRTFs) conflicts with making sound localization testing in VR easy to use and accessible. A balance must be sought between (potentially) increasing accuracy through (sometimes complex) individualization approaches and making an assessment tool general and easy to use.
The use of low-cost, standalone HMDs, such as the one used in this study, may increase clinical efficiency by allowing testing to be performed in any clinic, at home, and periodically as needed. Simplifying the equipment and space requirements to assess complex listening skills, including the development of home testing versions, could increase equity of access to hearing care across geographic boundaries, improve the quality of care, and enhance the experience of patients and their families.
Future work should evaluate sound localization skills in VR with NH children, adolescents, and patients with suspected APDs. In addition, the effectiveness of gamified auditory training suites in VR for the remediation of APDs remains to be investigated.
This manuscript presents an initial feasibility study along with the first estimates of normative cutoff values for sound localization abilities in VR in NH adult listeners. However, in order to use such a test for individual screening, it is necessary to collect large, dedicated standardization samples. These samples should include patients with spatial hearing deficits and normal controls in different age groups and should be collected under both conventional loudspeaker-based and VR conditions. This will help to determine appropriate cutoff criteria for sound localization testing using VR setups.
Conclusion
We examined how the virtualization of sound localization testing setups and procedures degrades subjects’ localization accuracy. Moreover, we discussed how, at the same time, sound localization testing in VR could be a viable alternative to conventional loudspeaker-based setups. With the goal of advancing VR-based sound localization testing to aid in the screening of APDs, the study provides initial proxy normative data and approximate cutoff values for sound localization testing in virtual audiovisual environments using state-of-the-art technology and consumer-grade standalone hardware.
Our results encourage further evaluation of sound localization testing in VR as an alternative to bulky and more complex systems. VR may provide a time- and resource-saving prescreening tool. These findings support the further development of VR applications for the assessment of spatial hearing abilities, facilitating the screening and diagnosis of patients with suspected APDs and further research into the mechanisms of spatial auditory processing.
Supplemental Material
sj-docx-1-tia-10.1177_23312165241235463 - Supplemental material for Toward Sound Localization Testing in Virtual Reality to Aid in the Screening of Auditory Processing Disorders
Supplemental material, sj-docx-1-tia-10.1177_23312165241235463 for Toward Sound Localization Testing in Virtual Reality to Aid in the Screening of Auditory Processing Disorders by Melissa Ramírez, Johannes M. Arend, Petra von Gablenz, Heinrich R. Liesefeld and Christoph Pörschmann in Trends in Hearing
Footnotes
Acknowledgments
The authors would like to thank all the participants in the study for their time. The authors thank Kai Altwicker for his help in designing and building the loudspeaker and LED arrays, Miguel Ángel Olivares for his help in developing the VR scenario, and Raphaël Gillioz for the visual documentation of the experimental setup. The authors are grateful to the action editors and reviewers who provided constructive feedback that significantly improved earlier versions of the manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was sponsored by the German Federal Ministry of Education and Research BMBF (13FH666IA6-VIWER-S) and partly by the German Research Foundation (DFG WE 4057/21-1).
Supplemental Material
Supplemental material for this article is available online in https://doi.org/10.5281/zenodo.10655341.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
