Abstract
This paper reports on the deterioration in sound-localization accuracy during listeners’ head and body movements. We investigated the sound-localization accuracy during passive body rotations at speeds in the range of 0.625–5 °/s. Participants were asked to determine whether a 30-ms noise stimuli emerged relative to their subjective-straight-ahead reference. Results indicated that the sound-localization resolution degraded with passive rotation, irrespective of the rotation speed, even at speeds of 0.625 °/s.
Spatial hearing is considered as a multisensory-integration process involving self-motion (Suzuki et al., 2020). Earlier studies demonstrated that the listener's head/body movement facilitates sound localization (Honda et al., 2007, 2013, 2018; Iwaya et al., 2003; Kawaura et al., 1989/1991; Thurlow & Runge, 1967; Wallach, 1939), but recent reports have shown that sound localization accuracy deteriorates during the listener's head/body rotation (Cooper et al., 2008; Honda et al., 2016, 2020).
It is interesting that these deterioration effects are independent of rotation velocities in the range of 5–60 °/s (Honda et al., 2016, 2020). Recently, Honda et al. (2020) investigated the effect of passive body rotations on the accuracy of the subjective-straight-ahead (SSA) orientation of listeners. In this study, the participants were asked to keep their heads still while their chairs were rotated at speeds of 5, 10, and 20 °/s. The rotating-chair experiment revealed a significant reduction in the sound-localization accuracies measured irrespective of the rotational speed down to 5 °/s. These results indicate that the deterioration in sound localization is not due to bottom-up effects such as ear-input blurring.
If the deteriorating effect persists irrespective of the speed at lower speeds than previously tested, this phenomenon may be attributed to a top-down effect, such as the participant's conscious perception of motion. To test this hypothesis, we conducted an experiment with the aim of gaining insights regarding the SSA-relative sound-localization resolution, considering passive body rotation speeds below 5 °/s. This issue has not been discussed extensively in literature.
Eight normal-hearing participants (six males and two females; aged 21–23) were recruited for the experiments performed in a dark anechoic room (Figure 1).

Experimental setup considered in this study. Sound bursts were generated using a circular-array (radius = 1.1 m; range = ±28.75°) of 30-mm loudspeakers (Hosiden 0254 7N101) separated by a 2.5° angular spacing. No special devices were used to restrict the participants’ head movements, and their head positions were monitored using a magnetic sensor placed at the top of their heads.
Two conditions—still- and rotating-chair—were considered during the experiments. Under both conditions, the participants were asked to judge the azimuthal direction of the sound image (left or right) relative to their SSA (i.e., the two-alternative forced choice (2AFC) approach). The acoustic stimulus comprised 1/3 octave-band noise with a 1-kHz center frequency. Each stimulus lasted 30 ms, including rise and decay times of 5 ms each. The A-weighted sound pressure level, when the band-noise was presented steadily, was set to 65 dB. The location of the acoustic stimulus was selected using the randomized maximum likelihood adaptation method (Takeshima et al., 2001). This study was approved by the ethics committee of the Research Institute of Electrical Communication at Tohoku University.
In the still-chair condition, the participants remained seated on the chair facing 0° and gazed at the LED, which was lit for 1 s, while keeping their heads still. Subsequently, the LED was switched off, and the acoustic stimulus was deployed from one loudspeaker located within ±11.25° of the participant's SSA. Next, the participants were asked to determine the direction from which the stimulus was deployed. One hundred such trials were performed.
In the rotating-chair condition, the participants remained seated on the chair facing ±15° and gazed at the LED for 1 s while keeping their heads still. After the LED was switched off, the chair was rotated at a speed of 0.625, 1.25, 2.5, or 5°/s to align with the 0° direction. After 4 s from the commencement of rotation, the acoustic stimulus was deployed from one loudspeaker located within ±15° of the participant's SSA. Once again, the participants were asked to determine the direction of origin of the stimulus. Ten similar sessions were performed for each participant, resulting in 640 trials (2 rotation directions × 4 rotation speeds × 80 trials). The rotation direction and speed were randomly selected.
The total number of left (0) and right (1) judgments under each condition were recorded. We then plotted the correct answering rate as a function of the angular distance between the loudspeaker position and the physical-straight-ahead of the observer. The point of subjective equality, i.e., the point of SSA (PSSA), is defined as the 50% point on the psychometric function for each condition for each participant. Moreover, the just noticeable difference (JND) of the PSSA for each condition is defined as the difference between the 50% point and the 75% point on the psychometric function. To estimate these points, we used the normal cumulative distribution function as the model psychometric function and numerically estimated its parameters (mean m and variance σ2). The function was fitted to the plots using maximum likelihood fitting (Ogura et al., 1989). Thus, the PSSA is the mean m of the estimated normal cumulative distribution function. Moreover, the JND of the PSSA is given by 0.6745σ, where σ denotes the estimated standard deviation, because the estimated psychometric function (the fitted normal cumulative distribution function) crosses 0.75 at the point m + 0.6745σ. One participant was excluded from the analysis owing to non-estimation of reasonable psychometric functions.
The one-way repeated-measure analysis of variance (ANOVA) was applied to the PSSA considering the rotational speed as a factor. As observed, the effect of the rotational speed was not significant (F (4, 24) = 1.41, n.s) (Figure 2). Interestingly, application of the said ANOVA to the JND demonstrated a significant rotational-speed effect (F (4, 24) = 9.26, p < .0001). Moreover, the post-hoc analysis (Ryan's method, ps < .05) yielded a small JND value under the still-chair condition compared to the rotating-chair scenario (Figure 3).

Average PSSA and standard error values.

Average JND and standard error values.
While Honda et al. (2020) confirmed the deteriorating effects at low velocities (5°/s), this study confirms the occurrence of these effects at speeds as low as 0.625°/s. More importantly, this study establishes that the PSSA accuracy remains unaffected by the rotational velocity. At the highest rotational speed of 5°/s considered in this study, the participants rotate by 0.15° during the duration of the 30-ms acoustic stimulus. As the rotation speed was constant during each trial, the angle of the sound source changed linearly; therefore, it could act as a source of noise that contributes to the JND. The variance of a continuous uniform distribution is defined as Δ2/12, where Δ is the difference between the angles at the start and end of the stimulus, i.e., 0.15. This corresponds to a standard deviation of 0.043°, which is considerably smaller than the JND value of approximately 1.8°. Therefore, the deteriorating effects of sound localization (Cooper et al., 2008; Honda et al., 2016, 2020) are hardly attributable to bottom-up effects, such as ear-input blurring. Honda et al. (2020) reported that the said effect can be attributed to top-down effects such as the participant's conscious perception of motion. The findings of this study suggest the involvement of such a top-down mechanism. During perceptual information processing, the listener should constantly update sound image position information via conscious head/body movement, resulting in an increase in the amount of information to be processed. Taking this into account, information processing becomes difficult due to the higher cognitive load in computing and integrating dynamic changes in the information. Therefore, it can be assumed that a top-down process induced by listeners’ conscious perception of motion may act as a constraint against such information processing overload.
Footnotes
Acknowledgements
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Ministry of Education, Culture, Sports, Science and Technology Grants-in-Aid for Scientific Research (A) (No. 487283 and No. 16H01736) and (B) (No. 26280078). This research was partly performed under the Cooperative Research Project Program of the Research Institute of Electrical Communication, Tohoku University (No. H29/A22 and R02/A32).
