Abstract
The accuracy in ranging and direction in locating a target source is crucial in sound source localization and different methods have been proposed due to important applications of sound source localization. One of the methods in sound source localization is triangulation with the time difference of arrival information. In this literature, a modified cross-correlation algorithm is introduced to increase the accuracy in time difference of arrival, thus further improving the sound source localization results. A numerical model is generated by assuming multiple sound sources broadcasting in room environment and the location of the target sound source is identified with the triangulation algorithm. Real-time data are produced through experimental setup using an array of four microphones with a target source and background noise. The signals are processed by modified cross-correlation and conventional cross-correlation for comparison. The impacts of the signal-to-noise ratio and time difference of arrival on sound source localization results are demonstrated and discussed. Experimental validation conducted in a non-ideal environment has shown that the modified cross-correlation algorithm can minimize the error in time difference of arrival to be used in sound source localization, thus improving the accuracy in both sound source ranging and direction.
Introduction
Sound source localization (SSL) and acoustic mappings are essential for the analysis of noise and vibration reduction processes. Methodologies of SSL with different algorithms have been developed to solve specific problems. In the late 1970s, Reddi 1 used a series of triangulation algorithms to locate multiple sound sources and achieved significant localization results of the target noise sources. Detection and tracking positions of one or multiple sources are the primary goals of SSL, which is a passive method where no acoustic signal is emitted by the sensing system, and the sounds generated by the object is used. In recent decades, McGregor et al. 2 speculated that the effectiveness of sound localization was also limited by the accuracy of time of arrival (TOA) and time difference of arrival (TDOA). TOA describes how much time the sound travels from the source to a sensor, and the TDOA specifies how much time the sound arrives in one sensor compared to the reference sensor. To conquer that problem, several mathematical approaches have been adopted. Benesty and colleagues3,4 used a cross-correlation (CC) algorithm and employed duplicate sets of microphones for redundancy check to increase the accuracy of TOA and TDOA, further increasing the accuracy of SSL and the detection range of the device. Brutti et al. 5 discussed that algorithms for SSL have traditionally been relying on TDOA estimations at pairs of microphones. In the case of several microphone pairs, the SSL estimation could be done using the point in space closely matching the TDOA measurement set. Sun et al. 6 introduced a probabilistic neural network to indoor SSL and thus addressed the challenge of reverberation and low signal-to-noise ratio (SNR) environment. Zhu et al. 7 proposed algorithms using Gaussian filters to increase the accuracy of direction of arrival (DOA) results based on TDOA. Computational algorithms, such as learning approach 8 and game theory, 9 are also introduced to SSL in recent years.
Considering many applications of SSL, such as speech SSL in human–robot interaction (HRI), 10 localization of sniper, 11 and medium-range aircraft localization, 12 it is imperative that improvements must be made in the accuracy in SSL. The accuracy of the TDOA algorithm is a vital factor in obtaining the most accurate data to track and localize acoustic sources. To address the problems of SSL using the TDOA methods, researchers have conducted analysis for respective sensor configurations.13,14 Meanwhile, Blandin et al. 15 introduced angular spectrum–based methods in TDOA estimation of multiple sources from a two-channel reverberant audio signal employing SNR weighting and probabilistic multi-source modeling techniques. More research on SSL for multiple sources in the free field has been done by Kotus 16 where DOA was determined for the sources using sound intensity methods based on Fourier analysis. This allowed the determination of DOA for frequency independently. Furthermore, methodologies are developed to eliminate the interference of noise on the accuracy of TDOA by approximating various environmental conditions to adjust the TDOA values. For example, the TDOA denoising algorithm 17 can increase the accuracy of TDOA by projecting the directly measured one to a linear subspace with prior geometric information. Hosseini et al. 18 introduced the filters and weighting functions which can successfully eliminate the error in TDOA, yet require a high SNR. Generalized cross-correlation (GCC) 19 is also utilized in sound mapping for broadband sounds.
In our research, we have addressed the challenge of making such improvements with a modified cross-correlation (MCC) algorithm where errors in signals are extensively minimized through peak detections and singularities caused by random noise are eliminated. The new methodology is designed to identify the locations of various types of signals, including both the narrow and broadband sounds, in a three-dimensional (3D) space without prior knowledge of geometric information of the environment. An MCC result is used as TDOA in this algorithm, and experimental setup including four-microphone arrays, background noise, and target source in a typical classroom or laboratory-type environments are used to produce real-time data for processing. LabVIEW software from NATIONAL INSTRUMENTS™ and MATLAB software from MathWorks® are used for algorithm operations. In determining the effectiveness of our modified algorithm, the real-time data which are processed through our modified algorithm are compared to the benchmark value and results by conventional CC data. The optimized strategy has demonstrated satisfactory results in both the numerical simulation and experimental validation. The impact of the SNR on the results is also discussed.
The applied methodology
Triangulation for SSL
In this paper, the triangulation algorithm is used for the SSL, where a four-microphone system, namely, Channels 1, 2, 3, and 4, is utilized to collect real-time acoustic signals in a non-ideal environment. 20 The four channels can be placed at any location in the 3D space, so long as not all are on the same plane. Comparing the TDOA among various channels, the location of the target signal can be determined in Cartesian coordinates, 21 as shown in equation (1), assuming that the locations of the four sensors are known
where x, y, and z indicate the values in Cartesian coordinates, which can pinpoint the location in a 3D space; the subscript s means the target signal, r is the reference channel, and i can be any other channel;
During the numerical simulation, various types of signals are used as the target signal, such as a machine sound, music, and human voice. A broadband white noise is utilized as a background noise to estimate the most likely cases in a non-ideal environment. The time domain signal of the pure target sound, for example, a piece of voice by a female (Figure 1(a)), and the same signal interference with background noise (Figure 1(b)) are shown.

The time domain signal example in numerical simulation: (a) original signal without background noise and (b) signal with background noise (SNR = 0).
While the microphone locations are fixed during the measurement and the speed of sound c is considered stable, the only input in equation (1) is a set of the TDOA values. The previous research 20 has indicated that the CC algorithm has a significant advantage in terms of accuracy with the conventional peak difference method to identify TDOA. In the following section, the process of CC is discussed, and an MCC algorithm is introduced to achieve a steadier result.
To identify the accuracy in the ranging of the results,
where
TDOA algorithms
With the measured signals at all the four channels collected, the TDOA can be calculated by the CC algorithm
where
where
Besides the digital error caused by the insufficient sampling frequency, another common error on the directly measured TDOA result is that unreliable singularities may show up and cause interference in the peak detection process because of the background noise and reverberation and reflection of the target sound. It has been shown in the numerical simulation 20 that the lower the SNR value, the more frequent the singularity error appears in the TDOA values, with minor influence by the relative location between the target sound source location and the microphone set. This is majorly caused by the increased contribution of the background noise when SNR decreases and can happen at an unpredictable time instance, thus reducing the accuracy of SSL. In another word, the singularity error in the TDOA values happens randomly. As three TDOAs are required in equation (1), a singularity error, if it happens to any of the three TDOA values, it will result in a significant error, or even a wrong result in SSL. For example, when have the four microphones set up at (1, 0, 0), (0, 1, 0), (0, 0, 1), and (–1, 0, 0), the sound source at (–3, 3, 0), and the background noise at (1, 1, 1) in meters in Cartesian coordinates with SNR = 0, certain interference by the background noise will cause singularity point in the CC result, as shown in Figure 2(a) and (b), where the peak index represents the correct TDOA, while the computer program picks the location of singularity automatically, which leads to an error of TDOA. Both TDOA values between microphones 1 and 2 as well as 1 and 4 are not correct, while the TDOA between microphone 1 and 3 is correct with an error within tolerance, as the detailed numbers are shown in Table 1. The wrong TDOAs lead to a wrong SSL result of (0.37, 0.37, 0.28) in meter. The error can be potentially reduced by increasing the SNR ratio, though a certain level of error still exists and cannot be eliminated.

Comparison of CC and MCC in array index: (a) wrong
Test example in numerical simulation.
To conquer the problem, an MCC algorithm with an innovative mathematical model is built to filter out any singularities during the CC process, including the following steps:
Step 1. Calculate the CC result array
Step 2. Apply the probability function to
where
Step 3. Apply a rectangular function to
where
Step 4. Pick the maximum peak from
As shown in Figure 2(c) and (d), the MCC process mentioned above can effectively eliminate the singularities and identify more accurate TDOAs. A comparison of CC and MCC is shown in Table 1. In the same example mentioned above, microphones are located at the target sound source at
Numerical simulation and impact of SNR
To evaluate the accuracy of the SSL, the target source is placed and tested at every 0.1 m within a 10 by 10 m2 area with white noise played at Cartesian coordinates (1, 1, 1) m as a background noise. To make the prototype product compactable, the locations of the four sensors are finalized at four vertices of a tetrahedron with edge length adjustable. To make the prototype of the sensing system to fit in the lab environment, one of the vertices is on the origin and the other three form a plane vertically. The edge length of the tetrahedron is set up at about 2 m during the numerical analysis and the experimental validation. The locations of the four microphones in Cartesian coordinates are set at (0, 0, 0), (–1, –0.6, 1.6), (1, 0.6, 1.6), and (0, 1.2, 1.6) m, respectively. The x–z plane describes the level plane on a certain height above floor, and the y-axis indicates the height. The algorithm is evaluated by scanning the level plane at the same height, which estimates the most common cases in SSL and noise detection.
Various levels of SNR are applied and the scanned results are demonstrated in Figure 3 which shows top views as a contour map within the 10 by 10 m2 area. Shown in color map, the error in percentage indicating the accuracy in ranging defined by equation (2) and the absolute error in azimuthal angle at the level plane display the accuracy of the SSL in directions. Microphones are designed to be at four vertices of a tetrahedron, shown by black round mark in Figure 3 from top views as well.

Error and absolute error in direction by the SSL with MCC: (a) error in ranging (SNR = 10 dB, MCC); (b) absolute error in directions (SNR = 10 dB, MCC); (c) error in ranging (SNR = 5 dB, MCC); (d) absolute error in directions (SNR = 5 dB, MCC); (e) error in ranging (SNR = 0, MCC); and (f) absolute error in directions (SNR = 0, MCC).
As shown in Figure 3, the error in ranging increases with the decrease of SNR and related to the geometry setup of the microphones. Although the overall accuracy in ranging of the SSL results is satisfactory, results in several certain directions are under concern, especially at the negative side of the z-axis. Error when locating target sources on the positive side of the z-axis can be controlled under 1% with positive SNR. On the other hand, the absolute error in degrees, indicating the error in directions, is eliminated and independent of the SNR. As expected, errors of directions increase when the target source is on the x-axis because of the geometric setup of the microphone systems.
The error in ranging and absolute error in directions by MCC are also compared to the CC algorithm, as shown in Figure 4. A set of data with SNR = 10 dB are tested by both the MCC and CC algorithms and the TDOAs are used in the SSL. The results have shown that conventional CC has a relatively higher error in certain areas in both ranging and directions due to the geometric setup of the microphone locations, while the MCC has resolved the problem and shows a steadier result with less error.

Comparison between the SSL results by MCC and CC: (a) error in ranging (SNR = 10 dB, MCC); (b) absolute error in directions (SNR = 10 dB, MCC); (c) error in ranging (SNR = 10 dB, CC); and (d) absolute error in directions (SNR = 10 dB, CC).
What is more, reliability, as defined in equation (8), of SSL is introduced to further evaluate the efficiency of SSL with CC and MCC
where
A comparison of CC and MCC in the format of reliability is shown in Figure 5. As expected, both CC and MCC produce a more accurate SSL solution when the SNR increases. However, the reliability of SSL result by CC drops significantly when the SNR decreases and

Impact of SNR and SSL with CC versus MCC.
Experimental results and discussion
The experimental validation was conducted in a regular lab-type classroom, with tables, chairs, and computers in it. Sound signals were collected by Model 130E21 microphones from PCB Piezotronics, Inc., and processed by NI-9234 Sound and Vibration Input Module from NATIONAL INSTRUMENTS. As mentioned in section “Numerical simulation and impact of SNR” in this paper, the locations of the four microphones are selected at four vertices of a tetrahedron, as shown in Figure 6, and the locations in Cartesian coordinates are (0, 0, 0), (–1, –0.6, 1.6), (1, 0.6, 1.6), and (0, 1.2, 1.6) m, respectively. Multiple air conditioner outlets on the ceiling were running continuously and considered as background noise. A loudspeaker playing machining sound was put randomly as the target sound source. As shown in Figure 7, both the target and background signals are broadband sounds. To estimate the SNR level during the experiment, the loudspeaker is first turned off and the background noise sound pressure level
The SNR in dB can then be determined as

Experimental setup.

Measured signal examples in experiments: (a) target and background (time domain), (b) background (time domain), (c) target and background (frequency domain), and (d) background (frequency domain).
A total of 500 samples were collected in the time domain and processed by CC and MCC. The error is defined as shown in equation (2) to evaluate the effectiveness of SSL, and the accuracy of the results is shown in Figure 8 and Table 2. The SNR is checked frequently during the experiment and is controlled between 1 and 3 dB in general.

Error distribution in experimental validation.
Error by two methods in experimental validation.
Both CC and MCC can be successfully applied to the triangulation algorithm to find the target location. However, MCC has better accuracy and success rate in frequency than CC. As shown in Figure 8, when controlling the tolerance at below 2% error, the SSL with CC has a success rate of 63%, while the MCC one can reach 70%. What is more, as shown in Table 2, when increasing the tolerance to a 15% error, the SSL by the CC process has a success rate of 63.91%, but the MCC one can reach an 81.33% success rate. Although still affected by multiple factors such as the SNR, source location, and signal resolution, MCC can effectively eliminate the error in TDOA caused by random interference of the background noise, thus providing a more stable SSL result.
Conclusion
Accuracy improvement of TDOA in SSL is a practical problem which is addressed in this literature by introducing an MCC method. The result of the proposed methodology is compared to that of the conventional CC algorithm in both numerical simulation and experimental validation and has shown a significant improvement in accuracy of SSL. Specifically, the accuracy of MCC with SSL is influenced by the SNR as well as other factors such as the microphone configurations and sampling frequency of the recorded time domain signals. The error in ranging increases with the distance of the target sound source, yet the error in direction is minimum and shows independence of the target distance and SNR. More real-time experiments on various plane heights and SNRs will be conducted in the future.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
This work was supported by the Undergraduate Research Opportunity Program (UROP) at the University of Michigan-Flint.
