Stochastic allocation strategy for irregular arrays based on geometric feature control

Abstract

Irregularities in microphone distribution enrich the diversity of spatial differences to decorrelate interferences from the beamforming target. However, the large degrees of freedom of irregular placements make it difficult to analyse and optimize array performance. This article proposes fast and feasible optimal irregular array design methods with improved beamforming performance for human speech. Important geometric features are extracted to be used as the input vector of the neural network structure to determine the optimal irregular arrangements of sensors. In addition, a hyperbola design method is proposed to directly cluster microphones in the hyperbola areas to produce rich differential distance entropies and yield significant signal-to-noise ratio improvements. These methods can be easily applied to guide non-computer-aided optimal irregular array designs for human speech in acoustic scenes such as immersive cocktail party environments.

Keywords

Irregular array beamforming geometric feature sensor distribution cluster design

Introduction

Microphone array processing uses time, spatial and spectral diversity to capture target acoustic signals and suppress interference and noise. Although regular arrays with uniformly spaced elements have been well studied, their performance are typically limited by the problems derive from symmetrical array arrangement, such as spatial aliasing and inconsistent performance over signal spectral.^1–3 It has been demonstrated that irregular arrays have the potential to outperform regular ones, especially for speech signals in immersive environments.^4,5 However, the large degrees of freedom of irregular microphone placements make it difficult to analyse and optimize array performance. Although optimization approaches have been proposed for irregular arrays with minimized gain pattern residues from desired pattern shape, it is not clear what are the crucial geometric features to determine the beamforming performance of totally randomly distributed array, such as array aperture and element spacing for regular arrays.^6,7 Irregular array synthesis algorithms for antenna design have been proposed in the literature.^{8–10,11–13} But they are not easily feasible for broadband human speech signals with limited knowledge of possible source locations, dynamic acoustic scenes and unknown sound propagation models. In the applications with moving sources and various background noise, such as high-speed train and crowded public scene with irregular space, direct and rapid array design method with stochastic arrangement of sensors based on the prior knowledge of acoustic environment remains lacking.

Therefore, this article proposes fast and feasible optimal irregular array design methods with enhanced signal-to-noise ratio (SNR) performance for human speech applications in immersive environment. Important distribution features, which are demonstrated theoretically and experimentally to show great impacts on performance metrics, are extracted to be applied as the input of a pre-trained neural network (NN) structure to predict array performance without the use of time-consuming simulations. Optimal arrays with high probabilities of superior beamforming performance can be directly picked based on prior knowledge of acoustic scenes. Another cluster design method is also developed based on hyperbola theory. Optimal arrays can be directly generated for specified acoustic scenes by clustering microphones in the hyperbola areas to produce rich differential distance entropies and provide superior noise suppression capability, even without optimization.

Problem formulation

Assuming a three-dimensional (3D) space for the field of view (FOV), $u (t; r_{s})$ stands for the source signal transmitted from position vector $r_{s}$ . The received signal of the pth microphone can be given as

v_{p} (t; r_{s}, r_{p}) = u (t; r_{s}) * h (t; r_{s}, r_{p})

(1)

h (t; r_{s}, r_{p}) = \sum_{n = 0}^{\infty} a_{spn} (t - τ_{spn})

(2)

where h(·) is the impulse response of the propagation path from $r_{s}$ to $r_{p}$ , $a_{spn} (t)$ is the response of nth path propagation model and $τ_{spn}$ is the corresponding time delay.

To consider the impact of microphone positions, delay-and-sum beamforming algorithm is applied with inverse distance weighting in this article. The expected optimal geometries should statistically enhance the performance of array, regardless of the beamforming types. The power gain leaked between beamforming focal point $r_{i}$ and spatial points in FOV can be expressed in frequency as

\begin{matrix} S (r_{i}, r_{s}) = \int^{​} \sum_{p = 1}^{P} \sum_{q = 1}^{P} B_{i p} B_{i q} {\overset{⌢}{V}}_{p} (ω; r_{s}, r_{p}) {\hat{V}}_{q}^{*} (ω; r_{s}, r_{q}) \\ \begin{matrix} e x p (j ω (τ_{i p} - τ_{i q})) d ω \end{matrix} \end{matrix}

(3)

where $r_{s}$ is the possible sound source location in FOV, ${\hat{V}}_{p}$ is the Fourier transform of received signal of pth microphone (p = 1, 2…, P), $B_{ip} = 1 / d_{ip}$ and $τ_{ip} = d_{ip} / c$ , where $d_{ip}$ is the propagation distance from beamforming target at $r_{i}$ to microphone position $r_{p}$ and c is the real-time speed of sound measured in application scenes.

When searching for the optimal geometry features, the coefficients of delay-and-sum beamformer can be considered as the function of microphone coordinates. The distribution of differential path from all pairwise microphones to the potential source positions is the important statistical factor to determine the array beamforming ability for noise suppression.^7,14,15 By applying the expected operations in equation (3), assuming the attenuation factors are uncorrelated with pairwise distance differences of microphones and considering only direct path propagation,^14–16 the output power of beamformer for sound sources at and away from focal points can be expressed as

\begin{matrix} E (S (r_{i}, r_{s})) = P^{2} \int 〈 ∣ \hat{U} (ω; r_{s}) ∣^{2} 〉 E (B_{ip} B_{iq} A_{sp} A_{sq}^{*}) \\ E (\exp (j 2 π (\frac{d_{sq} - d_{sp}}{λ} + \frac{d_{ip} - d_{iq}}{λ}))) d ω \end{matrix}

(4)

where the angular brackets represent the average power of source signals. As seen in equation (4), for the target source located at $r_{s} = r_{i}$ , the complex exponential terms become 1, and the target signal is enhanced by the coherent addition of differential path of pairwise microphones, regardless of array geometry. And for the interfering sources, weaker average output power of beamformer is generated, due to incoherent phases of the exponential terms.

As shown in equation (4), the key point for noise source suppression is to increase the incoherent level of transmission phases, which are related to the differential-path distance (DPD) distribution of overall pairwise microphones to the interfering sources and focal point. With fixed signal spectral and possible source distribution, limited range of DPD levels results in stronger partial coherence for multichannel signals received from interfering sources and might degrade the SNR performance of beamformer. Therefore, when searching for the optimal array geometry, instead of identifying exact positions of each microphone, the diversity and spread of DPDs are important for achieving incoherence to suppress noise signals. DPD distribution with wide range and rich diversity (such as uniform distribution) with the phase terms spreading from $- π$ to $π$ can results in a near zero power gain for non-target source positions.

As shown in Table 1, combining with the typical array parameters of aperture and centroid, statistics based on DPD distributions can be considered as important geometric features to characterize similar arrays and predict the beamforming performance of arrays without any Monte Carlo experiments.¹⁴ Table 1 also lists results from multi-way analysis of variance (ANOVA) to further demonstrate the strong correlation between geometric features and key performance matrices of array, such as mainlobe width (MLW) and mainlobe-to-peak-sidelobe ratio (MPSR).¹⁸ The proposed geometry features {L, a, σ, J} show highly significant impacts (p< 0.01) on the performance matrices of beampattern. They can explain over 80% of the array performance variance (as shown in R²) when beamforming for human speech signals.^7,14,15

Table 1.

Key geometry features related to array beamforming performance.

Features	Definition	Correlation with performance (ANOVA)
Array centroid offset {L}	The distance from array centroid to the focal point of beamformer	MLW: p < 0.0001	MLW −R² = 0.997 MPSR −R² = 0.817
Array centroid offset {L}		MPSR: p < 0.001
Array dispersion {a}	The standard deviation of microphone coordinates about array centroid	MLW: p < 0.0001
Array dispersion {a}		MPSR: p < 0.0366
DPD statistics {σ, J}	The standard deviation and normalized Shannon entropy of DPDs¹⁷	MLW: p < 0.0031
DPD statistics {σ, J}		MPSR: p < 0.0001

ANOVA: analysis of variance; MPSR: mainlobe-to-peak-sidelobe ratio; MLW: mainlobe width; DPD: differential-path distance.

In the next section, proposed geometric features are applied as the input vector for array optimization algorithms (e.g. a NN structure) to rapidly predict array SNR performance. Considering mutable acoustic applications, such as high-speed train and crowded public scenes, feasible cluster design method for stochastic arrays is proposed to directly generate optimal microphone clusters with good values of proposed features and to guide fast non-computer-aided optimal array design.

Optimal geometries for stochastic arrays

NN method

Because the relationship between irregular array features and beamforming performance is complex and nonlinear, a deep NN, which is good at non-deterministic mapping, is applied in this section. Geometry features extracted from the acoustic scene along with microphone number are applied as the first layer of a NN structure to rapidly predict the array beamforming performance for human speech signals.

As shown in Figure 1, microphone positions and prior knowledge about the acoustic scene are considered as the inputs, including probability density functions of possible target and noise source locations, related to the usual moving tracks and speaking manners of sources’ behaviour. If no prior knowledge is available, uniform distribution is the default setting to evenly consider all the spatial points in FOV as the possible source location. The objective function is expressed as

\begin{matrix} F (G) = \int_{r_{i} \in target space} {\int_{r_{s} \in noise space} Φ (G, r_{i}, r_{s}) p (r_{s} | r_{i}) d r_{s}} p (r_{i}) d r_{i} \end{matrix}

(5)

where G represents the microphone distribution with specified geometric features, $Φ (G, r_{i}, r_{s})$ represents the relations between geometric features and key performance matrices in specified scene and $p (r_{i})$ and $p (r_{s} | r_{i})$ represent the probability density functions of desired target and interfering source locations. The criterion searching for optimal array geometry can be given by

G_{opt} = \underset{G \in mic space}{argmin} 〈 F (G) 〉

(6)

The first layer of the optimization structure extracts five geometric features from the input vector, which are {L, a, σ, J} and microphone number. Two pre-trained sub-NNs are applied, one to serve as an array-noise-suppression metric and the other as a metric of spatial resolution. Each subnetwork is a two-layer feed-forward network, trained with Bayesian regularization based on a data set of 3D array gain patterns collected using Monte Carlo experiments with human speech signals. Thirty-five neurons are applied in each hidden layer of the subnetwork. For both the training and testing data sets, the regression R values reach 91%–96%, representing successful mappings from the selected array features to the key beamforming performance metrics. The outputs of each subnetwork are combined under probabilistic rules and constraints are derived from the acoustic scene. The experimental results of this optimization scheme are presented in the later section.

Figure 1.

Neural network structure to predict array performance.

Hyperbola cluster design

It has been demonstrated that high entropy and wide spread of DPD distribution derived from array geometric statistics and acoustic scene can increase the incoherence of noise components in received multichannel signals and further improve beamforming SNR. However, because DPD statistics do not have intuitive simple geometric interpretations that can be used to guide allocation of microphone distribution directly for mutual application environment, a cluster design method based on hyperbola area is proposed in this section for non-computer-aided optimal array design.

By defining the hyperbola areas based on knowledge of acoustic scene, the hyperbola cluster (HC) method can be used to directly generate an optimal array with good values of geometry features and further guide non-computer-aided optimal microphone placements. As mentioned in equation (4), with pairwise microphones {p, q} and two spatial positions ${r_{i}, r_{s}}$ in FOV, the DPD term can be rewritten as

Δ_{pq} (r_{i}, r_{s}) = (d_{sq} - d_{iq}) - (d_{sp} - d_{ip})

(7)

where different value of $(d_{sq} - d_{iq})$ can be marked by hyperbola curve with two focuses at ${r_{i}, r_{s}}$ .

As shown in Figure 2, hyperbola curve is explained as the locus of points with a constant absolute value of differential distances to two focuses. With given two spatial positions ${r_{i}, r_{s}}$ for sound sources, microphones located on one hyperbola curve show the same value of $(d_{sq} - d_{iq})$ . When microphones move towards the outside of hyperbola pair (in the grey area), the absolute value of $(d_{sq} - d_{iq})$ increases. Therefore, in order to generate a large spread of DPDs in equation (7), microphone clusters should be distributed in both hyperbola areas. In addition, there is no need to place microphones in the middle area of ${r_{i}, r_{s}}$ , because the small values of DPD distribution can be generated by the nearby microphones in the same hyperbola area. Therefore, with prior knowledge of possible source distribution in acoustic scene, a large spread and rich entropy of DPDs for each target and noise source pair can be generated by simply placing small microphone clusters over the hyperbola areas, which would provide superior noise suppression ability of beamformer.

Figure 2.

Optimal array geometries. The blue circles represent distributed microphones. The red crosses represent the possible noise source locations. The red triangle represents desired target as focal point of beamformer. (a) Computer-aided GA array and (b) HC array.

Figure 2 gives the optimal arrays resulted from computer-aided heuristic searching¹⁹ and hyperbola cluster design method. The hyperbola areas are marked by dashed lines with different colours. In Figure 2(a), it can be seen that the optimal geometries resulted from genetic algorithm (GA) searching,^5,19,20 most of the microphones are actually clustered in the hyperbola grey areas, which demonstrates the effectiveness of hyperbola analysis. Figure 2(b) provides a corresponding HC array directly generated by the HC method. Simulations and real-case experiments with human speech signals have demonstrated that the designed HC arrays show comparable or even better beamforming SNR, when compared with computer-aided optimized GA arrays.

Experimental results

Experiments in three acoustic scenes with different potential source distributions/spaces were performed to evaluate SNR performance for human speech signals. Audio cage with the size of 10 × 10 × 2 m³ was applied to simulate the indoor immersive environment for multi-source audio surveillance application cases. Three types of optimized arrays were employed: optimized arrays obtained by 100 GA iterations, arrays directly generated by HC and arrays selected from random distributions by a NN structure. In addition, the SNRs of a relevant random array set and regular array with the same centroid and dispersion are also provided for comparisons.

Table 2 compares the SNR results of the random array set and regular arrays in cocktail party experiments. Sound sources transmitting broadband human speech signals are distributed in the audio cage and are randomly selected as the target and noise sources. For specified geometry sets with similar aperture, average and top SNRs over 100 arrays are computed to demonstrate the effectiveness of proposed array geometry optimization method. All three types of optimal irregular geometries revealed enhanced beamforming performance, which demonstrates the feasibility of the array design methods proposed in this article and the effectiveness of proposed geometric features. Due to the statistical parameters and probabilistic rule applied in the optimization, as the acoustic scene becomes more complicated (more potential speakers and more microphones in an overlapping noise/target space), an even stronger SNR improvement can be observed.

Table 2.

SNR (dB) comparisons for cocktail party experiments.

Acoustic scenes		Sixty-four microphones with continuous noise space	Sixty-four microphones with discrete noise sources	Nine microphones with discrete noise sources
GA-optimized array set	Top three SNR	9.03	26.33	18.24
		8.99	25.78	17.71
		8.98	25.63	17.65
	Average SNR	8.45	23.83	16.56
HC array set	Top three SNR	8.92	28.04	21.83
		8.87	27.38	21.07
		8.71	27.03	20.69
	Average SNR	6.81	24.76	17.93
NN array set	Top three SNR	10.73	24.42	19.14
		9.43	24.10	16.67
		8.74	23.13	16.65
	Average SNR	6.96	20.05	10.00
Random array set	Top three SNR	6.47	22.61	17.78
		5.67	22.18	17.62
		5.49	22.09	17.20
	Average SNR	3.83	17.96	9.01
Regular array	SNR	3.40	17.28	8.89

HC: hyperbola cluster; NN: neural network; SNR: signal-to-noise ratio.

Through heuristic searching optimization of GA, significant SNR improvements are observed over all cases. Superior arrays are sorted out that outperform regular arrays and random array sets (100 arrays for each set with similar aperture and design space). Moreover, even without time-consuming optimization or heuristic searching by GA, as the direct design method, HC and NN directly generate optimal geometries with higher probability to show good beamforming performance. These direct-designed optimal arrays show comparable or even better SNR results than computer-aided GA arrays. And meanwhile, large SNR improvements are observed compared with corresponding regular arrays. In Figure 3, the top-view gain patterns for real-case cocktail party experiments when targeting the top source are given. It can be seen that our optimal arrays showed superior noise suppression abilities in this scene, whereas the spatial resolutions are also improved in comparison with the regular ones.

Figure 3.

Top-view gain patterns when beamforming at the top source. The red circles represent microphone positions. The triangles and crosses represent the target and noise source positions. (a) Regular array, (b) GA array, (c) HC array and (d) NN array.

Conclusion

This article has proposed feasible irregular array design methods with improved beamforming performance for cocktail party applications. Important geometric features have been proposed for use as NN structure inputs to predict array performance and directly pick optimal irregular geometries with good beamforming performance. In addition, in order to generate rich DPD entropy to better suppress noise signals, HC arrays derived from hyperbola areas can be directly generated based on prior knowledge of acoustic scene, providing improved SNR performance comparable with other complex optimization methods. Proposed method can be easily applied to guide non-computer-aided optimal irregular array design in dynamic multi-source acoustic applications such as mobile platforms with changing acoustic scenes and high-speed trains/aircraft with irregular spaces.

Footnotes

Handling Editor: Xi (Vincent) Wang

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Fundamental Research Funds for Central Universities (Grant No. 2018JBM008), National Natural Science Foundation of China (Grant No. 61501025) and Beijing Natural Science Foundation (Grant No. 4172045).

ORCID iD

Jingjing Yu

References

Benesty

Chen

Huang

. Microphone array signal processing. Berlin: Springer, 2008.

Rabinkin

Renomeron

French

. Optimum sensor placement for array sound capture. Proc SPIE 1997; 3162: 227–239.

Christensen

Hald

. Technical review beamforming. Naerum: Bruel & Kjaer Sound & Vibration Measurement A/S, 2004, pp.18–54.

Yoon

Song

T-K

. Sparse rectangular and spiral array designs for 3D medical ultrasound imaging. Sensors 2020; 20(1): E173.

. Microphone array optimization in immersive environments. PhD thesis, University of Kentucky, Lexington, KY, 2013.

Rabinkin

. Optimum sensor placement for microphone arrays. PhD thesis, Rutgers, The State University of New Jersey, New Brunswick, NJ, 1998.

Cheng

. Optimization techniques for antenna arrays. Proc IEEE 1971; 59(12): 1664–1674.

Bucci

D’Urso

Isernia

, et al. Deterministic synthesis of uniform amplitude sparse arrays via new density taper techniques. IEEE Trans Antennas Propag 2010; 58(6): 1949–1958.

Aslan

Puskely

Roederer

, et al. Multiple beam synthesis of passively cooled 5G planar arrays using convex optimization. IEEE Trans Antennas Propag. Epub ahead of print 9 December 2019. DOI: 10.1109/TAP.2019.2955885.

10.

Oliveri

Gottardi

Massa

. A new meta-paradigm for the synthesis of antenna arrays for future wireless communications. IEEE Trans Antennas Propag 2019; 67(6): 3774–3788.

11.

Pal

Vaidyanathan

. Nested arrays: a novel approach to array processing with enhanced degrees of freedom. IEEE Trans Signal Process 2010; 58(8): 4167–4181.

12.

Chi

. Exploitation of geometry in signal processing and sensing. PhD thesis, Princeton University, Princeton, NJ, 2012.

13.

Haykin

. Adaptive filter theory. 5th ed. Essex: Pearson Education, 2017, pp. 20–31.

14.

Donohue

. Geometry descriptors of irregular microphone arrays related to beamforming performance. EURASIP J Adv Signal Process Epub ahead of print 27 November 2012. DOI: 10.1186/1687-6180-2012-249.

15.

Donohue

. Performance for randomly described arrays. In: Proceedings of the 2011 IEEE workshop on applications of signal processing to audio and acoustics, New Paltz, NY, 16–19 October 2011. Piscataway, NJ: IEEE.

16.

Donohue

Saghaian Nejad Esfahani

. Constant false alarm rate sound source detection with distributed microphones. EURASIP J Adv Signal Process. Epub ahead of print 15 February 2011. DOI: 10.1155/2011/656494.

17.

Pielou

. The measurement of diversity in different types of biological collections. J Theor Biol 1966; 13: 131–144.

18.

Townsend

. Enhancements to the generalized sidelobe canceler for audio beamforming in an immersive environment. MS thesis, University of Kentucky, Lexington, KY, 2009, pp.5–10.

19.

Donohue

. Optimal irregular microphone distributions with enhanced beamforming performance in immersive environments. J Acoust Soc Am 2013; 134: 2066–2077.

20.

Akbari

Ziarati

. A multilevel evolutionary algorithm for optimizing numerical functions. Int J Ind Eng Comp 2010; 2: 419–430.