Sound field reconstruction in reverberant environments with rigid spherical microphone arrays

Abstract

To achieve accurate sound field reconstruction in reverberant environments, a method based on the principle of equivalent sources and a mixed-wave model is proposed with rigid spherical microphone arrays. The method assumes that the near-field sound sources are sparsely distributed and decomposes the sound field into the direct sound components modelled by monopole sources and the reflected sound components represented by plane waves, thereby constructing a mixed-wave model. The alternating direction method of multipliers is then used to alternately estimate the weights of monopole sources and plane waves, enabling the sound field decoupling. Finally, the target sound field in the region of interest is reconstructed using the obtained weights. Both numerical simulations and finite element method-based virtual validation experiment demonstrate that the proposed method generally expands the effective reconstruction region to twice that of Ambisonics, while improving reconstruction accuracy by 32% and 12.5% within a region spanning two to three times the array radius, compared with the plane wave decomposition method.

Keywords

sound field reconstruction reverberant environments mixed wave model alternating direction method of multipliers

1. Introduction

Sound field reconstruction^1,2 estimates the acoustic information at arbitrary positions by post-processing the signals captured by microphone arrays, and this technique has been widely applied in areas such as personal sound zones,^3,4 virtual reality,^5,6 and noise control.^7,8 The effective reconstruction region and application scenarios are closely related to the geometry of microphone arrays. Common array configurations include linear arrays,^9–11 planar arrays,^12–14 and spherical arrays.^15–17 Compared to linear and planar arrays, spherical arrays, especially rigid spherical arrays, are more compact, flexible, and capable of capturing more spatial acoustic information.

Ambisonics,^18–20 an early proposed sound field reconstruction method with spherical microphone arrays, represents sound fields using spherical harmonics and reconstructs the sound field at unmeasured locations by exploiting the estimated expansion coefficients. However, due to the limited number of microphones and discrete spatial sampling, Ambisonics performs poorly in reconstructing the sound field far from the array. To overcome this limitation, Samarasinghe et al.²¹ employed a distributed higher-order microphone array to expand the effective reconstruction region. The equivalent source method (ESM), another sound field reconstruction method, assumes that the target sound field is generated by a set of monopole sources or plane waves surrounding the region of interest, and estimates source strengths or plane wave weights via sparse optimization to achieve sound field reconstruction. Fernandez-Grande²² used the equivalent sources emitting spherical waves to model the sound field recorded by rigid spherical microphone arrays and adopted the regularization technique to estimate source strengths, achieving accurate three-dimensional sound field reconstruction. Verburg et al.²³ represented sound fields using a limited number of plane waves and reconstructed the room’s spatial frequency responses over the extended regions via compressive sensing. Compared to Ambisonics, ESM is not constrained by the spherical harmonic order truncation and thus provides a larger effective reconstruction region. However, the above ESM is based on the free-field assumption, which limits its applicability to reverberant environments, where sound reflections violate this assumption and degrade reconstruction accuracy.

To achieve accurate sound field reconstruction in reverberant environments, Koyama et al.^24,25 established a mixed wave model with monopole sources and plane waves. Damiano et al.²⁶ modeled early reflections using the image source method, represented the late-reverberant components with a set of plane waves, and simulated the directivity using multipole sources. These mixed wave models are constructed based on linear and planar microphone arrays for one-dimensional and two-dimensional sound field reconstruction. In this paper, we extend the mixed wave modeling framework to three-dimensional space. The signals captured by rigid spherical microphone arrays are decoupled into direct and reflected components using monopole sources and plane waves. Then, sparse constraints are imposed, and an alternating direction method of multipliers (ADMM) is used to estimate the weights of monopole sources and plane waves, enabling accurate three-dimensional sound field reconstruction in reverberant environments.

The remainder of this paper is organized as follows. Sections 2 and 3 introduce the mixed wave model with rigid spherical microphone arrays and ADMM solver. Sections 4 and 5 present numerical simulations and finite element method-based virtual validation experiment, respectively. Section 6 draws the conclusions and outlines the perspectives for future research.

2. The mixed wave model with rigid spherical microphone arrays

As shown in Figure 1, a near-field source is located in a bounded space $Z$ . The sound field induced by the source in this space satisfies the inhomogeneous Helmholtz equation

(\nabla^{2} + k^{2}) p (r, ω) = - Q (r, ω),

(1)

where

p (r, ω)

represents the sound pressure at the position

r = (r, θ, ϕ)

r

is the distance from the origin,

θ \in [0, π]

is the elevation angle,

ϕ \in [0, 2 π)

is the azimuth angle,

ω

is the circular frequency,

k

is the wave number,

Q (r, ω)

is the source distribution in the space

Z

, and

\nabla^{2} f = \partial^{2} f / \partial x^{2} + \partial^{2} f / \partial y^{2} + \partial^{2} f / \partial z^{2}

is the Laplace operator. Therefore,

p (r, ω)

can be expressed as the sum of the particular solution

p_{p} (r, ω)

and the homogeneous solution

p_{h} (r, ω)

p (r, ω) = p_{p} (r, ω) + p_{h} (r, ω) .

(2)

Figure 1.

The mixed wave model.

In Eq. (2), $p_{p} (r, ω)$ denotes the direct sound components, which can be modelled by monopole sources

p_{p} (r, ω) = \sum_{i = 1}^{I} G (r | r_{S i}) x (r_{S i}),

(3)

where I is the total number of monopole sources,

G (r | r_{S i})

is the transfer function from the

i th

monopole source at

r_{S i} = (r_{S i}, θ_{S i}, ϕ_{S i})

to the position

r = (r, θ, ϕ)

, and

x (r_{S i})

is the strength (weight) of the

i th

monopole source. Due to the rigid boundary of the spherical microphone array, acoustic scattering occurs on its surface.²⁷ Therefore,

G (r | r_{S i})

is modeled using the Neumann Green’s function²⁸ to account for the scattering effect,

G (r | r_{S i}) = \sum_{n = 0}^{\infty} \sum_{m = - n}^{n} - 4 π j h_{n}^{(2)} (k r_{S i}) [j_{n} (k r) - \frac{j_{n}^{'} (k a)}{{h_{n}^{(2)}}^{'} (k a)} h_{n}^{(2)} (k r)] Y_{n}^{m} (θ, ϕ) Y_{n}^{m *} (θ_{S i}, ϕ_{S i}),

(4)

where

h_{n}^{(2)} (\cdot)

is the spherical Hankel function of the second kind,

j_{n} (\cdot)

is the spherical Bessel function of the first kind,

{h_{n}^{(2)}}^{'} (\cdot)

and

j_{n}^{'} (\cdot)

are the derivatives of

h_{n}^{(2)} (\cdot)

and

j_{n} (\cdot)

{(\cdot)}^{*}

denotes the conjugate,

a

is the radius of the spherical microphone array,

Y_{n}^{m} (θ, ϕ)

is the spherical harmonic function in the direction of

(θ, ϕ)

, and

j = \sqrt{- 1}

is the imaginary unit.

p_{h} (r, ω)

denotes the reflected sound components, which is represented by plane waves,

p_{h} (r, ω) = \sum_{l = 1}^{L} W (r | Ω_{S l}) u (Ω_{S l}),

(5)

where L is the number of plane waves,

W (r | Ω_{S l})

is the transfer function from the l th propagation direction

Ω_{S l}

to the position

r = (r, θ, ϕ)

, and

u (Ω_{S l})

is the weight of the l th plane wave. Similarly, taking scattering into consideration,

W (r | Ω_{S l})

is expressed as

W (r | Ω_{S l}) = \sum_{n = 0}^{\infty} 4 π j^{n} [j_{n} (k r) - \frac{j_{n}^{'} (k a)}{{h_{n}^{(2)}}^{'} (k a)} h_{n}^{(2)} (k r)] \sum_{m = - n}^{n} Y_{n}^{m} (θ, ϕ) Y_{n}^{m *} (θ_{S l}, ϕ_{S l}) .

(6)

Based on Eq. (2), Eq. (3), and Eq. (5), we construct a mixed wave model to express the pressures measured by rigid spherical microphone arrays,

p = Gx + Wu,

(7)

where

p = {[p (r_{M 1}, ω), p (r_{M 2}, ω), \dots, p (r_{M Q}, ω)]}^{T} \in ℂ^{Q \times 1}

is the vector of microphone pressures,

r_{M q} = (a, θ_{M q}, ϕ_{M q})

is the position of the

q th

microphone, Q is the number of microphones, the superscript ^“T” denotes the transpose,

ℂ

denotes the set of complex numbers,

x = {[x (r_{S 1}), x (r_{S 2}), \dots, x (r_{S I})]}^{T} \in ℂ^{I \times 1}

and

u = {[u (Ω_{S 1}), u (Ω_{S 2}), \dots, u (Ω_{S L})]}^{T} \in ℂ^{L \times 1}

are the vectors of the weights of monopole sources and plane waves, respectively,

G = [G_{1}, G_{2}, \dots, G_{I}] \in ℂ^{Q \times I}

with

G_{i} = {[G (r_{M 1} | r_{S i}), G (r_{M 2} | r_{S i}), \dots, G (r_{M Q} | r_{S i})]}^{T} \in ℂ^{Q \times 1}

, and

W = [W_{1}, W_{2}, \dots, W_{L}] \in ℂ^{Q \times L}

with

W_{l} = {[W (r_{M 1} | Ω_{S l}), W (r_{M 2} | Ω_{S l}), \dots, W (r_{M Q} | Ω_{S l})]}^{T} \in ℂ^{Q \times 1}

3. ADMM solver

The source distribution is usually spatially sparse, so $x$ will have few non-zero elements. According to Vekua’s theory,²⁹ the reflected sound component in a bounded region can be well approximated by a limited number of plane waves. When L in Eq. (5) is very large, the most elements of $u$ can be approximated as 0. Therefore, to estimate $x$ and $u$ , we construct the optimization model,

\begin{array}{l} \min λ_{1} {‖ x ‖}_{0} + λ_{2} {‖ u ‖}_{0} \\ s . t . p = Gx + Wu \end{array},

(8)

where

{‖ \cdot ‖}_{0}

denotes the

l_{0}

norm, and

λ_{1} > 0

and

λ_{2} > 0

are the regularization parameters.

l_{0}

norm minimization problem is non-convex, which is challenging to solve, thus Eq. (8) is relaxed into a

l_{1}

norm minimization problem

\begin{array}{l} \min λ_{1} {‖ x ‖}_{1} + λ_{2} {‖ u ‖}_{1} \\ s . t . p = Gx + Wu \end{array} .

(9)

ADMM³⁰ is adopted to solve Eq. (9) due to its computational efficiency and flexibility, where each variable is updated alternately while keeping the other variables fixed. The steps are as follows. We introduce two additional variables, $x_{1}$ and $u_{1}$ , and convert Eq. (9) into

\begin{array}{l} \min λ_{1} {‖ x_{1} ‖}_{1} + λ_{2} {‖ u_{1} ‖}_{1} + \frac{1}{2} {‖ p - Gx - Wu ‖}_{2}^{2} \\ s . t . x = x_{1}, u = u_{1} \end{array} .

(10)

Eq. (10) can be further transformed into a non-constrained optimization problem,

\begin{array}{l} L (x, x_{1}, v_{x}, u, u_{1}, v_{u}) \\ = \frac{1}{2} {‖ p - Gx - Wu ‖}_{2}^{2} + λ_{1} {‖ x_{1} ‖}_{1} + Re {v_{x}^{H} (x - x_{1})} + \frac{ρ_{1}}{2} {‖ x - x_{1} ‖}_{2}^{2} \\ + λ_{2} {‖ u_{1} ‖}_{1} + Re {v_{u}^{H} (u - u_{1})} + \frac{ρ_{2}}{2} {‖ u - u_{1} ‖}_{2}^{2} \end{array},

(11)

where

v_{x}

and

v_{u}

are the Lagrangian multipliers,

Re {\cdot}

represents taking the real part, the superscript ^“H” denotes the Hermitian conjugate, and

ρ_{1} > 0

and

ρ_{2} > 0

are penalty parameters. Let

μ_{x} = v_{x} / ρ_{1}

and

μ_{u} = v_{u} / ρ_{2}

. Eq. (11) becomes

\begin{array}{l} L (x, x_{1}, μ_{x}, u, u_{1}, μ_{u}) \\ = \frac{1}{2} {‖ p - Gx - Wu ‖}_{2}^{2} + λ_{1} {‖ x_{1} ‖}_{1} + \frac{ρ_{1}}{2} ({‖ x - x_{1} + μ_{x} ‖}_{2}^{2} - {‖ μ_{x} ‖}_{2}^{2}) \\ + λ_{2} {‖ u_{1} ‖}_{1} + \frac{ρ_{2}}{2} ({‖ u - u_{1} + μ_{u} ‖}_{2}^{2} - {‖ μ_{u} ‖}_{2}^{2}) \end{array} .

(12)

Eq. (12) can be solved iteratively. In the $γ th$ iteration, by taking the partial derivatives of $L (x, x_{1}, μ_{x}, u, u_{1}, μ_{u})$ with respect to each variable and setting these derivatives to 0, we can obtain

{\begin{cases} \begin{array}{l} x^{(γ + 1)} = {(G^{H} G + ρ_{1} I)}^{- 1} (G^{H} p - G^{H} {Wu}^{(γ)} + ρ_{1} x^{(γ)} - ρ_{1} μ_{x}^{(γ)}) \\ x_{1}^{(γ + 1)} = T_{λ_{1} / ρ_{1}} (x^{(γ + 1)} + μ_{x}^{(γ)}) \\ μ_{x}^{(γ + 1)} = μ_{x}^{(γ)} + x^{(γ + 1)} - x_{1}^{(γ + 1)} \end{array} \\ \begin{array}{l} u^{(γ + 1)} = {(W^{H} W + ρ_{2} I)}^{- 1} (W^{H} p - W^{H} {Gx}^{(γ)} + ρ_{2} u_{1}^{(γ)} - ρ_{2} μ_{u}^{(γ)}) \\ u_{1}^{(γ + 1)} = T_{λ_{2} / ρ_{2}} (u^{(γ + 1)} + μ_{u}^{(γ)}) \\ μ_{u}^{(γ + 1)} = μ_{u}^{(γ)} + u^{(γ + 1)} - u_{1}^{(γ + 1)} \end{array} \end{cases},

(13)

where

T_{α} (\cdot) = sign (\cdot) \max {| \cdot | - α, 0}

is the soft-thresholding. Each variable is initialized as a zero vector. The iterations stop when

({{‖ x^{(γ + 1)} - x^{(γ)} ‖}_{2} / ‖ x^{(γ)} ‖}_{2}) < 10^{- 3}

and

({{‖ p^{(γ + 1)} - p^{(γ)} ‖}_{2} / ‖ p^{(γ)} ‖}_{2}) < 10^{- 3}

are satisfied, or when the maximum number of iterations

γ^{(\max)} = 1000

is reached.

The estimated strengths $\hat{x}$ and weights $\hat{u}$ are used to reconstruct sound field,

p^{\hat{R} e} = G^{\hat{R} e} \hat{x} + W^{\hat{R} e} \hat{u},

(14)

where

p^{\hat{R} e} \in ℂ^{V \times I}

is the vector of reconstructed sound pressures at V positions

r_{Re v} = (r_{Re v}, θ_{Re v}, ϕ_{Re v})

(

v = 1, 2, \dots, V

), and

G^{\hat{R} e} \in ℂ^{V \times I}

and

W^{\hat{R} e} \in ℂ^{V \times L}

can be obtained by replacing

r_{M q}

G

and

W

with

r_{Re v}

4. Numerical simulations

In this section, the performance of the proposed method is evaluated and compared with the Ambisonics method¹⁸ and the plane wave decomposition method³¹ through numerical simulations. Brüel & Kjær Type 8608 rigid spherical microphone array with 36 microphones is used. The radius is 0.0975 m. Figure 2 shows the microphone array and the mixed wave model. The center of the array is placed at the origin of the coordinate system, and a sound source is located on the y–z plane at $x = - 1 m$ . A spherical equivalent source surface is set for plane waves, which is 1 m from the center of the array, with angular intervals of $Δ ϕ = 10 °$ and $Δ θ = 10 °$ . A rectangular equivalent source surface is used for spherical waves, which is at $x = - 1 m$ , with boundaries defined by $y_{\min} = - 0.5 m$ , $y_{\max} = 0.5 m$ , $z_{\min} = - 0.5 m$ and $z_{\max} = 0.5 m$ . The grid spacing is set to 0.05 m. $ρ_{1} = ρ_{2} = 1$ ³², and $λ_{1} = λ_{2} = 0.1$ .To clearly demonstrate and compare performance, the reconstruction region $Z^{\hat{R} e}$ is set as a 1 m×1 m square in the x-y plane, centered at the origin. The theoretical sound pressures are given by the image source method.³³ To evaluate the performance of each method, the reconstruction error is defined as

Error = \frac{{‖ p^{The} - p^{Re} ‖}_{2}}{{‖ p^{The} ‖}_{2}} \times 100 %,

(15)

where

p^{The}

is the vector of theoretical pressures at the reconstruction positions, and

p^{Re}

is the vector of the reconstructed pressures.

Figure 2.

Spherical microphone array and the equivalent source model.

4.1. Sound field reconstruction in a semi-enclosed rectangular space

To simulate a semi-enclosed rectangular space, an infinitely long rectangular duct is modeled. Free boundary conditions are applied along the x-axis, indicating the space is infinitely extended in both the positive and negative x-directions. Along the y- and z-axes, rigid boundary conditions are set at y=0.75 m, y=-0.75 m, z=1.2 m and z=-1.2 m, with an absorption coefficient of 0.3. According to the Eyring formula,³⁴ T₆₀=190 ms. A source is placed at (-1 m, 0, 0), with a strength of 94 dB (referenced to 20 μPa).

The proposed method, the Ambisonics method, and the plane wave decomposition method are used to reconstruct the sound field within the region $Z^{\hat{R} e}$ . Figure 3 shows the theoretical values of the real parts of the sound pressures at 1000 Hz in the reconstruction region, obtained using the 10th-order image source method. Figure 4 presents the reconstruction results and corresponding reconstruction error. The black dotted circle in the figures represents the outline of the used rigid spherical microphone array. As shown in Figure 4(a1) and (a2), the reconstruction result of Ambisonics is close to the theoretical sound field within the array radius, with reconstruction error below 10%. However, large distortions occur when far away the array due to the order truncation of spherical harmonics. The result of the plane wave decomposition method, as shown in Figure 4(b1), shows distinct plane wavefronts. Comparing Figure 4(a2) and (b2), it can be observed that, the area where reconstruction error is below 10% (i.e., the bright region) is larger in the map of the plane wave decomposition method. For the proposed method, as shown in Figure 4(c1) and (c2), the reconstructed real parts of the sound pressures coincide best with the theoretical values. Correspondingly, the area where reconstruction error is less than 10% is significantly larger than the other two methods. This demonstrates that the proposed method achieves more accurate reconstruction in a larger region, which benefits from the mixed wave model.

Figure 3.

The theoretical values of the real parts of the sound pressures at 1000 Hz in a semi-enclosed rectangular space.

Figure 4.

Reconstruction results and error in the semi-enclosed rectangular space.

As for computational cost, the proposed method requires longer running time for sound field reconstruction than Ambisonics and the plane wave decomposition method. Specifically, for the above case, the running time of the proposed method, Ambisonics and the plane wave decomposition method is 4.29 s, 1.09 s, and 2.07 s on the 3.6GHz AMD Ryzen7 7745HX CPU, respectively.

4.2. Sound field reconstruction in a closed rectangular space

The proposed method is further applied to reconstruct the sound field within a closed space, and compared with the Ambisonics method and the plane wave decomposition method. To simulate a closed space, rigid planar boundaries are applied at x=1.5 m, x=-1.5 m, y=0.75 m, y=-0.75 m, z=1.2 m and z=-1.2 m with an absorption coefficient of 0.3. The sound source position and strength remain the same as those used in Section 4.1. Figure 5 shows the closed space and the theoretical values of the real parts of the sound pressures at 1000 Hz, which are computed through the 10th-order image source method.

Figure 5.

The closed space and the theoretical values of the real parts of the sound pressures at 1000 Hz.

Figure 6 presents the sound field reconstruction results and reconstruction error at 1000 Hz in the closed rectangular space. By comparing the results and the error of the three methods, similar conclusions can be drawn. Among the three methods, the reconstructed field of the proposed method is the most consistent with the theoretical field. Correspondingly, the area with reconstruction error less than 10% (i.e., the bright region) is the largest for the proposed method, followed by the plane wave decomposition method. The Ambisonics method maintains reconstruction error below 10% only within the region close to array. These results demonstrate that the proposed method outperforms the plane wave decomposition method and the Ambisonics method.

Figure 6.

Reconstruction results and error in the closed rectangular space.

Monte Carlo trials are conducted to analyze the reconstruction performance under different frequencies and absorption coefficients. The frequency ranges from 500 Hz to 2000 Hz in a step of 500 Hz. The absorption coefficient $α$ varies from 0.1 to 0.7 in a step of 0.2, corresponding to reverberation time T₆₀ of 568 ms, 159 ms, 82 ms, and 47 ms. For each frequency and absorption coefficient, 50 trials are considered. In each trial, a monopole source in the mixed wave model is randomly selected as the sound source, with a source strength of 94 dB (referenced to 20 μPa).

Figure 7 shows average reconstruction error of the three methods within regions marked by radius 0∼a, a∼2a and 2a∼3a. In the region 0∼a, the average reconstruction error of Ambisonics, the plane wave decomposition method, and the proposed method is below 10% across all cases. In other words, the area corresponding to error below 10% in the error map accounts for 100%. This indicates that all three methods can achieve accurate reconstruction within the region of 0∼a. However, the maximum error of the three methods is 9.4%, 1.5% and 1%, respectively, indicating that the proposed method has the higher reconstruction accuracy than the other two methods. In the region a∼2a, the average reconstruction error of Ambisonics exceeds 10% under all conditions. By comparison, the plane wave decomposition method and the proposed method achieve reconstruction error below 10% over 43% and 75% of the error map, respectively. In the region 2a∼3a, the three methods show larger reconstruction error, and the proportion of the area where the error is below 10% decreases to approximately 6.3% for the plane wave decomposition method and 18.8% for the proposed method. In addition, the average reconstruction error of the proposed method is obviously lower than the plane wave decomposition for all cases. In summary, the proposed method achieves higher reconstruction accuracy over a larger spatial region compared with both Ambisonics and the plane wave decomposition method. Specifically, the proposed method generally expands the effective reconstruction region to twice that of Ambisonics, and even up to three times under low-frequency and weakly reverberant conditions. Meanwhile, it improves the reconstruction accuracy (measured by the proportion of the area with error below 10%) by 32% and 12.5% within a region spanning two to three times the array radius, compared with the plane wave decomposition method.

Figure 7.

The average reconstruction error within regions marked by radius $0 \sim a$ 、 $a \sim 2 a$ and $2 a \sim 3 a$ .

The proposed method outperforms both Ambisonics and the plane wave decomposition method primarily because these conventional methods rely on spherical harmonics expansion, whose effective reconstruction region is inherently constrained by the microphone array.^35,36 Specifically, due to the finite number of microphones and discrete spatial sampling, the spherical harmonics expansion must be truncated, which limits the spatial region over which Ambisonics and plane wave decomposition can achieve accurate reconstruction. In contrast, the proposed method formulates the mixed wave model directly in the spatial domain rather than in the spherical harmonics domain. As a result, it avoids the truncation-related limitations imposed by the microphone array, enabling more accurate sound field reconstruction over an extended spatial region.

5. Finite element method-based virtual validation experiment

In this section, the performance of the proposed method is evaluated based on finite element analyses.³⁷ A sound field model, as shown in Figure 8(a), is built in the COMSOL. The gray cuboid represents the computational area, with dimensions of $4 m \times 1.5 m \times 2.4 m$ . The sphere inside the cuboid represents the rigid spherical microphone array (Brüel & Kjær Type 8608, 36-channel rigid array), and the symbol “★” denotes the sound source. The sound source position, boundary conditions, and reconstruction region are consistent with those in Section 4.2. Figure 8(b) shows the theoretical values of the real parts of the sound pressures at 1000 Hz, provided by COMSOL.

Figure 8.

Sound field model and the theoretical value of the real part of the sound pressure field at 1000 Hz provided by COMSOL.

Figure 9 presents the reconstruction results and reconstruction error of the Ambisonics method, the plane wave decomposition method, and the proposed method. It is clear that the proposed method continues to demonstrate the best reconstruction performance. It provides a larger region where reconstruction error is below 10%, effectively enabling accurate sound field reconstruction in reverberant environments. These findings are consistent with the conclusions in Section 4.2.

Figure 9.

Reconstruction results and error in the closed rectangular space based on finite element analyses.

6. Conclusions and perspectives

This paper proposes a method with rigid spherical microphone arrays for sound field reconstruction in reverberant environments. This method establishes a mixed wave model that represents incident and reflected sound fields using monopole sources and plane waves, and achieves sound field reconstruction via sparse constraint and ADMM. The proposed method generally doubles the effective reconstruction region of Ambisonics and improves reconstruction accuracy by up to 32% and 12.5% within a range of two to three times the array radius, compared with the plane wave decomposition method. Its superiority stems from operating in the spatial domain and avoiding limitations associated with spherical harmonics expansion.

As the frequency increases and reflections become stronger, the proposed method experiences some performance degradation. Future work will focus on addressing this limitation to enhance the method’s applicability. In addition, the real-world validation experiments are currently not possible due to the lack of suitable experimental conditions. We therefore plan to improve our experimental setup to provide validation under practical acoustic conditions in the future.

Footnotes

ORCID iDs

Yang Yang

Shijia Yin

Ruixue Ma

Tongrui Peng

Linbang Shen

Zhigang Chu

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Natural Science Foundation of China, grant number 12304519, the New Chongqing Youth Innovation Talent Project, grant number CSTB2024NSCQ-QCXMX0068, and the Science and Technology Research Program of Chongqing Municipal Education Commission, grant number KJZD-K202303202.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Borra

Gebru

Markovic

. Soundfield reconstruction in reverberant environments using higher-order microphones and impulse response measurements: In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019, pp. 281–285.

Liu

Qiu

, et al. Reproduction of sound field inside an aircraft mock-up for evaluating ANC devices. Applied Acoustics 2022; 188: 108588. https://doi.org/10.1016/j.apacoust.2021.108588

Wen

Fan

, et al. A multizone sound field reproduction method with constrained zone acoustic energy in the modal domain. Applied Acoustics 2024; 220: 109959. https://doi.org/10.1016/j.apacoust.2024.109959

Koyama

Kimura

Ueno

. Weighted pressure and mode matching for sound field reproduction: theoretical and experimental comparisons. IEEE/ACM Transactions on Audio Speech and Language Processing 2023; 71(4): 173–185. https://doi.org/10.17743/jaes.2022.0058

Miotello

Comanducci

Pezzoli

, et al. Reconstruction of sound field through diffusion models: In 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, 14-19 Apr. 2024, pp. 1476–1480.

Verburg

Elvander

van Watershoot

, et al. Optimal sensor placement for the spatial reconstruction of sound fields. EURASIP Journal on Audio, Speech and Music Processing 2024; 2024(1): 41. https://doi.org/10.1186/s13636-024-00364-4

Deng

Gao

Chen

. Ultrawide attenuation bands in gradient metabeams with acoustic black hole pillars. Thin-Walled Structures 2023; 184: 110459. https://doi.org/10.1016/j.tws.2022.110459

Deng

Gao

. Broadband vibroacoustic reduction for a circular beam coupled with a curved acoustic black hole via nullspace method. International Journal of Mechanical Sciences 2022; 233: 107641. https://doi.org/10.1016/j.ijmecsci.2022.107641

Wang

Fang

Duan

, et al. Phased-array-based sub-Nyquist sampling for joint wideband spectrum sensing and direction-of-arrival estimation. IEEE Transactions on Signal Processing 2018; 66(23): 6110–6123. https://doi.org/10.1109/tsp.2018.2875420

10.

Wang

Dou

Chen

, et al. Effective block sparse representation algorithm for DOA estimation with unknown mutual coupling. IEEE Communication Letters 2017; 21(12): 2622–2625. https://doi.org/10.1109/lcomm.2017.2747547

11.

Liu

, et al. Experimental synthesis of random pressure fields based on transfer-matrix analysis on 1D arrays. Journal of Sound and Vibration 2025; 597: 118822. https://doi.org/10.1016/j.jsv.2024.118822

12.

Meng

Ning

Liu

, et al. Broadband two-dimensional off-grid DOA compressive beamforming based on block-sparse Bayesian learning. Applied Acoustics 2025; 230: 110421. https://doi.org/10.1016/j.apacoust.2024.110421

13.

Chu

Weng

Yang

. Determination of propagation model matrix in generalized cross correlation based inverse model for broadband acoustic source localization. Journal of the Acoustical Society of America 2020; 147(4): 2098–2109. https://doi.org/10.1121/10.0000973

14.

Yang

Shu

, et al. Data-driven high-resolution total focus imaging from array ultrasonic time-domain signals of reinforced concrete material. Construction and Building Materials 2025; 492: 143048. https://doi.org/10.1016/j.conbuildmat.2025.143048

15.

Carneiro

Berry

. Three-dimensional sound source diagnostic using a spherical microphone array from multiple capture positions. Mechanical System and Signal Processing 2023; 199: 110455. https://doi.org/10.1016/j.ymssp.2023.110455

16.

Yang

Chu

Yin

. Two-dimensional grid-free compressive beamforming with spherical microphone arrays. Mechanical System and Signal Processing 2022; 169: 108642. https://doi.org/10.1016/j.ymssp.2021.108642

17.

Chu

Yang

. Deconvolution for three-dimensional acoustic source identification based on spherical harmonics beamforming. Journal of Sound and Vibration 2015; 51(20): 45–53.

18.

Rafaely

. Fundamentals of Spherical Array Processing. Springer Topics in Signal Processing, 2015.

19.

Xiao

Hui

CTJ

, et al. Speech intelligibility in noise with varying spatial acoustics under Ambisonics-based sound reproduction system. Applied Acoustics 2021; 174: 107707. https://doi.org/10.1016/j.apacoust.2020.107707

20.

Ben-Hur

Alon

Mehra

, et al. Binaural Reproduction Based on Bilateral Ambisonics and Ear-Aligned HRTFs. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021; 29: 901–913. https://doi.org/10.1109/taslp.2021.3055038

21.

Samarasinghe

Abhayapala

Poletti

. Wavefield analysis over large areas using distributed higher order microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2014; 22(3): 647–658. https://doi.org/10.1109/taslp.2014.2300341

22.

Fernandez-Grande

. Sound field reconstruction using a spherical microphone array. Journal of the Acoustical Society of America 2016; 139(3): 1168–1178. https://doi.org/10.1121/1.4943545

23.

Verburg

Fernandez-Grande

. Reconstruction of the sound field in a room using compressive sensing. Journal of the Acoustical Society of America 2018; 143(6): 3770–3779. https://doi.org/10.1121/1.5042247

24.

Koyama

Saruwatari

. Sound field decomposition in reverberant environment using sparse and low-rank signal models. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20-25 March 2016, pp. 395–399.

25.

Koyama

Daudet

. Sparse representation of a spatial sound field in a reverberant environment. IEEE Journal of Selected Topics in Signal Processing 2019; 13(1): 172–184. https://doi.org/10.1109/jstsp.2019.2901127

26.

Damiano

Borra

Bernardini

, et al. A compressive sensing approach for the reconstruction of the soundfield produced by directive sources in reverberant rooms. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2024; 32: 2667–2679. https://doi.org/10.1109/taslp.2024.3398999

27.

Gai

Yang

Zhao

. Electroacoustic wave scattering from cylindrical inhomogeneities in transversely isotropic piezoelectric composites. Materials&Design 2026; 261: 115296. https://doi.org/10.1016/j.matdes.2025.115296

28.

Jarrett

Habets

Naylor

. Theory and applications of spherical microphone array processing. Springer Topics in Signal Processing, 2017.

29.

Moiola

Hiptmair

Perugia

. Plane wave approximation of homogeneous Helmholtz solutions. Zeitschrift für Angewandte Mathematik und Physik 2011; 62: 809–837. https://doi.org/10.1007/s00033-011-0147-y

30.

Boyd

Parikh

Chu

, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations&Trends in Machine Learning 2010; 3(1): 1–122. https://doi.org/10.1561/2200000016

31.

Duraiswami

Zotkin

, et al. Plane-wave decomposition analysis for spherical microphone arrays. In: 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 16-19 Oct. 2005, pp. 150–153.

32.

Yang

Chu

Yang

, et al. Enhancement of direction-of-arrival estimation performance of spherical ESPRIT via atomic norm minimization. Journal of Sound and Vibration 2021; 491: 115758. https://doi.org/10.1016/j.jsv.2020.115758

33.

Allen

Berkley

. Image method for efficiently simulating small-room acoustics. Journal of the Acoustical Society of America 1979; 65(4): 943–950. https://doi.org/10.1121/1.382599

34.

Eyring

. Reverberation time in “Dead” rooms. Journal of the Acoustical Society of America 1930; 1(2): 217–241. https://doi.org/10.1121/1.1901884

35.

Poletti

. Three-dimensional surround sound systems based on spherical harmonics. Journal of the Audio Engineering Society 2005; 53(11): 1004–1025.

36.

Kennedy

Sadeghi

Abhayapala

, et al. Intrinsic limits of dimensionality and richness in random multipath fields. IEEE Transactions on Signal Processing 2007; 55(6): 2542–2556. https://doi.org/10.1109/tsp.2007.893738

37.

Milić

Marinković

Ćojbašić

. Geometrically nonlinear analysis of piezoelectric active laminated shells by means of isogeometric FE formulation. Facta Universitatis Series Mechanical Engineering 2025; 23(2): 387–405. https://doi.org/10.22190/fume050123059m