Capabilities of inverse scheme for acoustic source localization at low frequencies

We present the capabilities of a recently developed inverse scheme for source localization at low frequencies within an arbitrary acoustic environment. The inverse scheme is based on minimizing a Tikhonov functional matching measured microphone signals with simulated ones. We discuss the sensitivity of all involved parameters, the precision of geometry and physical boundary modeling for the numerical simulation using the finite element (FE) method, and the automatic determination of the positions of the recording microphones being distributed around the object of investigation. Finally, we apply the inverse scheme to a real-world scenario and compare the obtained results to state-of-the-art signal processing approaches,


Introduction
The knowledge about position and distribution of sound sources is necessary for taking actions in noise reduction. To obtain these information, a microphone array method can be applied. Thereby, a common technique is acoustic beamforming which can be used to determine the source position and distribution as well as the source strength. This technique is based on evaluating simultaneously recorded sound pressure data from microphone array measurements. The sound pressure received at the different microphone positions is mapped onto an image of the acoustic source field. The results are given in so called source maps (beamform maps), which are used for visualization. Thereby, the localization of acoustic sources, especially in the low frequency range, is still a topic of ongoing research with strong practical applications, e.g. wind turbines, heat pumps, transformers. The fundamental processing method, frequency domain beamforming [1], is robust and fast. Herein, the source map is computed by with the (weighted) steering vector e w and the cross spectral matrix e C of the microphone signals. 1 Here, the steering vector accounts for the phase shift and the amplitude correction as well as microphone weighting. In this simple method, the limitations regarding resolution and dynamic range are caused by the array response function (point spread function, PSF). A modification of this classic approach is delivered by functional beamforming (FuncBF) [2,3], which leads to an improvement of resolution and dynamic range, while the computational cost remains almost the same as in the standard approach. Furthermore, deconvolution techniques (e.g. DAMAS [4], Clean-SC [5], etc.) can be used, which attempt to eliminate the influence of the PSF on the raw source map. In addition, there are beamforming independent inverse methods (like e.g. L1 generalized inverse beamforming [6], crossspectral matrix fitting [7], etc.), which aim to solve an inverse problem considering the presence of all acoustic sources at once in the localization process. Resolution and dynamic range are greatly improved by these advanced methods at the costs of computational time. In literature, many different comparisons of beamforming methods can be found. Exemplarily, in [8,9] and [10], simulated data was used as input, and in [11,12], data coming from experiments. A comprehensive overview of different acoustic imaging methods can be found in [8,13,14].
Despite these advances in beamforming techniques, it has to be mentioned that major limitations are caused by the source model. Most beamforming algorithms model the acoustic sources as monopoles or/and dipoles. Moreover, the steering vector e w, describing the transfer function between source and microphone, is usually modeled by Green's function for free radiation. Different formulations of the steering vector exist [15], all of them exhibiting a trade-off between correct reconstruction of the source strength and the location. Further challenges are posed to these methods by source localization at low frequencies and in environments with partially or fully reflecting surfaces, for which beamforming techniques do not provide physically reasonable source maps. Furthermore, obstacles in the sound propagation path that distort the phase information can not be considered. To take a reverberant environment into account, the steering vectors have to be adapted. In [16] two approaches to consider reflective surfaces are envisaged, namely modeling the reflections by a set of monopoles located at the image source positions and an experimentally based identification of the transfer function.
Beamforming is mainly used for acoustic source localization rather than for obtaining quantitative source information. The qualitative statements given by the source map provide information about the distribution and position, and allow a relative comparison of the source strength at the considered map. In many cases, this information may already be sufficient to determine the origin of the sound emission. However, sometimes quantitative source information is also needed. The estimation of quantitative source spectra is not straightforward [17], but can be obtained through integration methods. To this end, the source map is integrated over a certain region to obtain a pressure spectrum for this specific area. Thus, the source map is required before integration. A distinction must be made between integrating the raw and deconvolved map. The deconvolved maps can be seen as ideal images of the source distribution and therefore, the integration may be done without further processing [18]. In [19], an overview of different integration methods is given.
To overcome the above mentioned limitations, an inverse scheme has been developed [20], which fully solves the Helmholtz equation using the finite element (FE) method with the correct boundary conditions. This approach is used in combination with microphone array measurements to localize and quantify low-frequency sound sources. The main advantage of this approach is given by the fact that a detailed source distribution, both in amplitude and phase is identified, and finally with this information a numerical simulation to obtain the acoustic field can be performed. Here, our main focus is on the sensitivity analysis of the parameters involved in the inverse scheme and its application to real-world sound source identification. Thereby, the challenges in the identification process of acoustic sources at low frequencies in an arbitrary environment are discussed, and the advantages and disadvantages of such an inverse scheme compared to advanced signal processing approaches (e.g., Clean-SC) are pointed out.
The rest of the paper is structured as follows: In Section 2 we summarize our developed inverse scheme and highlight the main features. In addition, a sensitivity analysis of the key algorithmic parameters is performed. In Section 3, the experimental environment, where the sound pressure measurements take place, is described and characterized. Furthermore, we discuss the challenges of the physical modeling and numerical simulation of the experimental environment. In addition, the algorithm and practical realization of an automatic determination of the positions of the recording microphones are presented. In Section 4, we evaluate the quaility of the numerical FE model towards measurements, when the experimental environment is excited by an omni-directional loudspeaker. Section 5 presents and discusses in detail all results of the inverse scheme for sound source localization at low frequencies and compares them to state-of-the-art methods. Finally, Section 6 concludes the main findings and provides an outlook.

Inverse scheme
The inverse scheme is based on a sparsity promoting Tikhonov functional to match measured and simulated microphone signals. A main part of the scheme is the solution of the wave equation in the frequency domain (Helmholtz equation), which allows to fully consider realistic geometry and boundary condition scenarios.

Physical and mathematical model
Assuming that the original geometry of the setup including the boundary conditions and the Fourier-transformed acoustic pressure signals p ms m x ð Þ (x being the angular sound frequency, m = 1, . . ., M) at the microphone positions x m are given, the physical model is represented by the Helmholtz equation. Here, we consider the following generalized form of the Helmholtz equation in the computa- with the searched for interior r in acoustic sources, and including speed of sound c 0 as well as density q 0 . In this way, poroelastic materials (e.g., porous absorbers) can be considered as a layer of an equivalent complex fluid having a frequency-dependent effective density e . eff and bulk modulus e K eff . With this formulation the acoustic properties of materials can be adjusted to a certain absorption coefficient. For this purpose a large number of models for characterization have been established for poroelastic materials, see e.g. [21]. Furthermore, the sound sources on the surface are modeled by Since the identification is done separately for each frequency x, the dependence on x is neglected in the notation. Now, the considered inverse problem is to reconstruct r in and/or r bd from pressure measurements at the microphone positions x 1 ; . . . ; x M . For the acoustic sources the following ansatz is made, with the searched for amplitudes a 1 ; a 2 ; :::; a N 2 R and phases u 1 ; u 2 ; :::; u N 2 ½Àp=2; p=2. Here, N denotes the number of possible sources and d xn the delta function at position x n .

Optimization based source identification
The fitting of the parameters by means of Tikhonov regularization amounts to solving the following constrained optimization problem min p2U ;a2R N ;u2½À p 2 ; p 2 N J ðp; a; uÞ; s: t: Eq: (1) is fulfilled ð5Þ where a = (a 1 , . . ., a N ), u = (u 1 , . . ., u N ) and In here, the box constraints on the phases u n are realized by a barrier term with some penalty parameter q > 0. This also helps to avoid phase wrapping artefacts. The penalty parameter q and the regularization parameters a and b are chosen according to the sequential discrepancy principle [22] q with x the smallest exponent such that following inequality ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi X M m¼1 pðx m Þ À p ms is fulfilled, with e being the measurement error. Sparsity of the reconstruction is desired to pick the few true source locations from a large number N of trial sources. By choosing q 2 (1, 2] close to one, an enhanced sparsity can be obtained. The optimization uses a gradient method with Armijo line search exploiting the adjoint method to efficiently obtain the gradient of the objective function. Hence, the computational time does not depend on the number of microphones M nor on the assumed number of possible sources N. Since the optimization algorithm uses different stopping criteria, a scaling factor is introduced in (6), where the maximum absolute value of the measured pressure jp ms m j is scaled to an amplitude of w. Further details about the inverse scheme can be found in [20].
In the current implementation, the FE method is applied for solving the Helmholtz equation (1). Hence, the applicability of the inverse scheme towards computational time is mainly restricted to the low frequency range, since the discretization effort and therefore the number of degree of freedoms in 3D is of the order Oðe À3 size Þ, where e size is the mesh size being determined by In (9) f max denotes the highest frequency of the acoustic sources, c 0 the speed of sound and N e should be at least 10 for linear FE basis functions [23]. Please note that any other numerical method, e.g., the boundary element method (BEM) can be applied, which may be even more efficient with respect to computation time, depending on the particular scenario.

Sensitivity analysis
For identifying important parameters that dominate the model behavior, sensitivity analysis is a commonly used technique. In order to gain an understanding of the relationship between the input variables and the identification result, a global sensitivity analysis (gSA) using the Sobol' method [24] is carried out. It is a quantitative method which allows for an assessment on how much the input parameter X i is more important than another.
As a demonstrative example a dipole source above a sound hard floor is chosen (see Fig. 1a). To approximate free radiation, a perfectly matched layer is used on the remaining sides. The chosen frequency is 1000 Hz, so that the wavelength k is 0.34 m. The sound pressure field as well as the true source distribution can be found in Figure 1b and one of the results of the inverse scheme is shown in Figure 1c.
To emulate the sound pressure measurement a forward run on a fine FE mesh is done (e size = k/20), whereas the inverse run for the identification is carried out on a coarser grid with e size = k/12 in order to avoid an inverse crime (forward and inverse run on the same grid). The obtained results are used to investigate the dependence of the identification result on the inversion parameters a 0 , b 0 , q 0 , q and w. For reasons of simplicity and reduced calculation effort, the simulations are initially performed in 2D. The conclusions of these results are adopted for 3D examples as well as for the real-world situation presented in Section 3.
For the quantification of the quality of the identification, three different error measures were used: 1. the relative L2 error between measured p ms m and simulated pressure values p inv m at M microphone positions 2. the relative L2 error between the true source strengths r ms n and the reconstructed r inv n at all points N src in the source region X src 3. and the relative L2 error between the true source strengths r ms n and the reconstructed r inv n at points in a circle with radius k/10 around the actual source position in the source region It should be stressed that sound pressure p and source strength r are complex quantities. In real-life situations only the error between the sound pressure at the microphone positions p err , equation (10), can be determined. Three different microphone arrangements, which are used here as an example to demonstrate the identification, are shown in Figure 1b (indicated by , , , each with M = 6). Two arrangements ( and ) show good source identification capability (low r err and r err,loc ), whereas configuration results in a bad identification result (see Tab. 1). Next, this setup, including the three different microphone layouts, is used to show the sensitivity to the parameters of the inverse scheme. Additionally, this setup was used to study the effect of the microphone positions and the number of microphones M (see [25]). In [20] and [25] it was shown that the inverse scheme performs well, also at high SNR levels (up to 7 dB).
The Sobol' method is a variance-based decomposition technique which provides a quantitative measure of how each input parameter and its interactions contributes to the output variance. In this popular method, the variance V of the model output Y = f (X 1 , . . ., X P ) is decomposed into contributions from effects of single parameters, combined effects of pairs of parameters, and so on (see [26]) leading to The variance contribution of each input parameter, and their interacting contributions on the total output variance, respectively, are expressed as so-called Sobol' sensitivity indices (SI) [24]: The first order SI S i provides the main effect of the parameter X i namely the contribution of the ith input parameter to the total model variance. The second order SI S ij is a measure of the impact of the interaction between parameter X i and X j . Higher order SI are usually not of practical interest [27]. Furthermore, total order SI were introduced in [28] which takes into account the main effect of X i and all its interactions with the other parameters (second-order and higher order effects). Therefore, S Ti is always greater than S i If the considered model is additive, the following relations hold For additive models, the effect of the independent parameter X i does not depend on the values of other independent parameters X j , hence no parameter interaction exists. Table 2 summarizes the interpretations that can be made on the SI. There are several tools freely available for performing sensitivity analyses, e.g., SAFE Matlab toolbox [29], SIMLAB a stand-alone software package [30], the C ++ based PSUADE software [31,32], the GUI-HDMR Matlab package [33] or the sensitivity analysis library in python (SALib) [34]. These tools offer different sampling techniques and SA methods. Here, to perform the SA, the tool PSUADE is used. Before a SA can be carried out, the input (i.e. a 0 , b 0 , q 0 , q, w) and output parameters (i.e. p err , r err , r err,loc ) have to be defined. Moreover, the probability density functions and value ranges of the input parameters have to be specified (see Tab. 3 for the definition).
Furthermore, the number of model evaluations N eval (also called sample points) must be chosen appropriately, since the amount of information revealed via SA depends heavily on the number of sample points [35]. The amount of model evaluations N eval that are sufficient for a reliable Sobol' sensitivity analysis depend on the number of input parameters and model complexity [36]. In principle, N eval = Q(2P + 1) are necessary for obtaining the Sobol' SI (Q is the size of an initial Monte Carlo sample). Using the computational strategy proposed in [37], the computational costs can be reduced to N eval = Q(P + 2). However, there is no general agreement on the optimal sample size Q. In general, the higher the number of input parameters, the larger the sample size should be [36]. According to [38], the sample size Q was set to 1000, which should provide accurate first order SI (main effects). For obtaining the 95%-confidence intervals on the first and total SI, bootstrapping with re-sampling was applied [39,40]. This can be used to check the appropriateness of the chosen sample size. To this end, the parameters should have narrow confidence intervals [36]. The generation of the input matrix as well as the computation of the Sobol' SI is performed in PSUADE. The model evaluations are made with openCFS [23], a FE-based multi-physics modeling and simulation tool.
In the plots of the sensitivity indices (SI) (Figs. 2a-2c) also the bootstrapped result (bootstrap sample size was 100) is shown (dark color bars). This was done to check the appropriateness of the sample size. Since, all SI have narrow confidence intervals, the sample size for the Sobol' method can be assumed to be sufficient. Considering the results it can be observed that b 0 and q 0 have a SI < 0.05 in all settings, which is an indication that these two parameters have no significant effect on the output. Since the sum of total order SI is higher than one in all settings, significant parameter interactions exist. When considering p err (Fig. 2a), the main effect is primarily determined by the parameters a 0 and w. A similar conclusion can be drawn for r err,loc (Fig. 2c), where the main effect is due to w. For r err (Fig. 2b), a clear statement cannot be given, but w or a 0 are suspected to determine the main effect. The parameter q has a small first order SI in all settings, but a high total order SI which indicates that it has significant interactions with the other parameters.
Hence, the following conclusions can be drawn from the sensitivity analysis: w and a 0 are the most significant parameters for all model outputs, q has significant interactions with others, b 0 and q 0 have neither significant effect on the model output, nor strong interactions with other parameters.
A further, very important point for a successful source identification is the positioning and number of the microphones. In [20] and [25] it is shown that a small p err does not necessarily lead to a good source identification if the number of microphones is too small or the microphone positions are not sensitive enough. For precise source identification, the microphones should be positioned at locations, Table 1. Error values of the three microphone arrangements. Inversion parameters: a 0 = 10 À4 , b 0 = 10 À3 , q 0 = 1, q = 1. S i % 0 Parameter X i has no significant effect on the model output* S Ti % S i Parameter X i has no significant interactions with others Model is additive (no parameter interactions) Significant parameter interactions in the model exists *In [36], the threshold value is defined by 0.05. Table 3. Value ranges of the input parameters for the SA (the ranges were chosen empirically). The parameters were assumed to be uniformly distributed.
min max where the acoustic pressure has a maximum. Since these optimal microphone positions cannot be determined before the actual measurement, a successful source localization may need re-positioning of microphones. It is also preferable to surround the sound source with microphones and to allow for a spatial distribution of microphones near the source. Furthermore, the microphone-to-source points ratio v = M/N should be at least 4%, see [25]. Consequently, a higher discretization of the source region will lead to a higher number of microphones. Ratios higher than v > 35% should be avoided, since an enhancement in the source identification is not expected, due to numerical noise.
3 Real-world situation

Experimental setup
To demonstrate the applicability of the inverse scheme in real-world scenarios, microphone array measurements were performed in a room where a generic sound source was located. The room is partially lined with porous absorbers (Baso Plan 100) on the walls and ceiling, leading to the reverberation time shown in Figure 3. A ground plan of the room is shown in Figure 4 including the sound source location. There is also a cupboard in the room and the DAQ (data acquisition) unit for recording the sound pressure. The sound source is a box (dimensions: 50 Â 100 Â 55 cm) made of Doka formwork sheet, see Figure 5a. The sound can be generated by three separately excitable loudspeakers (VISATON WS 20 E, ; = 8"). In order to characterize the source, the normal velocity is measured with a laser scanning vibrometer (LSV) Polytec PSV-500. In the measurement, first speaker L1 and afterwards L2 was active. The excitation frequency of the speakers was 250 Hz. The measured normal velocity level L v is depicted in Figures 5b and 5c. Here, just the side with the active speaker is illustrated.

Physical and numerical modeling
The application of the inverse scheme needs physical and geometrical modeling. In doing so, one has to deal with some challenges concerning real versus ideal world, which will be discussed in this section. Three main challenges will be considered in detail: geometrical modeling needed for the application of FEM, physical modeling of boundary conditions, and the determination of the microphone positions.
The presented inverse scheme requires an appropriate FE model of the measurement environment. The model was created with as few details as necessary to keep the model as simple as possible. The geometric modeling is not completely independent of the physical modeling (boundary conditions), which should be kept in mind.
The created FE model is depicted in Figure 6. The cupboard and the DAQ unit in the room are modeled by simple blocks. Moreover, the I-beam on the ceiling was considered in the FE model. The supports for the microphones (microphone trees, see Fig. 9) and the mounting frame (see Fig. 5) of the generic sound source were neglected in the modeling process, since the dimensions are small compared to the wavelength according to the excitation frequency. The grey region X damp represents the porous absorber on the walls After the considerations of geometrical modeling, the physical modeling aspects will be discussed. In order to obtain accurate data for an acoustic field by simulation, the boundary conditions necessary for the solution of the acoustic wave equation have to be determined in a suitable way. For the characterization of the materials present in the room, the absorption coefficient a was determined by impedance tube measurements applying the 2p-method (ISO 10534-2 [41]). The absorption curves for the plasterboard walls as well as the Doka formwork sheet is shown in Figure 7.
From these absorption curves the following assumptions were made The plasterboard walls (denoted by C plaster in Fig. 6), on which no absorber is mounted, are modeled as fully reflective, since no absorption effect can be observed up to 500 Hz. The same applies to the Doka formwork sheet, which is the material of the generic sound source (C src ). It also shows no absorption behavior and therefore, is considered to be fully reflective.
Moreover, the concrete floor C floor and the surface of the cupboard C cb as well as of the DAQ unit C daq is assumed to be fully reflective. This also applies to the surface C Ibeam of the steel I-beam. Hence, the homogeneous Neumann boundary condition is used for these surfaces Next, the modeling of the porous absorber (Baso Plan 100), denoted by X damp , on the walls and on the ceiling is discussed. Therefore, first, impedance tube measurements were conducted to characterize the properties of the absorber (see Fig. 8 for the obtained absorption curves). The measurements with the impedance tube (grey curves, mean value shown in blue) spread considerably in the lower frequency range, whereas nearly a constant absorption behavior above 400 Hz is observed.
For the modeling of the porous absorber an equivalent fluid model (assuming isotropic and volume averaged features) is used. For this purpose the Delany-Bazely-Miki (DBM) model is used, which is purely empirical and derived from measurements on many highly porous materials. The basis of the model is presented in [42] and was adapted by [43]. Following formulas are given for the characteristic impedance e Z c and the wave number e k Here, the only material parameter is the static air flow resistivity N. The limits for the validity of the model is given by [42] 0:01 whereas the lower limit is not that critical for the quality, which means that the model behaves well below this limit [43]. The static air flow resistivity can be measured according to [44], but here an inverse determination was chosen by adapting the material model to the measured absorption values. For this process, the sample thickness of the absorber was measured (h = 100 mm). All other material parameters were fitted with a least squares algorithm by comparing the absorption coefficient measurements a M to the respective absorption curve a A of the material model. This approach is based on measurements for which analytical solutions (models) for the measured quantity exists. In the considered case, the measurement is done in an impedance tube using the 2p-method. For this purpose a sample of the material to be characterized is placed inside the tube. This sample influences the sound wave propagation (plane wave propagation) in the tube for which mathematical models exist. Thus, the absorption coefficient can be expressed considering the measurement conditions at the 2p-method with e Z S the surface impedance at the transition between air and absorber, and Z c,air the characteristic impedance of air. For perpendicular sound incidence and due to the fact that the sample is backed by a rigid wall in the tube, the surface impedance can be determined by [45] e Z S ¼ Àj e Z c cotð e khÞ: ð22Þ resulting in a A ¼ 1 À j e Rj 2 ¼ 1 À Àj e Z c cotð e khÞ À Z c;air Àj e Z c cotð e khÞ þ Z c;air 2 : ð23Þ  To see whether an absorption effect is present, the curve obtained when no sample is placed in the tube is also shown. The attenuation effect in the absorber is considered by the complex quantities for the wave number e k and the characteristic impedance e Z c (j denotes the imaginary unit). These quantities result from the corresponding material model, here the DBM model (19).
Next, the material parameters, which minimize the error between measured a M and analytical absorption curve a A are searched. The measured absorption curve consists of F measurement points (f i , a M,i ) with i = 1 . . ., F. Here, the squared L2 error is used as error measure Hence, an optimization problem has to be solved to find those material parameters that minimize the error J err . Normally, and especially for nonlinear problems, such optimization problems are solved numerically, where the specification of a starting point (initial values) is necessary. For the optimization problem the gradient based Levenberg-Marquardt [46,47] algorithm was applied. Since gradient based algorithms may converge only locally, a combination with a genetic optimization algorithm was used. Here, the genetic algorithm "optimizes" the initial model parameters that were used afterwards in the Levenberg-Marquardt algorithm. In Figure 8, the fitted absorption curves of the DBM model is compared with the measurements. The material model was fitted to the mean value of the seven absorption curve measurements and resulted in a N = 8.92e3 Nsm À4 for the DBM model. The fit with the model shows a good agreement with the measurements.
To consider the porous absorber in the FE simulation, the generalized form of the Helmholtz equation (1) is used, which is suitable for an inhomogeneous medium (spatially varying density). Here, the searched for interior r in acoustic sources will be set to zero, since no sources are assumed to be in a volume. With the formulation (1), the porous absorber on the wall and ceiling is considered as a layer of an equivalent complex fluid. The frequency-dependent effective parameters e . eff and e K eff are obtained by e . eff ðxÞ ¼ e k ðxÞ e Z c ðxÞ employing the expressions for the characteristic impedance e Z c and the wave number e k of the DBM model (19). The sound sources on the surface of the generic source are modeled by (2).

Microphone positions
The presented approach for sound source localization require not only a suitable FE model but also precisely determined positions of the microphones with respect to a reference position. Through many simulations, it has been shown that spatially distributed microphone positions lead to the best identification results. This may be due to a higher sensitivity of the microphones compared to, e.g. a planar array. For the positioning within the room so-called microphone trees have been constructed (see Fig. 9) to achieve a spatial distribution of the microphones. These trees can have different branches with different lengths at various heights to which the microphones are attached. In order to determine the microphone locations in the room, an acoustic positioning system was developed. The system is based on the principle of multi-lateration, where distances between an unknown position (= microphone) and several known points (= loudspeakers at the walls, see Fig. 9) are used to determine the location. This setting is similar to the well-known global-positioning system (GPS) [48], which is mainly used for outdoor positioning purposes. If the distances r i between microphone and speakers s i (i = 1, . . ., N S ) are known, the position of the microphone can be obtained by using the sphere equation with the speaker coordinates s i = (x i , y i , z i ) T and the searched for microphone coordinates y = (x, y, z) T (Cartesian coordinate system). For a unique solution of the set of sphere equations (26), four speakers are used, three of which are essential. In trilateration, three equations are used to solve (26) which provides two possible solutions. For many situations it is clear which of the two solutions can only be correct (e.g., at the GPS the second solution would not be on earth but in outer space).
Since this is not clear in the present case, multi-lateration is applied. The distances r i in (26) can be rewritten using the time delay s i between speaker and microphone, and the propagation speed c 0 (= speed of sound) Hence, to obtain the distance r i , a sound signal (sin-burst, 16 kHz, 6 periods) is emitted from a speaker and the time delay between emitting and receiving this signal (at the microphone) is determined. An overview of different time delay estimation techniques in room acoustics is given in [49]. Furthermore, the speed of sound c 0 (0, U) must be known. For this purpose, the relative humidity of air U and the temperature 0 (in°C) is measured and the speed of sound is estimated using [50] with p 0 the static pressure and e(0) the temperature dependent vapor pressure of air which can be calculated using the Antoine-equation [51] e 0 ð Þ ¼ 133:322 Â 10 8:07131À 1730:63 233:426þ0 : Then, the distances between transmitter and receiver can be obtained by the time delay s and by knowing the propagation speed c 0 (27). Here, the time delay is obtained by cross correlation (CCO). In order to improve robustness against reverberation usually generalized cross correlation with phase transform (GCC-PHAT) may be applied [52]. However, an improvement using GCC-PHAT could not be observed here. For the correlation, a high sampling rate is advantageous. At the used DAQ unit, the highest possible sampling frequency was 192 kHz. Besides the determination of the distances and the measurement of the temperature 0 and relative humidity U, also the set of equations (26) has to be solved to obtain the microphone coordinates. There exist several algorithms belonging to multi-lateration for solving (26) [53][54][55]. In the actual system, a combination including a Kalman-filter (KF) [56] is applied. First, the system (26) is rewritten as an optimization problem with where x = [x, y, z] T andŷ is the optimum, i.e. the searched position of the microphone. For this, one of the sphere equations (marked by Ã * ) is subtracted from the remaining I = N S À 1. This process leads to the following equation . .
x I À x Ã y I À y Ã z I À z Ã b ¼ 1 2 . . ; which may by be solved bŷ The question which of the N S equations is used for subtraction is still open. Here, no fixed equation is used. Since the searched position should be independent of the chosen reference equation Ã*, the process is performed by choosing each equation once as reference and calculating the microphone positionŷ. Moreover, also the number of used equations N S for obtainingŷ could by varied, whereas pointed out abovefour are necessary. The total number of possible combinations can be calculated by with the maximum number of available speakers on the walls N S,max = 8. Hence, this procedure results in 163 estimations of the microphone position y. Next, the KF is applied to this data. The KF is an optimal estimation algorithm [57] and can be used for the state estimation (= searched microphone position) of a system with uncertain measurements. The process of the application is depicted in Figure 10. It is a recursive process in which the state estimation is improved with each new input measurement (= obtained position through (32)), leading in an optimal estimationŷ for y. The following definitions have been made for the KF with the state transition matrix A, the observation matrix H, measurement (noise) covariance matrix R, and process (noise) covariance matrix Q. The starting value for the error covariance P of the state was set to R. The standard deviation for the measurement was assumed to be s r = 0.1 m and for the process s q = 0.1 m (empirically chosen). The overall positioning error jjy Àŷjj is 37.3 ± 44.7 mm (determined at 70 locations in the room). This accuracy seems to be sufficient for now, since the identification by the inverse scheme is currently limited to low frequencies in the actual implementation (frequencies lower than 500 Hz seem to be practicable). It has to be noted that the determined position near the floor and ceiling have the highest deviations compared to the true position of the microphone. Moreover, the positions of the speakers on the walls will also have an influence on the localization result of the microphones.

Model validation
In order to check the appropriateness of the applied modeling approaches, the FE model needs to be validated. Therefore, a loudspeaker (Bruel & Kjaer OmniPower Sound Source Type 4292) was placed in the room and excited by white noise (bandwidth 1000 Hz). In the simulation, the source is modeled as point source (omni-directional). Since sound sources show such behavior at low frequencies and also, due to the fact that a special omni-directional loudspeaker was used, this approach has been followed.
To assess the suitability of the FE model, the relative L2 error p err (10) and the spatial distribution of the sound pressure for each frequency at the microphone positions were compared with the measurement. Here, the regression coefficient R 2 was calculated to quantify the model adequacy for the spatial distribution, which is determined by with p ms m the measured pressure and p m the pressure obtained from the simulation. Since the source strength is not known, a scaling factor "sc" for the calculation is determined by means of least squares minimization (min sc P ðp ms m À p m Þ 2 ). This scaling factor is used to scale the simulation before the comparisons were done, to account for the unknown source strength. This approach is possible, since a linear acoustic field is assumed. In Figure 11, the obtained regression coefficient and the relative L2 error are shown. It can be observed, that the model is quite good up to 500 Hz, but for higher frequencies both assessment criteria become worse. The investigations have shown that a regression coefficient R 2 of approximately 0.5 or higher is needed for a useful localization result. Figure 11 shows that this requirement is just fulfilled up to 350 Hz. Note, the microphone positions that were used for validation are different from those used for the source identification (see Fig. 12).

Source identification
In comparison to the inverse scheme, three common beamforming algorithms are used for the source identification. ConvBF [1] and FuncBF [2,3] were applied, which can also handle coherent acoustic sources. Furthermore, the Clean-SC [5] deconvolution algorithm was employed. This algorithm removes the side lobes from the raw source map based on spatial source coherence. However, it will not work satisfactorily with several tonal (i.e. coherent) sound sources. Steering vector formulation II (according to [15]) was used for the beamforming algorithms providing the correct source strength. It should be noted that the commonly used steering vector formulations do not use the available information of the room. To overcome this limitation, an approach combining measurement with simulation can be applied. Thereby, the transfer function between source and microphone positions is computed by a FE simulation leading to numerically calculated transfer functions (NCTF) as demonstrated in [25]. In doing so, the actual measurement setup is taken into account. Compared to the standard formulations, the NCTF leads to an improvement in resolution as well as dynamic range. However, the results at these low frequencies just showed minor improvements [25].
For the identification, four different microphone arrangements are considered. The first one is an Underbrink (UB) array (2D layout, aperture 1 m). At the other three   arrangements (MicA, MicB, MicC), the microphones are spatially distributed throughout the room without any special requirements for their positions. However, care was taken not to place them too close to the ceiling, the floor and the walls. The used microphone positions in the room are depicted in Figure 12. Furthermore, an optimized arrangement named MicBest was used for the inverse scheme, but only with synthesized measurement data (presented by black triangles). It should be noted that in the UB array 31, whereas for the others (MicA, MicB, MicC, MicBest) 50 microphones are used. This results in a microphone-to-source points ratio of v % 3% respectively v % 4%. The parameters for the inverse scheme were set to: a 0 = b 0 = q 0 = 0.125, q = 1.1, w = 40. The temperature in the room was 0 = 25°± 2°C and the relative humidity U = 25 ± 10% at the measurements. In order to consider the effects of these environmental conditions on c 0 , (28) is used. The considered source frequency is 250 Hz (k % 1.36 m).

Single speaker active
In the first example, speaker L1 on the left side of the generic sound source (see Fig. 5a) is active (the active speaker is marked by a black circle in the following source maps). First, the beamforming methods utilizing the UB array are applied (see Figs. 13a-13c). In the localization results (source maps) the source level L r (ref. 20 lPam) is shown. The localization results are rather poor, since the source position is detected at a wrong location. But the effect of improving the dynamic range through the different algorithms can be observed.
At low frequencies, beamforming methods usually do not perform well, especially when many sources are involved. Then it will hardly be possible to locate them individually. However, this is not the case in this example, since there is just one active sound source. Therefore, the standard methods should not perform poorly. However, the small aperture of the UB array and the fact that the sound waves from the source do not impinge directly on the array, the localization becomes increasingly difficult. In addition, the reflective surfaces in the room make localization even more challenging.
Next, the inverse scheme with the UB array measurement data was used. The area of the active sound source could be identified correctly. However, on the top of the source other spurious acoustic sources have been identified (see Fig. 13d). This can be an indication that the microphone positions are not sensitive enough or there are too few microphones. Considering the guidelines in [25] also a spatial distribution is desirable.
To further improve the localization result, other microphone arrangements (MicA, MicB, MicC) have been used (see Fig. 14). Compared to the previous results, there is a clear improvement of the localization, which indicates the importance of microphone positions. Thereby, MicA and MicB yield a similar localization result which outperforms those of MicC.
Beamforming was also done with the measured sound pressure data of these three arrangements, to have a comparison with the results of the inverse scheme. Due to the spatially distributed microphones a considerable improvement can also be expected there. Exemplarily, the localization performance of MicA is given in Figure 15.
So far, only localization results have been considered, which indicates whether the source was identified at the correct position. In order to make quantitative statements about the identified source distribution, the identified sources will be used to perform a sound field computation. This allows comparisons with the original acoustic field measured at the microphone positions. For the inverse scheme this is straightforward and no further steps need to be taken, since a detailed source distribution both in amplitude and phase is identified. Hence, a numerical simulation  was performed to obtain the acoustic field. For the quantification of the obtained result of the sound field computation the relative L2 error p err between measured and simulated pressure values at the microphone positions (10) is used. The results are given in Table 4.
The best acoustic field reconstruction is achieved by the simple 2D UB array considering the p err -values. However, this may lead to a wrong conclusion, if only this error value is considered. The source maps obtained with the UB array (Fig. 13) and the other 3D arrays (Fig. 14) show that the UB array does not provide a good localization result, although p err is the smallest. This fact clearly indicates that other microphone positions with a higher sensitivity should be used or the number of microphones should be increased to improve the localization result. But also with the other arrangements, a good agreement with the measured pressure was obtained (p err % 20%). Considering the localization result and the source field reconstruction the arrangement MicB is the best.
For the beamforming source maps, the acoustic field computation is not as simple as for the source distributions obtained by the inverse scheme. Main problem is given through the point spread function (PSF) of the array, since the computed source map (raw map) is a convolution of the PSF with the real source distribution. Deconvolution algorithms (like Clean-SC) try to eliminate the influence of the PSF from the raw source map resulting in a deconvolved map. Hence, the source maps obtained with Clean-SC can be integrated without further processing. For the ConvBF and FuncBF results, the source power integration technique [58][59][60] was applied to limit the effect of the PSF. To this end, the raw source map is normalized by the integrated PSF for a point source in the center of the integration area. Before the integration, an integration area must be specified. For this purpose, the maximum of the source map was taken as the center point of the integration area. From this center point, a sphere with radius 0.1 m was assumed and all surface points in this sphere were used for the integration. The results in Table 4 demonstrate the main advantage of the inverse scheme, namely an accurate identification of the source distribution with amplitude and phase for the computation of the acoustic field.
Next, the identified normal velocity level L v by the inverse scheme utilizing MicB is considered in order to have a comparison with the LSV measurement data (see Fig. 5).
Here, first the same situation as before, when speaker L1 on the left side of the source is active, is considered (Fig. 16a). The direct comparison with the LSV data shows a deviation of about 17 dB. In the next step, speaker L2 becomes the active sound source. This source radiates sound towards the floor. As before, the inverse scheme provides a good localization, but the amplitude again deviates by about 18 dB (Fig. 16b). To demonstrate the capability of the inverse scheme by using highly sensitive microphone positions (named by MicBest), we proceeded as follows. A forward simulation with prescribed normal velocity at the loudspeaker position (see Fig. 17a) was performed. The positions of the virtual microphones were determined according to the guidelines in [25] as the locations of maximal acoustic pressure. Since the acoustic field is known in the room through the forward simulation, the positions can be found easily. Therefore, first the maximum in the sound pressure field was searched and taken as microphone position, under the constraint of sufficient distance of the microphone from reflective surfaces. After this step the next position is determined. For this purpose, the region around the selected position is removed from the search space and the next maximum is searched, so that a spatial distribution is achieved as well. This procedure is followed until 50 positions are determined to have the same microphone number as MicB. In Figures 17b and 17c the various identification results using the two arrangements are depicted. Thereby, MicB (microphone arrangement as used in the measurements) shows a good localization result with a deviation of the velocity level of about 10 dB compared to the original one. However, using the microphone arrangement MicBest an almost perfect localization result as well as a good agreement in velocity level could be achieved (see Fig. 17c).

Two speakers active
The results achieved so far demonstrate the applicability of the proposed method for identifying low-frequency sound sources in real-world situations. An additional challenge is the localization (separation) of several active sound sources, especially in the low frequency range. To test  the proposed method in the presence of more than one acoustic source, two setups have been considered: (case A) speaker L1 and L3 active and (case B) speaker L1 and L2 active. Both sources should have approximately the same source strength, since the excitation signal was the same for both. However, due to speaker tolerances, the same source level may not be achieved. By using the inverse scheme for localization a good identification especially for case A was achieved (see Fig. 18a). For case B (see Fig. 19a), the result is not as good, but we would like to note that this setup is more challenging, since the two sources are closer to each other which makes the separation harder. The localization was also done with FuncBF (ConvBF is omitted, because FuncBF has the better ability for source separation). Since Clean-SC will not localize both sources (coherent sound sources), only the results of FuncBF are considered. It can be observed that in case A (Fig. 18b) FuncBF can separate the two sources, but the identified position of speaker L3 is more accurate with the inverse scheme. Moreover, the two source strengths of speaker L1 and L3 (which should be approximately equal) do not differ as much. For case B, FuncBF can not separate the two sources (see Fig. 19b).

Conclusion
In our developed inverse scheme the Helmholtz equation with the correct boundary conditions is solved. To recover the source locations, an inverse scheme based on a sparsity promoting Tikhonov functional to match measured (microphone signals) and simulated pressure is used. A clear advantage of such an inverse method is its ability to fully consider realistic geometry and boundary condition scenarios, as well as its straightforward generalizability to situations with convection and/or damping. The implemented optimization-based parameter identification algorithm relies on a gradient method with Armijo line search. For an efficient gradient computation of the objective function, the adjoint method is applied. By this approach, a detailed source distribution both in amplitude and phase is identified, and finally with this information a numerical simulation can be performed to obtain the acoustic field. State-of-the-art beamforming methods require an integration (question of the integration area arises) or more complex deconvolution algorithms must be applied. Since a FE model is needed for the inverse scheme, the modeling was discussed in detail. Special attention is paid to the determination of boundary conditions, especially the modeling of the porous absorber as well as the determination of the microphone positions in the measurement environment. An equivalent fluid model is used for the modeling of partially absorbing surfaces. To obtain the model parameters for the complex fluid, an inverse parameter determination was applied, where the measurement of the absorption coefficient in an impedance tube served as starting point for the adjustment of the parameters. To obtain the microphone positions, a set of sphere equations (at least four equations are necessary) was used in combination with a Kalman-filter.
The localization results achieved with the inverse scheme demonstrate the applicability at low frequencies in    real-world scenarios. Furthermore, through simulations, it could be shown that a perfect reconstruction result of the acoustic sources can be achieved with the microphone positions at pressure maxima, which demonstrates the potential of the inverse scheme. Despite the superiority of the inverse scheme compared to advanced deconvolution based signal processing schemes one has to consider the high effort due to two challenges: (1) Geometry and boundary condition modeling for an accurate FE computation, and (2) the precise determination of the positions of the used microphones.