Low frequency sound field reconstruction in a non-rectangular room using a small number of microphones

An accurate knowledge of the sound field distribution inside a room is required to identify and optimally locate corrective measures for room acoustics. However, the spatial recovery of the sound field would result in an impractically high number of microphones in the room. Fortunately, at low frequencies, the possibility to rely on a sparse description of sound fields can help reduce the total number of measurement points without affecting the accuracy of the reconstruction. In this paper, the use of Greedy algorithm and Global curve-fitting techniques are proposed, in order to first recover the modal parameters of the room, and then to reconstruct the entire enclosed sound field at low frequencies, using a reasonably low set of measurements. First, numerical investigations are conducted on a non-rectangular room configuration, with different acoustic properties, in order to analyze various aspects of the reconstruction frameworks such as accuracy and robustness. The model is then validated with an experimental study in an actual reverberation chamber. The study yields promising results in which the enclosed sound field can be faithfully reconstructed using a practically feasible number of microphones, even in complex-shaped and damped rooms.


Introduction
In room acoustics, sound field reconstruction generally consists of retrieving the entire enclosed sound field by performing a limited number of measurements. While the interpolations of the room impulse responses (RIRs) are commonly used for the purpose of auralization and sound reproduction, at low frequencies, a precise knowledge of its frequency domain equivalentthe room frequency responses (RFRs)can provide useful information on the spatial distribution of sound pressure caused by the resonances of the room (room modes) [1]. In the low-frequency range, room modes highly affect the sound field in the room, yielding irregularities in both the spatial and frequency domains which give rise to coloration as well as masking effects and eventually alter the listening experience. An accurate depiction of the spatial sound field in a room can provide important information for applying ad hoc treatments for room mode correction [2]. It has been shown that, at low frequencies, a knowledge on the modal properties and sound pressure distribution in the room helps improving the design of different passive corrective measures [3][4][5]. This becomes even more crucial in case of active strategies for room modes correction [6][7][8] where control settings could be adjusted based on the knowledge of the resulting sound pressure distribution. This highlights the need of a practical method to accurately reconstruct the sound field in the room at low frequencies.
Each RFR reveals the acoustic transfer from a given source to a given receiver in the room, in the frequency domain. Such RFRs embed the main properties of room modes, namely the resonance frequencies and modal decay times as well as the mode shapes of the room. To retrieve these information for a fixed source position, multiple measurements should be performed at different locations in the room and a reconstruction framework is required to recover the entire spatial information of the aforementioned quantities. The most intriguing question is how to faithfully reconstruct the spatial sound field in a room using the least number of measurements possible.
A regular space and time sampling of the RFRs generally results in an impractically dense microphones grid. It has been shown that, under the frame of the Plenacoustic function in free field [9], the inherent sparsity of the space-time representation of the governing function allows a more effective sampling approach of the sound field. Several studies have also addressed the different sparse properties of enclosed sound fields. In a room with closed boundaries, the sound field is fully dependent on the physics of the room, including its geometries and acoustic properties. Furthermore, at low frequencies, the wave equation is governed by a discrete number of eigenmodes which gives rise to additional sparse approximation. In [10], the spatial RFRs in a rectangular room can be interpolated on a line based on the fact that these transfer functions share the same common poles, with the only difference being their amplitudes (also known as residues) [11,12]. Mignot et al. [13] retrieved the low frequency RIRs in a rectangular room using a finite number of measurement points, by exploiting a low rank approximation using matching pursuit. In [14], a more conventional Compressed Sensing technique using a sensing matrix has been used in combination with plane waves expansion techniques to tackle the block-sparse properties of the acoustic field in a rectangular room.
In this paper, we focus on these inherent sparse properties in room acoustics at low frequencies using approximation techniques such as matching pursuit and global curve-fitting to obtain the low-frequency information of a non-rectangular room under an extensive point of view, where the spatial distribution of sound pressure in a large volume inside the room can be reconstructed and analyzed using a practically small number of microphones. In practice, not every room can be considered as a rectangular room, especially in the case of a conventional listening room or private cinema. Non-rectangular rooms certainly possess a more complex distribution of eigenmodes frequency-wise, and the mode shapes are also harder to predict. This practical challenge is the main motivation to investigate here a model of a non-rectangular reverberation chamber. A first numerical study of this facility is then followed by the experimental validation inside the actual reverberation chamber.
The analysis of the reconstruction results emphasizes on the frequency and spatial aspects of the responses in the room. As can be seen in [15][16][17], recent techniques for room modes equalization require an accurate knowledge of the sound field. For instance, an active electroacoustic absorber system [17], aiming at equalizing and flattening the frequency response of a room at low frequencies, requires an accurate model of the room to optimize the active acoustic impedance. In former studies, the actual efficiency of these low-frequency absorbers has been validated with a limited number of measurements inside the room, especially addressing the performance in terms of modal decay times reduction. With the possible help of the reconstruction framework proposed in this paper, the performance of the absorbers can be assessed space-wise. In addition, the framework can also provide precious information on how to adapt the acoustic impedance to be assigned at the diaphragm of the active electroacoustic absorbers. This motivates investigating how to minimize the number of measurement points for such a reconstruction framework as not only that it allows the reconstruction of the sound field within a specific bandwidth with limited equipment but also saves processing time that will eventually allow potential real time and online active-control strategies.
The outline of the paper is as follows. Section 2 first introduces a sparse representation of room acoustics at low frequencies. The reconstruction method, which is composed of two steps, is then introduced in Section 3. The first part of the method consists of the modal identifi-cation of the room in which two different approaches, respectively in the time and frequency domains, are suggested. The second part aims at recovering mode shape functions through plane wave approximation techniques. Following the descriptions of the reconstruction mechanism, Section 4 is dedicated to the validation of the method using both numerical models and experimental measurements in the actual reverberation chamber at EPFL (non-rectangular room) to emphasize the robustness of the algorithm. Several discussions are raised concerning the accuracy of the sound field spatial recovery as well as the requirements for a faithful reconstruction. Concluding remarks are finally presented in Section 5.

Sparsity in room acoustics
The main motivation of this study is to propose a simple, yet practical, experimental framework allowing a thorough characterization of the room behavior in the low frequency domain. Regardless of the method used to reduce the amount of measurement points, such a framework should rely on a sparse representation of the wave equation in a room at low frequencies. These could be exact sparsity that inherently emerges from the physics of the room or approximate sparsity which requires an approximation framework to reduce the degrees of freedom in the wave equation. In this section, several sparse aspects of room acoustics at low frequencies can be investigated using the modal decomposition form of the wave equation and the mode shape approximation theorem. The objective is to obtain a governing equation of the spatial distribution of sound pressure in a room where the number of variables is well defined and quantifiable. This could serve as the target for the reconstruction framework that follows in Section 3.

Modal decomposition
At low frequencies, where wavelengths are of the same order of magnitudes as the room dimensions, room walls are mostly reflective which give rise to standing waves phenomenon. This creates the so called room modes that occur at discrete resonance frequencies where most of the acoustic energy is concentrated [18]. There exists a formulation of room modes at low frequencies that presents an inherent sparsity, corresponding to a limited number of discrete resonance frequencies bounded by the Schroeder cutoff frequency [1]. In this sparse representation, the solutions of the wave equation can be decomposed as a discrete sum of damped harmonic eigenmodes: where U n are the space-dependent mode shape functions (eigenfunctions of the Helmholtz equation) for each mode n of the room,X is the position in the room, g n ðtÞ is the harmonic time-dependent decaying function and A n is the corresponding complex expansion coefficient of mode n. Each eigenmode of a room is uniquely represented by a complex wavenumber k n ¼ ðx n þ jd n Þ=c 0 (eigenvalues of the Helmholtz equation), where c 0 is the sound celerity in the air, x n is the modal angular frequency and d n > 0 is the corresponding damping factor [18]. The harmonic decaying function g n ðtÞ can be fully expressed as: It is worth noticing that whileX is a variable in equation (1) as the location of the point/microphone of interest, the location of the source and its properties are not explicitly written here. This information is however accounted for in the complex coefficients A n , and will be made implicit in the following derivations. This is motivated by the fact that, in the case investigated here, only a single fixed source will be considered, and hence the location of the source is not a variable.

Mode shape approximation
The previous derivation introduced a structured sparsity originated from the limited discrete modal decomposition of the wave equation at low frequencies. For a room with ideally rigid walls, U n is a space dependent function that corresponds to the exact solution of the Helmholtz equation [1]: It has been shown in [19] that these mode shape functions can be further approximated with spherical harmonics and spherical Bessel functions. Accordingly, any mode shape function can be approximated by a finite sum of plane waves sharing the same wavenumber jk n j, pointing in various directions. Each individual mode shape can then be formulated using the R-th order approximation: within whichk n;r are the 3D wavevectors sharing the same wavenumber jjk n;r jj 2 ¼ jk n j. Note that, in opposition to the exact sparsity in the previous section, this is an approximate sparsity. This decomposition not only provides an approximation for each of the mode shape, but also allows a closed-form interpretation of the mode shape function regardless of the type of the modes in the room. Assuming now that we restrict this representation below a given upper frequency limit, a finite number R of wavevectors would be enough to closely approximate every mode shape function within this frequency range.
Using equations (2) and (4), equation (1) could be expanded as: p t;X À Á ¼ X n;r C n;r e jxnt e Àdnt e jkn;rÁX ; ð5Þ where C n;r ¼ A n B n;r with r R. Hence, through a series of derivations, the expression in equation (1) can be interpreted as the discrete sum of space-time damped harmonics with the expansion coefficients C n;r . This expansion form directly links the acoustic response of the receiver to its location.

Reconstruction framework
The role of the reconstruction framework is to identify and estimate the values of the unknown parameters of equation (5) from a limited set of measurements. The proposed algorithm addresses the general case of a non-rectangular room, the modal behavior of which is less predictable than in a shoe-box room. Figure 1 shows the geometry of the studied room with two simulated room modes. Inside the room, a number of M microphones are randomly placed at different locations to acquire the RIR measurements. Depending on the frequency range of interest, these measurements could be filtered as well as downsampled to reduce computational cost. Calling N t the length of the time vector of each microphone measurement, the (N t Â M) matrix S of signals is defined as the input of the framework. The output of the reconstruction framework, in short, should be all the unknowns present in equation (5), excluding the predefined parameters, namely, the number of modes N and the list of wavevectorsk n;r for each mode shape approximation. The outputs, hence, include the angular frequency x n and the exponential damping factor d n for each eigenmode, as well as the N Â R expansion coefficients C n;r . Once all these values are determined, it is possible to interpolate the responses at any positionX int in the room by simply plugging it into equation (5).
The detailed framework can be divided into two steps. The first one is called modal identification, aiming at estimating the modal wavenumbers k n for the N room modes. Once identified, the second step intends to approximate the expansion coefficients C n;r for a set of predefined wavevectorsk n;r through projection.

Modal identification
Two alternative approaches are introduced here, processing the input signals either in the time or in the frequency domain. The first approach is the simultaneous orthogonal matching pursuit (SOMP) method [20] for damped sinusoids [21]. This method is based on a greedy algorithm approach to recursively estimate each modal parameter of the room from the matrix of input time signals. The second method is based on the rational fraction polynomials (RFP) global curve fitting method [22] which, contrarily to the iterative SOMP, simultaneously estimates the modal parameters of the room from a set of input RFRs of the room.

Time domain approach
This method has been successfully used in [13] to locally interpolate the RIRs in a rectangular room at low frequencies. From a pre-defined set of damped sinusoids, this method finds the ones that are highly correlated with the matrix of input signals using a low-rank approximation approach. To begin, two sets of x and d with x min < x < x max and d min < d < d max , are formed. The range of variation of the sets are roughly estimated based on available knowledge on the room. Combining every pair of entries of the two sets together will produce an overly redundant set of complex components ðjx q À d q Þ in which q 2 ½1; Q with Q as the total number of possible combinations. Each entry of this set is then used to form a time vector of length N t of time-decaying damped sinusoid h q ¼ e jxqt e Àdqt . Using the normalized vectors h q ¼ h q =jjh q jj 2 as column vectors will produce an ðN t Â QÞ array H.
The algorithm performs an iterative matching procedure. Every loop indexed i starts with an ðN t Â MÞ residue matrix R i which is the result of the previous loop. At the first loop, R 1 is set to be equal to the predefined signal matrix S. Through the searching procedure, a damped sinusoid with the highest correlation to the residue matrix (representing a pair of x n and d n ) is chosen. The new residue matrix R iþ1 for the following loop can then be formed by extracting the contribution of this chosen sinusoid from R i . The algorithm at a generic i-th iteration is detailed below: Each row q of N i is composed of the M correlation values between the q-th normalized damped sinusoid and each of the M measurements. By summing the energy of this set of values, process the evaluation correlation value r q between the qth damped sinusoid and the entire set of measurements: Out of the Q available r q , choose the maximum one, which points to the pole with the highest correlation to the measurements. As a result, the identified index (namely, q i ) yields the chosen modal wavenumber of this loop: After a modal wavenumber is found, following the orthogonalization and projection of SOMP in reference [20], the residue matrix for the next loop can be interpreted as R iþ1 ¼ R i À P i R i in which P i is the projection onto the chosen damped sinusoidal.
At the end of the procedure, a group of complex wavenumbers corresponding to the eigenmodes of the room is determined.

Frequency domain approach
As room modes are mostly visible in the RFRs, it seems pragmatical to investigate a frequency-domain approach for room modes identification. One particular example is the global curve-fitting method in the frequency domain using the RFP form [22]. This has been used in [23] to estimate the modal parameters by curve fitting the RFR measurements. Curve-fitting methods are usually processed locally, initiating on a single function at a time. The method in [22], however, performs curve-fitting procedures on multiple frequency response functions at different locations simultaneously to identify the model of the system. The method assumes the linearity of the RFRs and that they can be formulated as a ratio of two polynomials. These RFRs share the same denominator whose poles contain information on the modal angular frequencies (x n ) and damping (d n ) of the room. The method then performs a concurrently curve-fitting on the set of measured RFRs (see Appendix) to acquire the modal parameters of the room within a given bandwidth.

Projection onto spherical sampled wavevectors
Up to now, only the eigenfrequency parameters of the room modes, namely, x n and d n given in equation (5) have been identified. The remaining parameters to be determined are the expansion coefficients C n;r , for which the following algorithm is used: The first step is the separation of the current known and unknown parameters. Note that the time-varying terms in equation (1) have been identified in the former algorithm, and can be discarded from now on. Using a matrical form accordingly to the measurement matrix S gives: S T ¼ WG with G as the ðN Â N t Þ matrix where each of its row is a modal damped sinusoidal g n ðtÞ ¼ e jxn t e Àdn t ¼ e jkn t . Furthermore, W is the (M Â N ) space-dependent matrix of modes with the inclusion of the expansion coefficients A n that appear in equation (1): withX m 's the M position vectors for the location of the input measurements of S. If N t > N (which usually is the case), the system of equation (6) is over-determined with ðM Â N Þ unknown and ðM Â N t Þ equations. Hence, it is possible to estimate the (M Â N ) matrix of W by computing the least-squares estimation.
Based on the expression in equation (5), W can be further expanded using plane waves expansion.
-First, the list of component wavevectors needs to be defined. For each mode shape function, a set of R wavevectorsk n;r is created whose norm and directions match a uniform sampling over a sphere with radius jx n =c 0 j. Spherical sampling (proposed in [24]) is chosen in this case because the room is nonrectangular and hence there is no preferred basis for the formation of mode shape functions. -Each column w n of the matrix W can be treated individually as they are associated with different modes. Calling q n the (M Â R) matrix of the plane wave harmonics for mode n in which q n½m;r ¼ e jkn;r ÁXm , each column vector w n can be individually characterized as: with C n the (R Â 1) vector consisting of the R expansion coefficients C n;r of mode n. First, assuming that R < M, taking q n as the basis, w n can be projected onto this basis to derive the coefficient vector C n using least-square projection: As mentioned above, this derivation is only available when the number of sampled plane waves is lower than the number of microphones. As can be seen in [19,25,26], the convergence of the plane wave approximation is highly dependent on the number of plane waves available, especially in 3D. Hence, in the case where the number of measurement points is fairly low, restricting R < M could affect the reconstruction of mode shape functions. One possibility would be to allow R > M and derive the coefficient vector using a least norm optimization: Further studies need to be done to verify the limitations of this solution as well as the optimal choice for R. In our case, for a low number of microphones, several trials have shown that choosing R > M can estimate the mode shape better and increase the overall correlation. Regardless of the method used, in practice, the applicability of this step can always be cross-checked using a number of evaluation microphones.
Repeating the technique on each mode n N will return the set of expansion coefficients C n;r required for the reconstruction.

Reconstruction results
In this section, the results of the sound field reconstruction framework are analyzed using both numerical and experimental data. In the numerical simulations, a FEM model of a non-rectangular room is built for initial analysis. This first numerical study allows the assessment of multiple aspects of the reconstruction framework. It provides access to a very fine distribution of microphone placements, and the input data, such as wall impedance, can be changed straightforwardly. Furthermore, the FEM simulation not only provides the input but also can be used as the ground-truth reference for cross-checking the reconstruction results. In the second step, measurements are performed in the actual reverberation chamber at Ecole Polytechnique Fédérale de Lausanne (EPFL), with the same geometry as the one considered in the simulations, to confirm the validity and robustness of the framework.

Numerical simulation
The FEM model consists of a non-rectangular room with maximum height of 4.6 m, maximum width of 9.8 m and maximum length of 6.6 m that replicates the actual reverberation chamber at EPFL. The damping of the walls are initially considered very low, with a uniform absorption coefficient of a ¼ 0:01, approaching that of the actual reverberation chamber. The source is chosen to be a monopole point source and is put in proximity to one corner of the room in order to excite all the modes of the room. The measurement points are spread randomly in the room (refer to Fig. 2). This placement of microphones although possibly not the most ideal placement strategy for a given geometry, is guaranteed to capture enough information about the sound field and its modal properties assuming there is no readily available knowledge about the room.

Modal identification
In this section, the modal identification is performed using two different methods, namely SOMP and RFP, and their results are compared with each other. However, instead of directly comparing the retrieved modal properties x n and d n , the focus has been put on two other useful properties in modal analysis: the eigenfrequency (f n ) and the modal decay time (MT 60n ) which is defined as [27,28]: These two properties reflect the modal properties of the room and are directly linked to x n and d n . Using the same number of microphones, the modal decay times estimated from SOMP and RFP methods for the first 12 modes of the room are compared with those computed from the baseline FEM analysis considered as the ground-truth (see Fig. 3). After a few initial tests, it is observed that both methods performed equally well in identifying the frequency f n (in Hz) for each of the modes of the room. As the values of f n obtained using the two methods do not present much differences, the comparison is illustrated here only in terms of the modal decay times to compare their performance with respect to modal damping estimation. Using the numerical results from the FEM analysis as the reference, it can be seen in Figure 3 that both the RFP and SOMP methods are capable of identifying the room's eigenmodes, except that SOMP, on average, may underestimate the damping for the mode at 40.5 Hz, which will be discussed at a later stage. Generally, it can be observed that the RFP method performs slightly better than SOMP. However, there are significant differences between the two methods regarding robustness. Although both methods require a manual input regarding the total number of modes in a limited bandwidth, they process this information differently. For the global curve fitting using RFP, if the total number of modes within a frequency range is not accurately known, it requires a considerable amount of trials and errors to eventually come up with a coherent curve-fitting result. Furthermore, as can be seen in the later stage, without a meticulous consistency check, the interpolation results from RFP can end up with a higher amount of errors. This vulnerability, for most cases, does not exist for SOMP. This is due to the fact that the modal parameters are found in RFP simultaneously whereas in SOMP they are found iteratively using a residual manner: the room modes that have the highest contributions to the collected signals are estimated first, followed by the ones with less. This gives SOMP an advantage for the reconstruction procedure as the results do not deviate much from reality even when underestimating or overestimating the number of modes within the frequency range of interest. The number that users enter can only alter how many times the algorithm is repeated but should not affect the result of each individual loop.
In this particular case, the underestimated damping by SOMP that sometimes occurs at 40.5 Hz also comes from the fact that this algorithm processes residues at each computing step. The modes that are found at the later iterations of the algorithm are prone to higher errors and also, its correlation with the measurements is likely to be less than the ones that come before in the algorithm. When there are two modes that are very close together such as the particular cases at 40.5 Hz and 40.9 Hz respectively, depending on the set of input measurements, one of them may be found at the very far end of the algorithm compared to the other. Since one mode has been found earlier in the process, and its contribution to the residual has been extracted before, the error that occurs at the others would have minimal effects to the overall reconstruction result in the next stage. The same situation also occurs when users overestimate the number of modes. Then, around the final loops, the algorithm will certainly find some frequencies that do not correspond to any mode. As long as the overestimation is not too far from reality, this error in SOMP would have negligible effects on the reconstruction results in the next stage because the contributions of the few mismatched modes are generally significantly small compared to the correct ones.
Although being less robust, the RFP curve fitting method does have a clear advantage over the SOMP method regarding computational cost. SOMP not only performs an iterative mode finding process that requires a regular refreshing of the residual matrix but also does so using multiple costly matrix operations. The RFP method developed in [29] on the other hand, does not perform an iterative process and has taken into account several computational simplifications. For instance, on a conventional work station with 32GB RAM and four cores CPU of 3.4GHZ, in order to perform the results in Figure 3 using 25 microphones, the SOMP method usually would usually take 4-7 min to finish while the RFP would finish in 5-10 s. This significant difference will further increase if the number of input measurements or the number of modes increases. Overall, it can be seen that SOMP is a robust method that works best in cases where not much information about the room is available or where a blind estimation is required. RFP, on the other hand, requires more a priori information about the modes in the room to produce a coherent result. However, RFP generally takes much less processing time than SOMP and hence can potentially be beneficial in certain application such as online estimation or real-time sound field control. In terms of accuracy, it should be noted that under sufficient conditions, both methods are capable of producing a good estimation of the modal information of the room.

Local interpolation
From the outcomes of the algorithm, it is now possible to process and interpolate the responses at any point inside the geometry. The RFRs correspond to the transmissions between the source volume flow rate (in m 3 /s) and the sound pressure (in Pa) acquired at the measurement points. One example can be seen in Figure 4 for an arbitrary point far from the walls but also not too close to the center of the room. The interpolation was processed using both the SOMP and RFP method with the same set of 25 microphone positions in the room. It can be seen that both methods can produce an accurate interpolation of the response at this particular point.
To illustrate the difference between RFP and SOMP, one example of interpolation is plotted in Figure 5 where the total number of modes were underestimated. As an iterative process, SOMP would still give a good estimation of all the modes except the ones it does not find whereas RFP would add some modes that are not from the real system and hence will lead to higher errors. However, testing a few different trials for RFP can solve this problem and hence this method should not be overlooked as its computation time is short and can therefore be advantageous in many situations.
It must be noted, however, that the high level of accuracy seen in Figure 4 from both method is not yet guaranteed for every interpolated point in the room, and the error might be higher depending on the position of the point as well as on the precision of the modal identification results. This, once again, highlights the need for a spatial  representation of the sound field to confirm the global validity of the algorithm.

Sound field reconstruction
In this section, the interpolation process is extended to a large number of points inside the room to acquire a series of processed time responses of the room. The RFRs of the room can then be produced through the Fourier transform of these time responses. These resulting RFRs will allow the reconstruction of the spatial response of the room at any given frequency of interest.
It is known that, for a room with non-ideally rigid boundaries (a > 0), the Helmholtz equation is less valid close to the room walls [1]. Hence, the initial sound field reconstruction is performed for a shoe-box volume inside the room with each face being at least 1m away from the walls of the room. It is then possible to compare these results with frequency domain simulations, obtained using an FEM software, considered as the ground truth. Figure 6 shows three examples of the sound field reconstruction using 25 microphones, compared to such reference, at three different frequencies at very high spatial resolution. It can be observed that the reconstruction of the sound field yields qualitatively highly accurate results. The existence of the mode shapes is also clearly observed in all three examples. This proves that the spherical sampling technique for wave vectors is a powerful tool for rooms with complex geometries. Furthermore, this high level of accuracy is maintained in every direction of the 3D depiction since the input measurement points are spread randomly in the room. A few initial trials using a regular grid of microphones have not achieved such global precision in the results. This, once again, emphasizes the advantage of the much-recommended randomness that is used in common sparse and low-rank approximation frameworks. It is worth noticing that although there can be small differences when comparing the local sound pressure point by point, the general shapes as well as the separation between areas of high and low sound pressure are nevertheless precisely depicted. Furthermore the reconstruction of sound pressure field is accurate not just at the eigenfrequencies (45.25 Hz and 55.08 Hz) but also for frequencies in between two consecutive modes (e.g., at 38 Hz).
The normalized Pearson correlation coefficient for the amplitude of the frequency responses, calculated as below: can be used to evaluate the overall accuracy of the reconstructed frequency responseS f with respect to the reference response S f . Processing this coefficient to the regular grid of 11 Â 11 Â 11 points that samples the inner rectangular volume, yields an average correlation of 99.3% with a standard deviation of 0.8%. It is worth noting that COR % is a good indication of the overall fitting of the reconstructed signals in a bandwidth, but does not provide accurate clues for interpreting the precision frequency-wise. A global error evaluation will be introduced further in the section to address this subject. So far, the analysis has shown good results for the sound field reconstruction of a lightly damped room with a ¼ 0:01. In order to further assess its robustness in more conventional situations with acoustic treatments, the algorithm is tested with various room absorption condition. To verify this, a uniform absorption coefficient a is considered for the room walls, and set first at 0.1 and then increased to 0.3 to better represent a case of a damped room. Using the same number of microphones, the reconstruction of the sound field for these cases is performed as in the preceding case. Figure 7 shows the comparison of the reconstruction at the same room mode (but slightly different eigenfrequencies due to the resulting change of modal damping) between different values of wall absorption. The reconstruction results for these cases still present a good agreement with the reference ones. Not only that the framework captures correctly the reduction in terms of energy in the room but it also succeeds in rendering the smoothing effect of the spatial distribution as the room becomes more damped. Figure 8 shows a global comparison of the dimensionless normalized errors defined as: which quantifies the error of the reconstruction result at each frequency, normalized by R Sðf Þdf =Áf to discard the dependence on the acoustic energy difference in the room between different room absorption conditions. This quantity is then relevant for comparing the performance  of the reconstruction as a function of the room damping because it accounts for the acoustic energy not absorbed by the room, which, for the same reference source, decreases as the room gets more damped. The non-normalized error jSðf Þ ÀSðf Þj is computed for each recovery point in a 11 Â 11 Â 11 points grid that spatially samples the aforementioned shoe-box test volume. Averaging the error within the spatial grid will then give the average error at each each frequency jSðf Þ ÀSðf Þj. Figure 8 shows a decrease in terms of accuracy as the damping of the walls increases. This is explainable as the orthogonality assumption of the mode shape functions in equation (3) becomes weaker with higher damping in the room. Furthermore, the modal identification on the RFRs is generally more challenging in a room with high damping than in a lightly damped room. Regarding the correlation, the average COR % is still high at 98.3% with 1.8% of standard deviation (a ¼ 0:1) and 98.1% in average with 2.1% of standard deviation (a ¼ 0:3). Figure 8 also shows the error on the reconstruction sound field for a plane very close to a wall (maximum distance from the wall is 0.1 m). It can be observed that the sound field reconstruction near the wall induces higher errors, as can be anticipated. This is due to the aforementioned non-orthogonality of the mode shape function as well as the higher errors induced by extrapolation instead of interpolation as the concerned point is mostly outside of the microphones domain. The results from this evaluation are especially meaningful for modal equalization. It shows that this particular reconstruction framework can be used to effectively assess the sound field within a room before and after a given equalization method has been applied, which paves the way for a new tool for assessing the in situ performance of low-frequency room modes treatments [7] space-wise. Figure 8 also shows that the reconstruction error generally increases for higher frequencies. As the algorithm does not particularly favor lower-order modes over the higher ones, this indicates that something over the parameters estimation step affects the accuracy level. The first possible answer appears to be the complexity of the mode shape function. For a room with complex geometry, the complexity of the mode shape functions will also increase for higherorder room modes which will generally require a higher number of plane waves to converge. Even when using a least norm method to increase the number of plane waves, the compromise between regularization and instability of the mode shape approximation [30,31] means that the accuracy still relies heavily on the number of measurements available. Furthermore, as the frequency gets higher, the modal density will increase which means that the average distance (in Hz) between two consecutive modes will be smaller and induce more difficulties for the modal estimation framework.
So far, the number of measurement points (microphones) has not been mentioned. Figure 9 compares the Pearson Correlation criteria (space-wise average and standard deviation) processed for different number of evaluation microphones, and for different absorption coefficients of the walls. For each case, the algorithm is repeated multiple times with the same number of measurement points but each time the locations of the input measurements are chosen randomly from a set of 600 random points. This procedure is chosen so as to eliminate the bias that could emanate from the placement of the microphones, especially in the cases where the number of microphones is considerably low. For each case, the Pearson Correlation is calculated for the 11 Â 11 Â 11 grid that samples the shoebox-shaped reconstruction region. As can be seen, the correlation value gets higher as the number of microphones increases. The standard deviation value mentioned in this figure specifies the standard deviation of the correlation value between different interpolating points in the rectangular reconstruction region. A high standard deviation value will then indicate a highly uneven reconstruction accuracy in which the correlation of the reconstruction varies significantly depending on the location of the interpolation. Conversely, a low standard deviation indicates that the spatial reconstruction result is stable and can be trusted. Figure 9 shows that the correlation values improve as the number of measurement points increases. Furthermore, the standard deviation also decreases significantly when more measurement points are used for the framework. This shows that while the reconstruction gets more accurate, the estimation accuracy becomes also uniformly higher across all interpolation points in the volume. One of the reason is the more measurement points available, the better the chance to estimate correctly the room modes information. It can also be observed that for the analysis within a fixed bandwidth, the performance typically becomes stable and reliable when a certain number of measurement points is reached. In the case of a lightly damped room, for instance, a grid of size 1331 within a volume of 40 m 3 can be reconstructed with a high accuracy of 98.5% using just 30 input measurement points which is an effective result for a practical number of microphones. Furthermore, even with only 20 microphones, the result is still considered stable with a trusted average correlation around 95%.

Experimental results
The reconstruction framework is now applied to actual measurements inside the reverberation chamber at EPFL, which has the same geometry as the FEM model ( Fig. 10). The source, a custom-made subwoofer in a closed wooden cabinet, is located at a corner in the room to excite all room modes at low frequencies. The microphones (PCB 378B02 1/2 00 microphones) are spread randomly in the room to replicate the previous numerical analysis. The reference velocity of the source is measured with a laser velocimeter (Polytec OFV 500) placed in front of the loudspeaker diaphragm.
Two main methods can be used to evaluate the reconstruction results. One method is to directly compare the reconstructed sound field to the simulated one in FEM. This method can certainly verify the faithfulness of the spatial reproduction results but is not recommended for a point-by-point comparison as it is difficult to accurately match the FEM model with the real one, since it relies on the absorbing properties of the room which are not accurately known. Moreover, the reference used for processing the RFRs can be different between the simulation (volume flow) and the actual case (velocity) and it is difficult to accurately match the source excitation as well as its position. Thus, besides this method, a small part of the measurement points can be reserved to serve as an evaluation set. Combining these two evaluation methods provides a more concrete analysis of the reconstruction framework with the experimental data.
In this experiment, the signals from 25 microphone positions are used as the inputs of the algorithm to reproduce the sound field up to 75 Hz (within which about 20 modes can be observed). The microphones are located randomly in the room but were chosen so that they are practically evenly distributed space-wise to cover the area of the reconstructed rectangular volume. Figure 11 shows the spatial comparison between the reconstructed sound field and the reference one obtained from numerical simulation regarding the same eigenmodes. As the numerical model cannot be perfectly matched with the real room, there is a small difference in terms of the exact frequency of the eigenmodes. Comparing these results, it can be seen that similarly to the numerical results in the previous section, the reconstructed sound fields from real measurements yield highly accurate spatial recovery. The mode shapes are visible and the locations of nodal lines are correctly depicted with high spatial resolution. Once again, small mismatch in a few points is to be expected but the overall spatial representation remains to be faithful.
Using the evaluation set of 30 other microphone signals within the domain of interest, the results also agree with the previous simulation validation. Using SOMP for modal estimation, the average correlation stays at 97.8% with 1.89% of standard deviation. As expected, this result is slightly less accurate than the average correlation obtained with simulations but is still highly reliable. As mentioned earlier, SOMP is particularly robust and can perform well even when a priori information is missing. On the other hand, under the same circumstance, using a semi-supervised RFP curve fitting gives a slightly lower average correlation of 96.9% with a higher standard deviation of 2.3%. This result also agrees with the analysis in Section 4.2 regarding the different nature of the two methods. Three different examples of the reconstructed RFRs by both RFP and SOMP are plotted in Figure 12 and compared to the actual measurements from the evaluation set. Generally, without a detailed supervision and calibration, the RFP method will return a slightly less accurate result than SOMP as can be observed from the figure. However, its processing speed is much faster and hence could allow for re-calibration depending on the situation.
It should be noted that in practice, when the room geometries and wall absorption coefficient are not known, evaluation set like this along with the comparison parameters such as the correlation values are among the few available indications to know whether the reconstruction results are reliable. Therefore, practically, it is advised to always have a reserved evaluation set inside the domain of interest to navigate the adequate number of microphones required for any certain objective.  Regarding the experiment set-up, as the number of microphones is practically small, a blind random placement of microphones might leave out crucial areas of the room. Hence out of all the possible randomization, it is advised to choose an appropriate placement that does not leave out crucial areas of the region of interest. Moreover, placement technique like the one suggested in [30] might also be used to improve the recovery results. Lastly, it should be noticed that the measurement was conducted in a reverberation chamber without removing the reflective diffusing panels (Fig. 10). This shows that the framework is robust enough to perform well even in a practical nonempty room.

Conclusion
In this paper, we have investigated a robust sound field reconstruction framework in a room at low frequencies.
Through modal decomposition and plane wave approximation of mode shape functions, the framework allows recovering the entire sound pressure distribution of the room at any frequency within the concerned bandwidth, from a limited set of measurements. Within the framework, the performance of two different modal estimation methods in the time and frequency domain, namely SOMP and RFP, are compared. Both methods are shown to allow retrieving the modal parameters of the room. Between the two approaches, SOMP has been proven to be more robust whereas RFP has significant advantages in terms of computational cost.
The space-wise analysis of the reconstruction results confirms the practical applicability of the framework in the field of modal equalization. The reconstruction is performed inside a non-rectangular reverberation chamber using 20-30 microphones, which are proven sufficient to address the bandwidth of interest (containing around 20 modes). The results first show that the reconstruction is highly accurate for a lightly damped room. The framework is further tested by increasing the global absorption of the room walls. For these cases, the reconstruction shows a slight reduction in terms of accuracy, especially for positions close to the walls. This slight drop in accuracy is anticipated as it is generally more challenging to retrieve the modal parameters for highly damped room. Nevertheless, the overall reconstruction results retain a sufficiently high level of reliability. This means that the framework may be used to assess the space-wise performance of existing passive and active modal equalization methods. More importantly, the results of the method can be used as input for on-the-fly reconfiguration of active low frequency absorbers, such as the electroacoustic absorbers developed in [7]. Such in situ reconfigurability of active devices presents interesting potential for optimizing room mode equalization in real rooms, and should be further studied.
This paper tackles the case where a single source is fixed inside of the room. Further work should focus on retrieving the entire sound field for multiple source positions in the room. The microphones placement in this research are spread randomly in the room. However, considering a low number of microphones, the accuracy could benefit from a predefined microphones placement strategy, that should be further studied.