Issue 
Acta Acust.
Volume 5, 2021



Article Number  18  
Number of page(s)  10  
Section  Noise Control  
DOI  https://doi.org/10.1051/aacus/2021011  
Published online  14 April 2021 
Technical & Applied Article
Direction of arrival estimation of partial sound sources of vehicles with a twomicrophone array
^{1}
Federal University of Rio de Janeiro, Electrical Engineering Program, 21941901 Rio de Janeiro, Brazil
^{2}
RWTH Aachen University, Institute of Technical Acoustics, 52062 Aachen, Germany
^{*} Corresponding author: mvo@akustik.rwthaachen.de
Received:
31
December
2020
Accepted:
15
March
2021
The generalized crosscorrelation with phase transform (GCCPHAT) algorithm has proved to be useful for blindly estimating the direction of arrival of compact sound sources from microphone array recordings. In applications with distributions of partial sources, such as the tires of vehicles in urban environments, the GCCPHAT needs to be improved, otherwise the detected sound directions change values between directions of the main sources or correspond to an intermediate value between these directions. This paper presents an extension of the GCCPHAT, based on postprocessing of the output delay matrix and on image processing techniques, in order to separately identify directions of the sound produced by the front and rear tires of moving vehicles. The proposed approach can be extended to identify the tire noise directions produced by vehicles with multiple axles. The algorithm performance is analyzed using passby measurements of twoaxle vehicles, acquired by a twomicrophone array. The experiments were conducted with passenger vehicles of four distinct models, running at different speeds. The experimental results show that the proposed method is able to estimate the vehicle speed with an average error of 10.8 km/h and the vehicle wheelbase with 26 cm on average. A possible application is multiple source characterization for parametric vehicle sound synthesis in auralization.
Key words: Vehicle passby / Partial source separation / Array measurement / Source characterization / Traffic noise auralization
© G.D. Rocha et al., Published by EDP Sciences, 2021
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Noise maps are the main tool for assessing the sound distribution in urban environments. However, it does not portray the real auditory perception of a given urban area, because it is essentially a visual tool displaying longterm sound level averages. An additional approach is the use of auralization [1] in order to extend the noise assessment into audible sound, using recordings and/or simulated data [2–5]. The simulation of an urban acoustic scene must be based on an accurate model for the sound signals from the main sources and for propagation effects due to multiple paths of reflections and diffractions. In the context of urban noise, the most relevant sound sources in road traffic are produced by light and heavy vehicles [6–9].
Car noise emissions contain contributions from various separate sources, whose spectral content are distinct and spatially distributed throughout the vehicle. Among these sources are the tires, the engine and the exhaust system, the first one being dominant for vehicle speeds above 30 km/h [10–12]. This work aims at developing a signal processing method in order to track and separate noise contributions from the main vehicle sound sources, which, as observed in previous recordings, are produced by the front and rear tires on twoaxle vehicles. The sound of each individual source, corresponding to tires of one of the axles, is obtained by modifying time delay estimation methods originally designed for a single sound source.
In [13], generalized crosscorrelations associated to particle filtering were employed for tracking vehicle sounds. The vehicle noise was modeled as bimodal source signal to account for the sound coming from both axles. From this model, more accurate estimates of car speed and wheelbase were produced, compared to those obtained with the unimodal source model. In a subsequent work [14], the authors proposed a methodology for choosing the appropriate intersensor distance to improve the accuracy of wheelbase estimations from passby observations, and addressed the problem of tracking road vehicles. However, the separate identification of sound directions coming from front and rear axles has not been addressed in [14], nor, as far as we know, in any other publication.
It is relevant to consider complex partial sound sources in their spatially specific components. In farfield conditions, a sound source may well be characterized by a concentrated source with a directional pattern. This might apply to street vehicles or airplanes in very large distance. When it comes to the simulation of dynamic scenes with moving sound sources and moving listeners, nearfield conditions are also relevant, so that the vehicle source is perceived as a spatially extended source. This is obvious for trains and trams. However, for street vehicles such as cars there is no comprehensive approach to this aspect. In this paper, we develop a technique to extract data for extended sound sources. This data can be used for physicallybased synthesis of virtual sources in virtual acoustic environments.
The estimations of speed and wheelbase from acoustic signals have been reported in the literature, with larger predominance of the former [13, 15–18] over the latter [14, 15]. In [16], for instance, a maximum likelihood approach is proposed for speed estimation, which resulted in more accurate and robust estimates when compared to the conventional analysis of sequential shorttime crosscorrelations.
The direction of arrival (DoA) method proposed in this paper combines time difference of arrival (TDoA) estimates of audio signals in a pair of microphones during a determined observation interval. From the results presented in [19], which compared several DoA estimation algorithms for car passby recordings, it was concluded that the best TDoA estimates were obtained with the generalized crosscorrelation with phase transform (GCCPHAT) method [20, 21]. In light of this result, in this work the GCCPHAT algorithm was employed with the purpose of generating crosscorrelation estimates of the signals simultaneously acquired by the twomicrophones, for different timelags. A matrix containing the crosscorrelation values of successive time intervals, while the car is close to the microphone array, is formed. By applying image processing techniques to this matrix, the time arrival differences of the predominant source signals are emphasized. Then, using a curve fitting procedure, which employs a dynamical model for the vehicle sound signal, the directions of sound arrival from the front and rear tires are obtained. From parameters extracted from these curves, car speed and wheelbase estimates can be obtained. Since information from the entire passby time range is employed to generate such estimates, they are more accurate than those based on data from short periods.
2 Direction of arrival estimation
Acoustic source localization might be achieved by timedelay estimation when more than one input channel is available. Let us consider the setup depicted in Figure 1. A twomicrophone array with intersensor distance d is placed parallel to a road lane. If a single source contribution is assumed, microphone signals are modeled as,
(2)where s(t) is the source signal and n _{1}(t), n _{2}(t) are the noise components.
Figure 1 Twomicrophone setup for delay τ _{0} and direction of arrival ϕ estimation. 
Assuming freefield conditions, the DoA needed for tracking the source and represented by azimuth angle ϕ, might be estimated by,
(3)where c is the sound speed and τ _{0} denotes the TDoA between the two microphone signals.
The estimator presented next was tested for DoA estimation of vehicle sound sources in a previous work [19], in which five algorithms were compared when executing such task. The results indicated two algorithms as suitable choices, as shown next.
2.1 Generalized crosscorrelation method
The cross correlation is a measure of similarity between two signals, which is expressed as a function of the delay τ _{0} between them. For white noise signals that differ only by a time delay τ _{0}, the crosscorrelation function presents a welldefined peak, with the maximum value occurring for a lag τ equal to τ _{0}. For colored noise, such as present in tire noise, the generalized crosscorrelation method applies a normalization function to the crosspower spectrum of the twomicrophone signals in order to obtain a more prominent peak in the crosscorrelation function [20], making it easier to estimate the TDoA.
The crosspower spectrum of the twomicrophone discretetime signals, x _{1}(n) and x _{2}(n), can be recursively estimated by [22],
(4)where X _{1}(m, k) and X _{2}(m, k) are, respectively, the Npoint discrete Fourier transforms (DFT) of the windowed microphone signals w(n − mJ)x _{1}(n) and w(n − mJ)x _{2}(n), with m being the frame index and k ∈ {0, …, N − 1} the discrete frequency index. The Nlength sequence w(n) usually employed in audio applications is the Hamming window with shiftsize J = N/2. The exponential weighting coefficient α is empirically set to α = 0.7.
The generalized crosscorrelation function for frame m is then calculated as,
(5)with n ∈ {0, …, N − 1}. The power spectrum normalization factor employed in Equation (5) is the crosspower estimate magnitude, resulting in a spectral function known as phase transform (PHAT) of . The inverse DFT of such function produces the generalized crosscorrelation function , which presents sharper peaks for most audio signals and gives rise to the GCCPHAT method [20].
Finally, the TDoA for frame m is estimated from as follows:
(6)where T is the sampling period of the discretetime microphone signals x _{1}(n) and x _{2}(n).
3 Twoaxle vehicle tracking
The TDoA estimation of Equation (6) provides a unique value for each time window, corresponding to the sound of the dominant source captured by the microphone signals. Therefore, one can conclude that the GCCPHAT algorithm, derived for a single sound source, is not suitable for detecting separately the various noises produced by vehicles. The generalized crosscorrelation function of Equation (5) calculated from the signals of a car passby acquired by a twomicrophone horizontal array is shown in Figure 2, where the darker the gray level, the higher the generalized crosscorrelation value. The GCCPHAT estimates, obtained from the dominant GCC peaks by Equation (6), are shown by the blue curve. It can be observed that when the car passes in front of the microphone array, the estimated TDoA alternates between the directions of the front and rear vehicle axles.
Figure 2 TDoA estimate obtained by the GCCPHAT algorithm. 
To overcome this problem, we propose the twosource DoA estimation system depicted in Figure 3, where image processing techniques followed by a curve fitting method are appended to the singlesource GCCPHAT algorithm. Although GCCPHAT is used throughout this work, the system was developed to allow changing and testing different single source TDoA estimators, such as those examined in [19]. Optional algorithms, such as maximumvariance distortionless response (MVDR) [23, 24] and leastmean square (LMS) [25, 26] algorithms, can obtain the TDoA estimations using different criteria, but they must have essential characteristics that are exploited in our system. All singlesource estimation algorithms must use twochannel audio recordings and provide as output a matrix A(t, τ), which is a function of the time index t and of the possible discrete time delays τ between the microphone signals. In GCCPHAT algorithm, this matrix corresponds to autocorrelation matrix estimate , since the delays are obtained from Equation (6).
Figure 3 Schematic diagram of the proposed twosource DoA estimation system. 
Singlesource estimators in their original formulation provide a single DoA estimated curve as output, formed by the maximum value estimations for the various time frames. Such singlesource estimate stage is ignored in the proposed system and is replaced by the processing which accounts for two sources. The matrix A(t, τ) is used as an input data for the image processing stage, which handles the information of all relevant time frames together. This is possible for offline applications such as the one aimed in this project.
Besides the data from the singlesource TDoA estimator, the curve fitting algorithm must be fed with a curve model. A rough model for TDoA curves is derived in agreement to passby dynamic and geometric characteristics, as shown in Section 3.1. Moreover, the steps comprised in the “Twosource Extension” block are explained in Sections 3.2 and 3.3. Data preprocessing stage aims at cleaning and adjusting input matrix A(t, τ) by removing spurious and irrelevant data. This is carried out in order to improve the curve fitting performance.
3.1 Theoretical TDoA model
The theoretical evolution of TDoA over time is obtained by calculating the difference in sound path between the source and the two microphones separated by a distance d.
In the following derivation, it is assumed that: the source is at ground level, in the z = 0 plane; the microphone array is in the y = 0 plane, with its axis parallel to the z = 0 plane and at the height h, measured from the floor up to the array center; the vehicle velocity v is parallel to the xaxis and has constant magnitude v _{ x }.^{1} From simple trigonometric relations and assuming that the vehicle speed is much smaller than the sound speed, it can be demonstrated that the time difference of arrival of the source sound in the two sensors in a given snapshot t is given by,
(9)with s _{ x } and s _{ y } equal to the distances between the source and the x = 0 and y = 0 planes, respectively.
An illustration of the theoretical TDoA behavior for constant speed is shown in Figure 4, where τ was obtained from Equations (7)–(9) with v _{ x } = 60, s _{ y } = 3, d = 0.25, and,
(10)where t _{0} = 1.5 s is the time instant in which the vehicle passes exactly in front of the microphone array.
Figure 4 Theoretical TDoA curve for a 60 km/h moving vehicle. 
3.2 Data preprocessing
The data provided by the singlesource DoA estimation algorithm goes through a preprocessing stage consisting of level scaling and noise reduction to adjust it to the curve fitting algorithm. Using the GCCPHAT, the matrix A(t, τ) contains the generalized crosscorrelation between the twomicrophone signals for different time windows and lag values. Alternative DoA estimation algorithms can provide metrics other than the crosscorrelation. The scaling of the input data is, therefore, needed in order to adjust the twodimensional data contained in A(t, τ), so that it can be treated as a digital image representation, with values in the grayscale range. This image will be further processed to reduce noise by twodimensional filters.
Due to the decreasing amplitude of the sound propagating over long distances between the source and the microphones, the relevant audio data is concentrated around t = t _{0}, when sourcereceiver distance is minimum and signaltonoise ratio is maximum. Thus, the data scaling is followed by cropping, which aims to discard irrelevant audio recordings, acquired when the vehicle is far away from the microphones. A 3length data window is selected around the sample corresponding to maximum signal power (t = t _{0}) and the rest is ignored.
In a real passby scenario, the interfering noise generated by other sources, instead of the vehicle, can corrupt the recordings. Depending on noise source position, intensity and spectral content, this interference can seriously impair the performance of the curve fit method. For this reason, the input data is appropriately filtered, in an attempt to reduce noise.
First, the grayscale image is converted into a binary image using a threshold value, selected based on empirical tests. A binary image, composed of either zero or one values, is required by signal processing algorithms to perform morphological operations, such as dilation and erosion. Such operations are used to reduce noise and to provide more precise edge detection to fit the DoA estimation. This empirical optimization leads to different threshold values for the alternative singlesource DoA estimators, as a consequence of the diversity in the pixel distribution of the generated images. To illustrate this effect, the images generated by GCCPHAT and MVDR are shown in Figure 5, together with the respective histogram plots. The histogram graphs indicate how pixel graylevels are distributed across intensity levels. While for GCCPHAT pixels are highly concentrated around the 0.3 level, for the MVDR they are more evenly distributed across the different levels. This clear difference between images indicates the relevance of choosing an appropriate threshold value. Binary images generated for different thresholds are depicted in Figure 6. A tradeoff between removing noise and maintaining sufficient relevant data is observed.
Figure 5 Images obtained with (a) GCCPHAT and (b) MVDR singlesource estimators, with respective pixel histograms in (c) and (d). 
Figure 6 Binary images after applying the following threshold values: (a) 0.025, (b) 0.1 and (c) 0.25. 
Next, we describe the image processing approach proposed for noise reduction, with the illustrative results of the main steps shown in Figure 7. The morphological opening operation (dilatation followed by erosion [27, 28]) is performed on the binary image using a square structuring element, which eliminates isolated black pixels. Most noise is removed after the opening operation, as can be observed in Figure 7c.
Figure 7 Illustration of preprocessing and curve fitting results. (a) Grayscale image; (b) binary image; (c) image after morphological opening; (d) mean TDoA curve; (e) selected data; (f) fitted curves. 
In this image, two main parallel curves are highlighted, indicating the presence of two dominant noise sources. The time shift between them is in accordance with the sounds emitted by sources whose distance is the average wheelbase in passenger cars. Therefore, it can be concluded that these sound components resulted from the noise generated by the vehicle’s front and rear axles. This assumption is supported by the work presented in [14], in which a pair of loudspeakers was placed in front of the vehicle’s wheels and the resulting TDoA estimate exhibits the same pattern observed in Figure 7a.
In order to track the noise emitted by the tires of each axle separately, two distinct data sets must be provided to the curve fitting algorithm. Firstly, an “average curve” is calculated to define the points which belong to each axle. This curve is represented by the red line in Figure 7d and is obtained by averaging the indexes of the nonzero values of each column of the binary image. Then, this curve is used as the border line to separate the data into two sets. The data to the left of the average curve, represented in blue in Figure 7e, is associated with the emission of the frontaxle tires (since the left curve appears ahead in time). Likewise, the data to the right of the average curve, displayed in orange in Figure 7e, is associated with the emission of the rearaxle tires. These two data sets, corresponding to the left and right pixels with respect to the average TDoA curve, respectively, are delivered to the curve fitting stage.
3.3 Curve fitting method
The theoretical model obtained for the TDoA in Equation (7) serves here as prototype to fit the data, and the leastsquares algorithm is used as optimizer. In light of the nonlinear theoretical TDoA graph in Figure 4, the curve fitting method is applied to the image resulting from the preprocessing stage using a trust region for nonlinear optimization [29–32]. In addition, spurious data may not be eliminated in preprocessing and may appear as anomaly in the selected data. This is especially problematic, given the known sensitivity of leastsquares algorithms to outliers. In this sense, a robust least squares approach is employed. The cost function is weighted iteratively using bisquare weights [33, 34].
The curve parameters are initialized with random values from a Gaussian distribution. The unknown variables in Equation (7) are the instant t _{0}, distance s _{ y } and the speed v _{ x }. For each of them, we can define lower and upper limits. These intervals can be used as prior knowledge for initialization by setting the mean values of the random distributions as the centers of the defined intervals. The result is a fitted curve for each data vector, as shown in Figure 7f.
The fitting algorithm finds the model parameter values that minimize the mean squared error between data points and the corresponding fitted curve values. The speed is a parameter of the model of Equation (7) and is estimated directly in the optimization process. The two curves are adjusted independently and the estimated parameters, including speed, may be slightly different for each axle. For simplicity, the average value is used as speed estimate. On the other hand, the wheelbase is estimated as the distance between the fitted curves. This distance is evaluated when the delay τ = 0, that is, when each axle is symmetrically in front of the microphone array. In a controlled test scenario, the optimized parameters can be compared to the actual values as a measure of the accuracy of the algorithm.
4 Experimental results
A set of experiments was conducted with signals acquired by an array of two microphones, aligned horizontally and spaced by d = 0.25 m. It consisted of cars passing on front of the array, one at a time, in a quiet region, without traffic and with negligible background noise, located at the Brazilian National Institute of Metrology (INMETRO), Rio de Janeiro, Brazil. Four different passenger cars were used in the experiments, as detailed in Table 1. Each car passed by the microphone array at constant speeds of around 30, 50, 60 and 70 km/h and with a distance s _{ x } = 2.0 ± 0.2 m from the array origin, as depicted in Figure 8. A GPS device placed inside the vehicle was used to estimate the speed, based on the car position and time. The vehicle wheelbases were obtained from information provided by their respective manufacturers.
Figure 8 3D representation of the measurement setup: microphone array parallel to the vehicle trajectory; height h from the floor up to the array center; distance l between the source and the reference sensor. 
Vehicles used in passby tests.
The performance of the proposed system was evaluated by comparing the actual values of the speed and wheelbase of the vehicles, measured during the experiments, with the estimated ones. The speed estimates are represented by blue circles in Figure 9 for each test, while measured values are represented by red stars. Test results were sorted in ascending order of the measured speed values for visualization purposes. The average absolute error was 10.8 km/h. It can be seen in this figure that the estimation error clearly increases for higher speeds, especially for values above 50 km/h. If only the experiments with measured speed below 50 km/h are considered, the average absolute error is 6.3 km/h. On the other hand, if only the experiments with measured speed above 50 km/h are considered, the average absolute error increases to 18.1 km/h.
Figure 9 Speed estimates obtained using the proposed system with random parameter initialization. 
A tendency of underestimating the car velocity is observed in Figure 9, especially for speeds above 50 km/h. One possible cause could be the simplistic model used for the theoretical evolution of the TDoA over time, which assumed constant speed much smaller than the speed of sound and did not take into account the Doppler effect. The curves obtained for Test 22, which presented the largest speed error, are shown in blue in Figure 10. Despite the poor speed estimation result, the fitted curves are well adjusted to the delays obtained for the noise of the two axles of the vehicle, indicating that the error is not caused by the fitting optimization process, but by the theoretical model of the curve.
Figure 10 Tracking results for the highestspeederror test. 
The wheelbase estimation results are presented in Figure 11 for the experiments in ascending speed order. Blue circles and red stars indicate estimated and measured values, respectively. The average estimation error was 26 cm. Unlike speed estimation, wheelbase results do not appear to be correlated to the vehicle speed. This is in agreement with the sensitivity study carried out in [13], where the increase of vehicle speed had no influence in wheelbase estimation, but caused larger mean error and standard deviation in speed estimation.
Figure 11 Wheelbase estimates obtained using the proposed system with random parameter initialization. Errors of Test IDs 3 and 8 are due to the loud engine noise of Car 2, which was forced to not change gears during the passage. 
Tests 3 and 8 stand out in Figure 11 for presenting high errors. These two tests share similarities which explain the obtained result. The audio files used in Tests 3 and 8 contain sound emissions from the same vehicle, identified as Car 2 in Table 1. Car 2 emissions were also registered in Tests 4, 15, 16, 19 and 22, which resulted in estimation errors below 30 cm. However, for passby trials 3 and 8, the vehicle was forced to travel in low gear and high engine speed. Engine noise is affected by engine speed [4] and could increase to a level comparable to or higher than tire noise. The proposed system assumes that the tire noise is the dominant sound source and, when this condition is not maintained, such as in Tests 3 and 8, it is expected that the estimation of delay and wheelbase will fail.
Given that the same four vehicles were used throughout the 23 experiments, an average wheelbase estimate is calculated and depicted in Figure 12. For each vehicle, blue circle indicates the wheelbase estimate averaged over all trials in which the vehicle was recorded and red star indicates the measured value. Vehicle 2 presented the highest wheelbase estimation error, equal to 32 cm, due to increased engine noise, whereas Cars 1, 3 and 4 presented errors of 14, 24 and 12 cm, respectively.
Figure 12 Average wheelbase estimate for each vehicle. 
A second set of estimations is performed using measured speed values to initialize the parameters, instead of random initialization. As expected, speed estimation error decreases, as depicted in Figure 13, and the average absolute error in this scenario is 4.9 km/h. The underestimate tendency is even more noticeable, as it happens for 22 (or 95%) of the 23 tests. The increasing error for higher speeds is still present, although the absolute error for speeds above 50 km/h decreased to 7.4 km/h.
Figure 13 Speed estimates obtained using the proposed system and parameter initialization with measured speed values. 
In contrast, wheelbase estimation is barely affected by this change in initialization. The estimates in Figure 14 are almost identical to the ones in Figure 11 and the same 26 cm average error was obtained. The wheelbase is estimated as the distance between both fitted curves when τ = 0. Therefore, this value is highly correlated to parameter t _{0}, which indicates the instant when the curve crosses τ = 0. Speed parameter, in contrast, affects the curve slope around t _{0} and for that reason does not present an impact on wheelbase estimates.
Figure 14 Wheelbase estimates obtained using the proposed system and parameter initialization with measured speed values. 
5 Conclusion
In this paper, we presented a method for separately tracking the axles of a twoaxle vehicle using a pair of microphones. The approach is divided in two main steps: first, a time difference of arrival estimator calculates shorttime crosscorrelations between the microphone signals over an observation time interval, which are preprocessed and stored in a data matrix. Secondly, the matrix with the accumulated treated data is used to obtain two curves, corresponding to the direction of the tire noise of the two axles. From the curve fitting results, the speed and wheelbase of the vehicle are estimated. This modularized approach allows easier testing new algorithms and models and comparing their performance.
In the preprocessing stage, the concatenated crosscorrelation data is processed altogether using image processing techniques. Thresholding and opening operations are applied to the image for noise reduction. The choice of the threshold value is critical for the overall system performance and should be further investigated and optimized. Image pixels are then separated into two data vectors, which are separately used in the twocurve fitting model. The proposed system relies on a theoretical model for the time difference of arrival which should be improved in further studies.
The results for vehicle noise tracking were satisfactory and can be applied, as intended, for obtaining source models to be used in acoustic virtual reality systems. The speed estimate is not accurate, especially at high speeds, with the absolute mean speed estimate calculated only over low speed experiments equal to 6.3 km/h and calculated over all experiments equal to 18.1 km/h. In contrast, the wheelbase estimation errors were not correlated to the speed estimation errors, with average absolute error equal to 26 cm. With this, we can continue to build improved models of virtual vehicles as sources in virtual acoustic environments.
It is planned to apply the technique in longterm measurements at busy roads. With additional methods to identify categories of vehicle types and their speeds by video annotation, the aim is to fill a database of parametric data of street vehicles.
Conflict of interest
Author declared no conflict of interests.
Acknowledgments
The authors would like to thank CAPES and DAAD for partially supporting this research. We are also grateful to the National Institute of Metrology, Standardization and Industrial Quality (INMETRO), in the figures of Paulo M. Massarani and Zemar M. D. Soares for recordings assistance and to prof. Fernando A. N. C. Pinto, from Laboratory of Acoustics and Vibration from COPPE/UFRJ, for instrumentation support.
References
 M. Vorländer: Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic Virtual Reality, 2nd ed. Springer Nature, 2020. [Google Scholar]
 B. Masiero, W.D. Fonseca, M. MüllerTrapet, P. Dietrich, Auralization of passby beamforming measurements, in EAA EUROREGIO, 15–18 September 2010, Ljubljana, Slovenia. 2010. [Google Scholar]
 M. Nilsson, J. Forssén, P. Lundén, A. Peplow, B. Hellström: Listen Auralization of Urban Soundscapes. Stockholm University, Chalmers University of Technology, Sonic Studio, KTH Royal Institute of Technology, University College of Arts, Crafts and Design, 2011. [Google Scholar]
 R. Pieren, T. Bütler, K. Heutschi: Auralization of accelerating passenger cars using spectral modeling synthesis. Applied Sciences 6, 1 (2016) 5. [Google Scholar]
 L. Jiang, M. Masullo, L. Maffei, F. Meng, M. Vorländer: A demonstrator tool of webbased virtual reality for participatory evaluation of urban sound environment. Landscape and Urban Planning 170 (2018) 276–282. [Google Scholar]
 P.H.T. Zannin, F.B. Diniz, W.A. Barbosa: Environmental noise pollution in the city of Curitiba, Brazil. Applied Acoustics 63, 4 (2002) 351–358. [Google Scholar]
 P.H. Zannin, A. Calixto, F.B. Diniz, J.A. Ferreira: A survey of urban noise annoyance in a large Brazilian city: The importance of a subjective analysis in conjunction with an objective analysis. Environmental Impact Assessment Review 23, 2 (2003) 245–255. [Google Scholar]
 B. Jakovljevic, K. Paunovic, G. Belojevic: Roadtraffic noise and factors influencing noise annoyance in an urban population. Environment International 35, 3 (2009) 552–556. [Google Scholar]
 S. Agarwal, B.L. Swami: Road traffic noise, annoyance and community health survey – a case study for an Indian city. Noise and Health 13, 53 (2011) 272–276. https://doi.org/10.4103/14631741.82959. [Google Scholar]
 K. Heutschi, E. Bühlmann, J. Oertli: Options for reducing noise from roads and railway lines. Transportation Research Part A: Policy and Practice 94 (2016) 308–322. [Google Scholar]
 U. Sandberg: Tyre/road Noise: Myths and Realities. Statens vägoch transportforskningsinstitut, 2001. [Google Scholar]
 D. O’Boy, A. Dowling: Tyre/road interaction noise – numerical noise prediction of a patterned tyre on a rough road surface. Journal of Sound and Vibration 323, 1–2 (2009) 270–291. [Google Scholar]
 P. Marmaroli, J.M. Odobez, X. Falourd, H. Lissek: A bimodal sound source model for vehicle tracking in traffic monitoring, in 2011 19th European Signal Processing Conference, IEEE. 2011, pp. 1327–1331. [Google Scholar]
 P. Marmaroli, M. Carmona, J.M. Odobez, X. Falourd, H. Lissek: Observation of vehicle axles through passby noise: A strategy of microphone array design. IEEE Transactions on Intelligent Transportation Systems 14, 4 (2013) 1654–1664. [Google Scholar]
 V. Cevher, R. Chellappa, J.H. McClellan: Vehicle speed estimation using acoustic wave patterns. IEEE Transactions on Signal Processing 57, 1 (2008) 30–47. [Google Scholar]
 R. LópezValcarce, C. Mosquera, F. PérezGonzález: Estimation of road vehicle speed using two omnidirectional microphones: A maximum likelihood approach. EURASIP Journal on Advances in Signal Processing 2004, 8 (2004) 929146. [Google Scholar]
 P. Borkar, L.G. Malik: Review on vehicular speed, density estimation and classification using acoustic signal. International Journal for Traffic & Transport Engineering 3, 3 (2013) 331–343. [Google Scholar]
 F. PerézGonzález, R. LópezValcarce, C. Mosquera: Road vehicle speed estimation from a twomicrophone array, in 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, IEEE. 2002, p. II–1321. [Google Scholar]
 G.D. Rocha, F.R. Petraglia, J.C.B. Torres, M.R. Petraglia: Direction of arrival estimation of acoustic vehicular sources, in Proceedings of the 23rd International Congress on Acoustics, 9–13 September, Aachen, Germany. 2019. [Google Scholar]
 C. Knapp, G. Carter: The generalized correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 24, 4 (1976) 320–327. [Google Scholar]
 D. Hertz, M. Azaria: Time delay estimation between two phase shifted signals via generalized crosscorrelation methods. Signal Processing 8, 2 (1985) 235–257. [Google Scholar]
 G. Doblinger: Localization and tracking of acoustical sources, in Topics in Acoustic Echo and Noise Control, Springer. 2006, pp. 91–122. [Google Scholar]
 S. Vorobyov: Principles of minimum variance robust adaptive beamforming design. Signal Processing 93, 1 (2013) 3264–3277. [Google Scholar]
 J. Capon: Highresolution frequencywavenumber spectrum analysis. Proceedings of the IEEE 57, 8 (1969) 1408–1418. [Google Scholar]
 F. Reed, P. Feintuch, N. Bershad: Time delay estimation using the lms adaptive filter–static behavior. IEEE Transactions on Acoustics, Speech, and Signal Processing 29, 3 (1981) 561–571. [Google Scholar]
 E. Ferrara: Fast implementations of lms adaptive filters. IEEE Transactions on Acoustics, Speech, and Signal Processing 28, 4 (1980) 474–475. [Google Scholar]
 R.M. Haralick, L.G. Shapiro: Computer and Robot Vision, Vol. 1. AddisonWesley Reading, 1992. [Google Scholar]
 R.M. Haralick, S.R. Sternberg, X. Zhuang: Image analysis using mathematical morphology, in IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 4, IEEE. 1987, pp. 532–550. [Google Scholar]
 J.J. Moré, D.C. Sorensen: Computing a trust region step. SIAM Journal on Scientific and Statistical Computing 4, 3 (1983) 553–572. [Google Scholar]
 M.A. Branch, T.F. Coleman, Y. Li: A subspace, interior, and conjugate gradient method for largescale boundconstrained minimization problems. SIAM Journal on Scientific Computing 21, 1 (1999) 1–23. [Google Scholar]
 R.H. Byrd, R.B. Schnabel, G.A. Shultz: Approximate solution of the trust region problem by minimization over twodimensional subspaces. Mathematical Programming 40, 1–3 (1988) 247–263. [Google Scholar]
 T.F. Coleman, Y. Li: An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization 6, 2 (1996) 418–445. [Google Scholar]
 P.W. Holland, R.E. Welsch: Robust regression using iteratively reweighted leastsquares. Communications in StatisticsTheory and Methods 6, 9 (1977) 813–827. [Google Scholar]
 J.O. Street, R.J. Carroll, D. Ruppert: A note on computing robust regression estimates via iteratively reweighted least squares. The American Statistician 42, 2 (1988) 152–154. [Google Scholar]
Cite this article as: Rocha GD, Torres JCB, Petraglia MR & Vorländer M. 2021. Direction of arrival estimation of partial sound sources of vehicles with a twomicrophone array. Acta Acustica, 5, 18.
All Tables
All Figures
Figure 1 Twomicrophone setup for delay τ _{0} and direction of arrival ϕ estimation. 

In the text 
Figure 2 TDoA estimate obtained by the GCCPHAT algorithm. 

In the text 
Figure 3 Schematic diagram of the proposed twosource DoA estimation system. 

In the text 
Figure 4 Theoretical TDoA curve for a 60 km/h moving vehicle. 

In the text 
Figure 5 Images obtained with (a) GCCPHAT and (b) MVDR singlesource estimators, with respective pixel histograms in (c) and (d). 

In the text 
Figure 6 Binary images after applying the following threshold values: (a) 0.025, (b) 0.1 and (c) 0.25. 

In the text 
Figure 7 Illustration of preprocessing and curve fitting results. (a) Grayscale image; (b) binary image; (c) image after morphological opening; (d) mean TDoA curve; (e) selected data; (f) fitted curves. 

In the text 
Figure 8 3D representation of the measurement setup: microphone array parallel to the vehicle trajectory; height h from the floor up to the array center; distance l between the source and the reference sensor. 

In the text 
Figure 9 Speed estimates obtained using the proposed system with random parameter initialization. 

In the text 
Figure 10 Tracking results for the highestspeederror test. 

In the text 
Figure 11 Wheelbase estimates obtained using the proposed system with random parameter initialization. Errors of Test IDs 3 and 8 are due to the loud engine noise of Car 2, which was forced to not change gears during the passage. 

In the text 
Figure 12 Average wheelbase estimate for each vehicle. 

In the text 
Figure 13 Speed estimates obtained using the proposed system and parameter initialization with measured speed values. 

In the text 
Figure 14 Wheelbase estimates obtained using the proposed system and parameter initialization with measured speed values. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.