Open Access
Issue
Acta Acust.
Volume 8, 2024
Article Number 35
Number of page(s) 13
Section Virtual Acoustics
DOI https://doi.org/10.1051/aacus/2024026
Published online 13 September 2024

© The Author(s), Published by EDP Sciences, 2024

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

1.1 Background

Beyond the scope of the classical noise mapping, auralization can be applied to acoustically-focused urban planning [1, 2], to compare acoustic prototypes in listening experiments or for physics-based sound field rendering at virtual microphone arrays. These auralization frameworks can be visually enhanced with head-mounted displays by using virtual reality (VR) methods [35]. To enhance the features of a simple audio recording to a multidimensional representation, auralization models aim to reproduce the spatial and spectro-temporal radiation characteristics of real-world sound sources. Model-based syntheses are especially challenging to produce when perceptual plausibility is a quality criterion. For perceptually plausible auralizations, the consideration of fluctuations in the sound field is crucial, regardless of whether they are generated at the source itself or during sound propagation. In the latter case, this has been shown, for example, in the case of aircraft noise [6] or wind turbine noise [7]. In the case of drones, plausible synthesis is especially critical with regard to time-variant amplitude and frequency modulations of the tonal component. These emanate, on the one hand, from the speed control of the electric motors and, on the other hand, from advection and refraction effects due to aerodynamic turbulence in the rotor surroundings. This paper presents a drone auralization model accounting for the statistical amplitude and frequency modulation distributions of the tonal component. For model completeness, the drone directivity is computed by using a time-variant wave backpropagation method that is compensating for wave propagation effects as well as non-ideal, directional properties of the measurement microphones.

General drone noise emission characteristics have been widely investigated in a large number of publications, c.f. [8, 9]. In most of the publications, microphone array signals are processed to compute one-third octave band spectra and the emitted sound power. An important finding is that the directivity pattern of drones can be assumed as independent of the rotational speed of the rotors and of the flight procedure [10]. By using a comprehensive circular microphone array, Herold [11] measured the in-flight sound power and directivity of a drone in an anechoic chamber and compared his results to analytical monopoles and dipoles. Furthermore, he reconstructed the flight path using a multi-dimensional beamforming analysis. Alkmim et al. [12] presented a spherical harmonics decomposition of drone noise emission based on static hemi-spherical array data in order to predict far-field sound pressure levels. In both studies, no compensation of sound propagation effects due to atmospheric absorption or of the microphone directivity were reported. Heutschi et al. [13] presented a drone auralization model with focus on a parameterization of the rotational speed dependency and derived a generic directivity. For auralization, this model parameters can be used in combination with an anechoic source signal recording from a hovering maneuver. Focusing on psychoacoustic noise assessments in the field of advanced air mobility (AAM), Lotinga et al. [14] found in a review paper sharpness and tonality to more accurately describe drone sound emission compared to level-based noise metrics regarding its comparability in terms of annoyance ratings.

1.2 Contribution of this work

Beyond previous work, this paper presents a drone auralization model focusing on perceptual plausibility by statistically modeling the amplitude and frequency modulation distributions in the drone’s tonal sound emission. Both elements are calibrated by using a power matching approach so that the spectro-temporal source characteristics of the reference drone is fully replicated and optionally level-adapted to measurement- or simulation-based data of other drones (Fig. 3). A common approach of source signal synthesis for auralizations is to discretize the overall signal mixture into a procedural audio synthesis of tonal components and subtractive synthesis of white noise for representing stochastic components, c.f. [15]. The basic idea for statistically modeling the modulations in the drone’s tonal sound emission is sketched in Figure 1. A simple solution for auralizing the tonal component would be based on amplitude estimates AN at N discrete frequencies from a spectrum of a drone recording. With the drawback of sounding unnatural and artificial, the synthesis would reproduce the amplitudes of purely mono-frequent sinusoidal oscillators by using additive synthesis (vertical black lines). For enhanced plausibility, the presented work focuses on the statistical modeling of the instantaneous amplitudes and frequency variations (blue Gaussian distributions, modulation variance σ2) to include the characteristic “bee” sound of a drone in auralization. In conclusion, the core of this paper comprises a method for estimating amplitude and frequency variations of drone noise tonal components from measurement in a controlled environment, a method for estimating drone noise emission directivity by time-variant wave backpropagation, and a validation of the procedural audio synthesis method including tonal and broadband components by comparing psychoacoustic metrics obtained on the recorded and auralized samples for the hovering drone.

thumbnail Figure 1

Schematic spectral effect in the tonal component synthesis with (blue) and without (black) modulations.

This paper is structured as follows: Section 2 introduces the modeling concept and presents details on the hemi-anechoic measurement setup. In Section 3, the analysis and synthesis techniques for both drone sound emission components are presented. Section 4 contains technical aspects on the time-variant wave backpropagation and the resulting drone directivity. Notable aspects regarding the psychoacoustical quality of the synthesis result are presented in Section 5. Finally, the results are discussed and an outlook is given in Section 6.

2 Source characterization of drone sound emission

The acoustic source characterization of a reference drone (DJI Mavic Pro, shown in Fig. 2) is based on measurements in the hemi-anechoic chamber of the Institute for Hearing Technology and Acoustics (IHTA, RWTH Aachen University, Germany).

thumbnail Figure 2

Reference drone type DJI Mavic Pro.

2.1 Modeling concept

The general goal of source characterization as basis for use in auralization is the description of a sound source operating under free-field conditions, i.e. excluding any influences from room reflections or non-absorbing boundaries. Note, although a drone radiates considerable amount of sound energy in the ultrasonic frequency range (Fig. 5), these frequencies are not considered in this auralization model since they are inaudible. The auralization model aims to reproduce the spectro-temporal and spatial characteristics of recorded drone sound emission data. In this work, the model structure consists of two blocks, an emission signal in form of a synthesized audio stream and a directivity in form of angle-dependent magnitude spectra or impulse responses (Fig. 3). As described in more detail in Section 2.4, both components are calibrated with regard to a correct sound pressure reconstruction on a virtual spherical surface with a radius of 1 m around the drone’s center. For example, this model structure is compatible with the open-source auralization framework Virtual Acoustics (VA) [16] with additional directivity coding in the OpenDAFF format [17]. For emission signal modeling (c.f. Sect. 3), at first the tonal and stochastic contributions to the overall sound are obtained from recorded data as two separate components of a steady-state hovering maneuver. The stochastic contribution was modeled by using an auto-regressive approach to extract coefficients for a digital infinite impulse response (IIR) filter. For modeling the tonal contribution, after individual extraction of emitted tones, their statistical variation regarding frequency and amplitude modulations were analyzed. For directivity modeling, the radiated emission to the lower hemisphere in an angular range 0° < ϕ < 180° and 20° < θ < 90° is reconstructed on a virtual spherical surface with a radius of 1 m around the drone’s center. In the case of drones with contra-rotating propellers, a symmetry of sound emission along the vertical median plane can be assumed, thus, reducing the number of microphones in the array (c.f. Fig. 4). The reconstruction is computed from flyover array measurements using a time-variant wave backpropagation (c.f. Sect. 4).

thumbnail Figure 3

General auralization model concept: The omnidirectional component is implemented in the sound synthesizer and convolved with higher-order spatial components (directivity). This combined approach can optionally be level-calibrated with any external data from measurements or simulations.

thumbnail Figure 4

Schematic of the drone measurement setup in the hemi-anechoic chamber. The table indicates the lateral microphone distances to the fly-over trajectory and the according emission elevation angle θ.

2.2 Measurement setup

The frequency range of the measurement is valid down to the hemi-anechoic room’s cut-off frequency at about fcut-off ≈ 125 Hz. The sound emission was recorded in two settings, during a hovering maneuver and a flyover. The first measurement is the basis for the extraction of the drone’s sound emission signal (Sect. 3) whereas the second measurement is post-processed to extract the emission directivity (Sect. 4). For the fly-over measurement shown in Figure 4, the drone repeatedly passes a non-uniformly spaced linear microphone array with a constant speed of v = 0.75 m/s in a fly-over altitude of h = 2 m. The drone speed and altitude were controlled by using its optical sensor-assisted automation.1 The speed was externally validated by placement of two timer microphones on the ground at both ends of a trajectory with length L = 10 m. The microphone array is orthogonally placed to one side of the flight direction and symmetrically centered half-way on the trajectory. In the case of drones with contra-rotating propellers, the measurement data can be mirrored at the vertical median plane due to geometric symmetry of the drone’s fuselage. The microphones are non-uniformly spaced on the floor in geometrical continuation to their position with an angular spacing of ϕ = 10° on a hemi-spherical surface with radius r = 1 m around the drone’s center. The emission angles for elevation Θ and azimuth ϕ, as shown in Figure 4 are calculated by

(1)

(2)

with x being the distance travelled and d being the lateral microphone distance.

2.3 Measurement equipment

The measurement was performed by using a set of eight calibrated measurement microphones placed on the rigid floor (Fig. 4). On flyover median plane (channel 1), an NTI Audio M2230 (Class 1 acc. to IEC 61672 and ANSI S1.4) was installed, whereas omnidirectional Sennheiser KE4 microphones were installed flat on the ground on the off-axis positions (channels 2–8). All microphones were connected to an RME Octamic XTC audio interface, set to fs = 96 kHz sampling rate and NADC = 24 bit depth. The microphones were equipped with windshields to avoid aeroacoustic distortion on the capsules. The atmospheric conditions were 20 °C temperature and 40% humidity.

2.4 Calibration procedure

The drone auralization model comprises an iterative calibration option directly at the source (yellow boxes in Fig. 3). The calibration procedure depends on three prerequisites:

  1. The digital full-scale value of the source signal synthesis must be referenced to a defined sound pressure (in the unit of Pascals).

  2. The according directivity data are stored in form of linear-scaled amplitude values.

  3. The area represented by each directivity data point on the unit sphere is known.

All requirements are fulfilled for the extracted model parameters from Section 3. The procedure is as follows: An overall calibration gain factor Gcal (small yellow box) is defined as the ratio of the target sound source power Pcal and uncalibrated sound source power Pcal

(3)

with Puncal being calculated by

(4)

with the area weight wn being the proportional fraction of each represented directivity data point on a unit sphere. The sound intensity In is computed for each directivity surface element n as

(5)

with df being the directivity’s linearly scaled amplitude factors of each frequency bin, sf being a frequency bin’s amplitude of the uncalibrated emission signal and Z0 = ρ0 ⋅ c being the characteristic wave impedance.

Alternatively, as often applied in auralization frameworks, the source’s playback level can be calibrated at the very end of the auralization chain by reproducing the decibel value given in Figure 5 at a virtual receiver in the free field in 1 m distance and at emission angle θ = 45° and ϕ = 0°.

thumbnail Figure 5

Reference drone broadband emission spectrum (125 Hz < f < 48 kHz) and sound pressure level (SPL), recorded during hovering manoeuvre at a 1 m distance at emission angle θ = 45° and ϕ = 0° under free-field conditions. A 1/24-octave band smoothing is applied for a better visual clarity.

3 Analysis and synthesis of drone sound emission

3.1 Broadband spectrum

The drone’s sound emission is recorded at a 2 m distance during a hovering manoeuvre in the hemi-anechoic chamber. The sound pressure level at a 1 m radial distance from the source can be directly calculated from the calibrated audio recording due to the doubled sound pressure on the rigid floor (compared to free-field conditions). In the broadband spectrum, shown in Figure 5, tonal and broadband noise components can be distinguished by local level peaks and a shaped noise floor.

Three different characteristic spectral ranges can be distinguished in the broadband spectrum (Fig. 5):

  1. The frequency range f < 2.5 kHz is dominated by tonal components.

  2. The range f > 2.5 kHz contains amplitude-modulated, shaped noise, with a non-perceptual tonal component.

  3. A broad peak emission can be observed in the ultrasonic frequency range 35 kHz < f < 45 kHz.

Due to their clearly distinguishable characteristics, the first two components are separately synthesized in the presented drone auralization model. The presented drone auralization model assumes the broadband noise to be inaudible in the low frequency range f < 2.5 kHz and the tonal components to be inaudible in the range f > 2.5 kHz due to spectral masking considerations [18]. Note, since the ultrasonic component is inaudible for human hearing it is not considered in the presented model. However, despite high atmospheric absorption an impact of ultrasonic drone emission on bat echolocation has been critically discussed in [19].

3.2 Analysis of tonal amplitude and frequency modulations

3.2.1 Tone extraction

The broadband spectrum (Fig. 5) is a two-dimensional representation of the sources spectral characteristics, omitting details on temporal variations. At first, in order to analyze the amplitude modulations (AM) and frequency modulations (FM) individually by using the analytic signal (Sect. 3.2.2), each tone needs to be extracted from the broadband spectrum using steep bandpass filters, symmetrically centered around the observed peak frequency. The recorded data of the hovering maneuver is sampled at fs = 96 kHz, which is necessary to realize a filter bank of steep bandpass filters. As a rule of thumb, a filter bandwidth of Bn = ±0.1fc,n around the center frequency fc of tone n (with n ∈[1, 22]) was found necessary for a precise tone extraction. The passband ripple was chosen to 0.1 dB with a stopband attenuation of 60 dB. The filters are realized as linear-phase finite impulse response (FIR) filters based on the Kaiser window method. The resulting spectra of the first 22 individual tones are plotted with colored lines in Figure 6 and compared to the broadband spectrum.

thumbnail Figure 6

Spectra of the first 22 extracted tones (colored), compared to the originally recorded broadband spectrum. A 1/24-octave band smoothing is applied for a better visual clarity.

3.2.2 Instantaneous amplitude and frequency computation

For plausible drone sound synthesis, each of the drone’s tonal oscillators can be described by its instantaneous amplitude and frequency. The time-variant information of both parameters can be directly extracted from the bandpass-filtered tones by computation of the discrete-time analytic signal, that in turn, is the basis for the computation of an instantaneous magnitude and phase [20]. An analytic signal is given by the complex sum of the original signal and an imaginary part equal to its Hilbert transform by

(6)

with the Hilbert transform being defined by

(7)

After bandpass filtering around each oscillator frequencies, the time-varying amplitudes for synthesis are obtained from the envelope en for all N tonal oscillators by calculation of the instantaneous magnitude of the analytic signal by

(8)

The time-variant description for instantaneous frequency of each harmonic oscillator ϕN(t) is obtained by differentiation of the analytic signal phase

(9)

As an example, it can be observed in Figure 7 that the instantaneous amplitude variation of the fifth extracted tone exhibits the pressure range ≈0.005 Pa < p5 < 0.033 Pa, whereas instantaneous frequency variation is in the range 352 Hz < f5 < 364 Hz. Assuming normal distribution, each individual tone’s statistical information about amplitude and frequency modulation rate and depth is stored in form of variance values σ2.2 These values are directly applicable to the oscillator properties for procedural sound synthesis. The time-variant plot of the extracted harmonics (Fig. 8) shows the frequency modulations (FM) as well as amplitude modulations (AM). For excluding distortion in the statistics due to any possible drifting-based variation of the geometrical spreading loss between the measurement microphone and the noise-emitting oscillators, the zero-mean modulations are computed first.

thumbnail Figure 7

Sound pressure (left) and instantaneous frequency (right) of the fifth extracted tone with mean frequency f5,mean = 358.75 Hz.

thumbnail Figure 8

Instantaneous tonal amplitude and frequency fluctuation in the far-field sound measurement. Coloration of each tone acc. to Figure 6.

Since the sound is measured after travelling through the turbulent layer, each oscillator frequency is individually modulated and therefore varies independently of time. The statistical mean and variance values of the instantaneous amplitude and instantaneous frequency are shown in Figure 9 and according numerical values are provided as a separate table in the supplemental materials.

thumbnail Figure 9

Instantaneous amplitude and frequency variance in the tonal component. The frequency variance is exaggerated in the plot by a factor of 10 for a better visual clarity.

3.3 Auto-regressive modeling of noise component

The spectral characteristics of the noise component are modeled in this work as an autoregressive (AR) process, whose precision is depending on the model order p. The goal from autoregressive modeling is to determine numerical parameters of a p-th order all-pole IIR filter that, by convolution with white noise at its input, produces a time signal with the same random process statistics as the originally recorded process. Since the phase response of the digital filter has no audible impact on the resulting convolved signal with the white noise excitation – and the filter’s group delay can be neglected for stationary excitation signals – an IIR filter structure is adequate for low-order, feature preservation and minimum storage requirements. The power spectral density (PSD) of a p-th-order autoregressive process is [21]

(10)

with a and b being the IIR filter coefficients resulting from solving the Yule-Walker equations by means of the Levinson-Durbin recursion. Another advantage is that this method inherently produces a stable model. The PSD at the output of the IIR filter is given by the magnitude-squared of its frequency response multiplied by the variance of the white Gaussian noise input.

3.3.1 Noise synthesis

The high-frequency noise is computationally efficient synthesized by using cascaded biquads: Therefore, the AR model transfer function is expanded into a series second-order sections (SOS) form (Fig. 10). As result, the overall IIR filter block consists of a cascaded chain of SOS, each of those being a limited number of coefficients from the solution of equation (10). Details on the algorithmic procedure are described in [22]. Note, depending on the auralization framework, the conversion to an alternative form by means of a parametric equalizer with high-pass, low-pass, parametric or shelving filter elements defined by its parameters frequency f, gain G and quality factor Q might be preferred.

thumbnail Figure 10

IIR filter structure consisting of n cascaded second order sections.

The number of SOS elements depends on the coefficients to be handled. The required number is assessed by comparison of the drone’s averaged near-field radiation spectra with synthesized noise spectra (Fig. 11). At order 40, a sufficient approximation – that is indistinguishable in its audibility – of the original spectrum is achieved.3 The AR parameters obtained by the Yule-Walker method inherently computes a stable all-pole model.

thumbnail Figure 11

Comparison of the emitted high-frequency noise and the cascaded SOS filter responses of different order in the frequency range 2.5 kHz < f < 20 kHz.

4 Directivity model

The directivity is computed from flyover data (see video in the Supplemental data) by using a time-variant wave backpropagation method. It compensates for wave propagation effects as well as non-ideal, directional properties of the measurement microphones. Since most trajectories of drone auralizations refer to flight altitudes at ear level or above head height, the directivity is measured only in the lower hemisphere. In the following, a homogeneous atmosphere is assumed so that the path length of the direct wave can be determined along straight lines. Furthermore, the method assumes a rigid floor.

4.1 Measurement uncertainty from positioning inaccuracies

As stated earlier, the flight speed and altitude were controlled by using the drone’s optical sensor-assisted automation. It is assumed that it can still lead to deviating positions due to a non-ideal control system. As the method described below assumes for each time step the acoustic wave propagation effects between two clearly defined positions of the sound source and the receiver, the averages of n = 10 measurement repetitions are evaluated. The averaging result for the recorded spectrum at microphone channel 1 are shown in Figure 12. It indicates the standard deviation to be within ±2 dB for the lower frequency range 125 Hz < f and within ±1 dB for the frequency range above.

thumbnail Figure 12

Spectral mean and standard deviation at microphone channel 1 for n = 10 flyover measurement repetitions.

4.2 Time-variant wave backpropagation method

4.2.1 Compensation of microphone directivity

Since the wave backpropagation method requires to compensate for every spectral influence on the originally emitted wave spectrum, at first the properties of the electroacoustical measurement chain must be equalized. Since the measurement microphones are placed at fixed positions on the ground of the hemi-anechoic chamber (c.f. Fig. 4) with a fixed orientation, the emitted wave reaches the microphones from time-variant incidence angles. As shown in the following, even high-quality measurement microphones are not perfectly omnidirectional receivers and must be compensated as well in such a use case. Based on the standardized measurement method acc. to IEC 61094-8 for the determination of a microphone’s free-field sensitivity [23], the angle-dependent transfer functions of all eight microphone used in the measurement setup were individually measured in the frontal hemisphere on an equiangular sampling grid with a resolution of 10°.

As an example, Figure 13 shows the mean transfer function (over all incidence angles) for the NTI M2230 measurement microphone as dark blue colored line in the frequency range 1 kHz <f<20 kHz as well as the standard deviation over all measurement points (transparent blue shaded area). In this plot, a perfectly omnidirectional receiver would appear as a spectrally flat transfer function at 0 dB. However, the result shows for off-axis positions the directivity pattern to be omnidirectional only in the frequency range f < 2 kHz. For higher frequencies f > 2 kHz the measurement microphone is on average less sensitive (lower mean) and more directional (wider spread of transparent blue shaded area). A mean attenuation (over all directions) of 4 dB can be observed at 20 kHz. This high frequency attenuation is explained due to diffraction at the microphone’s cylindrical housing. The same trend applies to all microphones in the measurement setup and is individually equalized in the measured data.

thumbnail Figure 13

Off-axis frequency response (Elevation range 0° < θ < 45°) of the NTI M2230 microphone. Free-field sensitivity measurement acc. to IEC 61094-8 [23].

4.2.2 Compensation of wave propagation effects

The aforementioned sound propagation is calculated according to the relevant effects to be considered by ISO 9613-2 [24]. Each effect individually contributes to the overall transfer function between the source (drone) and receiver (microphones) positions. Generally, for the inversion, time-variant transfer functions are applied and therefore are calculated for the discretized trajectory of the moving source. The Doppler effect in the recorded data was compensated for each individual channel by applying dynamic resampling based on the speed information. Neglecting any attenuation due to diffraction since no barriers were placed between source and receiver, atmospheric inhomogeneities or reflections, the remaining partial contributions are added to calculate the total attenuation in decibels according to the standard as:

(11)

with Adiv being the attenuation due to spherical spreading, Aatm due to air absorption and Agr due to the ground effect. Due to the inverse-distance law, the free-field sound pressure is frequency-independent and just depends on the distance between sound source and receiver which is represented as Adiv in equation (11). Since the measurement microphones are placed directly at the sound hard surface of the hemi-anechoic chamber, Agr increases the emitted sound pressure by a factor of 2 (or +6 dB) compared to a free-field measurement. The air absorption Aatm depends on frequency and is calculated by integration over the absorption coefficient α [25]. For the dimensions of the hemi-anechoic chamber measurement setup with a maximum lateral distance of 5 m, the resulting distance-dependent air absorption Aatm is shown in Figure 14.

thumbnail Figure 14

Spectrum of distance-dependent air absorption Aatm for on-site measurement range up to 5 m based on the air absorption coefficient α for the measurement conditions in the hemi-anechoic chamber, calculated acc. to [25].

4.3 Radiation in the median and frontal plane

The median plane (x-z plane in Fig. 4) directivity spectra (Fig. 15) show the tonal component to be more prominent for smaller angles, near to the x-y plane. As observed in other publications (e.g. [11, 13]), a trend towards a vertical dipole radiation characteristics can be observed in a broad frequency range. However, this dipole behavior cannot be generalized to every frequency and may cause an audible difference in auralization. Moreover, the dipole shows a rather broad lobe to the lower hemisphere. An audible frequency shift of the spectral notch in the frequency range 2.5 kHz < f < 5 kHz can be observed towards lower frequencies for larger elevation angles. This notch in turn leads to a quadrupole directivity.

thumbnail Figure 15

Median plane directivity spectra (x-z plane) for azimuthal angle ϕ = 0° and elevation range 15° < θ < 85° (sliced into ±5° segments around the indicated middle value).

The frontal plane (y-z plane in Fig. 4) directivity (Fig. 16) shows the resulting spectra from each of the microphones. In this data, the spectral notch is only visible in the median plane emission angle ϕ = 0°. The spectral notch is likely originating from a fixed phase relation between the the front and rear axis propellers, since no ground or Doppler effect are apparent in the post-processed data. Again, a broad lobe of the vertical dipole is visible.

thumbnail Figure 16

Frontal plane directivity spectra (y-z plane) in the elevation range 20° < θ < 90°.

5 Synthesis results

The spectrogram in Figure 17 visually compares the tonal component syntheses of the drone auralization model without modulations (in the time range 0–6 s) and with modulations (in the time range 6–12 s). Each oscillator’s amplitudes are matched in both cases. In the following, all spectrogram are computed with an FFT block length of 8192 samples, 50 % overlap, Hanning window, and 96 kHz sampling frequency. For audible comparison, the according auralization result is provided in the supplemental files.

thumbnail Figure 17

Synthesis result: Spectrogram of tonal component synthesis without (left) and with (right) modulations.

Focusing on the omni-directional emission signal, i.e. without the directivity pattern, the spectrograms of the reference recording and the synthesized emission from the presented drone auralization model are compared in Figure 18.4 The perceptual similarity is objectively evaluated by means of the psychoacoustic measures sharpness and tonality in Table 1. Their computation is based on the reference recording and the synthesis signals for the hovering maneuver. The algorithms are considering the signal’s spectro-temporal properties and their effects under consideration of a non-linear hearing model. In this sense, loudness can be understood as the aurally correct counterpart to level.

thumbnail Figure 18

Spectrogram of the reference recording (left) and of the synthesized emission (right) in the frequency range f < 12 kHz.

Table 1

Comparison of drone reference recording and the auralization model synthesis by means of calculated psychoacoustic measures.

Since a drone fly-over is a rather slowly changing event regarding its spectral content, the 5% percentile loudness values (N5) in this study were calculated according to the ANSI S3.4-2007 standard. This algorithm is based on a model by Glasberg and Moore [26] aiming at the loudness determination of stationary signals. Unlike the level description in decibels, loudness values on the sone-scale can intuitively be compared so that the ratio of two sone-values correspond to their subjectively perceived ratio. Since the tonal component is a main contribution to drone noise, the psychoacoustic parameter sharpness is used for evaluation. It is closely inverse related to sensory pleasantness and reflects the ratio of high frequency and low frequency contributions in a sound. In order to account for the relationship between the sensation of sharpness and loudness for technical sounds, this study is based on the free-field calculation method according to Aures [27]. The tonality is separately calculated according to ECMA-418-2 [28].

The auralization model synthesis shows its largest percentual deviation regarding its tonality reproduction, being about 1% more tonal. Slightly overstimating the reference, the model produces a 0.25% stronger loudness. Showing a very similar sharpness, the synthesis is about 0.5% sharper. Finally, these differences are not audible since they are below the just noticeable differences for loudness (7%), sharpness (≈0.05 acum), and tonality (≈0.05 tuHMS).

6 Conclusion and outlook

This paper expands the view of previous publications on drone auralization by focusing beyond the analysis of sound emissions to present a complete synthesis model including the oscillator and digital filter coefficients for procedural audio synthesis. The auralization model aims to reproduce the spectro-temporal and spatial characteristics of recorded drone sound emission data during flight in cruise mode with constant altitude. The general model structure consists of two blocks, an emission signal in form of a synthesized audio stream and a directivity as a set of angle-dependent magnitude spectra. Both components consider a combined level calibration with regard to the reconstructed sound pressure of the surface data. For directivity modeling, the radiated emission to the lower hemisphere in an angular range 0° < ϕ < 180° and 20° < θ < 90° is reconstructed on a virtual spherical surface with a radius of 1 m around the drone’s center. The reconstruction is computed from flyover array measurements using a time-variant wave backpropagation. In the case of drones with contra-rotating propellers, a symmetry of sound emission along the vertical median plane can be assumed, thus, reducing the number of microphones in the array (c.f. Fig. 4).

The synthesis model depends on an initial measurement phase. For this measurement phase, the conclusion can be drawn that a free-field recording of a drone flyover is required with the following conditions: For ensuring useful instantaneous amplitude and frequency computation from the analytic signal (c.f. Sect. 3.2.2), the recording should be free from audible rotor speed variation caused by interruptions from the automatic positioning control. Furthermore, use of the equipment’s maximum sampling rate and bit depth is recommended for an improved frequency domain resolution during postprocessing. In general, the measurement conditions play an important role when precise modeling of the tonal composition is required. Our measurements show for drone measurements in a hemi-anechoic chamber that signal-to-noise ratios of SNRHAC > 60 dB can be achieved. In cases where the tonal details of the source are not so important or a large anechoic chamber is not available, it can be concluded from experience – without presenting details in this paper – that measurements can also be made in the field, with signal-to-noise ratios between 20 dB < SNRoutdoor < 30 dB. In the second case, it is possible to determine the directivity with good accuracy. For tonal component feature extraction, however, care must be taken to ensure that prominent ambient or natural noises are not recorded.

Two effects can be observed from the directivity. First, the radiation of tonal components is more prominent for smaller angles towards the x-y plane. Second, a trend towards a vertical dipole radiation with a broad lobe can be confirmed independently, as observed in other publications (e.g. [11, 13]). However, compared to previous publications this dipole behavior cannot be generalized to every frequency, which may cause an audible difference in auralization. An audible frequency shift of a spectral notch in the frequency range 2.5 kHz < f < 5 kHz can be observed towards lower frequencies for larger elevation angles and only in the x-z symmetry plane. This notch in turn leads to a quadrupole directivity. Since no ground or Doppler effect are apparent in the post-processed data, it can be concluded that the spectral notch is originating from a fixed phase relation between the emissions from the front and rear axis rotors.

For emission signal modeling, at first the tonal and stochastic contributions to the overall sound were separated from recorded data of a steady-state hovering maneuver. The stochastic contribution was modeled by using an auto-regressive approach to extract coefficients for a digital IIR filter chain. For modeling the tonal contribution, after individual extraction of emitted tones, their statistical variation regarding frequency and amplitude modulations were analyzed. The comparison of synthesized data with recordings by means of psychoacoustic measures shows good agreement. The presented model is limited regarding the auralization of dynamic flight procedures, such as climbing or sinking maneuvers. Furthermore, the extracted oscillator and filter coefficients describe a specific drone type (DJI Mavic Pro) and may not be valid for larger drones. However, the documentation of the modeling steps are applicable for further drone types and flight maneuvers. The tonal synthesis of the drone presented in the paper is necessary for a very small vehicle, where tonal noise is clearly dominant. In heavier drones, the noise signature might be dominated by broadband noise. The drone auralization model can be applied in future research on urban soundscapes where UAVs may appear in a great variety of use cases. Since the model describes the source properties free from outdoor sound propagation effects, it can also be used as an input for auralization frameworks that specifically study the influence of different wind and weather conditions on drone noise. Furthermore, it can deliver input data for simulation tools where the spatial radiation characteristics of a drone should be included, such as the development of array-based drone detection.

Acknowledgments

This research was supported by the HEAD-Genuit Foundation under project grant number P-22/02-W. The authors would like to thank Roland Sottek for giving advice regarding psychoacoustic evaluation and Niklas Demel for piloting the drone during the measurements.

Conflicts of interest

The authors declare no conflict of interest.

Data availability statement

The research data associated with this article are included in the supplementary material of this article.

Supplementary material

SuppPubmm1: Video of drone sound emission measurement in the hemi-anechoic chamber.

SuppPubmm2: Calibrated reference recording of the hovering maneuvre in 1m distance.

SuppPubmm3: Auralization result for the drone's tonal component without and with modulations.

SuppPubmm4: Audio synthesis example for the drone's stochastic component.

SuppPubmm5: Audio synthesis of the complete drone auralization model.

Access here

SuppPubmm6: Table with amplitude and frequency modulation statistics of the tonal oscillators. Access here

SuppPubmm7: Table with digital filter coefficients for the stochastic component synthesis using a cascaded second-order sections (SOS) IIR filter. Access here


1

A video example of a fly-over measurement is provided in the supplemental files.

2

All numerical values are provided in the supplemental files.

3

The numerical discrete-time filter coefficients b and a for the IIR filter with cascaded second-order sections form are provided in the supplemental files.

4

Both audio files are provided in the supplemental materials.

References

  1. K. Eggenschwiler, K. Heutschi, A. Taghipour, R. Pieren, A. Gisladottir, B. Schäffer: Urban design of inner courtyards and road traffic noise: influence of façade characteristics and building orientation on perceived noise annoyance, Building and Environment 224 (2022) 109526. [CrossRef] [Google Scholar]
  2. J. Llorca-Bofí, C. Dreier, J. Heck, M. Vorländer: Urban sound auralization and visualization framework – case study at IHTApark, Sustainability 14, 4 (2022) 18. [Google Scholar]
  3. R. Aalmoes, M. Boer, H. Verbeek: Virtual reality aircraft noise simulation for community engagement, in: Proc. INTERNOISE, Chicago, USA, 2018, pp. 1559–1566. [Google Scholar]
  4. V. Puyana-Romero, L. Lopez-Segura, L. Maffei, R. Hernández-Molina, M. Masullo: Interactive soundscapes: 360°-video based immersive virtual reality in a tool for the participatory acoustic environment evaluation of urban areas, Acta Acustica United with Acustica 103, 4 (2017) 574–588. [CrossRef] [Google Scholar]
  5. F. Stevens, D. Murphy, S. Smith: Soundscape auralisation and visualisation: a cross-modal approach to soundscape evaluation, in: Proc. DAFx, Aveiro, Portugal, 2018, pp. 1–8. [Google Scholar]
  6. D. Lincke, T. Schumacher, R. Pieren: Synthesizing coherence loss by atmospheric turbulence in virtual microphone array signals, The Journal of the Acoustical Society of America 153, 1 (2023) 456–466. [CrossRef] [PubMed] [Google Scholar]
  7. A. Bresciani, J. Maillard, L. de Santana: Physics-based scintillations for outdoor sound auralization, The Journal of the Acoustical Society of America 154, 2 (2023) 1179–1190. [CrossRef] [PubMed] [Google Scholar]
  8. B. Schäffer, R. Pieren, K. Heutschi, J.M. Wunderli, S. Becker: Drone noise emission characteristics and noise effects on humans – a systematic review, International Journal of Environmental Research and Public Health 18, 11 (2021) 5940. [Google Scholar]
  9. N. Zawodny, A. Christian, R. Cabell: A summary of NASA research exploring the acoustics of small unmanned aerial systems, in: Proc. AHS Int. Tech. Meeting on Aeromech. Des. for Trans. Vert. Fl., San Francisco, 2018, pp. 1–11. [Google Scholar]
  10. J.M. Wunderli, J. Meister, O. Boolakee, K. Heutschi: A method to measure and model acoustic emissions of multicopters, International Journal of Environmental Research and Public Health 20, 1 (2022) 96. [CrossRef] [PubMed] [Google Scholar]
  11. G. Herold: In-flight directivity and sound power measurement of small-scale unmanned aerial systems, Acta Acustica 6, 58 (2022) 13. [CrossRef] [EDP Sciences] [Google Scholar]
  12. M. Alkmim, J. Cardenuto, E. Tengan, T. Dietzen, T. Van Waterschoot, J. Cuenca, L. De Ryck, W. Desmet: Drone noise directivity and psychoacoustic evaluation using a hemispherical microphone array, The Journal of the Acoustical Society of America 152, 5 (2022) 2735–2745. [CrossRef] [PubMed] [Google Scholar]
  13. K. Heutschi, B. Ott, T. Nussbaumer, P. Wellig: Synthesis of real world drone signals based on lab recordings, Acta Acustica 4, 6 (2020) 24. [CrossRef] [EDP Sciences] [Google Scholar]
  14. M. Lotinga, C. Ramos-Romero, N. Green, A. Torija: Noise from unconventional aircraft: a review of current measurement techniques, psychoacoustics, metrics and regulation, Current Pollution Reports 9 (2023) 724–745. [Google Scholar]
  15. R. Pieren: Auralization of environmental acoustical sceneries: synthesis of road traffic, railway and wind turbine noise. PhD thesis, Delft University of Technology, Delft, 2018. [Google Scholar]
  16. Virtual Acoustics – A real-time auralization framework for scientific research. Institute for Hearing Technology and Acoustics, RWTH Aachen University. Available at http://www.virtualacoustics.org, accessed on 2024-06-13. [Google Scholar]
  17. OpenDAFF – An open source file format for directional audio content. Institute for Hearing Technology and Acoustics, RWTH Aachen University. Available at http://www.opendaff.org, accessed on 2024-06-13. [Google Scholar]
  18. H. Fastl, E. Zwicker: Psychoacoustics: facts and models, ser. Springer Series in Information Sciences, 2007. [CrossRef] [Google Scholar]
  19. G. Ednie, D. Bird, K. Elliott: Fewer bat passes are detected during small, commercial drone flights, Scientific Reports 11 (2021) 11529. [CrossRef] [PubMed] [Google Scholar]
  20. B. Boashash: Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals, Proceedings of the IEEE 80, 4 (1992) 520–538. [Google Scholar]
  21. M. Hayes: Statistical digital signal processing and modeling, ch. 8 – Spectrum estimation, John Wiley and Sons, New York, 1996, pp. 391–492. [Google Scholar]
  22. S. Mitra: Digital signal processing: a computer-based approach, ch. 6.4.2 – Digital Filter Structures – Cascade Realizations, McGraw-Hill, New York, 2001, pp. 370–372. [Google Scholar]
  23. Measurement microphones – Part 8: Methods for determining the freefield sensitivity of working standard microphones by comparison. International Electrotechnical Commission, Geneva, CH, 2012. [Google Scholar]
  24. Acoustics – Attenuation of sound during propagation outdoors – Part 2: General method of calculation. International Organization for Standardization, Geneva, CH, 1996. [Google Scholar]
  25. H. Bass, L. Sutherland, A. Zuckerwar, D. Blackstock, D. Hester: Atmospheric absorption of sound: further developments, The Journal of the Acoustical Society of America 97, 1 (1995) 680–683. [CrossRef] [Google Scholar]
  26. B. Moore, B. Glasberg, T. Baer: A model for the prediction of thresholds, loudness, and partial loudness, Journal of the Audio Engineering Society 45, 4 (1997) 224–240. [Google Scholar]
  27. W. Aures: Der sensorische Wohlklang als Funktion psychoakustischer Empfindungsgrössen (Sensory pleasantness as a function of psychoacoustic sensations), Acustica 58 (1985) 282–290. [Google Scholar]
  28. Psychoacoustic metrics for ITT equipment – Part 2: Models based on human perception. Geneva, CH, 2022. [Google Scholar]

Cite this article as: Dreier C. & Vorländer M. 2024. Drone auralization model with statistical synthesis of amplitude and frequency modulations. Acta Acustica, 8, 35.

All Tables

Table 1

Comparison of drone reference recording and the auralization model synthesis by means of calculated psychoacoustic measures.

All Figures

thumbnail Figure 1

Schematic spectral effect in the tonal component synthesis with (blue) and without (black) modulations.

In the text
thumbnail Figure 2

Reference drone type DJI Mavic Pro.

In the text
thumbnail Figure 3

General auralization model concept: The omnidirectional component is implemented in the sound synthesizer and convolved with higher-order spatial components (directivity). This combined approach can optionally be level-calibrated with any external data from measurements or simulations.

In the text
thumbnail Figure 4

Schematic of the drone measurement setup in the hemi-anechoic chamber. The table indicates the lateral microphone distances to the fly-over trajectory and the according emission elevation angle θ.

In the text
thumbnail Figure 5

Reference drone broadband emission spectrum (125 Hz < f < 48 kHz) and sound pressure level (SPL), recorded during hovering manoeuvre at a 1 m distance at emission angle θ = 45° and ϕ = 0° under free-field conditions. A 1/24-octave band smoothing is applied for a better visual clarity.

In the text
thumbnail Figure 6

Spectra of the first 22 extracted tones (colored), compared to the originally recorded broadband spectrum. A 1/24-octave band smoothing is applied for a better visual clarity.

In the text
thumbnail Figure 7

Sound pressure (left) and instantaneous frequency (right) of the fifth extracted tone with mean frequency f5,mean = 358.75 Hz.

In the text
thumbnail Figure 8

Instantaneous tonal amplitude and frequency fluctuation in the far-field sound measurement. Coloration of each tone acc. to Figure 6.

In the text
thumbnail Figure 9

Instantaneous amplitude and frequency variance in the tonal component. The frequency variance is exaggerated in the plot by a factor of 10 for a better visual clarity.

In the text
thumbnail Figure 10

IIR filter structure consisting of n cascaded second order sections.

In the text
thumbnail Figure 11

Comparison of the emitted high-frequency noise and the cascaded SOS filter responses of different order in the frequency range 2.5 kHz < f < 20 kHz.

In the text
thumbnail Figure 12

Spectral mean and standard deviation at microphone channel 1 for n = 10 flyover measurement repetitions.

In the text
thumbnail Figure 13

Off-axis frequency response (Elevation range 0° < θ < 45°) of the NTI M2230 microphone. Free-field sensitivity measurement acc. to IEC 61094-8 [23].

In the text
thumbnail Figure 14

Spectrum of distance-dependent air absorption Aatm for on-site measurement range up to 5 m based on the air absorption coefficient α for the measurement conditions in the hemi-anechoic chamber, calculated acc. to [25].

In the text
thumbnail Figure 15

Median plane directivity spectra (x-z plane) for azimuthal angle ϕ = 0° and elevation range 15° < θ < 85° (sliced into ±5° segments around the indicated middle value).

In the text
thumbnail Figure 16

Frontal plane directivity spectra (y-z plane) in the elevation range 20° < θ < 90°.

In the text
thumbnail Figure 17

Synthesis result: Spectrogram of tonal component synthesis without (left) and with (right) modulations.

In the text
thumbnail Figure 18

Spectrogram of the reference recording (left) and of the synthesized emission (right) in the frequency range f < 12 kHz.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.