Issue 
Acta Acust.
Volume 8, 2024



Article Number  35  
Number of page(s)  13  
Section  Virtual Acoustics  
DOI  https://doi.org/10.1051/aacus/2024026  
Published online  13 September 2024 
Technical & Applied Article
Drone auralization model with statistical synthesis of amplitude and frequency modulations
Institute for Hearing Technology and Acoustics, RWTH Aachen University, Kopernikusstraße 5, 52074 Aachen, Germany
^{*} Corresponding author: christian.dreier@akustik.rwthaachen.de
Received:
7
May
2024
Accepted:
13
June
2024
This paper presents a drone auralization model that reproduces the spectrotemporal and spatial characteristics of a drone during flight. Focusing on perceptual plausibility, the timevariant processes are modeled by taking into account the statistical amplitude and frequency modulation distributions of a reference drone sound. For completeness, the farfield directivity is extracted based on timevariant wave backpropagation from microphone array signals. Both components consider a combined level calibration with regard to the reconstructed sound pressure on a spherical surface around the source. With regard to reproducibility, this paper is accompanied by supplemental data to present a synthesis model including the oscillator and digital filter coefficients for procedural audio synthesis. From evaluation, the model shows good agreement by comparison of psychoacoustic measures of the synthesized drone to a recorded reference. The drone auralization model can be applied in future research on urban soundscapes where Unmanned Aerial Vehicles (UAV) may appear in a great variety of use cases. Furthermore, it can deliver input data for simulation tools where the spatial radiation characteristics of a drone should be included, such as the development of arraybased drone detection.
Key words: Auralization / Drone sound synthesis / Drone directivity / Modulation / Timevariant wave backpropagation
© The Author(s), Published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
1.1 Background
Beyond the scope of the classical noise mapping, auralization can be applied to acousticallyfocused urban planning [1, 2], to compare acoustic prototypes in listening experiments or for physicsbased sound field rendering at virtual microphone arrays. These auralization frameworks can be visually enhanced with headmounted displays by using virtual reality (VR) methods [3–5]. To enhance the features of a simple audio recording to a multidimensional representation, auralization models aim to reproduce the spatial and spectrotemporal radiation characteristics of realworld sound sources. Modelbased syntheses are especially challenging to produce when perceptual plausibility is a quality criterion. For perceptually plausible auralizations, the consideration of fluctuations in the sound field is crucial, regardless of whether they are generated at the source itself or during sound propagation. In the latter case, this has been shown, for example, in the case of aircraft noise [6] or wind turbine noise [7]. In the case of drones, plausible synthesis is especially critical with regard to timevariant amplitude and frequency modulations of the tonal component. These emanate, on the one hand, from the speed control of the electric motors and, on the other hand, from advection and refraction effects due to aerodynamic turbulence in the rotor surroundings. This paper presents a drone auralization model accounting for the statistical amplitude and frequency modulation distributions of the tonal component. For model completeness, the drone directivity is computed by using a timevariant wave backpropagation method that is compensating for wave propagation effects as well as nonideal, directional properties of the measurement microphones.
General drone noise emission characteristics have been widely investigated in a large number of publications, c.f. [8, 9]. In most of the publications, microphone array signals are processed to compute onethird octave band spectra and the emitted sound power. An important finding is that the directivity pattern of drones can be assumed as independent of the rotational speed of the rotors and of the flight procedure [10]. By using a comprehensive circular microphone array, Herold [11] measured the inflight sound power and directivity of a drone in an anechoic chamber and compared his results to analytical monopoles and dipoles. Furthermore, he reconstructed the flight path using a multidimensional beamforming analysis. Alkmim et al. [12] presented a spherical harmonics decomposition of drone noise emission based on static hemispherical array data in order to predict farfield sound pressure levels. In both studies, no compensation of sound propagation effects due to atmospheric absorption or of the microphone directivity were reported. Heutschi et al. [13] presented a drone auralization model with focus on a parameterization of the rotational speed dependency and derived a generic directivity. For auralization, this model parameters can be used in combination with an anechoic source signal recording from a hovering maneuver. Focusing on psychoacoustic noise assessments in the field of advanced air mobility (AAM), Lotinga et al. [14] found in a review paper sharpness and tonality to more accurately describe drone sound emission compared to levelbased noise metrics regarding its comparability in terms of annoyance ratings.
1.2 Contribution of this work
Beyond previous work, this paper presents a drone auralization model focusing on perceptual plausibility by statistically modeling the amplitude and frequency modulation distributions in the drone’s tonal sound emission. Both elements are calibrated by using a power matching approach so that the spectrotemporal source characteristics of the reference drone is fully replicated and optionally leveladapted to measurement or simulationbased data of other drones (Fig. 3). A common approach of source signal synthesis for auralizations is to discretize the overall signal mixture into a procedural audio synthesis of tonal components and subtractive synthesis of white noise for representing stochastic components, c.f. [15]. The basic idea for statistically modeling the modulations in the drone’s tonal sound emission is sketched in Figure 1. A simple solution for auralizing the tonal component would be based on amplitude estimates A_{N} at N discrete frequencies from a spectrum of a drone recording. With the drawback of sounding unnatural and artificial, the synthesis would reproduce the amplitudes of purely monofrequent sinusoidal oscillators by using additive synthesis (vertical black lines). For enhanced plausibility, the presented work focuses on the statistical modeling of the instantaneous amplitudes and frequency variations (blue Gaussian distributions, modulation variance σ^{2}) to include the characteristic “bee” sound of a drone in auralization. In conclusion, the core of this paper comprises a method for estimating amplitude and frequency variations of drone noise tonal components from measurement in a controlled environment, a method for estimating drone noise emission directivity by timevariant wave backpropagation, and a validation of the procedural audio synthesis method including tonal and broadband components by comparing psychoacoustic metrics obtained on the recorded and auralized samples for the hovering drone.
Figure 1 Schematic spectral effect in the tonal component synthesis with (blue) and without (black) modulations. 
This paper is structured as follows: Section 2 introduces the modeling concept and presents details on the hemianechoic measurement setup. In Section 3, the analysis and synthesis techniques for both drone sound emission components are presented. Section 4 contains technical aspects on the timevariant wave backpropagation and the resulting drone directivity. Notable aspects regarding the psychoacoustical quality of the synthesis result are presented in Section 5. Finally, the results are discussed and an outlook is given in Section 6.
2 Source characterization of drone sound emission
The acoustic source characterization of a reference drone (DJI Mavic Pro, shown in Fig. 2) is based on measurements in the hemianechoic chamber of the Institute for Hearing Technology and Acoustics (IHTA, RWTH Aachen University, Germany).
Figure 2 Reference drone type DJI Mavic Pro. 
2.1 Modeling concept
The general goal of source characterization as basis for use in auralization is the description of a sound source operating under freefield conditions, i.e. excluding any influences from room reflections or nonabsorbing boundaries. Note, although a drone radiates considerable amount of sound energy in the ultrasonic frequency range (Fig. 5), these frequencies are not considered in this auralization model since they are inaudible. The auralization model aims to reproduce the spectrotemporal and spatial characteristics of recorded drone sound emission data. In this work, the model structure consists of two blocks, an emission signal in form of a synthesized audio stream and a directivity in form of angledependent magnitude spectra or impulse responses (Fig. 3). As described in more detail in Section 2.4, both components are calibrated with regard to a correct sound pressure reconstruction on a virtual spherical surface with a radius of 1 m around the drone’s center. For example, this model structure is compatible with the opensource auralization framework Virtual Acoustics (VA) [16] with additional directivity coding in the OpenDAFF format [17]. For emission signal modeling (c.f. Sect. 3), at first the tonal and stochastic contributions to the overall sound are obtained from recorded data as two separate components of a steadystate hovering maneuver. The stochastic contribution was modeled by using an autoregressive approach to extract coefficients for a digital infinite impulse response (IIR) filter. For modeling the tonal contribution, after individual extraction of emitted tones, their statistical variation regarding frequency and amplitude modulations were analyzed. For directivity modeling, the radiated emission to the lower hemisphere in an angular range 0° < ϕ < 180° and 20° < θ < 90° is reconstructed on a virtual spherical surface with a radius of 1 m around the drone’s center. In the case of drones with contrarotating propellers, a symmetry of sound emission along the vertical median plane can be assumed, thus, reducing the number of microphones in the array (c.f. Fig. 4). The reconstruction is computed from flyover array measurements using a timevariant wave backpropagation (c.f. Sect. 4).
Figure 3 General auralization model concept: The omnidirectional component is implemented in the sound synthesizer and convolved with higherorder spatial components (directivity). This combined approach can optionally be levelcalibrated with any external data from measurements or simulations. 
Figure 4 Schematic of the drone measurement setup in the hemianechoic chamber. The table indicates the lateral microphone distances to the flyover trajectory and the according emission elevation angle θ. 
2.2 Measurement setup
The frequency range of the measurement is valid down to the hemianechoic room’s cutoff frequency at about f_{cutoff} ≈ 125 Hz. The sound emission was recorded in two settings, during a hovering maneuver and a flyover. The first measurement is the basis for the extraction of the drone’s sound emission signal (Sect. 3) whereas the second measurement is postprocessed to extract the emission directivity (Sect. 4). For the flyover measurement shown in Figure 4, the drone repeatedly passes a nonuniformly spaced linear microphone array with a constant speed of v = 0.75 m/s in a flyover altitude of h = 2 m. The drone speed and altitude were controlled by using its optical sensorassisted automation.^{1} The speed was externally validated by placement of two timer microphones on the ground at both ends of a trajectory with length L = 10 m. The microphone array is orthogonally placed to one side of the flight direction and symmetrically centered halfway on the trajectory. In the case of drones with contrarotating propellers, the measurement data can be mirrored at the vertical median plane due to geometric symmetry of the drone’s fuselage. The microphones are nonuniformly spaced on the floor in geometrical continuation to their position with an angular spacing of ϕ = 10° on a hemispherical surface with radius r = 1 m around the drone’s center. The emission angles for elevation Θ and azimuth ϕ, as shown in Figure 4 are calculated by
with x being the distance travelled and d being the lateral microphone distance.
2.3 Measurement equipment
The measurement was performed by using a set of eight calibrated measurement microphones placed on the rigid floor (Fig. 4). On flyover median plane (channel 1), an NTI Audio M2230 (Class 1 acc. to IEC 61672 and ANSI S1.4) was installed, whereas omnidirectional Sennheiser KE4 microphones were installed flat on the ground on the offaxis positions (channels 2–8). All microphones were connected to an RME Octamic XTC audio interface, set to f_{s} = 96 kHz sampling rate and N_{ADC} = 24 bit depth. The microphones were equipped with windshields to avoid aeroacoustic distortion on the capsules. The atmospheric conditions were 20 °C temperature and 40% humidity.
2.4 Calibration procedure
The drone auralization model comprises an iterative calibration option directly at the source (yellow boxes in Fig. 3). The calibration procedure depends on three prerequisites:
The digital fullscale value of the source signal synthesis must be referenced to a defined sound pressure (in the unit of Pascals).
The according directivity data are stored in form of linearscaled amplitude values.
The area represented by each directivity data point on the unit sphere is known.
All requirements are fulfilled for the extracted model parameters from Section 3. The procedure is as follows: An overall calibration gain factor G_{cal} (small yellow box) is defined as the ratio of the target sound source power P_{cal} and uncalibrated sound source power P_{cal}
with P_{uncal} being calculated by
with the area weight w_{n} being the proportional fraction of each represented directivity data point on a unit sphere. The sound intensity I_{n} is computed for each directivity surface element n as
with d_{f} being the directivity’s linearly scaled amplitude factors of each frequency bin, s_{f} being a frequency bin’s amplitude of the uncalibrated emission signal and Z_{0} = ρ_{0} ⋅ c being the characteristic wave impedance.
Alternatively, as often applied in auralization frameworks, the source’s playback level can be calibrated at the very end of the auralization chain by reproducing the decibel value given in Figure 5 at a virtual receiver in the free field in 1 m distance and at emission angle θ = 45° and ϕ = 0°.
Figure 5 Reference drone broadband emission spectrum (125 Hz < f < 48 kHz) and sound pressure level (SPL), recorded during hovering manoeuvre at a 1 m distance at emission angle θ = 45° and ϕ = 0° under freefield conditions. A 1/24octave band smoothing is applied for a better visual clarity. 
3 Analysis and synthesis of drone sound emission
3.1 Broadband spectrum
The drone’s sound emission is recorded at a 2 m distance during a hovering manoeuvre in the hemianechoic chamber. The sound pressure level at a 1 m radial distance from the source can be directly calculated from the calibrated audio recording due to the doubled sound pressure on the rigid floor (compared to freefield conditions). In the broadband spectrum, shown in Figure 5, tonal and broadband noise components can be distinguished by local level peaks and a shaped noise floor.
Three different characteristic spectral ranges can be distinguished in the broadband spectrum (Fig. 5):
The frequency range f < 2.5 kHz is dominated by tonal components.
The range f > 2.5 kHz contains amplitudemodulated, shaped noise, with a nonperceptual tonal component.
A broad peak emission can be observed in the ultrasonic frequency range 35 kHz < f < 45 kHz.
Due to their clearly distinguishable characteristics, the first two components are separately synthesized in the presented drone auralization model. The presented drone auralization model assumes the broadband noise to be inaudible in the low frequency range f < 2.5 kHz and the tonal components to be inaudible in the range f > 2.5 kHz due to spectral masking considerations [18]. Note, since the ultrasonic component is inaudible for human hearing it is not considered in the presented model. However, despite high atmospheric absorption an impact of ultrasonic drone emission on bat echolocation has been critically discussed in [19].
3.2 Analysis of tonal amplitude and frequency modulations
3.2.1 Tone extraction
The broadband spectrum (Fig. 5) is a twodimensional representation of the sources spectral characteristics, omitting details on temporal variations. At first, in order to analyze the amplitude modulations (AM) and frequency modulations (FM) individually by using the analytic signal (Sect. 3.2.2), each tone needs to be extracted from the broadband spectrum using steep bandpass filters, symmetrically centered around the observed peak frequency. The recorded data of the hovering maneuver is sampled at f_{s} = 96 kHz, which is necessary to realize a filter bank of steep bandpass filters. As a rule of thumb, a filter bandwidth of B_{n} = ±0.1f_{c,n} around the center frequency f_{c} of tone n (with n ∈[1, 22]) was found necessary for a precise tone extraction. The passband ripple was chosen to 0.1 dB with a stopband attenuation of 60 dB. The filters are realized as linearphase finite impulse response (FIR) filters based on the Kaiser window method. The resulting spectra of the first 22 individual tones are plotted with colored lines in Figure 6 and compared to the broadband spectrum.
Figure 6 Spectra of the first 22 extracted tones (colored), compared to the originally recorded broadband spectrum. A 1/24octave band smoothing is applied for a better visual clarity. 
3.2.2 Instantaneous amplitude and frequency computation
For plausible drone sound synthesis, each of the drone’s tonal oscillators can be described by its instantaneous amplitude and frequency. The timevariant information of both parameters can be directly extracted from the bandpassfiltered tones by computation of the discretetime analytic signal, that in turn, is the basis for the computation of an instantaneous magnitude and phase [20]. An analytic signal is given by the complex sum of the original signal and an imaginary part equal to its Hilbert transform by
with the Hilbert transform being defined by
After bandpass filtering around each oscillator frequencies, the timevarying amplitudes for synthesis are obtained from the envelope e_{n} for all N tonal oscillators by calculation of the instantaneous magnitude of the analytic signal by
The timevariant description for instantaneous frequency of each harmonic oscillator ϕ_{N}(t) is obtained by differentiation of the analytic signal phase
As an example, it can be observed in Figure 7 that the instantaneous amplitude variation of the fifth extracted tone exhibits the pressure range ≈0.005 Pa < p_{5} < 0.033 Pa, whereas instantaneous frequency variation is in the range 352 Hz < f_{5} < 364 Hz. Assuming normal distribution, each individual tone’s statistical information about amplitude and frequency modulation rate and depth is stored in form of variance values σ^{2}.^{2} These values are directly applicable to the oscillator properties for procedural sound synthesis. The timevariant plot of the extracted harmonics (Fig. 8) shows the frequency modulations (FM) as well as amplitude modulations (AM). For excluding distortion in the statistics due to any possible driftingbased variation of the geometrical spreading loss between the measurement microphone and the noiseemitting oscillators, the zeromean modulations are computed first.
Figure 7 Sound pressure (left) and instantaneous frequency (right) of the fifth extracted tone with mean frequency f_{5,mean} = 358.75 Hz. 
Figure 8 Instantaneous tonal amplitude and frequency fluctuation in the farfield sound measurement. Coloration of each tone acc. to Figure 6. 
Since the sound is measured after travelling through the turbulent layer, each oscillator frequency is individually modulated and therefore varies independently of time. The statistical mean and variance values of the instantaneous amplitude and instantaneous frequency are shown in Figure 9 and according numerical values are provided as a separate table in the supplemental materials.
Figure 9 Instantaneous amplitude and frequency variance in the tonal component. The frequency variance is exaggerated in the plot by a factor of 10 for a better visual clarity. 
3.3 Autoregressive modeling of noise component
The spectral characteristics of the noise component are modeled in this work as an autoregressive (AR) process, whose precision is depending on the model order p. The goal from autoregressive modeling is to determine numerical parameters of a pth order allpole IIR filter that, by convolution with white noise at its input, produces a time signal with the same random process statistics as the originally recorded process. Since the phase response of the digital filter has no audible impact on the resulting convolved signal with the white noise excitation – and the filter’s group delay can be neglected for stationary excitation signals – an IIR filter structure is adequate for loworder, feature preservation and minimum storage requirements. The power spectral density (PSD) of a pthorder autoregressive process is [21]
with a and b being the IIR filter coefficients resulting from solving the YuleWalker equations by means of the LevinsonDurbin recursion. Another advantage is that this method inherently produces a stable model. The PSD at the output of the IIR filter is given by the magnitudesquared of its frequency response multiplied by the variance of the white Gaussian noise input.
3.3.1 Noise synthesis
The highfrequency noise is computationally efficient synthesized by using cascaded biquads: Therefore, the AR model transfer function is expanded into a series secondorder sections (SOS) form (Fig. 10). As result, the overall IIR filter block consists of a cascaded chain of SOS, each of those being a limited number of coefficients from the solution of equation (10). Details on the algorithmic procedure are described in [22]. Note, depending on the auralization framework, the conversion to an alternative form by means of a parametric equalizer with highpass, lowpass, parametric or shelving filter elements defined by its parameters frequency f, gain G and quality factor Q might be preferred.
Figure 10 IIR filter structure consisting of n cascaded second order sections. 
The number of SOS elements depends on the coefficients to be handled. The required number is assessed by comparison of the drone’s averaged nearfield radiation spectra with synthesized noise spectra (Fig. 11). At order 40, a sufficient approximation – that is indistinguishable in its audibility – of the original spectrum is achieved.^{3} The AR parameters obtained by the YuleWalker method inherently computes a stable allpole model.
Figure 11 Comparison of the emitted highfrequency noise and the cascaded SOS filter responses of different order in the frequency range 2.5 kHz < f < 20 kHz. 
4 Directivity model
The directivity is computed from flyover data (see video in the Supplemental data) by using a timevariant wave backpropagation method. It compensates for wave propagation effects as well as nonideal, directional properties of the measurement microphones. Since most trajectories of drone auralizations refer to flight altitudes at ear level or above head height, the directivity is measured only in the lower hemisphere. In the following, a homogeneous atmosphere is assumed so that the path length of the direct wave can be determined along straight lines. Furthermore, the method assumes a rigid floor.
4.1 Measurement uncertainty from positioning inaccuracies
As stated earlier, the flight speed and altitude were controlled by using the drone’s optical sensorassisted automation. It is assumed that it can still lead to deviating positions due to a nonideal control system. As the method described below assumes for each time step the acoustic wave propagation effects between two clearly defined positions of the sound source and the receiver, the averages of n = 10 measurement repetitions are evaluated. The averaging result for the recorded spectrum at microphone channel 1 are shown in Figure 12. It indicates the standard deviation to be within ±2 dB for the lower frequency range 125 Hz < f and within ±1 dB for the frequency range above.
Figure 12 Spectral mean and standard deviation at microphone channel 1 for n = 10 flyover measurement repetitions. 
4.2 Timevariant wave backpropagation method
4.2.1 Compensation of microphone directivity
Since the wave backpropagation method requires to compensate for every spectral influence on the originally emitted wave spectrum, at first the properties of the electroacoustical measurement chain must be equalized. Since the measurement microphones are placed at fixed positions on the ground of the hemianechoic chamber (c.f. Fig. 4) with a fixed orientation, the emitted wave reaches the microphones from timevariant incidence angles. As shown in the following, even highquality measurement microphones are not perfectly omnidirectional receivers and must be compensated as well in such a use case. Based on the standardized measurement method acc. to IEC 610948 for the determination of a microphone’s freefield sensitivity [23], the angledependent transfer functions of all eight microphone used in the measurement setup were individually measured in the frontal hemisphere on an equiangular sampling grid with a resolution of 10°.
As an example, Figure 13 shows the mean transfer function (over all incidence angles) for the NTI M2230 measurement microphone as dark blue colored line in the frequency range 1 kHz <f<20 kHz as well as the standard deviation over all measurement points (transparent blue shaded area). In this plot, a perfectly omnidirectional receiver would appear as a spectrally flat transfer function at 0 dB. However, the result shows for offaxis positions the directivity pattern to be omnidirectional only in the frequency range f < 2 kHz. For higher frequencies f > 2 kHz the measurement microphone is on average less sensitive (lower mean) and more directional (wider spread of transparent blue shaded area). A mean attenuation (over all directions) of 4 dB can be observed at 20 kHz. This high frequency attenuation is explained due to diffraction at the microphone’s cylindrical housing. The same trend applies to all microphones in the measurement setup and is individually equalized in the measured data.
Figure 13 Offaxis frequency response (Elevation range 0° < θ < 45°) of the NTI M2230 microphone. Freefield sensitivity measurement acc. to IEC 610948 [23]. 
4.2.2 Compensation of wave propagation effects
The aforementioned sound propagation is calculated according to the relevant effects to be considered by ISO 96132 [24]. Each effect individually contributes to the overall transfer function between the source (drone) and receiver (microphones) positions. Generally, for the inversion, timevariant transfer functions are applied and therefore are calculated for the discretized trajectory of the moving source. The Doppler effect in the recorded data was compensated for each individual channel by applying dynamic resampling based on the speed information. Neglecting any attenuation due to diffraction since no barriers were placed between source and receiver, atmospheric inhomogeneities or reflections, the remaining partial contributions are added to calculate the total attenuation in decibels according to the standard as:
with A_{div} being the attenuation due to spherical spreading, A_{atm} due to air absorption and A_{gr} due to the ground effect. Due to the inversedistance law, the freefield sound pressure is frequencyindependent and just depends on the distance between sound source and receiver which is represented as A_{div} in equation (11). Since the measurement microphones are placed directly at the sound hard surface of the hemianechoic chamber, A_{gr} increases the emitted sound pressure by a factor of 2 (or +6 dB) compared to a freefield measurement. The air absorption A_{atm} depends on frequency and is calculated by integration over the absorption coefficient α [25]. For the dimensions of the hemianechoic chamber measurement setup with a maximum lateral distance of 5 m, the resulting distancedependent air absorption A_{atm} is shown in Figure 14.
Figure 14 Spectrum of distancedependent air absorption A_{atm} for onsite measurement range up to 5 m based on the air absorption coefficient α for the measurement conditions in the hemianechoic chamber, calculated acc. to [25]. 
4.3 Radiation in the median and frontal plane
The median plane (xz plane in Fig. 4) directivity spectra (Fig. 15) show the tonal component to be more prominent for smaller angles, near to the xy plane. As observed in other publications (e.g. [11, 13]), a trend towards a vertical dipole radiation characteristics can be observed in a broad frequency range. However, this dipole behavior cannot be generalized to every frequency and may cause an audible difference in auralization. Moreover, the dipole shows a rather broad lobe to the lower hemisphere. An audible frequency shift of the spectral notch in the frequency range 2.5 kHz < f < 5 kHz can be observed towards lower frequencies for larger elevation angles. This notch in turn leads to a quadrupole directivity.
Figure 15 Median plane directivity spectra (xz plane) for azimuthal angle ϕ = 0° and elevation range 15° < θ < 85° (sliced into ±5° segments around the indicated middle value). 
The frontal plane (yz plane in Fig. 4) directivity (Fig. 16) shows the resulting spectra from each of the microphones. In this data, the spectral notch is only visible in the median plane emission angle ϕ = 0°. The spectral notch is likely originating from a fixed phase relation between the the front and rear axis propellers, since no ground or Doppler effect are apparent in the postprocessed data. Again, a broad lobe of the vertical dipole is visible.
Figure 16 Frontal plane directivity spectra (yz plane) in the elevation range 20° < θ < 90°. 
5 Synthesis results
The spectrogram in Figure 17 visually compares the tonal component syntheses of the drone auralization model without modulations (in the time range 0–6 s) and with modulations (in the time range 6–12 s). Each oscillator’s amplitudes are matched in both cases. In the following, all spectrogram are computed with an FFT block length of 8192 samples, 50 % overlap, Hanning window, and 96 kHz sampling frequency. For audible comparison, the according auralization result is provided in the supplemental files.
Figure 17 Synthesis result: Spectrogram of tonal component synthesis without (left) and with (right) modulations. 
Focusing on the omnidirectional emission signal, i.e. without the directivity pattern, the spectrograms of the reference recording and the synthesized emission from the presented drone auralization model are compared in Figure 18.^{4} The perceptual similarity is objectively evaluated by means of the psychoacoustic measures sharpness and tonality in Table 1. Their computation is based on the reference recording and the synthesis signals for the hovering maneuver. The algorithms are considering the signal’s spectrotemporal properties and their effects under consideration of a nonlinear hearing model. In this sense, loudness can be understood as the aurally correct counterpart to level.
Figure 18 Spectrogram of the reference recording (left) and of the synthesized emission (right) in the frequency range f < 12 kHz. 
Comparison of drone reference recording and the auralization model synthesis by means of calculated psychoacoustic measures.
Since a drone flyover is a rather slowly changing event regarding its spectral content, the 5% percentile loudness values (N5) in this study were calculated according to the ANSI S3.42007 standard. This algorithm is based on a model by Glasberg and Moore [26] aiming at the loudness determination of stationary signals. Unlike the level description in decibels, loudness values on the sonescale can intuitively be compared so that the ratio of two sonevalues correspond to their subjectively perceived ratio. Since the tonal component is a main contribution to drone noise, the psychoacoustic parameter sharpness is used for evaluation. It is closely inverse related to sensory pleasantness and reflects the ratio of high frequency and low frequency contributions in a sound. In order to account for the relationship between the sensation of sharpness and loudness for technical sounds, this study is based on the freefield calculation method according to Aures [27]. The tonality is separately calculated according to ECMA4182 [28].
The auralization model synthesis shows its largest percentual deviation regarding its tonality reproduction, being about 1% more tonal. Slightly overstimating the reference, the model produces a 0.25% stronger loudness. Showing a very similar sharpness, the synthesis is about 0.5% sharper. Finally, these differences are not audible since they are below the just noticeable differences for loudness (7%), sharpness (≈0.05 acum), and tonality (≈0.05 tu_{HMS}).
6 Conclusion and outlook
This paper expands the view of previous publications on drone auralization by focusing beyond the analysis of sound emissions to present a complete synthesis model including the oscillator and digital filter coefficients for procedural audio synthesis. The auralization model aims to reproduce the spectrotemporal and spatial characteristics of recorded drone sound emission data during flight in cruise mode with constant altitude. The general model structure consists of two blocks, an emission signal in form of a synthesized audio stream and a directivity as a set of angledependent magnitude spectra. Both components consider a combined level calibration with regard to the reconstructed sound pressure of the surface data. For directivity modeling, the radiated emission to the lower hemisphere in an angular range 0° < ϕ < 180° and 20° < θ < 90° is reconstructed on a virtual spherical surface with a radius of 1 m around the drone’s center. The reconstruction is computed from flyover array measurements using a timevariant wave backpropagation. In the case of drones with contrarotating propellers, a symmetry of sound emission along the vertical median plane can be assumed, thus, reducing the number of microphones in the array (c.f. Fig. 4).
The synthesis model depends on an initial measurement phase. For this measurement phase, the conclusion can be drawn that a freefield recording of a drone flyover is required with the following conditions: For ensuring useful instantaneous amplitude and frequency computation from the analytic signal (c.f. Sect. 3.2.2), the recording should be free from audible rotor speed variation caused by interruptions from the automatic positioning control. Furthermore, use of the equipment’s maximum sampling rate and bit depth is recommended for an improved frequency domain resolution during postprocessing. In general, the measurement conditions play an important role when precise modeling of the tonal composition is required. Our measurements show for drone measurements in a hemianechoic chamber that signaltonoise ratios of SNR_{HAC} > 60 dB can be achieved. In cases where the tonal details of the source are not so important or a large anechoic chamber is not available, it can be concluded from experience – without presenting details in this paper – that measurements can also be made in the field, with signaltonoise ratios between 20 dB < SNR_{outdoor} < 30 dB. In the second case, it is possible to determine the directivity with good accuracy. For tonal component feature extraction, however, care must be taken to ensure that prominent ambient or natural noises are not recorded.
Two effects can be observed from the directivity. First, the radiation of tonal components is more prominent for smaller angles towards the xy plane. Second, a trend towards a vertical dipole radiation with a broad lobe can be confirmed independently, as observed in other publications (e.g. [11, 13]). However, compared to previous publications this dipole behavior cannot be generalized to every frequency, which may cause an audible difference in auralization. An audible frequency shift of a spectral notch in the frequency range 2.5 kHz < f < 5 kHz can be observed towards lower frequencies for larger elevation angles and only in the xz symmetry plane. This notch in turn leads to a quadrupole directivity. Since no ground or Doppler effect are apparent in the postprocessed data, it can be concluded that the spectral notch is originating from a fixed phase relation between the emissions from the front and rear axis rotors.
For emission signal modeling, at first the tonal and stochastic contributions to the overall sound were separated from recorded data of a steadystate hovering maneuver. The stochastic contribution was modeled by using an autoregressive approach to extract coefficients for a digital IIR filter chain. For modeling the tonal contribution, after individual extraction of emitted tones, their statistical variation regarding frequency and amplitude modulations were analyzed. The comparison of synthesized data with recordings by means of psychoacoustic measures shows good agreement. The presented model is limited regarding the auralization of dynamic flight procedures, such as climbing or sinking maneuvers. Furthermore, the extracted oscillator and filter coefficients describe a specific drone type (DJI Mavic Pro) and may not be valid for larger drones. However, the documentation of the modeling steps are applicable for further drone types and flight maneuvers. The tonal synthesis of the drone presented in the paper is necessary for a very small vehicle, where tonal noise is clearly dominant. In heavier drones, the noise signature might be dominated by broadband noise. The drone auralization model can be applied in future research on urban soundscapes where UAVs may appear in a great variety of use cases. Since the model describes the source properties free from outdoor sound propagation effects, it can also be used as an input for auralization frameworks that specifically study the influence of different wind and weather conditions on drone noise. Furthermore, it can deliver input data for simulation tools where the spatial radiation characteristics of a drone should be included, such as the development of arraybased drone detection.
Acknowledgments
This research was supported by the HEADGenuit Foundation under project grant number P22/02W. The authors would like to thank Roland Sottek for giving advice regarding psychoacoustic evaluation and Niklas Demel for piloting the drone during the measurements.
Conflicts of interest
The authors declare no conflict of interest.
Data availability statement
The research data associated with this article are included in the supplementary material of this article.
Supplementary material
SuppPubmm1: Video of drone sound emission measurement in the hemianechoic chamber.
SuppPubmm2: Calibrated reference recording of the hovering maneuvre in 1m distance.
SuppPubmm3: Auralization result for the drone's tonal component without and with modulations.
SuppPubmm4: Audio synthesis example for the drone's stochastic component.
SuppPubmm5: Audio synthesis of the complete drone auralization model.
Access hereSuppPubmm6: Table with amplitude and frequency modulation statistics of the tonal oscillators. Access here
SuppPubmm7: Table with digital filter coefficients for the stochastic component synthesis using a cascaded secondorder sections (SOS) IIR filter. Access here
A video example of a flyover measurement is provided in the supplemental files.
All numerical values are provided in the supplemental files.
The numerical discretetime filter coefficients b and a for the IIR filter with cascaded secondorder sections form are provided in the supplemental files.
Both audio files are provided in the supplemental materials.
References
 K. Eggenschwiler, K. Heutschi, A. Taghipour, R. Pieren, A. Gisladottir, B. Schäffer: Urban design of inner courtyards and road traffic noise: influence of façade characteristics and building orientation on perceived noise annoyance, Building and Environment 224 (2022) 109526. [CrossRef] [Google Scholar]
 J. LlorcaBofí, C. Dreier, J. Heck, M. Vorländer: Urban sound auralization and visualization framework – case study at IHTApark, Sustainability 14, 4 (2022) 18. [Google Scholar]
 R. Aalmoes, M. Boer, H. Verbeek: Virtual reality aircraft noise simulation for community engagement, in: Proc. INTERNOISE, Chicago, USA, 2018, pp. 1559–1566. [Google Scholar]
 V. PuyanaRomero, L. LopezSegura, L. Maffei, R. HernándezMolina, M. Masullo: Interactive soundscapes: 360°video based immersive virtual reality in a tool for the participatory acoustic environment evaluation of urban areas, Acta Acustica United with Acustica 103, 4 (2017) 574–588. [CrossRef] [Google Scholar]
 F. Stevens, D. Murphy, S. Smith: Soundscape auralisation and visualisation: a crossmodal approach to soundscape evaluation, in: Proc. DAFx, Aveiro, Portugal, 2018, pp. 1–8. [Google Scholar]
 D. Lincke, T. Schumacher, R. Pieren: Synthesizing coherence loss by atmospheric turbulence in virtual microphone array signals, The Journal of the Acoustical Society of America 153, 1 (2023) 456–466. [CrossRef] [PubMed] [Google Scholar]
 A. Bresciani, J. Maillard, L. de Santana: Physicsbased scintillations for outdoor sound auralization, The Journal of the Acoustical Society of America 154, 2 (2023) 1179–1190. [CrossRef] [PubMed] [Google Scholar]
 B. Schäffer, R. Pieren, K. Heutschi, J.M. Wunderli, S. Becker: Drone noise emission characteristics and noise effects on humans – a systematic review, International Journal of Environmental Research and Public Health 18, 11 (2021) 5940. [Google Scholar]
 N. Zawodny, A. Christian, R. Cabell: A summary of NASA research exploring the acoustics of small unmanned aerial systems, in: Proc. AHS Int. Tech. Meeting on Aeromech. Des. for Trans. Vert. Fl., San Francisco, 2018, pp. 1–11. [Google Scholar]
 J.M. Wunderli, J. Meister, O. Boolakee, K. Heutschi: A method to measure and model acoustic emissions of multicopters, International Journal of Environmental Research and Public Health 20, 1 (2022) 96. [CrossRef] [PubMed] [Google Scholar]
 G. Herold: Inflight directivity and sound power measurement of smallscale unmanned aerial systems, Acta Acustica 6, 58 (2022) 13. [CrossRef] [EDP Sciences] [Google Scholar]
 M. Alkmim, J. Cardenuto, E. Tengan, T. Dietzen, T. Van Waterschoot, J. Cuenca, L. De Ryck, W. Desmet: Drone noise directivity and psychoacoustic evaluation using a hemispherical microphone array, The Journal of the Acoustical Society of America 152, 5 (2022) 2735–2745. [CrossRef] [PubMed] [Google Scholar]
 K. Heutschi, B. Ott, T. Nussbaumer, P. Wellig: Synthesis of real world drone signals based on lab recordings, Acta Acustica 4, 6 (2020) 24. [CrossRef] [EDP Sciences] [Google Scholar]
 M. Lotinga, C. RamosRomero, N. Green, A. Torija: Noise from unconventional aircraft: a review of current measurement techniques, psychoacoustics, metrics and regulation, Current Pollution Reports 9 (2023) 724–745. [Google Scholar]
 R. Pieren: Auralization of environmental acoustical sceneries: synthesis of road traffic, railway and wind turbine noise. PhD thesis, Delft University of Technology, Delft, 2018. [Google Scholar]
 Virtual Acoustics – A realtime auralization framework for scientific research. Institute for Hearing Technology and Acoustics, RWTH Aachen University. Available at http://www.virtualacoustics.org, accessed on 20240613. [Google Scholar]
 OpenDAFF – An open source file format for directional audio content. Institute for Hearing Technology and Acoustics, RWTH Aachen University. Available at http://www.opendaff.org, accessed on 20240613. [Google Scholar]
 H. Fastl, E. Zwicker: Psychoacoustics: facts and models, ser. Springer Series in Information Sciences, 2007. [CrossRef] [Google Scholar]
 G. Ednie, D. Bird, K. Elliott: Fewer bat passes are detected during small, commercial drone flights, Scientific Reports 11 (2021) 11529. [CrossRef] [PubMed] [Google Scholar]
 B. Boashash: Estimating and interpreting the instantaneous frequency of a signal. I. Fundamentals, Proceedings of the IEEE 80, 4 (1992) 520–538. [Google Scholar]
 M. Hayes: Statistical digital signal processing and modeling, ch. 8 – Spectrum estimation, John Wiley and Sons, New York, 1996, pp. 391–492. [Google Scholar]
 S. Mitra: Digital signal processing: a computerbased approach, ch. 6.4.2 – Digital Filter Structures – Cascade Realizations, McGrawHill, New York, 2001, pp. 370–372. [Google Scholar]
 Measurement microphones – Part 8: Methods for determining the freefield sensitivity of working standard microphones by comparison. International Electrotechnical Commission, Geneva, CH, 2012. [Google Scholar]
 Acoustics – Attenuation of sound during propagation outdoors – Part 2: General method of calculation. International Organization for Standardization, Geneva, CH, 1996. [Google Scholar]
 H. Bass, L. Sutherland, A. Zuckerwar, D. Blackstock, D. Hester: Atmospheric absorption of sound: further developments, The Journal of the Acoustical Society of America 97, 1 (1995) 680–683. [CrossRef] [Google Scholar]
 B. Moore, B. Glasberg, T. Baer: A model for the prediction of thresholds, loudness, and partial loudness, Journal of the Audio Engineering Society 45, 4 (1997) 224–240. [Google Scholar]
 W. Aures: Der sensorische Wohlklang als Funktion psychoakustischer Empfindungsgrössen (Sensory pleasantness as a function of psychoacoustic sensations), Acustica 58 (1985) 282–290. [Google Scholar]
 Psychoacoustic metrics for ITT equipment – Part 2: Models based on human perception. Geneva, CH, 2022. [Google Scholar]
Cite this article as: Dreier C. & Vorländer M. 2024. Drone auralization model with statistical synthesis of amplitude and frequency modulations. Acta Acustica, 8, 35.
All Tables
Comparison of drone reference recording and the auralization model synthesis by means of calculated psychoacoustic measures.
All Figures
Figure 1 Schematic spectral effect in the tonal component synthesis with (blue) and without (black) modulations. 

In the text 
Figure 2 Reference drone type DJI Mavic Pro. 

In the text 
Figure 3 General auralization model concept: The omnidirectional component is implemented in the sound synthesizer and convolved with higherorder spatial components (directivity). This combined approach can optionally be levelcalibrated with any external data from measurements or simulations. 

In the text 
Figure 4 Schematic of the drone measurement setup in the hemianechoic chamber. The table indicates the lateral microphone distances to the flyover trajectory and the according emission elevation angle θ. 

In the text 
Figure 5 Reference drone broadband emission spectrum (125 Hz < f < 48 kHz) and sound pressure level (SPL), recorded during hovering manoeuvre at a 1 m distance at emission angle θ = 45° and ϕ = 0° under freefield conditions. A 1/24octave band smoothing is applied for a better visual clarity. 

In the text 
Figure 6 Spectra of the first 22 extracted tones (colored), compared to the originally recorded broadband spectrum. A 1/24octave band smoothing is applied for a better visual clarity. 

In the text 
Figure 7 Sound pressure (left) and instantaneous frequency (right) of the fifth extracted tone with mean frequency f_{5,mean} = 358.75 Hz. 

In the text 
Figure 8 Instantaneous tonal amplitude and frequency fluctuation in the farfield sound measurement. Coloration of each tone acc. to Figure 6. 

In the text 
Figure 9 Instantaneous amplitude and frequency variance in the tonal component. The frequency variance is exaggerated in the plot by a factor of 10 for a better visual clarity. 

In the text 
Figure 10 IIR filter structure consisting of n cascaded second order sections. 

In the text 
Figure 11 Comparison of the emitted highfrequency noise and the cascaded SOS filter responses of different order in the frequency range 2.5 kHz < f < 20 kHz. 

In the text 
Figure 12 Spectral mean and standard deviation at microphone channel 1 for n = 10 flyover measurement repetitions. 

In the text 
Figure 13 Offaxis frequency response (Elevation range 0° < θ < 45°) of the NTI M2230 microphone. Freefield sensitivity measurement acc. to IEC 610948 [23]. 

In the text 
Figure 14 Spectrum of distancedependent air absorption A_{atm} for onsite measurement range up to 5 m based on the air absorption coefficient α for the measurement conditions in the hemianechoic chamber, calculated acc. to [25]. 

In the text 
Figure 15 Median plane directivity spectra (xz plane) for azimuthal angle ϕ = 0° and elevation range 15° < θ < 85° (sliced into ±5° segments around the indicated middle value). 

In the text 
Figure 16 Frontal plane directivity spectra (yz plane) in the elevation range 20° < θ < 90°. 

In the text 
Figure 17 Synthesis result: Spectrogram of tonal component synthesis without (left) and with (right) modulations. 

In the text 
Figure 18 Spectrogram of the reference recording (left) and of the synthesized emission (right) in the frequency range f < 12 kHz. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.