Synthesis of real world drone signals based on lab recordings

– There is a great interest in the generation of plausible drone signals in various applications, e.g. for auralization purposes or the compilation of training data for detection algorithms. Here, a methodology is pre-sented which synthesises realistic immission signals based on laboratory recordings and subsequent signal processing. The transformation of a lab drone signal into a virtual ﬁ eld microphone signal has to consider a constant pitch shift to adjust for the manoeuvre speci ﬁ c rotational speed and the corresponding frequency dependent emission strength correction, a random pitch shift variation to account for turbulence induced rotational speed variations in the ﬁ eld, Doppler frequency shift and time and frequency dependent amplitude adjustments according to the different propagation effects. By evaluation of lab and ﬁ eld measurements, the relevant synthesizer parameters were determined. It was found that for the investigated set of drone types, the vertical radiation characteristics can be successfully described by a generic frequency dependent directivity pattern. The proposed method is applied to different drone models with a total weight between 800 g and 3.4 kg and is discussed with respect to its abilities and limitations comparing both, recordings taken in the lab and the ﬁ eld.


Introduction
Apart from the many new and fascinating possibilities offered by small unmanned aerial vehicles (UAV)here called dronesthere is also growing concern with respect to noise annoyance or even damage. Since the professional use of drones in a commercial context, e.g. parcel drones, is still in its infancy, an estimate of the future drone density in cities is pure speculation. However, it can be assumed that the noise caused by drones will give rise to complaints as the number of applications increases. The still widespread lack of knowledge about the annoyance of drone sounds requires field experiments or listening tests in the lab to derive annoyance/exposure relations. While drone noise can be annoying [1,2], the emitted sound on the other hand opens up the possibility to detect and identify drones by their acoustic signature [3].
In both applications, be it the auralisation of a drone flight in the lab or the build-up of a database to train deep learning algorithms to detect drones, realistic audio signals for a ground based receiver position are needed. While standard audio recordings only cover one or a few specific scenarios, an approach with virtual microphone signals that are synthetically generated appears more appealing. This allows for arbitrary variation of drone type, flight path, receiver geometry and ambient noise. The synthesis of a virtual microphone signal splits into two major tasks. The first one is the generation of the emission signal radiated by the drone, the second one is the simulation of the propagation effects by time-varying digital filters. The noise radiated by a rotating propeller is composed of strong tonal components [4] related to the rotational speed and a broadband signal. A fundamental decision has to be made on how this emission signal is obtained. The most versatile approach would be a synthesis from scratch based on the physics of the sound generation mechanism. It has been shown that computational fluid dynamics (CFD) simulations allow to predict the amplitudes of the tonal components with reasonable accuracy [5]. However these simulations are somewhat idealised as they do not consider broad band rotor noise and motor noise and also ignore interactions between rotors and between rotor and the drone body that has been shown to be important [6]. Moreover, during operation, propeller imperfections can also lead to changes in the acoustic signature [7], which can hardly be reproduced with theoretical models. In our context, an exact acoustical signature is considered essential. Therefore sound pressure recordings in the lab are used to derive an appropriate initial emission signal that is then modified to simulate the radiation and propagation to the virtual microphone. Section 2 of this paper reports recordings of drone signals taken in the lab. The experiments were conducted on a set of drones listed in Table 1. In Section 3, recordings in the field are evaluated in order to derive flight manoeuvre dependent rotor speeds and their random fluctuations as a function of wind speed. Section 4 describes the signal processing aspects of the virtual microphone signals. It will be shown that the recorded emission signal has to be resampled to account for the desired manoeuvre specific rotor speed and then filtered to mimic the various propagation effects. With the audio recording as a starting point, the synthesis of the virtual microphone signal depends on the following input parameters: drone type manoeuvre type flight path of the drone microphone location ground type wind speed and direction Sections 5 and 6 finally compare virtual and real microphone signals and discuss limitations and further improvements of the synthesis.
2 Recordings of drone sounds in the lab

Set-up
The acoustic emission of a drone is most easily observed in the anechoic chamber [8,9]. In our case, the recordings were taken in a semi-anechoic room with a rigid floor and highly absorbing walls and ceiling. The lab is specified as highly absorptive down to 100 Hz and has dimensions of 8 m Â 5 m Â 3.5 m. In order to suppress the ground reflection, the floor was covered temporarily with a foam layer of 20 cm thickness. In the experiments, the drone was operated either attached to a tripodthat is in a fixed position or in hover condition, in those cases where the drone sensor system allowed for an accurate and stable positioning in space. For the fixed drones, the rotational speed of the propellers (rpm) was varied directly by adjusting vertical thrust. The drones in hover were operated with different payloads attached, in order to vary the propeller rpm.
The acoustic radiation of a drone can be assumed angle independent in the horizontal rotor plane [10,11]. In the vertical direction, on the other hand, a pronounced directivity is expected. To capture the elevation angle-dependent radiation strength, a multi-microphone arrangement was set-up ( Fig. 1).
Special care had to be taken to suppress the wind induced microphone noise generated by the downwash of the rotors. To this end, the lowest microphone M1 was covered by a special wind screen (Rycote, model 086014). The substantial high frequency excess attenuation introduced by the wind screen was first measured (Fig. 2) and then compensated for in the subsequent analysis of the data.

Generic vertical directivity pattern
An evaluation of the microphone signals in the different elevation directions and relative to the chosen reference microphone M3 (radiation angle of À30°with respect to   the rotor plane) is shown in Figure 3. The average and the standard deviations are to be understood over all the drone types and all operating conditions. For the subsequent considerations, a generic vertical directivity pattern with weakest radiation in the horizontal plane and a symmetrical continuous increase in radiation strength for positive and negative elevation angles is assumed. In order to easily apply the effect of the radiation pattern to an emission audio signal, a filter representation was chosen as a model. The angle-and frequency dependent average directivity pattern from Figure 3 can be modeled by a second order high-shelving filter [12] with corner frequency f c = 500 Hz, Q = 0.5 and a radiation angle-dependent amplification G [dB] according to equation (1): h [À90°. . .+90°] is from a drone perspective the radiation angle with respect to the drone horizontal plane. The amplification G has been normalised to 0 dB for a radiation angle h = ±30°. The radiation polar plot is shown in Figure 4.

Rotational speed dependent emission models
The transformation of an audio signal of a drone operating at a reference rotational speed into the signal at a different rpm has to consider a suitable resampling as well as a frequency dependent amplification. The amplification is described here by a third-octave band equalizer and forms the actual emission model. Exemplarily, Figure 5 shows for the DJI Mavic 2 Pro drone the 1 kHz band of the equalizer amplification evaluated for seven measurements at various rotational speeds with respect to the reference signal at an rpm of 6540.
The relation between the equalizer setting E(i) in dB and the rotational speed R can reasonably well be described by a linear model: where S(i) is the slope parameter for the third octave band i and R ref is the reference rotational speed, chosen   here according to Table 2. The baseline third-octave band spectra at R ref are shown in Figure 6, the slope parameters are given in Table 3. For the drones Mavic 2 Pro and Inspire 2, seven measurements with different rpms were available to adjust the model. In case of the drones F-450 and S-900, the model fit is based on two measurements only. Where meaningful, the coefficient of determination R 2 is also listed in Table 3.

Flight manoeuver dependent rotor speeds
At the occasion of two measurement campaigns during one day in 2018 in Thun, Switzerland and four days 2019 in Felixdorf, Austria, the drones were operated in the field. Audio recordings were taken at two fixed microphone positions, one at a height of 1.2 m above ground and the other flush mounted on the ground. A third on-board microphone was attached to the drone by a 0.8 m long cable to capture the emission (Fig. 7). In addition, the position of the drone during the different flight procedures was logged with help of an on-board GPS tracker.
Based on spectral representations of the on-board recordings, the fundamental frequency f 0 was evaluated for the different flight manoeuvres. All drone models had two-blade propellers so that the time T for a complete revolution of the rotor is T = 2/f 0 and the rotational speed in rpm is obtained as R = 30f 0 . The velocities of the horizontal flights were evaluated as air speeds by subtracting the corresponding wind speed component from the velocity derived from GPS. Table 4 shows the findings.  Note that the slope parameter of the F-4 drone is valid for rotational speeds > 6200 r/min only.

Introduction
The operation of a drone in an inhomogeneous wind field requires a control mechanism that automatically compensates for changes in lifting force. This results in a random variation of the rotational speed of each rotor. In order to identify the magnitude and time scale of these fluctuations, repeated experiments with a DJI Mavic 2 Pro drone, equipped with low-noise propellers, were performed.
For the subsequent signal synthesis, the results of these measurements will be transferred analogously to the other drone models.

Set-up
The rotational speed was derived from audio signals captured with a two-channel on-board recorder Sony PCM-A10 (Fig. 8). The two cardioid microphones were oriented towards the two rear rotors (as seen in the flight direction). Due to the proximity effect, the microphone signals exhibit a substantial low frequency amplification. However, as only the identification of the fundamental frequency is of interest, this has no effect on the subsequent evaluation.
To guarantee repeatability of the experiments, a flight route at a height of 50 m along an orthogonal cross was defined with help of way points. Each segment of 250 m length was consecutively flown (forward flight) with a speed of 8 m/s in both directions to create downwind and upwind conditions. In addition, the manoeuvres: hover (at a height of 10, 20 and 50 m above ground), climb (+4 m/s) and sink (À3 m/s) were flown. Meteorological data on temperature, atmospheric pressure and humidity was obtained from a nearby weather station, local wind speed v wind,average,2.5 m and direction was determined with help of a hand-held anemometer at 2.5 m above ground.

Evaluation of the rotational speed
The estimation of the fundamental frequency f 0 with a temporal resolution of 1 ms was obtained in the time domain. A frame with a total length of 10 periods centered at the time of interest (5 periods before and 5 periods after) was copied and shifted until a local maximum was reached in the autocorrelation function. This delay was then interpreted as one period of f 0 and converted into the momentary rotational speed R = 30f 0 . Based on a priori knowledge of the order of magnitude of f 0 , the search range could be narrowed down to avoid octave errors.
As a plausibility check, each new estimate R[n + 1] was compared to the old one R[n]. Based on a power considera-

Magnitude of the rotational speed variation
The variation of the rotational speed during one manoeuvre and flight was evaluated as normalised standard deviation of the rotational speed: r n = r(R[n])/R average . In combination with the average wind speed v wind,average,2.5 m for that flight, a data pair was obtained to finally derive a linear relation between r n and v (Eq. (4)): For the manoeuvre forward flight, the wind was categorised as downwind whenever the angle between the wind flow and the flight direction was smaller than 90°and upwind whenever the angle was larger than 90°. In these two categories, the full wind speed was considered in the subsequent analysis. The drone flights were performed 18 times under different meteorological conditions. The average wind speed at 2.5 m varied between 0.3 and 5.3 m/s, the temperature ranged from 4 to 23°C. Figure 12 shows the pairs r n , v wind;average;2:5m for the different manoeuvres and Table 5 lists the model parameters a and b (Eq. (4)) as well as the coefficient of determination. The comparison of the different hover manoeuvres shows an increase of the rotational speed variation with height which is in line with the expected increase of wind speed with height. The manoeuvre sink exhibits very large rpm variations, almost independent of the wind speed. The reason for this is that the drone flies into a zone of very turbulent air, produced by their own downwash. In forward flight, the rpm variation is substantially larger in upwind conditions compared to downwind.

Temporal pattern of the rotational speed variation
The evaluation of the time scale of the rotational speed variations is based on the power spectral density of the time histories. For the analysis, the DC-component of the time histories was removed and the amplitudes were scaled for a standard deviation = 1. For the comparison, the different hover operations were combined and an analogous grouping was done for the forward flights. Figure 13 shows the power spectral density for hover, climb, forward and sink. With the exception of the frequency range around 10 Hz, the curves show only a weak dependency on the manoeuver, so a generic spectrum is assumed in the synthesis.

Generation of virtual microphone signals 4.1 Emission signal
The starting point is an audio recording taken in the anechoic chamber at an angle of À30°with respect to the rotor plane whereby the drone was operated at the reference   rotational speed according to Table 2. The sampling frequency of the recording and in the subsequent signal processing was set to 48 kHz.

Adjustment of rotational speed
In order to mimic real flight conditions, a transformation of the stationary drone signal recorded in the lab has to be performed. To this end, the audio signal is resampled twice. This is achieved by introducing a time-dependent delay ÁsðtÞ where the time derivative of Ás equals f ðtÞ and f ðtÞ ¼ ðR ref À RðtÞÞ=RðtÞ with rotational speed RðtÞ at time t and reference rotational speed R ref .
A first step implements the constant frequency shift according to the average rotational speed for the specific flight manoeuvre (see Sect. 3.1). In a second step, a random variation due to the non-stationary operation of the propellers in the inhomogeneous wind field is generated (see Sect. 3.2). The required function f ðtÞ is generated for a specific manoeuvre and a given average wind speed based on a random signal with a spectrum according to Figure 13 with appropriate amplitude scaling for the normalised standard deviation r n (Eq. (4), Tab. 5).
The resampling process requires suitable interpolation as access to samples at fractional delays is required. Here, a Lagrange-Interpolation [13] is used to determine the emission sample e at arbitrary time n þ s where n is an integer and s is the fraction as

Amplitude equalisation
The amplitude equalisation implements the necessary additional spectrum adjustment after resampling of the laboratory recording to convert it to the emission signal for the specific flight manoeuvre. Table 4 shows the manoeuvre specific rotational speeds, Table 2 lists the rotational speeds of the lab recordings. The amplification EðiÞ [dB] to be applied in the third-octave band i is determined with equation (2) and the parameter setting from Table 3.

Propagation filtering
The propagation filter mimics the effects: radiation directivity geometrical spreading Doppler frequency shift atmospheric absorption ground effect amplitude modulations due to turbulences  As the source is moving, the propagation filtering is time-variant according to the changing geometry of source and receiver and reflecting objects.

Radiation directivity
For a specific source-receiver geometry, the radiation directivity (Sect. 2.2) is considered by applying the appropriate high-shelving filter to the emission signal.

Geometrical spreading and Doppler frequency shift
Geometrical spreading is modeled as a frequency independent amplitude scaling with a factor s ¼ d ref =d where d is the actual distance and d ref is the distance of the recording = 1.5 m. Doppler frequency shift is the result of the motion of the source relative to the receiver and thus a time dependent sound propagation delay. This is implemented by a time-dependent mapping of the emission time axis onto the receiver time axis.

Ground effect
In addition to the direct path, sound is reflected at the ground. The two contributions superimpose and form an interference pattern. The amplitude and phase of the ground reflected wave with respect to the direct sound depends on the geometry and the ground type. Here, a characterization based on the airflow resistivity r ground is used with r ground = 20 000 kPa s/m 2 for asphalt surfaces, 5000 kPa s/m 2 for compact soil and 300 kPa s/m 2 for grass. With help of the empirical Delany-Bazley model [14], a frequency-dependent surface impedance Zðf Þ is determined and finally the spherical wave reflection coefficient Qðf Þ is calculated [15]. Qðf Þ is modeled by a finite impulse response (FIR) filter with 100 taps. This concept can also be applied to consider reflections at other surfaces, such as building facades with their corresponding flow resistivities.

Air absorption
Air absorption considers the additional frequencydependent attenuation as the sound wave travels through air. The standard ISO 9613-1 [16] offers a set of formulas to calculate temperature and humidity dependent spectral air absorption that is finally represented by a linear phase FIR filter with 50 taps.

Turbulence effects
As a consequence of temporal and local inhomogeneities of the atmosphere, the amplitude of a sound wave at a receiver location in distance d varies randomly [17]. Following reference [18] this phenomenon is modeled by a highshelving filter with Q ¼ 0:52 and f c ¼ 10 000= ffiffiffiffiffiffiffiffiffi ffi d½m p [12] ( Fig. 14). The filter amplification G expressed in dB is steered by a fourth-order 2 Hz low-pass filtered white noise signal with a standard deviation of 1 dB.

Background noise
The last step in the generation of the virtual microphone signal is the superposition of any desired background noise. Calibrated recordings or synthesised environmental sounds are suitable for this purpose.

Virtual vs. real microphone signals
As an example, Figure 15 shows spectrograms of a recording taken at 1.2 m above ground during the 2018 campaign and the corresponding synthesis based on the  lab recordings. The drone was a DJI F-450 in forward flight with a speed of 8 m/s at a height of 10 m above ground. Background noise that was added to the synthesis was recorded during a non-flight period.
The overall picture is dominated by a time-dependent interference pattern due to the superposition of direct and ground reflected sound. The comparison of the recording and the synthesis confirms an accurate simulation of the ground effect. The Doppler frequency shift of the 8 kHz component in the recording can also been seen in the synthesis, however, down-shifted and randomly varied in frequency due to an adjustment of the rotational speed. Also the air absorption induced attenuation of the high frequencies before and after the time of shortest distance in the synthesis is in line with the recording. The comparison demonstrates the principal validity of the synthesis method and the capability to correctly model the relevant propagation effects.
The synthesis approach in its present state creates very plausible signals; however, it still has some limitations: The synthesis considers the manoeuvre and the geometry of the flight path of the drone but ignores flight dynamics. Non-stationary operating conditions such as e.g. a transition from hover to forward flight can not be modeled yet. So far both, the constant and random pitch shift processes modify the full emission signal, assuming that the frequency of each signal component scales with the rotational speed. As the high-frequency tonal component (likely the electric power supply) in Figure 15 shows, this is not necessarily the case. The emission signal captured in the lab is composed of the superposition of all rotors. Consequently, a rotorindividual adjustment of the rotational speed is not possible.

Conclusions
The proposed approach to create virtual microphone signals of flying drones allows to efficiently (that means without having to record every signal separately) synthesize a large set of very plausible audio data to train machine learning algorithms and for auralisation purposes. The synthesis process is composed of an emission signal generation and a propagation filtering step. Here, the emission signal is based on a lab recording that is subsequently manipulated to simulate a manoeuvre dependent rotational speed of the rotors and non-stationary conditions in the field. The lab recordings can be obtained quickly and inexpensively, so that further drone models or variants with e.g. modified propellers [19] can be added to a collection with only little effort. So far, the necessary resampling process performs a broad-band pitch-shift that ignores rotor specific characteristics. A decoding of the emission signal into individual components and subsequent signal synthesis [20] would introduce more flexibility to more specifically vary the pitch. Further refinement of the synthesizer could be achieved by a breakdown of total emission into individual rotor signals. This would allow for a rotor-individual random variation of rotational speed. However, care must be taken not to loose the interaction phenomena between the rotors [21]. The propagation effects that transform the emission into a signal at a stationary receiver are physically well understood and can be implemented efficiently by digital filters. Greater effort must be made to include shielding and reflection effects in urban areas.