Issue |
Acta Acust.
Volume 9, 2025
Topical Issue - Virtual acoustics
|
|
---|---|---|
Article Number | 1 | |
Number of page(s) | 15 | |
DOI | https://doi.org/10.1051/aacus/2024073 | |
Published online | 03 January 2025 |
Scientific Article
Source-time dominant modeling of the Doppler shift for the auralization of moving sources
1
Institute of Sound Recording, University of Surrey, Stag Hill, Guildford, GU2 7XH, UK
2
Applied Acoustics Branch, NASA Langley Research Center, Building 1208, Hampton, VA 23681, USA
* Corresponding author: r.ali@surrey.ac.uk
Received:
19
February
2024
Accepted:
16
October
2024
When developing an auralization for acoustic scenarios involving moving sources and receivers, one key feature is the ability to simulate the Doppler shift, i.e., the changing frequency content from the receiver’s perspective. As the time-varying delay between a source and receiver is what accounts for the Doppler shift, an approximation of this delay is required to successfully render the changes in frequency content at the receiver. Depending on the signal-processing strategy chosen to accomplish this task, there is, however, a potential to introduce audible artifacts due to frequency folding (aliasing), frequency replication (imaging), and broadband noise. In this paper we discuss the manifestation of such artifacts and propose a method to eliminate them, which can be integrated into the digital signal processing chain of larger auralization schemes. The method is built upon a source-time dominant approach and uses a combination of oversampling, interpolation, and time-varying filtering to predict and eliminate frequency regions at the receiver that are vulnerable to aliasing and imaging. We demonstrate the strengths and weaknesses of the method using a circularly moving source with a fixed receiver.
Key words: Auralization / Signal processing / Doppler shift / Sample rate conversion / Virtual acoustics
© The Author(s), published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Auralization refers to both the signal processing chain used to synthesize and reproduce a complex acoustic scenario as well as the output of that process [1]. For the latter case, the simulated sound of the scenario can be saved as an audio file and presented to multiple listeners. In this way, auralizations can be particularly useful in understanding how different sounds affect people, and controlled experiments can be conducted where the listener is embedded in a virtual acoustic environment and variations on a target sound are presented for judgement [2, 3]. While these sorts of experiments can be done using recordings, this would necessitate a substantial recording effort for even a moderate range of design possibilities (for instance, see the flight testing campaign undertaken that generated data for a study by Boucher et al. [4, 5]). This will likely require advanced recording techniques and hardware – the use of arrayed microphones to capture spatial aspects of the source and/or recording location data (such as GPS) in synchrony with the audio in order to “put the source in motion” when back in the laboratory [6]. Most importantly, for a recording to be taken, the vehicle must already exist, which underscores a major strength of the auralization approach. For instance, an auralization produced from an object that is in its design stage can be used to engender noise-aware decision making at earlier stages of its development, where significant changes may be made at a lower cost that could have a large effect on the ultimate impact of noise on communities [2].
In acoustic scenarios consisting of sources and receivers moving relative to one another, one salient and immersive feature of an auralization is the ability to simulate the Doppler shift, i.e., the changing frequency content from the receiver’s perspective. The Doppler shift is an essential aspect of the auralization for urban scenarios [7], aircraft flyover scenarios [8], road-traffic noise [9, 10], and railway noise [11]. Wind turbines share similar geometry and noise-generation mechanisms to rotorcraft, but on a larger scale, and taking Doppler shift (as well as Doppler amplification) into consideration is important, especially for observers close to the disk plane [12]. Another interesting musical application comes in the simulation of “Leslie” speaker cabinets [13], a celebrated piece of electroacoustical-musical equipment that has a spinning speaker somewhat resembling a helicopter rotor.
The time-varying delay between a source and receiver during subsonic motion is what accounts for the Doppler shift, thus an approximation of this delay is required for its simulation. Well-established methods for simulating the Doppler shift within larger auralization schemes often employ the use of a variable-length delay lines along with some sort of interpolation [7, 10, 13], where the accuracy of this interpolation directly impacts the accuracy and naturalness of the simulated Doppler shift [14]. This interpolation can be done at the source so that, once the samples are propagated, they appear uniformly spaced at the receiver (a “receiver-time dominant” approach, e.g., [15]). It can also be done at the receiver working from nonuniformly-spaced samples that have been propagated before interpolation (a “source-time dominant” approach, e.g., [9]).
Regardless of the method used for simulating the Doppler shift, the time dilation will cause the bandwidth of the source signal to increase and/or decrease, and so there is a potential for audible artifacts to arise during the interpolation step as a result of frequency folding (aliasing) and frequency replication (imaging). Aliasing can occur when the sampling frequency of the receiver is less than twice the bandwidth of the Doppler shifted source signal. On the other hand, even if the sampling frequency of the receiver is greater than twice the bandwidth of the Doppler shifted source signal, undesired frequency content (Doppler shifted frequency replicas from a discrete-time source signal) can creep in below the Nyquist frequency of the receiver due to imaging. Hence any method of simulating the Doppler shift needs to take care in keeping track of how the bandwidth of the source signal changes during motion and apply relevant filtering to eradicate such artifacts.
The manifestation of these types of artifacts is a consideration that has not been given much attention in the literature. This is perhaps because it can be addressed in situations where the entire source-receiver trajectory is known beforehand and changes in source bandwidth at the receiver can be precomputed. Hence aliasing may be solved by appropriate filtering at the source [7] and/or choosing a sufficiently high sampling frequency at the receiver. Imaging is, however, taken care of as a by product of filtering in other parts of an auralization scheme, e.g., air absorption filters [7].
In this paper, we discuss how aliasing and imaging artifacts arise when auralizing moving sources, and propose a method for simulating the Doppler shift that is inherently designed to eliminate them. Unlike existing methods which make use of variable delay lines, the proposed method uses a different framework altogether. It adopts the source-time dominant paradigm and operates on a sample-by-sample basis, keeping track of the rate at which samples are coming into the receiver after simulating propagation. This allows one to figure out what frequency regions at the receiver will be vulnerable to aliasing and imaging, and subsequently use a combination of oversampling, filtering, and downsampling strategies to predict and eliminate these regions. We will refer to this approach as the “N-Oversampled Transmitted Acoustic Pressure” (NOTAP) method [16, 17].
The NOTAP method has several advantages in that simple “zero-order hold” interpolation (ZOH, a.k.a. “sample-and-hold”) can be effectively used, the entire source-receiver trajectory does not have to be known beforehand, and imaging is taken care of directly as part of the Doppler shift simulation (as opposed to other parts of an auralization scheme). Additionally, many contemporary auralization approaches coarse grain the Doppler shift factor, which can lead to audible artifacts if the factor is changing rapidly or is not interpolated appropriately [18]. In contrast, the NOTAP method uses the exact sample-wise propagation delay, an approach also taken in some other auralization methods (e.g., [9, 13]). Thus, it produces Doppler shifts that are not approximations, making it appropriate for applications that involve large and/or rapidly-changing Doppler shifts.
NOTAP has been inspired by the problem of auralizing aircraft rotors like those found on helicopters and many so called “advanced air mobility” vehicles (drones, air taxis, etc., see [19]). Rotors present an interesting challenge as aeroacoustic predictions made on the blade surfaces may undergo very large Doppler shifts. If the vehicle is flying toward the observer, sound emitted by the advancing blade will be shifted (towards higher frequencies) both by the motion of the blade around the rotor hub and the motion of the vehicle as a whole risking aliasing. This is characterized by the “advancing tip Mach number” of the rotorcraft which, for contemporary helicopters, is kept around 0.7 [20]. Similarly, sound emitted from a retreating blade as the vehicle flies away from the receiver may generate large negative Doppler shifts (towards lower frequencies) that could open the door for the incomplete removal of frequency replicas or images (see Sect. 2 for a detailed discussion of the problem). Typically these two motions are not processed together in contemporary rotorcraft auralization schemes (e.g., [19, 21], and [22] for a fixed-wing example), though they could be in concept, and the frequency shifts will still compound whether they come from one DSP stage or two.
The NOTAP method bears a large resemblance to existing DSP strategies that have been used to tackle other problems in the past. Massie describes a “drop-sample tuning” scheme for musical wavetable synthesis that likely sees wide use in samplers and synthesizers [23]. Strauss described a similar “Asynchronous Sampling Frequency Conversion” method for Doppler simulation in the 1990s [24], however there appear to be no examples of it used for auralization. Another class of previous work comes from authors trying to solve the “plesiochronous” problem: how to interface two digital systems that are running on nearly the same, but not exactly synchronised, clocks. A text by Zölzer describes a “multistage” method [25] to tackle this problem.1 A hardware solution described by Adams and Kwan [26] closely resembles the NOTAP method, and they note that their device generates “brief doppler-shift effects [sic]” when used to track artificially large swings in sample rates. In all, although the general approach taken by NOTAP is not an entirely new concept, there seem to be no contemporary works in auralization that utilize such a scheme.
The rest of this paper is organized as follows: Section 2 gives a conceptual overview of the problems of aliasing and imaging and describes how they might creep into an auralization DSP chain. Section 3 gives the technical description of the NOTAP method. Section 4 shows the results of the NOTAP method on an exaggerated example (with a slow simulated speed of sound to generate a high Doppler shift) that would produce both aliasing and imaging artifacts if processed with a naïve DSP strategy.
2 Artifacts in the chain
Two fundamental pitfalls that need to be avoided in the DSP chain of an auralization are the generation of imaging and aliasing artifacts. This section covers the source of these artifacts on a conceptual level, gives an account of how both may be avoided, and sets the stage for the technical description given in Section 3.
2.1 Images vs. interpolation
A band-limited continuous-time signal is depicted in Figure 1a, and its frequency spectrum shown in Figure 1b with a bandwidth, fb (Hz). Discretization of this continuous-time signal involves its multiplication with an impulse train, whose impulses are evenly spaced in time by 1/fs (fs being the sampling frequency (Hz)), the result of which is shown in Figure 1c. This multiplication in the time domain is equivalent to a convolution in the frequency domain of the frequency spectrum of the continuous time signal from Figure 1b with another impulse train, whose samples are now spaced by fs, resulting in copies or replicas of the frequency spectrum from Figure 1b at integer multiples of fs as shown in Figure 1d [27]2. These frequency replicas extend infinitely in frequency and are also referred to as images in the literature [29], a terminology which we adopt throughout this paper.
Figure 1 The effects of sampling in the time and frequency domain. (a) A continuous time domain signal, (b) the corresponding frequency spectrum of (a) with bandwidth, fb, (c) the discrete-time domain signal after multiplication of (a) with an impulse train, and (d) the corresponding frequency spectrum of (c), where the frequency spectrum of the continuous time domain signal is repeated at integer multiples of the sampling frequency, fb. These frequency spectrum replicas are referred to as images (only the first two positive images are shown, but they continue to ±∞). fN is the Nyquist frequency. |
At a fundamental level, any technique that is trying to infer the information that ought to exist between the samples of a discrete-time signal is doing something to remove these images. That is, the function to return to a continuous-time signal from a discrete one is the same as removing the images that implicitly exist in the discrete signal – returning to a signal with infinite frequency range (though most of this range is now empty, as in Fig. 1b). Thus, in the ideal case, this is done by convolving the discrete-time input signal with a sinc function that is properly scaled in magnitude and time (the canonical rectangular low pass filter, [27]). This operation is impossible in a practical setting as it involves an infinite acausal sum. Whatever approximations are made in order to make computation realizable will necessarily create imperfections in the resultant signal. This is what is meant by “imaging” in this work – the persistence of images due to their incomplete removal by filtering or interpolation stages in a signal processing chain (cfr. [25]).
There are several classes of algorithms to design a filter to remove images, i.e., an anti-imaging filter, where the main approach is to directly approximate the ideal filter (rectangular low pass filter). This can be done in several practical ways based on what the goal of the system is (does it need to be causal, execute in real time, etc.). In this way, the DSP subdisciplines of fractional delay interpolation [30–32], asynchronous sample-rate conversion [25], signal reconstruction [27], and bandlimited interpolation [33] are all fields grappling with the same problem. An important attribute that all of these concepts share is that they need a uniformly-sampled signal as an input. As will be seen, once the source is put into motion (resulting in a Doppler shift), processing the samples in a source-time dominant fashion (at the receiver) will necessitate the use of interpolation methods that can contend with nonuniformly-sampled inputs.
Another class of algorithms are based on geometrical conceptions of interpolation that can be brought to bear (e.g., linear interpolation between samples). These geometrical algorithms tend to offer suboptimal performance for their cost. These approaches can necessarily be used with nonuniform inputs, however some can be cast as LTI operations (e.g., as “Lagrangian” methods [30, 34]) making them computationally faster but necessitating a uniform input.
A final class of interpolation approaches that can deal with nonuniform inputs is the n-order hold methods [27, 28]. The simplest of these is zero-order hold, which is a sample-and-hold element that copies the last received sample over and over into a new time basis. Higher orders are defined by the method of (usually) geometrical interpolation that is used between the incoming samples – first-order hold uses linear interpolation between two samples, second-order would use quadratic interpolation between sets of three samples, and so on. Higher-order hold algorithms will offer a lower prevalence of images in the oversampled domain due to these geometrical operations acting as rudimentary low-pass filters. Higher-order hold methods may also avoid inducing noise (discussed more in Sect. 4).
Although N-order hold schemes are not encountered very often in auralization literature, they are used extensively in hardware systems for digital-to-analog and ASRC purposes [26, 27]. While these approaches necessarily generate images (which can be quite strong for the lower-order methods), they will always be above the desired signal content in frequency, and at locations that can be predicted by observing the rate of incoming samples. In this way, filtering down to the new sampling rate can be used to ensure that no images functionally remain in the resultant signal.
2.2 Aliasing
The other pitfall we want to avoid is the generation of aliasing artifacts. When a signal is sampled, its “baseband” will be between bounded by the negative and positive Nyquist frequencies −fN and fN (fN being half of the sampling frequency fs). Provided the input is real-valued, the negative frequency content will simply be a reflection of 0 to fN. In the example of Figure 1d, the Nyquist-Shannon sampling theorem is satisfied (fN > fb [27]), and the continuous-time signal can be recovered with an ideal low-pass filter placed between fb and fN. If, on the other hand, the Nyquist-Shannon sampling theorem is violated so that fs ≤ 2fb, then the images begin to overlap with the frequency content of the continuous-time signal, so that it cannot be recovered by low-pass filtering. This irreversible distortion is referred to as aliasing.3 This rearranging of signal energy destroys any harmonic structure that may be present in the input, making aliasing distortion easily noticeable when listening, even for relatively weak signal components. This problem is most commonly encountered in analog to digital converters, where aliasing is typically avoided by a cascade of both analog and digital low-pass filtering combined with an oversampling stage [27].
Not only can frequency content from the baseband of the source become aliased during an auralization, but images stemming from interpolation stages can be as well. For the baseband, when dealing with moving sources, the danger of aliasing arises when there are positive Doppler shifts. Although a source at rest may only have content that is confined below a point that satisfies the Nyquist-Shannon sampling theorem at the receiver fs, this does not guarantee that this same fs will be sufficient to avoid aliasing at the receiver when the source is in motion. If the Doppler shift is positive, fs may now be less than twice the Doppler-shifted bandwidth necessary to faithfully receive the source content. Thus, choosing an appropriate fs at the receiver is not just a matter of deciding what sampling rate is convenient for playback of the final signal, but also deciding the robustness of the auralization to aliasing (especially in the absence of a strategy to deal with aliasing otherwise).
The danger with aliased images generated by interpolation stages is a bit more complex. If inappropriate interpolation is used (or the images coming from interpolation are not correctly filtered out later in the scheme), unexpected content will be liable to show up at the receiver as aliased information. This is a rather insidious problem, as the aliased images will have very complex relationships to the original baseband content, making identification of their source very difficult (this will be demonstrated emphatically in Fig. 5d). Due to the corrupting effect that both images and aliasing can have on the timbre and subjective impression of a sound, taking care of these issues can be of great importance if the auralization is to be used in a listening test.
3 The NOTAP method
This section contains the technical details of the NOTAP method. The key to understanding the method is to realize that the information that defines how to handle both the problems of aliasing and imaging is encoded in the arrival times of the samples as they get to the receiver in a source-time dominant auralization method. If they are coming in very quickly relative to the desired receiver fs, we know we need to oversample considerably before decimating down to the final result. If they are coming in slowly, we know that aliasing won’t be a problem, but now, parts of the receiver baseband should be subject to filtering as the only thing that could occupy that higher frequency region would be images of the original signal.
3.1 Technical description
The departure point of the auralization process for a moving source typically begins with a discrete-time representation of the continuous-time acoustic pressure and position of the moving sound source. We therefore define the following finite discrete-time signals with a source sampling frequency, fs,s (Hz)(1) (2) (3)where ts is the vector of discrete times pertaining to the source signal with a uniform spacing of 1/fs,s, ps is the vector of acoustic pressures, whose elements correspond to the times in ts, Rs is the matrix of source positions whose elements also correspond to the times in ts, rs(l) = [xs(l), ys(l), zs(l)] is the source position vector at time ts(l) (l = 0, 1, …, L − 1), and L is the total number of samples.
Starting with these discrete-time source signals, the goal of the NOTAP method is to simulate the Doppler shift for the auralization of the moving source in a free-field at some fixed receiver position4, rr = [xr, yr, zr], ensuring the elimination of aliasing and imaging artifacts. An overview of the entire method is shown in Figure 2, depicting the intermediate processing stages of the method, each of which will be explained in more detail in this section. Also shown in Figure 2 are the signals (times and corresponding acoustic pressures) that are output at each of the intermediate processing stages. Signals that have the same subscript share a common sampling frequency and hence can be analyzed in isolation to further understand the impact of a particular processing stage. In Section 4, we will gain further insight into various stages through an example, and retain this section for the technical details only.
Figure 2 The N-oversampled transmitted acoustic pressure (NOTAP) method for simulating the Doppler shift for the auralization of a moving sound source and fixed receiver in the free-field. |
Proceeding down the chain in Figure 2, the first step of the NOTAP method is the transmission stage to obtain what we will refer to as the transmitted times and corresponding transmitted acoustic pressures at the receiver position. These samples will use the same index l as the source samples but now, in general, will not be equispaced in time due to the motion of the source. Similar to equations (1) and (2), we can also define vectors of the transmitted times and acoustic pressures respectively as tx = [tx(0), tx(1), …, tx(L − 1)]T and px = [px(0), px(1), …, px(L − 1)]T. For each time sample of the source signal, ts(l) (l = 0, 1, …, L − 1), the transmitted time, tx(l), can be computed as(4)where ∥rs(l) − rr∥2 is the Euclidean distance between the source at time ts(l) and the receiver, and c is the speed of sound (ms−1). The corresponding transmitted acoustic pressure at times tx(l) is then given by(5)where α is a factor that can be used to represent effects of acoustic propagation that can be expressed as a gain alone such as spherical spreading or Doppler amplification [35]. Due to how we have defined the vectors – as only a function of the discrete index l – equation (5) also corresponds to simply px(l) = αps(l). Other relevant effects, such as frequency-dependent simulations of atmospheric attenuation or reflections from surfaces with finite impedance need to be dealt with separately (i.e., after the signals are in their final sample rates at the receiver), hence the reason why the NOTAP is better suited as part of the DSP chain within a larger auralization scheme.
Taking a closer look at the time difference between any two samples in tx, from equation (4), we find that(6)where dR(l) = ∥rs(l) − rr∥2 − ∥rs(l − 1) − rr∥2, and the term dR(l)/c refers to the change of propagation delay. This equation demonstrates that the samples in tx, and hence also in px, are indeed not equispaced in time unless dR(l) is constant. As a result, the transmitted signal is now defined by a time-varying sampling frequency at the receiver, which we will refer to as an instantaneous transmitted sampling frequency, fs,x, that can be expressed as5 (7) (8)where equation (6) has been substituted in equation (7).
The problem we now encounter is that there is not a single uniform sampling frequency which readily allows us to make an auralization of the signal at the receiver, i.e., we cannot simply listen to the samples we have in px from equation (5). Thus, an interpolation strategy is needed that can handle nonuniform input, as discussed in Section 2.1. Specifically in the NOTAP method, we proceed by using n-order hold methods to interpolate the samples in px over some uniform time grid corresponding to an oversampled version of the user-specified receiver sampling frequency, fs,r.
This is illustrated in Figure 3a, where the stems with the crosses are transmitted acoustic pressures obtained after applying equations (4) and (5), which are nonuniformly spaced along the time axis. The circles correspond to the samples in a newly defined vector of receiver times with with uniform spacing Δts,r = 1/fs,r, with fs,r = 10 kHz. This serves as an interpolation grid over which tx and px are interpolated to obtain the corresponding acoustic pressures at the receiver. Both a ZOH and linear interpolation over this interpolation grid are shown in Figure 3a, yielding signals that can be auralized at sampling frequency fs,r. This naïve approach to the auralization of px will result in artifacts from either aliasing or imaging. A visual way to conceptualize the condition necessary for aliasing to occur is when two (or more) samples of px are within Δts,r, for instance between 0.4 ms and 0.5 s in Figure 3a. On the other hand, if two of the interpolation grid points are within the time between two samples of px, for instance approximately between 0.5 ms and 0.8 ms in Figure 3a, then imaging can occur. In this way, the information needed to evaluate the danger of both aliasing and imaging is contained in the time of arrivals of the samples as they get to the receiver, but that information needs to be acted upon either by sampling fast enough at the receiver to avoid aliasing or low-pass filtering appropriately to avoid imaging.
Figure 3 Different levels of oversampling with different simple interpolation schemes, where px is the transmitted acoustic pressure with nonuniformly spaced samples, i.e. the input signal for the interpolation schemes. (a) Undersampled receiver, where aliasing is possible. (b) Highly sampled receiver, producing a large frequency range above the possible transmitted content where imaging is possible. |
Focusing firstly on preventing aliasing, we proceed to the oversampling stage of Figure 2. Here we choose an intermediate sampling frequency fs,o = Nfs,r, where N is an oversampling ratio from which we can define an oversampled time vector to = [to(0), to(1), …, to(J − 1)]T that has J samples uniformly spaced by 1/fs,o such that to(J − 1) = tx(L − 1). Using fs,r = 10 kHz, this is illustrated in Figure 3b for N = 10, where the circles correspond to the samples in the oversampled time vector to. The oversampling creates a finer grid that can now be used along with tx and px to do an interpolation for the vector of acoustic pressures po, corresponding to the oversampled times in to. N now defines a soft upper limit on the incoming frequency content as it can either be chosen arbitrarily large such that fs,o is much greater than any expected audible frequency that may be generated as a result of any expected Doppler shift from the source baseband, or the method may be formulated to set this ratio dynamically so that no incoming transmitted samples are ever closer than any two oversampled ones. Both ZOH and linear interpolation of px at the oversampled times in to are shown in Figure 3, though any interpolation approach can be used that can deal with nonequispaced input samples. Various orders of interpolation can be chosen to interpolate tx and px at the oversampled times in to with a trade-off between latency and amount of distortion suppression, a discussion of which we reserve for Section 4.3.
The oversampled signal now consists of the transmitted baseband signal and a series of images. In the case of ZOH, these images are very strong and will persist up to the Nyquist frequency of the oversampled signal, though no further than that. Note that there will always be images at this stage, no matter what interpolation strategy is used, since the transmitted data is nonequispaced, and therefore, sinc-based interpolation strategies that perfectly recreate the baseband signal without generating images cannot be used.6 Thus, by this step, aliasing has been avoided, but now the signal needs to be decimated to fs,r and appropriately filtered to remove the images, which carries us to the decimation stage in Figure 2.
This stage consists of a low-pass filter followed by downsampling with a factor of N. The low-pass filter will have a cut-off frequency7 just below fs,r/2 and is applied to the signal po so that any signal content introduced above fs,r/2 due to the Doppler shift will also be removed. Recall that if the oversampling and interpolation had not been done, this additional frequency content would have been aliased, hence the low-pass filter serves the purpose of an anti-aliasing filter. It also serves the purpose of an anti-imaging filter, removing any images present above fs,r/2. This signal is then downsampled to fs,r, yielding the vector of decimated acoustic pressures at the receiver, pd = [pd(0), pd(1), …, pd(L − 1)]T, which corresponds to the time vector, tr = [tr(0), tr(1), …, tr(L − 1)]T, whose samples are uniformly spaced by 1/fs,r.
At this point we are seemingly done with the auralization process, however the low-pass filter in the decimation stage may not have been sufficient as some images (or parts of) may have persisted in the receiver baseband below fs,r/2. This is a direct consequence of the (time-varying) instantaneous transmitted sampling frequency, fs,x(l) (cfr. Eq. (8)) associated with the transmitted acoustic pressure, px. As opposed to images being copies of a signal at integer multiples of a fixed sampling frequency as is the case for uniformly-sampled signals, images in our scenario are expected to occur at integer multiples of fs,x(l). At each time index, l, the highest nonzero frequency component of these images (defining their bandwidth) can be given by:(9)for , where fb,x(l) is the highest nonzero frequency component of px(l). This equation demonstrates that the images will essentially have a time-varying bandwidth, making it possible that they can persist below fs,r/2.
Given that the low-pass filter from the decimation stage may not be sufficient, we would alternatively need to apply a low-pass filter with a time-varying cut-off frequency at fs,x(l)/2 (or some factor of fs,x(l)/2) in order to remove the images. As shown in the Appendix, once the discrete-time source signal satisfies the Nyquist-Shannon sampling theorem, then fs,x(l) – fb,x(l) > fb,x(l), i.e., we can be sure that for any time index, l, the lowest frequency image will not intersect with the highest nonzero frequency of px and hence the time-varying images can be completely removed. A low-pass filter with a time-varying cut-off frequency could ultimately be used in place of the fixed anti-imaging filter in the decimation stage, but for simplicity in this work, we maintain it as a separate block in Figure 2, implementing it as a finite impulse response (FIR) filter.
The final acoustic pressure at the receiver, i.e., the elements of pr, can be obtained by filtering the elements of the decimated acoustic pressure, pd as follows:(10)for (l = 0, 1, …, L − 1), where h(k, l) is the time-varying, symmetric, windowed FIR anti-imaging filter of length P defined as [36]:(11)where |k| ≤ K, the filter length, P = 2K + 1, w(k) is a symmetric window (i.e. w(k) = w(−k) and w(k) > 0 for |k| ≤ K), sinc(x) = sin(πx)/(πx), and:(12)where fc(tr(l)) is the time-varying cut-off frequency defined as some factor, Q such that fc(tr(l)) = Qfs,x(tr(l))/2, with Q ≤ 1.8
We have defined the cut-off frequency as a function of the receiver time, tr(l), since pr(l) in equation (10) corresponds to the receiver times tr(l). Hence in equation (12), we need to obtain an estimate for fs,x(tr(l)). We can use equation (7) for this, however we would firstly need to find the index in the vector of transmitted times, tx, that would correspond to tr(l). In general, we do not expect for any time in tx to be exactly equal to tr(l), hence, we simply find the index, j for which tx(j − 1) < tr(l) < tx(j) and estimate fs,x(tr(l)) as:(13)
In summary, for this time-varying anti-imaging filter block, for each receiver time tr(l), we estimate fs,x(tr(l)) from equation (13), from which we can design the FIR filter from equation (11), and then finally filter the decimated signal as in equation (10) to obtain pr. Following this last step of applying the time-varying anti-imaging filter, pr can then be listened to at the sampling frequency fs,r or can alternatively be upsampled to some desired listening sampling frequency as shown in Figure 2.
3.2 Considerations for implementation
The NOTAP method as described does not include compensation for the slight attenuation of high frequencies that comes with ZOH interpolation [27]. This compensation may simply be deemed unnecessary as it represents at most a 4 dB loss at high frequencies relative to the source sampling frequency. Implementation of a compensation filter, while possible, is not straightforward due to the fact that the ZOH period is variable with time. Thus, it would either come at significant computational expense as a variable filter implemented on the oversampled signal, or it could perhaps be formulated in such a way that it could be applied to the incoming samples before the ZOH operation. First-order hold interpolation causes more attenuation in the transmitted signal (perhaps counterintuitively [28]), and so an implementation based thereon may be more in need of compensation.
Considering the algorithm in general, the NOTAP method using a ZOH interpolant can be a system that runs in absolute real time [17] That is, it can run on its own output sample clock completely independently of the processes that define the incoming samples. While the oversample rate does define a much softer upper bound than fs,r on the aliasing performance of the scheme, it would not be difficult (or computationally that much more expensive) to swap in/out more decimation stages as needed to cope with high-frequency information arriving from the source – it is easy to check if samples are arriving at a rate that risks aliasing, and add more oversampling as needed. On the other hand, for offline processing, it is noted that the NOTAP method would be harder to parallelize than other block-wise propagation approaches for auralization as algorithms that work on blocks of samples would have natural delineations which would lead to easy parallelization.
4 A spinning contrivance
To illustrate the previously discussed issues of auralizing a moving source and show how the NOTAP procedure can circumvent such issues, we will consider a 2D scenario of a sound source moving in an anticlockwise direction along a circular path and a single receiver as depicted in Figure 4. This can be thought of as an extremely rudimentary model of a helicopter rotor blade (perhaps it is a worst-case scenario, where someone has attached a whistle to the tip of a blade). The radius of the circular path is r = 3, perhaps the size of the blade of a small helicopter. The starting position of the moving source is always that closest to the receiver, i.e., rs(0) = [3, 0]T. The center of the circle is set as the origin and the receiver was placed 5 away from the origin along the x-axis so that rr(0) = [5, 0]T. The velocity of the moving source is set to v = 62 ms−1; 1 revolution takes approximately 0.3 s. The monopole emits a simple harmonic tone of 800 Hz with an amplitude of 0.5 Pascals. The effect of spherical spreading loss is not included in the α factor for this example, so the resultant amplitude of the signal at the receiver should be 0.5 Pascals when no filtering is present.
Figure 4 Moving source scenario for auralization at the receiver position. The source is moving in an anticlockwise direction as indicated by the arrows along the circular path. |
The speed of sound was set at an unphysically low c = 80 ms−1. This is done in order to slow the entire scenario down while maintaining a realistic Doppler shift for a helicopter scenario. The slower orbit rate of the source leads to a situation where the peaks and artifacts of the resulting auralizations can clearly be heard, whereas, if they were presented at typical helicopter shaft-order frequencies, they may only be heard as timbral shifts and would be difficult to see in a spectrogram.
Figure 5 shows spectrograms of the signal at the receiver for three revolutions around the circular path using various schemes for simulating the Doppler shift and the associated parameters9. Sound files of these examples are made available as supplementary material. They are available for download so that the reader can “listen along” and hear the impacts of the various schemes for simulating the Doppler shift that are used. In general, it can be surprising when the plot of two waveforms look visually the same, even though they sound very different when listened to (as small bits of high frequency may not show up on a plot of a large low-frequency carrier, while completely changing the impression). Similarly, two waveforms may look very different, but give the same impression (as two realizations of white noise would). The examples shown here are made in such a way that both the frames of Figure 5 and the sound files clearly demonstrate the differences between the approaches, but listening along will still likely be instructive for most readers.
Figure 5 Spectrograms for various auralization methods using a sinusoidal source (a) ‘Ground truth’ using equation (14) with fs,r = 16 kHz, (b) aliased with continuous time source with fs,r = 4 kHz, (c) linear interpolation of a discrete time source with fs,r = 44.1 kHz, (d) linear interpolation of discrete time source and aliasing with fs,r = 4 kHz, (e) Decimated signal from (a) at fs,r = 4 kHz, (f) NOTAP using ZOH interpolation, an oversampling ratio of 50, and fs,r = 4 kHz, (g) NOTAP using ZOH interpolation, an oversampling ratio of 500, and fs,r = 4 kHz, (h) NOTAP using linear interpolation, an oversampling ratio of 50, and fs,r = 4 kHz. |
Figure 5a is representative of a ground truth of the signal we should expect at the receiver. We can clearly see the effect of the Doppler shift as the frequency is lowered when the source is moving away from the receiver and increasing as it moves toward the receiver. A receiver-time dominant approach was used in order to compute this signal. This is an inverse formulation of equation (4). Specifically, we first define a vector of known, uniformly-spaced receiver times, , according to a receiver sampling frequency, fs,r. The corresponding times at which the signal was emitted from the source, te(l) (l = 0, 1, …, N − 1) was then computed as (14)where is the analytical function that gives the position of the source on the moving path at time, t, so for the circular path with radius r, we have(15)
The corresponding acoustic pressure at the receiver, is then finally set equal to the acoustic pressure of the source at times te(l). Here, the harmonic source is known as a single sine function which can be evaluated for arbitrary time. For discrete-time source signals however, an interpolation would also be needed in the source domain since only the source pressures in ps would be known at the times in ts. In Figure 5a, fs,r = 16 kHz, greater than twice the highest frequency introduced by the Doppler shift, thus the Nyquist-Shannon theorem was satisfied. This ground-truth case can be heard in sound file A.
Upon comparison with equation (4), equation (14) is not trivial to solve since the source position is also a function of the unknown, te(l). As such, equation (14) will have to be solved indirectly (for instance, using root finding methods). While for simple geometries this problem may be able to be solved, a general system for auralization will always have to treat this as an indirect problem. It has been noted that computing emission times can be a substantial component of the overall execution time of an auralization system [37]. Obviously, using a source-time dominant algorithm (such as NOTAP) removes this stumbling block entirely.
4.1 Aliasing and imaging creep in
Reducing the receiver sampling frequency to fs,r = 4 kHz results in violation of the Nyquist-Shannon theorem and aliasing will occur in the output. Figure 5b demonstrates what happens to the ground-truth signal under this condition. As the signal frequency increases toward the end of a revolution around the circle, we can observe the signal being “reflected” off the receiver Nyquist frequency of 2 kHz. The resulting rapid downward spike in frequency produces a highly audible artifact, both due to the missing high-frequency content relative to Figure 5a, and due to the rather percussive nature of the sound as the tone spikes toward DC (similar, perhaps, to a simple analog drum machine). A further significant artifact generated in this signal is significant ringing which occurs at the receiver Nyquist frequency. This seems to happen near the times when the tone is reflected off of fN. These effects can be heard in sound file B.
One way to circumvent this would be to first generate the signal in Figure 5a at fs,r = 16 kHz and subsequently decimate it to fs,r = 4 kHz (this will be shown in Fig. 5e). This process can be extremely inefficient, especially for a receiver-time dominant approach, and also requires advance knowledge of the entire source-receiver path so that we can precompute how the signal bandwidth changes due to the Doppler shift in order to choose a sampling frequency that will avoid aliasing.
Figure 5c was produced using a discrete-time version of the source with a source sampling frequency of fs,s = 4 kHz.10 Thus, interpolation is required at the source in order to produce data at the times te(l) that is subsequently used to populate as required by the receiver-time dominant auralization method. The interpolation method used here is a simple linear interpolant which, although fast to compute, does not do a good job at suppressing images (i.e., it is used here in order to demonstrate the dangers of images in auralization).11 The result is that there are copies of the source content created above the effective source Nyquist frequency. Due to these artifacts being created in the source domain – in a continuous representation where there is effectively infinite frequency resolution – these images extend infinitely far up in frequency, although in practice they are limited by the fact that they decay in amplitude with increasing order. In this simulation, the receiver frequency is set to be 44.1 kHz in order to avoid significant aliasing of the images in the example result. In this case, although the tone is not aliased and reflected, the presence of the images significantly alter the timbre of the tone – as a musical instrument would be if it had its overtones adjusted. Note that these overtones do not form a harmonic series, so the timbral effect is rather harsh and unpleasant. This effect can be heard in sound file C.
Figure 5d shows what happens when both aliasing and imaging occur during the auralization. Again the receiver sampling frequency is set down to fs,r = 4 kHz and linear interpolation is used for the sampled version of the source signal. The true signal component at the source once again takes on a sharp negative spike at the receiver via aliasing. The images, instead of being timbre-changing overtones of the intended signal, now are aliased down into the receiver baseband in a way that decorrelates them with one another. In the spectrogram, one can clearly see the structure that the aliased images retain in the result – they do not simply become broadband noise. The resulting sound has multiple tones sweeping through the signal in various ways. The auralization does repeat with the same period as the other samples as the process is deterministic, however, the impression is severely corrupted. This can be heard in sound file D.
The desired behavior for an fs,r = 4 kHz is shown in Figure 5e. This result is absent of both aliasing and images. This was generated by downsampling Figure 5a – generating the full-frequency ideal case and then keeping only the part of the spectrum that will fit into the receiver frequency range up to fs,r /2 = 2 kHz. Again, the downside to doing the computation in this way is at least two-fold: this figure is dependent on both perfect knowledge of the source as a continuous-time signal, and it requires complete knowledge of the transmission path (i.e., the expected maximum frequency component at the receiver) in order to set the receiver sampling frequency a priori. It is also computationally intensive relative to the NOTAP method. This can be heard in sound file E.
Figures 5f–5h show results obtained from the NOTAP procedure using three different configurations. Figures 5f and 5g use a zero-order hold interpolation with oversampling ratios of 50 and 500 respectively, whereas Figure 5h uses a linear interpolant with an oversampling ratio of 50. In all of these figures, the main signal component matches that of the desired behavior seen in Figure 5e, and there is no evidence of aliasing or imaging in the generated signal. The differences between the frames is primarily a matter of differing amounts of broadband noise that has been generated during the NOTAP procedure, noise that reduces with increasing oversampling rate and interpolation order. The source of this noise will be discussed below, along with examination of intermediate points along the NOTAP signal path for this scenario. The sounds of Figures 5f, 5g, and 5h can be listened to in sound files F, G and H respectively.
Looking at the noise in Figure 5f, it is also a good example of the action of the time-varying anti-imaging filter. It can be seen that there are two zones of filtering – one in which the filtering is limiting content that would be aliased in the receiver domain (this cuts off both the broadband noise and the desired tone near 2 kHz), and another in which the broadband noise is being limited at a lower frequency.
It is interesting to think about the results of Figure 5 in terms of the accuracy of the Doppler simulation. In general, any variable delay line will implicitly simulate this effect, and the “accuracy and naturalness of a Doppler shift” will, in turn, depend on the accuracy of the fractional-delay method used [14]. In this sense, all of the simulations of Figure 5 are perfect. That is, there was no approximation or coarse graining of the time delays made in any of those methods. This is in contrast to other recent work that explored perceptual artifacts generated when the “scheduled” time delays were too coarse or were improperly interpolated between [18]. Thus, all of the strategies and processing settings discussed in this section do not discriminate between the approaches in terms of accuracy of the Doppler simulation itself – the main yellow tone component in all of the panes of Figure 5 are effectively equal. The differences between the approaches are the unintended/extra signal content that is either generated or not filtered out by the various ASRC methods. The plausibility of the simulation on a whole will be linked to the perceptibility of this extra content. Unfortunately, this will be situation dependent, as it will rely on the strength of the intended Doppler effect and the nature of the signal (e.g., are there broadband components that will tend to mask artifacts, or is the signal very sparse or harmonic so that pathologies can be easily heard). Thus, giving absolute guidance on what NOTAP settings will be plausibly sufficient is not possible in general.
4.2 Steps along the way
Snapshots of the intermediate signals after processing by the blocks in Figure 2 are shown in Figure 6. This signal is a segment of Figure 5g, and the time values (abscissae) are shared between the figures. It shows a moment where the received tone is sweeping up in frequency through a place where it would become aliased based on the receiver sampling frequency. Figure 6a is a plot of the transmitted acoustic pressures px at the transmitted times tx. It can be observed that the samples are unevenly spaced in time. In this example, the source tone is a factor of the sampling frequency, so the pattern of source samples repeats for every cycle of the tone. However, this pattern is compressed in time in the transmitted domain as the Doppler shift increases from left to right in the figure. Figure 6b shows the result of interpolating the transmitted pressures onto the oversampled time base to using a zero-order hold method. The signal now has a (fine) stair-step quality to it. The signal is then low-pass filtered just below fs,r /2 (the Nyquist frequency of the receiver), downsampled, and the time-varying anti-imaging filter applied. This results in Figure 6c. This is, the final result of pressures pr at times tr – now uniformly spaced by 1/fs,r. Although it may look messy, this signal accurately represents the desired response. This is revealed by upsampling for the “listening” version shown in Figure 6d.
Figure 6 Snapshots of signals obtained from different points along the NOTAP signal path used to generate Figure 5g (ZOH, N = 500). (a) Transmitted acoustic pressure with non-uniformly spaced samples, px (b) The ZOH interpolation of, po (c) Decimation of (b) to fs,r = 4 kHz and applying time-varying anti-imaging filter, pr, (d) Upsampled version of (c) to a listening sampling frequency of 44.1 kHz. |
It is also instructive to look at the frequency content of the oversampled domain shown in Figure 6b. A spectrogram of this time series is shown in Figure 7 12, where the first 3 integer multiples of the instantaneous transmitted sampling frequency, fs,x (computed using Eq. (7)), are overlaid as white broken lines. Here, the series of images induced by the zero-order hold process can be seen above the intended signal component. They are again not a harmonic series, but rather a series of overtones that interact with integer multiples of fs,x in accordance with equation (9). Note that the image component magnitudes are stronger in Figure 7 than in Figure 5c. This is due to zero-order hold being an inferior (albeit faster) interpolation method relative to linear interpolation, so it does less to suppress the images. However, this does not ultimately matter since the time-varying anti-imaging filter ensures that all of these images are removed. The main difference between interpolation methods in the NOTAP method is thus the amount of broadband noise induced in the result.
Figure 7 Spectrogram of the ZOH oversampled time-series from Figure 6b. The first 3 integer multiples of the instantaneous transmitted sampling frequency (computed using Eq. (7)) are overlaid as white broken lines. |
There is a subtle point that needs to be highlighted that favors interpolation via oversampling at the receiver rather than forming a continuous-time reconstruction at the source: The oversampled representation of the transmitted signal provides a limit to the images of the source – although there may be many images of the source in the frequency range above the (effective) source baseband and below the oversampled Nyquist frequency (and although those images may be quite strong, as in the case of Fig. 6), those source images are limited by the resolution of the oversampled domain. On the other hand, the continuous representation has effectively infinite frequency resolution and images of the source baseband never actually cease – they all wind up aliased into the receiver baseband, even if they are highly attenuated by using good interpolation methods. In this way, there is less of a chance for aliased images to be a problem if the interpolation is done via an oversampling method.
4.3 A source of noise
In the ZOH NOTAP cases of Figures 5f and 5g, it is seen that there is a large increase in the induced broadband noise coming from the lower oversample ratio case. This paper will only give a conceptual account of the source of this noise, as a technical discussion has already appeared in the work of other authors [24].13
Using a ZOH interpolator, the noise is induced via a “miss” that occurs between the actual time when a new (transmitted) sample arrives and the first oversample that is available to take its value – there will be some small amount of time between when a new transmitted sample arrives and the first oversample that will be assigned this new transmitted magnitude. An exaggerated illustration of this can be observed in Figure 3 between a transmitted sample and the first subsequent ZOH oversample. During the time between these two events, the oversampled signal will not reflect the transmitted signal. This miss can occur uniformly anywhere in the time between oversamples and thus the result will be white broadband noise. Since this miss distance is related to the time between oversamples, the oversample rate will be one factor that determines the magnitude of this noise. Further, the size of the miss will depend on how different two transmitted samples are from each other – the amplitude of the noise is related to the first derivative of the transmitted signal. It is therefore dependent on the transmitted signal magnitude and frequency content, with higher frequencies generating more noise.
When using a linear interpolation scheme (recall Fig. 3), this noise source will disappear. This is due to the fact that the oversamples generated by linearly interpolating between two transmitted samples will have no error associated with them.14 This obvious benefit of using linear interpolation is offset by several factors: The first is the higher computational cost associated with performing the interpolation rather than simply copying the last received signal over and over into the oversamples. This computation, while simple, will have to be performed for each oversample, though it also allows one to use lower oversample ratios if noise is no longer an issue. A second factor is that the algorithm will wind up with indeterminate execution time as it will have to wait for the next sample to arrive before it can process the interpolant between the two most recently received samples. Thus, a minimum delay will be necessary that will be related to the rate of incoming samples. This may be quite short if the incoming rate is high and can be bounded beforehand, but if it is low or uncertain, the delay might be prohibitive or impossible to implement. The final consideration is that linear interpolation will tend to act as a more agressive low-pass filter on the source baseband signal than ZOH will [27], requiring either more compensation, or leading to a loss of high-frequency content.
In all, depending on the oversample ratio and interpolation order that are chosen for the NOTAP method, there is a multi-way tradeoff to be made between the simplicity of the interpolation mechanism, the amount of induced decorrelated noise, the computational burden, the nearness to absolute real-time performance with which one wants the scheme to run, and the frequency response.15 This tradeoff frontier may be able to be pushed through, perhaps by keeping track of the miss errors in the ZOH approach and subtracting the resultant noise prediction from the output, or by replacing the ZOH rectangular convolution with that of another function that may have more favorable properties while still maintaining the real-time aspects of the approach.
5 Conclusion
In this paper, we have proposed a source-time dominant approach to model the Doppler shift for the auralization of moving sources, referred to as the NOTAP method. The method involves keeping track of the rate at which samples arrive to the receiver and uses a combination of oversampling, interpolation, and time-varying filtering. By considering an exaggerated example of a moving source and static receiver, it was shown how the NOTAP method is capable of successfully eliminating aliasing and imaging artifacts, while preserving as much of the original signal content as possible for a given receiver sampling frequency. The NOTAP method is particularly suited for the auralization of scenarios where large Doppler shifts are incurred (e.g., rotorcraft blades) as they are more susceptible to these artifacts. It has the benefit of employing simple interpolation schemes (e.g., zero order hold), which can facilitate a real time implementation, but limited by its difficulty to parallelize as compared to other block-wise propagation approaches.
Many pieces of a more realistic auralization were purposefully not addressed in this work so as to dedicate sufficient attention to the production and elimination of aliasing and imaging artifacts as a result of the fundamental signal processing when carrying out an auralization. Nevertheless, incorporation of other propagation effects such as spherical spreading, Doppler amplification, atmospheric absorption, and multipath effects should not be any more difficult in the NOTAP workflow than doing so in other auralization strategies. As the NOTAP method differs significantly in its basis to other Doppler simulation approaches, its usage within an auralization scheme may generate performance tradeoffs relative to current methods.
Acknowledgments
The authors would like to thank a number of researchers and colleagues with whom we have had many fruitful conversations, namely Haik Martirosyan, Matthew Boucher, Eric Greenwood, Toon van Waterschoot, and Thomas Dietzen. Randall Ali is a voluntary research associate at KU Leuven, Belgium and has been supported by the FWO Research Project G0A0424N. Support for Andrew Christian came primarily from the NASA Revolutionary Vertical Lift Technology project. Thanks to those involved in the review and editorial process at NASA and Acta Acoustica for feedback that led to significant improvements in the quality of this manuscript.
Conflicts of interest
The authors declare no conflict of interest.
Data availability statement
The sounds referenced in this publication are openly available as supplemental material with this publication.
Supplementary material
All audio files link to Fig. 5. Access here
(Fig. 5a): Representative of a ground truth of the signal we should expect at the receiver. The receiver sampling frequency was 16 kHz.
(Fig. 5b): The aliased signal when the receiver sampling frequency was set to 4 kHz.
(Fig. 5c): A case where excessive images are present in the signal and have not been appropriately filtered out, resulting in the imaging artifacts. A discrete-time source where linear interpolation was used and the receiver sampling frequency was 44.1 kHz.
(Fig. 5d): A case where both aliasing of the original signal as well the images has occurred. A discrete-time source where linear interpolation was used and the receiver sampling frequency was 4 kHz.
(Fig. 5e): Representative of the of the signal we should expect at the receiver when the receiver sampling frequency is 4 kHz. Although there is some bandwidth loss in comparison to A_ground_truth_fsr16k.wav, there are no aliasing/imaging artifacts. This is the intended behaviour of any auralization scheme at such low receiver sampling frequencies.
(Fig. 5f): NOTAP result using Zero Order Hold interpolation, an oversampling ratio, N = 50, and the receiver sampling frequency of 4 kHz.
(Fig. 5g): NOTAP result using Zero Order Hold interpolation, an oversampling ratio, N = 500, and the receiver sampling frequency of 4 kHz.
(Fig. 5h): NOTAP result using linear interpolation, an oversampling ratio, N = 50, and the receiver sampling frequency of 4 kHz.
The first edition of this book comes from 1996 and gives references from multiple research groups from the 1980s. Curiously, his later text on Digital Audio Effects [14] does not mention this approach when discussing Doppler shift processing.
Although the common textbook by Oppenheim and Schafer is used as a reference throughout this paper [27], a more tutorial version by similar authorship (that uses many of the same figures) is also available [28]. Chapter 7 (“Sampling”) of the later reference, in particular, contains most of the background given in this section, albeit not with the context used here of nonuniform sampling and auralization.
Generalizing to the case where the receiver is moving can add considerable complexity. It would be necessary to keep track independently of both how the source and receiver are moving with respect to the medium, and only indirectly how the source and receiver are moving relative to each other – one cannot simply use the difference between the two positions to compute the time-of-flight to derive the Doppler shift, as movement of source and receiver will produce different results (especially for Mach numbers greater than .3 or so). More precisely, the formulations given here are for the case where the medium is moving homogeneously with the receiver.
In order to perfectly apply sinc-based interpolation to nonequispaced data, even if one was willing to do the computation, one would need to perfectly know the time dilation factor between transmitted samples. Any attempt to infer that dilation (e.g., linear interpolation) would necessarily introduce imperfections to the computation that would leave images in the oversampled signal.
While this source sampling frequency may seem artificially low, it is reasonable as a rudimentary model of a helicopter rotor. In such a case, fs,s will be based on the rotational rate of the blade and the density of the aeroacoustic data one is auralizing. If one started with predictions of the loading of the blade at 360 “stations” around its rotation (a.k.a. “1-degree data,” perhaps a natural value), and the blade passing frequency is near 20 Hz, then the effective source Nyquist frequency will be around 4 kHz, leaving lots of room in the audible spectrum for images caused by poor interpolation techniques.
This noise source is distinct from the quantization source that also generates band-limited white noise in ZOH D/A converters as discussed, for instance, in [27]. There should be no (significant) quantization noise in the NOTAP method, provided it is implemented with floating-precision magnitude values.
This claim is predicated on storing the values of the transmitted samples as continuous (floating-point) values in both time and amplitude – not as values quantized to the nearest time sample or with fixed precision. The discussion of noise from linear interpolation in Strauss appears to differ from this work in this way [24].
For interpolation methods higher than first-order, even more samples will need to arrive in order to process the oversamples. From this point of view, there may not be a strong reason to bother with greater-than-linear interpolations, as they would only serve to increase the complexity, expense, and delay of the system.
References
- M. Vorländer: Auralization: fundamentals of acoustics, modelling, simulation, algorithms and acoustic virtual reality, Springer-Verlag, Berlin Heidelberg, 2008. [Google Scholar]
- S. Rizzi: Toward reduced aircraft community noise impact via a perception-influenced design approach, INTER-NOISE and NOISE-CON Congress and Conference Proceedings 8 (2016) 220–244. [Google Scholar]
- P. Davies: Perception-based engineering: integrating human responses into product and system design, Bridge Washington National Academy of Engineering 37, 3 (2007) 18. [Google Scholar]
- M.A. Boucher, A.W. Christian, S. Krishnamurthy, S.A. Rizzi: Perceptual evaluation of sound exposure level in annoyance ratings to helicopter noise, Journal of the American Helicopter Society 69, 32006 (2024) 1–18. [CrossRef] [Google Scholar]
- M. Watts, E. Greenwood, C. Smith, J. Stephenson: Noise abatement flight test data report, Technical Report TM-2019–220264, Hampton, VA, National Aeronautics and Space Administration (NASA), 2019. [Google Scholar]
- A. Christian, R. Cabell: Initial investigation into the psychoacoustic properties of small unmanned aerial system noise, in: Proceedings of the 23rd AIAA/CEAS Aeroacoustics Conference, Denver, CO, 5–9 June, 2017. [Google Scholar]
- J. Stienen, M. Vorländer: Real-time auralization of propagation paths with reflection, diffraction and the Doppler shift, Fortschritte der Akustik-DAGA 44 (2014) 1302–1305. [Google Scholar]
- S.A. Rizzi, A.K. Sahai: Auralization of air vehicle noise for community noise assessment, CEAS Aeronautical Journal 10, 1 (2019) 313–334. [CrossRef] [Google Scholar]
- R. Pieren, T. Bütler, K. Heutschi: Auralization of accelerating passenger cars using spectral modeling synthesis, Applied Sciences 6 (2016) 1. [Google Scholar]
- S. Damiano, T. van Waterschoot: Pyroadacoustics: a road acoustics simulator based on variable length delay lines, in: Proceedings of the 25th International Conference on Digital Audio Effects (DAFx‘22), Vienna, Austria, 6–10 September, 2022. [Google Scholar]
- J. Maillard, A. Kacem, N. Martin, B. Faure: Physically-based auralization of railway rolling noise, in: Proceedings of the 23rd International Congress on Acoustics (ICA 2019), Aachen, Germany, 9–13 September, 2019. [Google Scholar]
- D. Mascarenhas, B. Cotté, O. Doaré: Propagation effects in the synthesis of wind turbine aerodynamic noise, Acta Acustica 7 (2023) 23. [CrossRef] [EDP Sciences] [Google Scholar]
- J. Smith, S. Serafin, J. Abel, D. Berners: Doppler simulation and the Leslie, in: Proceedings of the 5th International Conference on Digital Audio Effects (DAFx ‘02), Hamburg, Germany, 26–28 September, 2002. [Google Scholar]
- U. Zölzer (Ed.): DAFX – Digital Audio Effects, 2nd edn., John Wiley & Sons, Chichester, United Kingdom, 2011. [CrossRef] [Google Scholar]
- S. Rizzi, B. Sullivan: Synthesis of virtual environments for aircraft community noise impact studies, 11th AIAA/CEAS Aeroacoustics Conference 1 (2005) 58–66. [Google Scholar]
- A. Christian, R. Ali, E. Greenwood: An n-order hold approach for fractional-delay interpolation in auralization, Journal of the Acoustical Society of America 151 (2022) A188. [CrossRef] [Google Scholar]
- A. Christian, R. Ali: Real-time considerations for a source-time dominant auralization scheme, Journal of the Acoustical Society of America 153 (2023) A36. [CrossRef] [Google Scholar]
- P. Schäfer, J. Fatelaa, M. Vorländer: Interpolation of scheduled simulation results for real-time auralization of moving sources, Acta Acoustica 8, 9 (2024) 1–11. [CrossRef] [EDP Sciences] [Google Scholar]
- S. Krishnamurthy, S.A. Rizzi, R. Cheng, D.D. Boyd, A.W. Christian: Prediction-based auralization of a multirotor urban air mobility vehicle, in: AIAA Scitech 2021 Forum, Virtual event, 11–15 and 19–21 January, 2021, p. 0587. [Google Scholar]
- F.H. Schmitz: Chapter 2: Rotor Noise, in: H.H. Hubbard (ed.), Aeroacoustics of flight vehicles: theory and practice volume 1: Noise sources (RP-1258), National Aeronautics and Space Administration (NASA), Hampton, VA, 1991, pp. 65–150. [Google Scholar]
- S.A. Rizzi, S.J. Letica, D.D. Boyd, L.V. Lopes: Prediction of noise-power-distance data for urban air mobility vehicles, Journal of Aircraft 61, 1 (2024) 166–182. [CrossRef] [Google Scholar]
- M. Arntzen, L. Bertsch, D.G. Simons: Auralization of novel aircraft configurations, Technical Report NLR-TP-2015-284, National Aerospace Laboratory, Amsterdam, Netherlands, 2015. [Google Scholar]
- D.C. Massie: Wavetable sampling synthesis, in: M. Kahrs, K. Brandenburg (eds.), Applications of digital signal processing to audio and acoustics, Kluwer Academic Publishers, Boston, MA, 2002, pp. 311–341. [CrossRef] [Google Scholar]
- H. Strauss: Implementing doppler shifts for virtual auditory environments, in: Preprints AES 104th Convention, Amsterdam, The Netherlands, 16–19 May, 1998. [Google Scholar]
- U. Zölzer (Ed.): Digital audio signal processing, John Wiley & Sons, Chichester, United Kingdom, 2022. [Google Scholar]
- R. Adams, T. Kwan: Theory and VLSI architectures for asynchronous sample-rate converters, Journal of the Audio Engineering Society 41, 7/8 (1993) 539–555. [Google Scholar]
- A.V. Oppenheim, R.W. Schafer: Discrete-time signal processing, 3rd edn., Prentice Hall, Inc, New Jersey, USA, , 2009. [Google Scholar]
- A.V. Oppenheim, A.S. Willsky, S.H. Nawab: Signals and systems, 2nd edn., Prentice-Hall, Inc., New Jersey, USA, 1996. [Google Scholar]
- L. Wanhammar: DSP integrated circuits, Academic Press, San Diego, USA, 1999. [Google Scholar]
- T. Laakso, V. Valimaki, M. Karjalainen, U. Laine: Splitting the unit delay, IEEE Signal Processing Magazine 13, 1 (1996) 30–60. [CrossRef] [Google Scholar]
- D. Rocchesso: Fractionally addressed delay lines, IEEE Transactions on Speech and Audio Processing 8, 6 (2000) 717–727. [CrossRef] [Google Scholar]
- W.M. Hartmann: Digital waveform generation by fractional addressing, Journal of the Acoustical Society of America 82, 6 (1987) 1883–1891. [CrossRef] [PubMed] [Google Scholar]
- J. Smith, P. Gossett: A flexible sampling-rate conversion method, Proceedings of the 1984 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ‘84) 9 (1984) 112–115. [CrossRef] [Google Scholar]
- U. Zölzer, T. Boltze: Interpolation algorithms: Theory and application, in: 97th Audio Engineering Society Convention, San Francisco, USA, 10–13 November, AES, 1994. [Google Scholar]
- P.M. Morse, K.U. Ingard: Theoretical acoustics, McGraw-Hill, New York, USA, 1968. [Google Scholar]
- P. Jarske, Y. Neuvo, S.K. Mitra: A simple approach to the design of linear phase FIR digital filters with variable characteristics, Signal Processing 14, 4 (1988) 313–326. [CrossRef] [Google Scholar]
- F. Farassat: Derivation of formulations 1 and 1A of Farassat, Technical Report TM-2007-214853, National Aeronautics and Space Administration (NASA), Hampton, VA, 2007. [Google Scholar]
Appendix
For the images with a time-varying bandwidth, for any time index, l, we need to know if it would be possible for the lowest frequency image to intersect with the highest nonzero frequency of px. If this is the case, then the time-varying images cannot be completely removed without affecting the bandwidth of px. We can think of this as a sort of “effective aliasing”, and from equation (9), a criterion which we will need to satisfy to ensure this does not occur is(A1)
The similarity to the Nyquist-Shannon sampling theorem is noted but now we are using the instantaneous transmitted sampling frequency, fs,x(l), which by definition is time-varying.
For a simple harmonic source, the frequency fb,x(l) (or bandwidth in this case) of the transmitted acoustic pressure will vary as [35](A2)where fo is the constant frequency of the source signal, and it is recalled that dR(l) = ∥rs(l) – rr ∥2 − ∥rs(l − 1) – rr∥2 and dtx(l) = tx(l) − tx(l − 1) = 1/fs,x(l). Therefore, the quantity dR(l)/dtx(l) is representative of how the source-receiver distances are changing with respect to the transmitted times.
From equation (8), we can derive an expression for dR(l)/dtx(l) as(A3)
Substitution of equation (A3) into equation (A2) then yields(A4)
This demonstrates that in order for equation (A1) to hold, the sampling frequency of the source signal must be greater than twice the bandwidth of the source, i.e., 1/fs,s > 2fo. Hence, once the Nyquist-Shannon sampling theorem is satisfied for the source signal – the time-varying images of the transmitted acoustic pressure will not overlap with the bandwidth of the transmitted acoustic pressure at any particular time.
Cite this article as: Ali R. & Christian A. 2025. Source-time dominant modeling of the Doppler shift for the auralization of moving sources. Acta Acustica, 9, 1. https://doi.org/10.1051/aacus/2024073.
All Figures
Figure 1 The effects of sampling in the time and frequency domain. (a) A continuous time domain signal, (b) the corresponding frequency spectrum of (a) with bandwidth, fb, (c) the discrete-time domain signal after multiplication of (a) with an impulse train, and (d) the corresponding frequency spectrum of (c), where the frequency spectrum of the continuous time domain signal is repeated at integer multiples of the sampling frequency, fb. These frequency spectrum replicas are referred to as images (only the first two positive images are shown, but they continue to ±∞). fN is the Nyquist frequency. |
|
In the text |
Figure 2 The N-oversampled transmitted acoustic pressure (NOTAP) method for simulating the Doppler shift for the auralization of a moving sound source and fixed receiver in the free-field. |
|
In the text |
Figure 3 Different levels of oversampling with different simple interpolation schemes, where px is the transmitted acoustic pressure with nonuniformly spaced samples, i.e. the input signal for the interpolation schemes. (a) Undersampled receiver, where aliasing is possible. (b) Highly sampled receiver, producing a large frequency range above the possible transmitted content where imaging is possible. |
|
In the text |
Figure 4 Moving source scenario for auralization at the receiver position. The source is moving in an anticlockwise direction as indicated by the arrows along the circular path. |
|
In the text |
Figure 5 Spectrograms for various auralization methods using a sinusoidal source (a) ‘Ground truth’ using equation (14) with fs,r = 16 kHz, (b) aliased with continuous time source with fs,r = 4 kHz, (c) linear interpolation of a discrete time source with fs,r = 44.1 kHz, (d) linear interpolation of discrete time source and aliasing with fs,r = 4 kHz, (e) Decimated signal from (a) at fs,r = 4 kHz, (f) NOTAP using ZOH interpolation, an oversampling ratio of 50, and fs,r = 4 kHz, (g) NOTAP using ZOH interpolation, an oversampling ratio of 500, and fs,r = 4 kHz, (h) NOTAP using linear interpolation, an oversampling ratio of 50, and fs,r = 4 kHz. |
|
In the text |
Figure 6 Snapshots of signals obtained from different points along the NOTAP signal path used to generate Figure 5g (ZOH, N = 500). (a) Transmitted acoustic pressure with non-uniformly spaced samples, px (b) The ZOH interpolation of, po (c) Decimation of (b) to fs,r = 4 kHz and applying time-varying anti-imaging filter, pr, (d) Upsampled version of (c) to a listening sampling frequency of 44.1 kHz. |
|
In the text |
Figure 7 Spectrogram of the ZOH oversampled time-series from Figure 6b. The first 3 integer multiples of the instantaneous transmitted sampling frequency (computed using Eq. (7)) are overlaid as white broken lines. |
|
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.