Issue 
Acta Acust.
Volume 5, 2021



Article Number  56  
Number of page(s)  14  
Section  Hearing, Audiology and Psychoacoustics  
DOI  https://doi.org/10.1051/aacus/2021050  
Published online  21 December 2021 
Scientific Article
The PeriodModulated Harmonic Locked Loop (PMHLL): A loweffort algorithm for rapid timedomain multiperiodicity estimation
^{1}
Department of Medical Physics and Acoustics, University of Oldenburg, 26129 Oldenburg, Germany
^{2}
Hörzentrum Oldenburg gGmbH, 26129 Oldenburg, Germany
^{3}
Cluster of Excellence Hearing4all, 26129 Oldenburg, Germany
^{*} Corresponding author: volker.hohmann@uol.de
Received:
14
July
2021
Accepted:
19
November
2021
Many speech and music analysis and processing schemes rely on an estimate of the fundamental frequency f_{0} of periodic signal components. Most established schemes apply rather unspecific signal models such as sinusoidal models to the estimation problem, which may limit time resolution and estimation accuracy. This study proposes a novel timedomain lockedloop algorithm with low computational effort and low memory footprint for f_{0} estimation. The loop control signal is directly derived from the input time signal, using a harmonic signal model. Theoretically, this allows for a noiserobust and rapid f_{0} estimation for periodic signals of arbitrary waveform, and without the requirement of a prior frequency analysis. Several simulations with short signals employing different types of periodicity and with added wideband noise were performed to demonstrate and evaluate the basic properties of the proposed algorithm. Depending on the SignaltoNoise Ratio (SNR), the estimator was found to converge within 3–4 signal repetitions, even at SNR close to or below 0 dB. Furthermore, it was found to follow fundamental frequency sweeps with a delay of less than one period and to track all tones of a threetone musical chord signal simultaneously. Quasiperiodic sounds with shifted harmonics as well as signals with stochastic periodicity were robustly tracked. Mean and standard deviation of the estimation error, i.e., the difference between true and estimated f_{0}, were at or below 1 Hz in most cases. The results suggest that the proposed algorithm may be applicable to lowdelay speech and music analysis and processing.
Key words: Multiperiodicity / Multipitch / Fundamental frequency estimation / Harmonictonoise ratio / Hearing acoustics
© V. Hohmann, Published by EDP Sciences, 2021
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Oscillations are common processes in natural and technical systems, leading to (quasi) periodicity constituting an important informationbearing feature of the signals emerging from such systems. Examples of natural signals are human speech, music, animal songs and echolocation cries. Technical examples are oscillating mechanical or electrodynamic systems. Detecting and processing (quasi) periodicity is therefore a key element of psychoacoustic and physiologic auditory systems modeling (e.g., [1, 2]), speech and music analysis and processing (e.g., [3, 4]), radio communication systems (e.g., [5]) as well as of power electronics systems (e.g., [6]). This study derives a lockedloop algorithm for fast detection and estimation of (quasi) periodicity emerging in time signals. It is motivated by auditory modeling approaches and adopts several processing principles commonly used in audio signal processing. A basic evaluation of the algorithm is provided using tests signals typical for audio applications, demonstrating its potential for widespread application in audio signal processing.
Signal periodicity emerges in many physical systems under the influence of internal delay, such as delayed feedback or coupling [7]. Therefore, modeling emerging periodicity often includes timedelayed coupling as well, examples being correlation analysis, differential equations and finite difference equations (i.e., filtering). To detect and estimate periodicity in the time domain, PhaseLocked Loops (PLL) are often used [8]. However, most PLLs do not include internal time delays corresponding to the periodicity to be detected, and are therefore best adapted to detecting sinusoidal signals by design. An exception is the DelayLockedLoop (DLL) [9, 10], which uses internal delays to detect and estimate periodic signals of rectangular shape optimally. DLLs are used for binary data and clock signal regeneration [11] as well as for synchronizing power grids [12]. However, they are not suited for (quasi) periodic signals of an arbitrary waveform shape. Several lockedloop techniques were proposed to estimate such generic periodic signals in audio signal processing [13, 14], but these techniques use several PLLs to estimate the fundamental and the harmonics of the periodic signal in parallel, which makes them computationally rather expensive and implies the underlying componentwise signal model to be sinusoidal rather than periodic. The contribution of this study is the combination of the lockedloop principle with delaybased superposition and cancellation mechanisms known from auditory systems modeling [15–17] to derive a novel delaybased locked loop, the PeriodModulated Harmonic Locked Loop (PMHLL). The PMHLL is explicitly tailored to modeling (quasi) periodic signals of arbitrary waveform shape and can estimate emerging periodicity quickly with high accuracy and low memory footprint. To the author’s knowledge no other lockedloop algorithm has been proposed that is directly controlled by a periodic signal component emerging in the input signal.
The remainder of the paper is organized as follows: First, the PMHLL is introduced in detail, including a thorough discussion of its properties, leading to a set of testable hypotheses as a basis for simulation design. Then, simulation results are presented and discussed. Although the algorithm is motivated by auditory models and was evaluated with signals in the audio frequency range, results generalize towards frequencytransposed versions of these signals, because the PMHLL is scaleinvariant by design. Finally, potential applications of the PMHLL in several fields are discussed. Matlab scripts generating the simulations and figures of this study are available in Zenodo: https://doi.org/10.5281/zenodo.5727729 [18].
2 Methods
2.1 Algorithm outline
The PMHLL is a timedomain lockedloop algorithm that adapts and locks to a periodic signal component with timevarying fundamental frequency f_{0}(t) emerging in its input signal x(t). Once locked to the periodic component, the PMHLL adapts its internal PMHLL oscillator frequency f_{c}(t) over time to the fundamental frequency of the periodic component, i.e., the time course of the PMHLL oscillator frequency is an estimate of the fundamental frequency: f_{c}(t) = $\widehat{{f}_{0}}\left(t\right)$. T_{c}(t) = 1/f_{c}(t) is an estimate of the signal’s fundamental period T_{0}(t), and is called the PMHLL oscillator period.
In addition to the fundamental frequency, the PMHLL estimates the HarmonictoNoise Ratio HNR(t), i.e., the timevarying ratio between the power of the periodic component at f_{0}(t) and the power of all other components present in the signal at time t. Locking of the PMHLL to a periodic component is indicated by the HNR being above 0 dB, which is the expected value of the HNR for pure noise signals. The higher the HNR, the higher is the saliency of the periodic component the PMHLL is locked to.
The PMHLL only adapts and locks to a periodic signal component emerging in the input signal, if its fundamental frequency lies in a welldefined PMHLL catch range around the current value of the PMHLL oscillator frequency. In the absence of a periodic component in the catch range, the PMHLL oscillator frequency f_{c}(t) meanders freely with the HNR estimate being low, indicating that the PMHLL is not locked. Note that the catch range follows the PMHLLoscillator frequency. If the fundamental frequency of the component the PMHLL is locked to jumps out of the current catch range, the PMHLL loses track.
In an application, several instances of the PMHLL can be run in parallel or serially, each starting at a different initial PMHLL oscillator frequency f_{c,0}, to scan a signal for the presence of periodic components in a certain predefined range of fundamental frequencies. For example, to scan a full octave, it can be divided into 12 semitones, and one PMHLL instance is started in each semitone range with its PMHLL oscillator frequency confined to that range.
In the following, the signal model and theory behind the PMHLL algorithm are discussed, followed by the specifics of how it is implemented in this study.
2.2 Signal model and PMHLL algorithm
The signal model underlying the PMHLL is a periodic signal with timevarying fundamental frequency f_{0}(t), computed as the derivative of the instantaneous phase φ(t), and a finite number N_{p} of harmonics:
$$\begin{array}{c}x\left(t\right)=\sum _{n=1}^{{N}_{p}}{a}_{n}\mathrm{sin}\left(\mathrm{n\phi}\left(t\right){\phi}_{0,n}\right),\\ {f}_{0}\left(t\right)=\frac{1}{2\pi}\frac{\mathrm{d}\phi \left(t\right)}{\mathrm{d}t}.\end{array}$$(1)
The amplitude a_{n} and phase φ_{0,n} of the single harmonics are not considered, i.e., only the signal periodicity emerging at the fundamental period T_{0} = 1/f_{0} drives adaptation and locking. Because uncontrolled amplitudes and relative phases of the harmonic components lead to an uncontrolled shape of the signal waveform, the PMHLL cannot use a reference signal as a modulator for deriving a loop control signal, as it is the case for standard lockedloop algorithms [8]. Instead, the signal model (1) is explicitly used to define a periodsynchronous modulation filter, which, when applied to the input signal, yields the phasebased control signal required to adapt the loop.^{1} Although the PMHLL has no intrinsic reference oscillator, the fundamental frequency estimate f_{c}(t) = $\widehat{{f}_{0}}\left(t\right)$is called the PMHLL oscillator frequency throughout this work, as introduced above.
The remainder of this section describes the details of the PMHLL algorithm, in particular periodstabilization and periodsynchronous modulation filtering. Algorithm properties are discussed in some depth to be able to formulate testable hypotheses as a basis for simulation design.
To stabilize the PMHLL control loop and to compute periodsynchronous signal features,^{2} the modeling concept of strobed temporal integration [17] is adopted. Socalled strobe points are generated once per signal period in this concept. Whereas strobe points are computed from the input signal in [17], specifically from the signal envelopes in frequency subbands, the PMHLL generates strobe points from its oscillator frequency f_{c}, which is loopcontrolled by the input signal. Figure 1 shows the principle of strobepoint generation, assuming that f_{c} increases linearly from 50 Hz to 150 Hz in the time interval 0 ms to 100 ms (plotted in blue, left scale). The corresponding phase function,
$$\mathrm{\Phi}\left(t\right)=\underset{0}{\overset{t}{\int}}{f}_{c}\left({t}^{\mathrm{\prime}}\right)\mathrm{d}{t}^{\mathrm{\prime}},$$(2)normalized by 2π and taken modulo 1, is plotted in red (right scale). Whenever the phase wraps, a strobe point is set, indicated by black bars in Figure 1. The procedure yields a monotonically increasing sequence of strobepoint temporal positions,
$${T}_{s}\left(t\right)=\left\{{t}_{s,1},{t}_{s,2},{t}_{s,3},\dots ,{t}_{s,n}\right\},$$(3)where t_{s,1} denotes the temporal position of the first strobe point found, and t_{s,n} denotes the position of the last strobe point elicited before the current time t. The length of the sequence increases with time. The time span between subsequent strobe points indicates the stabilized period. Averaging signal features such as fundamental frequency, signalpower or HNR across the time span between strobe points yields the corresponding periodsynchronous (or: periodstabilized) features. Furthermore, strobed integration is applied by computing a stabilized image (called stabilized auditory image (SAI) in [17]) of a signal s(t) by,
$$\begin{array}{c}{t}^{\mathrm{\prime}}=t{t}_{s,n},\\ {b}_{\mathrm{SI}}\left({t}^{\mathrm{\prime}}\right)={\mathrm{lp}}_{\tau}\left(s\left({t}_{s,n}+{t}^{\mathrm{\prime}}\right),s\left({t}_{s,n1}+{t}^{\mathrm{\prime}}\right),s\left({t}_{s,n2}+{t}^{\mathrm{\prime}}\right),\dots ,s\left({t}_{s,nN}+{t}^{\mathrm{\prime}}\right)\right),\\ {y}_{\mathrm{SI}}\left(t\right)={b}_{\mathrm{SI}}\left({t}^{\mathrm{\prime}}\right),\end{array}$$(4)where t denotes the current time instant, t’ denotes the time elapsed since the last strobe point, and lp_{τ}(·) denotes lowpass filtering with τ being the time constant.^{3} s(t) may be the input signal x(t) itself, or a filtered version of it (see below). Equation (4) means that the signal segment starting at each strobe point is continuously averaged into a signal buffer b_{SI}, which enhances the waveform of a periodic signal that emerges at the current stabilized period: Signal components that match the current stabilized period superpose constructively, whereas all other components average out (“noise”). According to (4), each entry in the signal buffer is updated once per period, and the buffer entry b_{SI}(t′) updated at time t is copied to an ongoing periodstabilized output signal y_{SI}(t).^{4}
Figure 1 Principle of strobepoint generation: The fundamental frequency estimate of a periodic signal is assumed to increase linearly from 50 Hz to 150 Hz in the time interval 0–100 ms (blue curve, left scale). The corresponding phase function according to (2) is normalized by 2π and taken modulo 1 (red curve, right scale). Whenever the phase wraps, a strobe point is set (indicated by black bars with circles on top). The time span between subsequent strobe points indicates the stabilized period. 
The main algorithmic novelty in this study is the proposed periodsynchronous modulation filter, which loopcontrols the adaptation of the PMHLL oscillator frequency f_{c}. Figure 2 shows the transfer functions in the range between f_{c}/2 and 3f_{c}/2 of the filters used in the proposed modulation filter (amplitude: blue curves, left scale; normalized phase: red curves, right scale). Two period synchronous filters are computed in parallel, a periodconstructive (solid lines) and a periodsuppressive (dashed lines) comb filter:
$$\begin{array}{c}{y}_{p}\left(t\right)=x\left(t\right)+x\left(t{T}_{c}\right),\\ {y}_{m}\left(t\right)=x\left(t\right)x\left(t{T}_{c}\right),\end{array}$$(5)where T_{c} = 1/f_{c} is the current PMHLL oscillator period and y_{p}, y_{m} are the outputs of the periodconstructive and –suppressive comb filters. If the fundamental frequency f_{0} of the signal matches the PMHLL oscillator frequency f_{c}, the output of the periodconstructive filter is twice the input, whereas the output of the periodsuppressive filter vanishes. If f_{0} is lower (indicated as f_{0,1}) or higher (indicated as f_{0,2}) than f_{c}, the ratio between the filter output amplitudes y_{p}(t)/y_{m}(t) decreases, and the phase difference between y_{p}(t) and y_{m}(t) is +90° or −90°, respectively. The PMHLL uses this phase difference as its loop control signal: If the phase difference is positive, the PMHLL oscillator frequency is increased by some factor, and it is decreased by some factor if the phase difference is negative. The PMHLL control loop sensitivity is expected to be higher than for DLL designs due to the abrupt phase flip at the locking point. Several options exist for computing the timevarying phase difference, one of which is using the sign of the instantaneous frequency of the complexvalued signal y_{p}(t) + iy_{m}(t). The filter outputs y_{p}(t) and y_{m}(t) may be periodstabilized according to equation (4) prior to computing the phase difference, and/or the phase difference may be periodstabilized by taking a timeaverage across a small fraction of the current PMHLL oscillator period T_{c} before taking the sign. For the specific implementation of the PMHLL used in this study see the next subsection.
Figure 2 Filter transfer functions in the range between f_{c}/2 and 3f_{c}/2, centered at the current PMHLL oscillator frequency (i.e., fundamental frequency estimate) f_{c}. The amplitude is shown in blue (left scale), and the normalized phase is shown in red curves (right scale). The periodconstructive comb filter (solid lines) enhances a component at f_{c}, whereas the periodsuppressive comb filter (dashed lines) cancels it. If the true fundamental frequency f_{0} of a signal is lower (indicated as f_{0,1}) or higher (indicated as f_{0,2}) than f_{c}, the phase difference between the two comb filters is +90° or −90°, respectively. This phase difference is used by the PMHLL algorithm as a control signal to adapt the fundamental frequency estimate f_{c} towards f_{0}. Note that, due to the periodicity of the transfer functions involved, the same behavior applies to all harmonics of the signal, i.e., the phase flip between +90° and −90° occurs at the same value of f_{c} for the fundamental and all its harmonics. Thus, the fundamental and all its harmonics contribute to the control signal in the same way. 
The HNR at the current PMHLL oscillator frequency is estimated by,
$$\mathrm{HNR}\left(t\right)=\frac{{\langle {\left{y}_{p}\left(t\right)\right}^{2}\rangle}_{\tau}}{{\langle {\left{y}_{m}\left(t\right)\right}^{2}\rangle}_{\tau}},$$(6)where t denotes the current time instant and 〈∙〉_{τ} denotes a running temporal average with a time constant of τ. The HNR may be periodstabilized by using the stabilized images of the comb filter outputs in the nominator and denominator and/or by adapting the time constant to be a multiple of the current PMHLL oscillator period T_{c}.
Figure 2 shows a single period of the periodic comb filter transfer functions, centered at the PMHLL oscillator frequency f_{c}. If the signal contains only the fundamental frequency component, the PMHLL catch range is [f_{c}(1 − 1/2), f_{c}(1 + 1/2)]. Beyond this range, the phase difference flips, and the control signal becomes ambiguous. If the signal contains higher harmonics up to harmonic number N_{p}, the frequency difference f_{c} − f_{0} at the fundamental increases to N_{p}·(f_{c} − f_{0}) at the highest harmonic. This means that the PMHLL catch range reduces to,
$$\Delta {f}_{c}=\left[{f}_{c}\cdot \left(1\frac{1}{2{N}_{p}}\right),\hspace{1em}{f}_{c}\cdot \left(1+\frac{1}{2{N}_{p}}\right)\right]\approx \left[\frac{{f}_{c}}{\left(1+\frac{1}{2{N}_{p}}\right)},\hspace{1em}{f}_{c}\cdot \left(1+\frac{1}{2{N}_{p}}\right)\right].$$(7)
As an example, the catch range for N_{p} = 7, which may be a typical choice for lowpassfiltered speech, is $\left[{f}_{c}\left(1\frac{1}{14}\right),{f}_{c}\left(1+\frac{1}{14}\right)\right]\approx \left[\frac{{f}_{c}}{1.07},1.07{f}_{c}\right]$, i.e., it corresponds to a semitone in each direction (factor 2^{1/12} ≈ 1.06). Thus, a bank of about six PMHLL instances is required to monitor a full octave. If the highest harmonic is N_{p} = 11, which may be a typical choice for auditory models, the catch range is reduced, and about ten PMHLL instances are required per octave. Note also that all PMHLL instances can share a single delay line for computing the comb filters (5).
Note that a PMHLL may still converge to periodic signals outside its catch range, because the higher harmonics that go beyond the nonambiguity range may not fully destroy the phase difference signal, depending on the relative amplitude of the lower and the higher harmonics. To ensure convergence in the case of an emerging periodic component in the catch range, a lowpass shelving filter at N_{p}·f_{c} may be used to suppress ambiguous harmonics potentially present in the input. The implementation used in this study does not include such a filter.
Note that random noise components added to the periodic component contribute at random to the positive and negative phase regime of the periodsynchronous modulation filter. The phasebased control signal thus may have a larger variation across time, but the noise does not bias its mean value. It is therefore expected that the PMHLL is rather insensitive to additive noise. Furthermore, a somewhat counterintuitive expectation is that increasing the noise bandwidth may improve the controlsignal statistics, because the balance of components falling in the positive and in the negative phase regime may be improved. On the other hand, a control system that uses the combfilter amplitude ratio for adaptation by, e.g., maximizing the HNR according to equation (6), would need a rather long time constant to estimate the HNR gradient, at least in the presence of noise.
Note that small deviations from the harmonic signal model (1), e.g., by mistuning of some or all harmonics or by stochastic modulation, lead to quasiperiodic signals, which may not fully destroy the phasebased control signal. It is therefore expected that the PMHLL also converges in the case of small deviations from the signal model.
Note also that all frequency components below f_{c}/2 lie in the positive phase regime, erroneously driving the PMHLL towards a higher oscillator frequency. Depending on the strength of these components, a highpass shelving filter at f_{c}/2 may be needed to suppress these components. The implementation used in this study does not include such a filter.
2.3 PMHLL Implementation
The digital implementation of the PMHLL [18] used in this study closely follows the algorithm description given above. It includes the following processing steps:
Sample the input signal x(t) at a sampling frequency f_{s} = 5 kHz.
Initialize an instance of the PMHLL with an initial PMHLL oscillator frequency f_{c,0} ≡ f_{c}(t = 0), which was chosen differently for the different simulations (cf. next section). f_{c} was set to a minimum of 96 Hz (80 Hz in simulation VII), adaptation was limited to this lower value. A maximum number of harmonics N_{p} = 7 was set to define the catch range.^{5}
Compute the two combfilter outputs y_{p} and y_{m} according to equation (5) from the input signal. If the current PMHLL oscillator period T_{c} lies between two sampling points of the signal delay line, linear interpolation of sample values is applied.
Compute the stabilized image y_{p,SI} and y_{m,SI} of the two combfilter outputs according to equation (4).
Compute the HNR according to equation (6) from the stabilized image y_{p,SI} and y_{m,SI} of the two combfilter outputs.
Compute the control signal c_{s}(n) by computing the phase difference between subsequent samples of the complexvalued signal y_{p,SI} + iy_{m,SI} and averaging the phase difference across time:
$$\begin{array}{c}c\left(n\right)={y}_{p,\mathrm{SI}}\left(n\right)+i{y}_{m,\mathrm{SI}}\left(n\right),\\ \mathrm{adc}\left(n\right)=\mathrm{arg}\left(c\left(n\right)\cdot {c}^{\mathrm{*}}(n1)\right)\\ \mathrm{cs}\left(n\right)=\langle \mathrm{adc}\left(n\right){\rangle}_{\tau}.\end{array},$$(8)
Adapt the PMHLL oscillator frequency f_{c} at each sample by multiplication with (control signal negative) or division by (control signal positive) a factor. The factor is chosen such that the PMHLL adapts to an emerging periodic component in its expected catch range (here: about a semitone) within two signal repetitions, i.e., within three times the PMHLL oscillator period T_{c} from the onset of the periodic component.
All running temporal averaging filters were implemented as 1storder IIR lowpass filters with time constants defined as a multiple of the current PMHLL oscillator period T_{c}. This means that all parameters controlling the PMHLL relate to T_{c}, rendering the PMHLL fully scalable with T_{c}. Time constants were 1.0 T_{c} for computing the stabilized image of the combfilter outputs (processing step 4), 0.5 T_{c} for the computation of the HNR (processing step 5) and 0.1 T_{c} for the phase difference averaging (processing step 6). The HNR was further smoothed with a time constant of 0.05 T_{c}. Thus, time constants were in the order of the magnitude of the oscillator period T_{c} or below, ensuring quick adaptation of the loop. Informal tests showed that the PMHLL convergence is robust against some variation in the time constants, but confirmed the expectation that the control loop starts oscillating if the time constants are increased to a small multiple of the oscillator period. Time constants were updated at each sample, but could also be updated once per period, i.e., when a strobe point is elicited.
The PMHLL processing is performed sample by sample, i.e., it is a lowdelay online algorithm. The adaptation time is expected to be about three times the period of an emerging periodic component and a fraction of this period once the loop is locked to a periodic component with continuously varying fundamental frequency. The memory footprint is mainly determined by the maximum oscillator period, according to equation (5). At a minimum fundamental frequency to be tracked of 50 Hz and a sampling frequency of 5 kHz, e.g., the delay line is 100 samples long. Furthermore, a small number of filter states and input and output samples need to be stored. An autocorrelation function testing the same range of delays needs a delay line of the same length, but, being the inverse Discrete Fourier Transform of the power spectrum, uses implicitly a less specific sinusoidal signal model. The PMHLL requires a couple of operations per sample, as described above, and thus is low in computational effort. A prior frequency analysis (shorttime Fourier transform or filterbank) is not required.
2.4 Simulations
By design and parameterization of the PMHLL, several hypotheses about its behavior and properties can be formulated:
The PMHLL adapts and locks to a periodic component emerging in the input signal in the PMHLL catch range within a few repetitions of the signal.
Discontinuities in the signal periodicity are indicated by a drop in the HNR estimate. Locking is reached again after a few repetitions of the signal as long as the signal fundamental frequency stays in the PMHLL catch range.
PMHLL adaptation is ongoing once the PMHLL is locked to a periodic component with a delay below the duration of one period.
The PMHLL is noiserobust. Quick adaptation is even reached at low SignaltoNoise Ratios (SNR) and with wideband noise signals. Noise may include harmonic components at other fundamental frequencies.
The PMHLL also adapts to quasiperiodic signals, including complex tones with shifted harmonics and signals with stochastic periodicity.
To test these hypotheses, simulations were performed using harmonic signals according to equation (1) with additive Ν(0, 1) – distributed Gaussian noise at different SNRs. Signals with different frequency content (fundamental missing or present) and with fixed, noncontinuous and sweeping fundamental frequency tracks were used, which lie in the catch range of the PMHLL instance. Furthermore, complex tones with shifted harmonics and a threetone musical chord signal were tested. In the latter case, three PMHLL instances were used in parallel to track each tone of the chord separately. Finally, Iterated Rippled Noise (IRN, [20]) was tested as an example of stochastic periodicity. IRN was generated by adding a 4s sample of Ν(0, 1)–distributed Gaussian noise and its delayed version in an iterative process (ADDSAME procedure in [20]). Because the delay was implemented with a circular signal shift, a segment around the center of the resulting IRN was selected for processing to exclude edge effects. Short signals with a duration between 100 ms and 400 ms were used to check the high adaptation speed expected for the PMHLL. The following simulations were performed, the details of which are listed in Table 1:
The signal was a harmonic complex tone with a duration of 400 ms and a fixed f_{0}, which abruptly changed from 98.5 Hz to 101 Hz after 200 ms. The complex tone had five harmonics at different amplitudes and included the fundamental. Three different SNRs were employed, 21.5 dB, 7.5 dB and 1.5 dB. Simulation I mainly tests hypotheses 1 and 2, but also 4.
Simulation II is the same as simulation I at a SNR of 7.5 dB, but all harmonics were shifted by +6 Hz or −6 Hz. A bias in the f_{0}estimate is expected, but HNR should still remain rather high, and the variance in the f_{0}estimate should not significantly increase compared to the harmonic case. Simulation II mainly tests hypothesis 5, but also 1, 2 and 4.
The signal in simulation III only contained the 6th and 7th harmonic, the SNR was 4.1 dB. Again, f_{0} switched from 98.5 Hz to 101 Hz after 200 ms, with a total signal duration of 400 ms. Simulation III tests mainly hypothesis 1 for the special case of a missing fundamental.
Simulation IV tests a complex tone with a linear sweep of its fundamental frequency from 96.0 Hz to 103.0 Hz within 100 ms. Again, harmonic numbers 1, 3, 4, 6 and 7 were included. Three SNRs, 21.3, 7.6 and 1.4 dB were tested. Simulation IV mainly tests hypotheses 3 and 4.
Simulation V tested an IRN with 5, 3 and 1 iterations to see whether the PMHLL also adapts to stochastic periodicity (hypothesis 5). The fundamental frequency was 98 Hz.
Simulation VI tested a chord signal with three tones at equal level, each including harmonic numbers 3, 4, 6 and 7. Furthermore, noise was added such that the SNR per component (against the noise plus the other two components) was −2.2 dB, and the total SNR of all three tones together was 3.1 dB. Each tone was estimated by a PMHLL instance with an initial oscillator frequency at a semitone above the tone fundamental frequency. Simulation VI mainly tested hypothesis 4.
List of simulations performed. 1st col.: Simulation number; 2nd col.: Brief signal description; 3rd col.: Total signal duration; 4th col.: Fundamental frequency; 5th col.: Composition of each complex tone (number of harmonic and its amplitude, n/a for IRN); 6th col.: Amount of mistuning of all components (simulation II only); 7th col.: Number if iterations of the IRN (simulation V only); 8th col.: Gain of additive Ν(0, 1)distributed Gaussian noise; 9th col.: SignaltoNoise Ratio computed across the whole signal duration (n/a for IRN); 10th col.: Initial PMHLL oscillator frequency: 11th col.: Tracking error in terms of mean and standard deviation across the whole signal duration, excluding the time period of initial convergence (cf. Sect. 3).
2.5 Data analysis
The estimation error was quantified by the mean and standard deviation across time of the difference between the PMHLL oscillator frequency and the fundamental frequency of the signal f_{c}(t) – f_{0}(t). The initial convergence part of the PMHLL, which amounts to about 10% of the signal duration, was excluded from the statistics. Note that signals were rather short, the phases of the harmonics were selected at random, and the noise was random as well. The error values thus may vary when repeating the simulation. Yet, results show a clear pattern, so that further simulations were not required to test the hypotheses.
3 Results
Figure 3 shows the data of simulation I. SNR decreases from left to right panel, and each panel shows the HMPLL oscillator frequency f_{c}(t) referenced to 96 Hz (red curve, right scale), the HNR (blue curve, left scale) and the strobe points (black bars). The signal’s fundamental frequency f_{0} was +2.5 Hz (first half) and +5 Hz (second half) on this scale. Data show an initial convergence within about three periods, also after the change in the fundamental frequency, and an ongoing locking. The HNR increases quickly during adaptation, and drops within one period after the change in fundamental frequency, indicating an unlocked state. The HNR is higher than the nominal SNR due to the combfiltering, and decreases with SNR. The statistics (last col. of Tab. 1, 1st entry) shows that mean and std. dev. of the tracking error increases with decreasing SNR, but remains in all cases well below 1 dB.
Figure 3 Simulation I: Complex tone with a fixed fundamental frequency of 98.5 Hz (1st half) and 101.0 Hz (2nd half), SNR is 21.5 dB, 7.5 dB and 1.5 dB (left, center and right panel). Each panel shows the PMHLL oscillator frequency (i.e., fundamental frequency estimate) f_{c} referenced to 96 Hz (red curve, right scale), the HNR (blue curve, left scale) and the strobe points (black bars), as a function of time. The black line shows the signal’s fundamental frequency trajectory. Data show quick convergence and small estimation variance even at the lowest SNR. The HNR depends on SNR and quickly goes low during loop adaptation, thus indicating the unlocked state. 
The two left panels of Figure 4 show the same as simulation I for an SNR of 7.5 dB (center panel of Fig. 3), but with a mistuning of all harmonics by +6 Hz (left panel) and −6 Hz (center panel). Data show the same pattern of loop convergence and HNR as in the harmonic case, with a similar trackingerror std. dev. of below 1 dB (last col. of Tab. 1, 2nd entry). This indicates that quasiperiodicity due to mistuning does not affect PMHLL functionality, as expected. The fundamental frequency estimate during ongoing locking is biased by +1.21 Hz and −1.39 Hz, respectively, indicating that the mistuning systematically biases estimation.
Figure 4 Complex tone with all harmonics mistuned by +6 Hz (left panel) and −6 Hz (center panel) at a SNR of 7.5 dB (Simulation II; same as center panel of Figure 3, but with mistuning). Data presentation is the same as in Figure 3. Mistuning does not affect PMHLL functionality, but the fundamental frequency estimate is biased on average by +1.21 Hz and −1.39 Hz, respectively. Right panel: Data for a complex tone with missing fundamental (simulation III) shows a similar pattern as the corresponding case of a complex tone with the fundamental present (center panel of Fig. 3). The HNR drops by about 3.4 dB compared to the corresponding case, because the three lowest of in total five harmonics were discarded from the signal. 
The right panel of Figure 4 shows the data for a complex tone with missing fundamental (simulation III). A pattern similar to the corresponding case of a complex tone with the fundamental present (center panel of Fig. 3) is found. Note that SNR is by 3.4 dB lower than for the corresponding case, because the three lowest harmonics (1, 3 and 4) were discarded from the signal. Accordingly, the HNR drops by about the same amount. The statistics (last col. of Tab. 1, 3rd entry) shows that mean and std. dev. of the tracking error again remain well below 1 dB.
Figure 5 shows the data of the sweep signal (simulation IV). SNR decreases from left to right panel, data presentation is as in Figure 3. Data clearly show quick convergence and an ongoing locking with a delay that is shorter than one period, even at the lowest SNR of 1.4 dB. The statistics (last col. of Tab. 1, 4th entry) again shows that mean and std. dev. of the tracking error increases with decreasing SNR, but remains in all cases below 1 dB.
Figure 5 Complex tone with a linear sweep of the fundamental frequency from 96 Hz to 103 Hz (simulation IV). SNR decreases from left to right panel, data presentation is as in Figure 3. Data clearly show quick convergence and an ongoing locking with a delay that is shorter than one period, even at the lowest SNR of 1.4 dB (right panel). Note that the total duration of the signal is only 100 ms. 
Figure 6 shows the data of the IRN (simulation V). The number of iterations was 5, 3 and 1 (left to right panels), data presentation is as in Figure 3. The estimate converges quickly to 98 Hz, which corresponds to the IRN delay. Mean and std. dev of the tracking error increases with less iterations, but stays in all cases well below 1 dB (last col. of Tab. 1, 5th entry). The HNR decreases with less iterations, and shows a higher variance as for the complex tones. It appears that the loop sometimes loses track and catches up again quickly, as indicated by the dips in the HNR, which coincide with a larger deviation in the fundamental frequency estimate.
Figure 6 IRN with 5, 3 and 1 iterations (left to right panels, simulation V). Data presentation is as in Figure 3. The estimate converges quickly to the target value of 98 Hz (+2 Hz on the relative frequency scale). The HNR decreases with less iterations and shows some variance, but the fundamental frequency estimate is still quite accurate. 
Figure 7 shows the data of the 3tone major chord signal (simulation VI). The upper panel shows the fundamental frequency estimate of the three PMHLL instances as a function of time. For each instance, the scale was referenced to the fundamental frequency of the tone it converges to. Although the SNR is at −2.2 dB for each of the tones, the PMHLL instances converge within about 20 ms, which corresponds to about four periods of the lowestfrequency tone. The mean tracking error (last col. of Tab. 1, 6th entry) is at or below 1 Hz. The std. dev. of the tracking error is higher than in the other simulations, but it is still rather low, given the high level of noise. The panel also shows the strobe points of each PMHLL instance, with shorter bar lengths for higher tones. The middle panel of Figure 7 shows the mean HNR per component, which is around 5 dB, i.e., about 7 dB higher than the signal SNR. The lower panel shows the nonnormalized intensity spectrum of the first 30 ms of the signal (coarse curve) and of the complete signal (fine curve). The spectrum of the first 30 ms does not show any clear tonal components, i.e., it appears impossible to estimate the chord from this FFT data. The spectrum of the complete signal would allow a chord estimate, but some logic would be needed to derive the correct tones and associate the tonal components. Note that the PMHLL converged within 20 ms.
Figure 7 3tone major chord signal (simulation VI). The upper panel shows the oscillator frequency (i.e., fundamental frequency estimate) f_{c} of three PMHLL instances as a function of time. Each instance targeted one of the tones by selecting the instance’s initial f_{c} to be a semitone above the target’s f_{0}, i.e., f_{0} lies at the upper edge of the instance’s catch range. In the plot, each instance’s f_{c} time course was referenced to its target f_{0}, i.e., the target f_{0} is plotted at 0 Hz. The PMHLL instances converge within about 20 ms at an SNR of −2.2 dB for each tone. Strobe points are plotted separately for the different instances, with shorter bar lengths for higher tones. The middle panel shows the mean HNR per component, and the lower panel shows the nonnormalized intensity spectrum of the first 30 ms of the signal (coarse curve) and of the complete signal (fine curve). The tones may not be derived from the 30ms FFT data, because tonal components are not resolved. Note that the PMHLL converged in 20 ms. 
4 Discussion
The simulations using a variety of test signals showed that the PMHLL adapts to periodic signal components emerging in a wideband time signal within 3–4 periods of the onset of the component and tracks changes in the component’s fundamental period, or fundamental frequency, with a delay that is well below the duration of one period. To the author’s knowledge, similarly short adaptation and tracking time constants have not been achieved with established methods, at least not for SNR values at or even below 0 dB. This achievement may be due to the sensitivity of the novel phasebased control signal, which is directly derived from the model of a harmonic signal with varying fundamental frequency.
To illustrate a possible application of the PMHLL to speech signals, a signal with timevarying values of fundamental frequency, first and second formant was generated using the vocoder method introduced in [21]. The signal sounds like a continuous succession of different vowels with varying pitch. This type of signal was used in [21] to investigate human attentive tracking of voices and was therefore considered suitable for the purpose of demonstrating pitch tracking using the PMHLL.^{6} Figure 8 shows the spectrogram of the signal using a 512tap windowed FFT with a window overlap of 90%. The signal’s initial fundamental frequency is at about 245 Hz. The left panel shows the spectrogram of the signal at an SNR of 0 dB and the fundamental frequency estimate of a PMHLL instance that was started at an oscillator frequency f_{c} of 310 Hz (black curve). The right panel shows the spectrogram of the clean voice and again the PMHLL estimate, together with its integer multiples 2–9. The PMHLL instance catches the fundamental frequency trajectory at about 100 ms into the signal and keeps track throughout the duration of the signal. The precision of the estimate can be best assessed at the higher harmonics. The deviation at the 8th harmonic is at or below about 3%. Even the rapid changes at about 0.6 s and 1.0 s are tracked. Note that the spectrogram of the noisy signal, from which the PMHLL estimate was derived, does not show the fundamental, and most of the time only the second harmonic is visible.
Figure 8 Spectrogram of a vocoded voiced speech signal with timevarying trajectories of fundamental frequency, first and second formant. The left panel shows the spectrogram of the signal at an SNR of 0 dB as well as the fundamental frequency estimate derived from the noisy signal with a PMHLL instance that was started at an oscillator frequency f_{c} of 310 Hz (black curve). The right panel shows the spectrogram of the clean voice and again the PMHLL estimate, together with its integer multiples 2–9. The PMHLL instance catches the fundamental frequency trajectory at about 100 ms into the signal and keeps track throughout the duration of the signal. Note that the spectrogram of the noisy signal does not show the fundamental, and most of the time only the second harmonic is visible. 
A requirement for the algorithm to adapt and converge is that the emerging periodic component lies in the catch range of the PMHLL instance. This is the case for the example in Figure 8, although the initial values of the signal’s fundamental frequency and of the PMHLL instance’s oscillator frequency were quite apart. To scan for a whole range of potentially observable fundamental frequencies in parallel, a set of PMHLL instances with different initial oscillator frequencies is required, all sharing the same delay line. The data from such a set, i.e., the tracks of oscillator frequency and HNR for each instance, can be sent upstream for further analysis and interpretation, e.g., by a Computational Scene Analysis (CASA) algorithm or a multipitch tracker. The tracks with salient periodicity (i.e., HNR is high) form segregated periodic glimpses of the sound, of which streams can be formed, e.g., for tracking speech. If only one instance of the set indicates a salient periodicity, only one source is active at that point in time, as it is the case for the signal from Figure 8, and the subsequent analysis is rather easy. If multiple salient periodicities are detected at the same point in time, a multisource scenario is detected. Consequently, the subsequent analysis stage must form multiple streams to segregate the different sources. In that sense, a set of PMHLL instances provides a set of periodicityrelated features that may replace, e.g., an autocorrelation analysis, if lowdelay processing with high temporal resolution is required. The advantage is that multiple periods can be detected in parallel, which allows the subsequent processing to segregate different sources.
Depending on the application, about 7–12 PMHLLinstances may be sufficient to monitor a full octave of fundamental frequencies, depending on the signal’s bandwidth. To monitor three octaves, about 30 PMHLL instances are needed. However, periodic signals are also periodic at integer multiples of the period (subharmonics), i.e., PMHLL instances can be coupled across octaves. Although each PMHLL instance is low in computational effort, the effort scales with the number of instances used. Note, however, that an established algorithm like YIN [22] performs a greedy search across all possible periods in each time frame. To scan the full range of three octaves from 50 Hz to 400 Hz at a sampling rate of 5 kHz, for example, about 90 different periods must be tested (delays between 10 and 100 samples). For each period and each frame, a measure related to the power of the output of the periodsuppressive comb filter (2nd line of Eq. (5)) is computed, and the period is selected that is associated with the minimum value of the power measure. This means that YIN is comparable to a set of PMHLL instances in terms of computational effort, but lacks the sensitivity and timeresolution of the phasebased adaptation and is by definition not capable of finding multiple periods in parallel.
The application of the PMHLL is more straightforward if the targeted value or range of fundamental frequencies is known. Assuming that prior knowledge is available at a certain instance of time on which fundamental frequencies may emerge, e.g., from a predictive CASA algorithm, new PMHLL instances may be initialized at these frequencies. The number of required PMHLL instances may be reduced then. Combinations with CASA algorithms will be investigated in future studies.
Although not tested in this study, the results suggest that the PMHLL can be applied to frequency subband signals from a filterbank or from shorttime Fourier transform (STFT). For STFTbased filterbanks as well as other modulating filterbanks, a phase reconstruction may be required by modulating all frequency bands into the baseband (Eq. (4) in [23]).
To properly represent the signal waveform while being able to follow changes in the signal period, all time constants must be selected to reflect the rate of change of the fundamental frequency that occurs in the signal. Speech, e.g., includes rapid changes in the fundamental frequency, i.e., time constants must be rather short, thereby limiting the effective number of periodaverages to only a few periods. For music, the time constants may be longer. In this study, time constants were chosen to be a fixed (small) multiple or fraction of the current PMHLL oscillator period. Any options for its automatic adaptation to the input’s characteristics may be investigated in a subsequent study.
Note that all simulations of this study computed the PMHLL control signal without further processing. By design of the adaptive loop control, the control signal must oscillate around the true fundamental frequency. Tracking algorithms such as Kalman filters (e.g., [24]) or particle filters (e.g., [25]) may be used to stabilize the estimate without adding significant delay.
The fact that all time constants are related to the current PMHLL oscillator frequency makes the algorithm fully scalable to other frequency and fundamental frequency ranges, as long as the sampling frequency is adapted properly. This is an important feature that makes the algorithm potentially interesting for a range of applications, in particular if lowdelay processing is required.
The PMHLL may be applied to extending auditory models of finegrain temporal processing. It can be considered an extension to the strobedintegration model [17] in that it adds a reliable strobepoint generation mechanism. All results obtained with the auditory strobedintegration model thus also apply to the PMHLL. The PMHLL can also be considered an extension of the cancellation model [16] in that it simplifies the search for cancellation delays. The results obtained with the cancellation model thus also apply to the PMHLL.^{7}
Another application area may be audio signal processing, in particular for devices that require lowdelay processing. Hearing aids, e.g., require the total delay to be well below 10 ms, which makes the application of pitchbased processing difficult but yet interesting (e.g., [23]). The PMHLL may allow a new perspective on pitchbased processing for hearing aids. In particular, it may be used for cancelling unwanted periodic components and/or enhancing desired targetrelated periodic speech components. Another option is to improve spectral estimates by using the PMHLL to derive periodsynchronous (“pitchsynchronous”) signal features such as the spectral profile or the HarmonictoNoise Ratio (HNR). Although not tested in this study, the results suggest that a periodsynchronous spectral analysis may be performed on the periodsynchronous features to estimate the spectrotemporal profile of the periodic component the PMHLL is locked to.
The PMHLL may also augment algorithms for lineartifact suppression for electrophysiological experiments (e.g., [27]) by providing a precise estimate of the power of the fundamental and harmonics of the line artifact for optimal suppression. Note that the periodsuppressive comb filter (Eq. (5)) suppresses all harmonics of the component the PMHLL is locked to.
Although rather speculative, the PMHLL may also be used for highfrequency applications such as radio receivers. For this, it may be required to use tunable analog delay lines (e.g., [28]) for implementing the comb filters (Eq. (5)).
Another speculative application is power electronics and grid synchronization [6], because here, specific PLLs are used, which may not be optimal for estimating line signals with unknown phase and amplitude values of its harmonics.
5 Conclusions
The simulation data confirm the theoretical expectation, that the PeriodModulated HarmonicLockedLoop (PMHLL) enables rapid and noiserobust f_{0} (fundamental frequency) estimation from wideband periodic signals in the timedomain. In particular, the following conclusions can be drawn:
Rapid convergence: A PMHLL instance converges within 3–4 periods from the signal onset, even at low SNRs around 0 dB.
Quasiperiodicity: The PMHLL also tracks quasiperiodic signals, such as complex tones with shifted harmonics and signals with stochastic periodicity.
Subperiod sensitivity: Once locked to a periodic component, a PMHLL instance follows a continuously changing f_{0} with an estimation delay of one period or below.
Multiperiodicity: A PMHLL instance only adapts and locks to emerging periodic signal components, if its fundamental frequency lies within an interval around the current oscillator frequency of the PMHLL instance. This “catch range” depends on the signal bandwidth and number of harmonics. For typical applications, around ten parallel PMHLL instances are sufficient to monitor a full octave of fundamental frequencies.
The PMHLL is the first algorithm to combine additive and subtractive (cancellation) models of periodicity processing to derive a phasebased loop control signal. Its adaptation speed and sensitivity is due to the direct application of a timedomain harmonic signal model to the estimation problem.
Parallel and/or sequential estimation of temporallyoverlapping periodic components using several PMHLL instances may facilitate separation of concurrent sources in scene analysis algorithms.
Acknowledgments
Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – ProjectID 352015383 – SFB 1330 – project B2.
Conflict of interest
The author declares that he has no conflicts of interest in relation to this article.
Data Availability Statement
All simulation data and figures presented here can be computed with Matlab scripts that are available in Zenodo: https://doi.org/10.5281/zenodo.5727729 [18].
The periodsynchronous modulation filter is specific to the PMHLL algorithm, which is why its acronym includes “PeriodModulated” (PM). The term HLL was coined before for lockedloops that track harmonic signals [14], which is also the case for the PMHLL.
The term pitchsynchronous is often used in speech research, see, e.g., [19]. Because pitch is a perceptual quantity, however, this term is avoided to make clear quantities estimated here refer to physical quantities.
The signal is included in the supplementary material [18]. After listening to the signal, a human can reproduce it. It is thus very similar to a human voice, but has the advantage that the groundtruth tracks of the fundamental frequency and of the two formants are known. Code for stimulus generation is publicly available at http://mcdermottlab.mit.edu/Attentive_Tracking_StimGen_Package_v2.zip (retrieved on 23.11.2021).
An important characteristic of auditory peripheral processing is the demodulation of the beating of harmonics falling into the same auditory frequency bands by the inner haircell function, a basic model of which is halfwave rectification and lowpass filtering. Periodicity can therefore be detected from resolved (single) harmonics at lower frequencies, and at the fundamental and potentially the first harmonic at higher frequencies [26]. Such a specific demodulation may also be interesting for audio signal processing applications.
References
 A. De Cheveigné: Pitch perception models, in Pitch, Springer, New York, NY. 2005, pp. 169–233. [CrossRef] [Google Scholar]
 A. Josupeit, V. Hohmann: Modeling speech localization, talker identification, and word recognition in a multitalker setting. The Journal of the Acoustical Society of America 142, 1 (2017) 35–54. [CrossRef] [PubMed] [Google Scholar]
 H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, H. Banno: TandemSTRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interferencefree spectrum, F0, and aperiodicity estimation, in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE. 2008, March, pp. 3933–3936. [Google Scholar]
 B. Gold, N. Morgan, D. Ellis: Speech and audio signal processing: Processing and perception of speech and music. John Wiley & Sons, 2011. [CrossRef] [Google Scholar]
 I. Nakazawa, K. Umeno: Fundamental study on almost periodic frequency arrangements for supermultiaccess radio communication systems. IEICE Communications Express 6 (2017) 673–678. https://doi.org/10.1587/comex.2017XBL0135. [CrossRef] [Google Scholar]
 Y. Han, M. Luo, X. Zhao, J.M. Guerrero, L. Xu: Comparative performance evaluation of orthogonalsignalgeneratorsbased singlephase PLL algorithms – A survey. IEEE Transactions on Power Electronics 31, 5 (2015) 3932–3944. [Google Scholar]
 S. Yanchuk, P. Perlikowski: Delay and periodicity. Physical Review E 79, 4 (2009) 046221. [CrossRef] [PubMed] [Google Scholar]
 R.E. Best: Phaselocked loops: Design, simulation, and applications. McGrawHill Education, 2007. [Google Scholar]
 B. Razavi: The delaylocked loop [A circuit for all seasons]. IEEE SolidState Circuits Magazine 10, 3 (2018) 9–15. [CrossRef] [Google Scholar]
 J.J. Spilker, D.T. Magill: The delaylock discriminatoran optimum tracking device. Proceedings of the IRE 49, 9 (1961) 1403–1416. [CrossRef] [Google Scholar]
 T. Xanthopoulos: Digital delay lock techniques, in Clocking in Modern VLSI Systems. Integrated Circuits and Systems, Xanthopoulos T, Ed., Springer, Boston, MA. 2009. https://doi.org/10.1007/9781441902610_6. [CrossRef] [Google Scholar]
 P. Li, L. Xue, P. Hazucha, T. Karnik, R. Bashirullah: A delaylocked loop synchronization scheme for highfrequency multiphase hysteretic DCDC converters. IEEE Journal of SolidState Circuits 44, 11 (2009) 3131–3145. [CrossRef] [Google Scholar]
 J. Böhler, U. Zölzer: Monophonic pitch detection by evaluation of individually parameterized phase locked loops, in 19th International Conference on Digital Audio Effects (DAFX16), Vol. 68. 2016, September. [Google Scholar]
 R.M. Bittner, A. Wang, J.P. Bello: Pitch contour tracking in music using Harmonic Locked Loops, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE. 2017, March, pp. 191–195. [Google Scholar]
 A. de Cheveigné: Separation of concurrent harmonic sounds: Fundamental frequency estimation and a timedomain cancellation model of auditory processing. The Journal of the Acoustical Society of America 93, 6 (1993) 3271–3290. [CrossRef] [Google Scholar]
 A. De Cheveigné: Cancellation model of pitch perception. The Journal of the Acoustical Society of America 103, 3 (1998) 1261–1271. [CrossRef] [PubMed] [Google Scholar]
 R.D. Patterson, M.H. Allerhand, C. Giguere: Timedomain modeling of peripheral auditory processing: A modular architecture and a software platform. The Journal of the Acoustical Society of America 98, 4 (1995) 1890–1894. [CrossRef] [PubMed] [Google Scholar]
 V. Hohmann: Raw data and scripts for “The PeriodModulated Harmonic Locked Loop (PMHLL): A loweffort algorithm for rapid timedomain periodicity estimation”. https://Zenodo.org (2021). https://doi.org/10.5281/zenodo.5727729. [Google Scholar]
 M.V. Mathews, J.E. Miller, E.E. David Jr: Pitch synchronous analysis of voiced sounds. The Journal of the Acoustical Society of America 33, 2 (1961) 179–186. [CrossRef] [Google Scholar]
 W.A. Yost: Pitch of iterated rippled noise. The Journal of the Acoustical Society of America 100, 1 (1996) 511–518. [CrossRef] [PubMed] [Google Scholar]
 K.J. Woods, J.H. McDermott: Attentive tracking of sound sources. Current Biology 25, 17 (2015) 2238–2246. [CrossRef] [PubMed] [Google Scholar]
 A. De Cheveigné, H. Kawahara: YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America 111, 4 (2002) 1917–1930. [CrossRef] [PubMed] [Google Scholar]
 M. Krawczyk, T. Gerkmann: STFT phase reconstruction in voiced speech for an improved singlechannel speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 12 (2014) 1931–1940. [CrossRef] [Google Scholar]
 O. Das, J.O. Smith, C. Chafe: Realtime pitch tracking in audio signals with the extended complex Kalman filter, in Proceedings of the 20th International Conference on Digital Audio Effects. 2017, September, pp. 118–124. [Google Scholar]
 J. Luberadzka, H. Kayser, V. Hohmann: Glimpsed periodicity features and recursive Bayesian estimation for modeling attentive voice tracking, in Proc. ICA 2019, Aachen, Germany. 2019, pp. 6569–6576. http://pub.degaakustik.de/ICA2019/data/articles/000836.pdf. [Google Scholar]
 P.X. Joris: Entracking as a brain stem code for pitch: The butte hypothesis, in Physiology, Psychoacoustics and Cognition in Normal and Impaired Hearing, Springer, Cham. 2016, pp. 347–354. [CrossRef] [PubMed] [Google Scholar]
 A. de Cheveigné: ZapLine: A simple and effective method to remove power line artifacts. NeuroImage 207 (2020) 116356. [CrossRef] [PubMed] [Google Scholar]
 J. Jing, X. Wang, L. Lu, L. Zhou, H. Shu, X. Wang, J. Chen: Reconfigurable RF notch filter based on an integrated silicon optical true time delay line. Journal of Physics D: Applied Physics 52, 19 (2019) 194001. [CrossRef] [Google Scholar]
Cite this article as: Hohmann V. 2021. The PeriodModulated Harmonic Locked Loop (PMHLL): A loweffort algorithm for rapid timedomain multiperiodicity estimation. Acta Acustica, 5, 56.
All Tables
List of simulations performed. 1st col.: Simulation number; 2nd col.: Brief signal description; 3rd col.: Total signal duration; 4th col.: Fundamental frequency; 5th col.: Composition of each complex tone (number of harmonic and its amplitude, n/a for IRN); 6th col.: Amount of mistuning of all components (simulation II only); 7th col.: Number if iterations of the IRN (simulation V only); 8th col.: Gain of additive Ν(0, 1)distributed Gaussian noise; 9th col.: SignaltoNoise Ratio computed across the whole signal duration (n/a for IRN); 10th col.: Initial PMHLL oscillator frequency: 11th col.: Tracking error in terms of mean and standard deviation across the whole signal duration, excluding the time period of initial convergence (cf. Sect. 3).
All Figures
Figure 1 Principle of strobepoint generation: The fundamental frequency estimate of a periodic signal is assumed to increase linearly from 50 Hz to 150 Hz in the time interval 0–100 ms (blue curve, left scale). The corresponding phase function according to (2) is normalized by 2π and taken modulo 1 (red curve, right scale). Whenever the phase wraps, a strobe point is set (indicated by black bars with circles on top). The time span between subsequent strobe points indicates the stabilized period. 

In the text 
Figure 2 Filter transfer functions in the range between f_{c}/2 and 3f_{c}/2, centered at the current PMHLL oscillator frequency (i.e., fundamental frequency estimate) f_{c}. The amplitude is shown in blue (left scale), and the normalized phase is shown in red curves (right scale). The periodconstructive comb filter (solid lines) enhances a component at f_{c}, whereas the periodsuppressive comb filter (dashed lines) cancels it. If the true fundamental frequency f_{0} of a signal is lower (indicated as f_{0,1}) or higher (indicated as f_{0,2}) than f_{c}, the phase difference between the two comb filters is +90° or −90°, respectively. This phase difference is used by the PMHLL algorithm as a control signal to adapt the fundamental frequency estimate f_{c} towards f_{0}. Note that, due to the periodicity of the transfer functions involved, the same behavior applies to all harmonics of the signal, i.e., the phase flip between +90° and −90° occurs at the same value of f_{c} for the fundamental and all its harmonics. Thus, the fundamental and all its harmonics contribute to the control signal in the same way. 

In the text 
Figure 3 Simulation I: Complex tone with a fixed fundamental frequency of 98.5 Hz (1st half) and 101.0 Hz (2nd half), SNR is 21.5 dB, 7.5 dB and 1.5 dB (left, center and right panel). Each panel shows the PMHLL oscillator frequency (i.e., fundamental frequency estimate) f_{c} referenced to 96 Hz (red curve, right scale), the HNR (blue curve, left scale) and the strobe points (black bars), as a function of time. The black line shows the signal’s fundamental frequency trajectory. Data show quick convergence and small estimation variance even at the lowest SNR. The HNR depends on SNR and quickly goes low during loop adaptation, thus indicating the unlocked state. 

In the text 
Figure 4 Complex tone with all harmonics mistuned by +6 Hz (left panel) and −6 Hz (center panel) at a SNR of 7.5 dB (Simulation II; same as center panel of Figure 3, but with mistuning). Data presentation is the same as in Figure 3. Mistuning does not affect PMHLL functionality, but the fundamental frequency estimate is biased on average by +1.21 Hz and −1.39 Hz, respectively. Right panel: Data for a complex tone with missing fundamental (simulation III) shows a similar pattern as the corresponding case of a complex tone with the fundamental present (center panel of Fig. 3). The HNR drops by about 3.4 dB compared to the corresponding case, because the three lowest of in total five harmonics were discarded from the signal. 

In the text 
Figure 5 Complex tone with a linear sweep of the fundamental frequency from 96 Hz to 103 Hz (simulation IV). SNR decreases from left to right panel, data presentation is as in Figure 3. Data clearly show quick convergence and an ongoing locking with a delay that is shorter than one period, even at the lowest SNR of 1.4 dB (right panel). Note that the total duration of the signal is only 100 ms. 

In the text 
Figure 6 IRN with 5, 3 and 1 iterations (left to right panels, simulation V). Data presentation is as in Figure 3. The estimate converges quickly to the target value of 98 Hz (+2 Hz on the relative frequency scale). The HNR decreases with less iterations and shows some variance, but the fundamental frequency estimate is still quite accurate. 

In the text 
Figure 7 3tone major chord signal (simulation VI). The upper panel shows the oscillator frequency (i.e., fundamental frequency estimate) f_{c} of three PMHLL instances as a function of time. Each instance targeted one of the tones by selecting the instance’s initial f_{c} to be a semitone above the target’s f_{0}, i.e., f_{0} lies at the upper edge of the instance’s catch range. In the plot, each instance’s f_{c} time course was referenced to its target f_{0}, i.e., the target f_{0} is plotted at 0 Hz. The PMHLL instances converge within about 20 ms at an SNR of −2.2 dB for each tone. Strobe points are plotted separately for the different instances, with shorter bar lengths for higher tones. The middle panel shows the mean HNR per component, and the lower panel shows the nonnormalized intensity spectrum of the first 30 ms of the signal (coarse curve) and of the complete signal (fine curve). The tones may not be derived from the 30ms FFT data, because tonal components are not resolved. Note that the PMHLL converged in 20 ms. 

In the text 
Figure 8 Spectrogram of a vocoded voiced speech signal with timevarying trajectories of fundamental frequency, first and second formant. The left panel shows the spectrogram of the signal at an SNR of 0 dB as well as the fundamental frequency estimate derived from the noisy signal with a PMHLL instance that was started at an oscillator frequency f_{c} of 310 Hz (black curve). The right panel shows the spectrogram of the clean voice and again the PMHLL estimate, together with its integer multiples 2–9. The PMHLL instance catches the fundamental frequency trajectory at about 100 ms into the signal and keeps track throughout the duration of the signal. Note that the spectrogram of the noisy signal does not show the fundamental, and most of the time only the second harmonic is visible. 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.