Issue |
Acta Acust.
Volume 6, 2022
Topical Issue - Auditory models: from binaural processing to multimodal cognition
|
|
---|---|---|
Article Number | 34 | |
Number of page(s) | 23 | |
DOI | https://doi.org/10.1051/aacus/2022020 | |
Published online | 12 August 2022 |
Scientific Article
Hybrid multi-harmonic model for the prediction of interaural time differences in individual behind-the-ear hearing-aid-related transfer functions
1
Institute for Hearing Technology and Acoustics, RWTH Aachen University, 52074 Aachen, Germany
2
Acoustics Research Institute, Austrian Academy of Sciences, 1040 Vienna, Austria
* Corresponding author: florian.pausch@akustik.rwth-aachen.de
Received:
5
May
2021
Accepted:
2
June
2022
Spatial sound perception in aided listeners partly relies on hearing-aid-related transfer functions (HARTFs), describing the directional acoustic paths between a sound source and the hearing-aid (HA) microphones. Compared to head-related transfer functions (HRTFs), the HARTFs of behind-the-ear HAs exhibit substantial differences in spectro-temporal characteristics and binaural cues such as interaural time differences (ITDs). Since assumptions on antipodal microphone placement on the equator of a three-concentric sphere are violated in such datasets, predicting the ITDs via Kuhn’s simple analytic harmonic model entails excessive errors. Although angular ear-canal offsets have been addressed in an extended Woodworth model, the prediction errors remain large if the frequency range does not comply with the model specifications. Tuned to the previously inaccurately modelled frequency range between 500 Hz and 1.5 kHz, we propose a hybrid multi-harmonic model to predict the ITDs in HRTFs and HARTFs for arbitrary directions in the horizontal plane with superior accuracy. The target model coefficients are derived from individual directional measurements of 30 adults, wearing two dual-microphone behind-the-ear HAs and two in-ear microphones. Model individualisation is facilitated by the availability of polynomial weights that are applied to subsets of individual anthropometric and HA features to estimate the target model coefficients. The model is published as part of the Auditory Modeling Toolbox (AMT, pausch2022) and supplemented with the individual features and directional datasets.
Key words: ITD model / Hearing aids / Adult anthropometrics / Binaural technology / Virtual acoustics
© The Author(s), published by EDP Sciences, 2022
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
Implementation of conventional binaural synthesis requires knowledge of the sound field, generated by a free-field sound source, at the ear-canal entrance [1]. The spatial filtering of the sound field by the individual anatomy of a listener is described by HRTFs [2]. When wearing HAs, the sound field is captured by the microphones of an HA, with the corresponding spatial filters referred to as HARTFs or by similar terms [3–6]. It is well understood from systematic investigations that the HA-device style and the microphone positions have considerable effects on HARTFs and directional hearing [7–9]. In part, a reduced localisation performance in the horizontal plane when using behind-the-ear receiver-in-ear HAs can be attributed to modified binaural cues, and limited access to ITDs owing to the reduced bandwidth of the HA receivers [10].
Previous work demonstrated that HRTFs can be decomposed into a minimum-phase and an excess-phase component, the latter consisting of a linear-phase and an allpass term [11, 12]. Although some authors suggested the possibility to neglect the allpass term [13, 14], there is perceptual evidence that this simplification may be audible, while perceptually indistinguishable results can be achieved when replacing the allpass term by its interaural group-delay difference evaluated at 0 Hz [15]. The application of the simple harmonic analytic models by Woodworth [16] and Kuhn [17] to HRTF datasets allows to predict the ITDs in the horizontal plane, representing the linear-phase term when considered as mere time delays. In such models, the head is approximated by an ideal three-concentric sphere, whose radius can be estimated by a weighted linear combination of anthropometric features1 [18, 19], with featureless ears arranged antipodally on the equator [16, 17]. Alternatively, the head geometry can be approximated by an ellipsoid [20, 21]. Numerical models [22–25] additionally allow to include refined pinna geometries, head asymmetries, and off-centre ear canals. The latter aspect has been partially addressed in an extended variant of Woodworth’s model, considering horizontal ear-canal angles larger than 90 deg [26], which also inspired a more recent geometric head model [27]. While the application of the two mentioned simple harmonic analytic models will lead to large errors, such refined models are likely better suited for estimating the ITDs in behind-the-ear HARTFs, since the HA microphone positions even further violate the assumptions of antipodally arranged reception points at the lateral head centres.
The development of ITD models typically relies on a reasonably large pool of acoustic measurement data and a collection of individual features [18, 21]. However, publicly available individual HARTF databases are rather sparse, cf. first to third columns of Table 1. Such databases provide a valuable basis for comparisons across HA devices, to highlight specific features and acoustic characteristics, and their variations between participants [28]. Further benefits extend to the development of acoustically transparent HAs [5] and individualised HA algorithms [29]. Individualisation of generic receiver directivities using individual anthropometrics will likely be more effective for HA users when based on ITD models derived from HARTF instead of HRTF datasets [30]. In binaural HAs, it is crucial to preserve the ITDs, which becomes a challenging task in presence of unwanted side effects of HA algorithms [31, 32]. In this context, the application of ITD models could serve as individual ITD reference data to continuously improve adaptive HA algorithms [33] and increase their robustness against physical imperfections, such as gain, position and phase errors in the HA microphone arrays [34], or detrimental acoustic conditions inflicted by noise sources or reverberation [35, 36].
Comparison of previously measured individual HARTF databases with the presented database.
Limiting factors of existing databases include low spatial resolution and selective angular coverage. This restricts their application to objective evaluations and perceptual experiments evaluating metrics for a discrete set of directions or a combination thereof with a static listener. Nonetheless, the development of direction-continuous ITD models requires spatially dense measurement data [24]. Similar demands apply to dynamic binaural real-time reproduction of virtual acoustic environments, where the listener’s unrestricted head movements are captured by a motion tracking system. Consequently, the auralisation system needs to continuously update the spatial transfer functions to maintain the position of a virtual sound source for a plausible listening experience [6, 37]. In such dynamic scenarios, the spatial resolution ideally lies below the minimum audible angle [38, 39]. If the binaural signals include simulated room acoustics a less dense grid of 2 deg × 2 deg is perceptually sufficient for auralising the direction-dependent reflection patterns at the listener position [40, 41].
In order to approach the target criteria of the aforementioned application areas, we specifically created a database of individual directional transfer functions. The transfer functions were measured at the front and rear microphones of two behind-the-ear HA prototypes, while attached to adult participants, and in-ear microphones at the blocked ear-canal entrances (IHTA-indHARTF [42]), cf. last column of Table 1. The measurements cover directions that are distributed on an equiangular 2.5 deg × 2.5 deg grid in azimuth and elevation, including elevation angles down to −70 deg. Based on the acoustic data obtained for equatorial directions, a hybrid multi-harmonic model (AMT: pausch2022 [43]) to predict the ITDs in the horizontal plane is proposed for each type of dataset [43], that is, for HRTFs, and front and rear HARTFs. The model is tuned to a frequency range between 500 Hz and 1.5 kHz, accounting for the range of highest sensitivity to ITDs between 700 Hz and 1.4 kHz [44], which is not covered with sufficient accuracy by Kuhn’s simple harmonic analytic model [17] and the extended Woodworth model [26]. To facilitate application to individualisation, we calculated optimal polynomial-regression weights to be applied on dataset- specific subsets of anthropometric and HA features for estimating the model coefficients. It is demonstrated to what extent the prediction performance of Kuhn’s analytic model is affected due to a violation of the specified microphone placement and the frequency range and how well the former can be intercepted by the extended Woodworth model. All three models are evaluated in terms of prediction errors, and estimation accuracy of maximum ITD values and arguments of the ITD maxima.
The remainder of this article is structured as follows: The first part (Sect. 2) focuses on the creation of the acoustic database, including information on the study participants (Sect. 2.1), as well as the measurement apparatus (Sect. 2.2) and procedure (Sect. 2.3). After introducing the ITD-estimation method, the ITD characteristics of the measured acoustic datasets are analysed across participants, also addressing measurement reliability (Sect. 2.4). The procedure and repeatability of the manual feature-extraction approach are described and complemented by a correlation analysis (Sect. 2.5). Building upon the acoustic database and specific feature subsets, we present the rationale and design steps that led to the hybrid multi-harmonic model in the second part (Sect. 3). The results on model evaluation (Sect. 4) include assessment of structural model complexity (Sect. 4.1), comparative broadband and frequency-selective performance analyses (Sect. 4.2), and model-robustness analysis (Sect. 4.3). Finally, the main findings are discussed in consideration of perceptual implications, possible future evaluations, and limitations (Sect. 5), ending with conclusions (Sect. 6).
2 Database creation
2.1 Participants
We recruited N = 30 participants (8 female, 22 male) aged 28.9 ± 6.1 (μ ± σ, range: 22–43 yrs). Prior to the experiment, all participants were informed of their rights, the study purpose and the contents of the database. They subsequently provided consent to publish the collected data in a pseudonymous way. The personal data and experimental results were processed and archived in accordance to country-specific and European data-protection regulations [45].
2.2 Apparatus
All acoustic measurements were performed in a hemi-anechoic chamber with the dimensions 11 m × 5.97 m × 4.5 m (length × width × height) and a lower frequency limit of about 100 Hz [48].
The loudspeaker array consisted of 64 1-inch full-range loudspeakers (W1-2025SA, Tang Band, Taipei, Taiwan) which were built into identically designed closed array segments, resulting in a frequency range of about 500 Hz to 18 kHz [49]. Covering elevation angles2 between 88.75 deg and −70 deg at a radius of 1.2 m, the loudspeakers were arranged in steps of 2.52 deg. A remote-controlled motor allowed to rotate the loudspeaker array around the participant for continuous HRTF measurements [50].
For the HARTF measurements, we used a pair of custom-made behind-the-ear receiver-in-ear HAs (GN ReSound, Ballerup, Denmark) without digital signal processor and with full access to the raw microphone signals [6]. Each device contained two omnidirectional micro-electro-mechanical-systems microphones (Knowles, Itasca, IL, USA), see Figure 1. The microphone signals were transmitted via slim cables with a diameter of 1.4 mm (Hi-Pro cable, Sonion, Roskilde, Denmark). In the current measurement, the receiver cables without an ear piece attached were used to position the devices behind the participant’s ear with negligible acoustic influence.
Figure 1 Detail of the technical drawing showing the HA prototype, with front and rear HA microphones coloured in blue. (Original drawing released with permission of GN ReSound and adapted.) |
Conventional HRTFs were measured simultaneously for comparison purposes using in-ear microphones (KE3, Sennheiser, Wedemark, Germany), which were installed in power domes and placed at the ear-canal entrances. All cables were taped to the sides of the neck and to the participant’s clothing to ensure that both the domes and research HAs remain in place throughout the measurement, see Figure 4.
Based on the method of multiple exponential sweeps [51, 52], the measurement signal was generated between 20 Hz and 20 kHz at a sampling rate of 44.1 kHz with an individual sweep length of 215 samples using MATLABTM (The MathWorks, Inc., Natick, MA, USA) and the ITA-Toolbox [53]. The measurement signal was transmitted via MADI protocol (RME HDSPe MADI, Audio AG, Haimhausen, Germany), DA-converted and amplified by two 32-channel power amplifiers with MADI interface (MA 32 D, KS Audio Research GmbH, Hettenleidelheim, Germany), and played back via the array loudspeakers. The captured signals were AD-converted (RME Octamic II, Audio AG, Haimhausen, Germany) and sent to the audio interface (RME Hammerfall DSP Multiface II, Audio AG, Haimhausen, Germany) via ADAT protocol for further processing in MATLABTM [53].
Four infrared cameras (Flex 13, NaturalPoint Inc., Corvallis, Oregon, USA) were installed at the ceiling to capture the poses, i.e. the positions and orientations, of the rigid bodies attached to the participant and the loudspeaker array during the acoustic measurement. Synchronised recording and post-processing of the movement trajectories was facilitated using the software Motive (NaturalPoint Inc., Corvallis, Oregon, USA) and a dedicated MATLABTM interface [53]. A 20-inch monitor (SyncMaster 205BW, Samsung, Seoul, South Korea) at 2.3 m distance in viewing direction provided real-time feedback on the participant’s pose. At the bottom of the displayed graphical user interface, see Figure 2, six boxes showed the errors by degrees of freedom relative to the target pose, with colour coding and numerical values in units of cm and deg for position and orientation, respectively. To visualise the current pose, a dynamic black cross-hair was displayed in the centre of the screen and contrasted with a static green cross-hair. Additional arrows indicated the suggested corrective motions for alignment [49]. A tolerance range of ±1 cm and ±0.8 deg for position and orientation offset, respectively, defined the optimal pose range. When drifting beyond these thresholds, the smileys changed, while both frames and text turned yellow, and red for pose errors larger than 3.5 cm and 1.5 deg. In comparison, the feedback system used by Denk et al. [54] prescribed similar thresholds, with slightly stricter limits of ±0.6 deg for the yaw angle. More lenient thresholds were used in the orientation-only feedback system by Begault et al. [55], with lamps indicating the current orientation, discretised in steps of 1 deg.
Figure 2 Graphical user interface providing real-time feedback to assist in correcting misalignment with the participant’s target pose. |
Figure 3 shows the alignment errors averaged over time and participants, split into position and orientation components. While the black boxplots represent absolute median errors with respect to the target pose, the grey boxplots depict relative pose errors referenced to the individual mean pose. In addition to the intrinsic orientation-error components, the error with respect to the great-circle central angle shows the absolute angular error between ideal and actual loudspeaker directions given a misoriented participant. In the following, median errors with interquartile ranges (IQRs) are reported.
Figure 3 Absolute (black) and relative (grey) pose errors with respect to the target pose and mean participant pose, respectively. Boxplots represent median (Mdn) values (horizontal lines) and interquartile ranges (IQRs), with whiskers covering 1.5 times the IQR. |
The relative errors are predominantly within the pre-defined tolerance ranges. This holds true for the relative orientation error, with a roll error of −0.1 deg (IQR: [−0.4, 0.3] deg), a pitch error of 0.1 deg (IQR: [−0.4, 0.6] deg), and a yaw error of 0.1 deg (IQR: [−0.3, 0.5] deg). The relative position errors amounted to offsets of 0 (IQR: [−0.3, 0.3] cm) in x-direction, 0 (IQR: [0, 0] cm) in y-direction, and 0 (IQR: [−0.5, 0.5] cm) in z-direction. Thereby, the feedback system effectively helped avoid drifting off the target pose over time, as typically observed in measurements without feedback [49], even if a fixed viewpoint is provided [56]. A larger absolute angular error of 1.5 deg (IQR: [0.9, 2.2] deg) is primarily caused by a lateralposition offset of 1.1 cm (IQR: [−0.6, 3.5]) cm) in x-direction, and a front-back-position offset of 0.2 cm (IQR: [−0.7, 1.2] cm) in z-direction. Smaller absolute errors of approximately 0.3 deg within a range of ±0.5 deg are achievable when measuring seated participants with an additional headrest [54]. However, no interfering reflections from the thighs and a head rest are contained in the data using the presented measurement routine, and misalignment is compensated during post-processing, cf. Section 2.3.2.
2.3 Acoustic measurements
2.3.1 Procedure
Free-field reference measurements were performed in absence of the participants with the in-ear microphones without attached power domes positioned in the centre of the loudspeaker array. Similarly, the research HAs were oriented at a typical carrying angle and mounted in a way that the centre of the HA microphones and their connection axis corresponded with the participant’s target pose, i.e. the array centre with orientation in nominal viewing direction.
As part of the preparations of the individual receiver-directivity measurements, the interaural and longitudinal axes of the standing participants were adjusted using two cross-line lasers to coincide with the centre of the loudspeaker array, see Figure 4. To make the compensatory movements suggested by the feedback system more intuitive, the offset of the head- mounted rigid body to the centre of the interaural axis was corrected individually [49]. In a training session, the participants were familiarised with the graphical user interface and their task to realign with the presented target pose. Individual receiver-directivity measurements were subsequently conducted based on the method proposed by Richter and Fels [50], favouring minimal participant movement by rotating the loudspeaker array around the ideally static participant. Before the actual start of the measurement, the participants had 20 s to align themselves with the target pose. With a pre-defined angular speed of 1.3 deg/s after completed acceleration, a full rotation required 4.5 min, resulting in an azimuthal resolution of approximately 2.49 deg.
Figure 4 Placement and fixation of the in-ear microphones, the research HAs and the head-mounted rigid body. Two cross-line laser were simultaneously used for optical validation of the participant’s target pose. |
2.3.2 Post-processing
To eliminate unwanted reflections in the impulse responses after deconvolution [51, 52], an adaptive time window was applied on each of the raw binaural datasets per participant. After direction-dependent determination of the latest impulse-response start instances for all microphone signals, a right-sided Hann window (1 ms fade-out duration, starting 2 ms after the latest start instance) was used. 5.8 ms before the end of the fade-out, we applied a left-sided Hann window with 1-ms fade-in duration. We used the same time window on both the raw binaural impulse responses and the reference measurements. Subsequent cropping resulted in a final impulse-response length of 256 samples at a sampling rate of 44.1 kHz. The windowed spatial transfer functions were divided by the corresponding reference spectra, applying spectral regularisation [57] and band limitation between 100 Hz and 20 kHz. Due to the lack of energy below the lower cut-off frequency of the loudspeakers, the magnitude spectra were extrapolated towards 0 dB at 0 Hz for frequencies below 500 Hz for physical reasons [2]. Causality of the resulting impulse responses was restored by setting a pre-delay of 0.5 ms in all spatial transfer functions, while preserving binaural cues.
Further necessary post-processing steps [49] included a directional mapping of spatial transfer function via rotation matrices, accounting for time-dependent loudspeaker positions, and time-dependent participant poses relative to the nominal array centre and the loudspeakers. The second step in optimising the directional mapping involved an azimuthal rotation of the global coordinate system so that the zero-crossings of the individual ITDs, estimated as per the method described in Section 2.4 for all directions in the horizontal plane, were approximately zero at 0 deg and 180 deg azimuth [21]. This was necessary to compensate for a remaining azimuth offset of 0.4 ± 3.6 deg (μ ± σ), averaged across participants, between the coordinate system of the optical tracking system, and the coordinate-system reference set by the mechanical switch of the motor controlling the loudspeaker array. Finally, the estimated directions of incidence, distributed on a frequency-dependent non-uniform grid, were transformed to a frequency-independent equiangular grid with a resolution of 2.5° × 2.5° in azimuth and elevation angles, allowing easier data handling. For spatial interpolation, we estimated the complex spectra of the spatial transfer functions for directions of incidence within the convex hull of input points using a natural-neighbour interpolation of the real and imaginary parts separately [58].
Due to issues with local electrical crosstalk in the loudspeaker array, the measurement results at elevation angles of 75 deg and 10 deg had to be excluded prior to spatial interpolation during post-processing. Since it was not possible to entirely restore fine spatial details in the eliminated directions, the introduced errors were examined by comparing the spectra at ϑ = 45° to their spatially interpolated counterparts, revealing a loss of predominantly high-frequency components. However, applying an adapted version of the inter-subject spectral difference [59] revealed a smaller error between the approximated and the original spectra, compared to the error between the original spectra and the spectra of neighbouring points at ϑ = 47.5°.
2.4 Interaural time differences
2.4.1 Estimation method
We define the ITD as the difference in times of arrival of a sound wave between the right and left ear, resulting in positive values for 0°≤ φ ≤ 180°. The ITD characteristics of the HARTF datasets and their differences to HRTFs were analysed for directions in the horizontal plane, accounting for variations across participants. To avoid ITD misestimations due to noise, the post-processed measurements were band-limited using a low-pass filter with a cut-off frequency of 1.6 kHz, implemented as 10th order low-pass digital Butterworth filter. Direction-dependent ITDs were subsequently evaluated per participant and dataset in a frequency range between the lower loudspeaker cut-off frequency of 500 Hz, and an upper frequency limit of 1.5 kHz [15]. For reasons of increased robustness in the intended modelling-frequency range, we decided to evaluate the maximum of the interaural cross-correlation of energy envelopes (MaxIACCe) instead of applying threshold-based methods, accepting slight sacrifices to perceptual relevance [60].3 The resulting ITD-curve progressions allowed to derive trends in the overall shape, maximum ITD values, max{ITD}, and the arguments of the ITD maxima, , and were used for modelling purposes, see Section 3.
2.4.2 Measurement-based interaural time differences
In Figure 5, the mean measurement-based ITD estimations with standard deviations, averaged across participants, are shown per type of dataset for source directions in the horizontal plane.
Figure 5 Comparison of measurement-based mean ITDs between datasets in the horizontal plane, evaluated in a frequency range of 0.5–1.5 kHz and averaged across participants with standard deviations σ (grey areas). Dashed and dotted lines show the mean ITD differences in front (F) and rear (R) HARTFs, respectively, compared to the mean ITDs in HRTFs. |
A considerable variation in curve progressions and ITD maxima was observed across participants, on average amounting to 724.6 ± 24.6 μs (μ ± σ, range: 684.8–780.1 μs), 691.3 ± 19.8 μs (μ ± σ, range: 653.1–734.7 μs), and 674.1 ± 20.7 μs (μ ± σ, range: 630.4–721.1 μs) in HRTF, and front and rear HARTF datasets, respectively. The average maximum ITD values in HRTF datasets were found to lie between the ones reported for the CIPIC database (μ ± σ = 646 ± 33 μs) [19] and the KEMAR dataset (≈790 μs) [61]. The general decrease of ITD maxima in HARTF datasets can be explained by the resulting shorter path lengths due to the HA-microphone positions further off the head centre and increased horizontal HA-microphones angles ΘHA compared to the smaller horizontal ear angles ΘE, see Figure 6 and Table 2.
Figure 6 Schematic side and top views of the right ear and the head, respectively, showing ear-related and HA features, and the HA device in light blue with blue dots representing the HA microphone positions. The auxiliary parameter ζ is defined in equation (A2). Figure adapted from [19]. |
Statistical summary of the results on measured and calculated anthropometric and HA features.
Reproducibility of selected anthropometric and HA features from participant 18 using manual feature extraction. The inter- and intra-individual results are based on feature extractions performed by six different participants, and three times by one of the six participants, respectively.
In HRTFs, the mean argument of the ITD maxima was estimated at 86.1 ± 2.6 deg (μ ± σ, range: 80–90 deg). For the front and rear HARTFs datasets, angularly shifted results of 81.3 ± 2.8 deg (μ ± σ, range: 77.5–87.5 deg) and 79.3 ± 2.3 deg (μ ± σ, range: 75–85 deg), respectively, can be reported. Reasons for deviations in terms of the arguments of the ITD maxima can be largely attributed to the deviating horizontal ear-canal and HA-microphones angle, ΘE and ΘHA, respectively.
Further analyses of ITD characteristics are provided in Section 4.2.
2.4.3 Reproducibility of measurement results
To examine the impact of slight variations in HA positioning on measured ITDs and overall reproducibility of the measurement results, one participant was measured acoustically four times (measurements I–IV), after repositioning the HA devices between runs. While the ear canals were blocked by the in-ear microphones used for the HRTF measurements in measurements I–III, measurement IV was conducted without in-ear microphones to investigate any potential effects of the closed ear canal on HARTFs, compared to an open one.
We generally obtained very similar ITD-curve progressions for directions in the horizontal plane. The mean deviations of 0.3 ± 6.5 μs (μ ± σ), 1.6 ± 7.7 μs (μ ± σ), and 1.2 ± 6.0 μs (μ ± σ) for HRTFs, front and rear HARTFs, respectively, are below the just-noticeable differences (JNDs) reported in Section 5.2.1, with a maximum error of ±31.7 μs. The different measurements further entailed consistent progressions in magnitude spectra with spectral differences [62] also being for the most part below reported JND values [50]. Slight narrowband overshoots between 5 kHz and 7 kHz were only observed for HARTFs, and for frequencies below 1.5 kHz in all datasets. Most likely, the reason for the spectral similarity lies in the fact that the HA devices are placed behind the ears, with the HA microphones being shadowed by the pinna and therefore not being susceptible to wave interaction effects. Measurement IV did not reveal any prominent deviations neither in the resulting ITDs nor in the magnitude spectra for frequencies above about 4 kHz [63], where monaural cues are to be expected, ruling out an effect of the ear-canal blockage.
2.5 Feature acquisition
2.5.1 Methods and definitions
For the extraction of individual features related to the body, head, pinna and HA placement, frontal and profile photos of each participant were taken. For this purpose, we mounted a ruler on a stand, next to which the participants stood. To minimise errors due to mismatched optical planes, the longitudinal axis of the participant was aligned with the ruler and the camera’s nominal line of sight using frontal and lateral crossline lasers. The centre of the HA-microphone positions was additionally marked with white tape at the respective ear to be visible in the profile photos. The camera (X-Pro2 with XF35mmF2 R WR FUJINON, Fujifilm, Minato, Tokyo, Japan) with a resolution of 6000 pixel × 4000 pixel was placed at a distance of about 3 m to capture the frontal view, left and right profile, and the seating position. We also captured detail views from each of the participant’s ears at about 50 cm distance, including a ruler for scaling. Inherent lens distortions and perspective distortions were minimised using image editing software (Capture One 21 Fujifilm v14, Phase One, Copenhagen, Denmark).
The large part of features was extracted manually [64] and includes available definitions [19] and new ones, specifying the individual placement of the HA devices, see Figure 6 and Table 2. Additional features were calculated using the geometrical formulas provided in Appendix A. The final set can be subdivided into head- and body-related features (x1 to x17), ear-related features (d1 to d8, Θ1, Θ2, and ΘE), and HA-related features (d9 to d14, Θ3, and ΘHA). Two different feature subsets were defined for the specific ITD-model implementation, see Section 3.4, to predict individual ITDs in HRTF and HARTF datasets.
2.5.2 Correlation analysis
Table 2 lists all measured and calculated features with means μ, standard deviations σ and percentage deviations χ. The results for most features, averaged across ears, are in the range (±10%) of those reported by Algazi et al. [19], except for larger deviations in neck height x7 (−47.6%), neck depth x8 (18.3%), torso-top width x10 (30.2%), seated height x15 (34.2%), cymba-conchae height d2 (13.6%, averaged across ears), fossa height d4 (11.4%, averaged across ears), cavum-conchae depth d8 (91.5%, averaged across ears), and pinna-rotation angle Θ1 (57.9%, averaged across ears), ignoring the non-meaningful deviations in pinna offset down x4, pinna offset back x5 and head offset forward x13. Possible reasons for the observed deviations include measurement uncertainties, and differences in sample-inherent characteristics such as age (CIPIC: 25.4 ± 8.6 yrs, range: 18–63 yrs), weight (CIPIC: 70.9 ± 14.8 kg, range: 39.5–117.9 kg; not available in the current database) and gender distribution (CIPIC: 16 f, 27 m). Exclusion of the aforementioned non-meaningful features (as well as HA-microphones offset up d13 and back d14) resulted in average percentage deviations of 20.1 ± 15% (μ ± σ) regarding head- and body-related features, and 36.9 ± 21.2% (μ ± σ) and 34.9 ± 21.8% (μ ± σ) regarding ear and HA-related features of the left and right ear, respectively.
To identify interrelationships between features, averaged across ears, Figure 7 visualises the upper correlation matrix in hierarchical clustering order [65], only showing bivariate correlations that were significant at the 95% confidence level. No significant correlations were found for intertragal-incisure width d7 and HA-microphones-to-scalp offset d10, whereas torso-top height x10 only resulted in an inconclusive significant correlation with x3 (ρ = .43), and was therefore removed. The most relevant of the top forty correlations are summarised below, largely in descending order of magnitudes in bivariate correlation coefficients ρ.
Figure 7 Visualisation of the upper correlation matrix between the features in Table 2, showing bivariate correlations that are significant at the 95% confidence level. |
Starting with ear-related and HA-related features, some of the strongest but also most obvious correlations were found between pinna offset back x5 and horizontal ear-canal angle ΘE (ρ ≈ 1) as well as horizontal HA-microphones angle ΘHA (ρ = .9), and between pinna-rotation angle Θ1 and frontal HA-microphones angle Θ3 (ρ = .96). For obvious reasons, the horizontal HA-microphones angle ΘHA was strongly correlated with HA-microphones offset back d14 (ρ = .99), pinna offset back x5 (ρ = .9), and horizontal ear-canal angle ΘE (ρ = 0.9). So were cymba-conchae height d2 and pinna height d5 (ρ = .77), as well as fossa height d4 and pinna height d5 (ρ = .73). Moderate correlations were present between cavum-conchae width d3 and pinna width d6 (ρ = .69). We additionally report negative correlations for HA-microphones offset up d13 with pinna-rotation angle Θ1 (ρ = −.75) and frontal HA-microphones angle Θ3 (ρ = −.7). No other meaningful negative correlations were present within the considered range of correlation magnitudes. Fossa height d4 was naturally related to the HA behind-pinna depth d11, represented by a moderate correlation coefficient of ρ = .69. The HA-microphones-to-ear-canal offset d9 appeared to be strongly correlated with HA-microphones offset up d13 (ρ = .75), and moderately correlated with multiple ear dimensions, that is, pinna height d5 (ρ = .61), cymba-conchae height d2 (ρ = .61), pinna width d6 (ρ = .6), and fossa height d4 (ρ = .59). Cymba-conchae height d2 was additionally moderately correlated with fossa height d4 (ρ = .56) and pinna width d6 (ρ = .51).
Regarding head- and body-related features, strong and plausible correlations were found between torso-top width x9 and shoulder circumference x17 (ρ = .86), shoulder width x12 (ρ = .84), and head width x1 (ρ = .77). Shoulder width x12 was strongly correlated with shoulder circumference x17 (ρ = .82) and moderately correlated with torso-top width x11 (ρ = .61) and neck width x6 (ρ = .57). Moderate correlations were also observed between head height x2 and height x14 (ρ = .68), and head depth x3 (ρ = .67), the latter also showing moderate correlations with head circumference x16 (ρ = .59) and head width x1 (ρ = .58).
2.5.3 Variability in feature extraction
To test the manual feature-acquisition approach for reproducibility, six participants extracted the feature subsets relevant for ITD modelling, cf. Section 3.4, from the photos of participant 18, one of whom performed two additional extractions. This approach allows to analyse inter- and intra-individual variations, which are summarised in Table 3. For head width x1, head height x2, pinna offset down x4, and calculated HA-microphones offset up d13, the intra-individual extractions entailed smaller standard and percentage deviations than the inter-individual results. For the remaining features, both variations were in a comparable range, see Table 2. The effects of these variations on the ITDs predicted by Model 3 are analysed in Section 4.3.
3 Modelling interaural time differences
This section starts with a summary of the key aspects of the analytic ITD models by Kuhn [17], and Woodworth [16, 66] (in its extended version [26]). To address their shortcomings in view of the inherent ITD characteristics of HARTFs, see Section 2.4.2, and the mismatched target frequency range, a hybrid multi-harmonic ITD model is introduced, allowing to predict the ITDs in the horizontal plane. In an intermediate step, the target coefficients of all three models are derived by fitting the ITD curves in the horizontal plane, the estimations of which are based on the corresponding measured directional datasets of the database presented in Section 2. Model individualisation is made possible by estimating these target model coefficients by application of weighted individual feature subsets. The method to obtain feasible weights, the specific model implementations using weighted individual feature subsets, and metrics and methods used for a comparative model-performance evaluation are introduced. All modelling steps and analyses were performed using R [67], RStudio [68] and MATLABTM [53].
3.1 Modelling rationale
Kuhn’s analytic symmetric-sphere model [17], referred to as Model 1 with corresponding values of
relies on the sphere-equivalent effective head radius an of the nth participant (n = 1, …, N) with horizontal ear-canal angles of ΘE = 90°. Model 1 is valid to predict the ITDs in the horizontal plane in HRTF data for low frequencies f, i.e., (kan)2 ≪ 1, with k = 2πf/c and c representing the wave number and the speed of sound, respectively. Given an average effective head radius of approximately ≈ 87.5 mm [18, 69] and c = 343 m/s, the valid range of Model 1 corresponds to frequencies much smaller than about 624 Hz.
Also based on a symmetric sphere with horizontal ear-canal angles of ΘE = ±90°, the analytic model by Woodworth [16, 66], referred to as Model 2, predicts values of
Due to the assumed left–right symmetry, the first and second angular ranges also apply to the corresponding angular sectors in the right half space relative to the median plane, using negated azimuth angles φ, and a negative π in the term describing the second angular range. Model 2 notably underestimates the ITDs in the low frequency range but converges to better suitability than Model 1 for ≫ 2.4, which corresponds to frequencies greater than 1.5 kHz, and allows optimal results for frequencies of 4 kHz and above [26].
Model 1 and Model 2 were applied to estimate the low- and high-frequency ITDs, respectively, in a generic HARTF dataset, measured from behind-the-ear HAs [3]. However, the ITD curves, evaluated for the front and rear HARTF datasets, see Figure 5 in Section 2.4, differ in part considerably from the ITDs evaluated for the HRTF datasets. Since especially the horizontal HA-microphones angle deviates from 90 deg in the former dataset type, the assumption of a front-back-symmetric head are violated, leading to ITD curves that are asymmetric at the frontal plane. We therefore implemented the extended Woodworth model, cf. equations (b1)–(b5) and Figure 4 in [26], referred to as Model 2+ with corresponding values ITDModel 2+. Plane-wave incidence was assumed since the measurement radius of the applied measurement system, see Section 2.2, satisfied assumptions about the distal region [70]. Model 2+ accounts for arbitrary horizontal microphone angles, 90° ≤ ΘE ≤ 180° and 90° ≤ ΘHA ≤ 180°.4
Due to the lack of a model that combines the advantages of Model 1 and Model 2+, while also addressing the ITD-curve irregularities and additional deviations from a sine wave particularly in HARTF datasets, we propose a multi-harmonic model. Model 3 (AMT: pausch2022 [43]) predicts values of
with the model coefficients (q = 1, …, 3; dataset type) for the nth participant. The model is optimised for 0.8 ≤ ≤ 2.4, targetting the frequency range between 500 Hz and 1500 Hz, within which the ITDs are not predicted with sufficient accuracy by Model 1 and 2(+). All modelling stages described below considered the ITD results for L = 73 azimuth angles in the horizontal plane, covering 0° ≤ φ ≤ 180° in steps of 2.5 deg.
3.2 Estimation of effective head radii
To obtain a common basis for model comparisons, we estimated the effective by-participant head radii an, similarly as done in [18] but using Model 1 instead of Model 2, and a different ITD-estimation method, see Section 2.4. The corresponding measurement-based ITD data was fitted in R (package nlxb from the library nlmrt [71]). Model 1 was chosen for this estimation because it better matches the evaluated frequency range used for the ITD estimation and the measurement-based ITD-curve shapes in HRTF datasets. Due to the position of the HA devices and microphones, the resulting effective head radii in the HARTF datasets were estimated by adding the individual HA-microphones-to-scalp offsets d10, see Figure 6, corresponding with the geometric calculations of the horizontal HA-microphones angle ΘHA, cf. equation (A3).
3.3 Estimation of model coefficients
Model 1 can be readily evaluated using these estimated effective head radii. Model 2+ [26] additionally required the calculation of across-ear averages of horizontal microphone angles, ΘE and ΘHA, see Appendix A. To obtain the coefficients of Model 3, that is, γ1n to γ3n, the measurement-based by-participant ITD curves were similarly fitted as done for the estimation of effective head radii per type of dataset, while retaining the previously estimated effective head radii. Therefore, Model 3 can be considered a hybrid model, since it is still based on a physically motivated effective head radius but takes advantage of an empirical curve-fitting approach.
In case no individual acoustic measurement data is available, a link to the individual features, see Table 2, needs to be established to make the models usable for individualisation purposes. Algazi et al. [18] estimated the effective by-participant head radii of Model 2, cf. equation (2), by a linear combination of selected head-related anthropometric features and an offset.5 Applying the same strategy to Model 3 failed due to its increased complexity and a curvilinear relationship between the features and the model coefficients an, and γ1n to γ3n. We instead used polynomial regression (see, e.g. [72]) to estimate the effective head radii an in Model 1 and Model 2+, and the effective head radii an together with the qth by-participant coefficients in Model 3, as described in Appendix B. The performance of all models was evaluated using model coefficients that were estimated by means of polynomial-regression weights applied to a subset of features, as selected for the specific model implementation described below.
3.4 Specific model implementation
An empirical selection of M = 5 features (HRTF dataset: {x1, x2, x3, x4, x5}; front and rear HARTF datasets: {x1, x2, x3, d13, d14}), averaged across ears, was chosen to support the prediction of maximum ITD values [18, 19] and the shift in the arguments of the ITD maxima [26]. Further corroboration of the feature selections roots in their largely loose interrelationships: In addition to the few moderate correlations for feature combinations within each subset, reported in Section 2.5.2, only weak (|ρ| < .5) or non-significant correlations (p > .05) were found otherwise, except for an additional moderate correlation between x2 and d13 (ρ = .55). To find a suitable polynomial complexity level and avoid over-fitting, the polynomial degree P was varied and consequently set in a way that the respective residual standard errors in the univariate and multivariate polynomial-regression models, cf. Appendix B, became minimal, suggesting P = 4.
3.5 Model-performance evaluation
Individual model-prediction performance was quantified on the basis of the mean absolute error (MAE)
between model-based and measurement-based ITDs, ITDmod and ITDmeas, respectively, and the root-mean-square error (RMSE)
Both error metrics were calculated for the L evaluated azimuth angles, reporting means μ and standard deviations σ of results averaged across participants.
To demonstrate the effect of the individual model coefficients on MAEs and RMSEs, the base variant of Model 3, consisting of an and γ1n only, was gradually increased in complexity by sequentially adding the remaining coefficients γ2n and γ3n, and compared to Model 1. In all complexity levels of Model 3, and for Model 1, the model coefficients were estimated using polynomial regression weights with a polynomial degree of P = 4 that were applied on the respective feature subsets, see Section 3.4.
For a performance comparison between all models, that is, Model 1, Model 2+, and the final Model 3, the correlation of measured and modelled ITD maxima was quantified by Pearson’s correlation coefficient ρ, and the variance explained was evaluated by the adjusted coefficient of determination R2, describing the predictable proportion of variance by simple linear regression. To predict Δmax{ITD} and , representing the errors between model-based predictions and the corresponding measurement-based estimations of max{ITD} and , respectively, six independent-sample one-way rANOVA models (package t1way from the library WRS2 [73]) were formulated. We entered the factor Model at the levels Model 1, Model 2+ and Model 3, and ran the analyses based on trimmed means (20% trimming level) with 10,000 bootstrap samples, separately for each dataset and error metric. For post-hoc testing, linear contrasts were conducted (package lincon [73]), reporting adjusted p-values given multiple comparisons [74]. All statistical test results were interpreted at the 95% confidence level.
Direction-dependent deviations in model-based ITD predictions from measurement-based ITD estimations were analysed per dataset type by calculating the direction-dependent ITD differences, and subsequently averaging the results across participants.
The frequency-selective performance evaluation relied on evaluating the datasets of all participants for their ITDs in bands of ±100 Hz around centre frequencies of 500 Hz, 1 kHz, 1.25 kHz, and 1.5 kHz. The results were compared to the corresponding model predictions, using the model coefficients as approximated by polynomial regression based on broadband measurement-based ITD estimations between 500 Hz and 1.5 kHz.
Since it is of particular interest how robustly Model 3 performs given variations in the model input data, that is, in the feature subsets introduced in Section 3.4, we evaluated its ITD-prediction performance using the inter- and intra-individual variations in feature subsets, summarised in Table 3.
4 Modelling results
4.1 Effect of model complexity
Figure 8 shows the effect of the individual coefficients on the performance of Model 3 in terms of MAEs and RMSEs, in comparison to Model 1. In HRTF datasets, Model 3 delivered similar results as Model 1 when the predictions were based on the first coefficient (γ1n), and the first and second coefficients (γ1n and γ2n), but on average performed substantially better, when increasing its complexity by including γ3n, although exhibiting larger error variability. In front and rear HARTF datasets, adding γ2n already allowed a refined ITD adaptation in Model 3, which manifested itself in lower prediction errors compared to Model 1. The largest performance improvement in HARTF datasets was observed after adding the second harmonic, that is, γ2n – a trend that continued after adding γ3n, but was no longer as pronounced. Entering higher harmonics would likely only lead to marginal improvements. This said, the specific implementation of Model 3 represents a good compromise between complexity and achievable prediction performance.
Figure 8 Effect of the complexity in Model 3 on mean ITD- prediction errors, averaged across participants, in HRTFs, and front (F) and rear (R) HARTFs. In comparison to i) Model 1, Model 3 was evaluated with ii) γ2n = γ3n = 0, iii) γ3n = 0, and iv) all coefficients, each optimised based on feature subsets that were scaled using polynomial-regression weights (P = 4). Error bars represent standard deviations. |
4.2 Inter-model comparison
Figure 9 compares the measurement-based ITD curves with the ones predicted by the three ITD models. Direction-dependent deviations between the model-based ITD predictions and the corresponding measurement-based ITD estimations are represented by dashed-dotted (Model 1 and Model 3) and dashed (Model 2+) lines. The last three rows show the correlation between modelled and measurement-based ITD maxima, the differences in ITD maxima, Δmax{ITD}, and the arguments of the ITD maxima, . Corresponding quantitative results are presented in Table 4. All reported correlations and predictors in linear regression models were significant (p < .001).
Figure 9 (a) Measurement-based (light grey) and (b), (c) model-based broadband ITD estimations, colour-coded in grey, black and blue, respectively, evaluated for directions in the horizontal plane for HRTF and front (F) and rear (R) HARTF datasets. Deviations from measurement-based ITDs are shown as dashed-dotted (Models 1 and 3) or dotted lines (Model 2+), with standard deviations as shaded areas. (d) Scatter plots comparing measurement-based and model-based ITD maxima, fitted by linear regression lines. Box plots show medians and IQRs of differences in (e) ITD maxima and (f) arguments of the ITD maxima, with whiskers covering 1.5 times the IQR, and outliers being displayed as crosses. Horizontal black lines indicate non-significant (n.s.) mean differences at the 95% confidence level. |
Broadband ITD-model prediction errors, quantified by MAEs and RMSEs, averaged across participants. Bivariate correlation coefficients ρ quantify the correlation strength between measurement-based and model-based ITD maxima, further described by the corresponding linear-regression equations, κ · ITDmeas + λ, and adjusted coefficients of determination R2. The last two columns present model-prediction errors regarding ITD maxima and the differences in the arguments of the ITD maxima.
The results, comparing frequency-selective measurement-based and broadband model-based ITD estimations and predictions, respectively, across datasets and models, are shown for all participants in Figure 10. Due to the influence of noise in the narrowband evaluations, application of the chosen ITD estimation method resulted in discontinuous ITD curves between 25 and 60 deg azimuth (HRTF), 17.5 and 45 deg (front HRTF), and 15 and 45 deg (rear HRTF), mainly for the frequency bands with centre frequencies above 500 Hz, and only for some participants. For this reason, the ITD maxima and the arguments of the ITD maxima were evaluated in an angular range of 90 ± 27.5 deg when calculating the means and standard deviations, represented by error bars.
Figure 10 Results of frequency-selective measurement-based ITD estimations (solid light grey curves) and ITD predictions of (a) Model 1 (dotted grey curves), (b) Model 2+ (dotted black curves) and (c) Model 3 (dotted blue curves), evaluated for the HRTF and front (F) and rear (R) HARTF datasets per participant. The measurement-based ITDs were estimated in frequency bands of ±100 Hz around the corresponding centre frequencies, displayed on the right ordinate. Green and red horizontal and vertical error bars indicate μ ± σ of the arguments of the ITD maxima and ITD maxima of measurement-based and model-based ITD estimations, respectively. |
Noteworthy characteristics of the individual models are discussed based on various factors below.
4.2.1 Estimation of effective head radii
Fitting Model 1 to the measurement-based ITD curves and estimating the effective head radii by polynomial regression resulted in average values of 78 ± 2.2 mm (μ ± σ, range: 74.5–83.6 mm) and 83.0 ± 2.9 mm (μ ± σ, range: 78.6–91.6 mm) for the HRTF and HARTF datasets, respectively. The effective head-radius estimations for the HRTF datasets were slightly lower than the average value of 87 mm (range: 79–95 mm) reported by Algazi et al. [18], with higher average RMSEs, see Table 4, compared to their 32 μs (range: 22–47 μs).
4.2.2 Prediction of interaural-time-difference maxima
Application of the estimated effective head radii to Model 1 resulted in underestimated measurement-based ITD maxima in HRTF datasets, see Figures 9b, 9d and 9e, which tended to be more pronounced for higher ITD maxima. This was confirmed by a strong and significant correlation between max{ITD}-values, as well as the corresponding linear-regression result with a large adjusted coefficient of determination, see left panel of Figure 9d, and the linear-regression equation in Table 4. Reasons for the reduced max{ITD}-values can be traced back to the shape of the measurement-based ITD curves in HRTF data, which started to deviate from the sinusoidal form for the evaluated sub-bands with centre frequencies above 500 Hz, see left panel of Figure 10a. The additional development of dips between 0–50 deg and 120–180 deg likely further contributed to lowered broadband maximum ITD-model magnitudes. These phenomena also tampered with the predictions of ITD maxima in the lowest frequency band of HRTFs, which actually approaches its target frequency range. Interestingly, the radius correction in HARTF datasets resulted in regression lines above but, in case of front HARTFs, closer to the ideal diagonal, albeit with generally lower correlation coefficients and adjusted R-squared values, see centre and right panel of Figure 9d. Deviations in Δmax{ITD}-values between front and rear HARTF datasets further reflected the effect of a slight horizontal rotation of the connection line running through the HA microphones, cf. Figure 6, indicating different effective head radii which were, however, estimated by an average correction offset d10 to the HA-microphones centre.
In accordance with previous results [16, 21, 26], Model 2+ drastically underestimated the ITD maxima in all datasets using the estimated effective head radii, see Figures 9b, 9d and 9e. This was expected since the model is only intended to be used for . Approaching its valid frequency range, the estimation error gradually decreased towards higher frequency bands across datasets, cf. Figure 9b. The corrected effective head radii and the reduced magnitudes of ITD maxima in HARTF datasets additionally favoured smaller prediction errors, see centre and right panels of Figure 9d. However, the prediction consistency for HARTF datasets degraded, entailing steeper but less determined linear regression lines, and lower correlation factors, see Table 4.
Due to its increased complexity and flexibility, Model 3 was able to widely adapt to the ITD-curve-shape properties in all types of datasets. This allowed the dips in ITD curves, problematic for Model 1, to be considered, enabling generally more accurate ITD-curve fittings, which was reflected by -values closer to zero across datasets, see Figures 9d and 9e. Consistently high correlation factors and well-determined linear-regression lines, approaching the ideal diagonals, indicate superior performance of Model 3, see Figure 9d and Table 4. Although slightly underestimating the ITD maxima in the lowest frequency bands around 500 Hz, a very good prediction performance across the remaining evaluated frequency bands was observed, outperforming especially Model 2+, see Figure 10c).
Table 5 summarises the results from rANOVA and post-hoc tests, analysing trimmed-mean differences in Δmax{ITD} and , with indices of Δ corresponding to the respective ITD models, see also Figures 9e and 9f. All rANOVAs of Δmax{ITD}-values turned out to be significant with large explanatory measures of effect size ξ. Post-hoc test results indicated significant trimmed-mean differences for all model comparisons, in all datasets. In HRTF datasets, the lowest mean model-prediction errors closest to zero were observed for Model 3, increasing significantly towards underestimation in Model 1, and particularly in Model 2+. In HARTF datasets, Model 3 again outperformed Model 1. While, compared to the prediction results in HRTF data, application of Model 1 resulted in higher median and mean errors in rear and front HARTF datasets, with a general tendency towards overestimation, Model 2+ showed reduced underestimation tendency for reasons discussed earlier. These results were supported in the same way by MAEs and RMSEs, see Table 4.
rANOVA table with degrees of freedom (df), Fisher-ratios (F), p-values, and explanatory measures of effect size ξ. The -statistics of linear contrasts are reported with corrected p-values given multiple comparisons. The indices of Δ correspond to the ITD-model variants.
4.2.3 Prediction of the arguments of the interaural-time-difference maxima
Due to the approximately sinusoidal shape of the ITD curves in the HRTF datasets, all models performed comparably when predicting the arguments of the ITD maxima, yielding median and mean -values close to zero with rather small spreads but some outliers, see left panels of Figures 9b and 9f, and Table 4.
The mathematical form of Model 1 only allowed a unique argument of the ITD maximum at φ = 90°, resulting in an overestimation of arguments of measurement-based ITD maxima in HARTF datasets, see centre and right panels of Figures 9b and 9f, and Table 4.
This limitation was specifically addressed by Model 2+, allowing a shift of the argument of the ITD maximum by taking account for a horizontal ear-canal and a HA-microphones angle, ΘE and ΘHA, respectively. Although the predictions were shifted in the right direction, the measurement-based -values were underestimated by Model 2+, and the error exhibited a larger spread. Reasons for this can be sourced from the assumption of a spherical geometric model [26] for calculating ΘE and ΘHA, see Figure 6 and equations (A1) and (A3). Calculations based on a refined ellipsoidal head [21] or extracting the angles based on photos from the top, or photogrammetric data [19, 75, 76] would result in smaller horizontal angles and likely lead to more accurate estimations in HARTF datasets using Model 2+. The observations for Model 1 and Model 2+ also apply to the results when comparing the model-based broadband ITD predictions to measurement-based narrowband ITD estimations, see Figure 10.
Model 3 eliminated the shortcomings of the two other models and resulted in -values close to zero across HARTF datasets, see Figures 9c and 9f and Table 4, when compared with the narrowband ITD estimations, see Figure 10c. Although the effective head radius was predefined, the measurement-based ITD curves could still be fitted precisely due to the remaining degrees of freedom in Model 3.
The already hinted performance similarity across models in HRTF datasets regarding was corroborated by a non-significant rANOVA, see Table 5. Both rANOVAs of the results for HARTF datasets were significant with large ξ. All post-hoc test results were significant, except between Model 2+ and 3 in rear HARTF datasets, indicating a certain degree of prediction accuracy in Model 2+ when estimating the arguments of the ITD maxima, which, however, was accompanied by a large prediction variance. Based on the error distributions, cf. Figures 9f, and the results from the post-hoc tests, the best overall performance can still be attributed to Model 3, while observing largely significant over- and underestimation in Model 1 and Model 2+, respectively, which was also supported by the patterns in MAEs and RMSEs, see Table 4.
4.2.4 Direction-dependent prediction of interaural time differences
Figures 9b and 9c show the direction-dependent ITD-prediction errors by each model. All results provided in this section represent means and standard deviations, μ ± σ, averaged across participants.
Model 1 allowed to achieve fair ITD predictions in HRTFs datasets, exhibiting a nearly symmetrical error pattern at the frontal plane across the evaluated directions in the horizontal plane, see Figure 9b. On average, the ITD predictions in the azimuth ranges within 0–90 deg and 90–180 deg reached maximum deviations of 36.9 ± 24.5 μs (at 32.5 deg) and 70.1 ± 21.0 μs (at 142.5 deg), respectively, dropping to minimum prediction errors of −49.5 ± 17.6 μs (at 65 deg) and −44.3 ± 8.7 μs (at 105 deg). Due to the shift of and the aforementioned curve irregularities in measurement-based ITD estimations in front and rear HARTF datasets, the model predictions within 0–90 deg changed to minimum and maximum errors of −45.6 ± 21.2 μs (at 40 deg) and 41.7 ± 23.8 μs (at 90 deg; rear HARTFs: −43.5 ± 22.6 μs at 35 deg, and 61.8 ± 25.5 μs at 90 deg), respectively, and resulted in prominent overestimations up to 170.4 ± 25.5 μs (at 125 deg; rear HARTFs: 197.5 ± 24.7 μs at 122.5 deg) within 90–180 deg.
Model 2+ consistently underestimated the measurement-based ITDs of HRTF datasets and also showed a rather symmetrical error pattern at the frontal plane, see Figure 9c. Deviation minima of −201.8 ± 25.9 μs and −198.8 ± 14.7 μs occurred at 62.5 deg and 112.5 deg, respectively, with a local maximum of −157.4 ± 16.8 μs at 90 deg. The ITD-deviation patterns shifted towards the same direction as did in front HARTF datasets with slightly higher minima of −180.3 ± 16.7 μs at 42.5 deg (rear HARTFs: −172.5 ± 19.0 μs at 40 deg), and −138.1 ± 30.8 μs at 92.5 deg (rear HARTFs: −116.7 ± 31.2 μs at 90 deg), and a local maximum of −112.2 ± 26.1 μs at 75 deg (rear HARTFs: −97.0 ± 25.1 μs at 75 deg). Noticeably lower model-prediction errors were present in both types of HARTF datasets for the evaluated directions above 120 deg.
Model 3 also entailed a roughly symmetrical error pattern at the frontal plane in HRTF datasets with the lowest fluctuation among all models, resulting in minimum prediction errors of −39.7 ± 18.1 μs and −38.0 ± 13.6 μs at 55 deg and 120 deg, respectively. A maximum model-prediction error of 13.9 ± 9.8 μs occurred at 85 deg. The prediction errors also remained consistently low in front HARTF datasets across all analysed directions with minimum deviations of −31.8 ± 16.8 μs and −20.1 ± 14.9 μs, appearing at 40 deg and 155 deg (rear HARTFs datasets: −25.1 ± 19.6 μs and −19.3 ± 13.6 μs at 35 deg and 155 deg), respectively. Maximum errors of 17.8 ± 28 μs and 27.6 ± 17.5 μs at 17.5 deg and 120 deg (rear HARTFs datasets: 12.9 ± 26.7 μs and 34.7 ± 17.4 μs at 12.5 deg and 117.5 deg, respectively) were observed.
4.3 Model robustness
The resulting ITD curves, predicted by Model 3 given inter- and intra-individual variations in the manually extracted feature subsets, cf. Table 3, are shown in Figure 11. In view of the unevenly distributed gender distribution, extracting the features from a female participant to test the model-prediction performance represents a somewhat stringent scenario. The corresponding mean MAEs and RMSEs in ITDModel 3 are presented in Table 6 and compared to the mean ITD errors obtained for the originally extracted feature values.
Figure 11 Effect on ITDModel 3 given (a) inter-individual and (b) intra-individual variations in feature subsets, whose elements were repeatedly extracted from participant 18 (female). Solid and dotted lines indicate measurement-based and model-based ITD estimations and predictions, respectively, in HRTFs, and front (F) and rear (R) HARTFs. |
Effect of feature variations on MAEs and RMSEs in the ITD predictions of Model 3. The errors were calculated by evaluating the predictions of Model 3 using inter- and intra-individual variations when extracting the selected anthropometric and HA features from participant 18 (female) multiple times, see Table 3, in comparison to measurement-based ITD estimations. Deviations with respect to the mean MAEs and RMSEs, cf. Table 4, are indicated by Δμ, with percentage deviations provided in brackets.
Exhibiting comparable magnitudes, the lowest errors were observed when predicting the ITDs of HRTFs, regardless of using inter- or intra-individual variations in feature extractions. With regard to the results when applying inter-individual variations, the ITD errors increase for rear HARTF datasets, and most noticeable for front HARTF datasets. Similar tendencies can be observed given intra-individual feature variations, although the mean error increase is not as pronounced as in the former scenario.
5 Discussion
5.1 Inter-model comparison
In comparison to Model 1 and especially Model 2+, the results from the specific model evaluation collectively indicate that Model 3 is superior in terms of ITD-prediction accuracy, while addressing their shortcomings. This conclusion applies to direction-dependent ITD predictions, regardless of the type of directional dataset, to the estimation of maximum ITD-values and the arguments of the ITD maxima (except for ITDs in HRTFs), as well as to MAEs and RMSEs. Also when compared to the measurement-based ITD estimations in frequency sub-bands within the target frequency range, Model 3 delivered more consistent and precise prediction patterns. Although MAEs and RMSEs increase after considering inter- and intra-individual variations in feature extractions, Model 3 demonstrated robustness and still resulted in ITD predictions that were substantially below the best-case prediction errors of the other two models. A more pronounced error increase in predicted ITDs for front HARTFs indicates that particular care is advised when manually extracting the corresponding feature subset. Note that these ITD predictions, provided variations of the selected female feature subset, are likely biased due to the gender distribution among participants, and thus may have been pushed towards the upper range of the underlying error distribution.
5.2 Perceptual implications
5.2.1 Just-noticeable differences
For a better understanding of the perceptual implications, the observed ITD-prediction errors of the individual models need to be compared to the JNDs in ITDs known from the literature.
One of the earliest studies investigating the JNDs in listeners with normal hearing was conducted by Klumpp and Eady [77], reporting values as low as 9 μs for stimuli which were band-limited between 200 Hz and 1.7 kHz. Similarly low values of 10 μs were observed when discriminating pulsed sinusoidal stimuli [78], and even lower ones of 6.9 μs in trained listeners (18.1 μs in untrained listeners) [79]. Aussal et al. [61] calculated a more relaxed threshold of 16 μs which, however, was not verified perceptually. Simon et al. [80] also reported higher perceptual JNDs of 33 μs and 68 μs at 30 deg azimuth and 90 deg azimuth, respectively, based on an alternative forced-choice left/right discrimination protocol. In perceptual experiments investigating the influence of simulated room reflections on JNDs using stimuli based on musical instruments [81], thresholds of 20 μs were determined, increasing by five to eight times higher values in reverberation.
Spencer et al. [82] measured notably higher average JNDs of 68 μs in low-frequency ITDs with high inter-individual variability in participants with symmetric hearing loss, which were not significantly different from those with normal hearing in their study. Phase distortions and corresponding ITD errors are likely to be expected when such thresholds are measured via bilateral HAs that are not synchronised (for reviews, see [83, 84]). Poorest performance with JNDs of 150 μs and 100–350 μs has been observed in bilateral cochlear-implant users [85] and bimodally-fitted listeners [86], respectively.
The lowest reported JNDs generally impose only minimal tolerance of prediction errors in ITD models. Neither of the compared models in the present investigation would fulfil the highest perceptual standards in terms of MAEs and RMSEs when using the modelled ITDs of HRTF datasets for applications involving listeners with normal hearing. Using more relaxed JND values in the upper range of 16–68 μs, Model 1 and especially Model 3 would be suitable for modelling purposes. The upper JND limits are also largely complied with by Model 1, and entirely by Model 3, when accounting for direction-dependent ITD-prediction errors.
Considering the JNDs in listeners with symmetrical hearing loss, Model 1 would severely distort but roughly reproduce the ITD characteristics in HARTF datasets. However, Model 1 particularly violates the perceptual demands for azimuth angles between 90 and 180 deg. In view of the additional localisation errors that already occur when using behind-the-ear receiver-in-ear HAs [9, 10], application of the significantly more precise Model 3, is therefore recommended, particularly in view of the consistently lower direction-dependent ITD-prediction errors.
5.2.2 Outlook on perceptual model evaluation
In addition to the model assessment based on JNDs, future listening experiments are necessary to evaluate Model 3. Application of the required azimuthal rotation on the individual datasets during post-processing likely introduced an error in the resulting measurement-based ITD estimations, which consequently affected the modelling procedure and the model-based ITD predictions. It remains to be investigated whether this post-processing step is noticeable and has perceptual effects. In this context, it is of particular importance to systematically examine the influence of the observed ITD-prediction errors on spatial audio quality metrics, such as, for example, sound localisation in the horizontal plane. Variations of previously manipulated factors and their evaluation [9, 10] may be extended to include a weighting between the ITDs from HRTFs when utilising residual hearing, which are superimposed with the ITDs from behind-the-ear HARTFs [6]. To this end, it is worth investigating the influence of different types of ear pieces, which can be included as additional filters, or (binaural) HA algorithms. The results of such experiments involving young adults with normal hearing will likely allow to derive important baseline measures, and further corroborate the perceptual validity of the proposed ITD model.
5.3 Limitations
Head asymmetries are not addressed by any the compared models in their current mathematical forms, only allowing ITD predictions of zero at 0 deg azimuth. The attempt to introduce an identical phase-shift coefficient in all sine terms of Model 3 initially led to promising ITD predictions but resulted in robustness issues when varying the magnitudes in feature subsets. Applying the proposed ITD model using feature subsets measured from women may lead to increased prediction errors given the unequal gender distribution with a bias towards male participants in the presented database. However, we proposed an extensible model framework for the estimation of specific model coefficients, which can be adapted to new input data. For more general applicability, the model may subsequently be extended to include elevation angles [87, 88].
6 Conclusion
We presented a hybrid multi-harmonic model (AMT, pausch2022 [43]) for the prediction of ITDs in different types of spatial transfer functions, that is, HRTFs, and front and rear HARTFs, for arbitrary directions in the horizontal plane. The model coefficients were derived from the horizontal-plane datasets of the IHTA-indHARTF database [42], containing the individual receiver directivities from 30 adults, wearing two behind-the-ear research HAs and in-ear microphones at the blocked ear-canal entrance. Tailored to each type of spatial transfer function, optimal polynomial weights were calculated to facilitate the estimation of individualised model coefficients by scaling individual subsets of anthropometric and HA features. Objective evaluations against the simple and extended harmonic analytic models by Kuhn and Woodworth, respectively, revealed superior ITD-prediction performance of the proposed model when compared to the broadband, frequency-selective, and direction-dependent measurement-based ITD estimations in the evaluated frequency range, regardless of the considered type of spatial transfer function. It was demonstrated that the two analytic models cannot satisfactorily predict the (arguments of the) ITD maxima and the generally more complex ITD-curve shapes in HARTF datasets. In this context, the extended Woodworth model suffered from an oversimplified calculation of the horizontal HA-microphones angle based on a spherical head geometry, motivating the use of more accurate calculation or acquisition methods. In addition to the prediction accuracy of the proposed model, being within an acceptable range in view of perceptual requirements, robust ITD predictions are feasible even under varying input data, which further substantiates its applicability.
Acknowledgments
We thank all study participants, and Julian Burger for manually extracting the individual features from the photos. We also appreciate the constructive comments by Robert Baumgartner and the two anonymous reviewers. Special thanks go to Clara Hollomey and Piotr Majdak for the smooth model integration into the AMT.
Declaration of conflicting interests
We declare that there are no conflicts of interest. GN ReSound was not involved in any stage of this publication other than in providing the hardware.
Funding
The work of F.P. was supported by the European Union’s Seventh Framework Programme for research, technological development and demonstration (grant no. 607139, Improving Children’s Auditory Rehabilitation), and partially financed within the Horizon 2020 research and innovation funding programme (grant no. 101017743, SONICOM–Transforming auditory-based social interaction and communication in AR/VR). S.D. received funding from the German Research Foundation (grant no. 402811912, Individual Binaural Synthesis of Virtual Acoustic Scenes).
Data availability statement
The IHTA-indHARTF database is available for download via RWTH Publications at https://doi.org/10.18154/RWTH-2022-04267 [42]. Implementations of the presented models (pausch2022) and corresponding simulations (exp_pausch2022) are publicly available as part of the Auditory Modeling Toolbox (AMT, https://www.amtoolbox.org) [89] in the release of version 1.2.0 [43].
Appendix A
Calculation of additional features
For simplification, the horizontal ear-canal angle ΘE and the horizontal HA-microphones angle ΘHA, both with respect to the median plane, were calculated based on the assumption of a spherical head [26]. We estimated the effective by-participant head radii an as described in Section 3.2. Projecting the pinna offset back x5 on the sphere led to
The horizontal distance ζ between the HA-microphones centre and the ear-canal entrance was calculated by
which was defined along the vertical plane running through the axis of the HA microphones, see Figure 6. We assumed the HA-microphones centre to be separated from the head centre by a distance (an + d10). From the resulting triangle between the ear-canal entrance, the head centre and the centre of the HA microphones, the horizontal HA-microphones angle was determined by (A3)
using the law of cosines. The calculation of the HA-microphones offset up d13 included the known pinna offset down x4, yielding(A4)
Finally, the HA-microphones offset back d14, describing the horizontal displacement relative to the head centre, was determined by
Appendix B
Multivariate polynomial regression
To estimate the effective head radius and the dataset-specific sets of model coefficients of Model 3, cf. equation (3), based on M features, a multivariate polynomial-regression model (see, e.g., [90]) with a polynomial degree of P was defined according to equation (B1)
or, equivalently, as
The matrices A, B and E contain the individual features, polynomial regression weights, and residual errors, respectively. Note that the effective head radii in both types of HARTF datasets differ from those in the HRTF dataset after accounting for the HA-microphones-to-scalp offset d10, cf. Section 3.2, as do the feature-subset selections, cf. Section 3.4. Minimising the sums of squared errors, regarding B, with (·)T representing the transpose, yielded (MP + 1) × 4 optimal polynomial-regression weights [90]
which were estimated in R (packages lm and poly both from the library stats) for ordinary polynomials. Based on the resulting set of optimal weights, the ITD-model coefficients were approximated through
To obtain the model coefficients for one participant using the corresponding feature subset, equation (B4) reduces to
with and . For Model 1 and 2+, the multivariate regression model simplifies to a univariate one, with the estimated polynomial-regression weights contained in a vector , cf. first column of .
Definitions of orientation and position rely on a right-handed head-related spherical coordinate system. Counter-clockwise increasing azimuth angles, and elevation angles are represented by φ ∈ [0°, 360°) and ϑ ∈ [−90°, 90°], respectively. In default position and orientation, the listener is oriented in negative z-axis, with the x-axis representing the inter-aural axis, both axes intersecting the origin of the global coordinate system. Intrinsic rotations within the local coordinate system associated with the participant’s head are defined by roll, pitch, and yaw angles, increasing clockwise around the negative z-axis, the positive x-axis, and the positive y-axis, respectively.
For the current HRTF dataset, linear regression analysis yielded an,reg = 0.27 · x1/2 + 0.11 · x2/2 + 0.06 · x3/2 + 0.04 mm (μ ± σ = 79.5 ± 1.9 mm), predicting the effective head radii obtained through fitting Model 1, with ITD maxima strongly correlating with the measurement results (ρ = .79, p < .001).
References
- M. Vorländer: Auralization – fundamentals of acoustics, modelling, simulation, algorithms and acoustic virtual reality, Springer, 2020. [Google Scholar]
- J. Blauert: Spatial hearing: the psychophysics of human sound localization, MIT Press, Cambridge, MA, 1997. [Google Scholar]
- H. Kayser, S.D. Ewert, J. Anemuller, T. Rohdenburg, V. Hohmann, B. Kollmeier: Database of multichannel in-ear and behind-the-ear head-related and binaural room impulse responses. EURASIP Journal on Advances in Signal Processing 2009 (2009) 6:1–6:10. https://doi.org/10.1155/2009/298605. [CrossRef] [Google Scholar]
- M.F. Mueller, A. Kegel, S.M. Schimmel, N. Dillier, M. Hofbauer: Localization of virtual sound sources with bilateral hearing aids in realistic acoustical scenes. The Journal of the Acoustical Society of America 131, 6 (2012) 4732–4742. https://doi.org/10.1121/1.4705292. [CrossRef] [PubMed] [Google Scholar]
- F. Denk, S.M.A. Ernst, S.D. Ewert, B. Kollmeier: Adapting hearing devices to the individual ear acoustics: database and target response correction functions for various device styles. Trends in Hearing 22 (2018) 2331216 518779313. https://doi.org/10.1177/2331216518779313. [Google Scholar]
- F. Pausch, L. Aspock, M. Vorländer, J. Fels: An extended binaural real-time auralization system with an interface to research hearing aids for experiments on subjects with hearing loss. Trends in Hearing 22 (2018) 2331216518800871. https://doi.org/10.1177/2331216518800871. [CrossRef] [Google Scholar]
- V. Durin, S. Carlile, P. Guillon, V. Best, S. Kalluri, Acoustic analysis of the directional information captured by five different hearing aid styles, The Journal of the Acoustical Society of America 136, 2 (2014) 818–828. https://doi.org/10.1121/1.4883372. [CrossRef] [PubMed] [Google Scholar]
- F. Denk, S.D. Ewert, B. Kollmeier: Spectral directional cues captured by hearing device microphones in individual human ears. The Journal of the Acoustical Society of America 144, 4 (2018) 2072–2087. https://doi.org/10.1121/1.5056173. [CrossRef] [PubMed] [Google Scholar]
- F. Denk, S.D. Ewert, B. Kollmeier: On the limitations of sound localization with hearing devices. The Journal of the Acoustical Society of America 146, 3 (2019) 1732–1744. https://doi.org/10.1121/1.5126521. [CrossRef] [PubMed] [Google Scholar]
- F. Pausch, J. Fels: Localization performance in a binaural real-time auralization system extended to research hearing aids. Trends in Hearing 24 (2020) 2331216520908704. https://doi.org/10.1177/2331216520908704. [CrossRef] [Google Scholar]
- J.-M. Jot, V. Larcher, O. Warusfel: Digital signal processing issues in the context of binaural and transaural stereophony. Audio Engineering Society Convention 98 (1995) Feb. 1995. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=7786. [Google Scholar]
- H. Møller, P. Minnaar, S.K. Olesen, F. Christensen, J. Plogsties: On the audibility of all-pass phase in electroacoustical transfer functions. Journal of the Audio Engineering Society 55, 3 (2007) 115–134. [Google Scholar]
- S. Mehrgardt, V. Mellert: Transformation characteristics of the external human ear. The Journal of the Acoustical Society of America 61, 6 (1977) 1567–1576. https://doi.org/10.1121/1.381470. [CrossRef] [PubMed] [Google Scholar]
- D.J. Kistler, F.L. Wightman: A model of head- related transfer functions based on principal components analysis and minimum-phase reconstruction. The Journal of the Acoustical Society of America 91, 3 (1992) 1637–1647. https://doi.org/10.1121/1.402444. [CrossRef] [PubMed] [Google Scholar]
- J. Plogsties, P. Minnaar, S.K. Olesen, F. Christensen, H. Moller: Audibility of all-pass components in head-related transfer functions. Audio Engineering Society Convention 108 108 (2000) Feb. 2000. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=9206. [Google Scholar]
- R.S. Woodworth, H. Schlosberg: Experimental psychology. Rev ed., Holt, Oxford, England, 1954. [Google Scholar]
- G.F. Kuhn: Model for the interaural time differences in the azimuthal plane. The Journal of the Acoustical Society of America 62, 1 (1977) 157–167. https://doi.org/10.1121/1.381498. [CrossRef] [Google Scholar]
- V.R. Algazi, C. Avendano, R.O. Duda: Estimation of a spherical-head model from anthropometry. Journal of the Audio Engineering Society 49, 6 (2001) 472–479. [Online]. Available: http://www.aes.org/elib/browse.cfm?elib=10188. [Google Scholar]
- V.R. Algazi, R.O. Duda, D.M. Thompson, C. Avendano: The CIPIC HRTF database, in Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No. 01TH8575). 2001, 99–102. [CrossRef] [Google Scholar]
- R. Duda, C. Avendano, V. Algazi: An adaptable ellipsoidal head model for the interaural time difference, in 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258), Vol. 2. 1999, 965–968. https://doi.org/10.1109/ICASSP.1999.759855. [CrossRef] [Google Scholar]
- R. Bomhardt, M. Lins, J. Fels: Analytical ellipsoidal model of interaural time differences for the individualization of head-related impulse responses. Journal of the Audio Engineering Society 64, 11 (2016) 882–894. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=18525. [CrossRef] [Google Scholar]
- B.F.G. Katz: Boundary element method calculation of individual head-related transfer function. I. Rigid model calculation. The Journal of the Acoustical Society of America 110, 5 (2001) 2440–2448. https://doi.org/10.1121/1.1412440. [CrossRef] [PubMed] [Google Scholar]
- N.A. Gumerov, A.E. O’Donovan, R. Duraiswami, D.N. Zotkin: Computation of the head-related transfer function via the fast multipole accelerated boundary element method and its spherical harmonic representation. The Journal of the Acoustical Society of America 127, 1 (2010) 370–386. https://doi.org/10.1121/1.3257598. [CrossRef] [PubMed] [Google Scholar]
- H. Ziegelwanger, P. Majdak: Modeling the direction-continuous time-of-arrival in head-related transfer functions. The Journal of the Acoustical Society of America 135, 3 (2014) 1278–1293. https://doi.org/10.1121/1.4863196. [CrossRef] [PubMed] [Google Scholar]
- H. Ziegelwanger, P. Majdak, W. Kreuzer: Numerical calculation of listener-specific head-related transfer functions and sound localization: Microphone model and mesh discretization. The Journal of the Acoustical Society of America 138, 1 (2015) 208222. https://doi.org/10.1121/1.4922518. [CrossRef] [PubMed] [Google Scholar]
- N.L. Aaronson, W.M. Hartmann: Testing, correcting, and extending the Woodworth model for interaural time difference. The Journal of the Acoustical Society of America 135, 2 (2014) 817–823. https://doi.org/10.1121/1.4861243. [CrossRef] [PubMed] [Google Scholar]
- D. Romblom, H. Bahu: Blockhead: A simple geometric head model, in Audio Engineering Society Conference: 2019 AES International Conference on Headphone Technology, Aug. 2019 [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=20502. [Google Scholar]
- F. Denk, B. Kollmeier: The Hearpiece database of individual transfer functions of an in-the-ear earpiece for hearing device research. Acta Acustica 5 (2021) 2. https://doi.org/10.1051/aacus/2020028. [CrossRef] [EDP Sciences] [Google Scholar]
- A.H. Moore, J.M. de Haan, M.S. Pedersen, P.A. Naylor, M. Brookes, J. Jensen: Personalized signal-independent beamforming for binaural hearing aids. The Journal of the Acoustical Society of America 145, 5 (2019) 2971–2981. https://doi.org/10.1121/1.5102173. [CrossRef] [PubMed] [Google Scholar]
- F. Pausch, Z.E. Peng, L. Aspöck, J. Fels: Speech perception by children in a real-time virtual acoustic environment with simulated hearing aids and room acoustics, in 22nd International Congress on Acoustics: ICA 2016, Invited paper, Buenos Aires, Catholic University of Argentina: Asociacion de Acusticos Argentinos, Sep. 5, 2016. [Online]. Available: http://www.ica2016.org.ar/ica2016proceedings/ica2016/ICA2016-0431.pdf. [Google Scholar]
- A.D. Brown, F.A. Rodriguez, C.D.F. Portnuff, M.J. Goupell, D.J. Tollin: Time-varying distortions of binaural information by bilateral hearing aids: effects of nonlinear frequency compression. Trends in Hearing 20 (2016) 2331216516668303. https://doi.org/10.1177/2331216516668303. [CrossRef] [Google Scholar]
- F.P. Itturriet, M.H. Costa: Perceptually relevant preservation of interaural time differences in binaural hearing aids. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 4 (2019) 753–764. https://doi.org/10.1109/TASLP.2019.2895973. [CrossRef] [Google Scholar]
- T.J. Klasen, T. Van den Bogaert, M. Moonen, J. Wouters: Binaural noise reduction algorithms for hearing aids that preserve interaural time delay cues. IEEE Transactions on Signal Processing 55, 4 (2007) 1579–1585. https://doi.org/10.1109/TSP.2006.888897. [CrossRef] [Google Scholar]
- T. Van de Bogaert, J. Wouters, T. Klasen, M. Moonen: Distortion of interaural time cues by directional noise reduction systems in modern digital hearing aids, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, 57–60. https://doi.org/10.1109/ASPAA.2005.1540167. [CrossRef] [Google Scholar]
- H. Husstedt, A. Mertins, M. Frenz: Evaluation of noise reduction algorithms in hearing aids for multiple signals from equal or different directions. Trends in Hearing 22 (2018) 2331216518803198. https://doi.org/10.1177/2331216518803198. [CrossRef] [Google Scholar]
- M. Jeub, M. Schafer, T. Esch, P. Vary: Model-based dereverberation preserving binaural cues. IEEE Transactions on Audio, Speech, and Language Processing 18, 7 (2010) 1732–1745. https://doi.org/10.1109/TASL.2010.2052156. [CrossRef] [Google Scholar]
- F. Brinkmann, A. Lindau, S. Weinzierl: On the authenticity of individual dynamic binaural synthesis. The Journal of the Acoustical Society of America 142, 4 (2017) 1784–1795. https://doi.org/10.1121/1.5005606 [CrossRef] [PubMed] [Google Scholar]
- A.W. Mills: On the minimum audible angle. The Journal of the Acoustical Society of America 30, 4 (1958) 237–246. https://doi.org/10.1121/1.1909553. [CrossRef] [Google Scholar]
- D.R. Perrott, S. Pacheco: Minimum audible angle thresholds for broadband noise as a function of the delay between the onset of the lead and lag signals. The Journal of the Acoustical Society of America 85, 6 (1989) 2669–2672. https://doi.org/10.1121/1.397764. [CrossRef] [PubMed] [Google Scholar]
- D. Schröder: Physically based real-time auralization of interactive virtual environments. Ph.D. dissertation, Institute of Technical Acoustics, RWTH Aachen University, Germany. 2011. [Google Scholar]
- A. Lindau, H.-J. Maempel, S. Weinzierl: Minimum BRIR grid resolution for dynamic binaural synthesis. The Journal of the Acoustical Society of America 123, 5 (2008) 3498–3498. https://doi.org/10.1121/1.2934364. [CrossRef] [Google Scholar]
- F. Pausch, S. Doma, J. Fels: IHTA-indHARTF – database of individual behind-the-ear hearing-aid-related transfer functions with high spatial resolution, Institute for Hearing Technology and Acoustics, RWTH Aachen. 2022. https://doi.org/10.18154/RWTH-2022-04267. [Google Scholar]
- The AMT Team: The auditory modeling toolbox full package (version 1.x), 2022. [Online]. Available: https://sourceforge.net/projects/amtoolbox/files/AMT%201.x/amtoolbox-full-1.2.0.zip/download. [Google Scholar]
- W. Hartmann, E. Macaulay: Anatomical limits on interaural time differences: An ecological perspective. Frontiers in Neuroscience 8 (2014) 34. https://doi.org/10.3389/fnins.2014.00034. [CrossRef] [PubMed] [Google Scholar]
- EU: European Union, Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal L110 59 (2016) 1–88. [Google Scholar]
- AES69-2020: AES69-2020, AES Standard for file exchange – Spatial acoustic data file format, Audio Engineering Society, Inc., New York, NY, USA, Standard AES69-2020, Dec. 6, 2020. p. 2020. [Google Scholar]
- F. Wefers: OpenDAFF – Ein freies quell-offenes Software-Paket fur richtungsabhängige Audiodaten, in Fortschritte der Akustik : 36. Deutsche Jahrestagung für Akustik; 15.–18. Marz 2010, Dt. Ges. fur Akustik, Berlin. 2010. [Online]. Available: http://publications.rwth-aachen.de/record/118694. [Google Scholar]
- L.L. Beranek, H.P. Sleeper: The design and construction of anechoic sound chambers. The Journal of the Acoustical Society of America 18, 1 (1946) 140–150. https://doi.org/10.1121/1.1916351. [Google Scholar]
- J.-G. Richter: Fast measurement of individual head-related transfer functions. Ph.D. Dissertation, RWTH Aachen University. 2019. https://doi.org/10.18154/RWTH-2019-04006. [CrossRef] [Google Scholar]
- J. Richter, J. Fels: On the influence of continuous subject rotation during high-resolution head-related transfer function measurements. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 4 (2019) 730–741. https://doi.org/10.1109/TASLP.2019.2894329. [CrossRef] [Google Scholar]
- P. Majdak, P. Balazs, B. Laback: Multiple exponential sweep method for fast measurement of head-related transfer functions. Journal of the Audio Engineering Society 55, 7/8 (2007) 623–637 [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=14190. [Google Scholar]
- P. Dietrich, B. Masiero, M. Vorländer: On the optimization of the multiple exponential sweep method. Journal of the Audio Engineering Society 61, 3 (2013) 113124 [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=16672. [Google Scholar]
- M. Berzborn, R. Bomhardt, J. Klein, J.-G. Richter, M. Vorländer: The ITA-Toolbox: An open source MATLAB toolbox for acoustic measurements and signal processing, in 43th Annual German Congress on Acoustics, 6 Mar 2016 – 9 Mar 2017, Kiel (Germany). 2019, pp. 222–225. [Google Scholar]
- F. Denk, J. Heeren, S.D. Ewert, B. Kollmeier, S.M. Ernst: Controlling the head position during individual HRTF measurements and its effect on accuracy, Fortschritte der Akustik-DAGA, Kiel, Germany. 2017. [Google Scholar]
- D.R. Begault, M. Godfroy, J.D. Miller, A. Roginska, M. Anderson, E.M. Wenzel: Design and verification of HeadZap, a semi-automated HRIR measurement system, in AES 120th Convention, 2006 May 20–23, Paris, France, 2006: 1–19. [Google Scholar]
- T. Hirahara, H. Sagara, I. Toshima, M. Otani: Head movement during head-related transfer function measurements. Acoustical Science and Technology 31, 2 (2010) 165–171. [CrossRef] [Google Scholar]
- O. Kirkeby, P.A. Nelson, H. Hamada, F. Orduna-Bustamante: Fast deconvolution of multichannel systems using regularization. IEEE Transactions on Speech and Audio Processing 6, 2 (1998) 189194. [CrossRef] [Google Scholar]
- S. Park, L. Linsen, O. Kreylos, J. Owens, B. Hamann: Discrete Sibson interpolation. IEEE Transactions on Visualization and Computer Graphics 12, 2 (2006) 243–253. https://doi.org/10.1109/TVCG.2006.27. [CrossRef] [PubMed] [Google Scholar]
- J.C. Middlebrooks: Individual differences in external-ear transfer functions reduced by scaling in frequency. The Journal of the Acoustical Society of America 106, 3 Pt 1 (1999) 1480–1492. https://doi.org/10.1121/1.427176. [CrossRef] [PubMed] [Google Scholar]
- A. Andreopoulou, B.F.G. Katz: Identification of perceptually relevant methods of inter-aural time difference estimation. The Journal of the Acoustical Society of America 142, 2 (2017) 588–598. https://doi.org/10.1121/1.4996457. [CrossRef] [PubMed] [Google Scholar]
- M. Aussal, F. Alouges, B. Katz: HRTF interpolation and ITD personalization for binaural synthesis using spherical harmonics, in Audio Engineering Society Conference: UK 25th Conference: Spatial Audio in Today’s 3D World, Mar. 2012. [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=18111. [Google Scholar]
- J.-G. Richter, G. Behler, J. Fels, Evaluation of a fast HRTF measurement system, in Audio Engineering Society Convention 140, Audio Engineering Society. 2016. [Google Scholar]
- E.H.A. Langendijk, A.W. Bronkhorst: Contribution of spectral cues to human sound localization. The Journal of the Acoustical Society of America 112, 4 (2002) 1583–1596. https://doi.org/10.1121/1.1501901. [CrossRef] [PubMed] [Google Scholar]
- M.D. Abràmoff, P.J. Magalhães, S.J. Ram: Image processing with ImageJ. Biophotonics International 11, 7 (2004) 36–42. [Google Scholar]
- T. Wei, V. Simko, R package “corrplot”: visualization of a correlation matrix (Version 0.84). 2017 [Online]. Available: https://github.com/taiyun/corrplot. [Google Scholar]
- J.W. Kling, L.A. Riggs: Woodworth & Schlosberg’s experimental psychology. 1971. [Google Scholar]
- R Core Team: Language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2021 [Online]. Available: https://www.R-project.org/. [Google Scholar]
- RStudio Team: RStudio: integrated development environment for R, RStudio, PBC, Boston, MA. 2021 [Online]. Available: http://www.rstudio.com/. [Google Scholar]
- R.V.L. Hartley, T.C. Fry: The binaural location of pure tones. Physical Review 18 (1921) 431–442. https://doi.org/10.1103/PhysRev.18.431. [CrossRef] [Google Scholar]
- D.S. Brungart, W.M. Rabinowitz: Auditory localization of nearby sources. Head-related transfer functions. The Journal of the Acoustical Society of America 106, 3 (1999) 1465–1479. https://doi.org/10.1121/1.427180. [CrossRef] [PubMed] [Google Scholar]
- J.C. Nash: nlmrt: Functions for nonlinear least squares solutions. R package version 2016.3.2. 2016. [Online]. Available: https://CRAN.R-project.org/package=nlmrt. [Google Scholar]
- D.C. Montgomery, E.A. Peck, G.G. Vining: Introduction to linear regression analysis, John Wiley & Sons. 2021. [Google Scholar]
- P. Mair, R. Wilcox: Robust statistical methods in R using the WRS2 package. Behavior Research Methods 52 (2020) 464–488. [CrossRef] [PubMed] [Google Scholar]
- R.R. Wilcox: Improved simultaneous confidence intervals for linear contrasts and regression parameters. Communications in Statistics – Simulation and Computation 15, 4 (1986) 917–932. https://doi.org/10.1080/03610918608812552. [CrossRef] [Google Scholar]
- H.S. Braren, J. Fels: A high-resolution individual 3D adult head and torso model for HRTF simulation and validation: HRTF measurement, 2020. Published under Creative Commons Attribution 4.0 License. https://doi.org/10.18154/RWTH-2020-06761. [Google Scholar]
- A. Mäkivirta, M. Malinen, J. Johansson, V. Saari, A. Karjalainen, P. Vosough: Accuracy of photogrammetric extraction of the head and torso shape for personal acoustic HRTF modeling, in Audio Engineering Society Convention 148, May 2020 [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=20740. [Google Scholar]
- R.G. Klumpp, H.R. Eady: Some measurements of interaural time difference thresholds. The Journal of the Acoustical Society of America 28, 5 (1956) 859–860. https://doi.org/10.1121/1.1908493. [CrossRef] [Google Scholar]
- W.A. Yost: Discriminations of interaural phase differences. The Journal of the Acoustical Society of America 55, 6 (1974) 1299–1303. https://doi.org/10.1121/1.1914701. [CrossRef] [PubMed] [Google Scholar]
- S. Thavam, M. Dietz: Smallest perceivable interaural time differences. The Journal of the Acoustical Society of America 145, 1 (2019) 458–468. https://doi.org/10.1121/1.5087566. [CrossRef] [PubMed] [Google Scholar]
- L.S.R. Simon, A. Andreopoulou, B.F.G. Katz: Investigation of perceptual interaural time difference evaluation protocols in a binaural context. Acta Acustica united with Acustica 102, 1 (2016) 129–140. https://doi.org/10.3813/AAA.918930. [CrossRef] [Google Scholar]
- S. Klockgether, S. van de Par: Just noticeable differences of spatial cues in echoic and anechoic acoustical environments. The Journal of the Acoustical Society of America 140, 4 (2016) EL352–EL357. https://doi.org/10.1121/1.4964844. [CrossRef] [PubMed] [Google Scholar]
- N.J. Spencer, M.L. Hawley, H.S. Colburn: Relating interaural difference sensitivities for several parameters measured in normal-hearing and hearing- impaired listeners. The Journal of the Acoustical Society of America 140, 3 (2016) 1783–1799. https://doi.org/10.1121/1.4962444. [CrossRef] [PubMed] [Google Scholar]
- C. Geetha, R.R. Rajan, K. Tanniru: A review of the performance of wireless synchronized hearing aids. Journal of Hearing Science 5, 4 (2015) 912. https://doi.org/10.17430/895179. [Google Scholar]
- P. Derleth, E. Georganti, M. Latzel, G. Courtois, M. Hofbauer, J. Raether, V. Kuehnel: Binaural signal processing in hearing aids. Seminars in Hearing 42 (2021) 206–223. Thieme Medical Publishers, Inc. https://doi.org/10.1055/s-0041-1735176. [CrossRef] [PubMed] [Google Scholar]
- D.T. Lawson, B.S. Wilson, M. Zerbi, C. van den Honert, C.C. Finley, J.C. Farmer Jr, J.T. McElveen Jr, P.A. Roush: Bilateral cochlear implants controlled by a single speech processor. American Journal of Otology 19, 6 (1998) 758–761. [Google Scholar]
- T. Francart, J. Brokx, J. Wouters: Sensitivity to interaural time differences with combined cochlear implant and acoustic stimulation. Journal of the Association for Research in Otolaryngology 10, 1 (2009) 131–141. [CrossRef] [PubMed] [Google Scholar]
- V. Larcher, J.-M. Jot: Techniques d’interpolation de filtres audio-numeriques, Applicationa la reproduction spatiale des sons sur ecouteurs, in Proc. CFA: Congres Francais d’Acoustique, Citeseer. 1997. [Google Scholar]
- L. Savioja, J. Huopaniemi, T. Lokki, R. Väänänen: Creating interactive virtual acoustic environments. Journal of the Audio Engineering Society 47, 9 (1999) 675–705 [Online]. Available: http://www.aes.org/e-lib/browse.cfm?elib=12095. [Google Scholar]
- P. Majdak, C. Hollomey, R. Baumgartner: AMT 1.x: A toolbox for reproducible research in auditory modeling, Acta Acustica 6 (2022) 19. https://doi.org/10.1051/aacus/2022011. [CrossRef] [EDP Sciences] [Google Scholar]
- G.A. Seber, A.J. Lee: Linear regression analysis, John Wiley & Sons. 2012. [Google Scholar]
Cite this article as: Pausch F. Doma S. & Fels J. 2022. Hybrid multi-harmonic model for the prediction of interaural time differences in individual behind-the-ear hearing-aid-related transfer functions. Acta Acustica, 6, 34.
All Tables
Comparison of previously measured individual HARTF databases with the presented database.
Statistical summary of the results on measured and calculated anthropometric and HA features.
Reproducibility of selected anthropometric and HA features from participant 18 using manual feature extraction. The inter- and intra-individual results are based on feature extractions performed by six different participants, and three times by one of the six participants, respectively.
Broadband ITD-model prediction errors, quantified by MAEs and RMSEs, averaged across participants. Bivariate correlation coefficients ρ quantify the correlation strength between measurement-based and model-based ITD maxima, further described by the corresponding linear-regression equations, κ · ITDmeas + λ, and adjusted coefficients of determination R2. The last two columns present model-prediction errors regarding ITD maxima and the differences in the arguments of the ITD maxima.
rANOVA table with degrees of freedom (df), Fisher-ratios (F), p-values, and explanatory measures of effect size ξ. The -statistics of linear contrasts are reported with corrected p-values given multiple comparisons. The indices of Δ correspond to the ITD-model variants.
Effect of feature variations on MAEs and RMSEs in the ITD predictions of Model 3. The errors were calculated by evaluating the predictions of Model 3 using inter- and intra-individual variations when extracting the selected anthropometric and HA features from participant 18 (female) multiple times, see Table 3, in comparison to measurement-based ITD estimations. Deviations with respect to the mean MAEs and RMSEs, cf. Table 4, are indicated by Δμ, with percentage deviations provided in brackets.
All Figures
Figure 1 Detail of the technical drawing showing the HA prototype, with front and rear HA microphones coloured in blue. (Original drawing released with permission of GN ReSound and adapted.) |
|
In the text |
Figure 2 Graphical user interface providing real-time feedback to assist in correcting misalignment with the participant’s target pose. |
|
In the text |
Figure 3 Absolute (black) and relative (grey) pose errors with respect to the target pose and mean participant pose, respectively. Boxplots represent median (Mdn) values (horizontal lines) and interquartile ranges (IQRs), with whiskers covering 1.5 times the IQR. |
|
In the text |
Figure 4 Placement and fixation of the in-ear microphones, the research HAs and the head-mounted rigid body. Two cross-line laser were simultaneously used for optical validation of the participant’s target pose. |
|
In the text |
Figure 5 Comparison of measurement-based mean ITDs between datasets in the horizontal plane, evaluated in a frequency range of 0.5–1.5 kHz and averaged across participants with standard deviations σ (grey areas). Dashed and dotted lines show the mean ITD differences in front (F) and rear (R) HARTFs, respectively, compared to the mean ITDs in HRTFs. |
|
In the text |
Figure 6 Schematic side and top views of the right ear and the head, respectively, showing ear-related and HA features, and the HA device in light blue with blue dots representing the HA microphone positions. The auxiliary parameter ζ is defined in equation (A2). Figure adapted from [19]. |
|
In the text |
Figure 7 Visualisation of the upper correlation matrix between the features in Table 2, showing bivariate correlations that are significant at the 95% confidence level. |
|
In the text |
Figure 8 Effect of the complexity in Model 3 on mean ITD- prediction errors, averaged across participants, in HRTFs, and front (F) and rear (R) HARTFs. In comparison to i) Model 1, Model 3 was evaluated with ii) γ2n = γ3n = 0, iii) γ3n = 0, and iv) all coefficients, each optimised based on feature subsets that were scaled using polynomial-regression weights (P = 4). Error bars represent standard deviations. |
|
In the text |
Figure 9 (a) Measurement-based (light grey) and (b), (c) model-based broadband ITD estimations, colour-coded in grey, black and blue, respectively, evaluated for directions in the horizontal plane for HRTF and front (F) and rear (R) HARTF datasets. Deviations from measurement-based ITDs are shown as dashed-dotted (Models 1 and 3) or dotted lines (Model 2+), with standard deviations as shaded areas. (d) Scatter plots comparing measurement-based and model-based ITD maxima, fitted by linear regression lines. Box plots show medians and IQRs of differences in (e) ITD maxima and (f) arguments of the ITD maxima, with whiskers covering 1.5 times the IQR, and outliers being displayed as crosses. Horizontal black lines indicate non-significant (n.s.) mean differences at the 95% confidence level. |
|
In the text |
Figure 10 Results of frequency-selective measurement-based ITD estimations (solid light grey curves) and ITD predictions of (a) Model 1 (dotted grey curves), (b) Model 2+ (dotted black curves) and (c) Model 3 (dotted blue curves), evaluated for the HRTF and front (F) and rear (R) HARTF datasets per participant. The measurement-based ITDs were estimated in frequency bands of ±100 Hz around the corresponding centre frequencies, displayed on the right ordinate. Green and red horizontal and vertical error bars indicate μ ± σ of the arguments of the ITD maxima and ITD maxima of measurement-based and model-based ITD estimations, respectively. |
|
In the text |
Figure 11 Effect on ITDModel 3 given (a) inter-individual and (b) intra-individual variations in feature subsets, whose elements were repeatedly extracted from participant 18 (female). Solid and dotted lines indicate measurement-based and model-based ITD estimations and predictions, respectively, in HRTFs, and front (F) and rear (R) HARTFs. |
|
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.