The Hearpiece database of individual transfer functions of an in-the-ear earpiece for hearing device research

We present a database of acoustic transfer functions of the Hearpiece, a commercially available multi-microphone multi-driver in-the-ear earpiece for hearing device research. The database includes HRTFs for 87 incidence directions as well as responses of the drivers, all measured at the four microphones of the Hearpiece as well as the eardrum in the occluded and open ear. The transfer functions were measured in both ears of 25 human participants and a KEMAR with anthropometric pinnae for five insertions of the device. We describe the measurements of the database and analyse derived acoustic parameters of the device. All regarded transfer functions are subject to differences between participants and insertions. Also, the KEMAR measurements are close to the median of human data in the present results for all assessed transfer functions. The database is a rich basis for development, evaluation and robustness analysis of multiple hearing device algorithms and applications. It is openly available at https://doi.org/10.5281/zenodo.3733190.


Introduction
Development and evaluation of hearing devices like hearing aids or hearables and appropriate algorithms is greatly facilitated by utilizing simulations. It is well understood that realistic simulations are required to obtain results that reflect the real behaviour of hearing devices in the ear of a user [1,2]. To simulate input signals of hearing devices, signals can be convolved with appropriate Headrelated Transfer Functions (HRTF) that describe the acoustic free-field transmission to the hearing device microphone from a certain incidence direction. Several researchers have presented measurements of hearing device HRTFs [2][3][4][5], and it has been shown that there are significant differences in the HRTFs between hearing device styles and microphone positions, as well as differences and perceptual consequences with respect to HRTF measured in the unobstructed ear [6][7][8]. Also, the differences between individuals and the implications for designing signal processing algorithms has been recognized [3,9]. One limitation of existing hearing device HRTF datasets is that while they can be utilized well to study the theoretical performance of algorithms, most authors used custom devices that are not available to other researchers. This means that others would have to build their own devices given the (often sparse) documentation in order to transfer their developments to real-time devices that are usable in the field.
Several other transfer functions related to hearing devices are crucial for their real-ear performance. However, they seem to be rarely addressed in the current literature, and the authors are not aware of public datasets of hearing device HRTFs that also include the driver responses, feedback paths, or transfer functions to the occluded eardrum. For instance, the responses of the driver(s) at the eardrum determine the sound that is perceived by the user, and recently several researchers tackled the problem of individualized equalization of the presented sound [10][11][12]. Also, the feedback paths, i.e., the response of the driver at the device's microphones, is a factor that may greatly affect the performance of hearing devices especially if gain is to be provided [13]. Furthermore, many devices feature a loose, vented or completely open fit, allowing significant sound energy from external sound sources to enter the ear canal directly without being processed by the device. Usually, the tightness of fit also interacts with the driver responses and feedback paths [14,15].
We present a database of all linear transfer functions of the Hearpiece, an in-the-ear earpiece with wired transducers for hearing device research introduced in 2019 that is commercially available [16]. The database contains the HRTFs from 87 directions to the four microphones of the Hearpiece as well as the eardrum, both in the open ear and the ear occluded by the passive device. In addition, it contains the responses of the two drivers at the eardrum as well as at the microphones integrated in the device (i.e., feedback paths). The transfer functions were measured in both ears of 25 adult participants and a KEMAR with anthropometric pinnae (G.R.A.S. 45BB-12 [17]) for each of five insertions and variations of the sound field by placing a telephone near the ear. Furthermore, the between-device variation was assessed for 10 pairs of the Hearpiece that were either vented or completely occluded the ear. The database thus amounts to 169,878 HRTFs and 5740 driver responses.
This work includes a description of the conducted measurements and the database, as well as an evaluation of derived acoustic parameters of the Hearpiece. While the general properties of the device were already analyzed in [16], data analyses in this work mainly focus on A) the differences between participants and devices from a series, B) the variations of the transfer functions with reinsertions, and C) to what degree measurements in KEMAR are suitable to capture the acoustic properties of such an in-ear hearing device in a median human ear. One special feature of the Hearpiece is a microphone in the ear canal, which allows the implementation of novel algorithms in the field of individualized equalization, feedback cancellation, or active noise/occlusion control [11,[18][19][20]. These approaches require knowledge, estimation or modelling of the relative transfer function from the In-Ear microphone to the eardrum, which is in the following referred to as Residual Ear Canal Transfer Function (RECTF). Since such relative transfer functions and their properties have only been assessed very sparsely before [21], the RECTF is also given special consideration here.
The database is made openly available. It can be broadly applied in research and development of in-the-ear type hearing devices, which have gained popularity with the advent of hearables and other hearing devices targeted at (near-to) normal-hearing users in different applications [22]. For such devices in general, the database allows researchers to develop and evaluate algorithms for applications like sound equalization [11,18], feedback cancellation [19,23], active noise-and occlusion cancellation [20,22,24], electro-acoustic modelling [25,26] and beamforming [27]. In particular, the database allows a robustness evaluation of participant-and insertion variability. Moreover, algorithm developments based on the database can be directly transferred to portable real-time prototypes due to the joint availability and compatibility of this database, the Hearpiece [16], and portable signal processing platforms [28,29].

Earpiece and measurements in the ear
A photograph of the Hearpiece and its schematic layout are shown in Figure 1, the detailed geometry is documented in [16,30]. The Hearpiece includes two Balanced-Armature drivers and four microphones in each side that are contained in an acrylic shell with a generic fit. Its outer shape was derived from the earphone "ProPhile" by InEar, and fits into about 90% of human ears according to the manufacturer [31] and own experience in approx. 100 users. The shape is similar to individualized in-the-ear hearing aids and many current hearables, and sits shallow in the cavum concha. An optimized fit is achieved using exchangeable silicone domes in four sizes, the size selected for each participant is given with the database.
Both drivers and two of the microphones are distributed along the included vent with a cross-section of approx. 1.5 mm 2 and a length of 19 mm, where the two microphones are positioned approx. 2.5 mm from the inner and outer ends (referred to as In-Ear and Outer Vent microphones, respectively). The two drivers couple into the vent independently with a relative distance of approx. 3.1 mm and are referred to as inner and outer driver. Two more microphones are located on the faceplate with a distance of 12.5 mm, one near the position of the ear canal entrance (Entrance microphone) and one in the rear part (Concha microphone). The drivers are balanced armature drivers of two different types (inner: Knowles WBFK-30042, outer: FK-26768), while all microphones are MEMS microphones of the same type (Knowles SPH1642HT5H-1). The Hearpiece is available as both a vented and a closed version, where the outer part of the vent is occluded and the Outer Vent microphone omitted. The main body of the data presented here regards the vented version; for differences of the closed version the reader is referred to [16]. The transducers were connected to the measurement system through a custom adaptor and amplifier box 1 without implementing any real-time processing. The custom transducer layout facilitates several tailored applications like automatic individual equalization using either one or both drivers [11,18,32] and feedback suppression using multiple microphones or drivers [19,23] that make use of electro-acoustic models individualized to the user [25,26]. Furthermore, the In-Ear microphone opens up possibilities to implement active noise or occlusion cancellation [20,24] features.
In addition to the microphones integrated in the device, measurements were also conducted at the eardrum. To this end, an audiological probe tube connected to an Etymotics ER7C microphone was inserted into the ear canal until the participant reported contact with the eardrum, and then pulled back by a minimal amount. The device was then inserted on top of the probe tube. To minimize squeezing and movement of the probe tube, it was placed at the lower anterior corner of the ear canal entrance, led towards the eardrum on the lower side of the ear canal and out of the ear between tragus and anti-tragus. This positioning of the probe tube resulted in measuring close to the innermost corner in the ear canal, since the eardrum is usually slanted with respect to the ear canal surface. This location is close to the umbo and the maximum pressure on the eardrum [33], and was found to be most reproducible between participants with the practical constraints of the present measurement [34]. Insertion of the probe tube and the device was executed by an experienced hearing aid acoustician. Upon reinsertions of the device (cf. Sect. 2.3), the probe tube was kept in place as well as possible.

Individual participants and dummy head
Twenty-seven adult participants (age 21-60, median age 28.5, 13 females, 14 males) took part in the measurements. In two female participants, it was not possible to insert the Hearpiece properly since their cavum conchae were too small, leaving 25 participants in the database. Screening measurements and ear inspection assured that the participants had clinically normal hearing (hearing threshold better than 20 dB HL for frequencies 8 kHz, normal loudness perception), no audiological abnormalities and no large accumulations of cerumen in their ear canals. No participant wore jewellery in their ears during the measurements.
The participants comprised one author, employees of the University of Oldenburg (Department of Medical Physics and Acoustics) and RWTH Aachen University (Institute of Technical Acoustics) as well as paid volunteers. All participants signed a written informed consent, and the experiment was approved by the University of Oldenburg Ethics council. Some relevant data of the participants (sex, age, ear canal lengths and silicone dome sizes) are supplied with the database.
The measurements were also conducted in a GRAS KEMAR 45BB-12 dummy head with anthropometric pinnae and low-noise ear simulators [17]. The anthropometric pinnae facilitated realistic fitting of the in-ear device, which was not possible with the standard pinnae. Measurements and evaluation were conducted identically for KEMAR and the human participants except for the eardrum data, where KEMAR's ear simulators were utilized.

Apparatus and procedure
The measurements were conducted in the Virtual Reality Lab of Oldenburg University, which is an anechoic chamber with 94 Genelec 8030 loudspeakers installed in a 3D layout. Forty-eight loudspeakers were spaced uniformly in the horizontal plane, leading to an azimuth resolution of 7.5°. Further circles of loudspeakers were installed at ±30°a nd ±60°elevation with a horizontal resolution of 30°and 60°, respectively, as well as one loudspeaker each directly above and below the center. Eight further loudspeakers were installed in the median sagittal plane to achieve a vertical resolution of 15°in this plane above À30°elevation. The seven incidence directions at elevations of À60°c ould not be considered due to obstruction by a sitting platform, such that a total of 87 loudspeakers were used for the measurements. The loudspeakers were mounted at a distance of 2.5-3 m from the participant, and the spatial separation of woofer and tweeter (approx. 1.3°) can be neglected. All active loudspeakers together produced a noise floor of 17.1 dBA at the array centre, which lies below the self-noise of the utilized microphones (except KEMAR ear simulators) and is thus negligible. Figure 2 shows a photograph of the setup with a participant during an experimental session. The participants were seated on a small metal grid platform covered by absorbers, with their legs wrapped in absorbing material to minimize reflections. To stabilize the head position during the course of the experiment, a small headrest as well as a graphical display to provide feedback on the head position as described in our previous work [35] was provided. The visual feedback utilizes head position data continuously recorded using a headtracker (Pohlemus Patriot) mounted on top of the participant's head, and displays the corrections necessary to restore a reference head position and orientation. The visual feedback was displayed on a screen mounted below the loudspeaker in front of the participants (see Fig. 2), which they monitored during the measurements for guidance to (re-) adjust their head to the reference position. At the beginning of the experiment, the participants were positioned and oriented at the center of the loudspeaker array using crossed laser markers, and the reference head position was recorded. The head position and orientation with respect to the reference position are supplied with the database.
The first step of the experimental procedure was to insert the probe tubes into the open ears, and the participant was positioned and oriented in the center of the array. Second, the HRTF to the eardrum of the open ear was measured (see Sect. 2.4). Third, the device was inserted and the HRTF to all microphones of the device and the eardrum measured. The hearing aid acoustician inserted the device to minimize hazard to the participants by pushing the probe tube closer to the eardrum and avoid squeezing of the probe tube. Fourth, the responses of the device's drivers were measured (see Sect. 2.5), once with nothing close to the ear and once while the participant held a telephone (Galaxy S3 mini, size approx. 12 Â 6 cm with glass surface, turned off) close to their right ear. Then, the device was taken out and steps 2-4 repeated for a total of five insertions of the device. Finally, the HRTF with the telephone held close to the right ear was measured both with the device inserted (no reinsertion after fifth round) and subsequently the open ear. The participants were instructed to hold the telephone as they would normally do during phone calls. All measurements were conducted for both ears simultaneously.
The database was measured with the device pair with serial number DV-0001, except for additional measurements that assessed nine devices with serial numbers 0003-0011. These additional measurements of the between-device differences were gathered as described above but exclusively in KEMAR, with only three insertions and without the telephone nearby.

HRTF measurements and processing
The measurements and processing of the HRTFs were performed very similar to our previous work [3]. HRTFs were measured for all 87 incidence directions where loudspeakers were installed and both ears using exponential sweeps covering a range from 30 Hz to 22.05 kHz (= half the sampling rate of 44.1 kHz) with an individual length of 3.2 s. To speed up the measurements, the sweeps were overlapped in time using the multiple exponential sweeps technique [36], leading to an overall duration of 27 s. The average level of the sweeps was 81 dB SPL in the free field. For each round of measurements, the order of incidence directions was randomized independently.
From the raw impulse responses, acoustic reflections from equipment were removed by frequency-dependent truncation [37]. That is, frequency bands for frequencies >750 Hz in a short-time Fourier transform representation of the impulse responses (3.5 ms block length) were truncated to a length of 3.5 ms. Lower frequency bands contain less reflected energy due to the limited size of reflecting surfaces like loudspeakers, and the truncation lengths could be increased in lower frequency bands to avoid low-frequency errors (600 Hz: 6 ms, 300 Hz: 13 ms, 0 Hz: 90 ms).
Next, the responses were compensated for the influence of the loudspeakers (measured using a 1/8 00 referencce microphone G.R.A.S. 46DP-1, acoustic reflections removed likewise) and microphone sensitivities by regularized spectral division. That is, the spectral division was only applied in a frequency range where significant signal energy was present (here 60 Hz to 21 kHz). Furthermore, a lower boundary of 30 dB below the broadband average was imposed on the divisor, which is the product of loudspeaker response and microphone sensitivity.
For the microphones included in the device, one representative sensitivity that included the pre-amplifiers was utilized. To this end, the sensitivity of 6 identical microphones was determined in free-field measurements, and the sensitivity of the microphone with a magnitude response closest to the average was chosen. 2 For each microphone of the device, an additional broadband gain was applied such that the resulting HRTFs measured in KEMAR reached the expected 0 dB around 100 Hz. For the two probe tube microphones, individual sensitivities to a free sound field were determined. The microphone sensitivities were measured with the microphones/probe tube inlets mounted in free space on a thin wooden pole facing a loudspeaker at 2.5 m distance, with respect to a 1/8 00 reference microphone mounted in the same manner. The resulting sensitivity of the probe tube microphones includes the propagation delay through the probe tube, hence this propagation delay is compensated through the spectral division.
Finally, the HRTFs were set to the expected 0 dB at frequencies below 60 Hz, shifted in time by 44 samples (equals 1 ms) and truncated to a final length of 356 samples at 44.1 kHz sampling rate. The truncation length was manually determined as the minimal length where the impulse response truncation does not disturb the lowfrequency end of the corresponding transfer function by more than ±1 dB.

Driver responses at the eardrum and feedback paths
The linear responses of the four drivers (two in each side) were measured sequentially, and for each driver simultaneously at the eardrum and all microphones of the device. An exponential sweep covering the frequency range from 30 Hz to 22.05 kHz (= half the sampling rate of 44.1 kHz) with a length of 2 s was employed. A frequency-dependent gain was applied to the sweep prior to playback to achieve a frequency-independent level of approx. 80 dB SPL equivalent to the free field at the eardrum [38]. Afterwards, the microphone sensitivities (same procedure as for HRTF) as well as the delay and sensitivity of the sound card were compensated, such that the impulse responses are stored in units of Pa/V. Finally, the impulse responses were truncated to 756 samples at 44.1 kHz, including 44 samples (= 1 ms) time shift as in the HRTF. Note that the longer length of driver responses as compared to the HRTF was found necessary due to the low-frequency behaviour of the driver responses.

Results and analysis
In the following, example results that aim to represent the extensive database as well as possible are presented and discussed. Explicit sample data is shown for KEMAR and two representative human participants: One male participant where the device fit particularly well (ER03ED10) and one female participant where only a less reliable fit could be achieved (EL08RD06). In Sections 3.1-3.5, various acoustic parameters are assessed and shown for one sample device and insertion (DV-0001, third insertion into right ear). The variation of these parameters across human participants is evaluated and compared to measurements in KEMAR. Averages of responses in the human participants were computed from the magnitude responses in decibels. In Section 3.6, the variability of all transfer functions is evaluated. In Section 3.7, the variation of driver responses and feedback paths within each eight vented and two closed pairs of devices measured in KEMAR is shown. Finally, Section 3.8 demonstrates a possible application scenario where real-time processing with different insertion gain settings is simulated for the device in a sample human ear. While aspects particular to the shown data are discussed with the presentation of the results, an overarching discussion is given in Section 4. Figure 3 shows samples of the HRIR, the impulse response representation of HRTFs, measured for frontal incidence in the right ear of participant ER03ED10, for all microphone locations of interest as denoted at the right of the panel. First, small timing differences of up to 0.1 ms between microphones originate from the geometric propagation difference. It should be noted that the open eardrum HRIR was measured separately from the other responses, and the good temporal alignment between both measurements at the eardrum verifies the stability of the head position throughout the experimental session of better than ±1 cm [35]. Second, the HRIR at the eardrum of the open ear is considerably longer than the others, which is caused by an oscillation at the k/4 resonance frequency of the ear canal. The level differences between microphone locations are caused by attenuation through insertion of the device (Eardrum, occluded) or due to (destruction of) such resonances of the open ear [3]. Third, in the HRIRs measured at the eardrum, additional acausal peaks are seen at around À0.3 ms in both responses, but not in the HRIRs measured in the microphones of the device or those of KEMAR. These peaks are not to be confused with the mild pre-ringing artefacts as present in the Outer Vent or Concha microphones. The acausal peaks very likely originate from a sound path leaking directly into the body of the probe tube microphone without travelling through the tube. This interpretation is consistent with the additional observation that the temporal alignment of this component with respect to the main response varies with incidence direction, and is not present in the driver responses (cf.

Head-related impulse responses and transfer functions
Sect. 3.3). This component does thus not belong to the HRIR to be measured at the probe tube inlet, and can be interpreted as a disturbance similar to noise. As it can be seen in Fig. 3, the acausal parts are most critical in the occluded eardrum data, where the main response is attenuated with respect to the open-ear case but the acausal part is constant in level. In the occluded eardrum responses, the additional energy of this disturbing component may impose a lower boundary on the measured HRTFs due to an effective decrease in SNR. Nevertheless, comparisons to KEMAR data (see following sections) and further analyses showed that up to about 10 kHz, the HRTFs measured at the occluded eardrum yield results that are as reliable as it can be expected from probe tube measurements in the occluded ear (cf. Sect. 4.4). In the responses at the eardrum of the open ear, this disturbance is generally low enough in level to be neglected. Figure 4 shows a direction-frequency representation of the HRTFs at the eardrum of the open ear and the Concha microphone after 1/12 octave smoothing, for the right ears of the three example participants. Somewhat altered but similar structures are seen in all participants for each microphone location. As expected, the HRTFs differ notably between microphone locations in each participant. The most prominent difference is an amplification around 2-10 kHz originating from ear cavity resonances, which is seen at the eardrum but not at the Concha microphone [3,39]. While the differences between left and right hemispheres are evident at both microphone locations, the eardrum HRTFs also contains more spatial dependences than the Concha microphone that originate from directional pinna filtering effects, which are largely destroyed by inserting the device [6,7]. Figure 5 shows the diffuse-field insertion loss, the attenuation of external sounds reaching the eardrum by inserting the passive device. The insertion loss was calculated by dividing the approximated diffuse-field responses at the occluded eardrum by the corresponding open-eardrum response. The diffuse-field responses were approximated from the HRTFs by calculating the power spectrum averaged across 47 uniformly distributed incidence directions (thus simulating uncorrelated summation, same directions as in [3]) after 1/12 octave smoothing of individual HRTFs. Each black line in Figure 5 denotes the result for one right ear of a human participant in the third insertion, the green line denotes the average across participants, and the orange line denotes the corresponding result in KEMAR.

Insertion loss
The typical insertion loss curve approaches 0 dB at the low frequencies, showing that the vent allows low-frequency sounds to leak into the ear canal unattenuated. Around 400 Hz, an amplification of up to 5 dB is seen most prominently in KEMAR data, but also for the human participants. The amplification probably results from a Helmholtz resonance of the residual ear canal volume and vent opening. Above approx. 500 Hz, the attenuation increases for most participants up to about 30 dB around 4 kHz. Only in some participants, where only a poor fit could be achieved, the device does not attenuate sounds below 1-2 kHz. It should be noted that even poorer fits than included here occur in the database, and within some participants the fit varies significantly between insertions (see also Sect. 3.6). Between 4 and 10 kHz, the attenuation decreases again down to 10 dB, which might be caused by approaching the k/2 resonance of the vent (length: 19 mm). Above 10 kHz, the insertion loss increases again for KEMAR measurements, but decreases further in the human participants (see also next paragraph). Apart from outliers with a very poor fit, the insertion loss in the human participants lies within a range of approx. ±7 dB around the average for frequencies >600 Hz.
Up to 10 kHz, the data from the human participants and KEMAR are very consistent, and KEMAR may be seen as a human participant where a particularly good fit could be achieved. In the human participants, the presence of the probe tube unavoidably prevents a tight seal between the ear canal and the silicone dome of the earpiece. Thus, in practice the low-frequency insertion loss in users may look even more like the KEMAR curve. Above 10 kHz, the results deviate consistently between human participants and KEMAR. The lower attenuation seen in the human participants could, on the one hand, be caused by utilizing KEMAR out of its intended frequency range [40]. On the other hand, we believe it is more likely that the apparently lower attenuation in the human participants is an artefact of a low SNR in this frequency region of the occluded eardrum measurements (cf. Sect. 3.1). Figure 6 shows the responses of both drivers of the device at the eardrum in separate panels. Generally, the responses differ between drivers, which is caused by using different driver types (cf. Sect. 2.1 and [16]). A lowfrequency roll-off with cut-off frequencies varying between approx. 300 Hz and 1 kHz is seen in all curves, which is caused by incomplete sealing of the ear canal due to the vent and imperfect fit. A varying fit in the ears of different participants probably explains most of the between-participant variation below 1 kHz, as for the insertion loss. The tight seal that can be achieved in KEMAR also here leads to the lowest cut-off at around 300 Hz; for most participants it lies at around 400 Hz. A resonance around  400 Hz can be seen in KEMAR data, but hardly in any of the participants (see also Sect. 3.6).

Driver responses at eardrum
In the range of 2-6 kHz, the differences between participants are small (< ±5 dB with respect to average) and mostly comprise a broadband offset. This can be explained by the fact that the corresponding half wave lengths are still larger than the residual ear canal lengths, and the responses are largely governed by the properties of the drivers and the device. The broadband differences are probably a result of residual ear canal volumes that differ between participants. Beyond 6 kHz, the resonances in the individual residual ear canals (and potentially small shifts of the probe tube, cf. Sect. 4.4) come into play, which lead to shifted resonances and between-participant deviations of 30 dB and more.
The KEMAR data reflects a median response of the present human participants well across the full frequency range, and is very close (difference <2 dB except frequency range from 7 to 9 kHz) to the average above 800 Hz. As for the insertion loss (Sect. 3.2), it can be assumed that in human participants a better seal than in the presented data can be achieved in practice due to the absence of the probe tube. Therefore, the actual low-frequency response in human participants may be even closer to the KEMAR data than in the present results. Figure 7 shows feedback paths, the responses of the drivers at the device microphones. The relevant feedback paths are given by the responses measured from the inner and outer driver to the Concha and Entrance microphones, which are most relevant for incoming signal pickup [16]. Since the behaviour of these four responses is very similar, only the response from the inner driver to the Concha microphone is shown in Figure 7 as an example. Results with free-field conditions are shown in the top panel, conditions when a telephone was held near the right ear are shown in the bottom panel. For either condition, again the individual participants are denoted by black lines, the participants' average is denoted by green lines and the KEMAR result is denoted by orange lines in the corresponding line style. For reference, the response of the inner driver at the eardrum measured in KEMAR in the appropriate field condition is also shown as a purple line. The difference in magnitude between the response at eardrum and the feedback path determines the gain margin before instabilities occur.

Feedback paths
The feedback path in free-field conditions is generally lower in level than the response at the eardrum by more than 30 dB up to 4 kHz, and about 20 dB at higher frequencies. The lower boundary of the feedback paths at around 60 dB SPL/V (around 50 dB SPL/V in KEMAR) is probably caused by noise in the measurements and caused by the necessity to limit the level reaching the participants' eardrum. It is expected that below 2 kHz, the downward slope of approx. 24 dB per octave observed between 2 and 4 kHz actually continues toward lower frequencies.
Placing a telephone near the ear generally results in a rather broadband amplification of the feedback paths of approx. 10 dB in average. However, the influence of the telephone is subject to large variations between participants of up  to 30 dB around 6 kHz, presumably due to different manners of how they held the telephone.
The variation across participants for the free-field case is in the range of the between-participant variation of the driver responses at the eardrum (cf. Sect. 3.3). The KEMAR results lie well in the range of human participants data, and coincides with the present human average curve within ±2 dB between 2 and 10 kHz, and within ±10 dB above 10 kHz. In the condition with the telephone near the ear, the increase of the feedback path in KEMAR is rather high but still in the range of human data, presumably because the telephone was placed very close to the ear. Figure 8 shows the RECTF, the relative transfer function from the In-Ear microphone to the occluded eardrum, in the three example ears for external sound sources as well as the two drivers. The RECTF was computed by dividing the corresponding 1/12 octave smoothed magnitude responses at both microphone locations. It can generally be seen that the RECTF is only flat at 0 dB in the low frequencies below 400 Hz, and generally has to be considered when estimating the sound pressure at the eardrum using the In-Ear microphone [25]. The sign of the RECTF shows that the sound pressure level is mostly higher at the In-Ear microphone, at some frequencies this difference amounts to 30 dB and more.

Residual ear canal transfer functions
Besides the dependence on frequency, considerable differences between sound sources are noted. That is, the RECTF is different between external sound sources and the two included drivers of the device. While the RECTF differs between external sound sources at different directions, it is very similar between the two drivers as sound sources. This deviation between sound sources is most prominent in a band between approx. 1.5 and 5 kHz. In the human participants the deviation between sound sources is also seen above 10 kHz, which may be an artefact of poor SNR (cf. next paragraph). The results are consistent with observations made in a previous prototype of the Hearpiece [21], however the underlying reason is still unclear. A discussion on the possible origin of this difference is provided in Section 4.3. Figure 9 shows the RECTF for all human participants, their average and the KEMAR data (as already shown in Fig. 8) for diffuse-field incidence and the inner driver. The RECTF varies between participants above 1 kHz by up to ±10 dB with respect to the average for diffuse-field incidence, and within ±5 dB for the inner driver. For the inner driver, the between-participant differences increase to ±15 dB above approx. 6 kHz due to shifted resonances,  In-Ear microphone to eardrum) measured in all participants and KEMAR, for diffuse-field incidence (upper panel) and the inner driver of the device (lower panel). The results for the outer driver are very similar to the inner driver (cf. Fig. 8). similar to the driver response (cf. Sect. 3.3). The KEMAR data is very consistent with the human data up to 10 kHz for diffuse-field incidence, and across the whole frequency range for the drivers. The deviations between the KEMAR and human data for diffuse-field incidence across 10 kHz might again be explained by a low SNR at the probe tube microphone for external sound sources but not the internal drivers (see Sect. 3.1). The hypothesis is supported by the observation that the between-participant differences are decreasing with increasing frequency in this range for diffuse-field incidence, but increasing for the inner driver as a sound source. Between-participant differences that are increasing with frequency as seen for the inner driver as sound source would generally be expected in this frequency range due to increasing differences between ear canal geometries.

Variations between participants and insertions
This section presents results on the variation of the transfer functions assessed above. The left column of Figure 10 shows results for all five measured insertions in the three example ears used above, where each colour represents one participant and each line represents one insertion. Besides these explicit examples of variations, the standard deviations of transfer functions (calculated for the magnitudes in dB) across insertions and participants are shown in the right column of Figure 10. The cyan line shows the standard deviation across all 25 ears over frequency (for the third insertion only, standard deviation across individual data shown in the previous figures). The shaded light and dark blue areas show the 10%-90% inter-quantile range of between-participant standard deviations occurring when 5 and 15 participants are randomly drawn from the database, respectively. The interquartileranges are based on bootstrapping all participants' right ear data with 100 unique resamplings. Note that the medians of standard deviation distributions across bootstrapped subsamples are identical to the standard deviation across all 25 participants. The red line denotes the median across all participant of the standard deviation between insertions, the shaded red area denotes the 10%-90% inter-quantile range across participants. The orange line denotes the standard deviation between insertions in KEMAR. Finally, the light purple line denotes the standard deviation across all 25 participants and five insertions. Only right ears were regarded for the analysis here for simplicity. Also, relative to the other variations assessed here, only small differences between the two ears of each participant were assumed.
For all transfer functions shown, the standard deviation between participants is larger than between insertions (see right column of Fig. 10). Also, the standard deviation across combined participants and insertions (purple line) is identical or only marginally larger than the between-participant standard deviation determined for one insertion (cyan line). The interquartile-range of the bootstrapped betweenparticipant standard deviations further increases by a factor of about 2 for all transfer functions when the number of participants is reduced from 15 to 5. The inter-quantile range of standard deviations between 15 participants is already below 1 dB (below 2 dB for the insertion loss) in the relevant frequency ranges, and would further decrease for larger sample sizes. Hence, the sample size of 25 is large enough to estimate a representative between-participant standard deviation of the assessed metrics .
The results for the HRTFs for a frontal source to the Concha microphone are shown in the top row of Fig. 10. The between-insertion standard deviation (right panel) is below 1 dB for the median across participants and below 2 dB for the extreme cases. This is consistent with the data seen in the example participants, where the maximum differences between insertions are below 1 dB except for one insertion in EL08RD06, where variations up to 5 dB below 10 kHz and up to 10 dB at higher frequencies are noted. This insertion represents an outlier that is also visible in the other transfer functions, as assessed below.
The results for the Insertion Loss are shown in the second row of Figure 10. The difference between insertions is a bit larger than for the HRTFs, but lies within ±3 dB for each participant (median standard deviation 1-1.5 dB), with the exception of the same poor insertion in participant EL08RD06 as discussed above. There, one poor insertion results in a decreased insertion loss and deviations up to 20 dB to the other insertions in this participant. The standard deviation between participants lies at 4-6 dB in a broad frequency range above 500 Hz, where the insertion loss is negative (cf. Fig. 5), and goes towards zero at lower frequencies.
The results for the responses of the inner driver are shown in the third row of Figure 10. Reinsertion typically causes only small variation below 8 kHz of smaller than 1 dB in the participants' median, and within ±2 dB in the example data except the outlier in EL08RD06. This outlier includes deviations greater than 10 dB especially at low frequencies with respect to the other insertions. A considerable insertion variability of the driver response above 8 kHz is seen in the human participants but less pronounced in KEMAR. The standard deviation between participants is approx. 2 dB between 1 and 6 kHz, and increases both at lower frequencies (approx. 4 dB below 500 Hz) and higher frequencies (up to 10 dB).
The results for the feedback paths (inner driver to Concha microphone) are shown in the fourth row of Figure 10. A generally very low reinsertion variability of smaller than ±2 dB in the sample data and smaller than 1 dB in the median standard deviation is seen above 1 kHz up to the high frequency end, where the SNR of the measurement is sufficient (cf. Sect. 3.4). Again, the outlier EL08RD06 where only a poor fit was achieved on one insertion leads to differences of up to 10 dB to the other insertions. However, it should be noted that the change in the feedback path due to poor fitting in this participant is much smaller compared to the effect on the insertion loss or driver response. Also the between-participant standard deviation is smaller than in the driver responses and lies between 2 and 4 dB above 1 kHz. Below 1 kHz, the high standard deviations between both participants and insertions are probably caused by the poor SNR (cf. Sect. 3.4).
Finally, the results for the RECTF for the inner driver are shown in the the bottom row of Figure 10. Up to 6 kHz, only very small variations between insertions (smaller than 1 dB median standard deviation) are noted. The exception is again the single poor insertion of the device in participant EL08RD06, which causes deviations greater than 10 dB from the other measurements, similarly to the driver response. At high frequencies, insertion variabilities are very similar to those seen in the driver response, best seen for the medians of between-participant standard deviations but also in participant ER03ED10. The betweenparticipant standard deviation is very similar to that of the driver response, with values around 2 dB for below 6 kHz that increases to up to 10 dB at higher frequencies.

Inter-device variability
The top panel of Figure 11 shows the inter-device variation for the driver responses of a series of ten device pair samples measured in KEMAR. The sample included eight pairs of vented devices (light lines), as well as two pairs of closed devices (darker lines), both ears shown for the second insertion. The responses are very similar between devices, and the deviation can mostly be described as a broadband sensitivity variation not exceeding ±2 dB, except for one vented device where only a poor fit was achieved. The general differences between both drivers, as well as between open and closed devices are intended by design, further analyses regarding these differences are provided in [16].
The between-device differences of the feedback paths (inner driver to Concha microphone, as in Fig. 7) are shown in the bottom panel of Figure 11. Again, only small differences exist between devices (except between open and closed design) that are mostly in the range of the variation of the driver response.

Application example
Finally, an application example utilizing the database as described above is given. Real-time processing in a linear hearing device based on the Hearpiece including all sound paths to the eardrum (occluded response, device output including processing delay, feedback) was simulated for anechoic frontal sound incidence as depicted in Figure 12. Processing included a constant filter that was designed similar to [11] such that three different insertion gain curves as denoted in Figure 13 were provided. The insertion gains were chosen arbitrarily, but could be prescribed for a neutral setting (no amplification, often referred as hear-through [10,12]), a mild and a moderate hearing loss (e.g., N2 and N3 standard audiograms [41]). A frequency-independent processing delay of 3.5 ms was assumed for the simulations, which was attributed to the driver response at eardrum as well as the feedback paths. Only the Concha microphone and the inner driver were included for sound pickup and reproduction, respectively.
As a basis, the transfer functions obtained in the right ear for participant ER03ED10 with the third insertion of the vented device with no telephone near the ear were utilized. To demonstrate the impact of transfer function variations, several parameters were varied. First, the set of transfer functions stated above was utilized both to compute the processing filter and simulate the hearing device ("Individual TFs"). Second, the general influence of feedback was assessed using the same transfer functions and filter, but with the feedback path set to 0 ("No Feedback"). Third, the influence of individualization was assessed by using appropriate transfer functions measured in KEMAR for calculating the processing filter, but simulating the device in the ear of the participant by using the same transfer functions as above ("KEMAR filter"). Fourth, the combined effect of insertion variability and placing a telephone next to the ear was assessed in the "Reinsertion + Telephone condition". To this end, the processing filter was computed based on the individual transfer functions stated above, but the behaviour of the hearing device was simulated using transfer functions taken from the fifth insertion and with the telephone near the ear in the same participant.
The results are shown in Figure 13. The top panel shows results with a flat 0 dB target insertion gain, i.e., a neutral "hear-through" setting that would let the user hear the environment similar to the open ear [10,12,32]. The resulting aided responses are very close to the open-ear response, except for a spectral ripple below approx. 2 kHz, which originates from interferences of the occluded response with the delayed device output [11]. The best match to the open-ear response is achieved in the Individual TF case. The difference to the No Feedback case is negligible, that is, no signifiant influence of feedback is noted with this insertion gain when only individual transfer functions are utilized. Calculating the processing filter based on KEMAR measurements (KEMAR filter) results in spectral deviations distributed  across the full frequency range, with deviations up to 10 dB below 10 kHz but more than 20 dB at higher frequencies. A variation of transfer functions due to reinsertion and placing a telephone near the ear leads to spectral deviations similar to the KEMAR filter condition up to 10 kHz. Around 13 kHz, a high excess amplification of approx. 30 dB around 13 kHz is noted in this condition that comes close to an instability, most probably due to the amplification of the feedback path due to the presence of the telephone (cf. Sect.

3.4).
The middle and bottom panels show aided responses including linear amplification by insertions gains denoted by the green lines. The aided responses correspond well to the open-ear response with added insertion gains, up to residual mismatches that are well in line with the 0 dB gain setting discussed above. Particularly with the gain setting in the bottom panel, setting the feedback paths to zero changes the simulated aided response significantly. The feedback leads to a very rippled response around the peaks of the response, and further analyses verify that this gain setting is at the upper end of the stability region in all three simulation conditions including feedback. In the Reinsertion + Telephone condition, an excess amplification around 13 kHz yields a maximum of 81 dB in the aided response (out of limits of Fig. 13) which lies approx. 40 dB above the desired insertion gain. These result stress the importance of including variations that occur in practice into simulations of hearing devices and algorithm development.

Variations of transfer functions
All assessed transfer functions showed substantial variations between participants, between insertions in the same participant, and between devices from a series. Similar variations of transfer functions as presented here have been previously reported in the literature, although not in a comprehensive manner for one device as in the present work. The between-participant variation of HRTFs measured in the ear canal has been studied extensively and is usually larger than in the HRTFs of the present work [42]. However, it has been shown before that the between-participant differences decrease at hearing device microphones, due to partial destruction of pinna cues that are very individual [6]. Variations both between participants and insertions are also well-known for headphone transfer functions, and the variations of the driver responses seen here are consistent with previous literature on the issue [43].
Especially the results shown in Section 3.6 demonstrate that the variations of hearing device related transfer functions between participants are, on average, larger than those between insertions in one participant. The differences between devices was usually in the range of between-insertion variations seen with reliable fits. Certainly, differences between participants and devices inherently include differences that are usually seen between insertions, since the device is separately inserted in each ear. It is therefore no surprise that the between-participant variation does not significantly increase when five instead of one insertions are considered (cf. Fig. 10, right column). However, it cannot be neglected that the distributions of variations between participants and insertions are different. The insertion variabilities assessed by means of standard deviations across a small number of insertions within one participant is not necessarily representative for typical variations that can occur, since these may be governed by one outlier (cf. Fig. 10). While the variability between insertions with good fits is typically less than 3 dB, imperfect fits can have large effects on all assessed transfer functions and may thus impact the device performance dramatically. These effects and associated robustness of algorithms can be studies well with the present database. To facilitate such analyses, a list describing the fits of each insertion across all participants is provided with the database.
Variations of the fit cause variations of the transfer functions across the whole frequency range, but are most pronounced in the low frequencies below 1 kHz [15]. A reduction in the tightness of fit introduces a leak between the device and the skin, and thus jointly leads to a smaller insertion loss (cf. Fig. 5), a poorer low-frequency reproduction with the balanced armature drivers (cf. Fig. 6), and variation of the feedback path, driver response and RECTF (see participant EL08RD06 in Fig. 10). A poor fit acts similarly to a vent, and comes in addition to the effect of a vent that is already included in the Hearpiece. Contrarily, effects of ear geometries are characterized by features that are structurally similar between participants, but shifted in frequency (e.g., HRTFs, driver response and RECTF in left column of Fig. 10) [3,44]. While the tightness of fit may vary both between participants in general, as well as between insertions inside one participant, differences due to the ear geometry can only occur between participants, but may influence the fit.
We were also able to demonstrate that the number of participants in the database is large enough to reliably estimate between-participant standard deviations. That is, the uncertainty of the between-participant standard deviation is already below 1 dB for most metrics when 15 participant are analyzed (cf. Fig. 10). According to the law of large numbers, this uncertainty should be further reduced by near a factor of near 1= ffiffi ffi 2 p for the 25 participants in the database. Only for the insertion loss, larger uncertainties of the between-participant standard deviation is seen (approx. 2 dB between 1 and 4 kHz for 15 participants), which is probably caused by the larger between-insertion variability in this metric that also affects the between-participant variations. Altogether, we conclude that the 25 participants in this database are sufficient to estimate the typical between-participant variations with high accuracy, under the assumption that the 25 participants are representative for the whole population. It has to be stated here that the selection of participants excluded those with particularly small ears (cf. Sect. 2.2) where the Hearpiece cannot be inserted. The participants are thus not be a representative cohort of the whole population, but rather a representative cohort of potential users of the Hearpiece.
Neglecting either source of variations of transfer functions in the design signal processing algorithms is likely to lead to an unexpected behaviour of the hearing device, as demonstrated in Section 3.8. Note that in this work, only variations of transfer function magnitudes were evaluated. For several applications, variations of the transfer function phase is also of great importance, and it should be noted that these are included in the data and similar variations as for the magnitude do occur. Notwithstanding the potential impact of large variabilities due to single (poor) insertions, the usually dominating source of variance of hearing device transfer functions are differences between individual ears. This result shows once more that it is worthwhile to adapt hearing devices to the acoustics of the individual ear [3,9,32].

Utilization of KEMAR data
In all metrics assessed, measurements in KEMAR with anthropometric pinnae [17] represented a median ear from our participant cohort very well up to 8 kHz, and reasonably well also at higher frequencies (cf. Figs. [5][6][7][8][9]. This observation is consistent with previous literature [3,40]. Remaining low-frequency differences can probably be attributed to a very tight fit of the device that could be achieved in KEMAR, whereas in the human participants the probe tube unavoidably generated an additional small leak. It should be noted here that the anthropometric pinnae greatly facilitated these measurements, and we were not able to fit the Hearpiece into standard artificial pinnae. Systematic differences to human data beyond 10 kHz occurred only in data that included the response of external sound sources at the occluded eardrum (cf. Figs. 5 and 9, top panel). As discussed above, these deviations were probably caused by insufficient SNR in the probe tube measurements at the occluded eardrum, although it should be noted that KEMAR was not originally designed for measurements beyond 10 kHz [40]. Variations of transfer functions as occurring between participants are inaccessible in KEMAR measurements, even if several insertions are conducted (cf. Fig. 10). Regarding the insertion variability, the standard deviation between insertions in KEMAR was typically in the range of the human participants, but below the median (except for the insertion loss, see Fig. 10).
In conclusion, KEMAR with anthropometric pinnae seems like a suitable tool to assess the acoustic properties of in-ear hearing devices in a median adult human earno less, but no more. For the development and robustness analysis of algorithms, we suggest measurements on multiple human participants (as provided in this database) rather than several repeated measurements in KEMAR [6,9].

Transfer functions in the residual ear canal
The RECTF, i.e., the relative transfer function between the In-Ear microphone and the eardrum, is of interest in applications that exploit an in-situ estimate of the sound pressure generated at the eardrum. Ideally, the sound pressure at the eardrum could be computed by convolving sound pressure at the In-Ear microphone with the RECTF, which could be individually determined through electroacoustic models [25,26] or by one measurement during the fitting process [14]. However, in the present data the RECTF deviates significantly between sound sources, that is, between the device's drivers and external sound sources, and less pronounced also between external sound sources at different positions (cf. Figs. 8 and 9). This behaviour is undesired, and unexpected if one-dimensional sound transmission through the ear canal is assumed [25,33,44]. We verified this behaviour in independent measurements with the present prototype [16], and similar variations have been noted in a previous version of the Hearpiece [21,32] as well as in vented hearing aids with a microphone at the inner end of the earmould [45].
Possible reasons for the dependence on the sound source are near-field coupling of the drivers to the In-Ear microphone, as well as deviations from one-dimensional sound transmission through the ear canal that are known to occur near jumps in cross-section [46]. The port of the In-Ear microphone has been placed approx. 2.5 mm away from the inner end, and jumps in cross-section at the couplings of drivers were minimized to avoid such effects. However, the present results show that this may not have been enough. Furthermore, the variation between external sound sources from different directions could be caused by direction-dependent interference of sound paths through the vent and additional leakage paths. The present and previous results demonstrate that such effects have to be taken into account when designing hearing devices with an In-Ear microphone. While several other studies presented developments exploiting an In-Ear microphone that was usually placed at the inner face of the device rather than inside the vent [24,44,45,47], to our knowledge only [45] reported on the RECTF directly, and identified similar dependence on the sound source as revealed here. Thus, identifying optimum designs based on the literature and the present results is not possible and further research is necessary to understand the underlying mechanisms and assure sourceindependent RECTFs in future hearing device designs.
It should also be pointed out that in the Hearpiece, higher levels are generated at the In-Ear microphone as compared to the eardrum at most frequencies (see Figs. 8 and 9). This is probably caused by the acoustic impedance inside the tube where the In-Ear microphone is located. This level difference of up to 30 dB has to be considered when designing the dynamic range of prototype systems that make use of the Hearpiece.

Probe tube measurements at the eardrum
Finally, the accuracy of the present probe tube measurements should be discussed. General issues that may limit the reliability of such measurements are a poor SNR at both ends of the audio frequency range due to transmission losses in the probe tube, as well as potential high-frequency errors due to standing wave minima if the probe tube is not placed directly at the eardrum [34].
A poor positioning of the probe tube would be expected to show up here as an increased variability in the high frequency end. In the present data, this is indeed observed above approx. 6-8 kHz (e.g. Fig. 6), however, it is hard to separate this issue from actual variations of the transfer functions due to geometric differences between ear canals. The insertion depth of the probe tube was controlled very carefully (cf. Sect. 2.3) and it can be assumed that the probe tube was placed within 1.5 mm from the eardrum, which should circumvent problems due to standing wave minima at frequencies below 20 kHz. However, above 10 kHz also transverse movements of the probe tube in the ear canal could influence the measured pressure quite drastically (> ±10 dB) due to complex pressure distributions across the eardrum [33].
The potentially limited SNR in a frequency range below 1 kHz had no evident influence on the present measurements. If poor SNR was an issue in the low-frequency responses, an increase of both between-participant and between-insertion standard deviations would be observed. This is not the case for any of the transfer functions at the eardrum assessed in the present database (see Fig. 10). At the high frequency end above 10 kHz, the SNR of probe tube microphones typically also decreases rapidly [34], which comes into addition to the positioning issue. In this study, another issue at high frequencies was an interfering sound component that did not travel through the probe tube but entered the probe microphone body directly (cf. Sect. 3.1). Our analyses showed that this component has a non-negligible influence on the responses at the occluded eardrum above 10 kHz, which is consistent with typical attenuation properties of probe tubes [34]. To reiterate, this component does not occur in the driver responses at eardrum.
We conclude that due to the well-known issues with probe tube microphone measurements, the presented transfer functions to the eardrum should be interpreted with care at frequencies above 8-10 kHz. This especially holds for conditions with the device in the ear. Fewer artefacts were noted in the driver responses as compared to the HRTFs at the occluded eardrum. At lower frequencies, no observations that would impose doubts on the responses obtained at the eardrum were found in the analyses.

Conclusion
We presented and evaluated a database of transfer functions of the Hearpiece, a commercially available earpiece for hearing device research [16]. The database comprises HRTFs from 87 incidence directions as well as responses of the two device's drivers, all measured at the four integrated microphones and the eardrum in the open and occluded ear. Measurements were made in both ears of 25 human participants and a KEMAR dummy head with anthropometric pinnae, for each five insertions of one sample device, as well as KEMAR measurements of nine additional devices from a series. In total, the database amounts to 169,878 HRTFs and 5740 driver responses/feedback paths and represents a full characterization of the acoustic properties of an in-ear hearing device and their practical variations.
The database can be used in research, development and evaluation of all sorts of hearing device algorithms and applications such as hearing aids, directional microphones, noise reduction, active noise and occlusion cancellation, feedback management, augmented reality or active hearing protection. A special benefit is that the algorithms can be easily transferred to portable real-time prototypes due to the joint open availability of the Hearpiece and compatible mobile processing platforms [16,28].
We analysed the variations of transfer functions in in-ear devices between participants and insertions and their implications on the behaviour of a device in the ear. While KEMAR data generally represents the behaviour of the present device in a median adult ear of our participant cohort well, our results demonstrate that for the design and evaluation of signal processing algorithms for in-ear devices, it is crucial to regard the variations occurring between individual human ears, insertions, and changes of the external sound field as captured in this database. Furthermore, the relative transfer function between the In-Ear microphone located in the ear canal and the eardrum showed dependencies on the sound source that is disturbing for several applications. While possible reasons like near-field effects and or multiple interference paths have been identified in previous literature, further research is required to better understand the exact origin of this effect and circumvent it in future hearing device designs.