Cross-site investigation on head-related and headphone transfer functions: variabilities in relation to loudness balancing

1 Headphone transfer function (HpTF) and head- 2 related transfer function (HRTF) measurements are 3 crucial in acoustic science and in binaural virtual 4 acoustic applications. Yet, their measurement set-up, 5 procedure or post-processing is diﬀerent for nearly ev- 6 ery lab, especially for the HRTF measurements. To 7 compare ﬁndings between diﬀerent labs, these mea- 8 surement deviations have to be quantiﬁed alongside 9 with their inﬂuence on perceptual aspects. In the 10 scope of a cross-site investigation on loudness bal- 11 ancing between headphone and loudspeaker listen- 12 ing, a set of HpTFs with three diﬀerent headphones 13 (open, closed, insert earphones) and HRTF close to 14 the eardrum were measured in 14 participants travel- 15 ling to two diﬀerent measurement sites at Aachen and 16 Oldenburg. 17 18 19 20 21 22 23 24 25 26 27


33
With the advent of three-dimensional video capturing 34 and consumer products for playback such as head-35 mounted displays, correct spatial audio presentation 36 is desired for immersive experiences. Binaural play-37 back over headphones is the most economic, easiest to 38 set-up and most flexible solution and therefore typi-39 cally used these days. To increase the authenticity of 40 these methods, individual head-related transfer func-41 tions (HRTFs) are usually considered ideal for bin-42 aural synthesis with playback over headphones com-43 pensated by individual headphone transfer functions 44 (HpTF) [1]. Rather complex multi-loudspeaker set-45 ups and methods for performing HRTF measurements 46 have been implemented at multiple research labs (see 47 [2] for an overview), however, no ubiquitous method 48 to do so has been established and it cannot be ex-49 pected that unified standards for this task will be 50 established. On the contrary, HpTFs are directly 51 measured on ear simulators or humans, but they un-52 derlie a certain dependency on fitting of the head-53 phone to the head which changes with every taking-54 off and putting-on (repositioning) of the headphones 55 [3, 4, 5, 6]. In addition, both HRTFs and HpTFs de-56 pend on the recording location in the ear, thus increas-57 ing the likelihood that the resulting functions criti-58 cally depend on the exact procedure pursued in each 59 laboratory and with each individual subject. The aim 60 of the current study therefore is to estimate the ex-61 pected variation across laboratories in relation to the 62 variability across subjects, and to evaluate these vari-63 ations by their respective consequences regarding the 64 estimated level at eardrum in a loudness balancing 65 experiment. 66 Hence, in a cross-site comparison between the lab 67 sites in Aachen and Oldenburg, a dataset of HRTFs 68 and HpTFs for three different headphones were mea-69 sured with probe microphones close to the eardrum in 70 14 participants travelling to both measurement sites. 71 Even though these probe tube measurements are more 72 time consuming than blocked-entrance measurements, 73 we employed this method since any kind of ear canal 74 entrance measurement would not be applicable for in-75 sert earphones that were included in the study, as well 76 as to correctly capture effects of acoustic loading of 77 the ear by headphones [5,7]. Additionally, the probe 78 tube placement close to the eardrum is an anatomi-79 cally well-defined location, as opposed to insertion at 80 shown in a companion paper [13] for the open head- phone presentation in a companion paper [13]. Such 132 mismatches, i.e. level differences at eardrum at equal 133 loudness, have been reported repeatedly since the 134 1940s [14,15,16] mostly for low frequencies and 135 closed headphones. The phenomenon was named by 136 the amount of mismatch occurring at low frequen-137 cies: "The missing 6dB". In 2011 Völk et al. [17] 138 showed that the loudness mismatch disappears when 139 applying binaural synthesis and compensation of the 140 headphone transfer function in a blind comparison 141 paradigm. They assume that the loudness mismatch 142 is caused by a mismatch in time and phase relations 143 between left and right ear. Nevertheless, the results 144 from our earlier study [13] with a larger variation 145 across listening conditions showed that some unex-146 plained mismatch still remains -especially across the 147 anechoic rooms in Aachen and Oldenburg, thus call-148 ing for a closer look at the underlying variabilities in 149 transfer functions for both sites.

150
The current paper therefore quantifies the cross-site 151 effects on HRTF and HpTF measurements as well 152 as loudness mismatch experiments. 14 participants 153 travelled to both laboratories, one in Aachen and 154 one in Oldenburg. In each lab a full HRTF set as 155 well as the HpTF for three different headphones were 156 measured. As detailed in [13], a loudness balancing 157 listening test was conducted where the subjects ad-158 justed headphone playback to equal loudness of loud-159 speaker playback of the same stimulus. Using individ-160 ual HRTFs and HpTFs, the appropriate levels created 161 at the eardrum, and the mismatch, i.e. level difference 162 at equal loudness, was calculated. While the previous 163 paper explored several factors influencing this mis-164 match (room, stimulus, binaural parameters of head-165 phone playback), the present study focusses on how 166 the mismatch, and in particular apparent disparities 167 between sites, could be explained by cross-site differ-168 ences between the underlying HRTF and HpTF data. 169 To this end, we restrict ourselves to the loudness bal-170 ancing results obtained in anechoic environments with 171 diotic headphone playback, but consider the results of 172 all three headphones and a wider frequency range of 173 the employed stimuli (125 Hz to 12 kHz). To measure the sound pressure level close to the 178 eardrum, a probe tube was inserted into the ear 179 canal until participants noticed a soft contact with the 180 eardrum. After slightly pulling back the probe tube, 181 the microphone and its housing were fixed with medi-182 cal tape on the subjects' cheek the ear to minimize the 183 influence of the measurement device on the fitting of 184 the headphones and on the incidence sound field. The 185 76 mm long probe tubes (Type 76109MBB, Precision 186 Cast Plastic Parts, Redding, CA, USA) were used on 187 ER7C series probe microphones (ER7C, Etymotic Re-188 search, Elk Grove Village, IL, USA). Once placed and 189 fixed, the probe tubes were not repositioned between 190 HpTF and HRTF measurement. HpTFs were measured in Oldenburg and Aachen with 267 duplicates of the same sound card with integrated 268 high-power headphone amplifier (ADI-2 PRO FS, 269 RME, Haimhausen, Germany). The headphones used 270 were repositioned by the experimenter between mea-271 surements and each headphone was measured eight 272 times for each participant to account for variabilities 273 due to fitting [22]. Measurements that revealed obvi-274 ous faults, e.g. broadband level reduction due to clog-275 ging or squeezing of probe tubes, or those including 276 false notches due to microphone misplacements, were 277 excluded. The probe tubes were not repositioned be-278 tween measurements.

279
The sensitivity and frequency response of the probe 280 tube microphones were acquired via the substitution 281 method as described in IEC 61094-8 [23] in the ane-282 choic chambers. In Aachen they were compared to a 283 GRAS 40AF half inch microphone while in Oldenburg 284 an eighth inch GRAS 46-DP1 was used. The measure-285 ments were shifted to a common delay and windowed 286 to a length of 93 ms (4096 samples at 44100 Hz   To understand the influence of measured HpTF and HRTF values on the results of the listening test we briefly formulate the mismatch (with '*' denoting a convolution) as:     Figure 3 shows the difference between the maximum 432 magnitude level measured in eight repetitions of fit-433 ting and the minimum level for each frequency bin.

434
For the open HD650 these differences are small (      In the Aachen anechoic chamber the re-502 spective mismatches for the tbn1000 and the tbn12000 503 differ significantly from other stimuli. In the Olden-504 burg anechoic chamber the high frequency tbn8000 505 and tbn12000 differ significantly from the lower ones, 506 with further analysis revealing only a difference for 507 the ER4 for this case and an additional difference to 508 the uen17 stimulus. The measurement conditions differed in various points 533 between the labs as listed in section 2.1.2. Only the 534 type of probe tube microphones used were the same 535 and agreements on the final response (length, over-536 all window and data format) for better comparison as 537 well as evaluation of the listening test were made. The 538 measured HRTFs across laboratory sites and partic-539 ipants in figure 1 state an increasing variability with 540 increasing frequency. Up to 6000 Hz the increase of 541 variability and its absolute value is moderate (up to 542 10 dB above 2000 Hz) while for higher frequencies 543 variations up to 20 dB occur due to different notch 544 centre frequencies.

545
The deviations across laboratory sites are reasonably 546 small (median below 2.7 dB) up to 6000 Hz. Besides 547 the very different approaches in measuring the HRTFs 548

578
Overall the deviations found do not exceed the ex-579 pected variability due to repetition of the measure-580 ment reported on the literature. Yet, some details 581 differ to the findings and can be mapped to labora-582 tory influence. The observable systematic deviation 583 at 500 Hz is most probably due to the interpolation 584 start in the Aachen post-processing, yet is consider-585 ably small (less than 1.5 dB). The deviation around 586 1600 Hz, which results from a notch in the Oldenburg 587 data that is not present in the Aachen data, could be 588 a change in posture for the different settings (seated 589 vs. standing with neck-rest). Influence of a possi-590 ble reflection from legs or knees are expected to affect 591 lower frequencies [10,18]. Also, influences of clothing 592 as investigated by [10] and [25] do not point towards 593 a narrow-band difference only that occurs in this re-594 gion. Another explanation could be found in different 595 angles of head pitch. As the effect is in a narrow band 596 and can be observed for both ears equally, head ro-597 tation other than pitch angle seem unlikely. For the 598 same reason, influence by a systematic participants' 599 displacement from the reference point of the measure-600 ment set-up seems unlikely as it would affect the left 601 and right side differently. Furthermore, the effect de-602 creases with increasing distance of the incidence from 603 the horizontal plane and vanishes about 45 degrees to 604 the left or right side. In the median plane the effects 605 shift towards lower frequencies for negative elevations 606 to about 1100 to 1200 Hz at -20 degree and vanishes 607 at an elevation of -30 degree. For positive elevation 608 to 2300 Hz at +20 degree and vanishes (or averages 610 out) at +30 degree.

611
Another systematic difference can be observed around 612 10 kHz in the shape of a notch followed by a peak.

613
The frequency range and the wider characteristics of

636
The DT770 shows slightly increased variability to-637 wards lower frequencies whereas the HpTFs become 638 unreliable for ER4 insert earphones for frequencies be-639 low 900 Hz with median variabilities up to 8 dB only 640 due to repositioning around 70 Hz and variability of 641 15 dB across subjects and laboratory sites at that fre-642 quency. Towards higher frequencies intra-and inter-643 subject variabilities increases for all headphones used 644 whereas systematic influence across laboratory sites 645 can only be observed at 10 kHz and above 13 kHz.

646
The results found for intra-subject variability due to 647 headphone repositioning in figure 3 show that the fre-648 quency range for small deviations of up to ± 1 dB 649 reaches up to 6000 Hz, which is comparable to the 650 findings of Völk [6]. Even though their investigation 651 focuses on open type headphones only and blocked 652 ear canal measurements, it offers a good reference, 653 especially for the HD650, because of its high num-654 ber of repetition and the utilization of both, inter-655 and intra-subject measurements. For the influence 656 of repositioning the HD650 they found variabilities 657 below 1 dB up to 6000 Hz with increased variabili-658 ties above these frequencies up to 6 dB whereas the 659 present study shows higher deviations especially for 660 frequencies above 13 kHz, which might relate to the 661 use of probe tubes and the measured sensitivity for the 662 compensation of the microphones.  ence for 1000 Hz can neither be found in the HRTF 725 differences nor in the HpTFs. The discussed devia-726 tions around 1600 Hz (see section 4.1) are above the 727 band limits of the one-third octave band noise at 1000 728 Hz. Reflections from legs or knees that might occur 729 when seated are below these frequency limits. As the 730 same hardware with synchronized settings were used 731 at both sites, the digital headphone output levels mea-732 sured in Aachen and Oldenburg are comparable to a 733 large extent and the deviation can already be found in 734 this data, before applying any HpTF or HRTF data. 735 Therefore, small differences between the rooms are ex-736 pected to cause the mismatch around 1 kHz (see [13] 737 for a detailed discussion).

738
The effect of probe tube misplacements, as described 739 in the sections above, on the listening test results is 740 minimized as they cancel out in the divisions of HpTF 741 by HRTF data as described in equation 1. This can 742 also be seen in figure 6, at least for the HD650 and 743 the DT770.  repositioning induced more variability towards lower 811 frequencies as the closed fit is more sensitive to fitting. 812 The ER4 insert earphones highly depend on the fit-813 ting of the sealing of the ear canal using double-dome 814 silicon plugs. This fitting was further influenced by 815 the presence of the probe tubes and led to high varia-816 tions between labs and between repositioning, the lat-817 ter particularly towards lower frequencies. However, 818 between the frequency range of about 900 to 3500 819 Hz the ER4 exhibited in reproducible HpTFs. To-820 wards higher frequencies variability increases for all 821 headphones due to shifted peaks and notches in the 822 transfer functions.

823
Last, the influence of the probe tube position as one 824 possible source of inaccuracy may lead to rather large 825 deviations between measured transfer functions across 826 sites. Yet, this effect cancels out in loudness balancing 827 tests when HpTF and HRTF measurements are done 828 with the same placing of the probe tubes, i.e. they 829 stay fixed between both measurements.

830
On the one hand, the influence of the transfer func-831 tions on the loudness balancing results can be shown 832 for deviations around 12 kHz, where differences in mi-833 crophone sensitivities lead to high deviations in HpTF 834 measurements between the laboratory sites and con-835 sequently in high deviations found for the loudness 836 mismatch at this frequency. Also, the use of individ-837 ual measurements clears the difference between the 838 use of the HD650 and the DT770 at lower frequen-839 cies. On the other hand, deviations in transfer func-840 tions can not explain different cross-site differences of 841 the loudness mismatch at 1000 Hz.

842
The gathered HRTF and HpTF data sets [30] can 843 be used for a more thorough investigation of general 844 (cross-site) measurement accuracy, while the present 845 investigation was focussed mainly on explaining dif-846 ferences in loudness balancing. The data shown here 847 solely focuses on the frontal direction and magnitude 848 differences. Time and phase information, interaural 849 cues as well as influences of incidence angles over the 850 whole sphere can be taken into account. For each par-851 ticipant, blocked ear canal measurements were also 852 taken (except for the insert earphones) and can be 853 compared to the presented eardrum measurements.