In ﬂ uence of sound spatial reproduction method on the detectability of reversing alarms in laboratory conditions

– This paper investigates the ability of binaural recording and reproduction to be used for measuring the detectability of reversing alarms in laboratory experiments. A complex and repeatible scenario was created using a wave-ﬁ eld synthesis system (WFS), and in-situ recordings in a lime mine. The reproduced sound ﬁ eld was further recorded with a dummy-head. Participants were asked to achieve a visual task (target tracking) while detecting two types of reversing alarms (tonal and broadband), mimicking an approaching vehicle. The experiment was conducted twice : at the center of a WFS array and in a sound-proof booth, using binaural recordings presented with headphones. Results showed that the detection times measured using binaural listening were signi ﬁ cantly different from those measured in a fully immersive sound ﬁ eld reproduction. These differences were also greater with tonal sounds compared to broadband sounds. This study shows the limitations of the binaural technique to be used for such applications.


Introduction
Almost 25% of deadly accidents involving vehicles in the workplace occur when the vehicle is reversing [1]. Moreover, from accident reports published by the Occupational Safety and Health Administration (OSHA) from 1972 to 2001, Purswell and Purswell [2] estimated that approximately 43% of the 150 reported accidents that involved vehicles occurred despite the reversing alarm being functional at the time. Consequently, the detectability of reversing alarms is an important safety issue and better understanding of alarm detectability is required. Two types of alarms are mainly used for reversing vehicles. A first one, noted tonal in the paper, uses short pure-tone bursts (approximately 1000 Hz, 0.5-s signal followed by 0.5 s of silence) [3,4] while the second one, noted broadband in the paper, uses band limited noise bursts (approximately from 1500 to 5000 Hz, 0.5 s signal followed by 0.5 s of silence) [4,5]. Compared to tonal alarms, broadband noise alarms seem to be easier to localize because of the more homogeneous sound propagation pattern with interference effects strongly attenuated by the fact that these alarms generate an audible signal over a frequency range in the most sensitive hearing range, between 2000 and 4000 Hz. Consequently, broadband noise alarms are recommended as they are easier for operatives to locate which improves site safety [6].
For safety and ethical reasons related to reversing alarms, the evaluation of the detectability can be conducted in a laboratory environment using recording and playback devices. In that field, one of the most widely-used techniques consists in recording sounds using a dummy-head and reproducing them through headphones. While more sophisticated methods, such as Higher Order Ambisonics (HOA) recordings with binaural reproduction using personalized Head-Related Transfer Functions (HRTF), have been developed, the use of the "dummy-head" technique remains prevalent in industrial companies. The advantages of this procedure are: (1) the ease of implementation and (2) the accurate reproduction of the sound signal at the entrance of the listener's ear canal when the listener's head is static. However, this procedure faces known limitations: The Head-Related Transfer Functions (HRTF) of the dummy head can be slightly different from the ones of the listeners, hence changing spatial sound localization cues [7]. *Corresponding author: m.olivier.valentin@gmail.com Furthermore, in a real listening situation, Interaural Level Difference (ILD), Interaural Time Delay (ITD), spectral cues and binaural cues change with the tiniest head rotation. In contrast, for artificial hearing using binaural reproduction without head tracking, when the listener rotates his head, the auditory scene presented over headphones also rotates, which is contradictory to what can be experienced when the head is rotating in natural hearing conditions, i.e. on site.
These underlying limitations of binaural reproduction raise some questions on the possible utilization of binaural recordings in the context of measuring detection time: is this reproduction method realistic enough to provide a reliable assessment of alarm detectability in noise? In the absence of sufficient conclusive evidence in the literature, we decided to evaluate the realism of binaural listening in the context of evaluating detectability of reversing alarms in noise. Binaural listening test results were benchmarked against evaluations performed in a fully immersive sound field reproduction of an acoustic scenario, used as a simplified simulation of the in-situ acoustic reality.
In this paper, the fully immersive reproduction is based on a more general yet adapted sound-field reconstruction (SFR) method with wave field synthesis (WFS) using point source distributions (i.e. the loudspeakers). Gains, and phases are adjusted for each frequency and each time to reproduce the target sound field as measured by a microphone array [8]. Gains and phases are adjusted on the basis of multichannel system inversion, where the inverted system is made of all the transfer paths from the loudspeakers to the microphones of the measuring microphone array.
The initial hypothesis to be verified was that detection of alarm signals performed in a fully immersive sound field reproduction is expected to be different from the binaural evaluation as the test-subjects can move their head inside the area bounded by the loudspeaker array, which is not possible for binaural recording and playback without head tracking.
The paper is organized as follows: Section 2 presents the methodology, while results are presented in Section 3 and discussed in Section 4. Conclusions and future works are detailed in Section 5.

Participants
Twenty individuals having hearing threshold below 25 dB HL (Hearing Level) participated in the first experiment carried out at the Laboratoire Vibrations Acoustique (LVA) in INSA-Lyon, France. Ten participants were master students and ten were staff of the laboratory.
Twenty-three individuals having hearing threshold below 25 dB HL participated in the second experiment carried out at the Groupe d'Acoustique de l'Université de Sherbrooke (GAUS), Canada. The subjects who took part in the second experiment were all distinct from those of the first experiment. Eleven participants were graduate students, nine were interns and three were staff of the laboratory.
This study was carried out in accordance with the Declaration of Helsinki and was reviewed and approved by the Comité d'éthique pour la recherche, the Internal review Board at Université de Sherbrooke, Québec, Canada. Informed consent was obtained from all participants before they were enrolled in the study.

Experimental procedure
Two sets of experiments were conducted to investigate the influence of the spatial sound reproduction approach on the detectability of reversing alarms in laboratory environments. The first set of experiments, conducted in Canada at GAUS, aimed to assess the detection of reversing alarms using SFR with WFS with a loudspeaker array. WFS was selected as the reference baseline since it is a physical approach to spatial audio while relying on sound field reproduction paradigm. Besides, the accuracy of the reproduced sound field was also verified by physical means and microphone array. The second set of experiments, conducted in France at the LVA, aimed to assess the detection of reversing alarms using headphones playback of binaural recordings inside a double-walled audiometric booth.
In both experiments, the participants remained seated and had to track a 2 cm 2 moving on a computer screen. The square remained fixed during a time randomly selected between 0.8 and 1.5 s after which the position changed randomly. The purpose of this task was to draw the participant's attention away from the main task of the test, that is, alarm sound detection, as it is the case in real-life situations. Simultaneously, participants had to detect the sound of an increasingly louder back-up alarm as soon as it became audible by pressing a key on a computer keyboard. Detection times (with respect to a known increasing level alarm sequence) were stored in the computer running the experiment and were used to compute the corresponding detection levels. A total of 2 Â 10 tonal alarms and 2 Â 10 broadband alarms (cf. Tab. 1, A and B correspond to two sequences of the same alarm) were presented to each participant. The sound environment (excluding the alarms) was played at 75 dB and presented over headphones or loudspeakers, are detailed in the next section.

On-site recordings
In-situ recordings were used for the generation of reproduced sound environment. Measurements of the target sound field were performed using a custom microphone array at an open-air lime mine with moving and stationary large machinery as well as in a factory. Recordings were performed at five different locations on the site. Pictures of the measurements in the lime mine are shown in Figure 1.
The microphone array used for the sound field capture consists of a circular double-layer (alternating between inner and outer radius) array of 1.23 m inner radius and 1.27 m outer radius [9]. The array is made of 85 custombuilt microphones and preamplifiers. Five of the microphones are located inside the circular region, one of which is used as the main reference and is located at the center of the circular array. The microphone array was calibrated for the on-site measurements prior to the measurements. All recordings were done with a sampling rate of 48 kHz.

Stimuli generation
The sound field reproduction was performed in a room at Université de Sherbrooke equipped with a square array of 96 loudspeakers of approximately 4 m by 4 m, at 1.55 m above the ground. Adjacent loudspeakers are separated by a distance of 16.25 cm. Four subwoofers, used to generate the frequency content below 120 Hz, are located in the four corners of the square loudspeaker array. Therefore, it is not expected that the reproduction will be spatially accurate below 120 Hz. The subwoofer signals are derived as a downmix of the corresponding 24-loudspeaker bars.
A 30-minute sound environment was designed from the overlay of several sound environments captured in the factory and the mining site, using the microphone array described in Section 2.3.1. The superposition of these different environments was aimed at obtaining a relatively realistic stationary, broadband factory sound environment [the average fluctuation strength, computed using ArtemiS SUITE with Psychoacoustics Module (HEAD acoustics, Herzogenrath, Germany) was inferior to 0.45 mvacil in every critical band of hearing].
Then, the virtual sound environment was reproduced using the loudspeaker array. The loudspeaker driving signals were obtained by solving a multi-channel equalization problem with the microphone array now placed in the center of the loudspeaker array. The loudspeaker inputs are calculated at each frequency so as to provide the same complex sound pressures at the microphone locations as measured on site. In the multi-channel inversion, the loudspeakers are assumed to behave as point sources in free-field conditions [9]. By doing so, the reproduced sound environment is accurate, in terms of sound pressure, for the entire area covered by the microphone array, i.e. a circle of about 1.3 m in diameter. An average sound pressure level of 75 dB lin (± 1 dB) was measured, in the center of loudspeaker array, using a SQuadriga II recorder with a long averaging time.
This 30-minute sound environment (Fig. 2) was split into four segments of about 7-8 min to provide several break periods to participants. Each of these four segments was mixed with 10 alarm-segments of 20 s each. An alarm-segment consisted of 20 alarms of 0.5 s each separated by 0.5 s of silence. All alarms of the same alarm-segment were of the same type: either tonal or broadband. They were  mixed with the sound environment such that the interval between two alarm-segments was set between 10 s and 40 s duration. The start of each alarm-segment was randomly placed in the 7-8 min sound segment. Finally, the direction of arrival of each alarm-segment was randomly drawn between front, back, right, and left. These directions of arrival were reproduced with a single loudspeaker from the array, for a strictly physical position (i.e. not virtual) of the alarm sound source. It is important to note that in our case, the alarms stimuli, either tonal (Fig. 3, left) or broadband (Fig. 3, right), were synthesized for a total control of the time and frequency parameters. The sequences for each type of alarm are presented in Table 1.
To simulate that a vehicle was backing up towards the participant, the overall alarm level was increased step-bystep using a 1 dB step every second. The signal (alarm sound) to background noise ratio varied therefore between À30 dB to À10 dB for each alarm segment.
As the same stimuli were chosen to be presented using binaural sound and headphones during the second set of experiments, binaural recordings of this auditory laboratory scene were performed using a G.R.A.S. KEMAR head and torso simulators type 45BA placed at the centre of the loudspeaker array. The ears of the manikin were placed in the same plane as the loudspeakers. The manikin was equipped with G.R.A.S. large ears type KB0065 (right ear) and KB0066 (left ear). Both ears were embedded with G.R.A. S. 1/2 00 prepolarized pressure microphones 40AD and G. R.A.S. preamps type 26CB. This recording was then later used for binaural reproduction.

Stimuli presentation
At the GAUS, stimuli were presented using a 96-loudspeaker system while the participants remained seated at the centre of the loudspeaker array with their head approximately in the loudspeaker plane. At the LVA, stimuli were presented binaurally using Sennheiser HD600 electrodynamic headphones while the participants remained seated on a chair inside a double-walled audiometric booth. The recorded signals were filtered in order to compensate the frequency response of the headphones. Figure 4 presents the average SNR for each detected alarm, computed from the average detection time for both sets of experiments using the following formula:

Results
where SNR alarm (t = t 0 ) corresponds to the lowest initial SNR for the alarm presented to the participant (i.e., À30 dB), Dp alarm corresponds to the increment of the SPL of the alarm presented to the participant (i.e., +1 dB/s), Figure 2. Frequency spectra of the sound environment generated using sound-field reconstruction (SFR) method with wave field synthesis (WFS). The spectra were recorded binaurally using a a head and torso simulator equiped with intra-auricular microphones from an average over 30 s of signal. The grey and black curves correspond respectively to the signal recorded using the right and the left ear of the head and torso simulator. and t detection corresponds to the alarm detection time.
Individual detection times, undetected alarms scores (i.e., the percentage of missed alarm) and click scores (i.e., the percentage of click on the square moving on the monitor), obtained for both sets of experiments are shown in Tables 2 and 3.

Discussion
The low target detection error rates, referred as the "undetected alarm" scores in Tables 2 and 3, demonstrate that participants were able to easily detect each alarm from the background noise during the experiments (mean = 1.63% for the SFR with WFS set of experiments and 2.50% for the binaural set of experiments). Furthermore, the high click scores (mean = 94.08% for the SFR with WFS set of experiments and 94.81% for the binaural set of experiments) confirm that the participants' attention was correctly drawn on the target detection task.
SNR results presented in Figure 4 indicate that the detection threshold values of the different alarm conditions are statistically significantly different, regardless of the sound reproduction method or the type of alarm. Indeed, this observation is strongly supported by a Friedman Rank Sum Test procedure [11], computed using R 3.6.1 with MASS 7.3-51.4 [12], which rejects the null hypothesis at a 1% significance level (p = 6.376 Â 10 À11 ), confirming that detection threshold depends on both the type of alarm and the method of sound reproduction.
Additionally, detection time results from Tables 2 and 3 suggest that the tonal alarm is detected earlier than the broadband alarm, regardless of the presentation method (D SFR with WFS % À2.7 s, D binaural reproduction % À1.0 s). Such an observation was expected because tonal alarms have their energy focused on a narrower frequency band than broadband alarms. Therefore, for the same SPL, tonal alarms are easier to detect than broadband alarms. Wilcoxon rank sum tests strongly support this observation as they reject the null hypothesis at a 1% significance level, confirming that the average detection time of a broadband alarm presented using SFR with WFS is higher than the average detection time of a tonal alarm presented using SFR with WFS (p = 4.2945e-6). Similarly, it is also confirmed that the average detection time of a broadband alarm presented using binaural reproduction with headphones is higher than the average detection time of a tonal alarm presented using binaural reproduction with headphones (p = 4.7014e-3).  Also, presenting the stimuli using SFR with WFS reduces the average detection time compared to binaural presentation, for the two types of alarms (D tonal % À3.8 s, D broadband % À2.2 s). This observation is strongly supported by Wilcoxon rank sum tests which reject the null hypothesis at a 1% significance level, confirming that the average detection time of a tonal alarm presented using binaural reproduction with headphones is higher than the average detection time of tonal alarm presented using SFR with WFS (p = 8.2744e-8). Similarly, it is also confirmed that the average detection time of a broadband alarm presented using binaural reproduction with headphones is higher than the average detection time of broadband alarm presented using SFR with WFS (p = 3.3335e-6).
These differences between the SFR with WFS results and the binaural results were expected. Indeed, in the SFR with WFS experiments, participants were able to move their heads to exploit the spatial variations of the sound field (more marked in the tonal case), thus facilitating alarm detection during signal presentations compared to the binaural experiments.

Conclusion and future work
For replicability, safety, and economical reasons, performing alarm detectability tests in situ is a challenge. Consequently, such evaluations are classically performed in a laboratory environment, using stimuli recorded with a dummy-head and presented through headphones. However, assessing alarm detection using such a static binaural technique is not optimal since participants cannot exploit the spatial variations of the sound field by moving their head, which can greatly improve the localization of the sound to be detected [13][14][15][16][17].
In this paper, we benchmarked binaural alarm detection tests against those performed in a spatial sound field reproduction of the original scene using a loudspeaker array and SFR with WFS. Our results suggest that tonal alarms are detected earlier than broadband alarms, probably because the energy of tonal alarms is concentrated in a narrower frequency band compared to broadband alarms. Furthermore, alarms presented using a loudspeaker array with SFR with WFS had a lower detection threshold than when presented using headphones, regardless of the type of alarms (tonal or broadband). The proposed explanation is that, in the SFR with WFS experiments, participants were able to move their heads to exploit the spatial variations of the sound field (more marked in the tonal case), thus facilitating alarm detection during signal presentations compared to the binaural experiments.
Future work will include binaural testing with head tracking to confirm whether binaural listening introduce a bias when evaluating alarm detection in noisy environment, because of the lack of head movement.