Issue |
Acta Acust.
Volume 6, 2022
Topical Issue - Auditory models: from binaural processing to multimodal cognition
|
|
---|---|---|
Article Number | 21 | |
Number of page(s) | 14 | |
DOI | https://doi.org/10.1051/aacus/2022009 | |
Published online | 26 May 2022 |
Scientific Article
Using a blind EC mechanism for modelling the interaction between binaural and temporal speech processing
1
Medizinische Physik and Cluster of Excellence Hearing4All, Carl-von-Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
2
Fraunhofer Institute for Digital Media Technology IDMT and Cluster of Excellence Hearing4All, Marie-Curie-Straße 2, 26129 Oldenburg, Germany
* Corresponding author: saskia.roettges@uol.de
Received:
1
April
2021
Accepted:
28
February
2022
We reanalyzed a study that investigated binaural and temporal integration of speech reflections with different amplitudes, delays, and interaural phase differences. We used a blind binaural speech intelligibility model (bBSIM), applying an equalization-cancellation process for modeling binaural release from masking. bBSIM is blind, as it requires only the mixed binaural speech and noise signals and no auxiliary information about the listening conditions. bBSIM was combined with two non-blind back-ends: The speech intelligibility index (SII) and the speech transmission index (STI) resulting in hybrid-models. Furthermore, bBSIM was combined with the non-intrusive short-time objective intelligibility (NI-STOI) resulting in a fully blind model. The fully non-blind reference model used in the previous study achieved the best prediction accuracy (R2 = 0.91 and RMSE = 1 dB). The fully blind model yielded a coefficient of determination (R2 = 0.87) similar to that of the reference model but also the highest root mean square error of the models tested in this study (RMSE = 4.4 dB). By adjusting the binaural processing errors of bBSIM as done in the reference model, the RMSE could be decreased to 1.9 dB. Furthermore, in this study, the dynamic range of the SII had to be adjusted to predict the low SRTs of the speech material used.
Key words: Speech intelligibility prediction / Temporal processing / Binaural processing / Auditory model
© The Author(s), Published by EDP Sciences, 2022
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.