Modeling of speech-dependent own voice transfer characteristics for hearables with an in-ear microphone

Open Access

Issue		Acta Acust. Volume 8, 2024


Article Number		28
Number of page(s)		13
Section		Speech
DOI		https://doi.org/10.1051/aacus/2024032
Published online		28 August 2024

R.E. Bouserhal, A. Bernier, J. Voix: An in-ear speech database in varying conditions of the audio-phonation loop, Journal of the Acoustical Society of America 145, 2 (2019) 1069–1077. [CrossRef] [PubMed] [Google Scholar]
M.Ø. Hansen: Occlusion effects part I and II, PhD thesis, Department of Acoustic Technology, Technical University of Denmark, 1998. [Google Scholar]
S. Stenfelt, S. Reinfeldt: A model of the occlusion effect with bone-conducted stimulation, International Journal of Audiology 46, 10 (2007) 595–608. [Google Scholar]
S. Vogl, M. Blau: Individualized prediction of the sound pressure at the eardrum for an earpiece with integrated receivers and microphones, Journal of the Acoustical Society of America 145, 2 (2019) 917–930. [CrossRef] [PubMed] [Google Scholar]
S. Reinfeldt, P. Östli, B. Håkansson, S. Stenfelt: Hearing one’s own voice during phoneme vocalization – transmission by air and bone conduction, Journal of the Acoustical Society of America 128, 2 (2010) 751–762. [CrossRef] [PubMed] [Google Scholar]
H. Saint-Gaudens, H. Nélisse, F. Sgard, O. Doutres: Towards a practical methodology for assessment of the objective occlusion effect induced by earplugs, Journal of the Acoustical Society of America 151, 6 (2022) 4086–4100. [CrossRef] [PubMed] [Google Scholar]
T. Zurbrügg, A. Stirnemannn, M. Kuster, H. Lissek: Investigations on the physical factors influencing the ear canal occlusion effect caused by hearing aids, Acta Acustica united with Acustica 100, 3 (2014) 527–536. [Google Scholar]
J. Richard, V. Zimpfer, S. Roth: Effect of bone-conduction microphone location and mouth opening on transfer function between oral cavity sound pressure and skin acceleration, in: Proceedings of Convention of the European Acoustics Association (Forum Acusticum), Turin, Italy, 11–15 September, 2023, pp. 4725–4732. [Google Scholar]
C. Pörschmann: Influences of bone conduction and air conduction on the sound of one’s own voice, Acta Acustica united with Acustica 86, 6 (2000) 1038–1045. [Google Scholar]
M.K. Brummund, F. Sgard, Y. Petit, F. Laville: Three-dimensional finite element modeling of the human external ear: simulation study of the bone conduction occlusion effect, Journal of the Acoustical Society of America 135, 3 (2014) 1433–1444. [CrossRef] [PubMed] [Google Scholar]
S. Liebich, J. Fabry, P. Jax, P. Vary: Signal processing challenges for active noise cancellation headphones, in: Proceedings of 13th ITG-Symposium on Speech Communication, Oldenburg, Germany, 10–12 October 2018, VDE, pp. 11–15. [Google Scholar]
P. Rivera Benois, R. Roden, M. Blau, S. Doclo: Optimization of a fixed virtual sensing feedback ANC controller for in-ear headphones with multiple loudspeakers, in: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May, 2022, IEEE, 8717–8721. [Google Scholar]
T. Zurbrügg: The occlusion effect – measurements, simulations and countermeasures, in: Proceedings of 13th ITG-Symposium on Speech Communication, Oldenburg, Germany, 10–12 October, 2018, VDE, pp. 26–30. [Google Scholar]
S. Liebich, P. Vary: Occlusion effect cancellation in headphones and hearing devices – the sister of active noise cancellation, IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022) 35–48. [CrossRef] [Google Scholar]
R.E. Bouserhal, T.H. Falk, J. Voix: In-ear microphone speech quality enhancement via adaptive filtering and artificial bandwidth extension, Journal of the Acoustical Society of America 141, 3 (2017) 1321–1331. [CrossRef] [PubMed] [Google Scholar]
H. Wang, X. Zhang, D. Wang: Fusing bone-conduction and air-conduction sensors for complex-domain speech enhancement, IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022) 3134–3143. [CrossRef] [PubMed] [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Training strategies for own voice reconstruction in hearing protection devices using an in-ear microphone, in: Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, 05–08 September, 2022, IEEE. [Google Scholar]
J. Hauret, T. Joubaud, V. Zimpfer, É. Bavu: Configurable EBEN: extreme bandwidth extension network to enhance body-conducted speech capture, IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023) 3499–3512. [CrossRef] [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Multi-microphone noise data augmentation for DNN-based own voice reconstruction for hearables in noisy environments, in: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, 14–19 April, 2024, IEEE, pp. 416–420. [Google Scholar]
V. Panayotov, G. Chen, D. Povey, S. Khudanpur: Librispeech: an ASR corpus based on public domain audio books, in: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April, 2015, IEEE, pp. 5206–5210. [Google Scholar]
T. Ko, V. Peddinti, D. Povey, M.L. Seltzer, S. Khudanpur, A study on data augmentation of reverberant speech for robust speech recognition, in: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 05-09 March, 2017, IEEE, pp. 5220–5224. [Google Scholar]
W. He, P. Motlicek, J.-M. Odobez: Neural network adaptation and data augmentation for multi-speaker direction-of-arrival estimation, IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021) 1303–1317. [CrossRef] [Google Scholar]
P. Srivastava, A. Deleforge, E. Vincent: Realistic sources, receivers and walls improve the generalisability of virtually-supervised blind acoustic parameter estimators, in: Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, 05–08 September, 2022, IEEE. [Google Scholar]
M. Pucher, T. Woltron: Conversion of airborne to bone-conducted speech with deep neural networks, in: Proceedings of Interspeech, Brno, Czechia, August, 2021, pp. 1–5. [Google Scholar]
F. Denk, M. Lettau, H. Schepker, S. Doclo, R. Roden, M. Blau, J.-H. Bach, J. Wellmann, B. Kollmeier: A one-size-fits-all earpiece with multiple microphones and drivers for hearing device research, in: Proceedings of AES International Conference on Headphone Technology, San Francisco, USA, 27–29 August, 2019, AES. [Google Scholar]
S. Haykin: Adaptive filter theory, 3rd edn., Prentice Hall, 1996. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Speech-dependent modeling of own voice transfer characteristics for in-ear microphones in hearables, in: Proceedings of Convention of the European Acoustics Association (Forum Acusticum), Turin, Italy, 11–15 September, 2023, pp. 1899–1902. [Google Scholar]
L. Ljung: System identification, in: A. Procházka, J. Uhlíř, P.W.J. Rayner, N.G. Kingsbury (Eds.), Signal analysis and prediction: applied and numerical harmonic analysis, Springer, 1998, pp. 163–173. [Google Scholar]
Y. Avargel, I. Cohen: On multiplicative transfer function approximation in the short-time Fourier transform domain, IEEE Signal Processing Letters 14, 5 (2007) 337–340. [CrossRef] [Google Scholar]
A.P. Simpson, K.J. Kohler, T. Rettstadt: The Kiel corpus of read/spontaneous speech: acoustic data base, processing tools, and analysis results, Arbeitsberichte Institut für Phonetik und Digitale Sprachverarbeitung Universität Kiel 32 (1997) 243–247. [Google Scholar]
A. Neustein, 100 Sätze reichen für ein ganzes Leben (Blog-post), August, 2019. Available at https://deutschlernerblog.de/100-saetze-reichen-fuer-ein-ganzes-leben/. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: German own voice recordings with hearable microphones, Zenodo, 2024. https://doi.org/10.5281/zenodo.10844598. [Google Scholar]
A. Gray, J. Markel: Distance measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing 24, 5 (1976) 380–391. [CrossRef] [Google Scholar]
R.F. Kubichek: Mel-cepstral distance measure for objective speech quality assessment, in: Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing, Victoria, BC, Canada, 19–21 May, 1993, IEEE, pp. 125–128. [Google Scholar]
International Telecommunications Union (ITU): ITU-T P.862, Perceptual Evaluation of Speech Quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, International Telecommunications Union, 2001. Available at https://www.itu.int/rec/T-REC-P.862. [Google Scholar]
J. Richard, V. Zimpfer, S. Roth: Comparison of objective and subjective methods for evaluating speech quality and intelligibility recorded through bone conduction and in-ear microphones, Applied Acoustics 211 (2023) 109576. [CrossRef] [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Modeling of speech-dependent own voice transfer characteristics for hearables with in-ear microphones: audio examples, Zenodo, 2024. https://doi.org/10.5281/zenodo.11371976. [Google Scholar]
A. Edraki, W.-Y. Chan, J. Jensen, D. Fogerty: Speaker adaptation for enhancement of bone-conducted speech, in: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, 14–19 April, 2024, IEEE, pp. 10456–10460. [Google Scholar]
L. He, H. Hou, S. Shi, X. Shuai, Z. Yan: Towards bone-conducted vibration speech enhancement on head-mounted wearables, in: Proceedings of 21st Annual International Conference on Mobile Systems, Applications and Services, Helsinki, Finland, 18–22 June, 2023, Association for Computing Machinery, pp. 14–27. [Google Scholar]
M. Wang, J. Chen, X.-L. Zhang, S. Rahardja: End-to-end multi-modal speech recognition on an air and bone conducted speech corpus, IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023) 513–524. [CrossRef] [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.