Subjective quality evaluation of personalized own voice reconstruction systems

Open Access

Issue		Acta Acust. Volume 10, 2026


Article Number		26
Number of page(s)		17
Section		Auditory Quality of Systems
DOI		https://doi.org/10.1051/aacus/2026021
Published online		06 April 2026

R.E. Bouserhal, T.H. Falk, J. Voix: Integration of a distance sensitive wireless communication protocol to hearing protectors equipped with in-ear microphones, in: Proceedings of the Meetings on Acoustics (ICA). Vol. 19. Montreal, QC, Canada, 2013. [Google Scholar]
S. Nordholm, A. Davis, P.C. Yong, H.H. Dam: Assistive listening headsets for high noise environments: protection and communication, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 2015, pp. 5753–5757. [Google Scholar]
R.E. Bouserhal, A. Bernier, J. Voix: An in-ear speech database in varying conditions of the audio-phonation loop. Journal of the Acoustical Society of America 145, 2 (2019) 1069–1077. [CrossRef] [PubMed] [Google Scholar]
B. Gårdbæk, P. Kidmose: On the origin of cardiovascular sounds recorded from the ear. IEEE Transactions on Biomedical Engineering 72, 1 (2025) 210–216. [Google Scholar]
M.Ø. Hansen: Occlusion effects Part I and II. Ph.D. thesis, Department of Acoustic Technology, Technical University of Denmark, 1998. [Google Scholar]
T. Zurbrügg, A. Stirnemannn, M. Kuster, H. Lissek: Investigations on the physical factors influencing the ear canal occlusion effect caused by hearing aids. Acta Acustica United with Acustica 100, 3 (2014) 527–536. [Google Scholar]
S. Stenfelt, S. Reinfeldt: A model of the occlusion effect with bone-conducted stimulation. International Journal of Audiology 46, 10 (2007) 595–608. [Google Scholar]
S. Vogl, M. Blau: Individualized prediction of the sound pressure at the eardrum for an earpiece with integrated receivers and microphones. Journal of the Acoustical Society of America 145, 2 (2019) 917–930. [CrossRef] [PubMed] [Google Scholar]
S. Reinfeldt, P. Östli, B. Håkansson, S. Stenfelt: Hearing one’s own voice during phoneme vocalization – Transmission by air and bone conduction. Journal of the Acoustical Society of America 128, 2 (2010) 751–762. [CrossRef] [PubMed] [Google Scholar]
H. Saint-Gaudens, H. Nélisse, F. Sgard, O. Doutres: Towards a practical methodology for assessment of the objective occlusion effect induced by earplugs. Journal of the Acoustical Society of America 151, 6 (2022) 4086–4100. [CrossRef] [PubMed] [Google Scholar]
J. Richard, V. Zimpfer, C. Blondé-Weinmann, S. Roth: Change in transfer function between air and bone conduction microphones due to mouth opening variation. Applied Acoustics 228 (2025) 110293. [Google Scholar]
F. Denk, B. Kollmeier: The hearpiece database of individual transfer functions of an in-the-ear earpiece for hearing device research. Acta Acustica 5 (2021) 2. [CrossRef] [EDP Sciences] [Google Scholar]
K. Kondo, T. Fujita, K. Nakagawa: On equalization of bone conducted speech for improved speech quality, in: Proceedings of the International Symposium on Signal Processing and Information Technology, Vancouver, BC, Canada, 2006, pp. 426–431. [Google Scholar]
H.S. Shin, T. Fingscheidt, H.-G. Kang: A priori SNR estimation using air- and bone-conduction microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 11 (2015) 2015–2025. [Google Scholar]
R.E. Bouserhal, T.H. Falk, J. Voix: In-ear microphone speech quality enhancement via adaptive filtering and artificial bandwidth extension. Journal of the Acoustical Society of America 141, 3 (2017) 1321–1331. [CrossRef] [PubMed] [Google Scholar]
H. Wang, X. Zhang, D. Wang: Fusing bone-conduction and air-conduction sensors for complex-domain speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022) 3134–3143. [CrossRef] [PubMed] [Google Scholar]
J. Hauret, T. Joubaud, V. Zimpfer, É. Bavu: Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023) 3499–3512. [CrossRef] [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Multi-microphone noise data augmentation for DNN-based own voice reconstruction for hearables in noisy environments, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, 2024, pp. 416–420. [Google Scholar]
C. Li, F. Yang, J. Yang: A two-stage approach to quality restoration of bone-conducted speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2024) 818–829. [Google Scholar]
C. Li, F. Yang, J. Yang: Restoration of bone-conducted speech with U-Net-like model and energy distance loss. IEEE Signal Processing Letters 31 (2024) 166–170. [Google Scholar]
A. Edraki, W.-Y. Chan, J. Jensen, D. Fogerty: Speaker adaptation for enhancement of bone-conducted speech, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, South Korea, 2024, pp. 10456–10460. [Google Scholar]
Y. Sui, M. Zhao, J. Xia, X. Jiang, S. Xia: TRAMBA: a hybrid transformer and Mamba architecture for practical audio and bone conduction speech super resolution and enhancement on mobile and wearable platforms, in: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. Vol. 8. New York, USA, 2024. [Google Scholar]
J. Richard, V. Zimpfer, S. Roth: Comparison of objective and subjective methods for evaluating speech quality and intelligibility recorded through bone conduction and in-ear microphones. Applied Acoustics 211 (2023) 109576. [CrossRef] [Google Scholar]
K. Tesch, T. Gerkmann: Insights into deep non-linear filters for improved multi-channel speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023) 563–575. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Speech-dependent data augmentation for own voice reconstruction with hearable microphones in noisy environments. EURASIP Journal on Audio, Speech, and Music Processing 2025 (2025) 32. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Low-complexity own voice reconstruction for hearables with an in-ear microphone, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Modeling of speech-dependent own voice transfer characteristics for hearables with an in-ear microphone. Acta Acustica 8 (2024) 28. [Google Scholar]
F. Denk, M. Lettau, H. Schepker, S. Doclo, R. Roden, M. Blau, J.-H. Bach, J. Wellmann, B. Kollmeier: A one-size-fits-all earpiece with multiple microphones and drivers for hearing device research, in: Proceedings of the AES International Conference on Headphone Technology, San Francisco, USA, 2019. [Google Scholar]
R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. Tyers, G. Weber: Common voice: a massively-multilingual speech corpus, in: Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, France, 2020, pp. 4218–4222. [Google Scholar]
H. Dubey, A. Aazami, V. Gopal, B. Naderi, S. Braun, R. Cutler, A. Ju, M. Zohourian, M. Tang, M. Golestaneh, R. Aichner: ICASSP 2023 deep noise suppression challenge. IEEE Open Journal of Signal Processing 5 (2024) 725–737. [Google Scholar]
International Telecommunications Union (ITU): ITU-T P.862, perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs, in: International Telecommunications Union, 2001. [Google Scholar]
International Telecommunications Union (ITU): ITU-T P.863, perceptual objective listening quality prediction (POLQA), in: International Telecommunications Union, 2018. [Google Scholar]
J.F. Santos, R. Bouserhal, J. Voix, T.H. Falk: Objective speech quality estimation of in-ear microphone speech, in: Proceedings of the 5th ISCA/DEGA Workshop on Perceptual Quality of Systems (PQS 2016), Berlin, Germany, 2016, pp. 69–73. [Google Scholar]
P. Bauer, C. Guillaumé, W. Tirry, T. Fingscheidt: On speech quality assessment of artificial bandwidth extension, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 2014, pp. 6082–6086. [Google Scholar]
J. Jensen, C.H. Taal: An algorithm for predicting the intelligibility of speech masked by modulated noise maskers. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 11 (2016) 2009–2022. [CrossRef] [Google Scholar]
C.H. Taal, R.C. Hendriks, R. Heusdens, J. Jensen: An algorithm for intelligibility prediction of time-frequency weighted noisy speech. IEEE Transactions on Audio, Speech, and Language Processing 19, 7 (2011) 2125–2136. [CrossRef] [Google Scholar]
T. Biberger, J.-H. Fleßner, R. Huber, S.D. Ewert: An objective audio quality measure based on power and envelope power cues. Journal of the Audio Engineering Society 66, 7/8 (2018) 578–593. [CrossRef] [Google Scholar]
B. Eurich, S.D. Ewert, M. Dietz, T. Biberger: A computationally efficient model for combined assessment of monaural and binaural audio quality. Journal of the Audio Engineering Society 72, 9 (2024) 536–551. [Google Scholar]
J.-H. Fleßner, T. Biberger, S.D. Ewert: Subjective and objective assessment of monaural and binaural aspects of audio quality. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, 7 (2019) 1112–1125. [Google Scholar]
T. Dau, D. Püschel, A. Kohlrausch: A quantitative model of the “effective” signal processing in the auditory system. I. Model structure. Journal of the Acoustical Society of America 99, 6 (1996) 3615–3622. [CrossRef] [PubMed] [Google Scholar]
R. Huber, B. Kollmeier: PEMO-Q – A new method for objective audio quality assessment using a model of auditory perception. IEEE Transactions on Audio, Speech, and Language Processing 14, 6 (2006) 1902–1911. [CrossRef] [Google Scholar]
A. Ragano, J. Skoglund, A. Hines: SCOREQ: speech quality assessment with contrastive regression, in: Advances in Neural Information Processing Systems. Vol. 37. 2024. [Google Scholar]
C.K.A. Reddy, V. Gopal, R. Cutler: DNSMOS P.835: a non-intrusive perceptual objective speech quality metric to evaluate noise suppressors, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 886–890. [Google Scholar]
International Telecommunications Union (ITU): ITU-T P.835, subjective test methodology for evaluating speech communication systems that include noise suppression algorithm, in: International Telecommunications Union, 2003. [Google Scholar]
International Telecommunications Union (ITU): ITU-T P.808, subjective evaluation of speech quality with a crowdsourcing approach, in: International Telecommunications Union, 2018. [Google Scholar]
P. Andreev, A. Alanov, O. Ivanov, D. Vetrov: HIFI++: a unified framework for bandwidth extension and speech enhancement, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023. [Google Scholar]
R. Huber, M. Krüger, B.T. Meyer: Single-ended prediction of listening effort using deep neural networks. Hearing Research 359 (2018) 40–49. [CrossRef] [PubMed] [Google Scholar]
R. Huber, A. Pusch, N. Moritz, J. Rennies, H. Schepker, B.T. Meyer: Objective assessment of a speech enhancement scheme with an automatic speech recognition-based system, in: Proceedings of the ITG Conference on Speech Communication, Oldenburg, Germany, 2018, pp. 86–90. [Google Scholar]
M. Krueger, M. Schulte, T. Brand, I. Holube: Development of an adaptive scaling method for subjective listening effort. Journal of the Acoustical Society of America 141, 6 (2017) 4680–4693. [CrossRef] [PubMed] [Google Scholar]
International Telecommunications Union (ITU): ITU-R BS.1534-3, method for the subjective assessment of intermediate sound quality (MUSHRA), in: International Telecommunications Union, 2015. [Google Scholar]
M. Schoeffler, S. Bartoschek, F.-R. Stöter, M. Roess, S. Westphal, B. Edler, J. Herre: Web-MUSHRA – A comprehensive framework for web-based listening tests. Journal of Open Research Software 6, 1 (2018). [Google Scholar]
S. Doclo, M. Moonen: GSVD-based optimal filtering for single and multimicrophone speech enhancement. IEEE Transactions on Signal Processing 50, 9 (2002) 2230–2244. [Google Scholar]
M. Souden, J. Chen, J. Benesty, S. Affes: Gaussian model-based multichannel speech presence probability. IEEE Transactions on Audio, Speech, and Language Processing 18, 5 (2009) 1072–1077. [Google Scholar]
S. Bagheri, D. Giacobello: Exploiting multi-channel speech presence probability in parametric multi-channel Wiener filter, in: Proceedings of the Interspeech, Graz, Austria, 2019, pp. 101–105. [Google Scholar]
J. Rennies, M. Ohlenbusch, A. Volgenandt, T. Spitz, H. Baumgartner, C. Rollwage, V. Uslar, V. Weber: Analyse und algorithmische Optimierung von Geräuschkulissen und Sprachkommunikation im OP-Saal, in: Proceedings of the German Annual Conference on Acoustics (DAGA), Hamburg, Germany, 2023, pp. 646–649. [Google Scholar]
R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2025. URL: https://www.R-project.org/. [Google Scholar]
J.G. Beerends, N.M.P. Neumann, E.L. van den Broek, A. Llagostera Casanovas, J.T. Menendez, C. Schmidmer, J. Berger: Subjective and objective assessment of full bandwidth speech quality. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2019) 440–449. [Google Scholar]
L. He, H. Hou, S. Shi, X. Shuai, Z. Yan: Towards bone-conducted vibration speech enhancement on head-mounted wearables, in: Proceedings of the Annual International Conference on Mobile Systems, Applications and Services, New York, USA, 2023, pp. 14–27. [Google Scholar]
A. Kuznetsova, A. Sivaraman, M. Kim: The potential of neural speech synthesis-based data augmentation for personalized speech enhancement, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023. [Google Scholar]
J.-S. Bae, A. Kuznetsova, D. Manocha, J. Hershey, T. Kristjansson, M. Kim: Generative data augmentation challenge: zero-shot speech synthesis for personalized speech enhancement, in: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW): Generative Data Augmentation for Real-World Signal Processing Applications (GenDA 2025), Hyderabad, India, 2025. [Google Scholar]
K. Zmolikova, M. Delcroix, T. Ochiai, K. Kinoshita, J. Černocký, D. Yu: Neural target speech extraction: an overview. IEEE Signal Processing Magazine 40, 3 (2023) 8–29. [Google Scholar]
J. Hauret, M. Olivier, T. Joubaud, C. Langrenne, S. Poirée, V. Zimpfer, É. Bavu: Vibravox: a dataset of French speech captured with body-conduction audio sensors. Speech Communication 172 (2025) 103238. [Google Scholar]
H. Pulakka, V. Myllylä, A. Rämö, P. Alku: Speech quality evaluation of artificial bandwidth extension: comparing subjective judgments and instrumental predictions, in: Proceedings of the Interspeech, Dresden, Germany, 2015, pp. 2583–2587. [Google Scholar]
P. Andreev, A. Alanov, O. Ivanov, D. Vetrov: HIFI++: a unified framework for bandwidth extension and speech enhancement, 2023. https://arXiv.org/abs/2203.13086. [Google Scholar]
J. Rennies, M. Berdau, R. Huber, H. Baumgartner, S. Weihe, T. Brand: Real-time assessment of listening effort using non-intrusive binaural prediction models, in: Proceedings of the DAS/DAGA 2025, Copenhagen, Denmark, 2025, pp. 18–21. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: German own voice recordings with hearable microphones. Zenodo, 2024. DOI: 10.5281/zenodo.10844599. URL: https://zenodo.org/records/10844599. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo: Transfer function measurements for simulating environmental noise at hearable microphones. Zenodo, 2024. DOI: 10.5281/zenodo.11196867. URL: https://zenodo.org/records/11196867. [Google Scholar]
M. Ohlenbusch, C. Rollwage, S. Doclo, J. Rennies: Subjective ratings and objective metric predictions of generic and personalized own voice reconstruction systems. Zenodo, 2025. DOI: 10.5281/zenodo.15248719. URL: https://zenodo.org/records/15248719. [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.