Direction specific ambisonics source separation with end-to-end deep learning

Topical Issue - Audio for Virtual and Augmented Reality

Open Access

This article has an erratum: [https://doi.org/10.1051/aacus/2023032]

Issue		Acta Acust. Volume 7, 2023 Topical Issue - Audio for Virtual and Augmented Reality


Article Number		29
Number of page(s)		12
DOI		https://doi.org/10.1051/aacus/2023020
Published online		16 June 2023

F. Zotter, M. Frank: Ambisonics: A practical 3D audio theory for recording, studio production, sound reinforcement, and virtual reality, ser. Springer Topics in Signal Processing, Vol. 19. Springer International, 2019. [CrossRef] [Google Scholar]
P. Guiraud, S. Hafezi, P.A. Naylor, A.H. Moore, J. Donley, V. Tourbabin, T. Lunner: An introduction to the speech enhancement for augmented reality (spear) challenge, in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), 5–8 September 2022, Bamberg, Germany. 2022. [Google Scholar]
J. Ahrens, H. Helmholz, D.L. Alon, S.V.A. Gari: Spherical harmonics decomposition of a sound field based on microphones around the circumference of a human head, in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 17–20 October 2021, New Paltz, NY, USA. 2021. [Google Scholar]
L. McCormack, A. Politis, R. Gonzalez, T. Lokki, V. Pulkki: Parametric ambisonic encoding of arbitrary microphone arrays. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022) 2062–2075. [CrossRef] [Google Scholar]
H. Teutsch: Modal array signal processing: principles and applications of acoustic wavefield decomposition, Vol. 348. Springer, 2007. [Google Scholar]
B. Rafaely: Fundamentals of spherical array processing. Springer Berlin Heidelberg, New York, NY, 2014. [Google Scholar]
D.P. Jarrett, E.A. Habets, P.A. Naylor: Theory and applications of spherical microphone array processing, ser. Springer Topics in Signal Processing, Vol. 9. Springer International Publishing, Cham, 2017. [Online]. Available: http://link.springer.com/10.1007/978-3-319-42211-4. [CrossRef] [Google Scholar]
A.A. Nugraha, A. Liutkus, E. Vincent: Multichannel audio source separation with deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 9 (2016) 1652–1664. [CrossRef] [Google Scholar]
A. Ozerov, C. Fevotte: Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Processing 18, 3 (2009) 550–563. [Google Scholar]
N. Epain, C.T. Jin: Independent component analysis using spherical microphone arrays. Acta Acustica United with Acustica 98, 1 (2012) 91–102. [CrossRef] [Google Scholar]
M. Hafsati, N. Epain, R. Gribonval, N. Bertin: Sound source separation in the higher order ambisonics domain, in DAFx 2019 – 22nd International Conference on Digital Audio Effects, September 2019, Birmingham, United Kingdom. 2019, pp. 1–7. [Google Scholar]
J. Nikunen, A. Politis: Multichannel NMF for source separation with ambisonic signals, in 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), 17–20 September 2018, Tokyo, Japan. IEEE, 2018, p. 251255. [Google Scholar]
A.J. Munoz-Montoro, J.J. Carabias-Orti, P. Vera-Candeas: Ambisonics domain singing voice separation combining deep neural network and direction aware multichannel NMF, in 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 6–8 October 2021, Tampere, Finland. IEEE, 2021. [Google Scholar]
Y. Mitsufuji, N. Takamune, S. Koyama, H. Saruwatari: Multichannel blind source separation based on evanescent-region-aware non-negative tensor factorization in spherical harmonic domain. IEEE/ACM Transactions on Audio, Speech, and Language Processing 29 (2021) 607–617. [CrossRef] [Google Scholar]
M. Guzik, K. Kowalczyk: Wishart localization prior on spatial covariance matrix in ambisonic source separation using non-negative tensor factorization, in ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 23–27 May 2022, Singapore. IEEE, 2022, pp. 446–450. [Google Scholar]
Y. Mitsufuji, G. Fabbro, S. Uhlich, F.-R. Stoter: Music demixing challenge 2021. Frontiers in Signal Processing 1 (2022) 18. [CrossRef] [Google Scholar]
M. Cobos, J. Ahrens, K. Kowalczyk, A. Politis: An overview of machine learning and other data-based methods for spatial audio capture, processing, and reproduction. EURASIP Journal on Audio, Speech, and Music Processing 2022, 1 (Dec. 2022) 1–21. [Online]. Available: https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-022-00242-x. [CrossRef] [Google Scholar]
A. Bosca, A. Guerin, L. Perotin, S. Kitic: Dilated U-net based approach for multichannel speech enhancement from first-order ambisonics recordings, in 2020 28th European Signal Processing Conference (EUSIPCO), 18–21 January 2021, Amsterdam, Netherlands. IEEE, 2020, pp. 216–220. [Google Scholar]
T. Ochiai, M. Delcroix, R. Ikeshita, K. Kinoshita, T. Nakatani, S. Araki: Beam-TasNet: Time-domain audio separation network meets frequency-domain beamformer, in ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4–8 May 2020, Barcelona, Spain. IEEE, 2020, pp. 6384–6388. [Online]. Available: https://ieeexplore.ieee.org/document/9053575/. [Google Scholar]
T. Jenrungrot, V. Jayaram, S. Seitz, I. Kemelmacher-Shlizerman: The cone of silence: Speech separation by localization. Advances in Neural Information Processing Systems 33 (2020) 20,925–20,938. [Google Scholar]
Online listening examples, http://research.spa.aalto.fi/publications/papers/acta22-sss/. [Google Scholar]
E. Vincent, H. Sawada, P. Bofill, S. Makino, J.P. Rosca: First stereo audio source separation evaluation campaign: data, algorithms and results, in International Conference on Independent Component Analysis and Signal Separation, 9–12 September 2007, London, United Kingdom. Springer, 2007, pp. 552–559. [Google Scholar]
H.L. Van Trees, Detection, estimation, and modulation theory. 4: Optimum array processing. Wiley, New York, NY, 2002. [Google Scholar]
F. Lluís, N. Meyer-Kahlen, V. Chatziioannou, A. Hofmann: A deep learning approach for angle specific source separation from raw ambisonics signals, in DAGA, 21–24 March 2022, Stuttgart, Germany. 2022. [Google Scholar]
A. Défossez, N. Usunier, L. Bottou, F. Bach: Music source separation in the waveform domain. 2019, ArXiv preprint: arXiv:1911.13254. [Google Scholar]
O. Ronneberger, P. Fischer, T. Brox: U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical image computing and computer-assisted intervention, October 5–9, 2015, Munich, Germany. Springer, 2015, pp. 234–241. [Google Scholar]
Y.N. Dauphin, A. Fan, M. Auli, D. Grangier: Language modeling with gated convolutional networks, in International Conference on Machine Learning, PMLR, August 2017, Sydney, Australia. 2017, pp. 933–941. [Google Scholar]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala: Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems 32 (2019) 8026–8037. [Google Scholar]
Z. Rafii, A. Liutkus, F.-R. Stöter, S.I. Mimilakis, R. Bittner: Musdb18-hq – an uncompressed version of musdb18. Aug. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3338373. [Google Scholar]
S. Wisdom, H. Erdogan, D.P. Ellis, R. Serizel, N. Turpault, E. Fonseca, J. Salamon, P. Seetharaman, J.R. Hershey: What’s all the fuss about free universal sound separation data?, in ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2021, Toronto/Virtual, Canada. IEEE, 2021, pp. 186–190. [Google Scholar]
H. Kuttruff: Room acoustics, 6th ed. CRC Press, Boca Raton, NY, 2017. [Google Scholar]
J. Le Roux, S. Wisdom, H. Erdogan, J.R. Hershey: SDR – half-baked or well done?, in 2019 ICASSP, 12–17 May 2019, Brighton, United Kingdom. IEEE, 2019, pp. 626–630. [Google Scholar]
R.H. Hardin, N.J.A. Sloane: McLaren’s improved snub cube and other new spherical designs in three dimensions. Discrete and Computational Geometry 15 (1996) 429–441. [CrossRef] [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.