Sparse DNN Model for Frequency Expanding of Higher Order Ambisonics Encoding Process

Shan Gao,Jing Lin,Xihong Wu,Tianshu Qu
DOI: https://doi.org/10.1109/taslp.2022.3153266
2022-01-01
IEEE/ACM Transactions on Audio Speech and Language Processing
Abstract:The performance of higherorder Ambisonics (HOA) signals obtained using spherical harmonics decomposition method is disturbed by two primary sources of errors, the noise pollution in low-frequency band and the spatial aliasing in high-frequency band. Inspired by the HOA signals upscale method, which is performed using the sparse character of the sound field, this paper propose a sound field decomposition model based on a sparse deep neural network that offers HOA signals with wider frequency bandwidth. We use the frequency domain multi-scale convolutional network to realize the spherical harmonics decomposition, as well as learning the spatial aliasing pattern, based on which the aliasing-free HOA signals can be derived. Besides, we apply a sparse encoding network to cpature the sparse feature of the sound field which will improve the model performance when the sparse condition is satisfied. The experiments results prove that the proposed model can obtain HOA signals with wider frequency range of operation under multiple sources (up to 10 sources) and low reverberant environments ($T_{60}\le$ 400 ms). When the sparsity feature cannot be satisfied ($T_{60} =$ 800 ms), the proposed network model still maintain the same performance as the traditional methods.
What problem does this paper attempt to address?