Multi-Source Localization Method Based on the Log-Mel Spectrum Augmented Noise Subspace

Haiwei Duan,Changchun Bao,Jing Zhou
DOI: https://doi.org/10.1109/icspcc59353.2023.10400324
2023-01-01
Abstract:The deep learning (DL) based direction-of-arrival (DOA) estimation is one of the research hotspots, and many methods have been proposed recently. However, most of those methods will face serious performance degradation, since the adverse impacts caused by the sources overlapping, noise and reverberation. One of the primary impacts is that the performance degradation is susceptible to some pre-extracted features that often result in spectral aliasing and peak confusion in a complex scenario. In this paper, a new feature stacked with the log-Mel spectrum and the noise subspace of the covariance matrix of the relative sound pressure is proposed and further used for the DL-based DOA estimation, which is referred to log-Mel spectrum augmented noise subspace (LMNS). The LMNS is more robust compared with the conventional features since it can represent both spectral and spatial information effectively. Meanwhile, the LMNS is used as the input feature and fed to a Conformer based residual network to map the spatial pseudo-spectrum, thereby the DOAs of the sound sources can be obtained. The experimental results show that the proposed method has better performance on the DOA estimation, which verifies that the proposed feature LMNS is more robust and effective in the scenarios with multi-source, noise and reverberation.
What problem does this paper attempt to address?