Multiple Sound Sources Localization Using Sub-Band Spatial Features and Attention Mechanism
Dongzhe Zhang,Jianfeng Chen,Jisheng Bai,Mou Wang,Muhammad Saad Ayub,Qingli Yan,Dongyuan Shi,Woon-Seng Gan
DOI: https://doi.org/10.1007/s00034-024-02925-6
IF: 2.311
2024-12-15
Circuits Systems and Signal Processing
Abstract:Deep learning based sound source localization is a growing research topic for wireless acoustic sensor networks. However, current methods just combine the DOA estimates provided by each microphone array node or use end-to-end architecture with multi-channel features of the arrays. The above methods suffer from performance degradation in high noise and reverberation environments. In this paper, we propose a deep learning-based method using spatial spectrum features and attention mechanisms to estimate the locations of sound sources. We first propose a new set of features to represent the spatial information in multiple frequency bands. By using sub-band spatial representations, the model can adequately utilize the geometric properties and the spatial spectrum of the array nodes. Then we propose to use a CNN-Transformer-based network to identify the correct peaks and suppress spurious peaks by modeling both local and global information from the spatial spectrum features. To evaluate the proposed method, we perform experiments using simulated datasets with different noises and reverberations, as well as different arrays and source positions. Experimental results show that the proposed method achieves the lowest RMSE and the highest F1 score compared with baseline methods. Further analysis demonstrates that the proposed method has robust sound sources localization performance in simulations and real data experiments.
engineering, electrical & electronic