IFAN: An Icosahedral Feature Attention Network for Sound Source Localization
Xin-Cheng Zhu,Hong Zhang,Hui-Tao Feng,Deng-Huang Zhao,Xiao-Jun Zhang,Zhi Tao
DOI: https://doi.org/10.1109/tim.2023.3348907
IF: 5.6
2024-01-01
IEEE Transactions on Instrumentation and Measurement
Abstract:Currently, sound source localization (SSL) techniques based on deep learning mainly rely on traditional signal processing methods to generate input features. Nevertheless, the applicability of these features in various environments shows significant differences. This study proposes a new single SSL model, called the icosahedral feature attention network (IFAN), to overcome this limitation. The proposed IFAN not only uses steered response power with phase transform (SRP-PHAT) but also develops steered response power with least mean square (SRP-LMS) as inputs of the network. The IFAN network encodes spatial position information into convolution kernels by introducing icosahedral convolutions. In addition, it adaptively learns optimal feature weights based on the input acoustic environment using the sigmoid function to capture the spatial distribution information of the sound source. For single-source SSL and tracking scenarios, the proposed method on the localization and tracking (LOCATA) challenge data corpus outperforms other state-of-the-art models. Moreover, it is capable of learning complementary information even in acoustic simulations involving a wide range of reverberations. The proposed IFAN can thus enhance the robustness and performance in different environments.
engineering, electrical & electronic,instruments & instrumentation