Sound Event Localization and Detection Using Parallel Multi-attention Enhancement

Zhengyu Chen,Qinghua Huang
DOI: https://doi.org/10.1007/s00034-023-02489-x
2023-09-05
Abstract:As a combination of sound event detection and direction of arrival, the joint task of sound event localization and detection (SELD) is an emerging audio signal processing task and is applied in many areas widely. A popular convolutional recurrent neural network (CRNN)-based method uses convolution neural network (CNN) to extract high-level space features from manually designed features and utilizes recurrent neural network to model sequence context information. Some studies have shown that the normal CNN could not be robust in challenging acoustic environments such as overlapping, moving and discontinuous sources. To improve the performance of SELD in more complex acoustic scenes, parallel multi-attention enhancement (PMAE) is proposed as a convolution enhancement method to boost the representation ability of CNN in this paper. PMAE consists of attention feature enhancement (AFE) and parallel multi-attention (PMA) block. PMA, embedded into AFE, extracts boosting global–local features by efficient attention modules along with different dimensions. AFE, as a feature fusion strategy, fuses multi-scale enhanced features to improve feature representation. AFE shows great performance for overlapping sources. PMA adequately extracts characteristic information of different sound events and shows better performance on moving and discontinuous sources when it is combined with AFE. Based on such a framework, the SELD system becomes robust, while the target sources are moving and overlapping with unknown interference classes. The simulations show that proposed PMAE improves the performance enormously for SELD without other technologies, such as data augment and post-processing.
engineering, electrical & electronic
What problem does this paper attempt to address?