Sound Event Localization and Detection Using a Spatial Omni-Dimensional Dynamic Interactions Network

Tongyang Dao,Min Guo,Miao Ma
DOI: https://doi.org/10.1007/s11760-023-02901-8
2023-01-01
Abstract:To improve the performance of real sound event localization and detection (SELD), we developed a new architecture based on a spatial omni-dimensional dynamic interactions (SODI) network. The proposed new architecture (SODI-SELD) is mainly composed of ACSmixBlock, SODBlock, and ConformerBlock. ACSmixBlock mixes self-attention, convolution, and SoftPool to extract richer channel features. SODBlock extracts adjacent features using omni-dimensional adaptive gated convolution (ODAgConv) and implements higher-order spatial interactions in a recursive manner to extract deeper channel features. These two modules improve channel feature extraction in terms of depth and breadth, while ConformerBlock improves modeling capabilities. The whole SODI-SELD architecture can reduce the information loss of sound event downsampling by SoftPool and use multi-head attention to prevent training overfitting. Experimental results on a real dataset with a maximum overlap of five show that the SODI-SELD architecture outperforms the Baseline model, where the F_20^∘ (macro) and LR_CD metrics improve by 8.2% and 8.5% , respectively, and the LE_CD metric decreases by 6.6^∘ . The code is available at https://github.com/daotongyang/SODI-SELD.git .
What problem does this paper attempt to address?