Improving Sound Event Localization and Detection with Class-Dependent Sound Separation for Real-World Scenarios

Shi Cheng,Jun Du,Qing Wang,Ya Jiang,Zhaoxu Nian,Shutong Niu,Chin-Hui Lee,Yu Gao,Wenbin Zhang
DOI: https://doi.org/10.1109/apsipaasc58517.2023.10317385
2023-01-01
Abstract:In this study, we propose a novel approach to sound event localization and detection (SELD) by using sound separation (SS) models to tackle key challenges of a high percentage of overlapped segments between sound events and imbalanced distributions of sound event classes in real-world scenarios. Specifically, we introduce class-dependent SS models to deal with overlapping mixtures and extract features from the SS model as prompts for SELD of a specific event class. The proposed SS-SELD method enhances the overall performance of the SELD system, resulting in improved accuracy and robustness in real-world scenarios. In contrast to many other classification methods that can be affected by the interference events, the proposed class-dependent SS framework enhances the overall performance of the SELD system, resulting in improved accuracies and robustness in real-world scenarios. When evaluated on the Sony-TAu Realistic Spatial Soundscapes 2023 (STARSS23) dataset, we demonstrate significant improvements in both sound event detection (SED) and direction-of-arrival (DOA) estimation. Our findings suggest that sound separation is a promising strategy to enhance the performance of SELD systems, particularly in scenarios with high overlaps between sound events and imbalanced distributions of event classes. In addition, our proposed framework had contributed building to our champion systems submitted to the Challenge of DCASE 2023 Task 3.
What problem does this paper attempt to address?