Sound Event Localization and Detection Based on Iterative Separation in Embedding Space

Zeyu Yuan,Donghang Wu,Xihong Wu,Tianshu Qu
DOI: https://doi.org/10.1109/icicsp59554.2023.10390682
2023-01-01
Abstract:Current Sound Event Localization and Detection(SELD) methods mainly adopt the output format from SELDnet that the Direction Of Arrival(DOA) prediction is for each category rather than event, thus these methods cannot handle the simultaneous occurrence of the same type of sound event in different directions. Although track-wise based methods could detect the homogeneous overlap, they are still limited to the need to know the maximum number of overlapping sound sources. In order to solve these problems, we propose a SELD method based on iterative separation in embedding space: Sep-SELD. Our localization and detection are performed on each single event, instead of locating and detecting all events at the same time. This is done by introducing separation in the embedding space. Meanwhile, to deal with the inconsistent and potential unknown number of active events in different frames, the separation is performed in an iterative manner. We conduct experiments on the DCASE2020 TASK3 dataset, and the results show that the proposed method has comparable performance to track-wise methods and flexibility to handle overlapping events without retraining from scratch.
What problem does this paper attempt to address?