An Experimental Study on Sound Event Localization and Detection under Realistic Testing Conditions

Shutong Niu,Jun Du,Qing Wang,Li Chai,Huaxin Wu,Zhaoxu Nian,Lei Sun,Yi Fang,Jia Pan,Chin-Hui Lee
DOI: https://doi.org/10.1109/icassp49357.2023.10094681
2023-01-01
Abstract:We study four data augmentation (DA) techniques and two model architectures on realistic data for sound event localization and detection (SELD). First, based on ResNet-Conformer (RC), we compare the four DA approaches on the realistic DCASE 2022 SELD test set which is often not easy to handle due to room reverberations and audio overlaps in spontaneous recordings. Experimental results show that, except for audio channel swapping (ACS), the other three data augmentation methods that work well on the simulated SELD data set are no longer effective due to mismatches between simulated and realistic conditions. Next, using ACS-based augmentation, the two improved ResNet-Conformer networks further enhance SELD performances in realistic conditions. By incorporating these two sets of techniques, our overall system ranked the first place in SELD task of the DCASE 2022 Challenge.
What problem does this paper attempt to address?