Sound Event Detection Via Conformer Recurrent Neural Networks

Fangqing Gao,Xin Li,Xiukun Wei
DOI: https://doi.org/10.1109/ccdc58219.2023.10327134
2023-01-01
Abstract:Sound Event Detection (SED) is a critical subject in machine listening that aims to mimic the capacity of the human auditory system. Recently, convolutional recurrent neural networks (CRNN) have attained state-of-the-art SED performance. Local time-frequency information of audio are extracted using the convolution module in CRNN. However, global information cannot be obtained due to the size of the convolution kernel. Convolution module is replaced with conformer block module for the shortcoming, which combines the advantages of transformer and convolutional neural networks to successfully describe the local and global interdependence of audio sequences. When compared to CNN, RNN, and CRNN models using the TUT-SED 2017 dataset, the proposed method can improve F1-score by 9.86% and reduce ER by 0.1235 in the development dataset and improve F1-score by 9.13% and reduce ER by 0.0836 in the evaluation dataset. Experimental results demonstrate the superiority and effectiveness of the proposed approach.
What problem does this paper attempt to address?