Polyphonic Sound Event Detection Based on CapsNet-RNN and Post Processing Optimization

Liujun zhang,Liyan Luo,Mei Wang,Xiyu Song,Shuting Guo,Wenshan Chu,Jingwen Tan
DOI: https://doi.org/10.1109/icisce50968.2020.00208
2020-01-01
Abstract:Sound event detection (SED) aims to detect the onset and offset times of sound events and gives a label to each event. In real-world, polyphonic SED is a challenging task because multiple sound events may occur at the same time. Recently, deep learning has offered valuable techniques for it, such as convolutional neural networks (CNN) and recurrent neural networks (RNN). In this paper, we design a new model that a combination of the capsule neural networks (CapsNet) and RNN, for the detection layers, we use various temporal segmentation methods and optimization algorithms to obtain the best performing threshold. CapsNet overcomes some limitations of CNN, such as the loss of position information after max-pooling. The RNN mainly models the temporal dependency of context information. The combination of CapsNet and RNN greatly enhances the relationships of part and whole, and improves the experiment performance in polyphonic SED. Experiments on the TUT-Sound Events 2017 show that the proposed approach improves ER by 21% and F1-score by 39.6% absolute compared with the state-of-the-art algorithms, respectively.
What problem does this paper attempt to address?