Multi-Scale Recurrent Neural Network for Sound Event Detection

Rui Lu,Zhiyao Duan,Changshui Zhang
DOI: https://doi.org/10.1109/icassp.2018.8462006
2018-01-01
Abstract:Sound event detection (SED) in real life is an interesting but challenging task due to the polyphonic and long-term dependent nature of sound events. Recently, multi-label recurrent neural networks (RNNs) have shown promises. However, even equipped with long short-term memory (LSTM) or gated recurrent unit (GRU) cells, RNNs are still limited to model the long-term dependency. In this paper, we propose a multiscale RNN to address this issue. By integrating information from different time resolutions, we can better capture both the fine-grained and long-term dependencies of sound events. We experiment on the development sets of Task3 of DCASE2016 and DCASE2017. Compared to our previously proposed single-scale RNN that won the third place among the 13 teams in Task3 of DCASE2017, the proposed multiscale model achieves statistically significantly better performance on the development datasets of both DECASE2016 and DCASE2017.
What problem does this paper attempt to address?