NAS-DYMC: NAS-Based Dynamic Multi-Scale Convolutional Neural Network for Sound Event Detection

Jun Wang,Peng Yao,Feng Deng,Jianchao Tan,Chengru Song,Xiaorui Wang
DOI: https://doi.org/10.1109/ICASSP49357.2023.10096621
2023-01-01
Abstract:CNN+RNN models have become the mainstream approach for semi-supervised sound event detection, and the CNN part is mainly a stack of several 2D convolutional layers to capture the representations of the time-frequency features. However, conventional 2D convolution is of limited ability in capturing detailed information about acoustic events. In this paper, to enhance the representation ability of CNN, we propose NAS-DYMC, a NAS-based dynamic multi-scale convolutional neural network to extract a more effective acoustic representation. Specifically, multi-scale convolution can capture the characteristics of sound events with different time-frequency distributions and dynamic convolution enhances the representation capability of conventional convolution by adapting attention weights onto basis kernels. Furthermore, a neural architecture search (NAS) method is adopted to find the optimal network architecture from the search space consisting of various dynamic multi-scale convolutions for the DCASE 2021 Task4 dataset. Experimental results demonstrate the superiority of our proposed method.
What problem does this paper attempt to address?