An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

Hengshun Zhou,Xue Bai,Jun Du
DOI: https://doi.org/10.1109/iscslp.2018.8706712
2018-01-01
Abstract:One main challenge for acoustic scene classification (ASC) is there are remarkable overlaps and similarities between different acoustic scenes. However, the most existing ASC tasks are always lack of adequate training data to well distinguish different classes, especially in the deep learning approaches, such as using convolutional neural network (CNN). Motivated by the success of the transfer learning mechanism from the image classification task (e.g., ImageNet) with a large amount of training data to other computer vision tasks with less training data [1], in this study we investigate the possibility of transfer learning between two quite different classification tasks with the inputs of 2D image signals and 1D audio signals. One strong motivation behind this is the spectrograms of the audio signal can be also considered as the 2D images which are potentially have the similar structures to those samples in the image classification task. Specifically, we conduct the transfer learning mechanism by adopting the pre-trained CNNs with different architectures from the ImageNet task to the DCASE2018 ASC subtask A. Furthermore, by leveraging more input channels and training data fragments, the classification accuracy of our proposed system is increased from 59.7% to 77.8% on the evaluation set, in comparison to the officially provided CNN system trained using only audio data.
What problem does this paper attempt to address?