Environmental Sound Recognition Based on Double-input Convolutional Neural Network Model

Xiyu Song,Wenshan Chu,Man Yao,Shuting Guo,Mei Wang,Lu Bai,Liyan Luo,Xin Liu
DOI: https://doi.org/10.1109/ICCASIT50869.2020.9368517
2020-10-14
Abstract:Most environmental sound recognition models use acoustic features like log-mel spectrogram (Logmel) or mel frequency cepstral coefficient (MFCC) as an input training network model in recent years, but the result of recognition is unsatisfactory. These acoustic features were originally designed for speech recognition and music recognition, which may not represent environmental sound comprehensively. In this paper, we designed a double-input convolutional neural network model, adopting Logmel features and raw waveforms as inputs and extracting the respective features separately using a convolutional neural network, after that performing feature combination and recognition. The network model performs feature extraction directly on raw waveforms and can extract some specific information that may not be contained in these acoustic features, which is complementary to Logmel features and improves the feature’s ability to represent environmental sound. The experiments over the GoogleAudioSet dataset showed that the proposed network model achieves a recognition result of 95.1%, which is better than the network model using a single feature and combining multiple acoustic features as input.
Computer Science,Engineering,Environmental Science
What problem does this paper attempt to address?