Acoustic scene classification using deep CNN with fine-resolution feature
Tao Zhang,Jinhua Liang,Biyun Ding
DOI: https://doi.org/10.1016/j.eswa.2019.113067
IF: 8.5
2020-04-01
Expert Systems with Applications
Abstract:Convolutional neural networks with spectrogram feature representation for acoustic scene classification are attracting more and more attentions due to its favorable performance. However, most of the existing methods are still restricted to the tradeoff between the minimum coverage area across time-frequency feature representation, i.e. time-frequency feature resolution, and the depth of CNN models. Thus, it is unfeasible to improve the performance by simply deepening networks. In this paper, fine-resolution convolutional neural network (FRCNN) is proposed to embrace the progress in very deep architecture, feature fusion and convolutional operation. Specifically, lateral construction is applied to generate a fine-resolution feature map with semantic information, and depth-wise separable convolution is utilized to reduce the number of trainable parameters. Extensive experiments demonstrate that the proposed FRCNN exhibits high performance on several metrics, with low computational complexity.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science