Acoustic scene classification by feed forward neural network with class dependent attention mechanism

Jia-Ming Liu,Hui-Hui Wang,Mingyu You,Ruiwei Zhao
2016-01-01
Abstract:In the acoustic scene classification task, we proposed a novel attention mechanism embedded to feed forward networks. On top of a shared input layer, 15 separated attention modules are calculated for each class, and output 15 class dependent feature vectors. Then the feature vectors are mapped to class labels by 15 subnetworks. A softmax layer is employed on the very top of the network. In our experiments, the default feature, MFCC and mel filterbank with delta and acceleration, is used to represent each segment. We split each 30s audio recording into 1s segments and calculate label for the segment, then output the most frequent label for the 30s recording. The best single neural network could get 77.4% cross validation accuracy without further feature engineering and any data augmentation. We train 5 models with MFCC features and 5 models with mel filterbank features, then make an ensemble with majority vote, getting a 78.6% final cross validation result. For submission, the 10 models are retrained with full dataset. And, the final submission is a majority vote ensemble of the 10 models’ outputs.
What problem does this paper attempt to address?