Abstract:In this paper, we investigate high-resolution modeling units of deep neural networks (DNNs) from concrete to abstract for acoustic scene classification based on Gaussian mixture model (GMM) and ergodic hidden Markov model (HMM). A direct modeling strategy for DNN to classify acoustic scenes is to map each frame feature of an audio to one scene category. However, all frames tagged with the same label may not be the best choice because the representative pattern of an audio is sparse. GMM is also often employed to model each acoustic scene directly as a generative model. Because the multiple Gaussians in a GMM model have different levels of contribution, and each Gaussian can be seen as a subclass of the scene category, so we can utilize the subclass of GMM as a bit abstract modeling unit to adopt DNN-GMM system. When single scene category is subdivided into various subclasses, prior scores for each subclass calculated from training set are stored as one part of model to response the sparseness of representative pattern. Ergodic HMM should be more appropriate to model the acoustic scenes than GMM due to the uncertain structure of scene audio. Using HMM states as modeling units, we build DNN-HMM hybrid system. By comparison, we find high-resolution modeling units are more effective than direct modeling. The final system is obtained by performing system combination to take advantage of the complementarity of different-level modeling units. Experiments on acoustic scene classification task of DCASE2016 challenge show that our final system yields 25.9% relative error rate reduction compared with a GMM baseline on evaluation set.

Auditory Scene Classification with Deep Belief Network.

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.

Utterance-Based Audio Sentiment Analysis Learned by a Parallel Combination of CNN and LSTM.

Using Deep Belief Network to Capture Temporal Information for Audio Event Classification.

Deep Neural Network Based Environment Sound Classification and Its Implementation on Hearing Aid App

Application of Deep Belief Networks for natural language understanding

Hierarchical learning for DNN-based acoustic scene classification

An Investigation of High-Resolution Modeling Units of Deep Neural Networks for Acoustic Scene Classification

Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks

A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models.

Deep Belief Networks Based Voice Activity Detection

Classification of Lung Nodules Based on Convolutional Deep Belief Network

Deep Neural Decision Forest for Acoustic Scene Classification

Audio Event Recognition Based on DBN Features from Multiple Filter-Bank Representations.

A Deep Neural Network for Audio Classification with a Classifier Attention Mechanism

Denoising Deep Neural Networks Based Voice Activity Detection

A Comparison of deep learning methods for environmental sound

Ensemble Of Deep Neural Networks For Acoustic Scene Classification

Improved Classification Based on Deep Belief Networks

Digital Audio Scene Recognition Method Based on Machine Learning Technology

Improving Unsupervised Anomalous Sound Detection Performance of Autoencoder and Its Variant with Pretrained Deep Belief Network