Auditory Scene Classification with Deep Belief Network.

Like Xue,Feng Su
DOI: https://doi.org/10.1007/978-3-319-14445-0_30
2015-01-01
Abstract:Effective modeling and analyzing of an auditory scene is crucial to many context-aware and content-based multimedia applications. In this paper, we explore the effectiveness of the multiple-layer generative deep neural network model in discovering the underlying higher level and highly non-linear probabilistic representations from acoustic data of the unstructured auditory scenes. We first create a more compact and representative description of the input audio clip by focusing on the salient regions of data and modeling their contextual correlations. Next, we exploit deep belief network (DBN) to unsupervisedly discover and generate the high-level descriptions of scene audio as the activations of units on higher hidden layers of the trained DBN model, which are finally classified to certain category of scene by either the discriminative output layer of DBN or a separate classifier like support vector machine (SVM). The experiment reveals the effectiveness of the proposed DBN-based classification approach for auditory scenes.
What problem does this paper attempt to address?