Large-scale audio feature extraction and SVM for acoustic scene classification

Jürgen T. Geiger,Björn Schuller,Gerhard Rigoll,Jurgen T. Geiger,Bjorn Schuller
DOI: https://doi.org/10.1109/waspaa.2013.6701857
2013-10-01
Abstract:This work describes a system for acoustic scene classification using large-scale audio feature extraction. It is our contribution to the Scene Classification track of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE). The system classifies 30 second long recordings of 10 different acoustic scenes. From the highly variable recordings, a large number of spectral, cepstral, energy and voicing-related audio features are extracted. Using a sliding window approach, classification is performed on short windows. SVM are used to classify these short segments, and a majority voting scheme is employed to get a decision for longer recordings. On the official development set of the challenge, an accuracy of 73% is achieved. SVM are compared with a nearest neighbour classifier and an approach called Latent Perceptual Indexing, whereby SVM achieve the best results. A feature analysis using the t-statistic shows that mainly Mel spectra are the most relevant features.
What problem does this paper attempt to address?