Large Scale Environmental Sound Classification Based on Efficient Feature Extraction.

Xiaoyan Wang,Hao Zhou,Zhi Liu,Yu Gu
DOI: https://doi.org/10.1109/icppw.2016.64
2016-01-01
Abstract:In recent years, plenty of studies endeavor to analyze the life auditory scenarios via mining non-speech sounds. Conventional audio recognition schemes clearly bound the feature extraction and recognition stages, such as in speech recognition. However, such separation leads to inconsistency in the purposes at each stage. The recognition stage contributes to portray the global data distribution focusing on "relationship" between signal samples. However, such consideration can hardly be embedded into feature extraction process which centered on the local structure; thus, the prominent "relation" information is destroyed. In this paper, we propose a unified acoustic recognition framework taking advantage of primitive feature input without injuring discriminant information and adopting effective classification scheme accordingly. We formulate the sound into subspace representation and initially adopt Grassmannian distance to classify the subspace-indexed non-speech sounds. To validate the proposed framework, we conducted experiments using RWCP Sound Scene Database. The experimental results demonstrated the proposed framework achieved fine recognition performance with high efficiency.
What problem does this paper attempt to address?