Environmental Sound Classification Based on Pitch Shifting

Wen Zhao,Bo Yin
DOI: https://doi.org/10.1109/scset55041.2022.00070
2022-01-01
Abstract:Environmental sound classification (ESC) is a challenging problem. Environmental sound classification is different from traditional music classification and speech classification because of its non-stationary characteristics. The use of convolutional neural network (CNN) to classify environmental sound is faced with the problems of unequal audio data sample length and single data, which leads to the generation of overfitting and low classification accuracy. In order to solve these problems, we proposed a waveform stretching method to solve the problem of unequal audio length, and adopted the audio data enhancement method with pitch shifting to prevent the over-fitting phenomenon caused by the single data. Log-Mel features are extracted and classified using an improved Lenet network. Our experimental results show that the proposed method achieves 95.65% accuracy on Urbansound8K.
What problem does this paper attempt to address?