SS-ESC: a spectral subtraction denoising based deep network model on environmental sound classification

Yunus Korkmaz
DOI: https://doi.org/10.1007/s11760-024-03649-5
IF: 1.583
2024-12-05
Signal Image and Video Processing
Abstract:Environmental Sound Classification (ESC), also referred as Sound Event Classification, is an essential part of many speech processing applications in terms of of separating background audio from original signal. By the recent developments in deep learning area, studies related to the ESC area have also been improved significiantly by the researchers. Because the nature of digital sound signals, the ESC was mostly developed using manually extracted one dimensional (1D) so far. In this paper, a novel ESC pipeline which uses spectral subtraction denoising as a preliminary stage was proposed based on deep learning architectures. The well-known deep learning architectures which are GoogLeNet, AlexNet, ShuffleNet, SqueezeNet and ResNet-18 were run over ESC problem by using ESC-10 benchmark dataset. Log-mel spectrogram images were preferred as feature matrices for mentioned networks. The results showed that the proposed SS-ESC model achieved the best results and outperformed many state-of-the-art methods with a test accuracy of 99.17% for the ESC-10 by the help of the AlexNet. These findings significiantly proved that the spectral subtraction denoising can contribute to the environmental sound classification problem in leveraging classification accuracy when it is used as a preliminary stage.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?