Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Yousef Abd Al-Hattab,Hasan Firdaus Zaki,Amir Akramin Shafie
DOI: https://doi.org/10.1007/s00521-021-06091-7
2021-05-26
Neural Computing and Applications
Abstract:The classification of environmental sounds is important for emerging applications such as automatic audio surveillance, audio forensics, and robot navigation. Existing techniques combined multiple features and stacked many CNN layers (very deep learning) to reach the desired accuracy. Instead of using many features and going deeper by stacking layers that are resource extensive, this paper proposes a novel technique that uses only a single feature, namely the Mel-Frequency Cepstral Coefficient (MFCC) and just three layers of CNN. We demonstrate that such a simple network can considerably outperform several conventional and deep learning-based algorithms. Through parameters fine-tuning of the data input, we reported a model that is significantly less complex in the architecture yet has recorded a similar accuracy of 95.59% compared to state-of-the-art deep models on UrbanSound8k dataset.
computer science, artificial intelligence
What problem does this paper attempt to address?