Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification

Aditya Dawn,Wazib Ansar
2024-08-25
Abstract:Environmental Sound Classification is an important problem of sound recognition and is more complicated than speech recognition problems as environmental sounds are not well structured with respect to time and frequency. Researchers have used various CNN models to learn audio features from different audio features like log mel spectrograms, gammatone spectral coefficients, mel-frequency spectral coefficients, generated from the audio files, over the past years. In this paper, we propose a new methodology : Two-Level Classification; the Level 1 Classifier will be responsible to classify the audio signal into a broader class and the Level 2 Classifiers will be responsible to find the actual class to which the audio belongs, based on the output of the Level 1 Classifier. We have also shown the effects of different audio filters, among which a new method of Audio Crop is introduced in this paper, which gave the highest accuracies in most of the cases. We have used the ESC-50 dataset for our experiment and obtained a maximum accuracy of 78.75% in case of Level 1 Classification and 98.04% in case of Level 2 Classifications.
Sound,Artificial Intelligence,Audio and Speech Processing
What problem does this paper attempt to address?
This paper attempts to solve the complex problems in Environmental Sound Classification (ESC). Different from speech recognition, environmental sounds have irregular structures in the time domain and frequency domain, which makes their classification more difficult. Specifically, the paper mainly focuses on the following issues: 1. **Complexity of environmental sounds**: There are a wide variety of environmental sounds and the background noise is complex, such as barking dogs, birdsong, knocking on the door, the sound of vacuum cleaners, etc. These sounds do not have a clear time and frequency structure like speech. 2. **Limitations of existing methods**: Although previous studies have used various Convolutional Neural Network (CNN) models to learn audio features, most methods directly classify audio files into 50 categories and fail to fully utilize the hierarchical structure information of audio signals. 3. **Improving classification accuracy**: In order to improve the accuracy of environmental sound classification, the paper proposes a new two - layer classification method and studies the influence of different audio filters on the pre - trained model. ### Solutions proposed in the paper To solve the above problems, the paper proposes the following innovations: 1. **Two - layer classification method**: - **Level 1 Classifier**: Classify audio signals into broader categories, such as animals, birds, natural sounds, etc. - **Level 2 Classifiers**: According to the output of the Level 1 Classifier, further subdivide into specific sub - categories. For example, if the Level 1 Classifier classifies the audio as "animal", the Level 2 Classifier will further determine whether it is a dog, a cow or a sheep, etc. 2. **Introduction of new audio processing methods**: - **Audio Crop**: Fill the silent parts in the audio file by repeating the non - zero part of the audio segment to keep the audio length consistent. - **Other audio filters**: Including low - pass filters, high - pass filters, band - pass filters and band - stop filters, which are used to remove unnecessary frequency components, thereby improving the classification performance. 3. **Experimental verification**: - Use the ESC - 50 data set for experiments. This data set contains 2,000 5 - second audio segments, covering 50 different sound categories. - The experimental results show that the Level 1 Classifier combined with Audio Crop and the EfficientNetB2 model has reached the highest classification accuracy of 78.75%, and the highest accuracy of the Level 2 Classifier has reached 98.04%. ### Summary By introducing a two - layer classification method and multiple audio processing techniques, this paper significantly improves the accuracy of environmental sound classification. This method not only solves the complexity problem of environmental sound classification, but also provides new ideas and technical means for future related research.