Abstract:Environmental Sound Classification is an important problem of sound recognition and is more complicated than speech recognition problems as environmental sounds are not well structured with respect to time and frequency. Researchers have used various CNN models to learn audio features from different audio features like log mel spectrograms, gammatone spectral coefficients, mel-frequency spectral coefficients, generated from the audio files, over the past years. In this paper, we propose a new methodology : Two-Level Classification; the Level 1 Classifier will be responsible to classify the audio signal into a broader class and the Level 2 Classifiers will be responsible to find the actual class to which the audio belongs, based on the output of the Level 1 Classifier. We have also shown the effects of different audio filters, among which a new method of Audio Crop is introduced in this paper, which gave the highest accuracies in most of the cases. We have used the ESC-50 dataset for our experiment and obtained a maximum accuracy of 78.75% in case of Level 1 Classification and 98.04% in case of Level 2 Classifications.

What problem does this paper attempt to address?

This paper attempts to solve the complex problems in Environmental Sound Classification (ESC). Different from speech recognition, environmental sounds have irregular structures in the time domain and frequency domain, which makes their classification more difficult. Specifically, the paper mainly focuses on the following issues: 1. **Complexity of environmental sounds**: There are a wide variety of environmental sounds and the background noise is complex, such as barking dogs, birdsong, knocking on the door, the sound of vacuum cleaners, etc. These sounds do not have a clear time and frequency structure like speech. 2. **Limitations of existing methods**: Although previous studies have used various Convolutional Neural Network (CNN) models to learn audio features, most methods directly classify audio files into 50 categories and fail to fully utilize the hierarchical structure information of audio signals. 3. **Improving classification accuracy**: In order to improve the accuracy of environmental sound classification, the paper proposes a new two - layer classification method and studies the influence of different audio filters on the pre - trained model. ### Solutions proposed in the paper To solve the above problems, the paper proposes the following innovations: 1. **Two - layer classification method**: - **Level 1 Classifier**: Classify audio signals into broader categories, such as animals, birds, natural sounds, etc. - **Level 2 Classifiers**: According to the output of the Level 1 Classifier, further subdivide into specific sub - categories. For example, if the Level 1 Classifier classifies the audio as "animal", the Level 2 Classifier will further determine whether it is a dog, a cow or a sheep, etc. 2. **Introduction of new audio processing methods**: - **Audio Crop**: Fill the silent parts in the audio file by repeating the non - zero part of the audio segment to keep the audio length consistent. - **Other audio filters**: Including low - pass filters, high - pass filters, band - pass filters and band - stop filters, which are used to remove unnecessary frequency components, thereby improving the classification performance. 3. **Experimental verification**: - Use the ESC - 50 data set for experiments. This data set contains 2,000 5 - second audio segments, covering 50 different sound categories. - The experimental results show that the Level 1 Classifier combined with Audio Crop and the EfficientNetB2 model has reached the highest classification accuracy of 78.75%, and the highest accuracy of the Level 2 Classifier has reached 98.04%. ### Summary By introducing a two - layer classification method and multiple audio processing techniques, this paper significantly improves the accuracy of environmental sound classification. This method not only solves the complexity problem of environmental sound classification, but also provides new ideas and technical means for future related research.

Studying the Effect of Audio Filters in Pre-Trained Models for Environmental Sound Classification

Environmental Sound Classification Using Local Binary Pattern and Audio Features Collaboration

Spectral images based environmental sound classification using CNN with meaningful data augmentation

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Environment Sound Classification using Multiple Feature Channels and Attention based Deep Convolutional Neural Network

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network

Robust Audio Sensing with Multi-Sound Classification.

A Comparative Study on Approaches to Acoustic Scene Classification using CNNs

Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

SoundCLR: Contrastive Learning of Representations For Improved Environmental Sound Classification

Attention based Convolutional Recurrent Neural Network for Environmental Sound Classification

Advanced Framework for Animal Sound Classification With Features Optimization

Using audio content and emotional response to predict soundscape perception through machine learning

Environmental Sound Classification Based on CAR-Transformer Neural Network Model

[Frontal tumor revealed by mega-stomach].

Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data

Automatic Rain and Cicada Chorus Filtering of Bird Acoustic Data

Learning Frame Level Attention for Environmental Sound Classification

An Ensemble One Dimensional Convolutional Neural Network with Bayesian Optimization for Environmental Sound Classification