Abstract:This paper presents a new approach to classify environmental sounds using a texture feature local binary pattern (LBP) and audio features collaboration. To our knowledge, this is the first time that the LBP (or its variants), which has a proven track record in the field of image recognition and classification, has been generalized for 1D and combined with audio features for an environmental sound classification task. To this end, we have generalized and defined LBP-1D and local phase quantization (LPQ)-1D on the 1-dimensional (1D) audio signal and have applied the original LBP, the variance LBP (VARLBP) and the extended LBP (ELBP) thus generated to the spectrogram of the audio signal in order to model the sound texture. We have also extensively compared these new LBP-based features to the classical audio descriptors commonly used in environmental sound classification, such as MFCC, GFCC, CQT, chromagram, STE and ZCR. We have evaluated our algorithm on ESC-10 and ESC-50 datasets using classical machine learning algorithms, such as support vector machines (SVM), random forest and k-nearest neighbor (kNN). The results showed that the LBP features outperform the classical audio features. We mix the LBP features with the audio descriptors, and our best mixed model achieves state-of-the-art results for environmental sound classification: 88.5<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0" height="0.343ex" style="vertical-align: -0.171ex;" viewBox="0 -73.8 0 147.5" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"></g></svg></span> on ESC-10 and 64.6<span class="mjpage"><svg xmlns:xlink="http://www.w3.org/1999/xlink" width="0" height="0.343ex" style="vertical-align: -0.171ex;" viewBox="0 -73.8 0 147.5" role="img" focusable="false" xmlns="http://www.w3.org/2000/svg"><g stroke="currentColor" fill="currentColor" stroke-width="0" transform="matrix(1 0 0 -1 0 0)"></g></svg></span> on ESC-50. Those results outperform the results of methods that used handcrafted features with classical machine learning algorithms and are similar to some convolutional neural network-based methods. Although our method is not the cutting edge of the state-of-the-art methods, it is faster than any convolutional neural network methods and represents a better choice when there is data scarcity or minimal computing power.<svg xmlns="http://www.w3.org/2000/svg" style="display: none;"><defs id="MathJax_SVG_glyphs"></defs></svg>

Using audio content and emotional response to predict soundscape perception through machine learning

Predicting Emotions Perceived from Sounds

Interpretable and Robust Machine Learning for Exploring and Classifying Soundscape Data

Audio-Based Emotion Recognition Using Self-Supervised Learning on an Engineered Feature Space

Environmental Sound Classification Using Local Binary Pattern and Audio Features Collaboration

Audio Recognition using Mel Spectrograms and Convolution Neural Networks

Robust Audio Sensing with Multi-Sound Classification.

Construction of AI Environmental Music Education Application Model Based on Deep Learning

Spectral images based environmental sound classification using CNN with meaningful data augmentation

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

A Comparison of deep learning methods for environmental sound

THE CONSTRUCTION OF A NEURAL NETWORK MODEL FOR SPEECH EMOTION RECOGNITION

Self-Supervised Learning for Audio-Based Emotion Recognition

Artificial intelligence-based collaborative acoustic scene and event classification to support urban soundscape analysis and classification

A novel hybrid ensemble approach to enhance the acoustic event classification in environmental sound analysis

An Ensemble One Dimensional Convolutional Neural Network with Bayesian Optimization for Environmental Sound Classification

Improving the Environmental Perception of Autonomous Vehicles using Deep Learning-based Audio Classification

Fusion of electroencephalographic dynamics and musical contents for estimating emotional responses in music listening

End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network

Using Deep Learning to Recognize Therapeutic Effects of Music Based on Emotions

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction