Abstract:Automatic sound classification attracts increasing research attention owing to its vast applications, such as robot navigation, environmental sensing, musical instrument classification, medical diagnosis, and surveillance. In this research, we propose an ensemble convolutional bidirectional Long Short-Term Memory (CBiLSTM) network with optimal hyper-parameter selection for undertaking sound classification. We first transform each audio signal into a spectrogram representation using the Short-time Fourier transform (STFT). A Particle Swarm Optimization (PSO) variant is subsequently proposed to optimize the learning rate, weight decay, numbers of filters and hidden units in the convolutional and BiLSTM layers, respectively, in order to extract effective spatial–temporal characteristics from the spectrogram inputs. To tackle the issue of stagnation in optimization, the proposed algorithm incorporates local exploitation using secant and Newton–Raphson methods, promising leader generation using regular and irregular super-ellipse formulae, and three-dimensional spherical search coefficients. Moreover, it takes into account multiple fused elite signals in conjunction with numerical analysis based exploitation to balance between diversification and intensification. A variety of CBiLSTM networks with distinctive optimized settings are devised. An ensemble model is then constructed by incorporating a set of three yielded networks based on a majority voting scheme. Evaluated using several audio data sets, our ensemble CBiLSTM networks outperform those with default and optimal settings identified by other search methods, existing deep architectures and state-of-the-art related studies. In addition to sound classification tasks, the proposed PSO algorithm also outperforms a number of classical and advanced search methods for solving diverse unimodal and multimodal benchmark functions with statistical significance.

Sound Classification Based on Multihead Attention and Support Vector Machine

Automatic Respiratory Sound Classification Via Multi-Branch Temporal Convolutional Network

Robust Audio Sensing with Multi-Sound Classification.

Advanced Framework for Animal Sound Classification With Features Optimization

Hierarchical Support Vector Machines for Audio Classification

Eco-Environmental Sound Classification Based on Matching Pursuit and Support Vector Machine

A Svm-based Classification Approach for Natural Sounds

Audio Classification and Segmentation Based on Support Vector Machines

Sound Classification Using Evolving Ensemble Models and Particle Swarm Optimization.

Robust sound event classification using deep neural networks

Environmental Sound Classification Using Temporal-Frequency Attention Based Convolutional Neural Network.

An Automatic Classification System for Environmental Sound in Smart Cities

A Spiking Neural Network Framework for Robust Sound Classification

A multi-view CNN-based acoustic classification system for automatic animal species identification

Acoustic Target Recognition Based on MFCC and SVM

Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset

Large Scale Environmental Sound Classification Based on Efficient Feature Extraction.

Multi-stream Network With Temporal Attention For Environmental Sound Classification

Wavelet Scattering Transform for Multiclass Support Vector Machines in Audio Devices Classification System

Improved Multi-Model Classification Technique for Sound Event Detection in Urban Environments

Research on Environmental Sound Classification Algorithm Based on Multi-feature Fusion