Abstract:Environmental sound detection is a challenging application of machine learning because of the noisy nature of the signal, and the small amount of (labeled) data that is typically available. This work thus presents a comparison of several state-of-the-art Deep Learning models on the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge task and data, classifying sounds into one of fifteen common indoor and outdoor acoustic scenes, such as bus, cafe, car, city center, forest path, library, train, etc. In total, 13 hours of stereo audio recordings are available, making this one of the largest datasets available. We perform experiments on six sets of features, including standard Mel-frequency cepstral coefficients (MFCC), Binaural MFCC, log Mel-spectrum and two different large- scale temporal pooling features extracted using OpenSMILE. On these features, we apply five models: Gaussian Mixture Model (GMM), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Convolutional Deep Neural Net- work (CNN) and i-vector. Using the late-fusion approach, we improve the performance of the baseline 72.5% by 15.6% in 4-fold Cross Validation (CV) avg. accuracy and 11% in test accuracy, which matches the best result of the DCASE 2016 challenge. With large feature sets, deep neural network models out- perform traditional methods and achieve the best performance among all the studied methods. Consistent with other work, the best performing single model is the non-temporal DNN model, which we take as evidence that sounds in the DCASE challenge do not exhibit strong temporal dynamics.

DD-CNN: Depthwise Disout Convolutional Neural Network for Low-complexity Acoustic Scene Classification

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

A convolutional neural network approach for acoustic scene classification

Sample Dropout for Audio Scene Classification Using Multi-Scale Dense Connected Convolutional Neural Network

Hierarchical learning for DNN-based acoustic scene classification

Deep Neural Decision Forest for Acoustic Scene Classification

Deep semantic learning for acoustic scene classification

An Investigation on Multiscale Normalised Deep Scattering Spectrum with Deep Residual Network for Acoustic Scene Classification

Deep Segment Model for Acoustic Scene Classification

Acoustic scene classification using deep CNN with fine-resolution feature

Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

Constrained Learned Feature Extraction for Acoustic Scene Classification.

Domestic Activities Classification from Audio Recordings Using Multi-scale Dilated Depthwise Separable Convolutional Network

Low-complexity acoustic scene classification in DCASE 2022 Challenge

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Dual-path convolutional neural network based on band interaction block for acoustic scene classification

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection

A Comparison of deep learning methods for environmental sound