Abstract:In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1, MobileNetV2, VGG16, VGG19, ResNet50V2, ResNet152V2, DenseNet121, DenseNet201, and Xception. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages residual-inception architectures and multiple kernels. Given the novel residual-inception (NRI) model, we further evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events occurring in a sound scene recording can help to improve ASC accuracy, then indicate how a sound scene context is well presented by combining both sound scene and sound event information. We conduct extensive experiments on various ASC datasets, including Crowded Scenes, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, 2022 Task 1. The experimental results on several different ASC challenges highlight two main achievements; the first is to propose robust, general, and low complexity ASC systems which are suitable for real-life applications on a wide range of edge devices and mobiles; the second is to propose an effective visualization method for comprehensively presenting a sound scene context.

A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models.

Deep Segment Model for Acoustic Scene Classification

Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network.

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances

High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification

Utterance-Based Audio Sentiment Analysis Learned by a Parallel Combination of CNN and LSTM.

Hierarchical classification for acoustic scenes using deep learning

Deep Neural Decision Forest for Acoustic Scene Classification

Deep semantic learning for acoustic scene classification

An Investigation of High-Resolution Modeling Units of Deep Neural Networks for Acoustic Scene Classification

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

An Investigation on Multiscale Normalised Deep Scattering Spectrum with Deep Residual Network for Acoustic Scene Classification

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

CNN-Based Acoustic Scene Classification System

Hierarchical learning for DNN-based acoustic scene classification