Abstract:In this paper, we propose lightweight deep neural networks for Acoustic Scene Classification (ASC) and a visualization method for presenting a sound scene context. To this end, we first propose an inception-based and low-memory footprint ASC model as the ASC baseline. The ASC baseline is then compared with benchmark and high-complexity network architectures. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages a residual-inception architecture and multiple kernels. Given the novel residual-inception (NRI) based model, we apply multiple techniques of model compression to evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events detected in a sound scene recording can help to improve ASC accuracy performance and to present the sound scene context more comprehensively. We conduct extensive experiments on various ASC datasets, including sound scene datasets proposed for IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, and 2022 Task 1. Our experimental results on several different ASC challenges highlight two main achievements. First, given the analysis of the trade off between the model performance and the model complexity, we propose two low-complexity ASC models: The medium-size model (MM) presents 4.96 M trainable parameters, 19.3 MB memory occupation, and 7.12 BFLOPs; The small-size model (SM) presents a very low complexity of 120 K trainable parameters, 120 KB memory occupation, and 0.82 BFLOPs. These ASC systems are very competitive to the state-of-the-art systems and compatible for real-life applications on a wide range of edge devices. Secondly, from the analysis of the role of sound events in a sound scene, we propose an effective visualization method for comprehensively presenting a sound scene context. By combining both the sound scene and sound event information, the visualization method not only indicates predicted sound scene contexts with high probabilities but also provides statistics of sound events occurring in these sound scene contexts.

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Deep semantic learning for acoustic scene classification

CNN-Based Acoustic Scene Classification System

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification

Deep Segment Model for Acoustic Scene Classification

Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Lightweight deep neural networks for acoustic scene classification and an effective visualization for presenting sound scene contexts

A convolutional neural network approach for acoustic scene classification

Hierarchical classification for acoustic scenes using deep learning

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

Acoustic Scene Classification using Deep Fisher network

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

A Review of Deep Learning Based Methods for Acoustic Scene Classification

Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

Deep Space Separable Distillation for Lightweight Acoustic Scene Classification