Abstract:In this paper, we propose lightweight deep neural networks for Acoustic Scene Classification (ASC) and a visualization method for presenting a sound scene context. To this end, we first propose an inception-based and low-memory footprint ASC model as the ASC baseline. The ASC baseline is then compared with benchmark and high-complexity network architectures. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages a residual-inception architecture and multiple kernels. Given the novel residual-inception (NRI) based model, we apply multiple techniques of model compression to evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events detected in a sound scene recording can help to improve ASC accuracy performance and to present the sound scene context more comprehensively. We conduct extensive experiments on various ASC datasets, including sound scene datasets proposed for IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, and 2022 Task 1. Our experimental results on several different ASC challenges highlight two main achievements. First, given the analysis of the trade off between the model performance and the model complexity, we propose two low-complexity ASC models: The medium-size model (MM) presents 4.96 M trainable parameters, 19.3 MB memory occupation, and 7.12 BFLOPs; The small-size model (SM) presents a very low complexity of 120 K trainable parameters, 120 KB memory occupation, and 0.82 BFLOPs. These ASC systems are very competitive to the state-of-the-art systems and compatible for real-life applications on a wide range of edge devices. Secondly, from the analysis of the role of sound events in a sound scene, we propose an effective visualization method for comprehensively presenting a sound scene context. By combining both the sound scene and sound event information, the visualization method not only indicates predicted sound scene contexts with high probabilities but also provides statistics of sound events occurring in these sound scene contexts.

Incremental Learning of Acoustic Scenes and Sound Events

Online Domain-Incremental Learning Approach to Classify Acoustic Scenes in All Locations

Domain-Incremental Learning for Audio Classification

Class-Incremental Learning for Multi-Label Audio Classification

INCREMENTAL LEARNING FOR END-TO-END AUTOMATIC SPEECH RECOGNITION

Class-Incremental Learning for Sound Event Localization and Detection

Incremental Learning Algorithm for Sound Event Detection

A convolutional neural network approach for acoustic scene classification

Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning

Incremental Learning Without Looking Back: a Neural Connection Relocation Approach

Online Continual Learning in Acoustic Scene Classification: An Empirical Study

Clustering by Errors: A Self-Organized Multitask Learning Method for Acoustic Scene Classification

Acoustic Scene Classification Using Deep Convolutional Neural Network via Transfer Learning

A Multiscale Incremental Learning Network for Remote Sensing Scene Classification

Joint Analysis of Acoustic Events and Scenes Based on Multitask Learning

Lightweight deep neural networks for acoustic scene classification and an effective visualization for presenting sound scene contexts

Hierarchical learning for DNN-based acoustic scene classification

Incremental Scene Classification Using Dual Knowledge Distillation and Classifier Discrepancy on Natural and Remote Sensing Images

Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events.

Incremental Learning Using Conditional Adversarial Networks

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification