Abstract:Acoustic Scene Classification (ASC) is a task that classifies a scene according to environmental acoustic signals. Audios collected from different cities and devices often exhibit biases in feature distributions, which may negatively impact ASC performance. Taking the city and device of the audio collection as two types of data domain, this paper attempts to disentangle the audio features of each domain to remove the related feature biases. A dual-alignment framework is proposed to generalize the ASC system on new devices or cities, by aligning boundaries across domains and decision boundaries within each domain. During the alignment, the maximum classifier discrepancy and gradient reversed layer are used for the feature disentanglement of scene, city and device, while four candidate domain classifiers are proposed to explore the optimal solution of feature disentanglement. To evaluate the dual-alignment framework, three experiments of biased ASC tasks are designed: 1) cross-city ASC in new cities; 2) cross-device ASC in new devices; 3) cross-city-device ASC in new cities and new devices. Results demonstrate the superiority of the proposed framework, showcasing performance improvements of 0.9, 19.8, and 10.7 on classification accuracy, respectively. The effectiveness of the proposed feature disentanglement approach is further evaluated in both biased and unbiased ASC problems, and the results demonstrate that better-disentangled audio features can lead to a more robust ASC system across different devices and cities. This paper advocates for the integration of feature disentanglement in ASC systems to achieve more reliable performance.

City Classification from Multiple Real-World Sound Scenes

Artificial intelligence-based collaborative acoustic scene and event classification to support urban soundscape analysis and classification

A multi-device dataset for urban acoustic scene classification

Urban Sound Classification : striving towards a fair comparison

A convolutional neural network approach for acoustic scene classification

CNN-Based Acoustic Scene Classification System

[Frontal tumor revealed by mega-stomach].

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines

Clustering by Errors: A Self-Organized Multitask Learning Method for Acoustic Scene Classification

Hierarchical classification for acoustic scenes using deep learning

Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

A Comparative Study on Approaches to Acoustic Scene Classification using CNNs

A Comparison of deep learning methods for environmental sound

Cross-task learning for audio tagging, sound event detection spatial localization: DCASE 2019 baseline systems

Analysis Acoustic Features for Acoustic Scene Classification and Score fusion of multi-classification systems applied to DCASE 2016 challenge

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Deep semantic learning for acoustic scene classification