Abstract:Acoustic Scene Classification (ASC) is a task that classifies a scene according to environmental acoustic signals. Audios collected from different cities and devices often exhibit biases in feature distributions, which may negatively impact ASC performance. Taking the city and device of the audio collection as two types of data domain, this paper attempts to disentangle the audio features of each domain to remove the related feature biases. A dual-alignment framework is proposed to generalize the ASC system on new devices or cities, by aligning boundaries across domains and decision boundaries within each domain. During the alignment, the maximum classifier discrepancy and gradient reversed layer are used for the feature disentanglement of scene, city and device, while four candidate domain classifiers are proposed to explore the optimal solution of feature disentanglement. To evaluate the dual-alignment framework, three experiments of biased ASC tasks are designed: 1) cross-city ASC in new cities; 2) cross-device ASC in new devices; 3) cross-city-device ASC in new cities and new devices. Results demonstrate the superiority of the proposed framework, showcasing performance improvements of 0.9, 19.8, and 10.7 on classification accuracy, respectively. The effectiveness of the proposed feature disentanglement approach is further evaluated in both biased and unbiased ASC problems, and the results demonstrate that better-disentangled audio features can lead to a more robust ASC system across different devices and cities. This paper advocates for the integration of feature disentanglement in ASC systems to achieve more reliable performance.

Semi-Supervised Acoustic Scene Classification with Test-Time Adaptation

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

The NERCSLIP-USTC System for Semi-Supervised Acoustic Scene Classification of ICME 2024 Grand Challenge

Improving Acoustic Scene Classification Via Self-Supervised and Semi-Supervised Learning with Efficient Audio Transformer

Deep semantic learning for acoustic scene classification

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

Domestic sound event detection by shift consistency mean-teacher training and adversarial domain adaptation

Hierarchical classification for acoustic scenes using deep learning

CNN-Based Acoustic Scene Classification System

Acoustic Scene Classification Across Cities and Devices via Feature Disentanglement

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

DCASE 2024 Task 4: Sound Event Detection with Heterogeneous Data and Missing Labels

Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Data Efficient Acoustic Scene Classification using Teacher-Informed Confusing Class Instruction

Long-term scalogram integrated with an iterative data augmentation scheme for acoustic scene classification

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

ICASSP 2022 L3DAS22 Challenge: Ensemble of Resnet-Conformers with Ambisonics Data Augmentation for Sound Event Localization and Detection