Abstract:Acoustic scene classification (ASC) aims to analyse the recording scene of a piece of audio. In real life, ASC has to deal with audio data from various recording devices, even those recorded by devices that did not appear during the training phase. Audio data recorded by different devices, especially unseen devices, have differences in sampling rate, amplitude, data distribution, etc. These differences can greatly interfere with the feature learning process of CNNs and lead to degradation of the performance of the ASC model. In order to learn advanced features that are less susceptible to differences in device information from manual features that contain device information, we propose an ASC method based on multi-level distance embedding space, called multi-level distance embedding learning (MDEL). There is a hierarchical relationship among the categories of acoustic scene, that is, from the three coarse-grained categories of indoor, outdoor, and transportation to more fine-grained categories. This relation corresponds to a similarity relation between categories of different granularity. MDEL exploits this hierarchical relationship of similarity between acoustic scene classes to construct embedding space containing multi-level distance. During the learning process, the model is guided to focus more on common features of the same scene classes and learn an advanced feature that is more robust to the device, thus improving the robustness of the model to data from unseen devices. Our method was evaluated on the audio dataset provided by the DCASE2020 Challenge for Task1a, and the overall classification accuracy was improved by 1.2 . For audio data from unseen devices, the classification accuracy was improved by 2.3 .

The NERCSLIP-USTC System for Semi-Supervised Acoustic Scene Classification of ICME 2024 Grand Challenge

Semi-Supervised Acoustic Scene Classification with Test-Time Adaptation

Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

Deep semantic learning for acoustic scene classification

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Improving Acoustic Scene Classification Via Self-Supervised and Semi-Supervised Learning with Efficient Audio Transformer

CNN-Based Acoustic Scene Classification System

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

Deep Segment Model for Acoustic Scene Classification

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

A convolutional neural network approach for acoustic scene classification

Hierarchical classification for acoustic scenes using deep learning

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

Multi-level distance embedding learning for robust acoustic scene classification with unseen devices

Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning

SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Leveraging Self-supervised Audio Representations for Data-Efficient Acoustic Scene Classification

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model