Abstract:Acoustic scene classification (ASC) aims to identify the type of scene (environment) in which a given audio signal is recorded. The log-mel feature and convolutional neural network (CNN) have recently become the most popular time-frequency (TF) feature representation and classifier in ASC. An audio signal recorded in a scene may include various sounds overlapping in time and frequency. The previous study suggests that separately considering the long-duration sounds and short-duration sounds in CNN may improve ASC accuracy. This study addresses the problem of the generalization ability of acoustic scene classifiers. In practice, acoustic scene signals' characteristics may be affected by various factors, such as the choice of recording devices and the change of recording locations. When an established ASC system predicts scene classes on audios recorded in unseen scenarios, its accuracy may drop significantly. The long-duration sounds not only contain domain-independent acoustic scene information, but also contain channel information determined by the recording conditions, which is prone to over-fitting. For a more robust ASC system, We propose a robust feature learning (RFL) framework to train the CNN. The RFL framework down-weights CNN learning specifically on long-duration sounds. The proposed method is to train an auxiliary classifier with only long-duration sound information as input. The auxiliary classifier is trained with an auxiliary loss function that assigns less learning weight to poorly classified examples than the standard cross-entropy loss. The experimental results show that the proposed RFL framework can obtain a more robust acoustic scene classifier towards unseen devices and cities.

A convolutional neural network approach for acoustic scene classification

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

CNN-Based Acoustic Scene Classification System

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

Deep Segment Model for Acoustic Scene Classification

Deep semantic learning for acoustic scene classification

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Dual-path convolutional neural network based on band interaction block for acoustic scene classification

Multi-Temporal Resolution Convolutional Neural Networks for Acoustic Scene Classification

Robust Feature Learning on Long-Duration Sounds for Acoustic Scene Classification

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

A Comparative Study on Approaches to Acoustic Scene Classification using CNNs