Abstract:Acoustic scene classification (ASC) has attracted much attention in recent years. In previous studies, the most common architecture is convolutional neural network (CNN) fed by three main features, i.e. log-mel energies, harmonic-percussive source separation (HPSS) and constant-Q transform (CQT). In this paper, we present a hybrid constant-Q transform (HCQT) based CNN system for ASC. Specifically, we first extract CQT and HCQT from each audio clip as the acoustic features, as well as other several features such as Mel-frequency cepstral coefficients, log-mel energies and its HPSS. Then, we feed those features into 5-layer or 9-layer CNNs with average pooling separately. Considering different features that have complementary information with each other, we further develop several methods to integrate the outputs of the CNNs, including averaging, weighted averaging, random forests and extremely randomized trees. To the best of our knowledge, this is the first time HCQT based method is being used for ASC. Essentially, the method combines two CQTs with different resolutions for remedying the high-frequency bins of the traditional CQT. In addition, we investigate different ensemble strategies of the CNN models thoroughly. We evaluated the proposed system in the DCASE 2019 challenge. Experimental results show that HCQT is more effective than the conventional CQT. Furthermore, the accuracies of our system on the validation and leaderboard datasets are 77.5% and 79.3% respectively, which outperforms the two comparison baselines significantly.

Attention-based convolutional neural networks for acoustic scene classification.

Acoustic Scene Classification Using Pixel-Based Attention

A convolutional neural network approach for acoustic scene classification

CAA-Net: Conditional Atrous CNNs With Attention for Explainable Device-Robust Acoustic Scene Classification

Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes.

High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification

Acoustic Scene Classification Based on Dense Convolutional Networks Incorporating Multi-channel Features

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

ACOUSTIC SCENE CLASSIFICATION USING CNN ENSEMBLES AND PRIMARY AMBIENT EXTRACTION Technical Report

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

Multi-level Attention Model with Deep Scattering Spectrum for Acoustic Scene Classification.

Frequency-based CNN and attention module for acoustic scene classification

Acoustic scene classification by feed forward neural network with class dependent attention mechanism

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

Hybrid Constant-Q Transform Based CNN Ensemble for Acoustic Scene Classification

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

CNNs-based Acoustic Scene Classification using Multi-Spectrogram Fusion and Label Expansions.

Spatio-Temporal Attention Pooling for Audio Scene Classification

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Acoustic Scene Classification Using Deep Convolutional Neural Network and Multiple Spectrograms Fusion.

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification