Abstract:Acoustic scene classification (ASC) has attracted much attention in recent years. In previous studies, the most common architecture is convolutional neural network (CNN) fed by three main features, i.e. log-mel energies, harmonic-percussive source separation (HPSS) and constant-Q transform (CQT). In this paper, we present a hybrid constant-Q transform (HCQT) based CNN system for ASC. Specifically, we first extract CQT and HCQT from each audio clip as the acoustic features, as well as other several features such as Mel-frequency cepstral coefficients, log-mel energies and its HPSS. Then, we feed those features into 5-layer or 9-layer CNNs with average pooling separately. Considering different features that have complementary information with each other, we further develop several methods to integrate the outputs of the CNNs, including averaging, weighted averaging, random forests and extremely randomized trees. To the best of our knowledge, this is the first time HCQT based method is being used for ASC. Essentially, the method combines two CQTs with different resolutions for remedying the high-frequency bins of the traditional CQT. In addition, we investigate different ensemble strategies of the CNN models thoroughly. We evaluated the proposed system in the DCASE 2019 challenge. Experimental results show that HCQT is more effective than the conventional CQT. Furthermore, the accuracies of our system on the validation and leaderboard datasets are 77.5% and 79.3% respectively, which outperforms the two comparison baselines significantly.

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

Acoustic Scene Classification Using Deep CNNs with Time-Frequency Representations

Acoustic Scene Classification Based on Dense Convolutional Networks Incorporating Multi-channel Features

A convolutional neural network approach for acoustic scene classification

Multi-level Attention Model with Deep Scattering Spectrum for Acoustic Scene Classification.

Deep Segment Model for Acoustic Scene Classification

Deep semantic learning for acoustic scene classification

Hybrid Constant-Q Transform Based CNN Ensemble for Acoustic Scene Classification

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Learning Temporal Relations from Semantic Neighbors for Acoustic Scene Classification

Acoustic Scene Classification Using Deep Convolutional Neural Network and Multiple Spectrograms Fusion.

Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network.

ATReSN-Net: Capturing Attentive Temporal Relations in Semantic Neighborhood for Acoustic Scene Classification.

Hierarchical classification for acoustic scenes using deep learning

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes.

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Multi-LCNN: A Hybrid Neural Network Based on Integrated Time-Frequency Characteristics for Acoustic Scene Classification.

CNN-Based Acoustic Scene Classification System

ACOUSTIC SCENE CLASSIFICATION USING CNN ENSEMBLES AND PRIMARY AMBIENT EXTRACTION Technical Report