Abstract:Acoustic scene classification (ASC) has attracted much attention in recent years. In previous studies, the most common architecture is convolutional neural network (CNN) fed by three main features, i.e. log-mel energies, harmonic-percussive source separation (HPSS) and constant-Q transform (CQT). In this paper, we present a hybrid constant-Q transform (HCQT) based CNN system for ASC. Specifically, we first extract CQT and HCQT from each audio clip as the acoustic features, as well as other several features such as Mel-frequency cepstral coefficients, log-mel energies and its HPSS. Then, we feed those features into 5-layer or 9-layer CNNs with average pooling separately. Considering different features that have complementary information with each other, we further develop several methods to integrate the outputs of the CNNs, including averaging, weighted averaging, random forests and extremely randomized trees. To the best of our knowledge, this is the first time HCQT based method is being used for ASC. Essentially, the method combines two CQTs with different resolutions for remedying the high-frequency bins of the traditional CQT. In addition, we investigate different ensemble strategies of the CNN models thoroughly. We evaluated the proposed system in the DCASE 2019 challenge. Experimental results show that HCQT is more effective than the conventional CQT. Furthermore, the accuracies of our system on the validation and leaderboard datasets are 77.5% and 79.3% respectively, which outperforms the two comparison baselines significantly.

Hybrid Constant-Q Transform Based CNN Ensemble for Acoustic Scene Classification

Acoustic Scene Classification Using Deep CNNs with Time-Frequency Representations

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

CNN-Based Acoustic Scene Classification System

A convolutional neural network approach for acoustic scene classification

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

Multi-LCNN: A Hybrid Neural Network Based on Integrated Time-Frequency Characteristics for Acoustic Scene Classification.

Multi-level Attention Model with Deep Scattering Spectrum for Acoustic Scene Classification.

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

ACOUSTIC SCENE CLASSIFICATION USING CNN ENSEMBLES AND PRIMARY AMBIENT EXTRACTION Technical Report

Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network.

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Deep Segment Model for Acoustic Scene Classification

Attention-based convolutional neural networks for acoustic scene classification.

Hierarchical classification for acoustic scenes using deep learning

A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models.

High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

SubSpectralNet - Using Sub-Spectrogram based Convolutional Neural Networks for Acoustic Scene Classification

Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

Attention-based Atrous Convolutional Neural Networks: Visualisation and Understanding Perspectives of Acoustic Scenes.