Abstract:Acoustic scene classification (ASC) has attracted much attention in recent years. In previous studies, the most common architecture is convolutional neural network (CNN) fed by three main features, i.e. log-mel energies, harmonic-percussive source separation (HPSS) and constant-Q transform (CQT). In this paper, we present a hybrid constant-Q transform (HCQT) based CNN system for ASC. Specifically, we first extract CQT and HCQT from each audio clip as the acoustic features, as well as other several features such as Mel-frequency cepstral coefficients, log-mel energies and its HPSS. Then, we feed those features into 5-layer or 9-layer CNNs with average pooling separately. Considering different features that have complementary information with each other, we further develop several methods to integrate the outputs of the CNNs, including averaging, weighted averaging, random forests and extremely randomized trees. To the best of our knowledge, this is the first time HCQT based method is being used for ASC. Essentially, the method combines two CQTs with different resolutions for remedying the high-frequency bins of the traditional CQT. In addition, we investigate different ensemble strategies of the CNN models thoroughly. We evaluated the proposed system in the DCASE 2019 challenge. Experimental results show that HCQT is more effective than the conventional CQT. Furthermore, the accuracies of our system on the validation and leaderboard datasets are 77.5% and 79.3% respectively, which outperforms the two comparison baselines significantly.

Acoustic Scene Classification Using Aggregation of Two-Scale Deep Embeddings.

Deep semantic learning for acoustic scene classification

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Deep Segment Model for Acoustic Scene Classification

Hierarchical classification for acoustic scenes using deep learning

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

Multi-level Attention Model with Deep Scattering Spectrum for Acoustic Scene Classification.

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

CNN-Based Acoustic Scene Classification System

Bi-level Acoustic Scene Classification Using Lightweight Deep Learning Model

Acoustic Scene Classification Based on Dense Convolutional Networks Incorporating Multi-channel Features

Acoustic Scene Classification Using Deep CNNs with Time-Frequency Representations

An Investigation on Multiscale Normalised Deep Scattering Spectrum with Deep Residual Network for Acoustic Scene Classification

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Long-term scalogram integrated with an iterative data augmentation scheme for acoustic scene classification

A Hybrid Approach with Multi-channel I-Vectors and Convolutional Neural Networks for Acoustic Scene Classification

Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework

Hybrid Constant-Q Transform Based CNN Ensemble for Acoustic Scene Classification