Abstract:Deep neural networks (DNNs) have been proven to be powerful models for acoustic scene classification tasks. State-of-the-art DNNs have millions of connections and are computationally intensive, making them difficult to deploy on systems with limited resources. With a focus on acoustic scene classification, we describe a new learnable module, the simulated Fourier transform module, which allows deep neural networks to implement the discrete Fourier transform operation 8x faster on a graphics processing unit (GPU). We frame the signal processing procedure as an adaptive machine learning problem and introduce learnable parameters in the module to facilitate fast adaptation for the complex and variable acoustic signal. This module gives neural networks the ability to model audio signals from raw waveforms, without extra fast Fourier transform and filter bank patches. Then, we use the temporal transformer module, which has been previously published, to alleviate the information loss caused by the simulated Fourier transform module. These techniques can be integrated into an existing fully connected neural network (FCNN), convolutional neural network (CNN), or recurrent neural network (RNN) models. We evaluate the proposed strategy using four acoustic scene datasets (LITIS Rouen, DCASE2016, DCASE2017, and DCASE2018) as target tasks. We show that the proposed approach significantly outperforms the vanilla FCNN, CNN, and RNN approach on both efficiency and performance. For instance, the proposed approach can reduce inference time by 8x while reducing the classification error on LITIS Rouen dataset from 3.21% to 1.81%.

Acoustic Scene Classification Using Deep Convolutional Neural Network and Multiple Spectrograms Fusion.

CNNs-based Acoustic Scene Classification using Multi-Spectrogram Fusion and Label Expansions.

Acoustic scene classification using deep CNN with fine-resolution feature

Deep Mutual Attention Network for Acoustic Scene Classification

ACOUSTIC SCENE CLASSIFICATION USING ENSEMBLES OF DEEP RESIDUAL NETWORKS AND SPECTROGRAM DECOMPOSITIONS Technical Report

Acoustic Scene Classification Using Deep Convolutional Neural Network via Transfer Learning

Acoustic scene classification based on three-dimensional multi-channel feature-correlated deep learning networks

Deep Sequential Image Features on Acoustic Scene Classification.

Deep Scalogram Representations for Acoustic Scene Classification

Acoustic Scene Classification Using Deep CNNs with Time-Frequency Representations

A Simple Fusion of Deep and Shallow Learning for Acoustic Scene Classification

Ensemble Of Deep Neural Networks For Acoustic Scene Classification

Acoustic Scene Recognition Based on Convolutional Neural Networks

Multi-Temporal Resolution Convolutional Neural Networks for Acoustic Scene Classification

Mixup-Based Acoustic Scene Classification Using Multi-channel Convolutional Neural Network.

An Investigation on Multiscale Normalised Deep Scattering Spectrum with Deep Residual Network for Acoustic Scene Classification

Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features

Constrained Learned Feature Extraction for Acoustic Scene Classification

A convolutional neural network approach for acoustic scene classification

ACOUSTIC SCENE CLASSIFICATION USING CNN ENSEMBLES AND PRIMARY AMBIENT EXTRACTION Technical Report

Acoustic Scene Classification Based on Dense Convolutional Networks Incorporating Multi-channel Features