Abstract:In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes, and (ii) Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions. For Task 1a, we propose a novel two-stage ASC system leveraging upon ad-hoc score combination of two convolutional neural networks (CNNs), classifying the acoustic input according to three classes, and then ten classes, respectively. Four different CNN-based architectures are explored to implement the two-stage classifiers, and several data augmentation techniques are also investigated. For Task 1b, we leverage upon a quantization method to reduce the complexity of two of our top-accuracy three-classes CNN-based architectures. On Task 1a development data set, an ASC accuracy of 76.9\% is attained using our best single classifier and data augmentation. An accuracy of 81.9\% is then attained by a final model fusion of our two-stage ASC classifiers. On Task 1b development data set, we achieve an accuracy of 96.7\% with a model size smaller than 500KB. Code is available: https://github.com/MihawkHu/DCASE2020_task1.

CNN-LTE: a Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Recognition

Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling

Acoustic scene classification using multi-layer temporal pooling based on convolutional neural network.

Deep semantic learning for acoustic scene classification

Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge

CNN-Based Acoustic Scene Classification System

A convolutional neural network approach for acoustic scene classification

Convolutional Neural Networks and x-vector Embedding for DCASE2018 Acoustic Scene Classification Challenge

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation

Cross-task learning for audio tagging, sound event detection spatial localization: DCASE 2019 baseline systems

A Low-Compexity Deep Learning Framework For Acoustic Scene Classification

Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework

Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Acoustic Scene Classification Using Fusion of Attentive Convolutional Neural Networks for DCASE2019 Challenge

Low-Complexity Acoustic Scene Classification Using Parallel Attention-Convolution Network

A HYBRID ASR MODEL APPROACH ON WEAKLY LABELED SCENE CLASSIFICATION Technical Report

An Investigation of Transfer Learning Mechanism for Acoustic Scene Classification

Low-complexity deep learning frameworks for acoustic scene classification using teacher-student scheme and multiple spectrograms

Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

Acoustic Scene Recognition Based on Convolutional Neural Networks

Spatio-Temporal Attention Pooling for Audio Scene Classification