The NERCSLIP-USTC System for Semi-Supervised Acoustic Scene Classification of ICME 2024 Grand Challenge

Qing Wang,Guirui Zhong,Hengyi Hong,Lei Wang,Mingqi Cai,Xin Fang,Ya Jiang,Jun Du
DOI: https://doi.org/10.1109/icmew63481.2024.10645399
2024-01-01
Abstract:Acoustic scene classification (ASC) aims at identifying audio clips into pre-defined classes. It remains challenging when it comes to domain generalization and semi-supervised learning techniques. In this paper, we propose a two-stage training strategy based on fully convolutional neural network (FCNN) to improve ASC performance. We first pre-train FCNN models using two publicly released datasets, which are then combined with the development dataset of the ICME ASC chal-lenge to fine-tune the models in the previous stage. For semi-supervised learning, we generate reliable pseudo labels for unlabeled data within the development dataset according to the confidence of different models. Furthermore, we train a three-class classifier besides the ten-class ASC system, which recognizes an input audio scene as one of three main classes, including in-door, out-door, and transportation. In addition, we adopt manifold mixup augmentation during the model training processing. Evaluated on the test set of the ASC task in ICME 2024 Grand Challenge, our proposed approach out-performs the baseline by a large margin, ranking the first place in the challenge.
What problem does this paper attempt to address?