Long-term scalogram integrated with an iterative data augmentation scheme for acoustic scene classification

Hangting Chen,Zuozhen Liu,Zongming Liu,Pengyuan Zhang
DOI: https://doi.org/10.1121/10.0005202
Abstract:In acoustic scene classification (ASC), acoustic features play a crucial role in the extraction of scene information, which can be stored over different time scales. Moreover, the limited size of the dataset may lead to a biased model with a poor performance for recordings from unseen cities and confusing scene classes. This paper proposes a long-term wavelet feature that captures discriminative long-term scene information. The extracted scalogram requires a lower storage capacity and can be classified faster and more accurately compared with classic Mel filter bank coefficients (FBank). Furthermore, a data augmentation scheme is adopted to improve the generalization of the ASC systems, which extends the database iteratively with auxiliary classifier generative adversarial neural networks (ACGANs) and a deep learning-based sample filter. Experiments were conducted on datasets from the Detection and Classification of Acoustic Scenes and Events (DCASE) challenges. The DCASE17 and DCASE19 datasets marked a performance boost of the proposed techniques compared with the FBank classifier. Moreover, the ACGAN-based data augmentation scheme achieved an absolute accuracy improvement of 6.10% on recordings from unseen cities, far exceeding classic augmentation methods.
What problem does this paper attempt to address?