Automated Audio Data Augmentation Network Using Bi-Level Optimization for Sound Event Localization and Detection

Wenjie Zhang,Peng Yu,Jun Yin,Xiaoheng Jiang,Mingliang Xu
DOI: https://doi.org/10.1109/lsp.2024.3475350
2024-10-16
IEEE Signal Processing Letters
Abstract:In sound event localization and detection (SELD), traditional methods often treat localization and detection algorithms separately from data augmentation. During the model training process, the strategy for data augmentation is typically implemented in a non-learnable manner. Existing audio data augmentation strategies struggle to find optimal parameter solutions for data augmentation that can be effectively applied to SELD systems. To address this challenge, we introduce an innovative network-based strategy, termed the Automated Audio Data Augmentation (AADA) network. This strategy employs bi-level optimization to synergistically integrate audio data augmentation techniques with SELD tasks. In the AADA network, the lower-level SELD task serves as a constraint for the higher-level data augmentation process. The audio data augmentation parameters are adaptively optimized by utilizing the transfer of intermediate feature information from the SELD tasks, thus obtaining optimal parameters for these tasks. Evaluation of our approach on the Sony-TAU Realistic Spatial Soundscapes 2023 dataset achieves a SELD score of 0.4801, significantly surpassing the performance metrics of all traditional data augmentation strategies for SELD.
engineering, electrical & electronic
What problem does this paper attempt to address?