A Multi-level Attention Fusion Network for Weakly Supervised Audio Classification

Weibo Zhao,Yuan He,Junsheng Mu,Xiaojun Jing
DOI: https://doi.org/10.1007/978-981-33-4102-9_83
2020-01-01
Abstract:Audio classification aims to distinguish different kinds of sounds, and it is of great importance to artificial intelligence applications. Nevertheless, there are still some challenges faced in this field, especially the classification of weakly labeled audio signals. The audio clip contains temporal information and spatial information. However, existing methods only utilize partial information so that the classification effect requires to be improved. To improve classification accuracy, we propose a multi-level attention fusion network (MLAFNet) based on deep supervision which includes multi-attention fusion (MAF) module and multi-level fusion (MLF) module. The MAF module can take full advantage of the information from the time and space domain. The MLF module based on deep supervision strategy can combine the coarse-grained and fine-grained information. Extensive experiments are carried out on the basis of Google Audio Set to demonstrate the effectiveness of the proposed network beyond several state-of-the-art approaches, which achieve 0.970 on AUC and 2.652 on d-prime.
What problem does this paper attempt to address?