Audio-visual scene recognition using attention-based graph convolutional model
Ziqi Wang,Yikai Wu,Yifan Wang,Wenjuan Gong,and Jordi Gonzàlez
DOI: https://doi.org/10.1007/s11042-024-19654-2
IF: 2.577
2024-06-19
Multimedia Tools and Applications
Abstract:Scene recognition aims to automatically comprehend scenes, and is widely utilized in various fields such as autonomous driving, intelligent security, and robotics. Current research predominantly employs local audio feature extractors, which results in the extracted features being unable to accommodate long-range contextual characteristics. Moreover, regarding the extracted features, most studies assume that the features of each modality possess equal importance. Our work primarily introduces a long-range audio feature extractor and employs a self-attention module to re-weight different features, addressing the limitations of the aforementioned local audio features and the varying importance of different modalities. We propose a visual-audio fusion model based on a self-attention-based graph convolutional neural network (SAGCN). In this model, we introduce an attention mechanism based cross-modal learning module into a structured multi-modal fusion network, and integrate the extracted features from different modalities to achieve precise scene recognition. The proposed model achieves an accuracy of 93.1 on a standard multi-modal scene recognition dataset: TAU dataset. Compared with other standard early and late fusion methods, the prediction accuracy enhances by 1.4 and 10 , respectively. For comparison with the SOTA methods, SAGCN exceeded the TAU baseline and attentional graph convolutional network on the TAU dataset by 8.3 and 1.5 , respectively, and achieved a 95.0% accuracy on the UCF101 dataset, outperforming the evolved loss method by 1.2 and the cross-modal deep clusterin method by 0.8 . The code is available at https://github.com/submission1234/SAGCN.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering