Sentiment Analysis of Social Media Comments Based on Multimodal Attention Fusion Network
Ziyu Liu,Tao Yang,Wen Chen,Jiangchuan Chen,Qinru Li,Jun Zhang
DOI: https://doi.org/10.1016/j.asoc.2024.112011
IF: 8.7
2024-01-01
Applied Soft Computing
Abstract:Social media comments are no longer in a single textual modality, but heterogeneous data in multiple modalities, such as vision, sound, and text, which is why multimodal sentiment analysis strategies has been introduced. However, among the multimodal sentiment analysis domains, a majority of the current multimodal sentiment analysis models employ the Transformer architecture due to its great impact and benefits, thereby leading to an augmentation in resource overhead. In this paper, a multimodal attention fusion (MAF) network model is proposed for sentiment analysis of multimodal data. MAF is mainly composed of the cross attention and residual unit. The Cross Attention Unit is designed to select one core modality out of three modes, while the remaining modes serve as base modalities. The core modality is then combined with the base modality information to facilitate significant interaction between the two modalities, resulting in three sets of two-by-two attention computations. Moreover, a residual unit is employed to integrate the overall information into the attentional information. This approach not only enables modality-to-modality interaction, but also supplements the overall information. In the end, experiments are conducted on two publicly available multimodal sentiment analysis datasets from Carnegie Mellon University(CMU), CMU-MOSEI (abbreviated as MOSEI) and CMU-MOSI (abbreviated as MOSI), to validate that the method achieves high performance while removing the complex structure, and is comparable to the State-Of-The-Art(SOTA) model with high-performance A100 and V100 Graphics Processing Units(GPU) in an ordinary hardware environment.