A Multimodal Sentiment Analysis Method Integrating Multi-Layer Attention Interaction and Multi-Feature Enhancement

Shengfeng Xie,Jingwei Li
DOI: https://doi.org/10.4018/ijitsa.335940
2024-01-12
International Journal of Information Technologies and Systems Approach
Abstract:To address issues related to the insufficient representation of text semantic information and the lack of deep fusion between internal modal information and intermodal information in current multimodal sentiment analysis (MSA) methods, a new method integrating multi-layer attention interaction and multi-feature enhancement (AM-MF) is proposed. First, multimodal feature extraction (MFE) is performed based on RoBERTa, ResNet, and ViT models for text, audio, and video information, and high-level features of the three modalities are obtained through self-attention mechanisms. Then, a cross modal attention (CMA) interaction module is constructed based on transformer, achieving feature fusion between different modalities. Finally, the use of a soft attention mechanism for the deep fusion of internal and intermodal information effectively achieves multimodal sentiment classification. The experimental results CH-SIMS and CMU-MOSEI datasets show that the classification results of proposed MSA method are significantly superior to other advanced comparative methods.
What problem does this paper attempt to address?