Abstract:The fast dissemination speed and wide range of information dissemination on social media also enable false information and rumors to spread rapidly on public social media. Attackers can use false information to trigger public panic and disrupt social stability. Traditional multimodal sentiment analysis methods face challenges due to the suboptimal fusion of multimodal features and consequent diminution in classification accuracy. To address these issues, this study introduces a novel emotion classification model. The model solves the problem of interaction between modalities, which is neglected by the direct fusion of multimodal features, and improves the model's ability to understand and generalize the semantics of emotions. The Transformer's encoding layer is applied to extract sophisticated sentiment semantic encodings from audio and textual sequences. Subsequently, a complex bimodal feature interaction fusion attention mechanism is deployed to scrutinize intramodal and intermodal correlations and capture contextual dependencies. This approach enhances the model's capacity to comprehend and extrapolate sentiment semantics. The cross‐modal fused features are incorporated into the classification layer, enabling sentiment prediction. Experimental testing on the IEMOCAP dataset demonstrates that the proposed model achieves an emotion recognition classification accuracy of 78.5% and an F1‐score of 77.6%. Compared to other mainstream multimodal emotion recognition methods, the proposed model shows significant improvements in all metrics. The experimental results demonstrate that the proposed method based on the Transformer and interactive attention mechanism can more fully understand the information of discourse emotion features in the network model. This research provides robust technical support for social network public sentiment security monitoring.

Multimodal Sentiment Analysis of Government Information Comments Based on Contrastive Learning and Cross-Attention Fusion Networks

Multi-Modal Sentiment Analysis Based on Image and Text Fusion Based on Cross-Attention Mechanism

A multimodal sentiment recognition method based on attention mechanism

Multi-Channel Attentive Graph Convolutional Network with Sentiment Fusion for Multimodal Sentiment Analysis

Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism

TSCL-FHFN: two-stage contrastive learning and feature hierarchical fusion network for multimodal sentiment analysis

CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection

Social Media Public Opinion Detection Using Multimodal Natural Language Processing and Attention Mechanisms

Multimodal Sentiment Analysis Representations Learning via Contrastive Learning with Condense Attention Fusion

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

Feature Extraction Network with Attention Mechanism for Data Enhancement and Recombination Fusion for Multimodal Sentiment Analysis

A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning

Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model

Multimodal Sentiment Analysis of Graphic Texts Based on Multicategorical Relative Fusion

Multimodal sentiment analysis based on multiple attention

Multi-layer cross-modality attention fusion network for multimodal sentiment analysis

Multimodal Emotion Classification with Multi-Level Semantic Reasoning Network

Multimodal Sentiment Analysis Based on Cross-Modal Attention and Gated Cyclic Hierarchical Fusion Networks

Context-Dependent Multimodal Sentiment Analysis Based on a Complex Attention Mechanism

Multi-Feature Fusion Multi-Modal Sentiment Analysis Model Based on Cross-Attention Mechanism

AMSA: Adaptive Multimodal Learning for Sentiment Analysis