Abstract:The fast dissemination speed and wide range of information dissemination on social media also enable false information and rumors to spread rapidly on public social media. Attackers can use false information to trigger public panic and disrupt social stability. Traditional multimodal sentiment analysis methods face challenges due to the suboptimal fusion of multimodal features and consequent diminution in classification accuracy. To address these issues, this study introduces a novel emotion classification model. The model solves the problem of interaction between modalities, which is neglected by the direct fusion of multimodal features, and improves the model's ability to understand and generalize the semantics of emotions. The Transformer's encoding layer is applied to extract sophisticated sentiment semantic encodings from audio and textual sequences. Subsequently, a complex bimodal feature interaction fusion attention mechanism is deployed to scrutinize intramodal and intermodal correlations and capture contextual dependencies. This approach enhances the model's capacity to comprehend and extrapolate sentiment semantics. The cross‐modal fused features are incorporated into the classification layer, enabling sentiment prediction. Experimental testing on the IEMOCAP dataset demonstrates that the proposed model achieves an emotion recognition classification accuracy of 78.5% and an F1‐score of 77.6%. Compared to other mainstream multimodal emotion recognition methods, the proposed model shows significant improvements in all metrics. The experimental results demonstrate that the proposed method based on the Transformer and interactive attention mechanism can more fully understand the information of discourse emotion features in the network model. This research provides robust technical support for social network public sentiment security monitoring.

Social Event Classification Based on Multimodal Masked Transformer Network

Positive Unlabeled Fake News Detection Via Multi-Modal Masked Transformer Network

Open-World Social Event Classification

Multi-scale Harmonic Mean Time Surfaces for Event-based Object Classification

Social Media Public Opinion Detection Using Multimodal Natural Language Processing and Attention Mechanisms

A Center-Masked Transformer for Hyperspectral Image Classification

A Multi-Modal Transformer Approach for Football Event Classification

Multi-Modal Event Topic Model for Social Event Analysis.

Discriminative Multimodal Embedding for Event Classification

Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection

Social Event Classification Via Boosted Multimodal Supervised Latent Dirichlet Allocation.

An interactive network based on transformer for multimodal crowd counting

A multimodal hyper-fusion transformer for remote sensing image classification

Multi-Modal Supervised Latent Dirichlet Allocation for Event Classification in Social Media

MASK-CNN-Transformer for real-time multi-label weather recognition

Multimodal Classification of Events in Social Media

Cross-modal sentiment analysis based on Transformer and image-text collaborative interaction

Multimodal Transformer With Multi-View Visual Representation for Image Captioning

Multimodal Fusion Transformer for Remote Sensing Image Classification

Tri-CLT: Learning Tri-Modal Representations with Contrastive Learning and Transformer for Multimodal Sentiment Recognition

Disaster Image Classification by Fusing Multimodal Social Media Data