Abstract:Multimodal Sentiment Analysis (SA) is gaining popularity due to its broad application potential. The existing studies have focused on the SA of single modalities, such as texts or photos, posing challenges in effectively handling social media data with multiple modalities. Moreover, most multimodal research has concentrated on merely combining the two modalities rather than exploring their complex correlations, leading to unsatisfactory sentiment classification results. Motivated by this, we propose a new visual-textual sentiment classification model named Multi-Model Fusion (MMF), which uses a mixed fusion framework for SA to effectively capture the essential information and the intrinsic relationship between the visual and textual content. The proposed model comprises three deep neural networks. Two different neural networks are proposed to extract the most emotionally relevant aspects of image and text data. Thus, more discriminative features are gathered for accurate sentiment classification. Then, a multichannel joint fusion model with a self-attention technique is proposed to exploit the intrinsic correlation between visual and textual characteristics and obtain emotionally rich information for joint sentiment classification. Finally, the results of the three classifiers are integrated using a decision fusion scheme to improve the robustness and generalizability of the proposed model. An interpretable visual-textual sentiment classification model is further developed using the Local Interpretable Model-agnostic Explanation model (LIME) to ensure the model’s explainability and resilience. The proposed MMF model has been tested on four real-world sentiment datasets, achieving (99.78%) accuracy on Binary_Getty (BG), (99.12%) on Binary_iStock (BIS), (95.70%) on Twitter, and (79.06%) on the Multi-View Sentiment Analysis (MVSA) dataset. These results demonstrate the superior performance of our MMF model compared to single-model approaches and current state-of-the-art techniques based on model evaluation criteria.

Dynamic Invariant-Specific Representation Fusion Network for Multimodal Sentiment Analysis

MFDR: Multiple-stage Fusion and Dynamically Refined Network for Multimodal Emotion Recognition

Sentiment Analysis Using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities.

Heterogeneous Hierarchical Fusion Network for Multimodal Sentiment Analysis in Real-World Environments

Multimodal Sentiment Analysis in Realistic Environments Based on Cross-Modal Hierarchical Fusion Network

Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning

GSIFN: A Graph-Structured and Interlaced-Masked Multimodal Transformer-based Fusion Network for Multimodal Sentiment Analysis

Multi-Model Fusion Framework Using Deep Learning for Visual-Textual Sentiment Classification

FDR-MSA: Enhancing multimodal sentiment analysis through feature disentanglement and reconstruction

Multimodal Sentiment Analysis Method Based on Hierarchical Adaptive Feature Fusion Network

Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis

A Multimodal Sentiment Analysis Method Integrating Multi-Layer Attention Interaction and Multi-Feature Enhancement

Application of Multimodal Data Fusion Attentive Dual Residual Generative Adversarial Network in Sentiment Recognition and Sentiment Analysis

MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis

Mutual information maximization and feature space separation and bi-bimodal mo-dality fusion for multimodal sentiment analysis

Tri-Modalities Fusion for Multimodal Sentiment Analysis

Multimodal Sentiment Analysis Missing Modality Reconstruction Network Based on Shared-Specific Features

Multimodal Sentiment Analysis Using Multi-tensor Fusion Network with Cross-modal Modeling

SKEAFN: Sentiment Knowledge Enhanced Attention Fusion Network for multimodal sentiment analysis

Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion

SentDep: Pioneering Fusion-Centric Multimodal Sentiment Analysis for Unprecedented Performance and Insights