An efficient multimodal sentiment analysis in social media using hybrid optimal multi-scale residual attention network

Murugesan, Kanipriya
DOI: https://doi.org/10.1007/s10462-023-10645-7
IF: 9.588
2024-02-06
Artificial Intelligence Review
Abstract:Sentiment analysis is a key component of many social media analysis projects. Additionally, prior research has concentrated on a single modality in particular, such as text descriptions for visual information. In contrast to standard image databases, social images frequently connect to one another, making sentiment analysis challenging. The majority of methods now in use consider different images individually, rendering them useless for interrelated images. We proposed a hybrid Arithmetic Optimization Algorithm- Hunger Games Search (AOA-HGS)-optimized Ensemble Multi-scale Residual Attention Network (EMRA-Net) technique in this paper to explore the modal correlations including texts, audio, social links, and video for more effective multimodal sentiment analysis. The hybrid AOA-HGS technique learns complementary and comprehensive features. The EMRA-Net uses two segments, including Ensemble Attention CNN (EA-CNN) and Three-scale Residual Attention Convolutional Neural Network (TRA-CNN), to analyze the multimodal sentiments. The loss of spatial domain image texture features can be reduced by adding the Wavelet transform to TRA-CNN. The feature-level fusion technique known as EA-CNN is used to combine visual, audio, and textual information. The proposed method performs significantly better than the existing multimodel sentimental analysis techniques of HALCB, HDF, and MMLatch when evaluated using the Multimodal Emotion Lines Dataset (MELD) and EmoryNLP datasets. Also, even though the size of the training set varies, the proposed method outperformed other techniques in terms of recall, accuracy, F score, and precision and takes less time to compute in both datasets.
computer science, artificial intelligence
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to address the issue of multimodal sentiment analysis in social media. Specifically, existing methods mostly focus on sentiment analysis of a single modality (such as text descriptions or visual information) and overlook the interrelationships between different modalities. Images in social media are often interrelated, making sentiment analysis more complex. Most existing methods handle different images separately and cannot effectively process interrelated images. To solve these problems, the authors propose a Hybrid Arithmetic Optimization Algorithm-Hunger Games Search (AOA-HGS) optimized Ensemble Multi-scale Residual Attention Network (EMRA-Net) method. This method can comprehensively consider information from various modalities such as text, audio, and video, thereby achieving more effective multimodal sentiment analysis. Additionally, this method outperforms existing multimodal sentiment analysis techniques on the Multimodal Emotion Lines Dataset (MELD) and EmoryNLP datasets, and it also has a shorter computation time.