A Fine-Grained Modal Label-Based Multi-Stage Network for Multimodal Sentiment Analysis.
Junjie Peng,Ting Wu,Wenqiang Zhang,Feng Cheng,Shuhua Tan,Fen Yi,Yansong Huang
DOI: https://doi.org/10.1016/j.eswa.2023.119721
IF: 8.5
2023-01-01
Expert Systems with Applications
Abstract:Sentiment analysis is a challenging but valuable research topic in affective computing. It can improve the quality of various real-world applications, including financial market prediction, disease analysis even politics. As sentiment may be expressed by text, image, audio, video, etc., multimodal sentiment analysis has emerged to capture information in multiple ways. Take video as an example, the analysis process may be difficult since the modalities in the video are heterogeneous and may express different sentiments. To deal with such issues, a Fine-grained modal label-based Multi-Stage Network (FmlMSN) is proposed. Utilizing seven sentiment labels in unimodal, bimodal and trimodal, the model focuses on information at different granularities from text, audio, image and the combinations of them. Meanwhile, inspired by the idea of stacking ensemble learning which is still limited in sentiment analysis, multi-stage training is performed for base learners of acoustic-visual, visual-textual and acoustic-textual. In each stage, the singleton modality and pair-wise modalities are interconnected by hard parameter sharing multi-task learning. Subsequently, the hidden bimodal features are used to train the meta-learner for the final sentiment prediction. Extensive experiments on three public datasets, including one in Chinese and two in English indicate that our model outperforms the existing state-of-the-art methods. Furthermore, empirical analysis suggests that the model is flexible and can reduce training time and calculation to some extent.