A cross modal hierarchical fusion multimodal sentiment analysis method based on multi-task learning
Lan Wang,Junjie Peng,Cangzhi Zheng,Tong Zhao,Li’an Zhu
DOI: https://doi.org/10.1016/j.ipm.2024.103675
IF: 7.466
2024-02-01
Information Processing & Management
Abstract:Humans often express affections and intentions through multiple forms when communicating, involving text, audio, and vision modalities. Using a single modality to determine the sentiment state may be biased, but combining multiple clues can fully explore more comprehensive information. Effective fusion of heterogeneous data is one of the core problems of multimodal sentiment analysis . Most cross-modal fusion strategies inevitably bring noisy information, resulting in low-quality joint feature representations and impacting the accuracy of sentiment classification. Considering the unique cues of modality-specific, common information between modalities, and sentiment variability among different layers, we introduce multi-task learning and propose a cross-modal hierarchical fusion method for multimodal sentiment analysis. The model combines unimodal, bimodal, and trimodal tasks to enhance multimodal feature representation for the final sentiment prediction . We conduct extensive experiments on CH-SIMS, CMU-MOSI, and CMU-MOSEI, where the first one is in Chinese and the last two are in English. The results demonstrate the generalizability of the proposed method. It effectively improves the accuracy of sentiment analysis while reducing the adverse impact of the noise compared to the existing models.
computer science, information systems,information science & library science